Thread: finding eventid in quakeml from fdsn-event ws

Started: 2017-03-01 19:45:17

Last activity: 2017-03-06 23:52:55

Topics: FDSN Working Group III

This thread is from a mailing list that has moved to Google Groups. Use the following links to browse the updated archives.

FDSN Working Group III

Philip Crotwell

finding eventid in quakeml from fdsn-event ws

2017-03-01 19:45:17

Hi all

A common access pattern is to first do an initial wide but shallow
query, and then return to do a deep but narrow query. For example
asking an fdsn-station ws for stations in a box, displaying those on a
map, and then only going back to ask for channels or response for
stations as the user clicks on them. Another example would be to query
an fdsn-event web service for earthquakes, and then return using
things like includeallorigins=true and includearrivals=true to get
more detailed information for a specific earthquake. The current
combination of the fdsn-event ws query parameters and the quakeml xml
specification currently makes this harder than it should be I feel
because while the fdsn-event has a query based on eventid, there is
not a standard way to put the eventid into the original quakeml.
QuakeML has a publicID for each event, but the structure of this is
complicated enough that it is challenging to parse in a way that
reliably extracts the value that should be returned to the service as
eventid.

I have collected example <event> elements from all of the fdsn-event
web services currently listed on
http://www.fdsn.org/webservices/datacenters/
and as you can see there is quite a variety of ways of including the
eventid, in publicID and elsewhere, which makes it harder for clients
as they have to have code that says if (host == USGS) { do this; }
else if (host = ETHZ) { do that; }
which is hard to maintain and fragile.

While this is not likely a big enough of an issue to issue a revision
of the web services spec, it would be really nice if until the next
revision there could be a consensus on how to provide the eventid. And
when the next revision is created, to make this mandatory and
standardized.

I think I would prefer something simple like the USGS, NCEDC and SCEDC
style where there is an simple attribute that gives the eventid
exactly without parsing, like catalog:eventid="71377596", but of
course the drawback is that currently this is a separate schema
definition from the quakeml standard. Using the publicID would be
better in that that is already part of the quakeml spec, but the
format of the URI is too varied and complicated at present for easy
parsing.

Another solution would be to allow the entire publicID to be returned
via the eventid parameter. This would require the server to be able to
parse its own style of publicID, which seems reasonable. However the
structure of the publicID may also cause problems as it looks like a
URL and so would require escaping/encoding of certain characters. Yet
another solution would be to use text format for the wide but shallow
query and then use quakeml for the deep, but this has the downside of
requiring the client to parse two unrelated data formats.

thanks
Philip

IRIS
<event publicID="smi:service.iris.edu/fdsnws/event/1/query?eventid=3337497">

NCEDC
<event publicID="quakeml:nc.anss.org/Event/NC/71377596"
catalog:datasource="nc" catalog:dataid="nc71377596"
catalog:eventsource="nc" catalog:eventid="71377596">

SCEDC
<event publicID="quakeml:service.scedc.caltech.edu/fdsnws/event/1/query?eventid=37300872"
catalog:datasource="ci" catalog:dataid="ci37300872"
catalog:eventsource="ci" catalog:eventid="37300872">

USGS
<event catalog:datasource="us" catalog:eventsource="us"
catalog:eventid="c000lvb5"
publicID="quakeml:earthquake.usgs.gov/fdsnws/event/1/query?eventid=usc000lvb5&format=quakeml">

ETHZ
<event publicID="smi:ch.ethz.sed/sc3a/2017eemfch">

INGV
<event publicID="smi:webservices.ingv.it/fdsnws/event/1/query?eventId=863301">

ISC
<event publicID="smi:ISC/evid=600516598">

View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg3-products/CAGFrVcXR18c2m7f3KYjUhy6ZvOL5OvcSUhFKc7BLP8A9Z4TOKA%40mail.gmail.com.

Fabian Euchner

Re: finding eventid in quakeml from fdsn-event ws

2017-03-03 23:49:07

Hi Philip, hi all,

in the QuakeML world, entities (events, origins, picks) are identified through the publicID,
and *only*" through the publicID. The publicID has been designed in a way that makes it
easy to be globally unique (authority part, then resource part that is in the hands of the
issueing agency which ensures uniqueness). The "legacy" event IDs that are used in some
earthquake catalogs (often just integer numbers) cannot be unique, collisions are likely to
occur. When designing QuakeML there was a long discussion whether legacy IDs should
be part of the data model, and there was a consensus that they shouldn't, first because
their usage should be discouraged (non-uniqueness, not being future-proof, etc), and
there was also no semantically convincing place in the schema to put them.

The fdsnws-event standard says when it comes to the eventid query parameter: "event
identifiers are data center specific". It seems that most implementations expect the legacy
ID, not the publicID of the event (in fact, in your examples, this holds for all data centers
expect for ETH). Thanks for pointing this out! I was totally unaware of the fact.

In my opinion this is a serious specification and implementation flaw. In the next version
of the event service spec it should be mandatory that eventid is the publicID of the event.
All services should be queried in the same way, but, e.g., for ETH it is not possible, because
there are no legacy IDs. In the current situation, the user has to know which service
requires legacy ID, and which service requires publicID. In addition, the legacy ID is not
per se contained in the returned QuakeML document, as it is not contained in the
standard. This makes it hard to find the legacy ID if it exists at all (can be hidden in the
publicID, or in an extension attribute which depends on the data center).

Furthermore, QuakeML publicIDs are designed to be opaque. They may be compiled from
other pieces of information, like timestamps, legacy IDs, etc., but they need not, they can
just be random strings (the resource part). Therefore, no user or service should rely on
parsing publicIDs.

Thanks again, Philip, for bringing up this important issue.

Best regards,
Fabian

A common access pattern is to first do an initial wide but shallow
query, and then return to do a deep but narrow query. For example
asking an fdsn-station ws for stations in a box, displaying those on a
map, and then only going back to ask for channels or response for
stations as the user clicks on them. Another example would be to query
an fdsn-event web service for earthquakes, and then return using
things like includeallorigins=true and includearrivals=true to get
more detailed information for a specific earthquake. The current
combination of the fdsn-event ws query parameters and the quakeml xml
specification currently makes this harder than it should be I feel
because while the fdsn-event has a query based on eventid, there is
not a standard way to put the eventid into the original quakeml.
QuakeML has a publicID for each event, but the structure of this is
complicated enough that it is challenging to parse in a way that
reliably extracts the value that should be returned to the service as
eventid.

I have collected example <event> elements from all of the fdsn-event
web services currently listed on
http://www.fdsn.org/webservices/datacenters/
and as you can see there is quite a variety of ways of including the
eventid, in publicID and elsewhere, which makes it harder for clients
as they have to have code that says if (host == USGS) { do this; }
else if (host = ETHZ) { do that; }
which is hard to maintain and fragile.

While this is not likely a big enough of an issue to issue a revision
of the web services spec, it would be really nice if until the next
revision there could be a consensus on how to provide the eventid. And
when the next revision is created, to make this mandatory and
standardized.

I think I would prefer something simple like the USGS, NCEDC and SCEDC
style where there is an simple attribute that gives the eventid
exactly without parsing, like catalog:eventid="71377596", but of
course the drawback is that currently this is a separate schema
definition from the quakeml standard. Using the publicID would be
better in that that is already part of the quakeml spec, but the
format of the URI is too varied and complicated at present for easy
parsing.

Another solution would be to allow the entire publicID to be returned
via the eventid parameter. This would require the server to be able to
parse its own style of publicID, which seems reasonable. However the
structure of the publicID may also cause problems as it looks like a
URL and so would require escaping/encoding of certain characters. Yet
another solution would be to use text format for the wide but shallow
query and then use quakeml for the deep, but this has the downside of
requiring the client to parse two unrelated data formats.

thanks
Philip

IRIS
<event publicID="smi:service.iris.edu/fdsnws/event/1/query?eventid=3337497">

NCEDC
<event publicID="quakeml:nc.anss.org/Event/NC/71377596"
catalog:datasource="nc" catalog:dataid="nc71377596"
catalog:eventsource="nc" catalog:eventid="71377596">

SCEDC
<event
publicID="quakeml:service.scedc.caltech.edu/fdsnws/event/1/query?eventid=37
300872" catalog:datasource="ci" catalog:dataid="ci37300872"
catalog:eventsource="ci" catalog:eventid="37300872">

USGS
<event catalog:datasource="us" catalog:eventsource="us"

View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg3-products/106346821.64utXHXUdN%40desdemona.ethz.ch.
- Jeremy Fee
  
  Re: finding eventid in quakeml from fdsn-event ws
  
  2017-03-03 20:51:17
  
  Hello,
  
  in the QuakeML world, entities (events, origins, picks) are identified
  
  through the publicID, and *only*" through the publicID. The publicID has
  been designed in a way that makes it easy to be globally unique (authority
  part, then resource part that is in the hands of the issueing agency which
  ensures uniqueness). The "legacy" event IDs that are used in some
  earthquake catalogs (often just integer numbers) cannot be unique,
  collisions are likely to occur. When designing QuakeML there was a long
  discussion whether legacy IDs should be part of the data model, and there
  was a consensus that they shouldn't, first because their usage should be
  discouraged (non-uniqueness, not being future-proof, etc), and there was
  also no semantically convincing place in the schema to put them.
  
  USGS handled this by defining an eventsource (typically FDSN network code)
  as a "namespace" for eventids, which eliminates collisions and allowing
  contributors to continue using the existing IDs for events without
  requiring yet another eventid system. We commented on this early in the
  Quakeml 1.2 process (see previous email to the quakeml mailing list below)
  and implemented a custom extension to Quakeml to support our requirements (
  https://github.com/usgs/eqmessageutils/blob/master/etc/quakeml_1.2/AnssCatalog-0.1.xsd
  ) while remaining compatible with the original specification.
  
  USGS requirements may differ from other organizations, because we aggregate
  multiple earthquake catalogs from many contributors into a single
  "composite" catalog. We consider an event to have multiple IDs, one unique
  id from each contributor (USGS included), and allow events to be referenced
  using any of those IDs. Messages from multiple contributors are associated
  based on location in space and time, and automatic associations can be
  manually overridden when needed. This balances the requirements for a)
  individual organizations to assign a unique identifier and maintain a
  catalog of events and b) operate independently of any central authority.
  
  The fdsnws-event standard says when it comes to the eventid query
  
  parameter: "event identifiers are data center specific". It seems that most
  implementations expect the legacy ID, not the publicID of the event (in
  fact, in your examples, this holds for all data centers expect for ETH).
  Thanks for pointing this out! I was totally unaware of the fact.
  
  In my opinion this is a serious specification and implementation flaw. In
  
  the next version of the event service spec it should be mandatory that
  eventid is the publicID of the event. All services should be queried in the
  same way, but, e.g., for ETH it is not possible, because there are no
  legacy IDs. In the current situation, the user has to know which service
  requires legacy ID, and which service requires publicID. In addition, the
  legacy ID is not per se contained in the returned QuakeML document, as it
  is not contained in the standard. This makes it hard to find the legacy ID
  if it exists at all (can be hidden in the publicID, or in an extension
  attribute which depends on the data center).
  
  Furthermore, QuakeML publicIDs are designed to be opaque. They may be
  
  compiled from other pieces of information, like timestamps, legacy IDs,
  etc., but they need not, they can just be random strings (the resource
  part). Therefore, no user or service should rely on parsing publicIDs.
  
  If you want to add support to query using public IDs, I recommend
  definition of a new "publicID" parameter for the fdsn service and leave the
  existing eventid parameter unchanged for backward compatibility. An
  additional consideration is how a service should handle multiple versions
  of the same event element (assuming ordering based on
  event/creationInfo/creationTime). It may be simpler for an explicit
  "detailURL" or similar attribute to be added to the event element to
  support the explicit use case to obtain more information.
  
  We introduced our custom extension to support these requirements of being
  able to uniquely identify events, and individual pieces of information
  being contributed to those events, because of the suggestion (see below)
  that there may be aliases for publicIDs and they would not be guaranteed to
  be universally unique. I suggest that rather than adding/changing meaning
  of an otherwise opaque and non-unique identifier, that explicit attributes
  or elements be created for these purposes (or that the AnssCatalog
  extension be more widely adopted).
  
  Thanks,
  
  Jeremy
  
  Previous email to quakeml mailing list (couldn't find the list archives
  online):
  
  From: Fabian Euchner <fabian.euchner<at>sed.ethz.ch>
  
  Date: March 23, 2011 9:35:11 AM MDT
  
  To: Jeremy M Fee <jmfee<at>usgs.gov>
  
  Cc: <QuakeML<at>intensity.usc.edu>, <kfelzer<at>gps.caltech.edu>, Michelle
  
  Guy <mguy<at>usgs.gov>
  
  Subject: Re: Fwd: [QuakeML] question about authority-id and resource-
  
  id in public identifiers
  
  Hi Jeremy et al.,
  
  the section on resource identifiers in the standard doc is based on
  
  some, up
  
  to now rather theoretical, thoughts on how a resource metadata
  
  framework could
  
  look like. This is very much inspired (I could also say borrowed)
  
  from how
  
  this is handled in the Astrophysical Virtual Observatory
  
  community ;-) To be
  
  honest, I don't know how agencies that already use QuakeML interpret &
  
  implement it and you are of course right that it's not specified in
  
  detail in
  
  the standard doc. Since it is a standard doc on the markup language,
  
  I think
  
  it's not the right place to specify it, there should be a second
  
  document that
  
  is more focused on infrastructure. *I think it's pretty clear that an*
  
  * identifier cannot refer to two different resources, but I could*
  
  * imagine that a*
  
  * resource can be referenced by more than one identifier from the same*
  
  * authority*
  
  * (aliases).* Thanks for starting this discussion, it's an important
  
  point if
  
  QuakeML starts to play a more important role in our networked
  
  infrastructures.
  
  Cheers,
  
  Fabian
  
  On Fri March 18 2011 20:55:50 Jeremy M Fee wrote:
  
  Hi Fabian,
  
  I received a reply from Karen at CalTech, but I'd like to know if
  
  these assumptions are also safe across all QuakeML implementations:
  
  1) An authority always refers to the same resource using the same
  
  resourceID.
  
  2) When an authority updates a resource, the same resourceID is used
  
  And as a result of the previous two assumptions:
  
  3) When one authority submits two different event resourceIDs, they
  
  refer to different events.
  
  I've read the QuakeML-BED.pdf for version 1.1, and cannot find
  
  anything imposing this restriction. At USGS we rely on this to 1)
  
  track updates to existing events, and 2) distinguish events that are
  
  so close in space and time they would otherwise be considered the
  
  same
  
  event.
  
  Thanks,
  
  Jeremy
  
  Begin forwarded message:
  
  From: Karen Felzer <kfelzer<at>gps.caltech.edu>
  
  Date: March 17, 2011 4:23:32 PM MDT
  
  To: Jeremy M Fee <jmfee<at>usgs.gov>
  
  Cc: quakeml<at>intensity.usc.edu
  
  Subject: Re: [QuakeML] question about authority-id and resource-id
  
  in public identifiers
  
  Yes -- information for the same earthquake should always be reported
  
  under the same earthquake ID number.
  
  regards,
  
  Karen Felzer
  
  On Mar 17, 2011, at 2:22 PM, Jeremy M Fee wrote:
  
  Hi,
  
  Is it a safe assumption that an authority will always refer to the
  
  same resource using the same resource id? Meaning, if an authority
  
  submits an event under one resource id, they will always reuse that
  
  same resource id when updating event information (and identify
  
  version information separately)? This would make it much easier to
  
  recognize updates, versus new information.
  
  Thanks,
  
  Jeremy
  
  _______________________________________________
  
  QuakeML mailing list
  
  QuakeML<at>intensity.usc.edu
  
  http://intensity.usc.edu/mailman/listinfo/quakeml
  
  --
  
  -------------------------------------------------------------------------------
  
  Fabian Euchner phone +41 44 633 7178
  
  Swiss Seismological Service fax +41 44 633 1065
  
  ETH Zurich, NO F67 e-mail fabian<at>sed.ethz.ch
  
  Sonneggstrasse 5 www.fabian-euchner.de
  
  8092 Zurich (Switzerland)
  
  www.earthquake.ethz.ch/people/feuchner
  
  -------------------------------------------------------------------------------
  
  QuakeML http://quakeml.org AstroCat http://astrocat.org
  
  QuakePy http://quakepy.org CVcat http://cvcat.net
  
  CSEP http://www.cseptesting.org
  
  -------------------------------------------------------------------------------
  
  On Fri, Mar 3, 2017 at 7:50 AM, Fabian Euchner <fabian.euchner<at>sed.ethz.ch>
  wrote:
  
  Hi Philip, hi all,
  
  in the QuakeML world, entities (events, origins, picks) are identified
  through the publicID, and *only*" through the publicID. The publicID has
  been designed in a way that makes it easy to be globally unique (authority
  part, then resource part that is in the hands of the issueing agency which
  ensures uniqueness). The "legacy" event IDs that are used in some
  earthquake catalogs (often just integer numbers) cannot be unique,
  collisions are likely to occur. When designing QuakeML there was a long
  discussion whether legacy IDs should be part of the data model, and there
  was a consensus that they shouldn't, first because their usage should be
  discouraged (non-uniqueness, not being future-proof, etc), and there was
  also no semantically convincing place in the schema to put them.
  
  The fdsnws-event standard says when it comes to the eventid query
  parameter: "event identifiers are data center specific". It seems that most
  implementations expect the legacy ID, not the publicID of the event (in
  fact, in your examples, this holds for all data centers expect for ETH).
  Thanks for pointing this out! I was totally unaware of the fact.
  
  In my opinion this is a serious specification and implementation flaw. In
  the next version of the event service spec it should be mandatory that
  eventid is the publicID of the event. All services should be queried in the
  same way, but, e.g., for ETH it is not possible, because there are no
  legacy IDs. In the current situation, the user has to know which service
  requires legacy ID, and which service requires publicID. In addition, the
  legacy ID is not per se contained in the returned QuakeML document, as it
  is not contained in the standard. This makes it hard to find the legacy ID
  if it exists at all (can be hidden in the publicID, or in an extension
  attribute which depends on the data center).
  
  Furthermore, QuakeML publicIDs are designed to be opaque. They may be
  compiled from other pieces of information, like timestamps, legacy IDs,
  etc., but they need not, they can just be random strings (the resource
  part). Therefore, no user or service should rely on parsing publicIDs.
  
  Thanks again, Philip, for bringing up this important issue.
  
  Best regards,
  
  Fabian
  
  A common access pattern is to first do an initial wide but shallow
  
  query, and then return to do a deep but narrow query. For example
  
  asking an fdsn-station ws for stations in a box, displaying those on a
  
  map, and then only going back to ask for channels or response for
  
  stations as the user clicks on them. Another example would be to query
  
  an fdsn-event web service for earthquakes, and then return using
  
  things like includeallorigins=true and includearrivals=true to get
  
  more detailed information for a specific earthquake. The current
  
  combination of the fdsn-event ws query parameters and the quakeml xml
  
  specification currently makes this harder than it should be I feel
  
  because while the fdsn-event has a query based on eventid, there is
  
  not a standard way to put the eventid into the original quakeml.
  
  QuakeML has a publicID for each event, but the structure of this is
  
  complicated enough that it is challenging to parse in a way that
  
  reliably extracts the value that should be returned to the service as
  
  eventid.
  
  I have collected example <event> elements from all of the fdsn-event
  
  web services currently listed on
  
  http://www.fdsn.org/webservices/datacenters/
  
  and as you can see there is quite a variety of ways of including the
  
  eventid, in publicID and elsewhere, which makes it harder for clients
  
  as they have to have code that says if (host == USGS) { do this; }
  
  else if (host = ETHZ) { do that; }
  
  which is hard to maintain and fragile.
  
  While this is not likely a big enough of an issue to issue a revision
  
  of the web services spec, it would be really nice if until the next
  
  revision there could be a consensus on how to provide the eventid. And
  
  when the next revision is created, to make this mandatory and
  
  standardized.
  
  I think I would prefer something simple like the USGS, NCEDC and SCEDC
  
  style where there is an simple attribute that gives the eventid
  
  exactly without parsing, like catalog:eventid="71377596", but of
  
  course the drawback is that currently this is a separate schema
  
  definition from the quakeml standard. Using the publicID would be
  
  better in that that is already part of the quakeml spec, but the
  
  format of the URI is too varied and complicated at present for easy
  
  parsing.
  
  Another solution would be to allow the entire publicID to be returned
  
  via the eventid parameter. This would require the server to be able to
  
  parse its own style of publicID, which seems reasonable. However the
  
  structure of the publicID may also cause problems as it looks like a
  
  URL and so would require escaping/encoding of certain characters. Yet
  
  another solution would be to use text format for the wide but shallow
  
  query and then use quakeml for the deep, but this has the downside of
  
  requiring the client to parse two unrelated data formats.
  
  thanks
  
  Philip
  
  IRIS
  
  <event publicID="smi:service.iris.edu/fdsnws/event/1/query?
  
  eventid=3337497">
  
  NCEDC
  
  <event publicID="quakeml:nc.anss.org/Event/NC/71377596"
  
  catalog:datasource="nc" catalog:dataid="nc71377596"
  
  catalog:eventsource="nc" catalog:eventid="71377596">
  
  SCEDC
  
  <event
  
  publicID="quakeml:service.scedc.caltech.edu/fdsnws/
  
  event/1/query?eventid=37
  
  300872" catalog:datasource="ci" catalog:dataid="ci37300872"
  
  catalog:eventsource="ci" catalog:eventid="37300872">
  
  USGS
  
  <event catalog:datasource="us" catalog:eventsource="us"
  
  catalog:eventid="c000lvb5"
  
  publicID="quakeml:earthquake.usgs.gov/fdsnws/event/1/query?
  
  eventid=usc000lvb
  
  5&format=quakeml">
  
  ETHZ
  
  <event publicID="smi:ch.ethz.sed/sc3a/2017eemfch">
  
  INGV
  
  <event
  
  publicID="smi:webservices.ingv.it/fdsnws/event/1/query?eventId=863301">
  
  ISC
  
  <event publicID="smi:ISC/evid=600516598">
  
  ----------------------
  
  FDSN Working Group III
  
  (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
  
  Sent from the FDSN Message Center (http://www.fdsn.org/message-center/)
  
  Update subscription preferences at http://www.fdsn.org/account/profile/
  
  --
  
  ------------------------------------------------------------
  -----------------
  
  Fabian Euchner phone +41 44 633 7178
  
  Institute of Geophysics fax +41 44 633 1065
  
  ETH Zurich, NO F5 e-mail fabian<at>sed.ethz.ch
  
  Sonneggstrasse 5 orcid.org/0000-0001-6340-7439
  
  8092 Zurich (Switzerland)
  
  ------------------------------------------------------------
  -----------------
  
  QuakeML http://quakeml.org QuakePy http://quakepy.org
  
  CSEP http://www.cseptesting.org/centers/eth
  
  ------------------------------------------------------------
  -----------------
  
  ----------------------
  FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-
  products/)
  
  Sent from the FDSN Message Center (http://www.fdsn.org/message-center/)
  Update subscription preferences at http://www.fdsn.org/account/profile/
  
  View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg3-products/CAHdFqiZ4HLMeFGy6NRCjL9q9YwrzerchkQsYcucYET54K3MduA%40mail.gmail.com.
  - Fabian Euchner
    
    Re: finding eventid in quakeml from fdsn-event ws
    
    2017-03-04 05:47:06
    
    Hello Jeremy, hello all,
    
    first, let me apologize if somebody found my comment too harsh or offending. That was
    absolutely not my intention.
    
    Since the fdsnws-event default output format is QuakeML, I assume that the QuakeML
    data model is the common minimum standard data model. Since legacy IDs are not
    contained therein, I think using them as a query parameter should not the common
    standard way to query individual event information. Therefore, I would suggest that a
    next iteration of the event service specification defines a new query parameter, maybe
    called eventpublicid, that is implemented by all data centers to query on event publicIDs,
    which are mandatory in all result QuakeML documents. If some data centers want to
    additionaly provide a query parameter for legacy IDs, that's fine for me. Every user
    querying based on this has to know what she/he does, and how to deal with results.
    
    All the best,
    Fabian
    
    Hello,
    
    in the QuakeML world, entities (events, origins, picks) are identified
    
    through the publicID, and *only*" through the publicID. The publicID has
    been designed in a way that makes it easy to be globally unique (authority
    part, then resource part that is in the hands of the issueing agency which
    ensures uniqueness). The "legacy" event IDs that are used in some
    earthquake catalogs (often just integer numbers) cannot be unique,
    collisions are likely to occur. When designing QuakeML there was a long
    discussion whether legacy IDs should be part of the data model, and there
    was a consensus that they shouldn't, first because their usage should be
    discouraged (non-uniqueness, not being future-proof, etc), and there was
    also no semantically convincing place in the schema to put them.
    
    USGS handled this by defining an eventsource (typically FDSN network code)
    as a "namespace" for eventids, which eliminates collisions and allowing
    contributors to continue using the existing IDs for events without
    requiring yet another eventid system. We commented on this early in the
    Quakeml 1.2 process (see previous email to the quakeml mailing list below)
    and implemented a custom extension to Quakeml to support our requirements (
    https://github.com/usgs/eqmessageutils/blob/master/etc/quakeml_1.2/AnssCatal
    og-0.1.xsd ) while remaining compatible with the original specification.
    
    USGS requirements may differ from other organizations, because we aggregate
    multiple earthquake catalogs from many contributors into a single
    "composite" catalog. We consider an event to have multiple IDs, one unique
    id from each contributor (USGS included), and allow events to be referenced
    using any of those IDs. Messages from multiple contributors are associated
    based on location in space and time, and automatic associations can be
    manually overridden when needed. This balances the requirements for a)
    individual organizations to assign a unique identifier and maintain a
    catalog of events and b) operate independently of any central authority.
    
    The fdsnws-event standard says when it comes to the eventid query
    
    parameter: "event identifiers are data center specific". It seems that
    most
    implementations expect the legacy ID, not the publicID of the event (in
    fact, in your examples, this holds for all data centers expect for ETH).
    Thanks for pointing this out! I was totally unaware of the fact.
    
    In my opinion this is a serious specification and implementation flaw. In
    
    the next version of the event service spec it should be mandatory that
    eventid is the publicID of the event. All services should be queried in
    the
    same way, but, e.g., for ETH it is not possible, because there are no
    legacy IDs. In the current situation, the user has to know which service
    requires legacy ID, and which service requires publicID. In addition, the
    legacy ID is not per se contained in the returned QuakeML document, as it
    is not contained in the standard. This makes it hard to find the legacy ID
    if it exists at all (can be hidden in the publicID, or in an extension
    attribute which depends on the data center).
    
    Furthermore, QuakeML publicIDs are designed to be opaque. They may be
    
    compiled from other pieces of information, like timestamps, legacy IDs,
    etc., but they need not, they can just be random strings (the resource
    part). Therefore, no user or service should rely on parsing publicIDs.
    
    If you want to add support to query using public IDs, I recommend
    definition of a new "publicID" parameter for the fdsn service and leave the
    existing eventid parameter unchanged for backward compatibility. An
    additional consideration is how a service should handle multiple versions
    of the same event element (assuming ordering based on
    event/creationInfo/creationTime). It may be simpler for an explicit
    "detailURL" or similar attribute to be added to the event element to
    support the explicit use case to obtain more information.
    
    We introduced our custom extension to support these requirements of being
    able to uniquely identify events, and individual pieces of information
    being contributed to those events, because of the suggestion (see below)
    that there may be aliases for publicIDs and they would not be guaranteed to
    be universally unique. I suggest that rather than adding/changing meaning
    of an otherwise opaque and non-unique identifier, that explicit attributes
    or elements be created for these purposes (or that the AnssCatalog
    extension be more widely adopted).
    
    Thanks,
    
    Jeremy
    
    Previous email to quakeml mailing list (couldn't find the list archives
    
    online):
    
    From: Fabian Euchner <fabian.euchner<at>sed.ethz.ch>
    
    Date: March 23, 2011 9:35:11 AM MDT
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg3-products/3475408.OiZpG6Se05%40desdemona.ethz.ch.
    
    Philip Crotwell
    
    Re: finding eventid in quakeml from fdsn-event ws
    
    2017-03-06 23:52:55
    
    Hi
    
    The main question I have as a client writer is how do I get from a
    general fdsn event query, with many events, to a detailed query for a
    single event without server-specific code. As best I can figure out,
    there is no simple answer now.
    
    The best I can come up with is this algorithm. I presume everyone
    agrees this is needlessly complicated and as there is not a good
    default action, it is unable to handle a new fdsn event web services
    without rewriting the code.
    
    1) If (IRIS or INGV):
    use publicID as full URL after replacing "smi:" with "http://"
    2) if (USGS or SCEDC):
    use publicID as full URL after replacing "quakeml:" with "http://"
    3) if (NCEDC):
    use catalog:eventid as eventid parameter
    4) if (ETHZ):
    use entire publicID (including smi:) as eventid parameter
    5) if (ISC):
    parse publicID as a URL and use the value of the evid parameter as
    eventid parameter
    
    One further note is that although the USGS, SCEDC and NCEDC appear to
    use the same anss "catalog" quakeml extension, they interpret the
    fdsnevent eventid parameter differently. The USGS requires the
    concatenation of catalog:eventsource and catalog:eventid while NCEDC
    and SCEDC both accept only catalog:eventid as the eventid parameter.
    
    What a client needs is to be able to use a single value from a quakeml
    event as the eventid. Theoretically, the publicID appears as if it is
    supposed to be that value, but as a practical matter only works for
    one out of the seven services. And the publicID as it is currently
    specified is not friendly to being used as a URL parameter as it is
    possible (and very common) to have it include the '&' character.
    Without escaping that char, the resulting URL will be wrong. IMHO, a
    friendly eventid value really should not require processing in order
    to be added to a URL.
    
    There are two questions I feel. First, can there be a recommendation
    as to what a current fdsn event web service should should accept as
    the eventid parameter? Second, if there is a revision of the spec,
    what should we change to make this easier?
    
    Absent a more specific publicID format, I don't see a good option that
    doesn't require almost everyone to make server changes. Perhaps
    accepting the full publicID as the eventid, in addition to whatever
    the current implementation, is the least bad?
    
    As to the longer term, perhaps adding a "publicid=" parameter to the
    fdsn event query is the clearest and most direct solution. But I still
    feel that existing publicIDs are too verbose and unfriendly for use in
    URLs. Perhaps some of this could be addressed in both in quakeml 2.0
    by making the structure of the publicID cleaner or simpler, and by an
    explicit mapping from quakeml event parameters to a url?
    
    thanks
    Philip
    
    On Fri, Mar 3, 2017 at 3:48 PM, Fabian Euchner
    <fabian.euchner<at>sed.ethz.ch> wrote:
    
    Hello Jeremy, hello all,
    
    first, let me apologize if somebody found my comment too harsh or offending.
    That was absolutely not my intention.
    
    Since the fdsnws-event default output format is QuakeML, I assume that the
    QuakeML data model is the common minimum standard data model. Since legacy
    IDs are not contained therein, I think using them as a query parameter
    should not the common standard way to query individual event information.
    Therefore, I would suggest that a next iteration of the event service
    specification defines a new query parameter, maybe called eventpublicid,
    that is implemented by all data centers to query on event publicIDs, which
    are mandatory in all result QuakeML documents. If some data centers want to
    additionaly provide a query parameter for legacy IDs, that's fine for me.
    Every user querying based on this has to know what she/he does, and how to
    deal with results.
    
    All the best,
    
    Fabian
    
    Hello,
    
    in the QuakeML world, entities (events, origins, picks) are identified
    
    through the publicID, and *only*" through the publicID. The publicID has
    
    been designed in a way that makes it easy to be globally unique
    (authority
    
    part, then resource part that is in the hands of the issueing agency
    which
    
    ensures uniqueness). The "legacy" event IDs that are used in some
    
    earthquake catalogs (often just integer numbers) cannot be unique,
    
    collisions are likely to occur. When designing QuakeML there was a long
    
    discussion whether legacy IDs should be part of the data model, and
    there
    
    was a consensus that they shouldn't, first because their usage should be
    
    discouraged (non-uniqueness, not being future-proof, etc), and there was
    
    also no semantically convincing place in the schema to put them.
    
    USGS handled this by defining an eventsource (typically FDSN network code)
    
    as a "namespace" for eventids, which eliminates collisions and allowing
    
    contributors to continue using the existing IDs for events without
    
    requiring yet another eventid system. We commented on this early in the
    
    Quakeml 1.2 process (see previous email to the quakeml mailing list below)
    
    and implemented a custom extension to Quakeml to support our requirements
    (
    
    https://github.com/usgs/eqmessageutils/blob/master/etc/quakeml_1.2/AnssCatal
    
    og-0.1.xsd ) while remaining compatible with the original specification.
    
    USGS requirements may differ from other organizations, because we
    aggregate
    
    multiple earthquake catalogs from many contributors into a single
    
    "composite" catalog. We consider an event to have multiple IDs, one unique
    
    id from each contributor (USGS included), and allow events to be
    referenced
    
    using any of those IDs. Messages from multiple contributors are associated
    
    based on location in space and time, and automatic associations can be
    
    manually overridden when needed. This balances the requirements for a)
    
    individual organizations to assign a unique identifier and maintain a
    
    catalog of events and b) operate independently of any central authority.
    
    The fdsnws-event standard says when it comes to the eventid query
    
    parameter: "event identifiers are data center specific". It seems that
    
    most
    
    implementations expect the legacy ID, not the publicID of the event (in
    
    fact, in your examples, this holds for all data centers expect for ETH).
    
    Thanks for pointing this out! I was totally unaware of the fact.
    
    In my opinion this is a serious specification and implementation flaw. In
    
    the next version of the event service spec it should be mandatory that
    
    eventid is the publicID of the event. All services should be queried in
    
    the
    
    same way, but, e.g., for ETH it is not possible, because there are no
    
    legacy IDs. In the current situation, the user has to know which service
    
    requires legacy ID, and which service requires publicID. In addition,
    the
    
    legacy ID is not per se contained in the returned QuakeML document, as
    it
    
    is not contained in the standard. This makes it hard to find the legacy
    ID
    
    if it exists at all (can be hidden in the publicID, or in an extension
    
    attribute which depends on the data center).
    
    Furthermore, QuakeML publicIDs are designed to be opaque. They may be
    
    compiled from other pieces of information, like timestamps, legacy IDs,
    
    etc., but they need not, they can just be random strings (the resource
    
    part). Therefore, no user or service should rely on parsing publicIDs.
    
    If you want to add support to query using public IDs, I recommend
    
    definition of a new "publicID" parameter for the fdsn service and leave
    the
    
    existing eventid parameter unchanged for backward compatibility. An
    
    additional consideration is how a service should handle multiple versions
    
    of the same event element (assuming ordering based on
    
    event/creationInfo/creationTime). It may be simpler for an explicit
    
    "detailURL" or similar attribute to be added to the event element to
    
    support the explicit use case to obtain more information.
    
    We introduced our custom extension to support these requirements of being
    
    able to uniquely identify events, and individual pieces of information
    
    being contributed to those events, because of the suggestion (see below)
    
    that there may be aliases for publicIDs and they would not be guaranteed
    to
    
    be universally unique. I suggest that rather than adding/changing meaning
    
    of an otherwise opaque and non-unique identifier, that explicit attributes
    
    or elements be created for these purposes (or that the AnssCatalog
    
    extension be more widely adopted).
    
    Thanks,
    
    Jeremy
    
    Previous email to quakeml mailing list (couldn't find the list archives
    
    online):
    
    From: Fabian Euchner <fabian.euchner<at>sed.ethz.ch>
    
    Date: March 23, 2011 9:35:11 AM MDT
    
    To: Jeremy M Fee <jmfee<at>usgs.gov>
    
    Cc: <QuakeML<at>intensity.usc.edu>, <kfelzer<at>gps.caltech.edu>, Michelle
    
    Guy <mguy<at>usgs.gov>
    
    Subject: Re: Fwd: [QuakeML] question about authority-id and resource-
    
    id in public identifiers
    
    Hi Jeremy et al.,
    
    the section on resource identifiers in the standard doc is based on
    
    some, up
    
    to now rather theoretical, thoughts on how a resource metadata
    
    framework could
    
    look like. This is very much inspired (I could also say borrowed)
    
    from how
    
    this is handled in the Astrophysical Virtual Observatory
    
    community ;-) To be
    
    honest, I don't know how agencies that already use QuakeML interpret &
    
    implement it and you are of course right that it's not specified in
    
    detail in
    
    the standard doc. Since it is a standard doc on the markup language,
    
    I think
    
    it's not the right place to specify it, there should be a second
    
    document that
    
    is more focused on infrastructure. *I think it's pretty clear that an*
    
    * identifier cannot refer to two different resources, but I could*
    
    * imagine that a*
    
    * resource can be referenced by more than one identifier from the same*
    
    * authority*
    
    * (aliases).* Thanks for starting this discussion, it's an important
    
    point if
    
    QuakeML starts to play a more important role in our networked
    
    infrastructures.
    
    Cheers,
    
    Fabian
    
    On Fri March 18 2011 20:55:50 Jeremy M Fee wrote:
    
    Hi Fabian,
    
    I received a reply from Karen at CalTech, but I'd like to know if
    
    these assumptions are also safe across all QuakeML implementations:
    
    1) An authority always refers to the same resource using the same
    
    resourceID.
    
    2) When an authority updates a resource, the same resourceID is
    
    used
    
    And as a result of the previous two assumptions:
    
    3) When one authority submits two different event resourceIDs, they
    
    refer to different events.
    
    I've read the QuakeML-BED.pdf for version 1.1, and cannot find
    
    anything imposing this restriction. At USGS we rely on this to 1)
    
    track updates to existing events, and 2) distinguish events that are
    
    so close in space and time they would otherwise be considered the
    
    same
    
    event.
    
    Thanks,
    
    Jeremy
    
    Begin forwarded message:
    
    From: Karen Felzer <kfelzer<at>gps.caltech.edu>
    
    Date: March 17, 2011 4:23:32 PM MDT
    
    To: Jeremy M Fee <jmfee<at>usgs.gov>
    
    Cc: quakeml<at>intensity.usc.edu
    
    Subject: Re: [QuakeML] question about authority-id and resource-id
    
    in public identifiers
    
    Yes -- information for the same earthquake should always be reported
    
    under the same earthquake ID number.
    
    regards,
    
    Karen Felzer
    
    On Mar 17, 2011, at 2:22 PM, Jeremy M Fee wrote:
    
    Hi,
    
    Is it a safe assumption that an authority will always refer to the
    
    same resource using the same resource id? Meaning, if an authority
    
    submits an event under one resource id, they will always reuse that
    
    same resource id when updating event information (and identify
    
    version information separately)? This would make it much easier to
    
    recognize updates, versus new information.
    
    Thanks,
    
    Jeremy
    
    _______________________________________________
    
    QuakeML mailing list
    
    QuakeML<at>intensity.usc.edu
    
    http://intensity.usc.edu/mailman/listinfo/quakeml
    
    --
    
    -------------------------------------------------------------------------
    
    ------>>
    
    Fabian Euchner phone +41 44 633 7178
    
    Swiss Seismological Service fax +41 44 633 1065
    
    ETH Zurich, NO F67 e-mail fabian<at>sed.ethz.ch
    
    Sonneggstrasse 5 www.fabian-euchner.de
    
    8092 Zurich (Switzerland)
    
    www.earthquake.ethz.ch/people/feuchner
    
    -------------------------------------------------------------------------
    
    ------>>
    
    QuakeML http://quakeml.org AstroCat http://astrocat.org
    
    QuakePy http://quakepy.org CVcat http://cvcat.net
    
    CSEP http://www.cseptesting.org
    
    -------------------------------------------------------------------------
    
    ------
    
    On Fri, Mar 3, 2017 at 7:50 AM, Fabian Euchner
    <fabian.euchner<at>sed.ethz.ch>
    
    wrote:
    
    Hi Philip, hi all,
    
    in the QuakeML world, entities (events, origins, picks) are identified
    
    through the publicID, and *only*" through the publicID. The publicID has
    
    been designed in a way that makes it easy to be globally unique
    (authority
    
    part, then resource part that is in the hands of the issueing agency
    which
    
    ensures uniqueness). The "legacy" event IDs that are used in some
    
    earthquake catalogs (often just integer numbers) cannot be unique,
    
    collisions are likely to occur. When designing QuakeML there was a long
    
    discussion whether legacy IDs should be part of the data model, and
    there
    
    was a consensus that they shouldn't, first because their usage should be
    
    discouraged (non-uniqueness, not being future-proof, etc), and there was
    
    also no semantically convincing place in the schema to put them.
    
    The fdsnws-event standard says when it comes to the eventid query
    
    parameter: "event identifiers are data center specific". It seems that
    
    most
    
    implementations expect the legacy ID, not the publicID of the event (in
    
    fact, in your examples, this holds for all data centers expect for ETH).
    
    Thanks for pointing this out! I was totally unaware of the fact.
    
    In my opinion this is a serious specification and implementation flaw.
    In
    
    the next version of the event service spec it should be mandatory that
    
    eventid is the publicID of the event. All services should be queried in
    
    the
    
    same way, but, e.g., for ETH it is not possible, because there are no
    
    legacy IDs. In the current situation, the user has to know which service
    
    requires legacy ID, and which service requires publicID. In addition,
    the
    
    legacy ID is not per se contained in the returned QuakeML document, as
    it
    
    is not contained in the standard. This makes it hard to find the legacy
    ID
    
    if it exists at all (can be hidden in the publicID, or in an extension
    
    attribute which depends on the data center).
    
    Furthermore, QuakeML publicIDs are designed to be opaque. They may be
    
    compiled from other pieces of information, like timestamps, legacy IDs,
    
    etc., but they need not, they can just be random strings (the resource
    
    part). Therefore, no user or service should rely on parsing publicIDs.
    
    Thanks again, Philip, for bringing up this important issue.
    
    Best regards,
    
    Fabian
    
    A common access pattern is to first do an initial wide but shallow
    
    query, and then return to do a deep but narrow query. For example
    
    asking an fdsn-station ws for stations in a box, displaying those on a
    
    map, and then only going back to ask for channels or response for
    
    stations as the user clicks on them. Another example would be to query
    
    an fdsn-event web service for earthquakes, and then return using
    
    things like includeallorigins=true and includearrivals=true to get
    
    more detailed information for a specific earthquake. The current
    
    combination of the fdsn-event ws query parameters and the quakeml xml
    
    specification currently makes this harder than it should be I feel
    
    because while the fdsn-event has a query based on eventid, there is
    
    not a standard way to put the eventid into the original quakeml.
    
    QuakeML has a publicID for each event, but the structure of this is
    
    complicated enough that it is challenging to parse in a way that
    
    reliably extracts the value that should be returned to the service as
    
    eventid.
    
    I have collected example <event> elements from all of the fdsn-event
    
    web services currently listed on
    
    http://www.fdsn.org/webservices/datacenters/
    
    and as you can see there is quite a variety of ways of including the
    
    eventid, in publicID and elsewhere, which makes it harder for clients
    
    as they have to have code that says if (host == USGS) { do this; }
    
    else if (host = ETHZ) { do that; }
    
    which is hard to maintain and fragile.
    
    While this is not likely a big enough of an issue to issue a revision
    
    of the web services spec, it would be really nice if until the next
    
    revision there could be a consensus on how to provide the eventid. And
    
    when the next revision is created, to make this mandatory and
    
    standardized.
    
    I think I would prefer something simple like the USGS, NCEDC and SCEDC
    
    style where there is an simple attribute that gives the eventid
    
    exactly without parsing, like catalog:eventid="71377596", but of
    
    course the drawback is that currently this is a separate schema
    
    definition from the quakeml standard. Using the publicID would be
    
    better in that that is already part of the quakeml spec, but the
    
    format of the URI is too varied and complicated at present for easy
    
    parsing.
    
    Another solution would be to allow the entire publicID to be returned
    
    via the eventid parameter. This would require the server to be able to
    
    parse its own style of publicID, which seems reasonable. However the
    
    structure of the publicID may also cause problems as it looks like a
    
    URL and so would require escaping/encoding of certain characters. Yet
    
    another solution would be to use text format for the wide but shallow
    
    query and then use quakeml for the deep, but this has the downside of
    
    requiring the client to parse two unrelated data formats.
    
    thanks
    
    Philip
    
    IRIS
    
    <event publicID="smi:service.iris.edu/fdsnws/event/1/query?
    
    eventid=3337497">
    
    NCEDC
    
    <event publicID="quakeml:nc.anss.org/Event/NC/71377596"
    
    catalog:datasource="nc" catalog:dataid="nc71377596"
    
    catalog:eventsource="nc" catalog:eventid="71377596">
    
    SCEDC
    
    <event
    
    publicID="quakeml:service.scedc.caltech.edu/fdsnws/
    
    event/1/query?eventid=37
    
    300872" catalog:datasource="ci" catalog:dataid="ci37300872"
    
    catalog:eventsource="ci" catalog:eventid="37300872">
    
    USGS
    
    <event catalog:datasource="us" catalog:eventsource="us"
    
    catalog:eventid="c000lvb5"
    
    publicID="quakeml:earthquake.usgs.gov/fdsnws/event/1/query?
    
    eventid=usc000lvb
    
    5&format=quakeml">
    
    ETHZ
    
    <event publicID="smi:ch.ethz.sed/sc3a/2017eemfch">
    
    INGV
    
    <event
    
    publicID="smi:webservices.ingv.it/fdsnws/event/1/query?eventId=863301">
    
    ISC
    
    <event publicID="smi:ISC/evid=600516598">
    
    ----------------------
    
    FDSN Working Group III
    
    (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
    
    Sent from the FDSN Message Center
    (http://www.fdsn.org/message-center/)
    
    Update subscription preferences at
    http://www.fdsn.org/account/profile/
    
    --
    
    ------------------------------------------------------------
    
    -----------------
    
    Fabian Euchner phone +41 44 633 7178
    
    Institute of Geophysics fax +41 44 633 1065
    
    ETH Zurich, NO F5 e-mail fabian<at>sed.ethz.ch
    
    Sonneggstrasse 5 orcid.org/0000-0001-6340-7439
    
    8092 Zurich (Switzerland)
    
    ------------------------------------------------------------
    
    -----------------
    
    QuakeML http://quakeml.org QuakePy http://quakepy.org
    
    CSEP http://www.cseptesting.org/centers/eth
    
    ------------------------------------------------------------
    
    -----------------
    
    ----------------------
    
    FDSN Working Group III
    (http://www.fdsn.org/message-center/topic/fdsn-wg3-
    
    products/)
    
    Sent from the FDSN Message Center (http://www.fdsn.org/message-center/)
    
    Update subscription preferences at http://www.fdsn.org/account/profile/
    
    --
    
    -----------------------------------------------------------------------------
    
    Fabian Euchner phone +41 44 633 7178
    
    Institute of Geophysics fax +41 44 633 1065
    
    ETH Zurich, NO F5 e-mail fabian<at>sed.ethz.ch
    
    Sonneggstrasse 5 orcid.org/0000-0001-6340-7439
    
    8092 Zurich (Switzerland)
    
    -----------------------------------------------------------------------------
    
    QuakeML http://quakeml.org QuakePy http://quakepy.org
    
    CSEP http://www.cseptesting.org/centers/eth
    
    -----------------------------------------------------------------------------
    
    ----------------------
    FDSN Working Group III
    (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
    
    Sent from the FDSN Message Center (http://www.fdsn.org/message-center/)
    Update subscription preferences at http://www.fdsn.org/account/profile/
    
    View this message in Google Groups at https://groups.google.com/a/fdsn.org/d/msgid/fdsn-wg3-products/CAGFrVcVCrPYCBOkLQs6QMvQCqxCBgQzz5NVNHVTa6VbYEVjM7g%40mail.gmail.com.