International Federation of Digital Seismograph Networks

Thread: finding eventid in quakeml from fdsn-event ws

None
Started: 2017-03-01 19:45:17
Last activity: 2017-03-06 23:52:55
Philip Crotwell
2017-03-01 19:45:17
Hi all

A common access pattern is to first do an initial wide but shallow
query, and then return to do a deep but narrow query. For example
asking an fdsn-station ws for stations in a box, displaying those on a
map, and then only going back to ask for channels or response for
stations as the user clicks on them. Another example would be to query
an fdsn-event web service for earthquakes, and then return using
things like includeallorigins=true and includearrivals=true to get
more detailed information for a specific earthquake. The current
combination of the fdsn-event ws query parameters and the quakeml xml
specification currently makes this harder than it should be I feel
because while the fdsn-event has a query based on eventid, there is
not a standard way to put the eventid into the original quakeml.
QuakeML has a publicID for each event, but the structure of this is
complicated enough that it is challenging to parse in a way that
reliably extracts the value that should be returned to the service as
eventid.

I have collected example <event> elements from all of the fdsn-event
web services currently listed on
http://www.fdsn.org/webservices/datacenters/
and as you can see there is quite a variety of ways of including the
eventid, in publicID and elsewhere, which makes it harder for clients
as they have to have code that says if (host == USGS) { do this; }
else if (host = ETHZ) { do that; }
which is hard to maintain and fragile.

While this is not likely a big enough of an issue to issue a revision
of the web services spec, it would be really nice if until the next
revision there could be a consensus on how to provide the eventid. And
when the next revision is created, to make this mandatory and
standardized.

I think I would prefer something simple like the USGS, NCEDC and SCEDC
style where there is an simple attribute that gives the eventid
exactly without parsing, like catalog:eventid="71377596", but of
course the drawback is that currently this is a separate schema
definition from the quakeml standard. Using the publicID would be
better in that that is already part of the quakeml spec, but the
format of the URI is too varied and complicated at present for easy
parsing.

Another solution would be to allow the entire publicID to be returned
via the eventid parameter. This would require the server to be able to
parse its own style of publicID, which seems reasonable. However the
structure of the publicID may also cause problems as it looks like a
URL and so would require escaping/encoding of certain characters. Yet
another solution would be to use text format for the wide but shallow
query and then use quakeml for the deep, but this has the downside of
requiring the client to parse two unrelated data formats.

thanks
Philip


IRIS
<event publicID="smi:service.iris.edu/fdsnws/event/1/query?eventid=3337497">

NCEDC
<event publicID="quakeml:nc.anss.org/Event/NC/71377596"
catalog:datasource="nc" catalog:dataid="nc71377596"
catalog:eventsource="nc" catalog:eventid="71377596">


SCEDC
<event publicID="quakeml:service.scedc.caltech.edu/fdsnws/event/1/query?eventid=37300872"
catalog:datasource="ci" catalog:dataid="ci37300872"
catalog:eventsource="ci" catalog:eventid="37300872">


USGS
<event catalog:datasource="us" catalog:eventsource="us"
catalog:eventid="c000lvb5"
publicID="quakeml:earthquake.usgs.gov/fdsnws/event/1/query?eventid=usc000lvb5&format=quakeml">

ETHZ
<event publicID="smi:ch.ethz.sed/sc3a/2017eemfch">

INGV
<event publicID="smi:webservices.ingv.it/fdsnws/event/1/query?eventId=863301">

ISC
<event publicID="smi:ISC/evid=600516598">

  • Fabian Euchner
    2017-03-03 23:49:07
    Hi Philip, hi all,

    in the QuakeML world, entities (events, origins, picks) are identified through the publicID,
    and *only*" through the publicID. The publicID has been designed in a way that makes it
    easy to be globally unique (authority part, then resource part that is in the hands of the
    issueing agency which ensures uniqueness). The "legacy" event IDs that are used in some
    earthquake catalogs (often just integer numbers) cannot be unique, collisions are likely to
    occur. When designing QuakeML there was a long discussion whether legacy IDs should
    be part of the data model, and there was a consensus that they shouldn't, first because
    their usage should be discouraged (non-uniqueness, not being future-proof, etc), and
    there was also no semantically convincing place in the schema to put them.

    The fdsnws-event standard says when it comes to the eventid query parameter: "event
    identifiers are data center specific". It seems that most implementations expect the legacy
    ID, not the publicID of the event (in fact, in your examples, this holds for all data centers
    expect for ETH). Thanks for pointing this out! I was totally unaware of the fact.

    In my opinion this is a serious specification and implementation flaw. In the next version
    of the event service spec it should be mandatory that eventid is the publicID of the event.
    All services should be queried in the same way, but, e.g., for ETH it is not possible, because
    there are no legacy IDs. In the current situation, the user has to know which service
    requires legacy ID, and which service requires publicID. In addition, the legacy ID is not
    per se contained in the returned QuakeML document, as it is not contained in the
    standard. This makes it hard to find the legacy ID if it exists at all (can be hidden in the
    publicID, or in an extension attribute which depends on the data center).

    Furthermore, QuakeML publicIDs are designed to be opaque. They may be compiled from
    other pieces of information, like timestamps, legacy IDs, etc., but they need not, they can
    just be random strings (the resource part). Therefore, no user or service should rely on
    parsing publicIDs.

    Thanks again, Philip, for bringing up this important issue.

    Best regards,
    Fabian





    A common access pattern is to first do an initial wide but shallow
    query, and then return to do a deep but narrow query. For example
    asking an fdsn-station ws for stations in a box, displaying those on a
    map, and then only going back to ask for channels or response for
    stations as the user clicks on them. Another example would be to query
    an fdsn-event web service for earthquakes, and then return using
    things like includeallorigins=true and includearrivals=true to get
    more detailed information for a specific earthquake. The current
    combination of the fdsn-event ws query parameters and the quakeml xml
    specification currently makes this harder than it should be I feel
    because while the fdsn-event has a query based on eventid, there is
    not a standard way to put the eventid into the original quakeml.
    QuakeML has a publicID for each event, but the structure of this is
    complicated enough that it is challenging to parse in a way that
    reliably extracts the value that should be returned to the service as
    eventid.

    I have collected example <event> elements from all of the fdsn-event
    web services currently listed on
    http://www.fdsn.org/webservices/datacenters/
    and as you can see there is quite a variety of ways of including the
    eventid, in publicID and elsewhere, which makes it harder for clients
    as they have to have code that says if (host == USGS) { do this; }
    else if (host = ETHZ) { do that; }
    which is hard to maintain and fragile.

    While this is not likely a big enough of an issue to issue a revision
    of the web services spec, it would be really nice if until the next
    revision there could be a consensus on how to provide the eventid. And
    when the next revision is created, to make this mandatory and
    standardized.

    I think I would prefer something simple like the USGS, NCEDC and SCEDC
    style where there is an simple attribute that gives the eventid
    exactly without parsing, like catalog:eventid="71377596", but of
    course the drawback is that currently this is a separate schema
    definition from the quakeml standard. Using the publicID would be
    better in that that is already part of the quakeml spec, but the
    format of the URI is too varied and complicated at present for easy
    parsing.

    Another solution would be to allow the entire publicID to be returned
    via the eventid parameter. This would require the server to be able to
    parse its own style of publicID, which seems reasonable. However the
    structure of the publicID may also cause problems as it looks like a
    URL and so would require escaping/encoding of certain characters. Yet
    another solution would be to use text format for the wide but shallow
    query and then use quakeml for the deep, but this has the downside of
    requiring the client to parse two unrelated data formats.

    thanks
    Philip


    IRIS
    <event publicID="smi:service.iris.edu/fdsnws/event/1/query?eventid=3337497">

    NCEDC
    <event publicID="quakeml:nc.anss.org/Event/NC/71377596"
    catalog:datasource="nc" catalog:dataid="nc71377596"
    catalog:eventsource="nc" catalog:eventid="71377596">


    SCEDC
    <event
    publicID="quakeml:service.scedc.caltech.edu/fdsnws/event/1/query?eventid=37
    300872" catalog:datasource="ci" catalog:dataid="ci37300872"
    catalog:eventsource="ci" catalog:eventid="37300872">


    USGS
    <event catalog:datasource="us" catalog:eventsource="us"
    • Jeremy Fee
      2017-03-03 20:51:17
      Hello,

      in the QuakeML world, entities (events, origins, picks) are identified
      through the publicID, and *only*" through the publicID. The publicID has
      been designed in a way that makes it easy to be globally unique (authority
      part, then resource part that is in the hands of the issueing agency which
      ensures uniqueness). The "legacy" event IDs that are used in some
      earthquake catalogs (often just integer numbers) cannot be unique,
      collisions are likely to occur. When designing QuakeML there was a long
      discussion whether legacy IDs should be part of the data model, and there
      was a consensus that they shouldn't, first because their usage should be
      discouraged (non-uniqueness, not being future-proof, etc), and there was
      also no semantically convincing place in the schema to put them.


      USGS handled this by defining an eventsource (typically FDSN network code)
      as a "namespace" for eventids, which eliminates collisions and allowing
      contributors to continue using the existing IDs for events without
      requiring yet another eventid system. We commented on this early in the
      Quakeml 1.2 process (see previous email to the quakeml mailing list below)
      and implemented a custom extension to Quakeml to support our requirements (
      https://github.com/usgs/eqmessageutils/blob/master/etc/quakeml_1.2/AnssCatalog-0.1.xsd
      ) while remaining compatible with the original specification.

      USGS requirements may differ from other organizations, because we aggregate
      multiple earthquake catalogs from many contributors into a single
      "composite" catalog. We consider an event to have multiple IDs, one unique
      id from each contributor (USGS included), and allow events to be referenced
      using any of those IDs. Messages from multiple contributors are associated
      based on location in space and time, and automatic associations can be
      manually overridden when needed. This balances the requirements for a)
      individual organizations to assign a unique identifier and maintain a
      catalog of events and b) operate independently of any central authority.


      The fdsnws-event standard says when it comes to the eventid query
      parameter: "event identifiers are data center specific". It seems that most
      implementations expect the legacy ID, not the publicID of the event (in
      fact, in your examples, this holds for all data centers expect for ETH).
      Thanks for pointing this out! I was totally unaware of the fact.



      In my opinion this is a serious specification and implementation flaw. In
      the next version of the event service spec it should be mandatory that
      eventid is the publicID of the event. All services should be queried in the
      same way, but, e.g., for ETH it is not possible, because there are no
      legacy IDs. In the current situation, the user has to know which service
      requires legacy ID, and which service requires publicID. In addition, the
      legacy ID is not per se contained in the returned QuakeML document, as it
      is not contained in the standard. This makes it hard to find the legacy ID
      if it exists at all (can be hidden in the publicID, or in an extension
      attribute which depends on the data center).



      Furthermore, QuakeML publicIDs are designed to be opaque. They may be
      compiled from other pieces of information, like timestamps, legacy IDs,
      etc., but they need not, they can just be random strings (the resource
      part). Therefore, no user or service should rely on parsing publicIDs.


      If you want to add support to query using public IDs, I recommend
      definition of a new "publicID" parameter for the fdsn service and leave the
      existing eventid parameter unchanged for backward compatibility. An
      additional consideration is how a service should handle multiple versions
      of the same event element (assuming ordering based on
      event/creationInfo/creationTime). It may be simpler for an explicit
      "detailURL" or similar attribute to be added to the event element to
      support the explicit use case to obtain more information.

      We introduced our custom extension to support these requirements of being
      able to uniquely identify events, and individual pieces of information
      being contributed to those events, because of the suggestion (see below)
      that there may be aliases for publicIDs and they would not be guaranteed to
      be universally unique. I suggest that rather than adding/changing meaning
      of an otherwise opaque and non-unique identifier, that explicit attributes
      or elements be created for these purposes (or that the AnssCatalog
      extension be more widely adopted).


      Thanks,

      Jeremy


      Previous email to quakeml mailing list (couldn't find the list archives
      online):

      From: Fabian Euchner <fabian.euchner<at>sed.ethz.ch>

      Date: March 23, 2011 9:35:11 AM MDT

      To: Jeremy M Fee <jmfee<at>usgs.gov>

      Cc: <QuakeML<at>intensity.usc.edu>, <kfelzer<at>gps.caltech.edu>, Michelle

      Guy <mguy<at>usgs.gov>

      Subject: Re: Fwd: [QuakeML] question about authority-id and resource-

      id in public identifiers


      Hi Jeremy et al.,


      the section on resource identifiers in the standard doc is based on

      some, up

      to now rather theoretical, thoughts on how a resource metadata

      framework could

      look like. This is very much inspired (I could also say borrowed)

      from how

      this is handled in the Astrophysical Virtual Observatory

      community ;-) To be

      honest, I don't know how agencies that already use QuakeML interpret &

      implement it and you are of course right that it's not specified in

      detail in

      the standard doc. Since it is a standard doc on the markup language,

      I think

      it's not the right place to specify it, there should be a second

      document that

      is more focused on infrastructure. *I think it's pretty clear that an*

      * identifier cannot refer to two different resources, but I could*

      * imagine that a*

      * resource can be referenced by more than one identifier from the same*

      * authority*

      * (aliases).* Thanks for starting this discussion, it's an important

      point if

      QuakeML starts to play a more important role in our networked

      infrastructures.


      Cheers,

      Fabian



      On Fri March 18 2011 20:55:50 Jeremy M Fee wrote:

      Hi Fabian,



      I received a reply from Karen at CalTech, but I'd like to know if

      these assumptions are also safe across all QuakeML implementations:

      1) An authority always refers to the same resource using the same

      resourceID.

      2) When an authority updates a resource, the same resourceID is used



      And as a result of the previous two assumptions:

      3) When one authority submits two different event resourceIDs, they

      refer to different events.





      I've read the QuakeML-BED.pdf for version 1.1, and cannot find

      anything imposing this restriction. At USGS we rely on this to 1)

      track updates to existing events, and 2) distinguish events that are

      so close in space and time they would otherwise be considered the

      same

      event.





      Thanks,



      Jeremy



      Begin forwarded message:

      From: Karen Felzer <kfelzer<at>gps.caltech.edu>

      Date: March 17, 2011 4:23:32 PM MDT

      To: Jeremy M Fee <jmfee<at>usgs.gov>

      Cc: quakeml<at>intensity.usc.edu

      Subject: Re: [QuakeML] question about authority-id and resource-id

      in public identifiers



      Yes -- information for the same earthquake should always be reported

      under the same earthquake ID number.



      regards,

      Karen Felzer



      On Mar 17, 2011, at 2:22 PM, Jeremy M Fee wrote:

      Hi,



      Is it a safe assumption that an authority will always refer to the

      same resource using the same resource id? Meaning, if an authority

      submits an event under one resource id, they will always reuse that

      same resource id when updating event information (and identify

      version information separately)? This would make it much easier to

      recognize updates, versus new information.





      Thanks,



      Jeremy

      _______________________________________________

      QuakeML mailing list

      QuakeML<at>intensity.usc.edu

      http://intensity.usc.edu/mailman/listinfo/quakeml




      --



      -------------------------------------------------------------------------------


      Fabian Euchner phone +41 44 633 7178

      Swiss Seismological Service fax +41 44 633 1065

      ETH Zurich, NO F67 e-mail fabian<at>sed.ethz.ch

      Sonneggstrasse 5 www.fabian-euchner.de

      8092 Zurich (Switzerland)

      www.earthquake.ethz.ch/people/feuchner



      -------------------------------------------------------------------------------


      QuakeML http://quakeml.org AstroCat http://astrocat.org

      QuakePy http://quakepy.org CVcat http://cvcat.net

      CSEP http://www.cseptesting.org



      -------------------------------------------------------------------------------





      On Fri, Mar 3, 2017 at 7:50 AM, Fabian Euchner <fabian.euchner<at>sed.ethz.ch>
      wrote:

      Hi Philip, hi all,



      in the QuakeML world, entities (events, origins, picks) are identified
      through the publicID, and *only*" through the publicID. The publicID has
      been designed in a way that makes it easy to be globally unique (authority
      part, then resource part that is in the hands of the issueing agency which
      ensures uniqueness). The "legacy" event IDs that are used in some
      earthquake catalogs (often just integer numbers) cannot be unique,
      collisions are likely to occur. When designing QuakeML there was a long
      discussion whether legacy IDs should be part of the data model, and there
      was a consensus that they shouldn't, first because their usage should be
      discouraged (non-uniqueness, not being future-proof, etc), and there was
      also no semantically convincing place in the schema to put them.



      The fdsnws-event standard says when it comes to the eventid query
      parameter: "event identifiers are data center specific". It seems that most
      implementations expect the legacy ID, not the publicID of the event (in
      fact, in your examples, this holds for all data centers expect for ETH).
      Thanks for pointing this out! I was totally unaware of the fact.



      In my opinion this is a serious specification and implementation flaw. In
      the next version of the event service spec it should be mandatory that
      eventid is the publicID of the event. All services should be queried in the
      same way, but, e.g., for ETH it is not possible, because there are no
      legacy IDs. In the current situation, the user has to know which service
      requires legacy ID, and which service requires publicID. In addition, the
      legacy ID is not per se contained in the returned QuakeML document, as it
      is not contained in the standard. This makes it hard to find the legacy ID
      if it exists at all (can be hidden in the publicID, or in an extension
      attribute which depends on the data center).



      Furthermore, QuakeML publicIDs are designed to be opaque. They may be
      compiled from other pieces of information, like timestamps, legacy IDs,
      etc., but they need not, they can just be random strings (the resource
      part). Therefore, no user or service should rely on parsing publicIDs.



      Thanks again, Philip, for bringing up this important issue.



      Best regards,

      Fabian











      A common access pattern is to first do an initial wide but shallow

      query, and then return to do a deep but narrow query. For example

      asking an fdsn-station ws for stations in a box, displaying those on a

      map, and then only going back to ask for channels or response for

      stations as the user clicks on them. Another example would be to query

      an fdsn-event web service for earthquakes, and then return using

      things like includeallorigins=true and includearrivals=true to get

      more detailed information for a specific earthquake. The current

      combination of the fdsn-event ws query parameters and the quakeml xml

      specification currently makes this harder than it should be I feel

      because while the fdsn-event has a query based on eventid, there is

      not a standard way to put the eventid into the original quakeml.

      QuakeML has a publicID for each event, but the structure of this is

      complicated enough that it is challenging to parse in a way that

      reliably extracts the value that should be returned to the service as

      eventid.



      I have collected example <event> elements from all of the fdsn-event

      web services currently listed on

      http://www.fdsn.org/webservices/datacenters/

      and as you can see there is quite a variety of ways of including the

      eventid, in publicID and elsewhere, which makes it harder for clients

      as they have to have code that says if (host == USGS) { do this; }

      else if (host = ETHZ) { do that; }

      which is hard to maintain and fragile.



      While this is not likely a big enough of an issue to issue a revision

      of the web services spec, it would be really nice if until the next

      revision there could be a consensus on how to provide the eventid. And

      when the next revision is created, to make this mandatory and

      standardized.



      I think I would prefer something simple like the USGS, NCEDC and SCEDC

      style where there is an simple attribute that gives the eventid

      exactly without parsing, like catalog:eventid="71377596", but of

      course the drawback is that currently this is a separate schema

      definition from the quakeml standard. Using the publicID would be

      better in that that is already part of the quakeml spec, but the

      format of the URI is too varied and complicated at present for easy

      parsing.



      Another solution would be to allow the entire publicID to be returned

      via the eventid parameter. This would require the server to be able to

      parse its own style of publicID, which seems reasonable. However the

      structure of the publicID may also cause problems as it looks like a

      URL and so would require escaping/encoding of certain characters. Yet

      another solution would be to use text format for the wide but shallow

      query and then use quakeml for the deep, but this has the downside of

      requiring the client to parse two unrelated data formats.



      thanks

      Philip





      IRIS

      <event publicID="smi:service.iris.edu/fdsnws/event/1/query?
      eventid=3337497">



      NCEDC

      <event publicID="quakeml:nc.anss.org/Event/NC/71377596"

      catalog:datasource="nc" catalog:dataid="nc71377596"

      catalog:eventsource="nc" catalog:eventid="71377596">





      SCEDC

      <event

      publicID="quakeml:service.scedc.caltech.edu/fdsnws/
      event/1/query?eventid=37

      300872" catalog:datasource="ci" catalog:dataid="ci37300872"

      catalog:eventsource="ci" catalog:eventid="37300872">





      USGS

      <event catalog:datasource="us" catalog:eventsource="us"

      catalog:eventid="c000lvb5"

      publicID="quakeml:earthquake.usgs.gov/fdsnws/event/1/query?
      eventid=usc000lvb

      5&format=quakeml">



      ETHZ

      <event publicID="smi:ch.ethz.sed/sc3a/2017eemfch">



      INGV

      <event

      publicID="smi:webservices.ingv.it/fdsnws/event/1/query?eventId=863301">



      ISC

      <event publicID="smi:ISC/evid=600516598">



      ----------------------

      FDSN Working Group III

      (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)



      Sent from the FDSN Message Center (http://www.fdsn.org/message-center/)

      Update subscription preferences at http://www.fdsn.org/account/profile/





      --

      ------------------------------------------------------------
      -----------------

      Fabian Euchner phone +41 44 633 7178

      Institute of Geophysics fax +41 44 633 1065

      ETH Zurich, NO F5 e-mail fabian<at>sed.ethz.ch

      Sonneggstrasse 5 orcid.org/0000-0001-6340-7439

      8092 Zurich (Switzerland)

      ------------------------------------------------------------
      -----------------

      QuakeML http://quakeml.org QuakePy http://quakepy.org

      CSEP http://www.cseptesting.org/centers/eth

      ------------------------------------------------------------
      -----------------




      ----------------------
      FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-
      products/)

      Sent from the FDSN Message Center (http://www.fdsn.org/message-center/)
      Update subscription preferences at http://www.fdsn.org/account/profile/



      • Fabian Euchner
        2017-03-04 05:47:06
        Hello Jeremy, hello all,

        first, let me apologize if somebody found my comment too harsh or offending. That was
        absolutely not my intention.

        Since the fdsnws-event default output format is QuakeML, I assume that the QuakeML
        data model is the common minimum standard data model. Since legacy IDs are not
        contained therein, I think using them as a query parameter should not the common
        standard way to query individual event information. Therefore, I would suggest that a
        next iteration of the event service specification defines a new query parameter, maybe
        called eventpublicid, that is implemented by all data centers to query on event publicIDs,
        which are mandatory in all result QuakeML documents. If some data centers want to
        additionaly provide a query parameter for legacy IDs, that's fine for me. Every user
        querying based on this has to know what she/he does, and how to deal with results.

        All the best,
        Fabian


        Hello,

        in the QuakeML world, entities (events, origins, picks) are identified

        through the publicID, and *only*" through the publicID. The publicID has
        been designed in a way that makes it easy to be globally unique (authority
        part, then resource part that is in the hands of the issueing agency which
        ensures uniqueness). The "legacy" event IDs that are used in some
        earthquake catalogs (often just integer numbers) cannot be unique,
        collisions are likely to occur. When designing QuakeML there was a long
        discussion whether legacy IDs should be part of the data model, and there
        was a consensus that they shouldn't, first because their usage should be
        discouraged (non-uniqueness, not being future-proof, etc), and there was
        also no semantically convincing place in the schema to put them.

        USGS handled this by defining an eventsource (typically FDSN network code)
        as a "namespace" for eventids, which eliminates collisions and allowing
        contributors to continue using the existing IDs for events without
        requiring yet another eventid system. We commented on this early in the
        Quakeml 1.2 process (see previous email to the quakeml mailing list below)
        and implemented a custom extension to Quakeml to support our requirements (
        https://github.com/usgs/eqmessageutils/blob/master/etc/quakeml_1.2/AnssCatal
        og-0.1.xsd ) while remaining compatible with the original specification.

        USGS requirements may differ from other organizations, because we aggregate
        multiple earthquake catalogs from many contributors into a single
        "composite" catalog. We consider an event to have multiple IDs, one unique
        id from each contributor (USGS included), and allow events to be referenced
        using any of those IDs. Messages from multiple contributors are associated
        based on location in space and time, and automatic associations can be
        manually overridden when needed. This balances the requirements for a)
        individual organizations to assign a unique identifier and maintain a
        catalog of events and b) operate independently of any central authority.


        The fdsnws-event standard says when it comes to the eventid query

        parameter: "event identifiers are data center specific". It seems that
        most
        implementations expect the legacy ID, not the publicID of the event (in
        fact, in your examples, this holds for all data centers expect for ETH).
        Thanks for pointing this out! I was totally unaware of the fact.

        In my opinion this is a serious specification and implementation flaw. In

        the next version of the event service spec it should be mandatory that
        eventid is the publicID of the event. All services should be queried in
        the
        same way, but, e.g., for ETH it is not possible, because there are no
        legacy IDs. In the current situation, the user has to know which service
        requires legacy ID, and which service requires publicID. In addition, the
        legacy ID is not per se contained in the returned QuakeML document, as it
        is not contained in the standard. This makes it hard to find the legacy ID
        if it exists at all (can be hidden in the publicID, or in an extension
        attribute which depends on the data center).

        Furthermore, QuakeML publicIDs are designed to be opaque. They may be

        compiled from other pieces of information, like timestamps, legacy IDs,
        etc., but they need not, they can just be random strings (the resource
        part). Therefore, no user or service should rely on parsing publicIDs.

        If you want to add support to query using public IDs, I recommend
        definition of a new "publicID" parameter for the fdsn service and leave the
        existing eventid parameter unchanged for backward compatibility. An
        additional consideration is how a service should handle multiple versions
        of the same event element (assuming ordering based on
        event/creationInfo/creationTime). It may be simpler for an explicit
        "detailURL" or similar attribute to be added to the event element to
        support the explicit use case to obtain more information.

        We introduced our custom extension to support these requirements of being
        able to uniquely identify events, and individual pieces of information
        being contributed to those events, because of the suggestion (see below)
        that there may be aliases for publicIDs and they would not be guaranteed to
        be universally unique. I suggest that rather than adding/changing meaning
        of an otherwise opaque and non-unique identifier, that explicit attributes
        or elements be created for these purposes (or that the AnssCatalog
        extension be more widely adopted).


        Thanks,

        Jeremy


        Previous email to quakeml mailing list (couldn't find the list archives

        online):
        From: Fabian Euchner <fabian.euchner<at>sed.ethz.ch>

        Date: March 23, 2011 9:35:11 AM MDT

        • Philip Crotwell
          2017-03-06 23:52:55
          Hi

          The main question I have as a client writer is how do I get from a
          general fdsn event query, with many events, to a detailed query for a
          single event without server-specific code. As best I can figure out,
          there is no simple answer now.

          The best I can come up with is this algorithm. I presume everyone
          agrees this is needlessly complicated and as there is not a good
          default action, it is unable to handle a new fdsn event web services
          without rewriting the code.

          1) If (IRIS or INGV):
          use publicID as full URL after replacing "smi:" with "http://"
          2) if (USGS or SCEDC):
          use publicID as full URL after replacing "quakeml:" with "http://"
          3) if (NCEDC):
          use catalog:eventid as eventid parameter
          4) if (ETHZ):
          use entire publicID (including smi:) as eventid parameter
          5) if (ISC):
          parse publicID as a URL and use the value of the evid parameter as
          eventid parameter


          One further note is that although the USGS, SCEDC and NCEDC appear to
          use the same anss "catalog" quakeml extension, they interpret the
          fdsnevent eventid parameter differently. The USGS requires the
          concatenation of catalog:eventsource and catalog:eventid while NCEDC
          and SCEDC both accept only catalog:eventid as the eventid parameter.

          What a client needs is to be able to use a single value from a quakeml
          event as the eventid. Theoretically, the publicID appears as if it is
          supposed to be that value, but as a practical matter only works for
          one out of the seven services. And the publicID as it is currently
          specified is not friendly to being used as a URL parameter as it is
          possible (and very common) to have it include the '&' character.
          Without escaping that char, the resulting URL will be wrong. IMHO, a
          friendly eventid value really should not require processing in order
          to be added to a URL.

          There are two questions I feel. First, can there be a recommendation
          as to what a current fdsn event web service should should accept as
          the eventid parameter? Second, if there is a revision of the spec,
          what should we change to make this easier?

          Absent a more specific publicID format, I don't see a good option that
          doesn't require almost everyone to make server changes. Perhaps
          accepting the full publicID as the eventid, in addition to whatever
          the current implementation, is the least bad?

          As to the longer term, perhaps adding a "publicid=" parameter to the
          fdsn event query is the clearest and most direct solution. But I still
          feel that existing publicIDs are too verbose and unfriendly for use in
          URLs. Perhaps some of this could be addressed in both in quakeml 2.0
          by making the structure of the publicID cleaner or simpler, and by an
          explicit mapping from quakeml event parameters to a url?

          thanks
          Philip


          On Fri, Mar 3, 2017 at 3:48 PM, Fabian Euchner
          <fabian.euchner<at>sed.ethz.ch> wrote:
          Hello Jeremy, hello all,



          first, let me apologize if somebody found my comment too harsh or offending.
          That was absolutely not my intention.



          Since the fdsnws-event default output format is QuakeML, I assume that the
          QuakeML data model is the common minimum standard data model. Since legacy
          IDs are not contained therein, I think using them as a query parameter
          should not the common standard way to query individual event information.
          Therefore, I would suggest that a next iteration of the event service
          specification defines a new query parameter, maybe called eventpublicid,
          that is implemented by all data centers to query on event publicIDs, which
          are mandatory in all result QuakeML documents. If some data centers want to
          additionaly provide a query parameter for legacy IDs, that's fine for me.
          Every user querying based on this has to know what she/he does, and how to
          deal with results.



          All the best,

          Fabian





          Hello,



          in the QuakeML world, entities (events, origins, picks) are identified



          through the publicID, and *only*" through the publicID. The publicID has

          been designed in a way that makes it easy to be globally unique
          (authority

          part, then resource part that is in the hands of the issueing agency
          which

          ensures uniqueness). The "legacy" event IDs that are used in some

          earthquake catalogs (often just integer numbers) cannot be unique,

          collisions are likely to occur. When designing QuakeML there was a long

          discussion whether legacy IDs should be part of the data model, and
          there

          was a consensus that they shouldn't, first because their usage should be

          discouraged (non-uniqueness, not being future-proof, etc), and there was

          also no semantically convincing place in the schema to put them.



          USGS handled this by defining an eventsource (typically FDSN network code)

          as a "namespace" for eventids, which eliminates collisions and allowing

          contributors to continue using the existing IDs for events without

          requiring yet another eventid system. We commented on this early in the

          Quakeml 1.2 process (see previous email to the quakeml mailing list below)

          and implemented a custom extension to Quakeml to support our requirements
          (


          https://github.com/usgs/eqmessageutils/blob/master/etc/quakeml_1.2/AnssCatal

          og-0.1.xsd ) while remaining compatible with the original specification.



          USGS requirements may differ from other organizations, because we
          aggregate

          multiple earthquake catalogs from many contributors into a single

          "composite" catalog. We consider an event to have multiple IDs, one unique

          id from each contributor (USGS included), and allow events to be
          referenced

          using any of those IDs. Messages from multiple contributors are associated

          based on location in space and time, and automatic associations can be

          manually overridden when needed. This balances the requirements for a)

          individual organizations to assign a unique identifier and maintain a

          catalog of events and b) operate independently of any central authority.





          The fdsnws-event standard says when it comes to the eventid query



          parameter: "event identifiers are data center specific". It seems that

          most

          implementations expect the legacy ID, not the publicID of the event (in

          fact, in your examples, this holds for all data centers expect for ETH).

          Thanks for pointing this out! I was totally unaware of the fact.



          In my opinion this is a serious specification and implementation flaw. In



          the next version of the event service spec it should be mandatory that

          eventid is the publicID of the event. All services should be queried in

          the

          same way, but, e.g., for ETH it is not possible, because there are no

          legacy IDs. In the current situation, the user has to know which service

          requires legacy ID, and which service requires publicID. In addition,
          the

          legacy ID is not per se contained in the returned QuakeML document, as
          it

          is not contained in the standard. This makes it hard to find the legacy
          ID

          if it exists at all (can be hidden in the publicID, or in an extension

          attribute which depends on the data center).



          Furthermore, QuakeML publicIDs are designed to be opaque. They may be



          compiled from other pieces of information, like timestamps, legacy IDs,

          etc., but they need not, they can just be random strings (the resource

          part). Therefore, no user or service should rely on parsing publicIDs.



          If you want to add support to query using public IDs, I recommend

          definition of a new "publicID" parameter for the fdsn service and leave
          the

          existing eventid parameter unchanged for backward compatibility. An

          additional consideration is how a service should handle multiple versions

          of the same event element (assuming ordering based on

          event/creationInfo/creationTime). It may be simpler for an explicit

          "detailURL" or similar attribute to be added to the event element to

          support the explicit use case to obtain more information.



          We introduced our custom extension to support these requirements of being

          able to uniquely identify events, and individual pieces of information

          being contributed to those events, because of the suggestion (see below)

          that there may be aliases for publicIDs and they would not be guaranteed
          to

          be universally unique. I suggest that rather than adding/changing meaning

          of an otherwise opaque and non-unique identifier, that explicit attributes

          or elements be created for these purposes (or that the AnssCatalog

          extension be more widely adopted).





          Thanks,



          Jeremy





          Previous email to quakeml mailing list (couldn't find the list archives



          online):

          From: Fabian Euchner <fabian.euchner<at>sed.ethz.ch>



          Date: March 23, 2011 9:35:11 AM MDT



          To: Jeremy M Fee <jmfee<at>usgs.gov>



          Cc: <QuakeML<at>intensity.usc.edu>, <kfelzer<at>gps.caltech.edu>, Michelle



          Guy <mguy<at>usgs.gov>



          Subject: Re: Fwd: [QuakeML] question about authority-id and resource-



          id in public identifiers



          Hi Jeremy et al.,





          the section on resource identifiers in the standard doc is based on



          some, up



          to now rather theoretical, thoughts on how a resource metadata



          framework could



          look like. This is very much inspired (I could also say borrowed)



          from how



          this is handled in the Astrophysical Virtual Observatory



          community ;-) To be



          honest, I don't know how agencies that already use QuakeML interpret &



          implement it and you are of course right that it's not specified in



          detail in



          the standard doc. Since it is a standard doc on the markup language,



          I think



          it's not the right place to specify it, there should be a second



          document that



          is more focused on infrastructure. *I think it's pretty clear that an*



          * identifier cannot refer to two different resources, but I could*



          * imagine that a*



          * resource can be referenced by more than one identifier from the same*



          * authority*



          * (aliases).* Thanks for starting this discussion, it's an important



          point if



          QuakeML starts to play a more important role in our networked



          infrastructures.



          Cheers,



          Fabian



          On Fri March 18 2011 20:55:50 Jeremy M Fee wrote:

          Hi Fabian,







          I received a reply from Karen at CalTech, but I'd like to know if



          these assumptions are also safe across all QuakeML implementations:

          1) An authority always refers to the same resource using the same



          resourceID.



          2) When an authority updates a resource, the same resourceID is

          used



          And as a result of the previous two assumptions:

          3) When one authority submits two different event resourceIDs, they



          refer to different events.











          I've read the QuakeML-BED.pdf for version 1.1, and cannot find



          anything imposing this restriction. At USGS we rely on this to 1)



          track updates to existing events, and 2) distinguish events that are



          so close in space and time they would otherwise be considered the



          same



          event.











          Thanks,







          Jeremy



          Begin forwarded message:

          From: Karen Felzer <kfelzer<at>gps.caltech.edu>



          Date: March 17, 2011 4:23:32 PM MDT



          To: Jeremy M Fee <jmfee<at>usgs.gov>



          Cc: quakeml<at>intensity.usc.edu



          Subject: Re: [QuakeML] question about authority-id and resource-id



          in public identifiers







          Yes -- information for the same earthquake should always be reported



          under the same earthquake ID number.







          regards,



          Karen Felzer



          On Mar 17, 2011, at 2:22 PM, Jeremy M Fee wrote:

          Hi,







          Is it a safe assumption that an authority will always refer to the



          same resource using the same resource id? Meaning, if an authority



          submits an event under one resource id, they will always reuse that



          same resource id when updating event information (and identify



          version information separately)? This would make it much easier to



          recognize updates, versus new information.











          Thanks,







          Jeremy



          _______________________________________________



          QuakeML mailing list



          QuakeML<at>intensity.usc.edu



          http://intensity.usc.edu/mailman/listinfo/quakeml



          --




          -------------------------------------------------------------------------

          ------>>

          Fabian Euchner phone +41 44 633 7178



          Swiss Seismological Service fax +41 44 633 1065



          ETH Zurich, NO F67 e-mail fabian<at>sed.ethz.ch



          Sonneggstrasse 5 www.fabian-euchner.de



          8092 Zurich (Switzerland)



          www.earthquake.ethz.ch/people/feuchner




          -------------------------------------------------------------------------

          ------>>

          QuakeML http://quakeml.org AstroCat http://astrocat.org



          QuakePy http://quakepy.org CVcat http://cvcat.net



          CSEP http://www.cseptesting.org




          -------------------------------------------------------------------------

          ------

          On Fri, Mar 3, 2017 at 7:50 AM, Fabian Euchner
          <fabian.euchner<at>sed.ethz.ch>



          wrote:

          Hi Philip, hi all,







          in the QuakeML world, entities (events, origins, picks) are identified

          through the publicID, and *only*" through the publicID. The publicID has

          been designed in a way that makes it easy to be globally unique
          (authority

          part, then resource part that is in the hands of the issueing agency
          which

          ensures uniqueness). The "legacy" event IDs that are used in some

          earthquake catalogs (often just integer numbers) cannot be unique,

          collisions are likely to occur. When designing QuakeML there was a long

          discussion whether legacy IDs should be part of the data model, and
          there

          was a consensus that they shouldn't, first because their usage should be

          discouraged (non-uniqueness, not being future-proof, etc), and there was

          also no semantically convincing place in the schema to put them.







          The fdsnws-event standard says when it comes to the eventid query

          parameter: "event identifiers are data center specific". It seems that

          most

          implementations expect the legacy ID, not the publicID of the event (in

          fact, in your examples, this holds for all data centers expect for ETH).

          Thanks for pointing this out! I was totally unaware of the fact.







          In my opinion this is a serious specification and implementation flaw.
          In

          the next version of the event service spec it should be mandatory that

          eventid is the publicID of the event. All services should be queried in

          the

          same way, but, e.g., for ETH it is not possible, because there are no

          legacy IDs. In the current situation, the user has to know which service

          requires legacy ID, and which service requires publicID. In addition,
          the

          legacy ID is not per se contained in the returned QuakeML document, as
          it

          is not contained in the standard. This makes it hard to find the legacy
          ID

          if it exists at all (can be hidden in the publicID, or in an extension

          attribute which depends on the data center).







          Furthermore, QuakeML publicIDs are designed to be opaque. They may be

          compiled from other pieces of information, like timestamps, legacy IDs,

          etc., but they need not, they can just be random strings (the resource

          part). Therefore, no user or service should rely on parsing publicIDs.







          Thanks again, Philip, for bringing up this important issue.







          Best regards,



          Fabian



          A common access pattern is to first do an initial wide but shallow



          query, and then return to do a deep but narrow query. For example



          asking an fdsn-station ws for stations in a box, displaying those on a



          map, and then only going back to ask for channels or response for



          stations as the user clicks on them. Another example would be to query



          an fdsn-event web service for earthquakes, and then return using



          things like includeallorigins=true and includearrivals=true to get



          more detailed information for a specific earthquake. The current



          combination of the fdsn-event ws query parameters and the quakeml xml



          specification currently makes this harder than it should be I feel



          because while the fdsn-event has a query based on eventid, there is



          not a standard way to put the eventid into the original quakeml.



          QuakeML has a publicID for each event, but the structure of this is



          complicated enough that it is challenging to parse in a way that



          reliably extracts the value that should be returned to the service as



          eventid.







          I have collected example <event> elements from all of the fdsn-event



          web services currently listed on



          http://www.fdsn.org/webservices/datacenters/



          and as you can see there is quite a variety of ways of including the



          eventid, in publicID and elsewhere, which makes it harder for clients



          as they have to have code that says if (host == USGS) { do this; }



          else if (host = ETHZ) { do that; }



          which is hard to maintain and fragile.







          While this is not likely a big enough of an issue to issue a revision



          of the web services spec, it would be really nice if until the next



          revision there could be a consensus on how to provide the eventid. And



          when the next revision is created, to make this mandatory and



          standardized.







          I think I would prefer something simple like the USGS, NCEDC and SCEDC



          style where there is an simple attribute that gives the eventid



          exactly without parsing, like catalog:eventid="71377596", but of



          course the drawback is that currently this is a separate schema



          definition from the quakeml standard. Using the publicID would be



          better in that that is already part of the quakeml spec, but the



          format of the URI is too varied and complicated at present for easy



          parsing.







          Another solution would be to allow the entire publicID to be returned



          via the eventid parameter. This would require the server to be able to



          parse its own style of publicID, which seems reasonable. However the



          structure of the publicID may also cause problems as it looks like a



          URL and so would require escaping/encoding of certain characters. Yet



          another solution would be to use text format for the wide but shallow



          query and then use quakeml for the deep, but this has the downside of



          requiring the client to parse two unrelated data formats.







          thanks



          Philip











          IRIS



          <event publicID="smi:service.iris.edu/fdsnws/event/1/query?



          eventid=3337497">



          NCEDC



          <event publicID="quakeml:nc.anss.org/Event/NC/71377596"



          catalog:datasource="nc" catalog:dataid="nc71377596"



          catalog:eventsource="nc" catalog:eventid="71377596">











          SCEDC



          <event



          publicID="quakeml:service.scedc.caltech.edu/fdsnws/



          event/1/query?eventid=37



          300872" catalog:datasource="ci" catalog:dataid="ci37300872"



          catalog:eventsource="ci" catalog:eventid="37300872">











          USGS



          <event catalog:datasource="us" catalog:eventsource="us"



          catalog:eventid="c000lvb5"



          publicID="quakeml:earthquake.usgs.gov/fdsnws/event/1/query?



          eventid=usc000lvb



          5&format=quakeml">







          ETHZ



          <event publicID="smi:ch.ethz.sed/sc3a/2017eemfch">







          INGV



          <event




          publicID="smi:webservices.ingv.it/fdsnws/event/1/query?eventId=863301">







          ISC



          <event publicID="smi:ISC/evid=600516598">







          ----------------------



          FDSN Working Group III



          (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)







          Sent from the FDSN Message Center
          (http://www.fdsn.org/message-center/)



          Update subscription preferences at
          http://www.fdsn.org/account/profile/



          --



          ------------------------------------------------------------

          -----------------



          Fabian Euchner phone +41 44 633 7178



          Institute of Geophysics fax +41 44 633 1065



          ETH Zurich, NO F5 e-mail fabian<at>sed.ethz.ch



          Sonneggstrasse 5 orcid.org/0000-0001-6340-7439



          8092 Zurich (Switzerland)



          ------------------------------------------------------------

          -----------------



          QuakeML http://quakeml.org QuakePy http://quakepy.org



          CSEP http://www.cseptesting.org/centers/eth



          ------------------------------------------------------------

          -----------------









          ----------------------

          FDSN Working Group III
          (http://www.fdsn.org/message-center/topic/fdsn-wg3-

          products/)



          Sent from the FDSN Message Center (http://www.fdsn.org/message-center/)

          Update subscription preferences at http://www.fdsn.org/account/profile/





          --

          -----------------------------------------------------------------------------

          Fabian Euchner phone +41 44 633 7178

          Institute of Geophysics fax +41 44 633 1065

          ETH Zurich, NO F5 e-mail fabian<at>sed.ethz.ch

          Sonneggstrasse 5 orcid.org/0000-0001-6340-7439

          8092 Zurich (Switzerland)

          -----------------------------------------------------------------------------

          QuakeML http://quakeml.org QuakePy http://quakepy.org

          CSEP http://www.cseptesting.org/centers/eth

          -----------------------------------------------------------------------------





          ----------------------
          FDSN Working Group III
          (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)

          Sent from the FDSN Message Center (http://www.fdsn.org/message-center/)
          Update subscription preferences at http://www.fdsn.org/account/profile/