A common access pattern is to first do an initial wide but shallow
query, and then return to do a deep but narrow query. For example
asking an fdsn-station ws for stations in a box, displaying those on a
map, and then only going back to ask for channels or response for
stations as the user clicks on them. Another example would be to query
an fdsn-event web service for earthquakes, and then return using
things like includeallorigins=true and includearrivals=true to get
more detailed information for a specific earthquake. The current
combination of the fdsn-event ws query parameters and the quakeml xml
specification currently makes this harder than it should be I feel
because while the fdsn-event has a query based on eventid, there is
not a standard way to put the eventid into the original quakeml.
QuakeML has a publicID for each event, but the structure of this is
complicated enough that it is challenging to parse in a way that
reliably extracts the value that should be returned to the service as
eventid.
I have collected example <event> elements from all of the fdsn-event
web services currently listed on
http://www.fdsn.org/webservices/datacenters/
and as you can see there is quite a variety of ways of including the
eventid, in publicID and elsewhere, which makes it harder for clients
as they have to have code that says if (host == USGS) { do this; }
else if (host = ETHZ) { do that; }
which is hard to maintain and fragile.
While this is not likely a big enough of an issue to issue a revision
of the web services spec, it would be really nice if until the next
revision there could be a consensus on how to provide the eventid. And
when the next revision is created, to make this mandatory and
standardized.
I think I would prefer something simple like the USGS, NCEDC and SCEDC
style where there is an simple attribute that gives the eventid
exactly without parsing, like catalog:eventid="71377596", but of
course the drawback is that currently this is a separate schema
definition from the quakeml standard. Using the publicID would be
better in that that is already part of the quakeml spec, but the
format of the URI is too varied and complicated at present for easy
parsing.
Another solution would be to allow the entire publicID to be returned
via the eventid parameter. This would require the server to be able to
parse its own style of publicID, which seems reasonable. However the
structure of the publicID may also cause problems as it looks like a
URL and so would require escaping/encoding of certain characters. Yet
another solution would be to use text format for the wide but shallow
query and then use quakeml for the deep, but this has the downside of
requiring the client to parse two unrelated data formats.
thanks
Philip
IRIS
<event publicID="smi:service.iris.edu/fdsnws/event/1/query?eventid=3337497">
NCEDC
<event publicID="quakeml:nc.anss.org/Event/NC/71377596"
catalog:datasource="nc" catalog:dataid="nc71377596"
catalog:eventsource="nc" catalog:eventid="71377596">
SCEDC
<event
publicID="quakeml:service.scedc.caltech.edu/fdsnws/event/1/query?eventid=37
300872" catalog:datasource="ci" catalog:dataid="ci37300872"
catalog:eventsource="ci" catalog:eventid="37300872">
USGS
<event catalog:datasource="us" catalog:eventsource="us"
through the publicID, and *only*" through the publicID. The publicID hasUSGS handled this by defining an eventsource (typically FDSN network code)
been designed in a way that makes it easy to be globally unique (authority
part, then resource part that is in the hands of the issueing agency which
ensures uniqueness). The "legacy" event IDs that are used in some
earthquake catalogs (often just integer numbers) cannot be unique,
collisions are likely to occur. When designing QuakeML there was a long
discussion whether legacy IDs should be part of the data model, and there
was a consensus that they shouldn't, first because their usage should be
discouraged (non-uniqueness, not being future-proof, etc), and there was
also no semantically convincing place in the schema to put them.
parameter: "event identifiers are data center specific". It seems that mostIn my opinion this is a serious specification and implementation flaw. In
implementations expect the legacy ID, not the publicID of the event (in
fact, in your examples, this holds for all data centers expect for ETH).
Thanks for pointing this out! I was totally unaware of the fact.
the next version of the event service spec it should be mandatory thatFurthermore, QuakeML publicIDs are designed to be opaque. They may be
eventid is the publicID of the event. All services should be queried in the
same way, but, e.g., for ETH it is not possible, because there are no
legacy IDs. In the current situation, the user has to know which service
requires legacy ID, and which service requires publicID. In addition, the
legacy ID is not per se contained in the returned QuakeML document, as it
is not contained in the standard. This makes it hard to find the legacy ID
if it exists at all (can be hidden in the publicID, or in an extension
attribute which depends on the data center).
compiled from other pieces of information, like timestamps, legacy IDs,If you want to add support to query using public IDs, I recommend
etc., but they need not, they can just be random strings (the resource
part). Therefore, no user or service should rely on parsing publicIDs.
From: Fabian Euchner <fabian.euchner<at>sed.ethz.ch>On Fri, Mar 3, 2017 at 7:50 AM, Fabian Euchner <fabian.euchner<at>sed.ethz.ch>
Date: March 23, 2011 9:35:11 AM MDT
To: Jeremy M Fee <jmfee<at>usgs.gov>
Cc: <QuakeML<at>intensity.usc.edu>, <kfelzer<at>gps.caltech.edu>, Michelle
Guy <mguy<at>usgs.gov>
Subject: Re: Fwd: [QuakeML] question about authority-id and resource-
id in public identifiers
Hi Jeremy et al.,some, up
the section on resource identifiers in the standard doc is based on
to now rather theoretical, thoughts on how a resource metadata
framework could
look like. This is very much inspired (I could also say borrowed)
from how
this is handled in the Astrophysical Virtual Observatory
community ;-) To be
honest, I don't know how agencies that already use QuakeML interpret &
implement it and you are of course right that it's not specified in
detail in
the standard doc. Since it is a standard doc on the markup language,
I think
it's not the right place to specify it, there should be a second
document that
is more focused on infrastructure. *I think it's pretty clear that an*
* identifier cannot refer to two different resources, but I could*
* imagine that a*
* resource can be referenced by more than one identifier from the same*
* authority*
* (aliases).* Thanks for starting this discussion, it's an important
point if
QuakeML starts to play a more important role in our networked
infrastructures.
Cheers,Fabian
On Fri March 18 2011 20:55:50 Jeremy M Fee wrote:Swiss Seismological Service fax +41 44 633 1065
Hi Fabian,
I received a reply from Karen at CalTech, but I'd like to know if
these assumptions are also safe across all QuakeML implementations:
1) An authority always refers to the same resource using the same
resourceID.
2) When an authority updates a resource, the same resourceID is used
And as a result of the previous two assumptions:
3) When one authority submits two different event resourceIDs, they
refer to different events.
I've read the QuakeML-BED.pdf for version 1.1, and cannot find
anything imposing this restriction. At USGS we rely on this to 1)
track updates to existing events, and 2) distinguish events that are
so close in space and time they would otherwise be considered the
same
event.
Thanks,
Jeremy
Begin forwarded message:
From: Karen Felzer <kfelzer<at>gps.caltech.edu>--
Date: March 17, 2011 4:23:32 PM MDT
To: Jeremy M Fee <jmfee<at>usgs.gov>
Cc: quakeml<at>intensity.usc.edu
Subject: Re: [QuakeML] question about authority-id and resource-id
in public identifiers
Yes -- information for the same earthquake should always be reported
under the same earthquake ID number.
regards,
Karen Felzer
On Mar 17, 2011, at 2:22 PM, Jeremy M Fee wrote:
Hi,
Is it a safe assumption that an authority will always refer to the
same resource using the same resource id? Meaning, if an authority
submits an event under one resource id, they will always reuse that
same resource id when updating event information (and identify
version information separately)? This would make it much easier to
recognize updates, versus new information.
Thanks,
Jeremy
_______________________________________________
QuakeML mailing list
QuakeML<at>intensity.usc.edu
http://intensity.usc.edu/mailman/listinfo/quakeml
-------------------------------------------------------------------------------
Fabian Euchner phone +41 44 633 7178
ETH Zurich, NO F67 e-mail fabian<at>sed.ethz.ch
Sonneggstrasse 5 www.fabian-euchner.de
8092 Zurich (Switzerland)
www.earthquake.ethz.ch/people/feuchner
-------------------------------------------------------------------------------QuakePy http://quakepy.org CVcat http://cvcat.net
QuakeML http://quakeml.org AstroCat http://astrocat.org
CSEP http://www.cseptesting.org
-------------------------------------------------------------------------------
Hi Philip, hi all,
in the QuakeML world, entities (events, origins, picks) are identified
through the publicID, and *only*" through the publicID. The publicID has
been designed in a way that makes it easy to be globally unique (authority
part, then resource part that is in the hands of the issueing agency which
ensures uniqueness). The "legacy" event IDs that are used in some
earthquake catalogs (often just integer numbers) cannot be unique,
collisions are likely to occur. When designing QuakeML there was a long
discussion whether legacy IDs should be part of the data model, and there
was a consensus that they shouldn't, first because their usage should be
discouraged (non-uniqueness, not being future-proof, etc), and there was
also no semantically convincing place in the schema to put them.
The fdsnws-event standard says when it comes to the eventid query
parameter: "event identifiers are data center specific". It seems that most
implementations expect the legacy ID, not the publicID of the event (in
fact, in your examples, this holds for all data centers expect for ETH).
Thanks for pointing this out! I was totally unaware of the fact.
In my opinion this is a serious specification and implementation flaw. In
the next version of the event service spec it should be mandatory that
eventid is the publicID of the event. All services should be queried in the
same way, but, e.g., for ETH it is not possible, because there are no
legacy IDs. In the current situation, the user has to know which service
requires legacy ID, and which service requires publicID. In addition, the
legacy ID is not per se contained in the returned QuakeML document, as it
is not contained in the standard. This makes it hard to find the legacy ID
if it exists at all (can be hidden in the publicID, or in an extension
attribute which depends on the data center).
Furthermore, QuakeML publicIDs are designed to be opaque. They may be
compiled from other pieces of information, like timestamps, legacy IDs,
etc., but they need not, they can just be random strings (the resource
part). Therefore, no user or service should rely on parsing publicIDs.
Thanks again, Philip, for bringing up this important issue.
Best regards,
Fabian
A common access pattern is to first do an initial wide but shalloweventid=3337497">
query, and then return to do a deep but narrow query. For example
asking an fdsn-station ws for stations in a box, displaying those on a
map, and then only going back to ask for channels or response for
stations as the user clicks on them. Another example would be to query
an fdsn-event web service for earthquakes, and then return using
things like includeallorigins=true and includearrivals=true to get
more detailed information for a specific earthquake. The current
combination of the fdsn-event ws query parameters and the quakeml xml
specification currently makes this harder than it should be I feel
because while the fdsn-event has a query based on eventid, there is
not a standard way to put the eventid into the original quakeml.
QuakeML has a publicID for each event, but the structure of this is
complicated enough that it is challenging to parse in a way that
reliably extracts the value that should be returned to the service as
eventid.
I have collected example <event> elements from all of the fdsn-event
web services currently listed on
http://www.fdsn.org/webservices/datacenters/
and as you can see there is quite a variety of ways of including the
eventid, in publicID and elsewhere, which makes it harder for clients
as they have to have code that says if (host == USGS) { do this; }
else if (host = ETHZ) { do that; }
which is hard to maintain and fragile.
While this is not likely a big enough of an issue to issue a revision
of the web services spec, it would be really nice if until the next
revision there could be a consensus on how to provide the eventid. And
when the next revision is created, to make this mandatory and
standardized.
I think I would prefer something simple like the USGS, NCEDC and SCEDC
style where there is an simple attribute that gives the eventid
exactly without parsing, like catalog:eventid="71377596", but of
course the drawback is that currently this is a separate schema
definition from the quakeml standard. Using the publicID would be
better in that that is already part of the quakeml spec, but the
format of the URI is too varied and complicated at present for easy
parsing.
Another solution would be to allow the entire publicID to be returned
via the eventid parameter. This would require the server to be able to
parse its own style of publicID, which seems reasonable. However the
structure of the publicID may also cause problems as it looks like a
URL and so would require escaping/encoding of certain characters. Yet
another solution would be to use text format for the wide but shallow
query and then use quakeml for the deep, but this has the downside of
requiring the client to parse two unrelated data formats.
thanks
Philip
IRIS
<event publicID="smi:service.iris.edu/fdsnws/event/1/query?
NCEDCevent/1/query?eventid=37
<event publicID="quakeml:nc.anss.org/Event/NC/71377596"
catalog:datasource="nc" catalog:dataid="nc71377596"
catalog:eventsource="nc" catalog:eventid="71377596">
SCEDC
<event
publicID="quakeml:service.scedc.caltech.edu/fdsnws/
300872" catalog:datasource="ci" catalog:dataid="ci37300872"eventid=usc000lvb
catalog:eventsource="ci" catalog:eventid="37300872">
USGS
<event catalog:datasource="us" catalog:eventsource="us"
catalog:eventid="c000lvb5"
publicID="quakeml:earthquake.usgs.gov/fdsnws/event/1/query?
5&format=quakeml">--
ETHZ
<event publicID="smi:ch.ethz.sed/sc3a/2017eemfch">
INGV
<event
publicID="smi:webservices.ingv.it/fdsnws/event/1/query?eventId=863301">
ISC
<event publicID="smi:ISC/evid=600516598">
----------------------
FDSN Working Group III
(http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent from the FDSN Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
------------------------------------------------------------
-----------------
Fabian Euchner phone +41 44 633 7178
Institute of Geophysics fax +41 44 633 1065
ETH Zurich, NO F5 e-mail fabian<at>sed.ethz.ch
Sonneggstrasse 5 orcid.org/0000-0001-6340-7439
8092 Zurich (Switzerland)
------------------------------------------------------------
-----------------
QuakeML http://quakeml.org QuakePy http://quakepy.org
CSEP http://www.cseptesting.org/centers/eth
------------------------------------------------------------
-----------------
----------------------
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-
products/)
Sent from the FDSN Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Hello,
in the QuakeML world, entities (events, origins, picks) are identified
through the publicID, and *only*" through the publicID. The publicID hasUSGS handled this by defining an eventsource (typically FDSN network code)
been designed in a way that makes it easy to be globally unique (authority
part, then resource part that is in the hands of the issueing agency which
ensures uniqueness). The "legacy" event IDs that are used in some
earthquake catalogs (often just integer numbers) cannot be unique,
collisions are likely to occur. When designing QuakeML there was a long
discussion whether legacy IDs should be part of the data model, and there
was a consensus that they shouldn't, first because their usage should be
discouraged (non-uniqueness, not being future-proof, etc), and there was
also no semantically convincing place in the schema to put them.
as a "namespace" for eventids, which eliminates collisions and allowing
contributors to continue using the existing IDs for events without
requiring yet another eventid system. We commented on this early in the
Quakeml 1.2 process (see previous email to the quakeml mailing list below)
and implemented a custom extension to Quakeml to support our requirements (
https://github.com/usgs/eqmessageutils/blob/master/etc/quakeml_1.2/AnssCatal
og-0.1.xsd ) while remaining compatible with the original specification.
USGS requirements may differ from other organizations, because we aggregate
multiple earthquake catalogs from many contributors into a single
"composite" catalog. We consider an event to have multiple IDs, one unique
id from each contributor (USGS included), and allow events to be referenced
using any of those IDs. Messages from multiple contributors are associated
based on location in space and time, and automatic associations can be
manually overridden when needed. This balances the requirements for a)
individual organizations to assign a unique identifier and maintain a
catalog of events and b) operate independently of any central authority.
The fdsnws-event standard says when it comes to the eventid query
parameter: "event identifiers are data center specific". It seems thatIn my opinion this is a serious specification and implementation flaw. In
most
implementations expect the legacy ID, not the publicID of the event (in
fact, in your examples, this holds for all data centers expect for ETH).
Thanks for pointing this out! I was totally unaware of the fact.
the next version of the event service spec it should be mandatory thatFurthermore, QuakeML publicIDs are designed to be opaque. They may be
eventid is the publicID of the event. All services should be queried in
the
same way, but, e.g., for ETH it is not possible, because there are no
legacy IDs. In the current situation, the user has to know which service
requires legacy ID, and which service requires publicID. In addition, the
legacy ID is not per se contained in the returned QuakeML document, as it
is not contained in the standard. This makes it hard to find the legacy ID
if it exists at all (can be hidden in the publicID, or in an extension
attribute which depends on the data center).
compiled from other pieces of information, like timestamps, legacy IDs,If you want to add support to query using public IDs, I recommend
etc., but they need not, they can just be random strings (the resource
part). Therefore, no user or service should rely on parsing publicIDs.
definition of a new "publicID" parameter for the fdsn service and leave the
existing eventid parameter unchanged for backward compatibility. An
additional consideration is how a service should handle multiple versions
of the same event element (assuming ordering based on
event/creationInfo/creationTime). It may be simpler for an explicit
"detailURL" or similar attribute to be added to the event element to
support the explicit use case to obtain more information.
We introduced our custom extension to support these requirements of being
able to uniquely identify events, and individual pieces of information
being contributed to those events, because of the suggestion (see below)
that there may be aliases for publicIDs and they would not be guaranteed to
be universally unique. I suggest that rather than adding/changing meaning
of an otherwise opaque and non-unique identifier, that explicit attributes
or elements be created for these purposes (or that the AnssCatalog
extension be more widely adopted).
Thanks,
Jeremy
Previous email to quakeml mailing list (couldn't find the list archives
online):
From: Fabian Euchner <fabian.euchner<at>sed.ethz.ch>
Date: March 23, 2011 9:35:11 AM MDT
Hello Jeremy, hello all,
first, let me apologize if somebody found my comment too harsh or offending.
That was absolutely not my intention.
Since the fdsnws-event default output format is QuakeML, I assume that the
QuakeML data model is the common minimum standard data model. Since legacy
IDs are not contained therein, I think using them as a query parameter
should not the common standard way to query individual event information.
Therefore, I would suggest that a next iteration of the event service
specification defines a new query parameter, maybe called eventpublicid,
that is implemented by all data centers to query on event publicIDs, which
are mandatory in all result QuakeML documents. If some data centers want to
additionaly provide a query parameter for legacy IDs, that's fine for me.
Every user querying based on this has to know what she/he does, and how to
deal with results.
All the best,
Fabian
Hello,--
in the QuakeML world, entities (events, origins, picks) are identified
through the publicID, and *only*" through the publicID. The publicID hasUSGS handled this by defining an eventsource (typically FDSN network code)
been designed in a way that makes it easy to be globally unique
(authority
part, then resource part that is in the hands of the issueing agency
which
ensures uniqueness). The "legacy" event IDs that are used in some
earthquake catalogs (often just integer numbers) cannot be unique,
collisions are likely to occur. When designing QuakeML there was a long
discussion whether legacy IDs should be part of the data model, and
there
was a consensus that they shouldn't, first because their usage should be
discouraged (non-uniqueness, not being future-proof, etc), and there was
also no semantically convincing place in the schema to put them.
as a "namespace" for eventids, which eliminates collisions and allowing
contributors to continue using the existing IDs for events without
requiring yet another eventid system. We commented on this early in the
Quakeml 1.2 process (see previous email to the quakeml mailing list below)
and implemented a custom extension to Quakeml to support our requirements
(
https://github.com/usgs/eqmessageutils/blob/master/etc/quakeml_1.2/AnssCatal
og-0.1.xsd ) while remaining compatible with the original specification.
USGS requirements may differ from other organizations, because we
aggregate
multiple earthquake catalogs from many contributors into a single
"composite" catalog. We consider an event to have multiple IDs, one unique
id from each contributor (USGS included), and allow events to be
referenced
using any of those IDs. Messages from multiple contributors are associated
based on location in space and time, and automatic associations can be
manually overridden when needed. This balances the requirements for a)
individual organizations to assign a unique identifier and maintain a
catalog of events and b) operate independently of any central authority.
The fdsnws-event standard says when it comes to the eventid query
parameter: "event identifiers are data center specific". It seems thatIn my opinion this is a serious specification and implementation flaw. In
most
implementations expect the legacy ID, not the publicID of the event (in
fact, in your examples, this holds for all data centers expect for ETH).
Thanks for pointing this out! I was totally unaware of the fact.
the next version of the event service spec it should be mandatory thatFurthermore, QuakeML publicIDs are designed to be opaque. They may be
eventid is the publicID of the event. All services should be queried in
the
same way, but, e.g., for ETH it is not possible, because there are no
legacy IDs. In the current situation, the user has to know which service
requires legacy ID, and which service requires publicID. In addition,
the
legacy ID is not per se contained in the returned QuakeML document, as
it
is not contained in the standard. This makes it hard to find the legacy
ID
if it exists at all (can be hidden in the publicID, or in an extension
attribute which depends on the data center).
compiled from other pieces of information, like timestamps, legacy IDs,If you want to add support to query using public IDs, I recommend
etc., but they need not, they can just be random strings (the resource
part). Therefore, no user or service should rely on parsing publicIDs.
definition of a new "publicID" parameter for the fdsn service and leave
the
existing eventid parameter unchanged for backward compatibility. An
additional consideration is how a service should handle multiple versions
of the same event element (assuming ordering based on
event/creationInfo/creationTime). It may be simpler for an explicit
"detailURL" or similar attribute to be added to the event element to
support the explicit use case to obtain more information.
We introduced our custom extension to support these requirements of being
able to uniquely identify events, and individual pieces of information
being contributed to those events, because of the suggestion (see below)
that there may be aliases for publicIDs and they would not be guaranteed
to
be universally unique. I suggest that rather than adding/changing meaning
of an otherwise opaque and non-unique identifier, that explicit attributes
or elements be created for these purposes (or that the AnssCatalog
extension be more widely adopted).
Thanks,
Jeremy
Previous email to quakeml mailing list (couldn't find the list archives
online):
From: Fabian Euchner <fabian.euchner<at>sed.ethz.ch>On Fri, Mar 3, 2017 at 7:50 AM, Fabian Euchner
Date: March 23, 2011 9:35:11 AM MDT
To: Jeremy M Fee <jmfee<at>usgs.gov>
Cc: <QuakeML<at>intensity.usc.edu>, <kfelzer<at>gps.caltech.edu>, Michelle
Guy <mguy<at>usgs.gov>
Subject: Re: Fwd: [QuakeML] question about authority-id and resource-
id in public identifiers
Hi Jeremy et al.,some, up
the section on resource identifiers in the standard doc is based on
to now rather theoretical, thoughts on how a resource metadata
framework could
look like. This is very much inspired (I could also say borrowed)
from how
this is handled in the Astrophysical Virtual Observatory
community ;-) To be
honest, I don't know how agencies that already use QuakeML interpret &
implement it and you are of course right that it's not specified in
detail in
the standard doc. Since it is a standard doc on the markup language,
I think
it's not the right place to specify it, there should be a second
document that
is more focused on infrastructure. *I think it's pretty clear that an*
* identifier cannot refer to two different resources, but I could*
* imagine that a*
* resource can be referenced by more than one identifier from the same*
* authority*
* (aliases).* Thanks for starting this discussion, it's an important
point if
QuakeML starts to play a more important role in our networked
infrastructures.
Cheers,Fabian
On Fri March 18 2011 20:55:50 Jeremy M Fee wrote:Swiss Seismological Service fax +41 44 633 1065
Hi Fabian,
I received a reply from Karen at CalTech, but I'd like to know if
these assumptions are also safe across all QuakeML implementations:
1) An authority always refers to the same resource using the same
resourceID.
2) When an authority updates a resource, the same resourceID is
used
And as a result of the previous two assumptions:
3) When one authority submits two different event resourceIDs, they
refer to different events.
I've read the QuakeML-BED.pdf for version 1.1, and cannot find
anything imposing this restriction. At USGS we rely on this to 1)
track updates to existing events, and 2) distinguish events that are
so close in space and time they would otherwise be considered the
same
event.
Thanks,
Jeremy
Begin forwarded message:
From: Karen Felzer <kfelzer<at>gps.caltech.edu>--
Date: March 17, 2011 4:23:32 PM MDT
To: Jeremy M Fee <jmfee<at>usgs.gov>
Cc: quakeml<at>intensity.usc.edu
Subject: Re: [QuakeML] question about authority-id and resource-id
in public identifiers
Yes -- information for the same earthquake should always be reported
under the same earthquake ID number.
regards,
Karen Felzer
On Mar 17, 2011, at 2:22 PM, Jeremy M Fee wrote:
Hi,
Is it a safe assumption that an authority will always refer to the
same resource using the same resource id? Meaning, if an authority
submits an event under one resource id, they will always reuse that
same resource id when updating event information (and identify
version information separately)? This would make it much easier to
recognize updates, versus new information.
Thanks,
Jeremy
_______________________________________________
QuakeML mailing list
QuakeML<at>intensity.usc.edu
http://intensity.usc.edu/mailman/listinfo/quakeml
-------------------------------------------------------------------------
------>>
Fabian Euchner phone +41 44 633 7178
ETH Zurich, NO F67 e-mail fabian<at>sed.ethz.ch
Sonneggstrasse 5 www.fabian-euchner.de
8092 Zurich (Switzerland)
www.earthquake.ethz.ch/people/feuchner
-------------------------------------------------------------------------QuakePy http://quakepy.org CVcat http://cvcat.net
------>>
QuakeML http://quakeml.org AstroCat http://astrocat.org
CSEP http://www.cseptesting.org
-------------------------------------------------------------------------
------
<fabian.euchner<at>sed.ethz.ch>
wrote:
Hi Philip, hi all,
in the QuakeML world, entities (events, origins, picks) are identified
through the publicID, and *only*" through the publicID. The publicID has
been designed in a way that makes it easy to be globally unique
(authority
part, then resource part that is in the hands of the issueing agency
which
ensures uniqueness). The "legacy" event IDs that are used in some
earthquake catalogs (often just integer numbers) cannot be unique,
collisions are likely to occur. When designing QuakeML there was a long
discussion whether legacy IDs should be part of the data model, and
there
was a consensus that they shouldn't, first because their usage should be
discouraged (non-uniqueness, not being future-proof, etc), and there was
also no semantically convincing place in the schema to put them.
The fdsnws-event standard says when it comes to the eventid query
parameter: "event identifiers are data center specific". It seems that
most
implementations expect the legacy ID, not the publicID of the event (in
fact, in your examples, this holds for all data centers expect for ETH).
Thanks for pointing this out! I was totally unaware of the fact.
In my opinion this is a serious specification and implementation flaw.
In
the next version of the event service spec it should be mandatory that
eventid is the publicID of the event. All services should be queried in
the
same way, but, e.g., for ETH it is not possible, because there are no
legacy IDs. In the current situation, the user has to know which service
requires legacy ID, and which service requires publicID. In addition,
the
legacy ID is not per se contained in the returned QuakeML document, as
it
is not contained in the standard. This makes it hard to find the legacy
ID
if it exists at all (can be hidden in the publicID, or in an extension
attribute which depends on the data center).
Furthermore, QuakeML publicIDs are designed to be opaque. They may be
compiled from other pieces of information, like timestamps, legacy IDs,
etc., but they need not, they can just be random strings (the resource
part). Therefore, no user or service should rely on parsing publicIDs.
Thanks again, Philip, for bringing up this important issue.
Best regards,
Fabian
A common access pattern is to first do an initial wide but shalloweventid=3337497">
query, and then return to do a deep but narrow query. For example
asking an fdsn-station ws for stations in a box, displaying those on a
map, and then only going back to ask for channels or response for
stations as the user clicks on them. Another example would be to query
an fdsn-event web service for earthquakes, and then return using
things like includeallorigins=true and includearrivals=true to get
more detailed information for a specific earthquake. The current
combination of the fdsn-event ws query parameters and the quakeml xml
specification currently makes this harder than it should be I feel
because while the fdsn-event has a query based on eventid, there is
not a standard way to put the eventid into the original quakeml.
QuakeML has a publicID for each event, but the structure of this is
complicated enough that it is challenging to parse in a way that
reliably extracts the value that should be returned to the service as
eventid.
I have collected example <event> elements from all of the fdsn-event
web services currently listed on
http://www.fdsn.org/webservices/datacenters/
and as you can see there is quite a variety of ways of including the
eventid, in publicID and elsewhere, which makes it harder for clients
as they have to have code that says if (host == USGS) { do this; }
else if (host = ETHZ) { do that; }
which is hard to maintain and fragile.
While this is not likely a big enough of an issue to issue a revision
of the web services spec, it would be really nice if until the next
revision there could be a consensus on how to provide the eventid. And
when the next revision is created, to make this mandatory and
standardized.
I think I would prefer something simple like the USGS, NCEDC and SCEDC
style where there is an simple attribute that gives the eventid
exactly without parsing, like catalog:eventid="71377596", but of
course the drawback is that currently this is a separate schema
definition from the quakeml standard. Using the publicID would be
better in that that is already part of the quakeml spec, but the
format of the URI is too varied and complicated at present for easy
parsing.
Another solution would be to allow the entire publicID to be returned
via the eventid parameter. This would require the server to be able to
parse its own style of publicID, which seems reasonable. However the
structure of the publicID may also cause problems as it looks like a
URL and so would require escaping/encoding of certain characters. Yet
another solution would be to use text format for the wide but shallow
query and then use quakeml for the deep, but this has the downside of
requiring the client to parse two unrelated data formats.
thanks
Philip
IRIS
<event publicID="smi:service.iris.edu/fdsnws/event/1/query?
NCEDCevent/1/query?eventid=37
<event publicID="quakeml:nc.anss.org/Event/NC/71377596"
catalog:datasource="nc" catalog:dataid="nc71377596"
catalog:eventsource="nc" catalog:eventid="71377596">
SCEDC
<event
publicID="quakeml:service.scedc.caltech.edu/fdsnws/
300872" catalog:datasource="ci" catalog:dataid="ci37300872"eventid=usc000lvb
catalog:eventsource="ci" catalog:eventid="37300872">
USGS
<event catalog:datasource="us" catalog:eventsource="us"
catalog:eventid="c000lvb5"
publicID="quakeml:earthquake.usgs.gov/fdsnws/event/1/query?
5&format=quakeml">--
ETHZ
<event publicID="smi:ch.ethz.sed/sc3a/2017eemfch">
INGV
<event
publicID="smi:webservices.ingv.it/fdsnws/event/1/query?eventId=863301">
ISC
<event publicID="smi:ISC/evid=600516598">
----------------------
FDSN Working Group III
(http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent from the FDSN Message Center
(http://www.fdsn.org/message-center/)
Update subscription preferences at
http://www.fdsn.org/account/profile/
------------------------------------------------------------
-----------------
Fabian Euchner phone +41 44 633 7178
Institute of Geophysics fax +41 44 633 1065
ETH Zurich, NO F5 e-mail fabian<at>sed.ethz.ch
Sonneggstrasse 5 orcid.org/0000-0001-6340-7439
8092 Zurich (Switzerland)
------------------------------------------------------------
-----------------
QuakeML http://quakeml.org QuakePy http://quakepy.org
CSEP http://www.cseptesting.org/centers/eth
------------------------------------------------------------
-----------------
----------------------
FDSN Working Group III
(http://www.fdsn.org/message-center/topic/fdsn-wg3-
products/)
Sent from the FDSN Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
-----------------------------------------------------------------------------
Fabian Euchner phone +41 44 633 7178
Institute of Geophysics fax +41 44 633 1065
ETH Zurich, NO F5 e-mail fabian<at>sed.ethz.ch
Sonneggstrasse 5 orcid.org/0000-0001-6340-7439
8092 Zurich (Switzerland)
-----------------------------------------------------------------------------
QuakeML http://quakeml.org QuakePy http://quakepy.org
CSEP http://www.cseptesting.org/centers/eth
-----------------------------------------------------------------------------
----------------------
FDSN Working Group III
(http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent from the FDSN Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/