Dear all,
EIDA Data Centers welcome and support the recent letter by J. Steim,
coherent with our message posted on on July 8th
(
http://www.fdsn.org/message-center/thread/413/#m-659).
As we have mentioned and detailed in various emails to this list, we
also are deeply concerned about the potential disruption and
deterioration of services to users due to a non-backwards compatibility
and many changes. Therefore, our position remains that in order to
optimize the process and usage of resources, before getting into the
single items of the proposal we should get a better understanding of
what we really need and want from an extension to existing SEED, how
this can be designed, which are the expected rollout plans and what will
be the implications for all users. Without having this clearly laid down
it is difficult to understand and evaluate if the changes we are
proposing are worth the efforts they will imply throughout our community.
We appreciate the support by the FDSN Chair for a meeting in late 2016
and this is also clear from the ongoing discussion. The aim should be
not to discuss two alternative proposals but rather to discuss how we
can reach the goal of maintaining a widely accepted format by addressing
as far as possible shortcomings of the current mini-SEED format, will be
supported by a wide section of the community, and be actively embraced
by data centers and end users. The meeting we proposed should include an
extensive discussion on what we really need from an extended or new
format and how we get there with a commonly agreed strategy.
As stated in the initial strawman the main driving motivation behind
this effort is the need to expand the network code to satisfy the always
growing number of demands: “Many FDSN members recognize that the current
two-character network code needs to expand. The miniSEED format is a
fixed length format and expanding the network code would render the
format incompatible with the current release. Such a small, but
disruptive change affords the opportunity to consider other changes to
the format, allowing the FDSN to address historical issues and create a
new foundation for current and future use.”
Therefore we proposed a pragmatic way to immediately solve this issue
with a cost effective solution. Still our proposal can accommodate a
number of other issues mentioned in the strawman as listed at the bottom
of the present e-mail [1].
Before moving forward with this process and iterations we would like to
invite everybody to carefully think about the general purpose of the
changes without being biased by the technical comments or change
proposals on the strawman. This can be done by setting up a dedicated
Working Group (as suggested by J. Steim) or in a dedicated meeting as we
proposed earlier. Indeed the dedicated meeting can be the fundamental
planning forum for this Working Group. In both cases the EIDA member
institutions are ready to actively contribute.
ORFEUS is ready to organize the meeting in Europe (possible location and
and date will be communicated later) and travel costs for up to 5 or 6
participants from other continents can be covered/sponsored by ORFEUS or
by the hosting Institute in Europe. A tentative agenda can be posted
here and discussed within the next days. The intention is not to have
two competing proposals, but to discuss and agree jointly the pathway to
the adoption and rollout of an extended or new standard that should not
be driven only by the urgent need for additional network codes.
Regards,
The ORFEUS/EIDA data centres
http://www.orfeus-eu.org/data/eida/nodes/
[1]
1. Expand the network code.
MS 2.5: Include expanded network code in b1002. Replace network code
in fixed header by "99" or another reserved code.
2. Add a miniSEED version field.
MS 2.5: Probably not needed, but can be included in b1002.
3. Add a data version field.
MS 2.5: Include data version field in b1002.
4. Move important Blockette details into fixed section of the header.
MS 2.5: Not applicable, MS 2.4 blockettes will be kept.
5. Simplify & improve the record start time.
MS 2.5: Not applicable, MS 2.4 time structure will be kept (millisecond
resolution is already supported by blockette 1001).
6. Combine and drop bit flags.
MS 2.5: Not applicable, MS 2.4 bit flags will be kept.
7. Eliminate the time correction field.
MS 2.5: Not applicable, MS 2.4 time correction field will be kept.
8. Forward compatibility mapping.
MS 2.5: Trivial -- since MS 2.5 is a superset of MS 2.4, any MS 2.4 file
is also an MS 2.5 file.
9. General compression and opaque data encodings.
MS 2.5: In MS 2.4, encodings 1..5 (general), 10..18 (FDSN networks) and
30..33 (older networks) are defined. Proposed new encodings 50, 51, 52
and 100 can be added, but should be used only in special cases when
compatibility is not an issue.
10. Add CRC field for validating integrity.
MS 2.5: Include CRC field in b1002. CRC should be calculated over the
entire record, with the CRC bytes assumed to be zero for purposes of the
calculation.
11. Expand the channel codes.
MS 2.5: Include expanded channel code in b1002. Replace channel code in
fixed header by a reserved value.
12. Expand the location identifier.
MS 2.5: Include expanded location identifier in b1002. Replace location
identifier in fixed header by a reserved value.
13. Fixed-point data sample encoding.
MS 2.5: See 9.
14. No SEED 2.4 blockettes, include support for opaque headers.
MS 2.5: Not applicable, MS 2.4 blockettes will be kept. Opaque headers,
though already supported by b2000, could be added to b1002 as well.
15. Eliminate sequence numbers.
MS 2.5: Not applicable, sequence numbers will be kept.
16. Eliminate the timing quality field.
MS 2.5: Not applicable, the timing quality field will be kept.
17. Variable record lengths.
MS 2.5: Not applicable. This is the only addition of MS3 that cannot be
implemented in MS 2.5. On the other hand, the proposal of variable
length records is rather controversial anyway and there are voices
against it.
On 23.08.2016 19:06, Joseph Steim wrote:
Participants in the discussion on the future of miniSEED
International data exchange in earthquake seismology has been effective
over decades because, among other reasons, SEED format has been mostly
static. There are places in SEED into which everything, albeit awkwardly
in some cases, has to fit. This creates a format that everybody may
grouse about equally, but lives within. As a result, there has been a
remarkable level of data sharing across networks. We were one of the
early participants in the design of miniSEED, and as a manufacturer, we
have supplied equipment embracing the advantages of a documented,
common, and efficient format. It has been gratifying to see seismology
benefit so greatly over recent years, helped along by the ability to
share high-quality data. After such a long, successful run, a few of the
format’s capabilities need refreshing, but the design remains sound.
There appear to be two main independent objectives in the present drive
to update miniSEED:
1. Extend representations of certain format elements, such as
network and location codes to accommodate growing needs.
2. Dropping some information now in the archive as defined
entities in favor of sanitizing the information permanently retained to
fit an idealized rendition of the data recorded by field equipment. As a
by-product, the extensible, documented “blockette” system would be
replaced by “opaque” data.
Point 1 can be argued is clearly needed, although whether it is
necessary to do a wholesale rewrite of MSEED handling software worldwide
to accomplish this goal is a worthy topic of discussion. These goals
could be accommodated within the existing format, for example, by
definition of new blockettes to contain the extended identifiers. For
example, reserved values could be used for the existing network and
location codes to indicate the presence of extended identifiers. Such an
approach would be forward and backward compatible, and impose minimal
changes on existing global infrastructure. I understand some FDSN
members have voiced a similar opinion that minimally invasive changes
could be developed that would address the requirements.
Point 2 is, as a matter of design philosophy, not a good idea. For an
archival format, as much information as possible about the recording
environment and the equipment should be maintained – and documented, not
filtered out - for potential use decades from now. Some of the
proposals, in the spirit of extensibility, propose moving some
information that is now fully enumerated in the published SEED format
specification into opaque headers - what might be called the information
“gray market”.
The objective of Point 2 is essentially to strip the published format
down to some clean bones, and neither mandate nor even define data
structures that may be pertinent to only one class of equipment in the
format’s definition. This is a nice idea from a data center’s view,
since all the burden of interpreting any information that might have its
formal specification decommissioned would be pushed onto the user. It’s
a bad idea from the point of view of future integrity and maximum
usefulness of the archive, since “opaque” data is likely to be
undocumented, poorly documented, or even omitted altogether as data are
passed from archive to archive over time. A diversity of information
should be supported, and defined in the archival format. The solution to
managing information that may be important to interpretation or future
harvesting is not to eliminate the information, but to document it. For
an analog, imagine WWSSN seismograms that have no writing on the back.
Some of the comments in email threads appear to agree with the point
that more information pertinent to the recording environment, not less,
is better in an archival format.
Of course changing the format in a non-backward-compatible way, as
proposed in the changes driven by Point 2, does risk blowing up a lot of
things that work now. Is it worth it? Ultimately all format definitions
are arbitrary. Much of what is being proposed is effectively an
arbitrary rearrangement. If this were 1988, the cost would be minimal.
Now, frankly, to arbitrarily change fundamental aspects of the design of
what has been one of the most successful collaborative undertakings in
earthquake seismology seems at least unnecessary, if not a wholly
unproductive use of resources. Everyone’s infrastructure will not be
simplified, but complicated by the major bifurcation in the format used
to exchange data worldwide. Every tool will have to support not one, but
both formats. This will not necessarily make things better, but it will
make work. A measured approach to solve the actual problems, such as
inadequate namespaces for certain format elements, might address the
task in a simple, direct, and efficient way that does not create an
enduring burden.
In a spirit of collaboration, we have responded to a number of the
proposed specific points in the relevant email threads. In general,
however, we are opposed to a redesign that would result in non-backward
compatibility. We would support a working group, and would be happy to
serve, to develop an approach to incorporate necessary changes, while
retaining as much backward compatibility as possible.
Best Regards
Dr. Joseph M. Steim
President
Quanterra, Inc.
steim<at>quanterra.com <steim<at>quanterra.com>
--
Dr. ANGELO STROLLO
Department 2 Geophysics
Section 2.4 Seismology - GEOFON
Tel.: +49 (0)331/2881285
Mob.: +49 (0)172/8590874
Fax : +49 (0)331/2881277
Email: strollo<at>gfz-potsdam.de
_______________________________________
Helmholtz Centre Potsdam
GFZ German Research Centre For Geosciences
Public Law Foundation State of Brandenburg
Telegrafenberg, 14473 Potsdam
House A3 Room 207
http://geofon.gfz-potsdam.de/