Hi all,
Change proposal #12 to the 2016-3-30 straw man (iteration 1) is attached:
Reduce record length field from 4 bytes to 2 bytes.
Please use this thread to provide your feedback on this proposal by
Wednesday August 24th.
thanks,
Chad
----------------------
Posted to multiple topics:
FDSN Working Group II
(http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III
(http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
On Aug 11, 2016, at 5:49 PM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
Hi
I think there should be a separation from what a datacenter permits in
its ingestion systems and what is allowed in the file format. I have
no problem with a datacenter saying "we only take records less than X
bytes" and it probably also makes sense for datacenters to give out
only small sized records. However, there is an advantage for client
software to be able to save a single continuous timespan of data as a
single array of floats, and 65k is kind of small for that. I know
there is an argument that miniseed is not for post processing, but
that seems to me to be a poor reason as it can handle it and it is
really nice to be able to save without switching file formats just
because you have done some processing. And for the most part,
processing means to take records that are continuous and turn them
into a single big float array, do something, and then save the array
out. Having to undo that combining process just to be able to save in
the file format is not ideal. And keep in mind that if some of the
other changes, like network code length, happen, the existing post
processing file formats like SAC will no longer be capable of holding
new data.
And in this case, the save would likely not compress the data, nor
would it need to do the CRC. I would also observe that the current
miniseed allows records of up to 2 to the 256 power, and datacenters
have not been swamped by huge records.
It is true that big records are bad in certain cases, but that doesn't
mean that they are bad in all cases. I feel the file format should not
be designed to prevent those other uses. The extra 2 bytes of storage
to allow up to 4Gb records seems well worth it to me.
thanks
Philip
On Thu, Aug 11, 2016 at 4:00 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
Hi all,----------------------
Change proposal #12 to the 2016-3-30 straw man (iteration 1) is attached:
Reduce record length field from 4 bytes to 2 bytes.
Please use this thread to provide your feedback on this proposal by
Wednesday August 24th.
thanks,
Chad
----------------------
Posted to multiple topics:
FDSN Working Group II
(http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III
(http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Hi,One alternative, which would be better suited for real-time, would be
My two cents is that the permitted length should be kept fairly small so 65k should be fine. I do not know how many times I have dealt with formats like SAC which can store a large time series segment with only a single timestamp for the first sample and have the time of the last sample be inaccurate because the digitizing rate is either not constance or is “slightly off”. Smaller record sizes forces more frequent recording of timestamps and improves timing quality.
I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.
On Aug 17, 2016, at 1:06 PM, andres<at>gfz-potsdam.de wrote:I like this idea. I've been considering similar concepts, dubbed microSEED, with frames that are not necessarily fixed length. The idea was left out of the straw man because it's a pretty radical change from current miniSEED where each record is independently usable. Lots of existing software would require significant redesign to read such data. But, if this concept could be developed in such a way that multiple frames could be easily reassembled into a next generation miniSEED record it might be a nice way to satisfy both archiving and real-time transmission needs.
On 08/17/2016 08:18 PM, David Ketchum wrote:
Hi,One alternative, which would be better suited for real-time, would be
My two cents is that the permitted length should be kept fairly small so 65k should be fine. I do not know how many times I have dealt with formats like SAC which can store a large time series segment with only a single timestamp for the first sample and have the time of the last sample be inaccurate because the digitizing rate is either not constance or is “slightly off”. Smaller record sizes forces more frequent recording of timestamps and improves timing quality.
I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.
using fixed-size "frames" instead of records. Think of a record
consisting of header frame followed by a variable number of data frames.
A frame might include timecode (sequence no.), channel index (for
multiplexing) and possibly CRC. Due to fixed size, finding the start of
a frame would be unambiguous. Compared to a 512-byte mseed 2.x record
(header + 7 data frames), latency would be 7 times smaller, because each
data frame could be sent separately. And by using more data frames one
could reduce overall bandwidth without increasing latency.
Transmitting data in 64-byte chunks was already attempted with mseed
2.4, but unfortunately the total number of samples and the last sample
value must be sent before any data. In the new format I would put such
values, if needed, into a "summary" frame that would be sent after data
frames.
All existing software would require significant modifications even withOn Aug 17, 2016, at 1:06 PM, andres<at>gfz-potsdam.de wrote:I like this idea. I've been considering similar concepts, dubbed microSEED, with frames that are not necessarily fixed length. The idea was left out of the straw man because it's a pretty radical change from current miniSEED where each record is independently usable. Lots of existing software would require significant redesign to read such data. But, if this concept could be developed in such a way that multiple frames could be easily reassembled into a next generation miniSEED record it might be a nice way to satisfy both archiving and real-time transmission needs.
On 08/17/2016 08:18 PM, David Ketchum wrote:
Hi,One alternative, which would be better suited for real-time, would be
My two cents is that the permitted length should be kept fairly small so 65k should be fine. I do not know how many times I have dealt with formats like SAC which can store a large time series segment with only a single timestamp for the first sample and have the time of the last sample be inaccurate because the digitizing rate is either not constance or is “slightly off”. Smaller record sizes forces more frequent recording of timestamps and improves timing quality.
I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.
using fixed-size "frames" instead of records. Think of a record
consisting of header frame followed by a variable number of data frames.
A frame might include timecode (sequence no.), channel index (for
multiplexing) and possibly CRC. Due to fixed size, finding the start of
a frame would be unambiguous. Compared to a 512-byte mseed 2.x record
(header + 7 data frames), latency would be 7 times smaller, because each
data frame could be sent separately. And by using more data frames one
could reduce overall bandwidth without increasing latency.
Transmitting data in 64-byte chunks was already attempted with mseed
2.4, but unfortunately the total number of samples and the last sample
value must be sent before any data. In the new format I would put such
values, if needed, into a "summary" frame that would be sent after data
frames.
On Aug 19, 2016, at 5:55 AM, andres<at>gfz-potsdam.de wrote:Hi Andres,
On 08/19/2016 08:58 AM, Chad Trabant wrote:
All existing software would require significant modifications even withOn Aug 17, 2016, at 1:06 PM, andres<at>gfz-potsdam.de wrote:I like this idea. I've been considering similar concepts, dubbed microSEED, with frames that are not necessarily fixed length. The idea was left out of the straw man because it's a pretty radical change from current miniSEED where each record is independently usable. Lots of existing software would require significant redesign to read such data. But, if this concept could be developed in such a way that multiple frames could be easily reassembled into a next generation miniSEED record it might be a nice way to satisfy both archiving and real-time transmission needs.
On 08/17/2016 08:18 PM, David Ketchum wrote:
Hi,One alternative, which would be better suited for real-time, would be
My two cents is that the permitted length should be kept fairly small so 65k should be fine. I do not know how many times I have dealt with formats like SAC which can store a large time series segment with only a single timestamp for the first sample and have the time of the last sample be inaccurate because the digitizing rate is either not constance or is “slightly off”. Smaller record sizes forces more frequent recording of timestamps and improves timing quality.
I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.
using fixed-size "frames" instead of records. Think of a record
consisting of header frame followed by a variable number of data frames.
A frame might include timecode (sequence no.), channel index (for
multiplexing) and possibly CRC. Due to fixed size, finding the start of
a frame would be unambiguous. Compared to a 512-byte mseed 2.x record
(header + 7 data frames), latency would be 7 times smaller, because each
data frame could be sent separately. And by using more data frames one
could reduce overall bandwidth without increasing latency.
Transmitting data in 64-byte chunks was already attempted with mseed
2.4, but unfortunately the total number of samples and the last sample
value must be sent before any data. In the new format I would put such
values, if needed, into a "summary" frame that would be sent after data
frames.
the current straw man (especially if variable length records are
allowed). SeedLink, Web Services, all user software. The overall cost of
the transition would be huge.
If we want to design a format for the next 30 years, we should not
restrict ourselves with limitations imposed by the current miniSEED
format. On the other hand, if compatibility with the current miniSEED
format is desired, just add another blockette to miniSEED 2.x (as
suggested by Angelo Strollo earlier) and that's it.
Back to the idea of "frames" -- indeed, some info that is needed forA footer would work. Alternatively, the "micro" header on each frame could contain: start time of primary header (for sequencing), the starttime of the first sample in the frame, the number of samples in the frame and any optional headers relevant for the frame (detection). Reassembly to a full record would require summing up the sample counts, combining the optional headers and stripping the micro/frame headers. Some care would be needed with details. If we created such a telemetry framing for otherwise complete "next generation" miniSEED it would have the advantage of limiting the telemetry complexity to those systems that need it, allowing some degree of separation between the use cases of telemetry, archiving, etc. It's certainly an intriguing line of thought.
real-time transfer could be stripped in offline format. If records could
be easily converted to frames and vice versa, it would be great.
Currently the main problem is forward references (number of samples,
detection flags, anything that refers to data that is not yet known when
sending the header), so we need a "footer" in addition to header.
Regards,
Andres.
----------------------
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
The idea was left out of the straw man because it's a pretty radical change from current miniSEED where each record is independently usable. Lots of existing software would require significant redesign to read such data.Thank you, Chad, for addressing an important point: the costs of the new
Chad Trabant schrieb am 19.08.2016 um 08:58:
The idea was left out of the straw man because it's a pretty radical change from current miniSEED where each record is independently usable. Lots of existing software would require significant redesign to read such data.Thank you, Chad, for addressing an important point: the costs of the new
format!
Do you have a rough idea about what the costs of the transition to an
incompatible new data format would be? Reading this discussion one might
get the impression that the transition would be a piece of cake. A
version change, a few modified headers, an extended network code plus a
few other improvements like microsecond time resolution. Hitherto
stubborn network operators will be forced not to use empty location
codes. But all these benefits will come with a price tag because of the
incompatibility of the new format with MiniSEED.
So what will be the cost of the transition? Who will pay the bill? Will
the costs be spread across the community or will the data centers have
to cover the costs alone?
There are quite a few tasks ahead of "us". "Us" means a whole community
of data providers, data management centers, data users, software
developers, hardware manufacturers. World-wide! I.e., everyone who is
now working with MiniSEED and has got used to it. Everyone!
Tasks will include:
* Recoding of entire data archives
* Software updates. In some cases redesign will be necessary, while
legacy software will just cease to work with the new format.
* Migrate data streaming and exchange between institutions world-wide.
It is easy to foresee that real-time data exchange, which was pretty
hard to establish in the first place with many partners world-wide, will
be heavily affected by migrating to the new format.
* Request tools: will there be a deadline like "by August 1st, 2017,
00:00:00 UTC" all fdsnws's have to support to the new format? Or will
there be a transition? If so, how will this be organized? Either access
to two archives (for each format) will be required or the fdsnws's will
have to be enabled to deliver both formats by conversion on the fly?
* Hardware manufacturers will have to support the new format.
* Station network operators will have to bear the costs of adopting the
new format even though it may not yield any benefit to them.
I could probably add more items to this list but thinking of the above
tasks causes me enough headaches already. That's the reason why I am
publicly raising the cost question now because the proponents of the new
format must have been thinking about this and probably have some idea
about how costly the transition would be.
Speaking of costs I would like to remind you of the alternative proposal
presented on July 8th by Angelo Strollo on behalf of the major European
data centers. They propose to simply introduce a new blockette 1002 to
accommodate longer network codes but with enough space for additional
attributes such as extended location id's etc. This light-weight
solution is backward compatible with the existing MiniSEED. It is
therefore the least disruptive solution and minimizes the costs of the
transition.
Regards
Joachim
----------------------
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Just want to point out that a new blockette with extended network codeA special 2-letter network code can be reserved. AFAIK there are even
is NOT backwards compatible. Old software that does not recognize the
new blockette (and therefore likely ignores it) will report it
successfully read the data, but will attribute new data records to the
wrong network. It may appear that this is a lower cost, however this
would generate a new class of bugs that would likely be subtle and
would persist for decades to come. There is pain in both ways, but I
would much prefer a system that fails obviously when it fails to one
that seems to work but actually is wrong infrequently and in a way
that is hard to notice.
A failure that looks like a failure gets fixed quickly, a failure that
looks like a success can easily persist for a long time, causing much
more damage in the long run.
On Aug 19, 2016, at 8:15 AM, andres<at>gfz-potsdam.de wrote:I agree with Philip, the proposed network extension blockette has a fundamental problem regarding backwards compatibility. It is only backwards compatible in that it can be read, but critical information will be quietly lost until a large number of legacy readers are replaced (which will take a very long time). Until then, when using legacy readers, all of the functions of a network code (ownership identification, logical station grouping) are lost with many implications. You can easily imagine older data converters being used for a long time and the expanded network code going missing right away. I predict it wouldn't take very long before network 99 shows up in publications.
On 08/19/2016 04:37 PM, Philip Crotwell wrote:
Just want to point out that a new blockette with extended network codeA special 2-letter network code can be reserved. AFAIK there are even
is NOT backwards compatible. Old software that does not recognize the
new blockette (and therefore likely ignores it) will report it
successfully read the data, but will attribute new data records to the
wrong network. It may appear that this is a lower cost, however this
would generate a new class of bugs that would likely be subtle and
would persist for decades to come. There is pain in both ways, but I
would much prefer a system that fails obviously when it fails to one
that seems to work but actually is wrong infrequently and in a way
that is hard to notice.
A failure that looks like a failure gets fixed quickly, a failure that
looks like a success can easily persist for a long time, causing much
more damage in the long run.
some obvious network codes, such as "99" or "XX" that have never been
used. If data records are attributed to network "99", it is quite
obvious what is going on. Yet, if I use my old PQLX to quickly look at
the data, I don't care about the network code.
Wasn't the network code added in SEED 2.3 in the first place? Any issues
known?
I agree with Philip, the proposed network extension blockette has a fundamental problem regarding backwards compatibility. It is only backwards compatible in that it can be read, but critical information will be quietly lost until a large number of legacy readers are replaced (which will take a very long time). Until then, when using legacy readers, all of the functions of a network code (ownership identification, logical station grouping) are lost with many implications.Hallo Chad,
You can easily imagine older data converters being used for a long time and the expanded network code going missing right away.Older data converters WILL continue to work fine with all currently
I predict it wouldn't take very long before network 99 shows up in publications.This implies authors who don't have a clue about what a network code is.
I do not believe assertions that all users of SEED will think it obvious what is going on with network 99. The grad student doing their work with an old version of PQLX is simply not going to know.Why not inform the grad student? What does it take for the grad student
As Philip says, it'd be better to break things than quietly continue to work while losing network identifiers.What do you mean by "things"? The proposed new format and its
Furthermore, even this small update would require modifications to all software chains,You have a position and are trying your best to defend it. This is
from data generationsNo modifications are needed at the stations. Stations continue to
to data centersData centers are the ones that benefit most from a continuity that the
to usersOnly users that actually use blockette-1002 data. If these users use
along with database schemas, protocols, etc., etc.There are some cases where updates will require further efforts. We
That is a huge amount of work for such a small change.I hope to have pointed out by now that the work required to implement
And now we are back at the beginning of this conversation that started in ~2013.What conversation are you referring to?
Chad Trabant wrote on 20.08.2016 02:01:
I agree with Philip, the proposed network extension blockette has a fundamental problem regarding backwards compatibility. It is only backwards compatible in that it can be read, but critical information will be quietly lost until a large number of legacy readers are replaced (which will take a very long time). Until then, when using legacy readers, all of the functions of a network code (ownership identification, logical station grouping) are lost with many implications.Hallo Chad,
what would be "a very long time"?
First of all note that most of the current infrastructures world-wide
will not be affected by the blockette-1002 extension at all. The reason
for this is that most institutions will simply not produce any data with
1002 blockettes because they don't need the extended attributes. They
will continue to produce and exchange 2.4 MiniSEED just as they have
been for many years. They will not have to upgrade their station
hardware/software in order to produce up-to-date, valid MiniSEED. NO CHANGE!
Of course, "most institutions" is not necessarily all and sooner or
later data with blockette 1002 will start to circulate. This will
require blockette-1002 aware decoders to make use of the extended
attributes.
The obvious question is now: How much time would it take to update
libmseed, qlib, seedlink et al. to support blockette 1002? A week? A
month? A year? A very long time?
As soon as blockette-1002 aware versions of said libraries are
available, the software using them needs to be re-compiled and linked
against them. A lot of software if not most is going to be
blockette-1002 enabled that way, without need for further modifications.
And, very importantly, the software can be made blockette-1002-ready
WELL IN ADVANCE of the actual circulation of blockette-1002 data!
This means specifically: If a consensus about the blockette 1002
structure can be found, say, by December (e.g. AGU), then the work to
make libmseed, qlib, seedlink et al. blockette-1002 ready and
subsequently the software that uses them will take at most a few more
months. With an updated libmseed, software like ObsPy and SeisComP will
support at least the extended attributes out of the box. I haven't
looked at the PQLX details but since it also uses libmseed to read
MiniSEED, a blockette-1002-ready libmseed should allow the transition
will very little (if any) further effort. I am therefore sure that most
relevant, actively maintained software can likewise be made
blockette-1002 ready before the Kobe meeting.
There are, of course, details that need to be addressed. For instance,
the proposed 4-character location identifier and how it is converted to
Earthworm's tracebuf format, as pointed out by Dave. But these problems
would be the same for blockette-1002 MiniSEED and the proposed new format.
You can easily imagine older data converters being used for a long time and the expanded network code going missing right away.Older data converters WILL continue to work fine with all currently
existing MiniSEED streams. Whereas NO older data converters will work
with ANY data converted to the proposed new and entirely incompatible
format!
I predict it wouldn't take very long before network 99 shows up in publications.This implies authors who don't have a clue about what a network code is.
How would they be able to correctly use a network code? That's not an
issue of data formats but channel naming in general.
I do not believe assertions that all users of SEED will think it obvious what is going on with network 99. The grad student doing their work with an old version of PQLX is simply not going to know.Why not inform the grad student? What does it take for the grad student
to learn that in an FDSN network code context "IU" doesn't stand for
"Indiana University"?
http://www.fdsn.org/networks/detail/IU
That's all! In case that grad student happens to stumble upon "99" then
probably an explanation on http://www.fdsn.org/networks/detail/99 would
help him or her.
As Philip says, it'd be better to break things than quietly continue to work while losing network identifiers.What do you mean by "things"? The proposed new format and its
implementation would not just break the grad student's PQLX but it would
break ENTIRE INFRASTRUCTURES. World-wide and from bottom to top!
Do you want to disrupt the entire FDSN data exchange to protect the grad
student using an old PQLX from getting a "99" network code? Is that what
you are saying?
Furthermore, even this small update would require modifications to all software chains,You have a position and are trying your best to defend it. This is
legitimate of course. But are exaggerating minor problems in order to
discredit an approach that you cannot deny would be a lot less
disruptive and expensive than the proposed new format.
from data generationsNo modifications are needed at the stations. Stations continue to
produce 2.4 MiniSEED which will remains valid. There is no need to
produce blockette 1002 except for stations that e.g. have extended
network or location codes. There will not be many (it any) in currently
existing networks.
to data centersData centers are the ones that benefit most from a continuity that the
blockette-1002 approach would allow because they neither need to recode
entire archives nor have to provide "old" and "new" data formats in
parallel.
to usersOnly users that actually use blockette-1002 data. If these users use
up-to-date versions of actively maintained software such as ObsPy,
SeisComP or MiniSEED-to-SAC converters they will not notice any
difference. Legacy software will continue to work with the exception of
the network code that will show up as "99".
along with database schemas, protocols, etc., etc.There are some cases where updates will require further efforts. We
already read about Earthworm and the limited space for the location
identifier in the current Tracebuf2 format. But the effort at the
Earthworm end to accommodate a longer location identifier would be the
same for blockette-1002 data as for the proposed new format. It is
therefore understandable that the Earthworm community has reservations
against an extended location code because it would have to pay the price
for something it probably doesn't need.
In general chances are high that most database schemas will remain
unaffected as well as most protocols.
But I am curious to hear about specific database schemas that would be
more difficult to update to blockette-1002 MiniSEED than to the proposed
new format.
That is a huge amount of work for such a small change.I hope to have pointed out by now that the work required to implement
blockette 1002 would in fact be dramatically less compared to the work
required to upgrade entire infrastructures (indeed from the data loggers
all the way to data users) to a fully incompatible new format.
And now we are back at the beginning of this conversation that started in ~2013.What conversation are you referring to?
Cheers
Joachim
----------------------
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Just like to point out that merely upgrading a library, like libmseed,The structure in libmseed that holds the record header attributes is 'MSRecord'. If the decoder of an updated libmseed sees a blockette 1002 it will have to take the information about the network code etc. from there and populate the MSRecord accordingly. That's all. The software will then use or copy the content of MSRecord.network, which by the way is large enough already (10 characters plus '\0') to accommodate the extended network code.
to parse a new blockette does not make suddenly make older software
compatible with a longer network code.
If the software itself is notThere will of course be target data structures in which the network code is hard-coded to be only two characters long. In such cases (hopefully) only two characters are copied. I haven't found a software in which this would be an actual issue. There *is* a similar issue, though, with the extended location code and the Earthworm Tracebuf2 structure. This will be a pain to solve within the Earthworm community but neither the blockette 1002 nor the proposed new format can be blamed for it. It's a limitation of Earthworm that is due to the current SEED channel naming conventions.
also upgraded to use the information in the new blockette then the new
information is effectively ignored.
I feel that this idea that thereThere will never be a solution involving zero effort.
is a non-disruptive, easy "fix" to expanding the network code is
unrealistic.
On 08/19/2016 04:37 PM, Philip Crotwell wrote:--
Just want to point out that a new blockette with extended networkA special 2-letter network code can be reserved. AFAIK there are even
code
is NOT backwards compatible. Old software that does not recognize
the
new blockette (and therefore likely ignores it) will report it
successfully read the data, but will attribute new data records to
the
wrong network. It may appear that this is a lower cost, however this
would generate a new class of bugs that would likely be subtle and
would persist for decades to come. There is pain in both ways, but I
would much prefer a system that fails obviously when it fails to one
that seems to work but actually is wrong infrequently and in a way
that is hard to notice.
A failure that looks like a failure gets fixed quickly, a failure
that
looks like a success can easily persist for a long time, causing
much
more damage in the long run.
some obvious network codes, such as "99" or "XX" that have never been
used. If data records are attributed to network "99", it is quite
obvious what is going on. Yet, if I use my old PQLX to quickly look
at
the data, I don't care about the network code.
Wasn't the network code added in SEED 2.3 in the first place? Any
issues
known?
Regards,
Andres.
----------------------
Posted to multiple topics:
FDSN Working Group II
(http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III
(http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at
http://www.fdsn.org/account/profile/
Just want to point out that a new blockette with extended network codeAs I wrote before, it *is* backward compatible with the *existing* MiniSEED, which is *all* MiniSEED currently existing in *all* archives. I didn't write "blockette-1002 MiniSEED", because it is obvious that attributes specific to blockette 1002 need to be retrieved from there.
is NOT backwards compatible.
Chad Trabant schrieb am 19.08.2016 um 08:58:
The idea was left out of the straw man because it's a pretty radical change from current miniSEED where each record is independently usable. Lots of existing software would require significant redesign to read such data.Thank you, Chad, for addressing an important point: the costs of the new
format!
Do you have a rough idea about what the costs of the transition to an
incompatible new data format would be? Reading this discussion one might
get the impression that the transition would be a piece of cake. A
version change, a few modified headers, an extended network code plus a
few other improvements like microsecond time resolution. Hitherto
stubborn network operators will be forced not to use empty location
codes. But all these benefits will come with a price tag because of the
incompatibility of the new format with MiniSEED.
So what will be the cost of the transition? Who will pay the bill? Will
the costs be spread across the community or will the data centers have
to cover the costs alone?
There are quite a few tasks ahead of "us". "Us" means a whole community
of data providers, data management centers, data users, software
developers, hardware manufacturers. World-wide! I.e., everyone who is
now working with MiniSEED and has got used to it. Everyone!
Tasks will include:
* Recoding of entire data archives
* Software updates. In some cases redesign will be necessary, while
legacy software will just cease to work with the new format.
* Migrate data streaming and exchange between institutions world-wide.
It is easy to foresee that real-time data exchange, which was pretty
hard to establish in the first place with many partners world-wide, will
be heavily affected by migrating to the new format.
* Request tools: will there be a deadline like "by August 1st, 2017,
00:00:00 UTC" all fdsnws's have to support to the new format? Or will
there be a transition? If so, how will this be organized? Either access
to two archives (for each format) will be required or the fdsnws's will
have to be enabled to deliver both formats by conversion on the fly?
* Hardware manufacturers will have to support the new format.
* Station network operators will have to bear the costs of adopting the
new format even though it may not yield any benefit to them.
I could probably add more items to this list but thinking of the above
tasks causes me enough headaches already. That's the reason why I am
publicly raising the cost question now because the proponents of the new
format must have been thinking about this and probably have some idea
about how costly the transition would be.
Speaking of costs I would like to remind you of the alternative proposal
presented on July 8th by Angelo Strollo on behalf of the major European
data centers. They propose to simply introduce a new blockette 1002 to
accommodate longer network codes but with enough space for additional
attributes such as extended location id's etc. This light-weight
solution is backward compatible with the existing MiniSEED. It is
therefore the least disruptive solution and minimizes the costs of the
transition.
Regards
Joachim
----------------------
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.Can you share some of the other reasons?
On Aug 17, 2016, at 11:18 AM, David Ketchum <dckgov<at>stw-software.com> wrote:
Hi,
My two cents is that the permitted length should be kept fairly small so 65k should be fine. I do not know how many times I have dealt with formats like SAC which can store a large time series segment with only a single timestamp for the first sample and have the time of the last sample be inaccurate because the digitizing rate is either not constance or is “slightly off”. Smaller record sizes forces more frequent recording of timestamps and improves timing quality.
I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.
Dave
On Aug 11, 2016, at 5:49 PM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:----------------------
Hi
I think there should be a separation from what a datacenter permits in
its ingestion systems and what is allowed in the file format. I have
no problem with a datacenter saying "we only take records less than X
bytes" and it probably also makes sense for datacenters to give out
only small sized records. However, there is an advantage for client
software to be able to save a single continuous timespan of data as a
single array of floats, and 65k is kind of small for that. I know
there is an argument that miniseed is not for post processing, but
that seems to me to be a poor reason as it can handle it and it is
really nice to be able to save without switching file formats just
because you have done some processing. And for the most part,
processing means to take records that are continuous and turn them
into a single big float array, do something, and then save the array
out. Having to undo that combining process just to be able to save in
the file format is not ideal. And keep in mind that if some of the
other changes, like network code length, happen, the existing post
processing file formats like SAC will no longer be capable of holding
new data.
And in this case, the save would likely not compress the data, nor
would it need to do the CRC. I would also observe that the current
miniseed allows records of up to 2 to the 256 power, and datacenters
have not been swamped by huge records.
It is true that big records are bad in certain cases, but that doesn't
mean that they are bad in all cases. I feel the file format should not
be designed to prevent those other uses. The extra 2 bytes of storage
to allow up to 4Gb records seems well worth it to me.
thanks
Philip
On Thu, Aug 11, 2016 at 4:00 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
Hi all,----------------------
Change proposal #12 to the 2016-3-30 straw man (iteration 1) is attached:
Reduce record length field from 4 bytes to 2 bytes.
Please use this thread to provide your feedback on this proposal by
Wednesday August 24th.
thanks,
Chad
----------------------
Posted to multiple topics:
FDSN Working Group II
(http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III
(http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
On Aug 19, 2016, at 5:38 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
Hi Dave,
I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.Can you share some of the other reasons?
I get the rapid access reasoning I think. As I've heard it described where one makes some educated guesses about where the data are in a file and skips around until you zero-in on the correct record(s).
The notion of a variable record length has been raised a number of times in the past, we finally added it to the straw man for these reasons:
a) In many ways it is a better fit for real time streams. No more waiting to "fill a record" or transmitting unfilled records, latency is much more controllable without waste. Also, data are usually generated at a regular rate, if one would like to package and transmit them at a regular rate with compression the output size is not readily predictable.
b) Adjustments to records such as adding optional headers become much easier. In 2.x miniSEED if you wanted to, for example, add a blockette but there is not enough room you are stuck with re-encoding the data into unfilled records or reprocessing a lot of data to pack it efficiently.
I'm on the fence with this one and would appreciate hearing about any other pros and cons regarding variable versus fixed record lengths.
thanks,
Chad
On Aug 17, 2016, at 11:18 AM, David Ketchum <dckgov<at>stw-software.com> wrote:
Hi,
My two cents is that the permitted length should be kept fairly small so 65k should be fine. I do not know how many times I have dealt with formats like SAC which can store a large time series segment with only a single timestamp for the first sample and have the time of the last sample be inaccurate because the digitizing rate is either not constance or is “slightly off”. Smaller record sizes forces more frequent recording of timestamps and improves timing quality.
I also think variable length records is a really bad idea. I prefer fixed length records on power of two boundaries for a variety of reasons. Mostly it permits more rapid accessing of the data without having to build extensive indices for each data block.
Dave
On Aug 11, 2016, at 5:49 PM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:----------------------
Hi
I think there should be a separation from what a datacenter permits in
its ingestion systems and what is allowed in the file format. I have
no problem with a datacenter saying "we only take records less than X
bytes" and it probably also makes sense for datacenters to give out
only small sized records. However, there is an advantage for client
software to be able to save a single continuous timespan of data as a
single array of floats, and 65k is kind of small for that. I know
there is an argument that miniseed is not for post processing, but
that seems to me to be a poor reason as it can handle it and it is
really nice to be able to save without switching file formats just
because you have done some processing. And for the most part,
processing means to take records that are continuous and turn them
into a single big float array, do something, and then save the array
out. Having to undo that combining process just to be able to save in
the file format is not ideal. And keep in mind that if some of the
other changes, like network code length, happen, the existing post
processing file formats like SAC will no longer be capable of holding
new data.
And in this case, the save would likely not compress the data, nor
would it need to do the CRC. I would also observe that the current
miniseed allows records of up to 2 to the 256 power, and datacenters
have not been swamped by huge records.
It is true that big records are bad in certain cases, but that doesn't
mean that they are bad in all cases. I feel the file format should not
be designed to prevent those other uses. The extra 2 bytes of storage
to allow up to 4Gb records seems well worth it to me.
thanks
Philip
On Thu, Aug 11, 2016 at 4:00 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
Hi all,----------------------
Change proposal #12 to the 2016-3-30 straw man (iteration 1) is attached:
Reduce record length field from 4 bytes to 2 bytes.
Please use this thread to provide your feedback on this proposal by
Wednesday August 24th.
thanks,
Chad
----------------------
Posted to multiple topics:
FDSN Working Group II
(http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III
(http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/