You are confusing data representation with data presentation. The flaws in 
Excel are NOT issues with the data format. So long as the data format clearly 
and consistently represents that content, then the representation is "good".

If you want to overcome limitations in Excel's presentation (import, 
interpretation), then that's an Excel issue. You can overcome it by manually 
doing the import and explicitly asserting the data type of each column, or you 
can create something more custom.

Realize that more and more your data is likely to be consumed by some other 
data science tool (R, Python/numpy/pandas, etc.) and you quickly see how 
pushing Excel issues into the data representation layer is a losing proposition.

---
Jim Melton


-----Original Message-----
Sent: Thursday, May 4, 2023 01:40
To: discuss-gnuradio@gnu.org
Subject: [EXTERNAL] Re: Getting GPS data into stream

Hey Marcus,

as you say, for a lot of science you don't get high rates – so I'm really less 
worried 
about that. More worried about Excel interpreting some singular data point as 
date; or, as 
soon as we involve textual data, all the funs with encodings, 
quoting/delimiting/escaping… 
(not to mention that an Excel set to German might interpret different things as 
numbers 
than a Northern American one).

I wish there was just one good CSV standard that tools adhered to. Alas, that's 
not the 
case, and especially Excel has a habit of autoconverting input and losing data 
at that point.
So, looking for an alternative that has these well-defined constraints and 
isn't as 
focused on hierarchical data (JSON, YAML, XML), far too verbose but excellent 
to query 
with command line tools (XML), completely impossible to correctly parse as 
human or parser 
in its full beauty (YAML)… Just some tabular data notation that's textual, 
appendable, and 
not a party of guesswork for the reading tool.
We could just canonalize calling all our files

marcusdata.utf8.textalwaysquoted.iso8601.headerspecifies_fieldname_parentheses_type.csv

but even that wouldn't solve the issue of excel seeing an unquoted 12.2021 and 
deciding 
the field being about christmases past.

So, maybe we just do some rootless JSON format that starts with a SigMF object 
describing 
the file and its columns, and then basically is just a sequence of JSON arrays

[ 1.212e-1, 0, "Müller", 24712388823 ]
[ 1.444e-2, 1, "📡🔭  \"👽\"!", 11111111111 ]
[ 2.0115-1, 0, "Cygnus-B", 0 ]

(I'm not even sure that's not valid JSON; gut feeling tells me we should be 
putting [] 
around the whole document, but we don't want that for streaming purposes. 
ECMA-404 doesn't 
seem to *forbid* it.)

That way, we get the metadata in a format that's easy to skip by simpler tools, 
but 
trivial to parse with the right tools (I've grown to like `jq`), and the data 
into a 
well-defined format. Sure, you can't dump that into Excel, still, but you know 
what, if it 
comes down to it, we can have a python script that takes these files and 
actually converts 
them to valid XLSX without the misconversion footguns, and that same tool could 
also be 
run in a browser for those having a hard time executing python on their 
machines.

Cheers,
Marcus
On 03.05.23 23:05, Marcus D. Leech wrote:
> On 03/05/2023 16:51, Marcus Müller wrote:
>>
>> Do agree, but really don't like CSV, too underspecified a format, too many 
>> ways that 
>> comes back to bite you (aside from a thousand SDR users writing emails that 
>> their PC 
>> can't keep up with writing a few MS/s of CSV…)
> I like CSV because you can hand your data files to someone who doesn't have a 
> complete 
> suite of astrophysics tools, and they
>    can slurp it into Excel and play with it.
> 
>>
>> How important is plain-textness in your applications?
> I (and many others in my community) tend to throw ad-hoc tools at data from 
> ad-hoc 
> experiments.  In the past, I used a lot
>    of AWK to post-process data, and these days, I use a lot of Python.    
> Text-based 
> formats lend themselves well to this kind
>    of processing.  Rates are quite low, typically.  Like logging an 
> integrated power 
> spectrum a few times a minute, for example.
> 
> There are other observing modes where text-based formats aren't quite so 
> obvious--like 
> pulsar observations, where filterbank
>    outputs might be recorded at 10s of kHz, and then post-processed with any 
> of a number 
> of pulsar tools.
> 
> In all of this, part of the "science" is extracted in "real-time" and part in 
> post-processing.
> 
> 
>>
>> Best,
>> Marcus
>>
> 
> 

CONFIDENTIALITY NOTICE - SNC EMAIL: This email and any attachments are 
confidential, may contain proprietary, protected, or export controlled 
information, and are intended for the use of the intended recipients only. Any 
review, reliance, distribution, disclosure, or forwarding of this email and/or 
attachments outside of Sierra Nevada Corporation (SNC) without express written 
approval of the sender, except to the extent required to further properly 
approved SNC business purposes, is strictly prohibited. If you are not the 
intended recipient of this email, please notify the sender immediately, and 
delete all copies without reading, printing, or saving in any manner. --- Thank 
You.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to