Re: Unicode Digest, Vol 56, Issue 20

2018-08-30 Thread Doug Ewell via Unicode
UnicodeData.txt was devised long before any of the other UCD data files. Though it might seem like a simple enhancement to us, adding a header block, or even a single line, would break a lot of existing processes that were built long ago to parse this file. So Unicode can't add a header to this

Re: Unicode Digest, Vol 56, Issue 20

2018-08-30 Thread Philippe Verdy via Unicode
Welel an alternative to XML is JSON which is more compact and faster/simpler to process; however JSON has no explicit schema, unless the schema is being made part of the data itself, complicating its structure (with many levels of arrays of arrays, in which case it becomes less easy to read by huma

Re: Unicode Digest, Vol 56, Issue 20

2018-08-30 Thread Marcel Schneider via Unicode
Thank you for looking into this. First, I’m unable to retrieve the publication you are citing, but a February thread had nearly the same subject, referring to Vol. 50. How did you compute these figures? Is that a code phrase to say: “The same questions over and over again; let’s settle this o

UCD in XML or in CSV? (was: Re: Unicode Digest, Vol 56, Issue 20)

2018-08-30 Thread Marcel Schneider via Unicode
On 30/08/18 23:34 Philippe Verdy via Unicode wrote: > > Welel an alternative to XML is JSON which is more compact and faster/simpler > to process; Thanks for pointing the problem and the solution alike. Indeed the main drawback of the XML format of UCD is that it results in an “insane” filesize

Re: UCD in XML or in CSV? (was: Re: Unicode Digest, Vol 56, Issue 20)

2018-08-30 Thread Marius Spix via Unicode
A good compromise between human readability, machine processability and filesize would be using YAML. Unlike JSON, YAML supports comments, anchors and references, multiple documents in a file and several other features. Regards, Marius Spix On Fri, 31 Aug 2018 06:58:37 +0200 (CEST) Marcel Schn

Re: UCD in XML or in CSV? (was: Re: Unicode Digest, Vol 56, Issue 20)

2018-08-31 Thread Manuel Strehl via Unicode
To handle the UCD XML file a streaming parser like Expat is necessary. For codepoints.net I use that data to stuff everything in a MySQL database. If anyone is interested, the code for that is Open Source: https://github.com/Codepoints/unicode2mysql/ The example for handling the large XML file c

Re: UCD in XML or in CSV? (was: Re: Unicode Digest, Vol 56, Issue 20)

2018-08-31 Thread Marcel Schneider via Unicode
On 31/08/18 08:25 Marius Spix via Unicode wrote: > > A good compromise between human readability, machine processability and > filesize would be using YAML. > > Unlike JSON, YAML supports comments, anchors and references, multiple > documents in a file and several other features. Thanks for advi

Re: UCD in XML or in CSV? (was: Re: Unicode Digest, Vol 56, Issue 20)

2018-09-01 Thread Marius Spix via Unicode
Hello Marcel, YAML supports references, so you can refer to another character’s properties. Example: repertoire: char: - name_alias: - [NUL,abbreviation] - ["NULL",control] cp: na1: "NULL" props: & age: "1.1" na: "" JSN: "" gc: Cc ccc: 0