Re: NamesList.txt as data source

Asmus Freytag Thu, 10 Mar 2016 18:16:45 -0800

On 3/10/2016 5:49 PM, "J. S. Choi" wrote:

One thing about NamesList.txt is that, as far as I have been able to tell, it’s 
the only machine-readable, parseable source of those annotations and 
cross-references.

There are explanations about character use that are only maintained inthe PDF of the core specification, where this information is packaged ina way that can be understood by a human reader, but is not amenable tobe extracted by machine.

While the annotations, comments, cross references etc. in Namelist.txtappear, formally, to be machine extractable, the way they are createdand managed make them just as much "human-accessible" only as the corespecification.

The goal getting a complete and machine-readable description ofcharacter behavior is illusory.


As part of the Unicode Standard and the UCD, the name lists’ annotations and 
cross-references contain much useful data on the intended usage of characters 
and code points beyond the core specification’s chapters. I have long held an 
interest in making the name-list data more universally accessible to the 
general public, especially to visually impaired people—i.e., using 
screen-reader-friendly HTML rather than PDF—while making clear that the 
annotations are merely references to the original, normative Standard’s actual 
code charts and name lists.

This is a different issue. The nameslist.txt is a reasonable source fordriving other _formatting_ programs than just Unibook. In fact, thepossibility of reuse in this context probably among the unstatedrationales for making the information and syntax available in the firstplace.

Let's understand this properly: using the file to translate it into a"human-readable" output format is a proper use of this data, even ifthat translation is done using a mechanical too, as long as the format isa) a format that benefits from the special shortcuts taken in selectingthe information present in the namelist.txt file,b) a format intended to be interpreted by a observant and intelligenthuman reader, and notc) a format intended as direct input to any text-processing algorithm,or any algorithm that "understands" the contents


What are these other primary sources that maintain these other annotation data; 
are they publicly available? If the name list is the only place where these 
sources’ data have been published, then, for better or for worse, the name list 
is all that is available for much information on many code points’ usage.

See my first through third paragraph.

A./

Re: NamesList.txt as data source

Reply via email to