On 3/10/2016 5:49 PM, "J. S. Choi" wrote:
One thing about NamesList.txt is that, as far as I have been able to tell, it’s 
the only machine-readable, parseable source of those annotations and 
cross-references.

There are explanations about character use that are only maintained in the PDF of the core specification, where this information is packaged in a way that can be understood by a human reader, but is not amenable to be extracted by machine.

While the annotations, comments, cross references etc. in Namelist.txt appear, formally, to be machine extractable, the way they are created and managed make them just as much "human-accessible" only as the core specification.

The goal getting a complete and machine-readable description of character behavior is illusory.

As part of the Unicode Standard and the UCD, the name lists’ annotations and 
cross-references contain much useful data on the intended usage of characters 
and code points beyond the core specification’s chapters. I have long held an 
interest in making the name-list data more universally accessible to the 
general public, especially to visually impaired people—i.e., using 
screen-reader-friendly HTML rather than PDF—while making clear that the 
annotations are merely references to the original, normative Standard’s actual 
code charts and name lists.

This is a different issue. The nameslist.txt is a reasonable source for driving other _formatting_ programs than just Unibook. In fact, the possibility of reuse in this context probably among the unstated rationales for making the information and syntax available in the first place.

Let's understand this properly: using the file to translate it into a "human-readable" output format is a proper use of this data, even if that translation is done using a mechanical too, as long as the format is a) a format that benefits from the special shortcuts taken in selecting the information present in the namelist.txt file, b) a format intended to be interpreted by a observant and intelligent human reader, and not c) a format intended as direct input to any text-processing algorithm, or any algorithm that "understands" the contents

What are these other primary sources that maintain these other annotation data; 
are they publicly available? If the name list is the only place where these 
sources’ data have been published, then, for better or for worse, the name list 
is all that is available for much information on many code points’ usage.
See my first through third paragraph.

A./

Reply via email to