On Thu, Mar 10 2016 at 22:40 CET, kenwhist...@att.net writes: > The *reason* that NamesList.txt exists at all is to drive the tool, > unibook, that formats the full Unicode code charts for posting.
[...] On Fri, Mar 11 2016 at 3:13 CET, asm...@ix.netcom.com writes: > On 3/10/2016 5:49 PM, "J. S. Choi" wrote: >> One thing about NamesList.txt is that, as far as I have been able to >> tell, it’s the only machine-readable, parseable source of those >> annotations and cross-references. [...] > This is a different issue. The nameslist.txt is a reasonable source > for driving other formatting programs than just Unibook. Exactly. A student of mine wrote a font sampling program producing output in a Unibook-like form. For this purpose he wrote also a converter from NamesList format to XML: https://github.com/ppablo28/fntsample_ucd_comments https://github.com/ppablo28/ucd_xml_parser I use the XML version of NamesList to provide my own comments to characters (work in progress): https://bitbucket.org/jsbien/parkosz-font/downloads/Parkosz1907draft.pdf Other examples of NamesList.txt use are http://www.fileformat.info/info/unicode/ https://codepoints.net/ Although not exactly the formatting programs, in my opinion they constitute also a valid use. > In fact, the possibility of reuse in this context probably among the > unstated rationales for making the information and syntax available in > the first place. I understand there is no intention to make an official XML version of the file as it would require changes in Unibook? [...] >> What are these other primary sources that maintain these other >> annotation data; are they publicly available? If the name list is the >> only place where these sources’ data have been published, then, for >> better or for worse, the name list is all that is available for much >> information on many code points’ usage. > See my first through third paragraph. You wrote: [...] > There are explanations about character use that are only maintained in > the PDF of the core specification, where this information is packaged > in a way that can be understood by a human reader, but is not amenable > to be extracted by machine. > > While the annotations, comments, cross references etc. in Namelist.txt > appear, formally, to be machine extractable, the way they are created > and managed make them just as much "human-accessible" only as the core > specification. I'm afraid it's not clear for me. Let's take an example. Sometime ago I inquired about a controversial alias for U+018D: http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0014.html Can I really find anything about "reversed Polish-hook o" in the core specification which is not a literal copy of the information from NamesList.txt? Best regards Janusz -- , Prof. dr hab. Janusz S. Bien - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej) Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department) jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/