On 14 July 2011 00:03, <announceme...@unicode.org> wrote: > The Unicode Technical Committee has posted a new issue for public review and > comment. Details are on the following web page: > > PRI #200 Draft UTR #49: Unicode Character Categories > > This document presents an approach to the categorization of Unicode > characters, and documents data files that implementers can use for defining > and labeling Unicode character categories.
==General Rant== I like the idea of categorizing characters hierarchically, but any categorization scheme is necessarily subjective to a greater or lesser degree, and I do not think that the Unicode Consortium should be pushing one particular hierarchical categorization model as the definitive categorization of Unicode characters. It seems to me that this is one of several recent expansions to the scope of Unicode Character Database (ScriptExtensions.txt is another example) that are neither necessary nor particularly helpful. ==Specific Comment== There are 18 top-level categories: [Control] [Diacritic] [Format] [Hieroglyph] [Ideogram] [Ideograph] [Letter] [Logogram] [Logograph] [Mark] [Number] [Punctuation] [Sign] [Syllable] [Symbol] [Virama] [Vowel] [Word] What are the differences between [Ideograph] and [Ideogram], and between [Logograph] and [Logogram] ? Even if UTR #49 does give distinctly different definitions for each of these four top-level categories, it will not be obvious to most users of Categories.txt what the difference between Ideograph and Ideogram and between Logograph and Logogram is as the -graph/-gram versions are synonymous in general use: <http://en.wikipedia.org/wiki/Logogram> <http://en.wikipedia.org/wiki/Ideogram> Andrew