William Overington <WOverington at ngo dot globalnet dot co dot uk> wrote:
> I do note however that review 3 refers to a document which is only > available to Unicode Consortium members, which seems a strange thing > if views of interested individuals are being sought. I agree. > Also, it is a pity that this new era of Unicode glasnost (displayed > with a ligature? :-) ) comes so shortly after the last Unicode > Technical Committee meeting the minutes of which state the consensus > about no more ligatures being added to the U+FBxx block. Surely the > matter of ligatures would be a good topic upon which to conduct such > a public review. No, it wouldn't, and here's why: There is a concept in Unicode called "normalization" in which certain characters or sequences are considered to be equal to other characters or sequences for comparison purposes. Using this concept, a capital A plus a combining acute accent (U+0041, U+0301) can be considered equivalent to a precomposed A-with-acute (U+00C1). See Unicode Standard Annex #15 [1] for more information. It's important to realize that the *whole reason* this mechanism exists is because of the precomposed ligatures and letters-with-diacritic and compatibility characters in Unicode. If there were only one way to express the concept of "A with acute" in Unicode, there would be no need for normalization. Industry standards, such as the forthcoming Internationalized Domain Name Architecture, depend on normalization to ensure that users don't get unexpected mismatches between "A plus combining acute" and "precomposed A-with-acute." And because these standards and their implementations are built to specific versions of the Unicode Standard, they require stability in the normalization process. If a new precomposed ligature "character" were added to Unicode, there would now be two ways of "spelling" a sequence that supposedly only had one spelling. Let's suppose, JUST FOR ILLUSTRATION, that Unicode added a "ct" ligature at U+FB07. Now there would be two ways of writing the sequence "ct": with the regular Latin letters (U+0063, U+0074) or with the ligature (U+FB07). But none of the existing normalization tables would equate these two, because the ct ligature did not exist in the (previous) version of Unicode that was used to create the normalization table. Thus normalization would work in some cases but not others, which would make the whole concept unstable and unpredictable and useless. That is why Unicode and WG2 have a policy [2] against adding new precomposed ligatures and letters-with-diacritic, to the U+FBxx block or anywhere else. They would break the stability of normalization, a concept whose entire value lies in its stability. That is why the "ct" ligature will not be added at U+FB07, and that is also why the National Taitung Teachers College will not see their 42 precomposed Latin letters added to Unicode. It is a good, sensible, well-thought-out policy that will not benefit from public review. Now, the Plane 14 language tag characters are a different matter entirely. There the UTC proposes not to add something in violation of its existing policy, but to formally discourage something that was just added only a couple of years ago. I am actually arguing for *greater* stability in the Unicode Standard, by arguing against the process of adding and then immediately deprecating features like language tags. (That is not my only argument for Plane 14, but it is one.) -Doug Ewell Fullerton, California [1] http://www.unicode.org/unicode/reports/tr15/ [2] http://std.dkuug.dk/jtc1/sc2/wg2/docs/principles.html