A few years ago I asked about the way variant selectors are supposed to work with Mongolian. In Unicode 3.2 there is an general explanation of variant selectors, with a table of Mongolian variants. I must confess they left me confused: it seems to me that the general explanation would point to one solution which I would call intuitive, character-based (and, in the few applications I have seen, existent), while the table would do it exactly the other way around, and be more or less glyph-based.
Simply put, my question is: are the variant selectors to be used only when a particular character is to be displayed with a glyph which is an exception to the general rules of Mongolian writing, OR is the variant selector always to be used with a particular glyph variant in a particular position, whether that glyph is predictable or not? To give an easy example (I suppose most (all?) cases would be similar): In Mongolian, a medial "n" is regularly displayed with a dot before a vowel, and without a dot before a consonant.The "n" in "ana" would be dotted (as would be the "n" in initial "na"), the "n" in "anda" would not. A typical Mongolian application would display those variants automatically, of course. However, there are a few words/cases (foreign names, place names, or actually grammar books when explaining Mongolian orthography etc.) where this rule breaks down; for the ease of argument, let's say there is a word "aNda" where the "n" would be dotted (I write "N" here for the unexpected case). In a typical Mongolian application, the user would have to make a special effort (different key/variant) to get at the right display. In theory also an undotted "n" in "aNa" might occur. (For some real examples, see http://userpage.fu-berlin.de/~corff/im/MLS/trans003.gif where the capital characters are used, as here, for irregular formations.) Now, if "/" would be a sign of the variant selector, and "N" the sign of the unexpected variant of "n", I would have expected the variant selector to be used only in the unexpected cases, i.e., "N" would have the encoding "n-/". Regular "ana" and "anda" would be unmarked (even if they display the "n" with different glyphs), irregular "aNa" and "aNda" would be encoded "a-n-/-a" and "a-n-/-d-a" (again, even if the "n-/" sequence would denote different glyphs). The statement "For example, in languages employing the Mongolian script, sometimes a specific variant range of glyphs is needed for a specific textual purpose for which the range of "generic" glyphs is considered inappropriate" could be taken to mean this solution. However, the Mongolian table is very glyph-based, and says "The valid combinations are exhaustively listed and described in the following table." It seems to imply that medial dotted "n" is ALWAYS denoted by "n-/" (as is undotted initial "n"). That is, regular "ana" (dotted) would be "a-n-/-a", regular "anda" would "a-n-d-a" (undotted), irregular "aNa" would be encode "a-n-a" (undotted), and irregular "aNda" (dotted) would be "a-n-/-d-a". That is, there would be regular formations marked with the variant selector, and irregular ones unmarked. Which of the two cases is meant by Unicode? Martin Heijdra Chinese Bibliographer East Asian Library and the Gest Collection Frist Campus Center, Room 317 Princeton University 33 Frist Campus Center Princeton, NJ 08544 USA
BEGIN:VCARD VERSION:2.1 N:Heijdra;Martin FN:Martin Heijdra EMAIL;PREF;INTERNET:[EMAIL PROTECTED] REV:20020710T152639Z END:VCARD