On 11/22/2010 4:15 AM, Michael Everson wrote:
It boils down to this: just as there aren’t technical or usability reasons that 
make it problematic to represent IPA text using two Greek characters in an 
otherwise-Latin system,
Yes there are. Sorting multilingual text including Greek and IPA 
transcriptions, for one. The glyph shape for IPA beta is practically unknown in 
Greek. Latin capital Chi is not the same as Greek capital chi.

>  so also there are no technical or usability reasons I’m aware of why it is 
problematic to represent this historic Janalif orthography using two Cyrillic 
characters.
They are the same technical and usability reasons which led to the 
disunification of Cyrillic Ԛ and Ԝ from Latin Q and W.

The sorting problem I think I understand.

Because scripts are kept together in sorting, when you have a mixed script list, you normally overrides just the sorting for the script to which the (sort-)language belongs. A mixed French-Russian list would use French ordering for the Latin characters, but the Russian words would all appear together (and be sorted according to some generic sort order for Cyrillic characters - except that for a bilingual list, sorting the Cyrillic according to Russian rules might also make sense.).

Same for a French-Greek list. The Greek characters will be together and sorted either by a generic Greek (script) sort, or a specific Greek (language) sort.When you sort a mixed list of IPA and Greek, the beta and chi will now sort with the Latin characters, in whatever sort order applies for IPA. That means the order of all Greek words in the list will get messed up. It will neither be a generic Greek (script) sort, nor a specific Greek (language) sort, because you can't tailor the same characters two different ways in the same sort.

That's the problem I understand is behind the issue with the Kurdish Q and W, and with the character pair proposed for disunification for Janalif.

Perhaps, it seems, there are some technical problems that would make the support for such "mixed-script" orthographies not as seamless as for regular orthographies after all.

In that case, a decision would boil down to whether these technical issues are significant enough (given the usage).

In other words, it becomes a cost-benefit analysis. Duplication of characters (except where their glyphs have acquired a different appearance in the other context) always has a cost in added confusability. Users can select the wrong character accidentally, spoofers can do so intentionally to try to cause harm. But Unicode was never just a list of distinct glyphs, so duplication between Latin and Greek, or Latin and Cyrillic is already widespread, especially among the capitals.

Unlike what Michael claims for IPA, the Janalif characters don't seem to have a very different appearance, so there would not be any technical or usability issue there. Minor glyph variations can be handled by standard technologies, like OpenType, as long as the overall appearance remains legible should language binding of a text have gotten lost.

That seems to be true for IPA as well - because already, if you use the font binding for IPA, your a's and g's will not come out right, which means you don't even have to worry about betas and chis.

IPA being a notation, I would not be surprised to learn that mixed lists with both IPA and other terms are a rare thing. But for Janalif it would seem that mixed Janalif/Cyrillic lists would be rather common, relative to the size of the corpus, even if its a dead (or currently out of use) orthography.

I'd like to see this addressed a bit more in detail by those who support the decision to keep the borrowed characters unified.

A./

Reply via email to