> >The ALA-LC conventions are not the only alternatives available for > >representation of Abkhaz and/or Khanty/Mansi data in romanization. > >In fact, you can find such data on the web using alternative > >romanizations. So it isn't as if the current gap in figuring out > >precisely how, in Unicode, to represent a double diacritic with > >another diacritic applied outside the visible double diacritic > >on a digraph is preventing anyone from using romanized Abkhaz or > >Khanty/Mansi data in interchange. > > By the same argument, Unicode might as well stop taking new characters; > surely, between the 500 Latin characters and dozens of punctuation marks > and combining characters and the other 70,000 characters, you can find > a way to communicate whatever language or data you need communicated.
Of course. Let them use ASCII, for that matter. But that wasn't my point. There is no particular evidence that the ALA-LC conventions with the dot above the graphic ligature ties is in widespread use for romanizations of these particular languages, that I can see. So the *urgency* of solving this problem isn't there, unless the LC/library/bibliographic community comes to the UTC and indicates that they have a data interchange problem with USMARC records using ANSEL that requires a clear representation solution in Unicode. And before we go there, I'd like to have a clear specification of how it works in USMARC records, so we would know how to do the following conversion: USMARC <--> Unicode for the two forms in question. The 1990 version of the LC romanizations for this non-Slavic stuff used all kinds of hand-drawn forms. And even the 1997 version of the ALA-LC document is photo-offset from pages that include various kinds of pasteup from who-knows-what sources, including some hand-drawn, with at least one of these dots above being added by hand. So it isn't clear that there is any connection between the ALA-LC document text and the ANSEL character encoding actually used in the USMARC records; this could be arbitrary markup with some system like TEX for publication. BTW, if we are blueskying about this topic, the *elegant* way to resolve this would be to recategorize all the double diacritics as *enclosing* combining marks (Me), rather than Mn, and then rewriting the conventions for their use to match those of the enclosing circle and such. Then they would subtend (or supertend) any grapheme cluster, including arbitrary digraphs indicated with a COMBINING GRAPHEME JOINER character. And a dot above would neatly apply to the entire subtended cluster, as for circled characters, and so on. Of course, that would invalidate anybody's current usage of the characters. Oh well, you can't win 'em all. --Ken