Kenneth Whistler <kenw at sybase dot com> wrote: >> Surely all Unicode/10646 characters are expected to be preserved in >> interchange. What have I got wrong, Ken? > > Your expectation that this stuff will actually work that way. > > Yes, the characters will be preserved in interchange. But the > most likely result you will get is: > > <anchor1>text<anchor2>annotation<anchor3> > > where the anchors will just be blorts. You should not expect that > the whole annotation *framework* will be implemented, and certainly > not that these three characters will suffice for "nice[ly] marked > up... furigana".
I don't have any problem with the idea that many, or even all, of today's applications lack meaningful support for ideographical annotation characters, and will display them as blorts, and I doubt that Michael expects widespread support for them either. What worries me is what Ken saus next: > These animals are more like U+FFFC -- they are internal anchors > that should not be exported, as there is no general expectation > that once exported to plain text, a receiver will have sufficient > context for making sense of them in the way the originator was > dealing with them internally. > > By rights, this whole problem of synchronizing the internal anchor > points for various ruby schemes should have been handled by > noncharacters -- but that mechanism was not really understood > and expanded sufficiently until after the interlinear annotation > characters were standardized. This moves the entire issue out of the realm of poor support and into the big, dark, scary cavern of pre-deprecation. Unicode 3.0 doesn't say exactly what Ken says. Unicode 3.0 (p. 326) says the annotation characters should only be used under "prior agreement between the sender and the receiver because the content may be misinterpreted otherwise." Fine, no problem; those are the same rules that apply to the PUA. Ken, though, seems to say they shouldn't be exported at all, and furthermore they shouldn't even have been encoded in the first place, except that the noncharacters (which explicitly mustn't be interchanged) hadn't been invented yet. This sounds like Plane 14, or the combining Vietnamese tone marks, all over again -- Unicode (and/or WG2) invents a mechanism, but then wishes they hadn't, or thinks of a better way, so the mechanism is "strongly discouraged" and eventually deprecated. (Not that I liked the separate Vietnamese tone marks; don't get me wrong.) Some groups, like IDN and the security mavens, criticize Unicode for its perceived "instability." A lot of the attention seems to revolve around gray areas of normalization and bidi, or confusable glyphs (what I call "spoof buddies"). Can I suggest that a potentially larger source of instability comes from the creation of characters and encoding mechanisms that are subsequently discouraged or deprecated because maybe they weren't fully thought out in the first place? The approval process in Unicode, and especially WG2, is a slow one, and some of these "on second thought" decisions race ahead of the approval process, so that the mechanisms are already doomed by the time of publication. Everybody will welcome the new conventional, graphical-type characters and scripts that are coming with Unicode 4.0. But maybe before standardizing another COMBINING GRAPHEME JOINER or other control-type character, it would be prudent to study the angles even more thoroughly and carefully, and make *damn* sure the character is going to be usable and not discouraged or even deprecated at birth. (No, I have never been involved in the character standardization process -- but I *have* been on committees that encoded other types of things too hastily and then had to find a way to "take back" their decision.) -Doug Ewell Fullerton, California