[Resend of a response which got eaten by the Unicode email during the system maintenance last week. Carl already responded to me on this, but others may not have seen what he was responding to. --Ken]
> Proposed unknown and missing character representation. This would be an > alternate to method currently described in 5.3. > > The missing or unknown character would be represented as a series of > vertical hex digit pairs for each byte of the character. The problem I have with this is that is seems to be an overengineered approach that conflates two issues: a. What does a font do when requested to display a character (or sequence) for which it has no glyph. b. What does a user do to diagnose text content that may be causing a rendering failure. For the first problem, we already have a widespread approach that seems adequate. And other correspondents on this topic have pointed out that the particular approach of displaying up hex numbers for characters may pose technical difficulties for at least some font technologies. [snip] > > This representation would be recognized by untrained people as unrenderable > data or garbage. So it would serve the same function as a missing glyph > character except that it would be different from normal glyphs so that they > would know that something was wrong and the text did not just happen to have > funny characters. I don't see any particular problem in training people to recognize when they are seeing their fonts' notdef glyphs. The whole concept of "seeing little boxes where the characters should be" is not hard to explain to people -- even to people who otherwise have difficulty with a lot of computer abstractions. Things will be better-behaved when applications finally get past the related but worse problem of screwing up the character encodings -- which results in the more typical misdisplay: lots of recognizable glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that must be another piece of Korean spam mail in my mail tray.) > > It would aid people in finding the problem and for people with Unicode books > the text would be decipherable. If the information was truly critical they > could have the text deciphered. Rather than trying to engineer a questionable solution into the fonts, I'd like to step back and ask what would better serve the user in such circumstances. And an approach which strikes me as a much more useful and extensible way to deal with this would be the concept of a "What's This?" text accessory. Essentially a small tool that a user could select a piece of text with (think of it like a little magnifying glass, if you will), which will then pop up the contents selected, deconstructed into its character sequence explicitly. Limited versions of such things exist already -- such as the tooltip-like popup windows for Asmus' Unibook program, which give attribute information for characters in the code chart. But I'm thinking of something a little more generic, associated with textedit/richedit type text editing areas (or associated with general word processing programs). The reason why such an approach is more extensible is that it is not merely focussed on the nondisplayable character glyph issue, but rather represents a general ability to "query" text, whether normally displayable or not. I could query a black box notdef glyph to find out what in the text caused its display; but I could just as well query a properly displayed Telugu glyph, for example, to find out what it was, as well. This is comparable (although more point-oriented) to the concept of giving people a source display for HTML, so they can figure out what in the markup is causing rendering problems for their rich text content. [snip] > This proposal would provide a standardized approach that vendors could adopt > to clarify missing character rendering and reduce support costs. By > including this in the standard we could provide a cross vendor approach. > This would provide a consistent solution. In my opinion, the standard already provides a description of a cross-vendor approach to the notdef glyph problem, with the advantage that it is the de facto, widely adopted approach as well. As long as font vendors stay away from making {p}'s and {q}'s their notdef glyphs, as I think we can safely presume they will, and instead use variants on the themes of hollowed or filled boxes, then the problem of *recognition* of the notdef glyphs for what they are is a pretty marginal problem. And as for how to provide users better diagnostics for figuring out the content of undisplayable text, I suppose the standard could suggest some implementation guidelines there, but this might be a better area to just leave up to competing implementation practice until certain user interface models catch on and get widespread acceptance. --Ken