John Hudson wrote: > At 03:52 PM 6/26/2003, Rick McGowan wrote: > > >I'll weigh in to agree with Ken here. The solution of cloning a whole set > >of these things just to fix combining behavior is, to understate, not quite > >nice. > > No, but would be far from the not nicest thing in Unicode, and there's a > really good reason for it. I was originally intrigued by Ken's ZWJ idea -- > or by a variant of it using some new re-ordering inhibiting character, to > avoid overloading ZWJ any further --, but the more I think about it, the > more not nice I think it is to force Biblical scholars to carry the can for > errors in the Unicode combining classes.
One of the reasons I keep poking around for alternatives that might work in a different way is that cloning sets of characters this way has a way of just displacing the problem. You don't want to force Biblical scholars to "carry the can" for the errors in the current combining classes... But who then does end up carrying the can eventually, if we go the cloning route? Cloning 14 characters creates a *new* normalization problem, and forces non-Biblical-scholar users of pointed Hebrew text to carry *that* particular can. How does a user of pointed Hebrew text know whether they are dealing with the legacy points, which people will have gone on using, outside the context of the group of cognoscenti who switch their applications and fonts over to the corrected set of points? What happens if they edit text represented in one scheme with a tool meant for the other? What about searches on data with pointed Hebrew -- should it normalize the two sets of points or not? (And here I am talking about normalization by an ad hoc, custom folding, rather than generic Unicode normalization.) Who carries the can for writing the conversion routines from data in one scheme or the other? How about conversion from legacy character sets for bibliographic data -- does that need to be upgraded? How about database implementations -- do they need custom extensions to do this folding as part of their query optimizations? And if the problem with the existing set of points is that their use in a normalized context eliminates distinctions that should be maintained, how do I write any conversion routines in such a way as to not corrupt or otherwise contaminate data using the new scheme? Who do I blame if my Hebrew fonts works with one set of points but not the other, and I'm getting intermittently trashed display as a result? ... and so on... I think if you really sit down and think about this in the larger context of users of Unicode Hebrew generically, instead of merely the Biblical Hebrew community that you are trying to find a solution for, you may realize that displacing the pain to *other* users may not be the best solution, either. While the solution I am suggesting is not without its conversion problems, I think they are significantly more tractable than those posed by cloning code points. The folding issue is much more straightforward, since it would consist entirely of ignoring the CGJ and applying standard normalization (or not). The new scheme would essentially be transparent to systems that don't bother inserting CGJ between points, as long as their fonts could handle the combinations. Loss of distinctions in order for data which is exported from the new systems, and then reimported, would be much less of an issue, since normalization could not destroy the distinctions without further intervention. > I believe the aim in fixing this > problem in Unicode should be to provide Biblical scholars with a good text > processing experience, not with awkward kludges, Yes, but I believe that is the responsibility of the systems and applications designers, given the tools and constraints we have to hand. > even if that means making > the Unicode Hebrew block look weird with duplicated marks. I really believe there be dragons there, and the end result will be to make it *more* difficult for the systems and applications designers to provide a "good text processing experience" to all users of pointed Hebrew text. --Ken