John Hudson wrote:

> At 03:52 PM 6/26/2003, Rick McGowan wrote:
> 
> >I'll weigh in to agree with Ken here. The solution of cloning a whole set
> >of these things just to fix combining behavior is, to understate, not quite
> >nice.
> 
> No, but would be far from the not nicest thing in Unicode, and there's a 
> really good reason for it. I was originally intrigued by Ken's ZWJ idea -- 
> or by a variant of it using some new re-ordering inhibiting character, to 
> avoid overloading ZWJ any further --, but the more I think about it, the 
> more not nice I think it is to force Biblical scholars to carry the can for 
> errors in the Unicode combining classes.

One of the reasons I keep poking around for alternatives that might
work in a different way is that cloning sets of characters this
way has a way of just displacing the problem. You don't want to
force Biblical scholars to "carry the can" for the errors in
the current combining classes...

But who then does end up carrying the can eventually, if we go
the cloning route? Cloning 14 characters creates a *new*
normalization problem, and forces non-Biblical-scholar users of
pointed Hebrew text to carry *that* particular can.

How does a user of pointed Hebrew text know whether they are
dealing with the legacy points, which people will have gone
on using, outside the context of the group of cognoscenti who
switch their applications and fonts over to the corrected set
of points? What happens if they edit text represented in one
scheme with a tool meant for the other? What about searches
on data with pointed Hebrew -- should it normalize the two
sets of points or not? (And here I am talking about normalization
by an ad hoc, custom folding, rather than generic Unicode
normalization.) Who carries the can for writing the conversion
routines from data in one scheme or the other? How about
conversion from legacy character sets for bibliographic
data -- does that need to be upgraded? How about database
implementations -- do they need custom extensions to do this
folding as part of their query optimizations? And if the
problem with the existing set of points is that their
use in a normalized context eliminates distinctions that
should be maintained, how do I write any conversion routines
in such a way as to not corrupt or otherwise contaminate data
using the new scheme? Who do I blame if my Hebrew fonts works
with one set of points but not the other, and I'm getting
intermittently trashed display as a result? ... and so on...

I think if you really sit down and think about this in the
larger context of users of Unicode Hebrew generically, instead
of merely the Biblical Hebrew community that you are trying
to find a solution for, you may realize that displacing the
pain to *other* users may not be the best solution, either.

While the solution I am suggesting is not without its
conversion problems, I think they are significantly more
tractable than those posed by cloning code points. The
folding issue is much more straightforward, since it would
consist entirely of ignoring the CGJ and applying standard
normalization (or not). The new scheme would essentially be transparent
to systems that don't bother inserting CGJ between points,
as long as their fonts could handle the combinations.
Loss of distinctions in order for data which is exported
from the new systems, and then reimported, would be much
less of an issue, since normalization could not destroy
the distinctions without further intervention. 

> I believe the aim in fixing this 
> problem in Unicode should be to provide Biblical scholars with a good text 
> processing experience, not with awkward kludges,

Yes, but I believe that is the responsibility of the systems and
applications designers, given the tools and constraints we have
to hand. 

> even if that means making 
> the Unicode Hebrew block look weird with duplicated marks. 

I really believe there be dragons there, and the end result will
be to make it *more* difficult for the systems and applications
designers to provide a "good text processing experience" to
all users of pointed Hebrew text.


--Ken


Reply via email to