Ted,

> The goal of the Maginot Line was longterm stability.

I'll resist the temptation to assault that metaphorical
defensive line directly, and instead just sweep right by it...

> Do I understand you correctly, Ken, that Sybase would rather have code
> versions that behave consistently but incorrectly (from a user's point of
> view) rather than inconsistent versions, the newer ones of which behave
> correctly? I can accept that such could be a company's business priority,
> but I just want to know if that's what you are saying.

Yes, I am saying that. However, I disagree with the presupposition
embedded in:

> but incorrectly (from a user's point of view)

which I think is as faulty as that of people who might claim that,
for example, storing ä for Swedish as <a, combining diaeresis>
would be incorrect from a user's point of view.

It is very important to my company (and to many others implementing
Unicode) that normalization not be changed in ways such that
data normalized as specified in Unicode 4.0, for example, become
*un*normalized by the specification of Unicode 5.0, for example,
so that reapplication of a newer version of the algorithm would
potentially change normalized data. That is the issue.

Is it a priority for my company that Biblical Hebrew "behave
incorrectly from a user's point of view"? Of course not.

But if yerushala(y)im is "spelled correctly", in this case,
with a CGJ, then implementation of correct behavior from
a user's point of view -- even taking into account that
data may be subject to normalization beyond the user's
control (as for web publication) -- is possible, while not
destabilizing normalization whatsoever.

Making it possible for potential customers to be satisfied
and happy with their software's behavior, while simultaneously
preserving the stability of infrastructure algorithms
important to our products, *is* a priority for my company.

> Also, I don't understand in what sense the normalization *algorithm* gets
> broken by changing combining classes. Could someone elaborate?

>From the Unicode Standard, Version 4.0:

"D8a The logical description of a process used to achieve a
     specified result involving Unicode characters."
     
Part of the specification of the Unicode normalization algorithm
is idempotency *across* versions, so that addition of new
characters to the standard, which require extensions of the
tables for decomposition, recomposition, and composition
exclusion in the algorithm, does *not* result in a situation 
where application of a later version of the normalization algorithm 
results in change of *any* string normalized by an earlier version 
of the algorithm.

The suggested changes in combining class values would break *that*
specification.

I'm not suggesting that the code in anyone's particular
implementation would suddenly go haywire and start producing
segmentation faults if we swapped two numbers in the table
of combining class values that it uses.

--Ken


Reply via email to