Ted, > The goal of the Maginot Line was longterm stability.
I'll resist the temptation to assault that metaphorical defensive line directly, and instead just sweep right by it... > Do I understand you correctly, Ken, that Sybase would rather have code > versions that behave consistently but incorrectly (from a user's point of > view) rather than inconsistent versions, the newer ones of which behave > correctly? I can accept that such could be a company's business priority, > but I just want to know if that's what you are saying. Yes, I am saying that. However, I disagree with the presupposition embedded in: > but incorrectly (from a user's point of view) which I think is as faulty as that of people who might claim that, for example, storing ä for Swedish as <a, combining diaeresis> would be incorrect from a user's point of view. It is very important to my company (and to many others implementing Unicode) that normalization not be changed in ways such that data normalized as specified in Unicode 4.0, for example, become *un*normalized by the specification of Unicode 5.0, for example, so that reapplication of a newer version of the algorithm would potentially change normalized data. That is the issue. Is it a priority for my company that Biblical Hebrew "behave incorrectly from a user's point of view"? Of course not. But if yerushala(y)im is "spelled correctly", in this case, with a CGJ, then implementation of correct behavior from a user's point of view -- even taking into account that data may be subject to normalization beyond the user's control (as for web publication) -- is possible, while not destabilizing normalization whatsoever. Making it possible for potential customers to be satisfied and happy with their software's behavior, while simultaneously preserving the stability of infrastructure algorithms important to our products, *is* a priority for my company. > Also, I don't understand in what sense the normalization *algorithm* gets > broken by changing combining classes. Could someone elaborate? >From the Unicode Standard, Version 4.0: "D8a The logical description of a process used to achieve a specified result involving Unicode characters." Part of the specification of the Unicode normalization algorithm is idempotency *across* versions, so that addition of new characters to the standard, which require extensions of the tables for decomposition, recomposition, and composition exclusion in the algorithm, does *not* result in a situation where application of a later version of the normalization algorithm results in change of *any* string normalized by an earlier version of the algorithm. The suggested changes in combining class values would break *that* specification. I'm not suggesting that the code in anyone's particular implementation would suddenly go haywire and start producing segmentation faults if we swapped two numbers in the table of combining class values that it uses. --Ken