As we are talking about rendering rather than operations on the backing store, this is actually irrelevant. If two sequences are visually indistinguishable (with the particular font in use), a rendering engine can safely map them together even if they are not canonically equivalent, as long as the backing store is unchanged.-----Original Message-----On
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Behalf Of Asmus Freytag
multipleCanonical equivalence must be taken into account in rendering
theaccents, so that any two canonically equivalent sequences display as
same.(and
This statement goes to the core of Unicode. If it is followed, it
guarantees that normalizing a string does not change its appearance
therefore it remains the 'same' string as far as the user isconcerned.)
I agree in principle. There are two ways in which the philosophy behind this breaks down in real life, though:
1. There are cases of combining marks given a class of 0, meaning that
combinations of marks in different positions relative to the base will
be visually indistinguishable, but the encoded representations are not
the same, and not canonically equivalent. E.g. (taken from someone else
on the Indic list) Devanagari ka + i + u vs. ka + u + i.
I wonder if anyone involved in this is speaking from real experience. Peter, I don't think your old company actually tried to implement such reordering; Sharon tells me that the idea was suggested, but rejected for reasons unrelated to performance. I have heard that your new company has tried it and has claimed that for Hebrew the performance hit is unacceptable. I am still sceptical of this claim. Presumably this was done by adding a reordering step to an existing rendering engine. But was this reordering properly optimised in binary code, or was it just bolted on to an unsuitable architecture using a high level language designed for the different purpose of glyph level reordering?2. Relying on normalization, and specifically canonical ordering, to happen in a rendering engine IS liable to be a noticeable performance issue. I suggest that whoever wrote
Rendering systems should handle any of the canonically equivalent orders of combining marks. This is not a performance issue: The amount
of time necessary to reorder combining marks is insignificant compared
to the time necessary to carry out other work required for rendering.
was not speaking from experience.
Also, as I just pointed out in a separate posting, there should be no performance hit for unpointed modern Hebrew as there are no combining marks to be reordered. The relatively few users of pointed Hebrew would prefer to see their text rendered correctly if a little slowly rather than quickly but incorrectly.
If, as you agree in principle, this is an issue which goes to the core of Unicode, should you not be prepared to take some small performance hit in order to conform properly to the architecture?
...If it is unavoidable to call the same routine (for sorting or any other purpose) multiple times with the same data, the results can be cached so that they do not have to be recalculated each time.
If what is normalized is the backing store. If what is normalized is a string at an intermediate stage in the rendering process, then this is not the case. The reason is the number of times text-rendering APIs get called. ...
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

