On 16/10/2003 12:38, Peter Constable wrote:

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]


On


Behalf Of Asmus Freytag






Canonical equivalence must be taken into account in rendering


multiple


accents, so that any two canonically equivalent sequences display as


the


same.

This statement goes to the core of Unicode. If it is followed, it
guarantees that normalizing a string does not change its appearance


(and


therefore it remains the 'same' string as far as the user is


concerned.)

I agree in principle. There are two ways in which the philosophy behind
this breaks down in real life, though:

1. There are cases of combining marks given a class of 0, meaning that
combinations of marks in different positions relative to the base will
be visually indistinguishable, but the encoded representations are not
the same, and not canonically equivalent. E.g. (taken from someone else
on the Indic list) Devanagari ka + i + u vs. ka + u + i.


As we are talking about rendering rather than operations on the backing store, this is actually irrelevant. If two sequences are visually indistinguishable (with the particular font in use), a rendering engine can safely map them together even if they are not canonically equivalent, as long as the backing store is unchanged.

2. Relying on normalization, and specifically canonical ordering, to
happen in a rendering engine IS liable to be a noticeable performance
issue. I suggest that whoever wrote



Rendering systems should handle any of the canonically equivalent orders of combining marks. This is not a performance issue: The amount


of time necessary to reorder combining marks is insignificant compared


to the time necessary to carry out other work required for rendering.



was not speaking from experience.




I wonder if anyone involved in this is speaking from real experience. Peter, I don't think your old company actually tried to implement such reordering; Sharon tells me that the idea was suggested, but rejected for reasons unrelated to performance. I have heard that your new company has tried it and has claimed that for Hebrew the performance hit is unacceptable. I am still sceptical of this claim. Presumably this was done by adding a reordering step to an existing rendering engine. But was this reordering properly optimised in binary code, or was it just bolted on to an unsuitable architecture using a high level language designed for the different purpose of glyph level reordering?

Also, as I just pointed out in a separate posting, there should be no performance hit for unpointed modern Hebrew as there are no combining marks to be reordered. The relatively few users of pointed Hebrew would prefer to see their text rendered correctly if a little slowly rather than quickly but incorrectly.

If, as you agree in principle, this is an issue which goes to the core of Unicode, should you not be prepared to take some small performance hit in order to conform properly to the architecture?

...

If what is normalized is the backing store. If what is normalized is a
string at an intermediate stage in the rendering process, then this is
not the case. The reason is the number of times text-rendering APIs get
called. ...

If it is unavoidable to call the same routine (for sorting or any other purpose) multiple times with the same data, the results can be cached so that they do not have to be recalculated each time.


-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/





Reply via email to