Re: Canonical equivalence in rendering: mandatory or recommended?

Peter Kirk Thu, 16 Oct 2003 16:11:28 -0700

On 16/10/2003 12:38, Peter Constable wrote:

-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]

On

Behalf Of Asmus Freytag

Canonical equivalence must be taken into account in rendering

multiple

accents, so that any two canonically equivalent sequences display as

the

same.

This statement goes to the core of Unicode. If it is followed, it guarantees that normalizing a string does not change its appearance

(and

therefore it remains the 'same' string as far as the user is

concerned.)
I agree in principle. There are two ways in which the philosophy behind
this breaks down in real life, though:
1. There are cases of combining marks given a class of 0, meaning that combinations of marks in different positions relative to the base will be visually indistinguishable, but the encoded representations are not the same, and not canonically equivalent. E.g. (taken from someone else on the Indic list) Devanagari ka + i + u vs. ka + u + i.

As we are talking about rendering rather than operations on the backing store, this is actually irrelevant. If two sequences are visually indistinguishable (with the particular font in use), a rendering engine can safely map them together even if they are not canonically equivalent, as long as the backing store is unchanged.

2. Relying on normalization, and specifically canonical ordering, to
happen in a rendering engine IS liable to be a noticeable performance
issue. I suggest that whoever wrote
Rendering systems should handle any of the canonically equivalent orders of combining marks. This is not a performance issue: The amount

of time necessary to reorder combining marks is insignificant compared

to the time necessary to carry out other work required for rendering.

was not speaking from experience.

I wonder if anyone involved in this is speaking from real experience. Peter, I don't think your old company actually tried to implement such reordering; Sharon tells me that the idea was suggested, but rejected for reasons unrelated to performance. I have heard that your new company has tried it and has claimed that for Hebrew the performance hit is unacceptable. I am still sceptical of this claim. Presumably this was done by adding a reordering step to an existing rendering engine. But was this reordering properly optimised in binary code, or was it just bolted on to an unsuitable architecture using a high level language designed for the different purpose of glyph level reordering?

Also, as I just pointed out in a separate posting, there should be no performance hit for unpointed modern Hebrew as there are no combining marks to be reordered. The relatively few users of pointed Hebrew would prefer to see their text rendered correctly if a little slowly rather than quickly but incorrectly.

If, as you agree in principle, this is an issue which goes to the core of Unicode, should you not be prepared to take some small performance hit in order to conform properly to the architecture?

...

If what is normalized is the backing store. If what is normalized is a
string at an intermediate stage in the rendering process, then this is
not the case. The reason is the number of times text-rendering APIs get
called. ...

If it is unavoidable to call the same routine (for sorting or any other purpose) multiple times with the same data, the results can be cached so that they do not have to be recalculated each time.


--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Re: Canonical equivalence in rendering: mandatory or recommended?

Reply via email to