On 23/07/2003 06:37, [EMAIL PROTECTED] wrote:

Philippe Verdy wrote on 07/22/2003 09:18:35 PM:



If there's an agreement about what should have been the best
combining classes...



Describing what would be the best combining classes can be tricky for RTL scripts if the canonical ordering is intended not only for purposes of normalization and string comparison but also as a preferred order for storage and editing interaction. The reason is that the combining classes are intentionally based on visual relative position wrt the base character, not logical. Arbitrarily, a LTR ordering ... < below left < below < below right < ... is used, meaning that combinations of marks will be sequenced in the opposite order to the underlying line order, and so not in the logical order in terms of which users will be thinking. As an example using Hebrew, for a combination of (say) beth with qamats and dehi, preferred classes according to the visual basis on which classes are defined would be

qamats = 220
dehi = 222

and so you'd get an encoded sequence of < beth, qamats, dehi >. But for the
user, the pre-positive dehi, being to the right of the qamats, would
probably be thought of as occuring before the qamats.

Now, I said above that the classes were based arbitrarily on a visual LTR
order. A RTL ordering ... < below right < below < below left < ... could
have been used, but then the same mismatch would exist for LTR scripts. So,
the problem is not with the arbitrary choice of LTR visual ordering for the
classes.



- Peter


--------------------------------------------------------------------------- Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485







From Unicode 4.0 section 3.11, http://www.unicode.org/book/preview/ch03.pdf: "The particular numeric value of the combining class does not have any special significance; the intent of providing the numeric values is /only/ to distinguish the combining classes as being different, for use in equivalence comparisons. ... The canonical order of character sequences does /not/ imply any kind of linguistic correctness or linguistic preference for ordering of combining marks in sequences." There is therefore no reason for combining classes to reflect ordering. The problem, if there is one, is with rendering software which expects to receive an input stream in a logical order although Unicode implies that the order is arbitrary, especially when normalised forms are used for data exchange. The implication of this is that rendering software should in general expect to perform its own reordering.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/





Reply via email to