Exactly. See http://www.unicode.org/faq/normalization.html#8, for example. (Note: the last FAQ would change if the UTC accepts the proposal for usage of CGJ.)
Mark __________________________________ http://www.macchiato.com ► “Eppur si muove” ◄ ----- Original Message ----- From: "Peter Kirk" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, July 23, 2003 07:24 Subject: Re: Yerushala(y)im - or Biblical Hebrew > On 23/07/2003 06:37, [EMAIL PROTECTED] wrote: > > >Philippe Verdy wrote on 07/22/2003 09:18:35 PM: > > > > > > > >>If there's an agreement about what should have been the best > >>combining classes... > >> > >> > > > >Describing what would be the best combining classes can be tricky for RTL > >scripts if the canonical ordering is intended not only for purposes of > >normalization and string comparison but also as a preferred order for > >storage and editing interaction. The reason is that the combining classes > >are intentionally based on visual relative position wrt the base character, > >not logical. Arbitrarily, a LTR ordering ... < below left < below < below > >right < ... is used, meaning that combinations of marks will be sequenced > >in the opposite order to the underlying line order, and so not in the > >logical order in terms of which users will be thinking. As an example using > >Hebrew, for a combination of (say) beth with qamats and dehi, preferred > >classes according to the visual basis on which classes are defined would be > > > >qamats = 220 > >dehi = 222 > > > >and so you'd get an encoded sequence of < beth, qamats, dehi >. But for the > >user, the pre-positive dehi, being to the right of the qamats, would > >probably be thought of as occuring before the qamats. > > > >Now, I said above that the classes were based arbitrarily on a visual LTR > >order. A RTL ordering ... < below right < below < below left < ... could > >have been used, but then the same mismatch would exist for LTR scripts. So, > >the problem is not with the arbitrary choice of LTR visual ordering for the > >classes. > > > > > > > >- Peter > > > > > >--------------------------------------------------------------------- ------ > >Peter Constable > > > >Non-Roman Script Initiative, SIL International > >7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA > >Tel: +1 972 708 7485 > > > > > > > > > > > > > > > From Unicode 4.0 section 3.11, > http://www.unicode.org/book/preview/ch03.pdf: "The particular numeric > value of the combining class does not have any special significance; the > intent of providing the numeric values is /only/ to distinguish the > combining classes as being different, for use in equivalence > comparisons. ... The canonical order of character sequences does /not/ > imply any kind of linguistic correctness or linguistic preference for > ordering of combining marks in sequences." There is therefore no reason > for combining classes to reflect ordering. The problem, if there is one, > is with rendering software which expects to receive an input stream in a > logical order although Unicode implies that the order is arbitrary, > especially when normalised forms are used for data exchange. The > implication of this is that rendering software should in general expect > to perform its own reordering. > > -- > Peter Kirk > [EMAIL PROTECTED] > http://web.onetel.net.uk/~peterkirk/ > > > >