Exactly. See http://www.unicode.org/faq/normalization.html#8, for
example. (Note: the last FAQ would change if the UTC accepts the
proposal for usage of CGJ.)

Mark
__________________________________
http://www.macchiato.com
►  “Eppur si muove” ◄

----- Original Message ----- 
From: "Peter Kirk" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, July 23, 2003 07:24
Subject: Re: Yerushala(y)im - or Biblical Hebrew


> On 23/07/2003 06:37, [EMAIL PROTECTED] wrote:
>
> >Philippe Verdy wrote on 07/22/2003 09:18:35 PM:
> >
> >
> >
> >>If there's an agreement about what should have been the best
> >>combining classes...
> >>
> >>
> >
> >Describing what would be the best combining classes can be tricky
for RTL
> >scripts if the canonical ordering is intended not only for purposes
of
> >normalization and string comparison but also as a preferred order
for
> >storage and editing interaction. The reason is that the combining
classes
> >are intentionally based on visual relative position wrt the base
character,
> >not logical. Arbitrarily, a LTR ordering ... < below left < below <
below
> >right < ... is used, meaning that combinations of marks will be
sequenced
> >in the opposite order to the underlying line order, and so not in
the
> >logical order in terms of which users will be thinking. As an
example using
> >Hebrew, for a combination of (say) beth with qamats and dehi,
preferred
> >classes according to the visual basis on which classes are defined
would be
> >
> >qamats = 220
> >dehi = 222
> >
> >and so you'd get an encoded sequence of < beth, qamats, dehi >. But
for the
> >user, the pre-positive dehi, being to the right of the qamats,
would
> >probably be thought of as occuring before the qamats.
> >
> >Now, I said above that the classes were based arbitrarily on a
visual LTR
> >order. A RTL ordering ... < below right < below < below left < ...
could
> >have been used, but then the same mismatch would exist for LTR
scripts. So,
> >the problem is not with the arbitrary choice of LTR visual ordering
for the
> >classes.
> >
> >
> >
> >- Peter
> >
> >
>
>---------------------------------------------------------------------
------
> >Peter Constable
> >
> >Non-Roman Script Initiative, SIL International
> >7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
> >Tel: +1 972 708 7485
> >
> >
> >
> >
> >
> >
> >
>  From Unicode 4.0 section 3.11,
> http://www.unicode.org/book/preview/ch03.pdf: "The particular
numeric
> value of the combining class does not have any special significance;
the
> intent of providing the numeric values is /only/ to distinguish the
> combining classes as being different, for use in equivalence
> comparisons. ... The canonical order of character sequences does
/not/
> imply any kind of linguistic correctness or linguistic preference
for
> ordering of combining marks in sequences." There is therefore no
reason
> for combining classes to reflect ordering. The problem, if there is
one,
> is with rendering software which expects to receive an input stream
in a
> logical order although Unicode implies that the order is arbitrary,
> especially when normalised forms are used for data exchange. The
> implication of this is that rendering software should in general
expect
> to perform its own reordering.
>
> -- 
> Peter Kirk
> [EMAIL PROTECTED]
> http://web.onetel.net.uk/~peterkirk/
>
>
>
>


Reply via email to