On 25/09/2003 12:27, [EMAIL PROTECTED] wrote:

Is this actually correct? For example, if I have in my data the string <U+0104, U+05B0> (which I know is garbage, but that is irrelevant), that

will decompose and reorder to <U+0041, U+05B0, U+0328>, as U+05B0 has a

higher combining class (202) than U+05B0 (10). What does this become in NFC? Is the reordering reversed and the combination reapplied?



First an attempt is made to compose U+0041 and U+05B0. There is no character allowing for this, so that attempt will fail. Then an attempt is made to compose U+0041 and U+0328 which will produce U+0104. U+0041 is replaced with U+0104 and U+0328 is removed resulting in <U+0104, U+05B0>.


It's not a reordering per se, as the first combining character is given the first "opportunity" to combine.


Thanks for the clarification.



This is not only a theoretical issue as the same applies to some real combinations. There was discussion only last week on the bidi list of a form which might be encoded <U+064A, U+0652, U+0654> but which would be

messed up if composed into <U+0626, U+0652>.



Yes, NFC would perform that composition. Are you sure it would be an issue? Applying bidi rules doesn't seem to make this an issue. <U+064A, U+0652, U+0654> bidi: Al, NSM, NSM applying rule W1 from USA9: Al, NSM, NSM -> Al, Al, NSM -> Al, Al, Al.

<U+0626, U+0652>
bidi: Al, NSM
applying rule W1:
Al, NSM -> Al, Al

Or is the issue with something else, but it came up on the bidi list?



The problem isn't with the bidi rules but with more general Arabic shaping etc. There are two issues, one the position of the hamza (in this case it should be to the left of the sukun) and the other that the medial form of U+064A has dots below, which are required in this combination, but the medial form of U+0626 does not. But I think we concluded that U+0654 alone is not suitable for encoding this particular hamza.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Reply via email to