On 23/11/16 11:20, Philippe Verdy wrote:
2016-11-23 12:00 GMT+01:00 Tom Hacohen <t...@osg.samsung.com
<mailto:t...@osg.samsung.com>>:


    Also take another look at
    http://www.unicode.org/reports/tr29/#Grapheme_Cluster_and_Format_Rules
    <http://www.unicode.org/reports/tr29/#Grapheme_Cluster_and_Format_Rules>
    specifically the table that shows another way of writing the ignore
    rule. This again shows my understanding of rule 4 is correct.

    Specially look at the following equivalence:
    X Y × Z W       ⇒       X (Extend | Format)* Y (Extend | Format)* ×
    Z (Extend | Format)* W


This expansion does not occur before rule WB4; it cannot be used to
transform rules WB1 to WB3c; this is explicitly stated in the algorithm.
And because the rule WB3c handles your case, you are misinterpreting the
specs as if it was applying there too...


I took a look at the ICU sources, and they explicitly mention this case, so it seems I was mistaken with interpreting the intention of the UAX. I still find it confusing, but based on this thread, it seems to just be me.

Sorry for the noise.

The comment from the ICU source code:
# Rule 3c ZWJ x (Extended_Pict | EmojiNRK). Precedes WB4, so no intervening Extend chars allowed.

Thanks for your help,
Tom

Reply via email to