On 23/11/16 11:20, Philippe Verdy wrote:
2016-11-23 12:00 GMT+01:00 Tom Hacohen <t...@osg.samsung.com
<mailto:t...@osg.samsung.com>>:
Also take another look at
http://www.unicode.org/reports/tr29/#Grapheme_Cluster_and_Format_Rules
<http://www.unicode.org/reports/tr29/#Grapheme_Cluster_and_Format_Rules>
specifically the table that shows another way of writing the ignore
rule. This again shows my understanding of rule 4 is correct.
Specially look at the following equivalence:
X Y × Z W ⇒ X (Extend | Format)* Y (Extend | Format)* ×
Z (Extend | Format)* W
This expansion does not occur before rule WB4; it cannot be used to
transform rules WB1 to WB3c; this is explicitly stated in the algorithm.
And because the rule WB3c handles your case, you are misinterpreting the
specs as if it was applying there too...
I took a look at the ICU sources, and they explicitly mention this case,
so it seems I was mistaken with interpreting the intention of the UAX. I
still find it confusing, but based on this thread, it seems to just be me.
Sorry for the noise.
The comment from the ICU source code:
# Rule 3c ZWJ x (Extended_Pict | EmojiNRK). Precedes WB4, so no
intervening Extend chars allowed.
Thanks for your help,
Tom