On Tuesday 22 November 2016 at 13:07, Tom Hacohen wrote:
> However, looking at the test case and the UAX[2], this does not look
> correct. More specifically, because of rule 4:
> ZWJ Extended GAZ -> ZWJ GAZ
> And then according to rule 3c, there should be no break opportunity 
> between them. 

I'd say this is not the right operational model. From [1]: 

"The rules are processed from top to bottom. As soon as a rule matches and 
produces a boundary status (boundary or no boundary) for that offset, the 
process is terminated."

So in this case between COMBINING DIAERESIS and HEAVY BLACK HEART rule WB4 
quicks in. It does not produce a boundary status, it only changes your offset 
context to ZWJ GAZ, as you mention. Now you continue applying the rules 
sequentially with WB6 which does not match, with WB7 which does not match,... 
and you'll get to WB999 which matches and produces a boundary status. 

After WB4 you do not restart the matching process from the beginning, as you 
do, leading you to say that WB3c should apply.

Best, 

Daniel


[1] http://www.unicode.org/reports/tr29/#Notation


Reply via email to