Re: UAX #29 and WB4

2020-03-04 Thread Mark Davis ☕️ via Unicode
One thing we have considered for a while is whether to do a rewrite of the rules to simplify the processing (and avoid the "treat as" rules), but it would take a fair amount of design work that we haven't had time to do. If you (or others) are interested in getting involved, please let us know. Ma

Re: UAX #29 and WB4

2020-03-04 Thread Daniel Bünzli via Unicode
On 4 March 2020 at 18:48:09, Daniel Bünzli (daniel.buen...@erratique.ch) wrote: > On 4 March 2020 at 18:01:25, Daniel Bünzli (daniel.buen...@erratique.ch) > wrote: > > > Re-reading the text I suspect I should not restart the rules from the first > > one when a > WB4 > > rewrite occurs but on

Re: UAX #29 and WB4

2020-03-04 Thread Daniel Bünzli via Unicode
On 4 March 2020 at 18:01:25, Daniel Bünzli (daniel.buen...@erratique.ch) wrote: > Re-reading the text I suspect I should not restart the rules from the first > one when a WB4 > rewrite occurs but only apply the subsequent rules. Is that correct ? However even if that's correct I don't understa

UAX #29 and WB4

2020-03-04 Thread Daniel Bünzli via Unicode
Hello,  My implementation of word break chokes only on the following test case from the file [1]:  ÷ 0020 × 0308 ÷ 0020 ÷ #  ÷ [0.2] SPACE (WSegSpace) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] SPACE (WSegSpace) ÷ [0.3]  I find:  ÷ 0020 × 0308 × 0020 ÷ Basically my implementation uses