Re: UAX 29 9.0.0 new emoji flag rules questions and comments

Mark Davis ☕️ Wed, 22 Jun 2016 04:36:02 -0700

That wouldn't work. The process works by taking each offset, and walking
through all the rules, using the first one that matches.


So with your rules and the following input:

RI RI RI RI RI RI

You'd get that any offset with at least 2 RI on the right and on the left
would have no break, and every thing else would have a break, thus:

RI x RI ÷ RI ÷ RI ÷ RI x RI


Mark

On Wed, Jun 22, 2016 at 1:10 PM, Daniel Bünzli <[email protected]>
wrote:

>
> Le mercredi, 22 juin 2016 à 01:32, Laurentiu Iancu a écrit :
> > Re #1, the ^ symbol indeed denotes a start-of-line anchor, in usual
> regex notation, and the corresponding rules could use sot instead.
>
> By the way it seems to me that an equivalent formulation of GB12/GB13 and
> WB15/WB16 would be to have the sequence of rules:
>
> RI RI ÷ RI RI
> RI x RI
>
> This fits particularly well in the case of word breaking since you already
> need as much context as this because of the rules WB{6,7,11,12}. It also
> avoids regexps and negation.
>
> Best,
>
> Daniel
>
>

Re: UAX 29 9.0.0 new emoji flag rules questions and comments

Reply via email to