That wouldn't work. The process works by taking each offset, and walking through all the rules, using the first one that matches.
So with your rules and the following input: RI RI RI RI RI RI You'd get that any offset with at least 2 RI on the right and on the left would have no break, and every thing else would have a break, thus: RI x RI ÷ RI ÷ RI ÷ RI x RI Mark On Wed, Jun 22, 2016 at 1:10 PM, Daniel Bünzli <[email protected]> wrote: > > Le mercredi, 22 juin 2016 à 01:32, Laurentiu Iancu a écrit : > > Re #1, the ^ symbol indeed denotes a start-of-line anchor, in usual > regex notation, and the corresponding rules could use sot instead. > > By the way it seems to me that an equivalent formulation of GB12/GB13 and > WB15/WB16 would be to have the sequence of rules: > > RI RI ÷ RI RI > RI x RI > > This fits particularly well in the case of word breaking since you already > need as much context as this because of the rules WB{6,7,11,12}. It also > avoids regexps and negation. > > Best, > > Daniel > >

