Re: Possible bug in formal grammar for extended grapheme cluster

2017-12-18 Thread Mark Davis ☕️ via Unicode
If you look back at http://www.unicode.org/reports/tr29/tr29-27.html#GB8a (2015), the rule was simply not to break sequences of RI characters. We changed that in http://www.unicode.org/reports/tr29/tr29-29.html#GB12 (2016) to only group pairs. Unfortunately, the (informative) table

Re: Possible bug in formal grammar for extended grapheme cluster

2017-12-18 Thread Andre Schappo via Unicode
Ah! That explains why pcre2grep -u '^\X{1}$' matches with     ...etc... André Schappo On 17 Dec 2017, at 17:17, Mark Davis ☕️ via Unicode > wrote: Thanks for the feedback. You're correct about this; that is a holdover

Re: Possible bug in formal grammar for extended grapheme cluster

2017-12-17 Thread Mark Davis ☕️ via Unicode
Thanks for the feedback. You're correct about this; that is a holdover from an earlier version of the document when there was a more basic treatment of RI sequences. There is already an action to modify these. There is a placeholder review note about that just above

Possible bug in formal grammar for extended grapheme cluster

2017-12-17 Thread David P. Kendal via Unicode
Hi, It’s possible I’m missing something, but the formal grammar/regular expression given for extended grapheme clusters appears to have a bug in it. The bug is here: RI-Sequence := Regional_Indicator+