On Tue, Oct 8, 2019 at 7:28 AM Richard Wordingham via Unicode < unicode@unicode.org> wrote:
> An example UTS#18 gives for matching a literal cluster can be simplified > to, in its notation: > > [c \q{ch}] > > This is interpreted as 'match against "ch" if possible, otherwise > against "c". Thus the strings "ca" and "cha" would both match the > expression > > [c \q{ch}]a > > while "chh" but not "ch" would match against > > [c \q{ch}]h > Right. We just independently discussed this today in the UTC meeting, connected with the "properties of strings" discussion in the proposed update. [c \q{ch}]h should work like (ch|c)h. Note that the order matters in the alternation -- so this works equivalently if longer strings are sorted first. May I correctly argue instead that matching against literal clusters > would be satisfied by instead supporting, for this example, the regular > subexpression "(c|ch)" or the UnicodeSet expression "[c{ch}]"? > ICU UnicodeSet [c{ch}] is equivalent to UTS #18 [c\q{ch}]. ICU's UnicodeSet syntax is simpler, the UTS #18 syntax is more backward-compatible. Best regards, markus