Re: Unicode Sets in 'Unicode Regular Expressions'

Richard Wordingham Tue, 27 May 2014 17:23:23 -0700

On Wed, 28 May 2014 00:56:40 +0200
Charlie Ruland ☘ <rul...@luckymail.com> wrote:


> So I take “Unicode set” to mean “set of Unicode characters” with
> their respective codepoints, whether decomposable or not.

The decomposability issue arises when trying to follow RL2.1
"Canonical Equivalence".  In a pattern such as "f\p{L}te".
\p{L} is not just a set of codepoints if the pattern is to be matched
by "fête" when processing NFD strings.  This is one reason I think Ken
is right when he says the ICU meaning is intended.  I believe I have a
coherent resolution of RL2.1, but I'm currently wrestling with the
other requirements that an implementation satisfying the spirit of
RL2.1 ought to address.

Richard.

_______________________________________________
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

Re: Unicode Sets in 'Unicode Regular Expressions'

Reply via email to