Re: Unpaired surrogates

2015-10-19 Thread Richard Wordingham
On Mon, 19 Oct 2015 13:32:07 -0700 "Doug Ewell" wrote: > Richard Wordingham wrote: > > It was the once the > > case that basic Unicode support in regular expressions required a > > regular expression engine to be able to search for specified lone > > surrogates - a real show-stopper for an engin

Re: Unpaired surrogates (was: Re: Why Work at Encoding Level?)

2015-10-19 Thread Markus Scherer
On Mon, Oct 19, 2015 at 1:32 PM, Doug Ewell wrote: > > ICU (but perhaps it's actually Java) seems to have a culture of > > tolerating lone surrogates, and rules for handling lone surrogates are > > strewn across the Unicode standards and annexes. > > I suspect you have an example. I have exampl

Re: Unpaired surrogates (was: Re: Why Work at Encoding Level?)

2015-10-19 Thread Philippe Verdy
2015-10-19 22:32 GMT+02:00 Doug Ewell : > Philippe Verdy wrote: > > > No ! The "supplementary code points" (or "supplementary characters" > > when they are assigned to characters) are represented in UTF-16 as two > > **code units**, NOT as two "code points" (even if their binary value > > are rela

Re: Why Work at Encoding Level?

2015-10-19 Thread Richard Wordingham
On Mon, 19 Oct 2015 21:35:16 +0200 Philippe Verdy wrote: > 2015-10-19 20:53 GMT+02:00 Richard Wordingham < > richard.wording...@ntlworld.com>: > > The word > > 'codepoint' is even worse, as a supplementary plane codepoint is > > represented by two BMP codepoints. > No ! The "supplementary code

Unpaired surrogates (was: Re: Why Work at Encoding Level?)

2015-10-19 Thread Doug Ewell
Richard Wordingham wrote: >> This discussion was originally about how to handle unpaired >> surrogates, as if that were a normal use case. > > And the subject line was changed when the topic changed to > traversing strings. Granted. I've changed it again to reflect this specific issue. > How abo

Re: Why Work at Encoding Level?

2015-10-19 Thread Philippe Verdy
2015-10-19 20:53 GMT+02:00 Richard Wordingham < richard.wording...@ntlworld.com>: > On Mon, 19 Oct 2015 10:07:31 -0700 > "Doug Ewell" wrote: > > > This discussion was originally about how to handle unpaired > > surrogates, as if that were a normal use case. > > And the subject line was changed wh

Re: Why Work at Encoding Level?

2015-10-19 Thread Richard Wordingham
On Mon, 19 Oct 2015 10:07:31 -0700 "Doug Ewell" wrote: > This discussion was originally about how to handle unpaired > surrogates, as if that were a normal use case. And the subject line was changed when the topic changed to traversing strings. > Regardless of what encoding model is used to han

Re: Why Work at Encoding Level?

2015-10-19 Thread Doug Ewell
This discussion was originally about how to handle unpaired surrogates, as if that were a normal use case. Regardless of what encoding model is used to handle characters under the hood, and regardless of how the Delete key should work with actual characters or clusters, there is never any excuse f