When it comes to methods operating on
buffers there's always the tension between viewing the buffer as
text elements vs. as data elements. For some purposes, from error
detection to data cleanup you need to be able to treat the buffer
as data elements. For many ot
2015-10-20 2:07 GMT+02:00 Richard Wordingham <
richard.wording...@ntlworld.com>:
> Now, as we know, UTF-32 does not handle the full range of Unicode code
> points;
??? All valid UTFs handle the full range of valid Unicode code points. This
includes UTF-32 as well as UTF-16 and UTF-8 (and their v
On Mon, 19 Oct 2015 13:32:07 -0700
"Doug Ewell" wrote:
> Richard Wordingham wrote:
> > It was the once the
> > case that basic Unicode support in regular expressions required a
> > regular expression engine to be able to search for specified lone
> > surrogates - a real show-stopper for an engin
On Mon, Oct 19, 2015 at 1:32 PM, Doug Ewell wrote:
> > ICU (but perhaps it's actually Java) seems to have a culture of
> > tolerating lone surrogates, and rules for handling lone surrogates are
> > strewn across the Unicode standards and annexes.
>
> I suspect you have an example.
I have exampl
2015-10-19 22:32 GMT+02:00 Doug Ewell :
> Philippe Verdy wrote:
>
> > No ! The "supplementary code points" (or "supplementary characters"
> > when they are assigned to characters) are represented in UTF-16 as two
> > **code units**, NOT as two "code points" (even if their binary value
> > are rela
Richard Wordingham wrote:
>> This discussion was originally about how to handle unpaired
>> surrogates, as if that were a normal use case.
>
> And the subject line was changed when the topic changed to
> traversing strings.
Granted. I've changed it again to reflect t
6 matches
Mail list logo