Re: UTF-8S (was: Re: ISO vs Unicode UTF-8)

Mark Davis Tue, 05 Jun 2001 08:38:20 -0700

[Sorry -- hit "Send" again too soon]

It is either one code point (lenient parser) or an error (strict parser). It
is never two.

I put samples on:

http://www.macchiato.com/utc/samples_of_utf8.htm

Mark

----- Original Message -----
From: "Marco Cimarosti" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "'Mark Davis'" <[EMAIL PROTECTED]>
Sent: Tuesday, June 05, 2001 05:03
Subject: RE: UTF-8S (was: Re: ISO vs Unicode UTF-8)


> Mark Davis wrote:
> > - I am well aware that one can accept 6-byte supplementary
> > characters on
> > input in UTF-8. (Did you really think I wasn't?)
>
> (O, no, I know you knew!)
>
> But how should this 6-byte sequence be interpreted by a standard UTF-8
> decoder? Does it become one or two code points?
>
> _ Marco
>
>

Re: UTF-8S (was: Re: ISO vs Unicode UTF-8)

Reply via email to