[Sorry -- hit "Send" again too soon] It is either one code point (lenient parser) or an error (strict parser). It is never two. I put samples on: http://www.macchiato.com/utc/samples_of_utf8.htm Mark ----- Original Message ----- From: "Marco Cimarosti" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: "'Mark Davis'" <[EMAIL PROTECTED]> Sent: Tuesday, June 05, 2001 05:03 Subject: RE: UTF-8S (was: Re: ISO vs Unicode UTF-8) > Mark Davis wrote: > > - I am well aware that one can accept 6-byte supplementary > > characters on > > input in UTF-8. (Did you really think I wasn't?) > > (O, no, I know you knew!) > > But how should this 6-byte sequence be interpreted by a standard UTF-8 > decoder? Does it become one or two code points? > > _ Marco > >
- RE: UTF-8S (was: Re: ISO vs Unicode UTF-8) Carl W. Brown
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Peter_Constable
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Mark Davis
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Peter_Constable
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Mark Davis
- RE: UTF-8S (was: Re: ISO vs Unicode UTF-8) Marco Cimarosti
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Mark Davis
- RE: UTF-8S (was: Re: ISO vs Unicode UTF-8) Marco Cimarosti
- RE: UTF-8S (was: Re: ISO vs Unicode UTF-8) Carl W. Brown
- RE: UTF-8S (was: Re: ISO vs Unicode UTF-8) Misha . Wolf
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Mark Davis
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Michael \(michka\) Kaplan
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Peter_Constable
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Markus Scherer
- RE: UTF-8S (was: Re: ISO vs Unicode UTF-8) Carl W. Brown
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Peter_Constable
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Peter_Constable
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Mark Davis
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Peter_Constable
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) DougEwell2
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) DougEwell2