Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

Tex Texin Thu, 27 Feb 2003 22:13:37 -0800

Kenneth Whistler wrote:
> Yes, it is true. All the standard *mandates* is what I quoted
> previously in this thread:
> 
> "C12a When a process interprets a code unit sequence which purports
>       to be in a Unicode character encoding form, it shall treat
>       ill-formed code unit sequences as an error condition, and
>       shall not interpret such sequences as characters."
> 
> > Is it ok then, if I detect an unpaired surrogate, mutter
> > "oops I have an error" and then drop that surrogate and continue processing
> > the file, resulting in a valid utf-8 file?
> 
> Hmm, I think you may be mixing the UTF-16 case with the UTF-8
> case, but...


Ken, thanks for the reply. 
I thought at some point along the way this thread was discussing utf-16 to
utf-8 conversion, which is where I was coming from. (Must've glommed some
threads or even some lists together.)

I certainly agree that reporting an error is the right design. However, there
is software out there that didn't anticipate an error could be generated
during the conversion. With the advent of surrogates and the clarification of
how UTF-8 is to be generated for surrogates, it becomes an issue, but can be
difficult to address when the upper layers aren't prepared for it. Anyway, for
some reason I thought the situation was also counter to the standard. Now I
know it is just bad design.

tex


-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master                          http://www.i18nGuy.com
                         
XenCraft                            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

Reply via email to