Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

Mark Davis Mon, 03 Mar 2003 13:39:05 -0800

> anything into the output buffer, even malformed Unicode, and still be


If your converter purports to produce any one of the Unicode encoding forms,
then it cannot conformantly produce malformed Unicode as a result.

If, of course, it does not purport to do that, it can do anything it wants
to.

Mark
________
[EMAIL PROTECTED]
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

----- Original Message -----
From: "Asmus Freytag" <[EMAIL PROTECTED]>
To: "Mark Davis" <[EMAIL PROTECTED]>; "Kent Karlsson"
<[EMAIL PROTECTED]>; "'Michael (michka) Kaplan'" <[EMAIL PROTECTED]>
Cc: "'Yung-Fong Tang'" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, March 03, 2003 12:17
Subject: Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for
review)


> At 11:52 AM 3/3/03 -0800, Mark Davis wrote:
> >Perhaps I wasn't clear; I agree with you on that.
> >
> >1) It is conformant to skip or substitute text, with just a code at the
end
> >indicating that something of that sort was done.
>
> It's a subtle point, but can be put into your formulation:
>
> What I was after is where the "substitution" itself isn't legal Unicode,
> i.e. an unpaired surrogate in UTF-32. My take is that, formally speaking,
> as long as there's an indication of an error condition, I'm free to put
> anything into the output buffer, even malformed Unicode, and still be
> conformant.
>
> >2) Or, if someone wants more flexibility, to stop at possible errors, and
> >give the client of the API information so that they can do more complex
> >processing.
> >
> >Mark
>
>
>

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

Reply via email to