Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

Asmus Freytag Sun, 02 Mar 2003 21:33:39 -0800

At 07:21 AM 3/2/03 -0800, Mark Davis wrote:

>    "C12a When a process interprets a code unit sequence which
>     purports to be in a Unicode character encoding form, it
>     shall treat ill-formed code unit sequences as an error
>     condition, and shall not interpret such sequences as
>     characters."

Can we agree or disagree on whether an API that returns an error code, but also an output buffer that contains a simplistic conversion of the erroneous sequence is or is not conformant.

To me it seems that by setting an error flag in the return code, the API has signalled that the user should not treat the output as containing correct Unicode.

Such an API design (on a low enough level) might strike the right balance between between usability in many different environments and satisfying the formal requirement.

The ideal case is one where the converter stops in a restartable configuration, allowing the client to implement (or ask for) a variety of error-recovery options. However, such an interface requires a lot of thought and may be difficult to implement for some language/platform/library environments. Further, it may be unnecessarily difficult to use for at least some conceivable clients.

A./

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

Reply via email to