UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

Kenneth Whistler Thu, 27 Feb 2003 13:38:40 -0800

Frank Tang responded to Kent Karlsson's response:

> The problem I need to deal with is not GENERATE those UTF-8, but how to 
> handle these DATA when my code receive it. For example, when I receive a 
> 10K UTF-8 file which have 1000 lines of text, if there are one UTF-8 
> sequence in the line 990 are ill-formed, should I fire the "error" for
> 1. the whole file (10K, 1000 lines),
> 2. all the line after line 899,
> 3. the line 990 itslef,


etc. etc.

> 
> I there are others way you can scope the ERROR, I probably can continue 
> it on and on and tell you 10-20 other way to scope it if I spend 20 more 
> minutes.
> 
> I do believe the error handling should be application specific.

Absolutely. Error handling is a matter of software design, and not
something mandated in detail by the Unicode Standard.

If you write software which handles a GIF image, and there is
a corrupted byte in the middle of a 118K GIF file, you don't go
to the GIF specification itself, e.g.,
http://www.w3.org/Graphics/GIF/spec-gif87.txt
to tell your software what to do after it has processed the first
59K bytes (or whatever). The GIF specification just tells you
what a well-formed GIF image is.

Likewise, the Unicode Standard tells you what a well-formed
UTF-8 byte sequence is. But it is the software designer who has
to be smart about determining what his/her software will do when
it encounters an error condition and finds itself dealing
with a sequence which is ill-formed according to the specification
of UTF-8 in the Unicode Standard.

--Ken

UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

Reply via email to