Ken,
Hmm, is that true? Is it ok then, if I detect an unpaired surrogate, mutter
"oops I have an error" and then drop that surrogate and continue processing
the file, resulting in a valid utf-8 file?

I thought for some reason this was prohibited, but if the standard does not
prescribe error handling, than this seems legit.

tex


Kenneth Whistler wrote:
> Absolutely. Error handling is a matter of software design, and not
> something mandated in detail by the Unicode Standard.
> 
> If you write software which handles a GIF image, and there is
> a corrupted byte in the middle of a 118K GIF file, you don't go
> to the GIF specification itself, e.g.,
> http://www.w3.org/Graphics/GIF/spec-gif87.txt
> to tell your software what to do after it has processed the first
> 59K bytes (or whatever). The GIF specification just tells you
> what a well-formed GIF image is.
> 
> Likewise, the Unicode Standard tells you what a well-formed
> UTF-8 byte sequence is. But it is the software designer who has
> to be smart about determining what his/her software will do when
> it encounters an error condition and finds itself dealing
> with a sequence which is ill-formed according to the specification
> of UTF-8 in the Unicode Standard.
> 
> --Ken

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master                          http://www.i18nGuy.com
                         
XenCraft                            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------

Reply via email to