Frank Tang responded to Kent Karlsson's response: > The problem I need to deal with is not GENERATE those UTF-8, but how to > handle these DATA when my code receive it. For example, when I receive a > 10K UTF-8 file which have 1000 lines of text, if there are one UTF-8 > sequence in the line 990 are ill-formed, should I fire the "error" for > 1. the whole file (10K, 1000 lines), > 2. all the line after line 899, > 3. the line 990 itslef,
etc. etc. > > I there are others way you can scope the ERROR, I probably can continue > it on and on and tell you 10-20 other way to scope it if I spend 20 more > minutes. > > I do believe the error handling should be application specific. Absolutely. Error handling is a matter of software design, and not something mandated in detail by the Unicode Standard. If you write software which handles a GIF image, and there is a corrupted byte in the middle of a 118K GIF file, you don't go to the GIF specification itself, e.g., http://www.w3.org/Graphics/GIF/spec-gif87.txt to tell your software what to do after it has processed the first 59K bytes (or whatever). The GIF specification just tells you what a well-formed GIF image is. Likewise, the Unicode Standard tells you what a well-formed UTF-8 byte sequence is. But it is the software designer who has to be smart about determining what his/her software will do when it encounters an error condition and finds itself dealing with a sequence which is ill-formed according to the specification of UTF-8 in the Unicode Standard. --Ken