Re: Clean and Unicode compliance

James Kass Sun, 16 Dec 2001 20:49:11 -0800


Martin Duerst wrote,


> As the person who implemented UTF-8 checking for http://validator.w3.org,
> I beg to disagree. In order to validate correctly, the validator has
> to make sure it correctly interprets the incomming byte sequence as
> a sequence of characters. For this, it has to know the character
> encoding. As an example, there are many files in iso-2022-jp or
> shift_jis that are prefectly valid as such, but will get rejected
> by some tools because they contain bytes that correspond to '<' in
> ASCII as part of a doublebyte character.
> 

Excellent example.  Use of less-than bracket bytes in certain 
encoding methods hadn't occurred to me.

HTML validators need to be aware of the encoding used in the
file.  Based on your comments and other comments in this thread, 
I concede the point.  A validator should validate that the plain
text portion of an HTML file is properly encoded/well formed.

Best regards,

James Kass.

Re: Clean and Unicode compliance

Reply via email to