Martin Duerst wrote,
> As the person who implemented UTF-8 checking for http://validator.w3.org, > I beg to disagree. In order to validate correctly, the validator has > to make sure it correctly interprets the incomming byte sequence as > a sequence of characters. For this, it has to know the character > encoding. As an example, there are many files in iso-2022-jp or > shift_jis that are prefectly valid as such, but will get rejected > by some tools because they contain bytes that correspond to '<' in > ASCII as part of a doublebyte character. > Excellent example. Use of less-than bracket bytes in certain encoding methods hadn't occurred to me. HTML validators need to be aware of the encoding used in the file. Based on your comments and other comments in this thread, I concede the point. A validator should validate that the plain text portion of an HTML file is properly encoded/well formed. Best regards, James Kass.

