Asmus Freytag wrote,
> A validator *should* look between the > and < in order to > catch invalid entity references, esp. invalu NCRs. > > For UTF-8, it would ideally also check that no ill-formed, > and therefore illegal, sequences are part of the UTF-8. You've made a good point about invalid NCRs or named entities. But, I think it's up to the author to proofread the actual text in an appropriate application. Is the HTML validator going to also be expected to check for grammar, spelling, and use of punctuation? There is so much text on the web using many different encoding methods. Big-5, Shift-JIS, and similar encodings are fairly well standardised and supported. Now, in addition to UTF-8, a web page might be in UTF-16 or perhaps even UTF-32, eventually. Plus, there's a plethora of non-standard encodings in common use today. An HTML validator should validate the mark-up, assuring an author that (s)he hasn't done anything incredibly dumb like having two </title> tags appearing consecutively. Really, this is all that we should expect from an HTML validator. Extra features such as checking for invalid UTF-8 sequences would probably be most welcome, but there are other tools for doing this which an author should already be using. Best regards, James Kass.