Re: Clean and Unicode compliance

James Kass Fri, 14 Dec 2001 12:47:44 -0800


Asmus Freytag wrote,


> A validator *should* look between the > and < in order to
> catch invalid entity references, esp. invalu NCRs.
> 
> For UTF-8, it would ideally also check that no ill-formed,
> and therefore illegal, sequences are part of the UTF-8.

You've made a good point about invalid NCRs or named entities.

But, I think it's up to the author to proofread the actual text
in an appropriate application.

Is the HTML validator going to also be expected to check for
grammar, spelling, and use of punctuation?

There is so much text on the web using many different
encoding methods.  Big-5, Shift-JIS, and similar encodings
are fairly well standardised and supported.  Now, in addition
to UTF-8, a web page might be in UTF-16 or perhaps even 
UTF-32, eventually.  Plus, there's a plethora of non-standard 
encodings in common use today.  An HTML validator should
validate the mark-up, assuring an author that (s)he hasn't
done anything incredibly dumb like having two </title>
tags appearing consecutively.  Really, this is all that we should
expect from an HTML validator.  Extra features such as 
checking for invalid UTF-8 sequences would probably be most 
welcome, but there are other tools for doing this which an 
author should already be using.

Best regards,

James Kass.

Re: Clean and Unicode compliance

Reply via email to