The HTML validation service from W3C at:
http://validator.w3.org
has been commended on this list and appears to be sophisticated
and fast.
Tests run on non-BMP text show no problem for Plane One using
UTF-8 encoding but error messages are generated when these
characters are referenced as
At 3:07 AM -0800 12/16/01, James Kass wrote:
Tests run on non-BMP text show no problem for Plane One using
UTF-8 encoding but error messages are generated when these
characters are referenced as NCRs.
I suspect there's a lot of random mistakes like this waiting to be
discovered. I recently
Elliotte Rusty Harold wrote,
I suspect a lot of our tools haven't been thoroughly tested with
PLane-1 and are likely to have these sorts of bugs in them.
Since Plane One is still fairly new, this is understandable.
I'm also having trouble getting Plane Zero pages to validate.
Spent
Hello James (and everybody else),
Can you please send comments and bug reports on the validator to
[EMAIL PROTECTED]? Sending bug reports to the right address
seriously increases the chance that they get fixed.
Regards, Martin.
At 14:46 01/12/16 -0800, James Kass wrote:
Elliotte Rusty Harold
At 07:16 01/12/14 -0800, James Kass wrote:
Having an HTML validator, like Tidy.exe, which generates errors
or warnings every time it encounters a UTF-8 sequence is
unnerving. It's especially irritating when the validator
automatically converts each string making a single UTF-8
character into two
As the person who implemented UTF-8 checking for http://validator.w3.org,
I beg to disagree. In order to validate correctly, the validator has
to make sure it correctly interprets the incomming byte sequence as
a sequence of characters. For this, it has to know the character
encoding. As an
Martin Duerst wrote,
This is really bad. Have you made sure you have the right
options? Tidy has a lot of options.
It sure does. One of which is -utf8. Using this option
(tidy -utf8 -f output.txt -m input.htm)
works like a charm, directing the errors and warnings for
an HTML file called
Martin Duerst wrote,
As the person who implemented UTF-8 checking for http://validator.w3.org,
I beg to disagree. In order to validate correctly, the validator has
to make sure it correctly interprets the incomming byte sequence as
a sequence of characters. For this, it has to know the
19940405
Hello,
Does the Clean development team plan to make Concurrent Clean partially or fully Unicode compliant in their future releases, as this is crucial for those of us who use non-European writing systems, and more generally for those who develop truly global applications.
Thanks in
Welé Negga wrote,
Does the Clean development team plan to make Concurrent
Clean partially or fully Unicode compliant in their future
releases, as this is crucial for those of us who use non-European
writing systems, and more generally for those who develop
truly global applications.
It is
W3C's HTML validation service seems to have no such problems.
We've been using it to validate all the files on the unicode
site regularly.
A validator *should* look between the and in order to
catch invalid entity references, esp. invalu NCRs.
For UTF-8, it would ideally also check that no
Asmus Freytag wrote,
A validator *should* look between the and in order to
catch invalid entity references, esp. invalu NCRs.
For UTF-8, it would ideally also check that no ill-formed,
and therefore illegal, sequences are part of the UTF-8.
You've made a good point about invalid NCRs
James,
NCRs *are* markup. And validating that the encoding matches
the declaration (e.g. UTF-8 is not ill-formed) has nothing
whatsoever to do with content, but all with verifying that
the file conforms to the HTML specification.
All this is completely different from spelling and grammar
Asmus Freytag wrote,
NCRs *are* markup.
Whether they are called mark-up or macros, they are
certainly part of HTML and I was not disagreeing with you
that they should be checked by the validator.
And validating that the encoding matches
the declaration (e.g. UTF-8 is not ill-formed)
14 matches
Mail list logo