Re: [whatwg] Thesis draft about HTML5 conformance checking

olivier Thereaux Sun, 11 Mar 2007 19:27:19 -0800


On Mar 11, 2007, at 02:15 , Henri Sivonen wrote:

The draft of my master's thesis is available for commenting at:
http://hsivonen.iki.fi/thesis/

Henri, congratulations on your work on the HTML conformance checkerand on the Thesis. It's been a truly informative and enlighteningreading, especially the parts where you develop on the (im)possibility of using only schemas to describe conformance to thehtml5 specs. This is a question that has been bothering me for a longtime, especially as there is only one (as of today) production-readyconformance checking tool not based on some kind (or combination) ofschema-based parsers, and although, as it is often pointed out, nobrowser uses a DTD-based parser in their engine today, I still thinkproducing a schema representation of (most of) the conformancecriteria help adoption and implementation.



Some comments based on first read through the thesis, below.

I'm cross-posting them to the www-validator list at w3c, as I thinkyour thesis will be of interest to a number of subscribers of thatlist too.

For www-validator, Henri's announcement and rfc -

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2007-March/009941.html

[2.3.2] I share the view of the Web that holds WebKit, Presto,Gecko and Trident (the engines of Safari, Opera, Mozilla/Firefoxand IE, respectively) to be the most important browser engines.

Did you have a chance to look at engines in authoring tools? Whattype of parser do NVU, Amaya, golive etc work on?How about parsing engines for search engine robots? These areprobably as important, if not more as some of the browser engines indefining the "generic" engine for the web today.

[4.1] The W3C Validator sticks strictly to the SGML validityformalism. It is often argued that it would be inappropriate for aprogram to be called a “validator” unless it checks exactly forvalidity in the SGML sense of the word – nothing more, nothing less.

That's very true, there's a strong reluctance from part of thevalidator user community tool to do anything else than formalvalidation, mostly (?) out of fear that it would eventually make theterm of "validation" meaningless. The only thing the validator doesbeyond DTD validation are the preparse checks on encoding, presenceof doctype, media type etc.

I think it will change over time, in fact it's already changing, asthe innards of the validator have moved to a SAX-based parsing. It'sgoing to be an opportunity to add data type checking and move closerto conformance checker than validator. Work at W3C on Unicorn [1] andlittle modules such as the Appendix C checker [2] for XHTML1.0 alsogo in that direction.


[1] http://www.w3.org/QA/Tools/Unicorn/
[2] http://dev.w3.org/cvsweb/perl/modules/W3C/XHTML/HTMLCompatChecker/

[6.1.3] Erroneous Source Is Not Shown
The error messages do not show the erroneous markup. For thisreason it is unnecessarily hard for the user to see where theproblem is.

Was this by lack of time? Did you have a look at existingimplementations? Oh I see [ 8.10 Showing the Erroneous Source Markup]as future work. If you're looking for a decent, though by no meansperfect, implementation, look for sub truncate_line in

http://dev.w3.org/cvsweb/~checkout~/validator/httpd/cgi-bin/check

(this is to be modularized out of the check script and into a cpanmodule sooner or later, see [3])


[3] http://esw.w3.org/topic/SoftwareProjects

[6.2] Instead of modifying the libraries themselves, an alternativeapproach to localization would be reverse templating. The Englishmessages would be matched against known patterns that would allowthe variable parts to be extracted. The variable parts could thenbe plugged into a translated message corresponding to the matchedpattern.

This is something I have been looking at, and had come to the sameconclusion. I'm hoping to be able to reuse, in one way or another,the existing localization of some of the libraries being used (e.g.OpenSP, with all its issues, has a very impressive localization record).

[8.1] Even though the software developed in this project is FreeSoftware / Open Source, it has not been developed in a way thatwould make it easily approachable to potential contributors.Perhaps the most pressing need for change in order to move thesoftware forward after the completion of this thesis is moving thesoftware to a public version control system and making building anddeploying the software easy.

Making it available on a more open-sourcey system, with a multi-userrevision system will probably not create an explosion of codecontributors (you've had very helpful contributions from e.g Elika,and most OS projects, even successful ones, never have more than ahandful of coders), but you may be able to create a healthy communityof users, reviewers, bug spotters, translators, document editors,beyond the whatwg community.

If you're interested in using w3c logistics, and benefit from theexisting communities around w3c, I'm happy to invite you. Sourceforgewould be another excellent choice - only with different tools,different community of users.

[8.8] To support the use of the conformance checker back end fromother applications (non-Java applications in particular), a Webservice would be useful.


Indeed. Did you have a chance to look at EARL?

I wrote some basic notes at http://lists.w3.org/Archives/Public/www-validator/2007Mar/0005and the EARL WG staff contact helped me answer some questions, and re-assessed that validators/conformance checkers where one of their mainuse cases.


Hope these initial thoughts/comments can be useful.
Thanks again for your interesting work.

--
olivier

Re: [whatwg] Thesis draft about HTML5 conformance checking

Reply via email to