On Dec 2, 2006, at 18:24, Sam Ruby wrote:

It would not be wise for HTML5 to limit itself to the more constrained
character set of XML.  In particular, the form feed character is
pretty popular,

This is yet another case where "take HTML5, read it into a DOM, and
serialize it as XML, and voilĂ : you have valid XHTML" doesn't work.

What I am advocating is making sure that *conforming* HTML5 documents can be serialized as XHTML5 without dataloss. This is important in order to be able to promise that an "XML tool chain" can be used for processing *conforming* HTML5 by sticking an HTML5 parser in front of the processing pipeline (for *non-browser* use cases like data mining, content management or conformance checking where scripts aren't executed nor CSS rendering performed). The motivation is to make processing HTML5 in non-browser apps less expensive without giving an incentive for the solutions to violate the spec ad hoc on their own.

For example, an "XML tool chain" is important enough for my conformance checking service that if at this point the assumption of *conforming* HTML5 being convertible to XHTML5 was broken in corner cases, I'd probably come up with ad hoc trickery for masking it instead of throwing away the tool chain. I'd prefer not having to do that and not having to explain to everyone else who finds an "XML tool chain" to be of value what tricks I needed to pull off to fake it.

I am not suggesting that HTML5 browsers halt and catch fire upon finding a form feed. And it is obvious that lossless conversion of all possible non-conforming HTML5 documents to XML is impossible anyway, so making that a goal would not be worthwhile.

But what legitimate and popular use would a form feed have in HTML5? Why can't we call it non-conforming? Are there use cases other than converting .txt RFCs to HTML with regexps without bothering to get rid of the form feeds?

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/


Reply via email to