Re: [whatwg] Valid Unicode

Henri Sivonen Sat, 02 Dec 2006 15:41:30 -0800

On Dec 2, 2006, at 18:24, Sam Ruby wrote:

It would not be wise for HTML5 to limit itself to the more constrained
character set of XML.  In particular, the form feed character is
pretty popular,


This is yet another case where "take HTML5, read it into a DOM, and
serialize it as XML, and voilà: you have valid XHTML" doesn't work.

What I am advocating is making sure that *conforming* HTML5 documentscan be serialized as XHTML5 without dataloss. This is important inorder to be able to promise that an "XML tool chain" can be used forprocessing *conforming* HTML5 by sticking an HTML5 parser in front ofthe processing pipeline (for *non-browser* use cases like datamining, content management or conformance checking where scriptsaren't executed nor CSS rendering performed). The motivation is tomake processing HTML5 in non-browser apps less expensive withoutgiving an incentive for the solutions to violate the spec ad hoc ontheir own.

For example, an "XML tool chain" is important enough for myconformance checking service that if at this point the assumption of*conforming* HTML5 being convertible to XHTML5 was broken in cornercases, I'd probably come up with ad hoc trickery for masking itinstead of throwing away the tool chain. I'd prefer not having to dothat and not having to explain to everyone else who finds an "XMLtool chain" to be of value what tricks I needed to pull off to fake it.

I am not suggesting that HTML5 browsers halt and catch fire uponfinding a form feed. And it is obvious that lossless conversion ofall possible non-conforming HTML5 documents to XML is impossibleanyway, so making that a goal would not be worthwhile.

But what legitimate and popular use would a form feed have in HTML5?Why can't we call it non-conforming? Are there use cases other thanconverting .txt RFCs to HTML with regexps without bothering to getrid of the form feeds?


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/

Re: [whatwg] Valid Unicode

Reply via email to