Sam Ruby wrote:
[snip]
HTML5 can do one better. Instead of handling presentational MathML as a special case, this support can be generalized. When a non-HTML element is encountered inside a HTML document, the parser could make one additional check: does this attribute have a xmlns attribute defined? If so, it can enter a "consume foreign markup" stage whereby these elements are simply placed into the resulting DOM. Such elements would therefore be made available to processors like JavaScript, which could enable some cool applications.

[snip]

Finally (whew!) unlike Microsoft's mis-advertised and undocumented XML data islands, theis "architected HTML extension sytax" would clearly and unabashedly be parsed by HTML5 parser rules for things like comments and attributes.
[snip]


An HTML parser relies on knowledge of the schema, so it's not easy to parse an arbitrary, unknown schema with an HTML parser.

For example, HTML offers no syntactic way to differentiate between "void" elements like <br> and normal elements like <div>. The parser just "knows" that BR is void.

Likewise, the content model of the <script> element is "hardcoded" into the parser; there's no way to discover it from the syntax alone. (I'll admit that there's no similar construct to the content model of <script> in XML, however, so this particular difference doesn't pose a problem.)

In order to handle custom elements in HTML while still allowing them to appear in the DOM, you'd have to make some rules such as that no void elements are allowed. You'd have to write otherwise-void elements as, say, <img></img> in order to have them handled correctly by the parser.

Even if you aren't constructing a DOM of these unknown elements, you need to be able to count opens and closes so that you can detect the end of the root custom element and resume normal parsing.

> Standard browsers would be advised to ignore extensions that they
> don't understand.  Including any text, so we don't have a repeat of
> the <table> problem again.

I'm sure you realise this, but there are already browsers out there that *don't* ignore extensions that they don't understand, so a mandate such as this would be meaningless.


Reply via email to