Sam Ruby wrote:
[snip]
HTML5 can do one better. Instead of handling presentational MathML as a
special case, this support can be generalized. When a non-HTML element
is encountered inside a HTML document, the parser could make one
additional check: does this attribute have a xmlns attribute defined? If
so, it can enter a "consume foreign markup" stage whereby these elements
are simply placed into the resulting DOM. Such elements would therefore
be made available to processors like JavaScript, which could enable some
cool applications.
[snip]
Finally (whew!) unlike Microsoft's mis-advertised and undocumented XML
data islands, theis "architected HTML extension sytax" would clearly and
unabashedly be parsed by HTML5 parser rules for things like comments and
attributes.
[snip]
An HTML parser relies on knowledge of the schema, so it's not easy to
parse an arbitrary, unknown schema with an HTML parser.
For example, HTML offers no syntactic way to differentiate between
"void" elements like <br> and normal elements like <div>. The parser
just "knows" that BR is void.
Likewise, the content model of the <script> element is "hardcoded" into
the parser; there's no way to discover it from the syntax alone. (I'll
admit that there's no similar construct to the content model of <script>
in XML, however, so this particular difference doesn't pose a problem.)
In order to handle custom elements in HTML while still allowing them to
appear in the DOM, you'd have to make some rules such as that no void
elements are allowed. You'd have to write otherwise-void elements as,
say, <img></img> in order to have them handled correctly by the parser.
Even if you aren't constructing a DOM of these unknown elements, you
need to be able to count opens and closes so that you can detect the end
of the root custom element and resume normal parsing.
> Standard browsers would be advised to ignore extensions that they
> don't understand. Including any text, so we don't have a repeat of
> the <table> problem again.
I'm sure you realise this, but there are already browsers out there that
*don't* ignore extensions that they don't understand, so a mandate such
as this would be meaningless.