[Please don't cross-post without setting followup-to; set that to .performance]

Ernst Bauernfeind wrote:
as I can see, there are at least to different methods of parsing a (x)
html-document: a "tag-soup"-parser which is very error-prone and makes
the best out of non-valid websites. and a xml-parser, which is only
activated if a document is served with mime-type application/xhtml
+xml.

You mean "error-correcting", not "error-prone", right?

IMHO an xml-parser should be a lot faster, because it just stops if
there is an error, and has not to concern error-handling.

It also needs to implement very different parsing rules from HTML in general, keep track of namespaces, etc.

I would be very interested in performance-comparison between the
mostly used tag-soup-parser and the xml-parser for xhtml1.x documents
which are correctly served as application/xhtml+xml.

can you maybe give me any metrics?

Parser performance per se is more or less a wash, from what I've seen. It's also generally been a small enough component of pageload time (15% or less) that this aspect should be about the last factor in decidint whether to use use XML or not.

or, better, is it possible to extract the gecko document parser and
benchmark it standalone with various documents?

Why? That would give you a somewhat useless number for your purposes (see below).

the answer to this question is relevant if it is preferable to build
valid xml xhtml documents or just stick to html 4.01.

That's an entirely different question from a performance standpoint. Typically, the HTML codepath receives more attention in terms of optimization and profiling; if we have to sacrifice XHTML-as-XML performance in favor of HTML performance, we do so.

There are also common markup constructs that actually do significantly different things in XHTML-as-XML and in HTML. A good example:

  <table>
    <tr><td>Text</td></tr>
  </table>

This produces different DOMs in HTML and in XHTML-as-XML; the HTML one is faster to lay out, especially if you plan to do any dynamic addition or removal of rows in that table.

-Boris
_______________________________________________
dev-tech-layout mailing list
dev-tech-layout@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-layout

Reply via email to