Re: [webkit-dev] HTML5 tokenizer landing soon

2010-06-15 Thread Geoffrey Garen
 Oliver, I certainly don't want you to be sad.  We spent a little more
 time on performance in
 https://bugs.webkit.org/show_bug.cgi?id=40592.  I've now measured
 the top-of-tree performance more carefully, and it looks like the
 HTML5 parser is roughly a 5% speedup on the parsing benchmark [1].

Nice!

Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] HTML5 tokenizer landing soon

2010-06-14 Thread Alexey Proskuryakov


14.06.2010, в 10:21, Adam Barth написал(а):


In the new world, the
preload scanner is very simple because the tokenization algorithm is
separate from the rest of what the old HTMLTokenizer class did (which
was a lot).



Will be be able to also switch  
TextResourceDecoder::checkForHeadCharset()? Currently, it implements a  
custom parser to find meta charset, which is unfortunate.


- WBR, Alexey Proskuryakov

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


[webkit-dev] HTML5 tokenizer landing soon

2010-06-13 Thread Adam Barth
People of WebKit,

As mentioned recently on webkit-dev, Eric, Tonyg, and I have been
working on implementing the HTML5 parsing algorithm in WebKit:

http://www.mail-archive.com/webkit-dev@lists.webkit.org/msg11472.html

We're now ready to turn the new tokenization algorithm on by default
(probably early this week).  The new code passes all the existing
LayoutTests, with the exception of roughly 40 tests that expect
behavior that violates the HTML5 specification [1].

There are some differences between the old parser and the HTML5
parser.  We've written up a brief document outlining those
differences:

https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBIhl=en

If these differences cause real compatibility issues on the web, we
should contribute this information to the working group so we can
improve the specification.  If these differences cause compatibility
issues for WebKit-specific HTML (e.g., for Dashboard widgets), we
might need to add a flag to support some subset of these parsing
quirks for non-web uses of WebKit.

Please be on the lookout for parsing-related regressions and CC Eric,
Tonyg, and me on the bugs.  There's still a lot of work to do
(including implementing the tree construction algorithm), but turning
the tokenization code on by default is an important milestone for the
project.

Happy parsing,
Adam

[1] See 
https://spreadsheets.google.com/ccc?key=0AppchfQ5mBrEdDFJUW5DOGNsdmtvZkN0ZmIzMjdaT0Ehl=en
for details.
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev