When loading web pages we are very frequently in a situation where we already have the source data (HTML text here but the same applies to preloaded Javascript, CSS, images, ...) and know we are likely to need it in soon, but can't actually utilize it for indeterminate time. This happens because pending external JS resources blocks the main parser (and pending CSS resources block JS execution) for web compatibility reasons. In this situation it makes sense to start processing resources we have to forms that are faster to use when they are eventually actually needed (like token stream here).
One thing we already do when the main parser gets blocked is preload scanning. We look through the unparsed HTML source we have and trigger loads for any resources found. It would be beneficial if this happened off the main thread. We could do it when new data arrives in parallel with JS execution and other time consuming engine work, potentially triggering resource loads earlier. I think a good first step here would be to share the tokens between the preload scanner and the main parser and worry about the threading part afterwards. We often parse the HTML source more or less twice so this is an unquestionable win. antti On Thu, Jan 10, 2013 at 7:41 AM, Filip Pizlo <fpi...@apple.com> wrote: > I think your biggest challenge will be ensuring that the latency of > shoving things to another core and then shoving them back will be smaller > than the latency of processing those same things on the main thread. > > For small documents, I expect concurrent tokenization to be a pure > regression because the latency of waking up another thread to do just a > small bit of work, plus the added cost of whatever synchronization > operations will be needed to ensure safety, will involve more total work > than just tokenizing locally. > > We certainly see this in the JSC parallel GC, and in line with traditional > parallel GC design, we ensure that parallel threads only kick in when the > main thread is unable to keep up with the work that it has created for > itself. > > Do you have a vision for how to implement a similar self-throttling, where > tokenizing continues on the main thread so long as it is cheap to do so? > > -Filip > > > On Jan 9, 2013, at 6:00 PM, Eric Seidel <e...@webkit.org> wrote: > > > We're planning to move parts of the HTML Parser off of the main thread: > > https://bugs.webkit.org/show_bug.cgi?id=106127 > > > > This is driven by our testing showing that HTML parsing on mobile is > > be slow, and long (causing user-visible delays averaging 10 frames / > > 150ms). > > https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002 > > Complete data can be found at [1]. > > > > Mozilla moved their parser onto a separate thread during their HTML5 > > parser re-write: > > > https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading > > > > We plan to take a slightly simpler approach, moving only Tokenizing > > off of the main thread: > > > https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit > > The left is our current design, the middle is a tokenizer-only design, > > and the right is more like mozilla's threaded-parser design. > > > > Profiling shows Tokenizing accounts for about 10x the number of > > samples as TreeBuilding. Including Antti's recent testing (.5% vs. > > 3%): > > https://bugs.webkit.org/show_bug.cgi?id=106127#c10 > > If after we do this we measure and find ourselves still spending a lot > > of main-thread time parsing, we'll move the TreeBuilder too. :) (This > > work is a nicely separable sub-set of larger work needed to move the > > TreeBuilder.) > > > > We welcome your thoughts and comments. > > > > > > 1. > https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0 > > (Epic thanks to Nat Duca for helping us collect that data.) > > _______________________________________________ > > webkit-dev mailing list > > webkit-dev@lists.webkit.org > > http://lists.webkit.org/mailman/listinfo/webkit-dev > > _______________________________________________ > webkit-dev mailing list > webkit-dev@lists.webkit.org > http://lists.webkit.org/mailman/listinfo/webkit-dev >
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev