Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

Eric Seidel Wed, 09 Jan 2013 19:07:38 -0800

On Wed, Jan 9, 2013 at 6:38 PM, Oliver Hunt <[email protected]> wrote:
> How will we ensure thread safety?  Even at just the tokenizing level don't we 
> use AtomicString?  AtromicString isn't threadsafe wrt StringImpl IIRC so this 
> seems like it sould add a world of hurt.


AtomicString is already usable from other threads
(http://trac.webkit.org/changeset/38094), but are correct this is the
core concern!  PickledToken (or whatever it's called) will have to be
written very carefully in order to minimize/eliminate copies, while
still guaranteeing thread safety.  The correct design and handling of
PickledToken is the entire question of this whole endeavor.

> I realise it's been a long time since I've worked on this so it's completely 
> possible that I'm not aware of the current behavior.
>
> That aside I question what the benefit of this will be.  All those cases 
> where we've started parsing html are intrinsically tied to the web's general 
> "single thread of execution" model, which implies that even if we do push 
> parsing into a separate thread we'll just end up with the ui thread blocked 
> on the parsing thread which doesn't seem hugely superior.
>
> What is the objective here? To improve performance, add parallelism, or 
> reduce latency?

The core goal is to reduce latency -- to free up the main thread for
JavaScript and UI interaction -- which as you correctly note, cannot
be moved off of the main thread due to the "single thread of
execution" model of the web.

One could view the pre-load scanner as a lay-man's attempt at this
type of "tokenize asynchronously" approach.  This model gets preload
scanning for free, as well as can easily answer wkb.ug/90751 request
to speculative tokenizing of the entire document.  (We just have to
save markers before every <script> token, as if the script uses
document.write, any tokens after </script> become invalid.)

I should also note that not all HTML parsing can be moved off of the
main thread.  innerHTML for example, would still be done entirely on
the main thread.  I would imagine that when we were to land this on
trunk it would be behind a feature flag and ports could opt-in to the
threaded-parsing path, as we must maintain the main-thread parsing
ability for innerHTML anyway.

> --Oliver
>
> On Jan 9, 2013, at 6:10 PM, Adam Barth <[email protected]> wrote:
>
>> On Wed, Jan 9, 2013 at 6:00 PM, Eric Seidel <[email protected]> wrote:
>>> We're planning to move parts of the HTML Parser off of the main thread:
>>> https://bugs.webkit.org/show_bug.cgi?id=106127
>>>
>>> This is driven by our testing showing that HTML parsing on mobile is
>>> be slow, and long (causing user-visible delays averaging 10 frames /
>>> 150ms).
>>> https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002
>>> Complete data can be found at [1].
>>
>> In case it's not clear from that link, the "ParseHTML" column is the
>> total amount of time the web inspector attributes to HTML parsing when
>> loading those URLs on a Nexus 7 using a top-of-tree build of
>> Chromium's content_shell (similar to WebKitTestRunner).
>>
>> The HTML parser parses data a chunk at a time, which means the total
>> time doesn't tell the whole story.  The "ParseHTML_max" column shows
>> the largest single block of time spent in the HTML parser, which is
>> more of a measure of the main thread "jank" caused by the parser.
>>
>> Antti has pointed out that the inspector isn't the best source of
>> data.  He measured total time using instruments, and got numbers that
>> are consistent (within a factor of 2) of the inspector measurements.
>> (We were using different data sets, so we wouldn't expect perfect
>> agreement even if we were measuring precisely the same thing.)
>>
>> Adam
>>
>>
>>> Mozilla moved their parser onto a separate thread during their HTML5
>>> parser re-write:
>>> https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading
>>>
>>> We plan to take a slightly simpler approach, moving only Tokenizing
>>> off of the main thread:
>>> https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit
>>> The left is our current design, the middle is a tokenizer-only design,
>>> and the right is more like mozilla's threaded-parser design.
>>>
>>> Profiling shows Tokenizing accounts for about 10x the number of
>>> samples as TreeBuilding.  Including Antti's recent testing (.5% vs.
>>> 3%):
>>> https://bugs.webkit.org/show_bug.cgi?id=106127#c10
>>> If after we do this we measure and find ourselves still spending a lot
>>> of main-thread time parsing, we'll move the TreeBuilder too. :)  (This
>>> work is a nicely separable sub-set of larger work needed to move the
>>> TreeBuilder.)
>>>
>>> We welcome your thoughts and comments.
>>>
>>>
>>> 1. 
>>> https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0
>>> (Epic thanks to Nat Duca for helping us collect that data.)
>> _______________________________________________
>> webkit-dev mailing list
>> [email protected]
>> http://lists.webkit.org/mailman/listinfo/webkit-dev
>
_______________________________________________
webkit-dev mailing list
[email protected]
http://lists.webkit.org/mailman/listinfo/webkit-dev

Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

Reply via email to