On Wed, Jan 9, 2013 at 6:38 PM, Oliver Hunt <[email protected]> wrote: > How will we ensure thread safety? Even at just the tokenizing level don't we > use AtomicString? AtromicString isn't threadsafe wrt StringImpl IIRC so this > seems like it sould add a world of hurt.
AtomicString is already usable from other threads (http://trac.webkit.org/changeset/38094), but are correct this is the core concern! PickledToken (or whatever it's called) will have to be written very carefully in order to minimize/eliminate copies, while still guaranteeing thread safety. The correct design and handling of PickledToken is the entire question of this whole endeavor. > I realise it's been a long time since I've worked on this so it's completely > possible that I'm not aware of the current behavior. > > That aside I question what the benefit of this will be. All those cases > where we've started parsing html are intrinsically tied to the web's general > "single thread of execution" model, which implies that even if we do push > parsing into a separate thread we'll just end up with the ui thread blocked > on the parsing thread which doesn't seem hugely superior. > > What is the objective here? To improve performance, add parallelism, or > reduce latency? The core goal is to reduce latency -- to free up the main thread for JavaScript and UI interaction -- which as you correctly note, cannot be moved off of the main thread due to the "single thread of execution" model of the web. One could view the pre-load scanner as a lay-man's attempt at this type of "tokenize asynchronously" approach. This model gets preload scanning for free, as well as can easily answer wkb.ug/90751 request to speculative tokenizing of the entire document. (We just have to save markers before every <script> token, as if the script uses document.write, any tokens after </script> become invalid.) I should also note that not all HTML parsing can be moved off of the main thread. innerHTML for example, would still be done entirely on the main thread. I would imagine that when we were to land this on trunk it would be behind a feature flag and ports could opt-in to the threaded-parsing path, as we must maintain the main-thread parsing ability for innerHTML anyway. > --Oliver > > On Jan 9, 2013, at 6:10 PM, Adam Barth <[email protected]> wrote: > >> On Wed, Jan 9, 2013 at 6:00 PM, Eric Seidel <[email protected]> wrote: >>> We're planning to move parts of the HTML Parser off of the main thread: >>> https://bugs.webkit.org/show_bug.cgi?id=106127 >>> >>> This is driven by our testing showing that HTML parsing on mobile is >>> be slow, and long (causing user-visible delays averaging 10 frames / >>> 150ms). >>> https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002 >>> Complete data can be found at [1]. >> >> In case it's not clear from that link, the "ParseHTML" column is the >> total amount of time the web inspector attributes to HTML parsing when >> loading those URLs on a Nexus 7 using a top-of-tree build of >> Chromium's content_shell (similar to WebKitTestRunner). >> >> The HTML parser parses data a chunk at a time, which means the total >> time doesn't tell the whole story. The "ParseHTML_max" column shows >> the largest single block of time spent in the HTML parser, which is >> more of a measure of the main thread "jank" caused by the parser. >> >> Antti has pointed out that the inspector isn't the best source of >> data. He measured total time using instruments, and got numbers that >> are consistent (within a factor of 2) of the inspector measurements. >> (We were using different data sets, so we wouldn't expect perfect >> agreement even if we were measuring precisely the same thing.) >> >> Adam >> >> >>> Mozilla moved their parser onto a separate thread during their HTML5 >>> parser re-write: >>> https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading >>> >>> We plan to take a slightly simpler approach, moving only Tokenizing >>> off of the main thread: >>> https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit >>> The left is our current design, the middle is a tokenizer-only design, >>> and the right is more like mozilla's threaded-parser design. >>> >>> Profiling shows Tokenizing accounts for about 10x the number of >>> samples as TreeBuilding. Including Antti's recent testing (.5% vs. >>> 3%): >>> https://bugs.webkit.org/show_bug.cgi?id=106127#c10 >>> If after we do this we measure and find ourselves still spending a lot >>> of main-thread time parsing, we'll move the TreeBuilder too. :) (This >>> work is a nicely separable sub-set of larger work needed to move the >>> TreeBuilder.) >>> >>> We welcome your thoughts and comments. >>> >>> >>> 1. >>> https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0 >>> (Epic thanks to Nat Duca for helping us collect that data.) >> _______________________________________________ >> webkit-dev mailing list >> [email protected] >> http://lists.webkit.org/mailman/listinfo/webkit-dev > _______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo/webkit-dev

