The data Eric and Adam were using comes from a python library a few of us have been developing called "telemetry." Its basically a bunch of python that lets us write performance tests against any browser that speaks the inspector websocket protocol. We're using it a lot of "should we parallelize X" questions, as well as regression-style "have our changes to X stayed a win over time?"
They might have other ways in mind to obtain this data that is more webkit-y, but I figure a bit on how we got this far might be useful for this mailing list. Roughly, telemetry scripts connect up to a host and port where you've arranged to have an inspector websocket listening, e.g. $MY_PHONE_IP:9222, or google-chrome --remote-debugging-port=9222 && telemetry --browser=$LOCALHOST:9222. Once that's established, we have communication with WebCore's InspectorAgent, and assuming we trust the agent, can do some pretty powerful stuff from there. The benchmark being discussed here [webkit_benchmark] navigates the browser from page to page, enabling inspector's TimelineAgent as it does in order to get performance data about the page load. We then postprocess that data stream into a human consumable csv and there is [some amount] of rejoicing. Assuming we trust inspector timeline [Pavel's done a number of fixes to help us trust it more!] this gets pretty clean results, pretty easily. A key challenge with telemetry has been getting stable runs on real world sites. The archive.org technqiues are cool, but they dont capture some of the big ones, like a logged-in gmail account. We've addressed this using tonyg and simonjam's http://code.google.com/p/web-page-replay/. If the browser under test supports web page replay [~= redirecting dns requests to the replay server instead of the real site], then you can get stable, repeatable runs against super complex real world sites --- its worked on every site we've tried so far. The core telemetry framework is here: http://src.chromium.org/chrome/trunk/src/tools/telemetry/ Its in chromium repo, but please dont hold that against it --- its movable, given interest. The actual webkit benchmark is pretty simple, because most of the functionality comes from telemetry: https://codereview.chromium.org/11791043/ With the patch above landed, obtaining the benchmarking results that Eric got against chrome should be ~= getting a telemetry checkout and doing: ./run_multipage_benchmarks --browser=canary webkit_benchmark page_sets/top_25.json Or if you had an android with chrome on it: ./run_multipage_benchmarks --browser=android-chrome webkit_benchmark page_sets/top_25.json Anyway, I'll leave it to Eric/Adam to speak to how this maps back into the WebKit ecosystem. The use of inspector protocol makes it a theoretical possibility on other ports, but I know some people get nervous (or run away angrily!) when they hear that we're using Inspector as a perf data source. :) - Nat On Thu, Jan 10, 2013 at 1:44 AM, Antti Koivisto <koivi...@iki.fi> wrote: > When loading web pages we are very frequently in a situation where we > already have the source data (HTML text here but the same applies to > preloaded Javascript, CSS, images, ...) and know we are likely to need it > in soon, but can't actually utilize it for indeterminate time. This happens > because pending external JS resources blocks the main parser (and pending > CSS resources block JS execution) for web compatibility reasons. In this > situation it makes sense to start processing resources we have to forms > that are faster to use when they are eventually actually needed (like token > stream here). > > One thing we already do when the main parser gets blocked is preload > scanning. We look through the unparsed HTML source we have and trigger > loads for any resources found. It would be beneficial if this happened off > the main thread. We could do it when new data arrives in parallel with JS > execution and other time consuming engine work, potentially triggering > resource loads earlier. > > I think a good first step here would be to share the tokens between the > preload scanner and the main parser and worry about the threading part > afterwards. We often parse the HTML source more or less twice so this is an > unquestionable win. > > > antti > > > On Thu, Jan 10, 2013 at 7:41 AM, Filip Pizlo <fpi...@apple.com> wrote: > >> I think your biggest challenge will be ensuring that the latency of >> shoving things to another core and then shoving them back will be smaller >> than the latency of processing those same things on the main thread. >> >> For small documents, I expect concurrent tokenization to be a pure >> regression because the latency of waking up another thread to do just a >> small bit of work, plus the added cost of whatever synchronization >> operations will be needed to ensure safety, will involve more total work >> than just tokenizing locally. >> >> We certainly see this in the JSC parallel GC, and in line with >> traditional parallel GC design, we ensure that parallel threads only kick >> in when the main thread is unable to keep up with the work that it has >> created for itself. >> >> Do you have a vision for how to implement a similar self-throttling, >> where tokenizing continues on the main thread so long as it is cheap to do >> so? >> >> -Filip >> >> >> On Jan 9, 2013, at 6:00 PM, Eric Seidel <e...@webkit.org> wrote: >> >> > We're planning to move parts of the HTML Parser off of the main thread: >> > https://bugs.webkit.org/show_bug.cgi?id=106127 >> > >> > This is driven by our testing showing that HTML parsing on mobile is >> > be slow, and long (causing user-visible delays averaging 10 frames / >> > 150ms). >> > https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002 >> > Complete data can be found at [1]. >> > >> > Mozilla moved their parser onto a separate thread during their HTML5 >> > parser re-write: >> > >> https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading >> > >> > We plan to take a slightly simpler approach, moving only Tokenizing >> > off of the main thread: >> > >> https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit >> > The left is our current design, the middle is a tokenizer-only design, >> > and the right is more like mozilla's threaded-parser design. >> > >> > Profiling shows Tokenizing accounts for about 10x the number of >> > samples as TreeBuilding. Including Antti's recent testing (.5% vs. >> > 3%): >> > https://bugs.webkit.org/show_bug.cgi?id=106127#c10 >> > If after we do this we measure and find ourselves still spending a lot >> > of main-thread time parsing, we'll move the TreeBuilder too. :) (This >> > work is a nicely separable sub-set of larger work needed to move the >> > TreeBuilder.) >> > >> > We welcome your thoughts and comments. >> > >> > >> > 1. >> https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0 >> > (Epic thanks to Nat Duca for helping us collect that data.) >> > _______________________________________________ >> > webkit-dev mailing list >> > webkit-dev@lists.webkit.org >> > http://lists.webkit.org/mailman/listinfo/webkit-dev >> >> _______________________________________________ >> webkit-dev mailing list >> webkit-dev@lists.webkit.org >> http://lists.webkit.org/mailman/listinfo/webkit-dev >> > >
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev