subject:"\[webkit\-dev\] Feature Announcement\: Moving HTML Parser off the Main Thread"

Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-11 Thread Adam Barth

On Thu, Jan 10, 2013 at 9:19 PM, Maciej Stachowiak m...@apple.com wrote:
On Jan 10, 2013, at 12:07 PM, Adam Barth aba...@webkit.org wrote:
On Thu, Jan 10, 2013 at 12:37 AM, Maciej Stachowiak m...@apple.com wrote:
I presume from your other comments that the goal of this work is
responsiveness, rather than page load speed as such. I'm excited about the
potential to improve responsiveness during page loading.

The goals are described in the first link Eric gave in his email:
https://bugs.webkit.org/show_bug.cgi?id=106127#c0. Specifically:

---8---
1) Moving parsing off the main thread could make web pages more
responsive because the main thread is available for handling input
events and executing JavaScript.
2) Moving parsing off the main thread could make web pages load more
quickly because WebCore can do other work in parallel with parsing
HTML (such as parsing CSS or attaching elements to the render tree).
---8---

OK - what test (if any) will be used to test whether the page load speed goal
is achieved?

All of them. :)

More seriously, Chromium runs a very large battery of performance
tests continuously on a matrix of different platforms, including
desktop and mobile. You can see one of the overview dashboards here:

http://build.chromium.org/f/chromium/perf/dashboard/overview.html

The ones that are particularly relevant to this work are the various
page load tests, both with simulated network delays and without
network delays. For iterative benchmarking, we plan to use Chromium's
Telemetry framework http://www.chromium.org/developers/telemetry.
Specifically, I expect we plan to work with the top_25 dataset
http://src.chromium.org/viewvc/chrome/trunk/src/tools/perf/page_sets/top_25.json?view=markup,
but we might use some other data sets if there are particular areas we
want to measure more carefully.

One question: what tests are you planning to use to validate whether this
approach achieves its goals of better responsiveness?

The tests we've run so far are also described in the first link Eric
gave in his email: https://bugs.webkit.org/show_bug.cgi?id=106127.
They suggest that there's a good deal of room for improvement in this
area. After we have a working implementation, we'll likely re-run
those experiments and run other experiments to do an A/B comparison of
the two approaches. As Filip points out, we'll likely end up with a
hybrid of the two designs that's optimized for handling various work
loads.

I agree the test suggests there is room for improvement. From the description
of how the test is run, I can think of two potential ways to improve how well
it correlates with actual user-perceived responsiveness:

(1) It seems to look at the max parsing pause time without considering
whether there's any content being shown that it's possible to interact with.
If the longest pauses happen before meaningful content is visible, then
reducing those pauses is unlikely to actually materially improve
responsiveness, at least in models where web content processing happens in a
separate process or thread from the UI. One possibility is to track the max
parsing pause time starting from the first visually non-empty layout. That
would better approximate how much actual user interaction is blocked.

Consider, also, that pages might be parsing in the same process in
another tab, or in a frame in the current tab.

(2) It might be helpful to track max and average pause time from non-parsing
sources, for the sake of comparison.

If you looked at the information Eric provided in his initial email,
you might have noticed
https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0,
which is precisely that.

These might result in a more accurate assessment of the benfits.

The reason I ask is that this sounds like a significant increase in
complexity, so we should be very confident that there is a real and major
benefit. One thing I wonder about is how common it is to have enough of the
page processed that the user could interact with it in principle, yet still
have large parsing chunks remaining which would prevent that interaction
from being smooth.

If you're interested in reducing the complexity of the parser, I'd
recommend removing the NEW_XML code. As previously discussed, that
code creates significant complexity for zero benefit.

Tu quoque fallacy. From your glib reply, I get the impression that you are
not giving the complexity cost of multithreading due consideration. I hope
that is not actually the case and I merely caught you at a bad moment or
something.

I'm quite aware of the complexity of multithreaded code having written
a great deal of it for Chromium.

One of the things I hope comes out of this project is a good example
of how to do multithreaded processing in WebCore. Currently, every
subsystem seems rolls their own threading abstractions, I think
largely because there hasn't been a

Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-11 Thread Filip Pizlo

On Jan 11, 2013, at 12:21 AM, Adam Barth aba...@webkit.org wrote:

The goals are described in the first link Eric gave in his email:
https://bugs.webkit.org/show_bug.cgi?id=106127#c0. Specifically:

OK - what test (if any) will be used to test whether the page load speed
goal is achieved?

All of them. :)

More seriously, Chromium runs a very large battery of performance
tests continuously on a matrix of different platforms, including
desktop and mobile. You can see one of the overview dashboards here:

http://build.chromium.org/f/chromium/perf/dashboard/overview.html

One question: what tests are you planning to use to validate whether this
approach achieves its goals of better responsiveness?

I agree the test suggests there is room for improvement. From the
description of how the test is run, I can think of two potential ways to
improve how well it correlates with actual user-perceived responsiveness:

Consider, also, that pages might be parsing in the same process in
another tab, or in a frame in the current tab.

(2) It might be helpful to track max and average pause time from non-parsing
sources, for the sake of comparison.

These might result in a more accurate assessment of the benfits.

The reason I ask is that this sounds like a significant increase in
complexity, so we should be very confident that there is a real and major
benefit. One thing I wonder about is how common it is to have enough of
the page processed that the user could interact with it in principle, yet
still have large parsing chunks remaining which would prevent that
interaction from being smooth.

If you're interested in reducing the complexity of the parser, I'd
recommend removing the NEW_XML code. As previously discussed, that
code creates significant complexity for zero benefit.

I'm quite aware of the complexity of multithreaded code having written
a great deal of it for Chromium.

One of the things I hope comes out of this project is a good example
of how to do multithreaded processing in WebCore.

Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-11 Thread Maciej Stachowiak

Your comments here make me feel more positively towards this project. In
particular, I'm happy that:
- There actually will be meaningful testing.
- You're prepared to abandon this approach if it doesn't meet its perf goals
(presumably at minimum no regression to page load time or responsiveness while
loading, and meaningful improvement to at least one of these).
- The shared-nothing message-passing approach to threading sounds likely to be
a relatively less complex/fragile approach to threading than most others.

Thanks for following up.

I have a comment on a tangential point that I'll split into another thread.

Cheers,
Maciej