On Jan 11, 2013, at 12:21 AM, Adam Barth <aba...@webkit.org> wrote:

> On Thu, Jan 10, 2013 at 9:19 PM, Maciej Stachowiak <m...@apple.com> wrote:
>> On Jan 10, 2013, at 12:07 PM, Adam Barth <aba...@webkit.org> wrote:
>>> On Thu, Jan 10, 2013 at 12:37 AM, Maciej Stachowiak <m...@apple.com> wrote:
>>>> I presume from your other comments that the goal of this work is 
>>>> responsiveness, rather than page load speed as such. I'm excited about the 
>>>> potential to improve responsiveness during page loading.
>>> 
>>> The goals are described in the first link Eric gave in his email:
>>> <https://bugs.webkit.org/show_bug.cgi?id=106127#c0>.  Specifically:
>>> 
>>> ---8<---
>>> 1) Moving parsing off the main thread could make web pages more
>>> responsive because the main thread is available for handling input
>>> events and executing JavaScript.
>>> 2) Moving parsing off the main thread could make web pages load more
>>> quickly because WebCore can do other work in parallel with parsing
>>> HTML (such as parsing CSS or attaching elements to the render tree).
>>> --->8---
>> 
>> OK - what test (if any) will be used to test whether the page load speed 
>> goal is achieved?
> 
> All of them.  :)
> 
> More seriously, Chromium runs a very large battery of performance
> tests continuously on a matrix of different platforms, including
> desktop and mobile.  You can see one of the overview dashboards here:
> 
> http://build.chromium.org/f/chromium/perf/dashboard/overview.html
> 
> The ones that are particularly relevant to this work are the various
> page load tests, both with simulated network delays and without
> network delays.  For iterative benchmarking, we plan to use Chromium's
> Telemetry framework <http://www.chromium.org/developers/telemetry>.
> Specifically, I expect we plan to work with the top_25 dataset
> <http://src.chromium.org/viewvc/chrome/trunk/src/tools/perf/page_sets/top_25.json?view=markup>,
> but we might use some other data sets if there are particular areas we
> want to measure more carefully.
> 
>>>> One question: what tests are you planning to use to validate whether this 
>>>> approach achieves its goals of better responsiveness?
>>> 
>>> The tests we've run so far are also described in the first link Eric
>>> gave in his email: <https://bugs.webkit.org/show_bug.cgi?id=106127>.
>>> They suggest that there's a good deal of room for improvement in this
>>> area.  After we have a working implementation, we'll likely re-run
>>> those experiments and run other experiments to do an A/B comparison of
>>> the two approaches.  As Filip points out, we'll likely end up with a
>>> hybrid of the two designs that's optimized for handling various work
>>> loads.
>> 
>> I agree the test suggests there is room for improvement. From the 
>> description of how the test is run, I can think of two potential ways to 
>> improve how well it correlates with actual user-perceived responsiveness:
>> 
>> (1) It seems to look at the max parsing pause time without considering 
>> whether there's any content being shown that it's possible to interact with. 
>> If the longest pauses happen before meaningful content is visible, then 
>> reducing those pauses is unlikely to actually materially improve 
>> responsiveness, at least in models where web content processing happens in a 
>> separate process or thread from the UI. One possibility is to track the max 
>> parsing pause time starting from the first visually non-empty layout. That 
>> would better approximate how much actual user interaction is blocked.
> 
> Consider, also, that pages might be parsing in the same process in
> another tab, or in a frame in the current tab.
> 
>> (2) It might be helpful to track max and average pause time from non-parsing 
>> sources, for the sake of comparison.
> 
> If you looked at the information Eric provided in his initial email,
> you might have noticed
> <https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0>,
> which is precisely that.
> 
>> These might result in a more accurate assessment of the benfits.
>> 
>>>> The reason I ask is that this sounds like a significant increase in 
>>>> complexity, so we should be very confident that there is a real and major 
>>>> benefit. One thing I wonder about is how common it is to have enough of 
>>>> the page processed that the user could interact with it in principle, yet 
>>>> still have large parsing chunks remaining which would prevent that 
>>>> interaction from being smooth.
>>> 
>>> If you're interested in reducing the complexity of the parser, I'd
>>> recommend removing the NEW_XML code.  As previously discussed, that
>>> code creates significant complexity for zero benefit.
>> 
>> Tu quoque fallacy. From your glib reply, I get the impression that you are 
>> not giving the complexity cost of multithreading due consideration. I hope 
>> that is not actually the case and I merely caught you at a bad moment or 
>> something.
> 
> I'm quite aware of the complexity of multithreaded code having written
> a great deal of it for Chromium.
> 
> One of the things I hope comes out of this project is a good example
> of how to do multithreaded processing in WebCore.  Currently, every
> subsystem seems "rolls their own" threading abstractions, I think
> largely because there hasn't been a strong technical example of how to
> do threading well.  By contrast, Chromium uses multithreading quite
> extensively and has put a lot of engineering effort into building
> high-quality threading abstractions:
> 
> http://dev.chromium.org/developers/design-documents/threading
> 
> I hope that this work will let us transfer some of that knowledge into
> WebCore.  Obviously cleaning up all of WebCore's threading
> abstractions is a big task, and one that will probably happen slowly
> over time.
> 
> Anyway, I can understand why you're worried about introducing more
> threading into WebCore given the current mishmash of threading
> abstractions that we have already.  The good news is that the folks
> working on this project probably have more experience writing
> multithreaded code than the average WebKit developer.  :)
> 
>> (And also we agreed to a drop dead date to remove the code which has either 
>> passed or is very close.)
>> 
>>>> Another thing I wonder about is whether yielding to the event loop more 
>>>> aggressively could achieve a similar benefit at a much lower complexity 
>>>> cost.
>>> 
>>> Yielding to the event loop more could reduce the "ParseHTML_max" time,
>>> but it cannot reduce the "ParseHTML" time.  Generally speaking,
>>> yielding to the event loop is a trade-off between throughput (i.e.,
>>> page load time) and responsiveness.  Moving work to a background
>>> thread should let us achieve a better trade-off between these
>>> quantities than we're likely to be able to achieve by tuning the yield
>>> parameter alone.
>> 
>> I agree that is possible. But it also seems like making the improvements 
>> that don't impose the complexity and hazards of multithreading in this area 
>> are worth trying first. Things such as retuning yielding and replacing the 
>> preload scanner with (non-threaded) speculative pre-tokenizing as suggested 
>> by Antti. That would let us better assess the benefits of the threading 
>> itself.
> 
> Given what I know about HTML parsing, I don't believe tuning the yield
> parameter will give nearly as good a result as moving tokenization to
> a background thread.  After we switched over to the new parser, we
> went through and tuned a bunch of these parameters (albeit not for
> maximum stop time as that wasn't on our radar at the time), so I have
> a good sense for what you can and can't achieve with these knobs.
> 
>>>> Having a test to drive the work would allow us to answer these types of 
>>>> questions. (It may also be that the test data you cited would already 
>>>> answer these questions but I didn't sufficiently understand it; if so, 
>>>> further explanation would be appreciated.)
>>> 
>>> If you're interested in building such a test, I would be interested in
>>> hearing the results.  We don't plan to build such a test at this time.
>> 
>> If you're actually planning to make a significant complexity-imposing 
>> architectural change for performance reasons, without any way to test 
>> whether it delivers the claimed performance benefits, or how it compares to 
>> less complex approaches, then why should any rational person agree with that 
>> approach? When attempting to improve performance, the burden of proof is on 
>> the person proposing the performance improvement, not on others to create a 
>> test to figure out if the performance improvement works. It's not valid to 
>> respond to a request for performance testing info with the equivalent of 
>> "patches welcome".
> 
> Is that really the case?  If so, I'm surprised that the patches for
> the shared memory cache and the NetworkProcess landed.  I raised
> similar questions to the ones you're raising now, but the folks
> purposing those changes basically ignored me and landed their patches
> anyway.
> 
>> But you and others have actually cited a performance test and described how 
>> to run of it, and perhaps even shown openness to improving it. So here 
>> again, I hope your words merely sound more negative than what you actually 
>> mean. I don't understand why you would say you don't plan to build a test 
>> when above you cited a test and your plans to run it before and after. Am I 
>> missing some nuance?
> 
> I didn't quite follow this last paragraph.
> 
> On Thu, Jan 10, 2013 at 11:00 PM, Filip Pizlo <fpi...@apple.com> wrote:
>> Thanks for your detailed reply. Seems like you guys have a pretty good plan 
>> in place.
> 
> Thanks.
> 
>> I hope this works and produces a performance improvement. That being said 
>> this does look like a sufficiently complex work item that success is far 
>> from guaranteed. So to play devil's advocate, what is your plan for if this 
>> doesn't work out?
> 
> If it doesn't work, we'll rip it out.
> 
>> I.e. are we talking about adding a bunch of threading support code in the 
>> optimistic hope that it makes things run fast, and then forgetting about it 
>> if it doesn't?  Or are you prepared to roll put any complexity that got 
>> landed if this does not ultimately live up to promise?  Or is this going to 
>> be one giant patch that only lands if it works?
> 
> I'm happy to rip out the feature if it turns out not to work.  There's
> a risk that we'll disagree about whether the feature is a win, but I
> imagine we'll come to a decision somehow.  :)
> 
> I would prefer not to land the code in one giant patch.  That's not
> typically how we approach WebKit development.  Instead, we plan to
> work behind an ENABLE flag, which will make it easier to identify what
> needs to be ripped out (and also make it easier to do A/B
> comparisons).
> 
>> I'm also trying to understand what would happen during the interim when this 
>> work is incomplete, we have thread-related goop in some critical paths, and 
>> we don't yet know if the WIP code is ever going to result in a speedup. And 
>> also, what will happen sometime from now if that code is never successfully 
>> optimized to the point where it is worth enabling.
> 
> Any threading goop will be behind an ENABLE flag until it's ready for
> prime time (i.e., until it is better than what we have currently).  As
> a project, we don't have a great track record of ripping out WIP code
> that's languishing (see, for example, the discussion about NEW_XML
> that I alluded to earlier), but I tend to be one of the people pushing
> to rip out unused code, so I'd lose a bit of my credibility if I
> wasn't willing to rip out my own code.  :)
> 
>> I appreciate that this sort of question can be asked of any performance work 
>> but in this particular case my gut tells me that this is going to result in 
>> significantly more complexity than the usual incremental performance work. 
>> So it's good to understand what plan B is.
>> 
>> Probably a good answer to this sort of question would address some fears 
>> that people may have. If this work does lead to a performance win then 
>> probably everyone will be happy. But if it doesn't then it would be great to 
>> have a "plan of retreat".
> 
> I'm thing that I've learned from this thread is that the community has
> more fear of multithreaded code than I expected.  In retrospect, this
> makes a lot of sense given that multithreaded code is rare in WebKit
> and that every subsystem rolls their own abstractions.
> 
> Our current plan is to use a simple design with two message queues:
> one queue of byte vectors being sent from the main thread to the
> background thread and one queue of PickledTokens being sent from the
> background thread to the main thread.  We're not planning on sharing
> any memory between the threads or on using any locking mechanisms
> (other than the ones that are already implemented in
> WTF::MessageQueue).  This general approach is widely used in the
> Chromium project and tends to be both high performance and relatively
> easy to reason about.  I'm happy to discuss this in more detail if
> you're interested.
> 
> To return to your question, plan B is to rip out the code and continue
> using the current implementation.  If you have something else in mind,
> I'm happy to discuss that as well.

This sounds great.  This is what I had in mind.  Best of luck on this effort, I 
suspect it should be fun, and hopefully profitable, too!

-F


> 
> Adam
> _______________________________________________
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> http://lists.webkit.org/mailman/listinfo/webkit-dev

_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev

Reply via email to