On Jan 11, 2013, at 12:21 AM, Adam Barth <aba...@webkit.org> wrote: > On Thu, Jan 10, 2013 at 9:19 PM, Maciej Stachowiak <m...@apple.com> wrote: >> On Jan 10, 2013, at 12:07 PM, Adam Barth <aba...@webkit.org> wrote: >>> On Thu, Jan 10, 2013 at 12:37 AM, Maciej Stachowiak <m...@apple.com> wrote: >>>> I presume from your other comments that the goal of this work is >>>> responsiveness, rather than page load speed as such. I'm excited about the >>>> potential to improve responsiveness during page loading. >>> >>> The goals are described in the first link Eric gave in his email: >>> <https://bugs.webkit.org/show_bug.cgi?id=106127#c0>. Specifically: >>> >>> ---8<--- >>> 1) Moving parsing off the main thread could make web pages more >>> responsive because the main thread is available for handling input >>> events and executing JavaScript. >>> 2) Moving parsing off the main thread could make web pages load more >>> quickly because WebCore can do other work in parallel with parsing >>> HTML (such as parsing CSS or attaching elements to the render tree). >>> --->8--- >> >> OK - what test (if any) will be used to test whether the page load speed >> goal is achieved? > > All of them. :) > > More seriously, Chromium runs a very large battery of performance > tests continuously on a matrix of different platforms, including > desktop and mobile. You can see one of the overview dashboards here: > > http://build.chromium.org/f/chromium/perf/dashboard/overview.html > > The ones that are particularly relevant to this work are the various > page load tests, both with simulated network delays and without > network delays. For iterative benchmarking, we plan to use Chromium's > Telemetry framework <http://www.chromium.org/developers/telemetry>. > Specifically, I expect we plan to work with the top_25 dataset > <http://src.chromium.org/viewvc/chrome/trunk/src/tools/perf/page_sets/top_25.json?view=markup>, > but we might use some other data sets if there are particular areas we > want to measure more carefully. > >>>> One question: what tests are you planning to use to validate whether this >>>> approach achieves its goals of better responsiveness? >>> >>> The tests we've run so far are also described in the first link Eric >>> gave in his email: <https://bugs.webkit.org/show_bug.cgi?id=106127>. >>> They suggest that there's a good deal of room for improvement in this >>> area. After we have a working implementation, we'll likely re-run >>> those experiments and run other experiments to do an A/B comparison of >>> the two approaches. As Filip points out, we'll likely end up with a >>> hybrid of the two designs that's optimized for handling various work >>> loads. >> >> I agree the test suggests there is room for improvement. From the >> description of how the test is run, I can think of two potential ways to >> improve how well it correlates with actual user-perceived responsiveness: >> >> (1) It seems to look at the max parsing pause time without considering >> whether there's any content being shown that it's possible to interact with. >> If the longest pauses happen before meaningful content is visible, then >> reducing those pauses is unlikely to actually materially improve >> responsiveness, at least in models where web content processing happens in a >> separate process or thread from the UI. One possibility is to track the max >> parsing pause time starting from the first visually non-empty layout. That >> would better approximate how much actual user interaction is blocked. > > Consider, also, that pages might be parsing in the same process in > another tab, or in a frame in the current tab. > >> (2) It might be helpful to track max and average pause time from non-parsing >> sources, for the sake of comparison. > > If you looked at the information Eric provided in his initial email, > you might have noticed > <https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0>, > which is precisely that. > >> These might result in a more accurate assessment of the benfits. >> >>>> The reason I ask is that this sounds like a significant increase in >>>> complexity, so we should be very confident that there is a real and major >>>> benefit. One thing I wonder about is how common it is to have enough of >>>> the page processed that the user could interact with it in principle, yet >>>> still have large parsing chunks remaining which would prevent that >>>> interaction from being smooth. >>> >>> If you're interested in reducing the complexity of the parser, I'd >>> recommend removing the NEW_XML code. As previously discussed, that >>> code creates significant complexity for zero benefit. >> >> Tu quoque fallacy. From your glib reply, I get the impression that you are >> not giving the complexity cost of multithreading due consideration. I hope >> that is not actually the case and I merely caught you at a bad moment or >> something. > > I'm quite aware of the complexity of multithreaded code having written > a great deal of it for Chromium. > > One of the things I hope comes out of this project is a good example > of how to do multithreaded processing in WebCore. Currently, every > subsystem seems "rolls their own" threading abstractions, I think > largely because there hasn't been a strong technical example of how to > do threading well. By contrast, Chromium uses multithreading quite > extensively and has put a lot of engineering effort into building > high-quality threading abstractions: > > http://dev.chromium.org/developers/design-documents/threading > > I hope that this work will let us transfer some of that knowledge into > WebCore. Obviously cleaning up all of WebCore's threading > abstractions is a big task, and one that will probably happen slowly > over time. > > Anyway, I can understand why you're worried about introducing more > threading into WebCore given the current mishmash of threading > abstractions that we have already. The good news is that the folks > working on this project probably have more experience writing > multithreaded code than the average WebKit developer. :) > >> (And also we agreed to a drop dead date to remove the code which has either >> passed or is very close.) >> >>>> Another thing I wonder about is whether yielding to the event loop more >>>> aggressively could achieve a similar benefit at a much lower complexity >>>> cost. >>> >>> Yielding to the event loop more could reduce the "ParseHTML_max" time, >>> but it cannot reduce the "ParseHTML" time. Generally speaking, >>> yielding to the event loop is a trade-off between throughput (i.e., >>> page load time) and responsiveness. Moving work to a background >>> thread should let us achieve a better trade-off between these >>> quantities than we're likely to be able to achieve by tuning the yield >>> parameter alone. >> >> I agree that is possible. But it also seems like making the improvements >> that don't impose the complexity and hazards of multithreading in this area >> are worth trying first. Things such as retuning yielding and replacing the >> preload scanner with (non-threaded) speculative pre-tokenizing as suggested >> by Antti. That would let us better assess the benefits of the threading >> itself. > > Given what I know about HTML parsing, I don't believe tuning the yield > parameter will give nearly as good a result as moving tokenization to > a background thread. After we switched over to the new parser, we > went through and tuned a bunch of these parameters (albeit not for > maximum stop time as that wasn't on our radar at the time), so I have > a good sense for what you can and can't achieve with these knobs. > >>>> Having a test to drive the work would allow us to answer these types of >>>> questions. (It may also be that the test data you cited would already >>>> answer these questions but I didn't sufficiently understand it; if so, >>>> further explanation would be appreciated.) >>> >>> If you're interested in building such a test, I would be interested in >>> hearing the results. We don't plan to build such a test at this time. >> >> If you're actually planning to make a significant complexity-imposing >> architectural change for performance reasons, without any way to test >> whether it delivers the claimed performance benefits, or how it compares to >> less complex approaches, then why should any rational person agree with that >> approach? When attempting to improve performance, the burden of proof is on >> the person proposing the performance improvement, not on others to create a >> test to figure out if the performance improvement works. It's not valid to >> respond to a request for performance testing info with the equivalent of >> "patches welcome". > > Is that really the case? If so, I'm surprised that the patches for > the shared memory cache and the NetworkProcess landed. I raised > similar questions to the ones you're raising now, but the folks > purposing those changes basically ignored me and landed their patches > anyway. > >> But you and others have actually cited a performance test and described how >> to run of it, and perhaps even shown openness to improving it. So here >> again, I hope your words merely sound more negative than what you actually >> mean. I don't understand why you would say you don't plan to build a test >> when above you cited a test and your plans to run it before and after. Am I >> missing some nuance? > > I didn't quite follow this last paragraph. > > On Thu, Jan 10, 2013 at 11:00 PM, Filip Pizlo <fpi...@apple.com> wrote: >> Thanks for your detailed reply. Seems like you guys have a pretty good plan >> in place. > > Thanks. > >> I hope this works and produces a performance improvement. That being said >> this does look like a sufficiently complex work item that success is far >> from guaranteed. So to play devil's advocate, what is your plan for if this >> doesn't work out? > > If it doesn't work, we'll rip it out. > >> I.e. are we talking about adding a bunch of threading support code in the >> optimistic hope that it makes things run fast, and then forgetting about it >> if it doesn't? Or are you prepared to roll put any complexity that got >> landed if this does not ultimately live up to promise? Or is this going to >> be one giant patch that only lands if it works? > > I'm happy to rip out the feature if it turns out not to work. There's > a risk that we'll disagree about whether the feature is a win, but I > imagine we'll come to a decision somehow. :) > > I would prefer not to land the code in one giant patch. That's not > typically how we approach WebKit development. Instead, we plan to > work behind an ENABLE flag, which will make it easier to identify what > needs to be ripped out (and also make it easier to do A/B > comparisons). > >> I'm also trying to understand what would happen during the interim when this >> work is incomplete, we have thread-related goop in some critical paths, and >> we don't yet know if the WIP code is ever going to result in a speedup. And >> also, what will happen sometime from now if that code is never successfully >> optimized to the point where it is worth enabling. > > Any threading goop will be behind an ENABLE flag until it's ready for > prime time (i.e., until it is better than what we have currently). As > a project, we don't have a great track record of ripping out WIP code > that's languishing (see, for example, the discussion about NEW_XML > that I alluded to earlier), but I tend to be one of the people pushing > to rip out unused code, so I'd lose a bit of my credibility if I > wasn't willing to rip out my own code. :) > >> I appreciate that this sort of question can be asked of any performance work >> but in this particular case my gut tells me that this is going to result in >> significantly more complexity than the usual incremental performance work. >> So it's good to understand what plan B is. >> >> Probably a good answer to this sort of question would address some fears >> that people may have. If this work does lead to a performance win then >> probably everyone will be happy. But if it doesn't then it would be great to >> have a "plan of retreat". > > I'm thing that I've learned from this thread is that the community has > more fear of multithreaded code than I expected. In retrospect, this > makes a lot of sense given that multithreaded code is rare in WebKit > and that every subsystem rolls their own abstractions. > > Our current plan is to use a simple design with two message queues: > one queue of byte vectors being sent from the main thread to the > background thread and one queue of PickledTokens being sent from the > background thread to the main thread. We're not planning on sharing > any memory between the threads or on using any locking mechanisms > (other than the ones that are already implemented in > WTF::MessageQueue). This general approach is widely used in the > Chromium project and tends to be both high performance and relatively > easy to reason about. I'm happy to discuss this in more detail if > you're interested. > > To return to your question, plan B is to rip out the code and continue > using the current implementation. If you have something else in mind, > I'm happy to discuss that as well.
This sounds great. This is what I had in mind. Best of luck on this effort, I suspect it should be fun, and hopefully profitable, too! -F > > Adam > _______________________________________________ > webkit-dev mailing list > webkit-dev@lists.webkit.org > http://lists.webkit.org/mailman/listinfo/webkit-dev _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev