Re: [webkit-dev] parallel rendering in WebKit
Hi, thanks for all of your for your help. I would be worried about correctness - if painting is not complete by the time the paint method returns, then you could get flashes of intermediate states showing up onscreen. We plan to do the painting on a background image. When all the painting finished, the image appears on the screen. We don't need to duplicate the dom tree, only the painting commands should be sent to the thread. We could keep the changes below RenderContext level in this way. I suspect resources like fonts or images *will* need to be duplicated, either that or use thread-safe refcounting and copy-on-write. The internal state of images can be mutated by progress in loading the image, or by ongoing animation. No need thread safe refcounting, since when a request object is processed by the thread, the thread sends back the request object to the main thread, which can release the resources. It is necessary, since memory alloc / free pairs on a memory chunk must be executed by the same thread. We can post a series of commands to the thread message queues (including the main thread), which will processed later (no wait is necessary). I'm also curious how this will help overall rendering time. Embedded platforms would normally only be displaying one document at a time, so how will one thread per document help? Not anymore. The UI design of mobile or handheld devices use more and more html-based content. It is easier for both UI designers (enough to know how to create html pages) and developers (a browser is needed anyway, and not need to maintain a separate UI rendering engine). Of course we need reliable and fast brower engines to achive these features. I can't promise any major gain at this momment, but it seems there is not any design issue in WebKit, which makes this approach impossible. Wish us luck :) Regards, Zoltan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] parallel rendering in WebKit
On Friday 19 February 2010 23:36:26 Zoltan Herczeg wrote: Hi, as all of you probably know, smp based systems are getting widespread even in the embedded domain, and we hope we can speed up webkit on these systems. We did some profiling, and seemed the platform dependent rendering took 50% of the total runtime (at least on our test platforms). We are thinking about adding some parallel rendering support for WebKit, probably mostly platform dependent code, but the threading support could be reused by different ports. Hi Zoltan, two questions. I assume you have done all your profiling with QtWebKit? And you are measuring (looking at total cumulative) everything under the paintEvent of QWebView? My initial reaction to multi threaded painting is, why don't we fix the obvious performance bottlenecks in Qt first? Would it be much work to verify your profiling results in WebKit/GTK+ or Chromium? My assumption/guess is that you will see quite some different numbers. regards holger ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] parallel rendering in WebKit
Date: Sat, 20 Feb 2010 12:22:02 +0100 From: zherc...@inf.u-szeged.hu To: m...@apple.com CC: webkit-dev@lists.webkit.org Subject: Re: [webkit-dev] parallel rendering in WebKit Hi, thanks for all of your for your help. I'm also curious how this will help overall rendering time. Embedded platforms would normally only be displaying one document at a time, so how will one thread per document help? Not anymore. The UI design of mobile or handheld devices use more and more html-based content. It is easier for both UI designers (enough to know how to create html pages) and developers (a browser is needed anyway, and not need to maintain a separate UI rendering engine). Of course we need reliable and fast brower engines to achive these features. ( again, with 1Gb of memory right now even iceweasel is lighting up my disk as i try to type in this simple form LOL but not nearly as bad as firefox on a 500M laptop). We have a phone app that has a bunch of pieces made from html with various fake links to point to internal resources and variables. It is easy to write the html but speed is certainly an issue. One part of this time delay seems to be start-up or initialization time and the few times I've profiled this ( java ) I usually end up looking at calls to script compilers and stuff like that but I'm not sure this unrelated thing would likely say much about webkit. You may alrady have to initialize a lot of Stuff just to render a simple form with a few buttons and you may be able to reduce that time to make it acceptable in more apps but it will always be slower than using more direct components. Keep in mind that random initialization code to setup parallelism could be much slower than predictable local access even if instruction counts ( and of course ignoring SIMD benefits ) are comparable due to memory architecture. If you only have one CPU the likely benefits would be in things that amount to IO- and I guess that includes use of GPU or scheduling use of contsrained resources or something. Threading of course creates a new class of resource, the lock or mutex, possibly leading to lock starvation as many workers compete for one resource. The extreme case being each launched thread holds the same lock for duration of execution thereby forcing them to execute serially and you may even find this improves performance over less synchronized case even when not needed for correctness. I guess offhand from what I've seen initialization and memory access patterns are often ignored components of performance and there is a tendency to think that launching a thread creates a new CPU. I can't promise any major gain at this momment, but it seems there is not any design issue in WebKit, which makes this approach impossible. Wish us luck :) Regards, Zoltan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev _ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/201469227/direct/01/ ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] parallel rendering in WebKit
Hi Zoltan, On Fri, Feb 19, 2010 at 2:36 PM, Zoltan Herczeg zherc...@inf.u-szeged.hu wrote: The plan is opening a rendering thread for each document object. [...] This is still a vague idea, and we are still investigating the possibilities. What is your opinion? Do you know about any major blockers we should know about? Didn't dhyatt and some others at Apple investigate something along these lines in the context of OpenCL and the new 'Grand Central Dispatch' stuff? As I recall, they didn't experience a significant improvement for a variety of reasons, though I think they were trying to parallelize rendering of a single document, not rendering different documents on different threads. -Brent ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] parallel rendering in WebKit
Hi, On 2010-02-19 at 23:46:11 [+0100], Benbuck Nason bna...@netflix.com wrote: To me it seems like the biggest gain would be to have two threads: one for update/layout, and another for rendering. This would probably require double-buffering the render state though. it's not really my place to jump into discussions about WebKit internals, since I am very new to the project, but I have some experience with this problem. By double-buffering the render state, do you mean to have the document tree in two copies? Or do you mean double buffering the graphics? The first is what needs to happen, since you want to render the document at a consistent point in time. Once this separation is done, you can even use as many rendering threads as you want, i.e. split the output bitmap into areas and use one thread per area. Other ways of splitting are possible too, like rendering independent compositing layers in different threads. But one needs a point in time to take a snapshot of the current document tree, kick of the rendering of this snapshot, while the main document is allowed to evolve. I can't really say if such a design makes sense for WebKit. Depending on the architecture of a port, rendering may already take place in another thread. for example, when drawing commands are dispatched to the GPU. Or in the Haiku port, drawing commands are dispatched to a system service, which does the actual work. Synchronization happens when the contents are blitted to the screen. What would be nice is if each page could run it's own layout thread. But AFAICT, a lot of work would need to be done for this. When I started to pick up work on the Haiku port, I had to redesign the threading, and make sure I call into WebCore code only on the main thread. There are tons of assertions to make sure of this (nice for porting). But even nicer would be a concept of threading and locking critical sections, instead of asserting a specific thread. Maybe this doesn't make so much sense for WebKit and I am just showing off my newbie status. :-) Best regards, -Stephan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] parallel rendering in WebKit
Date: Fri, 19 Feb 2010 23:36:26 +0100 From: zherc...@inf.u-szeged.hu To: webkit-dev@lists.webkit.org Subject: [webkit-dev] parallel rendering in WebKit Hi, as all of you probably know, smp based systems are getting widespread even in the embedded domain, and we hope we can speed up webkit on these systems. We did some profiling, and seemed the platform dependent rendering took 50% of the total runtime (at least on our test platforms). We are thinking about adding some parallel rendering support for WebKit, probably mostly platform dependent code, but the threading support could be reused by different ports. The plan is opening a rendering thread for each document object. This thread is dedicated to rendering, the resource management is still done by the main thread. In other words, functions like drawRect (in GraphicsContext.h) creates a small object, which contains the arguments of the called function, and passing this object to the thread. The thread would do the painting, and send back the object after it is processed. The main thread can free the memory space of the object later, and dereference the resources (we hope resources like fonts and images are not need to be duplicated). This is still a vague idea, and we are still investigating the possibilities. What is your opinion? Do you know about any major blockers we should know about? I'm new to webkit and haven't looked at details of browsers much but I would suggest that you at least identify some bottlenecks more specifically and get some idea how you expect to benefit. Certainly with one core you would have to be skeptical but even with multi cpu's you can end up slowing things down pretty quickly based on some graphs I've seen of what can happen. Obviously, if you are waiting for IO doing something on another thread may be a good idea but offhand I'd start trying to make coding or algorithmic changes and see if anything specific emerges while doing that that suggests a way to multi-thread. If you really are limited by some CPU intensive tasks and have a hierarchial memory you can end up doing more harm cache thrashing than anything else. In fact, you may be better off making sure any such algorighms are cache aware and retain memory locality etc before just throwing more threads at the thing. If you have inner loops that can be optimized or reordered to get better locaclity that can be a big issue. I would also point out personally I am tired of seeing my disk light come on in firefox or even in iceweasel sometimes due to what appears to be VM problems and memory leaks ( this was even more annoying when my first attempt to write a test program that invoked my webkit build was dying on a smart or auto ptr assign LOL). So, I'd worry about saving memory first and making things more local FWIW. Loading up memory with more housekeeping and idle threads may not be a good idea. Thanks in advance, Zoltan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev _ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/201469229/direct/01/ ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] parallel rendering in WebKit
We've been experimenting in our model with various modes of parallel 2D rendering (basic theme: GPU/SMP support rocks) but have a more wide open design space than WebKit's. For a not-too-painful approach, looks like Firefox is doing well, and that's even for D2D (not retained mode): check out the last few posts @ http://www.basschouten.com/ . Flash does a bunch as well, but that's less obviously transferred. Parallelism can be used at multiple levels within the renderer -- SIMD, threads, and/or GPU -- I was actually under the impression that WebKit on the iPhone already uses hardware acceleration for painting. For the latter case, the hardware was made for pushing pixels, so the performance question should be of how much of a speedup, not whether there is one. - Leo [[ A little further out, I've been going over the CSS spec, and found that a lot of the CSS transform stuff maps nicely into OpenGL as it doesn't impact layout; structured extensions like adding shaders to CSS surfaces doesn't sound too crazy at this point. This seems well-beyond the scope of this list, however. ]] Maciej Stachowiak wrote: On Feb 19, 2010, at 2:36 PM, Zoltan Herczeg wrote: Hi, as all of you probably know, smp based systems are getting widespread even in the embedded domain, and we hope we can speed up webkit on these systems. We did some profiling, and seemed the platform dependent rendering took 50% of the total runtime (at least on our test platforms). We are thinking about adding some parallel rendering support for WebKit, probably mostly platform dependent code, but the threading support could be reused by different ports. The plan is opening a rendering thread for each document object. This thread is dedicated to rendering, the resource management is still done by the main thread. In other words, functions like drawRect (in GraphicsContext.h) creates a small object, which contains the arguments of the called function, and passing this object to the thread. The thread would do the painting, and send back the object after it is processed. The main thread can free the memory space of the object later, and dereference the resources (we hope resources like fonts and images are not need to be duplicated). This is still a vague idea, and we are still investigating the possibilities. What is your opinion? Do you know about any major blockers we should know about? I would be worried about correctness - if painting is not complete by the time the paint method returns, then you could get flashes of intermediate states showing up onscreen. I suspect resources like fonts or images *will* need to be duplicated, either that or use thread-safe refcounting and copy-on-write. The internal state of images can be mutated by progress in loading the image, or by ongoing animation. I'm also curious how this will help overall rendering time. Embedded platforms would normally only be displaying one document at a time, so how will one thread per document help? Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] parallel rendering in WebKit
On Feb 19, 2010, at 7:20 PM, Leo Meyerovich wrote: We've been experimenting in our model with various modes of parallel 2D rendering (basic theme: GPU/SMP support rocks) but have a more wide open design space than WebKit's. For a not-too-painful approach, looks like Firefox is doing well, and that's even for D2D (not retained mode): check out the last few posts @ http://www.basschouten.com/ . Flash does a bunch as well, but that's less obviously transferred. Parallelism can be used at multiple levels within the renderer -- SIMD, threads, and/or GPU -- I was actually under the impression that WebKit on the iPhone already uses hardware acceleration for painting. For the latter case, the hardware was made for pushing pixels, so the performance question should be of how much of a speedup, not whether there is one. iPhone uses hardware acceleration for compositing and scrolling, not for painting per se. It also does most Web content processing on a separate thread from the UI thread, but that is for UI responsiveness, not painting throughput. - Leo [[ A little further out, I've been going over the CSS spec, and found that a lot of the CSS transform stuff maps nicely into OpenGL as it doesn't impact layout; structured extensions like adding shaders to CSS surfaces doesn't sound too crazy at this point. This seems well-beyond the scope of this list, however. ]] CSS transitions and transforms were designed to support being done in hardware. WebKit supports using hardware acceleration for animation and compositing (currently only really working for the Mac port). Using graphics APIs that send more work to the GPU (OpenGL, OpenVG, Direct2D, etc) is definitely a possibility and should not impact anything above the GraphicsContext layer. That's a different kid of change than dispatching individual drawing commands to a background thread though. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev