Thanks everyone for your feedback. Detailed responses inline. On Wed, Jan 9, 2013 at 9:41 PM, Filip Pizlo <fpi...@apple.com> wrote: > I think your biggest challenge will be ensuring that the latency of shoving > things to another core and then shoving them back will be smaller than the > latency of processing those same things on the main thread.
Yes. That's something we know we have to worry about. Given that we need to retain the ability to parse HTML on the main thread to handle document.write and innerHTML, we should be able to easily do A/B comparisons to make sure we understand any performance trade-offs that might arise. > For small documents, I expect concurrent tokenization to be a pure regression > because the latency of waking up another thread to do just a small bit of > work, plus the added cost of whatever synchronization operations will be > needed to ensure safety, will involve more total work than just tokenizing > locally. Once we have the ability to tokenize on a background thread, we can examine cases like these and heuristically decide whether to use the background thread or not at runtime. As I wrote above, we'll need these ability anyway, so keeping the ability to optimize these cases shouldn't add any new constraints to the design. > We certainly see this in the JSC parallel GC, and in line with traditional > parallel GC design, we ensure that parallel threads only kick in when the > main thread is unable to keep up with the work that it has created for itself. > > Do you have a vision for how to implement a similar self-throttling, where > tokenizing continues on the main thread so long as it is cheap to do so? It's certainly something we can tune in the optimization phase. I don't think we need a particular vision to be able to do it. Given that we want to implement speculative parsing (to replace preload scanning---more on this below), we'll already have the ability to checkpoint and restore the tokenizer state across threads. Once you have that primitive, it's easy to decide whether to continue tokenization on the main thread or on a background thread. On Wed, Jan 9, 2013 at 10:04 PM, Ian Hickson <i...@hixie.ch> wrote: > Parsing and (maybe to a lesser extent) compiling JS can be moved off the > main thread, though, right? That's probably worth examining too, if it > hasn't already been done. Yes, once we have the tokenizer running on a background thread, that opens up the possibility of parsing other sorts of data on the background thread as well. For example, when the tokenizer encounters an inline script block, you could imagine parsing the script on the background thread as well so that the main thread has less work to do. (You could also imagine making the optimizations without a background tokenizer, but the design constraints would be a bit different.) On Thu, Jan 10, 2013 at 12:11 AM, Zoltan Herczeg <zherc...@webkit.org> wrote: > Parsing, especially JS parsing still takes a large amount of time on page > loading. We tried to improve the preload scanner by moving it into > anouther thread, but there was no gain (except some special cases). > Synchronization between threads is surprisingly (ridiculously) costly, > usually worth for those tasks, which needs quite a few million > instructions to be executed (and tokenization takes far less in most > cases). For smaller tasks, SIMD instruction sets can help, which is > basically a parallel execution on a single thread. Anyway it is worth > trying, but it is really challenging to make it work in practice. Good > luck! This is something we're worried about and will need to be careful about. In the design we're proposing, preload scanning is replaced by speculative parsing, so the overhead of the preload scanner is removed entirely. The way this works is a follows: When running on the background thread, the tokenizer produces a queue of PickledTokens. As these tokens are queued, we can scan them to kick off any preloads that we find. Whenever the tokenizer queues a token that creates a new insertion point (in the terminology of the HTML specification), the tokenizer checkpoints itself but continues tokenizing speculatively. (Notice that tokens produced in this situation are still scanned for preloads but might not ever actually result in DOM being constructed.) After the main thread has processed the token that created the insertion point, if no characters were inserted, the main thread continues processing PickledTokens that were created speculative. If some characters were inserted, the main thread instead instructs the tokenizer to roll back to that checkpoint and continue tokenizing in a new state. In this case, the queue of speculative tokens is discarded. Notice that in the common case, we're execute JavaScript and tokenize in parallel, something that's not possible with a main-thread tokenizer. Once the script is done executing, we expect it to be common to be able to result tree building immediately as the tokenizer will have already tokenized much of the subsequent data. On Thu, Jan 10, 2013 at 12:37 AM, Maciej Stachowiak <m...@apple.com> wrote: > I presume from your other comments that the goal of this work is > responsiveness, rather than page load speed as such. I'm excited about the > potential to improve responsiveness during page loading. The goals are described in the first link Eric gave in his email: <https://bugs.webkit.org/show_bug.cgi?id=106127#c0>. Specifically: ---8<--- 1) Moving parsing off the main thread could make web pages more responsive because the main thread is available for handling input events and executing JavaScript. 2) Moving parsing off the main thread could make web pages load more quickly because WebCore can do other work in parallel with parsing HTML (such as parsing CSS or attaching elements to the render tree). --->8--- > One question: what tests are you planning to use to validate whether this > approach achieves its goals of better responsiveness? The tests we've run so far are also described in the first link Eric gave in his email: <https://bugs.webkit.org/show_bug.cgi?id=106127>. They suggest that there's a good deal of room for improvement in this area. After we have a working implementation, we'll likely re-run those experiments and run other experiments to do an A/B comparison of the two approaches. As Filip points out, we'll likely end up with a hybrid of the two designs that's optimized for handling various work loads. > The reason I ask is that this sounds like a significant increase in > complexity, so we should be very confident that there is a real and major > benefit. One thing I wonder about is how common it is to have enough of the > page processed that the user could interact with it in principle, yet still > have large parsing chunks remaining which would prevent that interaction from > being smooth. If you're interested in reducing the complexity of the parser, I'd recommend removing the NEW_XML code. As previously discussed, that code creates significant complexity for zero benefit. > Another thing I wonder about is whether yielding to the event loop more > aggressively could achieve a similar benefit at a much lower complexity cost. Yielding to the event loop more could reduce the "ParseHTML_max" time, but it cannot reduce the "ParseHTML" time. Generally speaking, yielding to the event loop is a trade-off between throughput (i.e., page load time) and responsiveness. Moving work to a background thread should let us achieve a better trade-off between these quantities than we're likely to be able to achieve by tuning the yield parameter alone. > Having a test to drive the work would allow us to answer these types of > questions. (It may also be that the test data you cited would already answer > these questions but I didn't sufficiently understand it; if so, further > explanation would be appreciated.) If you're interested in building such a test, I would be interested in hearing the results. We don't plan to build such a test at this time. On Thu, Jan 10, 2013 at 1:44 AM, Antti Koivisto <koivi...@iki.fi> wrote: > When loading web pages we are very frequently in a situation where we > already have the source data (HTML text here but the same applies to > preloaded Javascript, CSS, images, ...) and know we are likely to need it in > soon, but can't actually utilize it for indeterminate time. This happens > because pending external JS resources blocks the main parser (and pending > CSS resources block JS execution) for web compatibility reasons. In this > situation it makes sense to start processing resources we have to forms that > are faster to use when they are eventually actually needed (like token > stream here). Indeed. > One thing we already do when the main parser gets blocked is preload > scanning. We look through the unparsed HTML source we have and trigger loads > for any resources found. It would be beneficial if this happened off the > main thread. We could do it when new data arrives in parallel with JS > execution and other time consuming engine work, potentially triggering > resource loads earlier. A couple people have tried to move preload scanning to a background thread, but they haven't had much success. Given that moving the parser to a background thread gets us background preload scanning for free, I don't think it's worth investing effort in moving just the preload scanner anymore. > I think a good first step here would be to share the tokens between the > preload scanner and the main parser and worry about the threading part > afterwards. We often parse the HTML source more or less twice so this is an > unquestionable win. We've discussed doing that for a number of years, but no one has actually succeeded in doing it. Given that moving the parsing to a background thread gets us token reuse for free (because of the switch from preload scanning to speculative tokenization), I don't think it's worth investing effort in reusing the preload scanner's tokens anymore. Adam _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev