Re: [webkit-dev] arm jit

Gavin Barraclough Wed, 10 Jun 2009 14:53:54 -0700


On Jun 10, 2009, at 1:15 PM, Toshiyasu Morita wrote:

--- On Wed, 6/10/09, Geoffrey Garen <[email protected]> wrote:
>I'm having a hard time understanding from your comment whatoptimization changes you think are appropriate, but if you canproduce a patch that implements> your idea, and shows a benefit on a benchmark, I'd be happy toreview it.
Consider something like op_call.
This expands out to 95 inline instructions on the MIPS for just theslow case alone, of which 3 are functions calls to other functions.So this probably requires thousands of clock cycles to execute.
IMHO it doesn't make sense to inline op_call because:

[ I'm sorry, I've been away from a net connection, I may bereplicating a couple of things ggaren & olliej have already said. ]

Okay! First up, have you tried turning off ENABLE_JIT_OPTIMIZE_CALL?If you do so, it should address the majority of your concerns, below(specifically, reducing code size, and removing the need for op_callto patch generated code).

Of course, we added the call optimizations because we measure them asa significant performance improvement, but feel free to test whetherthis is true on your platform, and once the MIPS JIT is in the treewe'd be happy to consider changes to the optimized mode that aid MIPSperformance.

1. It's a huge amount of JIT code just to save three of fourinstructions at runtime (call, return, and maybe some registershuffling)
2. The code which is executed is thousands of instructions andsaving three or four instructions is a microscopic net win.
4. It make the generated machine code MUCH larger because instead ofhaving one copy of this function that is written in C/C++ andstatically compiled, there are multiple copies of this code forevery instance of op_call, which makes the instruction cache muchless effective.

I think it's worth making sure you understand the optimization here.The majority of calls can be optimized, and having been optimized onlyrun the sequence of instructions planted in the main generation pass.This code path is only a handful of instructions long, and introducingan extra call and return onto this path would almost certainly degradeperformance (feel free to try doing so, and please so submit anypatches that provide a memory saving, without significantly degradingperformance). For such a short and performance critical fragment ofcode it clearly could make sense to tweak the code for specificplatforms, and it may well provide a significant performance benefitto do so. We should certainly consider such patches.

The slow case JIT code is much longer, and less frequently executed.Introducing a call and return here to share code between callsdefinitely makes sense. The way you know we think that it, the JITalready works this way! The slow cases call out to a set of sharedtrampolines generated in privateCompileCTIMachineTrampolines. This ishowever, a work in progress, and we are currently still clearlygenerating far more code than we should be in the slow cases. Morework should be done to unify the pre-linked and post-link slow casestates, and to move work into the trampolines (this is something I maybe looking at again fairly soon).

It is certainly valid to question whether the work performed by themachine trampolines is better in JIT generated code, or in C++ codethat the compiler can optimize. In the early stages of itsdevelopment the JIT was more a context threaded interpreter, callingout to C++ to perform almost all optimizations. We have migrated workinto JIT generated code only where it has been a performance benefitto do so. Of course, that doesn't mean that we always got it right,or that the trade-offs haven't changed, or that the policy might notneed to be tweaked on different platforms. Please feel free toexperiment, and if you can produce patches that reduce the amount ofwork done in these JIT generated trampolines while improvingperformance then we'll be hugely appreciative (in fact, it needn'teven be a performance win here – anything that doesn't degradeperformance could be a nice simplification).

5. The generated machine code is weakly optimized, so instead ofhaving calling code which is well-optimized by the C/C++ compilerfor MIPS, it is executing weakly optimized dynamically generatedcode. Since the code is weakly optimized, it is also much largerthan it should be, which also makes the instruction cache much lesseffective.
6. The JIT-generated code resides in the data cache, and must beflushed to main memory, then the instruction cache must beinvalidated so the new code will load into the instruction cache.Because the WebKit JIT seems to do lazy compilation of functions atcall time (instead of compiling all the functions in one pass), thisrequires the data cache to be flushed and the instruction cache tobe invalided every time a new function is generated, which furtherdegrades performance. This type of code generation strategy is okfor processors with unified caches (or pseudo-ounified on x86) butfor RISC machines with separate instruction and data caches, it'sreally awful.

Naturally on ARMv7 we face the same issue, and the costs associatedwith cache flushing are significantly outweighed by the performanceimprovements provided by the associated optimizations. There is,however, a cost here, and one that we are certainly interested inreducing. There is potential to coalesce cache flush operations toreduce the overhead. For some of the values that are patched it maymake sense to replace the instruction patching with constant poolloads, to make the values cheaper to update (of course, having aconstant pool available to the code may be beneficial on allplatforms, and is something we would be interested in introducing in across-platform fashion).

Of course, it may not prove possible to make the optimizations thatare currently implemented through code patching make sense on allplatforms. For this reason (and to assist in bringing up newplatforms) there are #defines in Platform.h to allow the patchingoptimizations to be disabled. We will be happy to accept performanceimprovements to the non-patching code paths.


cheers,
G.

_______________________________________________
webkit-dev mailing list
[email protected]
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Re: [webkit-dev] arm jit

Reply via email to