The existing code for the i386 ldst optimization does jmps .+5 jmpl restart jmpl restart
for the store path. This is idiotic to say the least. Especially for x86_64, where we have available parameter registers. We replace that with a simple leaq restart(%rip), %rdx and we're also able to discard all of the code in the _mmu path that decodes that "jmpl restart" to find the return address. For arm, we have no free parameter registers, but we can generate a conditional call instruction *into* the slow path, and then tail-call from the slow path into the generic code. This gets us the return address set up exactly as we'd like, with the restriction that we must instruct TCG to use the return value register for all loads. This turns out to not be much of a restriction in practice. r~ Richard Henderson (8): tcg-i386: Add and use tcg_out64 tcg-i386: Try pc-relative lea for constant formation tcg-i386: Tidy qemu_ld/st slow path tcg: Add mmu helpers that take a return address argument tcg: Tidy softmmu_template.h tcg-i386: Use new return-argument ld/st helpers tcg-arm: Use ldrd/strd for appropriate qemu_ld/st64 tcg-arm: Rearrange slow-path qemu_ld/st include/exec/exec-all.h | 36 +---- include/exec/softmmu_defs.h | 46 +++--- include/exec/softmmu_template.h | 309 +++++++++++++++------------------------ tcg/arm/tcg-target.c | 313 ++++++++++++++++++++++------------------ tcg/i386/tcg-target.c | 259 +++++++++++++++------------------ tcg/tcg.c | 6 + 6 files changed, 449 insertions(+), 520 deletions(-) -- 1.8.3.1