[Qemu-devel] [PATCH for-next 0/8] Improve tcg ldst optimization

Richard Henderson Mon, 05 Aug 2013 11:17:58 -0700

The existing code for the i386 ldst optimization does

        jmps    .+5
        jmpl    restart
        jmpl    restart


for the store path.  This is idiotic to say the least.  Especially
for x86_64, where we have available parameter registers.  We replace
that with a simple

        leaq    restart(%rip), %rdx

and we're also able to discard all of the code in the _mmu path that
decodes that "jmpl restart" to find the return address.

For arm, we have no free parameter registers, but we can generate a
conditional call instruction *into* the slow path, and then tail-call
from the slow path into the generic code.  This gets us the return
address set up exactly as we'd like, with the restriction that we
must instruct TCG to use the return value register for all loads.
This turns out to not be much of a restriction in practice.


r~


Richard Henderson (8):
  tcg-i386: Add and use tcg_out64
  tcg-i386: Try pc-relative lea for constant formation
  tcg-i386: Tidy qemu_ld/st slow path
  tcg: Add mmu helpers that take a return address argument
  tcg: Tidy softmmu_template.h
  tcg-i386: Use new return-argument ld/st helpers
  tcg-arm: Use ldrd/strd for appropriate qemu_ld/st64
  tcg-arm: Rearrange slow-path qemu_ld/st

 include/exec/exec-all.h         |  36 +----
 include/exec/softmmu_defs.h     |  46 +++---
 include/exec/softmmu_template.h | 309 +++++++++++++++------------------------
 tcg/arm/tcg-target.c            | 313 ++++++++++++++++++++++------------------
 tcg/i386/tcg-target.c           | 259 +++++++++++++++------------------
 tcg/tcg.c                       |   6 +
 6 files changed, 449 insertions(+), 520 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH for-next 0/8] Improve tcg ldst optimization

Reply via email to