So I was curious about TCG and bored at the same time. :)

This series achieves 2-6% improvements in all the three benchmarks that I
tried ("dc -e10000k1000vp", 1500 digits of pi with bc, SHA-1 of 100 MB
of random data), so it must not be that bad.

It improves all three aspects of handling x86 condition codes: the good,
the bad, and the ugly.

The good is the things that are fast.  This series makes more things fast.
ZF, SF, CF and PF can be computed using TCG ops most of the time, while
remaining lazy.  They are cheap, so repeated computation is not a problem.
Only computations that require the overflow flag need the compute_eflags
helper.  Jumps whose result is computed in a previous basic block also
do, but even this could be eliminated in a future patch for ZF/SF/PF.

The bad is the things that are slow.  After this series QEMU does slow
things less.  The result of the compute_eflags helper is cached, because
the CC_OP_EFLAGS state can produce just as fast code as the specialized
states---the only cost is setting it up the first time EFLAGS is needed,
and that's unavoidable.

The ugly is the things that are rare.  This series may make rare things
slower, but at the same it makes them simpler.  For example shifts result
in a CC_OP_EFLAGS state instead of CC_OP_DYNAMIC.  This may result
in an extra call to compute EFLAGS, but at the same time exposes more
optimization opportunities downstream.

I won't really have time to look at this further, but there is more
low-hanging fruit.  Implementing (and using) movcond should be easier
after this series for example.  I'm looking at you, Richard...

If anybody can help testing it less cursorily than I did, that would be
great.

Paolo

Paolo Bonzini (14):
  i386: use OT_* consistently
  i386: introduce gen_ext_tl
  i386: factor setting of s->cc_op handling for string functions
  i386: drop cc_op argument of gen_jcc1
  i386: move eflags computation closer to gen_op_set_cc_op
  i386: factor gen_op_set_cc_op/tcg_gen_discard_tl around computing flags
  i386: add helper functions to get other flags
  i386: do not compute eflags multiple times consecutively
  i386: do not call helper to compute ZF/SF
  i386: use inverted setcond when computing NS or NZ
  i386: convert gen_compute_eflags_c to TCG
  i386: change gen_setcc_slow_T0 to gen_setcc_slow
  i386: optimize setbe
  i386: optimize setcc instructions

 target-i386/cc_helper.c          | 118 -------
 target-i386/cc_helper_template.h |  76 -----
 target-i386/helper.h             |   1 -
 target-i386/translate.c          | 691 +++++++++++++++++++++------------------
 4 file modificati, 365 inserzioni(+), 521 rimozioni(-)

-- 
1.7.12.1


Reply via email to