The MMU is placed within CPUNegativeOffsetState, which means the smallest negative offsets are at the end of the struct (see comment for struct CPUTLB).
But in target/cpu.h usually MMU indexes in the range 0-8 are used, which means that the negative offsets are bigger than if MMU indexes 9-15 would have been used. This patch inverts the given MMU index, so that the MMU indices now count down from (MMU_USER_IDX-1) to 0 and thus the tcg will see smaller negative offsets. When looking at the generated code, for every memory-access in the guest the x86-64 tcg generated up to now: IN: 0x000ebdf5: 8b 04 24 movl (%esp), %eax OUT: ... 0x003619: 48 23 bd 10 ff ff ff andq -0xf0(%rbp), %rdi 0x003620: 48 03 bd 18 ff ff ff addq -0xe8(%rbp), %rdi ... With the smaller negative offset it will now instead generate: OUT: ... 0x003499: 48 23 7d c0 andq -0x40(%rbp), %rdi 0x00349d: 48 03 7d c8 addq -0x38(%rbp), %rdi So, every memory acces in the guest now saves 6 bytes (=2 * 3) of instruction code in the fast path. Overall, this patch reduces the generated instruction size by ~3% and may improve overall performance. Signed-off-by: Helge Deller <del...@gmx.de> --- include/exec/cpu-defs.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h index 07bcdd38b2..7ba0481bc4 100644 --- a/include/exec/cpu-defs.h +++ b/include/exec/cpu-defs.h @@ -62,8 +62,13 @@ /* * MMU_INDEX() helper to specify MMU index. + * + * Inverse the number here to count downwards from NB_MMU_MODES-1 to 0. Since + * the MMU is placed within CPUNegativeOffsetState, this makes the negative + * offsets smaller for which the tcg backend will generate shorter instruction + * sequencies to access the MMU. */ -#define MMU_INDEX(n) (n) +#define MMU_INDEX(n) (NB_MMU_MODES - 1 - (n)) #if defined(CONFIG_SOFTMMU) && defined(CONFIG_TCG) #include "exec/tlb-common.h" -- 2.41.0