Re: [Qemu-devel] outlined TLB lookup on x86

2014-01-22 Thread Xin Tong
I have submitted a patch to the QEMU devel list on implementing a victim tlb in QEMU. i should have you 2 CC'ed on the patch email so that you can help review the patch in case no one is reviewing it. The name of the patch is [Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system emulate

Re: [Qemu-devel] outlined TLB lookup on x86

2014-01-22 Thread Richard Henderson
On 01/22/2014 09:35 AM, Peter Maydell wrote: > I don't really know the details of Alpha, but can you get away with just > "we implement N contexts, and only actually keep the most recently > used N"? This is effectively what we're doing at the moment, with N==1. Yes, I suppose we could do that. R

Re: [Qemu-devel] outlined TLB lookup on x86

2014-01-22 Thread Richard Henderson
On 01/22/2014 08:55 AM, Peter Maydell wrote: > Has anybody ever looked at implementing proper TLB contexts? I've thought about it. The best I could come up with is a pointer within ENV that points to the current TLB context. It definitely adds another load insn on the fast path, but we should be

Re: [Qemu-devel] outlined TLB lookup on x86

2014-01-22 Thread Peter Maydell
On 22 January 2014 17:32, Richard Henderson wrote: > On 01/22/2014 08:55 AM, Peter Maydell wrote: >> Has anybody ever looked at implementing proper TLB contexts? > > I've thought about it. The best I could come up with is a pointer within ENV > that points to the current TLB context. It definite

Re: [Qemu-devel] outlined TLB lookup on x86

2014-01-22 Thread Peter Maydell
On 22 January 2014 15:28, Xin Tong wrote: > On Wed, Nov 27, 2013 at 8:12 PM, Richard Henderson wrote: >> I'd be interested to experiment with different TLB sizes, to see what effect >> that has on performance. But I suspect that lack of TLB contexts mean that >> we >> wind up flushing the TLB m

Re: [Qemu-devel] outlined TLB lookup on x86

2014-01-22 Thread Richard Henderson
On 01/22/2014 07:28 AM, Xin Tong wrote: > Can you tell me whether ARM is the only architecture that requires > special treatment for increasing tlb size beyond 256 entries so that i > can whip up a patch to the QEMU mainline. The major constraint for the non-arm ports is CPU_TLB_ENTRY_SIZE +

Re: [Qemu-devel] outlined TLB lookup on x86

2014-01-22 Thread Xin Tong
On Wed, Nov 27, 2013 at 8:12 PM, Richard Henderson wrote: > On 11/27/2013 08:41 PM, Xin Tong wrote: >> I am trying to implement a out-of-line TLB lookup for QEMU softmmu-x86-64 on >> x86-64 machine, potentially for better instruction cache performance, I have >> a >> few questions. >> >> 1. I se

Re: [Qemu-devel] outlined TLB lookup on x86

2014-01-21 Thread Peter Maydell
On 21 January 2014 14:22, Xin Tong wrote: > I have found that adding a small (8-entry) fully associative victim > TLB (http://en.wikipedia.org/wiki/Victim_Cache) before the refill path > (page table walking) improves the performance of QEMU x86_64 system > emulation mode significantly on the speci

Re: [Qemu-devel] outlined TLB lookup on x86

2014-01-21 Thread Xin Tong
Hi I have found that adding a small (8-entry) fully associative victim TLB (http://en.wikipedia.org/wiki/Victim_Cache) before the refill path (page table walking) improves the performance of QEMU x86_64 system emulation mode significantly on the specint2006 benchmarks. This is primarily due to the

Re: [Qemu-devel] outlined TLB lookup on x86

2013-12-17 Thread Xin Tong
why is QEMU TLB organized based on the modes, e.g. on x86 there are 3 modes. what i think is that there may be conflicts between virtual addresses and physical addresses. organizing it by modes guarantees that QEMU does not hit a physical address translation entry when in user mode and vice versa ?

Re: [Qemu-devel] outlined TLB lookup on x86

2013-12-17 Thread Xin Tong
On Sun, Dec 8, 2013 at 2:54 AM, Xin Tong wrote: > > > > On Thu, Nov 28, 2013 at 8:12 AM, Lluís Vilanova wrote: >> >> Xin Tong writes: >> >> > Hi LIuis >> > we can probably generate vector intrinsics using the tcg, e.g. add >> > support to >> > tcg to emit vector instructions directly in code cach

Re: [Qemu-devel] outlined TLB lookup on x86

2013-12-09 Thread Lluís Vilanova
Xin Tong writes: > On Thu, Nov 28, 2013 at 8:12 AM, Lluís Vilanova wrote: > Xin Tong writes: >> Hi LIuis >> we can probably generate vector intrinsics using the tcg, e.g. add support > to >> tcg to emit vector instructions directly in code cache > There was some discuss

Re: [Qemu-devel] outlined TLB lookup on x86

2013-12-09 Thread Xin Tong
On Thu, Nov 28, 2013 at 8:12 AM, Lluís Vilanova wrote: > Xin Tong writes: > > > Hi LIuis > > we can probably generate vector intrinsics using the tcg, e.g. add > support to > > tcg to emit vector instructions directly in code cache > > There was some discussion long ago about adding vector instru

Re: [Qemu-devel] outlined TLB lookup on x86

2013-12-08 Thread Avi Kivity
On 11/28/2013 04:12 AM, Richard Henderson wrote: 2. why not use a TLB or bigger size? currently the TLB has 1<<8 entries. the TLB lookup is 10 x86 instructions , but every miss needs ~450 instructions, i measured this using Intel PIN. so even the miss rate is low (say 3%) the overall time spent

Re: [Qemu-devel] outlined TLB lookup on x86

2013-12-08 Thread Xin Tong
On Thu, Nov 28, 2013 at 8:12 AM, Lluís Vilanova wrote: > Xin Tong writes: > > > Hi LIuis > > we can probably generate vector intrinsics using the tcg, e.g. add > support to > > tcg to emit vector instructions directly in code cache > > There was some discussion long ago about adding vector instru

Re: [Qemu-devel] outlined TLB lookup on x86

2013-11-28 Thread Lluís Vilanova
Xin Tong writes: > Hi LIuis > we can probably generate vector intrinsics using the tcg, e.g. add support to > tcg to emit vector instructions directly in code cache There was some discussion long ago about adding vector instructions to TCG, but I don't remember what was the conclusion. Also reme

Re: [Qemu-devel] outlined TLB lookup on x86

2013-11-27 Thread Xin Tong
On Wed, Nov 27, 2013 at 6:12 PM, Richard Henderson wrote: > On 11/27/2013 08:41 PM, Xin Tong wrote: > > I am trying to implement a out-of-line TLB lookup for QEMU > softmmu-x86-64 on > > x86-64 machine, potentially for better instruction cache performance, I > have a > > few questions. > > > > 1

Re: [Qemu-devel] outlined TLB lookup on x86

2013-11-27 Thread Richard Henderson
On 11/27/2013 08:41 PM, Xin Tong wrote: > I am trying to implement a out-of-line TLB lookup for QEMU softmmu-x86-64 on > x86-64 machine, potentially for better instruction cache performance, I have a > few questions. > > 1. I see that tcg_out_qemu_ld_slow_path/tcg_out_qemu_st_slow_path are > gen

Re: [Qemu-devel] outlined TLB lookup on x86

2013-11-27 Thread Xin Tong
Hi LIuis we can probably generate vector intrinsics using the tcg, e.g. add support to tcg to emit vector instructions directly in code cache why would a larger TLB make some operations slower, the TLB is a direct-mapped hash and lookup should be O(1) there. In the cputlb, the CPU_TLB_SIZE is a

Re: [Qemu-devel] outlined TLB lookup on x86

2013-11-27 Thread Lluís Vilanova
Xin Tong writes: > I am trying to implement a out-of-line TLB lookup for QEMU softmmu-x86-64 on > x86-64 machine, potentially for better instruction cache performance, I have a > few questions. > 1. I see that tcg_out_qemu_ld_slow_path/tcg_out_qemu_st_slow_path are > generated > when tcg_out_tb_

[Qemu-devel] outlined TLB lookup on x86

2013-11-26 Thread Xin Tong
I am trying to implement a out-of-line TLB lookup for QEMU softmmu-x86-64 on x86-64 machine, potentially for better instruction cache performance, I have a few questions. 1. I see that tcg_out_qemu_ld_slow_path/tcg_out_qemu_st_slow_path are generated when tcg_out_tb_finalize is called. And when a