I have submitted a patch to the QEMU devel list on implementing a
victim tlb in QEMU. i should have you 2 CC'ed on the patch email so
that you can help review the patch in case no one is reviewing it. The
name of the patch is
[Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU system emulate
On 01/22/2014 09:35 AM, Peter Maydell wrote:
> I don't really know the details of Alpha, but can you get away with just
> "we implement N contexts, and only actually keep the most recently
> used N"? This is effectively what we're doing at the moment, with N==1.
Yes, I suppose we could do that. R
On 01/22/2014 08:55 AM, Peter Maydell wrote:
> Has anybody ever looked at implementing proper TLB contexts?
I've thought about it. The best I could come up with is a pointer within ENV
that points to the current TLB context. It definitely adds another load insn
on the fast path, but we should be
On 22 January 2014 17:32, Richard Henderson wrote:
> On 01/22/2014 08:55 AM, Peter Maydell wrote:
>> Has anybody ever looked at implementing proper TLB contexts?
>
> I've thought about it. The best I could come up with is a pointer within ENV
> that points to the current TLB context. It definite
On 22 January 2014 15:28, Xin Tong wrote:
> On Wed, Nov 27, 2013 at 8:12 PM, Richard Henderson wrote:
>> I'd be interested to experiment with different TLB sizes, to see what effect
>> that has on performance. But I suspect that lack of TLB contexts mean that
>> we
>> wind up flushing the TLB m
On 01/22/2014 07:28 AM, Xin Tong wrote:
> Can you tell me whether ARM is the only architecture that requires
> special treatment for increasing tlb size beyond 256 entries so that i
> can whip up a patch to the QEMU mainline.
The major constraint for the non-arm ports is
CPU_TLB_ENTRY_SIZE +
On Wed, Nov 27, 2013 at 8:12 PM, Richard Henderson wrote:
> On 11/27/2013 08:41 PM, Xin Tong wrote:
>> I am trying to implement a out-of-line TLB lookup for QEMU softmmu-x86-64 on
>> x86-64 machine, potentially for better instruction cache performance, I have
>> a
>> few questions.
>>
>> 1. I se
On 21 January 2014 14:22, Xin Tong wrote:
> I have found that adding a small (8-entry) fully associative victim
> TLB (http://en.wikipedia.org/wiki/Victim_Cache) before the refill path
> (page table walking) improves the performance of QEMU x86_64 system
> emulation mode significantly on the speci
Hi
I have found that adding a small (8-entry) fully associative victim
TLB (http://en.wikipedia.org/wiki/Victim_Cache) before the refill path
(page table walking) improves the performance of QEMU x86_64 system
emulation mode significantly on the specint2006 benchmarks. This is
primarily due to the
why is QEMU TLB organized based on the modes, e.g. on x86 there are 3
modes. what i think is that there may be conflicts between virtual
addresses and physical addresses. organizing it by modes guarantees
that QEMU does not hit a physical address translation entry when in
user mode and vice versa ?
On Sun, Dec 8, 2013 at 2:54 AM, Xin Tong wrote:
>
>
>
> On Thu, Nov 28, 2013 at 8:12 AM, Lluís Vilanova wrote:
>>
>> Xin Tong writes:
>>
>> > Hi LIuis
>> > we can probably generate vector intrinsics using the tcg, e.g. add
>> > support to
>> > tcg to emit vector instructions directly in code cach
Xin Tong writes:
> On Thu, Nov 28, 2013 at 8:12 AM, Lluís Vilanova wrote:
> Xin Tong writes:
>> Hi LIuis
>> we can probably generate vector intrinsics using the tcg, e.g. add support
> to
>> tcg to emit vector instructions directly in code cache
> There was some discuss
On Thu, Nov 28, 2013 at 8:12 AM, Lluís Vilanova wrote:
> Xin Tong writes:
>
> > Hi LIuis
> > we can probably generate vector intrinsics using the tcg, e.g. add
> support to
> > tcg to emit vector instructions directly in code cache
>
> There was some discussion long ago about adding vector instru
On 11/28/2013 04:12 AM, Richard Henderson wrote:
2. why not use a TLB or bigger size? currently the TLB has 1<<8 entries. the
TLB lookup is 10 x86 instructions , but every miss needs ~450 instructions, i
measured this using Intel PIN. so even the miss rate is low (say 3%) the
overall time spent
On Thu, Nov 28, 2013 at 8:12 AM, Lluís Vilanova wrote:
> Xin Tong writes:
>
> > Hi LIuis
> > we can probably generate vector intrinsics using the tcg, e.g. add
> support to
> > tcg to emit vector instructions directly in code cache
>
> There was some discussion long ago about adding vector instru
Xin Tong writes:
> Hi LIuis
> we can probably generate vector intrinsics using the tcg, e.g. add support to
> tcg to emit vector instructions directly in code cache
There was some discussion long ago about adding vector instructions to TCG, but
I don't remember what was the conclusion.
Also reme
On Wed, Nov 27, 2013 at 6:12 PM, Richard Henderson wrote:
> On 11/27/2013 08:41 PM, Xin Tong wrote:
> > I am trying to implement a out-of-line TLB lookup for QEMU
> softmmu-x86-64 on
> > x86-64 machine, potentially for better instruction cache performance, I
> have a
> > few questions.
> >
> > 1
On 11/27/2013 08:41 PM, Xin Tong wrote:
> I am trying to implement a out-of-line TLB lookup for QEMU softmmu-x86-64 on
> x86-64 machine, potentially for better instruction cache performance, I have a
> few questions.
>
> 1. I see that tcg_out_qemu_ld_slow_path/tcg_out_qemu_st_slow_path are
> gen
Hi LIuis
we can probably generate vector intrinsics using the tcg, e.g. add support
to tcg to emit vector instructions directly in code cache
why would a larger TLB make some operations slower, the TLB is a
direct-mapped hash and lookup should be O(1) there. In the cputlb, the
CPU_TLB_SIZE is a
Xin Tong writes:
> I am trying to implement a out-of-line TLB lookup for QEMU softmmu-x86-64 on
> x86-64 machine, potentially for better instruction cache performance, I have a
> few questions.
> 1. I see that tcg_out_qemu_ld_slow_path/tcg_out_qemu_st_slow_path are
> generated
> when tcg_out_tb_
I am trying to implement a out-of-line TLB lookup for QEMU softmmu-x86-64
on x86-64 machine, potentially for better instruction cache performance, I
have a few questions.
1. I see that tcg_out_qemu_ld_slow_path/tcg_out_qemu_st_slow_path are
generated when tcg_out_tb_finalize is called. And when a
21 matches
Mail list logo