On 07/12/2012 11:25 AM, Raghavendra K T wrote:
>>
>> The problem occurs even with no overcommit at all. One vcpu is in a
>> legitimately long pause loop. All those exits accomplish nothing, since
>> all vcpus are scheduled. Better to let it spin in guest mode.
>>
>
> I agree. One idea is we
On Thu, 2012-07-12 at 11:12 +0300, Avi Kivity wrote:
> On 07/12/2012 05:17 AM, Benjamin Herrenschmidt wrote:
> >> ARM doesn't have an instruction for cpu_relax(), so it can't intercept
> >> it. Given ppc's dislike of overcommit, and the way it implements
> >> cpu_relax() by adjusting hw thread
On Wed, 11 Jul 2012 14:04:03 +0300, Avi Kivity wrote:
>
> > So this would probably improve guests that uses cpu_relax, for example
> > stop_machine_run. I have no measurements, though.
>
> smp_call_function() too (though that can be converted to directed yield
> too). It seems worthwhile.
>
On 07/12/2012 01:41 PM, Avi Kivity wrote:
On 07/12/2012 08:11 AM, Raghavendra K T wrote:
Ah, I thouht you objected to the CONFIG var. Maybe call it
cpu_relax_intercepted since that's the linuxy name for the instruction.
Ok, just to be on same page. 'll have :
1. cpu_relax_intercepted
On 07/12/2012 01:45 PM, Avi Kivity wrote:
On 07/11/2012 05:01 PM, Raghavendra K T wrote:
On 07/11/2012 07:29 PM, Raghavendra K T wrote:
On 07/11/2012 02:30 PM, Avi Kivity wrote:
On 07/10/2012 12:47 AM, Andrew Theurer wrote:
For the cpu threads in the host that are actually active (in this
On 07/11/2012 05:01 PM, Raghavendra K T wrote:
> On 07/11/2012 07:29 PM, Raghavendra K T wrote:
>> On 07/11/2012 02:30 PM, Avi Kivity wrote:
>>> On 07/10/2012 12:47 AM, Andrew Theurer wrote:
For the cpu threads in the host that are actually active (in this case
1/2 of them), ~50% of
On 07/12/2012 05:17 AM, Benjamin Herrenschmidt wrote:
>> ARM doesn't have an instruction for cpu_relax(), so it can't intercept
>> it. Given ppc's dislike of overcommit, and the way it implements
>> cpu_relax() by adjusting hw thread priority, I'm guessing it doesn't
>> intercept those either,
On 07/12/2012 08:11 AM, Raghavendra K T wrote:
>> Ah, I thouht you objected to the CONFIG var. Maybe call it
>> cpu_relax_intercepted since that's the linuxy name for the instruction.
>>
>
> Ok, just to be on same page. 'll have :
> 1. cpu_relax_intercepted instead of pause_loop_exited.
>
> 2.
On 07/12/2012 08:11 AM, Raghavendra K T wrote:
Ah, I thouht you objected to the CONFIG var. Maybe call it
cpu_relax_intercepted since that's the linuxy name for the instruction.
Ok, just to be on same page. 'll have :
1. cpu_relax_intercepted instead of pause_loop_exited.
2.
On 07/12/2012 05:17 AM, Benjamin Herrenschmidt wrote:
ARM doesn't have an instruction for cpu_relax(), so it can't intercept
it. Given ppc's dislike of overcommit, and the way it implements
cpu_relax() by adjusting hw thread priority, I'm guessing it doesn't
intercept those either, but I'm
On 07/11/2012 05:01 PM, Raghavendra K T wrote:
On 07/11/2012 07:29 PM, Raghavendra K T wrote:
On 07/11/2012 02:30 PM, Avi Kivity wrote:
On 07/10/2012 12:47 AM, Andrew Theurer wrote:
For the cpu threads in the host that are actually active (in this case
1/2 of them), ~50% of their time is in
On 07/12/2012 01:45 PM, Avi Kivity wrote:
On 07/11/2012 05:01 PM, Raghavendra K T wrote:
On 07/11/2012 07:29 PM, Raghavendra K T wrote:
On 07/11/2012 02:30 PM, Avi Kivity wrote:
On 07/10/2012 12:47 AM, Andrew Theurer wrote:
For the cpu threads in the host that are actually active (in this
On 07/12/2012 01:41 PM, Avi Kivity wrote:
On 07/12/2012 08:11 AM, Raghavendra K T wrote:
Ah, I thouht you objected to the CONFIG var. Maybe call it
cpu_relax_intercepted since that's the linuxy name for the instruction.
Ok, just to be on same page. 'll have :
1. cpu_relax_intercepted
On Wed, 11 Jul 2012 14:04:03 +0300, Avi Kivity a...@redhat.com wrote:
So this would probably improve guests that uses cpu_relax, for example
stop_machine_run. I have no measurements, though.
smp_call_function() too (though that can be converted to directed yield
too). It seems
On Thu, 2012-07-12 at 11:12 +0300, Avi Kivity wrote:
On 07/12/2012 05:17 AM, Benjamin Herrenschmidt wrote:
ARM doesn't have an instruction for cpu_relax(), so it can't intercept
it. Given ppc's dislike of overcommit, and the way it implements
cpu_relax() by adjusting hw thread priority,
On 07/12/2012 11:25 AM, Raghavendra K T wrote:
The problem occurs even with no overcommit at all. One vcpu is in a
legitimately long pause loop. All those exits accomplish nothing, since
all vcpus are scheduled. Better to let it spin in guest mode.
I agree. One idea is we can have a
On 07/11/2012 05:09 PM, Avi Kivity wrote:
On 07/11/2012 02:18 PM, Christian Borntraeger wrote:
On 11/07/12 13:04, Avi Kivity wrote:
On 07/11/2012 01:17 PM, Christian Borntraeger wrote:
On 11/07/12 11:06, Avi Kivity wrote:
[...]
Almost all s390 kernels use diag9c (directed yield to a given
On Wed, 2012-07-11 at 14:23 +0300, Avi Kivity wrote:
> On 07/11/2012 02:16 PM, Alexander Graf wrote:
> >>
> >>> yes the data structure itself seems based on the algorithm
> >>> and not on arch specific things. That should work. If we move that to
> >>> common
> >>> code then s390 will use that
> ARM doesn't have an instruction for cpu_relax(), so it can't intercept
> it. Given ppc's dislike of overcommit, and the way it implements
> cpu_relax() by adjusting hw thread priority, I'm guessing it doesn't
> intercept those either, but I'm copying the ppc people in case I'm
> wrong. So it's
On 07/11/2012 07:29 PM, Raghavendra K T wrote:
On 07/11/2012 02:30 PM, Avi Kivity wrote:
On 07/10/2012 12:47 AM, Andrew Theurer wrote:
For the cpu threads in the host that are actually active (in this case
1/2 of them), ~50% of their time is in kernel and ~43% in guest. This
is for a no-IO
On 07/11/2012 02:30 PM, Avi Kivity wrote:
On 07/10/2012 12:47 AM, Andrew Theurer wrote:
For the cpu threads in the host that are actually active (in this case
1/2 of them), ~50% of their time is in kernel and ~43% in guest. This
is for a no-IO workload, so that's just incredible to see so
On 07/11/2012 05:21 PM, Raghavendra K T wrote:
On 07/11/2012 03:47 PM, Christian Borntraeger wrote:
On 11/07/12 11:06, Avi Kivity wrote:
[...]
So there is no win here, but there are other cases were diag44 is
used, e.g. cpu_relax.
I have to double check with others, if these cases are
On 07/11/2012 02:52 PM, Alexander Graf wrote:
>
> On 11.07.2012, at 13:23, Avi Kivity wrote:
>
>> On 07/11/2012 02:16 PM, Alexander Graf wrote:
> yes the data structure itself seems based on the algorithm
> and not on arch specific things. That should work. If we move that to
On 07/11/2012 05:25 PM, Christian Borntraeger wrote:
On 11/07/12 13:51, Raghavendra K T wrote:
Almost all s390 kernels use diag9c (directed yield to a given guest cpu) for
spinlocks, though.
Perhaps x86 should copy this.
See arch/s390/lib/spinlock.c
The basic idea is using several
On 11/07/12 13:51, Raghavendra K T wrote:
Almost all s390 kernels use diag9c (directed yield to a given guest cpu)
for spinlocks, though.
>>>
>>> Perhaps x86 should copy this.
>>
>> See arch/s390/lib/spinlock.c
>> The basic idea is using several heuristics:
>> - loop for a given amount
On 07/11/2012 03:47 PM, Christian Borntraeger wrote:
On 11/07/12 11:06, Avi Kivity wrote:
[...]
Almost all s390 kernels use diag9c (directed yield to a given guest cpu) for
spinlocks, though.
Perhaps x86 should copy this.
See arch/s390/lib/spinlock.c
The basic idea is using several
On 11.07.2012, at 13:23, Avi Kivity wrote:
> On 07/11/2012 02:16 PM, Alexander Graf wrote:
>>>
yes the data structure itself seems based on the algorithm
and not on arch specific things. That should work. If we move that to
common
code then s390 will use that scheme
On 07/11/2012 02:18 PM, Christian Borntraeger wrote:
> On 11/07/12 13:04, Avi Kivity wrote:
>> On 07/11/2012 01:17 PM, Christian Borntraeger wrote:
>>> On 11/07/12 11:06, Avi Kivity wrote:
>>> [...]
> Almost all s390 kernels use diag9c (directed yield to a given guest cpu)
> for
On 07/11/2012 02:16 PM, Alexander Graf wrote:
>>
>>> yes the data structure itself seems based on the algorithm
>>> and not on arch specific things. That should work. If we move that to
>>> common
>>> code then s390 will use that scheme automatically for the cases were we
>>> call
>>>
On 11/07/12 13:04, Avi Kivity wrote:
> On 07/11/2012 01:17 PM, Christian Borntraeger wrote:
>> On 11/07/12 11:06, Avi Kivity wrote:
>> [...]
Almost all s390 kernels use diag9c (directed yield to a given guest cpu)
for spinlocks, though.
>>>
>>> Perhaps x86 should copy this.
>>
>> See
On 11.07.2012, at 13:04, Avi Kivity wrote:
> On 07/11/2012 01:17 PM, Christian Borntraeger wrote:
>> On 11/07/12 11:06, Avi Kivity wrote:
>> [...]
Almost all s390 kernels use diag9c (directed yield to a given guest cpu)
for spinlocks, though.
>>>
>>> Perhaps x86 should copy this.
>>
On 07/11/2012 01:17 PM, Christian Borntraeger wrote:
> On 11/07/12 11:06, Avi Kivity wrote:
> [...]
>>> Almost all s390 kernels use diag9c (directed yield to a given guest cpu)
>>> for spinlocks, though.
>>
>> Perhaps x86 should copy this.
>
> See arch/s390/lib/spinlock.c
> The basic idea is
On 11/07/12 11:06, Avi Kivity wrote:
[...]
>> Almost all s390 kernels use diag9c (directed yield to a given guest cpu) for
>> spinlocks, though.
>
> Perhaps x86 should copy this.
See arch/s390/lib/spinlock.c
The basic idea is using several heuristics:
- loop for a given amount of loops
- check
On 07/09/2012 10:55 AM, Christian Borntraeger wrote:
> On 09/07/12 08:20, Raghavendra K T wrote:
>> Currently Pause Looop Exit (PLE) handler is doing directed yield to a
>> random VCPU on PL exit. Though we already have filtering while choosing
>> the candidate to yield_to, we can do better.
>>
On 07/10/2012 12:47 AM, Andrew Theurer wrote:
>
> For the cpu threads in the host that are actually active (in this case
> 1/2 of them), ~50% of their time is in kernel and ~43% in guest. This
> is for a no-IO workload, so that's just incredible to see so much cpu
> wasted. I feel that
On 07/10/2012 12:47 AM, Andrew Theurer wrote:
For the cpu threads in the host that are actually active (in this case
1/2 of them), ~50% of their time is in kernel and ~43% in guest. This
is for a no-IO workload, so that's just incredible to see so much cpu
wasted. I feel that 2
On 07/09/2012 10:55 AM, Christian Borntraeger wrote:
On 09/07/12 08:20, Raghavendra K T wrote:
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.
Problem
On 11/07/12 11:06, Avi Kivity wrote:
[...]
Almost all s390 kernels use diag9c (directed yield to a given guest cpu) for
spinlocks, though.
Perhaps x86 should copy this.
See arch/s390/lib/spinlock.c
The basic idea is using several heuristics:
- loop for a given amount of loops
- check if the
On 07/11/2012 01:17 PM, Christian Borntraeger wrote:
On 11/07/12 11:06, Avi Kivity wrote:
[...]
Almost all s390 kernels use diag9c (directed yield to a given guest cpu)
for spinlocks, though.
Perhaps x86 should copy this.
See arch/s390/lib/spinlock.c
The basic idea is using several
On 11.07.2012, at 13:04, Avi Kivity wrote:
On 07/11/2012 01:17 PM, Christian Borntraeger wrote:
On 11/07/12 11:06, Avi Kivity wrote:
[...]
Almost all s390 kernels use diag9c (directed yield to a given guest cpu)
for spinlocks, though.
Perhaps x86 should copy this.
See
On 11/07/12 13:04, Avi Kivity wrote:
On 07/11/2012 01:17 PM, Christian Borntraeger wrote:
On 11/07/12 11:06, Avi Kivity wrote:
[...]
Almost all s390 kernels use diag9c (directed yield to a given guest cpu)
for spinlocks, though.
Perhaps x86 should copy this.
See arch/s390/lib/spinlock.c
On 07/11/2012 02:16 PM, Alexander Graf wrote:
yes the data structure itself seems based on the algorithm
and not on arch specific things. That should work. If we move that to
common
code then s390 will use that scheme automatically for the cases were we
call
kvm_vcpu_on_spin(). All
On 07/11/2012 02:18 PM, Christian Borntraeger wrote:
On 11/07/12 13:04, Avi Kivity wrote:
On 07/11/2012 01:17 PM, Christian Borntraeger wrote:
On 11/07/12 11:06, Avi Kivity wrote:
[...]
Almost all s390 kernels use diag9c (directed yield to a given guest cpu)
for spinlocks, though.
Perhaps
On 11.07.2012, at 13:23, Avi Kivity wrote:
On 07/11/2012 02:16 PM, Alexander Graf wrote:
yes the data structure itself seems based on the algorithm
and not on arch specific things. That should work. If we move that to
common
code then s390 will use that scheme automatically for the
On 07/11/2012 03:47 PM, Christian Borntraeger wrote:
On 11/07/12 11:06, Avi Kivity wrote:
[...]
Almost all s390 kernels use diag9c (directed yield to a given guest cpu) for
spinlocks, though.
Perhaps x86 should copy this.
See arch/s390/lib/spinlock.c
The basic idea is using several
On 11/07/12 13:51, Raghavendra K T wrote:
Almost all s390 kernels use diag9c (directed yield to a given guest cpu)
for spinlocks, though.
Perhaps x86 should copy this.
See arch/s390/lib/spinlock.c
The basic idea is using several heuristics:
- loop for a given amount of loops
- check if
On 07/11/2012 05:25 PM, Christian Borntraeger wrote:
On 11/07/12 13:51, Raghavendra K T wrote:
Almost all s390 kernels use diag9c (directed yield to a given guest cpu) for
spinlocks, though.
Perhaps x86 should copy this.
See arch/s390/lib/spinlock.c
The basic idea is using several
On 07/11/2012 02:52 PM, Alexander Graf wrote:
On 11.07.2012, at 13:23, Avi Kivity wrote:
On 07/11/2012 02:16 PM, Alexander Graf wrote:
yes the data structure itself seems based on the algorithm
and not on arch specific things. That should work. If we move that to
common
code then
On 07/11/2012 05:21 PM, Raghavendra K T wrote:
On 07/11/2012 03:47 PM, Christian Borntraeger wrote:
On 11/07/12 11:06, Avi Kivity wrote:
[...]
So there is no win here, but there are other cases were diag44 is
used, e.g. cpu_relax.
I have to double check with others, if these cases are
On 07/11/2012 02:30 PM, Avi Kivity wrote:
On 07/10/2012 12:47 AM, Andrew Theurer wrote:
For the cpu threads in the host that are actually active (in this case
1/2 of them), ~50% of their time is in kernel and ~43% in guest. This
is for a no-IO workload, so that's just incredible to see so
On 07/11/2012 07:29 PM, Raghavendra K T wrote:
On 07/11/2012 02:30 PM, Avi Kivity wrote:
On 07/10/2012 12:47 AM, Andrew Theurer wrote:
For the cpu threads in the host that are actually active (in this case
1/2 of them), ~50% of their time is in kernel and ~43% in guest. This
is for a no-IO
ARM doesn't have an instruction for cpu_relax(), so it can't intercept
it. Given ppc's dislike of overcommit, and the way it implements
cpu_relax() by adjusting hw thread priority, I'm guessing it doesn't
intercept those either, but I'm copying the ppc people in case I'm
wrong. So it's s390
On Wed, 2012-07-11 at 14:23 +0300, Avi Kivity wrote:
On 07/11/2012 02:16 PM, Alexander Graf wrote:
yes the data structure itself seems based on the algorithm
and not on arch specific things. That should work. If we move that to
common
code then s390 will use that scheme automatically
On 07/11/2012 05:09 PM, Avi Kivity wrote:
On 07/11/2012 02:18 PM, Christian Borntraeger wrote:
On 11/07/12 13:04, Avi Kivity wrote:
On 07/11/2012 01:17 PM, Christian Borntraeger wrote:
On 11/07/12 11:06, Avi Kivity wrote:
[...]
Almost all s390 kernels use diag9c (directed yield to a given
On Tue, 2012-07-10 at 17:24 +0530, Raghavendra K T wrote:
> On 07/10/2012 03:17 AM, Andrew Theurer wrote:
> > On Mon, 2012-07-09 at 11:50 +0530, Raghavendra K T wrote:
> >> Currently Pause Looop Exit (PLE) handler is doing directed yield to a
> >> random VCPU on PL exit. Though we already have
On 07/10/2012 03:17 AM, Andrew Theurer wrote:
On Mon, 2012-07-09 at 11:50 +0530, Raghavendra K T wrote:
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.
On 07/10/2012 03:17 AM, Andrew Theurer wrote:
On Mon, 2012-07-09 at 11:50 +0530, Raghavendra K T wrote:
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.
On 07/10/2012 03:17 AM, Andrew Theurer wrote:
> On Mon, 2012-07-09 at 11:50 +0530, Raghavendra K T wrote:
>> Currently Pause Looop Exit (PLE) handler is doing directed yield to a
>> random VCPU on PL exit. Though we already have filtering while choosing
>> the candidate to yield_to, we can do
On 07/09/2012 01:25 PM, Christian Borntraeger wrote:
On 09/07/12 08:20, Raghavendra K T wrote:
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.
Problem is,
On 07/09/2012 01:25 PM, Christian Borntraeger wrote:
On 09/07/12 08:20, Raghavendra K T wrote:
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.
Problem is,
On 07/10/2012 03:17 AM, Andrew Theurer wrote:
On Mon, 2012-07-09 at 11:50 +0530, Raghavendra K T wrote:
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.
On 07/10/2012 03:17 AM, Andrew Theurer wrote:
On Mon, 2012-07-09 at 11:50 +0530, Raghavendra K T wrote:
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.
On 07/10/2012 03:17 AM, Andrew Theurer wrote:
On Mon, 2012-07-09 at 11:50 +0530, Raghavendra K T wrote:
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.
On Tue, 2012-07-10 at 17:24 +0530, Raghavendra K T wrote:
On 07/10/2012 03:17 AM, Andrew Theurer wrote:
On Mon, 2012-07-09 at 11:50 +0530, Raghavendra K T wrote:
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering
On 07/09/2012 02:20 AM, Raghavendra K T wrote:
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.
Problem is, for large vcpu guests, we have more probability
On Mon, 2012-07-09 at 11:50 +0530, Raghavendra K T wrote:
> Currently Pause Looop Exit (PLE) handler is doing directed yield to a
> random VCPU on PL exit. Though we already have filtering while choosing
> the candidate to yield_to, we can do better.
Hi, Raghu.
> Problem is, for large vcpu
On 09/07/12 08:20, Raghavendra K T wrote:
> Currently Pause Looop Exit (PLE) handler is doing directed yield to a
> random VCPU on PL exit. Though we already have filtering while choosing
> the candidate to yield_to, we can do better.
>
> Problem is, for large vcpu guests, we have more
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.
Problem is, for large vcpu guests, we have more probability of yielding
to a bad vcpu. We are not able to
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.
Problem is, for large vcpu guests, we have more probability of yielding
to a bad vcpu. We are not able to
On 09/07/12 08:20, Raghavendra K T wrote:
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.
Problem is, for large vcpu guests, we have more probability of
On Mon, 2012-07-09 at 11:50 +0530, Raghavendra K T wrote:
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.
Hi, Raghu.
Problem is, for large vcpu guests,
On 07/09/2012 02:20 AM, Raghavendra K T wrote:
Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.
Problem is, for large vcpu guests, we have more probability
72 matches
Mail list logo