On 21/02/2023 7:58 am, Xenia Ragiadakou wrote:
>
> On 2/21/23 01:08, Andrew Cooper wrote:
>> On 17/02/2023 6:48 pm, Xenia Ragiadakou wrote:
>>> Remove the forward declaration of struct vcpu because it is not used.
>>
>> Huh, turns out that was my fault in c/s b158e72abe, shortly after I
>> introduced them in the first place.
>>
>> Also, looking into that, there's one legitimate use of svm.h from
>> outside, which is svm_load_segs*() which means we can't make all the
>> headers be local.
>>
>> But still, most of svm.h shouldn't be includable in the rest of Xen.
>> Perhaps we can make a separate dedicated header for just this.
>>
>> [edit]  And svm_{doman,vcpu}.  Perhaps we want a brand new
>> include/asm/hvm/svm.h with only the things needed elsewhere.
>
> This can be done as part of the series aiming to make svm/vmx support
> configurable.

Ok, that's fine.

Honestly, there's a lot of cleanup which can be done.  We probably want
to end up making a number of $foo-types.h headers so we can construct
struct vcpu/domain without xen/sched.h including the majority of Xen in
one fell swoop.

>
>>
>> That said, we do need to stea^W borrow adaptive PLE, and make the
>
> I cannot understand what do you mean by "stea^W borrow adaptive PLE".

Pause Loop Exiting is tricky to get right.

The common expectation is that PLE hits in a spinlock or other mutual
exclusion primitive.

It is generally good to let the vCPU spin for a bit, in the expectation
that the other vCPU holding lock will release it in a timely fashion. 
Spinning for a few iterations (even a few hundred) is far lower overhead
than taking a vmexit.

But if the other vCPU isn't executing, it can never release the lock,
and letting the current vCPU spin does double damage because it consumes
the domain's scheduler credit, which in turn pushes out the reschedule
of the vCPU that's actually holding the lock.  (This is why paravirt
spinlocks are so useful in virt.  If you get them right, you end up with
only the vcpus that can usefully do work to release a lock executing.)


For overall system performance, there is a tradeoff between how long you
let a vCPU spin, and when it's better to force such a vCPU to yield. 
This point depends on the typical spinlock contention inside the guest,
and the overall system busyness, both of which vary over time.

Picking fixed numbers for PLE is better than not having PLE in the first
place, but only just.  There is an algorithm called adaptive-PLE which
tries to balance the PLE settings over time to optimise overall system
performance.

~Andrew

Reply via email to