On 11/08/15 18:05, Tim Deegan wrote:
Hi,
At 17:51 +0100 on 11 Aug (1439315508), Ben Catterall wrote:
On 11/08/15 10:55, Tim Deegan wrote:
At 11:14 +0100 on 10 Aug (1439205273), Andrew Cooper wrote:
On 10/08/15 10:49, Tim Deegan wrote:
Hi,
At 17:45 +0100 on 06 Aug (1438883118), Ben Catterall wrote:
The process to switch into and out of deprivileged mode can be likened to
setjmp/longjmp.
To enter deprivileged mode, we take a copy of the stack from the guest's
registers up to the current stack pointer.
This copy is pretty unfortunate, but I can see that avoiding it will
be a bit complex. Could we do something with more stacks? AFAICS
there have to be three stacks anyway:
- one to hold the depriv execution context;
- one to hold the privileged execution context; and
- one to take interrupts on.
So maybe we could do some fiddling to make Xen take interrupts on a
different stack while we're depriv'd?
That should happen naturally by virtue of the privilege level change
involved in taking the interrupt.
Right, and this is why we need a third stack - so interrupts don't
trash the existing priv state on the 'normal' Xen stack. And so we
either need to copy the priv stack out (and maybe copy it back), or
tell the CPU to use a different stack.
The copy is relatively small and paid only on the first and last entries
into the mode. I don't know if this is cheaper than the bookwork that
would be needed on entering and returning from the mode to switch to
these stacks. I'm assuming the sp pointers in the TSS and ISTs would
need changing on the first and last entry/exit if we have the extra
stack, is that correct?
Yep.
Or, is this a more dramatic change in that
everything uses this three stack model rather than just this feature.
Well, some other parts would have to change to accomodate this new
behaviour - that was what Andrew was talking about.
BTW, I think there need to be three stacks anyway, since the depriv
code shouldn't be allowed to write to the priv code's stack frames.
Or maybe I've misunderstood how much access the depriv code will have.
So, just to clarify:
We have a separate deprivileged stack allocated which the deprivileged
code uses. This is mapped in user mode.
We have the privileged stack which Xen runs on. To prevent this being
clobbered when we are in our mode and take an interrupt, we copy this
out to a buffer. This buffer is the saved privileged stack state.
So, we sort of have three stacks already, just the privileged stack is
copied out to a buffer, rather than switching pointers to another
interrupt stack.
Hopefully that clarifies?
I'm not sure how much in Xen would need changing to switch across to
using three stacks. Also, would this also need to be done for PV guests?
Would that need to be a separate patch series?
What's the overall consensus? Thanks!
I'm not sure there is one yet -- needs some more discussion of
whether the non-copying approach is feasible.
If we had enough headroom, we could try to be clever and tell the CPU
to take interrupts on the priv stack _below_ the existing state. That
would avoid the first of your problems below.
* Under this model, PV exception handlers should copy themselves onto
the privileged execution stack.
* Currently, the IST handlers copy themselves onto the primary stack if
they interrupt guest context.
* AMD Task Register on vmexit. (this old gem)
Gah, this thing. :
Curious (and I can't seem find this in the manuals): What is this thing?
IIRC: AMD processors don't context switch TR on vmexit, which makes
using IST handlers tricky there. We'd have to do the TR context
switch ourselves, and that would be expensive. Andrew, am I
remembering that right?
Thanks!
Tim.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel