Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-24 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, September 24, 2015 3:52 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; George Dunlap; Tian, Kevin;
> xen-devel@lists.xen.org; Keir Fraser
> Subject: RE: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> >>> On 24.09.15 at 03:50, <feng...@intel.com> wrote:
> > One issue is the number of vmexits is far far bigger than the number of
> > context switch. I test it for a quite short time and it shows there are
> > 2910043 vmexits and 71733 context switch (only count the number in
> > __context_switch() since we only change the PI state in this function).
> > If we change the PI state in each vmexit/vmentry, I am afraid this will
> > hurt the performance.
> 
> Note that George has already asked whether the updating of the
> PI descriptor is expensive, without you answering. 

Updating the PI descriptor needs to be atomic, I think it should be a little
expensive.

> If this is basically
> just a memory or VMCS field write, I don't think it really matters in
> which code path it sits, regardless of the frequency of either path
> being used. Also note that whatever measuring you do in an area
> like this, it'll only be an example, 

I DON'T think it is just an example, the vmexit numbers is definitely
far larger than context switch.

> unlikely to be representative of anything. 

I don't think so!

> Plus with the tendency to eliminate VMEXITs with newer
> hardware, the penalty of this sitting in the VMEXIT path ought to go
> down.

Broadwell is really very new hardware, the VMEXITs and the number
of context switch is not in the same order of magnitudes.

Thanks,
Feng

> 
> Jan
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-24 Thread Jan Beulich
>>> On 24.09.15 at 03:50,  wrote:
> One issue is the number of vmexits is far far bigger than the number of
> context switch. I test it for a quite short time and it shows there are
> 2910043 vmexits and 71733 context switch (only count the number in
> __context_switch() since we only change the PI state in this function).
> If we change the PI state in each vmexit/vmentry, I am afraid this will
> hurt the performance.

Note that George has already asked whether the updating of the
PI descriptor is expensive, without you answering. If this is basically
just a memory or VMCS field write, I don't think it really matters in
which code path it sits, regardless of the frequency of either path
being used. Also note that whatever measuring you do in an area
like this, it'll only be an example, unlikely to be representative of
anything. Plus with the tendency to eliminate VMEXITs with newer
hardware, the penalty of this sitting in the VMEXIT path ought to go
down.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-23 Thread Wu, Feng


> -Original Message-
> From: George Dunlap [mailto:george.dun...@citrix.com]
> Sent: Wednesday, September 23, 2015 5:44 PM
> To: Jan Beulich; Wu, Feng
> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
> xen-devel@lists.xen.org; Keir Fraser
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On 09/22/2015 03:01 PM, Jan Beulich wrote:
> >>>> On 22.09.15 at 15:40, <feng...@intel.com> wrote:
> >
> >>
> >>> -Original Message-
> >>> From: Jan Beulich [mailto:jbeul...@suse.com]
> >>> Sent: Tuesday, September 22, 2015 5:00 PM
> >>> To: Wu, Feng
> >>> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; George Dunlap; Tian,
> >> Kevin;
> >>> xen-devel@lists.xen.org; Keir Fraser
> >>> Subject: RE: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core
> logic
> >>> handling
> >>>
> >>>>>> On 22.09.15 at 09:19, <feng...@intel.com> wrote:
> >>>> However, I do find some issues with my proposal above, see below:
> >>>>
> >>>> 1. Set _VPF_blocked
> >>>> 2. ret = arch_block()
> >>>> 3. if ( ret || local_events_need_delivery() )
> >>>>  clear_bit(_VPF_blocked, >pause_flags);
> >>>>
> >>>> After step #2, if ret == false, that means, we successfully changed the 
> >>>> PI
> >>>> descriptor in arch_block(), if local_events_need_delivery() returns true,
> >>>> _VPF_blocked is cleared. After that, external interrupt come in, hence
> >>>> pi_wakeup_interrupt() --> vcpu_unblock(), but _VPF_blocked is cleared,
> >>>> so vcpu_unblock() does nothing, so the vCPU's PI state is incorrect.
> >>>>
> >>>> One possible solution for it is:
> >>>> - Disable the interrupts before the check in step #3 above
> >>>> - if local_events_need_delivery() returns true, undo all the operations
> >>>>  done in arch_block()
> >>>> - Enable interrupts after _VPF_blocked gets cleared.
> >>>
> >>> Couldn't this be taken care of by, if necessary, adjusting PI state
> >>> in vmx_do_resume()?
> >>
> >> What do you mean here? Could you please elaborate? Thanks!
> >
> > From the discussion so far I understand that all you're after is that
> > the PI descriptor is correct for the period of time while the guest
> > vCPU is actually executing in guest context. If that's a correct
> > understanding of mine, then setting the vector and flags back to
> > what's needed while the guest is running would be sufficient to be
> > done right before entering the guest, i.e. in vmx_do_resume().
> 
> Along those lines, is setting the SN and NDST relatively expensive?
> 
> If they are, then of course switching them in __context_switch() makes
> sense.
> 
> But if setting them is fairly cheap, then we could just clear SN on
> every vmexit, and set SN and NDST on every vmentry, couldn't we?  

Do you mean _set_ SN (Suppress Notification) on vmexit and _clear_
SN on vmentry? I think this might be an alternative.

> Then we wouldn't need hooks on the context switch path at all (which are only
> there to catch running -> runnable and runnable -> running) -- we could
> just have the block/unblock hooks (to change NV and add / remove things
> from the list).

We still need to _clear_ SN when vCPU is being blocked.

> 
> Setting NDST on vmentry also has the advantage of being more robust --
> you don't have to carefully think about cpu migration and all that; you
> simply set it right when you're about to actually use it.

In the current solution, we set the NDST in the end of vmx_ctxt_switch_to(),
it also doesn't suffer from cpu migration, right?

> 
> Looking at your comment in pi_notification_interrupt() -- I take it that
> "pending interrupt in PIRR will be synced to vIRR" somewhere in the call
> to vmx_intr_assist()? 

Yes.
vmx_intr_assist() --> hvm_vcpu_has_pending_irq()
--> vlapic_has_pending_irq() --> vlapic_find_highest_irr()
--> hvm_funcs.sync_pir_to_irr()

> So if we were to set NDST and enable SN before
> that call, we should be able to set VCPU_KICK if we get an interrupt
> between vmx_intr_assist() and the actual vmentry as necessary.

But if the interrupt comes in between last vmexit and enabling SN here,
it cannot be injected to guest during this vmentry. This will affect the
interrupt latency, each PI happening during vCPU is in root-mode needs
to be injected to the guest during the next vmentry.

I think the current solution we have discussed these days is very clear,
and I am about to implement it. Does the above method have obvious
advantage compare to what we discussed so far?

Thanks,
Feng

> 
>  -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-23 Thread George Dunlap
On 09/23/2015 01:35 PM, Wu, Feng wrote:
> 
> 
>> -Original Message-
>> From: George Dunlap [mailto:george.dun...@citrix.com]
>> Sent: Wednesday, September 23, 2015 5:44 PM
>> To: Jan Beulich; Wu, Feng
>> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
>> xen-devel@lists.xen.org; Keir Fraser
>> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
>> logic
>> handling
>>
>> On 09/22/2015 03:01 PM, Jan Beulich wrote:
>>>>>> On 22.09.15 at 15:40, <feng...@intel.com> wrote:
>>>
>>>>
>>>>> -Original Message-
>>>>> From: Jan Beulich [mailto:jbeul...@suse.com]
>>>>> Sent: Tuesday, September 22, 2015 5:00 PM
>>>>> To: Wu, Feng
>>>>> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; George Dunlap; Tian,
>>>> Kevin;
>>>>> xen-devel@lists.xen.org; Keir Fraser
>>>>> Subject: RE: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core
>> logic
>>>>> handling
>>>>>
>>>>>>>> On 22.09.15 at 09:19, <feng...@intel.com> wrote:
>>>>>> However, I do find some issues with my proposal above, see below:
>>>>>>
>>>>>> 1. Set _VPF_blocked
>>>>>> 2. ret = arch_block()
>>>>>> 3. if ( ret || local_events_need_delivery() )
>>>>>>  clear_bit(_VPF_blocked, >pause_flags);
>>>>>>
>>>>>> After step #2, if ret == false, that means, we successfully changed the 
>>>>>> PI
>>>>>> descriptor in arch_block(), if local_events_need_delivery() returns true,
>>>>>> _VPF_blocked is cleared. After that, external interrupt come in, hence
>>>>>> pi_wakeup_interrupt() --> vcpu_unblock(), but _VPF_blocked is cleared,
>>>>>> so vcpu_unblock() does nothing, so the vCPU's PI state is incorrect.
>>>>>>
>>>>>> One possible solution for it is:
>>>>>> - Disable the interrupts before the check in step #3 above
>>>>>> - if local_events_need_delivery() returns true, undo all the operations
>>>>>>  done in arch_block()
>>>>>> - Enable interrupts after _VPF_blocked gets cleared.
>>>>>
>>>>> Couldn't this be taken care of by, if necessary, adjusting PI state
>>>>> in vmx_do_resume()?
>>>>
>>>> What do you mean here? Could you please elaborate? Thanks!
>>>
>>> From the discussion so far I understand that all you're after is that
>>> the PI descriptor is correct for the period of time while the guest
>>> vCPU is actually executing in guest context. If that's a correct
>>> understanding of mine, then setting the vector and flags back to
>>> what's needed while the guest is running would be sufficient to be
>>> done right before entering the guest, i.e. in vmx_do_resume().
>>
>> Along those lines, is setting the SN and NDST relatively expensive?
>>
>> If they are, then of course switching them in __context_switch() makes
>> sense.
>>
>> But if setting them is fairly cheap, then we could just clear SN on
>> every vmexit, and set SN and NDST on every vmentry, couldn't we?  
> 
> Do you mean _set_ SN (Suppress Notification) on vmexit and _clear_
> SN on vmentry? I think this might be an alternative.

Er, yes, that's what I meant. :-)  Sorry, getting the set / clear
"suppress notification" mixed up with the normal sti / cli (set / clear
interrupts).

> 
>> Then we wouldn't need hooks on the context switch path at all (which are only
>> there to catch running -> runnable and runnable -> running) -- we could
>> just have the block/unblock hooks (to change NV and add / remove things
>> from the list).
> 
> We still need to _clear_ SN when vCPU is being blocked.

Yes, thanks for the reminder.

>> Setting NDST on vmentry also has the advantage of being more robust --
>> you don't have to carefully think about cpu migration and all that; you
>> simply set it right when you're about to actually use it.
> 
> In the current solution, we set the NDST in the end of vmx_ctxt_switch_to(),
> it also doesn't suffer from cpu migration, right?

It works, yes.

>> Looking at your comment in pi_notification_interrupt() -- I take it that
>> "pending interrupt in PIRR will be synced to vIRR" somewhere in the call
>> to vmx_intr_assist()? 
> 
> Yes.
> vmx_intr_assist() --> hvm_vcpu_has_pending_irq()
&g

Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-23 Thread Jan Beulich
>>> On 23.09.15 at 17:25,  wrote:
> So, at the moment my *advice* is to look into setting SN / NDST on
> vmexit/vmentry, and only having hooks at block/unblock.
> 
> However, the __context_switch() path is also OK with me, if Jan and
> Dario are happy.
> 
> Jan / Dario, do you guys have any opinions / preferences?

If this model works (and it looks to me as if it should), then I
second this being the better structured and hence preferable
approach.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-23 Thread Wu, Feng


> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Wednesday, September 23, 2015 3:12 PM
> To: Wu, Feng; George Dunlap; George Dunlap
> Cc: Jan Beulich; Tian, Kevin; Keir Fraser; Andrew Cooper;
> xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> >
> > Yes, we can go to blocked only from running state. let me clarify a
> > question first: Xen doesn't support kernel preemption, right?
> >
> No, it does not.
> 
> > ( i.e. we
> > can only perform context switch before returning to guest.)
> >
> Yes, we schedule only when SCHEDULE_SOFTIRQ is checked and found to be
> on.

Good, thanks for the confirmation.

Thanks,
Feng
> 
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-23 Thread Dario Faggioli
On Wed, 2015-09-23 at 06:35 +, Wu, Feng wrote:
> > From: George Dunlap [mailto:george.dun...@citrix.com]
> > On 09/22/2015 08:19 AM, Wu, Feng wrote:

> > > In the arch_block() hook, we actually need to
> > >   - Put vCPU to the blocking list
> > >   - Set the NV to wakeup vector
> > >   - Set NDST to the right pCPU
> > >   - Clear SN
> > 
> > Nit: We shouldn't need to actually clear SN here; SN should already
> > be
> > clear because the vcpu should be currently running.  (I don't think
> > there's a way for a vcpu to go from runnable->blocked, is there?) 
> >  And
> > if it's just been running, then NDST should also already be the
> > correct
> > pcpu.
> 
> Yes, we can go to blocked only from running state. let me clarify a
> question first: Xen doesn't support kernel preemption, right?
>
No, it does not.

> ( i.e. we
> can only perform context switch before returning to guest.) 
>
Yes, we schedule only when SCHEDULE_SOFTIRQ is checked and found to be
on.

Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-23 Thread Wu, Feng


> -Original Message-
> From: George Dunlap [mailto:george.dun...@citrix.com]
> Sent: Tuesday, September 22, 2015 6:27 PM
> To: Wu, Feng; Dario Faggioli; George Dunlap
> Cc: Jan Beulich; Tian, Kevin; Keir Fraser; Andrew Cooper;
> xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On 09/22/2015 08:19 AM, Wu, Feng wrote:
> >
> >
> >> -Original Message-
> >> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> >> Sent: Monday, September 21, 2015 10:25 PM
> >> To: Wu, Feng; George Dunlap; George Dunlap
> >> Cc: Jan Beulich; Tian, Kevin; Keir Fraser; Andrew Cooper;
> >> xen-devel@lists.xen.org
> >> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core
> logic
> >> handling
> >>
> >> On Mon, 2015-09-21 at 12:22 +, Wu, Feng wrote:
> >>>
> >>>> -Original Message-
> >>>> From: George Dunlap [mailto:george.dun...@citrix.com]
> >>
> >>>> You also need to check that local_events_need_delivery() will
> >>>> return
> >>>> "true" if you get an interrupt between that time and entering the
> >>>> hypervisor.  Will that happen automatically from
> >>>> hvm_local_events_need_delivery() -> hvm_vcpu_has_pending_irq() ->
> >>>> vlapic_has_pending_irq()?  Or will you need to add a hook in
> >>>> hvm_vcpu_has_pending_irq()?
> >>>
> >>> I think I don't need to add hook in hvm_vcpu_has_pending_irq(), what
> >>> I need
> >>> to do in vcpu_block() and do_poll() is as below:
> >>>
> >>> 1. set_bit(_VPF_blocked, >pause_flags);
> >>>
> >>> 2. ret = v->arch.arch_block(), in this hook, we can re-use the same
> >>> logic in
> >>> vmx_pre_ctx_switch_pi(), and check whether ON bit is set during
> >>> updating
> >>> posted-interrupt descriptor, can return 1 when ON is set
> >>>
> >> It think it would be ok for an hook to return a value (maybe, if doing
> >> that, let's pick variable names and use comments to explain what goes
> >> on as good as we can).
> >>
> >> I think I also see why you seem to prefer doing it that way, rather
> >> than hacking local_events_need_delivery(), but can you please elaborate
> >> on that? (My feeling is that you want to avoid having to update the
> >> data structures in between _VPF_blocked and the check
> >> local_events_need_delivery(), and then having to roll such update back
> >> if local_events_need_delivery() ends up being false, is that the
> >> case?).
> >
> > In the arch_block() hook, we actually need to
> > - Put vCPU to the blocking list
> > - Set the NV to wakeup vector
> > - Set NDST to the right pCPU
> > - Clear SN
> 
> Nit: We shouldn't need to actually clear SN here; SN should already be
> clear because the vcpu should be currently running.  (I don't think
> there's a way for a vcpu to go from runnable->blocked, is there?)  And
> if it's just been running, then NDST should also already be the correct
> pcpu.

Yes, we can go to blocked only from running state. let me clarify a
question first: Xen doesn't support kernel preemption, right?( i.e. we
can only perform context switch before returning to guest.) If that is
case, we can make sure the pcpu is not changed for the running
vCPU before we set the NDST filed in PI descriptor, so we don't need
to update NDST.

> 
> > During we are updating the posted-interrupt descriptor, the VT-d
> > hardware can issue notification event and hence ON is set. If that is the
> > case, it is straightforward to return directly, and it doesn't make sense
> > we continue to do the above operations since we don't need to actually.
> 
> But checking to see if any interrupts have come in in the middle of your
> code just adds unnecessary complication.  We need to have the code in
> place to un-do all the blocking steps in the case that
> local_events_need_delivery() returns true anyway.
> 
> Additionally, while local_events_need_delivery() is only called from
> do_block and do_poll, hvm_local_events_need_delivery() is called from a
> number of other places, as is hvm_vcpu_has_pending_irq().  All those
> places presumably also need to know whether there's a PI pending to work
> properly.

local_events_need_delivery() can be called in other places, since it is wrapped
in hypercall_preempt_check(), which can be called in bunch of places. But
that shouldn

Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-23 Thread Wu, Feng


> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Wednesday, September 23, 2015 4:00 PM
> To: Wu, Feng; George Dunlap
> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
> Cooper; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> >
> > I cannot think the bad effect of the spurious PI as well. I was just
> > a little
> > confused about we can do this and why we don't do this. Maybe
> > context_switch() is a critical path, if we can bear those spurious
> > PI,
> > it is not worth adding those logic in it at the cost of some
> > performance
> > lost during scheduling. Is this your concern?
> >
> The, however small, performance implications of even only checking
> whether the hooks should be invoked is certainly good to be avoided,
> especially, on non-PI enabled (and even more so on non-VMX) hardware.
> 
> However, what I think it is more important in this case, is that not
> having the hooks in context_switch() yields a better results from an
> architectural and code organization point of view.
> It makes both the context switch code, and the PI code, easier to
> understand and to maintain.

Good to know that, thanks for the explanation!

Thanks,
Feng

> 
> So, thanks to you and Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-23 Thread Dario Faggioli
On Wed, 2015-09-23 at 05:52 +, Wu, Feng wrote:
> George & Dario, thanks much for sharing so many scheduler knowledge
> to me, it is very useful. 
>
Well, we're lucky enough that it's our job to do that. :-D

> > > So the only downside to doing everything in block(), wake(), and
> > > __context_switch() is that if a VM is offline, or preempted by a
> > > tasklet, and an interrupt comes in, we will get a spurious PI
> > > (i.e.,
> > > one
> > > which interrupts us but we can't do anything useful about).
> > > 
> > Indeed. And that also seems bearable to me. Feng, are there reasons
> > why
> > you think it's not?
> 
> I cannot think the bad effect of the spurious PI as well. I was just
> a little
> confused about we can do this and why we don't do this. Maybe
> context_switch() is a critical path, if we can bear those spurious
> PI,
> it is not worth adding those logic in it at the cost of some
> performance
> lost during scheduling. Is this your concern?
> 
The, however small, performance implications of even only checking
whether the hooks should be invoked is certainly good to be avoided,
especially, on non-PI enabled (and even more so on non-VMX) hardware.

However, what I think it is more important in this case, is that not
having the hooks in context_switch() yields a better results from an
architectural and code organization point of view.
It makes both the context switch code, and the PI code, easier to
understand and to maintain.

So, thanks to you and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-23 Thread Dario Faggioli
On Thu, 2015-09-24 at 01:50 +, Wu, Feng wrote:

> > -Original Message-
> > From: George Dunlap [mailto:george.dun...@citrix.com]

> > This seems to me to be a cleaner design.  It removes the need to
> > modify
> > any code on context-switch path.  It moves modification of NDST and
> > most
> > modifications of SN closer to where I think they logically go.  It
> > reduces the number of unnecessary PI interrupts delivered to the
> > hypervisor (by suppressing notifications most of the time spend in
> > the
> > hypervisor), and automatically deals with the "spurious interrupts
> > during tasklet execution / vcpu offline lazy context switch" issue
> > we
> > were talking about with the other thread.
> 
> One issue is the number of vmexits is far far bigger than the number
> of
> context switch. I test it for a quite short time and it shows there
> are
> 2910043 vmexits and 71733 context switch (only count the number in
> __context_switch() since we only change the PI state in this
> function).
> If we change the PI state in each vmexit/vmentry, I am afraid this
> will
> hurt the performance.
> 
Interesting. Hard to tell without actually trying, though.

Personally, I'd agree with George and Jan that the vmexit solution is
more self contained, and hence preferable.

I don't really dislike the __context_switch() solution, though, and I'd
be fine to live with it, especially considering these numbers.

I guess the absolute best would be for you to prototype both, and try
gathering some performance numbers for comparison... Is this asking too
much? :-)

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-23 Thread George Dunlap
On 09/22/2015 03:01 PM, Jan Beulich wrote:
>>>> On 22.09.15 at 15:40, <feng...@intel.com> wrote:
> 
>>
>>> -Original Message-
>>> From: Jan Beulich [mailto:jbeul...@suse.com]
>>> Sent: Tuesday, September 22, 2015 5:00 PM
>>> To: Wu, Feng
>>> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; George Dunlap; Tian, 
>> Kevin;
>>> xen-devel@lists.xen.org; Keir Fraser
>>> Subject: RE: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
>>> logic
>>> handling
>>>
>>>>>> On 22.09.15 at 09:19, <feng...@intel.com> wrote:
>>>> However, I do find some issues with my proposal above, see below:
>>>>
>>>> 1. Set _VPF_blocked
>>>> 2. ret = arch_block()
>>>> 3. if ( ret || local_events_need_delivery() )
>>>>clear_bit(_VPF_blocked, >pause_flags);
>>>>
>>>> After step #2, if ret == false, that means, we successfully changed the PI
>>>> descriptor in arch_block(), if local_events_need_delivery() returns true,
>>>> _VPF_blocked is cleared. After that, external interrupt come in, hence
>>>> pi_wakeup_interrupt() --> vcpu_unblock(), but _VPF_blocked is cleared,
>>>> so vcpu_unblock() does nothing, so the vCPU's PI state is incorrect.
>>>>
>>>> One possible solution for it is:
>>>> - Disable the interrupts before the check in step #3 above
>>>> - if local_events_need_delivery() returns true, undo all the operations
>>>>  done in arch_block()
>>>> - Enable interrupts after _VPF_blocked gets cleared.
>>>
>>> Couldn't this be taken care of by, if necessary, adjusting PI state
>>> in vmx_do_resume()?
>>
>> What do you mean here? Could you please elaborate? Thanks!
> 
> From the discussion so far I understand that all you're after is that
> the PI descriptor is correct for the period of time while the guest
> vCPU is actually executing in guest context. If that's a correct
> understanding of mine, then setting the vector and flags back to
> what's needed while the guest is running would be sufficient to be
> done right before entering the guest, i.e. in vmx_do_resume().

Along those lines, is setting the SN and NDST relatively expensive?

If they are, then of course switching them in __context_switch() makes
sense.

But if setting them is fairly cheap, then we could just clear SN on
every vmexit, and set SN and NDST on every vmentry, couldn't we?  Then
we wouldn't need hooks on the context switch path at all (which are only
there to catch running -> runnable and runnable -> running) -- we could
just have the block/unblock hooks (to change NV and add / remove things
from the list).

Setting NDST on vmentry also has the advantage of being more robust --
you don't have to carefully think about cpu migration and all that; you
simply set it right when you're about to actually use it.

Looking at your comment in pi_notification_interrupt() -- I take it that
"pending interrupt in PIRR will be synced to vIRR" somewhere in the call
to vmx_intr_assist()?  So if we were to set NDST and enable SN before
that call, we should be able to set VCPU_KICK if we get an interrupt
between vmx_intr_assist() and the actual vmentry as necessary.

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-23 Thread Wu, Feng


> -Original Message-
> From: George Dunlap [mailto:george.dun...@citrix.com]
> Sent: Wednesday, September 23, 2015 11:26 PM
> To: Wu, Feng; Jan Beulich
> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
> xen-devel@lists.xen.org; Keir Fraser
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On 09/23/2015 01:35 PM, Wu, Feng wrote:
> >
> >
> >> -Original Message-
> >> From: George Dunlap [mailto:george.dun...@citrix.com]
> >> Sent: Wednesday, September 23, 2015 5:44 PM
> >> To: Jan Beulich; Wu, Feng
> >> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
> >> xen-devel@lists.xen.org; Keir Fraser
> >> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core
> logic
> >> handling
> >>
> >> On 09/22/2015 03:01 PM, Jan Beulich wrote:
> >>>>>> On 22.09.15 at 15:40, <feng...@intel.com> wrote:
> >>>
> >>>>
> >>>>> -Original Message-
> >>>>> From: Jan Beulich [mailto:jbeul...@suse.com]
> >>>>> Sent: Tuesday, September 22, 2015 5:00 PM
> >>>>> To: Wu, Feng
> >>>>> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; George Dunlap;
> Tian,
> >>>> Kevin;
> >>>>> xen-devel@lists.xen.org; Keir Fraser
> >>>>> Subject: RE: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt
> core
> >> logic
> >>>>> handling
> >>>>>
> >>>>>>>> On 22.09.15 at 09:19, <feng...@intel.com> wrote:
> >>>>>> However, I do find some issues with my proposal above, see below:
> >>>>>>
> >>>>>> 1. Set _VPF_blocked
> >>>>>> 2. ret = arch_block()
> >>>>>> 3. if ( ret || local_events_need_delivery() )
> >>>>>>clear_bit(_VPF_blocked, >pause_flags);
> >>>>>>
> >>>>>> After step #2, if ret == false, that means, we successfully changed the
> PI
> >>>>>> descriptor in arch_block(), if local_events_need_delivery() returns 
> >>>>>> true,
> >>>>>> _VPF_blocked is cleared. After that, external interrupt come in, hence
> >>>>>> pi_wakeup_interrupt() --> vcpu_unblock(), but _VPF_blocked is cleared,
> >>>>>> so vcpu_unblock() does nothing, so the vCPU's PI state is incorrect.
> >>>>>>
> >>>>>> One possible solution for it is:
> >>>>>> - Disable the interrupts before the check in step #3 above
> >>>>>> - if local_events_need_delivery() returns true, undo all the operations
> >>>>>>  done in arch_block()
> >>>>>> - Enable interrupts after _VPF_blocked gets cleared.
> >>>>>
> >>>>> Couldn't this be taken care of by, if necessary, adjusting PI state
> >>>>> in vmx_do_resume()?
> >>>>
> >>>> What do you mean here? Could you please elaborate? Thanks!
> >>>
> >>> From the discussion so far I understand that all you're after is that
> >>> the PI descriptor is correct for the period of time while the guest
> >>> vCPU is actually executing in guest context. If that's a correct
> >>> understanding of mine, then setting the vector and flags back to
> >>> what's needed while the guest is running would be sufficient to be
> >>> done right before entering the guest, i.e. in vmx_do_resume().
> >>
> >> Along those lines, is setting the SN and NDST relatively expensive?
> >>
> >> If they are, then of course switching them in __context_switch() makes
> >> sense.
> >>
> >> But if setting them is fairly cheap, then we could just clear SN on
> >> every vmexit, and set SN and NDST on every vmentry, couldn't we?
> >
> > Do you mean _set_ SN (Suppress Notification) on vmexit and _clear_
> > SN on vmentry? I think this might be an alternative.
> 
> Er, yes, that's what I meant. :-)  Sorry, getting the set / clear
> "suppress notification" mixed up with the normal sti / cli (set / clear
> interrupts).
> 
> >
> >> Then we wouldn't need hooks on the context switch path at all (which are
> only
> >> there to catch running -> runnable and runnable -> running) -- we could
> >> just have the block/unblock hooks (to change NV and add / remo

Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-22 Thread Wu, Feng


> -Original Message-
> From: George Dunlap [mailto:george.dun...@citrix.com]
> Sent: Tuesday, September 22, 2015 10:28 PM
> To: Wu, Feng; Dario Faggioli
> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
> Cooper; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On 09/22/2015 02:25 PM, Wu, Feng wrote:
> >>> But if we want to avoid spurious PI interrupts when running idle, then
> >>> yes, we need *some* kind of a hook on the lazy context switch path.
> >>>
> >>> /me does some more thinking...
> >>
> >> To be honest, since we'll be get spurious PI interrupts in the
> >> hypervisor all the time anyway, I'm inclined at the moment not to worry
> >> to much about this case.
> >
> > Why do you say "we'll be get spurious PI interrupts in the  hypervisor all 
> > the
> time"?
> >
> > And could you please share what is your concern to handle this case to avoid
> > such spurious PI interrupts? Thanks!
> 
> So please correct me if I'm wrong in my understanding:
> 
> * When a vcpu is in runstate "running", with PI enabled, you have the PI
> vector set to "posted_interrupt_vector", SN=0.
> 
> * When in this state in non-root mode, PI interrupts result in an
> interrupt being delivered directly to the guest.
> 
> * When in this state in root mode, PI interrupts result in a
> posted_interrupt_vector interrupt being delivered to Xen.
> 
> Is that the case?

Exactly, it is how PI works today.

Thanks,
Feng

> 
> So basically, if the PI happens to come in when the guest is making a
> hypercall, or the guest is doing any other number of things that involve
> the hypervisor, then Xen will get a "spurious" PI interrupt -- spurious
> because there's nothing Xen actually needs to do about it; the guest
> interrupt will be delivered the next time we do a VMENTER.
> 
> So spurious PI interrupts are already going to happen from time to time
> -- they're not really that big of a deal.  Having them happen when a VM
> is running a tasklet or idle waiting for qemu isn't such a big deal either.



> 
>  -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-22 Thread Wu, Feng


> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Tuesday, September 22, 2015 10:39 PM
> To: George Dunlap; Wu, Feng
> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
> Cooper; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On Tue, 2015-09-22 at 15:15 +0100, George Dunlap wrote:
> > On 09/22/2015 02:52 PM, Wu, Feng wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> 
> > > > Yes, the idle to vCPUB switch is covered by __context_switch(),
> > > > but
> > > it cannot change the PI state of vCPUA at that point. Like
> > > mentioned
> > > above, in this case, spurious PI interrupts happens.
> >
> > On the contrary, when __context_switch() is called for the idle ->
> > vcpuB
> > transition in the above scenario, it is *actually* context switching
> > from vcpuA to vcpuB, since vcpuA is still actually on the cpu.
> >  Which
> > means that if you add PI code into arch->ctxt_switch_from(), the case
> > you describe above will be handled automatically.
> >
> Yep, that's exactly what I was also writing... luckily, I've seen
> George's email before getting too far with that! :-P
> 
> That's the point of lazy context switch. The context of vCPUA is still
> on pCPUA during the time idle executes. If, at the next transition
> point, we switch from idle to vCPUA again, then nothing is needed, and
> we just saved one context saving and one context restoring operation.
> If it is something else, like in your case, we need to save vCPUA's
> context, which is actually what we have on pCPUA, despite idle (and not
> vCPUA itself) was what was running.

George & Dario, thanks much for sharing so many scheduler knowledge
to me, it is very useful. I think I was making a mistake about how
__context_switch() work when I wrote the emails above. Now it is
clear, the scenario I mentioned above can be covered in __context_switch().

> 
> > So the only downside to doing everything in block(), wake(), and
> > __context_switch() is that if a VM is offline, or preempted by a
> > tasklet, and an interrupt comes in, we will get a spurious PI (i.e.,
> > one
> > which interrupts us but we can't do anything useful about).
> >
> Indeed. And that also seems bearable to me. Feng, are there reasons why
> you think it's not?

I cannot think the bad effect of the spurious PI as well. I was just a little
confused about we can do this and why we don't do this. Maybe
context_switch() is a critical path, if we can bear those spurious PI,
it is not worth adding those logic in it at the cost of some performance
lost during scheduling. Is this your concern?

Thanks,
Feng

> 
> Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-22 Thread Jan Beulich
>>> On 22.09.15 at 09:19,  wrote:
> However, I do find some issues with my proposal above, see below:
> 
> 1. Set _VPF_blocked
> 2. ret = arch_block()
> 3. if ( ret || local_events_need_delivery() )
>   clear_bit(_VPF_blocked, >pause_flags);
> 
> After step #2, if ret == false, that means, we successfully changed the PI
> descriptor in arch_block(), if local_events_need_delivery() returns true,
> _VPF_blocked is cleared. After that, external interrupt come in, hence
> pi_wakeup_interrupt() --> vcpu_unblock(), but _VPF_blocked is cleared,
> so vcpu_unblock() does nothing, so the vCPU's PI state is incorrect.
> 
> One possible solution for it is:
> - Disable the interrupts before the check in step #3 above
> - if local_events_need_delivery() returns true, undo all the operations
>  done in arch_block()
> - Enable interrupts after _VPF_blocked gets cleared.

Couldn't this be taken care of by, if necessary, adjusting PI state
in vmx_do_resume()?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-22 Thread Wu, Feng


> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Monday, September 21, 2015 10:25 PM
> To: Wu, Feng; George Dunlap; George Dunlap
> Cc: Jan Beulich; Tian, Kevin; Keir Fraser; Andrew Cooper;
> xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On Mon, 2015-09-21 at 12:22 +, Wu, Feng wrote:
> >
> > > -Original Message-
> > > From: George Dunlap [mailto:george.dun...@citrix.com]
> 
> > > You also need to check that local_events_need_delivery() will
> > > return
> > > "true" if you get an interrupt between that time and entering the
> > > hypervisor.  Will that happen automatically from
> > > hvm_local_events_need_delivery() -> hvm_vcpu_has_pending_irq() ->
> > > vlapic_has_pending_irq()?  Or will you need to add a hook in
> > > hvm_vcpu_has_pending_irq()?
> >
> > I think I don't need to add hook in hvm_vcpu_has_pending_irq(), what
> > I need
> > to do in vcpu_block() and do_poll() is as below:
> >
> > 1. set_bit(_VPF_blocked, >pause_flags);
> >
> > 2. ret = v->arch.arch_block(), in this hook, we can re-use the same
> > logic in
> > vmx_pre_ctx_switch_pi(), and check whether ON bit is set during
> > updating
> > posted-interrupt descriptor, can return 1 when ON is set
> >
> It think it would be ok for an hook to return a value (maybe, if doing
> that, let's pick variable names and use comments to explain what goes
> on as good as we can).
> 
> I think I also see why you seem to prefer doing it that way, rather
> than hacking local_events_need_delivery(), but can you please elaborate
> on that? (My feeling is that you want to avoid having to update the
> data structures in between _VPF_blocked and the check
> local_events_need_delivery(), and then having to roll such update back
> if local_events_need_delivery() ends up being false, is that the
> case?).

In the arch_block() hook, we actually need to
- Put vCPU to the blocking list
- Set the NV to wakeup vector
- Set NDST to the right pCPU
- Clear SN

During we are updating the posted-interrupt descriptor, the VT-d
hardware can issue notification event and hence ON is set. If that is the
case, it is straightforward to return directly, and it doesn't make sense
we continue to do the above operations since we don't need to actually.

> 
> Code would look better, IMO, if we manage to fold that somehow inside
> local_events_need_delivery(), but that:

As mentioned above, during updating the PI descriptor for blocking, we
need to check whether ON is set, and return if it is set. This logic cannot
be included in local_events_need_delivery(), since the update itself
has not much relationship with local_events_need_delivery().

However, I do find some issues with my proposal above, see below:

1. Set _VPF_blocked
2. ret = arch_block()
3. if ( ret || local_events_need_delivery() )
clear_bit(_VPF_blocked, >pause_flags);

After step #2, if ret == false, that means, we successfully changed the PI
descriptor in arch_block(), if local_events_need_delivery() returns true,
_VPF_blocked is cleared. After that, external interrupt come in, hence
pi_wakeup_interrupt() --> vcpu_unblock(), but _VPF_blocked is cleared,
so vcpu_unblock() does nothing, so the vCPU's PI state is incorrect.

One possible solution for it is:
- Disable the interrupts before the check in step #3 above
- if local_events_need_delivery() returns true, undo all the operations
 done in arch_block()
- Enable interrupts after _VPF_blocked gets cleared.

It is a little annoying.

Thanks,
Feng


>  1. is hard to tell without actually seeing how the code will end up
> being
>  2. might be my opinion only, so let's see what others think.
> 
> Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-22 Thread Wu, Feng


> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Tuesday, September 22, 2015 9:40 PM
> To: Wu, Feng; George Dunlap
> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
> Cooper; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On Tue, 2015-09-22 at 13:25 +, Wu, Feng wrote:
> >
> 
> > > -Original Message-
> > > From: George Dunlap [mailto:george.dun...@citrix.com]
> 
> > Specifically, consider the following scheduling case happened on
> > pCPUA:
> > vCPUA --> idle --> vCPUB
> >
> > 1. First, vCPUA is running on pCPUA, so the NDST filed in PI
> > descriptor is pCPUA
> > 2. Then vCPUA is switched out and idle is switched in running in
> > pCPUA
> > 3. Sometime later, vCPUB is switched in on pCPUA. However, the NDST
> > field
> > for vCPUA is still pCPUA, and currently, vCPUB is running on it. That
> > means
> > the spurious PI interrupts for vCPUA can disturb vCPUB (because the
> > NDST
> > field is pCPUA), it seems not so good to me.
> >
> Mmm... Ok, but you're not saying what caused the transition from vCPUA
> to idle, and from idle to vCPUB. That matters because, if this all
> happened because blockings and wakeups, it's nothing to do with lazy
> context switch, which is what we are discussing here (in the sense that
> PI data structures would, in those cases, be updated appropriately in
> block and wake hooks).

Like George mentioned before, Let's assume it is because of tasklets or
vCPU is going to offline to wait device model's IO operations, so idle
is switched in.

> 
> Also, if we're going from vCPUA to vCPUB, even if there's idle in
> between, that can't be done via lazy context switch.

Yes, in the above scenario, vCPUB to idle transition has nothing to
do with lazy context switch, however, the point here is vCPUA to
idle is related to lazy context switch, and if we don't set the SN
for vCPUA here, it will remain clear and the NDST field of vCPUA
will remain pCPUA, even some time later vCPUB is running on it.
In that case, the spurious Pi interrupts for vCPUA can disturb vCPUB.

> In fact, in this
> case, __context_switch() must be called at some point (during the idle-
> ->vCPUB switch, if my understanding is correct), and the hook will
> actually get called!

Yes, the idle to vCPUB switch is covered by __context_switch(), but
it cannot change the PI state of vCPUA at that point. Like mentioned
above, in this case, spurious PI interrupts happens.

Thanks,
Feng

> 
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-22 Thread Wu, Feng


> -Original Message-
> From: George Dunlap [mailto:george.dun...@citrix.com]
> Sent: Tuesday, September 22, 2015 6:46 PM
> To: Wu, Feng; Dario Faggioli
> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
> Cooper; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On 09/22/2015 11:43 AM, George Dunlap wrote:
> > On 09/22/2015 06:10 AM, Wu, Feng wrote:
> >>
> >>
> >>> -Original Message-
> >>> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> >>> Sent: Monday, September 21, 2015 10:12 PM
> >>> To: Wu, Feng; George Dunlap
> >>> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap;
> Andrew
> >>> Cooper; Jan Beulich
> >>> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core
> logic
> >>> handling
> >>>
> >>> On Mon, 2015-09-21 at 13:50 +, Wu, Feng wrote:
> >>>>
> >>>>> -Original Message-
> >>>>> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> >>>
> >>>>> Note that, in case of preemptions, we are switching from a non-idle
> >>>>> vcpu to another, non-idle, vcpu, so lazy context switching to the
> >>>>> idle
> >>>>> vcpu should not have much to do with this case...
> >>>>
> >>>> So do you mean in preemptions, we cannot switch from non-idle to idle
> >>>> or
> >>>> Idle to non-idle, i.e, we can only switch from non-idle to non-idle?
> >>>>
> >>> That's what I meant. It's the definition of 'preemption' and of 'idle
> >>> task/vcpu', AFICT. I mean, the idle vcpu has the lowest possible
> >>> priority ever, so it can't really preempt anyone.
> >>>
> >>> OTOH, if the idle vcpu is running, that means there weren't any active
> >>> vcpus because, e.g., all were blocked; therefore, any active vcpu
> >>> wanting to run would have to wake up (and hence go throught the proper
> >>> wake up logic) before trying to preempt idle.
> >>>
> >>> There is one possible caveat: tasklets. In fact (as you can check
> >>> yourself by looking, in the code, for tasklet_work_scheduled), it is
> >>> possible that we force the idle vcpu to execute, even when we have
> >>> active and runnable vcpus, to let it process tasklets. I'm not really
> >>> sure whether this could be a problem for you or not, can you have a
> >>> look (and/or, a try) and report back?
> >>
> >> Thanks for your information about the tasklets part, it is very important,
> >> consider the following scenario:
> >>
> >> non-idle vCPUA --> idle (tasklet) --> non-idle vCPUA
> >>
> >> The above context switch will use the lazy context switch and cannot be
> >> covered in __context_switch(), we need to change SN's state during the
> >> "non-idle to idle" and "idle to non-idle" transition, so that means we need
> >> add the PI related logic in context_switch() instead of only in
> __context_switch().
> >>
> >> Besides that, it is more robust to do the PI switch in context_switch() 
> >> which
> >> can cover lazy context switch. Maybe in future, there are some other
> >> feature needing execute task _inside_ an idle context, and it may introduce
> >> some issues due to no PI state transition, and it is a little hard to 
> >> debug.
> >
> > So a transition like the above could happen in the case of a cpu that's
> > gone offline (e.g., to allow the devicemodel to handle an IO); or, as
> > you say, if we're doing urgent work in a tasklet such that it preempts a
> > running task.
> >
> > One option would be to just ignore this -- in which case we would get
> > spurious PI interrupts, but no other major issues, right?

Specifically, consider the following scheduling case happened on pCPUA:
vCPUA --> idle --> vCPUB

1. First, vCPUA is running on pCPUA, so the NDST filed in PI descriptor is pCPUA
2. Then vCPUA is switched out and idle is switched in running in pCPUA
3. Sometime later, vCPUB is switched in on pCPUA. However, the NDST field
for vCPUA is still pCPUA, and currently, vCPUB is running on it. That means
the spurious PI interrupts for vCPUA can disturb vCPUB (because the NDST
field is pCPUA), it seems not so good to me.

> >
> > But if we want to avoid spurious PI interrupts when running idle, then
> > yes, we need *some* kind of a hook on the lazy context switch path.
> >
> > /me does some more thinking...
> 
> To be honest, since we'll be get spurious PI interrupts in the
> hypervisor all the time anyway, I'm inclined at the moment not to worry
> to much about this case.

Why do you say "we'll be get spurious PI interrupts in the  hypervisor all the 
time"?

And could you please share what is your concern to handle this case to avoid
such spurious PI interrupts? Thanks!

Thanks,
Feng

> 
>  -George
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-22 Thread Dario Faggioli
On Tue, 2015-09-22 at 13:25 +, Wu, Feng wrote:
> 

> > -Original Message-
> > From: George Dunlap [mailto:george.dun...@citrix.com]

> Specifically, consider the following scheduling case happened on
> pCPUA:
> vCPUA --> idle --> vCPUB
> 
> 1. First, vCPUA is running on pCPUA, so the NDST filed in PI
> descriptor is pCPUA
> 2. Then vCPUA is switched out and idle is switched in running in
> pCPUA
> 3. Sometime later, vCPUB is switched in on pCPUA. However, the NDST
> field
> for vCPUA is still pCPUA, and currently, vCPUB is running on it. That
> means
> the spurious PI interrupts for vCPUA can disturb vCPUB (because the
> NDST
> field is pCPUA), it seems not so good to me.
> 
Mmm... Ok, but you're not saying what caused the transition from vCPUA
to idle, and from idle to vCPUB. That matters because, if this all
happened because blockings and wakeups, it's nothing to do with lazy
context switch, which is what we are discussing here (in the sense that
PI data structures would, in those cases, be updated appropriately in
block and wake hooks).

Also, if we're going from vCPUA to vCPUB, even if there's idle in
between, that can't be done via lazy context switch. In fact, in this
case, __context_switch() must be called at some point (during the idle-
->vCPUB switch, if my understanding is correct), and the hook will
actually get called!

Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-22 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Tuesday, September 22, 2015 5:00 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; George Dunlap; Tian, Kevin;
> xen-devel@lists.xen.org; Keir Fraser
> Subject: RE: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> >>> On 22.09.15 at 09:19, <feng...@intel.com> wrote:
> > However, I do find some issues with my proposal above, see below:
> >
> > 1. Set _VPF_blocked
> > 2. ret = arch_block()
> > 3. if ( ret || local_events_need_delivery() )
> > clear_bit(_VPF_blocked, >pause_flags);
> >
> > After step #2, if ret == false, that means, we successfully changed the PI
> > descriptor in arch_block(), if local_events_need_delivery() returns true,
> > _VPF_blocked is cleared. After that, external interrupt come in, hence
> > pi_wakeup_interrupt() --> vcpu_unblock(), but _VPF_blocked is cleared,
> > so vcpu_unblock() does nothing, so the vCPU's PI state is incorrect.
> >
> > One possible solution for it is:
> > - Disable the interrupts before the check in step #3 above
> > - if local_events_need_delivery() returns true, undo all the operations
> >  done in arch_block()
> > - Enable interrupts after _VPF_blocked gets cleared.
> 
> Couldn't this be taken care of by, if necessary, adjusting PI state
> in vmx_do_resume()?

What do you mean here? Could you please elaborate? Thanks!

Thanks,
Feng

> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-22 Thread George Dunlap
On 09/22/2015 06:10 AM, Wu, Feng wrote:
> 
> 
>> -Original Message-
>> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
>> Sent: Monday, September 21, 2015 10:12 PM
>> To: Wu, Feng; George Dunlap
>> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
>> Cooper; Jan Beulich
>> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
>> logic
>> handling
>>
>> On Mon, 2015-09-21 at 13:50 +, Wu, Feng wrote:
>>>
>>>> -Original Message-
>>>> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
>>
>>>> Note that, in case of preemptions, we are switching from a non-idle
>>>> vcpu to another, non-idle, vcpu, so lazy context switching to the
>>>> idle
>>>> vcpu should not have much to do with this case...
>>>
>>> So do you mean in preemptions, we cannot switch from non-idle to idle
>>> or
>>> Idle to non-idle, i.e, we can only switch from non-idle to non-idle?
>>>
>> That's what I meant. It's the definition of 'preemption' and of 'idle
>> task/vcpu', AFICT. I mean, the idle vcpu has the lowest possible
>> priority ever, so it can't really preempt anyone.
>>
>> OTOH, if the idle vcpu is running, that means there weren't any active
>> vcpus because, e.g., all were blocked; therefore, any active vcpu
>> wanting to run would have to wake up (and hence go throught the proper
>> wake up logic) before trying to preempt idle.
>>
>> There is one possible caveat: tasklets. In fact (as you can check
>> yourself by looking, in the code, for tasklet_work_scheduled), it is
>> possible that we force the idle vcpu to execute, even when we have
>> active and runnable vcpus, to let it process tasklets. I'm not really
>> sure whether this could be a problem for you or not, can you have a
>> look (and/or, a try) and report back?
> 
> Thanks for your information about the tasklets part, it is very important,
> consider the following scenario:
> 
> non-idle vCPUA --> idle (tasklet) --> non-idle vCPUA
> 
> The above context switch will use the lazy context switch and cannot be
> covered in __context_switch(), we need to change SN's state during the
> "non-idle to idle" and "idle to non-idle" transition, so that means we need
> add the PI related logic in context_switch() instead of only in 
> __context_switch().
> 
> Besides that, it is more robust to do the PI switch in context_switch() which
> can cover lazy context switch. Maybe in future, there are some other
> feature needing execute task _inside_ an idle context, and it may introduce
> some issues due to no PI state transition, and it is a little hard to debug.

So a transition like the above could happen in the case of a cpu that's
gone offline (e.g., to allow the devicemodel to handle an IO); or, as
you say, if we're doing urgent work in a tasklet such that it preempts a
running task.

One option would be to just ignore this -- in which case we would get
spurious PI interrupts, but no other major issues, right?

But if we want to avoid spurious PI interrupts when running idle, then
yes, we need *some* kind of a hook on the lazy context switch path.

/me does some more thinking...

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-22 Thread George Dunlap
On 09/22/2015 08:19 AM, Wu, Feng wrote:
> 
> 
>> -Original Message-
>> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
>> Sent: Monday, September 21, 2015 10:25 PM
>> To: Wu, Feng; George Dunlap; George Dunlap
>> Cc: Jan Beulich; Tian, Kevin; Keir Fraser; Andrew Cooper;
>> xen-devel@lists.xen.org
>> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
>> logic
>> handling
>>
>> On Mon, 2015-09-21 at 12:22 +, Wu, Feng wrote:
>>>
>>>> -Original Message-
>>>> From: George Dunlap [mailto:george.dun...@citrix.com]
>>
>>>> You also need to check that local_events_need_delivery() will
>>>> return
>>>> "true" if you get an interrupt between that time and entering the
>>>> hypervisor.  Will that happen automatically from
>>>> hvm_local_events_need_delivery() -> hvm_vcpu_has_pending_irq() ->
>>>> vlapic_has_pending_irq()?  Or will you need to add a hook in
>>>> hvm_vcpu_has_pending_irq()?
>>>
>>> I think I don't need to add hook in hvm_vcpu_has_pending_irq(), what
>>> I need
>>> to do in vcpu_block() and do_poll() is as below:
>>>
>>> 1. set_bit(_VPF_blocked, >pause_flags);
>>>
>>> 2. ret = v->arch.arch_block(), in this hook, we can re-use the same
>>> logic in
>>> vmx_pre_ctx_switch_pi(), and check whether ON bit is set during
>>> updating
>>> posted-interrupt descriptor, can return 1 when ON is set
>>>
>> It think it would be ok for an hook to return a value (maybe, if doing
>> that, let's pick variable names and use comments to explain what goes
>> on as good as we can).
>>
>> I think I also see why you seem to prefer doing it that way, rather
>> than hacking local_events_need_delivery(), but can you please elaborate
>> on that? (My feeling is that you want to avoid having to update the
>> data structures in between _VPF_blocked and the check
>> local_events_need_delivery(), and then having to roll such update back
>> if local_events_need_delivery() ends up being false, is that the
>> case?).
> 
> In the arch_block() hook, we actually need to
>   - Put vCPU to the blocking list
>   - Set the NV to wakeup vector
>   - Set NDST to the right pCPU
>   - Clear SN

Nit: We shouldn't need to actually clear SN here; SN should already be
clear because the vcpu should be currently running.  (I don't think
there's a way for a vcpu to go from runnable->blocked, is there?)  And
if it's just been running, then NDST should also already be the correct
pcpu.

> During we are updating the posted-interrupt descriptor, the VT-d
> hardware can issue notification event and hence ON is set. If that is the
> case, it is straightforward to return directly, and it doesn't make sense
> we continue to do the above operations since we don't need to actually.

But checking to see if any interrupts have come in in the middle of your
code just adds unnecessary complication.  We need to have the code in
place to un-do all the blocking steps in the case that
local_events_need_delivery() returns true anyway.

Additionally, while local_events_need_delivery() is only called from
do_block and do_poll, hvm_local_events_need_delivery() is called from a
number of other places, as is hvm_vcpu_has_pending_irq().  All those
places presumably also need to know whether there's a PI pending to work
properly.

>> Code would look better, IMO, if we manage to fold that somehow inside
>> local_events_need_delivery(), but that:
> 
> As mentioned above, during updating the PI descriptor for blocking, we
> need to check whether ON is set, and return if it is set. This logic cannot
> be included in local_events_need_delivery(), since the update itself
> has not much relationship with local_events_need_delivery().
> 
> However, I do find some issues with my proposal above, see below:
> 
> 1. Set _VPF_blocked
> 2. ret = arch_block()
> 3. if ( ret || local_events_need_delivery() )
>   clear_bit(_VPF_blocked, >pause_flags);
> 
> After step #2, if ret == false, that means, we successfully changed the PI
> descriptor in arch_block(), if local_events_need_delivery() returns true,
> _VPF_blocked is cleared. After that, external interrupt come in, hence
> pi_wakeup_interrupt() --> vcpu_unblock(), but _VPF_blocked is cleared,
> so vcpu_unblock() does nothing, so the vCPU's PI state is incorrect.
> 
> One possible solution for it is:
> - Disable the interrupts before the check in step #3 above
> - if local_events_need_delivery() returns true, undo all the operations
>  done in arch_block()
> - E

Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-22 Thread George Dunlap
On 09/22/2015 11:43 AM, George Dunlap wrote:
> On 09/22/2015 06:10 AM, Wu, Feng wrote:
>>
>>
>>> -Original Message-
>>> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
>>> Sent: Monday, September 21, 2015 10:12 PM
>>> To: Wu, Feng; George Dunlap
>>> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
>>> Cooper; Jan Beulich
>>> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
>>> logic
>>> handling
>>>
>>> On Mon, 2015-09-21 at 13:50 +, Wu, Feng wrote:
>>>>
>>>>> -Original Message-
>>>>> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
>>>
>>>>> Note that, in case of preemptions, we are switching from a non-idle
>>>>> vcpu to another, non-idle, vcpu, so lazy context switching to the
>>>>> idle
>>>>> vcpu should not have much to do with this case...
>>>>
>>>> So do you mean in preemptions, we cannot switch from non-idle to idle
>>>> or
>>>> Idle to non-idle, i.e, we can only switch from non-idle to non-idle?
>>>>
>>> That's what I meant. It's the definition of 'preemption' and of 'idle
>>> task/vcpu', AFICT. I mean, the idle vcpu has the lowest possible
>>> priority ever, so it can't really preempt anyone.
>>>
>>> OTOH, if the idle vcpu is running, that means there weren't any active
>>> vcpus because, e.g., all were blocked; therefore, any active vcpu
>>> wanting to run would have to wake up (and hence go throught the proper
>>> wake up logic) before trying to preempt idle.
>>>
>>> There is one possible caveat: tasklets. In fact (as you can check
>>> yourself by looking, in the code, for tasklet_work_scheduled), it is
>>> possible that we force the idle vcpu to execute, even when we have
>>> active and runnable vcpus, to let it process tasklets. I'm not really
>>> sure whether this could be a problem for you or not, can you have a
>>> look (and/or, a try) and report back?
>>
>> Thanks for your information about the tasklets part, it is very important,
>> consider the following scenario:
>>
>> non-idle vCPUA --> idle (tasklet) --> non-idle vCPUA
>>
>> The above context switch will use the lazy context switch and cannot be
>> covered in __context_switch(), we need to change SN's state during the
>> "non-idle to idle" and "idle to non-idle" transition, so that means we need
>> add the PI related logic in context_switch() instead of only in 
>> __context_switch().
>>
>> Besides that, it is more robust to do the PI switch in context_switch() which
>> can cover lazy context switch. Maybe in future, there are some other
>> feature needing execute task _inside_ an idle context, and it may introduce
>> some issues due to no PI state transition, and it is a little hard to debug.
> 
> So a transition like the above could happen in the case of a cpu that's
> gone offline (e.g., to allow the devicemodel to handle an IO); or, as
> you say, if we're doing urgent work in a tasklet such that it preempts a
> running task.
> 
> One option would be to just ignore this -- in which case we would get
> spurious PI interrupts, but no other major issues, right?
> 
> But if we want to avoid spurious PI interrupts when running idle, then
> yes, we need *some* kind of a hook on the lazy context switch path.
> 
> /me does some more thinking...

To be honest, since we'll be get spurious PI interrupts in the
hypervisor all the time anyway, I'm inclined at the moment not to worry
to much about this case.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-22 Thread Jan Beulich
>>> On 22.09.15 at 15:40, <feng...@intel.com> wrote:

> 
>> -Original Message-
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Tuesday, September 22, 2015 5:00 PM
>> To: Wu, Feng
>> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; George Dunlap; Tian, 
> Kevin;
>> xen-devel@lists.xen.org; Keir Fraser
>> Subject: RE: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
>> logic
>> handling
>> 
>> >>> On 22.09.15 at 09:19, <feng...@intel.com> wrote:
>> > However, I do find some issues with my proposal above, see below:
>> >
>> > 1. Set _VPF_blocked
>> > 2. ret = arch_block()
>> > 3. if ( ret || local_events_need_delivery() )
>> >clear_bit(_VPF_blocked, >pause_flags);
>> >
>> > After step #2, if ret == false, that means, we successfully changed the PI
>> > descriptor in arch_block(), if local_events_need_delivery() returns true,
>> > _VPF_blocked is cleared. After that, external interrupt come in, hence
>> > pi_wakeup_interrupt() --> vcpu_unblock(), but _VPF_blocked is cleared,
>> > so vcpu_unblock() does nothing, so the vCPU's PI state is incorrect.
>> >
>> > One possible solution for it is:
>> > - Disable the interrupts before the check in step #3 above
>> > - if local_events_need_delivery() returns true, undo all the operations
>> >  done in arch_block()
>> > - Enable interrupts after _VPF_blocked gets cleared.
>> 
>> Couldn't this be taken care of by, if necessary, adjusting PI state
>> in vmx_do_resume()?
> 
> What do you mean here? Could you please elaborate? Thanks!

>From the discussion so far I understand that all you're after is that
the PI descriptor is correct for the period of time while the guest
vCPU is actually executing in guest context. If that's a correct
understanding of mine, then setting the vector and flags back to
what's needed while the guest is running would be sufficient to be
done right before entering the guest, i.e. in vmx_do_resume().

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-22 Thread George Dunlap
On 09/22/2015 02:52 PM, Wu, Feng wrote:
> 
> 
>> -Original Message-
>> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
>> Sent: Tuesday, September 22, 2015 9:40 PM
>> To: Wu, Feng; George Dunlap
>> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
>> Cooper; Jan Beulich
>> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
>> logic
>> handling
>>
>> On Tue, 2015-09-22 at 13:25 +, Wu, Feng wrote:
>>>
>>
>>>> -Original Message-
>>>> From: George Dunlap [mailto:george.dun...@citrix.com]
>>
>>> Specifically, consider the following scheduling case happened on
>>> pCPUA:
>>> vCPUA --> idle --> vCPUB
>>>
>>> 1. First, vCPUA is running on pCPUA, so the NDST filed in PI
>>> descriptor is pCPUA
>>> 2. Then vCPUA is switched out and idle is switched in running in
>>> pCPUA
>>> 3. Sometime later, vCPUB is switched in on pCPUA. However, the NDST
>>> field
>>> for vCPUA is still pCPUA, and currently, vCPUB is running on it. That
>>> means
>>> the spurious PI interrupts for vCPUA can disturb vCPUB (because the
>>> NDST
>>> field is pCPUA), it seems not so good to me.
>>>
>> Mmm... Ok, but you're not saying what caused the transition from vCPUA
>> to idle, and from idle to vCPUB. That matters because, if this all
>> happened because blockings and wakeups, it's nothing to do with lazy
>> context switch, which is what we are discussing here (in the sense that
>> PI data structures would, in those cases, be updated appropriately in
>> block and wake hooks).
> 
> Like George mentioned before, Let's assume it is because of tasklets or
> vCPU is going to offline to wait device model's IO operations, so idle
> is switched in.
> 
>>
>> Also, if we're going from vCPUA to vCPUB, even if there's idle in
>> between, that can't be done via lazy context switch.
> 
> Yes, in the above scenario, vCPUB to idle transition has nothing to
> do with lazy context switch, however, the point here is vCPUA to
> idle is related to lazy context switch, and if we don't set the SN
> for vCPUA here, it will remain clear and the NDST field of vCPUA
> will remain pCPUA, even some time later vCPUB is running on it.
> In that case, the spurious Pi interrupts for vCPUA can disturb vCPUB.
> 
>> In fact, in this
>> case, __context_switch() must be called at some point (during the idle-
>> ->vCPUB switch, if my understanding is correct), and the hook will
>> actually get called!
> 
> Yes, the idle to vCPUB switch is covered by __context_switch(), but
> it cannot change the PI state of vCPUA at that point. Like mentioned
> above, in this case, spurious PI interrupts happens.

On the contrary, when __context_switch() is called for the idle -> vcpuB
transition in the above scenario, it is *actually* context switching
from vcpuA to vcpuB, since vcpuA is still actually on the cpu.   Which
means that if you add PI code into arch->ctxt_switch_from(), the case
you describe above will be handled automatically.

So the only downside to doing everything in block(), wake(), and
__context_switch() is that if a VM is offline, or preempted by a
tasklet, and an interrupt comes in, we will get a spurious PI (i.e., one
which interrupts us but we can't do anything useful about).

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-22 Thread George Dunlap
On 09/22/2015 02:25 PM, Wu, Feng wrote:
>>> But if we want to avoid spurious PI interrupts when running idle, then
>>> yes, we need *some* kind of a hook on the lazy context switch path.
>>>
>>> /me does some more thinking...
>>
>> To be honest, since we'll be get spurious PI interrupts in the
>> hypervisor all the time anyway, I'm inclined at the moment not to worry
>> to much about this case.
> 
> Why do you say "we'll be get spurious PI interrupts in the  hypervisor all 
> the time"?
> 
> And could you please share what is your concern to handle this case to avoid
> such spurious PI interrupts? Thanks!

So please correct me if I'm wrong in my understanding:

* When a vcpu is in runstate "running", with PI enabled, you have the PI
vector set to "posted_interrupt_vector", SN=0.

* When in this state in non-root mode, PI interrupts result in an
interrupt being delivered directly to the guest.

* When in this state in root mode, PI interrupts result in a
posted_interrupt_vector interrupt being delivered to Xen.

Is that the case?

So basically, if the PI happens to come in when the guest is making a
hypercall, or the guest is doing any other number of things that involve
the hypervisor, then Xen will get a "spurious" PI interrupt -- spurious
because there's nothing Xen actually needs to do about it; the guest
interrupt will be delivered the next time we do a VMENTER.

So spurious PI interrupts are already going to happen from time to time
-- they're not really that big of a deal.  Having them happen when a VM
is running a tasklet or idle waiting for qemu isn't such a big deal either.

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-22 Thread Dario Faggioli
On Tue, 2015-09-22 at 15:15 +0100, George Dunlap wrote:
> On 09/22/2015 02:52 PM, Wu, Feng wrote:
> > 
> > 
> > > -Original Message-
> > > From: Dario Faggioli [mailto:dario.faggi...@citrix.com]

> > > Yes, the idle to vCPUB switch is covered by __context_switch(),
> > > but
> > it cannot change the PI state of vCPUA at that point. Like
> > mentioned
> > above, in this case, spurious PI interrupts happens.
> 
> On the contrary, when __context_switch() is called for the idle ->
> vcpuB
> transition in the above scenario, it is *actually* context switching
> from vcpuA to vcpuB, since vcpuA is still actually on the cpu.  
>  Which
> means that if you add PI code into arch->ctxt_switch_from(), the case
> you describe above will be handled automatically.
> 
Yep, that's exactly what I was also writing... luckily, I've seen
George's email before getting too far with that! :-P

That's the point of lazy context switch. The context of vCPUA is still
on pCPUA during the time idle executes. If, at the next transition
point, we switch from idle to vCPUA again, then nothing is needed, and
we just saved one context saving and one context restoring operation.
If it is something else, like in your case, we need to save vCPUA's
context, which is actually what we have on pCPUA, despite idle (and not
vCPUA itself) was what was running.

> So the only downside to doing everything in block(), wake(), and
> __context_switch() is that if a VM is offline, or preempted by a
> tasklet, and an interrupt comes in, we will get a spurious PI (i.e.,
> one
> which interrupts us but we can't do anything useful about).
> 
Indeed. And that also seems bearable to me. Feng, are there reasons why
you think it's not?

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-21 Thread Wu, Feng


> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Monday, September 21, 2015 10:12 PM
> To: Wu, Feng; George Dunlap
> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
> Cooper; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On Mon, 2015-09-21 at 13:50 +, Wu, Feng wrote:
> >
> > > -Original Message-
> > > From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> 
> > > Note that, in case of preemptions, we are switching from a non-idle
> > > vcpu to another, non-idle, vcpu, so lazy context switching to the
> > > idle
> > > vcpu should not have much to do with this case...
> >
> > So do you mean in preemptions, we cannot switch from non-idle to idle
> > or
> > Idle to non-idle, i.e, we can only switch from non-idle to non-idle?
> >
> That's what I meant. It's the definition of 'preemption' and of 'idle
> task/vcpu', AFICT. I mean, the idle vcpu has the lowest possible
> priority ever, so it can't really preempt anyone.
> 
> OTOH, if the idle vcpu is running, that means there weren't any active
> vcpus because, e.g., all were blocked; therefore, any active vcpu
> wanting to run would have to wake up (and hence go throught the proper
> wake up logic) before trying to preempt idle.
> 
> There is one possible caveat: tasklets. In fact (as you can check
> yourself by looking, in the code, for tasklet_work_scheduled), it is
> possible that we force the idle vcpu to execute, even when we have
> active and runnable vcpus, to let it process tasklets. I'm not really
> sure whether this could be a problem for you or not, can you have a
> look (and/or, a try) and report back?

Thanks for your information about the tasklets part, it is very important,
consider the following scenario:

non-idle vCPUA --> idle (tasklet) --> non-idle vCPUA

The above context switch will use the lazy context switch and cannot be
covered in __context_switch(), we need to change SN's state during the
"non-idle to idle" and "idle to non-idle" transition, so that means we need
add the PI related logic in context_switch() instead of only in 
__context_switch().

Besides that, it is more robust to do the PI switch in context_switch() which
can cover lazy context switch. Maybe in future, there are some other
feature needing execute task _inside_ an idle context, and it may introduce
some issues due to no PI state transition, and it is a little hard to debug.

Thanks,
Feng

> 
> Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-21 Thread Dario Faggioli
On Mon, 2015-09-21 at 11:59 +, Wu, Feng wrote:
> 
> > -Original Message-
> > From: George Dunlap [mailto:george.dun...@citrix.com]

> > > I think the handling for lazy context switch is not only for the
> > > blocking case,
> > > we still need to do something for lazy context switch even we
> > > handled the
> > > blocking case in vcpu_block(), such as,
> > > 1. For non-idle -> idle
> > > - set 'SN'
> > 
> > If we set SN in vcpu_block(), then we don't need to set it on
> > context
> > switch -- 是不是?
> 
> For preemption case (not blocking case) , we still need to clear/set
> SN, and
> this has no business with vcpu_block()/vcpu_wake(), right? Do I miss
> something
> here? BTW, you Chinese is good! :)
> 
Well, sure, preemptions are fine being dealt with during context
switches.

AFAICR, Geoge was suggesting investigating the possibility of doing
that within the already existing arch specific part of the context
switch itself. Have you checked whether that would be possible? If yes,
it really would be great, IMO.

Note that, in case of preemptions, we are switching from a non-idle
vcpu to another, non-idle, vcpu, so lazy context switching to the idle
vcpu should not have much to do with this case... Was this something
you were saying something/asking about above? (seems so, but I can't be
sure, so I thought I better ask :-) ).

Regards,
Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-21 Thread Wu, Feng


> -Original Message-
> From: George Dunlap [mailto:george.dun...@citrix.com]
> Sent: Monday, September 21, 2015 5:19 PM
> To: Wu, Feng; Dario Faggioli
> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
> Cooper; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On 09/21/2015 06:08 AM, Wu, Feng wrote:
> >
> >
> >> -Original Message-
> >> From: George Dunlap [mailto:george.dun...@citrix.com]
> >> Sent: Thursday, September 17, 2015 5:38 PM
> >> To: Dario Faggioli; Wu, Feng
> >> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap;
> Andrew
> >> Cooper; Jan Beulich
> >> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core
> logic
> >> handling
> >>
> >> On 09/17/2015 09:48 AM, Dario Faggioli wrote:
> >>> On Thu, 2015-09-17 at 08:00 +, Wu, Feng wrote:
> >>>
> >>>>> -Original Message-
> >>>>> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> >>>
> >>>>> So, I guess, first of all, can you confirm whether or not it's exploding
> >>>>> in debug builds?
> >>>>
> >>>> Does the following information in Config.mk mean it is a debug build?
> >>>>
> >>>> # A debug build of Xen and tools?
> >>>> debug ?= y
> >>>> debug_symbols ?= $(debug)
> >>>>
> >>> I think so. But as I said in my other email, I was wrong, and this is
> >>> probably not an issue.
> >>>
> >>>>> And in either case (just tossing out ideas) would it be
> >>>>> possible to deal with the "interrupt already raised when blocking" case:
> >>>>
> >>>> Thanks for the suggestions below!
> >>>>
> >>> :-)
> >>>
> >>>>>  - later in the context switching function ?
> >>>> In this case, we might need to set a flag in vmx_pre_ctx_switch_pi()
> instead
> >>>> of calling vcpu_unblock() directly, then when it returns to
> context_switch(),
> >>>> we can check the flag and don't really block the vCPU.
> >>>>
> >>> Yeah, and that would still be rather hard to understand and maintain,
> >>> IMO.
> >>>
> >>>> But I don't have a clear
> >>>> picture about how to archive this, here are some questions from me:
> >>>> - When we are in context_switch(), we have done the following changes to
> >>>> vcpu's state:
> >>>>  * sd->curr is set to next
> >>>>  * vCPU's running state (both prev and next ) is changed by
> >>>>vcpu_runstate_change()
> >>>>  * next->is_running is set to 1
> >>>>  * periodic timer for prev is stopped
> >>>>  * periodic timer for next is setup
> >>>>  ..
> >>>>
> >>>> So what point should we perform the action to _unblock_ the vCPU? We
> >>>> Need to roll back the formal changes to the vCPU's state, right?
> >>>>
> >>> Mmm... not really. Not blocking prev does not mean that prev would be
> >>> kept running on the pCPU, and that's true for your current solution as
> >>> well! As you say yourself, you're already in the process of switching
> >>> between prev and next, at a point where it's already a thing that next
> >>> will be the vCPU that will run. Not blocking means that prev is
> >>> reinserted to the runqueue, and a new invocation to the scheduler is
> >>> (potentially) queued as well (via raising SCHEDULE_SOFTIRQ, in
> >>> __runq_tickle()), but it's only when such new scheduling happens that
> >>> prev will (potentially) be selected to run again.
> >>>
> >>> So, no, unless I'm fully missing your point, there wouldn't be no
> >>> rollback required. However, I still would like the other solution (doing
> >>> stuff in vcpu_block()) better (see below).
> >>>
> >>>>>  - with another hook, perhaps in vcpu_block() ?
> >>>>
> >>>> We could check this in vcpu_block(), however, the logic here is that 
> >>>> before
> >>>> vCPU is blocked, we need to change the posted-interrupt descriptor,
> >>>> and during changing it, if 'ON' bit is set, which means VT-d hardware
> >>>> issues a no

Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-21 Thread Wu, Feng


> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Monday, September 21, 2015 9:32 PM
> To: Wu, Feng; George Dunlap
> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
> Cooper; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On Mon, 2015-09-21 at 11:59 +, Wu, Feng wrote:
> >
> > > -Original Message-
> > > From: George Dunlap [mailto:george.dun...@citrix.com]
> 
> > > > I think the handling for lazy context switch is not only for the
> > > > blocking case,
> > > > we still need to do something for lazy context switch even we
> > > > handled the
> > > > blocking case in vcpu_block(), such as,
> > > > 1. For non-idle -> idle
> > > > - set 'SN'
> > >
> > > If we set SN in vcpu_block(), then we don't need to set it on
> > > context
> > > switch -- 是不是?
> >
> > For preemption case (not blocking case) , we still need to clear/set
> > SN, and
> > this has no business with vcpu_block()/vcpu_wake(), right? Do I miss
> > something
> > here? BTW, you Chinese is good! :)
> >
> Well, sure, preemptions are fine being dealt with during context
> switches.
> 
> AFAICR, Geoge was suggesting investigating the possibility of doing
> that within the already existing arch specific part of the context
> switch itself. Have you checked whether that would be possible? If yes,
> it really would be great, IMO.

This is George's suggestion about this:

" And at that point, could we actually get rid of the PI-specific context
switch hooks altogether, and just put the SN state changes required for
running->runnable and runnable->running in vmx_ctxt_switch_from() and
vmx_ctxt_switch_to()?"

However, vmx_ctxt_switch_from() and vmx_ctxt_switch_to() are called in
__context_switch(), which still cannot cover "non-idle -> idle" and "idle -> 
non-idle"
lazy transition. And I think we need to change SN's state during the two
transitions.

> 
> Note that, in case of preemptions, we are switching from a non-idle
> vcpu to another, non-idle, vcpu, so lazy context switching to the idle
> vcpu should not have much to do with this case... 

So do you mean in preemptions, we cannot switch from non-idle to idle or
Idle to non-idle, i.e, we can only switch from non-idle to non-idle?

Thanks,
Feng

> Was this something
> you were saying something/asking about above? (seems so, but I can't be
> sure, so I thought I better ask :-) ).



> 
> Regards,
> Dario
> 
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-21 Thread Dario Faggioli
On Mon, 2015-09-21 at 12:22 +, Wu, Feng wrote:
> 
> > -Original Message-
> > From: George Dunlap [mailto:george.dun...@citrix.com]

> > You also need to check that local_events_need_delivery() will
> > return
> > "true" if you get an interrupt between that time and entering the
> > hypervisor.  Will that happen automatically from
> > hvm_local_events_need_delivery() -> hvm_vcpu_has_pending_irq() ->
> > vlapic_has_pending_irq()?  Or will you need to add a hook in
> > hvm_vcpu_has_pending_irq()?
> 
> I think I don't need to add hook in hvm_vcpu_has_pending_irq(), what
> I need
> to do in vcpu_block() and do_poll() is as below:
> 
> 1. set_bit(_VPF_blocked, >pause_flags);
> 
> 2. ret = v->arch.arch_block(), in this hook, we can re-use the same
> logic in
> vmx_pre_ctx_switch_pi(), and check whether ON bit is set during
> updating
> posted-interrupt descriptor, can return 1 when ON is set
>
It think it would be ok for an hook to return a value (maybe, if doing
that, let's pick variable names and use comments to explain what goes
on as good as we can).

I think I also see why you seem to prefer doing it that way, rather
than hacking local_events_need_delivery(), but can you please elaborate
on that? (My feeling is that you want to avoid having to update the
data structures in between _VPF_blocked and the check
local_events_need_delivery(), and then having to roll such update back
if local_events_need_delivery() ends up being false, is that the
case?).

Code would look better, IMO, if we manage to fold that somehow inside
local_events_need_delivery(), but that:
 1. is hard to tell without actually seeing how the code will end up 
being
 2. might be my opinion only, so let's see what others think.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-21 Thread Dario Faggioli
On Mon, 2015-09-21 at 13:50 +, Wu, Feng wrote:
> 
> > -Original Message-
> > From: Dario Faggioli [mailto:dario.faggi...@citrix.com]

> > Note that, in case of preemptions, we are switching from a non-idle
> > vcpu to another, non-idle, vcpu, so lazy context switching to the
> > idle
> > vcpu should not have much to do with this case... 
> 
> So do you mean in preemptions, we cannot switch from non-idle to idle
> or
> Idle to non-idle, i.e, we can only switch from non-idle to non-idle?
> 
That's what I meant. It's the definition of 'preemption' and of 'idle
task/vcpu', AFICT. I mean, the idle vcpu has the lowest possible
priority ever, so it can't really preempt anyone.

OTOH, if the idle vcpu is running, that means there weren't any active
vcpus because, e.g., all were blocked; therefore, any active vcpu
wanting to run would have to wake up (and hence go throught the proper
wake up logic) before trying to preempt idle.

There is one possible caveat: tasklets. In fact (as you can check
yourself by looking, in the code, for tasklet_work_scheduled), it is
possible that we force the idle vcpu to execute, even when we have
active and runnable vcpus, to let it process tasklets. I'm not really
sure whether this could be a problem for you or not, can you have a
look (and/or, a try) and report back?

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-21 Thread George Dunlap
On 09/21/2015 06:09 AM, Wu, Feng wrote:
> 
> 
>> -Original Message-
>> From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of George
>> Dunlap
>> Sent: Friday, September 18, 2015 10:34 PM
>> To: Dario Faggioli
>> Cc: Jan Beulich; George Dunlap; Tian, Kevin; Keir Fraser; Andrew Cooper;
>> xen-devel@lists.xen.org; Wu, Feng
>> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
>> logic
>> handling
>>
>> On Fri, Sep 18, 2015 at 3:31 PM, George Dunlap
>> <george.dun...@eu.citrix.com> wrote:
>>>> As said, me too. Perhaps we can go for option 1, which is simpler,
>>>> cleaner and more consistent, considering the current status of the
>>>> code. We can always investigate, in future, whether and how to
>>>> implement the optimization for all the blockings, if beneficial and fea
>>>> sible, or have them diverge, if deemed worthwhile.
>>>
>>> Sounds like a plan.
>>
>> Er, just in case that idiom wasn't clear: Option 1 sounds like a
>> *good* plan, so unless Feng disagrees, let's go with that. :-)
> 
> Sorry for the late response, I was on leave last Friday.
> 
> Thanks for your discussions and suggestions. I have one question about option 
> 1.
> I find that there are two places where '_VPF_blocked' can get set: 
> vcpu_block()
> and do_poll(). After putting the logic in vcpu_block(), do we need to care 
> about
> do_poll(). I don't know the purpose of do_poll() and the usage case of it.
> Dario/George, could you please share some knowledge about it? Thanks a lot!

Yes, you'll need to make the callback everywhere _VPF_blocked is set.

Normally you'd want to try to refactor both of those to share a commmon
codepath, but it looks like there are specific reasons why they have to
be different codepaths; so you'll just have to make the callback in both
places (after setting VPF_blocked).

You also need to check that local_events_need_delivery() will return
"true" if you get an interrupt between that time and entering the
hypervisor.  Will that happen automatically from
hvm_local_events_need_delivery() -> hvm_vcpu_has_pending_irq() ->
vlapic_has_pending_irq()?  Or will you need to add a hook in
hvm_vcpu_has_pending_irq()?

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-21 Thread Wu, Feng


> -Original Message-
> From: George Dunlap [mailto:george.dun...@citrix.com]
> Sent: Monday, September 21, 2015 5:54 PM
> To: Wu, Feng; George Dunlap; Dario Faggioli
> Cc: Jan Beulich; Tian, Kevin; Keir Fraser; Andrew Cooper;
> xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On 09/21/2015 06:09 AM, Wu, Feng wrote:
> >
> >
> >> -Original Message-
> >> From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of
> George
> >> Dunlap
> >> Sent: Friday, September 18, 2015 10:34 PM
> >> To: Dario Faggioli
> >> Cc: Jan Beulich; George Dunlap; Tian, Kevin; Keir Fraser; Andrew Cooper;
> >> xen-devel@lists.xen.org; Wu, Feng
> >> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core
> logic
> >> handling
> >>
> >> On Fri, Sep 18, 2015 at 3:31 PM, George Dunlap
> >> <george.dun...@eu.citrix.com> wrote:
> >>>> As said, me too. Perhaps we can go for option 1, which is simpler,
> >>>> cleaner and more consistent, considering the current status of the
> >>>> code. We can always investigate, in future, whether and how to
> >>>> implement the optimization for all the blockings, if beneficial and fea
> >>>> sible, or have them diverge, if deemed worthwhile.
> >>>
> >>> Sounds like a plan.
> >>
> >> Er, just in case that idiom wasn't clear: Option 1 sounds like a
> >> *good* plan, so unless Feng disagrees, let's go with that. :-)
> >
> > Sorry for the late response, I was on leave last Friday.
> >
> > Thanks for your discussions and suggestions. I have one question about 
> > option
> 1.
> > I find that there are two places where '_VPF_blocked' can get set:
> vcpu_block()
> > and do_poll(). After putting the logic in vcpu_block(), do we need to care
> about
> > do_poll(). I don't know the purpose of do_poll() and the usage case of it.
> > Dario/George, could you please share some knowledge about it? Thanks a lot!
> 
> Yes, you'll need to make the callback everywhere _VPF_blocked is set.
> 
> Normally you'd want to try to refactor both of those to share a commmon
> codepath, but it looks like there are specific reasons why they have to
> be different codepaths; so you'll just have to make the callback in both
> places (after setting VPF_blocked).
> 
> You also need to check that local_events_need_delivery() will return
> "true" if you get an interrupt between that time and entering the
> hypervisor.  Will that happen automatically from
> hvm_local_events_need_delivery() -> hvm_vcpu_has_pending_irq() ->
> vlapic_has_pending_irq()?  Or will you need to add a hook in
> hvm_vcpu_has_pending_irq()?

I think I don't need to add hook in hvm_vcpu_has_pending_irq(), what I need
to do in vcpu_block() and do_poll() is as below:

1. set_bit(_VPF_blocked, >pause_flags);

2. ret = v->arch.arch_block(), in this hook, we can re-use the same logic in
vmx_pre_ctx_switch_pi(), and check whether ON bit is set during updating
posted-interrupt descriptor, can return 1 when ON is set, something like the
following:

do {
old.control = new.control = pi_desc->control;

/* Should not block the vCPU if an interrupt was posted for it. */
if ( pi_test_on() )
{
spin_unlock_irqrestore(>arch.hvm_vmx.pi_lock, flags);
return 1;
}

/*
 * Change the 'NDST' field to v->arch.hvm_vmx.pi_block_cpu,
 * so when external interrupts from assigned deivces happen,
 * wakeup notifiction event will go to
 * v->arch.hvm_vmx.pi_block_cpu, then in pi_wakeup_interrupt()
 * we can find the vCPU in the right list to wake up.
 */
dest = cpu_physical_id(v->arch.hvm_vmx.pi_block_cpu);

if ( x2apic_enabled )
new.ndst = dest;
else
new.ndst = MASK_INSR(dest, PI_xAPIC_NDST_MASK);

pi_clear_sn();
new.nv = pi_wakeup_vector;
} while ( cmpxchg(_desc->control, old.control, new.control) !=
  old.control );

3. After returning from arch_block() we can simple check:
if ( ret || local_events_need_delivery() )
Don't block the vCPU;

Thanks,
Feng


> 
>  -George
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-21 Thread George Dunlap
On 09/21/2015 06:08 AM, Wu, Feng wrote:
> 
> 
>> -Original Message-
>> From: George Dunlap [mailto:george.dun...@citrix.com]
>> Sent: Thursday, September 17, 2015 5:38 PM
>> To: Dario Faggioli; Wu, Feng
>> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
>> Cooper; Jan Beulich
>> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
>> logic
>> handling
>>
>> On 09/17/2015 09:48 AM, Dario Faggioli wrote:
>>> On Thu, 2015-09-17 at 08:00 +, Wu, Feng wrote:
>>>
>>>>> -Original Message-
>>>>> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
>>>
>>>>> So, I guess, first of all, can you confirm whether or not it's exploding
>>>>> in debug builds?
>>>>
>>>> Does the following information in Config.mk mean it is a debug build?
>>>>
>>>> # A debug build of Xen and tools?
>>>> debug ?= y
>>>> debug_symbols ?= $(debug)
>>>>
>>> I think so. But as I said in my other email, I was wrong, and this is
>>> probably not an issue.
>>>
>>>>> And in either case (just tossing out ideas) would it be
>>>>> possible to deal with the "interrupt already raised when blocking" case:
>>>>
>>>> Thanks for the suggestions below!
>>>>
>>> :-)
>>>
>>>>>  - later in the context switching function ?
>>>> In this case, we might need to set a flag in vmx_pre_ctx_switch_pi() 
>>>> instead
>>>> of calling vcpu_unblock() directly, then when it returns to 
>>>> context_switch(),
>>>> we can check the flag and don't really block the vCPU.
>>>>
>>> Yeah, and that would still be rather hard to understand and maintain,
>>> IMO.
>>>
>>>> But I don't have a clear
>>>> picture about how to archive this, here are some questions from me:
>>>> - When we are in context_switch(), we have done the following changes to
>>>> vcpu's state:
>>>>* sd->curr is set to next
>>>>* vCPU's running state (both prev and next ) is changed by
>>>>  vcpu_runstate_change()
>>>>* next->is_running is set to 1
>>>>* periodic timer for prev is stopped
>>>>* periodic timer for next is setup
>>>>..
>>>>
>>>> So what point should we perform the action to _unblock_ the vCPU? We
>>>> Need to roll back the formal changes to the vCPU's state, right?
>>>>
>>> Mmm... not really. Not blocking prev does not mean that prev would be
>>> kept running on the pCPU, and that's true for your current solution as
>>> well! As you say yourself, you're already in the process of switching
>>> between prev and next, at a point where it's already a thing that next
>>> will be the vCPU that will run. Not blocking means that prev is
>>> reinserted to the runqueue, and a new invocation to the scheduler is
>>> (potentially) queued as well (via raising SCHEDULE_SOFTIRQ, in
>>> __runq_tickle()), but it's only when such new scheduling happens that
>>> prev will (potentially) be selected to run again.
>>>
>>> So, no, unless I'm fully missing your point, there wouldn't be no
>>> rollback required. However, I still would like the other solution (doing
>>> stuff in vcpu_block()) better (see below).
>>>
>>>>>  - with another hook, perhaps in vcpu_block() ?
>>>>
>>>> We could check this in vcpu_block(), however, the logic here is that before
>>>> vCPU is blocked, we need to change the posted-interrupt descriptor,
>>>> and during changing it, if 'ON' bit is set, which means VT-d hardware
>>>> issues a notification event because interrupts from the assigned devices
>>>> is coming, we don't need to block the vCPU and hence no need to update
>>>> the PI descriptor in this case.
>>>>
>>> Yep, I saw that. But could it be possible to do *everything* related to
>>> blocking, including the update of the descriptor, in vcpu_block(), if no
>>> interrupt have been raised yet at that time? I mean, would you, if
>>> updating the descriptor in there, still get the event that allows you to
>>> call vcpu_wake(), and hence vmx_vcpu_wake_prepare(), which would undo
>>> the blocking, no matter whether that resulted in an actual context
>>> switch already or not?
>>>
>

Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-20 Thread Wu, Feng


> -Original Message-
> From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of George
> Dunlap
> Sent: Friday, September 18, 2015 10:34 PM
> To: Dario Faggioli
> Cc: Jan Beulich; George Dunlap; Tian, Kevin; Keir Fraser; Andrew Cooper;
> xen-devel@lists.xen.org; Wu, Feng
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On Fri, Sep 18, 2015 at 3:31 PM, George Dunlap
> <george.dun...@eu.citrix.com> wrote:
> >> As said, me too. Perhaps we can go for option 1, which is simpler,
> >> cleaner and more consistent, considering the current status of the
> >> code. We can always investigate, in future, whether and how to
> >> implement the optimization for all the blockings, if beneficial and fea
> >> sible, or have them diverge, if deemed worthwhile.
> >
> > Sounds like a plan.
> 
> Er, just in case that idiom wasn't clear: Option 1 sounds like a
> *good* plan, so unless Feng disagrees, let's go with that. :-)

Sorry for the late response, I was on leave last Friday.

Thanks for your discussions and suggestions. I have one question about option 1.
I find that there are two places where '_VPF_blocked' can get set: vcpu_block()
and do_poll(). After putting the logic in vcpu_block(), do we need to care about
do_poll(). I don't know the purpose of do_poll() and the usage case of it.
Dario/George, could you please share some knowledge about it? Thanks a lot!

Thanks,
Feng


> 
>  -George
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-18 Thread Jan Beulich
>>> George Dunlap  09/17/15 4:30 PM >>>
>On 09/17/2015 01:40 PM, Dario Faggioli wrote:
>So one option is to do the "blocking" stuff in an arch-specific call
>from vcpu_block():
>
>vcpu_block()
 >set(_VPF_blocked)
 >v->arch.block()
  >- Add v to pcpu.pi_blocked_vcpu
  >- NV => pi_wakeup_vector
 >local_events_need_delivery()
   >hvm_vcpu_has_pending_irq()
>
 >...
 >context_switch(): nothing
>
>The other is to do the "blocking" stuff in the context switch (similar
>to what's done now):
>
>vcpu_block()
  >set(_VPF_blocked)
  >local_events_need_delivery()
>hvm_vcpu_has_pending_irq()
  >...
>context_switch
   >v->arch.block()
>- Add v to pcpu.pi_blocked_vcpu
>- NV => pi_wakeup_vector
>
>If we do it the first way, and an interrupt comes in before the context
>switch is finished, it will call pi_wakeup_vector, which will DTRT --
>take v off the pi_blocked_vcpu list and call vcpu_unblock() (which in
>turn will set NV back to posted_intr_vector).
>
>If we do it the second way, and an interrupt comes in before the context
>switch is finished, it will call posted_intr_vector.  We can, at that
>point, check to see if the current vcpu is marked as blocked.  If it is,
>we can call vcpu_unblock() without having to modify NV or worry about
>adding / removing the vcpu from the pi_blocked_vcpu list.
>
>The thing I like about the first one is that it makes PI blocking the
>same as normal blocking -- everything happens in the same place; so the
>code is cleaner and easier to understand.
>
>The thing I like about the second one is that it cleverly avoids having
>to do all the work of adding the vcpu to the list and then searching to
>remove it if the vcpu in question gets woken up on the way to being
>blocked.  So the code may end up being faster for workloads where that
>happens frequently.

 But wouldn't such an optimization argument apply to some/all other
blocking ways too? I.e. shouldn't, if we were to go that route, other
early wakeups get treated equally? Unless that's wrong thinking of mine
or being implemented that way, I'd clearly favor option 1 for consistency
reasons.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-18 Thread Dario Faggioli
On Fri, 2015-09-18 at 00:27 -0600, Jan Beulich wrote:
> > George Dunlap  09/17/15 4:30 PM >>>The

> > > > vcpu_block()
>  >set(_VPF_blocked)
>  >v->arch.block()
>   >- Add v to pcpu.pi_blocked_vcpu
>   >- NV => pi_wakeup_vector
>  >local_events_need_delivery()
>>hvm_vcpu_has_pending_irq()
> > 
>  >...
>  >context_switch(): nothing
> > 
> > The other is to do the "blocking" stuff in the context switch
> > (similar
> > to what's done now):
> > 
> > vcpu_block()
>   >set(_VPF_blocked)
>   >local_events_need_delivery()
> >hvm_vcpu_has_pending_irq()
>   >...
> > context_switch
>>v->arch.block()
> >- Add v to pcpu.pi_blocked_vcpu
> >- NV => pi_wakeup_vector
> >
> > [...]
> >
> > > > thing I like about the first one is that it makes PI blocking
> > > > the
> > same as normal blocking -- everything happens in the same place; so
> > the
> > code is cleaner and easier to understand.
> > 
> > The thing I like about the second one is that it cleverly avoids
> > having
> > to do all the work of adding the vcpu to the list and then
> > searching to
> > remove it if the vcpu in question gets woken up on the way to being
> > blocked.  So the code may end up being faster for workloads where
> > that
> > happens frequently.
> 
>  But wouldn't such an optimization argument apply to some/all other
> blocking ways too? I.e. shouldn't, if we were to go that route, other
> early wakeups get treated equally? 
>
Good point, actually.

However, with "regular blockings" this would probably be less of an
optimization.

In fact, in PI case, George's solution 2 is, potentially, avoiding
having to switch the descriptor and manipulating the list of blocked
vcpus.
In regular blockings there are no such things. We may be able to avoid
a context switch, but that also depends, AFAICT, on when the interrupt
exactly happens and/or is notified (i.e., before the call to schedule()
as opposed to after that and before the actual __context_switch()). I'm
not saying this wouldn't be better, but it's certainly optimizing less
than in the PI case.

> Unless that's wrong thinking of mine
> or being implemented that way, I'd clearly favor option 1 for
> consistency
> reasons.
> 
As said, me too. Perhaps we can go for option 1, which is simpler,
cleaner and more consistent, considering the current status of the
code. We can always investigate, in future, whether and how to
implement the optimization for all the blockings, if beneficial and fea
sible, or have them diverge, if deemed worthwhile.

Regards,
Dario
--
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-18 Thread George Dunlap
On Fri, Sep 18, 2015 at 3:31 PM, George Dunlap
 wrote:
>> As said, me too. Perhaps we can go for option 1, which is simpler,
>> cleaner and more consistent, considering the current status of the
>> code. We can always investigate, in future, whether and how to
>> implement the optimization for all the blockings, if beneficial and fea
>> sible, or have them diverge, if deemed worthwhile.
>
> Sounds like a plan.

Er, just in case that idiom wasn't clear: Option 1 sounds like a
*good* plan, so unless Feng disagrees, let's go with that. :-)

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-18 Thread George Dunlap
On Fri, Sep 18, 2015 at 10:22 AM, Dario Faggioli
 wrote:
> On Fri, 2015-09-18 at 00:27 -0600, Jan Beulich wrote:
>> > George Dunlap  09/17/15 4:30 PM >>>The
>
>> > > > vcpu_block()
>>  >set(_VPF_blocked)
>>  >v->arch.block()
>>   >- Add v to pcpu.pi_blocked_vcpu
>>   >- NV => pi_wakeup_vector
>>  >local_events_need_delivery()
>>>hvm_vcpu_has_pending_irq()
>> >
>>  >...
>>  >context_switch(): nothing
>> >
>> > The other is to do the "blocking" stuff in the context switch
>> > (similar
>> > to what's done now):
>> >
>> > vcpu_block()
>>   >set(_VPF_blocked)
>>   >local_events_need_delivery()
>> >hvm_vcpu_has_pending_irq()
>>   >...
>> > context_switch
>>>v->arch.block()
>> >- Add v to pcpu.pi_blocked_vcpu
>> >- NV => pi_wakeup_vector
>> >
>> > [...]
>> >
>> > > > thing I like about the first one is that it makes PI blocking
>> > > > the
>> > same as normal blocking -- everything happens in the same place; so
>> > the
>> > code is cleaner and easier to understand.
>> >
>> > The thing I like about the second one is that it cleverly avoids
>> > having
>> > to do all the work of adding the vcpu to the list and then
>> > searching to
>> > remove it if the vcpu in question gets woken up on the way to being
>> > blocked.  So the code may end up being faster for workloads where
>> > that
>> > happens frequently.
>>
>>  But wouldn't such an optimization argument apply to some/all other
>> blocking ways too? I.e. shouldn't, if we were to go that route, other
>> early wakeups get treated equally?
>>
> Good point, actually.
>
> However, with "regular blockings" this would probably be less of an
> optimization.
>
> In fact, in PI case, George's solution 2 is, potentially, avoiding
> having to switch the descriptor and manipulating the list of blocked
> vcpus.
> In regular blockings there are no such things. We may be able to avoid
> a context switch, but that also depends, AFAICT, on when the interrupt
> exactly happens and/or is notified (i.e., before the call to schedule()
> as opposed to after that and before the actual __context_switch()). I'm
> not saying this wouldn't be better, but it's certainly optimizing less
> than in the PI case.
>
>> Unless that's wrong thinking of mine
>> or being implemented that way, I'd clearly favor option 1 for
>> consistency
>> reasons.
>>
> As said, me too. Perhaps we can go for option 1, which is simpler,
> cleaner and more consistent, considering the current status of the
> code. We can always investigate, in future, whether and how to
> implement the optimization for all the blockings, if beneficial and fea
> sible, or have them diverge, if deemed worthwhile.

Sounds like a plan.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-17 Thread Wu, Feng


> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Thursday, September 17, 2015 1:18 AM
> To: Wu, Feng
> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
> Cooper; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On Fri, 2015-09-11 at 16:29 +0800, Feng Wu wrote:
> > This patch includes the following aspects:
> > - Handling logic when vCPU is blocked:
> > * Add a global vector to wake up the blocked vCPU
> >   when an interrupt is being posted to it (This part
> >   was sugguested by Yang Zhang <yang.z.zh...@intel.com>).
> > * Define two per-cpu variables:
> >   1. pi_blocked_vcpu:
> > A list storing the vCPUs which were blocked
> > on this pCPU.
> >
> >   2. pi_blocked_vcpu_lock:
> > The spinlock to protect pi_blocked_vcpu.
> >
> > - Add some scheduler hooks, this part was suggested
> >   by Dario Faggioli <dario.faggi...@citrix.com>.
> > * vmx_pre_ctx_switch_pi()
> >   It is called before context switch, we update the
> >   posted interrupt descriptor when the vCPU is preempted,
> >   go to sleep, or is blocked.
> >
> > * vmx_post_ctx_switch_pi()
> >   It is called after context switch, we update the posted
> >   interrupt descriptor when the vCPU is going to run.
> >
> > * arch_vcpu_wake_prepare()
> >   It will be called when waking up the vCPU, we update
> >   the posted interrupt descriptor when the vCPU is
> >   unblocked.
> >
> > CC: Keir Fraser <k...@xen.org>
> > CC: Jan Beulich <jbeul...@suse.com>
> > CC: Andrew Cooper <andrew.coop...@citrix.com>
> > CC: Kevin Tian <kevin.t...@intel.com>
> > CC: George Dunlap <george.dun...@eu.citrix.com>
> > CC: Dario Faggioli <dario.faggi...@citrix.com>
> > Sugguested-by: Dario Faggioli <dario.faggi...@citrix.com>
> > Signed-off-by: Feng Wu <feng...@intel.com>
> > Reviewed-by: Dario Faggioli <dario.faggi...@citrix.com>
> > ---
> > v7:
> > - Merge [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted
> interrupts
> >   and "[PATCH v6 14/18] vmx: posted-interrupt handling when vCPU is
> blocked"
> >   into this patch, so it is self-contained and more convenient
> >   for code review.
> > - Make 'pi_blocked_vcpu' and 'pi_blocked_vcpu_lock' static
> > - Coding style
> > - Use per_cpu() instead of this_cpu() in pi_wakeup_interrupt()
> > - Move ack_APIC_irq() to the beginning of pi_wakeup_interrupt()
> > - Rename 'pi_ctxt_switch_from' to 'ctxt_switch_prepare'
> > - Rename 'pi_ctxt_switch_to' to 'ctxt_switch_cancel'
> > - Use 'has_hvm_container_vcpu' instead of 'is_hvm_vcpu'
> > - Use 'spin_lock' and 'spin_unlock' when the interrupt has been
> >   already disabled.
> > - Rename arch_vcpu_wake_prepare to vmx_vcpu_wake_prepare
> > - Define vmx_vcpu_wake_prepare in xen/arch/x86/hvm/hvm.c
> > - Call .pi_ctxt_switch_to() __context_switch() instead of directly
> >   calling vmx_post_ctx_switch_pi() in vmx_ctxt_switch_to()
> > - Make .pi_block_cpu unsigned int
> > - Use list_del() instead of list_del_init()
> > - Coding style
> >
> > One remaining item:
> > Jan has concern about calling vcpu_unblock() in vmx_pre_ctx_switch_pi(),
> > need Dario or George's input about this.
> >
> Hi,
> 
> Sorry for the delay in replying, I was on PTO for a few time.

That's fine. Thanks for your reply!

> 
> Coming to the issue, well, it's a though call.
> 
> First of all, Feng, have you tested this with a debug build of Xen? I'm
> asking because it looks to me that you're ending up calling vcpu_wake()
> with IRQ disabled which, if my brain is not too rusty after a few weeks
> of vacation, should result in check_lock() (in xen/common/spinlock.c)
> complaining, doesn't it?
> 
> In fact, in principle this is not too much different from what happens
> in other places. More specifically, what we have is a vcpu being
> re-inserted in  a runqueue, and the need for re-running the scheduler on
> a(some) PCPU(s) is evaluated. That is similar to what happens in Credit2
> (and in RTDS) in csched2_context_saved(), which is called from within
> context_saved(), still from the context switch code (if
> __CSFLAG_delayed_runq_add is true).
> 
> So it's not the thing per se that is that terrible, IMO. The differences
> between that and your case are:
>  - in the Credit2 case, it happens

Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-17 Thread Wu, Feng


> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Thursday, September 17, 2015 4:48 PM
> To: Wu, Feng
> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
> Cooper; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On Thu, 2015-09-17 at 08:00 +, Wu, Feng wrote:
> 
> > > -Original Message-
> > > From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> 
> > > So, I guess, first of all, can you confirm whether or not it's exploding
> > > in debug builds?
> >
> > Does the following information in Config.mk mean it is a debug build?
> >
> > # A debug build of Xen and tools?
> > debug ?= y
> > debug_symbols ?= $(debug)
> >
> I think so. But as I said in my other email, I was wrong, and this is
> probably not an issue.
> 
> > > And in either case (just tossing out ideas) would it be
> > > possible to deal with the "interrupt already raised when blocking" case:
> >
> > Thanks for the suggestions below!
> >
> :-)
> 
> > >  - later in the context switching function ?
> > In this case, we might need to set a flag in vmx_pre_ctx_switch_pi() instead
> > of calling vcpu_unblock() directly, then when it returns to 
> > context_switch(),
> > we can check the flag and don't really block the vCPU.
> >
> Yeah, and that would still be rather hard to understand and maintain,
> IMO.
> 
> > But I don't have a clear
> > picture about how to archive this, here are some questions from me:
> > - When we are in context_switch(), we have done the following changes to
> > vcpu's state:
> > * sd->curr is set to next
> > * vCPU's running state (both prev and next ) is changed by
> >   vcpu_runstate_change()
> > * next->is_running is set to 1
> > * periodic timer for prev is stopped
> > * periodic timer for next is setup
> > ..
> >
> > So what point should we perform the action to _unblock_ the vCPU? We
> > Need to roll back the formal changes to the vCPU's state, right?
> >
> Mmm... not really. Not blocking prev does not mean that prev would be
> kept running on the pCPU, and that's true for your current solution as
> well! As you say yourself, you're already in the process of switching
> between prev and next, at a point where it's already a thing that next
> will be the vCPU that will run. Not blocking means that prev is
> reinserted to the runqueue, and a new invocation to the scheduler is
> (potentially) queued as well (via raising SCHEDULE_SOFTIRQ, in
> __runq_tickle()), but it's only when such new scheduling happens that
> prev will (potentially) be selected to run again.
> 
> So, no, unless I'm fully missing your point, there wouldn't be no
> rollback required. However, I still would like the other solution (doing
> stuff in vcpu_block()) better (see below).

Thanks for the detailed clarification. Yes, maybe my description above
is a little vague. Yes, the non-blocking vCPU should be put into the
runqueue. I shouldn't use the term "roll back". :)

> 
> > >  - with another hook, perhaps in vcpu_block() ?
> >
> > We could check this in vcpu_block(), however, the logic here is that before
> > vCPU is blocked, we need to change the posted-interrupt descriptor,
> > and during changing it, if 'ON' bit is set, which means VT-d hardware
> > issues a notification event because interrupts from the assigned devices
> > is coming, we don't need to block the vCPU and hence no need to update
> > the PI descriptor in this case.
> >
> Yep, I saw that. But could it be possible to do *everything* related to
> blocking, including the update of the descriptor, in vcpu_block(), if no
> interrupt have been raised yet at that time? I mean, would you, if
> updating the descriptor in there, still get the event that allows you to
> call vcpu_wake(), and hence vmx_vcpu_wake_prepare(), which would undo
> the blocking, no matter whether that resulted in an actual context
> switch already or not?
> 
> I appreciate that this narrows the window for such an event to happen by
> quite a bit, making the logic itself a little less useful (it makes
> things more similar to "regular" blocking vs. event delivery, though,
> AFAICT), but if it's correct, ad if it allows us to save the ugly
> invocation of vcpu_unblock from context switch context, I'd give it a
> try.
> 
> After all, this PI thing requires actions to be taken when a vCPU is
> scheduled or descheduled because of blocking, unblocking and
&g

Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-17 Thread George Dunlap
On 09/17/2015 10:38 AM, George Dunlap wrote:
> Is it the case that the interrupt is not actually delivered to the
> processor, but that the pending bit will be set in the pi field, so that
> the interrupt will be delivered the next time the hypervisor returns
> into the guest?
> 
> (I am assuming that is the case, because if the hypervisor *does* get an
> interrupt, then it can just unblock it there.)

Actually, it looks like you *do* in fact get a
pi_notification_interrupt() in this case.  Could we to check to see if
the current vcpu is blocked and unblock it?

I haven't yet decided whether I prefer my original suggestion of
switching the interrupt and putting things on the wake-up list in
vcpu_block(), or of deferring adding things to the wake-up list until
the actual context switch.

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-17 Thread George Dunlap
On 09/17/2015 10:38 AM, George Dunlap wrote:
> On 09/17/2015 09:48 AM, Dario Faggioli wrote:
>> On Thu, 2015-09-17 at 08:00 +, Wu, Feng wrote:
>>
 -Original Message-
 From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
>>
 So, I guess, first of all, can you confirm whether or not it's exploding
 in debug builds?
>>>
>>> Does the following information in Config.mk mean it is a debug build?
>>>
>>> # A debug build of Xen and tools?
>>> debug ?= y
>>> debug_symbols ?= $(debug)
>>>
>> I think so. But as I said in my other email, I was wrong, and this is
>> probably not an issue.
>>
 And in either case (just tossing out ideas) would it be
 possible to deal with the "interrupt already raised when blocking" case:
>>>
>>> Thanks for the suggestions below!
>>>
>> :-)
>>
  - later in the context switching function ?
>>> In this case, we might need to set a flag in vmx_pre_ctx_switch_pi() instead
>>> of calling vcpu_unblock() directly, then when it returns to 
>>> context_switch(),
>>> we can check the flag and don't really block the vCPU. 
>>>
>> Yeah, and that would still be rather hard to understand and maintain,
>> IMO.
>>
>>> But I don't have a clear
>>> picture about how to archive this, here are some questions from me:
>>> - When we are in context_switch(), we have done the following changes to
>>> vcpu's state:
>>> * sd->curr is set to next
>>> * vCPU's running state (both prev and next ) is changed by
>>>   vcpu_runstate_change()
>>> * next->is_running is set to 1
>>> * periodic timer for prev is stopped
>>> * periodic timer for next is setup
>>> ..
>>>
>>> So what point should we perform the action to _unblock_ the vCPU? We
>>> Need to roll back the formal changes to the vCPU's state, right?
>>>
>> Mmm... not really. Not blocking prev does not mean that prev would be
>> kept running on the pCPU, and that's true for your current solution as
>> well! As you say yourself, you're already in the process of switching
>> between prev and next, at a point where it's already a thing that next
>> will be the vCPU that will run. Not blocking means that prev is
>> reinserted to the runqueue, and a new invocation to the scheduler is
>> (potentially) queued as well (via raising SCHEDULE_SOFTIRQ, in
>> __runq_tickle()), but it's only when such new scheduling happens that
>> prev will (potentially) be selected to run again.
>>
>> So, no, unless I'm fully missing your point, there wouldn't be no
>> rollback required. However, I still would like the other solution (doing
>> stuff in vcpu_block()) better (see below).
>>
  - with another hook, perhaps in vcpu_block() ?
>>>
>>> We could check this in vcpu_block(), however, the logic here is that before
>>> vCPU is blocked, we need to change the posted-interrupt descriptor,
>>> and during changing it, if 'ON' bit is set, which means VT-d hardware
>>> issues a notification event because interrupts from the assigned devices
>>> is coming, we don't need to block the vCPU and hence no need to update
>>> the PI descriptor in this case. 
>>>
>> Yep, I saw that. But could it be possible to do *everything* related to
>> blocking, including the update of the descriptor, in vcpu_block(), if no
>> interrupt have been raised yet at that time? I mean, would you, if
>> updating the descriptor in there, still get the event that allows you to
>> call vcpu_wake(), and hence vmx_vcpu_wake_prepare(), which would undo
>> the blocking, no matter whether that resulted in an actual context
>> switch already or not?
>>
>> I appreciate that this narrows the window for such an event to happen by
>> quite a bit, making the logic itself a little less useful (it makes
>> things more similar to "regular" blocking vs. event delivery, though,
>> AFAICT), but if it's correct, ad if it allows us to save the ugly
>> invocation of vcpu_unblock from context switch context, I'd give it a
>> try.
>>
>> After all, this PI thing requires actions to be taken when a vCPU is
>> scheduled or descheduled because of blocking, unblocking and
>> preemptions, and it would seem natural to me to:
>>  - deal with blockings in vcpu_block()
>>  - deal with unblockings in vcpu_wake()
>>  - deal with preemptions in context_switch()
>>
>> This does not mean being able to consolidate some of the cases (like
>> blockings and preemptions, in the current version of the code) were not
>> a nice thing... But we don't want it at all costs . :-)
> 
> So just to clarify the situation...

Er, and to clarify something else -- Technically I'm responding to Dario
here, but my mail is actually addressed to Wu Feng.  This was just a
good point to "put my oar in" to the conversation. :-)

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-17 Thread Dario Faggioli
On Thu, 2015-09-17 at 08:00 +, Wu, Feng wrote:

> > -Original Message-
> > From: Dario Faggioli [mailto:dario.faggi...@citrix.com]

> > So, I guess, first of all, can you confirm whether or not it's exploding
> > in debug builds?
> 
> Does the following information in Config.mk mean it is a debug build?
> 
> # A debug build of Xen and tools?
> debug ?= y
> debug_symbols ?= $(debug)
> 
I think so. But as I said in my other email, I was wrong, and this is
probably not an issue.

> > And in either case (just tossing out ideas) would it be
> > possible to deal with the "interrupt already raised when blocking" case:
> 
> Thanks for the suggestions below!
> 
:-)

> >  - later in the context switching function ?
> In this case, we might need to set a flag in vmx_pre_ctx_switch_pi() instead
> of calling vcpu_unblock() directly, then when it returns to context_switch(),
> we can check the flag and don't really block the vCPU. 
>
Yeah, and that would still be rather hard to understand and maintain,
IMO.

> But I don't have a clear
> picture about how to archive this, here are some questions from me:
> - When we are in context_switch(), we have done the following changes to
> vcpu's state:
>   * sd->curr is set to next
>   * vCPU's running state (both prev and next ) is changed by
> vcpu_runstate_change()
>   * next->is_running is set to 1
>   * periodic timer for prev is stopped
>   * periodic timer for next is setup
>   ..
> 
> So what point should we perform the action to _unblock_ the vCPU? We
> Need to roll back the formal changes to the vCPU's state, right?
> 
Mmm... not really. Not blocking prev does not mean that prev would be
kept running on the pCPU, and that's true for your current solution as
well! As you say yourself, you're already in the process of switching
between prev and next, at a point where it's already a thing that next
will be the vCPU that will run. Not blocking means that prev is
reinserted to the runqueue, and a new invocation to the scheduler is
(potentially) queued as well (via raising SCHEDULE_SOFTIRQ, in
__runq_tickle()), but it's only when such new scheduling happens that
prev will (potentially) be selected to run again.

So, no, unless I'm fully missing your point, there wouldn't be no
rollback required. However, I still would like the other solution (doing
stuff in vcpu_block()) better (see below).

> >  - with another hook, perhaps in vcpu_block() ?
> 
> We could check this in vcpu_block(), however, the logic here is that before
> vCPU is blocked, we need to change the posted-interrupt descriptor,
> and during changing it, if 'ON' bit is set, which means VT-d hardware
> issues a notification event because interrupts from the assigned devices
> is coming, we don't need to block the vCPU and hence no need to update
> the PI descriptor in this case. 
>
Yep, I saw that. But could it be possible to do *everything* related to
blocking, including the update of the descriptor, in vcpu_block(), if no
interrupt have been raised yet at that time? I mean, would you, if
updating the descriptor in there, still get the event that allows you to
call vcpu_wake(), and hence vmx_vcpu_wake_prepare(), which would undo
the blocking, no matter whether that resulted in an actual context
switch already or not?

I appreciate that this narrows the window for such an event to happen by
quite a bit, making the logic itself a little less useful (it makes
things more similar to "regular" blocking vs. event delivery, though,
AFAICT), but if it's correct, ad if it allows us to save the ugly
invocation of vcpu_unblock from context switch context, I'd give it a
try.

After all, this PI thing requires actions to be taken when a vCPU is
scheduled or descheduled because of blocking, unblocking and
preemptions, and it would seem natural to me to:
 - deal with blockings in vcpu_block()
 - deal with unblockings in vcpu_wake()
 - deal with preemptions in context_switch()

This does not mean being able to consolidate some of the cases (like
blockings and preemptions, in the current version of the code) were not
a nice thing... But we don't want it at all costs . :-)

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-17 Thread George Dunlap
On 09/17/2015 09:48 AM, Dario Faggioli wrote:
> On Thu, 2015-09-17 at 08:00 +, Wu, Feng wrote:
> 
>>> -Original Message-
>>> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> 
>>> So, I guess, first of all, can you confirm whether or not it's exploding
>>> in debug builds?
>>
>> Does the following information in Config.mk mean it is a debug build?
>>
>> # A debug build of Xen and tools?
>> debug ?= y
>> debug_symbols ?= $(debug)
>>
> I think so. But as I said in my other email, I was wrong, and this is
> probably not an issue.
> 
>>> And in either case (just tossing out ideas) would it be
>>> possible to deal with the "interrupt already raised when blocking" case:
>>
>> Thanks for the suggestions below!
>>
> :-)
> 
>>>  - later in the context switching function ?
>> In this case, we might need to set a flag in vmx_pre_ctx_switch_pi() instead
>> of calling vcpu_unblock() directly, then when it returns to context_switch(),
>> we can check the flag and don't really block the vCPU. 
>>
> Yeah, and that would still be rather hard to understand and maintain,
> IMO.
> 
>> But I don't have a clear
>> picture about how to archive this, here are some questions from me:
>> - When we are in context_switch(), we have done the following changes to
>> vcpu's state:
>>  * sd->curr is set to next
>>  * vCPU's running state (both prev and next ) is changed by
>>vcpu_runstate_change()
>>  * next->is_running is set to 1
>>  * periodic timer for prev is stopped
>>  * periodic timer for next is setup
>>  ..
>>
>> So what point should we perform the action to _unblock_ the vCPU? We
>> Need to roll back the formal changes to the vCPU's state, right?
>>
> Mmm... not really. Not blocking prev does not mean that prev would be
> kept running on the pCPU, and that's true for your current solution as
> well! As you say yourself, you're already in the process of switching
> between prev and next, at a point where it's already a thing that next
> will be the vCPU that will run. Not blocking means that prev is
> reinserted to the runqueue, and a new invocation to the scheduler is
> (potentially) queued as well (via raising SCHEDULE_SOFTIRQ, in
> __runq_tickle()), but it's only when such new scheduling happens that
> prev will (potentially) be selected to run again.
> 
> So, no, unless I'm fully missing your point, there wouldn't be no
> rollback required. However, I still would like the other solution (doing
> stuff in vcpu_block()) better (see below).
> 
>>>  - with another hook, perhaps in vcpu_block() ?
>>
>> We could check this in vcpu_block(), however, the logic here is that before
>> vCPU is blocked, we need to change the posted-interrupt descriptor,
>> and during changing it, if 'ON' bit is set, which means VT-d hardware
>> issues a notification event because interrupts from the assigned devices
>> is coming, we don't need to block the vCPU and hence no need to update
>> the PI descriptor in this case. 
>>
> Yep, I saw that. But could it be possible to do *everything* related to
> blocking, including the update of the descriptor, in vcpu_block(), if no
> interrupt have been raised yet at that time? I mean, would you, if
> updating the descriptor in there, still get the event that allows you to
> call vcpu_wake(), and hence vmx_vcpu_wake_prepare(), which would undo
> the blocking, no matter whether that resulted in an actual context
> switch already or not?
> 
> I appreciate that this narrows the window for such an event to happen by
> quite a bit, making the logic itself a little less useful (it makes
> things more similar to "regular" blocking vs. event delivery, though,
> AFAICT), but if it's correct, ad if it allows us to save the ugly
> invocation of vcpu_unblock from context switch context, I'd give it a
> try.
> 
> After all, this PI thing requires actions to be taken when a vCPU is
> scheduled or descheduled because of blocking, unblocking and
> preemptions, and it would seem natural to me to:
>  - deal with blockings in vcpu_block()
>  - deal with unblockings in vcpu_wake()
>  - deal with preemptions in context_switch()
> 
> This does not mean being able to consolidate some of the cases (like
> blockings and preemptions, in the current version of the code) were not
> a nice thing... But we don't want it at all costs . :-)

So just to clarify the situation...

If a vcpu configured for the "running" state (i.e., NV set to
"posted_intr_vector", notifications enabled), and an interrupt happens
in the hypervisor -- what happens?

Is it the case that the interrupt is not actually delivered to the
processor, but that the pending bit will be set in the pi field, so that
the interrupt will be delivered the next time the hypervisor returns
into the guest?

(I am assuming that is the case, because if the hypervisor *does* get an
interrupt, then it can just unblock it there.)

This sort of race condition -- where we get an interrupt to wake up a
vcpu as we're blocking -- is 

Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-17 Thread Dario Faggioli
On Thu, 2015-09-17 at 12:44 +0100, George Dunlap wrote:
> On 09/17/2015 10:38 AM, George Dunlap wrote:
> > Is it the case that the interrupt is not actually delivered to the
> > processor, but that the pending bit will be set in the pi field, so that
> > the interrupt will be delivered the next time the hypervisor returns
> > into the guest?
> > 
> > (I am assuming that is the case, because if the hypervisor *does* get an
> > interrupt, then it can just unblock it there.)
> 
> Actually, it looks like you *do* in fact get a
> pi_notification_interrupt() in this case.  Could we to check to see if
> the current vcpu is blocked and unblock it?
> 
> I haven't yet decided whether I prefer my original suggestion of
> switching the interrupt and putting things on the wake-up list in
> vcpu_block(), or of deferring adding things to the wake-up list until
> the actual context switch.
> 
Sorry but I don't get what you mean with the latter.

I particular, I don't think I understand what you mean with and how it
would work to "defer[ring] adding things to the wake-up list until
actual context switch"... In what case would you defer stuff to context
switch?

Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-17 Thread Dario Faggioli
On Thu, 2015-09-17 at 15:30 +0100, George Dunlap wrote:
> On 09/17/2015 01:40 PM, Dario Faggioli wrote:

> >> I haven't yet decided whether I prefer my original suggestion of
> >> switching the interrupt and putting things on the wake-up list in
> >> vcpu_block(), or of deferring adding things to the wake-up list until
> >> the actual context switch.
> >>
> > Sorry but I don't get what you mean with the latter.
> > 
> > I particular, I don't think I understand what you mean with and how it
> > would work to "defer[ring] adding things to the wake-up list until
> > actual context switch"... In what case would you defer stuff to context
> > switch?
> 
> So one option is to do the "blocking" stuff in an arch-specific call
> from vcpu_block():
> 
> vcpu_block()
>  set(_VPF_blocked)
>  v->arch.block()
>   - Add v to pcpu.pi_blocked_vcpu
>   - NV => pi_wakeup_vector
>  local_events_need_delivery()
>hvm_vcpu_has_pending_irq()
> 
>  ...
>  context_switch(): nothing
> 
> The other is to do the "blocking" stuff in the context switch (similar
> to what's done now):
> 
> vcpu_block()
>   set(_VPF_blocked)
>   local_events_need_delivery()
> hvm_vcpu_has_pending_irq()
>   ...
> context_switch
>v->arch.block()
> - Add v to pcpu.pi_blocked_vcpu
> - NV => pi_wakeup_vector
> 
Ok, thanks for elaborating.

> If we do it the first way, and an interrupt comes in before the context
> switch is finished, it will call pi_wakeup_vector, which will DTRT --
> take v off the pi_blocked_vcpu list and call vcpu_unblock() (which in
> turn will set NV back to posted_intr_vector).
> 
> If we do it the second way, and an interrupt comes in before the context
> switch is finished, it will call posted_intr_vector.  We can, at that
> point, check to see if the current vcpu is marked as blocked.  If it is,
> we can call vcpu_unblock() without having to modify NV or worry about
> adding / removing the vcpu from the pi_blocked_vcpu list.
> 
Right.

> The thing I like about the first one is that it makes PI blocking the
> same as normal blocking -- everything happens in the same place; so the
> code is cleaner and easier to understand.
> 
Indeed.

> The thing I like about the second one is that it cleverly avoids having
> to do all the work of adding the vcpu to the list and then searching to
> remove it if the vcpu in question gets woken up on the way to being
> blocked.  So the code may end up being faster for workloads where that
> happens frequently.
> 
Maybe some instrumentation to figure out how frequent this is, at least
in a couple of (thought to be) common workloads can be added, and a few
test performed? Feng?

One thing that may be worth giving some thinking at is whether
interrupts are enabled or not. I mean, during vcpu_block()
SCHEDULE_SOFTIRQ is raised, for invoking the scheduler. Then (at least)
during schedule(), IRQs are disabled. They're re-enabled near the end of
schedule(), and re-disabled for the actual context switch ( ~= around
__context_switch()).

My point being that IRQs are going to be disabled for a significant
amount of time, between vcpu_block() and the actual context being
switched. And solution 2 requires IRQs to be enabled in order to avoid a
potentially pointless NV update, doesn't it? If yes, that may
(negatively) affect the probability of being able to actually benefit
from the optimization...

Anyway, I'm not sure. I think my gut feelings are in favour of solution
1, but, really, it's hard to tell without actually trying. 

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-17 Thread George Dunlap
On 09/17/2015 01:40 PM, Dario Faggioli wrote:
> On Thu, 2015-09-17 at 12:44 +0100, George Dunlap wrote:
>> On 09/17/2015 10:38 AM, George Dunlap wrote:
>>> Is it the case that the interrupt is not actually delivered to the
>>> processor, but that the pending bit will be set in the pi field, so that
>>> the interrupt will be delivered the next time the hypervisor returns
>>> into the guest?
>>>
>>> (I am assuming that is the case, because if the hypervisor *does* get an
>>> interrupt, then it can just unblock it there.)
>>
>> Actually, it looks like you *do* in fact get a
>> pi_notification_interrupt() in this case.  Could we to check to see if
>> the current vcpu is blocked and unblock it?
>>
>> I haven't yet decided whether I prefer my original suggestion of
>> switching the interrupt and putting things on the wake-up list in
>> vcpu_block(), or of deferring adding things to the wake-up list until
>> the actual context switch.
>>
> Sorry but I don't get what you mean with the latter.
> 
> I particular, I don't think I understand what you mean with and how it
> would work to "defer[ring] adding things to the wake-up list until
> actual context switch"... In what case would you defer stuff to context
> switch?

So one option is to do the "blocking" stuff in an arch-specific call
from vcpu_block():

vcpu_block()
 set(_VPF_blocked)
 v->arch.block()
  - Add v to pcpu.pi_blocked_vcpu
  - NV => pi_wakeup_vector
 local_events_need_delivery()
   hvm_vcpu_has_pending_irq()

 ...
 context_switch(): nothing

The other is to do the "blocking" stuff in the context switch (similar
to what's done now):

vcpu_block()
  set(_VPF_blocked)
  local_events_need_delivery()
hvm_vcpu_has_pending_irq()
  ...
context_switch
   v->arch.block()
- Add v to pcpu.pi_blocked_vcpu
- NV => pi_wakeup_vector

If we do it the first way, and an interrupt comes in before the context
switch is finished, it will call pi_wakeup_vector, which will DTRT --
take v off the pi_blocked_vcpu list and call vcpu_unblock() (which in
turn will set NV back to posted_intr_vector).

If we do it the second way, and an interrupt comes in before the context
switch is finished, it will call posted_intr_vector.  We can, at that
point, check to see if the current vcpu is marked as blocked.  If it is,
we can call vcpu_unblock() without having to modify NV or worry about
adding / removing the vcpu from the pi_blocked_vcpu list.

The thing I like about the first one is that it makes PI blocking the
same as normal blocking -- everything happens in the same place; so the
code is cleaner and easier to understand.

The thing I like about the second one is that it cleverly avoids having
to do all the work of adding the vcpu to the list and then searching to
remove it if the vcpu in question gets woken up on the way to being
blocked.  So the code may end up being faster for workloads where that
happens frequently.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-16 Thread Dario Faggioli
On Fri, 2015-09-11 at 16:29 +0800, Feng Wu wrote:

> CC: Keir Fraser 
> CC: Jan Beulich 
> CC: Andrew Cooper 
> CC: Kevin Tian 
> CC: George Dunlap 
> CC: Dario Faggioli 
> Sugguested-by: Dario Faggioli 
> Signed-off-by: Feng Wu 
> Reviewed-by: Dario Faggioli 
>
Ehm, just one thing, for now...

> ---
> v7:
> - Merge [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted 
> interrupts
>   and "[PATCH v6 14/18] vmx: posted-interrupt handling when vCPU is blocked"
>   into this patch, so it is self-contained and more convenient
>   for code review.
> - Make 'pi_blocked_vcpu' and 'pi_blocked_vcpu_lock' static
> - Coding style
> - Use per_cpu() instead of this_cpu() in pi_wakeup_interrupt()
> - Move ack_APIC_irq() to the beginning of pi_wakeup_interrupt()
> - Rename 'pi_ctxt_switch_from' to 'ctxt_switch_prepare'
> - Rename 'pi_ctxt_switch_to' to 'ctxt_switch_cancel'
> - Use 'has_hvm_container_vcpu' instead of 'is_hvm_vcpu'
> - Use 'spin_lock' and 'spin_unlock' when the interrupt has been
>   already disabled.
> - Rename arch_vcpu_wake_prepare to vmx_vcpu_wake_prepare
> - Define vmx_vcpu_wake_prepare in xen/arch/x86/hvm/hvm.c
> - Call .pi_ctxt_switch_to() __context_switch() instead of directly
>   calling vmx_post_ctx_switch_pi() in vmx_ctxt_switch_to()
> - Make .pi_block_cpu unsigned int
> - Use list_del() instead of list_del_init()
> - Coding style
> 
...With soo many changes having done to this patch, I'd say you should
have dropped any Acked/Reviewed-by tag!

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-16 Thread Dario Faggioli
On Wed, 2015-09-16 at 19:18 +0200, Dario Faggioli wrote:
> On Fri, 2015-09-11 at 16:29 +0800, Feng Wu wrote:
> > One remaining item:
> > Jan has concern about calling vcpu_unblock() in vmx_pre_ctx_switch_pi(),
> > need Dario or George's input about this.
> > 
> Hi,
> 
> Sorry for the delay in replying, I was on PTO for a few time.
> 
> Coming to the issue, well, it's a though call.
> 
> First of all, Feng, have you tested this with a debug build of Xen? I'm
> asking because it looks to me that you're ending up calling vcpu_wake()
> with IRQ disabled which, if my brain is not too rusty after a few weeks
> of vacation, should result in check_lock() (in xen/common/spinlock.c)
> complaining, doesn't it?
> 
Mmm.. My bad (so, yes, I'm a bit rusty, I guess). check_lock(), in case
of spin_lock_irqsave(), is called _after_ local_irq_disable(), inside
_spin_lock_irqsave(), so this above should not be an issue. Sorry for
the noise. :-(

The rest of what's said below, and the fact that I agree with George's
and Jan's concerns, still stand. :-)

So, in particular...

> In fact, in principle this is not too much different from what happens
> in other places. More specifically, what we have is a vcpu being
> re-inserted in  a runqueue, and the need for re-running the scheduler on
> a(some) PCPU(s) is evaluated. That is similar to what happens in Credit2
> (and in RTDS) in csched2_context_saved(), which is called from within
> context_saved(), still from the context switch code (if
> __CSFLAG_delayed_runq_add is true).
> 
> So it's not the thing per se that is that terrible, IMO. The differences
> between that and your case are:
>  - in the Credit2 case, it happens later down in the context switch
>path (which would look already better to me) and, more important,
>with IRQs already re-enabled;
>  - in the Credit2 case, the effect that something like that can have on 
>the scheduler is much more evident, as it happens inside a scheduler
>hook, rather than buried down in arch specific code, which makes me a
>lot less concerned about the possibility of latent issues Jan was
>hinting at, with which I concur.
> 
> So, I guess, first of all, can you confirm whether or not it's exploding
> in debug builds? And in either case (just tossing out ideas) would it be
> possible to deal with the "interrupt already raised when blocking" case:
>  - later in the context switching function ?
>  - with another hook, perhaps in vcpu_block() ?
> 
... let me/us know what you think about these alternatives.

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

2015-09-16 Thread Dario Faggioli
On Fri, 2015-09-11 at 16:29 +0800, Feng Wu wrote:
> This patch includes the following aspects:
> - Handling logic when vCPU is blocked:
> * Add a global vector to wake up the blocked vCPU
>   when an interrupt is being posted to it (This part
>   was sugguested by Yang Zhang ).
> * Define two per-cpu variables:
>   1. pi_blocked_vcpu:
> A list storing the vCPUs which were blocked
> on this pCPU.
> 
>   2. pi_blocked_vcpu_lock:
> The spinlock to protect pi_blocked_vcpu.
> 
> - Add some scheduler hooks, this part was suggested
>   by Dario Faggioli .
> * vmx_pre_ctx_switch_pi()
>   It is called before context switch, we update the
>   posted interrupt descriptor when the vCPU is preempted,
>   go to sleep, or is blocked.
> 
> * vmx_post_ctx_switch_pi()
>   It is called after context switch, we update the posted
>   interrupt descriptor when the vCPU is going to run.
> 
> * arch_vcpu_wake_prepare()
>   It will be called when waking up the vCPU, we update
>   the posted interrupt descriptor when the vCPU is
>   unblocked.
> 
> CC: Keir Fraser 
> CC: Jan Beulich 
> CC: Andrew Cooper 
> CC: Kevin Tian 
> CC: George Dunlap 
> CC: Dario Faggioli 
> Sugguested-by: Dario Faggioli 
> Signed-off-by: Feng Wu 
> Reviewed-by: Dario Faggioli 
> ---
> v7:
> - Merge [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted 
> interrupts
>   and "[PATCH v6 14/18] vmx: posted-interrupt handling when vCPU is blocked"
>   into this patch, so it is self-contained and more convenient
>   for code review.
> - Make 'pi_blocked_vcpu' and 'pi_blocked_vcpu_lock' static
> - Coding style
> - Use per_cpu() instead of this_cpu() in pi_wakeup_interrupt()
> - Move ack_APIC_irq() to the beginning of pi_wakeup_interrupt()
> - Rename 'pi_ctxt_switch_from' to 'ctxt_switch_prepare'
> - Rename 'pi_ctxt_switch_to' to 'ctxt_switch_cancel'
> - Use 'has_hvm_container_vcpu' instead of 'is_hvm_vcpu'
> - Use 'spin_lock' and 'spin_unlock' when the interrupt has been
>   already disabled.
> - Rename arch_vcpu_wake_prepare to vmx_vcpu_wake_prepare
> - Define vmx_vcpu_wake_prepare in xen/arch/x86/hvm/hvm.c
> - Call .pi_ctxt_switch_to() __context_switch() instead of directly
>   calling vmx_post_ctx_switch_pi() in vmx_ctxt_switch_to()
> - Make .pi_block_cpu unsigned int
> - Use list_del() instead of list_del_init()
> - Coding style
> 
> One remaining item:
> Jan has concern about calling vcpu_unblock() in vmx_pre_ctx_switch_pi(),
> need Dario or George's input about this.
> 
Hi,

Sorry for the delay in replying, I was on PTO for a few time.

Coming to the issue, well, it's a though call.

First of all, Feng, have you tested this with a debug build of Xen? I'm
asking because it looks to me that you're ending up calling vcpu_wake()
with IRQ disabled which, if my brain is not too rusty after a few weeks
of vacation, should result in check_lock() (in xen/common/spinlock.c)
complaining, doesn't it?

In fact, in principle this is not too much different from what happens
in other places. More specifically, what we have is a vcpu being
re-inserted in  a runqueue, and the need for re-running the scheduler on
a(some) PCPU(s) is evaluated. That is similar to what happens in Credit2
(and in RTDS) in csched2_context_saved(), which is called from within
context_saved(), still from the context switch code (if
__CSFLAG_delayed_runq_add is true).

So it's not the thing per se that is that terrible, IMO. The differences
between that and your case are:
 - in the Credit2 case, it happens later down in the context switch
   path (which would look already better to me) and, more important,
   with IRQs already re-enabled;
 - in the Credit2 case, the effect that something like that can have on 
   the scheduler is much more evident, as it happens inside a scheduler
   hook, rather than buried down in arch specific code, which makes me a
   lot less concerned about the possibility of latent issues Jan was
   hinting at, with which I concur.

So, I guess, first of all, can you confirm whether or not it's exploding
in debug builds? And in either case (just tossing out ideas) would it be
possible to deal with the "interrupt already raised when blocking" case:
 - later in the context switching function ?
 - with another hook, perhaps in vcpu_block() ?

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list