Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-21 Thread Wu, Feng
> >> Thanks for the comments!
> >>
> >> From my understanding, __sync_local_execstate() can only get called
> >> in the following two cases:
> >> #1) this_cpu(curr_vcpu) == current, in this case, __context_switch() is
> >> not called.
> >> #2) this_cpu(curr_vcpu) != current, and current == idle_vcpu, that means
> >> we just switched from a non-idle vCPU to idle vCPU, so here we need to
> >> call __context_switch() to copy things to the original vcpu struct.
> >>
> >> Please correct me if the above understanding is wrong or incomplete?
> >
> > Hi George / Dario,
> >
> > Could you please confirm the above understanding is correct? (In fact, it is
> > Related to lazy context switch, right?) if so I can continue with the
> > pi_context_switch() way George suggested.
> 
> Yes, that's the general idea.  Normally, you can access the registers of
> a non-running vcpu from the vcpu struct.  But if we've done a lazy
> context switch, that's not true -- so to access those registers properly
> we need to go through and do the full context switch *on that pcpu*.

Thanks for the clarification!

Thanks,
Feng
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-21 Thread George Dunlap
On 09/21/2015 06:07 AM, Wu, Feng wrote:
> 
> 
>> -Original Message-
>> From: Wu, Feng
>> Sent: Thursday, September 17, 2015 2:16 PM
>> To: George Dunlap; Jan Beulich
>> Cc: Tian, Kevin; Keir Fraser; Andrew Cooper; Dario Faggioli;
>> xen-devel@lists.xen.org; Wu, Feng
>> Subject: RE: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for
>> VT-d posted interrupts
>>
>>
>>
>>> -Original Message-
>>> From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of
>> George
>>> Dunlap
>>> Sent: Thursday, September 17, 2015 12:57 AM
>>> To: Jan Beulich
>>> Cc: Wu, Feng; Tian, Kevin; Keir Fraser; Andrew Cooper; Dario Faggioli;
>>> xen-devel@lists.xen.org
>>> Subject: Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for
>>> VT-d posted interrupts
>>>
>>> On Mon, Sep 7, 2015 at 1:54 PM, Jan Beulich <jbeul...@suse.com> wrote:
>>>>>>> On 25.08.15 at 03:57, <feng...@intel.com> wrote:
>>>>> --- a/xen/arch/x86/domain.c
>>>>> +++ b/xen/arch/x86/domain.c
>>>>> @@ -1573,6 +1573,22 @@ static void __context_switch(void)
>>>>>  per_cpu(curr_vcpu, cpu) = n;
>>>>>  }
>>>>>
>>>>> +static inline void pi_ctxt_switch_from(struct vcpu *prev)
>>>>> +{
>>>>> +/*
>>>>> + * When switching from non-idle to idle, we only do a lazy context
>>> switch.
>>>>> + * However, in order for posted interrupt (if available and enabled)
>> to
>>>>> + * work properly, we at least need to update the descriptors.
>>>>> + */
>>>>> +if ( prev->arch.pi_ctxt_switch_from && !is_idle_vcpu(prev) )
>>>>> +prev->arch.pi_ctxt_switch_from(prev);
>>>>> +}
>>>>> +
>>>>> +static inline void pi_ctxt_switch_to(struct vcpu *next)
>>>>> +{
>>>>> +if ( next->arch.pi_ctxt_switch_to && !is_idle_vcpu(next) )
>>>>> +next->arch.pi_ctxt_switch_to(next);
>>>>> +}
>>>>>
>>>>>  void context_switch(struct vcpu *prev, struct vcpu *next)
>>>>>  {
>>>>> @@ -1605,9 +1621,12 @@ void context_switch(struct vcpu *prev, struct
>>> vcpu *next)
>>>>>
>>>>>  set_current(next);
>>>>>
>>>>> +pi_ctxt_switch_from(prev);
>>>>> +
>>>>>  if ( (per_cpu(curr_vcpu, cpu) == next) ||
>>>>>   (is_idle_domain(nextd) && cpu_online(cpu)) )
>>>>>  {
>>>>> +pi_ctxt_switch_to(next);
>>>>>  local_irq_enable();
>>>>
>>>> This placement, if really intended that way, needs explanation (in a
>>>> comment) and perhaps even renaming of the involved symbols, as
>>>> looking at it from a general perspective it seems wrong (with
>>>> pi_ctxt_switch_to() excluding idle vCPU-s it effectively means you
>>>> want this only when switching back to what got switched out lazily
>>>> before, i.e. this would be not something to take place on an arbitrary
>>>> context switch). As to possible alternative names - maybe make the
>>>> hooks ctxt_switch_prepare() and ctxt_switch_cancel()?
>>>
>>> Why on earth is this more clear than what he had before?
>>>
>>> In the first call, he's not "preparing" anything -- he's actually
>>> switching the PI context out for prev.  And in the second call, he's
>>> not "cancelling" anything -- he's actually switching the PI context in
>>> for next.  The names you suggest are actively confusing, not helpful.
>>>
>>> But before talking about how to make things more clear, one side
>>> question -- do we need to actually call pi_ctxt_switch_to() in
>>> __context_switch()?
>>>
>>> The only other place __context_switch() is called is
>>> from__sync_local_execstate().  But the only reason that needs to be
>>> called is because sometimes we *don't* call __context_switch(), and so
>>> there are things on the cpu that aren't copied into the vcpu struct.
>>
>> Thanks for the comments!
>>
>> From my understanding, __sync_local_execstate() can only get called
>> in the following two cases:
>> #1) this_cpu(curr_vcpu) == current, in this case, __context_switch() is
>> not called.
>&

Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-21 Thread Jan Beulich
>>> On 21.09.15 at 11:28,  wrote:
> On 09/21/2015 09:23 AM, Jan Beulich wrote:
> On 16.09.15 at 18:56,  wrote:
>>> On Mon, Sep 7, 2015 at 1:54 PM, Jan Beulich  wrote:
>>> On 25.08.15 at 03:57,  wrote:
> @@ -1605,9 +1621,12 @@ void context_switch(struct vcpu *prev, struct vcpu 
> *next)
>
>  set_current(next);
>
> +pi_ctxt_switch_from(prev);
> +
>  if ( (per_cpu(curr_vcpu, cpu) == next) ||
>   (is_idle_domain(nextd) && cpu_online(cpu)) )
>  {
> +pi_ctxt_switch_to(next);
>  local_irq_enable();

 This placement, if really intended that way, needs explanation (in a
 comment) and perhaps even renaming of the involved symbols, as
 looking at it from a general perspective it seems wrong (with
 pi_ctxt_switch_to() excluding idle vCPU-s it effectively means you
 want this only when switching back to what got switched out lazily
 before, i.e. this would be not something to take place on an arbitrary
 context switch). As to possible alternative names - maybe make the
 hooks ctxt_switch_prepare() and ctxt_switch_cancel()?
>>>
>>> Why on earth is this more clear than what he had before?
>>>
>>> In the first call, he's not "preparing" anything -- he's actually
>>> switching the PI context out for prev.  And in the second call, he's
>>> not "cancelling" anything -- he's actually switching the PI context in
>>> for next.  The names you suggest are actively confusing, not helpful.
>> 
>> While I think later discussion on this thread moved in a good direction,
>> I still think I should reply here (even if late): To me, the use of
>> pi_ctxt_switch_to() in the patch fragment still seen above is very
>> much the cancellation of the immediately preceding pi_ctxt_switch_from(),
>> as it's the "we don't want to do anything else" path that it gets put
>> into.
> 
> Either we have different understandings about what the code does, or I
> don't understand what you're saying here.
> 
> The codepath in question will only be called if we're switching *into*
> or *out of* the "lazy context swtich" -- i.e., switching from a vcpu to
> the idle vcpu, but not saving or restoring state.

Oh, I'm sorry - you're right.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-21 Thread George Dunlap
On 09/21/2015 09:23 AM, Jan Beulich wrote:
 On 16.09.15 at 18:56,  wrote:
>> On Mon, Sep 7, 2015 at 1:54 PM, Jan Beulich  wrote:
>> On 25.08.15 at 03:57,  wrote:
 --- a/xen/arch/x86/domain.c
 +++ b/xen/arch/x86/domain.c
 @@ -1573,6 +1573,22 @@ static void __context_switch(void)
  per_cpu(curr_vcpu, cpu) = n;
  }

 +static inline void pi_ctxt_switch_from(struct vcpu *prev)
 +{
 +/*
 + * When switching from non-idle to idle, we only do a lazy context 
 switch.
 + * However, in order for posted interrupt (if available and enabled) 
 to
 + * work properly, we at least need to update the descriptors.
 + */
 +if ( prev->arch.pi_ctxt_switch_from && !is_idle_vcpu(prev) )
 +prev->arch.pi_ctxt_switch_from(prev);
 +}
 +
 +static inline void pi_ctxt_switch_to(struct vcpu *next)
 +{
 +if ( next->arch.pi_ctxt_switch_to && !is_idle_vcpu(next) )
 +next->arch.pi_ctxt_switch_to(next);
 +}

  void context_switch(struct vcpu *prev, struct vcpu *next)
  {
 @@ -1605,9 +1621,12 @@ void context_switch(struct vcpu *prev, struct vcpu 
 *next)

  set_current(next);

 +pi_ctxt_switch_from(prev);
 +
  if ( (per_cpu(curr_vcpu, cpu) == next) ||
   (is_idle_domain(nextd) && cpu_online(cpu)) )
  {
 +pi_ctxt_switch_to(next);
  local_irq_enable();
>>>
>>> This placement, if really intended that way, needs explanation (in a
>>> comment) and perhaps even renaming of the involved symbols, as
>>> looking at it from a general perspective it seems wrong (with
>>> pi_ctxt_switch_to() excluding idle vCPU-s it effectively means you
>>> want this only when switching back to what got switched out lazily
>>> before, i.e. this would be not something to take place on an arbitrary
>>> context switch). As to possible alternative names - maybe make the
>>> hooks ctxt_switch_prepare() and ctxt_switch_cancel()?
>>
>> Why on earth is this more clear than what he had before?
>>
>> In the first call, he's not "preparing" anything -- he's actually
>> switching the PI context out for prev.  And in the second call, he's
>> not "cancelling" anything -- he's actually switching the PI context in
>> for next.  The names you suggest are actively confusing, not helpful.
> 
> While I think later discussion on this thread moved in a good direction,
> I still think I should reply here (even if late): To me, the use of
> pi_ctxt_switch_to() in the patch fragment still seen above is very
> much the cancellation of the immediately preceding pi_ctxt_switch_from(),
> as it's the "we don't want to do anything else" path that it gets put
> into.

Either we have different understandings about what the code does, or I
don't understand what you're saying here.

The codepath in question will only be called if we're switching *into*
or *out of* the "lazy context swtich" -- i.e., switching from a vcpu to
the idle vcpu, but not saving or restoring state.

For the "switch into" case:
* prev will be a domain vcpu, next will be the idle vcpu
* pi_ctxt_switch_from(prev) will either change NV to 'pi_wake' or set SN
for prev's PI vectors (depending on whether prev is blocked or offline)
* pi_ctxt_switch_to(next) will do nothing, since next is the idle vcpu

For the "switching out" case:
* prev will be the idle vcpu, next will be a domain vcpu
* pi_ctxt_switch_from(prev) will do nothing, since prev is the idle vcpu
* pi_ctxt_switch_to(next) will clear SN and change the vector to
'posted_intr'

So there is no situation in which pi_ctxt_switch_to() "cancels" the
immediately preceding pi_ctxt_switch_from().  Either the preceding
from() did something, in which case to() does nothing; or the preceding
from() did nothing, in which case to() does something.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-21 Thread Jan Beulich
>>> On 16.09.15 at 18:56,  wrote:
> On Mon, Sep 7, 2015 at 1:54 PM, Jan Beulich  wrote:
> On 25.08.15 at 03:57,  wrote:
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -1573,6 +1573,22 @@ static void __context_switch(void)
>>>  per_cpu(curr_vcpu, cpu) = n;
>>>  }
>>>
>>> +static inline void pi_ctxt_switch_from(struct vcpu *prev)
>>> +{
>>> +/*
>>> + * When switching from non-idle to idle, we only do a lazy context 
>>> switch.
>>> + * However, in order for posted interrupt (if available and enabled) to
>>> + * work properly, we at least need to update the descriptors.
>>> + */
>>> +if ( prev->arch.pi_ctxt_switch_from && !is_idle_vcpu(prev) )
>>> +prev->arch.pi_ctxt_switch_from(prev);
>>> +}
>>> +
>>> +static inline void pi_ctxt_switch_to(struct vcpu *next)
>>> +{
>>> +if ( next->arch.pi_ctxt_switch_to && !is_idle_vcpu(next) )
>>> +next->arch.pi_ctxt_switch_to(next);
>>> +}
>>>
>>>  void context_switch(struct vcpu *prev, struct vcpu *next)
>>>  {
>>> @@ -1605,9 +1621,12 @@ void context_switch(struct vcpu *prev, struct vcpu 
>>> *next)
>>>
>>>  set_current(next);
>>>
>>> +pi_ctxt_switch_from(prev);
>>> +
>>>  if ( (per_cpu(curr_vcpu, cpu) == next) ||
>>>   (is_idle_domain(nextd) && cpu_online(cpu)) )
>>>  {
>>> +pi_ctxt_switch_to(next);
>>>  local_irq_enable();
>>
>> This placement, if really intended that way, needs explanation (in a
>> comment) and perhaps even renaming of the involved symbols, as
>> looking at it from a general perspective it seems wrong (with
>> pi_ctxt_switch_to() excluding idle vCPU-s it effectively means you
>> want this only when switching back to what got switched out lazily
>> before, i.e. this would be not something to take place on an arbitrary
>> context switch). As to possible alternative names - maybe make the
>> hooks ctxt_switch_prepare() and ctxt_switch_cancel()?
> 
> Why on earth is this more clear than what he had before?
> 
> In the first call, he's not "preparing" anything -- he's actually
> switching the PI context out for prev.  And in the second call, he's
> not "cancelling" anything -- he's actually switching the PI context in
> for next.  The names you suggest are actively confusing, not helpful.

While I think later discussion on this thread moved in a good direction,
I still think I should reply here (even if late): To me, the use of
pi_ctxt_switch_to() in the patch fragment still seen above is very
much the cancellation of the immediately preceding pi_ctxt_switch_from(),
as it's the "we don't want to do anything else" path that it gets put
into.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-20 Thread Wu, Feng


> -Original Message-
> From: Wu, Feng
> Sent: Thursday, September 17, 2015 2:16 PM
> To: George Dunlap; Jan Beulich
> Cc: Tian, Kevin; Keir Fraser; Andrew Cooper; Dario Faggioli;
> xen-devel@lists.xen.org; Wu, Feng
> Subject: RE: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for
> VT-d posted interrupts
> 
> 
> 
> > -Original Message-
> > From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of
> George
> > Dunlap
> > Sent: Thursday, September 17, 2015 12:57 AM
> > To: Jan Beulich
> > Cc: Wu, Feng; Tian, Kevin; Keir Fraser; Andrew Cooper; Dario Faggioli;
> > xen-devel@lists.xen.org
> > Subject: Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for
> > VT-d posted interrupts
> >
> > On Mon, Sep 7, 2015 at 1:54 PM, Jan Beulich <jbeul...@suse.com> wrote:
> > >>>> On 25.08.15 at 03:57, <feng...@intel.com> wrote:
> > >> --- a/xen/arch/x86/domain.c
> > >> +++ b/xen/arch/x86/domain.c
> > >> @@ -1573,6 +1573,22 @@ static void __context_switch(void)
> > >>  per_cpu(curr_vcpu, cpu) = n;
> > >>  }
> > >>
> > >> +static inline void pi_ctxt_switch_from(struct vcpu *prev)
> > >> +{
> > >> +/*
> > >> + * When switching from non-idle to idle, we only do a lazy context
> > switch.
> > >> + * However, in order for posted interrupt (if available and enabled)
> to
> > >> + * work properly, we at least need to update the descriptors.
> > >> + */
> > >> +if ( prev->arch.pi_ctxt_switch_from && !is_idle_vcpu(prev) )
> > >> +prev->arch.pi_ctxt_switch_from(prev);
> > >> +}
> > >> +
> > >> +static inline void pi_ctxt_switch_to(struct vcpu *next)
> > >> +{
> > >> +if ( next->arch.pi_ctxt_switch_to && !is_idle_vcpu(next) )
> > >> +next->arch.pi_ctxt_switch_to(next);
> > >> +}
> > >>
> > >>  void context_switch(struct vcpu *prev, struct vcpu *next)
> > >>  {
> > >> @@ -1605,9 +1621,12 @@ void context_switch(struct vcpu *prev, struct
> > vcpu *next)
> > >>
> > >>  set_current(next);
> > >>
> > >> +pi_ctxt_switch_from(prev);
> > >> +
> > >>  if ( (per_cpu(curr_vcpu, cpu) == next) ||
> > >>   (is_idle_domain(nextd) && cpu_online(cpu)) )
> > >>  {
> > >> +pi_ctxt_switch_to(next);
> > >>  local_irq_enable();
> > >
> > > This placement, if really intended that way, needs explanation (in a
> > > comment) and perhaps even renaming of the involved symbols, as
> > > looking at it from a general perspective it seems wrong (with
> > > pi_ctxt_switch_to() excluding idle vCPU-s it effectively means you
> > > want this only when switching back to what got switched out lazily
> > > before, i.e. this would be not something to take place on an arbitrary
> > > context switch). As to possible alternative names - maybe make the
> > > hooks ctxt_switch_prepare() and ctxt_switch_cancel()?
> >
> > Why on earth is this more clear than what he had before?
> >
> > In the first call, he's not "preparing" anything -- he's actually
> > switching the PI context out for prev.  And in the second call, he's
> > not "cancelling" anything -- he's actually switching the PI context in
> > for next.  The names you suggest are actively confusing, not helpful.
> >
> > But before talking about how to make things more clear, one side
> > question -- do we need to actually call pi_ctxt_switch_to() in
> > __context_switch()?
> >
> > The only other place __context_switch() is called is
> > from__sync_local_execstate().  But the only reason that needs to be
> > called is because sometimes we *don't* call __context_switch(), and so
> > there are things on the cpu that aren't copied into the vcpu struct.
> 
> Thanks for the comments!
> 
> From my understanding, __sync_local_execstate() can only get called
> in the following two cases:
> #1) this_cpu(curr_vcpu) == current, in this case, __context_switch() is
> not called.
> #2) this_cpu(curr_vcpu) != current, and current == idle_vcpu, that means
> we just switched from a non-idle vCPU to idle vCPU, so here we need to
> call __context_switch() to copy things to the original vcpu struct.
> 
> Please correct me if the above understanding is wrong or incomplete?

Hi

Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-17 Thread Wu, Feng


> -Original Message-
> From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of George
> Dunlap
> Sent: Thursday, September 17, 2015 12:57 AM
> To: Jan Beulich
> Cc: Wu, Feng; Tian, Kevin; Keir Fraser; Andrew Cooper; Dario Faggioli;
> xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for
> VT-d posted interrupts
> 
> On Mon, Sep 7, 2015 at 1:54 PM, Jan Beulich <jbeul...@suse.com> wrote:
> >>>> On 25.08.15 at 03:57, <feng...@intel.com> wrote:
> >> --- a/xen/arch/x86/domain.c
> >> +++ b/xen/arch/x86/domain.c
> >> @@ -1573,6 +1573,22 @@ static void __context_switch(void)
> >>  per_cpu(curr_vcpu, cpu) = n;
> >>  }
> >>
> >> +static inline void pi_ctxt_switch_from(struct vcpu *prev)
> >> +{
> >> +/*
> >> + * When switching from non-idle to idle, we only do a lazy context
> switch.
> >> + * However, in order for posted interrupt (if available and enabled) 
> >> to
> >> + * work properly, we at least need to update the descriptors.
> >> + */
> >> +if ( prev->arch.pi_ctxt_switch_from && !is_idle_vcpu(prev) )
> >> +prev->arch.pi_ctxt_switch_from(prev);
> >> +}
> >> +
> >> +static inline void pi_ctxt_switch_to(struct vcpu *next)
> >> +{
> >> +if ( next->arch.pi_ctxt_switch_to && !is_idle_vcpu(next) )
> >> +next->arch.pi_ctxt_switch_to(next);
> >> +}
> >>
> >>  void context_switch(struct vcpu *prev, struct vcpu *next)
> >>  {
> >> @@ -1605,9 +1621,12 @@ void context_switch(struct vcpu *prev, struct
> vcpu *next)
> >>
> >>  set_current(next);
> >>
> >> +pi_ctxt_switch_from(prev);
> >> +
> >>  if ( (per_cpu(curr_vcpu, cpu) == next) ||
> >>   (is_idle_domain(nextd) && cpu_online(cpu)) )
> >>  {
> >> +pi_ctxt_switch_to(next);
> >>  local_irq_enable();
> >
> > This placement, if really intended that way, needs explanation (in a
> > comment) and perhaps even renaming of the involved symbols, as
> > looking at it from a general perspective it seems wrong (with
> > pi_ctxt_switch_to() excluding idle vCPU-s it effectively means you
> > want this only when switching back to what got switched out lazily
> > before, i.e. this would be not something to take place on an arbitrary
> > context switch). As to possible alternative names - maybe make the
> > hooks ctxt_switch_prepare() and ctxt_switch_cancel()?
> 
> Why on earth is this more clear than what he had before?
> 
> In the first call, he's not "preparing" anything -- he's actually
> switching the PI context out for prev.  And in the second call, he's
> not "cancelling" anything -- he's actually switching the PI context in
> for next.  The names you suggest are actively confusing, not helpful.
> 
> But before talking about how to make things more clear, one side
> question -- do we need to actually call pi_ctxt_switch_to() in
> __context_switch()?
> 
> The only other place __context_switch() is called is
> from__sync_local_execstate().  But the only reason that needs to be
> called is because sometimes we *don't* call __context_switch(), and so
> there are things on the cpu that aren't copied into the vcpu struct.

Thanks for the comments!

>From my understanding, __sync_local_execstate() can only get called
in the following two cases:
#1) this_cpu(curr_vcpu) == current, in this case, __context_switch() is
not called.
#2) this_cpu(curr_vcpu) != current, and current == idle_vcpu, that means
we just switched from a non-idle vCPU to idle vCPU, so here we need to
call __context_switch() to copy things to the original vcpu struct.

Please correct me if the above understanding is wrong or incomplete?

I think calling pi_ctxt_switch_to() in __context_switch() is needed when
we are switching to a non-idle vCPU (we need change the PI state of the
target vCPU), and the call is not needed when switching to idle vCPU.
So if the above understanding is correct, I think you suggestion below
is really good, it makes things clearer.

> 
> That doesn't apply to the PI state -- for one, nothing is copied from
> the processor; and for two, pi_ctxt_switch_from() is called
> unconditionally anyway.
> 
> Would it make more sense to call pi_context_switch(prev, next) just
> after "set_current"?

I think it is a good point.

Thanks,
Feng

> 
> (Keeping in mind I totally may have missed something...)
> 
>  -George
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-17 Thread Wu, Feng


> -Original Message-
> From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of George
> Dunlap
> Sent: Thursday, September 17, 2015 1:08 AM
> To: Wu, Feng
> Cc: Jan Beulich; Tian, Kevin; Keir Fraser; Andrew Cooper; Dario Faggioli;
> xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for
> VT-d posted interrupts
> 
> On Thu, Sep 10, 2015 at 9:59 AM, Wu, Feng <feng...@intel.com> wrote:
> >> >> >> Calling vcpu_unblock() in the middle of context_switch()? Why? And
> >> >> >> is this safe?
> >> >> >
> >> >> > I cannot see anything unsafe so far, can some scheduler maintainer
> >> >> > help to confirm it? Dario? George?
> >> >>
> >> >> But first and foremost you ought to answer the "why".
> >> >
> >> > I think if you think this solution is unsafe, you need point out where it
> >> > is, not
> >> > just ask "is this safe ?", I don't think this is an effective comments.
> >>
> >> It is my general understanding that people wanting code to be
> >> included have to prove their code is okay, not reviewers to prove
> >> the code is broken.
> >>
> >> > Anyway, I go through the main path of the code as below, I really don't 
> >> > find
> >> > anything unsafe here.
> >> >
> >> > context_switch() -->
> >> > pi_ctxt_switch_from() -->
> >> > vmx_pre_ctx_switch_pi() -->
> >> > vcpu_unblock() -->
> >> > vcpu_wake() -->
> >> > SCHED_OP(VCPU2OP(v), wake, v) -->
> >> > csched_vcpu_wake() -->
> >> > __runq_insert()
> >> > __runq_tickle()
> >> >
> >> > If you continue to think it is unsafe, pointing out the place will be
> >> > welcome!
> >>
> >> Once again - I didn't say it's unsafe (and hence can't point at a
> >> particular place). What bothers me is that vcpu_unblock() affects
> >> scheduler state, and altering scheduler state from inside the
> >> context switch path looks problematic at the first glance. I'd be
> >> happy if I was told there is no such problem, but be aware that
> >> this would have to include latent ones: Even if right now things
> >> are safe, having such feedback have the potential of becoming
> >> an actual problem with a later scheduler change is already an
> >> issue, as finding the problem is then likely going to be rather hard
> >> (and I suspect it's not going to be you to debug this).
> >
> > What I can say is that after investigation, I don't find anything bad
> > for this vcpu_unblock(), I tested it in my experiment. So that is why
> > I'd like to ask some ideas from scheduler expects.
> 
> You still didn't answer Jan's question as to why you chose to call it there.

The reason why I need to call vcpu_unblock() here is that we are about
to block the vCPU and put it on the blocking list if everything goes okay,
but if during updating the posted-interrupt descriptor 'ON' is set, which
means there is a new interrupt coming for this vCPU, we shouldn't block
it, so I call vcpu_unblock to do it.

> 
> As to why Jan is suspicious, the hypervisor code is incredibly
> complicated, and even hypervisor maintainers are only mortals. :-) One
> way we deal with the complication is by trying to restrict the ways
> different code interacts, so that people reading, writing, and
> maintaining the code can make simplifying assumptions to make it
> easier to grasp / harder to make mistakes.
> 
> People reading or working on vcpu_wake() code expect it to be called
> from interrupt contexts, and also expect it to be called from normal
> hypercalls.  They *don't* expect it to be called from in the middle of
> a context switch, where "current", the registers, and all those things
> are all up in the air.  It's possible that the code *already*
> implicitly assumes it's not called from context switch (i.e., that v
> == current means certain state), and even if not, it's possible
> someone will make that assumption in the future, causing a very
> hard-to-detect bug.
> 
> Now I haven't looked at the code in detail yet; it may be that the
> only way to make PI work is to make this call.  If that's the case we
> need to carefully examine assumptions, and carefully comment the code
> to make sure people working on the scheduling code continue to think
> carefully about their assumptions in the future.  But if we can get
> away without doing that, it will make things much easier.
> 
> But it's certainly reasonable to expect the maintainers to offer you
> an alternate solution if your proposed solution is unacceptable. :-)
> Let me take a look and see what I think.

Really appreciate your detailed description about this, it is reasonable,
and thank you from your comments!

Thanks,
Feng

> 
>  -George
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-16 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, September 10, 2015 5:26 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
> xen-devel@lists.xen.org; Keir Fraser
> Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted
> interrupts
> 
> >> > Anyway, I go through the main path of the code as below, I really don't 
> >> > find
> >> > anything unsafe here.
> >> >
> >> > context_switch() -->
> >> > pi_ctxt_switch_from() -->
> >> > vmx_pre_ctx_switch_pi() -->
> >> > vcpu_unblock() -->
> >> > vcpu_wake() -->
> >> > SCHED_OP(VCPU2OP(v), wake, v) -->
> >> > csched_vcpu_wake() -->
> >> > __runq_insert()
> >> > __runq_tickle()
> >> >
> >> > If you continue to think it is unsafe, pointing out the place will be
> >> > welcome!
> >>
> >> Once again - I didn't say it's unsafe (and hence can't point at a
> >> particular place). What bothers me is that vcpu_unblock() affects
> >> scheduler state, and altering scheduler state from inside the
> >> context switch path looks problematic at the first glance. I'd be
> >> happy if I was told there is no such problem, but be aware that
> >> this would have to include latent ones: Even if right now things
> >> are safe, having such feedback have the potential of becoming
> >> an actual problem with a later scheduler change is already an
> >> issue, as finding the problem is then likely going to be rather hard
> >> (and I suspect it's not going to be you to debug this).
> >
> > What I can say is that after investigation, I don't find anything bad
> > for this vcpu_unblock(), I tested it in my experiment. So that is why
> > I'd like to ask some ideas from scheduler expects.
> 
> Agreed - I'm also awaiting their input.

Hi Dario, & George, could you please gave your thoughts about this?
Your input is very important for us!

Thanks,
Feng

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-16 Thread George Dunlap
On Mon, Sep 7, 2015 at 1:54 PM, Jan Beulich  wrote:
 On 25.08.15 at 03:57,  wrote:
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1573,6 +1573,22 @@ static void __context_switch(void)
>>  per_cpu(curr_vcpu, cpu) = n;
>>  }
>>
>> +static inline void pi_ctxt_switch_from(struct vcpu *prev)
>> +{
>> +/*
>> + * When switching from non-idle to idle, we only do a lazy context 
>> switch.
>> + * However, in order for posted interrupt (if available and enabled) to
>> + * work properly, we at least need to update the descriptors.
>> + */
>> +if ( prev->arch.pi_ctxt_switch_from && !is_idle_vcpu(prev) )
>> +prev->arch.pi_ctxt_switch_from(prev);
>> +}
>> +
>> +static inline void pi_ctxt_switch_to(struct vcpu *next)
>> +{
>> +if ( next->arch.pi_ctxt_switch_to && !is_idle_vcpu(next) )
>> +next->arch.pi_ctxt_switch_to(next);
>> +}
>>
>>  void context_switch(struct vcpu *prev, struct vcpu *next)
>>  {
>> @@ -1605,9 +1621,12 @@ void context_switch(struct vcpu *prev, struct vcpu 
>> *next)
>>
>>  set_current(next);
>>
>> +pi_ctxt_switch_from(prev);
>> +
>>  if ( (per_cpu(curr_vcpu, cpu) == next) ||
>>   (is_idle_domain(nextd) && cpu_online(cpu)) )
>>  {
>> +pi_ctxt_switch_to(next);
>>  local_irq_enable();
>
> This placement, if really intended that way, needs explanation (in a
> comment) and perhaps even renaming of the involved symbols, as
> looking at it from a general perspective it seems wrong (with
> pi_ctxt_switch_to() excluding idle vCPU-s it effectively means you
> want this only when switching back to what got switched out lazily
> before, i.e. this would be not something to take place on an arbitrary
> context switch). As to possible alternative names - maybe make the
> hooks ctxt_switch_prepare() and ctxt_switch_cancel()?

Why on earth is this more clear than what he had before?

In the first call, he's not "preparing" anything -- he's actually
switching the PI context out for prev.  And in the second call, he's
not "cancelling" anything -- he's actually switching the PI context in
for next.  The names you suggest are actively confusing, not helpful.

But before talking about how to make things more clear, one side
question -- do we need to actually call pi_ctxt_switch_to() in
__context_switch()?

The only other place __context_switch() is called is
from__sync_local_execstate().  But the only reason that needs to be
called is because sometimes we *don't* call __context_switch(), and so
there are things on the cpu that aren't copied into the vcpu struct.

That doesn't apply to the PI state -- for one, nothing is copied from
the processor; and for two, pi_ctxt_switch_from() is called
unconditionally anyway.

Would it make more sense to call pi_context_switch(prev, next) just
after "set_current"?

(Keeping in mind I totally may have missed something...)

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-16 Thread George Dunlap
On Thu, Sep 10, 2015 at 9:59 AM, Wu, Feng  wrote:
>> >> >> Calling vcpu_unblock() in the middle of context_switch()? Why? And
>> >> >> is this safe?
>> >> >
>> >> > I cannot see anything unsafe so far, can some scheduler maintainer
>> >> > help to confirm it? Dario? George?
>> >>
>> >> But first and foremost you ought to answer the "why".
>> >
>> > I think if you think this solution is unsafe, you need point out where it
>> > is, not
>> > just ask "is this safe ?", I don't think this is an effective comments.
>>
>> It is my general understanding that people wanting code to be
>> included have to prove their code is okay, not reviewers to prove
>> the code is broken.
>>
>> > Anyway, I go through the main path of the code as below, I really don't 
>> > find
>> > anything unsafe here.
>> >
>> > context_switch() -->
>> > pi_ctxt_switch_from() -->
>> > vmx_pre_ctx_switch_pi() -->
>> > vcpu_unblock() -->
>> > vcpu_wake() -->
>> > SCHED_OP(VCPU2OP(v), wake, v) -->
>> > csched_vcpu_wake() -->
>> > __runq_insert()
>> > __runq_tickle()
>> >
>> > If you continue to think it is unsafe, pointing out the place will be
>> > welcome!
>>
>> Once again - I didn't say it's unsafe (and hence can't point at a
>> particular place). What bothers me is that vcpu_unblock() affects
>> scheduler state, and altering scheduler state from inside the
>> context switch path looks problematic at the first glance. I'd be
>> happy if I was told there is no such problem, but be aware that
>> this would have to include latent ones: Even if right now things
>> are safe, having such feedback have the potential of becoming
>> an actual problem with a later scheduler change is already an
>> issue, as finding the problem is then likely going to be rather hard
>> (and I suspect it's not going to be you to debug this).
>
> What I can say is that after investigation, I don't find anything bad
> for this vcpu_unblock(), I tested it in my experiment. So that is why
> I'd like to ask some ideas from scheduler expects.

You still didn't answer Jan's question as to why you chose to call it there.

As to why Jan is suspicious, the hypervisor code is incredibly
complicated, and even hypervisor maintainers are only mortals. :-) One
way we deal with the complication is by trying to restrict the ways
different code interacts, so that people reading, writing, and
maintaining the code can make simplifying assumptions to make it
easier to grasp / harder to make mistakes.

People reading or working on vcpu_wake() code expect it to be called
from interrupt contexts, and also expect it to be called from normal
hypercalls.  They *don't* expect it to be called from in the middle of
a context switch, where "current", the registers, and all those things
are all up in the air.  It's possible that the code *already*
implicitly assumes it's not called from context switch (i.e., that v
== current means certain state), and even if not, it's possible
someone will make that assumption in the future, causing a very
hard-to-detect bug.

Now I haven't looked at the code in detail yet; it may be that the
only way to make PI work is to make this call.  If that's the case we
need to carefully examine assumptions, and carefully comment the code
to make sure people working on the scheduling code continue to think
carefully about their assumptions in the future.  But if we can get
away without doing that, it will make things much easier.

But it's certainly reasonable to expect the maintainers to offer you
an alternate solution if your proposed solution is unacceptable. :-)
Let me take a look and see what I think.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-10 Thread Jan Beulich
>>> On 10.09.15 at 10:59,  wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Thursday, September 10, 2015 4:28 PM
>> >>> On 10.09.15 at 04:07,  wrote:
>> >> From: Jan Beulich [mailto:jbeul...@suse.com]
>> >> Sent: Wednesday, September 09, 2015 6:27 PM
>> >> >>> On 09.09.15 at 10:56,  wrote:
>> >> >> From: Jan Beulich [mailto:jbeul...@suse.com]
>> >> >> Sent: Monday, September 07, 2015 8:55 PM
>> >> >> >>> On 25.08.15 at 03:57,  wrote:
>> >> >> > +
>> >> >> > +/*
>> >> >> > + * Delete the vCPU from the related block list
>> >> >> > + * if we are resuming from blocked state.
>> >> >> > + */
>> >> >> > +if ( v->arch.hvm_vmx.pi_block_cpu != -1 )
>> >> >> > +{
>> >> >> > +spin_lock_irqsave(_cpu(pi_blocked_vcpu_lock,
>> >> >> > +  v->arch.hvm_vmx.pi_block_cpu),
>> >> >> flags);
>> >> >> > +list_del_init(>arch.hvm_vmx.pi_blocked_vcpu_list);
>> >> >>
>> >> >> Shouldn't you set .pi_block_cpu back to -1 along with removing
>> >> >> the vCPU from the list? Which then ought to allow using just
>> >> >> list_del() here?
>> >> >
>> >> > Here is the story about using list_del_init() instead of list_del(), 
>> >> > (the
>> >> > same reason for using list_del_init() in pi_wakeup_interrupt() ).
>> >> > If we use list_del(), that means we need to set .pi_block_cpu to
>> >> > -1 after removing the vCPU from the block list, there are two
>> >> > places where the vCPU is removed from the list:
>> >> > - arch_vcpu_wake_prepare()
>> >> > - pi_wakeup_interrupt()
>> >> >
>> >> > That means we also need to set .pi_block_cpu to -1 in
>> >> > pi_wakeup_interrupt(), which seems problematic to me, since
>> >> > .pi_block_cpu is used to get the per-CPU spin lock, it is not safe
>> >> > to change it in pi_wakeup_interrupt(), maybe someone is using
>> >> > it at the same time.
>> >> >
>> >> > And the check "if ( v->arch.hvm_vmx.pi_block_cpu != -1 )" here
>> >> > is only used when .pi_block_cpu is set to -1 at the initial time.
>> >> >
>> >> > If you have any good suggestions about this, I will be all ears.
>> >>
>> >> Latching pi_block_cpu into a local variable would take care of that
>> >> part of the problem. Of course you then may also need to check
>> >> that while waiting to acquire the lock pi_block_cpu didn't change.
>> >
>> > Thanks for the suggestion! But I think it is not easy to "check
>> > "that while waiting to acquire the lock pi_block_cpu didn't change".
>> > Let's take the following pseudo code as an example:
>> >
>> > int local_pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
>> >
>> > ..
>> >
>> > spin_lock_irqsave(_cpu(pi_blocked_vcpu_lock,
>> >local_pi_block_cpu), flags);
>> > list_del(>arch.hvm_vmx.pi_blocked_vcpu_list);
>> > spin_unlock_irqrestore(_cpu(pi_blocked_vcpu_lock,
>> >local_pi_block_cpu), flags);
>> >
>> > If .pi_block_cpu is changed to -1 by others during calling list_del()
>> > above, that means the vCPU is removed by others, then calling
>> > list_del() here again would be problematic. I think there might be
>> > other corner cases for this. So I suggest adding some comments
>> > for list_del_init() as you mentioned below.
>> 
>> That's why I said "check that while waiting to acquire the lock
>> pi_block_cpu didn't change." The code you present does not do
>> this.
> 
> I didn't see we need implement it as the above code, I just list it
> here the show it is hard to do that. 
> First, how to check it while waiting to acquire the lock .pi_block_cpu
> didn't change?

Note the difference between "check while waiting" and "check that
while waiting": The former is indeed hard to implement, while the
latter is pretty straightforward (and we do so elsewhere).

> Secondly, even if we can check it, what should we do if .pi_block_cpu
> is changed after acquiring the lock as I mentioned above?

Drop the lock and start over. I.e. (taking your pseudo code)

restart:
local_pi_block_cpu = ...;
bail-if-invalid (e.g. -1 in current model)
spin_lock_irqsave(_cpu(, local_pi_block_cpu), flags);
if(local_pi_block_cpu != actual_pi_block_cpu) {
spin_unlock_irqrestore(_cpu(,local_pi_block_cpu), flags);
goto restart;
}
list_del(>arch.hvm_vmx.pi_blocked_vcpu_list);
spin_unlock_irqrestore(_cpu(,local_pi_block_cpu), flags);

>> > Anyway, I go through the main path of the code as below, I really don't 
>> > find
>> > anything unsafe here.
>> >
>> > context_switch() -->
>> > pi_ctxt_switch_from() -->
>> > vmx_pre_ctx_switch_pi() -->
>> > vcpu_unblock() -->
>> > vcpu_wake() -->
>> > SCHED_OP(VCPU2OP(v), wake, v) -->
>> > csched_vcpu_wake() -->
>> > __runq_insert()
>> > __runq_tickle()
>> >
>> > If you continue to think it is unsafe, 

Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-10 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, September 10, 2015 4:28 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
> xen-devel@lists.xen.org; Keir Fraser
> Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted
> interrupts
> 
> >>> On 10.09.15 at 04:07,  wrote:
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Wednesday, September 09, 2015 6:27 PM
> >> >>> On 09.09.15 at 10:56,  wrote:
> >> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> >> Sent: Monday, September 07, 2015 8:55 PM
> >> >> >>> On 25.08.15 at 03:57,  wrote:
> >> >> > +
> >> >> > +/*
> >> >> > + * Delete the vCPU from the related block list
> >> >> > + * if we are resuming from blocked state.
> >> >> > + */
> >> >> > +if ( v->arch.hvm_vmx.pi_block_cpu != -1 )
> >> >> > +{
> >> >> > +spin_lock_irqsave(_cpu(pi_blocked_vcpu_lock,
> >> >> > +  v->arch.hvm_vmx.pi_block_cpu),
> >> >> flags);
> >> >> > +list_del_init(>arch.hvm_vmx.pi_blocked_vcpu_list);
> >> >>
> >> >> Shouldn't you set .pi_block_cpu back to -1 along with removing
> >> >> the vCPU from the list? Which then ought to allow using just
> >> >> list_del() here?
> >> >
> >> > Here is the story about using list_del_init() instead of list_del(), (the
> >> > same reason for using list_del_init() in pi_wakeup_interrupt() ).
> >> > If we use list_del(), that means we need to set .pi_block_cpu to
> >> > -1 after removing the vCPU from the block list, there are two
> >> > places where the vCPU is removed from the list:
> >> > - arch_vcpu_wake_prepare()
> >> > - pi_wakeup_interrupt()
> >> >
> >> > That means we also need to set .pi_block_cpu to -1 in
> >> > pi_wakeup_interrupt(), which seems problematic to me, since
> >> > .pi_block_cpu is used to get the per-CPU spin lock, it is not safe
> >> > to change it in pi_wakeup_interrupt(), maybe someone is using
> >> > it at the same time.
> >> >
> >> > And the check "if ( v->arch.hvm_vmx.pi_block_cpu != -1 )" here
> >> > is only used when .pi_block_cpu is set to -1 at the initial time.
> >> >
> >> > If you have any good suggestions about this, I will be all ears.
> >>
> >> Latching pi_block_cpu into a local variable would take care of that
> >> part of the problem. Of course you then may also need to check
> >> that while waiting to acquire the lock pi_block_cpu didn't change.
> >
> > Thanks for the suggestion! But I think it is not easy to "check
> > "that while waiting to acquire the lock pi_block_cpu didn't change".
> > Let's take the following pseudo code as an example:
> >
> > int local_pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> >
> > ..
> >
> > spin_lock_irqsave(_cpu(pi_blocked_vcpu_lock,
> >local_pi_block_cpu), flags);
> > list_del(>arch.hvm_vmx.pi_blocked_vcpu_list);
> > spin_unlock_irqrestore(_cpu(pi_blocked_vcpu_lock,
> >local_pi_block_cpu), flags);
> >
> > If .pi_block_cpu is changed to -1 by others during calling list_del()
> > above, that means the vCPU is removed by others, then calling
> > list_del() here again would be problematic. I think there might be
> > other corner cases for this. So I suggest adding some comments
> > for list_del_init() as you mentioned below.
> 
> That's why I said "check that while waiting to acquire the lock
> pi_block_cpu didn't change." The code you present does not do
> this.

I didn't see we need implement it as the above code, I just list it
here the show it is hard to do that. 
First, how to check it while waiting to acquire the lock .pi_block_cpu
didn't change?
Secondly, even if we can check it, what should we do if .pi_block_cpu
is changed after acquiring the lock as I mentioned above?

> 
> >> >> > +do {
> >> >> > +old.control = new.control = pi_desc->control;
> >> >> > +
> >> >> > +/* Should not block the vCPU if an interrupt was posted
> for
> >> it.
> >> >> */
> >> >> > +if ( pi_test_on() )
> >> >> > +{
> >> >> > +
> spin_unlock_irqrestore(>arch.hvm_vmx.pi_lock,
> >> >> gflags);
> >> >> > +vcpu_unblock(v);
> >> >>
> >> >> Calling vcpu_unblock() in the middle of context_switch()? Why? And
> >> >> is this safe?
> >> >
> >> > I cannot see anything unsafe so far, can some scheduler maintainer
> >> > help to confirm it? Dario? George?
> >>
> >> But first and foremost you ought to answer the "why".
> >
> > I think if you think this solution is unsafe, you need point out where it
> > is, not
> > just ask "is this safe ?", I don't think this is an effective comments.
> 
> It is my general understanding that people wanting code to be
> included have to prove their code is okay, not reviewers to prove
> the code is broken.
> 
> > Anyway, I go through the main path of the code as below, I really don't find
> > 

Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-10 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, September 10, 2015 5:26 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
> xen-devel@lists.xen.org; Keir Fraser
> Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted
> interrupts
> 
> >>> On 10.09.15 at 10:59,  wrote:
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Thursday, September 10, 2015 4:28 PM
> >> >>> On 10.09.15 at 04:07,  wrote:
> >> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> >> Sent: Wednesday, September 09, 2015 6:27 PM
> >> >> >>> On 09.09.15 at 10:56,  wrote:
> >> >> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> >> >> Sent: Monday, September 07, 2015 8:55 PM
> >> >> >> >>> On 25.08.15 at 03:57,  wrote:
> >> >> >> > +
> >> >> >> > +/*
> >> >> >> > + * Delete the vCPU from the related block list
> >> >> >> > + * if we are resuming from blocked state.
> >> >> >> > + */
> >> >> >> > +if ( v->arch.hvm_vmx.pi_block_cpu != -1 )
> >> >> >> > +{
> >> >> >> > +spin_lock_irqsave(_cpu(pi_blocked_vcpu_lock,
> >> >> >> > +
> v->arch.hvm_vmx.pi_block_cpu),
> >> >> >> flags);
> >> >> >> > +
> list_del_init(>arch.hvm_vmx.pi_blocked_vcpu_list);
> >> >> >>
> >> >> >> Shouldn't you set .pi_block_cpu back to -1 along with removing
> >> >> >> the vCPU from the list? Which then ought to allow using just
> >> >> >> list_del() here?
> >> >> >
> >> >> > Here is the story about using list_del_init() instead of list_del(), 
> >> >> > (the
> >> >> > same reason for using list_del_init() in pi_wakeup_interrupt() ).
> >> >> > If we use list_del(), that means we need to set .pi_block_cpu to
> >> >> > -1 after removing the vCPU from the block list, there are two
> >> >> > places where the vCPU is removed from the list:
> >> >> > - arch_vcpu_wake_prepare()
> >> >> > - pi_wakeup_interrupt()
> >> >> >
> >> >> > That means we also need to set .pi_block_cpu to -1 in
> >> >> > pi_wakeup_interrupt(), which seems problematic to me, since
> >> >> > .pi_block_cpu is used to get the per-CPU spin lock, it is not safe
> >> >> > to change it in pi_wakeup_interrupt(), maybe someone is using
> >> >> > it at the same time.
> >> >> >
> >> >> > And the check "if ( v->arch.hvm_vmx.pi_block_cpu != -1 )" here
> >> >> > is only used when .pi_block_cpu is set to -1 at the initial time.
> >> >> >
> >> >> > If you have any good suggestions about this, I will be all ears.
> >> >>
> >> >> Latching pi_block_cpu into a local variable would take care of that
> >> >> part of the problem. Of course you then may also need to check
> >> >> that while waiting to acquire the lock pi_block_cpu didn't change.
> >> >
> >> > Thanks for the suggestion! But I think it is not easy to "check
> >> > "that while waiting to acquire the lock pi_block_cpu didn't change".
> >> > Let's take the following pseudo code as an example:
> >> >
> >> > int local_pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> >> >
> >> > ..
> >> >
> >> > spin_lock_irqsave(_cpu(pi_blocked_vcpu_lock,
> >> >local_pi_block_cpu), flags);
> >> > list_del(>arch.hvm_vmx.pi_blocked_vcpu_list);
> >> > spin_unlock_irqrestore(_cpu(pi_blocked_vcpu_lock,
> >> >local_pi_block_cpu), flags);
> >> >
> >> > If .pi_block_cpu is changed to -1 by others during calling list_del()
> >> > above, that means the vCPU is removed by others, then calling
> >> > list_del() here again would be problematic. I think there might be
> >> > other corner cases for this. So I suggest adding some comments
> >> > for list_del_init() as you mentioned below.
> >>
> >> That's why I said "check that while waiting to acquire the lock
> >> pi_block_cpu didn't change." The code you present does not do
> >> this.
> >
> > I didn't see we need implement it as the above code, I just list it
> > here the show it is hard to do that.
> > First, how to check it while waiting to acquire the lock .pi_block_cpu
> > didn't change?
> 
> Note the difference between "check while waiting" and "check that
> while waiting": The former is indeed hard to implement, while the
> latter is pretty straightforward (and we do so elsewhere).
> 
> > Secondly, even if we can check it, what should we do if .pi_block_cpu
> > is changed after acquiring the lock as I mentioned above?
> 
> Drop the lock and start over. I.e. (taking your pseudo code)
> 
> restart:
> local_pi_block_cpu = ...;
> bail-if-invalid (e.g. -1 in current model)
> spin_lock_irqsave(_cpu(, local_pi_block_cpu), flags);
> if(local_pi_block_cpu != actual_pi_block_cpu) {
> spin_unlock_irqrestore(_cpu(,local_pi_block_cpu), flags);
> goto restart;
> }

Thanks a lot for showing me this pseudo code! My concern is if
.pi_block_vcpu is changed to -1 at this point, it doesn't work.
.pi_block_vcpu being -1 here means the vCPU is 

Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-10 Thread Jan Beulich
>>> On 10.09.15 at 11:41,  wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Thursday, September 10, 2015 5:26 PM
>> >>> On 10.09.15 at 10:59,  wrote:
>> > First, how to check it while waiting to acquire the lock .pi_block_cpu
>> > didn't change?
>> 
>> Note the difference between "check while waiting" and "check that
>> while waiting": The former is indeed hard to implement, while the
>> latter is pretty straightforward (and we do so elsewhere).
>> 
>> > Secondly, even if we can check it, what should we do if .pi_block_cpu
>> > is changed after acquiring the lock as I mentioned above?
>> 
>> Drop the lock and start over. I.e. (taking your pseudo code)
>> 
>> restart:
>> local_pi_block_cpu = ...;
>> bail-if-invalid (e.g. -1 in current model)
>> spin_lock_irqsave(_cpu(, local_pi_block_cpu), flags);
>> if(local_pi_block_cpu != actual_pi_block_cpu) {
>> spin_unlock_irqrestore(_cpu(,local_pi_block_cpu), flags);
>> goto restart;
>> }
> 
> Thanks a lot for showing me this pseudo code! My concern is if
> .pi_block_vcpu is changed to -1 at this point, it doesn't work.
> .pi_block_vcpu being -1 here means the vCPU is remove from
> the blocking list by others, then we cannot delete it again via
> list_del() here.

Did you miss the "bail-if-invalid" above?

> BTW, I cannot see performance overhead for list_del_init()
> compared to list_del().
> 
> list_del():
> static inline void list_del(struct list_head *entry)
> {
> ASSERT(entry->next->prev == entry);
> ASSERT(entry->prev->next == entry);
> __list_del(entry->prev, entry->next);
> entry->next = LIST_POISON1;
> entry->prev = LIST_POISON2;
> }
> 
> list_del_init():
> static inline void list_del_init(struct list_head *entry)
> {
> __list_del(entry->prev, entry->next);
> INIT_LIST_HEAD(entry);
> }

Well, yes, both do two stores (I forgot about the poisoning), but
arguably the poisoning could become a debug-build-only thing. I.e.
it is an implementation detail that the number of stores currently
is the same. From an abstract perspective one should still prefer
list_del() when the re-init isn't really needed. And in the specific
case here asking you to use list_del() makes sure the code ends
up not even trying the deletion when not needed.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-10 Thread Jan Beulich
>>> On 10.09.15 at 04:07,  wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Wednesday, September 09, 2015 6:27 PM
>> >>> On 09.09.15 at 10:56,  wrote:
>> >> From: Jan Beulich [mailto:jbeul...@suse.com]
>> >> Sent: Monday, September 07, 2015 8:55 PM
>> >> >>> On 25.08.15 at 03:57,  wrote:
>> >> > +
>> >> > +/*
>> >> > + * Delete the vCPU from the related block list
>> >> > + * if we are resuming from blocked state.
>> >> > + */
>> >> > +if ( v->arch.hvm_vmx.pi_block_cpu != -1 )
>> >> > +{
>> >> > +spin_lock_irqsave(_cpu(pi_blocked_vcpu_lock,
>> >> > +  v->arch.hvm_vmx.pi_block_cpu),
>> >> flags);
>> >> > +list_del_init(>arch.hvm_vmx.pi_blocked_vcpu_list);
>> >>
>> >> Shouldn't you set .pi_block_cpu back to -1 along with removing
>> >> the vCPU from the list? Which then ought to allow using just
>> >> list_del() here?
>> >
>> > Here is the story about using list_del_init() instead of list_del(), (the
>> > same reason for using list_del_init() in pi_wakeup_interrupt() ).
>> > If we use list_del(), that means we need to set .pi_block_cpu to
>> > -1 after removing the vCPU from the block list, there are two
>> > places where the vCPU is removed from the list:
>> > - arch_vcpu_wake_prepare()
>> > - pi_wakeup_interrupt()
>> >
>> > That means we also need to set .pi_block_cpu to -1 in
>> > pi_wakeup_interrupt(), which seems problematic to me, since
>> > .pi_block_cpu is used to get the per-CPU spin lock, it is not safe
>> > to change it in pi_wakeup_interrupt(), maybe someone is using
>> > it at the same time.
>> >
>> > And the check "if ( v->arch.hvm_vmx.pi_block_cpu != -1 )" here
>> > is only used when .pi_block_cpu is set to -1 at the initial time.
>> >
>> > If you have any good suggestions about this, I will be all ears.
>> 
>> Latching pi_block_cpu into a local variable would take care of that
>> part of the problem. Of course you then may also need to check
>> that while waiting to acquire the lock pi_block_cpu didn't change.
> 
> Thanks for the suggestion! But I think it is not easy to "check
> "that while waiting to acquire the lock pi_block_cpu didn't change".
> Let's take the following pseudo code as an example:
> 
> int local_pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> 
> ..
> 
> spin_lock_irqsave(_cpu(pi_blocked_vcpu_lock,
>local_pi_block_cpu), flags);
> list_del(>arch.hvm_vmx.pi_blocked_vcpu_list);
> spin_unlock_irqrestore(_cpu(pi_blocked_vcpu_lock,
>local_pi_block_cpu), flags);
> 
> If .pi_block_cpu is changed to -1 by others during calling list_del()
> above, that means the vCPU is removed by others, then calling
> list_del() here again would be problematic. I think there might be
> other corner cases for this. So I suggest adding some comments
> for list_del_init() as you mentioned below.

That's why I said "check that while waiting to acquire the lock
pi_block_cpu didn't change." The code you present does not do
this.

>> >> > +do {
>> >> > +old.control = new.control = pi_desc->control;
>> >> > +
>> >> > +/* Should not block the vCPU if an interrupt was posted for
>> it.
>> >> */
>> >> > +if ( pi_test_on() )
>> >> > +{
>> >> > +spin_unlock_irqrestore(>arch.hvm_vmx.pi_lock,
>> >> gflags);
>> >> > +vcpu_unblock(v);
>> >>
>> >> Calling vcpu_unblock() in the middle of context_switch()? Why? And
>> >> is this safe?
>> >
>> > I cannot see anything unsafe so far, can some scheduler maintainer
>> > help to confirm it? Dario? George?
>> 
>> But first and foremost you ought to answer the "why".
> 
> I think if you think this solution is unsafe, you need point out where it 
> is, not
> just ask "is this safe ?", I don't think this is an effective comments.

It is my general understanding that people wanting code to be
included have to prove their code is okay, not reviewers to prove
the code is broken.

> Anyway, I go through the main path of the code as below, I really don't find
> anything unsafe here.
> 
> context_switch() -->
> pi_ctxt_switch_from() -->
> vmx_pre_ctx_switch_pi() -->
> vcpu_unblock() -->
> vcpu_wake() -->
> SCHED_OP(VCPU2OP(v), wake, v) -->
> csched_vcpu_wake() -->
> __runq_insert()
> __runq_tickle()
> 
> If you continue to think it is unsafe, pointing out the place will be 
> welcome!

Once again - I didn't say it's unsafe (and hence can't point at a
particular place). What bothers me is that vcpu_unblock() affects
scheduler state, and altering scheduler state from inside the
context switch path looks problematic at the first glance. I'd be
happy if I was told there is no such problem, but be aware that
this would have to include latent ones: Even if 

Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-10 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, September 10, 2015 8:45 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
> xen-devel@lists.xen.org; Keir Fraser
> Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted
> interrupts
> 
> >>> On 10.09.15 at 14:34,  wrote:
> 
> >
> >> -Original Message-
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Thursday, September 10, 2015 6:01 PM
> >> To: Wu, Feng
> >> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
> >> xen-devel@lists.xen.org; Keir Fraser
> >> Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d
> posted
> >> interrupts
> >>
> >> >>> On 10.09.15 at 11:41,  wrote:
> >> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> >> Sent: Thursday, September 10, 2015 5:26 PM
> >> >> >>> On 10.09.15 at 10:59,  wrote:
> >> >> > First, how to check it while waiting to acquire the lock .pi_block_cpu
> >> >> > didn't change?
> >> >>
> >> >> Note the difference between "check while waiting" and "check that
> >> >> while waiting": The former is indeed hard to implement, while the
> >> >> latter is pretty straightforward (and we do so elsewhere).
> >> >>
> >> >> > Secondly, even if we can check it, what should we do if .pi_block_cpu
> >> >> > is changed after acquiring the lock as I mentioned above?
> >> >>
> >> >> Drop the lock and start over. I.e. (taking your pseudo code)
> >> >>
> >> >> restart:
> >> >> local_pi_block_cpu = ...;
> >> >> bail-if-invalid (e.g. -1 in current model)
> >> >> spin_lock_irqsave(_cpu(, local_pi_block_cpu), flags);
> >> >> if(local_pi_block_cpu != actual_pi_block_cpu) {
> >> >> spin_unlock_irqrestore(_cpu(,local_pi_block_cpu), flags);
> >> >> goto restart;
> >> >> }
> >> >
> >> > Thanks a lot for showing me this pseudo code! My concern is if
> >> > .pi_block_vcpu is changed to -1 at this point, it doesn't work.
> >> > .pi_block_vcpu being -1 here means the vCPU is remove from
> >> > the blocking list by others, then we cannot delete it again via
> >> > list_del() here.
> >>
> >> Did you miss the "bail-if-invalid" above?
> >
> > I am sorry, do I miss something here? If .pi_block_cpu becomes
> > -1 here (after the above 'if' statement is finished with
> > local_pi_block_cpu == actual_pi_block_cpu ), how can "bail-if-invalid"
> > above help?
> 
> The (obvious I thought) implication is that all assignments to
> pi_block_cpu (along with all list manipulations) now need to happen
> with the lock held.

If all the assignment to pi_block_cpu is with one lock held, I don't think
we need to above checkout, we can safely use .pi_block_cpu, right?

Thanks,
Feng

> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-10 Thread Jan Beulich
>>> On 10.09.15 at 14:34,  wrote:

> 
>> -Original Message-
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Thursday, September 10, 2015 6:01 PM
>> To: Wu, Feng
>> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
>> xen-devel@lists.xen.org; Keir Fraser
>> Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted
>> interrupts
>> 
>> >>> On 10.09.15 at 11:41,  wrote:
>> >> From: Jan Beulich [mailto:jbeul...@suse.com]
>> >> Sent: Thursday, September 10, 2015 5:26 PM
>> >> >>> On 10.09.15 at 10:59,  wrote:
>> >> > First, how to check it while waiting to acquire the lock .pi_block_cpu
>> >> > didn't change?
>> >>
>> >> Note the difference between "check while waiting" and "check that
>> >> while waiting": The former is indeed hard to implement, while the
>> >> latter is pretty straightforward (and we do so elsewhere).
>> >>
>> >> > Secondly, even if we can check it, what should we do if .pi_block_cpu
>> >> > is changed after acquiring the lock as I mentioned above?
>> >>
>> >> Drop the lock and start over. I.e. (taking your pseudo code)
>> >>
>> >> restart:
>> >> local_pi_block_cpu = ...;
>> >> bail-if-invalid (e.g. -1 in current model)
>> >> spin_lock_irqsave(_cpu(, local_pi_block_cpu), flags);
>> >> if(local_pi_block_cpu != actual_pi_block_cpu) {
>> >> spin_unlock_irqrestore(_cpu(,local_pi_block_cpu), flags);
>> >> goto restart;
>> >> }
>> >
>> > Thanks a lot for showing me this pseudo code! My concern is if
>> > .pi_block_vcpu is changed to -1 at this point, it doesn't work.
>> > .pi_block_vcpu being -1 here means the vCPU is remove from
>> > the blocking list by others, then we cannot delete it again via
>> > list_del() here.
>> 
>> Did you miss the "bail-if-invalid" above?
> 
> I am sorry, do I miss something here? If .pi_block_cpu becomes
> -1 here (after the above 'if' statement is finished with
> local_pi_block_cpu == actual_pi_block_cpu ), how can "bail-if-invalid"
> above help?

The (obvious I thought) implication is that all assignments to
pi_block_cpu (along with all list manipulations) now need to happen
with the lock held.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-10 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, September 10, 2015 9:15 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
> xen-devel@lists.xen.org; Keir Fraser
> Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted
> interrupts
> 
> >>> On 10.09.15 at 14:58,  wrote:
> 
> >
> >> -Original Message-
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Thursday, September 10, 2015 8:45 PM
> >> To: Wu, Feng
> >> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
> >> xen-devel@lists.xen.org; Keir Fraser
> >> Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d
> posted
> >> interrupts
> >>
> >> >>> On 10.09.15 at 14:34,  wrote:
> >>
> >> >
> >> >> -Original Message-
> >> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> >> Sent: Thursday, September 10, 2015 6:01 PM
> >> >> To: Wu, Feng
> >> >> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
> >> >> xen-devel@lists.xen.org; Keir Fraser
> >> >> Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d
> >> posted
> >> >> interrupts
> >> >>
> >> >> >>> On 10.09.15 at 11:41,  wrote:
> >> >> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> >> >> Sent: Thursday, September 10, 2015 5:26 PM
> >> >> >> >>> On 10.09.15 at 10:59,  wrote:
> >> >> >> > First, how to check it while waiting to acquire the lock 
> >> >> >> > .pi_block_cpu
> >> >> >> > didn't change?
> >> >> >>
> >> >> >> Note the difference between "check while waiting" and "check that
> >> >> >> while waiting": The former is indeed hard to implement, while the
> >> >> >> latter is pretty straightforward (and we do so elsewhere).
> >> >> >>
> >> >> >> > Secondly, even if we can check it, what should we do if 
> >> >> >> > .pi_block_cpu
> >> >> >> > is changed after acquiring the lock as I mentioned above?
> >> >> >>
> >> >> >> Drop the lock and start over. I.e. (taking your pseudo code)
> >> >> >>
> >> >> >> restart:
> >> >> >> local_pi_block_cpu = ...;
> >> >> >> bail-if-invalid (e.g. -1 in current model)
> >> >> >> spin_lock_irqsave(_cpu(, local_pi_block_cpu), flags);
> >> >> >> if(local_pi_block_cpu != actual_pi_block_cpu) {
> >> >> >> spin_unlock_irqrestore(_cpu(,local_pi_block_cpu),
> flags);
> >> >> >> goto restart;
> >> >> >> }
> >> >> >
> >> >> > Thanks a lot for showing me this pseudo code! My concern is if
> >> >> > .pi_block_vcpu is changed to -1 at this point, it doesn't work.
> >> >> > .pi_block_vcpu being -1 here means the vCPU is remove from
> >> >> > the blocking list by others, then we cannot delete it again via
> >> >> > list_del() here.
> >> >>
> >> >> Did you miss the "bail-if-invalid" above?
> >> >
> >> > I am sorry, do I miss something here? If .pi_block_cpu becomes
> >> > -1 here (after the above 'if' statement is finished with
> >> > local_pi_block_cpu == actual_pi_block_cpu ), how can "bail-if-invalid"
> >> > above help?
> >>
> >> The (obvious I thought) implication is that all assignments to
> >> pi_block_cpu (along with all list manipulations) now need to happen
> >> with the lock held.
> >
> > If all the assignment to pi_block_cpu is with one lock held, I don't think
> > we need to above checkout, we can safely use .pi_block_cpu, right?
> 
> No. In order to use it you need to make sure it's valid (or else
> there's no valid lock to acquire). And once you acquired the
> lock, you have to check whether changed / became invalid.

If all the assignment to .pi_block_cpu is with one lock held,
how can it get changed / become invalid then once I acquired
the lock?

Thanks,
Feng

> Plus the up front check allows to avoid acquiring the lock in the
> first place when the field is invalid anyway.
> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-10 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, September 10, 2015 6:01 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
> xen-devel@lists.xen.org; Keir Fraser
> Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted
> interrupts
> 
> >>> On 10.09.15 at 11:41,  wrote:
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Thursday, September 10, 2015 5:26 PM
> >> >>> On 10.09.15 at 10:59,  wrote:
> >> > First, how to check it while waiting to acquire the lock .pi_block_cpu
> >> > didn't change?
> >>
> >> Note the difference between "check while waiting" and "check that
> >> while waiting": The former is indeed hard to implement, while the
> >> latter is pretty straightforward (and we do so elsewhere).
> >>
> >> > Secondly, even if we can check it, what should we do if .pi_block_cpu
> >> > is changed after acquiring the lock as I mentioned above?
> >>
> >> Drop the lock and start over. I.e. (taking your pseudo code)
> >>
> >> restart:
> >> local_pi_block_cpu = ...;
> >> bail-if-invalid (e.g. -1 in current model)
> >> spin_lock_irqsave(_cpu(, local_pi_block_cpu), flags);
> >> if(local_pi_block_cpu != actual_pi_block_cpu) {
> >> spin_unlock_irqrestore(_cpu(,local_pi_block_cpu), flags);
> >> goto restart;
> >> }
> >
> > Thanks a lot for showing me this pseudo code! My concern is if
> > .pi_block_vcpu is changed to -1 at this point, it doesn't work.
> > .pi_block_vcpu being -1 here means the vCPU is remove from
> > the blocking list by others, then we cannot delete it again via
> > list_del() here.
> 
> Did you miss the "bail-if-invalid" above?

I am sorry, do I miss something here? If .pi_block_cpu becomes
-1 here (after the above 'if' statement is finished with
local_pi_block_cpu == actual_pi_block_cpu ), how can "bail-if-invalid"
above help?

Thanks,
Feng

> 
> > BTW, I cannot see performance overhead for list_del_init()
> > compared to list_del().
> >
> > list_del():
> > static inline void list_del(struct list_head *entry)
> > {
> > ASSERT(entry->next->prev == entry);
> > ASSERT(entry->prev->next == entry);
> > __list_del(entry->prev, entry->next);
> > entry->next = LIST_POISON1;
> > entry->prev = LIST_POISON2;
> > }
> >
> > list_del_init():
> > static inline void list_del_init(struct list_head *entry)
> > {
> > __list_del(entry->prev, entry->next);
> > INIT_LIST_HEAD(entry);
> > }
> 
> Well, yes, both do two stores (I forgot about the poisoning), but
> arguably the poisoning could become a debug-build-only thing. I.e.
> it is an implementation detail that the number of stores currently
> is the same. From an abstract perspective one should still prefer
> list_del() when the re-init isn't really needed. And in the specific
> case here asking you to use list_del() makes sure the code ends
> up not even trying the deletion when not needed.
> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-10 Thread Jan Beulich
>>> On 10.09.15 at 15:27,  wrote:

> 
>> -Original Message-
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Thursday, September 10, 2015 9:15 PM
>> To: Wu, Feng
>> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
>> xen-devel@lists.xen.org; Keir Fraser
>> Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted
>> interrupts
>> 
>> >>> On 10.09.15 at 14:58,  wrote:
>> 
>> >
>> >> -Original Message-
>> >> From: Jan Beulich [mailto:jbeul...@suse.com]
>> >> Sent: Thursday, September 10, 2015 8:45 PM
>> >> To: Wu, Feng
>> >> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
>> >> xen-devel@lists.xen.org; Keir Fraser
>> >> Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d
>> posted
>> >> interrupts
>> >>
>> >> >>> On 10.09.15 at 14:34,  wrote:
>> >>
>> >> >
>> >> >> -Original Message-
>> >> >> From: Jan Beulich [mailto:jbeul...@suse.com]
>> >> >> Sent: Thursday, September 10, 2015 6:01 PM
>> >> >> To: Wu, Feng
>> >> >> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
>> >> >> xen-devel@lists.xen.org; Keir Fraser
>> >> >> Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d
>> >> posted
>> >> >> interrupts
>> >> >>
>> >> >> >>> On 10.09.15 at 11:41,  wrote:
>> >> >> >> From: Jan Beulich [mailto:jbeul...@suse.com]
>> >> >> >> Sent: Thursday, September 10, 2015 5:26 PM
>> >> >> >> >>> On 10.09.15 at 10:59,  wrote:
>> >> >> >> > First, how to check it while waiting to acquire the lock 
>> >> >> >> > .pi_block_cpu
>> >> >> >> > didn't change?
>> >> >> >>
>> >> >> >> Note the difference between "check while waiting" and "check that
>> >> >> >> while waiting": The former is indeed hard to implement, while the
>> >> >> >> latter is pretty straightforward (and we do so elsewhere).
>> >> >> >>
>> >> >> >> > Secondly, even if we can check it, what should we do if 
>> >> >> >> > .pi_block_cpu
>> >> >> >> > is changed after acquiring the lock as I mentioned above?
>> >> >> >>
>> >> >> >> Drop the lock and start over. I.e. (taking your pseudo code)
>> >> >> >>
>> >> >> >> restart:
>> >> >> >> local_pi_block_cpu = ...;
>> >> >> >> bail-if-invalid (e.g. -1 in current model)
>> >> >> >> spin_lock_irqsave(_cpu(, local_pi_block_cpu), flags);
>> >> >> >> if(local_pi_block_cpu != actual_pi_block_cpu) {
>> >> >> >> spin_unlock_irqrestore(_cpu(,local_pi_block_cpu),
>> flags);
>> >> >> >> goto restart;
>> >> >> >> }
>> >> >> >
>> >> >> > Thanks a lot for showing me this pseudo code! My concern is if
>> >> >> > .pi_block_vcpu is changed to -1 at this point, it doesn't work.
>> >> >> > .pi_block_vcpu being -1 here means the vCPU is remove from
>> >> >> > the blocking list by others, then we cannot delete it again via
>> >> >> > list_del() here.
>> >> >>
>> >> >> Did you miss the "bail-if-invalid" above?
>> >> >
>> >> > I am sorry, do I miss something here? If .pi_block_cpu becomes
>> >> > -1 here (after the above 'if' statement is finished with
>> >> > local_pi_block_cpu == actual_pi_block_cpu ), how can "bail-if-invalid"
>> >> > above help?
>> >>
>> >> The (obvious I thought) implication is that all assignments to
>> >> pi_block_cpu (along with all list manipulations) now need to happen
>> >> with the lock held.
>> >
>> > If all the assignment to pi_block_cpu is with one lock held, I don't think
>> > we need to above checkout, we can safely use .pi_block_cpu, right?
>> 
>> No. In order to use it you need to make sure it's valid (or else
>> there's no valid lock to acquire). And once you acquired the
>> lock, you have to check whether changed / became invalid.
> 
> If all the assignment to .pi_block_cpu is with one lock held,
> how can it get changed / become invalid then once I acquired
> the lock?

Again - if pi_block_cpu is invalid, which lock do you want to acquire?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-09 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Monday, September 07, 2015 8:55 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
> xen-devel@lists.xen.org; Keir Fraser
> Subject: Re: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted
> interrupts
> 
> >>> On 25.08.15 at 03:57,  wrote:
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -1573,6 +1573,22 @@ static void __context_switch(void)
> >  per_cpu(curr_vcpu, cpu) = n;
> >  }
> >
> > +static inline void pi_ctxt_switch_from(struct vcpu *prev)
> > +{
> > +/*
> > + * When switching from non-idle to idle, we only do a lazy context
> switch.
> > + * However, in order for posted interrupt (if available and enabled) to
> > + * work properly, we at least need to update the descriptors.
> > + */
> > +if ( prev->arch.pi_ctxt_switch_from && !is_idle_vcpu(prev) )
> > +prev->arch.pi_ctxt_switch_from(prev);
> > +}
> > +
> > +static inline void pi_ctxt_switch_to(struct vcpu *next)
> > +{
> > +if ( next->arch.pi_ctxt_switch_to && !is_idle_vcpu(next) )
> > +next->arch.pi_ctxt_switch_to(next);
> > +}
> >
> >  void context_switch(struct vcpu *prev, struct vcpu *next)
> >  {
> > @@ -1605,9 +1621,12 @@ void context_switch(struct vcpu *prev, struct
> vcpu *next)
> >
> >  set_current(next);
> >
> > +pi_ctxt_switch_from(prev);
> > +
> >  if ( (per_cpu(curr_vcpu, cpu) == next) ||
> >   (is_idle_domain(nextd) && cpu_online(cpu)) )
> >  {
> > +pi_ctxt_switch_to(next);
> >  local_irq_enable();
> 
> This placement, if really intended that way, needs explanation (in a
> comment) and perhaps even renaming of the involved symbols, as

I think maybe removing the static wrapper function can make thing
clearer. What is your opinion?

> looking at it from a general perspective it seems wrong (with
> pi_ctxt_switch_to() excluding idle vCPU-s it effectively means you
> want this only when switching back to what got switched out lazily
> before, i.e. this would be not something to take place on an arbitrary
> context switch). As to possible alternative names - maybe make the
> hooks ctxt_switch_prepare() and ctxt_switch_cancel()?
> 
> > @@ -117,10 +119,20 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> >  INIT_LIST_HEAD(>arch.hvm_vmx.pi_blocked_vcpu_list);
> >  INIT_LIST_HEAD(>arch.hvm_vmx.pi_vcpu_on_set_list);
> >
> > +v->arch.hvm_vmx.pi_block_cpu = -1;
> > +
> > +spin_lock_init(>arch.hvm_vmx.pi_lock);
> > +
> >  v->arch.schedule_tail= vmx_do_resume;
> >  v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
> >  v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
> >
> > +if ( iommu_intpost && is_hvm_vcpu(v) )
> > +{
> > +v->arch.pi_ctxt_switch_from = vmx_pre_ctx_switch_pi;
> > +v->arch.pi_ctxt_switch_to = vmx_post_ctx_switch_pi;
> > +}
> 
> Why conditional upon is_hvm_vcpu()?

I only think we need to add these hooks for PV guests, right?
Or you mean I should use has_hvm_container_vcpu() instead?

> 
> > @@ -718,6 +730,140 @@ static void vmx_fpu_leave(struct vcpu *v)
> >  }
> >  }
> >
> > +void arch_vcpu_wake_prepare(struct vcpu *v)
> 
> A non-static function with this name should not live in vmx.c.
> 
> > +{
> > +unsigned long gflags;
> 
> Just "flags" please.
> 
> > +if ( !iommu_intpost || !is_hvm_vcpu(v)
> || !has_arch_pdevs(v->domain) )
> 
> DYM !has_hvm_container_vcpu()?
> 
> > +return;
> > +
> > +spin_lock_irqsave(>arch.hvm_vmx.pi_lock, gflags);
> > +
> > +if ( likely(vcpu_runnable(v)) ||
> > + !test_bit(_VPF_blocked, >pause_flags) )
> 
> _VPF_blocked set implies !vcpu_runnable(), i.e. afaict just the
> latter check would suffice if this is really what you mean.
> 
> > +{
> > +struct pi_desc *pi_desc = >arch.hvm_vmx.pi_desc;
> > +unsigned long flags;
> 
> You don't need this - you already know IRQs are off for the lock
> acquire below.
> 
> > +/*
> > + * We don't need to send notification event to a non-running
> > + * vcpu, the interrupt information will be delivered to it before
> > + * VM-ENTRY when the vcpu is scheduled to run next time.
> > + */
> > +pi_set_sn(pi_desc);
> > +
> > +/*
> > + * Set 'NV' field back to posted_intr_vector, so the
> > + * Posted-Interrupts can be delivered to the vCPU by
> > + * VT-d HW after it is scheduled to run.
> > + */
> > +write_atomic((uint8_t*)_desc->nv, posted_intr_vector);
> 
> Bogus cast.
> 
> > +
> > +/*
> > + * Delete the vCPU from the related block list
> > + * if we are resuming from blocked state.
> > + */
> > +if ( v->arch.hvm_vmx.pi_block_cpu != -1 )
> > +{
> > +spin_lock_irqsave(_cpu(pi_blocked_vcpu_lock,
> > +  

Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-09 Thread Jan Beulich
>>> On 09.09.15 at 10:56,  wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Monday, September 07, 2015 8:55 PM
>> >>> On 25.08.15 at 03:57,  wrote:
>> > @@ -1605,9 +1621,12 @@ void context_switch(struct vcpu *prev, struct
>> vcpu *next)
>> >
>> >  set_current(next);
>> >
>> > +pi_ctxt_switch_from(prev);
>> > +
>> >  if ( (per_cpu(curr_vcpu, cpu) == next) ||
>> >   (is_idle_domain(nextd) && cpu_online(cpu)) )
>> >  {
>> > +pi_ctxt_switch_to(next);
>> >  local_irq_enable();
>> 
>> This placement, if really intended that way, needs explanation (in a
>> comment) and perhaps even renaming of the involved symbols, as
> 
> I think maybe removing the static wrapper function can make thing
> clearer. What is your opinion?

Considering that there's going to be a second use of the
switch-to variant, I think the helpers are okay to be kept (but
I wouldn't insist on that), just that they need to be named in
a away making clear what their purpose is.

>> > @@ -117,10 +119,20 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>> >  INIT_LIST_HEAD(>arch.hvm_vmx.pi_blocked_vcpu_list);
>> >  INIT_LIST_HEAD(>arch.hvm_vmx.pi_vcpu_on_set_list);
>> >
>> > +v->arch.hvm_vmx.pi_block_cpu = -1;
>> > +
>> > +spin_lock_init(>arch.hvm_vmx.pi_lock);
>> > +
>> >  v->arch.schedule_tail= vmx_do_resume;
>> >  v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>> >  v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
>> >
>> > +if ( iommu_intpost && is_hvm_vcpu(v) )
>> > +{
>> > +v->arch.pi_ctxt_switch_from = vmx_pre_ctx_switch_pi;
>> > +v->arch.pi_ctxt_switch_to = vmx_post_ctx_switch_pi;
>> > +}
>> 
>> Why conditional upon is_hvm_vcpu()?
> 
> I only think we need to add these hooks for PV guests, right?

"... we don't need to ..."?

> Or you mean I should use has_hvm_container_vcpu() instead?

Exactly.

>> > +
>> > +/*
>> > + * Delete the vCPU from the related block list
>> > + * if we are resuming from blocked state.
>> > + */
>> > +if ( v->arch.hvm_vmx.pi_block_cpu != -1 )
>> > +{
>> > +spin_lock_irqsave(_cpu(pi_blocked_vcpu_lock,
>> > +  v->arch.hvm_vmx.pi_block_cpu),
>> flags);
>> > +list_del_init(>arch.hvm_vmx.pi_blocked_vcpu_list);
>> 
>> Shouldn't you set .pi_block_cpu back to -1 along with removing
>> the vCPU from the list? Which then ought to allow using just
>> list_del() here?
> 
> Here is the story about using list_del_init() instead of list_del(), (the
> same reason for using list_del_init() in pi_wakeup_interrupt() ).
> If we use list_del(), that means we need to set .pi_block_cpu to
> -1 after removing the vCPU from the block list, there are two
> places where the vCPU is removed from the list:
> - arch_vcpu_wake_prepare()
> - pi_wakeup_interrupt()
> 
> That means we also need to set .pi_block_cpu to -1 in
> pi_wakeup_interrupt(), which seems problematic to me, since
> .pi_block_cpu is used to get the per-CPU spin lock, it is not safe
> to change it in pi_wakeup_interrupt(), maybe someone is using
> it at the same time.
> 
> And the check "if ( v->arch.hvm_vmx.pi_block_cpu != -1 )" here
> is only used when .pi_block_cpu is set to -1 at the initial time.
> 
> If you have any good suggestions about this, I will be all ears.

Latching pi_block_cpu into a local variable would take care of that
part of the problem. Of course you then may also need to check
that while waiting to acquire the lock pi_block_cpu didn't change.
And if that absolutely can't be made work correctly, these
apparently strange uses of list_del_init() would each need a
justifying comment.

>> > +do {
>> > +old.control = new.control = pi_desc->control;
>> > +
>> > +/* Should not block the vCPU if an interrupt was posted for 
>> > it.
>> */
>> > +if ( pi_test_on() )
>> > +{
>> > +spin_unlock_irqrestore(>arch.hvm_vmx.pi_lock,
>> gflags);
>> > +vcpu_unblock(v);
>> 
>> Calling vcpu_unblock() in the middle of context_switch()? Why? And
>> is this safe? 
> 
> I cannot see anything unsafe so far, can some scheduler maintainer
> help to confirm it? Dario? George?

But first and foremost you ought to answer the "why".

>> And if really needed to cover something not dealt with
>> elsewhere, wouldn't this need to happen _after_ having switched
>> the notification mechanism below?
> 
> If ON is set, we don't need to block the vCPU hence no need to change the
> notification vector to the wakeup one, which is used when vCPU is blocked.
> 
>> 
>> > +return;

Oh, right, I somehow managed to ignore the "return" here.

>> > +static void vmx_post_ctx_switch_pi(struct vcpu *v)
>> > +{
>> > +struct pi_desc *pi_desc = >arch.hvm_vmx.pi_desc;
>> > +
>> > +if ( !has_arch_pdevs(v->domain) )
>> > +return;
>> > +
>> > +if ( 

Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-09 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Wednesday, September 09, 2015 6:27 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
> xen-devel@lists.xen.org; Keir Fraser
> Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted
> interrupts
> 
> >>> On 09.09.15 at 10:56,  wrote:
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Monday, September 07, 2015 8:55 PM
> >> >>> On 25.08.15 at 03:57,  wrote:
> >> > +
> >> > +/*
> >> > + * Delete the vCPU from the related block list
> >> > + * if we are resuming from blocked state.
> >> > + */
> >> > +if ( v->arch.hvm_vmx.pi_block_cpu != -1 )
> >> > +{
> >> > +spin_lock_irqsave(_cpu(pi_blocked_vcpu_lock,
> >> > +  v->arch.hvm_vmx.pi_block_cpu),
> >> flags);
> >> > +list_del_init(>arch.hvm_vmx.pi_blocked_vcpu_list);
> >>
> >> Shouldn't you set .pi_block_cpu back to -1 along with removing
> >> the vCPU from the list? Which then ought to allow using just
> >> list_del() here?
> >
> > Here is the story about using list_del_init() instead of list_del(), (the
> > same reason for using list_del_init() in pi_wakeup_interrupt() ).
> > If we use list_del(), that means we need to set .pi_block_cpu to
> > -1 after removing the vCPU from the block list, there are two
> > places where the vCPU is removed from the list:
> > - arch_vcpu_wake_prepare()
> > - pi_wakeup_interrupt()
> >
> > That means we also need to set .pi_block_cpu to -1 in
> > pi_wakeup_interrupt(), which seems problematic to me, since
> > .pi_block_cpu is used to get the per-CPU spin lock, it is not safe
> > to change it in pi_wakeup_interrupt(), maybe someone is using
> > it at the same time.
> >
> > And the check "if ( v->arch.hvm_vmx.pi_block_cpu != -1 )" here
> > is only used when .pi_block_cpu is set to -1 at the initial time.
> >
> > If you have any good suggestions about this, I will be all ears.
> 
> Latching pi_block_cpu into a local variable would take care of that
> part of the problem. Of course you then may also need to check
> that while waiting to acquire the lock pi_block_cpu didn't change.

Thanks for the suggestion! But I think it is not easy to "check
"that while waiting to acquire the lock pi_block_cpu didn't change".
Let's take the following pseudo code as an example:

int local_pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;

..

spin_lock_irqsave(_cpu(pi_blocked_vcpu_lock,
   local_pi_block_cpu), flags);
list_del(>arch.hvm_vmx.pi_blocked_vcpu_list);
spin_unlock_irqrestore(_cpu(pi_blocked_vcpu_lock,
   local_pi_block_cpu), flags);

If .pi_block_cpu is changed to -1 by others during calling list_del()
above, that means the vCPU is removed by others, then calling
list_del() here again would be problematic. I think there might be
other corner cases for this. So I suggest adding some comments
for list_del_init() as you mentioned below.

> And if that absolutely can't be made work correctly, these
> apparently strange uses of list_del_init() would each need a
> justifying comment.
> 
> >> > +do {
> >> > +old.control = new.control = pi_desc->control;
> >> > +
> >> > +/* Should not block the vCPU if an interrupt was posted for
> it.
> >> */
> >> > +if ( pi_test_on() )
> >> > +{
> >> > +spin_unlock_irqrestore(>arch.hvm_vmx.pi_lock,
> >> gflags);
> >> > +vcpu_unblock(v);
> >>
> >> Calling vcpu_unblock() in the middle of context_switch()? Why? And
> >> is this safe?
> >
> > I cannot see anything unsafe so far, can some scheduler maintainer
> > help to confirm it? Dario? George?
> 
> But first and foremost you ought to answer the "why".

I think if you think this solution is unsafe, you need point out where it is, 
not
just ask "is this safe ?", I don't think this is an effective comments.

Anyway, I go through the main path of the code as below, I really don't find
anything unsafe here.

context_switch() -->
pi_ctxt_switch_from() -->
vmx_pre_ctx_switch_pi() -->
vcpu_unblock() -->
vcpu_wake() -->
SCHED_OP(VCPU2OP(v), wake, v) -->
csched_vcpu_wake() -->
__runq_insert()
__runq_tickle()

If you continue to think it is unsafe, pointing out the place will be welcome!

> 
> >> And if really needed to cover something not dealt with
> >> elsewhere, wouldn't this need to happen _after_ having switched
> >> the notification mechanism below?
> >
> > If ON is set, we don't need to block the vCPU hence no need to change the
> > notification vector to the wakeup one, which is used when vCPU is blocked.
> >
> >>
> >> > +return;
> 
> Oh, right, I somehow managed to ignore the "return" here.
> 
> >> > +static 

[Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-08-24 Thread Feng Wu
This patch adds the following arch hooks in scheduler:
- vmx_pre_ctx_switch_pi():
It is called before context switch, we update the posted
interrupt descriptor when the vCPU is preempted, go to sleep,
or is blocked.

- vmx_post_ctx_switch_pi()
It is called after context switch, we update the posted
interrupt descriptor when the vCPU is going to run.

- arch_vcpu_wake_prepare()
It will be called when waking up the vCPU, we update
the posted interrupt descriptor when the vCPU is unblocked.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Kevin Tian kevin.t...@intel.com
CC: George Dunlap george.dun...@eu.citrix.com
CC: Dario Faggioli dario.faggi...@citrix.com
Sugguested-by: Dario Faggioli dario.faggi...@citrix.com
Signed-off-by: Feng Wu feng...@intel.com
Reviewed-by: Dario Faggioli dario.faggi...@citrix.com
---
v6:
- Add two static inline functions for pi context switch
- Fix typos

v5:
- Rename arch_vcpu_wake to arch_vcpu_wake_prepare
- Make arch_vcpu_wake_prepare() inline for ARM
- Merge the ARM dummy hook with together
- Changes to some code comments
- Leave 'pi_ctxt_switch_from' and 'pi_ctxt_switch_to' NULL if
  PI is disabled or the vCPU is not in HVM
- Coding style

v4:
- Newly added

 xen/arch/x86/domain.c  |  19 +
 xen/arch/x86/hvm/vmx/vmx.c | 147 +
 xen/common/schedule.c  |   2 +
 xen/include/asm-arm/domain.h   |   2 +
 xen/include/asm-x86/domain.h   |   3 +
 xen/include/asm-x86/hvm/hvm.h  |   2 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |   8 ++
 7 files changed, 183 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 045f6ff..443986e 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1573,6 +1573,22 @@ static void __context_switch(void)
 per_cpu(curr_vcpu, cpu) = n;
 }
 
+static inline void pi_ctxt_switch_from(struct vcpu *prev)
+{
+/*
+ * When switching from non-idle to idle, we only do a lazy context switch.
+ * However, in order for posted interrupt (if available and enabled) to
+ * work properly, we at least need to update the descriptors.
+ */
+if ( prev-arch.pi_ctxt_switch_from  !is_idle_vcpu(prev) )
+prev-arch.pi_ctxt_switch_from(prev);
+}
+
+static inline void pi_ctxt_switch_to(struct vcpu *next)
+{
+if ( next-arch.pi_ctxt_switch_to  !is_idle_vcpu(next) )
+next-arch.pi_ctxt_switch_to(next);
+}
 
 void context_switch(struct vcpu *prev, struct vcpu *next)
 {
@@ -1605,9 +1621,12 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
 set_current(next);
 
+pi_ctxt_switch_from(prev);
+
 if ( (per_cpu(curr_vcpu, cpu) == next) ||
  (is_idle_domain(nextd)  cpu_online(cpu)) )
 {
+pi_ctxt_switch_to(next);
 local_irq_enable();
 }
 else
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 5167fae..889ede3 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -67,6 +67,8 @@ enum handler_return { HNDL_done, HNDL_unhandled, 
HNDL_exception_raised };
 
 static void vmx_ctxt_switch_from(struct vcpu *v);
 static void vmx_ctxt_switch_to(struct vcpu *v);
+static void vmx_pre_ctx_switch_pi(struct vcpu *v);
+static void vmx_post_ctx_switch_pi(struct vcpu *v);
 
 static int  vmx_alloc_vlapic_mapping(struct domain *d);
 static void vmx_free_vlapic_mapping(struct domain *d);
@@ -117,10 +119,20 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 INIT_LIST_HEAD(v-arch.hvm_vmx.pi_blocked_vcpu_list);
 INIT_LIST_HEAD(v-arch.hvm_vmx.pi_vcpu_on_set_list);
 
+v-arch.hvm_vmx.pi_block_cpu = -1;
+
+spin_lock_init(v-arch.hvm_vmx.pi_lock);
+
 v-arch.schedule_tail= vmx_do_resume;
 v-arch.ctxt_switch_from = vmx_ctxt_switch_from;
 v-arch.ctxt_switch_to   = vmx_ctxt_switch_to;
 
+if ( iommu_intpost  is_hvm_vcpu(v) )
+{
+v-arch.pi_ctxt_switch_from = vmx_pre_ctx_switch_pi;
+v-arch.pi_ctxt_switch_to = vmx_post_ctx_switch_pi;
+}
+
 if ( (rc = vmx_create_vmcs(v)) != 0 )
 {
 dprintk(XENLOG_WARNING,
@@ -718,6 +730,140 @@ static void vmx_fpu_leave(struct vcpu *v)
 }
 }
 
+void arch_vcpu_wake_prepare(struct vcpu *v)
+{
+unsigned long gflags;
+
+if ( !iommu_intpost || !is_hvm_vcpu(v) || !has_arch_pdevs(v-domain) )
+return;
+
+spin_lock_irqsave(v-arch.hvm_vmx.pi_lock, gflags);
+
+if ( likely(vcpu_runnable(v)) ||
+ !test_bit(_VPF_blocked, v-pause_flags) )
+{
+struct pi_desc *pi_desc = v-arch.hvm_vmx.pi_desc;
+unsigned long flags;
+
+/*
+ * We don't need to send notification event to a non-running
+ * vcpu, the interrupt information will be delivered to it before
+ * VM-ENTRY when the vcpu is scheduled to run next time.
+ */
+pi_set_sn(pi_desc);
+
+/*
+ * Set 'NV' field back to posted_intr_vector, so the
+ *