subject:"\[Xen\-devel\] \[v3 12\/15\] vmx\: posted\-interrupt handling when vCPU is blocked"

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-10 Thread Wu, Feng



> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, July 10, 2015 4:50 PM
> To: Wu, Feng
> Cc: Andrew Cooper; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z;
> xen-devel@lists.xen.org; k...@xen.org
> Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> >>> On 10.07.15 at 09:29,  wrote:
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Friday, July 10, 2015 2:32 PM
> >> >>> On 10.07.15 at 08:21,  wrote:
> >> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> >> Sent: Thursday, July 09, 2015 3:26 PM
> >> >> >>> On 09.07.15 at 00:49,  wrote:
> >> >> >>  From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> >> >> >> Sent: Wednesday, July 08, 2015 9:09 PM
> >> >> >> On 08/07/2015 13:46, Jan Beulich wrote:
> >> >> >> >>>> On 08.07.15 at 13:00,  wrote:
> >> >> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table
> >> __initdata
> >> >> >> >>> vmx_function_table = {
> >> >> >> >>>  .enable_msr_exit_interception =
> >> >> vmx_enable_msr_exit_interception,
> >> >> >> >>>  };
> >> >> >> >>>
> >> >> >> >>> +/*
> >> >> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked.
> >> >> >> >>> + */
> >> >> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> >> >> >> >>> +{
> >> >> >> >>> +struct arch_vmx_struct *vmx;
> >> >> >> >>> +unsigned int cpu = smp_processor_id();
> >> >> >> >>> +
> >> >> >> >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> >> >> >> >>> +
> >> >> >> >>> +/*
> >> >> >> >>> + * FIXME: The length of the list depends on how many
> >> >> >> >>> + * vCPU is current blocked on this specific pCPU.
> >> >> >> >>> + * This may hurt the interrupt latency if the list
> >> >> >> >>> + * grows to too many entries.
> >> >> >> >>> + */
> >> >> >> >> let's go with this linked list first until a real issue is 
> >> >> >> >> identified.
> >> >> >> > This is exactly the way of thinking I dislike when it comes to code
> >> >> >> > that isn't intended to be experimental only: We shouldn't wait
> >> >> >> > for problems to surface when we already can see them. I.e. if
> >> >> >> > there are no plans to deal with this, I'd ask for the feature to be
> >> >> >> > off by default and be properly marked experimental in the
> >> >> >> > command line option documentation (so people know to stay
> >> >> >> > away from it).
> >> >> >>
> >> >> >> And in this specific case, there is no balancing of vcpus across the
> >> >> >> pcpus lists.
> >> >> >>
> >> >> >> One can construct a pathological case using pinning and pausing to
> get
> >> >> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer
> >> >> >> interrupts will exasperate the problem by staying on the list for 
> >> >> >> longer
> >> >> >> periods of time.
> >> >> >
> >> >> > In that extreme case I believe many contentions in other code paths
> will
> >> >> > be much larger than overhead caused by this structure limitation.
> >> >>
> >> >> Examples?
> >> >>
> >> >> >> IMO, the PI feature cannot be declared as done/supported with this
> bug
> >> >> >> remaining.  OTOH, it is fine to be experimental, and disabled by
> default
> >> >> >> for people who wish to experiment.
> >> >> >>
> >> >> >
> >> >> > Again, I don't expect to see it disabled as experimental. For good
> >> >> > production
> >> >> > environment where vcpus

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-10 Thread Jan Beulich

>>> On 10.07.15 at 09:29,  wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Friday, July 10, 2015 2:32 PM
>> >>> On 10.07.15 at 08:21,  wrote:
>> >> From: Jan Beulich [mailto:jbeul...@suse.com]
>> >> Sent: Thursday, July 09, 2015 3:26 PM
>> >> >>> On 09.07.15 at 00:49,  wrote:
>> >> >>  From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
>> >> >> Sent: Wednesday, July 08, 2015 9:09 PM
>> >> >> On 08/07/2015 13:46, Jan Beulich wrote:
>> >> >>  On 08.07.15 at 13:00,  wrote:
>> >> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table
>> __initdata
>> >> >> >>> vmx_function_table = {
>> >> >> >>>  .enable_msr_exit_interception =
>> >> vmx_enable_msr_exit_interception,
>> >> >> >>>  };
>> >> >> >>>
>> >> >> >>> +/*
>> >> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked.
>> >> >> >>> + */
>> >> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
>> >> >> >>> +{
>> >> >> >>> +struct arch_vmx_struct *vmx;
>> >> >> >>> +unsigned int cpu = smp_processor_id();
>> >> >> >>> +
>> >> >> >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
>> >> >> >>> +
>> >> >> >>> +/*
>> >> >> >>> + * FIXME: The length of the list depends on how many
>> >> >> >>> + * vCPU is current blocked on this specific pCPU.
>> >> >> >>> + * This may hurt the interrupt latency if the list
>> >> >> >>> + * grows to too many entries.
>> >> >> >>> + */
>> >> >> >> let's go with this linked list first until a real issue is 
>> >> >> >> identified.
>> >> >> > This is exactly the way of thinking I dislike when it comes to code
>> >> >> > that isn't intended to be experimental only: We shouldn't wait
>> >> >> > for problems to surface when we already can see them. I.e. if
>> >> >> > there are no plans to deal with this, I'd ask for the feature to be
>> >> >> > off by default and be properly marked experimental in the
>> >> >> > command line option documentation (so people know to stay
>> >> >> > away from it).
>> >> >>
>> >> >> And in this specific case, there is no balancing of vcpus across the
>> >> >> pcpus lists.
>> >> >>
>> >> >> One can construct a pathological case using pinning and pausing to get
>> >> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer
>> >> >> interrupts will exasperate the problem by staying on the list for 
>> >> >> longer
>> >> >> periods of time.
>> >> >
>> >> > In that extreme case I believe many contentions in other code paths will
>> >> > be much larger than overhead caused by this structure limitation.
>> >>
>> >> Examples?
>> >>
>> >> >> IMO, the PI feature cannot be declared as done/supported with this bug
>> >> >> remaining.  OTOH, it is fine to be experimental, and disabled by 
>> >> >> default
>> >> >> for people who wish to experiment.
>> >> >>
>> >> >
>> >> > Again, I don't expect to see it disabled as experimental. For good
>> >> > production
>> >> > environment where vcpus are well balanced and interrupt latency is
>> >> > sensitive,
>> >> > linked list should be efficient here. For bad environment like extreme 
> case
>> >> > you raised, I don't know whether it really matters to just tune 
>> >> > interrupt
>> >> > path.
>> >>
>> >> Can you _guarantee_ that everything potentially leading to such a
>> >> pathological situation is covered by XSA-77? And even if it is now,
>> >> removing elements from the waiver list would become significantly
>> >> more difficult if disconnected behavior like this one would need to
>> >> be taken into account.
>> >>
>> >> Please understand that history has told us to be rather more careful
>> >> than might seem necessary with this: ATS originally having been
>> >> enabled by default is one bold example, and the recent flood of MSI
>> >> related XSAs is another; I suppose I could find more. All affecting
>> >> code originating from Intel, apparently written with only functionality
>> >> in mind, while having left out (other than basic) security considerations.
>> >>
>> >> IOW, with my committer role hat on, the feature is going to be
>> >> experimental (and hence default off) unless the issue here gets
>> >> addressed. And no, I cannot immediately suggest a good approach,
>> >> and with all of the rush before the feature freeze I also can't justify
>> >> taking a lot of time to think of options.
>> >
>> > Is it acceptable to you if I only add the blocked vcpus that has
>> > assigned devices to the list? I think that should shorten the
>> > length of the list.
>> 
>> I actually implied this to be the case already, i.e.
>> - if it's not, this needs to be fixed anyway,
>> - it's not going to eliminate the concern (just think of a couple of
>>   many-vCPU guests all having devices assigned).
> 
> So how about allocating multiple wakeup vectors (says, 16, maybe
> we can make this configurable) and multiplex them amongst all the
> blocked vCPUs?

For such an approach to be effective, you'd need to know up front
how many vCPU-s you may need to deal with, or allocate vect

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-10 Thread Wu, Feng



> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, July 10, 2015 2:32 PM
> To: Wu, Feng
> Cc: Andrew Cooper; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z;
> xen-devel@lists.xen.org; k...@xen.org
> Subject: RE: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> >>> On 10.07.15 at 08:21,  wrote:
> 
> >
> >> -Original Message-
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Thursday, July 09, 2015 3:26 PM
> >> To: Wu, Feng; Tian, Kevin
> >> Cc: Andrew Cooper; george.dun...@eu.citrix.com; Zhang, Yang Z;
> >> xen-devel@lists.xen.org; k...@xen.org
> >> Subject: RE: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when
> vCPU
> >> is blocked
> >>
> >> >>> On 09.07.15 at 00:49,  wrote:
> >> >>  From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> >> >> Sent: Wednesday, July 08, 2015 9:09 PM
> >> >> On 08/07/2015 13:46, Jan Beulich wrote:
> >> >> >>>> On 08.07.15 at 13:00,  wrote:
> >> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table
> __initdata
> >> >> >>> vmx_function_table = {
> >> >> >>>  .enable_msr_exit_interception =
> >> vmx_enable_msr_exit_interception,
> >> >> >>>  };
> >> >> >>>
> >> >> >>> +/*
> >> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked.
> >> >> >>> + */
> >> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> >> >> >>> +{
> >> >> >>> +struct arch_vmx_struct *vmx;
> >> >> >>> +unsigned int cpu = smp_processor_id();
> >> >> >>> +
> >> >> >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> >> >> >>> +
> >> >> >>> +/*
> >> >> >>> + * FIXME: The length of the list depends on how many
> >> >> >>> + * vCPU is current blocked on this specific pCPU.
> >> >> >>> + * This may hurt the interrupt latency if the list
> >> >> >>> + * grows to too many entries.
> >> >> >>> + */
> >> >> >> let's go with this linked list first until a real issue is 
> >> >> >> identified.
> >> >> > This is exactly the way of thinking I dislike when it comes to code
> >> >> > that isn't intended to be experimental only: We shouldn't wait
> >> >> > for problems to surface when we already can see them. I.e. if
> >> >> > there are no plans to deal with this, I'd ask for the feature to be
> >> >> > off by default and be properly marked experimental in the
> >> >> > command line option documentation (so people know to stay
> >> >> > away from it).
> >> >>
> >> >> And in this specific case, there is no balancing of vcpus across the
> >> >> pcpus lists.
> >> >>
> >> >> One can construct a pathological case using pinning and pausing to get
> >> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer
> >> >> interrupts will exasperate the problem by staying on the list for longer
> >> >> periods of time.
> >> >
> >> > In that extreme case I believe many contentions in other code paths will
> >> > be much larger than overhead caused by this structure limitation.
> >>
> >> Examples?
> >>
> >> >> IMO, the PI feature cannot be declared as done/supported with this bug
> >> >> remaining.  OTOH, it is fine to be experimental, and disabled by default
> >> >> for people who wish to experiment.
> >> >>
> >> >
> >> > Again, I don't expect to see it disabled as experimental. For good
> >> > production
> >> > environment where vcpus are well balanced and interrupt latency is
> >> > sensitive,
> >> > linked list should be efficient here. For bad environment like extreme 
> >> > case
> >> > you raised, I don't know whether it really matters to just tune interrupt
> >> > path.
> >>
> >> Can you _guarantee_ that everything potentially leading to such a

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-09 Thread Jan Beulich

>>> On 10.07.15 at 08:21,  wrote:

> 
>> -Original Message-
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Thursday, July 09, 2015 3:26 PM
>> To: Wu, Feng; Tian, Kevin
>> Cc: Andrew Cooper; george.dun...@eu.citrix.com; Zhang, Yang Z;
>> xen-devel@lists.xen.org; k...@xen.org 
>> Subject: RE: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
>> is blocked
>> 
>> >>> On 09.07.15 at 00:49,  wrote:
>> >>  From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
>> >> Sent: Wednesday, July 08, 2015 9:09 PM
>> >> On 08/07/2015 13:46, Jan Beulich wrote:
>> >> >>>> On 08.07.15 at 13:00,  wrote:
>> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
>> >> >>> vmx_function_table = {
>> >> >>>  .enable_msr_exit_interception =
>> vmx_enable_msr_exit_interception,
>> >> >>>  };
>> >> >>>
>> >> >>> +/*
>> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked.
>> >> >>> + */
>> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
>> >> >>> +{
>> >> >>> +struct arch_vmx_struct *vmx;
>> >> >>> +unsigned int cpu = smp_processor_id();
>> >> >>> +
>> >> >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
>> >> >>> +
>> >> >>> +/*
>> >> >>> + * FIXME: The length of the list depends on how many
>> >> >>> + * vCPU is current blocked on this specific pCPU.
>> >> >>> + * This may hurt the interrupt latency if the list
>> >> >>> + * grows to too many entries.
>> >> >>> + */
>> >> >> let's go with this linked list first until a real issue is identified.
>> >> > This is exactly the way of thinking I dislike when it comes to code
>> >> > that isn't intended to be experimental only: We shouldn't wait
>> >> > for problems to surface when we already can see them. I.e. if
>> >> > there are no plans to deal with this, I'd ask for the feature to be
>> >> > off by default and be properly marked experimental in the
>> >> > command line option documentation (so people know to stay
>> >> > away from it).
>> >>
>> >> And in this specific case, there is no balancing of vcpus across the
>> >> pcpus lists.
>> >>
>> >> One can construct a pathological case using pinning and pausing to get
>> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer
>> >> interrupts will exasperate the problem by staying on the list for longer
>> >> periods of time.
>> >
>> > In that extreme case I believe many contentions in other code paths will
>> > be much larger than overhead caused by this structure limitation.
>> 
>> Examples?
>> 
>> >> IMO, the PI feature cannot be declared as done/supported with this bug
>> >> remaining.  OTOH, it is fine to be experimental, and disabled by default
>> >> for people who wish to experiment.
>> >>
>> >
>> > Again, I don't expect to see it disabled as experimental. For good
>> > production
>> > environment where vcpus are well balanced and interrupt latency is
>> > sensitive,
>> > linked list should be efficient here. For bad environment like extreme case
>> > you raised, I don't know whether it really matters to just tune interrupt
>> > path.
>> 
>> Can you _guarantee_ that everything potentially leading to such a
>> pathological situation is covered by XSA-77? And even if it is now,
>> removing elements from the waiver list would become significantly
>> more difficult if disconnected behavior like this one would need to
>> be taken into account.
>> 
>> Please understand that history has told us to be rather more careful
>> than might seem necessary with this: ATS originally having been
>> enabled by default is one bold example, and the recent flood of MSI
>> related XSAs is another; I suppose I could find more. All affecting
>> code originating from Intel, apparently written with only functionality
>> in mind, while having left out (other than basic) security considerations.
>> 
>> IOW, with my committer role hat on, the feature is going to be
>> experimental (and hence default off) unless the issue here gets
>> addressed. And no, I cannot immediately suggest a good approach,
>> and with all of the rush before the feature freeze I also can't justify
>> taking a lot of time to think of options.
> 
> Is it acceptable to you if I only add the blocked vcpus that has
> assigned devices to the list? I think that should shorten the
> length of the list.

I actually implied this to be the case already, i.e.
- if it's not, this needs to be fixed anyway,
- it's not going to eliminate the concern (just think of a couple of
  many-vCPU guests all having devices assigned).

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-09 Thread Wu, Feng



> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, July 09, 2015 3:26 PM
> To: Wu, Feng; Tian, Kevin
> Cc: Andrew Cooper; george.dun...@eu.citrix.com; Zhang, Yang Z;
> xen-devel@lists.xen.org; k...@xen.org
> Subject: RE: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> >>> On 09.07.15 at 00:49,  wrote:
> >>  From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> >> Sent: Wednesday, July 08, 2015 9:09 PM
> >> On 08/07/2015 13:46, Jan Beulich wrote:
> >> >>>> On 08.07.15 at 13:00,  wrote:
> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
> >> >>> vmx_function_table = {
> >> >>>  .enable_msr_exit_interception =
> vmx_enable_msr_exit_interception,
> >> >>>  };
> >> >>>
> >> >>> +/*
> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked.
> >> >>> + */
> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> >> >>> +{
> >> >>> +struct arch_vmx_struct *vmx;
> >> >>> +unsigned int cpu = smp_processor_id();
> >> >>> +
> >> >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> >> >>> +
> >> >>> +/*
> >> >>> + * FIXME: The length of the list depends on how many
> >> >>> + * vCPU is current blocked on this specific pCPU.
> >> >>> + * This may hurt the interrupt latency if the list
> >> >>> + * grows to too many entries.
> >> >>> + */
> >> >> let's go with this linked list first until a real issue is identified.
> >> > This is exactly the way of thinking I dislike when it comes to code
> >> > that isn't intended to be experimental only: We shouldn't wait
> >> > for problems to surface when we already can see them. I.e. if
> >> > there are no plans to deal with this, I'd ask for the feature to be
> >> > off by default and be properly marked experimental in the
> >> > command line option documentation (so people know to stay
> >> > away from it).
> >>
> >> And in this specific case, there is no balancing of vcpus across the
> >> pcpus lists.
> >>
> >> One can construct a pathological case using pinning and pausing to get
> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer
> >> interrupts will exasperate the problem by staying on the list for longer
> >> periods of time.
> >
> > In that extreme case I believe many contentions in other code paths will
> > be much larger than overhead caused by this structure limitation.
> 
> Examples?
> 
> >> IMO, the PI feature cannot be declared as done/supported with this bug
> >> remaining.  OTOH, it is fine to be experimental, and disabled by default
> >> for people who wish to experiment.
> >>
> >
> > Again, I don't expect to see it disabled as experimental. For good
> > production
> > environment where vcpus are well balanced and interrupt latency is
> > sensitive,
> > linked list should be efficient here. For bad environment like extreme case
> > you raised, I don't know whether it really matters to just tune interrupt
> > path.
> 
> Can you _guarantee_ that everything potentially leading to such a
> pathological situation is covered by XSA-77? And even if it is now,
> removing elements from the waiver list would become significantly
> more difficult if disconnected behavior like this one would need to
> be taken into account.
> 
> Please understand that history has told us to be rather more careful
> than might seem necessary with this: ATS originally having been
> enabled by default is one bold example, and the recent flood of MSI
> related XSAs is another; I suppose I could find more. All affecting
> code originating from Intel, apparently written with only functionality
> in mind, while having left out (other than basic) security considerations.
> 
> IOW, with my committer role hat on, the feature is going to be
> experimental (and hence default off) unless the issue here gets
> addressed. And no, I cannot immediately suggest a good approach,
> and with all of the rush before the feature freeze I also can't justify
> taking a lot of time to think of options.
> 

Is it acceptable to you if I only add the blocked vcpus that has
assigned devices to the list? I think that should shorten the
length of the list.

Thanks,
Feng

> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-09 Thread Jan Beulich

>>> On 09.07.15 at 00:49,  wrote:
>>  From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
>> Sent: Wednesday, July 08, 2015 9:09 PM
>> On 08/07/2015 13:46, Jan Beulich wrote:
>>  On 08.07.15 at 13:00,  wrote:
>> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
>> >>> vmx_function_table = {
>> >>>  .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
>> >>>  };
>> >>>
>> >>> +/*
>> >>> + * Handle VT-d posted-interrupt when VCPU is blocked.
>> >>> + */
>> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
>> >>> +{
>> >>> +struct arch_vmx_struct *vmx;
>> >>> +unsigned int cpu = smp_processor_id();
>> >>> +
>> >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
>> >>> +
>> >>> +/*
>> >>> + * FIXME: The length of the list depends on how many
>> >>> + * vCPU is current blocked on this specific pCPU.
>> >>> + * This may hurt the interrupt latency if the list
>> >>> + * grows to too many entries.
>> >>> + */
>> >> let's go with this linked list first until a real issue is identified.
>> > This is exactly the way of thinking I dislike when it comes to code
>> > that isn't intended to be experimental only: We shouldn't wait
>> > for problems to surface when we already can see them. I.e. if
>> > there are no plans to deal with this, I'd ask for the feature to be
>> > off by default and be properly marked experimental in the
>> > command line option documentation (so people know to stay
>> > away from it).
>> 
>> And in this specific case, there is no balancing of vcpus across the
>> pcpus lists.
>> 
>> One can construct a pathological case using pinning and pausing to get
>> almost every vcpu on a single pcpu list, and vcpus recieving fewer
>> interrupts will exasperate the problem by staying on the list for longer
>> periods of time.
> 
> In that extreme case I believe many contentions in other code paths will
> be much larger than overhead caused by this structure limitation.

Examples?

>> IMO, the PI feature cannot be declared as done/supported with this bug
>> remaining.  OTOH, it is fine to be experimental, and disabled by default
>> for people who wish to experiment.
>> 
> 
> Again, I don't expect to see it disabled as experimental. For good 
> production
> environment where vcpus are well balanced and interrupt latency is 
> sensitive,
> linked list should be efficient here. For bad environment like extreme case
> you raised, I don't know whether it really matters to just tune interrupt 
> path.

Can you _guarantee_ that everything potentially leading to such a
pathological situation is covered by XSA-77? And even if it is now,
removing elements from the waiver list would become significantly
more difficult if disconnected behavior like this one would need to
be taken into account.

Please understand that history has told us to be rather more careful
than might seem necessary with this: ATS originally having been
enabled by default is one bold example, and the recent flood of MSI
related XSAs is another; I suppose I could find more. All affecting
code originating from Intel, apparently written with only functionality
in mind, while having left out (other than basic) security considerations.

IOW, with my committer role hat on, the feature is going to be
experimental (and hence default off) unless the issue here gets
addressed. And no, I cannot immediately suggest a good approach,
and with all of the rush before the feature freeze I also can't justify
taking a lot of time to think of options. 

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Tian, Kevin

> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Wednesday, July 08, 2015 9:09 PM
> 
> On 08/07/2015 13:46, Jan Beulich wrote:
>  On 08.07.15 at 13:00,  wrote:
> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
> >>> vmx_function_table = {
> >>>  .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
> >>>  };
> >>>
> >>> +/*
> >>> + * Handle VT-d posted-interrupt when VCPU is blocked.
> >>> + */
> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> >>> +{
> >>> +struct arch_vmx_struct *vmx;
> >>> +unsigned int cpu = smp_processor_id();
> >>> +
> >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> >>> +
> >>> +/*
> >>> + * FIXME: The length of the list depends on how many
> >>> + * vCPU is current blocked on this specific pCPU.
> >>> + * This may hurt the interrupt latency if the list
> >>> + * grows to too many entries.
> >>> + */
> >> let's go with this linked list first until a real issue is identified.
> > This is exactly the way of thinking I dislike when it comes to code
> > that isn't intended to be experimental only: We shouldn't wait
> > for problems to surface when we already can see them. I.e. if
> > there are no plans to deal with this, I'd ask for the feature to be
> > off by default and be properly marked experimental in the
> > command line option documentation (so people know to stay
> > away from it).
> 
> And in this specific case, there is no balancing of vcpus across the
> pcpus lists.
> 
> One can construct a pathological case using pinning and pausing to get
> almost every vcpu on a single pcpu list, and vcpus recieving fewer
> interrupts will exasperate the problem by staying on the list for longer
> periods of time.

In that extreme case I believe many contentions in other code paths will
be much larger than overhead caused by this structure limitation.

> 
> IMO, the PI feature cannot be declared as done/supported with this bug
> remaining.  OTOH, it is fine to be experimental, and disabled by default
> for people who wish to experiment.
> 

Again, I don't expect to see it disabled as experimental. For good production
environment where vcpus are well balanced and interrupt latency is sensitive,
linked list should be efficient here. For bad environment like extreme case
you raised, I don't know whether it really matters to just tune interrupt path.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Tian, Kevin

> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Wednesday, July 08, 2015 8:46 PM
> 
> >>> On 08.07.15 at 13:00,  wrote:
> >> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
> >> vmx_function_table = {
> >>  .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
> >>  };
> >>
> >> +/*
> >> + * Handle VT-d posted-interrupt when VCPU is blocked.
> >> + */
> >> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> >> +{
> >> +struct arch_vmx_struct *vmx;
> >> +unsigned int cpu = smp_processor_id();
> >> +
> >> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> >> +
> >> +/*
> >> + * FIXME: The length of the list depends on how many
> >> + * vCPU is current blocked on this specific pCPU.
> >> + * This may hurt the interrupt latency if the list
> >> + * grows to too many entries.
> >> + */
> >
> > let's go with this linked list first until a real issue is identified.
> 
> This is exactly the way of thinking I dislike when it comes to code
> that isn't intended to be experimental only: We shouldn't wait
> for problems to surface when we already can see them. I.e. if
> there are no plans to deal with this, I'd ask for the feature to be
> off by default and be properly marked experimental in the
> command line option documentation (so people know to stay
> away from it).
> 

I don't see big problem here. For typical server consolidation ratio
1:4 or 1:8, the link list should be short. It's not experimental. It's
based on our judge that linked list should be fine here. But any
structure may has potential problem which we don't know now,
just like tasklet scalability issue Andrew raised earlier (in that case
we can't blame anyone using tasklet before the issue is reported).

So, there's no plan to address this since we don't see real problem
now. I'd suggest Feng to remove the comment since FIXME means
something experimental. Or if you have a better suggestion, please
articulate instead of asking people stay away from it.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Andrew Cooper

On 08/07/2015 13:46, Jan Beulich wrote:
 On 08.07.15 at 13:00,  wrote:
>>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
>>> vmx_function_table = {
>>>  .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
>>>  };
>>>
>>> +/*
>>> + * Handle VT-d posted-interrupt when VCPU is blocked.
>>> + */
>>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
>>> +{
>>> +struct arch_vmx_struct *vmx;
>>> +unsigned int cpu = smp_processor_id();
>>> +
>>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
>>> +
>>> +/*
>>> + * FIXME: The length of the list depends on how many
>>> + * vCPU is current blocked on this specific pCPU.
>>> + * This may hurt the interrupt latency if the list
>>> + * grows to too many entries.
>>> + */
>> let's go with this linked list first until a real issue is identified.
> This is exactly the way of thinking I dislike when it comes to code
> that isn't intended to be experimental only: We shouldn't wait
> for problems to surface when we already can see them. I.e. if
> there are no plans to deal with this, I'd ask for the feature to be
> off by default and be properly marked experimental in the
> command line option documentation (so people know to stay
> away from it).

And in this specific case, there is no balancing of vcpus across the
pcpus lists.

One can construct a pathological case using pinning and pausing to get
almost every vcpu on a single pcpu list, and vcpus recieving fewer
interrupts will exasperate the problem by staying on the list for longer
periods of time.

IMO, the PI feature cannot be declared as done/supported with this bug
remaining.  OTOH, it is fine to be experimental, and disabled by default
for people who wish to experiment.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Jan Beulich

>>> On 08.07.15 at 13:00,  wrote:
>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
>> vmx_function_table = {
>>  .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
>>  };
>> 
>> +/*
>> + * Handle VT-d posted-interrupt when VCPU is blocked.
>> + */
>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
>> +{
>> +struct arch_vmx_struct *vmx;
>> +unsigned int cpu = smp_processor_id();
>> +
>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
>> +
>> +/*
>> + * FIXME: The length of the list depends on how many
>> + * vCPU is current blocked on this specific pCPU.
>> + * This may hurt the interrupt latency if the list
>> + * grows to too many entries.
>> + */
> 
> let's go with this linked list first until a real issue is identified.

This is exactly the way of thinking I dislike when it comes to code
that isn't intended to be experimental only: We shouldn't wait
for problems to surface when we already can see them. I.e. if
there are no plans to deal with this, I'd ask for the feature to be
off by default and be properly marked experimental in the
command line option documentation (so people know to stay
away from it).

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Wu, Feng



> -Original Message-
> From: Tian, Kevin
> Sent: Wednesday, July 08, 2015 7:00 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
> Yang Z; george.dun...@eu.citrix.com
> Subject: RE: [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> 
> > From: Wu, Feng
> > Sent: Wednesday, June 24, 2015 1:18 PM
> >
> > This patch includes the following aspects:
> > - Add a global vector to wake up the blocked vCPU
> >   when an interrupt is being posted to it (This
> >   part was sugguested by Yang Zhang ).
> > - Adds a new per-vCPU tasklet to wakeup the blocked
> >   vCPU. It can be used in the case vcpu_unblock
> >   cannot be called directly.
> > - Define two per-cpu variables:
> >   * pi_blocked_vcpu:
> >   A list storing the vCPUs which were blocked on this pCPU.
> >
> >   * pi_blocked_vcpu_lock:
> >   The spinlock to protect pi_blocked_vcpu.
> >
> > Signed-off-by: Feng Wu 
> > ---
> > v3:
> > - This patch is generated by merging the following three patches in v2:
> >[RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
> >[RFC v2 10/15] vmx: Define two per-cpu variables
> >[RFC v2 11/15] vmx: Add a global wake-up vector for VT-d
> Posted-Interrupts
> > - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
> > - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct 
> > arch_vmx_struct'
> > - rename 'vcpu_wakeup_tasklet_handler' to
> 'pi_vcpu_wakeup_tasklet_handler'
> > - Make pi_wakeup_interrupt() static
> > - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
> > - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
> > - Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
> > - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'
> >
> >  xen/arch/x86/hvm/vmx/vmcs.c|  3 +++
> >  xen/arch/x86/hvm/vmx/vmx.c | 54
> > ++
> >  xen/include/asm-x86/hvm/hvm.h  |  1 +
> >  xen/include/asm-x86/hvm/vmx/vmcs.h |  5 
> >  xen/include/asm-x86/hvm/vmx/vmx.h  |  5 
> >  5 files changed, 68 insertions(+)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> > index 11dc1b5..0c5ce3f 100644
> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> > @@ -631,6 +631,9 @@ int vmx_cpu_up(void)
> >  if ( cpu_has_vmx_vpid )
> >  vpid_sync_all();
> >
> > +INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> > +spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
> > +
> >  return 0;
> >  }
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index b94ef6a..7db6009 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr,
> uint64_t
> > *msr_content);
> >  static int vmx_msr_write_intercept(unsigned int msr, uint64_t
> msr_content);
> >  static void vmx_invlpg_intercept(unsigned long vaddr);
> >
> > +/*
> > + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
> > + * can find which vCPU should be waken up.
> > + */
> > +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
> > +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> > +
> >  uint8_t __read_mostly posted_intr_vector;
> > +uint8_t __read_mostly pi_wakeup_vector;
> > +
> > +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg)
> > +{
> > +vcpu_unblock((struct vcpu *)arg);
> > +}
> >
> >  static int vmx_domain_initialise(struct domain *d)
> >  {
> > @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> >  if ( v->vcpu_id == 0 )
> >  v->arch.user_regs.eax = 1;
> >
> > +tasklet_init(
> > +&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet,
> > +pi_vcpu_wakeup_tasklet_handler,
> > +(unsigned long)v);
> > +
> > +INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> > +
> >  return 0;
> >  }
> >
> >  static void vmx_vcpu_destroy(struct vcpu *v)
> >  {
> > +tasklet_kill(&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet);
> >  /*
> >   * There are cases that domain still remains in log-dirty mode when it
> is
> >   * about to be destroyed (ex, user types 'xl destroy '), in which
> case
> > @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
> > vmx_function_table = {
> >  .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
> >  };
> >
> > +/*
> > + * Handle VT-d posted-interrupt when VCPU is blocked.
> > + */
> > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> > +{
> > +struct arch_vmx_struct *vmx;
> > +unsigned int cpu = smp_processor_id();
> > +
> > +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> > +
> > +/*
> > + * FIXME: The length of the list depends on how many
> > + * vCPU is current blocked on this specific pCPU.
> > + * This may hurt the interrupt latency if the list
> > + * grows to too many entries.
> > + */
>

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Tian, Kevin

> From: Wu, Feng
> Sent: Wednesday, June 24, 2015 1:18 PM
> 
> This patch includes the following aspects:
> - Add a global vector to wake up the blocked vCPU
>   when an interrupt is being posted to it (This
>   part was sugguested by Yang Zhang ).
> - Adds a new per-vCPU tasklet to wakeup the blocked
>   vCPU. It can be used in the case vcpu_unblock
>   cannot be called directly.
> - Define two per-cpu variables:
>   * pi_blocked_vcpu:
>   A list storing the vCPUs which were blocked on this pCPU.
> 
>   * pi_blocked_vcpu_lock:
>   The spinlock to protect pi_blocked_vcpu.
> 
> Signed-off-by: Feng Wu 
> ---
> v3:
> - This patch is generated by merging the following three patches in v2:
>[RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
>[RFC v2 10/15] vmx: Define two per-cpu variables
>[RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
> - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
> - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct'
> - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler'
> - Make pi_wakeup_interrupt() static
> - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
> - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
> - Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
> - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'
> 
>  xen/arch/x86/hvm/vmx/vmcs.c|  3 +++
>  xen/arch/x86/hvm/vmx/vmx.c | 54
> ++
>  xen/include/asm-x86/hvm/hvm.h  |  1 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h |  5 
>  xen/include/asm-x86/hvm/vmx/vmx.h  |  5 
>  5 files changed, 68 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 11dc1b5..0c5ce3f 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -631,6 +631,9 @@ int vmx_cpu_up(void)
>  if ( cpu_has_vmx_vpid )
>  vpid_sync_all();
> 
> +INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> +spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
> +
>  return 0;
>  }
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index b94ef6a..7db6009 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr, 
> uint64_t
> *msr_content);
>  static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
>  static void vmx_invlpg_intercept(unsigned long vaddr);
> 
> +/*
> + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
> + * can find which vCPU should be waken up.
> + */
> +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
> +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> +
>  uint8_t __read_mostly posted_intr_vector;
> +uint8_t __read_mostly pi_wakeup_vector;
> +
> +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg)
> +{
> +vcpu_unblock((struct vcpu *)arg);
> +}
> 
>  static int vmx_domain_initialise(struct domain *d)
>  {
> @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>  if ( v->vcpu_id == 0 )
>  v->arch.user_regs.eax = 1;
> 
> +tasklet_init(
> +&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet,
> +pi_vcpu_wakeup_tasklet_handler,
> +(unsigned long)v);
> +
> +INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +
>  return 0;
>  }
> 
>  static void vmx_vcpu_destroy(struct vcpu *v)
>  {
> +tasklet_kill(&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet);
>  /*
>   * There are cases that domain still remains in log-dirty mode when it is
>   * about to be destroyed (ex, user types 'xl destroy '), in which 
> case
> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
> vmx_function_table = {
>  .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
>  };
> 
> +/*
> + * Handle VT-d posted-interrupt when VCPU is blocked.
> + */
> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> +struct arch_vmx_struct *vmx;
> +unsigned int cpu = smp_processor_id();
> +
> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> +
> +/*
> + * FIXME: The length of the list depends on how many
> + * vCPU is current blocked on this specific pCPU.
> + * This may hurt the interrupt latency if the list
> + * grows to too many entries.
> + */

let's go with this linked list first until a real issue is identified.

> +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu),
> +pi_blocked_vcpu_list)
> +if ( vmx->pi_desc.on )
> +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet);

Not sure where the vcpu is removed from the list (possibly in later patch).
But at least removing vcpu from the list at this point should be safe and 
right way to go. IIRC Andrew and other guys raised similar concern earlier. :-)

Thanks
Kevin

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Jan Beulich

>>> On 08.07.15 at 12:36,  wrote:
>> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
>> Sent: Tuesday, June 30, 2015 1:07 AM
>> On 24/06/15 06:18, Feng Wu wrote:
>> > @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>> >  if ( v->vcpu_id == 0 )
>> >  v->arch.user_regs.eax = 1;
>> >
>> > +tasklet_init(
>> > +&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet,
>> > +pi_vcpu_wakeup_tasklet_handler,
>> > +(unsigned long)v);
>> 
>> c/s f6dd295 indicates that the global tasklet lock causes a bottleneck
>> when injecting interrupts, and replaced a tasklet with a softirq to fix
>> the scalability issue.
>> 
>> I would expect exactly the bottleneck to exist here.
> 
> I am still considering this comments. Jan, what is your opinion about this?

"My opinion" here is that I expect you to respond to Andrew.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Wu, Feng



> -Original Message-
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Tuesday, June 30, 2015 1:07 AM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: k...@xen.org; jbeul...@suse.com; Tian, Kevin; Zhang, Yang Z;
> george.dun...@eu.citrix.com
> Subject: Re: [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> 
> On 24/06/15 06:18, Feng Wu wrote:
> > This patch includes the following aspects:
> > - Add a global vector to wake up the blocked vCPU
> >   when an interrupt is being posted to it (This
> >   part was sugguested by Yang Zhang ).
> > - Adds a new per-vCPU tasklet to wakeup the blocked
> >   vCPU. It can be used in the case vcpu_unblock
> >   cannot be called directly.
> > - Define two per-cpu variables:
> >   * pi_blocked_vcpu:
> >   A list storing the vCPUs which were blocked on this pCPU.
> >
> >   * pi_blocked_vcpu_lock:
> >   The spinlock to protect pi_blocked_vcpu.
> >
> > Signed-off-by: Feng Wu 
> > ---
> > v3:
> > - This patch is generated by merging the following three patches in v2:
> >[RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
> >[RFC v2 10/15] vmx: Define two per-cpu variables
> >[RFC v2 11/15] vmx: Add a global wake-up vector for VT-d
> Posted-Interrupts
> > - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
> > - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct 
> > arch_vmx_struct'
> > - rename 'vcpu_wakeup_tasklet_handler' to
> 'pi_vcpu_wakeup_tasklet_handler'
> > - Make pi_wakeup_interrupt() static
> > - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
> > - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
> > - Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
> > - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'
> >
> >  xen/arch/x86/hvm/vmx/vmcs.c|  3 +++
> >  xen/arch/x86/hvm/vmx/vmx.c | 54
> ++
> >  xen/include/asm-x86/hvm/hvm.h  |  1 +
> >  xen/include/asm-x86/hvm/vmx/vmcs.h |  5 
> >  xen/include/asm-x86/hvm/vmx/vmx.h  |  5 
> >  5 files changed, 68 insertions(+)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> > index 11dc1b5..0c5ce3f 100644
> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> > @@ -631,6 +631,9 @@ int vmx_cpu_up(void)
> >  if ( cpu_has_vmx_vpid )
> >  vpid_sync_all();
> >
> > +INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> > +spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
> > +
> >  return 0;
> >  }
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index b94ef6a..7db6009 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr,
> uint64_t *msr_content);
> >  static int vmx_msr_write_intercept(unsigned int msr, uint64_t
> msr_content);
> >  static void vmx_invlpg_intercept(unsigned long vaddr);
> >
> > +/*
> > + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
> > + * can find which vCPU should be waken up.
> > + */
> > +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
> > +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> > +
> >  uint8_t __read_mostly posted_intr_vector;
> > +uint8_t __read_mostly pi_wakeup_vector;
> > +
> > +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg)
> > +{
> > +vcpu_unblock((struct vcpu *)arg);
> > +}
> >
> >  static int vmx_domain_initialise(struct domain *d)
> >  {
> > @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> >  if ( v->vcpu_id == 0 )
> >  v->arch.user_regs.eax = 1;
> >
> > +tasklet_init(
> > +&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet,
> > +pi_vcpu_wakeup_tasklet_handler,
> > +(unsigned long)v);
> 
> c/s f6dd295 indicates that the global tasklet lock causes a bottleneck
> when injecting interrupts, and replaced a tasklet with a softirq to fix
> the scalability issue.
> 
> I would expect exactly the bottleneck to exist here.

I am still considering this comments. Jan, what is your opinion about this?

Thanks,
Feng

> 
> > +
> > +INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> > +
> >  return 0;
> >  }
> >
> >  static void vmx_vcpu_destroy(struct vcpu *v)
> >  {
> > +tasklet_kill(&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet);
> >  /*
> >   * There are cases that domain still remains in log-dirty mode when it
> is
> >   * about to be destroyed (ex, user types 'xl destroy '), in which
> case
> > @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
> vmx_function_table = {
> >  .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
> >  };
> >
> > +/*
> > + * Handle VT-d posted-interrupt when VCPU is blocked.
> > + */
> > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> > +{
> > +struct arch_vmx_struct *vmx;
> > +unsigned int cpu = smp_processor_id();
> > +

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Wu, Feng



> -Original Message-
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Thursday, July 02, 2015 9:00 PM
> To: Dario Faggioli
> Cc: Wu, Feng; Tian, Kevin; k...@xen.org; george.dun...@eu.citrix.com;
> xen-devel@lists.xen.org; jbeul...@suse.com; Zhang, Yang Z
> Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> On 02/07/15 13:38, Dario Faggioli wrote:
> > On Thu, 2015-07-02 at 13:16 +0100, Andrew Cooper wrote:
> >> On 02/07/15 13:04, Dario Faggioli wrote:
> >>> On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote:
> >>>> I can't currently decide whether this will be quicker or slower overall,
> >>>> or (most likely) it will even out to equal in the general case.
> >>>>
> >>> Well, given the thing works as you (two) just described, I think
> >>> draining the list is the only thing we can do.
> >>>
> >>> In fact, AFAICT, since we can't know for what vcpu a particular
> >>> notification is intended, we don't have alternatives to waking them all,
> >>> do we?
> >> Perhaps you misunderstand.
> >>
> > I'm quite sure I was. While I think now I'm getting it.
> >
> >> Every single vcpu has a PI descriptor which is shared memory with
> hardware.
> >>
> > Right.
> >
> >> A NV is delivered strictly when hardware atomically changes desc.on from
> >> 0 to 1.  i.e. the first time that an oustanding notification arrives.
> >> (iirc, desc.on is later cleared by hardware when the vcpu is scheduled
> >> and the vector(s) actually injected.)
> >>
> >> Part of the scheduling modifications alter when a vcpu is eligible to
> >> have NV's delivered on its behalf.  non-scheduled vcpus get NV's while
> >> scheduled vcpus have direct injection instead.
> >>
> > Blocked vcpus, AFAICT. But that's not relevant here.
> >
> >> Therefore, in the case that an NV arrives, we know for certain that one
> >> of the NV-eligible vcpus has had desc.on set by hardware, and we can
> >> uniquely identify it by searching for the vcpu for which desc.on is set.
> >>
> > Yeah, but we ca have more than one of them. You said "I can't currently
> > decide whether this will be quicker or slower", which I read like you
> > were suggesting that not draining the queue was a plausible alternative,
> > while I now think it's not.
> >
> > Perhaps you were not meaning anything like that, so it was not necessary
> > for me to point this out, in which case, sorry for the noise. :-)
> 
> To be clear, (assuming that a kicked vcpu is removed from the list),
> then both options of kicking exactly one vcpu or kicking all vcpus will
> function.  The end result after all processing of NVs will be that every
> vcpu with desc.on set will be kicked exactly once.

We need to remove the kicked vCPU from the list, this is will be included
in the next version.

Thanks,
Feng

> 
> I just was concerned about the O() of searching the list on a subsequent
> NV, knowing that we most likely took the relevant entry off the list on
> the previous NV.
> 
> >
> >> In the case of stacked NV's, we cannot associate which specific vcpu
> >> caused which NV, but we know that we will get one NV per vcpu needing
> >> kicking.
> >>
> > Exactly, and that's what I'm talking about, and why I'm saying that
> > waking everyone is the only solution. The bottom line being that, even
> > in case this is deemed too slow, we don't have the option of waking only
> > one vcpu at each NV, as we wouldn't know who to wake, and hence we'd
> > need to make things faster in some other way.
> 
> Ah - I see your point now.
> 
> Yes - kicking exactly one vcpu per NV could result in a different vcpu
> being deferred based on the interrupt activity of other vcpus and its
> position in the list.
> 
> In which case, we should eagerly kick all vcpus.
> 
> ~Andrew
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Andrew Cooper

On 02/07/15 13:38, Dario Faggioli wrote:
> On Thu, 2015-07-02 at 13:16 +0100, Andrew Cooper wrote:
>> On 02/07/15 13:04, Dario Faggioli wrote:
>>> On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote:
 I can't currently decide whether this will be quicker or slower overall,
 or (most likely) it will even out to equal in the general case.

>>> Well, given the thing works as you (two) just described, I think
>>> draining the list is the only thing we can do.
>>>
>>> In fact, AFAICT, since we can't know for what vcpu a particular
>>> notification is intended, we don't have alternatives to waking them all,
>>> do we?
>> Perhaps you misunderstand.
>>
> I'm quite sure I was. While I think now I'm getting it.
>
>> Every single vcpu has a PI descriptor which is shared memory with hardware.
>>
> Right.
>
>> A NV is delivered strictly when hardware atomically changes desc.on from
>> 0 to 1.  i.e. the first time that an oustanding notification arrives. 
>> (iirc, desc.on is later cleared by hardware when the vcpu is scheduled
>> and the vector(s) actually injected.)
>>
>> Part of the scheduling modifications alter when a vcpu is eligible to
>> have NV's delivered on its behalf.  non-scheduled vcpus get NV's while
>> scheduled vcpus have direct injection instead.
>>
> Blocked vcpus, AFAICT. But that's not relevant here.
>
>> Therefore, in the case that an NV arrives, we know for certain that one
>> of the NV-eligible vcpus has had desc.on set by hardware, and we can
>> uniquely identify it by searching for the vcpu for which desc.on is set.
>>
> Yeah, but we ca have more than one of them. You said "I can't currently
> decide whether this will be quicker or slower", which I read like you
> were suggesting that not draining the queue was a plausible alternative,
> while I now think it's not.
>
> Perhaps you were not meaning anything like that, so it was not necessary
> for me to point this out, in which case, sorry for the noise. :-)

To be clear, (assuming that a kicked vcpu is removed from the list),
then both options of kicking exactly one vcpu or kicking all vcpus will
function.  The end result after all processing of NVs will be that every
vcpu with desc.on set will be kicked exactly once.

I just was concerned about the O() of searching the list on a subsequent
NV, knowing that we most likely took the relevant entry off the list on
the previous NV.

>
>> In the case of stacked NV's, we cannot associate which specific vcpu
>> caused which NV, but we know that we will get one NV per vcpu needing
>> kicking.
>>
> Exactly, and that's what I'm talking about, and why I'm saying that
> waking everyone is the only solution. The bottom line being that, even
> in case this is deemed too slow, we don't have the option of waking only
> one vcpu at each NV, as we wouldn't know who to wake, and hence we'd
> need to make things faster in some other way.

Ah - I see your point now.

Yes - kicking exactly one vcpu per NV could result in a different vcpu
being deferred based on the interrupt activity of other vcpus and its
position in the list.

In which case, we should eagerly kick all vcpus.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Dario Faggioli

On Thu, 2015-07-02 at 13:16 +0100, Andrew Cooper wrote:
> On 02/07/15 13:04, Dario Faggioli wrote:
> > On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote:

> >> I can't currently decide whether this will be quicker or slower overall,
> >> or (most likely) it will even out to equal in the general case.
> >>
> > Well, given the thing works as you (two) just described, I think
> > draining the list is the only thing we can do.
> >
> > In fact, AFAICT, since we can't know for what vcpu a particular
> > notification is intended, we don't have alternatives to waking them all,
> > do we?
> 
> Perhaps you misunderstand.
> 
I'm quite sure I was. While I think now I'm getting it.

> Every single vcpu has a PI descriptor which is shared memory with hardware.
> 
Right.

> A NV is delivered strictly when hardware atomically changes desc.on from
> 0 to 1.  i.e. the first time that an oustanding notification arrives. 
> (iirc, desc.on is later cleared by hardware when the vcpu is scheduled
> and the vector(s) actually injected.)
> 
> Part of the scheduling modifications alter when a vcpu is eligible to
> have NV's delivered on its behalf.  non-scheduled vcpus get NV's while
> scheduled vcpus have direct injection instead.
> 
Blocked vcpus, AFAICT. But that's not relevant here.

> Therefore, in the case that an NV arrives, we know for certain that one
> of the NV-eligible vcpus has had desc.on set by hardware, and we can
> uniquely identify it by searching for the vcpu for which desc.on is set.
> 
Yeah, but we ca have more than one of them. You said "I can't currently
decide whether this will be quicker or slower", which I read like you
were suggesting that not draining the queue was a plausible alternative,
while I now think it's not.

Perhaps you were not meaning anything like that, so it was not necessary
for me to point this out, in which case, sorry for the noise. :-)

> In the case of stacked NV's, we cannot associate which specific vcpu
> caused which NV, but we know that we will get one NV per vcpu needing
> kicking.
> 
Exactly, and that's what I'm talking about, and why I'm saying that
waking everyone is the only solution. The bottom line being that, even
in case this is deemed too slow, we don't have the option of waking only
one vcpu at each NV, as we wouldn't know who to wake, and hence we'd
need to make things faster in some other way.


Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Andrew Cooper

On 02/07/15 13:04, Dario Faggioli wrote:
> On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote:
>> On 02/07/15 09:30, Dario Faggioli wrote:
>>> It is, therefore, not effective in making sure that, even with only one
>>> notification, you only kick the interested vcpu.
>>>
>>> This is the third time that I ask:
>>>  (1) whether it is possible to have more vcpus queued on one pcpu PI 
>>>  blocked list with desc.on (I really believe it is);
>>>  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
>>>  soon as any notification arrives, instead that putting together a
>>>  mechanism for kicking only a specific one.
>> We will receive one NV for every time the hardware managed to
>> successfully set desc.on
>>
> Right, I see it now, thanks.
>
>> If multiple stack up and we proactively drain the list, we will
>> subsequently search the list to completion for all remaining NV's, due
>> to finding no appropriate entries.
>>
>> I can't currently decide whether this will be quicker or slower overall,
>> or (most likely) it will even out to equal in the general case.
>>
> Well, given the thing works as you (two) just described, I think
> draining the list is the only thing we can do.
>
> In fact, AFAICT, since we can't know for what vcpu a particular
> notification is intended, we don't have alternatives to waking them all,
> do we?

Perhaps you misunderstand.

Every single vcpu has a PI descriptor which is shared memory with hardware.

A NV is delivered strictly when hardware atomically changes desc.on from
0 to 1.  i.e. the first time that an oustanding notification arrives. 
(iirc, desc.on is later cleared by hardware when the vcpu is scheduled
and the vector(s) actually injected.)

Part of the scheduling modifications alter when a vcpu is eligible to
have NV's delivered on its behalf.  non-scheduled vcpus get NV's while
scheduled vcpus have direct injection instead.

Therefore, in the case that an NV arrives, we know for certain that one
of the NV-eligible vcpus has had desc.on set by hardware, and we can
uniquely identify it by searching for the vcpu for which desc.on is set.

In the case of stacked NV's, we cannot associate which specific vcpu
caused which NV, but we know that we will get one NV per vcpu needing
kicking.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Wu, Feng



> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Thursday, July 02, 2015 8:04 PM
> To: Andrew Cooper
> Cc: Wu, Feng; Tian, Kevin; k...@xen.org; george.dun...@eu.citrix.com;
> xen-devel@lists.xen.org; jbeul...@suse.com; Zhang, Yang Z
> Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote:
> > On 02/07/15 09:30, Dario Faggioli wrote:
> 
> > > It is, therefore, not effective in making sure that, even with only one
> > > notification, you only kick the interested vcpu.
> > >
> > > This is the third time that I ask:
> > >  (1) whether it is possible to have more vcpus queued on one pcpu PI
> > >  blocked list with desc.on (I really believe it is);
> > >  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
> > >  soon as any notification arrives, instead that putting together a
> > >  mechanism for kicking only a specific one.
> >
> > We will receive one NV for every time the hardware managed to
> > successfully set desc.on
> >
> Right, I see it now, thanks.
> 
> > If multiple stack up and we proactively drain the list, we will
> > subsequently search the list to completion for all remaining NV's, due
> > to finding no appropriate entries.
> >
> > I can't currently decide whether this will be quicker or slower overall,
> > or (most likely) it will even out to equal in the general case.
> >
> Well, given the thing works as you (two) just described, I think
> draining the list is the only thing we can do.
> 
> In fact, AFAICT, since we can't know for what vcpu a particular
> notification is intended,

Exactly, when notification event happens, the hardware sets 'ON',
software will find the vCPU with 'ON' set, in fact, software doesn't
know which vCPU the wakeup event is targeting, the only thing it
can do is kicking the vCPUs with desc.on = 1.

Thanks,
Feng

 we don't have alternatives to waking them all,
> do we?
> 
> Dario
> 
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Dario Faggioli

On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote:
> On 02/07/15 09:30, Dario Faggioli wrote:

> > It is, therefore, not effective in making sure that, even with only one
> > notification, you only kick the interested vcpu.
> >
> > This is the third time that I ask:
> >  (1) whether it is possible to have more vcpus queued on one pcpu PI 
> >  blocked list with desc.on (I really believe it is);
> >  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
> >  soon as any notification arrives, instead that putting together a
> >  mechanism for kicking only a specific one.
> 
> We will receive one NV for every time the hardware managed to
> successfully set desc.on
> 
Right, I see it now, thanks.

> If multiple stack up and we proactively drain the list, we will
> subsequently search the list to completion for all remaining NV's, due
> to finding no appropriate entries.
> 
> I can't currently decide whether this will be quicker or slower overall,
> or (most likely) it will even out to equal in the general case.
> 
Well, given the thing works as you (two) just described, I think
draining the list is the only thing we can do.

In fact, AFAICT, since we can't know for what vcpu a particular
notification is intended, we don't have alternatives to waking them all,
do we?

Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Wu, Feng



> -Original Message-
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Thursday, July 02, 2015 6:30 PM
> To: Dario Faggioli; Wu, Feng
> Cc: xen-devel@lists.xen.org; Zhang, Yang Z; george.dun...@eu.citrix.com;
> Tian, Kevin; k...@xen.org; jbeul...@suse.com
> Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> On 02/07/15 09:30, Dario Faggioli wrote:
> > On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote:
> >
> >>>>> +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu),
> >>>>> +pi_blocked_vcpu_list)
> >>>>> +if ( vmx->pi_desc.on )
> >>>>> +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet);
> >>>> There is a logical bug here.  If we have two NV's delivered to this
> >>>> pcpu, we will kick the first vcpu twice.
> >>>>
> >>>> On finding desc.on, a kick should be scheduled, then the vcpu removed
> >>>> from this list.  With desc.on set, we know for certain that another NV
> >>>> will not arrive for it until it has been scheduled again and the
> >>>> interrupt posted.
> >>>>
> >>> Yes, that seems a possible issue (and one that should indeed be
> >>> avoided).
> >>>
> >>> I'm still unsure about the one that I raised myself but, if it is
> >>> possible to have more than one vcpu in a pcpu list, with desc.on==true,
> >>> then it looks to me that we kick all of them, for each notification.
> >>>
> >>> Added what Andrew's spotted, if there are a bunch of vcpus, queued with
> >>> desc.on==ture, and a bunch of notifications arrives before the tasklet
> >>> gets executed, we'll be kicking the whole bunch of them for a bunch of
> >>> times! :-/
> >> As Andrew mentioned, removing the vCPUs with desc.on = true from the
> >> list can avoid kick vCPUs for multiple times.
> >>
> > It avoids kicking vcpus multiple times if more than one notification
> > arrives, yes.
> >
> > It is, therefore, not effective in making sure that, even with only one
> > notification, you only kick the interested vcpu.
> >
> > This is the third time that I ask:
> >  (1) whether it is possible to have more vcpus queued on one pcpu PI
> >  blocked list with desc.on (I really believe it is);
> >  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
> >  soon as any notification arrives, instead that putting together a
> >  mechanism for kicking only a specific one.
> 
> We will receive one NV for every time the hardware managed to
> successfully set desc.on
> 
> If multiple stack up and we proactively drain the list, we will
> subsequently search the list to completion for all remaining NV's, due
> to finding no appropriate entries.
> 
> I can't currently decide whether this will be quicker or slower overall,
> or (most likely) it will even out to equal in the general case.

What do you mean by "general case"?

Thanks,
Feng

> 
> ~Andrew
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Wu, Feng



> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Thursday, July 02, 2015 6:10 PM
> To: Wu, Feng
> Cc: Andrew Cooper; xen-devel@lists.xen.org; Zhang, Yang Z;
> george.dun...@eu.citrix.com; Tian, Kevin; k...@xen.org; jbeul...@suse.com
> Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> On Thu, 2015-07-02 at 08:58 +, Wu, Feng wrote:
> 
> > > -Original Message-
> > > From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> > >
> > > This is the third time that I ask:
> > >  (1) whether it is possible to have more vcpus queued on one pcpu PI
> > >  blocked list with desc.on (I really believe it is);
> >
> > I think it is, please see the following scenario:
> >
> > When cpu masks the interrupts, and an external interrupt occurs for the
> > assigned device while the target vCPU2 is blocked, the wakeup notification
> > event handler has no chance to run, after a while, another wakeup
> > notification event for vCPU4 blocking on the same pCPU occurs,
> > after cpu unmakes the interrupts, wakeup notification handler
> > gets called. Then we get:
> > vCPU2, desc.on = 1 and vCPU4, desc.on = 1
> > Then in the handler we need to kick both of them.
> >
> Ok, first of all, thanks for answering! :-)
> 
> And yes, this makes sense.
> 
> > >  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
> > >  soon as any notification arrives, instead that putting together a
> > >  mechanism for kicking only a specific one.
> > >
> > Why can't we kick all of them, 'desc.on = 1' means there is a pending
> > interrupt, when we meet this condition, kicking the related vCPU should
> > be the right thing to do.
> >
> Right, I see it now. I felt like I was missing something, and that's why
> I was asking to you to elaborate a bit more.
> Thanks again for having done this. I was missing/forgetting half of the
> way desc.on is actually handled, sorry for this.
> 
> BTW, I'm finding it hard reading this series from the archives; there
> appears to be some threading issues and some missing messages. I also
> don't have it in my inbox, because my filters failed to spot and flag it
> properly. If you send a new version, please, Cc me, so it will be easier
> for me to look at all the patches, and provide a more helpful review.

Sure, thanks for the review!

Thanks,
Feng

> 
> Thanks and Regards,
> Dario
> 
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Andrew Cooper

On 02/07/15 09:30, Dario Faggioli wrote:
> On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote:
>
> +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu),
> +pi_blocked_vcpu_list)
> +if ( vmx->pi_desc.on )
> +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet);
 There is a logical bug here.  If we have two NV's delivered to this
 pcpu, we will kick the first vcpu twice.

 On finding desc.on, a kick should be scheduled, then the vcpu removed
 from this list.  With desc.on set, we know for certain that another NV
 will not arrive for it until it has been scheduled again and the
 interrupt posted.

>>> Yes, that seems a possible issue (and one that should indeed be
>>> avoided).
>>>
>>> I'm still unsure about the one that I raised myself but, if it is
>>> possible to have more than one vcpu in a pcpu list, with desc.on==true,
>>> then it looks to me that we kick all of them, for each notification.
>>>
>>> Added what Andrew's spotted, if there are a bunch of vcpus, queued with
>>> desc.on==ture, and a bunch of notifications arrives before the tasklet
>>> gets executed, we'll be kicking the whole bunch of them for a bunch of
>>> times! :-/
>> As Andrew mentioned, removing the vCPUs with desc.on = true from the
>> list can avoid kick vCPUs for multiple times.
>>
> It avoids kicking vcpus multiple times if more than one notification
> arrives, yes.
>
> It is, therefore, not effective in making sure that, even with only one
> notification, you only kick the interested vcpu.
>
> This is the third time that I ask:
>  (1) whether it is possible to have more vcpus queued on one pcpu PI 
>  blocked list with desc.on (I really believe it is);
>  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
>  soon as any notification arrives, instead that putting together a
>  mechanism for kicking only a specific one.

We will receive one NV for every time the hardware managed to
successfully set desc.on

If multiple stack up and we proactively drain the list, we will
subsequently search the list to completion for all remaining NV's, due
to finding no appropriate entries.

I can't currently decide whether this will be quicker or slower overall,
or (most likely) it will even out to equal in the general case.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Dario Faggioli

On Thu, 2015-07-02 at 08:58 +, Wu, Feng wrote:

> > -Original Message-
> > From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> >
> > This is the third time that I ask:
> >  (1) whether it is possible to have more vcpus queued on one pcpu PI
> >  blocked list with desc.on (I really believe it is);
> 
> I think it is, please see the following scenario:
> 
> When cpu masks the interrupts, and an external interrupt occurs for the
> assigned device while the target vCPU2 is blocked, the wakeup notification
> event handler has no chance to run, after a while, another wakeup
> notification event for vCPU4 blocking on the same pCPU occurs,
> after cpu unmakes the interrupts, wakeup notification handler
> gets called. Then we get:
>   vCPU2, desc.on = 1 and vCPU4, desc.on = 1
> Then in the handler we need to kick both of them.
> 
Ok, first of all, thanks for answering! :-)

And yes, this makes sense.

> >  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
> >  soon as any notification arrives, instead that putting together a
> >  mechanism for kicking only a specific one.
> > 
> Why can't we kick all of them, 'desc.on = 1' means there is a pending
> interrupt, when we meet this condition, kicking the related vCPU should
> be the right thing to do.
> 
Right, I see it now. I felt like I was missing something, and that's why
I was asking to you to elaborate a bit more.
Thanks again for having done this. I was missing/forgetting half of the
way desc.on is actually handled, sorry for this.

BTW, I'm finding it hard reading this series from the archives; there
appears to be some threading issues and some missing messages. I also
don't have it in my inbox, because my filters failed to spot and flag it
properly. If you send a new version, please, Cc me, so it will be easier
for me to look at all the patches, and provide a more helpful review.

Thanks and Regards,
Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Wu, Feng



> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Thursday, July 02, 2015 4:30 PM
> To: Wu, Feng
> Cc: Andrew Cooper; xen-devel@lists.xen.org; Zhang, Yang Z;
> george.dun...@eu.citrix.com; Tian, Kevin; k...@xen.org; jbeul...@suse.com
> Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote:
> 
> > > > > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu),
> > > > > +pi_blocked_vcpu_list)
> > > > > +if ( vmx->pi_desc.on )
> > > > > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet);
> > > >
> > > > There is a logical bug here.  If we have two NV's delivered to this
> > > > pcpu, we will kick the first vcpu twice.
> > > >
> > > > On finding desc.on, a kick should be scheduled, then the vcpu removed
> > > > from this list.  With desc.on set, we know for certain that another NV
> > > > will not arrive for it until it has been scheduled again and the
> > > > interrupt posted.
> > > >
> > > Yes, that seems a possible issue (and one that should indeed be
> > > avoided).
> > >
> > > I'm still unsure about the one that I raised myself but, if it is
> > > possible to have more than one vcpu in a pcpu list, with desc.on==true,
> > > then it looks to me that we kick all of them, for each notification.
> > >
> > > Added what Andrew's spotted, if there are a bunch of vcpus, queued with
> > > desc.on==ture, and a bunch of notifications arrives before the tasklet
> > > gets executed, we'll be kicking the whole bunch of them for a bunch of
> > > times! :-/
> >
> > As Andrew mentioned, removing the vCPUs with desc.on = true from the
> > list can avoid kick vCPUs for multiple times.
> >
> It avoids kicking vcpus multiple times if more than one notification
> arrives, yes.
> 
> It is, therefore, not effective in making sure that, even with only one
> notification, you only kick the interested vcpu.
> 
> This is the third time that I ask:
>  (1) whether it is possible to have more vcpus queued on one pcpu PI
>  blocked list with desc.on (I really believe it is);

I think it is, please see the following scenario:

When cpu masks the interrupts, and an external interrupt occurs for the
assigned device while the target vCPU2 is blocked, the wakeup notification
event handler has no chance to run, after a while, another wakeup
notification event for vCPU4 blocking on the same pCPU occurs,
after cpu unmakes the interrupts, wakeup notification handler
gets called. Then we get:
vCPU2, desc.on = 1 and vCPU4, desc.on = 1
Then in the handler we need to kick both of them.

>  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
>  soon as any notification arrives, instead that putting together a
>  mechanism for kicking only a specific one.
> 
Why can't we kick all of them, 'desc.on = 1' means there is a pending
interrupt, when we meet this condition, kicking the related vCPU should
be the right thing to do.

Thanks,
Feng

> The fact that you're not answering is not so much of a big deal for
> me... I'll just keep asking! :-D
> 
> 
> Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Dario Faggioli

On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote:

> > > > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu),
> > > > +pi_blocked_vcpu_list)
> > > > +if ( vmx->pi_desc.on )
> > > > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet);
> > >
> > > There is a logical bug here.  If we have two NV's delivered to this
> > > pcpu, we will kick the first vcpu twice.
> > >
> > > On finding desc.on, a kick should be scheduled, then the vcpu removed
> > > from this list.  With desc.on set, we know for certain that another NV
> > > will not arrive for it until it has been scheduled again and the
> > > interrupt posted.
> > >
> > Yes, that seems a possible issue (and one that should indeed be
> > avoided).
> > 
> > I'm still unsure about the one that I raised myself but, if it is
> > possible to have more than one vcpu in a pcpu list, with desc.on==true,
> > then it looks to me that we kick all of them, for each notification.
> > 
> > Added what Andrew's spotted, if there are a bunch of vcpus, queued with
> > desc.on==ture, and a bunch of notifications arrives before the tasklet
> > gets executed, we'll be kicking the whole bunch of them for a bunch of
> > times! :-/
> 
> As Andrew mentioned, removing the vCPUs with desc.on = true from the
> list can avoid kick vCPUs for multiple times.
> 
It avoids kicking vcpus multiple times if more than one notification
arrives, yes.

It is, therefore, not effective in making sure that, even with only one
notification, you only kick the interested vcpu.

This is the third time that I ask:
 (1) whether it is possible to have more vcpus queued on one pcpu PI 
 blocked list with desc.on (I really believe it is);
 (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
 soon as any notification arrives, instead that putting together a
 mechanism for kicking only a specific one.

The fact that you're not answering is not so much of a big deal for
me... I'll just keep asking! :-D


Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-01 Thread Wu, Feng



> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Wednesday, July 01, 2015 9:26 PM
> To: Andrew Cooper
> Cc: Wu, Feng; xen-devel@lists.xen.org; Zhang, Yang Z;
> george.dun...@eu.citrix.com; Tian, Kevin; k...@xen.org; jbeul...@suse.com
> Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> On Tue, 2015-06-30 at 11:11 +0100, Andrew Cooper wrote:
> > On 24/06/15 06:18, Feng Wu wrote:
> 
> > > +/*
> > > + * Handle VT-d posted-interrupt when VCPU is blocked.
> > > + */
> > > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> > > +{
> > > +struct arch_vmx_struct *vmx;
> > > +unsigned int cpu = smp_processor_id();
> > > +
> > > +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> > > +
> > > +/*
> > > + * FIXME: The length of the list depends on how many
> > > + * vCPU is current blocked on this specific pCPU.
> > > + * This may hurt the interrupt latency if the list
> > > + * grows to too many entries.
> > > + */
> > > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu),
> > > +pi_blocked_vcpu_list)
> > > +if ( vmx->pi_desc.on )
> > > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet);
> >
> > There is a logical bug here.  If we have two NV's delivered to this
> > pcpu, we will kick the first vcpu twice.
> >
> > On finding desc.on, a kick should be scheduled, then the vcpu removed
> > from this list.  With desc.on set, we know for certain that another NV
> > will not arrive for it until it has been scheduled again and the
> > interrupt posted.
> >
> Yes, that seems a possible issue (and one that should indeed be
> avoided).
> 
> I'm still unsure about the one that I raised myself but, if it is
> possible to have more than one vcpu in a pcpu list, with desc.on==true,
> then it looks to me that we kick all of them, for each notification.
> 
> Added what Andrew's spotted, if there are a bunch of vcpus, queued with
> desc.on==ture, and a bunch of notifications arrives before the tasklet
> gets executed, we'll be kicking the whole bunch of them for a bunch of
> times! :-/

As Andrew mentioned, removing the vCPUs with desc.on = true from the
list can avoid kick vCPUs for multiple times.

Thanks,
Feng

> 
> Regards,
> Dario
> 
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-01 Thread Wu, Feng



> -Original Message-
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Tuesday, June 30, 2015 6:12 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: k...@xen.org; jbeul...@suse.com; Tian, Kevin; Zhang, Yang Z;
> george.dun...@eu.citrix.com
> Subject: Re: [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> 
> On 24/06/15 06:18, Feng Wu wrote:
> > @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
> vmx_function_table = {
> >  .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
> >  };
> >
> > +/*
> > + * Handle VT-d posted-interrupt when VCPU is blocked.
> > + */
> > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> > +{
> > +struct arch_vmx_struct *vmx;
> > +unsigned int cpu = smp_processor_id();
> > +
> > +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> > +
> > +/*
> > + * FIXME: The length of the list depends on how many
> > + * vCPU is current blocked on this specific pCPU.
> > + * This may hurt the interrupt latency if the list
> > + * grows to too many entries.
> > + */
> > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu),
> > +pi_blocked_vcpu_list)
> > +if ( vmx->pi_desc.on )
> > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet);
> 
> There is a logical bug here.  If we have two NV's delivered to this
> pcpu, we will kick the first vcpu twice.
> 
> On finding desc.on, a kick should be scheduled, then the vcpu removed
> from this list.  

So removing the vCPU from the blocking list here can avoid kicking the
vCPU multiple times, right?

Thanks,
Feng

With desc.on set, we know for certain that another NV
> will not arrive for it until it has been scheduled again and the
> interrupt posted.
> 
> ~Andrew
> 
> > +
> > +spin_unlock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> > +
> > +ack_APIC_irq();
> > +this_cpu(irq_count)++;
> > +}
> > +
> >  const struct hvm_function_table * __init start_vmx(void)
> >  {
> >  set_in_cr4(X86_CR4_VMXE);
> >


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-01 Thread Dario Faggioli

On Tue, 2015-06-30 at 11:11 +0100, Andrew Cooper wrote:
> On 24/06/15 06:18, Feng Wu wrote:

> > +/*
> > + * Handle VT-d posted-interrupt when VCPU is blocked.
> > + */
> > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> > +{
> > +struct arch_vmx_struct *vmx;
> > +unsigned int cpu = smp_processor_id();
> > +
> > +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> > +
> > +/*
> > + * FIXME: The length of the list depends on how many
> > + * vCPU is current blocked on this specific pCPU.
> > + * This may hurt the interrupt latency if the list
> > + * grows to too many entries.
> > + */
> > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu),
> > +pi_blocked_vcpu_list)
> > +if ( vmx->pi_desc.on )
> > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet);
> 
> There is a logical bug here.  If we have two NV's delivered to this
> pcpu, we will kick the first vcpu twice.
> 
> On finding desc.on, a kick should be scheduled, then the vcpu removed
> from this list.  With desc.on set, we know for certain that another NV
> will not arrive for it until it has been scheduled again and the
> interrupt posted.
> 
Yes, that seems a possible issue (and one that should indeed be
avoided).

I'm still unsure about the one that I raised myself but, if it is
possible to have more than one vcpu in a pcpu list, with desc.on==true,
then it looks to me that we kick all of them, for each notification.

Added what Andrew's spotted, if there are a bunch of vcpus, queued with
desc.on==ture, and a bunch of notifications arrives before the tasklet
gets executed, we'll be kicking the whole bunch of them for a bunch of
times! :-/

Regards,
Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-06-30 Thread Andrew Cooper

On 24/06/15 06:18, Feng Wu wrote:
> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata 
> vmx_function_table = {
>  .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
>  };
>  
> +/*
> + * Handle VT-d posted-interrupt when VCPU is blocked.
> + */
> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> +struct arch_vmx_struct *vmx;
> +unsigned int cpu = smp_processor_id();
> +
> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> +
> +/*
> + * FIXME: The length of the list depends on how many
> + * vCPU is current blocked on this specific pCPU.
> + * This may hurt the interrupt latency if the list
> + * grows to too many entries.
> + */
> +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu),
> +pi_blocked_vcpu_list)
> +if ( vmx->pi_desc.on )
> +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet);

There is a logical bug here.  If we have two NV's delivered to this
pcpu, we will kick the first vcpu twice.

On finding desc.on, a kick should be scheduled, then the vcpu removed
from this list.  With desc.on set, we know for certain that another NV
will not arrive for it until it has been scheduled again and the
interrupt posted.

~Andrew

> +
> +spin_unlock(&per_cpu(pi_blocked_vcpu_lock, cpu));
> +
> +ack_APIC_irq();
> +this_cpu(irq_count)++;
> +}
> +
>  const struct hvm_function_table * __init start_vmx(void)
>  {
>  set_in_cr4(X86_CR4_VMXE);
>


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-06-29 Thread Andrew Cooper

On 24/06/15 06:18, Feng Wu wrote:
> This patch includes the following aspects:
> - Add a global vector to wake up the blocked vCPU
>   when an interrupt is being posted to it (This
>   part was sugguested by Yang Zhang ).
> - Adds a new per-vCPU tasklet to wakeup the blocked
>   vCPU. It can be used in the case vcpu_unblock
>   cannot be called directly.
> - Define two per-cpu variables:
>   * pi_blocked_vcpu:
>   A list storing the vCPUs which were blocked on this pCPU.
>
>   * pi_blocked_vcpu_lock:
>   The spinlock to protect pi_blocked_vcpu.
>
> Signed-off-by: Feng Wu 
> ---
> v3:
> - This patch is generated by merging the following three patches in v2:
>[RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
>[RFC v2 10/15] vmx: Define two per-cpu variables
>[RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
> - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
> - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct'
> - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler'
> - Make pi_wakeup_interrupt() static
> - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
> - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
> - Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
> - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'
>
>  xen/arch/x86/hvm/vmx/vmcs.c|  3 +++
>  xen/arch/x86/hvm/vmx/vmx.c | 54 
> ++
>  xen/include/asm-x86/hvm/hvm.h  |  1 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h |  5 
>  xen/include/asm-x86/hvm/vmx/vmx.h  |  5 
>  5 files changed, 68 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 11dc1b5..0c5ce3f 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -631,6 +631,9 @@ int vmx_cpu_up(void)
>  if ( cpu_has_vmx_vpid )
>  vpid_sync_all();
>  
> +INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> +spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
> +
>  return 0;
>  }
>  
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index b94ef6a..7db6009 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr, 
> uint64_t *msr_content);
>  static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
>  static void vmx_invlpg_intercept(unsigned long vaddr);
>  
> +/*
> + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
> + * can find which vCPU should be waken up.
> + */
> +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
> +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> +
>  uint8_t __read_mostly posted_intr_vector;
> +uint8_t __read_mostly pi_wakeup_vector;
> +
> +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg)
> +{
> +vcpu_unblock((struct vcpu *)arg);
> +}
>  
>  static int vmx_domain_initialise(struct domain *d)
>  {
> @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>  if ( v->vcpu_id == 0 )
>  v->arch.user_regs.eax = 1;
>  
> +tasklet_init(
> +&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet,
> +pi_vcpu_wakeup_tasklet_handler,
> +(unsigned long)v);

c/s f6dd295 indicates that the global tasklet lock causes a bottleneck
when injecting interrupts, and replaced a tasklet with a softirq to fix
the scalability issue.

I would expect exactly the bottleneck to exist here.

> +
> +INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +
>  return 0;
>  }
>  
>  static void vmx_vcpu_destroy(struct vcpu *v)
>  {
> +tasklet_kill(&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet);
>  /*
>   * There are cases that domain still remains in log-dirty mode when it is
>   * about to be destroyed (ex, user types 'xl destroy '), in which 
> case
> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata 
> vmx_function_table = {
>  .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
>  };
>  
> +/*
> + * Handle VT-d posted-interrupt when VCPU is blocked.
> + */
> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> +struct arch_vmx_struct *vmx;
> +unsigned int cpu = smp_processor_id();
> +
> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));

this_cpu($foo) should be used in preference to per_cpu($foo, $myself).

However, always hoist repeated uses of this/per_cpu into local
variables, as the compiler is unable to elide repeated accesses (because
of a deliberate anti-optimisation behind the scenes).

spinlock_t *lock = &this_cpu(pi_blocked_vcpu_lock);
list_head *blocked_vcpus = &this_cpu(ps_blocked_vcpu);

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-06-23 Thread Feng Wu

This patch includes the following aspects:
- Add a global vector to wake up the blocked vCPU
  when an interrupt is being posted to it (This
  part was sugguested by Yang Zhang ).
- Adds a new per-vCPU tasklet to wakeup the blocked
  vCPU. It can be used in the case vcpu_unblock
  cannot be called directly.
- Define two per-cpu variables:
  * pi_blocked_vcpu:
  A list storing the vCPUs which were blocked on this pCPU.

  * pi_blocked_vcpu_lock:
  The spinlock to protect pi_blocked_vcpu.

Signed-off-by: Feng Wu 
---
v3:
- This patch is generated by merging the following three patches in v2:
   [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
   [RFC v2 10/15] vmx: Define two per-cpu variables
   [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
- rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
- Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct'
- rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler'
- Make pi_wakeup_interrupt() static
- Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
- move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
- Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
- Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'

 xen/arch/x86/hvm/vmx/vmcs.c|  3 +++
 xen/arch/x86/hvm/vmx/vmx.c | 54 ++
 xen/include/asm-x86/hvm/hvm.h  |  1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |  5 
 xen/include/asm-x86/hvm/vmx/vmx.h  |  5 
 5 files changed, 68 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 11dc1b5..0c5ce3f 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -631,6 +631,9 @@ int vmx_cpu_up(void)
 if ( cpu_has_vmx_vpid )
 vpid_sync_all();
 
+INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
+spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
+
 return 0;
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b94ef6a..7db6009 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t 
*msr_content);
 static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
 static void vmx_invlpg_intercept(unsigned long vaddr);
 
+/*
+ * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
+ * can find which vCPU should be waken up.
+ */
+DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
+DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
+
 uint8_t __read_mostly posted_intr_vector;
+uint8_t __read_mostly pi_wakeup_vector;
+
+static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg)
+{
+vcpu_unblock((struct vcpu *)arg);
+}
 
 static int vmx_domain_initialise(struct domain *d)
 {
@@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 if ( v->vcpu_id == 0 )
 v->arch.user_regs.eax = 1;
 
+tasklet_init(
+&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet,
+pi_vcpu_wakeup_tasklet_handler,
+(unsigned long)v);
+
+INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
+
 return 0;
 }
 
 static void vmx_vcpu_destroy(struct vcpu *v)
 {
+tasklet_kill(&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet);
 /*
  * There are cases that domain still remains in log-dirty mode when it is
  * about to be destroyed (ex, user types 'xl destroy '), in which case
@@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata 
vmx_function_table = {
 .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
 };
 
+/*
+ * Handle VT-d posted-interrupt when VCPU is blocked.
+ */
+static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
+{
+struct arch_vmx_struct *vmx;
+unsigned int cpu = smp_processor_id();
+
+spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu));
+
+/*
+ * FIXME: The length of the list depends on how many
+ * vCPU is current blocked on this specific pCPU.
+ * This may hurt the interrupt latency if the list
+ * grows to too many entries.
+ */
+list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu),
+pi_blocked_vcpu_list)
+if ( vmx->pi_desc.on )
+tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet);
+
+spin_unlock(&per_cpu(pi_blocked_vcpu_lock, cpu));
+
+ack_APIC_irq();
+this_cpu(irq_count)++;
+}
+
 const struct hvm_function_table * __init start_vmx(void)
 {
 set_in_cr4(X86_CR4_VMXE);
@@ -1884,11 +1932,17 @@ const struct hvm_function_table * __init start_vmx(void)
 }
 
 if ( cpu_has_vmx_posted_intr_processing )
+{
 alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
+
+if ( iommu_intpost )
+alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
+}
 else
 {
 vmx_function_table.deliver_posted_intr = NULL;
 vmx_function_table.sync_pir_to_irr =

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

[Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

32 matches

Site Navigation

Mail list logo

Footer information