Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> -Original Message- > From: Jan Beulich [mailto:jbeul...@suse.com] > Sent: Friday, July 10, 2015 4:50 PM > To: Wu, Feng > Cc: Andrew Cooper; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; > xen-devel@lists.xen.org; k...@xen.org > Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU > is blocked > > >>> On 10.07.15 at 09:29, wrote: > >> From: Jan Beulich [mailto:jbeul...@suse.com] > >> Sent: Friday, July 10, 2015 2:32 PM > >> >>> On 10.07.15 at 08:21, wrote: > >> >> From: Jan Beulich [mailto:jbeul...@suse.com] > >> >> Sent: Thursday, July 09, 2015 3:26 PM > >> >> >>> On 09.07.15 at 00:49, wrote: > >> >> >> From: Andrew Cooper [mailto:andrew.coop...@citrix.com] > >> >> >> Sent: Wednesday, July 08, 2015 9:09 PM > >> >> >> On 08/07/2015 13:46, Jan Beulich wrote: > >> >> >> >>>> On 08.07.15 at 13:00, wrote: > >> >> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table > >> __initdata > >> >> >> >>> vmx_function_table = { > >> >> >> >>> .enable_msr_exit_interception = > >> >> vmx_enable_msr_exit_interception, > >> >> >> >>> }; > >> >> >> >>> > >> >> >> >>> +/* > >> >> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked. > >> >> >> >>> + */ > >> >> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > >> >> >> >>> +{ > >> >> >> >>> +struct arch_vmx_struct *vmx; > >> >> >> >>> +unsigned int cpu = smp_processor_id(); > >> >> >> >>> + > >> >> >> >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); > >> >> >> >>> + > >> >> >> >>> +/* > >> >> >> >>> + * FIXME: The length of the list depends on how many > >> >> >> >>> + * vCPU is current blocked on this specific pCPU. > >> >> >> >>> + * This may hurt the interrupt latency if the list > >> >> >> >>> + * grows to too many entries. > >> >> >> >>> + */ > >> >> >> >> let's go with this linked list first until a real issue is > >> >> >> >> identified. > >> >> >> > This is exactly the way of thinking I dislike when it comes to code > >> >> >> > that isn't intended to be experimental only: We shouldn't wait > >> >> >> > for problems to surface when we already can see them. I.e. if > >> >> >> > there are no plans to deal with this, I'd ask for the feature to be > >> >> >> > off by default and be properly marked experimental in the > >> >> >> > command line option documentation (so people know to stay > >> >> >> > away from it). > >> >> >> > >> >> >> And in this specific case, there is no balancing of vcpus across the > >> >> >> pcpus lists. > >> >> >> > >> >> >> One can construct a pathological case using pinning and pausing to > get > >> >> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer > >> >> >> interrupts will exasperate the problem by staying on the list for > >> >> >> longer > >> >> >> periods of time. > >> >> > > >> >> > In that extreme case I believe many contentions in other code paths > will > >> >> > be much larger than overhead caused by this structure limitation. > >> >> > >> >> Examples? > >> >> > >> >> >> IMO, the PI feature cannot be declared as done/supported with this > bug > >> >> >> remaining. OTOH, it is fine to be experimental, and disabled by > default > >> >> >> for people who wish to experiment. > >> >> >> > >> >> > > >> >> > Again, I don't expect to see it disabled as experimental. For good > >> >> > production > >> >> > environment where vcpus
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
>>> On 10.07.15 at 09:29, wrote: >> From: Jan Beulich [mailto:jbeul...@suse.com] >> Sent: Friday, July 10, 2015 2:32 PM >> >>> On 10.07.15 at 08:21, wrote: >> >> From: Jan Beulich [mailto:jbeul...@suse.com] >> >> Sent: Thursday, July 09, 2015 3:26 PM >> >> >>> On 09.07.15 at 00:49, wrote: >> >> >> From: Andrew Cooper [mailto:andrew.coop...@citrix.com] >> >> >> Sent: Wednesday, July 08, 2015 9:09 PM >> >> >> On 08/07/2015 13:46, Jan Beulich wrote: >> >> >> On 08.07.15 at 13:00, wrote: >> >> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table >> __initdata >> >> >> >>> vmx_function_table = { >> >> >> >>> .enable_msr_exit_interception = >> >> vmx_enable_msr_exit_interception, >> >> >> >>> }; >> >> >> >>> >> >> >> >>> +/* >> >> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked. >> >> >> >>> + */ >> >> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) >> >> >> >>> +{ >> >> >> >>> +struct arch_vmx_struct *vmx; >> >> >> >>> +unsigned int cpu = smp_processor_id(); >> >> >> >>> + >> >> >> >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); >> >> >> >>> + >> >> >> >>> +/* >> >> >> >>> + * FIXME: The length of the list depends on how many >> >> >> >>> + * vCPU is current blocked on this specific pCPU. >> >> >> >>> + * This may hurt the interrupt latency if the list >> >> >> >>> + * grows to too many entries. >> >> >> >>> + */ >> >> >> >> let's go with this linked list first until a real issue is >> >> >> >> identified. >> >> >> > This is exactly the way of thinking I dislike when it comes to code >> >> >> > that isn't intended to be experimental only: We shouldn't wait >> >> >> > for problems to surface when we already can see them. I.e. if >> >> >> > there are no plans to deal with this, I'd ask for the feature to be >> >> >> > off by default and be properly marked experimental in the >> >> >> > command line option documentation (so people know to stay >> >> >> > away from it). >> >> >> >> >> >> And in this specific case, there is no balancing of vcpus across the >> >> >> pcpus lists. >> >> >> >> >> >> One can construct a pathological case using pinning and pausing to get >> >> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer >> >> >> interrupts will exasperate the problem by staying on the list for >> >> >> longer >> >> >> periods of time. >> >> > >> >> > In that extreme case I believe many contentions in other code paths will >> >> > be much larger than overhead caused by this structure limitation. >> >> >> >> Examples? >> >> >> >> >> IMO, the PI feature cannot be declared as done/supported with this bug >> >> >> remaining. OTOH, it is fine to be experimental, and disabled by >> >> >> default >> >> >> for people who wish to experiment. >> >> >> >> >> > >> >> > Again, I don't expect to see it disabled as experimental. For good >> >> > production >> >> > environment where vcpus are well balanced and interrupt latency is >> >> > sensitive, >> >> > linked list should be efficient here. For bad environment like extreme > case >> >> > you raised, I don't know whether it really matters to just tune >> >> > interrupt >> >> > path. >> >> >> >> Can you _guarantee_ that everything potentially leading to such a >> >> pathological situation is covered by XSA-77? And even if it is now, >> >> removing elements from the waiver list would become significantly >> >> more difficult if disconnected behavior like this one would need to >> >> be taken into account. >> >> >> >> Please understand that history has told us to be rather more careful >> >> than might seem necessary with this: ATS originally having been >> >> enabled by default is one bold example, and the recent flood of MSI >> >> related XSAs is another; I suppose I could find more. All affecting >> >> code originating from Intel, apparently written with only functionality >> >> in mind, while having left out (other than basic) security considerations. >> >> >> >> IOW, with my committer role hat on, the feature is going to be >> >> experimental (and hence default off) unless the issue here gets >> >> addressed. And no, I cannot immediately suggest a good approach, >> >> and with all of the rush before the feature freeze I also can't justify >> >> taking a lot of time to think of options. >> > >> > Is it acceptable to you if I only add the blocked vcpus that has >> > assigned devices to the list? I think that should shorten the >> > length of the list. >> >> I actually implied this to be the case already, i.e. >> - if it's not, this needs to be fixed anyway, >> - it's not going to eliminate the concern (just think of a couple of >> many-vCPU guests all having devices assigned). > > So how about allocating multiple wakeup vectors (says, 16, maybe > we can make this configurable) and multiplex them amongst all the > blocked vCPUs? For such an approach to be effective, you'd need to know up front how many vCPU-s you may need to deal with, or allocate vect
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> -Original Message- > From: Jan Beulich [mailto:jbeul...@suse.com] > Sent: Friday, July 10, 2015 2:32 PM > To: Wu, Feng > Cc: Andrew Cooper; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; > xen-devel@lists.xen.org; k...@xen.org > Subject: RE: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU > is blocked > > >>> On 10.07.15 at 08:21, wrote: > > > > >> -Original Message- > >> From: Jan Beulich [mailto:jbeul...@suse.com] > >> Sent: Thursday, July 09, 2015 3:26 PM > >> To: Wu, Feng; Tian, Kevin > >> Cc: Andrew Cooper; george.dun...@eu.citrix.com; Zhang, Yang Z; > >> xen-devel@lists.xen.org; k...@xen.org > >> Subject: RE: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when > vCPU > >> is blocked > >> > >> >>> On 09.07.15 at 00:49, wrote: > >> >> From: Andrew Cooper [mailto:andrew.coop...@citrix.com] > >> >> Sent: Wednesday, July 08, 2015 9:09 PM > >> >> On 08/07/2015 13:46, Jan Beulich wrote: > >> >> >>>> On 08.07.15 at 13:00, wrote: > >> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table > __initdata > >> >> >>> vmx_function_table = { > >> >> >>> .enable_msr_exit_interception = > >> vmx_enable_msr_exit_interception, > >> >> >>> }; > >> >> >>> > >> >> >>> +/* > >> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked. > >> >> >>> + */ > >> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > >> >> >>> +{ > >> >> >>> +struct arch_vmx_struct *vmx; > >> >> >>> +unsigned int cpu = smp_processor_id(); > >> >> >>> + > >> >> >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); > >> >> >>> + > >> >> >>> +/* > >> >> >>> + * FIXME: The length of the list depends on how many > >> >> >>> + * vCPU is current blocked on this specific pCPU. > >> >> >>> + * This may hurt the interrupt latency if the list > >> >> >>> + * grows to too many entries. > >> >> >>> + */ > >> >> >> let's go with this linked list first until a real issue is > >> >> >> identified. > >> >> > This is exactly the way of thinking I dislike when it comes to code > >> >> > that isn't intended to be experimental only: We shouldn't wait > >> >> > for problems to surface when we already can see them. I.e. if > >> >> > there are no plans to deal with this, I'd ask for the feature to be > >> >> > off by default and be properly marked experimental in the > >> >> > command line option documentation (so people know to stay > >> >> > away from it). > >> >> > >> >> And in this specific case, there is no balancing of vcpus across the > >> >> pcpus lists. > >> >> > >> >> One can construct a pathological case using pinning and pausing to get > >> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer > >> >> interrupts will exasperate the problem by staying on the list for longer > >> >> periods of time. > >> > > >> > In that extreme case I believe many contentions in other code paths will > >> > be much larger than overhead caused by this structure limitation. > >> > >> Examples? > >> > >> >> IMO, the PI feature cannot be declared as done/supported with this bug > >> >> remaining. OTOH, it is fine to be experimental, and disabled by default > >> >> for people who wish to experiment. > >> >> > >> > > >> > Again, I don't expect to see it disabled as experimental. For good > >> > production > >> > environment where vcpus are well balanced and interrupt latency is > >> > sensitive, > >> > linked list should be efficient here. For bad environment like extreme > >> > case > >> > you raised, I don't know whether it really matters to just tune interrupt > >> > path. > >> > >> Can you _guarantee_ that everything potentially leading to such a
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
>>> On 10.07.15 at 08:21, wrote: > >> -Original Message- >> From: Jan Beulich [mailto:jbeul...@suse.com] >> Sent: Thursday, July 09, 2015 3:26 PM >> To: Wu, Feng; Tian, Kevin >> Cc: Andrew Cooper; george.dun...@eu.citrix.com; Zhang, Yang Z; >> xen-devel@lists.xen.org; k...@xen.org >> Subject: RE: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU >> is blocked >> >> >>> On 09.07.15 at 00:49, wrote: >> >> From: Andrew Cooper [mailto:andrew.coop...@citrix.com] >> >> Sent: Wednesday, July 08, 2015 9:09 PM >> >> On 08/07/2015 13:46, Jan Beulich wrote: >> >> >>>> On 08.07.15 at 13:00, wrote: >> >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata >> >> >>> vmx_function_table = { >> >> >>> .enable_msr_exit_interception = >> vmx_enable_msr_exit_interception, >> >> >>> }; >> >> >>> >> >> >>> +/* >> >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked. >> >> >>> + */ >> >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) >> >> >>> +{ >> >> >>> +struct arch_vmx_struct *vmx; >> >> >>> +unsigned int cpu = smp_processor_id(); >> >> >>> + >> >> >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); >> >> >>> + >> >> >>> +/* >> >> >>> + * FIXME: The length of the list depends on how many >> >> >>> + * vCPU is current blocked on this specific pCPU. >> >> >>> + * This may hurt the interrupt latency if the list >> >> >>> + * grows to too many entries. >> >> >>> + */ >> >> >> let's go with this linked list first until a real issue is identified. >> >> > This is exactly the way of thinking I dislike when it comes to code >> >> > that isn't intended to be experimental only: We shouldn't wait >> >> > for problems to surface when we already can see them. I.e. if >> >> > there are no plans to deal with this, I'd ask for the feature to be >> >> > off by default and be properly marked experimental in the >> >> > command line option documentation (so people know to stay >> >> > away from it). >> >> >> >> And in this specific case, there is no balancing of vcpus across the >> >> pcpus lists. >> >> >> >> One can construct a pathological case using pinning and pausing to get >> >> almost every vcpu on a single pcpu list, and vcpus recieving fewer >> >> interrupts will exasperate the problem by staying on the list for longer >> >> periods of time. >> > >> > In that extreme case I believe many contentions in other code paths will >> > be much larger than overhead caused by this structure limitation. >> >> Examples? >> >> >> IMO, the PI feature cannot be declared as done/supported with this bug >> >> remaining. OTOH, it is fine to be experimental, and disabled by default >> >> for people who wish to experiment. >> >> >> > >> > Again, I don't expect to see it disabled as experimental. For good >> > production >> > environment where vcpus are well balanced and interrupt latency is >> > sensitive, >> > linked list should be efficient here. For bad environment like extreme case >> > you raised, I don't know whether it really matters to just tune interrupt >> > path. >> >> Can you _guarantee_ that everything potentially leading to such a >> pathological situation is covered by XSA-77? And even if it is now, >> removing elements from the waiver list would become significantly >> more difficult if disconnected behavior like this one would need to >> be taken into account. >> >> Please understand that history has told us to be rather more careful >> than might seem necessary with this: ATS originally having been >> enabled by default is one bold example, and the recent flood of MSI >> related XSAs is another; I suppose I could find more. All affecting >> code originating from Intel, apparently written with only functionality >> in mind, while having left out (other than basic) security considerations. >> >> IOW, with my committer role hat on, the feature is going to be >> experimental (and hence default off) unless the issue here gets >> addressed. And no, I cannot immediately suggest a good approach, >> and with all of the rush before the feature freeze I also can't justify >> taking a lot of time to think of options. > > Is it acceptable to you if I only add the blocked vcpus that has > assigned devices to the list? I think that should shorten the > length of the list. I actually implied this to be the case already, i.e. - if it's not, this needs to be fixed anyway, - it's not going to eliminate the concern (just think of a couple of many-vCPU guests all having devices assigned). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> -Original Message- > From: Jan Beulich [mailto:jbeul...@suse.com] > Sent: Thursday, July 09, 2015 3:26 PM > To: Wu, Feng; Tian, Kevin > Cc: Andrew Cooper; george.dun...@eu.citrix.com; Zhang, Yang Z; > xen-devel@lists.xen.org; k...@xen.org > Subject: RE: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU > is blocked > > >>> On 09.07.15 at 00:49, wrote: > >> From: Andrew Cooper [mailto:andrew.coop...@citrix.com] > >> Sent: Wednesday, July 08, 2015 9:09 PM > >> On 08/07/2015 13:46, Jan Beulich wrote: > >> >>>> On 08.07.15 at 13:00, wrote: > >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata > >> >>> vmx_function_table = { > >> >>> .enable_msr_exit_interception = > vmx_enable_msr_exit_interception, > >> >>> }; > >> >>> > >> >>> +/* > >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked. > >> >>> + */ > >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > >> >>> +{ > >> >>> +struct arch_vmx_struct *vmx; > >> >>> +unsigned int cpu = smp_processor_id(); > >> >>> + > >> >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); > >> >>> + > >> >>> +/* > >> >>> + * FIXME: The length of the list depends on how many > >> >>> + * vCPU is current blocked on this specific pCPU. > >> >>> + * This may hurt the interrupt latency if the list > >> >>> + * grows to too many entries. > >> >>> + */ > >> >> let's go with this linked list first until a real issue is identified. > >> > This is exactly the way of thinking I dislike when it comes to code > >> > that isn't intended to be experimental only: We shouldn't wait > >> > for problems to surface when we already can see them. I.e. if > >> > there are no plans to deal with this, I'd ask for the feature to be > >> > off by default and be properly marked experimental in the > >> > command line option documentation (so people know to stay > >> > away from it). > >> > >> And in this specific case, there is no balancing of vcpus across the > >> pcpus lists. > >> > >> One can construct a pathological case using pinning and pausing to get > >> almost every vcpu on a single pcpu list, and vcpus recieving fewer > >> interrupts will exasperate the problem by staying on the list for longer > >> periods of time. > > > > In that extreme case I believe many contentions in other code paths will > > be much larger than overhead caused by this structure limitation. > > Examples? > > >> IMO, the PI feature cannot be declared as done/supported with this bug > >> remaining. OTOH, it is fine to be experimental, and disabled by default > >> for people who wish to experiment. > >> > > > > Again, I don't expect to see it disabled as experimental. For good > > production > > environment where vcpus are well balanced and interrupt latency is > > sensitive, > > linked list should be efficient here. For bad environment like extreme case > > you raised, I don't know whether it really matters to just tune interrupt > > path. > > Can you _guarantee_ that everything potentially leading to such a > pathological situation is covered by XSA-77? And even if it is now, > removing elements from the waiver list would become significantly > more difficult if disconnected behavior like this one would need to > be taken into account. > > Please understand that history has told us to be rather more careful > than might seem necessary with this: ATS originally having been > enabled by default is one bold example, and the recent flood of MSI > related XSAs is another; I suppose I could find more. All affecting > code originating from Intel, apparently written with only functionality > in mind, while having left out (other than basic) security considerations. > > IOW, with my committer role hat on, the feature is going to be > experimental (and hence default off) unless the issue here gets > addressed. And no, I cannot immediately suggest a good approach, > and with all of the rush before the feature freeze I also can't justify > taking a lot of time to think of options. > Is it acceptable to you if I only add the blocked vcpus that has assigned devices to the list? I think that should shorten the length of the list. Thanks, Feng > Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
>>> On 09.07.15 at 00:49, wrote: >> From: Andrew Cooper [mailto:andrew.coop...@citrix.com] >> Sent: Wednesday, July 08, 2015 9:09 PM >> On 08/07/2015 13:46, Jan Beulich wrote: >> On 08.07.15 at 13:00, wrote: >> >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata >> >>> vmx_function_table = { >> >>> .enable_msr_exit_interception = vmx_enable_msr_exit_interception, >> >>> }; >> >>> >> >>> +/* >> >>> + * Handle VT-d posted-interrupt when VCPU is blocked. >> >>> + */ >> >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) >> >>> +{ >> >>> +struct arch_vmx_struct *vmx; >> >>> +unsigned int cpu = smp_processor_id(); >> >>> + >> >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); >> >>> + >> >>> +/* >> >>> + * FIXME: The length of the list depends on how many >> >>> + * vCPU is current blocked on this specific pCPU. >> >>> + * This may hurt the interrupt latency if the list >> >>> + * grows to too many entries. >> >>> + */ >> >> let's go with this linked list first until a real issue is identified. >> > This is exactly the way of thinking I dislike when it comes to code >> > that isn't intended to be experimental only: We shouldn't wait >> > for problems to surface when we already can see them. I.e. if >> > there are no plans to deal with this, I'd ask for the feature to be >> > off by default and be properly marked experimental in the >> > command line option documentation (so people know to stay >> > away from it). >> >> And in this specific case, there is no balancing of vcpus across the >> pcpus lists. >> >> One can construct a pathological case using pinning and pausing to get >> almost every vcpu on a single pcpu list, and vcpus recieving fewer >> interrupts will exasperate the problem by staying on the list for longer >> periods of time. > > In that extreme case I believe many contentions in other code paths will > be much larger than overhead caused by this structure limitation. Examples? >> IMO, the PI feature cannot be declared as done/supported with this bug >> remaining. OTOH, it is fine to be experimental, and disabled by default >> for people who wish to experiment. >> > > Again, I don't expect to see it disabled as experimental. For good > production > environment where vcpus are well balanced and interrupt latency is > sensitive, > linked list should be efficient here. For bad environment like extreme case > you raised, I don't know whether it really matters to just tune interrupt > path. Can you _guarantee_ that everything potentially leading to such a pathological situation is covered by XSA-77? And even if it is now, removing elements from the waiver list would become significantly more difficult if disconnected behavior like this one would need to be taken into account. Please understand that history has told us to be rather more careful than might seem necessary with this: ATS originally having been enabled by default is one bold example, and the recent flood of MSI related XSAs is another; I suppose I could find more. All affecting code originating from Intel, apparently written with only functionality in mind, while having left out (other than basic) security considerations. IOW, with my committer role hat on, the feature is going to be experimental (and hence default off) unless the issue here gets addressed. And no, I cannot immediately suggest a good approach, and with all of the rush before the feature freeze I also can't justify taking a lot of time to think of options. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com] > Sent: Wednesday, July 08, 2015 9:09 PM > > On 08/07/2015 13:46, Jan Beulich wrote: > On 08.07.15 at 13:00, wrote: > >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata > >>> vmx_function_table = { > >>> .enable_msr_exit_interception = vmx_enable_msr_exit_interception, > >>> }; > >>> > >>> +/* > >>> + * Handle VT-d posted-interrupt when VCPU is blocked. > >>> + */ > >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > >>> +{ > >>> +struct arch_vmx_struct *vmx; > >>> +unsigned int cpu = smp_processor_id(); > >>> + > >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); > >>> + > >>> +/* > >>> + * FIXME: The length of the list depends on how many > >>> + * vCPU is current blocked on this specific pCPU. > >>> + * This may hurt the interrupt latency if the list > >>> + * grows to too many entries. > >>> + */ > >> let's go with this linked list first until a real issue is identified. > > This is exactly the way of thinking I dislike when it comes to code > > that isn't intended to be experimental only: We shouldn't wait > > for problems to surface when we already can see them. I.e. if > > there are no plans to deal with this, I'd ask for the feature to be > > off by default and be properly marked experimental in the > > command line option documentation (so people know to stay > > away from it). > > And in this specific case, there is no balancing of vcpus across the > pcpus lists. > > One can construct a pathological case using pinning and pausing to get > almost every vcpu on a single pcpu list, and vcpus recieving fewer > interrupts will exasperate the problem by staying on the list for longer > periods of time. In that extreme case I believe many contentions in other code paths will be much larger than overhead caused by this structure limitation. > > IMO, the PI feature cannot be declared as done/supported with this bug > remaining. OTOH, it is fine to be experimental, and disabled by default > for people who wish to experiment. > Again, I don't expect to see it disabled as experimental. For good production environment where vcpus are well balanced and interrupt latency is sensitive, linked list should be efficient here. For bad environment like extreme case you raised, I don't know whether it really matters to just tune interrupt path. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> From: Jan Beulich [mailto:jbeul...@suse.com] > Sent: Wednesday, July 08, 2015 8:46 PM > > >>> On 08.07.15 at 13:00, wrote: > >> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata > >> vmx_function_table = { > >> .enable_msr_exit_interception = vmx_enable_msr_exit_interception, > >> }; > >> > >> +/* > >> + * Handle VT-d posted-interrupt when VCPU is blocked. > >> + */ > >> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > >> +{ > >> +struct arch_vmx_struct *vmx; > >> +unsigned int cpu = smp_processor_id(); > >> + > >> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); > >> + > >> +/* > >> + * FIXME: The length of the list depends on how many > >> + * vCPU is current blocked on this specific pCPU. > >> + * This may hurt the interrupt latency if the list > >> + * grows to too many entries. > >> + */ > > > > let's go with this linked list first until a real issue is identified. > > This is exactly the way of thinking I dislike when it comes to code > that isn't intended to be experimental only: We shouldn't wait > for problems to surface when we already can see them. I.e. if > there are no plans to deal with this, I'd ask for the feature to be > off by default and be properly marked experimental in the > command line option documentation (so people know to stay > away from it). > I don't see big problem here. For typical server consolidation ratio 1:4 or 1:8, the link list should be short. It's not experimental. It's based on our judge that linked list should be fine here. But any structure may has potential problem which we don't know now, just like tasklet scalability issue Andrew raised earlier (in that case we can't blame anyone using tasklet before the issue is reported). So, there's no plan to address this since we don't see real problem now. I'd suggest Feng to remove the comment since FIXME means something experimental. Or if you have a better suggestion, please articulate instead of asking people stay away from it. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On 08/07/2015 13:46, Jan Beulich wrote: On 08.07.15 at 13:00, wrote: >>> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata >>> vmx_function_table = { >>> .enable_msr_exit_interception = vmx_enable_msr_exit_interception, >>> }; >>> >>> +/* >>> + * Handle VT-d posted-interrupt when VCPU is blocked. >>> + */ >>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) >>> +{ >>> +struct arch_vmx_struct *vmx; >>> +unsigned int cpu = smp_processor_id(); >>> + >>> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); >>> + >>> +/* >>> + * FIXME: The length of the list depends on how many >>> + * vCPU is current blocked on this specific pCPU. >>> + * This may hurt the interrupt latency if the list >>> + * grows to too many entries. >>> + */ >> let's go with this linked list first until a real issue is identified. > This is exactly the way of thinking I dislike when it comes to code > that isn't intended to be experimental only: We shouldn't wait > for problems to surface when we already can see them. I.e. if > there are no plans to deal with this, I'd ask for the feature to be > off by default and be properly marked experimental in the > command line option documentation (so people know to stay > away from it). And in this specific case, there is no balancing of vcpus across the pcpus lists. One can construct a pathological case using pinning and pausing to get almost every vcpu on a single pcpu list, and vcpus recieving fewer interrupts will exasperate the problem by staying on the list for longer periods of time. IMO, the PI feature cannot be declared as done/supported with this bug remaining. OTOH, it is fine to be experimental, and disabled by default for people who wish to experiment. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
>>> On 08.07.15 at 13:00, wrote: >> @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata >> vmx_function_table = { >> .enable_msr_exit_interception = vmx_enable_msr_exit_interception, >> }; >> >> +/* >> + * Handle VT-d posted-interrupt when VCPU is blocked. >> + */ >> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) >> +{ >> +struct arch_vmx_struct *vmx; >> +unsigned int cpu = smp_processor_id(); >> + >> +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); >> + >> +/* >> + * FIXME: The length of the list depends on how many >> + * vCPU is current blocked on this specific pCPU. >> + * This may hurt the interrupt latency if the list >> + * grows to too many entries. >> + */ > > let's go with this linked list first until a real issue is identified. This is exactly the way of thinking I dislike when it comes to code that isn't intended to be experimental only: We shouldn't wait for problems to surface when we already can see them. I.e. if there are no plans to deal with this, I'd ask for the feature to be off by default and be properly marked experimental in the command line option documentation (so people know to stay away from it). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> -Original Message- > From: Tian, Kevin > Sent: Wednesday, July 08, 2015 7:00 PM > To: Wu, Feng; xen-devel@lists.xen.org > Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang, > Yang Z; george.dun...@eu.citrix.com > Subject: RE: [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked > > > From: Wu, Feng > > Sent: Wednesday, June 24, 2015 1:18 PM > > > > This patch includes the following aspects: > > - Add a global vector to wake up the blocked vCPU > > when an interrupt is being posted to it (This > > part was sugguested by Yang Zhang ). > > - Adds a new per-vCPU tasklet to wakeup the blocked > > vCPU. It can be used in the case vcpu_unblock > > cannot be called directly. > > - Define two per-cpu variables: > > * pi_blocked_vcpu: > > A list storing the vCPUs which were blocked on this pCPU. > > > > * pi_blocked_vcpu_lock: > > The spinlock to protect pi_blocked_vcpu. > > > > Signed-off-by: Feng Wu > > --- > > v3: > > - This patch is generated by merging the following three patches in v2: > >[RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU > >[RFC v2 10/15] vmx: Define two per-cpu variables > >[RFC v2 11/15] vmx: Add a global wake-up vector for VT-d > Posted-Interrupts > > - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet' > > - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct > > arch_vmx_struct' > > - rename 'vcpu_wakeup_tasklet_handler' to > 'pi_vcpu_wakeup_tasklet_handler' > > - Make pi_wakeup_interrupt() static > > - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list' > > - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct' > > - Rename 'blocked_vcpu' to 'pi_blocked_vcpu' > > - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock' > > > > xen/arch/x86/hvm/vmx/vmcs.c| 3 +++ > > xen/arch/x86/hvm/vmx/vmx.c | 54 > > ++ > > xen/include/asm-x86/hvm/hvm.h | 1 + > > xen/include/asm-x86/hvm/vmx/vmcs.h | 5 > > xen/include/asm-x86/hvm/vmx/vmx.h | 5 > > 5 files changed, 68 insertions(+) > > > > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c > > index 11dc1b5..0c5ce3f 100644 > > --- a/xen/arch/x86/hvm/vmx/vmcs.c > > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > > @@ -631,6 +631,9 @@ int vmx_cpu_up(void) > > if ( cpu_has_vmx_vpid ) > > vpid_sync_all(); > > > > +INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu)); > > +spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu)); > > + > > return 0; > > } > > > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > > index b94ef6a..7db6009 100644 > > --- a/xen/arch/x86/hvm/vmx/vmx.c > > +++ b/xen/arch/x86/hvm/vmx/vmx.c > > @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr, > uint64_t > > *msr_content); > > static int vmx_msr_write_intercept(unsigned int msr, uint64_t > msr_content); > > static void vmx_invlpg_intercept(unsigned long vaddr); > > > > +/* > > + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we > > + * can find which vCPU should be waken up. > > + */ > > +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu); > > +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock); > > + > > uint8_t __read_mostly posted_intr_vector; > > +uint8_t __read_mostly pi_wakeup_vector; > > + > > +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg) > > +{ > > +vcpu_unblock((struct vcpu *)arg); > > +} > > > > static int vmx_domain_initialise(struct domain *d) > > { > > @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v) > > if ( v->vcpu_id == 0 ) > > v->arch.user_regs.eax = 1; > > > > +tasklet_init( > > +&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet, > > +pi_vcpu_wakeup_tasklet_handler, > > +(unsigned long)v); > > + > > +INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > > + > > return 0; > > } > > > > static void vmx_vcpu_destroy(struct vcpu *v) > > { > > +tasklet_kill(&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet); > > /* > > * There are cases that domain still remains in log-dirty mode when it > is > > * about to be destroyed (ex, user types 'xl destroy '), in which > case > > @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata > > vmx_function_table = { > > .enable_msr_exit_interception = vmx_enable_msr_exit_interception, > > }; > > > > +/* > > + * Handle VT-d posted-interrupt when VCPU is blocked. > > + */ > > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > > +{ > > +struct arch_vmx_struct *vmx; > > +unsigned int cpu = smp_processor_id(); > > + > > +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); > > + > > +/* > > + * FIXME: The length of the list depends on how many > > + * vCPU is current blocked on this specific pCPU. > > + * This may hurt the interrupt latency if the list > > + * grows to too many entries. > > + */ >
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> From: Wu, Feng > Sent: Wednesday, June 24, 2015 1:18 PM > > This patch includes the following aspects: > - Add a global vector to wake up the blocked vCPU > when an interrupt is being posted to it (This > part was sugguested by Yang Zhang ). > - Adds a new per-vCPU tasklet to wakeup the blocked > vCPU. It can be used in the case vcpu_unblock > cannot be called directly. > - Define two per-cpu variables: > * pi_blocked_vcpu: > A list storing the vCPUs which were blocked on this pCPU. > > * pi_blocked_vcpu_lock: > The spinlock to protect pi_blocked_vcpu. > > Signed-off-by: Feng Wu > --- > v3: > - This patch is generated by merging the following three patches in v2: >[RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU >[RFC v2 10/15] vmx: Define two per-cpu variables >[RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts > - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet' > - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct' > - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler' > - Make pi_wakeup_interrupt() static > - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list' > - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct' > - Rename 'blocked_vcpu' to 'pi_blocked_vcpu' > - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock' > > xen/arch/x86/hvm/vmx/vmcs.c| 3 +++ > xen/arch/x86/hvm/vmx/vmx.c | 54 > ++ > xen/include/asm-x86/hvm/hvm.h | 1 + > xen/include/asm-x86/hvm/vmx/vmcs.h | 5 > xen/include/asm-x86/hvm/vmx/vmx.h | 5 > 5 files changed, 68 insertions(+) > > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c > index 11dc1b5..0c5ce3f 100644 > --- a/xen/arch/x86/hvm/vmx/vmcs.c > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > @@ -631,6 +631,9 @@ int vmx_cpu_up(void) > if ( cpu_has_vmx_vpid ) > vpid_sync_all(); > > +INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu)); > +spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu)); > + > return 0; > } > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index b94ef6a..7db6009 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr, > uint64_t > *msr_content); > static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content); > static void vmx_invlpg_intercept(unsigned long vaddr); > > +/* > + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we > + * can find which vCPU should be waken up. > + */ > +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu); > +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock); > + > uint8_t __read_mostly posted_intr_vector; > +uint8_t __read_mostly pi_wakeup_vector; > + > +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg) > +{ > +vcpu_unblock((struct vcpu *)arg); > +} > > static int vmx_domain_initialise(struct domain *d) > { > @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v) > if ( v->vcpu_id == 0 ) > v->arch.user_regs.eax = 1; > > +tasklet_init( > +&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet, > +pi_vcpu_wakeup_tasklet_handler, > +(unsigned long)v); > + > +INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > + > return 0; > } > > static void vmx_vcpu_destroy(struct vcpu *v) > { > +tasklet_kill(&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet); > /* > * There are cases that domain still remains in log-dirty mode when it is > * about to be destroyed (ex, user types 'xl destroy '), in which > case > @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata > vmx_function_table = { > .enable_msr_exit_interception = vmx_enable_msr_exit_interception, > }; > > +/* > + * Handle VT-d posted-interrupt when VCPU is blocked. > + */ > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > +{ > +struct arch_vmx_struct *vmx; > +unsigned int cpu = smp_processor_id(); > + > +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); > + > +/* > + * FIXME: The length of the list depends on how many > + * vCPU is current blocked on this specific pCPU. > + * This may hurt the interrupt latency if the list > + * grows to too many entries. > + */ let's go with this linked list first until a real issue is identified. > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu), > +pi_blocked_vcpu_list) > +if ( vmx->pi_desc.on ) > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet); Not sure where the vcpu is removed from the list (possibly in later patch). But at least removing vcpu from the list at this point should be safe and right way to go. IIRC Andrew and other guys raised similar concern earlier. :-) Thanks Kevin
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
>>> On 08.07.15 at 12:36, wrote: >> From: Andrew Cooper [mailto:andrew.coop...@citrix.com] >> Sent: Tuesday, June 30, 2015 1:07 AM >> On 24/06/15 06:18, Feng Wu wrote: >> > @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v) >> > if ( v->vcpu_id == 0 ) >> > v->arch.user_regs.eax = 1; >> > >> > +tasklet_init( >> > +&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet, >> > +pi_vcpu_wakeup_tasklet_handler, >> > +(unsigned long)v); >> >> c/s f6dd295 indicates that the global tasklet lock causes a bottleneck >> when injecting interrupts, and replaced a tasklet with a softirq to fix >> the scalability issue. >> >> I would expect exactly the bottleneck to exist here. > > I am still considering this comments. Jan, what is your opinion about this? "My opinion" here is that I expect you to respond to Andrew. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> -Original Message- > From: Andrew Cooper [mailto:andrew.coop...@citrix.com] > Sent: Tuesday, June 30, 2015 1:07 AM > To: Wu, Feng; xen-devel@lists.xen.org > Cc: k...@xen.org; jbeul...@suse.com; Tian, Kevin; Zhang, Yang Z; > george.dun...@eu.citrix.com > Subject: Re: [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked > > On 24/06/15 06:18, Feng Wu wrote: > > This patch includes the following aspects: > > - Add a global vector to wake up the blocked vCPU > > when an interrupt is being posted to it (This > > part was sugguested by Yang Zhang ). > > - Adds a new per-vCPU tasklet to wakeup the blocked > > vCPU. It can be used in the case vcpu_unblock > > cannot be called directly. > > - Define two per-cpu variables: > > * pi_blocked_vcpu: > > A list storing the vCPUs which were blocked on this pCPU. > > > > * pi_blocked_vcpu_lock: > > The spinlock to protect pi_blocked_vcpu. > > > > Signed-off-by: Feng Wu > > --- > > v3: > > - This patch is generated by merging the following three patches in v2: > >[RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU > >[RFC v2 10/15] vmx: Define two per-cpu variables > >[RFC v2 11/15] vmx: Add a global wake-up vector for VT-d > Posted-Interrupts > > - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet' > > - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct > > arch_vmx_struct' > > - rename 'vcpu_wakeup_tasklet_handler' to > 'pi_vcpu_wakeup_tasklet_handler' > > - Make pi_wakeup_interrupt() static > > - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list' > > - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct' > > - Rename 'blocked_vcpu' to 'pi_blocked_vcpu' > > - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock' > > > > xen/arch/x86/hvm/vmx/vmcs.c| 3 +++ > > xen/arch/x86/hvm/vmx/vmx.c | 54 > ++ > > xen/include/asm-x86/hvm/hvm.h | 1 + > > xen/include/asm-x86/hvm/vmx/vmcs.h | 5 > > xen/include/asm-x86/hvm/vmx/vmx.h | 5 > > 5 files changed, 68 insertions(+) > > > > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c > > index 11dc1b5..0c5ce3f 100644 > > --- a/xen/arch/x86/hvm/vmx/vmcs.c > > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > > @@ -631,6 +631,9 @@ int vmx_cpu_up(void) > > if ( cpu_has_vmx_vpid ) > > vpid_sync_all(); > > > > +INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu)); > > +spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu)); > > + > > return 0; > > } > > > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > > index b94ef6a..7db6009 100644 > > --- a/xen/arch/x86/hvm/vmx/vmx.c > > +++ b/xen/arch/x86/hvm/vmx/vmx.c > > @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr, > uint64_t *msr_content); > > static int vmx_msr_write_intercept(unsigned int msr, uint64_t > msr_content); > > static void vmx_invlpg_intercept(unsigned long vaddr); > > > > +/* > > + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we > > + * can find which vCPU should be waken up. > > + */ > > +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu); > > +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock); > > + > > uint8_t __read_mostly posted_intr_vector; > > +uint8_t __read_mostly pi_wakeup_vector; > > + > > +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg) > > +{ > > +vcpu_unblock((struct vcpu *)arg); > > +} > > > > static int vmx_domain_initialise(struct domain *d) > > { > > @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v) > > if ( v->vcpu_id == 0 ) > > v->arch.user_regs.eax = 1; > > > > +tasklet_init( > > +&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet, > > +pi_vcpu_wakeup_tasklet_handler, > > +(unsigned long)v); > > c/s f6dd295 indicates that the global tasklet lock causes a bottleneck > when injecting interrupts, and replaced a tasklet with a softirq to fix > the scalability issue. > > I would expect exactly the bottleneck to exist here. I am still considering this comments. Jan, what is your opinion about this? Thanks, Feng > > > + > > +INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > > + > > return 0; > > } > > > > static void vmx_vcpu_destroy(struct vcpu *v) > > { > > +tasklet_kill(&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet); > > /* > > * There are cases that domain still remains in log-dirty mode when it > is > > * about to be destroyed (ex, user types 'xl destroy '), in which > case > > @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata > vmx_function_table = { > > .enable_msr_exit_interception = vmx_enable_msr_exit_interception, > > }; > > > > +/* > > + * Handle VT-d posted-interrupt when VCPU is blocked. > > + */ > > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > > +{ > > +struct arch_vmx_struct *vmx; > > +unsigned int cpu = smp_processor_id(); > > +
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> -Original Message- > From: Andrew Cooper [mailto:andrew.coop...@citrix.com] > Sent: Thursday, July 02, 2015 9:00 PM > To: Dario Faggioli > Cc: Wu, Feng; Tian, Kevin; k...@xen.org; george.dun...@eu.citrix.com; > xen-devel@lists.xen.org; jbeul...@suse.com; Zhang, Yang Z > Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU > is blocked > > On 02/07/15 13:38, Dario Faggioli wrote: > > On Thu, 2015-07-02 at 13:16 +0100, Andrew Cooper wrote: > >> On 02/07/15 13:04, Dario Faggioli wrote: > >>> On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote: > >>>> I can't currently decide whether this will be quicker or slower overall, > >>>> or (most likely) it will even out to equal in the general case. > >>>> > >>> Well, given the thing works as you (two) just described, I think > >>> draining the list is the only thing we can do. > >>> > >>> In fact, AFAICT, since we can't know for what vcpu a particular > >>> notification is intended, we don't have alternatives to waking them all, > >>> do we? > >> Perhaps you misunderstand. > >> > > I'm quite sure I was. While I think now I'm getting it. > > > >> Every single vcpu has a PI descriptor which is shared memory with > hardware. > >> > > Right. > > > >> A NV is delivered strictly when hardware atomically changes desc.on from > >> 0 to 1. i.e. the first time that an oustanding notification arrives. > >> (iirc, desc.on is later cleared by hardware when the vcpu is scheduled > >> and the vector(s) actually injected.) > >> > >> Part of the scheduling modifications alter when a vcpu is eligible to > >> have NV's delivered on its behalf. non-scheduled vcpus get NV's while > >> scheduled vcpus have direct injection instead. > >> > > Blocked vcpus, AFAICT. But that's not relevant here. > > > >> Therefore, in the case that an NV arrives, we know for certain that one > >> of the NV-eligible vcpus has had desc.on set by hardware, and we can > >> uniquely identify it by searching for the vcpu for which desc.on is set. > >> > > Yeah, but we ca have more than one of them. You said "I can't currently > > decide whether this will be quicker or slower", which I read like you > > were suggesting that not draining the queue was a plausible alternative, > > while I now think it's not. > > > > Perhaps you were not meaning anything like that, so it was not necessary > > for me to point this out, in which case, sorry for the noise. :-) > > To be clear, (assuming that a kicked vcpu is removed from the list), > then both options of kicking exactly one vcpu or kicking all vcpus will > function. The end result after all processing of NVs will be that every > vcpu with desc.on set will be kicked exactly once. We need to remove the kicked vCPU from the list, this is will be included in the next version. Thanks, Feng > > I just was concerned about the O() of searching the list on a subsequent > NV, knowing that we most likely took the relevant entry off the list on > the previous NV. > > > > >> In the case of stacked NV's, we cannot associate which specific vcpu > >> caused which NV, but we know that we will get one NV per vcpu needing > >> kicking. > >> > > Exactly, and that's what I'm talking about, and why I'm saying that > > waking everyone is the only solution. The bottom line being that, even > > in case this is deemed too slow, we don't have the option of waking only > > one vcpu at each NV, as we wouldn't know who to wake, and hence we'd > > need to make things faster in some other way. > > Ah - I see your point now. > > Yes - kicking exactly one vcpu per NV could result in a different vcpu > being deferred based on the interrupt activity of other vcpus and its > position in the list. > > In which case, we should eagerly kick all vcpus. > > ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On 02/07/15 13:38, Dario Faggioli wrote: > On Thu, 2015-07-02 at 13:16 +0100, Andrew Cooper wrote: >> On 02/07/15 13:04, Dario Faggioli wrote: >>> On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote: I can't currently decide whether this will be quicker or slower overall, or (most likely) it will even out to equal in the general case. >>> Well, given the thing works as you (two) just described, I think >>> draining the list is the only thing we can do. >>> >>> In fact, AFAICT, since we can't know for what vcpu a particular >>> notification is intended, we don't have alternatives to waking them all, >>> do we? >> Perhaps you misunderstand. >> > I'm quite sure I was. While I think now I'm getting it. > >> Every single vcpu has a PI descriptor which is shared memory with hardware. >> > Right. > >> A NV is delivered strictly when hardware atomically changes desc.on from >> 0 to 1. i.e. the first time that an oustanding notification arrives. >> (iirc, desc.on is later cleared by hardware when the vcpu is scheduled >> and the vector(s) actually injected.) >> >> Part of the scheduling modifications alter when a vcpu is eligible to >> have NV's delivered on its behalf. non-scheduled vcpus get NV's while >> scheduled vcpus have direct injection instead. >> > Blocked vcpus, AFAICT. But that's not relevant here. > >> Therefore, in the case that an NV arrives, we know for certain that one >> of the NV-eligible vcpus has had desc.on set by hardware, and we can >> uniquely identify it by searching for the vcpu for which desc.on is set. >> > Yeah, but we ca have more than one of them. You said "I can't currently > decide whether this will be quicker or slower", which I read like you > were suggesting that not draining the queue was a plausible alternative, > while I now think it's not. > > Perhaps you were not meaning anything like that, so it was not necessary > for me to point this out, in which case, sorry for the noise. :-) To be clear, (assuming that a kicked vcpu is removed from the list), then both options of kicking exactly one vcpu or kicking all vcpus will function. The end result after all processing of NVs will be that every vcpu with desc.on set will be kicked exactly once. I just was concerned about the O() of searching the list on a subsequent NV, knowing that we most likely took the relevant entry off the list on the previous NV. > >> In the case of stacked NV's, we cannot associate which specific vcpu >> caused which NV, but we know that we will get one NV per vcpu needing >> kicking. >> > Exactly, and that's what I'm talking about, and why I'm saying that > waking everyone is the only solution. The bottom line being that, even > in case this is deemed too slow, we don't have the option of waking only > one vcpu at each NV, as we wouldn't know who to wake, and hence we'd > need to make things faster in some other way. Ah - I see your point now. Yes - kicking exactly one vcpu per NV could result in a different vcpu being deferred based on the interrupt activity of other vcpus and its position in the list. In which case, we should eagerly kick all vcpus. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On Thu, 2015-07-02 at 13:16 +0100, Andrew Cooper wrote: > On 02/07/15 13:04, Dario Faggioli wrote: > > On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote: > >> I can't currently decide whether this will be quicker or slower overall, > >> or (most likely) it will even out to equal in the general case. > >> > > Well, given the thing works as you (two) just described, I think > > draining the list is the only thing we can do. > > > > In fact, AFAICT, since we can't know for what vcpu a particular > > notification is intended, we don't have alternatives to waking them all, > > do we? > > Perhaps you misunderstand. > I'm quite sure I was. While I think now I'm getting it. > Every single vcpu has a PI descriptor which is shared memory with hardware. > Right. > A NV is delivered strictly when hardware atomically changes desc.on from > 0 to 1. i.e. the first time that an oustanding notification arrives. > (iirc, desc.on is later cleared by hardware when the vcpu is scheduled > and the vector(s) actually injected.) > > Part of the scheduling modifications alter when a vcpu is eligible to > have NV's delivered on its behalf. non-scheduled vcpus get NV's while > scheduled vcpus have direct injection instead. > Blocked vcpus, AFAICT. But that's not relevant here. > Therefore, in the case that an NV arrives, we know for certain that one > of the NV-eligible vcpus has had desc.on set by hardware, and we can > uniquely identify it by searching for the vcpu for which desc.on is set. > Yeah, but we ca have more than one of them. You said "I can't currently decide whether this will be quicker or slower", which I read like you were suggesting that not draining the queue was a plausible alternative, while I now think it's not. Perhaps you were not meaning anything like that, so it was not necessary for me to point this out, in which case, sorry for the noise. :-) > In the case of stacked NV's, we cannot associate which specific vcpu > caused which NV, but we know that we will get one NV per vcpu needing > kicking. > Exactly, and that's what I'm talking about, and why I'm saying that waking everyone is the only solution. The bottom line being that, even in case this is deemed too slow, we don't have the option of waking only one vcpu at each NV, as we wouldn't know who to wake, and hence we'd need to make things faster in some other way. Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On 02/07/15 13:04, Dario Faggioli wrote: > On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote: >> On 02/07/15 09:30, Dario Faggioli wrote: >>> It is, therefore, not effective in making sure that, even with only one >>> notification, you only kick the interested vcpu. >>> >>> This is the third time that I ask: >>> (1) whether it is possible to have more vcpus queued on one pcpu PI >>> blocked list with desc.on (I really believe it is); >>> (2) if yes, whether it is TheRightThing(TM) to kick all of them, as >>> soon as any notification arrives, instead that putting together a >>> mechanism for kicking only a specific one. >> We will receive one NV for every time the hardware managed to >> successfully set desc.on >> > Right, I see it now, thanks. > >> If multiple stack up and we proactively drain the list, we will >> subsequently search the list to completion for all remaining NV's, due >> to finding no appropriate entries. >> >> I can't currently decide whether this will be quicker or slower overall, >> or (most likely) it will even out to equal in the general case. >> > Well, given the thing works as you (two) just described, I think > draining the list is the only thing we can do. > > In fact, AFAICT, since we can't know for what vcpu a particular > notification is intended, we don't have alternatives to waking them all, > do we? Perhaps you misunderstand. Every single vcpu has a PI descriptor which is shared memory with hardware. A NV is delivered strictly when hardware atomically changes desc.on from 0 to 1. i.e. the first time that an oustanding notification arrives. (iirc, desc.on is later cleared by hardware when the vcpu is scheduled and the vector(s) actually injected.) Part of the scheduling modifications alter when a vcpu is eligible to have NV's delivered on its behalf. non-scheduled vcpus get NV's while scheduled vcpus have direct injection instead. Therefore, in the case that an NV arrives, we know for certain that one of the NV-eligible vcpus has had desc.on set by hardware, and we can uniquely identify it by searching for the vcpu for which desc.on is set. In the case of stacked NV's, we cannot associate which specific vcpu caused which NV, but we know that we will get one NV per vcpu needing kicking. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> -Original Message- > From: Dario Faggioli [mailto:dario.faggi...@citrix.com] > Sent: Thursday, July 02, 2015 8:04 PM > To: Andrew Cooper > Cc: Wu, Feng; Tian, Kevin; k...@xen.org; george.dun...@eu.citrix.com; > xen-devel@lists.xen.org; jbeul...@suse.com; Zhang, Yang Z > Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU > is blocked > > On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote: > > On 02/07/15 09:30, Dario Faggioli wrote: > > > > It is, therefore, not effective in making sure that, even with only one > > > notification, you only kick the interested vcpu. > > > > > > This is the third time that I ask: > > > (1) whether it is possible to have more vcpus queued on one pcpu PI > > > blocked list with desc.on (I really believe it is); > > > (2) if yes, whether it is TheRightThing(TM) to kick all of them, as > > > soon as any notification arrives, instead that putting together a > > > mechanism for kicking only a specific one. > > > > We will receive one NV for every time the hardware managed to > > successfully set desc.on > > > Right, I see it now, thanks. > > > If multiple stack up and we proactively drain the list, we will > > subsequently search the list to completion for all remaining NV's, due > > to finding no appropriate entries. > > > > I can't currently decide whether this will be quicker or slower overall, > > or (most likely) it will even out to equal in the general case. > > > Well, given the thing works as you (two) just described, I think > draining the list is the only thing we can do. > > In fact, AFAICT, since we can't know for what vcpu a particular > notification is intended, Exactly, when notification event happens, the hardware sets 'ON', software will find the vCPU with 'ON' set, in fact, software doesn't know which vCPU the wakeup event is targeting, the only thing it can do is kicking the vCPUs with desc.on = 1. Thanks, Feng we don't have alternatives to waking them all, > do we? > > Dario > > -- > <> (Raistlin Majere) > - > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote: > On 02/07/15 09:30, Dario Faggioli wrote: > > It is, therefore, not effective in making sure that, even with only one > > notification, you only kick the interested vcpu. > > > > This is the third time that I ask: > > (1) whether it is possible to have more vcpus queued on one pcpu PI > > blocked list with desc.on (I really believe it is); > > (2) if yes, whether it is TheRightThing(TM) to kick all of them, as > > soon as any notification arrives, instead that putting together a > > mechanism for kicking only a specific one. > > We will receive one NV for every time the hardware managed to > successfully set desc.on > Right, I see it now, thanks. > If multiple stack up and we proactively drain the list, we will > subsequently search the list to completion for all remaining NV's, due > to finding no appropriate entries. > > I can't currently decide whether this will be quicker or slower overall, > or (most likely) it will even out to equal in the general case. > Well, given the thing works as you (two) just described, I think draining the list is the only thing we can do. In fact, AFAICT, since we can't know for what vcpu a particular notification is intended, we don't have alternatives to waking them all, do we? Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> -Original Message- > From: Andrew Cooper [mailto:andrew.coop...@citrix.com] > Sent: Thursday, July 02, 2015 6:30 PM > To: Dario Faggioli; Wu, Feng > Cc: xen-devel@lists.xen.org; Zhang, Yang Z; george.dun...@eu.citrix.com; > Tian, Kevin; k...@xen.org; jbeul...@suse.com > Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU > is blocked > > On 02/07/15 09:30, Dario Faggioli wrote: > > On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote: > > > >>>>> +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu), > >>>>> +pi_blocked_vcpu_list) > >>>>> +if ( vmx->pi_desc.on ) > >>>>> +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet); > >>>> There is a logical bug here. If we have two NV's delivered to this > >>>> pcpu, we will kick the first vcpu twice. > >>>> > >>>> On finding desc.on, a kick should be scheduled, then the vcpu removed > >>>> from this list. With desc.on set, we know for certain that another NV > >>>> will not arrive for it until it has been scheduled again and the > >>>> interrupt posted. > >>>> > >>> Yes, that seems a possible issue (and one that should indeed be > >>> avoided). > >>> > >>> I'm still unsure about the one that I raised myself but, if it is > >>> possible to have more than one vcpu in a pcpu list, with desc.on==true, > >>> then it looks to me that we kick all of them, for each notification. > >>> > >>> Added what Andrew's spotted, if there are a bunch of vcpus, queued with > >>> desc.on==ture, and a bunch of notifications arrives before the tasklet > >>> gets executed, we'll be kicking the whole bunch of them for a bunch of > >>> times! :-/ > >> As Andrew mentioned, removing the vCPUs with desc.on = true from the > >> list can avoid kick vCPUs for multiple times. > >> > > It avoids kicking vcpus multiple times if more than one notification > > arrives, yes. > > > > It is, therefore, not effective in making sure that, even with only one > > notification, you only kick the interested vcpu. > > > > This is the third time that I ask: > > (1) whether it is possible to have more vcpus queued on one pcpu PI > > blocked list with desc.on (I really believe it is); > > (2) if yes, whether it is TheRightThing(TM) to kick all of them, as > > soon as any notification arrives, instead that putting together a > > mechanism for kicking only a specific one. > > We will receive one NV for every time the hardware managed to > successfully set desc.on > > If multiple stack up and we proactively drain the list, we will > subsequently search the list to completion for all remaining NV's, due > to finding no appropriate entries. > > I can't currently decide whether this will be quicker or slower overall, > or (most likely) it will even out to equal in the general case. What do you mean by "general case"? Thanks, Feng > > ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> -Original Message- > From: Dario Faggioli [mailto:dario.faggi...@citrix.com] > Sent: Thursday, July 02, 2015 6:10 PM > To: Wu, Feng > Cc: Andrew Cooper; xen-devel@lists.xen.org; Zhang, Yang Z; > george.dun...@eu.citrix.com; Tian, Kevin; k...@xen.org; jbeul...@suse.com > Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU > is blocked > > On Thu, 2015-07-02 at 08:58 +, Wu, Feng wrote: > > > > -Original Message- > > > From: Dario Faggioli [mailto:dario.faggi...@citrix.com] > > > > > > This is the third time that I ask: > > > (1) whether it is possible to have more vcpus queued on one pcpu PI > > > blocked list with desc.on (I really believe it is); > > > > I think it is, please see the following scenario: > > > > When cpu masks the interrupts, and an external interrupt occurs for the > > assigned device while the target vCPU2 is blocked, the wakeup notification > > event handler has no chance to run, after a while, another wakeup > > notification event for vCPU4 blocking on the same pCPU occurs, > > after cpu unmakes the interrupts, wakeup notification handler > > gets called. Then we get: > > vCPU2, desc.on = 1 and vCPU4, desc.on = 1 > > Then in the handler we need to kick both of them. > > > Ok, first of all, thanks for answering! :-) > > And yes, this makes sense. > > > > (2) if yes, whether it is TheRightThing(TM) to kick all of them, as > > > soon as any notification arrives, instead that putting together a > > > mechanism for kicking only a specific one. > > > > > Why can't we kick all of them, 'desc.on = 1' means there is a pending > > interrupt, when we meet this condition, kicking the related vCPU should > > be the right thing to do. > > > Right, I see it now. I felt like I was missing something, and that's why > I was asking to you to elaborate a bit more. > Thanks again for having done this. I was missing/forgetting half of the > way desc.on is actually handled, sorry for this. > > BTW, I'm finding it hard reading this series from the archives; there > appears to be some threading issues and some missing messages. I also > don't have it in my inbox, because my filters failed to spot and flag it > properly. If you send a new version, please, Cc me, so it will be easier > for me to look at all the patches, and provide a more helpful review. Sure, thanks for the review! Thanks, Feng > > Thanks and Regards, > Dario > > -- > <> (Raistlin Majere) > - > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On 02/07/15 09:30, Dario Faggioli wrote: > On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote: > > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu), > +pi_blocked_vcpu_list) > +if ( vmx->pi_desc.on ) > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet); There is a logical bug here. If we have two NV's delivered to this pcpu, we will kick the first vcpu twice. On finding desc.on, a kick should be scheduled, then the vcpu removed from this list. With desc.on set, we know for certain that another NV will not arrive for it until it has been scheduled again and the interrupt posted. >>> Yes, that seems a possible issue (and one that should indeed be >>> avoided). >>> >>> I'm still unsure about the one that I raised myself but, if it is >>> possible to have more than one vcpu in a pcpu list, with desc.on==true, >>> then it looks to me that we kick all of them, for each notification. >>> >>> Added what Andrew's spotted, if there are a bunch of vcpus, queued with >>> desc.on==ture, and a bunch of notifications arrives before the tasklet >>> gets executed, we'll be kicking the whole bunch of them for a bunch of >>> times! :-/ >> As Andrew mentioned, removing the vCPUs with desc.on = true from the >> list can avoid kick vCPUs for multiple times. >> > It avoids kicking vcpus multiple times if more than one notification > arrives, yes. > > It is, therefore, not effective in making sure that, even with only one > notification, you only kick the interested vcpu. > > This is the third time that I ask: > (1) whether it is possible to have more vcpus queued on one pcpu PI > blocked list with desc.on (I really believe it is); > (2) if yes, whether it is TheRightThing(TM) to kick all of them, as > soon as any notification arrives, instead that putting together a > mechanism for kicking only a specific one. We will receive one NV for every time the hardware managed to successfully set desc.on If multiple stack up and we proactively drain the list, we will subsequently search the list to completion for all remaining NV's, due to finding no appropriate entries. I can't currently decide whether this will be quicker or slower overall, or (most likely) it will even out to equal in the general case. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On Thu, 2015-07-02 at 08:58 +, Wu, Feng wrote: > > -Original Message- > > From: Dario Faggioli [mailto:dario.faggi...@citrix.com] > > > > This is the third time that I ask: > > (1) whether it is possible to have more vcpus queued on one pcpu PI > > blocked list with desc.on (I really believe it is); > > I think it is, please see the following scenario: > > When cpu masks the interrupts, and an external interrupt occurs for the > assigned device while the target vCPU2 is blocked, the wakeup notification > event handler has no chance to run, after a while, another wakeup > notification event for vCPU4 blocking on the same pCPU occurs, > after cpu unmakes the interrupts, wakeup notification handler > gets called. Then we get: > vCPU2, desc.on = 1 and vCPU4, desc.on = 1 > Then in the handler we need to kick both of them. > Ok, first of all, thanks for answering! :-) And yes, this makes sense. > > (2) if yes, whether it is TheRightThing(TM) to kick all of them, as > > soon as any notification arrives, instead that putting together a > > mechanism for kicking only a specific one. > > > Why can't we kick all of them, 'desc.on = 1' means there is a pending > interrupt, when we meet this condition, kicking the related vCPU should > be the right thing to do. > Right, I see it now. I felt like I was missing something, and that's why I was asking to you to elaborate a bit more. Thanks again for having done this. I was missing/forgetting half of the way desc.on is actually handled, sorry for this. BTW, I'm finding it hard reading this series from the archives; there appears to be some threading issues and some missing messages. I also don't have it in my inbox, because my filters failed to spot and flag it properly. If you send a new version, please, Cc me, so it will be easier for me to look at all the patches, and provide a more helpful review. Thanks and Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> -Original Message- > From: Dario Faggioli [mailto:dario.faggi...@citrix.com] > Sent: Thursday, July 02, 2015 4:30 PM > To: Wu, Feng > Cc: Andrew Cooper; xen-devel@lists.xen.org; Zhang, Yang Z; > george.dun...@eu.citrix.com; Tian, Kevin; k...@xen.org; jbeul...@suse.com > Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU > is blocked > > On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote: > > > > > > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu), > > > > > +pi_blocked_vcpu_list) > > > > > +if ( vmx->pi_desc.on ) > > > > > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet); > > > > > > > > There is a logical bug here. If we have two NV's delivered to this > > > > pcpu, we will kick the first vcpu twice. > > > > > > > > On finding desc.on, a kick should be scheduled, then the vcpu removed > > > > from this list. With desc.on set, we know for certain that another NV > > > > will not arrive for it until it has been scheduled again and the > > > > interrupt posted. > > > > > > > Yes, that seems a possible issue (and one that should indeed be > > > avoided). > > > > > > I'm still unsure about the one that I raised myself but, if it is > > > possible to have more than one vcpu in a pcpu list, with desc.on==true, > > > then it looks to me that we kick all of them, for each notification. > > > > > > Added what Andrew's spotted, if there are a bunch of vcpus, queued with > > > desc.on==ture, and a bunch of notifications arrives before the tasklet > > > gets executed, we'll be kicking the whole bunch of them for a bunch of > > > times! :-/ > > > > As Andrew mentioned, removing the vCPUs with desc.on = true from the > > list can avoid kick vCPUs for multiple times. > > > It avoids kicking vcpus multiple times if more than one notification > arrives, yes. > > It is, therefore, not effective in making sure that, even with only one > notification, you only kick the interested vcpu. > > This is the third time that I ask: > (1) whether it is possible to have more vcpus queued on one pcpu PI > blocked list with desc.on (I really believe it is); I think it is, please see the following scenario: When cpu masks the interrupts, and an external interrupt occurs for the assigned device while the target vCPU2 is blocked, the wakeup notification event handler has no chance to run, after a while, another wakeup notification event for vCPU4 blocking on the same pCPU occurs, after cpu unmakes the interrupts, wakeup notification handler gets called. Then we get: vCPU2, desc.on = 1 and vCPU4, desc.on = 1 Then in the handler we need to kick both of them. > (2) if yes, whether it is TheRightThing(TM) to kick all of them, as > soon as any notification arrives, instead that putting together a > mechanism for kicking only a specific one. > Why can't we kick all of them, 'desc.on = 1' means there is a pending interrupt, when we meet this condition, kicking the related vCPU should be the right thing to do. Thanks, Feng > The fact that you're not answering is not so much of a big deal for > me... I'll just keep asking! :-D > > > Regards, > Dario > -- > <> (Raistlin Majere) > - > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote: > > > > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu), > > > > +pi_blocked_vcpu_list) > > > > +if ( vmx->pi_desc.on ) > > > > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet); > > > > > > There is a logical bug here. If we have two NV's delivered to this > > > pcpu, we will kick the first vcpu twice. > > > > > > On finding desc.on, a kick should be scheduled, then the vcpu removed > > > from this list. With desc.on set, we know for certain that another NV > > > will not arrive for it until it has been scheduled again and the > > > interrupt posted. > > > > > Yes, that seems a possible issue (and one that should indeed be > > avoided). > > > > I'm still unsure about the one that I raised myself but, if it is > > possible to have more than one vcpu in a pcpu list, with desc.on==true, > > then it looks to me that we kick all of them, for each notification. > > > > Added what Andrew's spotted, if there are a bunch of vcpus, queued with > > desc.on==ture, and a bunch of notifications arrives before the tasklet > > gets executed, we'll be kicking the whole bunch of them for a bunch of > > times! :-/ > > As Andrew mentioned, removing the vCPUs with desc.on = true from the > list can avoid kick vCPUs for multiple times. > It avoids kicking vcpus multiple times if more than one notification arrives, yes. It is, therefore, not effective in making sure that, even with only one notification, you only kick the interested vcpu. This is the third time that I ask: (1) whether it is possible to have more vcpus queued on one pcpu PI blocked list with desc.on (I really believe it is); (2) if yes, whether it is TheRightThing(TM) to kick all of them, as soon as any notification arrives, instead that putting together a mechanism for kicking only a specific one. The fact that you're not answering is not so much of a big deal for me... I'll just keep asking! :-D Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> -Original Message- > From: Dario Faggioli [mailto:dario.faggi...@citrix.com] > Sent: Wednesday, July 01, 2015 9:26 PM > To: Andrew Cooper > Cc: Wu, Feng; xen-devel@lists.xen.org; Zhang, Yang Z; > george.dun...@eu.citrix.com; Tian, Kevin; k...@xen.org; jbeul...@suse.com > Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU > is blocked > > On Tue, 2015-06-30 at 11:11 +0100, Andrew Cooper wrote: > > On 24/06/15 06:18, Feng Wu wrote: > > > > +/* > > > + * Handle VT-d posted-interrupt when VCPU is blocked. > > > + */ > > > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > > > +{ > > > +struct arch_vmx_struct *vmx; > > > +unsigned int cpu = smp_processor_id(); > > > + > > > +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); > > > + > > > +/* > > > + * FIXME: The length of the list depends on how many > > > + * vCPU is current blocked on this specific pCPU. > > > + * This may hurt the interrupt latency if the list > > > + * grows to too many entries. > > > + */ > > > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu), > > > +pi_blocked_vcpu_list) > > > +if ( vmx->pi_desc.on ) > > > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet); > > > > There is a logical bug here. If we have two NV's delivered to this > > pcpu, we will kick the first vcpu twice. > > > > On finding desc.on, a kick should be scheduled, then the vcpu removed > > from this list. With desc.on set, we know for certain that another NV > > will not arrive for it until it has been scheduled again and the > > interrupt posted. > > > Yes, that seems a possible issue (and one that should indeed be > avoided). > > I'm still unsure about the one that I raised myself but, if it is > possible to have more than one vcpu in a pcpu list, with desc.on==true, > then it looks to me that we kick all of them, for each notification. > > Added what Andrew's spotted, if there are a bunch of vcpus, queued with > desc.on==ture, and a bunch of notifications arrives before the tasklet > gets executed, we'll be kicking the whole bunch of them for a bunch of > times! :-/ As Andrew mentioned, removing the vCPUs with desc.on = true from the list can avoid kick vCPUs for multiple times. Thanks, Feng > > Regards, > Dario > > -- > <> (Raistlin Majere) > - > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
> -Original Message- > From: Andrew Cooper [mailto:andrew.coop...@citrix.com] > Sent: Tuesday, June 30, 2015 6:12 PM > To: Wu, Feng; xen-devel@lists.xen.org > Cc: k...@xen.org; jbeul...@suse.com; Tian, Kevin; Zhang, Yang Z; > george.dun...@eu.citrix.com > Subject: Re: [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked > > On 24/06/15 06:18, Feng Wu wrote: > > @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata > vmx_function_table = { > > .enable_msr_exit_interception = vmx_enable_msr_exit_interception, > > }; > > > > +/* > > + * Handle VT-d posted-interrupt when VCPU is blocked. > > + */ > > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > > +{ > > +struct arch_vmx_struct *vmx; > > +unsigned int cpu = smp_processor_id(); > > + > > +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); > > + > > +/* > > + * FIXME: The length of the list depends on how many > > + * vCPU is current blocked on this specific pCPU. > > + * This may hurt the interrupt latency if the list > > + * grows to too many entries. > > + */ > > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu), > > +pi_blocked_vcpu_list) > > +if ( vmx->pi_desc.on ) > > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet); > > There is a logical bug here. If we have two NV's delivered to this > pcpu, we will kick the first vcpu twice. > > On finding desc.on, a kick should be scheduled, then the vcpu removed > from this list. So removing the vCPU from the blocking list here can avoid kicking the vCPU multiple times, right? Thanks, Feng With desc.on set, we know for certain that another NV > will not arrive for it until it has been scheduled again and the > interrupt posted. > > ~Andrew > > > + > > +spin_unlock(&per_cpu(pi_blocked_vcpu_lock, cpu)); > > + > > +ack_APIC_irq(); > > +this_cpu(irq_count)++; > > +} > > + > > const struct hvm_function_table * __init start_vmx(void) > > { > > set_in_cr4(X86_CR4_VMXE); > > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On Tue, 2015-06-30 at 11:11 +0100, Andrew Cooper wrote: > On 24/06/15 06:18, Feng Wu wrote: > > +/* > > + * Handle VT-d posted-interrupt when VCPU is blocked. > > + */ > > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > > +{ > > +struct arch_vmx_struct *vmx; > > +unsigned int cpu = smp_processor_id(); > > + > > +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); > > + > > +/* > > + * FIXME: The length of the list depends on how many > > + * vCPU is current blocked on this specific pCPU. > > + * This may hurt the interrupt latency if the list > > + * grows to too many entries. > > + */ > > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu), > > +pi_blocked_vcpu_list) > > +if ( vmx->pi_desc.on ) > > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet); > > There is a logical bug here. If we have two NV's delivered to this > pcpu, we will kick the first vcpu twice. > > On finding desc.on, a kick should be scheduled, then the vcpu removed > from this list. With desc.on set, we know for certain that another NV > will not arrive for it until it has been scheduled again and the > interrupt posted. > Yes, that seems a possible issue (and one that should indeed be avoided). I'm still unsure about the one that I raised myself but, if it is possible to have more than one vcpu in a pcpu list, with desc.on==true, then it looks to me that we kick all of them, for each notification. Added what Andrew's spotted, if there are a bunch of vcpus, queued with desc.on==ture, and a bunch of notifications arrives before the tasklet gets executed, we'll be kicking the whole bunch of them for a bunch of times! :-/ Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On 24/06/15 06:18, Feng Wu wrote: > @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata > vmx_function_table = { > .enable_msr_exit_interception = vmx_enable_msr_exit_interception, > }; > > +/* > + * Handle VT-d posted-interrupt when VCPU is blocked. > + */ > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > +{ > +struct arch_vmx_struct *vmx; > +unsigned int cpu = smp_processor_id(); > + > +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); > + > +/* > + * FIXME: The length of the list depends on how many > + * vCPU is current blocked on this specific pCPU. > + * This may hurt the interrupt latency if the list > + * grows to too many entries. > + */ > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu), > +pi_blocked_vcpu_list) > +if ( vmx->pi_desc.on ) > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet); There is a logical bug here. If we have two NV's delivered to this pcpu, we will kick the first vcpu twice. On finding desc.on, a kick should be scheduled, then the vcpu removed from this list. With desc.on set, we know for certain that another NV will not arrive for it until it has been scheduled again and the interrupt posted. ~Andrew > + > +spin_unlock(&per_cpu(pi_blocked_vcpu_lock, cpu)); > + > +ack_APIC_irq(); > +this_cpu(irq_count)++; > +} > + > const struct hvm_function_table * __init start_vmx(void) > { > set_in_cr4(X86_CR4_VMXE); > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On 24/06/15 06:18, Feng Wu wrote: > This patch includes the following aspects: > - Add a global vector to wake up the blocked vCPU > when an interrupt is being posted to it (This > part was sugguested by Yang Zhang ). > - Adds a new per-vCPU tasklet to wakeup the blocked > vCPU. It can be used in the case vcpu_unblock > cannot be called directly. > - Define two per-cpu variables: > * pi_blocked_vcpu: > A list storing the vCPUs which were blocked on this pCPU. > > * pi_blocked_vcpu_lock: > The spinlock to protect pi_blocked_vcpu. > > Signed-off-by: Feng Wu > --- > v3: > - This patch is generated by merging the following three patches in v2: >[RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU >[RFC v2 10/15] vmx: Define two per-cpu variables >[RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts > - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet' > - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct' > - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler' > - Make pi_wakeup_interrupt() static > - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list' > - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct' > - Rename 'blocked_vcpu' to 'pi_blocked_vcpu' > - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock' > > xen/arch/x86/hvm/vmx/vmcs.c| 3 +++ > xen/arch/x86/hvm/vmx/vmx.c | 54 > ++ > xen/include/asm-x86/hvm/hvm.h | 1 + > xen/include/asm-x86/hvm/vmx/vmcs.h | 5 > xen/include/asm-x86/hvm/vmx/vmx.h | 5 > 5 files changed, 68 insertions(+) > > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c > index 11dc1b5..0c5ce3f 100644 > --- a/xen/arch/x86/hvm/vmx/vmcs.c > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > @@ -631,6 +631,9 @@ int vmx_cpu_up(void) > if ( cpu_has_vmx_vpid ) > vpid_sync_all(); > > +INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu)); > +spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu)); > + > return 0; > } > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index b94ef6a..7db6009 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr, > uint64_t *msr_content); > static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content); > static void vmx_invlpg_intercept(unsigned long vaddr); > > +/* > + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we > + * can find which vCPU should be waken up. > + */ > +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu); > +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock); > + > uint8_t __read_mostly posted_intr_vector; > +uint8_t __read_mostly pi_wakeup_vector; > + > +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg) > +{ > +vcpu_unblock((struct vcpu *)arg); > +} > > static int vmx_domain_initialise(struct domain *d) > { > @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v) > if ( v->vcpu_id == 0 ) > v->arch.user_regs.eax = 1; > > +tasklet_init( > +&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet, > +pi_vcpu_wakeup_tasklet_handler, > +(unsigned long)v); c/s f6dd295 indicates that the global tasklet lock causes a bottleneck when injecting interrupts, and replaced a tasklet with a softirq to fix the scalability issue. I would expect exactly the bottleneck to exist here. > + > +INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > + > return 0; > } > > static void vmx_vcpu_destroy(struct vcpu *v) > { > +tasklet_kill(&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet); > /* > * There are cases that domain still remains in log-dirty mode when it is > * about to be destroyed (ex, user types 'xl destroy '), in which > case > @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata > vmx_function_table = { > .enable_msr_exit_interception = vmx_enable_msr_exit_interception, > }; > > +/* > + * Handle VT-d posted-interrupt when VCPU is blocked. > + */ > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > +{ > +struct arch_vmx_struct *vmx; > +unsigned int cpu = smp_processor_id(); > + > +spin_lock(&per_cpu(pi_blocked_vcpu_lock, cpu)); this_cpu($foo) should be used in preference to per_cpu($foo, $myself). However, always hoist repeated uses of this/per_cpu into local variables, as the compiler is unable to elide repeated accesses (because of a deliberate anti-optimisation behind the scenes). spinlock_t *lock = &this_cpu(pi_blocked_vcpu_lock); list_head *blocked_vcpus = &this_cpu(ps_blocked_vcpu); ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel