Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
-Original Message- From: Tian, Kevin Sent: Tuesday, March 10, 2015 10:22 AM To: Wu, Feng; xen-devel@lists.xen.org Cc: Jan Beulich; Zhang, Yang Z Subject: RE: VT-d Posted-interrupt (PI) design for XEN From: Wu, Feng Sent: Wednesday, March 04, 2015 9:30 PM VT-d Posted-interrupt (PI) design for XEN Background == With the development of virtualization, there are more and more device assignment requirements. However, today when a VM is running with assigned devices (such as, NIC), external interrupt handling for the assigned devices always needs VMM intervention. VT-d Posted-interrupt is a more enhanced method to handle interrupts in the virtualization environment. Interrupt posting is the process by which an interrupt request is recorded in a memory-resident posted-interrupt-descriptor structure by the root-complex, followed by an optional notification event issued to the CPU complex. With VT-d Posted-interrupt we can get the following advantages: - Directly delivery of external interrupts to running vCPUs without VMM intervention Directly - Direct - Decease the interrupt migration complexity. On vCPU migration, software can atomically co-migrate all interrupts targeting the migrating vCPU. could you elaborate this benefit? I didn't see discussion around migration throughout the proposal. Posted-interrupt Introduction There are two components to the Posted-interrupt architecture: Processor Support and Root-Complex Support - Processor Support Posted-interrupt processing is a feature by which a processor processes the virtual interrupts by recording them as pending on the virtual-APIC page. Posted-interrupt processing is enabled by setting the process posted interrupts VM-execution control. The processing is performed in response to the arrival of an interrupt with the posted-interrupt notification vector. In response to such an interrupt, the processor processes virtual interrupts recorded in a data structure called a posted-interrupt descriptor. More information about APICv and CPU-side Posted-interrupt, please refer to Chapter 29, and Section 29.6 in the Intel SDM: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/6 4-ia-32-architectures-software-developer-manual-325462.pdf - Root-Complex Support Interrupt posting is the process by which an interrupt request (from IOAPIC or MSI/MSIx capable sources) is recorded in a memory-resident posted-interrupt-descriptor structure by the root-complex, followed by an optional notification event issued to the CPU complex. The interrupt request arriving at the root-complex carry the identity of the interrupt request source and a 'remapping-index'. The remapping-index is used to look-up an entry from the memory-resident interrupt-remap-table. Unlike with interrupt-remapping, the interrupt-remap-table-entry for a posted- interrupt, specifies a virtual-vector and a pointer to the posted-interrupt descriptor. The virtual-vector specifies the vector of the interrupt to be recorded in the posted-interrupt descriptor. The posted-interrupt descriptor hosts storage for the virtual-vectors and contains the attributes of the notification event (interrupt) to be issued to the CPU complex to inform CPU/software about pending interrupts recorded in the posted-interrupt descriptor. More information about VT-d PI, please refer to http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog y/vt-directed-io-spec.html Design Overview == In this design, we will cover the following items: 1. Add a variant to control whether enable VT-d posted-interrupt or not. 2. VT-d PI feature detection. 3. Extend posted-interrupt descriptor structure to cover VT-d PI specific stuff. 4. Extend IRTE structure to support VT-d PI. 5. Introduce a new global vector which is used for waking up the HLT'ed vCPU. HLT'ed - blocked 6. Update IRTE when guest modifies the interrupt configuration (MSI/MSIx configuration). 7. Update posted-interrupt descriptor during vCPU scheduling (when the state of the vCPU is transmitted among RUNSTATE_running / RUNSTATE_blocked/ RUNSTATE_runnable / RUNSTATE_offline). 8. New boot command line for Xen, which controls VT-d PI feature by user. 9. Multicast/broadcast and lowest priority interrupts consideration. add a step on notification handler, as what you described in another mail. Implementation details === - New variant to control VT-d PI Like variant 'iommu_intremap' for interrupt remapping, it is very straightforward to add a new one 'iommu_intpost' for posted-interrupt. 'iommu_intpost' is set only when interrupt remapping and VT-d posted-interrupt are both enabled. - VT-d PI feature detection. Bit 59 in VT-d Capability Register is used to report
Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
-Original Message- From: Tian, Kevin Sent: Tuesday, March 10, 2015 10:01 AM To: Andrew Cooper; Tim Deegan; Wu, Feng Cc: Zhang, Yang Z; Jan Beulich; xen-devel@lists.xen.org Subject: RE: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: Monday, March 09, 2015 7:46 PM On 09/03/15 10:33, Tim Deegan wrote: At 02:03 + on 09 Mar (1425863009), Wu, Feng wrote: -Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: Friday, March 06, 2015 5:44 PM To: Wu, Feng Cc: Jan Beulich; Zhang, Yang Z; Tian, Kevin; xen-devel@lists.xen.org Subject: Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN At 02:07 + on 06 Mar (1425604054), Wu, Feng wrote: From: Tim Deegan [mailto:t...@xen.org] But I don't understand why we would need a new global vector for RUNSTATE_blocked rather than suppressing the posted interrupts as you suggest for RUNSTATE_runnable. (Or conversely why not use the new global vector for RUNSTATE_runnable too?) If we suppress the posted-interrupts when vCPU is blocked, it cannot be unblocked by the external interrupts, this is not correct. OK, I don't understand at all now. :) When the posted interrupt is suppressed, what happens to the interrupt? When the posted interrupt is suppressed, VT-d engine will not issue notification events. If it's just dropped, then we can't use that for _any_ cases. We can suppress the posted-interrupt when vCPU is waiting in the runqueue (vCPU is in RUNSTATE_runnable state), it is not needed to send notification event when vCPU is in this state, since when interrupt happens, the interrupt information are not _dropped_, instead, they are stored in PIR, and this will be synced to vIRR before VM-Entry. So you think you can use the same system for RUNSTATE_runnable as RUNSTATE_blocked? That seems like a good idea. I'll leave the details (e.g. single global vector + queue vs any other way to wake the vcpu) to people who know the x86 irq code better than I do. :) From my reading the relevant section in the VT-d spec, to the best of my understanding: We only need the second vector if Xen wishes to be informed that an interrupt has been queued for a vcpu. The spec suggests that, for one usecase, this information should affect scheduling decisions. If we do not wish to make scheduling alterations based on interrupt delivery, the extra vector can be ignored. If we do wish to make scheduling alterations, we will need to be able to uniquely identify a vcpu from a vector, which will involve allocating one vector per vcpu. If my understanding is correct, I would suggest that Xen opt for not getting notifications. Interrupting one guest to indicate that another vcpu has been interrupted scales progressively worse with the number of running VMs, and there are existing usecases which have already exhausted the x86 vector space completely. It might be sensible to have the option available as a per-domain opt-in option. A usecase such as device driver domain could easily want to deal with its interrupts ahead of running the domains it is servicing. IMO we don't need such opt. An blocked VCPU may not be woken up when losing a virtual interrupt notification, and if you look at earlier reply to Jan it's not necessarily to have one-vector-per-vcpu. It's just a global vector, which when sent to a specific pcpu, the handler will walk through blocked vcpus on that pcpu to decide which one should be woken up. So only one new vector is required. from Feng's design, the notification may be disabled in one scenario, i.e. when vcpu is in runnable state. That works if real-time is not considered since we know runnable vcpu is already unblocked. Later when considering real-time, this notification will be required too. Thanks for your clarification, Kevin! Thanks, Feng Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
-Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: Monday, March 09, 2015 7:46 PM To: Tim Deegan; Wu, Feng Cc: Zhang, Yang Z; Tian, Kevin; Jan Beulich; xen-devel@lists.xen.org Subject: Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN On 09/03/15 10:33, Tim Deegan wrote: At 02:03 + on 09 Mar (1425863009), Wu, Feng wrote: -Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: Friday, March 06, 2015 5:44 PM To: Wu, Feng Cc: Jan Beulich; Zhang, Yang Z; Tian, Kevin; xen-devel@lists.xen.org Subject: Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN At 02:07 + on 06 Mar (1425604054), Wu, Feng wrote: From: Tim Deegan [mailto:t...@xen.org] But I don't understand why we would need a new global vector for RUNSTATE_blocked rather than suppressing the posted interrupts as you suggest for RUNSTATE_runnable. (Or conversely why not use the new global vector for RUNSTATE_runnable too?) If we suppress the posted-interrupts when vCPU is blocked, it cannot be unblocked by the external interrupts, this is not correct. OK, I don't understand at all now. :) When the posted interrupt is suppressed, what happens to the interrupt? When the posted interrupt is suppressed, VT-d engine will not issue notification events. If it's just dropped, then we can't use that for _any_ cases. We can suppress the posted-interrupt when vCPU is waiting in the runqueue (vCPU is in RUNSTATE_runnable state), it is not needed to send notification event when vCPU is in this state, since when interrupt happens, the interrupt information are not _dropped_, instead, they are stored in PIR, and this will be synced to vIRR before VM-Entry. So you think you can use the same system for RUNSTATE_runnable as RUNSTATE_blocked? That seems like a good idea. I'll leave the details (e.g. single global vector + queue vs any other way to wake the vcpu) to people who know the x86 irq code better than I do. :) From my reading the relevant section in the VT-d spec, to the best of my understanding: We only need the second vector if Xen wishes to be informed that an interrupt has been queued for a vcpu. The spec suggests that, for one usecase, this information should affect scheduling decisions. If we do not wish to make scheduling alterations based on interrupt delivery, the extra vector can be ignored. As I mentioned in the previous mail in this thread, the second vector is used to wake up the blocked vCPU when external interrupts is coming for the vCPU. Thanks, Feng If we do wish to make scheduling alterations, we will need to be able to uniquely identify a vcpu from a vector, which will involve allocating one vector per vcpu. If my understanding is correct, I would suggest that Xen opt for not getting notifications. Interrupting one guest to indicate that another vcpu has been interrupted scales progressively worse with the number of running VMs, and there are existing usecases which have already exhausted the x86 vector space completely. It might be sensible to have the option available as a per-domain opt-in option. A usecase such as device driver domain could easily want to deal with its interrupts ahead of running the domains it is servicing. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
On 09/03/15 10:33, Tim Deegan wrote: At 02:03 + on 09 Mar (1425863009), Wu, Feng wrote: -Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: Friday, March 06, 2015 5:44 PM To: Wu, Feng Cc: Jan Beulich; Zhang, Yang Z; Tian, Kevin; xen-devel@lists.xen.org Subject: Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN At 02:07 + on 06 Mar (1425604054), Wu, Feng wrote: From: Tim Deegan [mailto:t...@xen.org] But I don't understand why we would need a new global vector for RUNSTATE_blocked rather than suppressing the posted interrupts as you suggest for RUNSTATE_runnable. (Or conversely why not use the new global vector for RUNSTATE_runnable too?) If we suppress the posted-interrupts when vCPU is blocked, it cannot be unblocked by the external interrupts, this is not correct. OK, I don't understand at all now. :) When the posted interrupt is suppressed, what happens to the interrupt? When the posted interrupt is suppressed, VT-d engine will not issue notification events. If it's just dropped, then we can't use that for _any_ cases. We can suppress the posted-interrupt when vCPU is waiting in the runqueue (vCPU is in RUNSTATE_runnable state), it is not needed to send notification event when vCPU is in this state, since when interrupt happens, the interrupt information are not _dropped_, instead, they are stored in PIR, and this will be synced to vIRR before VM-Entry. So you think you can use the same system for RUNSTATE_runnable as RUNSTATE_blocked? That seems like a good idea. I'll leave the details (e.g. single global vector + queue vs any other way to wake the vcpu) to people who know the x86 irq code better than I do. :) From my reading the relevant section in the VT-d spec, to the best of my understanding: We only need the second vector if Xen wishes to be informed that an interrupt has been queued for a vcpu. The spec suggests that, for one usecase, this information should affect scheduling decisions. If we do not wish to make scheduling alterations based on interrupt delivery, the extra vector can be ignored. If we do wish to make scheduling alterations, we will need to be able to uniquely identify a vcpu from a vector, which will involve allocating one vector per vcpu. If my understanding is correct, I would suggest that Xen opt for not getting notifications. Interrupting one guest to indicate that another vcpu has been interrupted scales progressively worse with the number of running VMs, and there are existing usecases which have already exhausted the x86 vector space completely. It might be sensible to have the option available as a per-domain opt-in option. A usecase such as device driver domain could easily want to deal with its interrupts ahead of running the domains it is servicing. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: Monday, March 09, 2015 7:46 PM On 09/03/15 10:33, Tim Deegan wrote: At 02:03 + on 09 Mar (1425863009), Wu, Feng wrote: -Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: Friday, March 06, 2015 5:44 PM To: Wu, Feng Cc: Jan Beulich; Zhang, Yang Z; Tian, Kevin; xen-devel@lists.xen.org Subject: Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN At 02:07 + on 06 Mar (1425604054), Wu, Feng wrote: From: Tim Deegan [mailto:t...@xen.org] But I don't understand why we would need a new global vector for RUNSTATE_blocked rather than suppressing the posted interrupts as you suggest for RUNSTATE_runnable. (Or conversely why not use the new global vector for RUNSTATE_runnable too?) If we suppress the posted-interrupts when vCPU is blocked, it cannot be unblocked by the external interrupts, this is not correct. OK, I don't understand at all now. :) When the posted interrupt is suppressed, what happens to the interrupt? When the posted interrupt is suppressed, VT-d engine will not issue notification events. If it's just dropped, then we can't use that for _any_ cases. We can suppress the posted-interrupt when vCPU is waiting in the runqueue (vCPU is in RUNSTATE_runnable state), it is not needed to send notification event when vCPU is in this state, since when interrupt happens, the interrupt information are not _dropped_, instead, they are stored in PIR, and this will be synced to vIRR before VM-Entry. So you think you can use the same system for RUNSTATE_runnable as RUNSTATE_blocked? That seems like a good idea. I'll leave the details (e.g. single global vector + queue vs any other way to wake the vcpu) to people who know the x86 irq code better than I do. :) From my reading the relevant section in the VT-d spec, to the best of my understanding: We only need the second vector if Xen wishes to be informed that an interrupt has been queued for a vcpu. The spec suggests that, for one usecase, this information should affect scheduling decisions. If we do not wish to make scheduling alterations based on interrupt delivery, the extra vector can be ignored. If we do wish to make scheduling alterations, we will need to be able to uniquely identify a vcpu from a vector, which will involve allocating one vector per vcpu. If my understanding is correct, I would suggest that Xen opt for not getting notifications. Interrupting one guest to indicate that another vcpu has been interrupted scales progressively worse with the number of running VMs, and there are existing usecases which have already exhausted the x86 vector space completely. It might be sensible to have the option available as a per-domain opt-in option. A usecase such as device driver domain could easily want to deal with its interrupts ahead of running the domains it is servicing. IMO we don't need such opt. An blocked VCPU may not be woken up when losing a virtual interrupt notification, and if you look at earlier reply to Jan it's not necessarily to have one-vector-per-vcpu. It's just a global vector, which when sent to a specific pcpu, the handler will walk through blocked vcpus on that pcpu to decide which one should be woken up. So only one new vector is required. from Feng's design, the notification may be disabled in one scenario, i.e. when vcpu is in runnable state. That works if real-time is not considered since we know runnable vcpu is already unblocked. Later when considering real-time, this notification will be required too. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
-Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: Friday, March 06, 2015 5:44 PM To: Wu, Feng Cc: Jan Beulich; Zhang, Yang Z; Tian, Kevin; xen-devel@lists.xen.org Subject: Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN At 02:07 + on 06 Mar (1425604054), Wu, Feng wrote: From: Tim Deegan [mailto:t...@xen.org] But I don't understand why we would need a new global vector for RUNSTATE_blocked rather than suppressing the posted interrupts as you suggest for RUNSTATE_runnable. (Or conversely why not use the new global vector for RUNSTATE_runnable too?) If we suppress the posted-interrupts when vCPU is blocked, it cannot be unblocked by the external interrupts, this is not correct. OK, I don't understand at all now. :) When the posted interrupt is suppressed, what happens to the interrupt? When the posted interrupt is suppressed, VT-d engine will not issue notification events. If it's just dropped, then we can't use that for _any_ cases. We can suppress the posted-interrupt when vCPU is waiting in the runqueue (vCPU is in RUNSTATE_runnable state), it is not needed to send notification event when vCPU is in this state, since when interrupt happens, the interrupt information are not _dropped_, instead, they are stored in PIR, and this will be synced to vIRR before VM-Entry. If it goes through the old path, via the vlapic, that should be enough to wake any HLT'ed vcpu. It sounds like it might be a little slower, but not necessarily once you've had to add a new list of potentially-HLT'd-and-wakeable vcpus, especially with many idle vcpus. When Posted-interrupt is used, how to go to the old path? Thanks, Feng Thanks, Feng Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
At 02:07 + on 06 Mar (1425604054), Wu, Feng wrote: From: Tim Deegan [mailto:t...@xen.org] But I don't understand why we would need a new global vector for RUNSTATE_blocked rather than suppressing the posted interrupts as you suggest for RUNSTATE_runnable. (Or conversely why not use the new global vector for RUNSTATE_runnable too?) If we suppress the posted-interrupts when vCPU is blocked, it cannot be unblocked by the external interrupts, this is not correct. OK, I don't understand at all now. :) When the posted interrupt is suppressed, what happens to the interrupt? If it's just dropped, then we can't use that for _any_ cases. If it goes through the old path, via the vlapic, that should be enough to wake any HLT'ed vcpu. It sounds like it might be a little slower, but not necessarily once you've had to add a new list of potentially-HLT'd-and-wakeable vcpus, especially with many idle vcpus. Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
-Original Message- From: Tim Deegan [mailto:t...@xen.org] Sent: Thursday, March 05, 2015 8:03 PM To: Jan Beulich Cc: Wu, Feng; Zhang, Yang Z; Tian, Kevin; xen-devel@lists.xen.org Subject: Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN Hi, At 08:52 + on 05 Mar (1425541947), Jan Beulich wrote: On 05.03.15 at 09:29, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, March 05, 2015 3:13 PM And if it can know, why couldn't the handler for posted_intr_vector not know either (i.e. after introducing a specific handler for it in place of the currently used event_check_interrupt)? Come back to the above scenario, vCPU1 is running on pCPU0 while vCPU0 is blocked, if we still use posted_intr_vector for the blocked vCPU0. If vCPU1 is running in non-root mode and external interrupts happen for it, the notification event will be handled by CPU hardware (in non-root mode) automatically, then we cannot get any control in the handler for posted_intr_vector. And how would this be different with your separate new vector? I feel I'm missing something, but I'm afraid I have to rely on you to point out what it is. Just again - please explain what it is you need two global vectors for that can't be done with one. I think the relevant detail is that the posted_intr_vector is consumed by the CPU's posted-interrupt logic and doesn't cause an exit to Xen. Exactly! But I don't understand why we would need a new global vector for RUNSTATE_blocked rather than suppressing the posted interrupts as you suggest for RUNSTATE_runnable. (Or conversely why not use the new global vector for RUNSTATE_runnable too?) If we suppress the posted-interrupts when vCPU is blocked, it cannot be unblocked by the external interrupts, this is not correct. Thanks, Feng Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, March 05, 2015 6:15 PM To: Wu, Feng Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org Subject: RE: VT-d Posted-interrupt (PI) design for XEN On 05.03.15 at 10:07, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, March 05, 2015 4:52 PM And how would this be different with your separate new vector? I feel I'm missing something, but I'm afraid I have to rely on you to point out what it is. Just again - please explain what it is you need two global vectors for that can't be done with one. Stilling using the above scenario, if vCPU1 is running in non-root mode and external interrupts happen for vCPU0 (who is HLT'ed). If using 'posted_intr_vector' for vCPU0 and 'posted_intr_vector' is also used for other vCPUs, including vCPU1. VT-d engine will issue notification event using this global vector, and this SPECIAL vector will be handled this way: (from Section 29.6 in the Intel SDM: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/6 4-ia-32-ar chitectures-software-developer-manual-325462.pdf) 1. The local APIC is acknowledged; this provides the processor core with an interrupt vector, called here the physical vector. 2. If the physical vector equals the posted-interrupt notification vector, the logical processor continues to the next step. Otherwise, a VM exit occurs as it would normally due to an external interrupt; the vector is saved in the VM-exit interruption-information field. 3. The processor clears the outstanding-notification bit in the posted-interrupt descriptor. This is done atomically so as to leave the remainder of the descriptor unmodified (e.g., with a locked AND operation). 4. The processor writes zero to the EOI register in the local APIC; this dismisses the interrupt with the postedinterrupt notification vector from the local APIC. 5. The logical processor performs a logical-OR of PIR into VIRR and clears PIR. No other agent can read or write a PIR bit (or group of bits) between the time it is read (to determine what to OR into VIRR) and when it is cleared. 6. The logical processor sets RVI to be the maximum of the old value of RVI and the highest index of all bits that were set in PIR; if no bit was set in PIR, RVI is left unmodified. 7. The logical processor evaluates pending virtual interrupts as described in Section 29.2.1. This is totally handled by CPU hardware, so we cannot get control in the handler for posted_intr_vector. OTOH, if using 'pi_wakeup_vector' for vCPU0, VT-d engine will issue notification event using this new vector, Since this new vector is not a SPECIAL one to CPU, it is just a normal vector. To cpu, it just receives an normal external interrupt, then we can get control in the handler of this new vector. In this case, hypervisor can do something in it, such as wakeup the HLT'ed vCPU. Hope this can clarify your confusion. Thanks, yes - it is this vector-is-special-to-CPU that makes a second vector necessary. Please make sure this is being properly explained in the description and/or code comments of the patches to come (of course without need to quote the SDM, but a reference to the respective section may be useful). Sure, I will add the description later! So things are a little clear now, could you please take some time to review this design again and give more comments? Thanks a lot!! Thanks, Feng Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, March 05, 2015 3:13 PM To: Wu, Feng Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org Subject: RE: VT-d Posted-interrupt (PI) design for XEN On 05.03.15 at 06:04, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, March 04, 2015 11:19 PM On 04.03.15 at 14:30, feng...@intel.com wrote: - Introduce a new global vector which is used to wake up the HLT'ed vCPU. Currently, there is a global vector 'posted_intr_vector', which is used as the global notification vector for all vCPUs in the system. This vector is stored in VMCS and CPU considers it as a special vector, uses it to notify the related pCPU when an interrupt is recorded in the posted-interrupt descriptor. After having VT-d PI, VT-d engine can issue notification event when the assigned devices issue interrupts. We need add a new global vector to wakeup the HLT'ed vCPU, please refer to the following scenario for the usage of this new global vector: 1. vCPU0 is running on pCPU0 2. vCPU0 is HLT'ed and vCPU1 is currently running on pCPU0 3. An external interrupt from an assigned device occurs for vCPU0, if we still use 'posted_intr_vector' as the notification vector for vCPU0, the notification event for vCPU0 (the event will go to pCPU1) will be consumed by vCPU1 incorrectly. The worst case is that vCPU0 will never be woken up again since the wakeup event for it is always consumed by other vCPUs incorrectly. So we need introduce another global vector, naming 'pi_wakeup_vector' to wake up the HTL'ed vCPU. I'm afraid you describe a particular scenario here, but I don't see how this is related to the introduction of another global vector: Either the current (global) vector is sufficient, or another global vector also can't solve your problem. I'm sure I'm missing something here, so please be explicit. In fact, the new global vector is used for the above scenario. Let me explain this a bit more: After having VT-d PI, when an external interrupt from an assigned device happens, here is the hardware processing flow: 1. Interrupts happen. 2. Find the associated IRTE. 3. Find the destination vCPU from IRTE (from Posted-interrupt descriptor address) 4. Sync the interrupt (stored in IRTE as 'virtual vector') to PIRR fields in Posted-interrupt descriptor. 5. If needed (Please refer to the VT-d Spec about the condition of issuing Notification Event), issue notification event to the destination CPU which is store in posted-interrupt descriptor as 'NDST' Back to the above scenario: 1. vCPU0 is running in pCPU0, and the 'NDST' filed of vCPU0's posted-interrupt descriptor is pCPU0 2. vCPU0 is HLT'ed and vCPU1 is currently running on pCPU0. 3. An external interrupt from an assigned device happens, the destination of this interrupt will be determined as above flow (IRTE -- posted-interrupt descriptor address/vCPU -- notification event to 'NDST'), If this external interrupt is for vCPU0, the notification event will be delivered to pCPU0 since the 'NDST' field of vCPU0's posted-interrupt descriptor is pCPU0. if we use the current (global) vector for the notification event for vCPU0 in the above case, since the current global vector (notification vector) is a particular vector to CPU, vCPU1 will consume it while vCPU1 is currently running on pCPU0, so we failed to wake up the HLT'ed vCPU0. please refer to Section 29.6 in the Intel SDM about how CPU handles this particular vector: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/6 4-ia-32-ar chitectures-software-developer-manual-325462.pdf After introducing a new global vector naming 'pi_wakeup_vector', before vCPU is being HLT'ed, we set The 'NV' filed (Notification Vector) in the vCPU's posted-interrupt descriptor to 'pi_wakeup_vector', and this is a normal vector to CPU and CPU will not do special things for it (different from the current global vector). In the handler of this vector, we can wake up the HLT'ed vCPU. So suppose you have more than on vCPU which most recently ran on pCPU0 - how will the handler for the new vector know which of the vCPU-s it should kick? Oh, sorry, I thought I had added how the wakeup the HLT'ed vCPU in this design, Seems I missed it. Here is it. 1. Define a per-cpu list 'blocked_vcpu_on_cpu_lock', which stored the blocked vCPU on the pCPU. 2. When the vCPU's state is changed to RUNSTATE_blocked, insert the vCPU to the per-cpu list belonging to the pCPU it was running 3. When the vCPU is unblocked, remove the vCPU from the related pCPU list. In the handler of 'pi_wakeup_vector', we do: 1. Get the physical CPU. 2. Iterate the list 'blocked_vcpu_on_cpu_lock' of the current pCPU, if 'ON' is set, we unblock the associated vCPU
Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
On 05.03.15 at 10:07, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Thursday, March 05, 2015 4:52 PM And how would this be different with your separate new vector? I feel I'm missing something, but I'm afraid I have to rely on you to point out what it is. Just again - please explain what it is you need two global vectors for that can't be done with one. Stilling using the above scenario, if vCPU1 is running in non-root mode and external interrupts happen for vCPU0 (who is HLT'ed). If using 'posted_intr_vector' for vCPU0 and 'posted_intr_vector' is also used for other vCPUs, including vCPU1. VT-d engine will issue notification event using this global vector, and this SPECIAL vector will be handled this way: (from Section 29.6 in the Intel SDM: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-ar chitectures-software-developer-manual-325462.pdf) 1. The local APIC is acknowledged; this provides the processor core with an interrupt vector, called here the physical vector. 2. If the physical vector equals the posted-interrupt notification vector, the logical processor continues to the next step. Otherwise, a VM exit occurs as it would normally due to an external interrupt; the vector is saved in the VM-exit interruption-information field. 3. The processor clears the outstanding-notification bit in the posted-interrupt descriptor. This is done atomically so as to leave the remainder of the descriptor unmodified (e.g., with a locked AND operation). 4. The processor writes zero to the EOI register in the local APIC; this dismisses the interrupt with the postedinterrupt notification vector from the local APIC. 5. The logical processor performs a logical-OR of PIR into VIRR and clears PIR. No other agent can read or write a PIR bit (or group of bits) between the time it is read (to determine what to OR into VIRR) and when it is cleared. 6. The logical processor sets RVI to be the maximum of the old value of RVI and the highest index of all bits that were set in PIR; if no bit was set in PIR, RVI is left unmodified. 7. The logical processor evaluates pending virtual interrupts as described in Section 29.2.1. This is totally handled by CPU hardware, so we cannot get control in the handler for posted_intr_vector. OTOH, if using 'pi_wakeup_vector' for vCPU0, VT-d engine will issue notification event using this new vector, Since this new vector is not a SPECIAL one to CPU, it is just a normal vector. To cpu, it just receives an normal external interrupt, then we can get control in the handler of this new vector. In this case, hypervisor can do something in it, such as wakeup the HLT'ed vCPU. Hope this can clarify your confusion. Thanks, yes - it is this vector-is-special-to-CPU that makes a second vector necessary. Please make sure this is being properly explained in the description and/or code comments of the patches to come (of course without need to quote the SDM, but a reference to the respective section may be useful). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
On 04.03.15 at 14:30, feng...@intel.com wrote: - Introduce a new global vector which is used to wake up the HLT'ed vCPU. Currently, there is a global vector 'posted_intr_vector', which is used as the global notification vector for all vCPUs in the system. This vector is stored in VMCS and CPU considers it as a special vector, uses it to notify the related pCPU when an interrupt is recorded in the posted-interrupt descriptor. After having VT-d PI, VT-d engine can issue notification event when the assigned devices issue interrupts. We need add a new global vector to wakeup the HLT'ed vCPU, please refer to the following scenario for the usage of this new global vector: 1. vCPU0 is running on pCPU0 2. vCPU0 is HLT'ed and vCPU1 is currently running on pCPU0 3. An external interrupt from an assigned device occurs for vCPU0, if we still use 'posted_intr_vector' as the notification vector for vCPU0, the notification event for vCPU0 (the event will go to pCPU1) will be consumed by vCPU1 incorrectly. The worst case is that vCPU0 will never be woken up again since the wakeup event for it is always consumed by other vCPUs incorrectly. So we need introduce another global vector, naming 'pi_wakeup_vector' to wake up the HTL'ed vCPU. I'm afraid you describe a particular scenario here, but I don't see how this is related to the introduction of another global vector: Either the current (global) vector is sufficient, or another global vector also can't solve your problem. I'm sure I'm missing something here, so please be explicit. - Update posted-interrupt descriptor during vCPU scheduling The basic idea here is: 1. When vCPU's state is RUNSTATE_running, - Set 'NV' to 'posted_intr_vector'. - Clear 'SN' to accept posted-interrupts. - Set 'NDST' to the pCPU on which the vCPU will be running. [...] This is pretty hard to read without knowing what the abbreviations actually stand for, and suggesting to hunt for them in the spec isn't very reader friendly either. Please explain these fields, at the very least by way of comments on the structure fields presented earlier. On Xen side, what is your opinion about support lowest-priority interrupts for VT-d PI? I certainly think (as with every other virtualized piece of hardware) that hardware behavior should be emulated as closely as possible. I.e. yes, we should have it eventually. As to the two stage approach mentioned for KVM - I've grown reservations against Intel people making promises towards future implementation of something, i.e. I'm kind of hesitant to agree to such an implementation model. Yet you're to contribute the patches, and I'm surely not planning to veto a stage-1-only implementation as it would be an improvement anyway. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
On 04/03/15 13:30, Wu, Feng wrote: VT-d Posted-interrupt (PI) design for XEN Thankyou very much for this! Background == With the development of virtualization, there are more and more device assignment requirements. However, today when a VM is running with assigned devices (such as, NIC), external interrupt handling for the assigned devices always needs VMM intervention. VT-d Posted-interrupt is a more enhanced method to handle interrupts in the virtualization environment. Interrupt posting is the process by which an interrupt request is recorded in a memory-resident posted-interrupt-descriptor structure by the root-complex, followed by an optional notification event issued to the CPU complex. With VT-d Posted-interrupt we can get the following advantages: - Directly delivery of external interrupts to running vCPUs without VMM intervention - Decease the interrupt migration complexity. On vCPU migration, software can atomically co-migrate all interrupts targeting the migrating vCPU. I presume you mean Decrease ? Decease means something quite different. Posted-interrupt Introduction There are two components to the Posted-interrupt architecture: Processor Support and Root-Complex Support - Processor Support Posted-interrupt processing is a feature by which a processor processes the virtual interrupts by recording them as pending on the virtual-APIC page. Posted-interrupt processing is enabled by setting the process posted interrupts VM-execution control. The processing is performed in response to the arrival of an interrupt with the posted-interrupt notification vector. In response to such an interrupt, the processor processes virtual interrupts recorded in a data structure called a posted-interrupt descriptor. More information about APICv and CPU-side Posted-interrupt, please refer to Chapter 29, and Section 29.6 in the Intel SDM: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf - Root-Complex Support Interrupt posting is the process by which an interrupt request (from IOAPIC or MSI/MSIx capable sources) is recorded in a memory-resident posted-interrupt-descriptor structure by the root-complex, followed by an optional notification event issued to the CPU complex. The interrupt request arriving at the root-complex carry the identity of the interrupt request source and a 'remapping-index'. The remapping-index is used to look-up an entry from the memory-resident interrupt-remap-table. Unlike with interrupt-remapping, the interrupt-remap-table-entry for a posted- interrupt, specifies a virtual-vector and a pointer to the posted-interrupt descriptor. The virtual-vector specifies the vector of the interrupt to be recorded in the posted-interrupt descriptor. The posted-interrupt descriptor hosts storage for the virtual-vectors and contains the attributes of the notification event (interrupt) to be issued to the CPU complex to inform CPU/software about pending interrupts recorded in the posted-interrupt descriptor. More information about VT-d PI, please refer to http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html Design Overview == In this design, we will cover the following items: 1. Add a variant to control whether enable VT-d posted-interrupt or not. 2. VT-d PI feature detection. 3. Extend posted-interrupt descriptor structure to cover VT-d PI specific stuff. 4. Extend IRTE structure to support VT-d PI. 5. Introduce a new global vector which is used for waking up the HLT'ed vCPU. 6. Update IRTE when guest modifies the interrupt configuration (MSI/MSIx configuration). 7. Update posted-interrupt descriptor during vCPU scheduling (when the state of the vCPU is transmitted among RUNSTATE_running / RUNSTATE_blocked/ RUNSTATE_runnable / RUNSTATE_offline). 8. New boot command line for Xen, which controls VT-d PI feature by user. 9. Multicast/broadcast and lowest priority interrupts consideration. Implementation details === - New variant to control VT-d PI I know what you are trying to say, but New variant does not express what you mean. A new control relating to VT-d PI perhaps? Like variant 'iommu_intremap' for interrupt remapping, it is very straightforward to add a new one 'iommu_intpost' for posted-interrupt. 'iommu_intpost' is set only when interrupt remapping and VT-d posted-interrupt are both enabled. I would avoid mixing names such as PI and intpost. If anything, it should be iommu_postint to keep the naming consistent. (Here and elsewhere). - VT-d PI feature detection. Bit 59 in VT-d Capability Register is used to report VT-d Posted-interrupt support. - Extend posted-interrupt descriptor structure to cover VT-d PI specific stuff. Here is the new structure for posted-interrupt descriptor: struct pi_desc {
Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
-Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: Thursday, March 05, 2015 2:48 AM To: Wu, Feng; xen-devel@lists.xen.org Cc: Zhang, Yang Z; Tian, Kevin; Jan Beulich Subject: Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN On 04/03/15 13:30, Wu, Feng wrote: VT-d Posted-interrupt (PI) design for XEN Thankyou very much for this! Background == With the development of virtualization, there are more and more device assignment requirements. However, today when a VM is running with assigned devices (such as, NIC), external interrupt handling for the assigned devices always needs VMM intervention. VT-d Posted-interrupt is a more enhanced method to handle interrupts in the virtualization environment. Interrupt posting is the process by which an interrupt request is recorded in a memory-resident posted-interrupt-descriptor structure by the root-complex, followed by an optional notification event issued to the CPU complex. With VT-d Posted-interrupt we can get the following advantages: - Directly delivery of external interrupts to running vCPUs without VMM intervention - Decease the interrupt migration complexity. On vCPU migration, software can atomically co-migrate all interrupts targeting the migrating vCPU. I presume you mean Decrease ? Yes! Decease means something quite different. Sorry for the typo. Posted-interrupt Introduction There are two components to the Posted-interrupt architecture: Processor Support and Root-Complex Support - Processor Support Posted-interrupt processing is a feature by which a processor processes the virtual interrupts by recording them as pending on the virtual-APIC page. Posted-interrupt processing is enabled by setting the process posted interrupts VM-execution control. The processing is performed in response to the arrival of an interrupt with the posted-interrupt notification vector. In response to such an interrupt, the processor processes virtual interrupts recorded in a data structure called a posted-interrupt descriptor. More information about APICv and CPU-side Posted-interrupt, please refer to Chapter 29, and Section 29.6 in the Intel SDM: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/6 4-ia-32-architectures-software-developer-manual-325462.pdf - Root-Complex Support Interrupt posting is the process by which an interrupt request (from IOAPIC or MSI/MSIx capable sources) is recorded in a memory-resident posted-interrupt-descriptor structure by the root-complex, followed by an optional notification event issued to the CPU complex. The interrupt request arriving at the root-complex carry the identity of the interrupt request source and a 'remapping-index'. The remapping-index is used to look-up an entry from the memory-resident interrupt-remap-table. Unlike with interrupt-remapping, the interrupt-remap-table-entry for a posted- interrupt, specifies a virtual-vector and a pointer to the posted-interrupt descriptor. The virtual-vector specifies the vector of the interrupt to be recorded in the posted-interrupt descriptor. The posted-interrupt descriptor hosts storage for the virtual-vectors and contains the attributes of the notification event (interrupt) to be issued to the CPU complex to inform CPU/software about pending interrupts recorded in the posted-interrupt descriptor. More information about VT-d PI, please refer to http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog y/vt-directed-io-spec.html Design Overview == In this design, we will cover the following items: 1. Add a variant to control whether enable VT-d posted-interrupt or not. 2. VT-d PI feature detection. 3. Extend posted-interrupt descriptor structure to cover VT-d PI specific stuff. 4. Extend IRTE structure to support VT-d PI. 5. Introduce a new global vector which is used for waking up the HLT'ed vCPU. 6. Update IRTE when guest modifies the interrupt configuration (MSI/MSIx configuration). 7. Update posted-interrupt descriptor during vCPU scheduling (when the state of the vCPU is transmitted among RUNSTATE_running / RUNSTATE_blocked/ RUNSTATE_runnable / RUNSTATE_offline). 8. New boot command line for Xen, which controls VT-d PI feature by user. 9. Multicast/broadcast and lowest priority interrupts consideration. Implementation details === - New variant to control VT-d PI I know what you are trying to say, but New variant does not express what you mean. A new control relating to VT-d PI perhaps? Like variant 'iommu_intremap' for interrupt remapping, it is very straightforward to add a new one 'iommu_intpost' for posted-interrupt. 'iommu_intpost' is set only when interrupt remapping and VT-d posted-interrupt are both enabled. I would
Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
On 05.03.15 at 06:04, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, March 04, 2015 11:19 PM On 04.03.15 at 14:30, feng...@intel.com wrote: - Introduce a new global vector which is used to wake up the HLT'ed vCPU. Currently, there is a global vector 'posted_intr_vector', which is used as the global notification vector for all vCPUs in the system. This vector is stored in VMCS and CPU considers it as a special vector, uses it to notify the related pCPU when an interrupt is recorded in the posted-interrupt descriptor. After having VT-d PI, VT-d engine can issue notification event when the assigned devices issue interrupts. We need add a new global vector to wakeup the HLT'ed vCPU, please refer to the following scenario for the usage of this new global vector: 1. vCPU0 is running on pCPU0 2. vCPU0 is HLT'ed and vCPU1 is currently running on pCPU0 3. An external interrupt from an assigned device occurs for vCPU0, if we still use 'posted_intr_vector' as the notification vector for vCPU0, the notification event for vCPU0 (the event will go to pCPU1) will be consumed by vCPU1 incorrectly. The worst case is that vCPU0 will never be woken up again since the wakeup event for it is always consumed by other vCPUs incorrectly. So we need introduce another global vector, naming 'pi_wakeup_vector' to wake up the HTL'ed vCPU. I'm afraid you describe a particular scenario here, but I don't see how this is related to the introduction of another global vector: Either the current (global) vector is sufficient, or another global vector also can't solve your problem. I'm sure I'm missing something here, so please be explicit. In fact, the new global vector is used for the above scenario. Let me explain this a bit more: After having VT-d PI, when an external interrupt from an assigned device happens, here is the hardware processing flow: 1. Interrupts happen. 2. Find the associated IRTE. 3. Find the destination vCPU from IRTE (from Posted-interrupt descriptor address) 4. Sync the interrupt (stored in IRTE as 'virtual vector') to PIRR fields in Posted-interrupt descriptor. 5. If needed (Please refer to the VT-d Spec about the condition of issuing Notification Event), issue notification event to the destination CPU which is store in posted-interrupt descriptor as 'NDST' Back to the above scenario: 1. vCPU0 is running in pCPU0, and the 'NDST' filed of vCPU0's posted-interrupt descriptor is pCPU0 2. vCPU0 is HLT'ed and vCPU1 is currently running on pCPU0. 3. An external interrupt from an assigned device happens, the destination of this interrupt will be determined as above flow (IRTE -- posted-interrupt descriptor address/vCPU -- notification event to 'NDST'), If this external interrupt is for vCPU0, the notification event will be delivered to pCPU0 since the 'NDST' field of vCPU0's posted-interrupt descriptor is pCPU0. if we use the current (global) vector for the notification event for vCPU0 in the above case, since the current global vector (notification vector) is a particular vector to CPU, vCPU1 will consume it while vCPU1 is currently running on pCPU0, so we failed to wake up the HLT'ed vCPU0. please refer to Section 29.6 in the Intel SDM about how CPU handles this particular vector: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-ar chitectures-software-developer-manual-325462.pdf After introducing a new global vector naming 'pi_wakeup_vector', before vCPU is being HLT'ed, we set The 'NV' filed (Notification Vector) in the vCPU's posted-interrupt descriptor to 'pi_wakeup_vector', and this is a normal vector to CPU and CPU will not do special things for it (different from the current global vector). In the handler of this vector, we can wake up the HLT'ed vCPU. So suppose you have more than on vCPU which most recently ran on pCPU0 - how will the handler for the new vector know which of the vCPU-s it should kick? And if it can know, why couldn't the handler for posted_intr_vector not know either (i.e. after introducing a specific handler for it in place of the currently used event_check_interrupt)? (One of the reasons I'm asking, i.e. apart from wanting to understand the model, is the limited amount of vectors we have.) Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel