[Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel
Rather than assuming only PV guests need special treatment (and dealing with that directly when an IRQ gets set up), keep all guest MSI IRQs masked until either the (HVM) guest unmasks them via vMSI or the (PV, PVHVM, or PVH) guest sets up an event channel for it. To not further clutter the common evtchn_bind_pirq() with x86-specific code, introduce an arch_evtchn_bind_pirq() hook instead. Reported-by: Sander Eikelenboom li...@eikelenboom.it Signed-off-by: Jan Beulich jbeul...@suse.com Tested-by: Sander Eikelenboom li...@eikelenboom.it --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -2502,6 +2502,25 @@ int unmap_domain_pirq_emuirq(struct doma return ret; } +void arch_evtchn_bind_pirq(struct domain *d, int pirq) +{ +int irq = domain_pirq_to_irq(d, pirq); +struct irq_desc *desc; +unsigned long flags; + +if ( irq = 0 ) +return; + +if ( is_hvm_domain(d) ) +map_domain_emuirq_pirq(d, pirq, IRQ_PT); + +desc = irq_to_desc(irq); +spin_lock_irqsave(desc-lock, flags); +if ( desc-msi_desc ) +guest_mask_msi_irq(desc, 0); +spin_unlock_irqrestore(desc-lock, flags); +} + bool_t hvm_domain_use_pirq(const struct domain *d, const struct pirq *pirq) { return is_hvm_domain(d) pirq --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -422,10 +422,7 @@ void guest_mask_msi_irq(struct irq_desc static unsigned int startup_msi_irq(struct irq_desc *desc) { -bool_t guest_masked = (desc-status IRQ_GUEST) - is_hvm_domain(desc-msi_desc-dev-domain); - -msi_set_mask_bit(desc, 0, guest_masked); +msi_set_mask_bit(desc, 0, !!(desc-status IRQ_GUEST)); return 0; } --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -502,10 +502,7 @@ static long evtchn_bind_pirq(evtchn_bind bind-port = port; -#ifdef CONFIG_X86 -if ( is_hvm_domain(d) domain_pirq_to_irq(d, pirq) 0 ) -map_domain_emuirq_pirq(d, pirq, IRQ_PT); -#endif +arch_evtchn_bind_pirq(d, pirq); out: spin_unlock(d-event_lock); --- a/xen/include/asm-arm/irq.h +++ b/xen/include/asm-arm/irq.h @@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, void arch_move_irqs(struct vcpu *v); +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq))) + /* Set IRQ type for an SPI */ int irq_set_spi_type(unsigned int spi, unsigned int type); --- a/xen/include/xen/irq.h +++ b/xen/include/xen/irq.h @@ -172,4 +172,8 @@ unsigned int set_desc_affinity(struct ir unsigned int arch_hwdom_irqs(domid_t); #endif +#ifndef arch_evtchn_bind_pirq +void arch_evtchn_bind_pirq(struct domain *, int pirq); +#endif + #endif /* __XEN_IRQ_H__ */ x86/MSI: fix guest unmasking when handling IRQ via event channel Rather than assuming only PV guests need special treatment (and dealing with that directly when an IRQ gets set up), keep all guest MSI IRQs masked until either the (HVM) guest unmasks them via vMSI or the (PV, PVHVM, or PVH) guest sets up an event channel for it. To not further clutter the common evtchn_bind_pirq() with x86-specific code, introduce an arch_evtchn_bind_pirq() hook instead. Reported-by: Sander Eikelenboom li...@eikelenboom.it Signed-off-by: Jan Beulich jbeul...@suse.com Tested-by: Sander Eikelenboom li...@eikelenboom.it --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -2502,6 +2502,25 @@ int unmap_domain_pirq_emuirq(struct doma return ret; } +void arch_evtchn_bind_pirq(struct domain *d, int pirq) +{ +int irq = domain_pirq_to_irq(d, pirq); +struct irq_desc *desc; +unsigned long flags; + +if ( irq = 0 ) +return; + +if ( is_hvm_domain(d) ) +map_domain_emuirq_pirq(d, pirq, IRQ_PT); + +desc = irq_to_desc(irq); +spin_lock_irqsave(desc-lock, flags); +if ( desc-msi_desc ) +guest_mask_msi_irq(desc, 0); +spin_unlock_irqrestore(desc-lock, flags); +} + bool_t hvm_domain_use_pirq(const struct domain *d, const struct pirq *pirq) { return is_hvm_domain(d) pirq --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -422,10 +422,7 @@ void guest_mask_msi_irq(struct irq_desc static unsigned int startup_msi_irq(struct irq_desc *desc) { -bool_t guest_masked = (desc-status IRQ_GUEST) - is_hvm_domain(desc-msi_desc-dev-domain); - -msi_set_mask_bit(desc, 0, guest_masked); +msi_set_mask_bit(desc, 0, !!(desc-status IRQ_GUEST)); return 0; } --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -502,10 +502,7 @@ static long evtchn_bind_pirq(evtchn_bind bind-port = port; -#ifdef CONFIG_X86 -if ( is_hvm_domain(d) domain_pirq_to_irq(d, pirq) 0 ) -map_domain_emuirq_pirq(d, pirq, IRQ_PT); -#endif +arch_evtchn_bind_pirq(d, pirq); out: spin_unlock(d-event_lock); --- a/xen/include/asm-arm/irq.h +++ b/xen/include/asm-arm/irq.h @@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, void arch_move_irqs(struct vcpu *v); +#define
Re: [Xen-devel] [PATCH v2] net/bridge: Use __in6_dev_get rather than in6_dev_get in br_validate_ipv6
On Tue, Jul 07, 2015 at 11:34:34AM -0700, Stephen Hemminger wrote: On Tue, 7 Jul 2015 15:55:21 +0100 Julien Grall julien.gr...@citrix.com wrote: The commit efb6de9b4ba0092b2c55f6a52d16294a8a698edd netfilter: bridge: forward IPv6 fragmented packets introduced a new function br_validate_ipv6 which take a reference on the inet6 device. Although, the reference is not released at the end. This will result to the impossibility to destroy any netdevice using ipv6 and bridge. It's possible to directly retrieve the inet6 device without taking a reference as all netfilter hooks are protected by rcu_read_lock via nf_hook_slow. Spotted while trying to destroy a Xen guest on the upstream Linux: unregister_netdevice: waiting for vif1.0 to become free. Usage count = 1 Signed-off-by: Julien Grall julien.gr...@citrix.com Cc: Bernhard Thaler bernhard.tha...@wvnet.at Cc: Pablo Neira Ayuso pa...@netfilter.org Cc: f...@strlen.de Cc: ian.campb...@citrix.com Cc: wei.l...@citrix.com Cc: Bob Liu bob@oracle.com --- Note that it's impossible to create new guest after this message. I'm not sure if it's normal. Changes in v2: - Don't take a reference to inet6. - This was net/bridge: Add missing in6_dev_put in br_validate_ipv6 [0] [0] https://lkml.org/lkml/2015/7/3/443 --- net/bridge/br_netfilter_ipv6.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) I like this simple solution Acked-by: Stephen Hemminger step...@networkplumber.org Applied, thanks. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel
On 08/07/2015 09:56, Jan Beulich wrote: Rather than assuming only PV guests need special treatment (and dealing with that directly when an IRQ gets set up), keep all guest MSI IRQs masked until either the (HVM) guest unmasks them via vMSI or the (PV, PVHVM, or PVH) guest sets up an event channel for it. To not further clutter the common evtchn_bind_pirq() with x86-specific code, introduce an arch_evtchn_bind_pirq() hook instead. Reported-by: Sander Eikelenboom li...@eikelenboom.it Signed-off-by: Jan Beulich jbeul...@suse.com Tested-by: Sander Eikelenboom li...@eikelenboom.it Reviewed-by: Andrew Cooper andrew.coop...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
On 08.07.15 at 12:36, feng...@intel.com wrote: From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: Tuesday, June 30, 2015 1:07 AM On 24/06/15 06:18, Feng Wu wrote: @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v) if ( v-vcpu_id == 0 ) v-arch.user_regs.eax = 1; +tasklet_init( +v-arch.hvm_vmx.pi_vcpu_wakeup_tasklet, +pi_vcpu_wakeup_tasklet_handler, +(unsigned long)v); c/s f6dd295 indicates that the global tasklet lock causes a bottleneck when injecting interrupts, and replaced a tasklet with a softirq to fix the scalability issue. I would expect exactly the bottleneck to exist here. I am still considering this comments. Jan, what is your opinion about this? My opinion here is that I expect you to respond to Andrew. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 11/18] xen/arm: ITS: Add GITS registers emulation
On Mon, 2015-06-22 at 17:31 +0530, vijay.kil...@gmail.com wrote: From: Vijaya Kumar K vijaya.ku...@caviumnetworks.com Emulate GITS* registers and handle LPI configuration table update trap. These need to only be exposed to a guest which has been configured with an ITS. For dom0 that means at a minimum it needs to be based on the capabilities of the underlying hardware. The same is true of the next patch adding the GICR registers. For domU it seems there is currently no ITS exposed to them, since there is no toolstack changes here, so the emulation should be configured accordingly. Signed-off-by: Vijaya Kumar K vijaya.ku...@caviumnetworks.com --- xen/arch/arm/vgic-v3-its.c| 516 + xen/include/asm-arm/gic-its.h | 14 ++ 2 files changed, 530 insertions(+) diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c index 0671434..fa9dccc 100644 --- a/xen/arch/arm/vgic-v3-its.c +++ b/xen/arch/arm/vgic-v3-its.c @@ -63,6 +63,46 @@ static void dump_cmd(its_cmd_block *cmd) } #endif +void vgic_its_disable_lpis(struct vcpu *v, uint32_t vlpi) +{ +struct pending_irq *p; +unsigned long flags; + +p = irq_to_pending(v, vlpi); +clear_bit(GIC_IRQ_GUEST_ENABLED, p-status); +gic_remove_from_queues(v, vlpi); +if ( p-desc != NULL ) +{ +spin_lock_irqsave(p-desc-lock, flags); +p-desc-handler-disable(p-desc); +spin_unlock_irqrestore(p-desc-lock, flags); +} +} + +void vgic_its_enable_lpis(struct vcpu *v, uint32_t vlpi, uint8_t priority) +{ +struct pending_irq *p; +unsigned long flags; + +/* Get plpi for the given vlpi */ +p = irq_to_pending(v, vlpi); +p-priority = priority; +set_bit(GIC_IRQ_GUEST_ENABLED, p-status); + +spin_lock_irqsave(v-arch.vgic.lock, flags); + +if ( !list_empty(p-inflight) + !test_bit(GIC_IRQ_GUEST_VISIBLE, p-status) ) +gic_raise_guest_irq(v, irq_to_virq(p-desc), p-priority); + +spin_unlock_irqrestore(v-arch.vgic.lock, flags); +if ( p-desc != NULL ) +{ +spin_lock_irqsave(p-desc-lock, flags); +p-desc-handler-enable(p-desc); +spin_unlock_irqrestore(p-desc-lock, flags); +} +} /* ITS device table helper functions */ int vits_vdevice_entry(struct domain *d, uint32_t dev_id, struct vdevice_table *entry, int set) @@ -649,6 +689,482 @@ err: return 0; } +static int vgic_v3_gits_lpi_mmio_read(struct vcpu *v, mmio_info_t *info) +{ +uint32_t offset; +struct hsr_dabt dabt = info-dabt; +struct cpu_user_regs *regs = guest_cpu_user_regs(); +register_t *r = select_user_reg(regs, dabt.reg); +uint8_t cfg; + +offset = info-gpa - + (v-domain-arch.vits-propbase 0xf000UL); + +if ( offset SZ_64K ) +{ +DPRINTK(vITS:d%dv%d LPI Table read offset 0x%x\n, +v-domain-domain_id, v-vcpu_id, offset); +cfg = readb_relaxed(v-domain-arch.vits-prop_page + offset); +*r = cfg; +return 1; +} +else +dprintk(XENLOG_G_ERR, vITS:d%dv%d LPI Table read with wrong offset 0x%x\n, +v-domain-domain_id, v-vcpu_id, offset); + + +return 0; +} + +static int vgic_v3_gits_lpi_mmio_write(struct vcpu *v, mmio_info_t *info) +{ +uint32_t offset; +uint32_t vid; +uint8_t cfg; +bool_t enable; +struct hsr_dabt dabt = info-dabt; +struct cpu_user_regs *regs = guest_cpu_user_regs(); +register_t *r = select_user_reg(regs, dabt.reg); + +offset = info-gpa - + (v-domain-arch.vits-propbase 0xf000UL); + +vid = offset + NR_GIC_LPI; +if ( offset SZ_64K ) +{ +DPRINTK(vITS:d%dv%d LPI Table write offset 0x%x\n, +v-domain-domain_id, v-vcpu_id, offset); +cfg = readb_relaxed(v-domain-arch.vits-prop_page + offset); +enable = (cfg *r) 0x1; + +if ( !enable ) + vgic_its_enable_lpis(v, vid, (*r 0xfc)); +else + vgic_its_disable_lpis(v, vid); + +/* Update virtual prop page */ +writeb_relaxed((*r 0xff), +v-domain-arch.vits-prop_page + offset); + +return 1; +} +else +dprintk(XENLOG_G_ERR, vITS:d%dv%d LPI Table invalid write @ 0x%x\n, +v-domain-domain_id, v-vcpu_id, offset); + +return 0; +} + +static const struct mmio_handler_ops vgic_gits_lpi_mmio_handler = { +.read_handler = vgic_v3_gits_lpi_mmio_read, +.write_handler = vgic_v3_gits_lpi_mmio_write, +}; + +int vgic_its_unmap_lpi_prop(struct vcpu *v) +{ +paddr_t maddr; +uint32_t lpi_size; +int i; + +maddr = v-domain-arch.vits-propbase 0xf000UL; +lpi_size = 1UL ((v-domain-arch.vits-propbase 0x1f) + 1); + +DPRINTK(vITS:d%dv%d Unmap guest
Re: [Xen-devel] [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling
From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM The basic idea here is: 1. When vCPU's state is RUNSTATE_running, - set 'NV' to 'Notification Vector'. - Clear 'SN' to accpet PI. - set 'NDST' to the right pCPU. 2. When vCPU's state is RUNSTATE_blocked, - set 'NV' to 'Wake-up Vector', so we can wake up the related vCPU when posted-interrupt happens for it. - Clear 'SN' to accpet PI. 3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline, - Set 'SN' to suppress non-urgent interrupts. (Current, we only support non-urgent interrupts) - Set 'NV' back to 'Notification Vector' if needed. Signed-off-by: Feng Wu feng...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 08/15] Suppress posting interrupts when 'SN' is set
From: Wu, Feng Sent: Wednesday, July 08, 2015 6:11 PM From: Tian, Kevin Sent: Wednesday, July 08, 2015 5:06 PM From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM Currently, we don't support urgent interrupt, all interrupts are recognized as non-urgent interrupt, so we cannot send posted-interrupt when 'SN' is set. Signed-off-by: Feng Wu feng...@intel.com --- v3: use cmpxchg to test SN/ON and set ON xen/arch/x86/hvm/vmx/vmx.c | 32 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 0837627..b94ef6a 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1686,6 +1686,8 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector) { +struct pi_desc old, new, prev; + move to 'else if'. if ( pi_test_and_set_pir(vector, v-arch.hvm_vmx.pi_desc) ) return; @@ -1698,13 +1700,35 @@ static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector) */ pi_set_on(v-arch.hvm_vmx.pi_desc); } -else if ( !pi_test_and_set_on(v-arch.hvm_vmx.pi_desc) ) +else { +prev.control = 0; + +do { +old.control = v-arch.hvm_vmx.pi_desc.control + ~(1 POSTED_INTR_ON | 1 POSTED_INTR_SN); +new.control = v-arch.hvm_vmx.pi_desc.control | + 1 POSTED_INTR_ON; + +/* + * Currently, we don't support urgent interrupt, all + * interrupts are recognized as non-urgent interrupt, + * so we cannot send posted-interrupt when 'SN' is set. + * Besides that, if 'ON' is already set, we cannot set + * posted-interrupts as well. + */ +if ( prev.sn || prev.on ) +{ +vcpu_kick(v); +return; +} would it make more sense to move above check after cmpxchg? My original idea is that, we only need to do the check when prev.control != old.control, which means the cmpxchg is not successful completed. If we add the check between cmpxchg and while ( prev.control != old.control ), it seems the logic is not so clear, since we don't need to check prev.sn and prev.on when cmxchg succeeds in setting the new value. Thanks, Feng Then it'd be clearer if you move the check the start of the loop, so you can avoid two additional reads when the prev.on/sn is set. :-) Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes
-Original Message- From: Tian, Kevin Sent: Wednesday, July 08, 2015 7:46 PM To: Wu, Feng; xen-devel@lists.xen.org Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang, Yang Z; george.dun...@eu.citrix.com Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config changes From: Wu, Feng Sent: Wednesday, July 08, 2015 6:32 PM -Original Message- From: Tian, Kevin Sent: Wednesday, July 08, 2015 6:23 PM To: Wu, Feng; xen-devel@lists.xen.org Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang, Yang Z; george.dun...@eu.citrix.com Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config changes From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM When guest changes its interrupt configuration (such as, vector, etc.) for direct-assigned devices, we need to update the associated IRTE with the new guest vector, so external interrupts from the assigned devices can be injected to guests without VM-Exit. For lowest-priority interrupts, we use vector-hashing mechamisn to find the destination vCPU. This follows the hardware behavior, since modern Intel CPUs use vector hashing to handle the lowest-priority interrupt. For multicast/broadcast vCPU, we cannot handle it via interrupt posting, still use interrupt remapping. Signed-off-by: Feng Wu feng...@intel.com --- v3: - Use bitmap to store the all the possible destination vCPUs of an interrupt, then trying to find the right destination from the bitmap - Typo and some small changes xen/drivers/passthrough/io.c | 96 +++- 1 file changed, 95 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index 9b77334..18e24e1 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -26,6 +26,7 @@ #include asm/hvm/iommu.h #include asm/hvm/support.h #include xen/hvm/irq.h +#include asm/io_apic.h static DEFINE_PER_CPU(struct list_head, dpci_list); @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci) xfree(dpci); } +/* + * The purpose of this routine is to find the right destination vCPU for + * an interrupt which will be delivered by VT-d posted-interrupt. There + * are several cases as below: If you aim to have this interface common to more usages, don't restrict to VT-d posted-interrupt which should be just an example. Yes, making this a common interface should be better. + * + * - For lowest-priority interrupts, we find the destination vCPU from the + * guest vector using vector-hashing mechanism and return true. This follows + * the hardware behavior, since modern Intel CPUs use vector hashing to + * handle the lowest-priority interrupt. Does AMD use same hashing mechanism? Can this interface be reused by other IOMMU type or it's an Intel specific implementation? I am not sure how AMD handle lowest-priority. Intel hardware guys told me recent Intel hardware platform use this method to deliver lowest-priority interrupts. What do you mean by other IOMMU type? OS doesn't assume how vector hashing is done in hardware level. So it should be fine to use Intel algorithm in this emulation path. However my point is just about the comment since modern Intel CPUs use vector hashing to handle the lowest-priority interrupt. It's not because Intel does so. It's the implementation option that you choose Intel algorithm here. here I can mention: we choose vector-hashing for lowest-priority handling and list Intel as an example to use it, okay? Thanks, Feng Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [libvirt] [PATCH] libxl: support dom0
On 07.07.2015 01:27, Jim Fehlig wrote: On 07/06/2015 03:46 PM, Jim Fehlig wrote: In Xen, dom0 is really just another domain that supports ballooning, adding/removing devices, changing vcpu configuration, etc. This patch adds support to the libxl driver for managing dom0. Note that the legacy xend driver has long supported managing dom0. Operations that are not supported on dom0 are filtered in libvirt where a sensible error is reported. Errors from libxl are not always helpful. E.g., attempting a save on dom0 results in 2015-06-23 15:25:05 MDT libxl: debug: libxl_dom.c:1570:libxl__toolstack_save: domain=0 toolstack data size=8 2015-06-23 15:25:05 MDT libxl: debug: libxl.c:979:do_libxl_domain_suspend: ao 0x7f7e68000b70: inprogress: poller=0x7f7e68000930, flags=i 2015-06-23 15:25:05 MDT libxl-save-helper: debug: starting save: Success 2015-06-23 15:25:05 MDT xc: detail: xc_domain_save_suse: starting save of domid 0 2015-06-23 15:25:05 MDT xc: error: Couldn't map live_shinfo (3 = No such process): Internal error 2015-06-23 15:25:05 MDT xc: detail: Save exit of domid 0 with errno=3 2015-06-23 15:25:05 MDT libxl-save-helper: debug: complete r=1: No such process 2015-06-23 15:25:05 MDT libxl: error: libxl_dom.c:1876:libxl__xc_domain_save_done: saving domain: domain did not respond to suspend request: No such process 2015-06-23 15:25:05 MDT libxl: error: libxl_dom.c:2033:remus_teardown_done: Remus: failed to teardown device for guest with domid 0, rc -8 Signed-off-by: Jim Fehlig jfeh...@suse.com --- src/libxl/libxl_driver.c | 95 1 file changed, 95 insertions(+) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 149ef70..d0b76ac 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -79,6 +79,15 @@ VIR_LOG_INIT(libxl.libxl_driver); /* Number of Xen scheduler parameters */ #define XEN_SCHED_CREDIT_NPARAM 2 +#define LIBXL_CHECK_DOM0_GOTO(name, label) \ +do { \ +if (STREQ_NULLABLE(name, Domain-0)) { \ +virReportError(VIR_ERR_OPERATION_INVALID, %s, \ + _(Domain-0 does not support requested operation)); \ +goto label; \ + } \ +} while (0) + static libxlDriverPrivatePtr libxl_driver; @@ -501,6 +510,62 @@ const struct libxl_event_hooks ev_hooks = { }; static int +libxlAddDom0(libxlDriverPrivatePtr driver) +{ +libxlDriverConfigPtr cfg = libxlDriverConfigGet(driver); +virDomainDefPtr def = NULL; +virDomainObjPtr vm = NULL; +virDomainDefPtr oldDef = NULL; +libxl_dominfo d_info; +int ret = -1; + +libxl_dominfo_init(d_info); + +/* Ensure we have a dom0 */ +if (libxl_domain_info(cfg-ctx, d_info, 0) != 0) { +virReportError(VIR_ERR_INTERNAL_ERROR, + %s, _(unable to get Domain-0 information from libxenlight)); +goto cleanup; +} + +if (!(def = virDomainDefNew())) +goto cleanup; + +def-id = 0; +def-virtType = VIR_DOMAIN_VIRT_XEN; +if (VIR_STRDUP(def-name, Domain-0) 0) +goto cleanup; + +def-os.type = VIR_DOMAIN_OSTYPE_XEN; + +if (virUUIDParse(----, def-uuid) 0) +goto cleanup; + +vm-def-vcpus = d_info.vcpu_online; +vm-def-maxvcpus = d_info.vcpu_max_id + 1; +vm-def-mem.cur_balloon = d_info.current_memkb; +vm-def-mem.max_balloon = d_info.max_memkb; Opps. Before sending the patch, but after testing it again, I moved the call to libxl_domain_info to the beginning of this function. I also moved setting the vcpu and memory info earlier, but + +if (!(vm = virDomainObjListAdd(driver-domains, def, + driver-xmlopt, + 0, + oldDef))) +goto cleanup; + +def = NULL; +ret = 0; before getting a virDomainObj - ouch. Consider the following obvious fix squashed in diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index d0b76ac..c0dd00b 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -541,18 +541,19 @@ libxlAddDom0(libxlDriverPrivatePtr driver) if (virUUIDParse(----, def-uuid) 0) goto cleanup; +if (!(vm = virDomainObjListAdd(driver-domains, def, + driver-xmlopt, + 0, + oldDef))) +goto cleanup; + +def = NULL; + vm-def-vcpus = d_info.vcpu_online; vm-def-maxvcpus = d_info.vcpu_max_id + 1;
[Xen-devel] [PATCH OSSTEST v8 06/14] Test pygrub and pvgrub on the regular flights
Since we now have the ability to test these drop one of each of pygrub, pvgrub-32 and pvgrub-64 into the standard flights. Omitting the {Guest}_diver runvar causes ts-debian-di-install to use the d-i images in the location configured via TftpDiVersion, so they are Version Controlled along with the d-i version used for the host. This adds three new jobs: +test-amd64-amd64-amd64-pvgrub all_hostflags arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test +test-amd64-amd64-amd64-pvgrub archamd64 +test-amd64-amd64-amd64-pvgrub buildjob build-amd64 +test-amd64-amd64-amd64-pvgrub debian_arch amd64 +test-amd64-amd64-amd64-pvgrub debian_bootloader pvgrub +test-amd64-amd64-amd64-pvgrub debian_method netboot +test-amd64-amd64-amd64-pvgrub debian_suite wheezy +test-amd64-amd64-amd64-pvgrub kernbuildjob build-amd64-pvops +test-amd64-amd64-amd64-pvgrub kernkindpvops +test-amd64-amd64-amd64-pvgrub toolstack xl +test-amd64-amd64-amd64-pvgrub xenbuildjob build-amd64 +test-amd64-amd64-i386-pvgrub all_hostflags arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test +test-amd64-amd64-i386-pvgrub archamd64 +test-amd64-amd64-i386-pvgrub buildjob build-amd64 +test-amd64-amd64-i386-pvgrub debian_arch i386 +test-amd64-amd64-i386-pvgrub debian_bootloader pvgrub +test-amd64-amd64-i386-pvgrub debian_method netboot +test-amd64-amd64-i386-pvgrub debian_suite wheezy +test-amd64-amd64-i386-pvgrub kernbuildjob build-amd64-pvops +test-amd64-amd64-i386-pvgrub kernkindpvops +test-amd64-amd64-i386-pvgrub toolstack xl +test-amd64-amd64-i386-pvgrub xenbuildjob build-amd64 +test-amd64-amd64-pygrub all_hostflags arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test +test-amd64-amd64-pygrub archamd64 +test-amd64-amd64-pygrub buildjob build-amd64 +test-amd64-amd64-pygrub debian_arch amd64 +test-amd64-amd64-pygrub debian_bootloader pygrub +test-amd64-amd64-pygrub debian_method netboot +test-amd64-amd64-pygrub debian_suite wheezy +test-amd64-amd64-pygrub kernbuildjob build-amd64-pvops +test-amd64-amd64-pygrub kernkindpvops +test-amd64-amd64-pygrub toolstack xl +test-amd64-amd64-pygrub xenbuildjob build-amd64 Signed-off-by: Ian Campbell ian.campb...@citrix.com Acked-by: Ian Jackson ian.jack...@eu.citrix.com --- v7: Use {Guest}_suite not {Guest}_dist as runvar to choose version. Refreshed runvars i ncommit message. v3: added runvar details --- make-flight | 39 +++ 1 file changed, 39 insertions(+) diff --git a/make-flight b/make-flight index de8393a..725da26 100755 --- a/make-flight +++ b/make-flight @@ -325,6 +325,42 @@ do_passthrough_tests () { done } +do_pygrub_tests () { + if [ $xenarch != amd64 -o $dom0arch != amd64 -o $kern != ]; then +return + fi + + job_create_test test-$xenarch$kern-$dom0arch-pygrub \ +test-debian-di xl $xenarch $dom0arch\ + debian_arch=amd64 \ + debian_suite=$guestsuite \ + debian_method=netboot \ + debian_bootloader=pygrub \ + all_hostflags=$most_hostflags +} + +do_pvgrub_tests () { + if [ $xenarch != amd64 -o $dom0arch != amd64 -o $kern != ]; then +return + fi + + job_create_test test-$xenarch$kern-$dom0arch-amd64-pvgrub \ +test-debian-di xl $xenarch $dom0arch\ + debian_arch=amd64 \ + debian_suite=$guestsuite \ + debian_method=netboot \ + debian_bootloader=pvgrub \ + all_hostflags=$most_hostflags \ + + job_create_test test-$xenarch$kern-$dom0arch-i386-pvgrub \ +test-debian-di xl $xenarch $dom0arch\ +
[Xen-devel] [PATCH OSSTEST v8 01/14] mfi-common: Allow make-*flight to filter the set of build jobs to include
By using the same job_create_build(_filter_callback) scheme used for the test jobs. Will be used in make-distros-flight. Signed-off-by: Ian Campbell ian.campb...@citrix.com Acked-by: Ian Jackson ian.jack...@eu.citrix.com --- v8: Moved to head of queue, make-distros-flight isn't introduced yet so that hunk is dropped here and comes back later on. --- make-flight | 4 mfi-common | 21 +++-- 2 files changed, 19 insertions(+), 6 deletions(-) diff --git a/make-flight b/make-flight index c763ce9..de8393a 100755 --- a/make-flight +++ b/make-flight @@ -34,6 +34,10 @@ flight=`./cs-flight-create $blessing $branch` defsuite=`getconfig DebianSuite` defguestsuite=`getconfig GuestDebianSuite` +job_create_build_filter_callback () { +: +} + if [ x$buildflight = x ]; then create_build_jobs diff --git a/mfi-common b/mfi-common index a9e966f..a100afb 100644 --- a/mfi-common +++ b/mfi-common @@ -54,6 +54,15 @@ xenbranch_xsm_variants () { esac } +job_create_build () { + job_create_build_filter_callback $@ || return 0 + + local job=$1; shift + local recipe=$1; shift + + ./cs-job-create $flight $job $recipe $@ +} + create_build_jobs () { local arch @@ -164,7 +173,7 @@ create_build_jobs () { else xsm_suffix= fi - ./cs-job-create $flight build-$arch$xsm_suffix build \ + job_create_build build-$arch$xsm_suffix build \ arch=$arch enable_xend=$build_defxend enable_ovmf=$enable_ovmf\ enable_xsm=$enable_xsm \ tree_qemu=$TREE_QEMU \ @@ -183,7 +192,7 @@ create_build_jobs () { done if [ $build_extraxend = true ] ; then -./cs-job-create $flight build-$arch-xend build \ +job_create_build build-$arch-xend build \ arch=$arch enable_xend=true enable_ovmf=$enable_ovmf \ tree_qemu=$TREE_QEMU \ tree_qemuu=$TREE_QEMU_UPSTREAM \ @@ -196,7 +205,7 @@ create_build_jobs () { revision_qemuu=$REVISION_QEMU_UPSTREAM fi -./cs-job-create $flight build-$arch-pvops build-kern \ +job_create_build build-$arch-pvops build-kern\ arch=$arch kconfighow=xen-enable-xen-config \ $RUNVARS $BUILD_RUNVARS $BUILD_LINUX_RUNVARS $arch_runvars \ $suite_runvars \ @@ -208,7 +217,7 @@ create_build_jobs () { if [ x$REVISION_LIBVIRT != xdisable ]; then -./cs-job-create $flight build-$arch-libvirt build-libvirt\ +job_create_build build-$arch-libvirt build-libvirt \ arch=$arch \ tree_xen=$TREE_XEN \ $RUNVARS $BUILD_RUNVARS $BUILD_LIBVIRT_RUNVARS $arch_runvars \ @@ -223,7 +232,7 @@ create_build_jobs () { case $arch in i386|amd64) -./cs-job-create $flight build-$arch-rumpuserxen build-rumpuserxen\ +job_create_build build-$arch-rumpuserxen build-rumpuserxen \ arch=$arch \ tree_xen=$TREE_XEN \ $RUNVARS $BUILD_RUNVARS $BUILD_RUMPUSERXEN_RUNVARS $arch_runvars \ @@ -252,7 +261,7 @@ create_build_jobs () { if [ x$REVISION_LINUX_OLD != xdisable ]; then - ./cs-job-create $flight build-$arch-oldkern build-kern\ + job_create_build build-$arch-oldkern build-kern \ arch=$arch kconfighow=create-config-sh \ kimagefile=vmlinux \ $RUNVARS $BUILD_RUNVARS $BUILD_LINUX_OLD_RUNVARS\ -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH OSSTEST v8 02/14] TestSupport: Add helper to fetch a URL on a host
Signed-off-by: Ian Campbell ian.campb...@citrix.com --- v8: Use \Q...\E to safely quote $url and $path v7: Quote $url and $path, switch to a heredoc to avoid resulting over long line v5: Support http_proxy via $c{HttpProxy} v3: Make sure wget is installed --- Osstest/Debian.pm | 2 +- Osstest/TestSupport.pm | 12 +++- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm index 718a7e2..2d49ff8 100644 --- a/Osstest/Debian.pm +++ b/Osstest/Debian.pm @@ -841,7 +841,7 @@ d-i apt-setup/another boolean false d-i apt-setup/non-free boolean false d-i apt-setup/contrib boolean false -d-i pkgsel/include string openssh-server, ntp, ntpdate, ethtool, chiark-utils-bin, $extra_packages +d-i pkgsel/include string openssh-server, ntp, ntpdate, ethtool, chiark-utils-bin, wget, $extra_packages d-i grub-installer/force-efi-extra-removable boolean true diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm index b5994a4..1cace4f 100644 --- a/Osstest/TestSupport.pm +++ b/Osstest/TestSupport.pm @@ -55,7 +55,7 @@ BEGIN { target_putfilecontents_stash target_putfilecontents_root_stash target_put_guest_image target_editfile - target_editfile_cancel + target_editfile_cancel target_fetchurl target_editfile_root target_file_exists target_editfile_kvp_replace target_run_apt @@ -1595,6 +1595,16 @@ END return $cfgpath; } +sub target_fetchurl($$$;$) { +my ($ho, $url, $path, $timeo) = @_; +$timeo ||= 2000; +my $useproxy = export http_proxy=$c{HttpProxy}; if $c{HttpProxy}; +target_cmd_root($ho, END, $timeo); +$useproxy wget --progress=dot:mega -O \Q$path\E \Q$url\E +END +} + + sub target_put_guest_image ($$;$) { my ($ho, $gho, $default) = @_; my $specimage = $r{$gho-{Guest}_image}; -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH OSSTEST v8 00/14] add distro domU testing flight
Hi, Since v7 I've done the switch from lvm to none as discussed, fixed (I hope!) the quoting in the fetchurl helper and added the runvar docs to the ts-debian-di-install script. I also pushed the build job filtering to the head (which resulted in some other patches being folded in to the introduction in make-distros-flight instead of later). I retained acks even when changing things due to either the moving of the make-*flight filter or the moving of the runvar docs to the script, otherwise I dropped them, I hope that is ok. Summary of (A)cks, (M)odified and (N)ew (NM==Replaced something): AM mfi-common: Allow make-*flight to filter the set of build jobs to include M TestSupport: Add helper to fetch a URL on a host AM distros: add support for installing Debian PV guests via d-i, flight and jobs AM distros: support booting Debian PV (d-i installed) guests with pvgrub. M distros: Support pvgrub for Wheezy too. A Test pygrub and pvgrub on the regular flights A distros: add branch infrastructure N crontab-cambridge: Use hard tabs for alignment. M distros: Run one suite per day on a weekly basis A Debian: Handle lack of bootloader support in d-i on ARM. A ts-debian-di-install: Refactor root_disk specification A make-flight: refactor PV debian tests M Add testing of file backed disk formats make-distros-flight: Use ftp.debian.org directly Results for an adhoc xen-unstable flight are at http://osstest.xs.citrite.net/~osstest/testlogs/logs/37711/ And for Jessie: http://osstest.xs.citrite.net/~osstest/testlogs/logs/37717/ Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH OSSTEST v8 08/14] crontab-cambridge: Use hard tabs for alignment.
Also quote the value of BRANCHES=. Signed-off-by: Ian Campbell ian.campb...@citrix.com --- v8: Slit out from distros: Run one suite per day on a weekly basis --- crontab-cambridge | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/crontab-cambridge b/crontab-cambridge index 60bb4fd..e0c3eff 100644 --- a/crontab-cambridge +++ b/crontab-cambridge @@ -1,5 +1,5 @@ PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin MAILTO=ian.jack...@citrix.com,ian.campb...@eu.citrix.com -# mh dom mon dow command -4-59/30* * * * cd testing.git BRANCHES=osstest ./cr-for-branches branches -q ./cr-daily-branch --real -3 4 * * * savelog -c28 testing.git/tmp/cr-for-branches.log /dev/null +# mh dom mon dow command +4-59/30* * * * cd testing.git BRANCHES='osstest'./cr-for-branches branches -q ./cr-daily-branch --real +3 4 * * * savelog -c28 testing.git/tmp/cr-for-branches.log /dev/null -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel
On 08/07/15 11:58, Jan Beulich wrote: On 08.07.15 at 11:39, david.vra...@citrix.com wrote: On 08/07/15 09:56, Jan Beulich wrote: Rather than assuming only PV guests need special treatment (and dealing with that directly when an IRQ gets set up), keep all guest MSI IRQs masked until either the (HVM) guest unmasks them via vMSI or the (PV, PVHVM, or PVH) guest sets up an event channel for it. To not further clutter the common evtchn_bind_pirq() with x86-specific code, introduce an arch_evtchn_bind_pirq() hook instead. Can you describe the symptoms of the bug being fixed here? Interrupts simply didn't get unmasked for PVHVM Linux guests. --- a/xen/include/asm-arm/irq.h +++ b/xen/include/asm-arm/irq.h @@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, void arch_move_irqs(struct vcpu *v); +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq))) Would this be better as a inline function? + /* Set IRQ type for an SPI */ int irq_set_spi_type(unsigned int spi, unsigned int type); --- a/xen/include/xen/irq.h +++ b/xen/include/xen/irq.h @@ -172,4 +172,8 @@ unsigned int set_desc_affinity(struct ir unsigned int arch_hwdom_irqs(domid_t); #endif +#ifndef arch_evtchn_bind_pirq +void arch_evtchn_bind_pirq(struct domain *, int pirq); ... moving this into xen/include/asm-x86/irq.h Oh, right, (also to Julien) - this is exactly the reason I do not want it to be an inline function for ARM: I want the declaration here, not replicated in every interested arch's header. Ok. FWIW, with this requirement I would (instead of the macros) add a weak arch_evtchn_bind_pirq() that's a no-op. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 15/15] Add a command line parameter for VT-d posted-interrupts
From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM Enable VT-d Posted-Interrupts and add a command line parameter for it. Signed-off-by: Feng Wu feng...@intel.com --- v3: Remove the redundant no intremp then no intpost logic docs/misc/xen-command-line.markdown | 9 - xen/drivers/passthrough/iommu.c | 4 +++- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index aa684c0..f8ec15f 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -875,6 +875,13 @@ debug hypervisor only). Control the use of interrupt remapping (DMA remapping will always be enabled if IOMMU functionality is enabled). + `intpost` + + Default: `true` + + Control the use of interrupt posting, interrupt posting is dependant on + interrupt remapping. Control the use of interrupt posting, which depends on the availability of interrupt remapping. + `qinval` (VT-d) Default: `true` diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index 597f676..e13251c 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -52,7 +52,7 @@ bool_t __read_mostly iommu_passthrough; bool_t __read_mostly iommu_snoop = 1; bool_t __read_mostly iommu_qinval = 1; bool_t __read_mostly iommu_intremap = 1; -bool_t __read_mostly iommu_intpost; +bool_t __read_mostly iommu_intpost = 1; bool_t __read_mostly iommu_hap_pt_share = 1; bool_t __read_mostly iommu_debug; bool_t __read_mostly amd_iommu_perdev_intremap = 1; @@ -97,6 +97,8 @@ static void __init parse_iommu_param(char *s) iommu_qinval = val; else if ( !strcmp(s, intremap) ) iommu_intremap = val; +else if ( !strcmp(s, intpost) ) +iommu_intpost = val; else if ( !strcmp(s, debug) ) { iommu_debug = val; -- 2.1.0 Reviewed-by: Kevin Tian kevin.t...@intel.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes
From: Wu, Feng Sent: Wednesday, July 08, 2015 6:32 PM -Original Message- From: Tian, Kevin Sent: Wednesday, July 08, 2015 6:23 PM To: Wu, Feng; xen-devel@lists.xen.org Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang, Yang Z; george.dun...@eu.citrix.com Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config changes From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM When guest changes its interrupt configuration (such as, vector, etc.) for direct-assigned devices, we need to update the associated IRTE with the new guest vector, so external interrupts from the assigned devices can be injected to guests without VM-Exit. For lowest-priority interrupts, we use vector-hashing mechamisn to find the destination vCPU. This follows the hardware behavior, since modern Intel CPUs use vector hashing to handle the lowest-priority interrupt. For multicast/broadcast vCPU, we cannot handle it via interrupt posting, still use interrupt remapping. Signed-off-by: Feng Wu feng...@intel.com --- v3: - Use bitmap to store the all the possible destination vCPUs of an interrupt, then trying to find the right destination from the bitmap - Typo and some small changes xen/drivers/passthrough/io.c | 96 +++- 1 file changed, 95 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index 9b77334..18e24e1 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -26,6 +26,7 @@ #include asm/hvm/iommu.h #include asm/hvm/support.h #include xen/hvm/irq.h +#include asm/io_apic.h static DEFINE_PER_CPU(struct list_head, dpci_list); @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci) xfree(dpci); } +/* + * The purpose of this routine is to find the right destination vCPU for + * an interrupt which will be delivered by VT-d posted-interrupt. There + * are several cases as below: If you aim to have this interface common to more usages, don't restrict to VT-d posted-interrupt which should be just an example. Yes, making this a common interface should be better. + * + * - For lowest-priority interrupts, we find the destination vCPU from the + * guest vector using vector-hashing mechanism and return true. This follows + * the hardware behavior, since modern Intel CPUs use vector hashing to + * handle the lowest-priority interrupt. Does AMD use same hashing mechanism? Can this interface be reused by other IOMMU type or it's an Intel specific implementation? I am not sure how AMD handle lowest-priority. Intel hardware guys told me recent Intel hardware platform use this method to deliver lowest-priority interrupts. What do you mean by other IOMMU type? OS doesn't assume how vector hashing is done in hardware level. So it should be fine to use Intel algorithm in this emulation path. However my point is just about the comment since modern Intel CPUs use vector hashing to handle the lowest-priority interrupt. It's not because Intel does so. It's the implementation option that you choose Intel algorithm here. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes
From: Wu, Feng Sent: Wednesday, July 08, 2015 7:52 PM + * - For lowest-priority interrupts, we find the destination vCPU from the + * guest vector using vector-hashing mechanism and return true. This follows + * the hardware behavior, since modern Intel CPUs use vector hashing to + * handle the lowest-priority interrupt. Does AMD use same hashing mechanism? Can this interface be reused by other IOMMU type or it's an Intel specific implementation? I am not sure how AMD handle lowest-priority. Intel hardware guys told me recent Intel hardware platform use this method to deliver lowest-priority interrupts. What do you mean by other IOMMU type? OS doesn't assume how vector hashing is done in hardware level. So it should be fine to use Intel algorithm in this emulation path. However my point is just about the comment since modern Intel CPUs use vector hashing to handle the lowest-priority interrupt. It's not because Intel does so. It's the implementation option that you choose Intel algorithm here. here I can mention: we choose vector-hashing for lowest-priority handling and list Intel as an example to use it, okay? Yes. :-) Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v25 06/15] x86/VPMU: Initialize PMU for PV(H) guests
Am Freitag 19 Juni 2015, 14:44:37 schrieb Boris Ostrovsky: Code for initializing/tearing down PMU for PV guests Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com Acked-by: Daniel De Graaf dgde...@tycho.nsa.gov Acked-by: Jan Beulich jbeul...@suse.com Acked-by: Kevin Tian kevin.t...@intel.com Reviewed-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com --- tools/flask/policy/policy/modules/xen/xen.te | 4 + xen/arch/x86/domain.c| 2 + xen/arch/x86/hvm/hvm.c | 1 + xen/arch/x86/hvm/svm/svm.c | 4 +- xen/arch/x86/hvm/svm/vpmu.c | 16 +++- xen/arch/x86/hvm/vmx/vmx.c | 4 +- xen/arch/x86/hvm/vmx/vpmu_core2.c| 30 -- xen/arch/x86/hvm/vpmu.c | 131 --- xen/common/event_channel.c | 1 + xen/include/asm-x86/hvm/vpmu.h | 2 + xen/include/public/pmu.h | 2 + xen/include/public/xen.h | 1 + xen/include/xsm/dummy.h | 3 + xen/xsm/flask/hooks.c| 4 + xen/xsm/flask/policy/access_vectors | 2 + 15 files changed, 181 insertions(+), 26 deletions(-) diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te index 45b5cb2..f553eb5 100644 --- a/tools/flask/policy/policy/modules/xen/xen.te +++ b/tools/flask/policy/policy/modules/xen/xen.te @@ -130,6 +130,10 @@ if (guest_writeconsole) { dontaudit domain_type xen_t : xen writeconsole; } +# Allow all domains to use PMU (but not to change its settings --- that's what +# pmu_ctrl is for) +allow domain_type xen_t:xen2 pmu_use; + ### # # Domain creation diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index dc18565..b699f68 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -438,6 +438,8 @@ int vcpu_initialise(struct vcpu *v) vmce_init_vcpu(v); } +spin_lock_init(v-arch.vpmu.vpmu_lock); + if ( has_hvm_container_domain(d) ) { rc = hvm_vcpu_initialise(v); diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index d5e5242..83a81f5 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4931,6 +4931,7 @@ static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = { HYPERCALL(hvm_op), HYPERCALL(sysctl), HYPERCALL(domctl), +HYPERCALL(xenpmu_op), [ __HYPERVISOR_arch_1 ] = (hvm_hypercall_t *)paging_domctl_continuation }; diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index a02f983..680eebe 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1165,7 +1165,9 @@ static int svm_vcpu_initialise(struct vcpu *v) return rc; } -vpmu_initialise(v); +/* PVH's VPMU is initialized via hypercall */ +if ( is_hvm_vcpu(v) ) +vpmu_initialise(v); svm_guest_osvw_init(v); diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index b60ca40..a8572a6 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -364,13 +364,11 @@ static void amd_vpmu_destroy(struct vcpu *v) amd_vpmu_unset_msr_bitmap(v); xfree(vpmu-context); -vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED); if ( vpmu_is_set(vpmu, VPMU_RUNNING) ) -{ -vpmu_reset(vpmu, VPMU_RUNNING); release_pmu_ownship(PMU_OWNER_HVM); -} + +vpmu_clear(vpmu); } /* VPMU part of the 'q' keyhandler */ @@ -482,6 +480,16 @@ int __init amd_vpmu_init(void) return -EINVAL; } +if ( sizeof(struct xen_pmu_data) + + 2 * sizeof(uint64_t) * num_counters PAGE_SIZE ) +{ +printk(XENLOG_WARNING + VPMU: Register bank does not fit into VPMU shared page\n); +counters = ctrls = NULL; +num_counters = 0; +return -ENOSPC; +} + return 0; } diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 0837627..50e11dd 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -140,7 +140,9 @@ static int vmx_vcpu_initialise(struct vcpu *v) } } -vpmu_initialise(v); +/* PVH's VPMU is initialized via hypercall */ +if ( is_hvm_vcpu(v) ) +vpmu_initialise(v); vmx_install_vlapic_mapping(v); diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c index 025c970..e7642e5 100644 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c @@ -365,13 +365,16 @@ static int core2_vpmu_alloc_resource(struct vcpu *v) if ( !acquire_pmu_ownership(PMU_OWNER_HVM) ) return 0; -
Re: [Xen-devel] [PATCH V4 3/3] xen/vm_event: Deny register writes if refused by vm_event reply
On Wed, Jul 8, 2015 at 6:22 AM, Razvan Cojocaru rcojoc...@bitdefender.com wrote: Deny register writes if a vm_client subscribed to mov_to_msr or control register write events forbids them. Currently supported for MSR, CR0, CR3 and CR4 events. Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com Acked-by: George Dunlap george.dun...@eu.citrix.com Acked-by: Jan Beulich jbeul...@suse.com --- Changes since V3: - Renamed MEM_ACCESS_FLAG_DENY to VM_EVENT_FLAG_DENY (and fixed the bit shift appropriately). - Moved the DENY vm_event response logic from p2m.c to newly added dedicated files for vm_event handling, as suggested by Tamas Lengyel. This looks good to me. It will have to be rebased on staging once the other series is merged as couple things will conflict. If this series lands first however, the newly added asm/vm_event files lack the required license header. With that: Acked-by: Tamas K Lengyel tleng...@novetta.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM This patch includes the following aspects: - Add a global vector to wake up the blocked vCPU when an interrupt is being posted to it (This part was sugguested by Yang Zhang yang.z.zh...@intel.com). - Adds a new per-vCPU tasklet to wakeup the blocked vCPU. It can be used in the case vcpu_unblock cannot be called directly. - Define two per-cpu variables: * pi_blocked_vcpu: A list storing the vCPUs which were blocked on this pCPU. * pi_blocked_vcpu_lock: The spinlock to protect pi_blocked_vcpu. Signed-off-by: Feng Wu feng...@intel.com --- v3: - This patch is generated by merging the following three patches in v2: [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU [RFC v2 10/15] vmx: Define two per-cpu variables [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet' - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct' - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler' - Make pi_wakeup_interrupt() static - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list' - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct' - Rename 'blocked_vcpu' to 'pi_blocked_vcpu' - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock' xen/arch/x86/hvm/vmx/vmcs.c| 3 +++ xen/arch/x86/hvm/vmx/vmx.c | 54 ++ xen/include/asm-x86/hvm/hvm.h | 1 + xen/include/asm-x86/hvm/vmx/vmcs.h | 5 xen/include/asm-x86/hvm/vmx/vmx.h | 5 5 files changed, 68 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 11dc1b5..0c5ce3f 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -631,6 +631,9 @@ int vmx_cpu_up(void) if ( cpu_has_vmx_vpid ) vpid_sync_all(); +INIT_LIST_HEAD(per_cpu(pi_blocked_vcpu, cpu)); +spin_lock_init(per_cpu(pi_blocked_vcpu_lock, cpu)); + return 0; } diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index b94ef6a..7db6009 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content); static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content); static void vmx_invlpg_intercept(unsigned long vaddr); +/* + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we + * can find which vCPU should be waken up. + */ +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu); +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock); + uint8_t __read_mostly posted_intr_vector; +uint8_t __read_mostly pi_wakeup_vector; + +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg) +{ +vcpu_unblock((struct vcpu *)arg); +} static int vmx_domain_initialise(struct domain *d) { @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v) if ( v-vcpu_id == 0 ) v-arch.user_regs.eax = 1; +tasklet_init( +v-arch.hvm_vmx.pi_vcpu_wakeup_tasklet, +pi_vcpu_wakeup_tasklet_handler, +(unsigned long)v); + +INIT_LIST_HEAD(v-arch.hvm_vmx.pi_blocked_vcpu_list); + return 0; } static void vmx_vcpu_destroy(struct vcpu *v) { +tasklet_kill(v-arch.hvm_vmx.pi_vcpu_wakeup_tasklet); /* * There are cases that domain still remains in log-dirty mode when it is * about to be destroyed (ex, user types 'xl destroy dom'), in which case @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata vmx_function_table = { .enable_msr_exit_interception = vmx_enable_msr_exit_interception, }; +/* + * Handle VT-d posted-interrupt when VCPU is blocked. + */ +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) +{ +struct arch_vmx_struct *vmx; +unsigned int cpu = smp_processor_id(); + +spin_lock(per_cpu(pi_blocked_vcpu_lock, cpu)); + +/* + * FIXME: The length of the list depends on how many + * vCPU is current blocked on this specific pCPU. + * This may hurt the interrupt latency if the list + * grows to too many entries. + */ let's go with this linked list first until a real issue is identified. +list_for_each_entry(vmx, per_cpu(pi_blocked_vcpu, cpu), +pi_blocked_vcpu_list) +if ( vmx-pi_desc.on ) +tasklet_schedule(vmx-pi_vcpu_wakeup_tasklet); Not sure where the vcpu is removed from the list (possibly in later patch). But at least removing vcpu from the list at this point should be safe and right way to go. IIRC Andrew and other guys raised similar concern earlier. :-) Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 2/3] arm: Allow the user to specify the GIC version
On Wed, 2015-07-08 at 11:17 +0100, Ian Campbell wrote: On Tue, 2015-07-07 at 17:22 +0100, Julien Grall wrote: diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index e1632fa..11f6461 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -369,6 +369,12 @@ libxl_vnode_info = Struct(vnode_info, [ (vcpus, libxl_bitmap), # vcpus in this node ]) +libxl_gic_version = Enumeration(gic_version, [ +(0, DEFAULT), +(0x20, v2), +(0x30, v3) +], init_val = LIBXL_GIC_VERSION_DEFAULT) + libxl_domain_build_info = Struct(domain_build_info,[ (max_vcpus, integer), (avail_vcpus, libxl_bitmap), @@ -480,6 +486,11 @@ libxl_domain_build_info = Struct(domain_build_info,[ ])), (invalid, None), ], keyvar_init_val = LIBXL_DOMAIN_TYPE_INVALID)), + + +(arch_arm, Struct(None, [(gic_version, libxl_gic_version), + ])), + ], dir=DIR_IN This results in the following when building the ocaml bindings: Traceback (most recent call last): File genwrap.py, line 529, in module ml.write(gen_ocaml_ml(ty, False)) File genwrap.py, line 217, in gen_ocaml_ml s += gen_struct(ty) File genwrap.py, line 119, in gen_struct x = ocaml_instance_of_field(f) File genwrap.py, line 112, in ocaml_instance_of_field return %s : %s % (munge_name(name), ocaml_type_of(f.type)) File genwrap.py, line 90, in ocaml_type_of return ty.rawname.capitalize() + .t AttributeError: 'NoneType' object has no attribute 'capitalize' make[7]: *** No rule to make target '_libxl_types.ml.in', needed by 'xenlight.ml'. Stop. I'll take a look. I have a patch to genwrap.py which results in the following diff to the generate ml files for the anonymous sub-struct added by the IDL change above. Dave/Euan/Rob, is that idiomatic ocaml or is it possible to have anonymous structs in ocaml like it is in C? If there is a better/more usual way to do this would you mind supplying me with the ocaml I should be aiming for please? Ian. --- tools/ocaml/libs/xl/_libxl_BACKUP_types.ml.in 2015-07-08 11:22:35.0 +0100 +++ tools/ocaml/libs/xl/_libxl_types.ml.in 2015-07-08 12:25:56.0 +0100 @@ -508,6 +508,17 @@ module Vnode_info = struct external default : ctx - unit - t = stub_libxl_vnode_info_init end +(* libxl_gic_version implementation *) +type gic_version = +| GIC_VERSION_DEFAULT +| GIC_VERSION_V2 +| GIC_VERSION_V3 + +let string_of_gic_version = function + | GIC_VERSION_DEFAULT - DEFAULT + | GIC_VERSION_V2 - V2 + | GIC_VERSION_V3 - V3 + (* libxl_domain_build_info implementation *) module Domain_build_info = struct @@ -566,6 +577,10 @@ module Domain_build_info = struct type type__union = Hvm of type_hvm | Pv of type_pv | Invalid + type arch_arm__anon = { + gic_version : gic_version; + } + type t = { max_vcpus : int; @@ -607,6 +622,7 @@ module Domain_build_info = struct ramdisk : string option; device_tree : string option; xl_type : type__union; + arch_arm : arch_arm__anon; } external default : ctx - ?xl_type:domain_type - unit - t = stub_libxl_domain_build_info_init end ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes
From: Wu, Feng Sent: Wednesday, July 08, 2015 7:05 PM -Original Message- From: Wu, Feng Sent: Wednesday, July 08, 2015 6:32 PM To: Tian, Kevin; xen-devel@lists.xen.org Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang, Yang Z; george.dun...@eu.citrix.com; Wu, Feng Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config changes -Original Message- From: Tian, Kevin Sent: Wednesday, July 08, 2015 6:23 PM To: Wu, Feng; xen-devel@lists.xen.org Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang, Yang Z; george.dun...@eu.citrix.com Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config changes From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM When guest changes its interrupt configuration (such as, vector, etc.) for direct-assigned devices, we need to update the associated IRTE with the new guest vector, so external interrupts from the assigned devices can be injected to guests without VM-Exit. For lowest-priority interrupts, we use vector-hashing mechamisn to find the destination vCPU. This follows the hardware behavior, since modern Intel CPUs use vector hashing to handle the lowest-priority interrupt. For multicast/broadcast vCPU, we cannot handle it via interrupt posting, still use interrupt remapping. Signed-off-by: Feng Wu feng...@intel.com --- v3: - Use bitmap to store the all the possible destination vCPUs of an interrupt, then trying to find the right destination from the bitmap - Typo and some small changes xen/drivers/passthrough/io.c | 96 +++- 1 file changed, 95 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index 9b77334..18e24e1 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -26,6 +26,7 @@ #include asm/hvm/iommu.h #include asm/hvm/support.h #include xen/hvm/irq.h +#include asm/io_apic.h static DEFINE_PER_CPU(struct list_head, dpci_list); @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci) xfree(dpci); } +/* + * The purpose of this routine is to find the right destination vCPU for + * an interrupt which will be delivered by VT-d posted-interrupt. There + * are several cases as below: If you aim to have this interface common to more usages, don't restrict to VT-d posted-interrupt which should be just an example. Yes, making this a common interface should be better. Thinking about this a little more, this function itself is kind of restricted to VT-d posted-interrupt, since it doesn't handle multicast/broadcast interrupts, it only handle lowest-priority and single destination interrupts. However, I can make the vector-hashing logic as a separate function, which can be used elsewhere. iommu_intpost is a general option, not VT-d specific. It's fine to keep this function here. My earlier comment is more about the accuracy of the code comment above. :-) Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V4 1/3] xen/mem_access: Support for memory-content hiding
On Wed, Jul 8, 2015 at 6:22 AM, Razvan Cojocaru rcojoc...@bitdefender.com wrote: This patch adds support for memory-content hiding, by modifying the value returned by emulated instructions that read certain memory addresses that contain sensitive data. The patch only applies to cases where MEM_ACCESS_EMULATE or MEM_ACCESS_EMULATE_NOWRITE have been set to a vm_event response. Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com Acked-by: George Dunlap george.dun...@eu.citrix.com --- Changes since V3: - Renamed MEM_ACCESS_SET_EMUL_READ_DATA to VM_EVENT_FLAG_SET_EMUL_READ_DATA and updated its comment. - Removed xfree(v-arch.vm_event.emul_read_data) from free_vcpu_struct(). - Returning X86EMUL_UNHANDLEABLE from hvmemul_cmpxchg() when !curr-arch.vm_event.emul_read_data. - Replaced in xmalloc_bytes() with xmalloc_array() in hvmemul_rep_outs_set_context(). - Setting the rest of the buffer to zero in hvmemul_rep_movs() (no longer leaking heap contents). - No longer memset()ing the whole buffer before copy (just zeroing out the rest). - Moved hvmemul_ctxt-set_context = 0 to hvm_emulate_prepare() and removed hvm_emulate_one_set_context(). --- tools/tests/xen-access/xen-access.c |2 +- xen/arch/x86/hvm/emulate.c | 138 ++- xen/arch/x86/hvm/event.c| 50 ++--- xen/arch/x86/mm/p2m.c | 92 +-- xen/common/domain.c |2 + xen/common/vm_event.c | 23 ++ xen/include/asm-x86/domain.h|2 + xen/include/asm-x86/hvm/emulate.h | 10 ++- xen/include/public/vm_event.h | 31 ++-- 9 files changed, 274 insertions(+), 76 deletions(-) Acked-by: Tamas K Lengyel tleng...@novetta.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH OSSTEST v8 12/14] make-flight: refactor PV debian tests
No functional change, standalone-generate-dump-flight-runvars confirms no change to the runvars. Includes a hook which is not used yet, $recipe_sfx. Signed-off-by: Ian Campbell ian.campb...@citrix.com Acked-by: Ian Jackson ian.jack...@eu.citrix.com --- v4: new patch --- make-flight | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/make-flight b/make-flight index 725da26..2a132df 100755 --- a/make-flight +++ b/make-flight @@ -361,6 +361,17 @@ do_pvgrub_tests () { all_hostflags=$most_hostflags } +do_pv_debian_test_one () { + testname=$1; shift + recipe_sfx=$1; shift + toolstack=$1; shift + + job_create_test test-$xenarch$kern-$dom0arch-$testname\ + test-debian$recipe_sfx $toolstack \ +$xenarch $dom0arch \ +$debian_runvars all_hostflags=$most_hostflags $@ +} + do_pv_debian_tests () { xsms=$(xenbranch_xsm_variants) @@ -376,20 +387,13 @@ do_pv_debian_tests () { suffix=${platform:+-$platform} hostflags=${most_hostflags}${platform:+,platform-$platform} - job_create_test test-$xenarch$kern-$dom0arch-xl$suffix \ - test-debian xl \ - $xenarch $dom0arch \ - enable_xsm=$xsm \ - $debian_runvars all_hostflags=$hostflags + do_pv_debian_test_one xl$suffix '' xl enable_xsm=$xsm + done done for xsm in $xsms ; do -job_create_test test-$xenarch$kern-$dom0arch-libvirt \ -test-debian libvirt \ -$xenarch $dom0arch \ -enable_xsm=$xsm \ -$debian_runvars all_hostflags=$most_hostflags +do_pv_debian_test_one libvirt '' libvirt enable_xsm=$xsm done } -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH OSSTEST v8 13/14] Add testing of file backed disk formats
xen-create-image makes this tricky to do since it is rather LVM centric. Now that we have the ability to install from d-i it's possible to arrange fairly easily that they use something other than a phy backend over a bare LVM device. Here we add support to the test script and infra and create a bunch of new jobs testing the cross product of {xl,libvirt} x {raw,qcow2,vhd}. A disk format of raw means a raw backing file, where as none (the default) means to continue to use the base LVM device. The test scripts are modified such that when constructing a domain with a diskfmt runvar specifeies a file backed disk format (i.e. not none): - the LVM device is slightly enlarged to account for file format headers (1M should be plenty). - the LVM device will have an ext3 filesystem created on it instead of being used as a phy device for the guest. Reusing the LVM volume in this way means we don't need to do more storage management in dom0 (i.e. arranging for / to be large enough, or managing a special images LV) - the relevant type of container is created within the filesystem using the appropriate tool. - New properties Disk{fmt,spec} are added to all $gho, containing the format used for the root disk and the xl diskspec to load it. - lvm backed guests use a xend/xm compatible spec, everything else uses the improved xl syntax which libvirt also supports. We won't test non-LVM on xend. - New properties Disk{mnt,img} are added to $gho which are not using LVM. These contain the mount point to use (configurable via OSSTEST_CONFIG and runvars) and the full path (including mount point) to the image itself. - When starting or stopping a guest we arrange for the filesystem to be (u)mounted. - The prepearation when starting a guest copes gracefully with the disk already being prepared. - Hooks are called from guest_create() and guest_destroy() to manipulate the disk as needed. Using standalong-generate-dump-flight-runvars a representative set of runvars is: +test-amd64-amd64-xl-qcow2 all_hostflags arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test +test-amd64-amd64-xl-qcow2 archamd64 +test-amd64-amd64-xl-qcow2 buildjob build-amd64 +test-amd64-amd64-xl-qcow2 debian_arch amd64 +test-amd64-amd64-xl-qcow2 debian_bootloader pygrub +test-amd64-amd64-xl-qcow2 debian_diskfmt qcow2 +test-amd64-amd64-xl-qcow2 debian_kernkind pvops +test-amd64-amd64-xl-qcow2 debian_method netboot +test-amd64-amd64-xl-qcow2 debian_suite wheezy +test-amd64-amd64-xl-qcow2 kernbuildjob build-amd64-pvops +test-amd64-amd64-xl-qcow2 kernkindpvops +test-amd64-amd64-xl-qcow2 toolstack xl +test-amd64-amd64-xl-qcow2 xenbuildjob build-amd64 Compared to test-amd64-amd64-pygrub (which is the most similar job) and normalising the test name the difference is: test-amd64-amd64-SUFFIX all_hostflags arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test test-amd64-amd64-SUFFIX archamd64 test-amd64-amd64-SUFFIX buildjob build-amd64 test-amd64-amd64-SUFFIX debian_arch amd64 test-amd64-amd64-SUFFIX debian_bootloader pygrub +test-amd64-amd64-SUFFIX debian_diskfmt qcow2 +test-amd64-amd64-SUFFIX debian_kernkind pvops test-amd64-amd64-SUFFIX debian_method netboot test-amd64-amd64-SUFFIX debian_suite wheezy test-amd64-amd64-SUFFIX kernbuildjob build-amd64-pvops test-amd64-amd64-SUFFIX kernkindpvops test-amd64-amd64-SUFFIX toolstack xl test-amd64-amd64-SUFFIX xenbuildjob build-amd64 Signed-off-by: Ian Campbell ian.campb...@citrix.com --- v8: Default diskfmt is none (was lvm), i.e. use the LVM device directly. Reword the commit log to reflect this. v7: Use the right arch for tests, not always amd64 (doesn't work well on arm!) Defer guest_find_diskimg until _vg runvar and thence Lvdev are setup: selectguest calls guest_find_lv then guest_find_diskimg, using preexisting runvars. But prepare_guest calls selectguest before setting disk_lv, so Lvdev ends up undefined, after setting
[Xen-devel] [PATCH OSSTEST v8 14/14] make-distros-flight: Use ftp.debian.org directly
The local proxy seems to serve stale packages for Jessie etc, I blame the intercepting cache on the way out of our network, similar to b5f15136900d mg-debian-installer-update: workaround caching proxies, except it is between the apt-cache and the world not the osstest vm and the world. Since the netboot kernel+initrd are reasonably small, these flights are infrequent and they are intended to test the current upstream version I think this is tollerable. Signed-off-by: Ian Campbell ian.campb...@citrix.com --- make-distros-flight | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/make-distros-flight b/make-distros-flight index 49f4b60..d407fcb 100755 --- a/make-distros-flight +++ b/make-distros-flight @@ -79,7 +79,9 @@ test_do_one_netboot () { gsuite=sid gver=daily else -local mirror=http://`getconfig DebianMirrorHost`/`getconfig DebianMirrorSubpath` +#local mirror=http://`getconfig DebianMirrorHost`/`getconfig DebianMirrorSubpath` +# XXX local mirror seems to serve up stale files. +local mirror=http://ftp.debian.org/debian; diurl=$mirror/dists/$gsuite/main/installer-$domU/current/images/netboot gver=$gsuite fi -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH OSSTEST v8 05/14] distros: Support pvgrub for Wheezy too.
This requires us to install pv-grub-menu from backports, which we do using a late_command. Signed-off-by: Ian Campbell ian.campb...@citrix.com --- v8: - Use a heredoc for sources.list additions, since this was suggested by Ian I would have retained the ack apart from the second change. - Dropped unused $cd argument to setup_netinst. (Noticed when I came to document gident_cd and found it wasn't used!) v7: - Remove vestigial attempts to enable -backports via d-i preseeding. v3: - Remove spurious () from (END) (and the prexisting too) - Remove $xopts{EnableBackports} and automatically handle the need to add backports in preseed_base. - Install via late_command not apt-setup, since the former has issues, hence subject drops attempt to... --- Osstest/Debian.pm| 40 --- make-distros-flight | 58 +-- ts-debian-di-install | 76 +++- 3 files changed, 168 insertions(+), 6 deletions(-) diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm index 1edf49f..7c94b6c 100644 --- a/Osstest/Debian.pm +++ b/Osstest/Debian.pm @@ -777,8 +777,6 @@ sub preseed_base (;@) { preseed_hook_overlay($ho, $sfx, 'overlay', 'overlay.tar'); my $preseed = END; -d-i mirror/suite string $suite - d-i debian-installer/locale string en_GB d-i console-keymaps-at/keymap select gb d-i keyboard-configuration/xkb-keymap string en_GB @@ -854,6 +852,11 @@ END d-i clock-setup/ntp-server string $ntpserver END +# For CDROM the suite is part of the image +$preseed .= END unless $xopts{CDROM}; +d-i mirror/suite string $suite +END + $preseed .= END; ### END OF DEBIAN PRESEED BASE @@ -867,7 +870,38 @@ sub preseed_create_guest ($$;@) { my $suite= $xopts{Suite} || $c{DebianSuite}; -my $extra_packages = pv-grub-menu if $xopts{PvMenuLst}; +my $extra_packages = ; +if ($xopts{PvMenuLst}) { +if ($suite =~ m/wheezy/) { +# pv-grub-menu/wheezy-backports + using apt-setup to add +# backports results in iproute, ifupdown and +# isc-dhcp-client getting removed because tasksel's +# invocation of apt-get install somehow decides the +# iproute2 from wheezy-backports is a thing it wants to +# install. So instead lets fake it with a late command... +# +# This also has the bonus of working round an issue with +# 1.2.1~bpo70+1 which created an invalid menu.lst using +# root(/dev/xvda,0) which pvgrub cannot parse because +# the Grub device.map isn't present at pkgsel/include time +# but it is by late_command time. This was fixed by +# version 1.3 which is in Jessie onwards. +preseed_hook_command($ho, 'late_command', $sfx, END); +#!/bin/sh +set -ex + +cat EOF /target/etc/apt/sources.list + +\# $suite backports +deb http://$c{DebianMirrorHost}/$c{DebianMirrorSubpath} $suite-backports main +EOF +in-target apt-get update +in-target apt-get install -y -t wheezy-backports pv-grub-menu +END +} else { +$extra_packages = pv-grub-menu; +} +} my $preseed_file= preseed_base($ho, $suite, $sfx, $extra_packages, %xopts); $preseed_file.= (END); diff --git a/make-distros-flight b/make-distros-flight index c19e3ba..49f4b60 100755 --- a/make-distros-flight +++ b/make-distros-flight @@ -106,9 +106,9 @@ test_do_one_netboot () { arm*_arm*_*) bootloader=pygrub;; # no pvgrub for arm # Needs a menu.lst, not present in Squeeze+ due to switch to grub2, -# workedaround in Jessie+ with pv-grub-menu package. +# workedaround in Wheezy+ with pv-grub-menu package (backports in Wheezy, +# in Jessie+ main). *_squeeze) bootloader=pygrub;; -*_wheezy) bootloader=pygrub;; # pv-grub-x86_64.gz is not built by 32-bit dom0 userspace build. i386_amd64_*) bootloader=pygrub;; @@ -127,6 +127,48 @@ test_do_one_netboot () { all_hostflags=$most_hostflags } +test_do_one_netinst () { + local path_arch + case $domU in +amd64|i386) path_arch=multi-arch; file_arch=amd64-i386;; +*) path_arch=$domU; file_arch=$domU;; + esac + case $domU in +amd64) iso_path=/install.amd/xen;; +i386) iso_path=/install.386/xen;; +*) iso_path=/install.$domU;; + esac + + local cdurl= + case $cd in +current) + cdurl=http://cdimage.debian.org/debian-cd/current/${path_arch}/jigdo-cd; + ;; +weekly) + cdurl=http://cdimage.debian.org/cdimage/weekly-builds/${path_arch}/jigdo-cd; + ;; +*) + echo cd $cd? + exit 1 + ;; + esac + + # Always pygrub since no pv-grub-menu on CD + job_create_test \ + test-$xenarch$kern-$dom0arch-$domU-$cd-netinst-pygrub\ +test-debian-di xl $xenarch $dom0arch\ +
[Xen-devel] [PATCH OSSTEST v8 09/14] distros: Run one suite per day on a weekly basis
Once a week should be sufficient for these tests. Perhaps in the future we will want to increase the frequency for the suites under active development (testing, unstable) For now run this on the Citrix Cambridge instance until the XenProject instance has sufficient capacity. Signed-off-by: Ian Campbell ian.campb...@citrix.com --- v8: Switch to hard tabs split to previous patch. v7: Replaces distros: Run a flight over the weekend. Now run in Cambridge Run separate flight per-suite Dropped Ack --- crontab-cambridge | 5 + 1 file changed, 5 insertions(+) diff --git a/crontab-cambridge b/crontab-cambridge index e0c3eff..7d3ed57 100644 --- a/crontab-cambridge +++ b/crontab-cambridge @@ -2,4 +2,9 @@ PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin MAILTO=ian.jack...@citrix.com,ian.campb...@eu.citrix.com # mh dom mon dow command 4-59/30* * * * cd testing.git BRANCHES='osstest'./cr-for-branches branches -q ./cr-daily-branch --real +46 7 * * 6 cd testing.git BRANCHES='distros-debian-snapshot'./cr-for-branches branches -w ./cr-daily-branch --real +46 7 * * 5 cd testing.git BRANCHES='distros-debian-sid' ./cr-for-branches branches -w ./cr-daily-branch --real +46 7 * * 4 cd testing.git BRANCHES='distros-debian-jessie' ./cr-for-branches branches -w ./cr-daily-branch --real +46 7 * * 3 cd testing.git BRANCHES='distros-debian-wheezy' ./cr-for-branches branches -w ./cr-daily-branch --real +46 7 * * 2 cd testing.git BRANCHES='distros-debian-squeeze' ./cr-for-branches branches -w ./cr-daily-branch --real 3 4 * * * savelog -c28 testing.git/tmp/cr-for-branches.log /dev/null -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH OSSTEST v8 04/14] distros: support booting Debian PV (d-i installed) guests with pvgrub.
This requires the use of the pv-grub-menu package which is in Jessie onwards. (it is in wheezy-backports which is the subject of a subsequent patch). The bootloader to use is specified via a runvar {Guest}_bootloader. Adjust make-distros-flight to use pvgrub for some subset of i386 and amd64 guests to get coverage. Signed-off-by: Ian Campbell ian.campb...@citrix.com Acked-by: Ian Jackson ian.jack...@eu.citrix.com --- v8: Added comment regarding new runvar. Since this was inspired by Ian's comment on distros: support PV guest install from Debian netinst media I have retained the ack. v7: Move definition of $extra_packages variable to here which is its first usage. Use {Guest}_suite not {Guest}_dist as runvar to choose version. v3: Define and use arch_debian2xen and arch_xen2debian Avoid pv-grub-x86_64.gz on i386 dom0, we don't built it there. Fiddle with py vs pv grub stripy a bit. --- Osstest.pm | 7 +++ Osstest/Debian.pm| 4 +++- make-distros-flight | 20 +++- ts-debian-di-install | 18 ++ 4 files changed, 43 insertions(+), 6 deletions(-) diff --git a/Osstest.pm b/Osstest.pm index 6535401..8f97dd2 100644 --- a/Osstest.pm +++ b/Osstest.pm @@ -39,6 +39,7 @@ BEGIN { db_begin_work db_prepare ensuredir get_filecontents_core_quiet system_checked nonempty visible_undef show_abs_time + %arch_debian2xen %arch_xen2debian ); %EXPORT_TAGS = ( ); @@ -54,6 +55,12 @@ scalar *main::DEBUG; # declaration prevents `Name main::DEBUG used only once' # scalar prevents `useless use of a variable in void context' +our %arch_debian2xen = qw(i386 x86_32 + amd64 x86_64 + armhf armhf); +our %arch_xen2debian; +$arch_xen2debian{$arch_debian2xen{$_}} = $_ foreach keys %arch_debian2xen; + #-- static default config settings -- our %c = qw( diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm index 2d49ff8..1edf49f 100644 --- a/Osstest/Debian.pm +++ b/Osstest/Debian.pm @@ -867,7 +867,9 @@ sub preseed_create_guest ($$;@) { my $suite= $xopts{Suite} || $c{DebianSuite}; -my $preseed_file= preseed_base($ho, $suite, $sfx, '', %xopts); +my $extra_packages = pv-grub-menu if $xopts{PvMenuLst}; + +my $preseed_file= preseed_base($ho, $suite, $sfx, $extra_packages, %xopts); $preseed_file.= (END); d-i partman-auto/method string regular d-i partman-auto/choose_recipe \\ diff --git a/make-distros-flight b/make-distros-flight index bdca7d1..c19e3ba 100755 --- a/make-distros-flight +++ b/make-distros-flight @@ -90,6 +90,11 @@ test_do_one_netboot () { *) ;; esac + stripy bootloader pvgrub pygrub \ +$xenarch amd64 \ +$dom0arch i386 \ +$domU amd64 \ + case $domU in i386|amd64) diurl=$diurl/xen;; @@ -97,8 +102,20 @@ test_do_one_netboot () { diurl=$diurl/debian-installer/arm64;; esac + case ${dom0arch}_${domU}_${gsuite} in +arm*_arm*_*) bootloader=pygrub;; # no pvgrub for arm + +# Needs a menu.lst, not present in Squeeze+ due to switch to grub2, +# workedaround in Jessie+ with pv-grub-menu package. +*_squeeze) bootloader=pygrub;; +*_wheezy) bootloader=pygrub;; + +# pv-grub-x86_64.gz is not built by 32-bit dom0 userspace build. +i386_amd64_*) bootloader=pygrub;; + esac + job_create_test \ - test-$xenarch$kern-$dom0arch-$domU-$gver-netboot-pygrub \ + test-$xenarch$kern-$dom0arch-$domU-$gver-netboot-$bootloader \ test-debian-di xl $xenarch $dom0arch\ kernbuildjob=${bfi}build-$dom0arch-$kernbuild \ debian_arch=$domU \ @@ -106,6 +123,7 @@ test_do_one_netboot () { debian_method=netboot \ debian_netboot_kernel=$diurl/vmlinuz\ debian_netboot_ramdisk=$diurl/initrd.gz \ + debian_bootloader=$bootloader \ all_hostflags=$most_hostflags } diff --git a/ts-debian-di-install b/ts-debian-di-install index 08019a9..a59194a 100755 --- a/ts-debian-di-install +++ b/ts-debian-di-install @@ -22,13 +22,16 @@ # Debian arch to install. # - gident_method: # Install method, currently only netboot. +# - gident_bootloader: +# The PV bootloader to use when booting the guest. One of +# pvgrub or pygrub. Default is pygrub. # # For method=netboot: # # - gident_netboot_kernel: -# URL of the kernel to boot +# URL of the kernel to boot. # - gident_netboot_ramdisk: -# URL of the ramdisk to boot +# URL of the ramdisk to boot. # #If neither kernel nor ramdisk are specified then the current #TftpDiVersion of d-i will be used, and the runvars will be set to @@
[Xen-devel] [PATCH OSSTEST v8 11/14] ts-debian-di-install: Refactor root_disk specification
Signed-off-by: Ian Campbell ian.campb...@citrix.com Acked-by: Ian Jackson ian.jack...@eu.citrix.com --- v4: new patch --- ts-debian-di-install | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/ts-debian-di-install b/ts-debian-di-install index 373fad1..6fafd6d 100755 --- a/ts-debian-di-install +++ b/ts-debian-di-install @@ -227,12 +227,14 @@ END OnPowerOff = preserve ); +my $root_disk = 'phy:$gho-{Lvdev},xvda,w'; + prepareguest_part_xencfg($ho, $gho, $ram_mb, \%install_xopts, END); $method_cfg extra = $cmdline # disk= [ -$extra_disk 'phy:$gho-{Lvdev},xvda,w' +$extra_disk $root_disk ] END @@ -256,7 +258,7 @@ END $blcfg # disk= [ -'phy:$gho-{Lvdev},xvda,w' +$root_disk ] END return; -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH OSSTEST v8 07/14] distros: add branch infrastructure
Since the distro nightlies are not version controlled we cannot use the usual mechanisms for detecting regressions. Special case things appropriately. We use an OLD_REVISION of flight-NNN to signify that the old revision is another flight and not a tree revision. A grep over $NEW_REVISION needed adjusting since NEW_REVISION is empty in this mode, leading to grep filename which hangs waiting for stdin. Signed-off-by: Ian Campbell ian.campb...@citrix.com Acked-by: Ian Jackson ian.jack...@eu.citrix.com --- v7: Handle empty $NEW_REVISION by quoting it instead of a needless test -n Switch to flight-per-suite model v3: Handle within cr-daily-branch, since ap-fetch-version* don't make sense for a branch such as this. --- cr-daily-branch | 36 cri-common | 1 + 2 files changed, 29 insertions(+), 8 deletions(-) diff --git a/cr-daily-branch b/cr-daily-branch index 34b6d2b..1fcfd9d 100755 --- a/cr-daily-branch +++ b/cr-daily-branch @@ -68,23 +68,34 @@ fetch_version () { printf '%s\n' $fetch_version_result } -treeurl=`./ap-print-url $branch` +case $branch in +distros-*) + treeurl=none;; +*) + treeurl=`./ap-print-url $branch`;; +esac force_baseline=false skipidentical=true wantpush=$OSSTEST_PUSH -if [ x$OLD_REVISION = x ]; then -OLD_REVISION=`./ap-fetch-version-old $branch` -export OLD_REVISION -fi - check_tested () { ./sg-check-tested --debug --branch=$branch \ --blessings=${DAILY_BRANCH_TESTED_BLESSING:-$OSSTEST_BLESSING} \ $@ } +if [ x$OLD_REVISION = x ]; then +case $branch in + distros-*) + OSSTEST_NO_BASELINE=y + OLD_REVISION=flight-`check_tested` + ;; + *) OLD_REVISION=`./ap-fetch-version-old $branch`;; +esac +export OLD_REVISION +fi + if [ x$OSSTEST_NO_BASELINE != xy ] ; then testedflight=`check_tested --revision-$tree=$OLD_REVISION` @@ -227,6 +238,11 @@ if [ x$OLD_REVISION = xdetermine-late ]; then OLD_REVISION=`./ap-fetch-version-baseline-late $branch $NEW_REVISION` fi +case $branch in +distros-*) makeflight=./make-distros-flight ;; +*) makeflight=./make-flight ;; +esac + if [ x$NEW_REVISION = x$OLD_REVISION ]; then wantpush=false for checkbranch in x $BRANCHES_ALWAYS; do @@ -241,7 +257,7 @@ if [ x$NEW_REVISION = x$OLD_REVISION ]; then fi $DAILY_BRANCH_PREMAKE_HOOK -flight=`./make-flight $branch $xenbranch $OSSTEST_BLESSING $@` +flight=`$makeflight $branch $xenbranch $OSSTEST_BLESSING $@` $DAILY_BRANCH_POSTMAKE_HOOK heading=tmp/$flight.heading-info @@ -261,6 +277,10 @@ fi revlog=tmp/$flight.revision-log case $NEW_REVISION/$OLD_REVISION in +/flight-[0-9]*) + echo 2 SGR COMPARISON AGAINST ${OLD_REVISION} + sgr_args+= --that-flight=${OLD_REVISION#flight-} + ;; */*[^0-9a-f]* | *[^0-9a-f]*/*) echo 2 NO SGR COMPARISON badchar $NEW_REVISION/$OLD_REVISION ;; @@ -321,7 +341,7 @@ start_email $flight $branch $sgr_args $subject_prefix push=false if grep '^tolerable$' $mrof /dev/null 21; then push=$wantpush; fi if test -f $branch.force; then push=$OSSTEST_PUSH; fi -if grep -xF $NEW_REVISION $branch.force-rev; then push=$OSSTEST_PUSH; fi +if grep -xF $NEW_REVISION $branch.force-rev; then push=$OSSTEST_PUSH; fi if test -f $branch.block; then push=false; fi if test -e $mrof test -e $tree_bisect ! grep '^broken' $mrof; then diff --git a/cri-common b/cri-common index ad44546..58b08f2 100644 --- a/cri-common +++ b/cri-common @@ -72,6 +72,7 @@ select_xenbranch () { rumpuserxen) tree=rumpuserxen; xenbranch=xen-unstable ;; seabios)tree=seabios; xenbranch=xen-unstable ;; ovmf) tree=ovmf; xenbranch=xen-unstable ;; + distros-*) tree=none; xenbranch=xen-unstable ;; osstest)tree=osstest; xenbranch=xen-unstable ;; esac if [ x$tree = xlinux ]; then -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH OSSTEST v8 03/14] distros: add support for installing Debian PV guests via d-i, flight and jobs
This patch introduces ts-debian-di-install which can install Debian from a netboot (PXE) debian installer image. By default it installs from the d-i image used by osstest (using the special Xen PV guest enabled flavour where necessary) but it can also fetch the kernel and ramdisk from URLs specified in runvars. The resulting guests boot the distro kernel using pygrub (pvgrub will follow). The distros flights differ substantially from the existing flights. Introduce make-distros-flight using the functionality previously refactored into mfi-common. The new flight tests all versions of Debian from Squeeze onward as an amd64, i386 and armhf guests (armhf from Jessie onwards only) using the usual smoke tests. Test names are suffixed -pygrub pending the addition of pvgrub variants in a future commit. Add the new cases to sg-run-job Signed-off-by: Ian Campbell ian.campb...@citrix.com Acked-by: Ian Jackson ian.jack...@eu.citrix.com --- v8: mfi-common: Allow make-*flight to filter the set of build jobs to include was moved to the front of the series, so the bits of that and make-distros-flight: don't bother building for XSM or libvirt. have been folded in here. This patch and those two were all already acked so I have retained the ack here. Added description of configuraiton runvar's to ts-debian-di-install itself. This was due to Ian's feedback on distros: support PV guest install from Debian netinst media. further down the series, so I have retained the ack. v7: Use {Guest}_suite as runvar. Also use $suite not $dist in make-distros-flight for consistency. Switch to a flight per Debian suite model rather than one enourmous flight. Switch to constructing the URLs in make-distros-flight v6: Only apply -xen suffix to x86 images when doing a netboot using the osstest version of d-i, since that is the only arch where we create such files, other arches can use the bare names. Use the guest $arch not the host $r{arch} when finding the kernel+initrd to use for d-i install using the osstest d-i. v4: use guest create v3: $BUILD_LVEXTEND_MAX now handled in mfi-common Consolidate setting of ruvars Include $flight and $job in tmpdir name Use Osstest::Debian::di_installcmdline_core Document the usage of get_host_property on a guest object Correct ARM netboot paths Include bootloader in test name Should include -pv too? console= repetition for Jessie onwards. Wait for up to an hour for the install. I'd seen timeouts right at the end of the install with the previous value --- Osstest/TestSupport.pm | 3 + make-distros-flight| 138 + sg-run-job | 11 +++ ts-debian-di-install | 180 + 4 files changed, 332 insertions(+) create mode 100755 make-distros-flight create mode 100755 ts-debian-di-install diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm index 1cace4f..3a7a535 100644 --- a/Osstest/TestSupport.pm +++ b/Osstest/TestSupport.pm @@ -931,8 +931,11 @@ sub propname_massage ($) { return $prop; } +# It is fine to call this on a guest object too, in which case it will +# always return $defval. sub get_host_property ($$;$) { my ($ho, $prop, $defval) = @_; +return $defval unless $ho-{Properties}; my $val = $ho-{Properties}{propname_massage($prop)}; return defined($val) ? $val : $defval; } diff --git a/make-distros-flight b/make-distros-flight new file mode 100755 index 000..bdca7d1 --- /dev/null +++ b/make-distros-flight @@ -0,0 +1,138 @@ +#!/bin/bash + +# This is part of osstest, an automated testing framework for Xen. +# Copyright (C) 2009-2013 Citrix Inc. +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. + + +set -e + +branch=$1 +xenbranch=$2 +blessing=$3 +buildflight=$4 + +flight=`./cs-flight-create $blessing $branch` + +. cri-common +. ap-common +. mfi-common + +defsuite=`getconfig DebianSuite` +defguestsuite=`getconfig GuestDebianSuite` + +case $branch in + distros-debian-*) debian_suite=${branch#distros-debian-} ;; + *)echo $branch 2; exit 1 ;; +esac + +job_create_build_filter_callback () { + local job=$1; shift + + case $job in +build-*-libvirt) return 1;; + esac + case $* in +* enable_xsm=true *) return 1;; +
[Xen-devel] [PATCH OSSTEST v8 10/14] Debian: Handle lack of bootloader support in d-i on ARM.
Debian doesn't currently know what bootloader to install in a Xen guest on ARM. We install pv-grub-menu above which actually does what we need, but the installer doesn't treat that as a bootloader. Most ARM platforms end up installing a u-boot boot.scr, based on a platform whitelist. This doesn't seem appropriate for us. Grub is not available for arm32. For arm64 we will eventually end up with in-guest UEFI and therefore grub-efi and things will work normally. I'm not sure what the answer is going to be for arm32. This patch enables the workaround for Wheezy, Jessie and Sid, post-Jessie should be enabled as we add them. (Pre-wheezy does not support running as a Xen guest on ARM so we don't test them at all). Signed-off-by: Ian Campbell ian.campb...@citrix.com Acked-by: Ian Jackson ian.jack...@eu.citrix.com --- v4: Handle sid too v3: New --- Osstest/Debian.pm| 14 -- ts-debian-di-install | 6 -- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm index 7c94b6c..4669047 100644 --- a/Osstest/Debian.pm +++ b/Osstest/Debian.pm @@ -865,8 +865,8 @@ END return $preseed; } -sub preseed_create_guest ($$;@) { -my ($ho, $sfx, %xopts) = @_; +sub preseed_create_guest ($$$;@) { +my ($ho, $arch, $sfx, %xopts) = @_; my $suite= $xopts{Suite} || $c{DebianSuite}; @@ -913,6 +913,16 @@ d-i grub-installer/bootdev string /dev/xvda END +# Debian doesn't currently know what bootloader to install in a +# Xen guest on ARM. We install pv-grub-menu above which actually +# does what we need, but the installer doesn't treat that as a +# bootloader. +logm(\$arch is $arch, \$suite is $suite); +$preseed_file.= (END) if $arch =~ /^arm/ $suite =~ /wheezy|jessie|sid/; +d-i nobootloader/confirmation_common boolean true + +END + $preseed_file .= preseed_hook_cmds(); return create_webfile($ho, preseed$sfx, $preseed_file); diff --git a/ts-debian-di-install b/ts-debian-di-install index 1a7e1d0..373fad1 100755 --- a/ts-debian-di-install +++ b/ts-debian-di-install @@ -192,7 +192,9 @@ END $method_cfg = setup_netboot($tmpdir, $arch, $suite); - $ps_url = preseed_create_guest($gho, '', Suite=$suite, PvMenuLst=($bl eq pvgrub)); + $ps_url = preseed_create_guest($gho, $arch, '', + Suite=$suite, + PvMenuLst=($bl eq pvgrub)); $extra_disk = ; } @@ -202,7 +204,7 @@ END ($method_cfg,$extra_disk) = setup_netinst($tmpdir, $arch); - $ps_url = preseed_create_guest($gho, '', CDROM=1); + $ps_url = preseed_create_guest($gho, $arch, '', CDROM=1); } else { -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel
On 08.07.15 at 11:39, david.vra...@citrix.com wrote: On 08/07/15 09:56, Jan Beulich wrote: Rather than assuming only PV guests need special treatment (and dealing with that directly when an IRQ gets set up), keep all guest MSI IRQs masked until either the (HVM) guest unmasks them via vMSI or the (PV, PVHVM, or PVH) guest sets up an event channel for it. To not further clutter the common evtchn_bind_pirq() with x86-specific code, introduce an arch_evtchn_bind_pirq() hook instead. Can you describe the symptoms of the bug being fixed here? Interrupts simply didn't get unmasked for PVHVM Linux guests. --- a/xen/include/asm-arm/irq.h +++ b/xen/include/asm-arm/irq.h @@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, void arch_move_irqs(struct vcpu *v); +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq))) Would this be better as a inline function? + /* Set IRQ type for an SPI */ int irq_set_spi_type(unsigned int spi, unsigned int type); --- a/xen/include/xen/irq.h +++ b/xen/include/xen/irq.h @@ -172,4 +172,8 @@ unsigned int set_desc_affinity(struct ir unsigned int arch_hwdom_irqs(domid_t); #endif +#ifndef arch_evtchn_bind_pirq +void arch_evtchn_bind_pirq(struct domain *, int pirq); ... moving this into xen/include/asm-x86/irq.h Oh, right, (also to Julien) - this is exactly the reason I do not want it to be an inline function for ARM: I want the declaration here, not replicated in every interested arch's header. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 13/15] vmx: Properly handle notification event when vCPU is running
From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM When a vCPU is running in Root mode and a notification event has been injected to it. we need to set VCPU_KICK_SOFTIRQ for the current cpu, so the pending interrupt in PIRR will be synced to vIRR before VM-Exit in time. Signed-off-by: Feng Wu feng...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
-Original Message- From: Tian, Kevin Sent: Wednesday, July 08, 2015 7:00 PM To: Wu, Feng; xen-devel@lists.xen.org Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang, Yang Z; george.dun...@eu.citrix.com Subject: RE: [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM This patch includes the following aspects: - Add a global vector to wake up the blocked vCPU when an interrupt is being posted to it (This part was sugguested by Yang Zhang yang.z.zh...@intel.com). - Adds a new per-vCPU tasklet to wakeup the blocked vCPU. It can be used in the case vcpu_unblock cannot be called directly. - Define two per-cpu variables: * pi_blocked_vcpu: A list storing the vCPUs which were blocked on this pCPU. * pi_blocked_vcpu_lock: The spinlock to protect pi_blocked_vcpu. Signed-off-by: Feng Wu feng...@intel.com --- v3: - This patch is generated by merging the following three patches in v2: [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU [RFC v2 10/15] vmx: Define two per-cpu variables [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet' - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct' - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler' - Make pi_wakeup_interrupt() static - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list' - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct' - Rename 'blocked_vcpu' to 'pi_blocked_vcpu' - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock' xen/arch/x86/hvm/vmx/vmcs.c| 3 +++ xen/arch/x86/hvm/vmx/vmx.c | 54 ++ xen/include/asm-x86/hvm/hvm.h | 1 + xen/include/asm-x86/hvm/vmx/vmcs.h | 5 xen/include/asm-x86/hvm/vmx/vmx.h | 5 5 files changed, 68 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 11dc1b5..0c5ce3f 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -631,6 +631,9 @@ int vmx_cpu_up(void) if ( cpu_has_vmx_vpid ) vpid_sync_all(); +INIT_LIST_HEAD(per_cpu(pi_blocked_vcpu, cpu)); +spin_lock_init(per_cpu(pi_blocked_vcpu_lock, cpu)); + return 0; } diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index b94ef6a..7db6009 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content); static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content); static void vmx_invlpg_intercept(unsigned long vaddr); +/* + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we + * can find which vCPU should be waken up. + */ +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu); +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock); + uint8_t __read_mostly posted_intr_vector; +uint8_t __read_mostly pi_wakeup_vector; + +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg) +{ +vcpu_unblock((struct vcpu *)arg); +} static int vmx_domain_initialise(struct domain *d) { @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v) if ( v-vcpu_id == 0 ) v-arch.user_regs.eax = 1; +tasklet_init( +v-arch.hvm_vmx.pi_vcpu_wakeup_tasklet, +pi_vcpu_wakeup_tasklet_handler, +(unsigned long)v); + +INIT_LIST_HEAD(v-arch.hvm_vmx.pi_blocked_vcpu_list); + return 0; } static void vmx_vcpu_destroy(struct vcpu *v) { +tasklet_kill(v-arch.hvm_vmx.pi_vcpu_wakeup_tasklet); /* * There are cases that domain still remains in log-dirty mode when it is * about to be destroyed (ex, user types 'xl destroy dom'), in which case @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata vmx_function_table = { .enable_msr_exit_interception = vmx_enable_msr_exit_interception, }; +/* + * Handle VT-d posted-interrupt when VCPU is blocked. + */ +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) +{ +struct arch_vmx_struct *vmx; +unsigned int cpu = smp_processor_id(); + +spin_lock(per_cpu(pi_blocked_vcpu_lock, cpu)); + +/* + * FIXME: The length of the list depends on how many + * vCPU is current blocked on this specific pCPU. + * This may hurt the interrupt latency if the list + * grows to too many entries. + */ let's go with this linked list first until a real issue is identified. +list_for_each_entry(vmx, per_cpu(pi_blocked_vcpu, cpu), +pi_blocked_vcpu_list) +if ( vmx-pi_desc.on ) +
Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel
On 08/07/2015 11:55, Jan Beulich wrote: On 08.07.15 at 11:07, julien.gr...@citrix.com wrote: On 08/07/2015 09:56, Jan Beulich wrote: --- a/xen/include/asm-arm/irq.h +++ b/xen/include/asm-arm/irq.h @@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, void arch_move_irqs(struct vcpu *v); +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq))) + This addition is here in order to ensure that d and pirq are evaluated, right? Sure. If so, I didn't find it obvious to understand. Why didn't you use a static inline? Or maybe add a comment explicitly say this is not implemented. A static inline could be used in this case, yes. But I see no significant advantages. As to the comment - it is implemented, it's just a no-op. And stating that it is a no-op would be redundant with it obviously being so by looking at it. It's not so obvious as I asked about it. The first thing I saw was (d) + (pirq) and I though : Why do we want to add a domain with a pirq?. I only see after the (void) and it just because I remembered we talked about similar case a year ago. Having a comment doesn't hurt and help the comprehension. -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] x86, arm: remove asm/spinlock.h from all architectures removed x86's _raw_read_unlock()
On 08/07/15 11:45, Jan Beulich wrote: David, I'm afraid we'll need another fixup here, even if things build fine despite the removal. Ah, we get a generic implementation instead. Thanks for pointing this out. I'll fix it. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel
On 08.07.15 at 11:07, julien.gr...@citrix.com wrote: On 08/07/2015 09:56, Jan Beulich wrote: --- a/xen/include/asm-arm/irq.h +++ b/xen/include/asm-arm/irq.h @@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, void arch_move_irqs(struct vcpu *v); +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq))) + This addition is here in order to ensure that d and pirq are evaluated, right? Sure. If so, I didn't find it obvious to understand. Why didn't you use a static inline? Or maybe add a comment explicitly say this is not implemented. A static inline could be used in this case, yes. But I see no significant advantages. As to the comment - it is implemented, it's just a no-op. And stating that it is a no-op would be redundant with it obviously being so by looking at it. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes
-Original Message- From: Wu, Feng Sent: Wednesday, July 08, 2015 6:32 PM To: Tian, Kevin; xen-devel@lists.xen.org Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang, Yang Z; george.dun...@eu.citrix.com; Wu, Feng Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config changes -Original Message- From: Tian, Kevin Sent: Wednesday, July 08, 2015 6:23 PM To: Wu, Feng; xen-devel@lists.xen.org Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang, Yang Z; george.dun...@eu.citrix.com Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config changes From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM When guest changes its interrupt configuration (such as, vector, etc.) for direct-assigned devices, we need to update the associated IRTE with the new guest vector, so external interrupts from the assigned devices can be injected to guests without VM-Exit. For lowest-priority interrupts, we use vector-hashing mechamisn to find the destination vCPU. This follows the hardware behavior, since modern Intel CPUs use vector hashing to handle the lowest-priority interrupt. For multicast/broadcast vCPU, we cannot handle it via interrupt posting, still use interrupt remapping. Signed-off-by: Feng Wu feng...@intel.com --- v3: - Use bitmap to store the all the possible destination vCPUs of an interrupt, then trying to find the right destination from the bitmap - Typo and some small changes xen/drivers/passthrough/io.c | 96 +++- 1 file changed, 95 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index 9b77334..18e24e1 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -26,6 +26,7 @@ #include asm/hvm/iommu.h #include asm/hvm/support.h #include xen/hvm/irq.h +#include asm/io_apic.h static DEFINE_PER_CPU(struct list_head, dpci_list); @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci) xfree(dpci); } +/* + * The purpose of this routine is to find the right destination vCPU for + * an interrupt which will be delivered by VT-d posted-interrupt. There + * are several cases as below: If you aim to have this interface common to more usages, don't restrict to VT-d posted-interrupt which should be just an example. Yes, making this a common interface should be better. Thinking about this a little more, this function itself is kind of restricted to VT-d posted-interrupt, since it doesn't handle multicast/broadcast interrupts, it only handle lowest-priority and single destination interrupts. However, I can make the vector-hashing logic as a separate function, which can be used elsewhere. Thanks, Feng + * + * - For lowest-priority interrupts, we find the destination vCPU from the + * guest vector using vector-hashing mechanism and return true. This follows + * the hardware behavior, since modern Intel CPUs use vector hashing to + * handle the lowest-priority interrupt. Does AMD use same hashing mechanism? Can this interface be reused by other IOMMU type or it's an Intel specific implementation? I am not sure how AMD handle lowest-priority. Intel hardware guys told me recent Intel hardware platform use this method to deliver lowest-priority interrupts. What do you mean by other IOMMU type? Thanks, Feng + * - Otherwise, for single destination interrupt, it is straightforward to + * find the destination vCPU and return true. + * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting, + * so return false. + * + * Here is the details about the vector-hashing mechanism: + * 1. For lowest-priority interrupts, store all the possible destination + * vCPUs in an array. + * 2. Use gvec % max number of destination vCPUs to find the right + * destination vCPU in the array for the lowest-priority interrupt. + */ +static struct vcpu *pi_find_dest_vcpu(struct domain *d, uint8_t dest_id, + uint8_t dest_mode, uint8_t delivery_mode, + uint8_t gvec) +{ +unsigned long *dest_vcpu_bitmap = NULL; +unsigned int dest_vcpu_num = 0, idx = 0; +int size = (d-max_vcpus + BITS_PER_LONG - 1) / BITS_PER_LONG; +struct vcpu *v, *dest = NULL; +int i; + +dest_vcpu_bitmap = xzalloc_array(unsigned long, size); +if ( !dest_vcpu_bitmap ) +{ +dprintk(XENLOG_G_INFO, +dom%d: failed to allocate memory\n, d-domain_id); +return NULL; +} + +for_each_vcpu ( d, v ) +{ +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0, +
Re: [Xen-devel] [PATCH] xen: Use module_pci_driver() in platform pci driver.
On 08/07/15 06:54, Rajat Jain wrote: Eliminate the module_init function by using module_pci_driver() This is not equivalent since this adds a useless module_exit() function. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel
On 08.07.15 at 13:14, david.vra...@citrix.com wrote: On 08/07/15 11:58, Jan Beulich wrote: On 08.07.15 at 11:39, david.vra...@citrix.com wrote: On 08/07/15 09:56, Jan Beulich wrote: + /* Set IRQ type for an SPI */ int irq_set_spi_type(unsigned int spi, unsigned int type); --- a/xen/include/xen/irq.h +++ b/xen/include/xen/irq.h @@ -172,4 +172,8 @@ unsigned int set_desc_affinity(struct ir unsigned int arch_hwdom_irqs(domid_t); #endif +#ifndef arch_evtchn_bind_pirq +void arch_evtchn_bind_pirq(struct domain *, int pirq); ... moving this into xen/include/asm-x86/irq.h Oh, right, (also to Julien) - this is exactly the reason I do not want it to be an inline function for ARM: I want the declaration here, not replicated in every interested arch's header. Ok. FWIW, with this requirement I would (instead of the macros) add a weak arch_evtchn_bind_pirq() that's a no-op. Yeah, that's how Linux likes to do it. But we learned the hard way that weak conflicts with our making symbols hidden by default, so no, weak is not an option either I'm afraid. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V4 3/3] xen/vm_event: Deny register writes if refused by vm_event reply
Are the license headers required? I just tried to make the change as small as possible, and looking at the other headers (for example in xen/include/asm-arm), at least half of them have no license header. I'm guessing this is something we'd now like to start correcting in new patches? Thanks, Razvan The wiki's definition that goes with the Signed-off-by tag goes: The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file;. So, the open source license should be indicated in the file. If it's a new file being created, I would say it's the creators responsibility to add the license. But I haven't seen any discussion/documentation on the matter so I'm just guessing. Cheers, Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v25 05/15] x86/VPMU: Initialize VPMUs with __initcall
Am Freitag 19 Juni 2015, 14:44:36 schrieb Boris Ostrovsky: Move some VPMU initilization operations into __initcalls to avoid performing same tests and calculations for each vcpu. Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com Acked-by: Jan Beulich jbeul...@suse.com For the Intel/VMX part: Reviewed-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com --- xen/arch/x86/hvm/svm/vpmu.c | 106 -- xen/arch/x86/hvm/vmx/vpmu_core2.c | 151 +++--- xen/arch/x86/hvm/vpmu.c | 32 xen/include/asm-x86/hvm/vpmu.h| 2 + 4 files changed, 156 insertions(+), 135 deletions(-) diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index 481ea7b..b60ca40 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -356,54 +356,6 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) return 1; } -static int amd_vpmu_initialise(struct vcpu *v) -{ -struct xen_pmu_amd_ctxt *ctxt; -struct vpmu_struct *vpmu = vcpu_vpmu(v); -uint8_t family = current_cpu_data.x86; - -if ( counters == NULL ) -{ - switch ( family ) - { - case 0x15: - num_counters = F15H_NUM_COUNTERS; - counters = AMD_F15H_COUNTERS; - ctrls = AMD_F15H_CTRLS; - k7_counters_mirrored = 1; - break; - case 0x10: - case 0x12: - case 0x14: - case 0x16: - default: - num_counters = F10H_NUM_COUNTERS; - counters = AMD_F10H_COUNTERS; - ctrls = AMD_F10H_CTRLS; - k7_counters_mirrored = 0; - break; - } -} - -ctxt = xzalloc_bytes(sizeof(*ctxt) + - 2 * sizeof(uint64_t) * num_counters); -if ( !ctxt ) -{ -gdprintk(XENLOG_WARNING, Insufficient memory for PMU, - PMU feature is unavailable on domain %d vcpu %d.\n, -v-vcpu_id, v-domain-domain_id); -return -ENOMEM; -} - -ctxt-counters = sizeof(*ctxt); -ctxt-ctrls = ctxt-counters + sizeof(uint64_t) * num_counters; - -vpmu-context = ctxt; -vpmu-priv_context = NULL; -vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED); -return 0; -} - static void amd_vpmu_destroy(struct vcpu *v) { struct vpmu_struct *vpmu = vcpu_vpmu(v); @@ -474,30 +426,62 @@ struct arch_vpmu_ops amd_vpmu_ops = { int svm_vpmu_initialise(struct vcpu *v) { +struct xen_pmu_amd_ctxt *ctxt; struct vpmu_struct *vpmu = vcpu_vpmu(v); -uint8_t family = current_cpu_data.x86; -int ret = 0; -/* vpmu enabled? */ if ( vpmu_mode == XENPMU_MODE_OFF ) return 0; -switch ( family ) +if ( !counters ) +return -EINVAL; + +ctxt = xzalloc_bytes(sizeof(*ctxt) + + 2 * sizeof(uint64_t) * num_counters); +if ( !ctxt ) { +printk(XENLOG_G_WARNING Insufficient memory for PMU, +PMU feature is unavailable on domain %d vcpu %d.\n, + v-vcpu_id, v-domain-domain_id); +return -ENOMEM; +} + +ctxt-counters = sizeof(*ctxt); +ctxt-ctrls = ctxt-counters + sizeof(uint64_t) * num_counters; + +vpmu-context = ctxt; +vpmu-priv_context = NULL; + +vpmu-arch_vpmu_ops = amd_vpmu_ops; + +vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED); +return 0; +} + +int __init amd_vpmu_init(void) +{ +switch ( current_cpu_data.x86 ) +{ +case 0x15: +num_counters = F15H_NUM_COUNTERS; +counters = AMD_F15H_COUNTERS; +ctrls = AMD_F15H_CTRLS; +k7_counters_mirrored = 1; +break; case 0x10: case 0x12: case 0x14: -case 0x15: case 0x16: -ret = amd_vpmu_initialise(v); -if ( !ret ) -vpmu-arch_vpmu_ops = amd_vpmu_ops; -return ret; +num_counters = F10H_NUM_COUNTERS; +counters = AMD_F10H_COUNTERS; +ctrls = AMD_F10H_CTRLS; +k7_counters_mirrored = 0; +break; +default: +printk(XENLOG_WARNING VPMU: Unsupported CPU family %#x\n, + current_cpu_data.x86); +return -EINVAL; } -printk(VPMU: Initialization failed. - AMD processor family %d has not - been supported\n, family); -return -EINVAL; +return 0; } diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c index cfcdf42..025c970 100644 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c @@ -708,62 +708,6 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs) return 1; } -static int core2_vpmu_initialise(struct vcpu *v) -{ -struct vpmu_struct *vpmu = vcpu_vpmu(v); -u64 msr_content; -static bool_t ds_warned; - -if ( !(vpmu_features XENPMU_FEATURE_INTEL_BTS) ) -
Re: [Xen-devel] [v6][PATCH 10/16] tools: introduce some new parameters to set rdm policy
Tiejun Chen writes ([v6][PATCH 10/16] tools: introduce some new parameters to set rdm policy): This patch introduces user configurable parameters to specify RDM resource and according policies, Thanks. I appreciate that I have come to this review late. While I have found the review conversation quite unsatisfactory, I don't really feel that I can reject the patch series pending better answers to my questions. Instead, I feel that I need to make a set of decisions which will avoid my review comments being a blocker for this series. After discussing matters with the other tools maintainers, I have concluded: * On the question of whether the default should be `strategy=host' or `strategy=none': I still don't understand what is going on here and I am frustrated because I don't feel that the replies I have been getting are actually answers to my questions. They seem to be answers to different questions. However, the patch series with `strategy=none' is strictly less of a change to the codebase than with `stategy=host' and it is easy to change defaults later. It would be perverse to block this functionality on the grounds that it is not enabled strongly enough by default. Therefore, despite the fact that after several rounds of emails I still do not have a convincing explanation, I am going to drop this line of questioning. * On the question of the documentation: The documentation is unfortunately a poor guide to a user. Many of my questions were prompted by reading the documentation. Having gone several rounds of emails I still do not know enough to suggest improvements. In my view the effect of the poor documentation will be that most users will simply ignore the whole feature as too confusing. (Unless they have somehow divined that they are having RDM trouble in which case they may flail at random experimenting with various options.) Again, the effect therefore is that knowledgeable users might be able to do better, but for most users this is just yet another piece of docs for some feature they don't want to use. While I'm not entirely comfortable with accepting documentation which reduces the overall readability and usefulness of the manual, I think this is a relatively minor objection which I am prepared to overlook. Of course there is some opportunity for improving the documentation during the freeze. * On the question of option naming, `strategy' vs `type': `type' was definitely wrong. It may be that a better name than `strategy' would be correct. This depends on the contemplated direction for future expansion. Sadly, I do not expect that further discussion is going to illuminate this further. `strategy' will do. * On the question of option naming, `none' vs `ignore': I asked whether the submitter agreed that `none' should be renamed `ignore'. I have not received a clear opinion. Instead, the submitter indicated a willingness to change this on my request. the latest resubmission just did the rename. The purpose of asking `do you agree', in this way, is to try to help the submitters and the maintainers come up with the best answers. Note that it is a fundamental assumption of the patch review process that the submitter understands the design and implementation decisions embodied in the patchset. The submitter needs to be able to respond to suggestions with evaluations, not simply acquiescence. (If it happens that some of the decisions were made by someone else, the submitter needs to 1. state this clearly where relevant and 2. either consult the designers/authors, or if they aren't available, reverse-engineer the intent.) In the absence of a clear statement of the submitter's own opinion, I remain doubtful that this rename was correct. But, I don't think it important enough to make any more fuss about. * On the question of option naming, the `reserve='. Ian Campbell points out that the API structure for `[rdm_]reserve' as submitted is anomalous. I agree with him. The existing API and config file arrangements are rather too confusing. Please change `reserve' to `policy', in the following places: * In the xl rdm config parsing, `reserve=' should be `policy='. * In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='. * The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'. * The field name `reserve' in `libxl_rdm_reserve' should be `policy'. I think that with these changes I will be able to ack the remaining tools parts of this series, and drop my objections to the parts acked by Wei. I can't speak for the hypervisor side, which I haven't really looked at. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 08/15] Suppress posting interrupts when 'SN' is set
-Original Message- From: Tian, Kevin Sent: Wednesday, July 08, 2015 7:31 PM To: Wu, Feng; xen-devel@lists.xen.org Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang, Yang Z; george.dun...@eu.citrix.com Subject: RE: [v3 08/15] Suppress posting interrupts when 'SN' is set From: Wu, Feng Sent: Wednesday, July 08, 2015 6:11 PM From: Tian, Kevin Sent: Wednesday, July 08, 2015 5:06 PM From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM Currently, we don't support urgent interrupt, all interrupts are recognized as non-urgent interrupt, so we cannot send posted-interrupt when 'SN' is set. Signed-off-by: Feng Wu feng...@intel.com --- v3: use cmpxchg to test SN/ON and set ON xen/arch/x86/hvm/vmx/vmx.c | 32 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 0837627..b94ef6a 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1686,6 +1686,8 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector) { +struct pi_desc old, new, prev; + move to 'else if'. if ( pi_test_and_set_pir(vector, v-arch.hvm_vmx.pi_desc) ) return; @@ -1698,13 +1700,35 @@ static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector) */ pi_set_on(v-arch.hvm_vmx.pi_desc); } -else if ( !pi_test_and_set_on(v-arch.hvm_vmx.pi_desc) ) +else { +prev.control = 0; + +do { +old.control = v-arch.hvm_vmx.pi_desc.control + ~(1 POSTED_INTR_ON | 1 POSTED_INTR_SN); +new.control = v-arch.hvm_vmx.pi_desc.control | + 1 POSTED_INTR_ON; + +/* + * Currently, we don't support urgent interrupt, all + * interrupts are recognized as non-urgent interrupt, + * so we cannot send posted-interrupt when 'SN' is set. + * Besides that, if 'ON' is already set, we cannot set + * posted-interrupts as well. + */ +if ( prev.sn || prev.on ) +{ +vcpu_kick(v); +return; +} would it make more sense to move above check after cmpxchg? My original idea is that, we only need to do the check when prev.control != old.control, which means the cmpxchg is not successful completed. If we add the check between cmpxchg and while ( prev.control != old.control ), it seems the logic is not so clear, since we don't need to check prev.sn and prev.on when cmxchg succeeds in setting the new value. Thanks, Feng Then it'd be clearer if you move the check the start of the loop, so you can avoid two additional reads when the prev.on/sn is set. :-) Good idea! Thanks, Feng Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86: correct socket_cpumask allocation for AP
On 08.07.15 at 11:36, chao.p.p...@linux.intel.com wrote: @@ -84,11 +85,21 @@ void *stack_base[NR_CPUS]; static void smp_store_cpu_info(int id) { struct cpuinfo_x86 *c = cpu_data + id; +unsigned int socket; *c = boot_cpu_data; if ( id != 0 ) +{ identify_cpu(c); +socket = cpu_to_socket(id); +if ( !socket_cpumask[socket] ) +{ +socket_cpumask[socket] = secondary_socket_cpumask; +secondary_socket_cpumask = NULL; I don't think this will build with small enough NR_CPUS. Which raises the question whether the use of cpumask_var_t is suitable here in the first place. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 01/13] x86: add socket_cpumask
On 08.07.15 at 04:43, chao.p.p...@linux.intel.com wrote: On Tue, Jul 07, 2015 at 06:32:55PM -0400, Boris Ostrovsky wrote: @@ -245,6 +248,8 @@ static void set_cpu_sibling_map(int cpu) cpumask_set_cpu(cpu, cpu_sibling_setup_map); +cpumask_set_cpu(cpu, socket_cpumask[cpu_to_socket(cpu)]); This patch crashes Xen on my 32-cpu Intel box here for cpu 16, which is the first CPU on the second socket (i.e. on socket 1). The reason appears to be that cpu_to_socket(16) is (correctly) 1 here, but ... + if ( c[cpu].x86_num_siblings 1 ) { for_each_cpu ( i, cpu_sibling_setup_map ) @@ -649,7 +654,13 @@ void cpu_exit_clear(unsigned int cpu) static void cpu_smpboot_free(unsigned int cpu) { -unsigned int order; +unsigned int order, socket = cpu_to_socket(cpu); + +if ( cpumask_empty(socket_cpumask[socket]) ) +{ +free_cpumask_var(socket_cpumask[socket]); +socket_cpumask[socket] = NULL; +} free_cpumask_var(per_cpu(cpu_sibling_mask, cpu)); free_cpumask_var(per_cpu(cpu_core_mask, cpu)); @@ -694,6 +705,7 @@ static int cpu_smpboot_alloc(unsigned int cpu) nodeid_t node = cpu_to_node(cpu); struct desc_struct *gdt; unsigned long stub_page; +unsigned int socket = cpu_to_socket(cpu); ... is zero here, meaning that socket_cpumask[1] is NULL. I suspect that phys_proc_id is probably not set at this point but is by the time we get to set_cpu_sibling_map(). I haven't looked any further yet. I might do this tomorrow unless Chao does it before me. Thanks for testing. Boris' report first of all raises the question: Did you test this at all on a multi-socket system? Considering you not having tested the CPU removal case either, I'm starting to wonder how much testing this series has seen overall... I think I have found the reason. For AP, phys_proc_id is set in: start_secondary()=smp_callin()=smp_store_cpu_info()=identify_cpu() which is behind cpu_smpboot_alloc() called from CPU_PREPARE. One way would move 'zalloc_cpumask_var(socket_cpumask + socket)' to set_cpu_sibling_map() to fix it if Jan agrees that, otherwise other solution needs to be found. Looks sensible at a first glance, but in order to be able to do proper error handling the allocation needs to remain in cpu_smpboot_alloc(). I.e. you'd add a static variable, pre- allocate a cpumask into it if it's currently NULL, and consume the allocation in set_cpu_sibling_map() (or maybe even better in smp_store_cpu_info() right after the identify_cpu() call) if socket_cpumask[socket] is NULL. And then you test this on an affected system, and submit asap, so we can preferably avoid reverting the whole series. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] traps.c:3227: GPF (0000): ffff82d080194a4d - ffff82d080239d85 and other dom0 induced log messages
On 08.07.15 at 10:45, li...@eikelenboom.it wrote: Here we go: (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 - 82d080239d85 (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 - 82d080239d85 which leads to: # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080195583 /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080239d85 ??:? Were /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 leads to: case MSR_EFER: rdmsr_normal: /* Everyone can read the MSR space. */ /* gdprintk(XENLOG_WARNING,Domain attempted RDMSR %p.\n, _p(regs-ecx));*/ HERE --if ( rdmsr_safe(regs-ecx, val) ) Right, so as Andrew suspected - we won't know whether that's legitimate/reasonable without knowing the MSR being accessed. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] x86: correct socket_cpumask allocation for AP
For AP, phys_proc_id is still not valid in CPU_PREPARE notifier (cpu_smpboot_alloc), so cpu_to_socket(cpu) is not valid as well. Introduce a pre-allocated secondary_cpu_mask so that later in smp_store_cpu_info() socket_cpumask[socket] can consume it. Signed-off-by: Chao Peng chao.p.p...@linux.intel.com --- This is targeted for staging branch. I tested on a 2-sockets machine and looks fine. --- xen/arch/x86/smpboot.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index c73aa1b..49b8497 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -62,6 +62,7 @@ EXPORT_SYMBOL(cpu_online_map); unsigned int __read_mostly nr_sockets; cpumask_var_t *__read_mostly socket_cpumask; +static cpumask_var_t secondary_socket_cpumask; struct cpuinfo_x86 cpu_data[NR_CPUS]; @@ -84,11 +85,21 @@ void *stack_base[NR_CPUS]; static void smp_store_cpu_info(int id) { struct cpuinfo_x86 *c = cpu_data + id; +unsigned int socket; *c = boot_cpu_data; if ( id != 0 ) +{ identify_cpu(c); +socket = cpu_to_socket(id); +if ( !socket_cpumask[socket] ) +{ +socket_cpumask[socket] = secondary_socket_cpumask; +secondary_socket_cpumask = NULL; +} +} + /* * Certain Athlons might work (for various values of 'work') in SMP * but they are not certified as MP capable. @@ -705,7 +716,6 @@ static int cpu_smpboot_alloc(unsigned int cpu) nodeid_t node = cpu_to_node(cpu); struct desc_struct *gdt; unsigned long stub_page; -unsigned int socket = cpu_to_socket(cpu); if ( node != NUMA_NO_NODE ) memflags = MEMF_node(node); @@ -748,8 +758,8 @@ static int cpu_smpboot_alloc(unsigned int cpu) goto oom; per_cpu(stubs.addr, cpu) = stub_page + STUB_BUF_CPU_OFFS(cpu); -if ( !socket_cpumask[socket] - !zalloc_cpumask_var(socket_cpumask + socket) ) +if ( !secondary_socket_cpumask + !zalloc_cpumask_var(secondary_socket_cpumask) ) goto oom; if ( zalloc_cpumask_var(per_cpu(cpu_sibling_mask, cpu)) -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V4 0/3] Vm_event memory introspection helpers
Version 4 of the series addresses V3 reviews, and consists of: [PATCH 1/3] xen/mem_access: Support for memory-content hiding [PATCH 2/3] xen/vm_event: Support for guest-requested events [PATCH 3/3] xen/vm_event: Deny register writes if refused by vm_event reply All the patches in this version have been acked by at least one person. For [PATCH 3/3], Tamas has suggested that I move the DENY logic from p2m.c to dedicated files, which I've done here. Since this is simply a trivial move without any modifications to the logic itself, I've kept both acks received for the patch; George's ack should in any case not be an issue, as it only concerned the mm parts which are unchanged, but if I shouldn't have kept Jan's ack then please disregard it. This version of the series assumes the patch vm_event: Rename MEM_ACCESS_EMULATE and MEM_ACCESS_EMULATE_NOWRITE that I've submitted yesterday. I've not added that patch to this series because I wanted it to be available for Tamas as well, as he's working on a parallel series and I had hoped that this way would be better than him having to wait for this whole series to go in. Thank you, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V4 3/3] xen/vm_event: Deny register writes if refused by vm_event reply
Deny register writes if a vm_client subscribed to mov_to_msr or control register write events forbids them. Currently supported for MSR, CR0, CR3 and CR4 events. Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com Acked-by: George Dunlap george.dun...@eu.citrix.com Acked-by: Jan Beulich jbeul...@suse.com --- Changes since V3: - Renamed MEM_ACCESS_FLAG_DENY to VM_EVENT_FLAG_DENY (and fixed the bit shift appropriately). - Moved the DENY vm_event response logic from p2m.c to newly added dedicated files for vm_event handling, as suggested by Tamas Lengyel. --- MAINTAINERS |1 + xen/arch/x86/Makefile |1 + xen/arch/x86/domain.c |2 + xen/arch/x86/hvm/emulate.c|8 +-- xen/arch/x86/hvm/event.c |5 +- xen/arch/x86/hvm/hvm.c| 118 - xen/arch/x86/hvm/svm/nestedsvm.c | 14 ++--- xen/arch/x86/hvm/svm/svm.c|2 +- xen/arch/x86/hvm/vmx/vmx.c| 15 +++-- xen/arch/x86/hvm/vmx/vvmx.c | 18 +++--- xen/arch/x86/vm_event.c | 33 +++ xen/common/vm_event.c |9 +++ xen/include/asm-arm/vm_event.h| 12 xen/include/asm-x86/domain.h | 18 +- xen/include/asm-x86/hvm/event.h |9 ++- xen/include/asm-x86/hvm/support.h |9 +-- xen/include/asm-x86/vm_event.h|8 +++ xen/include/public/vm_event.h |6 ++ 18 files changed, 242 insertions(+), 46 deletions(-) create mode 100644 xen/arch/x86/vm_event.c create mode 100644 xen/include/asm-arm/vm_event.h create mode 100644 xen/include/asm-x86/vm_event.h diff --git a/MAINTAINERS b/MAINTAINERS index 6b1068e..59c0822 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -383,6 +383,7 @@ F: xen/common/vm_event.c F: xen/common/mem_access.c F: xen/arch/x86/hvm/event.c F: xen/arch/x86/monitor.c +F: xen/arch/x86/vm_event.c XENTRACE M: George Dunlap george.dun...@eu.citrix.com diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile index 37e547c..5f24951 100644 --- a/xen/arch/x86/Makefile +++ b/xen/arch/x86/Makefile @@ -60,6 +60,7 @@ obj-y += machine_kexec.o obj-y += crash.o obj-y += tboot.o obj-y += hpet.o +obj-y += vm_event.o obj-y += xstate.o obj-$(crash_debug) += gdbstub.o diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index a8fe046..c688ab9 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -678,6 +678,8 @@ void arch_domain_destroy(struct domain *d) cleanup_domain_irq_mapping(d); psr_free_rmid(d); + +xfree(d-arch.event_write_data); } void arch_domain_shutdown(struct domain *d) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index c6ccb1f..780adb4 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -1389,14 +1389,14 @@ static int hvmemul_write_cr( switch ( reg ) { case 0: -return hvm_set_cr0(val); +return hvm_set_cr0(val, 1); case 2: current-arch.hvm_vcpu.guest_cr[2] = val; return X86EMUL_OKAY; case 3: -return hvm_set_cr3(val); +return hvm_set_cr3(val, 1); case 4: -return hvm_set_cr4(val); +return hvm_set_cr4(val, 1); default: break; } @@ -1417,7 +1417,7 @@ static int hvmemul_write_msr( uint64_t val, struct x86_emulate_ctxt *ctxt) { -return hvm_msr_write_intercept(reg, val); +return hvm_msr_write_intercept(reg, val, 1); } static int hvmemul_wbinvd( diff --git a/xen/arch/x86/hvm/event.c b/xen/arch/x86/hvm/event.c index 17638ea..042e583 100644 --- a/xen/arch/x86/hvm/event.c +++ b/xen/arch/x86/hvm/event.c @@ -90,7 +90,7 @@ static int hvm_event_traps(uint8_t sync, vm_event_request_t *req) return 1; } -void hvm_event_cr(unsigned int index, unsigned long value, unsigned long old) +bool_t hvm_event_cr(unsigned int index, unsigned long value, unsigned long old) { struct arch_domain *currad = current-domain-arch; unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index); @@ -109,7 +109,10 @@ void hvm_event_cr(unsigned int index, unsigned long value, unsigned long old) hvm_event_traps(currad-monitor.write_ctrlreg_sync ctrlreg_bitmask, req); +return 1; } + +return 0; } void hvm_event_msr(unsigned int msr, uint64_t value) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 536d1c8..abfca33 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -52,6 +52,7 @@ #include asm/traps.h #include asm/mc146818rtc.h #include asm/mce.h +#include asm/monitor.h #include asm/hvm/hvm.h #include asm/hvm/vpt.h #include asm/hvm/support.h @@ -468,6 +469,35 @@ void hvm_do_resume(struct vcpu *v) } } +if ( unlikely(d-arch.event_write_data) ) +{ +struct monitor_write_data *w = d-arch.event_write_data[v-vcpu_id]; + +if ( w-do_write.msr ) +{ +
[Xen-devel] [PATCH V4 1/3] xen/mem_access: Support for memory-content hiding
This patch adds support for memory-content hiding, by modifying the value returned by emulated instructions that read certain memory addresses that contain sensitive data. The patch only applies to cases where MEM_ACCESS_EMULATE or MEM_ACCESS_EMULATE_NOWRITE have been set to a vm_event response. Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com Acked-by: George Dunlap george.dun...@eu.citrix.com --- Changes since V3: - Renamed MEM_ACCESS_SET_EMUL_READ_DATA to VM_EVENT_FLAG_SET_EMUL_READ_DATA and updated its comment. - Removed xfree(v-arch.vm_event.emul_read_data) from free_vcpu_struct(). - Returning X86EMUL_UNHANDLEABLE from hvmemul_cmpxchg() when !curr-arch.vm_event.emul_read_data. - Replaced in xmalloc_bytes() with xmalloc_array() in hvmemul_rep_outs_set_context(). - Setting the rest of the buffer to zero in hvmemul_rep_movs() (no longer leaking heap contents). - No longer memset()ing the whole buffer before copy (just zeroing out the rest). - Moved hvmemul_ctxt-set_context = 0 to hvm_emulate_prepare() and removed hvm_emulate_one_set_context(). --- tools/tests/xen-access/xen-access.c |2 +- xen/arch/x86/hvm/emulate.c | 138 ++- xen/arch/x86/hvm/event.c| 50 ++--- xen/arch/x86/mm/p2m.c | 92 +-- xen/common/domain.c |2 + xen/common/vm_event.c | 23 ++ xen/include/asm-x86/domain.h|2 + xen/include/asm-x86/hvm/emulate.h | 10 ++- xen/include/public/vm_event.h | 31 ++-- 9 files changed, 274 insertions(+), 76 deletions(-) diff --git a/tools/tests/xen-access/xen-access.c b/tools/tests/xen-access/xen-access.c index 12ab921..e6ca9ba 100644 --- a/tools/tests/xen-access/xen-access.c +++ b/tools/tests/xen-access/xen-access.c @@ -530,7 +530,7 @@ int main(int argc, char *argv[]) break; case VM_EVENT_REASON_SOFTWARE_BREAKPOINT: printf(Breakpoint: rip=%016PRIx64, gfn=%PRIx64 (vcpu %d)\n, - req.regs.x86.rip, + req.data.regs.x86.rip, req.u.software_breakpoint.gfn, req.vcpu_id); diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index fe5661d..c6ccb1f 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -653,6 +653,31 @@ static int hvmemul_read( unsigned int bytes, struct x86_emulate_ctxt *ctxt) { +struct hvm_emulate_ctxt *hvmemul_ctxt = +container_of(ctxt, struct hvm_emulate_ctxt, ctxt); + +if ( unlikely(hvmemul_ctxt-set_context) ) +{ +struct vcpu *curr = current; +unsigned int safe_bytes; + +if ( !curr-arch.vm_event.emul_read_data ) +return X86EMUL_UNHANDLEABLE; + +safe_bytes = min_t(unsigned int, + bytes, curr-arch.vm_event.emul_read_data-size); + +if ( safe_bytes ) +{ +memcpy(p_data, curr-arch.vm_event.emul_read_data-data, safe_bytes); + +if ( bytes safe_bytes ) +memset(p_data + safe_bytes, 0, bytes - safe_bytes); +} + +return X86EMUL_OKAY; +} + return __hvmemul_read( seg, offset, p_data, bytes, hvm_access_read, container_of(ctxt, struct hvm_emulate_ctxt, ctxt)); @@ -893,6 +918,28 @@ static int hvmemul_cmpxchg( unsigned int bytes, struct x86_emulate_ctxt *ctxt) { +struct hvm_emulate_ctxt *hvmemul_ctxt = +container_of(ctxt, struct hvm_emulate_ctxt, ctxt); + +if ( unlikely(hvmemul_ctxt-set_context) ) +{ +struct vcpu *curr = current; + +if ( curr-arch.vm_event.emul_read_data ) +{ +unsigned int safe_bytes = min_t(unsigned int, bytes, +curr-arch.vm_event.emul_read_data-size); + +memcpy(p_new, curr-arch.vm_event.emul_read_data-data, + safe_bytes); + +if ( bytes safe_bytes ) +memset(p_new + safe_bytes, 0, bytes - safe_bytes); +} +else +return X86EMUL_UNHANDLEABLE; +} + /* Fix this in case the guest is really relying on r-m-w atomicity. */ return hvmemul_write(seg, offset, p_new, bytes, ctxt); } @@ -935,6 +982,43 @@ static int hvmemul_rep_ins( !!(ctxt-regs-eflags X86_EFLAGS_DF), gpa); } +static int hvmemul_rep_outs_set_context( +enum x86_segment src_seg, +unsigned long src_offset, +uint16_t dst_port, +unsigned int bytes_per_rep, +unsigned long *reps, +struct x86_emulate_ctxt *ctxt) +{ +unsigned int bytes = *reps * bytes_per_rep; +struct vcpu *curr = current; +unsigned int safe_bytes; +char *buf = NULL; +int rc; + +if ( !curr-arch.vm_event.emul_read_data ) +return X86EMUL_UNHANDLEABLE; + +buf = xmalloc_array(char, bytes); + +if ( buf == NULL ) +return
[Xen-devel] [PATCH V4 2/3] xen/vm_event: Support for guest-requested events
Added support for a new class of vm_events: VM_EVENT_REASON_REQUEST, sent via HVMOP_request_vm_event. The guest can request that a generic vm_event (containing only the vm_event-filled guest registers as information) be sent to userspace by setting up the correct registers and doing a VMCALL. For example, for a 32-bit guest, this means: EAX = 34 (hvmop), EBX = 24 (HVMOP_guest_request_vm_event), ECX = 0 (NULL required for the hypercall parameter, reserved). Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com Acked-by: Tamas K Lengyel tleng...@novetta.com Acked-by: Wei Liu wei.l...@citrix.com Acked-by: Jan Beulich jbeul...@suse.com --- Changes since V3: - None, just addded acks. --- tools/libxc/include/xenctrl.h |2 ++ tools/libxc/xc_monitor.c| 15 +++ xen/arch/x86/hvm/event.c| 16 xen/arch/x86/hvm/hvm.c |8 +++- xen/arch/x86/monitor.c | 16 xen/include/asm-x86/domain.h| 16 +--- xen/include/asm-x86/hvm/event.h |1 + xen/include/public/domctl.h |6 ++ xen/include/public/hvm/hvm_op.h |2 ++ xen/include/public/vm_event.h |2 ++ 10 files changed, 76 insertions(+), 8 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index d1d2ab3..4ce519a 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2384,6 +2384,8 @@ int xc_monitor_mov_to_msr(xc_interface *xch, domid_t domain_id, bool enable, int xc_monitor_singlestep(xc_interface *xch, domid_t domain_id, bool enable); int xc_monitor_software_breakpoint(xc_interface *xch, domid_t domain_id, bool enable); +int xc_monitor_guest_request(xc_interface *xch, domid_t domain_id, + bool enable, bool sync); /*** * Memory sharing operations. diff --git a/tools/libxc/xc_monitor.c b/tools/libxc/xc_monitor.c index 63013de..d979122 100644 --- a/tools/libxc/xc_monitor.c +++ b/tools/libxc/xc_monitor.c @@ -105,3 +105,18 @@ int xc_monitor_singlestep(xc_interface *xch, domid_t domain_id, return do_domctl(xch, domctl); } + +int xc_monitor_guest_request(xc_interface *xch, domid_t domain_id, bool enable, + bool sync) +{ +DECLARE_DOMCTL; + +domctl.cmd = XEN_DOMCTL_monitor_op; +domctl.domain = domain_id; +domctl.u.monitor_op.op = enable ? XEN_DOMCTL_MONITOR_OP_ENABLE +: XEN_DOMCTL_MONITOR_OP_DISABLE; +domctl.u.monitor_op.event = XEN_DOMCTL_MONITOR_EVENT_GUEST_REQUEST; +domctl.u.monitor_op.u.guest_request.sync = sync; + +return do_domctl(xch, domctl); +} diff --git a/xen/arch/x86/hvm/event.c b/xen/arch/x86/hvm/event.c index 5341937..17638ea 100644 --- a/xen/arch/x86/hvm/event.c +++ b/xen/arch/x86/hvm/event.c @@ -126,6 +126,22 @@ void hvm_event_msr(unsigned int msr, uint64_t value) hvm_event_traps(1, req); } +void hvm_event_guest_request(void) +{ +struct vcpu *curr = current; +struct arch_domain *currad = curr-domain-arch; + +if ( currad-monitor.guest_request_enabled ) +{ +vm_event_request_t req = { +.reason = VM_EVENT_REASON_GUEST_REQUEST, +.vcpu_id = curr-vcpu_id, +}; + +hvm_event_traps(currad-monitor.guest_request_sync, req); +} +} + int hvm_event_int3(unsigned long gla) { int rc = 0; diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 535d622..536d1c8 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -5974,7 +5974,6 @@ static int hvmop_get_param( #define HVMOP_op_mask 0xff long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) - { unsigned long start_iter, mask; long rc = 0; @@ -6388,6 +6387,13 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) break; } +case HVMOP_guest_request_vm_event: +if ( guest_handle_is_null(arg) ) +hvm_event_guest_request(); +else +rc = -EINVAL; +break; + default: { gdprintk(XENLOG_DEBUG, Bad HVM op %ld.\n, op); diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c index 896acf7..f8df7d2 100644 --- a/xen/arch/x86/monitor.c +++ b/xen/arch/x86/monitor.c @@ -161,6 +161,22 @@ int monitor_domctl(struct domain *d, struct xen_domctl_monitor_op *mop) break; } +case XEN_DOMCTL_MONITOR_EVENT_GUEST_REQUEST: +{ +bool_t status = ad-monitor.guest_request_enabled; + +rc = status_check(mop, status); +if ( rc ) +return rc; + +ad-monitor.guest_request_sync = mop-u.guest_request.sync; + +domain_pause(d); +ad-monitor.guest_request_enabled = !status; +domain_unpause(d); +break; +} + default: return -EOPNOTSUPP; diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index 7908844..f712caa 100644 ---
Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes
From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM When guest changes its interrupt configuration (such as, vector, etc.) for direct-assigned devices, we need to update the associated IRTE with the new guest vector, so external interrupts from the assigned devices can be injected to guests without VM-Exit. For lowest-priority interrupts, we use vector-hashing mechamisn to find the destination vCPU. This follows the hardware behavior, since modern Intel CPUs use vector hashing to handle the lowest-priority interrupt. For multicast/broadcast vCPU, we cannot handle it via interrupt posting, still use interrupt remapping. Signed-off-by: Feng Wu feng...@intel.com --- v3: - Use bitmap to store the all the possible destination vCPUs of an interrupt, then trying to find the right destination from the bitmap - Typo and some small changes xen/drivers/passthrough/io.c | 96 +++- 1 file changed, 95 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index 9b77334..18e24e1 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -26,6 +26,7 @@ #include asm/hvm/iommu.h #include asm/hvm/support.h #include xen/hvm/irq.h +#include asm/io_apic.h static DEFINE_PER_CPU(struct list_head, dpci_list); @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci) xfree(dpci); } +/* + * The purpose of this routine is to find the right destination vCPU for + * an interrupt which will be delivered by VT-d posted-interrupt. There + * are several cases as below: If you aim to have this interface common to more usages, don't restrict to VT-d posted-interrupt which should be just an example. + * + * - For lowest-priority interrupts, we find the destination vCPU from the + * guest vector using vector-hashing mechanism and return true. This follows + * the hardware behavior, since modern Intel CPUs use vector hashing to + * handle the lowest-priority interrupt. Does AMD use same hashing mechanism? Can this interface be reused by other IOMMU type or it's an Intel specific implementation? + * - Otherwise, for single destination interrupt, it is straightforward to + * find the destination vCPU and return true. + * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting, + * so return false. + * + * Here is the details about the vector-hashing mechanism: + * 1. For lowest-priority interrupts, store all the possible destination + * vCPUs in an array. + * 2. Use gvec % max number of destination vCPUs to find the right + * destination vCPU in the array for the lowest-priority interrupt. + */ +static struct vcpu *pi_find_dest_vcpu(struct domain *d, uint8_t dest_id, + uint8_t dest_mode, uint8_t delivery_mode, + uint8_t gvec) +{ +unsigned long *dest_vcpu_bitmap = NULL; +unsigned int dest_vcpu_num = 0, idx = 0; +int size = (d-max_vcpus + BITS_PER_LONG - 1) / BITS_PER_LONG; +struct vcpu *v, *dest = NULL; +int i; + +dest_vcpu_bitmap = xzalloc_array(unsigned long, size); +if ( !dest_vcpu_bitmap ) +{ +dprintk(XENLOG_G_INFO, +dom%d: failed to allocate memory\n, d-domain_id); +return NULL; +} + +for_each_vcpu ( d, v ) +{ +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0, +dest_id, dest_mode) ) +continue; + +__set_bit(v-vcpu_id, dest_vcpu_bitmap); +dest_vcpu_num++; +} + +if ( delivery_mode == dest_LowestPrio ) +{ +if ( dest_vcpu_num != 0 ) +{ Having 'idx=0' here is more readable than initializing it earlier. +for ( i = 0; i = gvec % dest_vcpu_num; i++) +idx = find_next_bit(dest_vcpu_bitmap, d-max_vcpus, idx) + 1; +idx--; + +BUG_ON(idx = d-max_vcpus || idx 0); idx is unsigned int. can't 0 +dest = d-vcpu[idx]; +} +} +else if ( dest_vcpu_num == 1 ) a comment would be applausive to explain the condition means fixed destination, while multicast/broadcast will have num as ZERO. +{ +idx = find_first_bit(dest_vcpu_bitmap, d-max_vcpus); +BUG_ON(idx = d-max_vcpus || idx 0); +dest = d-vcpu[idx]; +} + +xfree(dest_vcpu_bitmap); + +return dest; +} + int pt_irq_create_bind( struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind) { @@ -257,7 +330,7 @@ int pt_irq_create_bind( { case PT_IRQ_TYPE_MSI: { -uint8_t dest, dest_mode; +uint8_t dest, dest_mode, delivery_mode; int dest_vcpu_id; if ( !(pirq_dpci-flags HVM_IRQ_DPCI_MAPPED) ) @@ -330,11 +403,32 @@ int pt_irq_create_bind( /* Calculate dest_vcpu_id for MSI-type pirq
Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes
-Original Message- From: Tian, Kevin Sent: Wednesday, July 08, 2015 6:23 PM To: Wu, Feng; xen-devel@lists.xen.org Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang, Yang Z; george.dun...@eu.citrix.com Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config changes From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM When guest changes its interrupt configuration (such as, vector, etc.) for direct-assigned devices, we need to update the associated IRTE with the new guest vector, so external interrupts from the assigned devices can be injected to guests without VM-Exit. For lowest-priority interrupts, we use vector-hashing mechamisn to find the destination vCPU. This follows the hardware behavior, since modern Intel CPUs use vector hashing to handle the lowest-priority interrupt. For multicast/broadcast vCPU, we cannot handle it via interrupt posting, still use interrupt remapping. Signed-off-by: Feng Wu feng...@intel.com --- v3: - Use bitmap to store the all the possible destination vCPUs of an interrupt, then trying to find the right destination from the bitmap - Typo and some small changes xen/drivers/passthrough/io.c | 96 +++- 1 file changed, 95 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index 9b77334..18e24e1 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -26,6 +26,7 @@ #include asm/hvm/iommu.h #include asm/hvm/support.h #include xen/hvm/irq.h +#include asm/io_apic.h static DEFINE_PER_CPU(struct list_head, dpci_list); @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci) xfree(dpci); } +/* + * The purpose of this routine is to find the right destination vCPU for + * an interrupt which will be delivered by VT-d posted-interrupt. There + * are several cases as below: If you aim to have this interface common to more usages, don't restrict to VT-d posted-interrupt which should be just an example. Yes, making this a common interface should be better. + * + * - For lowest-priority interrupts, we find the destination vCPU from the + * guest vector using vector-hashing mechanism and return true. This follows + * the hardware behavior, since modern Intel CPUs use vector hashing to + * handle the lowest-priority interrupt. Does AMD use same hashing mechanism? Can this interface be reused by other IOMMU type or it's an Intel specific implementation? I am not sure how AMD handle lowest-priority. Intel hardware guys told me recent Intel hardware platform use this method to deliver lowest-priority interrupts. What do you mean by other IOMMU type? Thanks, Feng + * - Otherwise, for single destination interrupt, it is straightforward to + * find the destination vCPU and return true. + * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting, + * so return false. + * + * Here is the details about the vector-hashing mechanism: + * 1. For lowest-priority interrupts, store all the possible destination + * vCPUs in an array. + * 2. Use gvec % max number of destination vCPUs to find the right + * destination vCPU in the array for the lowest-priority interrupt. + */ +static struct vcpu *pi_find_dest_vcpu(struct domain *d, uint8_t dest_id, + uint8_t dest_mode, uint8_t delivery_mode, + uint8_t gvec) +{ +unsigned long *dest_vcpu_bitmap = NULL; +unsigned int dest_vcpu_num = 0, idx = 0; +int size = (d-max_vcpus + BITS_PER_LONG - 1) / BITS_PER_LONG; +struct vcpu *v, *dest = NULL; +int i; + +dest_vcpu_bitmap = xzalloc_array(unsigned long, size); +if ( !dest_vcpu_bitmap ) +{ +dprintk(XENLOG_G_INFO, +dom%d: failed to allocate memory\n, d-domain_id); +return NULL; +} + +for_each_vcpu ( d, v ) +{ +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0, +dest_id, dest_mode) ) +continue; + +__set_bit(v-vcpu_id, dest_vcpu_bitmap); +dest_vcpu_num++; +} + +if ( delivery_mode == dest_LowestPrio ) +{ +if ( dest_vcpu_num != 0 ) +{ Having 'idx=0' here is more readable than initializing it earlier. +for ( i = 0; i = gvec % dest_vcpu_num; i++) +idx = find_next_bit(dest_vcpu_bitmap, d-max_vcpus, idx) + 1; +idx--; + +BUG_ON(idx = d-max_vcpus || idx 0); idx is unsigned int. can't 0 +dest = d-vcpu[idx]; +} +} +else if ( dest_vcpu_num == 1 ) a comment would be applausive to explain the condition means fixed destination, while
[Xen-devel] [PATCH 4.0 01/55] config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected
4.0-stable review patch. If anyone has any objections, please let me know. -- From: Konrad Rzeszutek Wilk konrad.w...@oracle.com commit a6dfa128ce5c414ab46b1d690f7a1b8decb8526d upstream. A huge amount of NIC drivers use the DMA API, however if compiled under 32-bit an very important part of the DMA API can be ommitted leading to the drivers not working at all (especially if used with 'swiotlb=force iommu=soft'). As Prashant Sreedharan explains it: the driver [tg3] uses DEFINE_DMA_UNMAP_ADDR(), dma_unmap_addr_set() to keep a copy of the dma mapping and dma_unmap_addr() to get the mapping value. On most of the platforms this is a no-op, but ... with iommu=soft and swiotlb=force this house keeping is required, ... otherwise we pass 0 while calling pci_unmap_/pci_dma_sync_ instead of the DMA address. As such enable this even when using 32-bit kernels. Reported-by: Ian Jackson ian.jack...@eu.citrix.com Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Acked-by: David S. Miller da...@davemloft.net Acked-by: Prashant Sreedharan prash...@broadcom.com Cc: Borislav Petkov b...@alien8.de Cc: H. Peter Anvin h...@zytor.com Cc: Linus Torvalds torva...@linux-foundation.org Cc: Michael Chan mc...@broadcom.com Cc: Thomas Gleixner t...@linutronix.de Cc: boris.ostrov...@oracle.com Cc: casca...@linux.vnet.ibm.com Cc: david.vra...@citrix.com Cc: sanje...@broadcom.com Cc: siva.kal...@broadcom.com Cc: vyasev...@gmail.com Cc: xen-de...@lists.xensource.com Link: http://lkml.kernel.org/r/20150417190448.ga9...@l.oracle.com Signed-off-by: Ingo Molnar mi...@kernel.org Cc: Ben Hutchings b...@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org --- arch/x86/Kconfig |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -177,7 +177,7 @@ config SBUS config NEED_DMA_MAP_STATE def_bool y - depends on X86_64 || INTEL_IOMMU || DMA_API_DEBUG + depends on X86_64 || INTEL_IOMMU || DMA_API_DEBUG || SWIOTLB config NEED_SG_DMA_LENGTH def_bool y ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 05/15] vt-d: VT-d Posted-Interrupts feature detection
From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt. With VT-d Posted-Interrupts enabled, external interrupts from direct-assigned devices can be delivered to guests without VMM intervention when guest is running in non-root mode. This patch adds feature detection logic for VT-d posted-interrupt. Signed-off-by: Feng Wu feng...@intel.com --- v3: - Remove the if no intremap then no intpost logic in intel_vtd_setup(), it is covered in the iommu_setup(). - Add if no intremap then no intpost logic in the end of init_vtd_hw() which is called by vtd_resume(). So the logic exists in the following three places: - parse_iommu_param() - iommu_setup() - init_vtd_hw() xen/drivers/passthrough/vtd/iommu.c | 18 -- xen/drivers/passthrough/vtd/iommu.h | 1 + 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 9053a1f..4221185 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2071,6 +2071,9 @@ static int init_vtd_hw(void) disable_intremap(drhd-iommu); } +if ( !iommu_intremap ) +iommu_intpost = 0; + /* * Set root entries for each VT-d engine. After set root entry, * must globally invalidate context cache, and then globally @@ -2133,8 +2136,8 @@ int __init intel_vtd_setup(void) } /* We enable the following features only if they are supported by all VT-d - * engines: Snoop Control, DMA passthrough, Queued Invalidation and - * Interrupt Remapping. + * engines: Snoop Control, DMA passthrough, Queued Invalidation, Interrupt + * Remapping, and Posted Interrupt */ for_each_drhd_unit ( drhd ) { @@ -2162,6 +2165,15 @@ int __init intel_vtd_setup(void) if ( iommu_intremap !ecap_intr_remap(iommu-ecap) ) iommu_intremap = 0; +/* + * We cannot use posted interrupt if X86_FEATURE_CX16 is + * not supported, since we count on this feature to + * atomically update 16-byte IRTE in posted format. + */ +if ( !iommu_intremap + (!cap_intr_post(iommu-cap) || !cpu_has_cx16) ) +iommu_intpost = 0; + Looks a typo here. -|| Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] traps.c:3227: GPF (0000): ffff82d080194a4d - ffff82d080239d85 and other dom0 induced log messages
Monday, July 6, 2015, 11:33:09 AM, you wrote: On 26.06.15 at 17:57, li...@eikelenboom.it wrote: On 2015-06-26 17:51, Jan Beulich wrote: On 26.06.15 at 17:41, li...@eikelenboom.it wrote: from 3.16 to 3.19 we gained a lot of these, if i remember correctly related to perf being enabled in the kernel: + traps.c:2655:d0v0 Domain attempted WRMSR c081 from 0xe023e008 to 0x00230010. + traps.c:2655:d0v0 Domain attempted WRMSR c082 from 0x82d0b000 to 0x81bc2670. + traps.c:2655:d0v0 Domain attempted WRMSR c083 from 0x82d0b020 to 0x81bc4630. These are the SYSCALL (STAR) MSRs, which the kernel has no business touching when running on Xen. from 3.19 to 4.0 we gained: + d0 attempted to change d0v0's CR4 flags 0660 - 0760 + d0 attempted to change d0v1's CR4 flags 0660 - 0760 + d0 attempted to change d0v2's CR4 flags 0660 - 0760 + d0 attempted to change d0v3's CR4 flags 0660 - 0760 + d0 attempted to change d0v4's CR4 flags 0660 - 0760 + d0 attempted to change d0v5's CR4 flags 0660 - 0760 This is X86_CR4_PCE - not sure how to properly handle that. Andrew, you're fiddling with the CR4 handling right now anyway - any thoughts? and from 4.0 to 4.1 we gained the ones you were interested in: + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 For these to be meaningful you need to translate them to symbolic addresses. (And yes, we should see to make the code print them in a more useful manner.) How ? addr2line against xen-syms (or xen.efi if you use that one). And of course the result may need manual adjustment to account for eventual patches you have in your tree. Jan Ah yeah .. silly me .. somehow i had in mind it would be kernel addresses instead of xen, so running it against vmlinux of course lead no where. Here we go: (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 - 82d080239d85 (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 - 82d080239d85 which leads to: # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080195583 /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080239d85 ??:? Were /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 leads to: case MSR_EFER: rdmsr_normal: /* Everyone can read the MSR space. */ /* gdprintk(XENLOG_WARNING,Domain attempted RDMSR %p.\n, _p(regs-ecx));*/ HERE --if ( rdmsr_safe(regs-ecx, val) ) goto fail; rdmsr_writeback: regs-eax = (uint32_t)val; regs-edx = (uint32_t)(val 32); break; } break; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 08, 2015 4:44 PM To: Wu, Feng Cc: Andrew Cooper; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: RE: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64 On 08.07.15 at 10:33, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 08, 2015 4:13 PM On 08.07.15 at 09:06, feng...@intel.com wrote: From: xen-devel-boun...@lists.xen.org [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Andrew Cooper Sent: Thursday, June 25, 2015 2:35 AM On 24/06/15 06:18, Feng Wu wrote: +{ +uint128_t prev; + +ASSERT(cpu_has_cx16); Given that if this assertion were to fail, cmpxchg16b would fail with #UD, I would hand-code a asm_fixup section which in turn panics. This avoids a situation where non-debug builds could die with an unqualified #UD exception. Is there an existing way to panic the hypervisor in assembler code, I don't find it, it would be appreciated if you can point it out. I'm not convinced such a #UD would be a significant problem: Looking at the disassembly will show the cause right away. The out of line ud2-s in some of VMX'es inline assembly wrappers are far worse. So, do you agree with the fixup section or not? I'd rather not go that route, unless Andrew or your manage to convince me otherwise. I think Andrew's enforce really means ASSERT() or BUG_ON(), again to avoid an unqualified exception. However - see above. Plus, all that said, without having seen the actual use sites of cmpxchg16b yet, I'm not at all convinced we really need this patch. After introducing posted format in IRTE, some fields exist in both the High 64 bit and the low 64 bit,such as pda_h and pda_l, how to make sure it is atomic when updating the pda field? Is there a need for updating these _after_ initially setting up an entry? Each time the guest sets the affinity, we need to change this filed to refer to the new destination. Thanks, Feng Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64
On 08/07/2015 09:12, Jan Beulich wrote: +{ +uint128_t prev; + +ASSERT(cpu_has_cx16); Given that if this assertion were to fail, cmpxchg16b would fail with #UD, I would hand-code a asm_fixup section which in turn panics. This avoids a situation where non-debug builds could die with an unqualified #UD exception. Is there an existing way to panic the hypervisor in assembler code, I don't find it, it would be appreciated if you can point it out. When I asked for this, I was thinking of having an assertion frame with the cmpxchg16b instruction in the place of the regular ud2a. This way, if it were to failed with #UD, there is a more useful error message. However, there is no easy way of doing this at the moment, and it is an obscure set of circumstances, so probably not worth the hassle. I'm not convinced such a #UD would be a significant problem: Looking at the disassembly will show the cause right away. The out of line ud2-s in some of VMX'es inline assembly wrappers are far worse. Unqualified #UDs are harder to debug than qualified ones, and I have an annoying habit of hitting them. In some copious free time, I want to continue the work started with c/s 0a3e27e and 881d6bf. git grep suggests there isn't actually too much to fix up in this regard. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used
From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM This patch adds an API which is used to update the IRTE for posted-interrupt when guest changes MSI/MSI-X information. Signed-off-by: Feng Wu feng...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com, with one small comment: +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint8_t gvec) +{ +struct irq_desc *desc; +struct msi_desc *msi_desc; +int remap_index; +int rc = 0; +struct pci_dev *pci_dev; +struct acpi_drhd_unit *drhd; +struct iommu *iommu; +struct ir_ctrl *ir_ctrl; +struct iremap_entry *iremap_entries = NULL, *p = NULL; +struct iremap_entry new_ire; +struct pi_desc *pi_desc = v-arch.hvm_vmx.pi_desc; +unsigned long flags; +uint128_t old_ire, ret; + +desc = pirq_spin_lock_irq_desc(pirq, NULL); +if ( !desc ) +return -ENOMEM; -EINVAL? ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 08/15] Suppress posting interrupts when 'SN' is set
-Original Message- From: Tian, Kevin Sent: Wednesday, July 08, 2015 5:06 PM To: Wu, Feng; xen-devel@lists.xen.org Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang, Yang Z; george.dun...@eu.citrix.com Subject: RE: [v3 08/15] Suppress posting interrupts when 'SN' is set From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM Currently, we don't support urgent interrupt, all interrupts are recognized as non-urgent interrupt, so we cannot send posted-interrupt when 'SN' is set. Signed-off-by: Feng Wu feng...@intel.com --- v3: use cmpxchg to test SN/ON and set ON xen/arch/x86/hvm/vmx/vmx.c | 32 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 0837627..b94ef6a 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1686,6 +1686,8 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector) { +struct pi_desc old, new, prev; + move to 'else if'. if ( pi_test_and_set_pir(vector, v-arch.hvm_vmx.pi_desc) ) return; @@ -1698,13 +1700,35 @@ static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector) */ pi_set_on(v-arch.hvm_vmx.pi_desc); } -else if ( !pi_test_and_set_on(v-arch.hvm_vmx.pi_desc) ) +else { +prev.control = 0; + +do { +old.control = v-arch.hvm_vmx.pi_desc.control + ~(1 POSTED_INTR_ON | 1 POSTED_INTR_SN); +new.control = v-arch.hvm_vmx.pi_desc.control | + 1 POSTED_INTR_ON; + +/* + * Currently, we don't support urgent interrupt, all + * interrupts are recognized as non-urgent interrupt, + * so we cannot send posted-interrupt when 'SN' is set. + * Besides that, if 'ON' is already set, we cannot set + * posted-interrupts as well. + */ +if ( prev.sn || prev.on ) +{ +vcpu_kick(v); +return; +} would it make more sense to move above check after cmpxchg? My original idea is that, we only need to do the check when prev.control != old.control, which means the cmpxchg is not successful completed. If we add the check between cmpxchg and while ( prev.control != old.control ), it seems the logic is not so clear, since we don't need to check prev.sn and prev.on when cmxchg succeeds in setting the new value. Thanks, Feng + +prev.control = cmpxchg(v-arch.hvm_vmx.pi_desc.control, + old.control, new.control); +} while ( prev.control != old.control ); + __vmx_deliver_posted_interrupt(v); -return; } - -vcpu_kick(v); } static void vmx_sync_pir_to_irr(struct vcpu *v) -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM Extend struct pi_desc according to VT-d Posted-Interrupts Spec. Signed-off-by: Feng Wu feng...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 00/13] enable Cache Allocation Technology (CAT) for VMs
On Tue, Jul 07, 2015 at 03:46:21PM +0100, Ian Campbell wrote: On Fri, 2015-06-26 at 16:43 +0800, Chao Peng wrote: Chao Peng (13): x86: add socket_cpumask x86: detect and initialize Intel CAT feature x86: maintain COS to CBM mapping for each socket x86: add COS information for each domain x86: expose CBM length and COS number information x86: dynamically get/set CBM for a domain x86: add scheduling support for Intel CAT xsm: add CAT related xsm policies Jan applied to here. So I was going to apply these 5: tools/libxl: minor name changes for CMT commands tools/libxl: add command to show PSR hardware info tools/libxl: introduce some socket helpers tools: add tools support for Intel CAT docs: add xl-psr.markdown But, on i686 I see: xl_cmdimpl.c: In function ‘psr_cat_hwinfo’: xl_cmdimpl.c:8390:16: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ [-Werror=format=] (1ul info-cbm_len) - 1); ^ xl_cmdimpl.c: In function ‘psr_cat_print_socket’: xl_cmdimpl.c:8450:5: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ [-Werror=format=] printf(%-16s: %#PRIx64\n, Default CBM, (1ul info-cbm_len) - 1); ^ cc1: all warnings being treated as errors It seems there is some mismatch between your types and the printf formats used. The appropriate format specifier for an unsigned long (which you have from the ul in the constant) is %#lx and not %#PRIxXX which is associated with uintXX_t types. If you need a 64 bit type then you might have meant instead to use ull in which case you want %#llx as the format specifier. This is what I need. Thanks for suggestion. Chao If you really want/need an exactly 64 bit type then you'll have to do some nasty casting, something like ((uint64_t)1) info-cbm_len) - 1 or something, that's pretty ugly though. If you have to go this route then please test both builds, in case I've gotten my ()'s wrong. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] Modified RTDS scheduler to use an event-driven model instead of polling.
[Trimming the Cc-list a bit, to avoid bothering Wei and Jan] On Tue, 2015-07-07 at 22:56 -0700, Meng Xu wrote: Hi Dario, Hi, 2015-07-07 7:03 GMT-07:00 Dario Faggioli dario.faggi...@citrix.com: On Mon, 2015-07-06 at 22:51 -0700, Meng Xu wrote: So, it looks to me that, as far as (1) and (2) are concerned, since we are just inserting a vCPU in the runq, if we have M pCPUs, and we know whether we inserted it within the first M spots, we already have what we want, or am I missing something? And if __runq_insert() now (with Dagaen patch) tells us this, well, we can simplify the tickling logic, can't we? I think you might assume that the first M VCPUs in the runq are the current running VCPUs on the M pCPUs. Am I correct? (From what you described in the following example, I think I'm correct. ;-) ) Mmm... Interesting. Yes, I was. I was basing this assumption on this chunk on Dagaen's patch: // If we become one of top [# CPUs] in the runq, tickle it // TODO: make this work when multiple tickles are required if ( new_position 0 new_position = prv-NUM_CPUS ) runq_tickle(ops, svc); And forgot (and did not go check) about the __q_remove() in rt_schedule(). My bad again. But then, since we don't have the running vCPUs in the runq, how the code above is supposed to be correct? With an example: We are waking up (or re-inserting, in rt_context_saved()) vCPU j. We have 6 pCPUs. __runq_insert() tells us that it put vCPU j at the 3rd place in the runq. This means vCPU j should be set to run as soon as possible. So, if vCPU j is 3rd in runq, either (a) there are only 3 runnable vCPUs (i.e., if we are waking up j, there were 2 of them, and j is the third; if we are in context_saved, there already where 3, and j just got it's deadline postponed, or someone else got its one replenished); (b) there are more than 3 runnable vCPUs, i.e., there is at least a 4th vCPU --say vCPU k-- in the runq, which was the 3rd before vCPU j were woken (or re-inserted), but now became the 4th, because deadline(j)deadline(k). In case (a), there are for sure idle pCPUs, and we should tickle one of them. I tell that you make the above assumption from here. However, in the current implementation, runq does not hold the current running VCPUs on the pCPUs. We remove the vcpu from runq in rt_schedule() function. What you described above make perfect sense if we decide to make runq hold the current running VCPUs. Yep. And it indeed seems to me that we may well think about doing so. It will make it possible to base on the position for making/optimizing scheduling decisions, and at the same time I don't think I see much downsides in that, do you? Actually, after thinking about the example you described, I think we can hold the current running VCPUs *and* the current idle pCPUs in the scheduler-wide structure; What do you mean with 'current idle pCPUs'? I said something similar as well, and what I meant was a cpumask with bit i set if i-eth pCPU is idle, do you also mean this? About the running vCPUs, why just not leave them in the actual runq? In other words, we can have another runningq (not runq) and a idle_pcpu list in the rt_private; Now all VCPUs are stored in three queues: runningq, runq, and depletedq, in increasing priority order. Perhaps, but I'm not sure I see the need for another list. Again, why just not leave them in runq? I appreciate this is a rather big change (although, perhaps it looks bigger said than done), but I think it could be worth pursuing. For double checking, asserting, and making sure that we are able to identify the running svc-s, we have the __RTDS_scheduled flag. When we make the tickle decision, we only need to scan the idle_pcpu and then runningq to figure out which pCPU to tickle. All of other design you describe still hold here, except that the position where a VCPU is inserted into runq cannot directly give us which pCPU to tickle. What do you think? I think that I'd like to know why you think adding another queue is necessary, instead of just leaving the vCPUs in the actual runq. Is there something bad about that which I'm missing? In case (b) there may be idle pCPUs (and, if that's the case, we should tickle one of them, of course) or not. If not, we need to go figure out which pCPU to tickle, which is exactly what runq_tickle() does, but we at least know for sure that we want to tickle the pCPU where vCPU k runs, or others where vCPUs with deadline greater than vCPU k run. Does this make sense? Yes, if we decide to hold the currently running VCPUs in scheduler-wide structure: it can be runq or runningq. Yes, but if we use two queues, we defeat at least part of this optimization/simplification. Still, I think I gave enough material for an actual optimization. What do you think? Yes. It is very clear. The only thing is how we are going
Re: [Xen-devel] [v3 05/15] vt-d: VT-d Posted-Interrupts feature detection
-Original Message- From: Tian, Kevin Sent: Wednesday, July 08, 2015 3:32 PM To: Wu, Feng; xen-devel@lists.xen.org Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang, Yang Z; george.dun...@eu.citrix.com Subject: RE: [v3 05/15] vt-d: VT-d Posted-Interrupts feature detection From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt. With VT-d Posted-Interrupts enabled, external interrupts from direct-assigned devices can be delivered to guests without VMM intervention when guest is running in non-root mode. This patch adds feature detection logic for VT-d posted-interrupt. Signed-off-by: Feng Wu feng...@intel.com --- v3: - Remove the if no intremap then no intpost logic in intel_vtd_setup(), it is covered in the iommu_setup(). - Add if no intremap then no intpost logic in the end of init_vtd_hw() which is called by vtd_resume(). So the logic exists in the following three places: - parse_iommu_param() - iommu_setup() - init_vtd_hw() xen/drivers/passthrough/vtd/iommu.c | 18 -- xen/drivers/passthrough/vtd/iommu.h | 1 + 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 9053a1f..4221185 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2071,6 +2071,9 @@ static int init_vtd_hw(void) disable_intremap(drhd-iommu); } +if ( !iommu_intremap ) +iommu_intpost = 0; + /* * Set root entries for each VT-d engine. After set root entry, * must globally invalidate context cache, and then globally @@ -2133,8 +2136,8 @@ int __init intel_vtd_setup(void) } /* We enable the following features only if they are supported by all VT-d - * engines: Snoop Control, DMA passthrough, Queued Invalidation and - * Interrupt Remapping. + * engines: Snoop Control, DMA passthrough, Queued Invalidation, Interrupt + * Remapping, and Posted Interrupt */ for_each_drhd_unit ( drhd ) { @@ -2162,6 +2165,15 @@ int __init intel_vtd_setup(void) if ( iommu_intremap !ecap_intr_remap(iommu-ecap) ) iommu_intremap = 0; +/* + * We cannot use posted interrupt if X86_FEATURE_CX16 is + * not supported, since we count on this feature to + * atomically update 16-byte IRTE in posted format. + */ +if ( !iommu_intremap + (!cap_intr_post(iommu-cap) || !cpu_has_cx16) ) +iommu_intpost = 0; + Looks a typo here. -|| Yes, this is a typo. Thanks for the review. Thanks, Feng Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 07/15] vmx: Initialize VT-d Posted-Interrupts Descriptor
From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM This patch initializes the VT-d Posted-interrupt Descriptor. Signed-off-by: Feng Wu feng...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 for Xen 4.6 1/4] xen: enable per-VCPU parameter settings for RTDS scheduler
On Tue, 2015-07-07 at 23:06 -0700, Meng Xu wrote: 2015-07-07 7:39 GMT-07:00 Dario Faggioli dario.faggi...@citrix.com: On Tue, 2015-07-07 at 09:59 +0100, Jan Beulich wrote: On 29.06.15 at 04:44, lichong...@gmail.com wrote: --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -31,7 +31,6 @@ obj-y += rbtree.o obj-y += rcupdate.o obj-y += sched_credit.o obj-y += sched_credit2.o -obj-y += sched_sedf.o obj-y += sched_arinc653.o obj-y += sched_rt.o obj-y += schedule.o Stray change. Or perhaps the file doesn't build anymore, in which case you should instead have stated that the patch is dependent upon the series removing SEDF. This indeed does not belong in here. And of course, things should build... So, Chong, either deal with SEDF as well, if basing your patches on a tree where it is still there, or base on top of my patches, ignore it, but state the dependency, as Jan is asking. @@ -1157,8 +1158,75 @@ rt_dom_cntl( +case XEN_DOMCTL_SCHEDOP_putvcpuinfo: +spin_lock_irqsave(prv-lock, flags); +for( index = 0; index op-u.v.nr_vcpus; index++ ) +{ +if ( copy_from_guest_offset(local_sched, +op-u.v.vcpus, index, 1) ) +{ +rc = -EFAULT; +break; +} +if ( local_sched.vcpuid = d-max_vcpus +|| d-vcpu[local_sched.vcpuid] == NULL ) +{ +rc = -EINVAL; +break; +} +svc = rt_vcpu(d-vcpu[local_sched.vcpuid]); +svc-period = MICROSECS(local_sched.s.rtds.period); +svc-budget = MICROSECS(local_sched.s.rtds.budget); Are all input values valid here? That's a good point, actually. Right now, SEDF does some range enforcement, by means of these values: #define PERIOD_MAX MILLISECS(1) /* 10s */ #define PERIOD_MIN (MICROSECS(10)) /* 10us */ #define SLICE_MIN (MICROSECS(5))/* 5us */ Chong, it probably makes sense to (in a separate patch), introduce something like this in RTDS too (with SLICE_MIN--BUDGET_MIN), and then use them, in this patch, for sanity checking the input. It also makes sense to check and enforce budget=period, IMO. About the specific values, I'm open to proposals. I think something like the SEDF's one is fine. Meng? We are trying to make some range enforcement for RTDS scheduler. Is my understanding correct? (It should be, but just in case. :-) ) We are wondering whether that could be necessary/useful, and IMO, it would. As to the range of period, I think the max value can be as large as the type of period (ie. s_time_t) can represent. When we want a dedicated CPU for a guest, we will set budget=period and can set the period to a very very large value to avoid the unnecessarily invocation of the scheduler. Makes sense. We do have STIME_MAX and, given that period is something that is added to current time during scheduling, STIME_DELTA_MAX. Maybe, put something together basing on those? As to the min value of period, I think it should be =100us. The scheduler overhead of running a large box could be 1us if the runq is long and competetion of the runq lock is heavy. If the scheduler is potentially invoked every 10us, the scheduler overhead will be 10% of total computation time, which seems a lot to me. Ok. As to the range of budget, the min value can be 5us, the same with SEDF; Well, wouldn't the above reasoning about overhead apply here too? Budgets of 5us mean the scheduler can be invoked every 5us for budget enforcement. If 10us was unreasonable, 5 is even more so. Therefore, 100us here too? Or maybe let's allow for lower values (like 50us or 10us), but print a warning? the max value is the value of period of the same VCPU. Yep. And, whatever the values, it would be useful to have comments somewhere (either when the values are defined or enforced), stating what you said above. Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v6][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr
Currently we're intending to cover this kind of devices with shared RMRR simply since the case of shared RMRR is a rare case according to our previous experiences. But late we can group these devices which shared rmrr, and then allow all devices within a group to be assigned to same domain. CC: Yang Zhang yang.z.zh...@intel.com CC: Kevin Tian kevin.t...@intel.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com --- v6: * Nothing is changed. v5: * Nothing is changed. v4: * Refine one code comment. xen/drivers/passthrough/vtd/iommu.c | 32 +--- 1 file changed, 29 insertions(+), 3 deletions(-) diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index c833290..095fb1d 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2297,13 +2297,39 @@ static int intel_iommu_assign_device( if ( list_empty(acpi_drhd_units) ) return -ENODEV; +seg = pdev-seg; +bus = pdev-bus; +/* + * In rare cases one given rmrr is shared by multiple devices but + * obviously this would put the security of a system at risk. So + * we should prevent from this sort of device assignment. + * + * TODO: in the future we can introduce group device assignment + * interface to make sure devices sharing RMRR are assigned to the + * same domain together. + */ +for_each_rmrr_device( rmrr, bdf, i ) +{ +if ( rmrr-segment == seg + PCI_BUS(bdf) == bus + PCI_DEVFN2(bdf) == devfn ) +{ +if ( rmrr-scope.devices_cnt 1 ) +{ +printk(XENLOG_G_ERR VTDPREFIX +cannot assign %04x:%02x:%02x.%u +with shared RMRR for Dom%d.\n, + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), + d-domain_id); +return -EPERM; +} +} +} + ret = reassign_device_ownership(hardware_domain, d, devfn, pdev); if ( ret ) return ret; -seg = pdev-seg; -bus = pdev-bus; - /* Setup rmrr identity mapping */ for_each_rmrr_device( rmrr, bdf, i ) { -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v6][PATCH 06/16] hvmloader/pci: skip reserved ranges
When allocating mmio address for PCI bars, we need to make sure they don't overlap with reserved regions. CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v6: * Nothing is changed. v5: * Rename that field, is_64bar, inside struct bars with flag, and then extend to also indicate if this bar is already allocated. v4: * We have to re-design this as follows: #1. Goal MMIO region should exclude all reserved device memory #2. Requirements #2.1 Still need to make sure MMIO region is fit all pci devices as before #2.2 Accommodate the not aligned reserved memory regions If I'm missing something let me know. #3. How to #3.1 Address #2.1 We need to either of populating more RAM, or of expanding more highmem. But we should know just 64bit-bar can work with highmem, and as you mentioned we also should avoid expanding highmem as possible. So my implementation is to allocate 32bit-bar and 64bit-bar orderly. 1. The first allocation round just to 32bit-bar If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar with all remaining resources including low pci memory. If not, we need to calculate how much RAM should be populated to allocate the remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go to the second allocation round 2. 2. The second allocation round to the remaining 32bit-bar We should can finish allocating all 32bit-bar in theory, then go to the third allocation round 3. 3. The third allocation round to 64bit-bar We'll try to first allocate from the remaining low memory resource. If that isn't enough, we try to expand highmem to allocate for 64bit-bar. This process should be same as the original. #3.2 Address #2.2 I'm trying to accommodate the not aligned reserved memory regions: We should skip all reserved device memory, but we also need to check if other smaller bars can be allocated if a mmio hole exists between resource-base and reserved device memory. If a hole exists between base and reserved device memory, lets go out simply to try allocate for next bar since all bars are in descending order of size. If not, we need to move resource-base to reserved_end just to reallocate this bar. tools/firmware/hvmloader/pci.c | 194 ++--- 1 file changed, 164 insertions(+), 30 deletions(-) diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c index 5ff87a7..397f3b7 100644 --- a/tools/firmware/hvmloader/pci.c +++ b/tools/firmware/hvmloader/pci.c @@ -38,6 +38,31 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0; enum virtual_vga virtual_vga = VGA_none; unsigned long igd_opregion_pgbase = 0; +static void relocate_ram_for_pci_memory(unsigned long cur_pci_mem_start) +{ +struct xen_add_to_physmap xatp; +unsigned int nr_pages = min_t( +unsigned int, +hvm_info-low_mem_pgend - (cur_pci_mem_start PAGE_SHIFT), +(1u 16) - 1); +if ( hvm_info-high_mem_pgend == 0 ) +hvm_info-high_mem_pgend = 1ull (32 - PAGE_SHIFT); +hvm_info-low_mem_pgend -= nr_pages; +printf(Relocating 0x%x pages from PRIllx to PRIllx\ +for lowmem MMIO hole\n, + nr_pages, + PRIllx_arg(((uint64_t)hvm_info-low_mem_pgend)PAGE_SHIFT), + PRIllx_arg(((uint64_t)hvm_info-high_mem_pgend)PAGE_SHIFT)); +xatp.domid = DOMID_SELF; +xatp.space = XENMAPSPACE_gmfn_range; +xatp.idx = hvm_info-low_mem_pgend; +xatp.gpfn = hvm_info-high_mem_pgend; +xatp.size = nr_pages; +if ( hypercall_memory_op(XENMEM_add_to_physmap, xatp) != 0 ) +BUG(); +hvm_info-high_mem_pgend += nr_pages; +} + void pci_setup(void) { uint8_t is_64bar, using_64bar, bar64_relocate = 0; @@ -50,17 +75,22 @@ void pci_setup(void) /* Resources assignable to PCI devices via BARs. */ struct resource { uint64_t base, max; -} *resource, mem_resource, high_mem_resource, io_resource; +} *resource, mem_resource, high_mem_resource, io_resource, exp_mem_resource; /* Create a list of device BARs in descending order of size. */ struct bars { -uint32_t is_64bar; +#define PCI_BAR_IS_64BIT0x1 +#define PCI_BAR_IS_ALLOCATED0x2 +uint32_t flag; uint32_t devfn; uint32_t bar_reg; uint64_t bar_sz; } *bars = (struct bars *)scratch_start; -unsigned int i, nr_bars = 0; -uint64_t mmio_hole_size = 0; +unsigned int i, j, n, nr_bars = 0; +uint64_t mmio_hole_size = 0, reserved_start, reserved_end, reserved_size; +bool bar32_allocating = 0; +uint64_t mmio32_unallocated_total = 0; +unsigned long
[Xen-devel] [v6][PATCH 14/16] xen/vtd: enable USB device assignment
USB RMRR may conflict with guest BIOS region. In such case, identity mapping setup is simply skipped in previous implementation. Now we can handle this scenario cleanly with new policy mechanism so previous hack code can be removed now. CC: Yang Zhang yang.z.zh...@intel.com CC: Kevin Tian kevin.t...@intel.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com --- v6: * Nothing is changed. v5: * Nothing is changed. v4: * Refine the patch head description xen/drivers/passthrough/vtd/dmar.h | 1 - xen/drivers/passthrough/vtd/iommu.c | 11 ++- xen/drivers/passthrough/vtd/utils.c | 7 --- 3 files changed, 2 insertions(+), 17 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h index af1feef..af205f5 100644 --- a/xen/drivers/passthrough/vtd/dmar.h +++ b/xen/drivers/passthrough/vtd/dmar.h @@ -129,7 +129,6 @@ do {\ int vtd_hw_check(void); void disable_pmr(struct iommu *iommu); -int is_usb_device(u16 seg, u8 bus, u8 devfn); int is_igd_drhd(struct acpi_drhd_unit *drhd); #endif /* _DMAR_H_ */ diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 56f5911..c833290 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2245,11 +2245,9 @@ static int reassign_device_ownership( /* * If the device belongs to the hardware domain, and it has RMRR, don't * remove it from the hardware domain, because BIOS may use RMRR at - * booting time. Also account for the special casing of USB below (in - * intel_iommu_assign_device()). + * booting time. */ -if ( !is_hardware_domain(source) - !is_usb_device(pdev-seg, pdev-bus, pdev-devfn) ) +if ( !is_hardware_domain(source) ) { const struct acpi_rmrr_unit *rmrr; u16 bdf; @@ -2303,13 +2301,8 @@ static int intel_iommu_assign_device( if ( ret ) return ret; -/* FIXME: Because USB RMRR conflicts with guest bios region, - * ignore USB RMRR temporarily. - */ seg = pdev-seg; bus = pdev-bus; -if ( is_usb_device(seg, bus, pdev-devfn) ) -return 0; /* Setup rmrr identity mapping */ for_each_rmrr_device( rmrr, bdf, i ) diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c index bd14c02..b8a077f 100644 --- a/xen/drivers/passthrough/vtd/utils.c +++ b/xen/drivers/passthrough/vtd/utils.c @@ -29,13 +29,6 @@ #include extern.h #include asm/io_apic.h -int is_usb_device(u16 seg, u8 bus, u8 devfn) -{ -u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), -PCI_CLASS_DEVICE); -return (class == 0xc03); -} - /* Disable vt-d protected memory registers. */ void disable_pmr(struct iommu *iommu) { -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v6][PATCH 10/16] tools: introduce some new parameters to set rdm policy
This patch introduces user configurable parameters to specify RDM resource and according policies, Global RDM parameter: rdm = strategy=host,reserve=strict/relaxed Per-device RDM parameter: pci = [ 'sbdf, rdm_reserve=strict/relaxed' ] Global RDM parameter, strategy, allows user to specify reserved regions explicitly, Currently, using 'host' to include all reserved regions reported on this platform which is good to handle hotplug scenario. In the future this parameter may be further extended to allow specifying random regions, e.g. even those belonging to another platform as a preparation for live migration with passthrough devices. By default this isn't set so we don't check all rdms. Instead, we just check rdm specific to a given device if you're assigning this kind of device. Note this option is not recommended unless you can make sure any conflict does exist. 'strict/relaxed' policy decides how to handle conflict when reserving RDM regions in pfn space. If conflict exists, 'strict' means an immediate error so VM can't keep running, while 'relaxed' allows moving forward with a warning message thrown out. Default per-device RDM policy is same as default global RDM policy as being 'relaxed'. And the per-device policy would override the global policy like others. CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v6: * Some rename to make our policy reasonable type - strategy none - ignore * Don't expose ignore in xl level and just keep that as a default. And then sync docs and the patch head description v5: * Just make sure the per-device plicy always override the global policy, and so cleanup some associated comments and the patch head description. * A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED. * Improve all descriptions in doc. * Make all rdm variables specific to .hvm v4: * No need to define init_val for libxl_rdm_reserve_type since its just zero * Grab those changes to xl/libxlu to as a final patch docs/man/xl.cfg.pod.5| 81 docs/misc/vtd.txt| 24 + tools/libxl/libxl_create.c | 7 tools/libxl/libxl_internal.h | 2 ++ tools/libxl/libxl_pci.c | 9 + tools/libxl/libxl_types.idl | 18 ++ 6 files changed, 141 insertions(+) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index a3e0e2e..091e80d 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -655,6 +655,79 @@ assigned slave device. =back +=item Brdm=RDM_RESERVATION_STRING + +(HVM/x86 only) Specifies information about Reserved Device Memory (RDM), +which is necessary to enable robust device passthrough. One example of RDM +is reported through ACPI Reserved Memory Region Reporting (RMRR) structure +on x86 platform. + +BRDM_RESERVE_STRING has the form C[KEY=VALUE,KEY=VALUE,... where: + +=over 4 + +=item BKEY=VALUE + +Possible BKEYs are: + +=over 4 + +=item Bstrategy=STRING + +Currently there is only one valid type: + +host means all reserved device memory on this platform should be checked to +reserve regions in this VM's guest address space. This global rdm parameter +allows user to specify reserved regions explicitly, and using host includes +all reserved regions reported on this platform, which is useful when doing +hotplug. + +By default this isn't set so we don't check all rdms. Instead, we just check +rdm specific to a given device if you're assigning this kind of device. Note +this option is not recommended unless you can make sure any conflict does exist. + +For example, you're trying to set memory = 2800 to allocate memory to one +given VM but the platform owns two RDM regions like, + +Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff +Device B [sbdf_B]: RMRR region_B: base_addr ad80 end_address afff + +In this conflict case, + +#1. If Bstrategy is set to host, for example, + +rdm = strategy=host,reserve=strict or rdm = strategy=host,reserve=relaxed + +It means all conflicts will be handled according to the policy +introduced by Breserve as described below. + +#2. If Bstrategy is not set at all, but + +pci = [ 'sbdf_A, rdm_reserve=x' ] + +It means only one conflict of region_A will be handled according to the policy +introduced by Brdm_reserve=STRING as described inside pci options. + +=item Breserve=STRING + +Specifies how to deal with conflicts when reserving reserved device +memory in guest address space. + +When that conflict is unsolved, + +strict means VM can't be created, or the associated device can't be +attached in the case of hotplug. + +relaxed allows VM to be created but may cause VM to crash if +pass-through device accesses RDM. For exampl,e Windows IGD GFX driver +always accessed RDM regions so it leads to VM crash. + +Note this may be overridden by
[Xen-devel] [v6][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest
Here we'll construct a basic guest e820 table via XENMEM_set_memory_map. This table includes lowmem, highmem and RDMs if they exist, and hvmloader would need this info later. Note this guest e820 table would be same as before if the platform has no any RDM or we disable RDM (by default). CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v6: * Nothing is changed. v5: * Rephrase patch's short log * Make libxl__domain_construct_e820() hidden v4: * Use goto style error handling. * Instead of NOGC, we shoud use libxl__malloc(gc,XXX) to allocate local e820. tools/libxl/libxl_dom.c | 5 +++ tools/libxl/libxl_internal.h | 24 + tools/libxl/libxl_x86.c | 83 3 files changed, 112 insertions(+) diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 62ef120..41da479 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, goto out; } +if (libxl__domain_construct_e820(gc, d_config, domid, args)) { +LOG(ERROR, setting domain memory map failed); +goto out; +} + ret = hvm_build_set_params(ctx-xch, domid, info, state-store_port, state-store_mfn, state-console_port, state-console_mfn, state-store_domid, diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index b4d8419..a50449a 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -3794,6 +3794,30 @@ static inline void libxl__update_config_vtpm(libxl__gc *gc, */ void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr, const libxl_bitmap *sptr); + +/* + * Here we're just trying to set these kinds of e820 mappings: + * + * #1. Low memory region + * + * Low RAM starts at least from 1M to make sure all standard regions + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios, + * have enough space. + * Note: Those stuffs below 1M are still constructed with multiple + * e820 entries by hvmloader. At this point we don't change anything. + * + * #2. RDM region if it exists + * + * #3. High memory region if it exists + * + * Note: these regions are not overlapping since we already check + * to adjust them. Please refer to libxl__domain_device_construct_rdm(). + */ +_hidden int libxl__domain_construct_e820(libxl__gc *gc, + libxl_domain_config *d_config, + uint32_t domid, + struct xc_hvm_build_args *args); + #endif /* diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c index ed2bd38..be297b2 100644 --- a/tools/libxl/libxl_x86.c +++ b/tools/libxl/libxl_x86.c @@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq) } /* + * Here we're just trying to set these kinds of e820 mappings: + * + * #1. Low memory region + * + * Low RAM starts at least from 1M to make sure all standard regions + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios, + * have enough space. + * Note: Those stuffs below 1M are still constructed with multiple + * e820 entries by hvmloader. At this point we don't change anything. + * + * #2. RDM region if it exists + * + * #3. High memory region if it exists + * + * Note: these regions are not overlapping since we already check + * to adjust them. Please refer to libxl__domain_device_construct_rdm(). + */ +#define GUEST_LOW_MEM_START_DEFAULT 0x10 +int libxl__domain_construct_e820(libxl__gc *gc, + libxl_domain_config *d_config, + uint32_t domid, + struct xc_hvm_build_args *args) +{ +int rc = 0; +unsigned int nr = 0, i; +/* We always own at least one lowmem entry. */ +unsigned int e820_entries = 1; +struct e820entry *e820 = NULL; +uint64_t highmem_size = +args-highmem_end ? args-highmem_end - (1ull 32) : 0; + +/* Add all rdm entries. */ +for (i = 0; i d_config-num_rdms; i++) +if (d_config-rdms[i].flag != LIBXL_RDM_RESERVE_FLAG_INVALID) +e820_entries++; + + +/* If we should have a highmem range. */ +if (highmem_size) +e820_entries++; + +if (e820_entries = E820MAX) { +LOG(ERROR, Ooops! Too many entries in the memory map!\n); +rc = ERROR_INVAL; +goto out; +} + +e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries); + +/* Low memory */ +e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT; +e820[nr].size = args-lowmem_end - GUEST_LOW_MEM_START_DEFAULT; +e820[nr].type
[Xen-devel] [v6][PATCH 02/16] xen/vtd: create RMRR mapping
RMRR reserved regions must be setup in the pfn space with an identity mapping to reported mfn. However existing code has problem to setup correct mapping when VT-d shares EPT page table, so lead to problem when assigning devices (e.g GPU) with RMRR reported. So instead, this patch aims to setup identity mapping in p2m layer, regardless of whether EPT is shared or not. And we still keep creating VT-d table. And we also need to introduce a pair of helper to create/clear this sort of identity mapping as follows: set_identity_p2m_entry(): If the gfn space is unoccupied, we just set the mapping. If space is already occupied by desired identity mapping, do nothing. Otherwise, failure is returned. clear_identity_p2m_entry(): We just define macro to wrapper guest_physmap_remove_page() with a returning value as necessary. CC: Tim Deegan t...@xen.org CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com CC: Yang Zhang yang.z.zh...@intel.com CC: Kevin Tian kevin.t...@intel.com Reviewed-by: Kevin Tian kevin.t...@intel.com Reviewed-by: Tim Deegan t...@xen.org Acked-by: George Dunlap george.dun...@eu.citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v6: * Nothing is changed. v5: * Fold our original patch #2 and #3 as this new * Introduce a new, clear_identity_p2m_entry, which can wrapper guest_physmap_remove_page(). And we use this to clean our identity mapping. v4: * Change that orginal condition, if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm ) to make sure we catch those invalid mfn mapping as we expected. * To have if ( !paging_mode_translate(p2m-domain) ) return 0; at the start, instead of indenting the whole body of the function in an inner scope. * extend guest_physmap_remove_page() to return a value as a proper unmapping helper * Instead of intel_iommu_unmap_page(), we should use guest_physmap_remove_page() to unmap rmrr mapping correctly. * Drop iommu_map_page() since actually ept_set_entry() can do this internally. xen/arch/x86/mm/p2m.c | 40 +++-- xen/drivers/passthrough/vtd/iommu.c | 5 ++--- xen/include/asm-x86/p2m.h | 13 +--- 3 files changed, 50 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 6b39733..99a26ca 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn, p2m-default_access); } -void +int guest_physmap_remove_page(struct domain *d, unsigned long gfn, unsigned long mfn, unsigned int page_order) { struct p2m_domain *p2m = p2m_get_hostp2m(d); +int rc; gfn_lock(p2m, gfn, page_order); -p2m_remove_page(p2m, gfn, mfn, page_order); +rc = p2m_remove_page(p2m, gfn, mfn, page_order); gfn_unlock(p2m, gfn, page_order); +return rc; } int @@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access); } +int set_identity_p2m_entry(struct domain *d, unsigned long gfn, + p2m_access_t p2ma) +{ +p2m_type_t p2mt; +p2m_access_t a; +mfn_t mfn; +struct p2m_domain *p2m = p2m_get_hostp2m(d); +int ret; + +if ( !paging_mode_translate(p2m-domain) ) +return 0; + +gfn_lock(p2m, gfn, 0); + +mfn = p2m-get_entry(p2m, gfn, p2mt, a, 0, NULL); + +if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm ) +ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K, +p2m_mmio_direct, p2ma); +else if ( mfn_x(mfn) == gfn p2mt == p2m_mmio_direct a == p2ma ) +ret = 0; +else +{ +ret = -EBUSY; +printk(XENLOG_G_WARNING + Cannot setup identity map d%d:%lx, +gfn already mapped to %lx.\n, + d-domain_id, gfn, mfn_x(mfn)); +} + +gfn_unlock(p2m, gfn, 0); +return ret; +} + /* Returns: 0 for success, -errno for failure */ int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn) { diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 44ed23d..8415958 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map, while ( base_pfn end_pfn ) { -if ( intel_iommu_unmap_page(d, base_pfn) ) +if ( clear_identity_p2m_entry(d, base_pfn, 0) ) ret = -ENXIO; base_pfn++; } @@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map, while ( base_pfn end_pfn ) { -int err = intel_iommu_map_page(d, base_pfn, base_pfn, -
[Xen-devel] [v6][PATCH 04/16] xen: enable XENMEM_memory_map in hvm
This patch enables XENMEM_memory_map in hvm. So hvmloader can use it to setup the e820 mappings. CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Reviewed-by: Tim Deegan t...@xen.org Reviewed-by: Kevin Tian kevin.t...@intel.com Acked-by: Jan Beulich jbeul...@suse.com Acked-by: George Dunlap george.dun...@eu.citrix.com --- v6: * Nothing is changed. v5: * Nothing is changed. v4: * Just refine the patch head description as Jan commented. xen/arch/x86/hvm/hvm.c | 2 -- xen/arch/x86/mm.c | 6 -- 2 files changed, 8 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 535d622..638daee 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4741,7 +4741,6 @@ static long hvm_memory_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) switch ( cmd MEMOP_CMD_MASK ) { -case XENMEM_memory_map: case XENMEM_machine_memory_map: case XENMEM_machphys_mapping: return -ENOSYS; @@ -4817,7 +4816,6 @@ static long hvm_memory_op_compat32(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) switch ( cmd MEMOP_CMD_MASK ) { -case XENMEM_memory_map: case XENMEM_machine_memory_map: case XENMEM_machphys_mapping: return -ENOSYS; diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index fd151c6..92eccd0 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) return rc; } -if ( is_hvm_domain(d) ) -{ -rcu_unlock_domain(d); -return -EPERM; -} - e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries); if ( e820 == NULL ) { -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [v6][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy
This patch passes rdm reservation policy to xc_assign_device() so the policy is checked when assigning devices to a VM. Note this also bring some fallout to python usage of xc_assign_device(). CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com CC: David Scott dave.sc...@eu.citrix.com Acked-by: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v6: * Nothing is changed. v5: * Fix the flag field as 0 to DT device v4: * In the patch head description, I add to explain why we need to sync the xc.c file tools/libxc/include/xenctrl.h | 3 ++- tools/libxc/xc_domain.c | 9 - tools/libxl/libxl_pci.c | 3 ++- tools/ocaml/libs/xc/xenctrl_stubs.c | 16 tools/python/xen/lowlevel/xc/xc.c | 30 -- 5 files changed, 44 insertions(+), 17 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 9160623..89cbc5a 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2079,7 +2079,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch, /* HVM guest pass-through */ int xc_assign_device(xc_interface *xch, uint32_t domid, - uint32_t machine_sbdf); + uint32_t machine_sbdf, + uint32_t flag); int xc_get_device_group(xc_interface *xch, uint32_t domid, diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index 0951291..ef41228 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch, int xc_assign_device( xc_interface *xch, uint32_t domid, -uint32_t machine_sbdf) +uint32_t machine_sbdf, +uint32_t flag) { DECLARE_DOMCTL; @@ -1705,6 +1706,7 @@ int xc_assign_device( domctl.domain = domid; domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI; domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf; +domctl.u.assign_device.flag = flag; return do_domctl(xch, domctl); } @@ -1792,6 +1794,11 @@ int xc_assign_dt_device( domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT; domctl.u.assign_device.u.dt.size = size; +/* + * DT doesn't own any RDM so actually DT has nothing to do + * for any flag and here just fix that as 0. + */ +domctl.u.assign_device.flag = 0; set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path); rc = do_domctl(xch, domctl); diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index e0743f8..632c15e 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i FILE *f; unsigned long long start, end, flags, size; int irq, i, rc, hvm = 0; +uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED; if (type == LIBXL_DOMAIN_TYPE_INVALID) return ERROR_FAIL; @@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i out: if (!libxl_is_stubdom(ctx, domid, NULL)) { -rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev)); +rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev), flag); if (rc 0 (hvm || errno != ENOSYS)) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, xc_assign_device failed); return ERROR_FAIL; diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c index 64f1137..b7de615 100644 --- a/tools/ocaml/libs/xc/xenctrl_stubs.c +++ b/tools/ocaml/libs/xc/xenctrl_stubs.c @@ -1172,12 +1172,17 @@ CAMLprim value stub_xc_domain_test_assign_device(value xch, value domid, value d CAMLreturn(Val_bool(ret == 0)); } -CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc) +static int domain_assign_device_rdm_flag_table[] = { +XEN_DOMCTL_DEV_RDM_RELAXED, +}; + +CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc, +value rflag) { - CAMLparam3(xch, domid, desc); + CAMLparam4(xch, domid, desc, rflag); int ret; int domain, bus, dev, func; - uint32_t sbdf; + uint32_t sbdf, flag; domain = Int_val(Field(desc, 0)); bus = Int_val(Field(desc, 1)); @@ -1185,7 +1190,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc) func = Int_val(Field(desc, 3)); sbdf = encode_sbdf(domain, bus, dev, func); - ret = xc_assign_device(_H(xch), _D(domid), sbdf); + ret = Int_val(Field(rflag, 0)); + flag = domain_assign_device_rdm_flag_table[ret]; + + ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag); if (ret 0)
[Xen-devel] [v6][PATCH 05/16] hvmloader: get guest memory map into memory_map[]
Now we get this map layout by call XENMEM_memory_map then save them into one global variable memory_map[]. It should include lowmem range, rdm range and highmem range. Note rdm range and highmem range may not exist in some cases. And here we need to check if any reserved memory conflicts with [RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END]. This range is used to allocate memory in hvmloder level, and we would lead hvmloader failed in case of conflict since its another rare possibility in real world. CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Reviewed-by: Kevin Tian kevin.t...@intel.com --- v6: * Nothing is changed. v5: * Nothing is changed. v4: * Move some codes related to e820 to that specific file, e820.c. * Consolidate printf()+BUG() and BUG_ON() * Avoid another fixed width type for the parameter of get_mem_mapping_layout() tools/firmware/hvmloader/e820.c | 35 +++ tools/firmware/hvmloader/e820.h | 7 +++ tools/firmware/hvmloader/hvmloader.c | 2 ++ tools/firmware/hvmloader/util.c | 26 ++ tools/firmware/hvmloader/util.h | 12 5 files changed, 82 insertions(+) diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c index 2e05e93..3e53c47 100644 --- a/tools/firmware/hvmloader/e820.c +++ b/tools/firmware/hvmloader/e820.c @@ -23,6 +23,41 @@ #include config.h #include util.h +struct e820map memory_map; + +void memory_map_setup(void) +{ +unsigned int nr_entries = E820MAX, i; +int rc; +uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1; +uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr; + +rc = get_mem_mapping_layout(memory_map.map, nr_entries); + +if ( rc || !nr_entries ) +{ +printf(Get guest memory maps[%d] failed. (%d)\n, nr_entries, rc); +BUG(); +} + +memory_map.nr_map = nr_entries; + +for ( i = 0; i nr_entries; i++ ) +{ +if ( memory_map.map[i].type == E820_RESERVED ) +{ +if ( check_overlap(alloc_addr, alloc_size, + memory_map.map[i].addr, + memory_map.map[i].size) ) +{ +printf(Fail to setup memory map due to conflict); +printf( on dynamic reserved memory range.\n); +BUG(); +} +} +} +} + void dump_e820_table(struct e820entry *e820, unsigned int nr) { uint64_t last_end = 0, start, end; diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h index b2ead7f..8b5a9e0 100644 --- a/tools/firmware/hvmloader/e820.h +++ b/tools/firmware/hvmloader/e820.h @@ -15,6 +15,13 @@ struct e820entry { uint32_t type; } __attribute__((packed)); +#define E820MAX128 + +struct e820map { +unsigned int nr_map; +struct e820entry map[E820MAX]; +}; + #endif /* __HVMLOADER_E820_H__ */ /* diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c index 25b7f08..84c588c 100644 --- a/tools/firmware/hvmloader/hvmloader.c +++ b/tools/firmware/hvmloader/hvmloader.c @@ -262,6 +262,8 @@ int main(void) init_hypercalls(); +memory_map_setup(); + xenbus_setup(); bios = detect_bios(); diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c index 80d822f..122e3fa 100644 --- a/tools/firmware/hvmloader/util.c +++ b/tools/firmware/hvmloader/util.c @@ -27,6 +27,17 @@ #include xen/memory.h #include xen/sched.h +/* + * Check whether there exists overlap in the specified memory range. + * Returns true if exists, else returns false. + */ +bool check_overlap(uint64_t start, uint64_t size, + uint64_t reserved_start, uint64_t reserved_size) +{ +return (start + size reserved_start) +(start reserved_start + reserved_size); +} + void wrmsr(uint32_t idx, uint64_t v) { asm volatile ( @@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid) *p = '\0'; } +int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries) +{ +int rc; +struct xen_memory_map memmap = { +.nr_entries = *max_entries +}; + +set_xen_guest_handle(memmap.buffer, entries); + +rc = hypercall_memory_op(XENMEM_memory_map, memmap); +*max_entries = memmap.nr_entries; + +return rc; +} + void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns) { static int over_allocated; diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h index f99c0f19..1100a3b 100644 --- a/tools/firmware/hvmloader/util.h +++ b/tools/firmware/hvmloader/util.h @@ -4,8 +4,10 @@
[Xen-devel] [v6][PATCH 07/16] hvmloader/e820: construct guest e820 table
Now we can use that memory map to build our final e820 table but it may need to reorder all e820 entries. CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- v6: * Nothing is changed. v5: * Nothing is changed. v4: * Rename local variable, low_mem_pgend, to low_mem_end. * Improve some code comments * Adjust highmem after lowmem is changed. tools/firmware/hvmloader/e820.c | 80 + 1 file changed, 66 insertions(+), 14 deletions(-) diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c index 3e53c47..aa2569f 100644 --- a/tools/firmware/hvmloader/e820.c +++ b/tools/firmware/hvmloader/e820.c @@ -108,7 +108,9 @@ int build_e820_table(struct e820entry *e820, unsigned int lowmem_reserved_base, unsigned int bios_image_base) { -unsigned int nr = 0; +unsigned int nr = 0, i, j; +uint64_t add_high_mem = 0; +uint64_t low_mem_end = hvm_info-low_mem_pgend PAGE_SHIFT; if ( !lowmem_reserved_base ) lowmem_reserved_base = 0xA; @@ -152,13 +154,6 @@ int build_e820_table(struct e820entry *e820, e820[nr].type = E820_RESERVED; nr++; -/* Low RAM goes here. Reserve space for special pages. */ -BUG_ON((hvm_info-low_mem_pgend PAGE_SHIFT) (2u 20)); -e820[nr].addr = 0x10; -e820[nr].size = (hvm_info-low_mem_pgend PAGE_SHIFT) - e820[nr].addr; -e820[nr].type = E820_RAM; -nr++; - /* * Explicitly reserve space for special pages. * This space starts at RESERVED_MEMBASE an extends to cover various @@ -194,16 +189,73 @@ int build_e820_table(struct e820entry *e820, nr++; } - -if ( hvm_info-high_mem_pgend ) +/* + * Construct E820 table according to recorded memory map. + * + * The memory map created by toolstack may include, + * + * #1. Low memory region + * + * Low RAM starts at least from 1M to make sure all standard regions + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios, + * have enough space. + * + * #2. Reserved regions if they exist + * + * #3. High memory region if it exists + */ +for ( i = 0; i memory_map.nr_map; i++ ) { -e820[nr].addr = ((uint64_t)1 32); -e820[nr].size = -((uint64_t)hvm_info-high_mem_pgend PAGE_SHIFT) - e820[nr].addr; -e820[nr].type = E820_RAM; +e820[nr] = memory_map.map[i]; nr++; } +/* Low RAM goes here. Reserve space for special pages. */ +BUG_ON(low_mem_end (2u 20)); + +/* + * We may need to adjust real lowmem end since we may + * populate RAM to get enough MMIO previously. + */ +for ( i = 0; i memory_map.nr_map; i++ ) +{ +uint64_t end = e820[i].addr + e820[i].size; +if ( e820[i].type == E820_RAM + low_mem_end e820[i].addr low_mem_end end ) +{ +add_high_mem = end - low_mem_end; +e820[i].size = low_mem_end - e820[i].addr; +} +} + +/* + * And then we also need to adjust highmem. + */ +if ( add_high_mem ) +{ +for ( i = 0; i memory_map.nr_map; i++ ) +{ +if ( e820[i].type == E820_RAM + e820[i].addr (1ull 32)) +e820[i].size += add_high_mem; +} +} + +/* Finally we need to reorder all e820 entries. */ +for ( j = 0; j nr-1; j++ ) +{ +for ( i = j+1; i nr; i++ ) +{ +if ( e820[j].addr e820[i].addr ) +{ +struct e820entry tmp; +tmp = e820[j]; +e820[j] = e820[i]; +e820[i] = tmp; +} +} +} + return nr; } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable: pci-passthrough of device using MSI-X interrupts not working after commit x86/MSI: track host and guest masking separately
Tuesday, July 7, 2015, 6:08:25 PM, you wrote: On 26.06.15 at 17:48, li...@eikelenboom.it wrote: On 2015-06-26 17:22, Jan Beulich wrote: I have an idea: In static unsigned int startup_msi_irq(struct irq_desc *desc) { bool_t guest_masked = (desc-status IRQ_GUEST) is_hvm_domain(desc-msi_desc-dev-domain); if ( unlikely(!msi_set_mask_bit(desc, 0, guest_masked)) ) WARN(); return 0; } I think we need to also exclude the emuirq case (which is what I understand backs the pvhvm interrupt in the guest - Stefano, please confirm). For testing purposes, could you try simply passing zero instead of guest_masked here? I can confirm, with 0 it works ! Okay, here's something that hopefully could go in (provided of course it too works for you). Hi Jan, Just tested and it works fine :-) -- Sander Jan --- unstable.orig/xen/arch/x86/irq.c2015-07-07 17:56:52.0 +0200 +++ unstable/xen/arch/x86/irq.c 2015-07-07 17:04:08.0 +0200 @@ -2502,6 +2502,25 @@ int unmap_domain_pirq_emuirq(struct doma return ret; } +void arch_evtchn_bind_pirq(struct domain *d, int pirq) +{ +int irq = domain_pirq_to_irq(d, pirq); +struct irq_desc *desc; +unsigned long flags; + +if ( irq = 0 ) +return; + +if ( is_hvm_domain(d) ) +map_domain_emuirq_pirq(d, pirq, IRQ_PT); + +desc = irq_to_desc(irq); +spin_lock_irqsave(desc-lock, flags); +if ( desc-msi_desc ) +guest_mask_msi_irq(desc, 0); +spin_unlock_irqrestore(desc-lock, flags); +} + bool_t hvm_domain_use_pirq(const struct domain *d, const struct pirq *pirq) { return is_hvm_domain(d) pirq --- unstable.orig/xen/arch/x86/msi.c2015-07-07 17:56:53.0 +0200 +++ unstable/xen/arch/x86/msi.c 2015-07-07 16:50:02.0 +0200 @@ -422,10 +422,7 @@ void guest_mask_msi_irq(struct irq_desc static unsigned int startup_msi_irq(struct irq_desc *desc) { -bool_t guest_masked = (desc-status IRQ_GUEST) - is_hvm_domain(desc-msi_desc-dev-domain); - -msi_set_mask_bit(desc, 0, guest_masked); +msi_set_mask_bit(desc, 0, !!(desc-status IRQ_GUEST)); return 0; } --- unstable.orig/xen/common/event_channel.c2015-07-07 17:56:51.0 +0200 +++ unstable/xen/common/event_channel.c 2015-07-07 16:53:47.0 +0200 @@ -456,10 +456,7 @@ static long evtchn_bind_pirq(evtchn_bind bind-port = port; -#ifdef CONFIG_X86 -if ( is_hvm_domain(d) domain_pirq_to_irq(d, pirq) 0 ) -map_domain_emuirq_pirq(d, pirq, IRQ_PT); -#endif +arch_evtchn_bind_pirq(d, pirq); out: spin_unlock(d-event_lock); --- unstable.orig/xen/include/asm-arm/irq.h 2015-07-07 17:56:49.0 +0200 +++ unstable/xen/include/asm-arm/irq.h 2015-07-07 17:02:00.0 +0200 @@ -48,6 +48,8 @@ int release_guest_irq(struct domain *d, void arch_move_irqs(struct vcpu *v); +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq))) + /* Set IRQ type for an SPI */ int irq_set_spi_type(unsigned int spi, unsigned int type); --- unstable.orig/xen/include/xen/irq.h 2015-07-07 17:56:49.0 +0200 +++ unstable/xen/include/xen/irq.h 2015-07-07 17:02:49.0 +0200 @@ -172,4 +172,8 @@ unsigned int set_desc_affinity(struct ir unsigned int arch_hwdom_irqs(domid_t); #endif +#ifndef arch_evtchn_bind_pirq +void arch_evtchn_bind_pirq(struct domain *, int pirq); +#endif + #endif /* __XEN_IRQ_H__ */ ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 08, 2015 4:13 PM To: Wu, Feng Cc: Andrew Cooper; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: RE: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64 On 08.07.15 at 09:06, feng...@intel.com wrote: -Original Message- From: xen-devel-boun...@lists.xen.org [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Andrew Cooper Sent: Thursday, June 25, 2015 2:35 AM To: Wu, Feng; xen-devel@lists.xen.org Cc: george.dun...@eu.citrix.com; Zhang, Yang Z; Tian, Kevin; k...@xen.org; jbeul...@suse.com Subject: Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64 On 24/06/15 06:18, Feng Wu wrote: This patch adds cmpxchg16b support for x86-64, so software can perform 128-bit atomic write/read. Signed-off-by: Feng Wu feng...@intel.com --- v3: Newly added. xen/include/asm-x86/x86_64/system.h | 28 xen/include/xen/types.h | 5 + 2 files changed, 33 insertions(+) diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h index 662813a..a910d00 100644 --- a/xen/include/asm-x86/x86_64/system.h +++ b/xen/include/asm-x86/x86_64/system.h @@ -6,6 +6,34 @@ (unsigned long)(n),sizeof(*(ptr /* + * Atomic 16 bytes compare and exchange. Compare OLD with MEM, if + * identical, store NEW in MEM. Return the initial value in MEM. + * Success is indicated by comparing RETURN with OLD. + * + * This function can only be called when cpu_has_cx16 is ture. + */ + +static always_inline uint128_t __cmpxchg16b( +volatile void *ptr, uint128_t old, uint128_t new) It is not nice for register scheduling taking uint128_t's by value. Instead, I would pass them by pointer and let the inlining sort the eventual references out. +{ +uint128_t prev; + +ASSERT(cpu_has_cx16); Given that if this assertion were to fail, cmpxchg16b would fail with #UD, I would hand-code a asm_fixup section which in turn panics. This avoids a situation where non-debug builds could die with an unqualified #UD exception. Is there an existing way to panic the hypervisor in assembler code, I don't find it, it would be appreciated if you can point it out. I'm not convinced such a #UD would be a significant problem: Looking at the disassembly will show the cause right away. The out of line ud2-s in some of VMX'es inline assembly wrappers are far worse. So, do you agree with the fixup section or not? As to panic()ing from assembly code: movq$string-label, %rdi callpanic Also, you must enforce 16-byte alignment of the memory reference, as described in the manual. What should I do if the caller passes an non 16-byte alignment data (struct iremap_entry in this case) ? Do this mean I need to define it like this? struct iremap_entry { .. } __attribute__ ((aligned (16))); How would that help? The table entries hardware uses are supposed to be 16-byte aligned anyway, aren't they? Oh, yes, the base address of the remapping table is 4K aligned. I think Andrew's enforce really means ASSERT() or BUG_ON(), again to avoid an unqualified exception. However - see above. Plus, all that said, without having seen the actual use sites of cmpxchg16b yet, I'm not at all convinced we really need this patch. After introducing posted format in IRTE, some fields exist in both the High 64 bit and the low 64 bit,such as pda_h and pda_l, how to make sure it is atomic when updating the pda field? Thanks, Feng Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 00/13] enable Cache Allocation Technology (CAT) for VMs
On Wed, Jul 08, 2015 at 05:40:47PM +0800, Chao Peng wrote: On Tue, Jul 07, 2015 at 03:46:21PM +0100, Ian Campbell wrote: On Fri, 2015-06-26 at 16:43 +0800, Chao Peng wrote: Chao Peng (13): x86: add socket_cpumask x86: detect and initialize Intel CAT feature x86: maintain COS to CBM mapping for each socket x86: add COS information for each domain x86: expose CBM length and COS number information x86: dynamically get/set CBM for a domain x86: add scheduling support for Intel CAT xsm: add CAT related xsm policies Jan applied to here. So I was going to apply these 5: tools/libxl: minor name changes for CMT commands tools/libxl: add command to show PSR hardware info tools/libxl: introduce some socket helpers tools: add tools support for Intel CAT docs: add xl-psr.markdown But, on i686 I see: xl_cmdimpl.c: In function ‘psr_cat_hwinfo’: xl_cmdimpl.c:8390:16: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ [-Werror=format=] (1ul info-cbm_len) - 1); ^ xl_cmdimpl.c: In function ‘psr_cat_print_socket’: xl_cmdimpl.c:8450:5: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ [-Werror=format=] printf(%-16s: %#PRIx64\n, Default CBM, (1ul info-cbm_len) - 1); ^ cc1: all warnings being treated as errors It seems there is some mismatch between your types and the printf formats used. The appropriate format specifier for an unsigned long (which you have from the ul in the constant) is %#lx and not %#PRIxXX which is associated with uintXX_t types. If you need a 64 bit type then you might have meant instead to use ull in which case you want %#llx as the format specifier. This is what I need. Thanks for suggestion. Chao, 4.6 freeze is on Friday. Can you fix that minor bug and repost your series within two days? Wei. Chao If you really want/need an exactly 64 bit type then you'll have to do some nasty casting, something like ((uint64_t)1) info-cbm_len) - 1 or something, that's pretty ugly though. If you have to go this route then please test both builds, in case I've gotten my ()'s wrong. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used
-Original Message- From: Tian, Kevin Sent: Wednesday, July 08, 2015 6:00 PM To: Wu, Feng; xen-devel@lists.xen.org Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang, Yang Z; george.dun...@eu.citrix.com Subject: RE: [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM This patch adds an API which is used to update the IRTE for posted-interrupt when guest changes MSI/MSI-X information. Signed-off-by: Feng Wu feng...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com, with one small comment: +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint8_t gvec) +{ +struct irq_desc *desc; +struct msi_desc *msi_desc; +int remap_index; +int rc = 0; +struct pci_dev *pci_dev; +struct acpi_drhd_unit *drhd; +struct iommu *iommu; +struct ir_ctrl *ir_ctrl; +struct iremap_entry *iremap_entries = NULL, *p = NULL; +struct iremap_entry new_ire; +struct pi_desc *pi_desc = v-arch.hvm_vmx.pi_desc; +unsigned long flags; +uint128_t old_ire, ret; + +desc = pirq_spin_lock_irq_desc(pirq, NULL); +if ( !desc ) +return -ENOMEM; -EINVAL? I think -EINVAL is reasonable. Thanks, Feng ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
Hi, At 17:38 + on 07 Jul (1436290689), Sahita, Ravi wrote: In order to make forward progress, do the other maintainers (Jan, Andrew, Tim) agree with the patch direction that George has suggested for this particular patch? I'm no longer a maintainer for this code, but FWIW I think that this direction (adding a new argument to the internal APIs rather than adding new internal APIs) is correct. Because the sve bit must be _set_ to get the old/default behaviour, I think the p2m_pt implementation should always return sve = 1 on _get and possibly also assert sve != 0 on _set. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] traps.c:3227: GPF (0000): ffff82d080194a4d - ffff82d080239d85 and other dom0 induced log messages
On 08/07/2015 11:04, Sander Eikelenboom wrote: Wednesday, July 8, 2015, 10:58:02 AM, you wrote: On 08/07/2015 09:45, Sander Eikelenboom wrote: Monday, July 6, 2015, 11:33:09 AM, you wrote: On 26.06.15 at 17:57, li...@eikelenboom.it wrote: On 2015-06-26 17:51, Jan Beulich wrote: On 26.06.15 at 17:41, li...@eikelenboom.it wrote: from 3.16 to 3.19 we gained a lot of these, if i remember correctly related to perf being enabled in the kernel: + traps.c:2655:d0v0 Domain attempted WRMSR c081 from 0xe023e008 to 0x00230010. + traps.c:2655:d0v0 Domain attempted WRMSR c082 from 0x82d0b000 to 0x81bc2670. + traps.c:2655:d0v0 Domain attempted WRMSR c083 from 0x82d0b020 to 0x81bc4630. These are the SYSCALL (STAR) MSRs, which the kernel has no business touching when running on Xen. from 3.19 to 4.0 we gained: + d0 attempted to change d0v0's CR4 flags 0660 - 0760 + d0 attempted to change d0v1's CR4 flags 0660 - 0760 + d0 attempted to change d0v2's CR4 flags 0660 - 0760 + d0 attempted to change d0v3's CR4 flags 0660 - 0760 + d0 attempted to change d0v4's CR4 flags 0660 - 0760 + d0 attempted to change d0v5's CR4 flags 0660 - 0760 This is X86_CR4_PCE - not sure how to properly handle that. Andrew, you're fiddling with the CR4 handling right now anyway - any thoughts? and from 4.0 to 4.1 we gained the ones you were interested in: + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 For these to be meaningful you need to translate them to symbolic addresses. (And yes, we should see to make the code print them in a more useful manner.) How ? addr2line against xen-syms (or xen.efi if you use that one). And of course the result may need manual adjustment to account for eventual patches you have in your tree. Jan Ah yeah .. silly me .. somehow i had in mind it would be kernel addresses instead of xen, so running it against vmlinux of course lead no where. Here we go: (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 - 82d080239d85 (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 - 82d080239d85 which leads to: # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080195583 /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080239d85 ??:? The second one is not. It is the fixup label, which will be hidden away out-of-line, and lacking debug symbols. Were /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 leads to: case MSR_EFER: rdmsr_normal: /* Everyone can read the MSR space. */ /* gdprintk(XENLOG_WARNING,Domain attempted RDMSR %p.\n, _p(regs-ecx));*/ HERE --if ( rdmsr_safe(regs-ecx, val) ) goto fail; Moving the printk into the fail case will identify which is the problematic MSR. We need the value of regs-_ecx here (the low 32bits, not the full 64 as the commented printk currently has). I have a small todo list of misc debugging improvements. I will add this to the list. ~Andrew rdmsr_writeback: regs-eax = (uint32_t)val; regs-edx = (uint32_t)(val 32); break; } break; Don't know if the full 64bits is of equal use It is (just with an unhelpful quantity of zeroes) , but here it is: (XEN) [2015-07-08 10:01:58.717] traps.c:2760:d14v0 Domain attempted but failed RDMSR 0570. Looks to be MSR_IA32_RTIT_CTL, which is part of the Intel Processor Trace PMU driver (Linux/arch/x86/kernel/cpu/perf_event_intel_pt.c). A PV domain running on AMD absolutely shouldn't be attempting to read this. It appears that pt_init() blindly probes the MSR without any cpuid/vendor detection. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 2/3] arm: Allow the user to specify the GIC version
On Tue, 2015-07-07 at 17:22 +0100, Julien Grall wrote: diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index e1632fa..11f6461 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -369,6 +369,12 @@ libxl_vnode_info = Struct(vnode_info, [ (vcpus, libxl_bitmap), # vcpus in this node ]) +libxl_gic_version = Enumeration(gic_version, [ +(0, DEFAULT), +(0x20, v2), +(0x30, v3) +], init_val = LIBXL_GIC_VERSION_DEFAULT) + libxl_domain_build_info = Struct(domain_build_info,[ (max_vcpus, integer), (avail_vcpus, libxl_bitmap), @@ -480,6 +486,11 @@ libxl_domain_build_info = Struct(domain_build_info,[ ])), (invalid, None), ], keyvar_init_val = LIBXL_DOMAIN_TYPE_INVALID)), + + +(arch_arm, Struct(None, [(gic_version, libxl_gic_version), + ])), + ], dir=DIR_IN This results in the following when building the ocaml bindings: Traceback (most recent call last): File genwrap.py, line 529, in module ml.write(gen_ocaml_ml(ty, False)) File genwrap.py, line 217, in gen_ocaml_ml s += gen_struct(ty) File genwrap.py, line 119, in gen_struct x = ocaml_instance_of_field(f) File genwrap.py, line 112, in ocaml_instance_of_field return %s : %s % (munge_name(name), ocaml_type_of(f.type)) File genwrap.py, line 90, in ocaml_type_of return ty.rawname.capitalize() + .t AttributeError: 'NoneType' object has no attribute 'capitalize' make[7]: *** No rule to make target '_libxl_types.ml.in', needed by 'xenlight.ml'. Stop. I'll take a look. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [linux-3.4 test] 59139: regressions - FAIL
flight 59139 linux-3.4 real [real] http://logs.test-lab.xenproject.org/osstest/logs/59139/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemut-win7-amd64 6 xen-boot fail REGR. vs. 30511 Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-sedf-pin 6 xen-boot fail in 58831 pass in 58798 test-amd64-amd64-xl 6 xen-boot fail in 59091 pass in 59139 test-amd64-amd64-pair10 xen-boot/dst_host fail pass in 58798 test-amd64-amd64-pair 9 xen-boot/src_host fail pass in 58798 test-amd64-i386-pair 10 xen-boot/dst_host fail pass in 58831 test-amd64-i386-pair 9 xen-boot/src_host fail pass in 58831 test-amd64-i386-xl-qemuu-win7-amd64 9 windows-install fail pass in 59091 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail baseline untested test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail baseline untested test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail baseline untested test-amd64-amd64-xl-multivcpu 6 xen-boot fail baseline untested test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail baseline untested test-amd64-amd64-libvirt-xsm 6 xen-bootfail baseline untested test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 6 xen-boot fail baseline untested test-amd64-i386-libvirt-xsm 6 xen-bootfail baseline untested test-amd64-amd64-xl-credit2 6 xen-bootfail baseline untested test-amd64-i386-xl-xsm6 xen-bootfail baseline untested test-amd64-amd64-xl-xsm 6 xen-bootfail baseline untested test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 guest-localmigrate fail baseline untested test-amd64-amd64-xl-sedf 6 xen-boot fail in 58831 like 30406 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-localmigrate/x10 fail in 59091 baseline untested test-amd64-i386-libvirt 11 guest-start fail like 30511 test-amd64-amd64-libvirt 11 guest-start fail like 30511 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 30511 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 30511 test-amd64-amd64-xl-qemuu-ovmf-amd64 6 xen-bootfail like 53709-bisect test-amd64-i386-xl6 xen-bootfail like 53725-bisect test-amd64-i386-freebsd10-amd64 6 xen-boot fail like 58780-bisect test-amd64-i386-xl-qemuu-winxpsp3 6 xen-boot fail like 58786-bisect test-amd64-i386-qemut-rhel6hvm-intel 6 xen-bootfail like 58788-bisect test-amd64-i386-rumpuserxen-i386 6 xen-bootfail like 58799-bisect test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 6 xen-bootfail like 58801-bisect test-amd64-amd64-xl-qemuu-debianhvm-amd64 6 xen-boot fail like 58803-bisect test-amd64-amd64-xl-qemut-winxpsp3 6 xen-boot fail like 58804-bisect test-amd64-i386-freebsd10-i386 6 xen-boot fail like 58805-bisect test-amd64-i386-xl-qemuu-ovmf-amd64 6 xen-boot fail like 58806-bisect test-amd64-amd64-xl-qemuu-winxpsp3 6 xen-boot fail like 58807-bisect test-amd64-i386-xl-qemut-winxpsp3 6 xen-boot fail like 58808-bisect test-amd64-i386-xl-qemut-winxpsp3-vcpus1 6 xen-bootfail like 58809-bisect test-amd64-amd64-rumpuserxen-amd64 6 xen-boot fail like 58810-bisect test-amd64-i386-xl-qemuu-debianhvm-amd64 6 xen-bootfail like 58811-bisect test-amd64-amd64-xl-qemut-debianhvm-amd64 6 xen-boot fail like 58813-bisect test-amd64-i386-qemuu-rhel6hvm-intel 6 xen-bootfail like 58814-bisect test-amd64-i386-xl-qemut-debianhvm-amd64 6 xen-bootfail like 58815-bisect Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt-xsm 12 migrate-support-check fail in 58831 never pass test-amd64-i386-libvirt 12 migrate-support-check fail in 58831 never pass test-amd64-amd64-libvirt 12 migrate-support-check fail in 58831 never pass test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail in 59091 never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass version targeted for testing: linuxcf1b3dad6c5699b977273276bada8597636ef3e2 baseline version: linuxbb4a05a0400ed6d2f1e13d1f82f289ff74300a70 Last test of basis30511 2014-09-29 16:37:46 Z 281 days Failing since 32004 2014-12-02 04:10:03 Z 218 days 167 attempts Testing same since58781 2015-06-20 14:15:50 Z 17 days 21 attempts 500 people touched revisions under test,
Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy
On Wed, 2015-07-08 at 08:54 +0800, Chen, Tiejun wrote: +none is the default value and it means we don't check any reserved regions +and then all rdm policies would be ignored. Guest just works as before and +the conflict of RDM and guest address space wouldn't be handled, and then +this may result in the associated device not being able to work or even crash +the VM. So if you're assigning this kind of device, this option is not +recommended unless you can make sure any conflict doesn't exist. + One issue didn't come to conclusion during last round of review. Ian was asking what's the difference with type=none vs not specifying rdm option at all. You need to either convince Ian or remove type=none in *xl* level. I.e. don't touch the libxl IDL. It still needs a none type. I'll update this next revision. And also rephrase this doc to address your comments below. FTR I think I indicated yesterday that I was satisfied with your explanation for why type=none exists as an option even at the xl level, namely that it allows us to change the default in the future. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] traps.c:3227: GPF (0000): ffff82d080194a4d - ffff82d080239d85 and other dom0 induced log messages
On 08/07/2015 09:45, Sander Eikelenboom wrote: Monday, July 6, 2015, 11:33:09 AM, you wrote: On 26.06.15 at 17:57, li...@eikelenboom.it wrote: On 2015-06-26 17:51, Jan Beulich wrote: On 26.06.15 at 17:41, li...@eikelenboom.it wrote: from 3.16 to 3.19 we gained a lot of these, if i remember correctly related to perf being enabled in the kernel: + traps.c:2655:d0v0 Domain attempted WRMSR c081 from 0xe023e008 to 0x00230010. + traps.c:2655:d0v0 Domain attempted WRMSR c082 from 0x82d0b000 to 0x81bc2670. + traps.c:2655:d0v0 Domain attempted WRMSR c083 from 0x82d0b020 to 0x81bc4630. These are the SYSCALL (STAR) MSRs, which the kernel has no business touching when running on Xen. from 3.19 to 4.0 we gained: + d0 attempted to change d0v0's CR4 flags 0660 - 0760 + d0 attempted to change d0v1's CR4 flags 0660 - 0760 + d0 attempted to change d0v2's CR4 flags 0660 - 0760 + d0 attempted to change d0v3's CR4 flags 0660 - 0760 + d0 attempted to change d0v4's CR4 flags 0660 - 0760 + d0 attempted to change d0v5's CR4 flags 0660 - 0760 This is X86_CR4_PCE - not sure how to properly handle that. Andrew, you're fiddling with the CR4 handling right now anyway - any thoughts? and from 4.0 to 4.1 we gained the ones you were interested in: + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 + traps.c:3227: GPF (): 82d080194a4d - 82d080239d85 For these to be meaningful you need to translate them to symbolic addresses. (And yes, we should see to make the code print them in a more useful manner.) How ? addr2line against xen-syms (or xen.efi if you use that one). And of course the result may need manual adjustment to account for eventual patches you have in your tree. Jan Ah yeah .. silly me .. somehow i had in mind it would be kernel addresses instead of xen, so running it against vmlinux of course lead no where. Here we go: (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 - 82d080239d85 (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 - 82d080239d85 which leads to: # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080195583 /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080239d85 ??:? The second one is not. It is the fixup label, which will be hidden away out-of-line, and lacking debug symbols. Were /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 leads to: case MSR_EFER: rdmsr_normal: /* Everyone can read the MSR space. */ /* gdprintk(XENLOG_WARNING,Domain attempted RDMSR %p.\n, _p(regs-ecx));*/ HERE --if ( rdmsr_safe(regs-ecx, val) ) goto fail; Moving the printk into the fail case will identify which is the problematic MSR. We need the value of regs-_ecx here (the low 32bits, not the full 64 as the commented printk currently has). I have a small todo list of misc debugging improvements. I will add this to the list. ~Andrew rdmsr_writeback: regs-eax = (uint32_t)val; regs-edx = (uint32_t)(val 32); break; } break; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [linux-4.1 test] 59143: regressions - FAIL
flight 59143 linux-4.1 real [real] http://logs.test-lab.xenproject.org/osstest/logs/59143/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-localmigrate/x10 fail REGR. vs. 59031 Regressions which are regarded as allowable (not blocking): test-amd64-i386-libvirt 11 guest-start fail REGR. vs. 59031 Tests which did not succeed, but are not blocking: test-amd64-i386-freebsd10-amd64 9 freebsd-install fail never pass test-amd64-i386-freebsd10-i386 9 freebsd-install fail never pass test-amd64-amd64-xl-pvh-intel 13 guest-saverestorefail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-i386-libvirt-xsm 11 guest-start fail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 11 guest-start fail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail never pass version targeted for testing: linux6a010c0abd49388a49af3d5a5bfc00e0d5767607 baseline version: linuxb953c0d234bc72e8489d3bf51a276c5c4ec85345 Last test of basis59031 2015-07-02 23:39:59 Z5 days Testing same since59054 2015-07-05 10:20:43 Z2 days3 attempts People who touched revisions under test: Alexander Shishkin alexander.shish...@linux.intel.com Alexey Sokolov soko...@7pikes.com Andi Kleen a...@linux.intel.com Arnaldo Carvalho de Melo a...@redhat.com Borislav Petkov b...@alien8.de Borislav Petkov b...@suse.de Dmitry Tunin hanipouspi...@gmail.com Greg Kroah-Hartman gre...@linuxfoundation.org Imre Palik im...@amazon.de Ingo Molnar mi...@kernel.org Jiri Olsa jo...@kernel.org Kalle Valo kv...@codeaurora.org Lukas Wunner lu...@wunner.de Marcel Holtmann mar...@holtmann.org Oleg Nesterov o...@redhat.com Palik, Imre im...@amazon.de Peter Zijlstra (Intel) pet...@infradead.org RafaÅ MiÅecki zaj...@gmail.com jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass build-amd64-rumpuserxen pass build-i386-rumpuserxen pass test-amd64-amd64-xl pass test-armhf-armhf-xl pass test-amd64-i386-xl pass test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemut-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm fail test-amd64-amd64-libvirt-xsm pass test-armhf-armhf-libvirt-xsm pass test-amd64-i386-libvirt-xsm fail test-amd64-amd64-xl-xsm pass test-armhf-armhf-xl-xsm pass test-amd64-i386-xl-xsm
Re: [Xen-devel] [v3 08/15] Suppress posting interrupts when 'SN' is set
From: Wu, Feng Sent: Wednesday, June 24, 2015 1:18 PM Currently, we don't support urgent interrupt, all interrupts are recognized as non-urgent interrupt, so we cannot send posted-interrupt when 'SN' is set. Signed-off-by: Feng Wu feng...@intel.com --- v3: use cmpxchg to test SN/ON and set ON xen/arch/x86/hvm/vmx/vmx.c | 32 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 0837627..b94ef6a 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1686,6 +1686,8 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector) { +struct pi_desc old, new, prev; + move to 'else if'. if ( pi_test_and_set_pir(vector, v-arch.hvm_vmx.pi_desc) ) return; @@ -1698,13 +1700,35 @@ static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector) */ pi_set_on(v-arch.hvm_vmx.pi_desc); } -else if ( !pi_test_and_set_on(v-arch.hvm_vmx.pi_desc) ) +else { +prev.control = 0; + +do { +old.control = v-arch.hvm_vmx.pi_desc.control + ~(1 POSTED_INTR_ON | 1 POSTED_INTR_SN); +new.control = v-arch.hvm_vmx.pi_desc.control | + 1 POSTED_INTR_ON; + +/* + * Currently, we don't support urgent interrupt, all + * interrupts are recognized as non-urgent interrupt, + * so we cannot send posted-interrupt when 'SN' is set. + * Besides that, if 'ON' is already set, we cannot set + * posted-interrupts as well. + */ +if ( prev.sn || prev.on ) +{ +vcpu_kick(v); +return; +} would it make more sense to move above check after cmpxchg? + +prev.control = cmpxchg(v-arch.hvm_vmx.pi_desc.control, + old.control, new.control); +} while ( prev.control != old.control ); + __vmx_deliver_posted_interrupt(v); -return; } - -vcpu_kick(v); } static void vmx_sync_pir_to_irr(struct vcpu *v) -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel
Hi, On 08/07/2015 09:56, Jan Beulich wrote: --- a/xen/include/asm-arm/irq.h +++ b/xen/include/asm-arm/irq.h @@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, void arch_move_irqs(struct vcpu *v); +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq))) + This addition is here in order to ensure that d and pirq are evaluated, right? If so, I didn't find it obvious to understand. Why didn't you use a static inline? Or maybe add a comment explicitly say this is not implemented. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy
On Wed, 2015-07-08 at 17:06 +0800, Chen, Tiejun wrote: #2. Don't expose ignore to user and just keep host as the default He told me he would discuss this with you, but sounds he didn't do this, or I'm missing something here? My question was regarding how xl rdm=type=none differed from not saying anything (i.e. getting the default). You explained that this was useful to allow the default to be changed, which I agreed with. The question regarding the actually naming of the options at either the xl level or the libxl (which seems to be what Ian J's comments were on) are orthogonal to the question of whether there should be a way to explicitly ask for the default (as opposed to implicitly asking for it by omission of the option). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 1/7] libxl: get rid of the SEDF scheduler
On Tue, 2015-07-07 at 18:43 +0200, Dario Faggioli wrote: only the interface is left in place, for backward compile-time compatibility, but every attempt to use it would throw an error. Signed-off-by: Dario Faggioli dario.faggi...@citrix.com --- Cc: George Dunlap george.dun...@eu.citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com Changes from v3: - drop George's Rev-by: which should not be there since v2; - better grouping of fields in libxl_domain_sched_params, as suggested during review; - improved comment for ERROR_FEATURE_REMOVED, as suggested during review. Changes from v2: - introduce and use ERROR_FEATURE_REMOVED, as requested during review; - mark the SEDF only parameter as deprecated in libxl_types.idl, as requested during review. --- tools/libxl/libxl.c | 73 ++- tools/libxl/libxl_create.c | 61 tools/libxl/libxl_types.idl |8 - 3 files changed, 11 insertions(+), 131 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 3a83903..38aff8d 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -5728,73 +5728,6 @@ static int sched_credit2_domain_set(libxl__gc *gc, uint32_t domid, return 0; } -static int sched_sedf_domain_get(libxl__gc *gc, uint32_t domid, - libxl_domain_sched_params *scinfo) -{ -uint64_t period; -uint64_t slice; -uint64_t latency; -uint16_t extratime; -uint16_t weight; -int rc; - -rc = xc_sedf_domain_get(CTX-xch, domid, period, slice, latency, -extratime, weight); -if (rc != 0) { -LOGE(ERROR, getting domain sched sedf); -return ERROR_FAIL; -} - -libxl_domain_sched_params_init(scinfo); -scinfo-sched = LIBXL_SCHEDULER_SEDF; -scinfo-period = period / 100; -scinfo-slice = slice / 100; -scinfo-latency = latency / 100; -scinfo-extratime = extratime; -scinfo-weight = weight; - -return 0; -} - -static int sched_sedf_domain_set(libxl__gc *gc, uint32_t domid, - const libxl_domain_sched_params *scinfo) -{ -uint64_t period; -uint64_t slice; -uint64_t latency; -uint16_t extratime; -uint16_t weight; - -int ret; - -ret = xc_sedf_domain_get(CTX-xch, domid, period, slice, latency, -extratime, weight); -if (ret != 0) { -LOGE(ERROR, getting domain sched sedf); -return ERROR_FAIL; -} - -if (scinfo-period != LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT) -period = (uint64_t)scinfo-period * 100; -if (scinfo-slice != LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT) -slice = (uint64_t)scinfo-slice * 100; -if (scinfo-latency != LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT) -latency = (uint64_t)scinfo-latency * 100; -if (scinfo-extratime != LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT) -extratime = scinfo-extratime; -if (scinfo-weight != LIBXL_DOMAIN_SCHED_PARAM_WEIGHT_DEFAULT) -weight = scinfo-weight; - -ret = xc_sedf_domain_set(CTX-xch, domid, period, slice, latency, -extratime, weight); -if ( ret 0 ) { -LOGE(ERROR, setting domain sched sedf); -return ERROR_FAIL; -} - -return 0; -} - static int sched_rtds_domain_get(libxl__gc *gc, uint32_t domid, libxl_domain_sched_params *scinfo) { @@ -5873,7 +5806,8 @@ int libxl_domain_sched_params_set(libxl_ctx *ctx, uint32_t domid, switch (sched) { case LIBXL_SCHEDULER_SEDF: -ret=sched_sedf_domain_set(gc, domid, scinfo); +LOG(ERROR, SEDF scheduler no longer available); +ret=ERROR_FEATURE_REMOVED; break; case LIBXL_SCHEDULER_CREDIT: ret=sched_credit_domain_set(gc, domid, scinfo); @@ -5909,7 +5843,8 @@ int libxl_domain_sched_params_get(libxl_ctx *ctx, uint32_t domid, switch (scinfo-sched) { case LIBXL_SCHEDULER_SEDF: -ret=sched_sedf_domain_get(gc, domid, scinfo); +LOG(ERROR, SEDF scheduler no longer available); +ret=ERROR_FEATURE_REMOVED; break; case LIBXL_SCHEDULER_CREDIT: ret=sched_credit_domain_get(gc, domid, scinfo); diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 9c2303c..3f31a3b 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -50,61 +50,6 @@ int libxl__domain_create_info_setdefault(libxl__gc *gc, return 0; } -static int sched_params_valid(libxl__gc *gc, - uint32_t domid, libxl_domain_sched_params *scp) -{ -int
Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
-Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: Tuesday, June 30, 2015 1:07 AM To: Wu, Feng; xen-devel@lists.xen.org Cc: k...@xen.org; jbeul...@suse.com; Tian, Kevin; Zhang, Yang Z; george.dun...@eu.citrix.com Subject: Re: [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked On 24/06/15 06:18, Feng Wu wrote: This patch includes the following aspects: - Add a global vector to wake up the blocked vCPU when an interrupt is being posted to it (This part was sugguested by Yang Zhang yang.z.zh...@intel.com). - Adds a new per-vCPU tasklet to wakeup the blocked vCPU. It can be used in the case vcpu_unblock cannot be called directly. - Define two per-cpu variables: * pi_blocked_vcpu: A list storing the vCPUs which were blocked on this pCPU. * pi_blocked_vcpu_lock: The spinlock to protect pi_blocked_vcpu. Signed-off-by: Feng Wu feng...@intel.com --- v3: - This patch is generated by merging the following three patches in v2: [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU [RFC v2 10/15] vmx: Define two per-cpu variables [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet' - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct' - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler' - Make pi_wakeup_interrupt() static - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list' - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct' - Rename 'blocked_vcpu' to 'pi_blocked_vcpu' - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock' xen/arch/x86/hvm/vmx/vmcs.c| 3 +++ xen/arch/x86/hvm/vmx/vmx.c | 54 ++ xen/include/asm-x86/hvm/hvm.h | 1 + xen/include/asm-x86/hvm/vmx/vmcs.h | 5 xen/include/asm-x86/hvm/vmx/vmx.h | 5 5 files changed, 68 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 11dc1b5..0c5ce3f 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -631,6 +631,9 @@ int vmx_cpu_up(void) if ( cpu_has_vmx_vpid ) vpid_sync_all(); +INIT_LIST_HEAD(per_cpu(pi_blocked_vcpu, cpu)); +spin_lock_init(per_cpu(pi_blocked_vcpu_lock, cpu)); + return 0; } diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index b94ef6a..7db6009 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content); static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content); static void vmx_invlpg_intercept(unsigned long vaddr); +/* + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we + * can find which vCPU should be waken up. + */ +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu); +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock); + uint8_t __read_mostly posted_intr_vector; +uint8_t __read_mostly pi_wakeup_vector; + +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg) +{ +vcpu_unblock((struct vcpu *)arg); +} static int vmx_domain_initialise(struct domain *d) { @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v) if ( v-vcpu_id == 0 ) v-arch.user_regs.eax = 1; +tasklet_init( +v-arch.hvm_vmx.pi_vcpu_wakeup_tasklet, +pi_vcpu_wakeup_tasklet_handler, +(unsigned long)v); c/s f6dd295 indicates that the global tasklet lock causes a bottleneck when injecting interrupts, and replaced a tasklet with a softirq to fix the scalability issue. I would expect exactly the bottleneck to exist here. I am still considering this comments. Jan, what is your opinion about this? Thanks, Feng + +INIT_LIST_HEAD(v-arch.hvm_vmx.pi_blocked_vcpu_list); + return 0; } static void vmx_vcpu_destroy(struct vcpu *v) { +tasklet_kill(v-arch.hvm_vmx.pi_vcpu_wakeup_tasklet); /* * There are cases that domain still remains in log-dirty mode when it is * about to be destroyed (ex, user types 'xl destroy dom'), in which case @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata vmx_function_table = { .enable_msr_exit_interception = vmx_enable_msr_exit_interception, }; +/* + * Handle VT-d posted-interrupt when VCPU is blocked. + */ +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) +{ +struct arch_vmx_struct *vmx; +unsigned int cpu = smp_processor_id(); + +spin_lock(per_cpu(pi_blocked_vcpu_lock, cpu)); this_cpu($foo) should be used in preference to per_cpu($foo, $myself). However, always hoist repeated uses of this/per_cpu into local variables, as
[Xen-devel] x86, arm: remove asm/spinlock.h from all architectures removed x86's _raw_read_unlock()
David, I'm afraid we'll need another fixup here, even if things build fine despite the removal. Thanks, Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Performance problem about address translation
On 2015年07月08日 14:26, xinyue wrote: Very sorry for sending wrong before. On 2015年07月08日 14:13, xinyue wrote: On 2015年07月07日 19:49, Ian Campbell wrote: On Tue, 2015-07-07 at 11:24 +0800, xinyue wrote: Please don't use HTML mail and do proper quoting And after analyzing the performance of hvm domu, I found a process named evolution-data- using almost 99.9% cpu. Does someone known what's this and why it appears? evolution-data-server is part of the evolution mail client. It has nothing to do with Xen I'm afraid so you will have to look elsewhere for why it is taking so much CPU. Ian. Sorry for that and thanks very much. I think the problem maybe caused by the address alignment. The HVM DomU crashed after the hypercall and Dom0 crashed later sometimes with Bus error. I think the function that caused the crash is get_gfn. The related code is unsigned long gfn; unsigned long mfn; struct vcpu *vcpu = current; struct domain *d = vcpu-domain; uint32_t pfec = PFEC_page_present; p2m_type_t t; gfn = paging_gva_to_gfn(current, 0xc029, pfec); mfn = get_gfn(d, gfn, t); Is that I lost some type translation? Thanks and best regards! xinyue Thanks for all advices, I found the problem appeared because I forget adding function put_gfn. Thanks again and best regards! xinyue ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v25 00/15] x86/PMU: Xen PMU PV(H) support
On 19.06.15 at 20:44, boris.ostrov...@oracle.com wrote: While making another scan through this series now that some more reviews from Dietmar are trickling in, I notice: Boris Ostrovsky (15): common/symbols: Export hypervisor symbols to privileged guest x86/VPMU: Add public xenpmu.h x86/VPMU: Make vpmu not HVM-specific x86/VPMU: Interface for setting PMU mode and flags still missing a VMX maintainer's ack x86/VPMU: Initialize VPMUs with __initcall same here plus no review (albeit I wouldn't make the latter a requirement) x86/VPMU: Initialize PMU for PV(H) guests same regarding review state x86/VPMU: Save VPMU state for PV guests during context switch x86/VPMU: When handling MSR accesses, leave fault injection to callers again same regarding review state x86/VPMU: Add support for PMU register handling on PV guests x86/VPMU: Use pre-computed masks when checking validity of MSRs VPMU/AMD: Check MSR values before writing to hardware no review yet (and here I'd really like to have one) x86/VPMU: Handle PMU interrupts for PV(H) guests same here x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr x86/VPMU: Add privileged PMU mode here a review would again be nice, but I'd again not make it a requirement x86/VPMU: Move VPMU files up from hvm/ directory Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64
On 08.07.15 at 10:33, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, July 08, 2015 4:13 PM On 08.07.15 at 09:06, feng...@intel.com wrote: From: xen-devel-boun...@lists.xen.org [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Andrew Cooper Sent: Thursday, June 25, 2015 2:35 AM On 24/06/15 06:18, Feng Wu wrote: +{ +uint128_t prev; + +ASSERT(cpu_has_cx16); Given that if this assertion were to fail, cmpxchg16b would fail with #UD, I would hand-code a asm_fixup section which in turn panics. This avoids a situation where non-debug builds could die with an unqualified #UD exception. Is there an existing way to panic the hypervisor in assembler code, I don't find it, it would be appreciated if you can point it out. I'm not convinced such a #UD would be a significant problem: Looking at the disassembly will show the cause right away. The out of line ud2-s in some of VMX'es inline assembly wrappers are far worse. So, do you agree with the fixup section or not? I'd rather not go that route, unless Andrew or your manage to convince me otherwise. I think Andrew's enforce really means ASSERT() or BUG_ON(), again to avoid an unqualified exception. However - see above. Plus, all that said, without having seen the actual use sites of cmpxchg16b yet, I'm not at all convinced we really need this patch. After introducing posted format in IRTE, some fields exist in both the High 64 bit and the low 64 bit,such as pda_h and pda_l, how to make sure it is atomic when updating the pda field? Is there a need for updating these _after_ initially setting up an entry? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel