date:20150708

[Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread Jan Beulich

Rather than assuming only PV guests need special treatment (and
dealing with that directly when an IRQ gets set up), keep all guest MSI
IRQs masked until either the (HVM) guest unmasks them via vMSI or the
(PV, PVHVM, or PVH) guest sets up an event channel for it.

To not further clutter the common evtchn_bind_pirq() with x86-specific
code, introduce an arch_evtchn_bind_pirq() hook instead.

Reported-by: Sander Eikelenboom li...@eikelenboom.it
Signed-off-by: Jan Beulich jbeul...@suse.com
Tested-by: Sander Eikelenboom li...@eikelenboom.it

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2502,6 +2502,25 @@ int unmap_domain_pirq_emuirq(struct doma
 return ret;
 }
 
+void arch_evtchn_bind_pirq(struct domain *d, int pirq)
+{
+int irq = domain_pirq_to_irq(d, pirq);
+struct irq_desc *desc;
+unsigned long flags;
+
+if ( irq = 0 )
+return;
+
+if ( is_hvm_domain(d) )
+map_domain_emuirq_pirq(d, pirq, IRQ_PT);
+
+desc = irq_to_desc(irq);
+spin_lock_irqsave(desc-lock, flags);
+if ( desc-msi_desc )
+guest_mask_msi_irq(desc, 0);
+spin_unlock_irqrestore(desc-lock, flags);
+}
+
 bool_t hvm_domain_use_pirq(const struct domain *d, const struct pirq *pirq)
 {
 return is_hvm_domain(d)  pirq 
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -422,10 +422,7 @@ void guest_mask_msi_irq(struct irq_desc 
 
 static unsigned int startup_msi_irq(struct irq_desc *desc)
 {
-bool_t guest_masked = (desc-status  IRQ_GUEST) 
-  is_hvm_domain(desc-msi_desc-dev-domain);
-
-msi_set_mask_bit(desc, 0, guest_masked);
+msi_set_mask_bit(desc, 0, !!(desc-status  IRQ_GUEST));
 return 0;
 }
 
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -502,10 +502,7 @@ static long evtchn_bind_pirq(evtchn_bind
 
 bind-port = port;
 
-#ifdef CONFIG_X86
-if ( is_hvm_domain(d)  domain_pirq_to_irq(d, pirq)  0 )
-map_domain_emuirq_pirq(d, pirq, IRQ_PT);
-#endif
+arch_evtchn_bind_pirq(d, pirq);
 
  out:
 spin_unlock(d-event_lock);
--- a/xen/include/asm-arm/irq.h
+++ b/xen/include/asm-arm/irq.h
@@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, 
 
 void arch_move_irqs(struct vcpu *v);
 
+#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))
+
 /* Set IRQ type for an SPI */
 int irq_set_spi_type(unsigned int spi, unsigned int type);
 
--- a/xen/include/xen/irq.h
+++ b/xen/include/xen/irq.h
@@ -172,4 +172,8 @@ unsigned int set_desc_affinity(struct ir
 unsigned int arch_hwdom_irqs(domid_t);
 #endif
 
+#ifndef arch_evtchn_bind_pirq
+void arch_evtchn_bind_pirq(struct domain *, int pirq);
+#endif
+
 #endif /* __XEN_IRQ_H__ */



x86/MSI: fix guest unmasking when handling IRQ via event channel

Rather than assuming only PV guests need special treatment (and
dealing with that directly when an IRQ gets set up), keep all guest MSI
IRQs masked until either the (HVM) guest unmasks them via vMSI or the
(PV, PVHVM, or PVH) guest sets up an event channel for it.

To not further clutter the common evtchn_bind_pirq() with x86-specific
code, introduce an arch_evtchn_bind_pirq() hook instead.

Reported-by: Sander Eikelenboom li...@eikelenboom.it
Signed-off-by: Jan Beulich jbeul...@suse.com
Tested-by: Sander Eikelenboom li...@eikelenboom.it

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2502,6 +2502,25 @@ int unmap_domain_pirq_emuirq(struct doma
 return ret;
 }
 
+void arch_evtchn_bind_pirq(struct domain *d, int pirq)
+{
+int irq = domain_pirq_to_irq(d, pirq);
+struct irq_desc *desc;
+unsigned long flags;
+
+if ( irq = 0 )
+return;
+
+if ( is_hvm_domain(d) )
+map_domain_emuirq_pirq(d, pirq, IRQ_PT);
+
+desc = irq_to_desc(irq);
+spin_lock_irqsave(desc-lock, flags);
+if ( desc-msi_desc )
+guest_mask_msi_irq(desc, 0);
+spin_unlock_irqrestore(desc-lock, flags);
+}
+
 bool_t hvm_domain_use_pirq(const struct domain *d, const struct pirq *pirq)
 {
 return is_hvm_domain(d)  pirq 
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -422,10 +422,7 @@ void guest_mask_msi_irq(struct irq_desc 
 
 static unsigned int startup_msi_irq(struct irq_desc *desc)
 {
-bool_t guest_masked = (desc-status  IRQ_GUEST) 
-  is_hvm_domain(desc-msi_desc-dev-domain);
-
-msi_set_mask_bit(desc, 0, guest_masked);
+msi_set_mask_bit(desc, 0, !!(desc-status  IRQ_GUEST));
 return 0;
 }
 
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -502,10 +502,7 @@ static long evtchn_bind_pirq(evtchn_bind
 
 bind-port = port;
 
-#ifdef CONFIG_X86
-if ( is_hvm_domain(d)  domain_pirq_to_irq(d, pirq)  0 )
-map_domain_emuirq_pirq(d, pirq, IRQ_PT);
-#endif
+arch_evtchn_bind_pirq(d, pirq);
 
  out:
 spin_unlock(d-event_lock);
--- a/xen/include/asm-arm/irq.h
+++ b/xen/include/asm-arm/irq.h
@@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, 
 
 void arch_move_irqs(struct vcpu *v);
 
+#define

Re: [Xen-devel] [PATCH v2] net/bridge: Use __in6_dev_get rather than in6_dev_get in br_validate_ipv6

2015-07-08 Thread Pablo Neira Ayuso

On Tue, Jul 07, 2015 at 11:34:34AM -0700, Stephen Hemminger wrote:
 On Tue, 7 Jul 2015 15:55:21 +0100
 Julien Grall julien.gr...@citrix.com wrote:
 
  The commit efb6de9b4ba0092b2c55f6a52d16294a8a698edd netfilter: bridge:
  forward IPv6 fragmented packets introduced a new function
  br_validate_ipv6 which take a reference on the inet6 device. Although,
  the reference is not released at the end.
  
  This will result to the impossibility to destroy any netdevice using
  ipv6 and bridge.
  
  It's possible to directly retrieve the inet6 device without taking a
  reference as all netfilter hooks are protected by rcu_read_lock via
  nf_hook_slow.
  
  Spotted while trying to destroy a Xen guest on the upstream Linux:
  unregister_netdevice: waiting for vif1.0 to become free. Usage count = 1
  
  Signed-off-by: Julien Grall julien.gr...@citrix.com
  Cc: Bernhard Thaler bernhard.tha...@wvnet.at
  Cc: Pablo Neira Ayuso pa...@netfilter.org
  Cc: f...@strlen.de
  Cc: ian.campb...@citrix.com
  Cc: wei.l...@citrix.com
  Cc: Bob Liu bob@oracle.com
  
  ---
  Note that it's impossible to create new guest after this message.
  I'm not sure if it's normal.
  
  Changes in v2:
  - Don't take a reference to inet6.
  - This was net/bridge: Add missing in6_dev_put in
  br_validate_ipv6 [0]
  
  [0] https://lkml.org/lkml/2015/7/3/443
  ---
   net/bridge/br_netfilter_ipv6.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
 
 I like this simple solution
 
 Acked-by: Stephen Hemminger step...@networkplumber.org

Applied, thanks.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread Andrew Cooper

On 08/07/2015 09:56, Jan Beulich wrote:
 Rather than assuming only PV guests need special treatment (and
 dealing with that directly when an IRQ gets set up), keep all guest MSI
 IRQs masked until either the (HVM) guest unmasks them via vMSI or the
 (PV, PVHVM, or PVH) guest sets up an event channel for it.

 To not further clutter the common evtchn_bind_pirq() with x86-specific
 code, introduce an arch_evtchn_bind_pirq() hook instead.

 Reported-by: Sander Eikelenboom li...@eikelenboom.it
 Signed-off-by: Jan Beulich jbeul...@suse.com
 Tested-by: Sander Eikelenboom li...@eikelenboom.it

Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Jan Beulich

 On 08.07.15 at 12:36, feng...@intel.com wrote:
 From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
 Sent: Tuesday, June 30, 2015 1:07 AM
 On 24/06/15 06:18, Feng Wu wrote:
  @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v)
   if ( v-vcpu_id == 0 )
   v-arch.user_regs.eax = 1;
 
  +tasklet_init(
  +v-arch.hvm_vmx.pi_vcpu_wakeup_tasklet,
  +pi_vcpu_wakeup_tasklet_handler,
  +(unsigned long)v);
 
 c/s f6dd295 indicates that the global tasklet lock causes a bottleneck
 when injecting interrupts, and replaced a tasklet with a softirq to fix
 the scalability issue.
 
 I would expect exactly the bottleneck to exist here.
 
 I am still considering this comments. Jan, what is your opinion about this?

My opinion here is that I expect you to respond to Andrew.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v3 11/18] xen/arm: ITS: Add GITS registers emulation

2015-07-08 Thread Ian Campbell

On Mon, 2015-06-22 at 17:31 +0530, vijay.kil...@gmail.com wrote:
 From: Vijaya Kumar K vijaya.ku...@caviumnetworks.com
 
 Emulate GITS* registers and handle LPI configuration
 table update trap.

These need to only be exposed to a guest which has been configured with
an ITS. For dom0 that means at a minimum it needs to be based on the
capabilities of the underlying hardware.

The same is true of the next patch adding the GICR registers.

For domU it seems there is currently no ITS exposed to them, since there
is no toolstack changes here, so the emulation should be configured
accordingly.

 
 Signed-off-by: Vijaya Kumar K vijaya.ku...@caviumnetworks.com
 ---
  xen/arch/arm/vgic-v3-its.c|  516 
 +
  xen/include/asm-arm/gic-its.h |   14 ++
  2 files changed, 530 insertions(+)
 
 diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c
 index 0671434..fa9dccc 100644
 --- a/xen/arch/arm/vgic-v3-its.c
 +++ b/xen/arch/arm/vgic-v3-its.c
 @@ -63,6 +63,46 @@ static void dump_cmd(its_cmd_block *cmd)
  }
  #endif
  
 +void vgic_its_disable_lpis(struct vcpu *v, uint32_t vlpi)
 +{
 +struct pending_irq *p;
 +unsigned long flags;
 +
 +p = irq_to_pending(v, vlpi);
 +clear_bit(GIC_IRQ_GUEST_ENABLED, p-status);
 +gic_remove_from_queues(v, vlpi);
 +if ( p-desc != NULL )
 +{
 +spin_lock_irqsave(p-desc-lock, flags);
 +p-desc-handler-disable(p-desc);
 +spin_unlock_irqrestore(p-desc-lock, flags);
 +}
 +}
 +
 +void vgic_its_enable_lpis(struct vcpu *v, uint32_t vlpi, uint8_t priority)
 +{
 +struct pending_irq *p;
 +unsigned long flags;
 +
 +/* Get plpi for the given vlpi */
 +p = irq_to_pending(v, vlpi);
 +p-priority = priority;
 +set_bit(GIC_IRQ_GUEST_ENABLED, p-status);
 +
 +spin_lock_irqsave(v-arch.vgic.lock, flags);
 +
 +if ( !list_empty(p-inflight) 
 + !test_bit(GIC_IRQ_GUEST_VISIBLE, p-status) )
 +gic_raise_guest_irq(v, irq_to_virq(p-desc), p-priority);
 +
 +spin_unlock_irqrestore(v-arch.vgic.lock, flags);
 +if ( p-desc != NULL )
 +{
 +spin_lock_irqsave(p-desc-lock, flags);
 +p-desc-handler-enable(p-desc);
 +spin_unlock_irqrestore(p-desc-lock, flags);
 +}
 +}
  /* ITS device table helper functions */
  int vits_vdevice_entry(struct domain *d, uint32_t dev_id,
 struct vdevice_table *entry, int set)
 @@ -649,6 +689,482 @@ err:
  return 0;
  }
  
 +static int vgic_v3_gits_lpi_mmio_read(struct vcpu *v, mmio_info_t *info)
 +{
 +uint32_t offset;
 +struct hsr_dabt dabt = info-dabt;
 +struct cpu_user_regs *regs = guest_cpu_user_regs();
 +register_t *r = select_user_reg(regs, dabt.reg);
 +uint8_t cfg;
 +
 +offset = info-gpa -
 + (v-domain-arch.vits-propbase  0xf000UL);
 +
 +if ( offset  SZ_64K )
 +{
 +DPRINTK(vITS:d%dv%d LPI Table read offset 0x%x\n,
 +v-domain-domain_id, v-vcpu_id, offset);
 +cfg = readb_relaxed(v-domain-arch.vits-prop_page + offset);
 +*r = cfg;
 +return 1;
 +}
 +else
 +dprintk(XENLOG_G_ERR, vITS:d%dv%d LPI Table read with wrong offset 
 0x%x\n,
 +v-domain-domain_id, v-vcpu_id, offset);
 +
 +
 +return 0;
 +}
 +
 +static int vgic_v3_gits_lpi_mmio_write(struct vcpu *v, mmio_info_t *info)
 +{
 +uint32_t offset;
 +uint32_t vid;
 +uint8_t cfg;
 +bool_t enable;
 +struct hsr_dabt dabt = info-dabt;
 +struct cpu_user_regs *regs = guest_cpu_user_regs();
 +register_t *r = select_user_reg(regs, dabt.reg);
 +
 +offset = info-gpa -
 + (v-domain-arch.vits-propbase  0xf000UL);
 +
 +vid = offset + NR_GIC_LPI;
 +if ( offset  SZ_64K )
 +{
 +DPRINTK(vITS:d%dv%d LPI Table write offset 0x%x\n,
 +v-domain-domain_id, v-vcpu_id, offset);
 +cfg = readb_relaxed(v-domain-arch.vits-prop_page + offset);
 +enable = (cfg  *r)  0x1;
 +
 +if ( !enable )
 + vgic_its_enable_lpis(v, vid,  (*r  0xfc));
 +else
 + vgic_its_disable_lpis(v, vid);
 +
 +/* Update virtual prop page */
 +writeb_relaxed((*r  0xff),
 +v-domain-arch.vits-prop_page + offset);
 +
 +return 1;
 +}
 +else
 +dprintk(XENLOG_G_ERR, vITS:d%dv%d LPI Table invalid write @ 0x%x\n,
 +v-domain-domain_id, v-vcpu_id, offset);
 +
 +return 0; 
 +}
 +
 +static const struct mmio_handler_ops vgic_gits_lpi_mmio_handler = {
 +.read_handler  = vgic_v3_gits_lpi_mmio_read,
 +.write_handler = vgic_v3_gits_lpi_mmio_write,
 +};
 +
 +int vgic_its_unmap_lpi_prop(struct vcpu *v)
 +{
 +paddr_t maddr;
 +uint32_t lpi_size;
 +int i;
 +
 +maddr = v-domain-arch.vits-propbase  0xf000UL;
 +lpi_size = 1UL  ((v-domain-arch.vits-propbase  0x1f) + 1);
 +
 +DPRINTK(vITS:d%dv%d Unmap guest

Re: [Xen-devel] [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling

2015-07-08 Thread Tian, Kevin

 From: Wu, Feng
 Sent: Wednesday, June 24, 2015 1:18 PM
 
 The basic idea here is:
 1. When vCPU's state is RUNSTATE_running,
 - set 'NV' to 'Notification Vector'.
 - Clear 'SN' to accpet PI.
 - set 'NDST' to the right pCPU.
 2. When vCPU's state is RUNSTATE_blocked,
 - set 'NV' to 'Wake-up Vector', so we can wake up the
   related vCPU when posted-interrupt happens for it.
 - Clear 'SN' to accpet PI.
 3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
 - Set 'SN' to suppress non-urgent interrupts.
   (Current, we only support non-urgent interrupts)
 - Set 'NV' back to 'Notification Vector' if needed.
 
 Signed-off-by: Feng Wu feng...@intel.com

Acked-by: Kevin Tian kevin.t...@intel.com


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 08/15] Suppress posting interrupts when 'SN' is set

2015-07-08 Thread Tian, Kevin

 From: Wu, Feng
 Sent: Wednesday, July 08, 2015 6:11 PM
  From: Tian, Kevin
  Sent: Wednesday, July 08, 2015 5:06 PM
 
   From: Wu, Feng
   Sent: Wednesday, June 24, 2015 1:18 PM
  
   Currently, we don't support urgent interrupt, all interrupts
   are recognized as non-urgent interrupt, so we cannot send
   posted-interrupt when 'SN' is set.
  
   Signed-off-by: Feng Wu feng...@intel.com
   ---
   v3:
   use cmpxchg to test SN/ON and set ON
  
xen/arch/x86/hvm/vmx/vmx.c | 32
 
1 file changed, 28 insertions(+), 4 deletions(-)
  
   diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
   index 0837627..b94ef6a 100644
   --- a/xen/arch/x86/hvm/vmx/vmx.c
   +++ b/xen/arch/x86/hvm/vmx/vmx.c
   @@ -1686,6 +1686,8 @@ static void __vmx_deliver_posted_interrupt(struct
  vcpu *v)
  
static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
{
   +struct pi_desc old, new, prev;
   +
 
  move to 'else if'.
 
if ( pi_test_and_set_pir(vector, v-arch.hvm_vmx.pi_desc) )
return;
  
   @@ -1698,13 +1700,35 @@ static void vmx_deliver_posted_intr(struct vcpu
  *v, u8
   vector)
 */
pi_set_on(v-arch.hvm_vmx.pi_desc);
}
   -else if ( !pi_test_and_set_on(v-arch.hvm_vmx.pi_desc) )
   +else
{
   +prev.control = 0;
   +
   +do {
   +old.control = v-arch.hvm_vmx.pi_desc.control 
   +  ~(1  POSTED_INTR_ON | 1 
  POSTED_INTR_SN);
   +new.control = v-arch.hvm_vmx.pi_desc.control |
   +  1  POSTED_INTR_ON;
   +
   +/*
   + * Currently, we don't support urgent interrupt, all
   + * interrupts are recognized as non-urgent interrupt,
   + * so we cannot send posted-interrupt when 'SN' is set.
   + * Besides that, if 'ON' is already set, we cannot set
   + * posted-interrupts as well.
   + */
   +if ( prev.sn || prev.on )
   +{
   +vcpu_kick(v);
   +return;
   +}
 
  would it make more sense to move above check after cmpxchg?
 
 My original idea is that, we only need to do the check when
 prev.control != old.control, which means the cmpxchg is not
 successful completed. If we add the check between cmpxchg
 and while ( prev.control != old.control ), it seems the logic is
 not so clear, since we don't need to check prev.sn and prev.on
 when cmxchg succeeds in setting the new value.
 
 Thanks,
 Feng
 

Then it'd be clearer if you move the check the start of the loop, so
you can avoid two additional reads when the prev.on/sn is set. :-)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes

2015-07-08 Thread Wu, Feng

 -Original Message-
 From: Tian, Kevin
 Sent: Wednesday, July 08, 2015 7:46 PM
 To: Wu, Feng; xen-devel@lists.xen.org
 Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
 Yang Z; george.dun...@eu.citrix.com
 Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
 changes

  From: Wu, Feng
  Sent: Wednesday, July 08, 2015 6:32 PM

   -Original Message-
   From: Tian, Kevin
   Sent: Wednesday, July 08, 2015 6:23 PM
   To: Wu, Feng; xen-devel@lists.xen.org
   Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
   Yang Z; george.dun...@eu.citrix.com
   Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
   changes

From: Wu, Feng
Sent: Wednesday, June 24, 2015 1:18 PM

When guest changes its interrupt configuration (such as, vector, etc.)
for direct-assigned devices, we need to update the associated IRTE
with the new guest vector, so external interrupts from the assigned
devices can be injected to guests without VM-Exit.

For lowest-priority interrupts, we use vector-hashing mechamisn to find
the destination vCPU. This follows the hardware behavior, since modern
Intel CPUs use vector hashing to handle the lowest-priority interrupt.

For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
still use interrupt remapping.

Signed-off-by: Feng Wu feng...@intel.com
---
v3:
- Use bitmap to store the all the possible destination vCPUs of an
interrupt, then trying to find the right destination from the bitmap
- Typo and some small changes

 xen/drivers/passthrough/io.c | 96
+++-
 1 file changed, 95 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 9b77334..18e24e1 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -26,6 +26,7 @@
 #include asm/hvm/iommu.h
 #include asm/hvm/support.h
 #include xen/hvm/irq.h
+#include asm/io_apic.h

 static DEFINE_PER_CPU(struct list_head, dpci_list);

@@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci
 *dpci)
 xfree(dpci);
 }

+/*
+ * The purpose of this routine is to find the right destination vCPU 
for
+ * an interrupt which will be delivered by VT-d posted-interrupt. There
+ * are several cases as below:

   If you aim to have this interface common to more usages, don't restrict to
   VT-d posted-interrupt which should be just an example.

  Yes, making this a common interface should be better.

+ *
+ * - For lowest-priority interrupts, we find the destination vCPU from 
the
+ *   guest vector using vector-hashing mechanism and return true. This
   follows
+ *   the hardware behavior, since modern Intel CPUs use vector
 hashing to
+ *   handle the lowest-priority interrupt.

   Does AMD use same hashing mechanism? Can this interface be reused by
   other IOMMU type or it's an Intel specific implementation?

  I am not sure how AMD handle lowest-priority. Intel hardware guys told me
  recent Intel hardware platform use this method to deliver lowest-priority
  interrupts. What do you mean by other IOMMU type?

 OS doesn't assume how vector hashing is done in hardware level. So it should
 be fine to use Intel algorithm in this emulation path. However my point is 
 just
 about the comment  since modern Intel CPUs use vector hashing to handle
 the lowest-priority interrupt. It's not because Intel does so. It's the
 implementation option that you choose Intel algorithm here.

here I can mention: we choose vector-hashing for lowest-priority handling and
list Intel as an example to use it, okay?

Thanks,
Feng

 Thanks
 Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [libvirt] [PATCH] libxl: support dom0

2015-07-08 Thread Michal Privoznik

On 07.07.2015 01:27, Jim Fehlig wrote:
 On 07/06/2015 03:46 PM, Jim Fehlig wrote:
 In Xen, dom0 is really just another domain that supports ballooning,
 adding/removing devices, changing vcpu configuration, etc. This patch
 adds support to the libxl driver for managing dom0. Note that the
 legacy xend driver has long supported managing dom0.

 Operations that are not supported on dom0 are filtered in libvirt
 where a sensible error is reported. Errors from libxl are not
 always helpful. E.g., attempting a save on dom0 results in

 2015-06-23 15:25:05 MDT libxl: debug:
 libxl_dom.c:1570:libxl__toolstack_save: domain=0 toolstack data size=8
 2015-06-23 15:25:05 MDT libxl: debug:
 libxl.c:979:do_libxl_domain_suspend: ao 0x7f7e68000b70: inprogress:
 poller=0x7f7e68000930, flags=i
 2015-06-23 15:25:05 MDT libxl-save-helper: debug: starting save: Success
 2015-06-23 15:25:05 MDT xc: detail: xc_domain_save_suse: starting save
 of domid 0
 2015-06-23 15:25:05 MDT xc: error: Couldn't map live_shinfo (3 = No
 such process): Internal error
 2015-06-23 15:25:05 MDT xc: detail: Save exit of domid 0 with errno=3
 2015-06-23 15:25:05 MDT libxl-save-helper: debug: complete r=1: No
 such process
 2015-06-23 15:25:05 MDT libxl: error:
 libxl_dom.c:1876:libxl__xc_domain_save_done: saving domain: domain did
 not respond to suspend request: No such process
 2015-06-23 15:25:05 MDT libxl: error:
 libxl_dom.c:2033:remus_teardown_done: Remus: failed to teardown device
 for guest with domid 0, rc -8

 Signed-off-by: Jim Fehlig jfeh...@suse.com
 ---
   src/libxl/libxl_driver.c | 95
 
   1 file changed, 95 insertions(+)

 diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c
 index 149ef70..d0b76ac 100644
 --- a/src/libxl/libxl_driver.c
 +++ b/src/libxl/libxl_driver.c
 @@ -79,6 +79,15 @@ VIR_LOG_INIT(libxl.libxl_driver);
   /* Number of Xen scheduler parameters */
   #define XEN_SCHED_CREDIT_NPARAM   2
   +#define LIBXL_CHECK_DOM0_GOTO(name,
 label)   \
 +do
 {  \
 +if (STREQ_NULLABLE(name, Domain-0))
 {   \
 +virReportError(VIR_ERR_OPERATION_INVALID,
 %s,   \
 +   _(Domain-0 does not support requested
 operation)); \
 +goto
 label;   \
 +   
 } \
 +} while (0)
 +
 static libxlDriverPrivatePtr libxl_driver;
   @@ -501,6 +510,62 @@ const struct libxl_event_hooks ev_hooks = {
   };
 static int
 +libxlAddDom0(libxlDriverPrivatePtr driver)
 +{
 +libxlDriverConfigPtr cfg = libxlDriverConfigGet(driver);
 +virDomainDefPtr def = NULL;
 +virDomainObjPtr vm = NULL;
 +virDomainDefPtr oldDef = NULL;
 +libxl_dominfo d_info;
 +int ret = -1;
 +
 +libxl_dominfo_init(d_info);
 +
 +/* Ensure we have a dom0 */
 +if (libxl_domain_info(cfg-ctx, d_info, 0) != 0) {
 +virReportError(VIR_ERR_INTERNAL_ERROR,
 +   %s, _(unable to get Domain-0 information
 from libxenlight));
 +goto cleanup;
 +}
 +
 +if (!(def = virDomainDefNew()))
 +goto cleanup;
 +
 +def-id = 0;
 +def-virtType = VIR_DOMAIN_VIRT_XEN;
 +if (VIR_STRDUP(def-name, Domain-0)  0)
 +goto cleanup;
 +
 +def-os.type = VIR_DOMAIN_OSTYPE_XEN;
 +
 +if (virUUIDParse(----,
 def-uuid)  0)
 +goto cleanup;
 +
 +vm-def-vcpus = d_info.vcpu_online;
 +vm-def-maxvcpus = d_info.vcpu_max_id + 1;
 +vm-def-mem.cur_balloon = d_info.current_memkb;
 +vm-def-mem.max_balloon = d_info.max_memkb;
 
 Opps. Before sending the patch, but after testing it again, I moved the
 call to libxl_domain_info to the beginning of this function.  I also
 moved setting the vcpu and memory info earlier, but
 
 +
 +if (!(vm = virDomainObjListAdd(driver-domains, def,
 +   driver-xmlopt,
 +   0,
 +   oldDef)))
 +goto cleanup;
 +
 +def = NULL;
 +ret = 0;
 
 before getting a virDomainObj - ouch.  Consider the following obvious
 fix squashed in
 
 diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c
 index d0b76ac..c0dd00b 100644
 --- a/src/libxl/libxl_driver.c
 +++ b/src/libxl/libxl_driver.c
 @@ -541,18 +541,19 @@ libxlAddDom0(libxlDriverPrivatePtr driver)
  if (virUUIDParse(----, def-uuid)
  0)
  goto cleanup;
 
 +if (!(vm = virDomainObjListAdd(driver-domains, def,
 +   driver-xmlopt,
 +   0,
 +   oldDef)))
 +goto cleanup;
 +
 +def = NULL;
 +
  vm-def-vcpus = d_info.vcpu_online;
  vm-def-maxvcpus = d_info.vcpu_max_id + 1;

[Xen-devel] [PATCH OSSTEST v8 06/14] Test pygrub and pvgrub on the regular flights

2015-07-08 Thread Ian Campbell

Since we now have the ability to test these drop one of each of
pygrub, pvgrub-32 and pvgrub-64 into the standard flights. Omitting
the {Guest}_diver runvar causes ts-debian-di-install to use the d-i
images in the location configured via TftpDiVersion, so they are
Version Controlled along with the d-i version used for the host.

This adds three new jobs:

+test-amd64-amd64-amd64-pvgrub all_hostflags   
arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test
+test-amd64-amd64-amd64-pvgrub archamd64
+test-amd64-amd64-amd64-pvgrub buildjob
build-amd64
+test-amd64-amd64-amd64-pvgrub debian_arch amd64
+test-amd64-amd64-amd64-pvgrub debian_bootloader   
pvgrub
+test-amd64-amd64-amd64-pvgrub debian_method   
netboot
+test-amd64-amd64-amd64-pvgrub debian_suite
wheezy
+test-amd64-amd64-amd64-pvgrub kernbuildjob
build-amd64-pvops
+test-amd64-amd64-amd64-pvgrub kernkindpvops
+test-amd64-amd64-amd64-pvgrub toolstack   xl
+test-amd64-amd64-amd64-pvgrub xenbuildjob 
build-amd64

+test-amd64-amd64-i386-pvgrub  all_hostflags   
arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test
+test-amd64-amd64-i386-pvgrub  archamd64
+test-amd64-amd64-i386-pvgrub  buildjob
build-amd64
+test-amd64-amd64-i386-pvgrub  debian_arch i386
+test-amd64-amd64-i386-pvgrub  debian_bootloader   
pvgrub
+test-amd64-amd64-i386-pvgrub  debian_method   
netboot
+test-amd64-amd64-i386-pvgrub  debian_suite
wheezy
+test-amd64-amd64-i386-pvgrub  kernbuildjob
build-amd64-pvops
+test-amd64-amd64-i386-pvgrub  kernkindpvops
+test-amd64-amd64-i386-pvgrub  toolstack   xl
+test-amd64-amd64-i386-pvgrub  xenbuildjob 
build-amd64

+test-amd64-amd64-pygrub   all_hostflags   
arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test
+test-amd64-amd64-pygrub   archamd64
+test-amd64-amd64-pygrub   buildjob
build-amd64
+test-amd64-amd64-pygrub   debian_arch amd64
+test-amd64-amd64-pygrub   debian_bootloader   
pygrub
+test-amd64-amd64-pygrub   debian_method   
netboot
+test-amd64-amd64-pygrub   debian_suite
wheezy
+test-amd64-amd64-pygrub   kernbuildjob
build-amd64-pvops
+test-amd64-amd64-pygrub   kernkindpvops
+test-amd64-amd64-pygrub   toolstack   xl
+test-amd64-amd64-pygrub   xenbuildjob 
build-amd64

Signed-off-by: Ian Campbell ian.campb...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
---
v7: Use {Guest}_suite not {Guest}_dist as runvar to choose version.
Refreshed runvars i ncommit message.
v3: added runvar details
---
 make-flight | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/make-flight b/make-flight
index de8393a..725da26 100755
--- a/make-flight
+++ b/make-flight
@@ -325,6 +325,42 @@ do_passthrough_tests () {
   done
 }
 
+do_pygrub_tests () {
+  if [ $xenarch != amd64 -o $dom0arch != amd64 -o $kern !=  ]; then
+return
+  fi
+
+  job_create_test test-$xenarch$kern-$dom0arch-pygrub   \
+test-debian-di xl $xenarch $dom0arch\
+  debian_arch=amd64 \
+  debian_suite=$guestsuite  \
+  debian_method=netboot \
+  debian_bootloader=pygrub  \
+  all_hostflags=$most_hostflags
+}
+
+do_pvgrub_tests () {
+  if [ $xenarch != amd64 -o $dom0arch != amd64 -o $kern !=  ]; then
+return
+  fi
+
+  job_create_test test-$xenarch$kern-$dom0arch-amd64-pvgrub \
+test-debian-di xl $xenarch $dom0arch\
+  debian_arch=amd64 \
+  debian_suite=$guestsuite  \
+  debian_method=netboot \
+  debian_bootloader=pvgrub  \
+  all_hostflags=$most_hostflags \
+
+  job_create_test test-$xenarch$kern-$dom0arch-i386-pvgrub  \
+test-debian-di xl $xenarch $dom0arch\
+

[Xen-devel] [PATCH OSSTEST v8 01/14] mfi-common: Allow make-*flight to filter the set of build jobs to include

2015-07-08 Thread Ian Campbell

By using the same job_create_build(_filter_callback) scheme used for
the test jobs.

Will be used in make-distros-flight.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
---
v8: Moved to head of queue, make-distros-flight isn't introduced yet
so that hunk is dropped here and comes back later on.
---
 make-flight |  4 
 mfi-common  | 21 +++--
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/make-flight b/make-flight
index c763ce9..de8393a 100755
--- a/make-flight
+++ b/make-flight
@@ -34,6 +34,10 @@ flight=`./cs-flight-create $blessing $branch`
 defsuite=`getconfig DebianSuite`
 defguestsuite=`getconfig GuestDebianSuite`
 
+job_create_build_filter_callback () {
+:
+}
+
 if [ x$buildflight = x ]; then
 
   create_build_jobs
diff --git a/mfi-common b/mfi-common
index a9e966f..a100afb 100644
--- a/mfi-common
+++ b/mfi-common
@@ -54,6 +54,15 @@ xenbranch_xsm_variants () {
 esac
 }
 
+job_create_build () {
+  job_create_build_filter_callback $@ || return 0
+
+  local job=$1; shift
+  local recipe=$1; shift
+
+  ./cs-job-create $flight $job $recipe $@
+}
+
 create_build_jobs () {
 
   local arch
@@ -164,7 +173,7 @@ create_build_jobs () {
   else
 xsm_suffix=
   fi
-  ./cs-job-create $flight build-$arch$xsm_suffix build   \
+  job_create_build build-$arch$xsm_suffix build  \
 arch=$arch enable_xend=$build_defxend enable_ovmf=$enable_ovmf\
 enable_xsm=$enable_xsm   \
 tree_qemu=$TREE_QEMU \
@@ -183,7 +192,7 @@ create_build_jobs () {
 done
 
 if [ $build_extraxend = true ] ; then
-./cs-job-create $flight build-$arch-xend build   \
+job_create_build build-$arch-xend build  \
 arch=$arch enable_xend=true enable_ovmf=$enable_ovmf \
 tree_qemu=$TREE_QEMU \
 tree_qemuu=$TREE_QEMU_UPSTREAM   \
@@ -196,7 +205,7 @@ create_build_jobs () {
 revision_qemuu=$REVISION_QEMU_UPSTREAM
 fi
 
-./cs-job-create $flight build-$arch-pvops build-kern \
+job_create_build build-$arch-pvops build-kern\
 arch=$arch kconfighow=xen-enable-xen-config  \
 $RUNVARS $BUILD_RUNVARS $BUILD_LINUX_RUNVARS $arch_runvars   \
 $suite_runvars   \
@@ -208,7 +217,7 @@ create_build_jobs () {
 
 if [ x$REVISION_LIBVIRT != xdisable ]; then
 
-./cs-job-create $flight build-$arch-libvirt build-libvirt\
+job_create_build build-$arch-libvirt build-libvirt   \
 arch=$arch   \
 tree_xen=$TREE_XEN   \
 $RUNVARS $BUILD_RUNVARS $BUILD_LIBVIRT_RUNVARS $arch_runvars \
@@ -223,7 +232,7 @@ create_build_jobs () {
 
 case $arch in
 i386|amd64)
-./cs-job-create $flight build-$arch-rumpuserxen build-rumpuserxen\
+job_create_build build-$arch-rumpuserxen build-rumpuserxen   \
 arch=$arch   \
 tree_xen=$TREE_XEN   \
 $RUNVARS $BUILD_RUNVARS $BUILD_RUMPUSERXEN_RUNVARS 
$arch_runvars \
@@ -252,7 +261,7 @@ create_build_jobs () {
 
 if [ x$REVISION_LINUX_OLD != xdisable ]; then
 
-  ./cs-job-create $flight build-$arch-oldkern build-kern\
+  job_create_build build-$arch-oldkern build-kern   \
 arch=$arch kconfighow=create-config-sh  \
 kimagefile=vmlinux  \
 $RUNVARS $BUILD_RUNVARS $BUILD_LINUX_OLD_RUNVARS\
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v8 02/14] TestSupport: Add helper to fetch a URL on a host

2015-07-08 Thread Ian Campbell

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
v8: Use \Q...\E to safely quote $url and $path
v7: Quote $url and $path, switch to a heredoc to avoid resulting over
long line
v5: Support http_proxy via $c{HttpProxy}
v3: Make sure wget is installed
---
 Osstest/Debian.pm  |  2 +-
 Osstest/TestSupport.pm | 12 +++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm
index 718a7e2..2d49ff8 100644
--- a/Osstest/Debian.pm
+++ b/Osstest/Debian.pm
@@ -841,7 +841,7 @@ d-i apt-setup/another boolean false
 d-i apt-setup/non-free boolean false
 d-i apt-setup/contrib boolean false
 
-d-i pkgsel/include string openssh-server, ntp, ntpdate, ethtool, 
chiark-utils-bin, $extra_packages
+d-i pkgsel/include string openssh-server, ntp, ntpdate, ethtool, 
chiark-utils-bin, wget, $extra_packages
 
 d-i grub-installer/force-efi-extra-removable boolean true
 
diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index b5994a4..1cace4f 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -55,7 +55,7 @@ BEGIN {
   target_putfilecontents_stash
  target_putfilecontents_root_stash
   target_put_guest_image target_editfile
-  target_editfile_cancel
+  target_editfile_cancel target_fetchurl
   target_editfile_root target_file_exists
   target_editfile_kvp_replace
   target_run_apt
@@ -1595,6 +1595,16 @@ END
 return $cfgpath;
 }
 
+sub target_fetchurl($$$;$) {
+my ($ho, $url, $path, $timeo) = @_;
+$timeo ||= 2000;
+my $useproxy = export http_proxy=$c{HttpProxy}; if $c{HttpProxy};
+target_cmd_root($ho, END, $timeo);
+$useproxy wget --progress=dot:mega -O \Q$path\E \Q$url\E
+END
+}
+
+
 sub target_put_guest_image ($$;$) {
 my ($ho, $gho, $default) = @_;
 my $specimage = $r{$gho-{Guest}_image};
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v8 00/14] add distro domU testing flight

2015-07-08 Thread Ian Campbell

Hi,

Since v7 I've done the switch from lvm to none as discussed, fixed
(I hope!) the quoting in the fetchurl helper and added the runvar docs
to the ts-debian-di-install script. I also pushed the build job
filtering to the head (which resulted in some other patches being folded
in to the introduction in make-distros-flight instead of later).

I retained acks even when changing things due to either the moving of
the make-*flight filter or the moving of the runvar docs to the script,
otherwise I dropped them, I hope that is ok.

Summary of (A)cks, (M)odified and (N)ew (NM==Replaced something):

AM  mfi-common: Allow make-*flight to filter the set of build jobs to 
include
 M  TestSupport: Add helper to fetch a URL on a host
AM  distros: add support for installing Debian PV guests via d-i, flight 
and jobs
AM  distros: support booting Debian PV (d-i installed) guests with pvgrub.
 M  distros: Support pvgrub for Wheezy too.
A   Test pygrub and pvgrub on the regular flights
A   distros: add branch infrastructure
  N crontab-cambridge: Use hard tabs for alignment.
 M  distros: Run one suite per day on a weekly basis
A   Debian: Handle lack of bootloader support in d-i on ARM.
A   ts-debian-di-install: Refactor root_disk specification
A   make-flight: refactor PV debian tests
 M  Add testing of file backed disk formats
make-distros-flight: Use ftp.debian.org directly

Results for an adhoc xen-unstable flight are at 
http://osstest.xs.citrite.net/~osstest/testlogs/logs/37711/
And for Jessie:
http://osstest.xs.citrite.net/~osstest/testlogs/logs/37717/

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v8 08/14] crontab-cambridge: Use hard tabs for alignment.

2015-07-08 Thread Ian Campbell

Also quote the value of BRANCHES=.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
v8: Slit out from distros: Run one suite per day on a weekly basis
---
 crontab-cambridge | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/crontab-cambridge b/crontab-cambridge
index 60bb4fd..e0c3eff 100644
--- a/crontab-cambridge
+++ b/crontab-cambridge
@@ -1,5 +1,5 @@
 PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
 MAILTO=ian.jack...@citrix.com,ian.campb...@eu.citrix.com
-# mh   dom mon dow command
-4-59/30*   * * *   cd testing.git  BRANCHES=osstest  
./cr-for-branches branches -q ./cr-daily-branch --real
-3  4   * * *   savelog -c28 
testing.git/tmp/cr-for-branches.log /dev/null
+# mh   dom mon dow command
+4-59/30*   * * *   cd testing.git  
BRANCHES='osstest'./cr-for-branches branches -q 
./cr-daily-branch --real
+3  4   * * *   savelog -c28 
testing.git/tmp/cr-for-branches.log /dev/null
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread David Vrabel

On 08/07/15 11:58, Jan Beulich wrote:
 On 08.07.15 at 11:39, david.vra...@citrix.com wrote:
 On 08/07/15 09:56, Jan Beulich wrote:
 Rather than assuming only PV guests need special treatment (and
 dealing with that directly when an IRQ gets set up), keep all guest MSI
 IRQs masked until either the (HVM) guest unmasks them via vMSI or the
 (PV, PVHVM, or PVH) guest sets up an event channel for it.

 To not further clutter the common evtchn_bind_pirq() with x86-specific
 code, introduce an arch_evtchn_bind_pirq() hook instead.

 Can you describe the symptoms of the bug being fixed here?
 
 Interrupts simply didn't get unmasked for PVHVM Linux guests.
 
 --- a/xen/include/asm-arm/irq.h
 +++ b/xen/include/asm-arm/irq.h
 @@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, 
  
  void arch_move_irqs(struct vcpu *v);
  
 +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))

 Would this be better as a inline function?

 +
  /* Set IRQ type for an SPI */
  int irq_set_spi_type(unsigned int spi, unsigned int type);
  
 --- a/xen/include/xen/irq.h
 +++ b/xen/include/xen/irq.h
 @@ -172,4 +172,8 @@ unsigned int set_desc_affinity(struct ir
  unsigned int arch_hwdom_irqs(domid_t);
  #endif
  
 +#ifndef arch_evtchn_bind_pirq
 +void arch_evtchn_bind_pirq(struct domain *, int pirq);

 ... moving this into xen/include/asm-x86/irq.h
 
 Oh, right, (also to Julien) - this is exactly the reason I do not want it
 to be an inline function for ARM: I want the declaration here, not
 replicated in every interested arch's header.

Ok.

FWIW, with this requirement I would (instead of the macros) add a weak
arch_evtchn_bind_pirq() that's a no-op.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 15/15] Add a command line parameter for VT-d posted-interrupts

2015-07-08 Thread Tian, Kevin

 From: Wu, Feng
 Sent: Wednesday, June 24, 2015 1:18 PM
 
 Enable VT-d Posted-Interrupts and add a command line
 parameter for it.
 
 Signed-off-by: Feng Wu feng...@intel.com
 ---
 v3:
 Remove the redundant no intremp then no intpost logic
 
  docs/misc/xen-command-line.markdown | 9 -
  xen/drivers/passthrough/iommu.c | 4 +++-
  2 files changed, 11 insertions(+), 2 deletions(-)
 
 diff --git a/docs/misc/xen-command-line.markdown
 b/docs/misc/xen-command-line.markdown
 index aa684c0..f8ec15f 100644
 --- a/docs/misc/xen-command-line.markdown
 +++ b/docs/misc/xen-command-line.markdown
 @@ -875,6 +875,13 @@ debug hypervisor only).
   Control the use of interrupt remapping (DMA remapping will always be 
 enabled
   if IOMMU functionality is enabled).
 
 + `intpost`
 +
 + Default: `true`
 +
 + Control the use of interrupt posting, interrupt posting is dependant on
 + interrupt remapping.

Control the use of interrupt posting, which depends on the availability of 
interrupt remapping.

 +
   `qinval` (VT-d)
 
   Default: `true`
 diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
 index 597f676..e13251c 100644
 --- a/xen/drivers/passthrough/iommu.c
 +++ b/xen/drivers/passthrough/iommu.c
 @@ -52,7 +52,7 @@ bool_t __read_mostly iommu_passthrough;
  bool_t __read_mostly iommu_snoop = 1;
  bool_t __read_mostly iommu_qinval = 1;
  bool_t __read_mostly iommu_intremap = 1;
 -bool_t __read_mostly iommu_intpost;
 +bool_t __read_mostly iommu_intpost = 1;
  bool_t __read_mostly iommu_hap_pt_share = 1;
  bool_t __read_mostly iommu_debug;
  bool_t __read_mostly amd_iommu_perdev_intremap = 1;
 @@ -97,6 +97,8 @@ static void __init parse_iommu_param(char *s)
  iommu_qinval = val;
  else if ( !strcmp(s, intremap) )
  iommu_intremap = val;
 +else if ( !strcmp(s, intpost) )
 +iommu_intpost = val;
  else if ( !strcmp(s, debug) )
  {
  iommu_debug = val;
 --
 2.1.0

Reviewed-by: Kevin Tian kevin.t...@intel.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes

2015-07-08 Thread Tian, Kevin

 From: Wu, Feng
 Sent: Wednesday, July 08, 2015 6:32 PM

  -Original Message-
  From: Tian, Kevin
  Sent: Wednesday, July 08, 2015 6:23 PM
  To: Wu, Feng; xen-devel@lists.xen.org
  Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
  Yang Z; george.dun...@eu.citrix.com
  Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
  changes

   From: Wu, Feng
   Sent: Wednesday, June 24, 2015 1:18 PM

   When guest changes its interrupt configuration (such as, vector, etc.)
   for direct-assigned devices, we need to update the associated IRTE
   with the new guest vector, so external interrupts from the assigned
   devices can be injected to guests without VM-Exit.

   For lowest-priority interrupts, we use vector-hashing mechamisn to find
   the destination vCPU. This follows the hardware behavior, since modern
   Intel CPUs use vector hashing to handle the lowest-priority interrupt.

   For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
   still use interrupt remapping.

   Signed-off-by: Feng Wu feng...@intel.com
   ---
   v3:
   - Use bitmap to store the all the possible destination vCPUs of an
   interrupt, then trying to find the right destination from the bitmap
   - Typo and some small changes

xen/drivers/passthrough/io.c | 96
   +++-
1 file changed, 95 insertions(+), 1 deletion(-)

   diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
   index 9b77334..18e24e1 100644
   --- a/xen/drivers/passthrough/io.c
   +++ b/xen/drivers/passthrough/io.c
   @@ -26,6 +26,7 @@
#include asm/hvm/iommu.h
#include asm/hvm/support.h
#include xen/hvm/irq.h
   +#include asm/io_apic.h

static DEFINE_PER_CPU(struct list_head, dpci_list);

   @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
xfree(dpci);
}

   +/*
   + * The purpose of this routine is to find the right destination vCPU for
   + * an interrupt which will be delivered by VT-d posted-interrupt. There
   + * are several cases as below:

  If you aim to have this interface common to more usages, don't restrict to
  VT-d posted-interrupt which should be just an example.

 Yes, making this a common interface should be better.

   + *
   + * - For lowest-priority interrupts, we find the destination vCPU from 
   the
   + *   guest vector using vector-hashing mechanism and return true. This
  follows
   + *   the hardware behavior, since modern Intel CPUs use vector hashing to
   + *   handle the lowest-priority interrupt.

  Does AMD use same hashing mechanism? Can this interface be reused by
  other IOMMU type or it's an Intel specific implementation?

 I am not sure how AMD handle lowest-priority. Intel hardware guys told me
 recent Intel hardware platform use this method to deliver lowest-priority
 interrupts. What do you mean by other IOMMU type?

OS doesn't assume how vector hashing is done in hardware level. So it should
be fine to use Intel algorithm in this emulation path. However my point is just
about the comment  since modern Intel CPUs use vector hashing to handle 
the lowest-priority interrupt. It's not because Intel does so. It's the 
implementation option that you choose Intel algorithm here.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes

2015-07-08 Thread Tian, Kevin

 From: Wu, Feng
 Sent: Wednesday, July 08, 2015 7:52 PM
 + * - For lowest-priority interrupts, we find the destination vCPU 
 from the
 + *   guest vector using vector-hashing mechanism and return true. 
 This
follows
 + *   the hardware behavior, since modern Intel CPUs use vector
  hashing to
 + *   handle the lowest-priority interrupt.
   
Does AMD use same hashing mechanism? Can this interface be reused by
other IOMMU type or it's an Intel specific implementation?
  
   I am not sure how AMD handle lowest-priority. Intel hardware guys told me
   recent Intel hardware platform use this method to deliver lowest-priority
   interrupts. What do you mean by other IOMMU type?
  
 
  OS doesn't assume how vector hashing is done in hardware level. So it should
  be fine to use Intel algorithm in this emulation path. However my point is 
  just
  about the comment  since modern Intel CPUs use vector hashing to handle
  the lowest-priority interrupt. It's not because Intel does so. It's the
  implementation option that you choose Intel algorithm here.
 
 here I can mention: we choose vector-hashing for lowest-priority handling and
 list Intel as an example to use it, okay?
 

Yes. :-)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v25 06/15] x86/VPMU: Initialize PMU for PV(H) guests

2015-07-08 Thread Dietmar Hahn

Am Freitag 19 Juni 2015, 14:44:37 schrieb Boris Ostrovsky:
 Code for initializing/tearing down PMU for PV guests
 
 Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
 Acked-by: Daniel De Graaf dgde...@tycho.nsa.gov
 Acked-by: Jan Beulich jbeul...@suse.com
 Acked-by: Kevin Tian kevin.t...@intel.com

Reviewed-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com

 ---
  tools/flask/policy/policy/modules/xen/xen.te |   4 +
  xen/arch/x86/domain.c|   2 +
  xen/arch/x86/hvm/hvm.c   |   1 +
  xen/arch/x86/hvm/svm/svm.c   |   4 +-
  xen/arch/x86/hvm/svm/vpmu.c  |  16 +++-
  xen/arch/x86/hvm/vmx/vmx.c   |   4 +-
  xen/arch/x86/hvm/vmx/vpmu_core2.c|  30 --
  xen/arch/x86/hvm/vpmu.c  | 131 
 ---
  xen/common/event_channel.c   |   1 +
  xen/include/asm-x86/hvm/vpmu.h   |   2 +
  xen/include/public/pmu.h |   2 +
  xen/include/public/xen.h |   1 +
  xen/include/xsm/dummy.h  |   3 +
  xen/xsm/flask/hooks.c|   4 +
  xen/xsm/flask/policy/access_vectors  |   2 +
  15 files changed, 181 insertions(+), 26 deletions(-)
 
 diff --git a/tools/flask/policy/policy/modules/xen/xen.te 
 b/tools/flask/policy/policy/modules/xen/xen.te
 index 45b5cb2..f553eb5 100644
 --- a/tools/flask/policy/policy/modules/xen/xen.te
 +++ b/tools/flask/policy/policy/modules/xen/xen.te
 @@ -130,6 +130,10 @@ if (guest_writeconsole) {
   dontaudit domain_type xen_t : xen writeconsole;
  }
  
 +# Allow all domains to use PMU (but not to change its settings --- that's 
 what
 +# pmu_ctrl is for)
 +allow domain_type xen_t:xen2 pmu_use;
 +
  
 ###
  #
  # Domain creation
 diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
 index dc18565..b699f68 100644
 --- a/xen/arch/x86/domain.c
 +++ b/xen/arch/x86/domain.c
 @@ -438,6 +438,8 @@ int vcpu_initialise(struct vcpu *v)
  vmce_init_vcpu(v);
  }
  
 +spin_lock_init(v-arch.vpmu.vpmu_lock);
 +
  if ( has_hvm_container_domain(d) )
  {
  rc = hvm_vcpu_initialise(v);
 diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
 index d5e5242..83a81f5 100644
 --- a/xen/arch/x86/hvm/hvm.c
 +++ b/xen/arch/x86/hvm/hvm.c
 @@ -4931,6 +4931,7 @@ static hvm_hypercall_t *const 
 pvh_hypercall64_table[NR_hypercalls] = {
  HYPERCALL(hvm_op),
  HYPERCALL(sysctl),
  HYPERCALL(domctl),
 +HYPERCALL(xenpmu_op),
  [ __HYPERVISOR_arch_1 ] = (hvm_hypercall_t *)paging_domctl_continuation
  };
  
 diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
 index a02f983..680eebe 100644
 --- a/xen/arch/x86/hvm/svm/svm.c
 +++ b/xen/arch/x86/hvm/svm/svm.c
 @@ -1165,7 +1165,9 @@ static int svm_vcpu_initialise(struct vcpu *v)
  return rc;
  }
  
 -vpmu_initialise(v);
 +/* PVH's VPMU is initialized via hypercall */
 +if ( is_hvm_vcpu(v) )
 +vpmu_initialise(v);
  
  svm_guest_osvw_init(v);
  
 diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
 index b60ca40..a8572a6 100644
 --- a/xen/arch/x86/hvm/svm/vpmu.c
 +++ b/xen/arch/x86/hvm/svm/vpmu.c
 @@ -364,13 +364,11 @@ static void amd_vpmu_destroy(struct vcpu *v)
  amd_vpmu_unset_msr_bitmap(v);
  
  xfree(vpmu-context);
 -vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
  
  if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
 -{
 -vpmu_reset(vpmu, VPMU_RUNNING);
  release_pmu_ownship(PMU_OWNER_HVM);
 -}
 +
 +vpmu_clear(vpmu);
  }
  
  /* VPMU part of the 'q' keyhandler */
 @@ -482,6 +480,16 @@ int __init amd_vpmu_init(void)
  return -EINVAL;
  }
  
 +if ( sizeof(struct xen_pmu_data) +
 + 2 * sizeof(uint64_t) * num_counters  PAGE_SIZE )
 +{
 +printk(XENLOG_WARNING
 +   VPMU: Register bank does not fit into VPMU shared page\n);
 +counters = ctrls = NULL;
 +num_counters = 0;
 +return -ENOSPC;
 +}
 +
  return 0;
  }
  
 diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
 index 0837627..50e11dd 100644
 --- a/xen/arch/x86/hvm/vmx/vmx.c
 +++ b/xen/arch/x86/hvm/vmx/vmx.c
 @@ -140,7 +140,9 @@ static int vmx_vcpu_initialise(struct vcpu *v)
  }
  }
  
 -vpmu_initialise(v);
 +/* PVH's VPMU is initialized via hypercall */
 +if ( is_hvm_vcpu(v) )
 +vpmu_initialise(v);
  
  vmx_install_vlapic_mapping(v);
  
 diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c 
 b/xen/arch/x86/hvm/vmx/vpmu_core2.c
 index 025c970..e7642e5 100644
 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
 +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
 @@ -365,13 +365,16 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
  if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
  return 0;
  
 -

Re: [Xen-devel] [PATCH V4 3/3] xen/vm_event: Deny register writes if refused by vm_event reply

2015-07-08 Thread Lengyel, Tamas

On Wed, Jul 8, 2015 at 6:22 AM, Razvan Cojocaru rcojoc...@bitdefender.com
wrote:

 Deny register writes if a vm_client subscribed to mov_to_msr or
 control register write events forbids them. Currently supported for
 MSR, CR0, CR3 and CR4 events.

 Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com
 Acked-by: George Dunlap george.dun...@eu.citrix.com
 Acked-by: Jan Beulich jbeul...@suse.com

 ---
 Changes since V3:
  - Renamed MEM_ACCESS_FLAG_DENY to VM_EVENT_FLAG_DENY (and fixed
the bit shift appropriately).
  - Moved the DENY vm_event response logic from p2m.c to newly
added dedicated files for vm_event handling, as suggested
by Tamas Lengyel.


This looks good to me. It will have to be rebased on staging once the other
series is merged as couple things will conflict. If this series lands first
however, the newly added asm/vm_event files lack the required license
header.

With that:
Acked-by: Tamas K Lengyel tleng...@novetta.com
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Tian, Kevin

 From: Wu, Feng
 Sent: Wednesday, June 24, 2015 1:18 PM
 
 This patch includes the following aspects:
 - Add a global vector to wake up the blocked vCPU
   when an interrupt is being posted to it (This
   part was sugguested by Yang Zhang yang.z.zh...@intel.com).
 - Adds a new per-vCPU tasklet to wakeup the blocked
   vCPU. It can be used in the case vcpu_unblock
   cannot be called directly.
 - Define two per-cpu variables:
   * pi_blocked_vcpu:
   A list storing the vCPUs which were blocked on this pCPU.
 
   * pi_blocked_vcpu_lock:
   The spinlock to protect pi_blocked_vcpu.
 
 Signed-off-by: Feng Wu feng...@intel.com
 ---
 v3:
 - This patch is generated by merging the following three patches in v2:
[RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
[RFC v2 10/15] vmx: Define two per-cpu variables
[RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
 - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
 - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct'
 - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler'
 - Make pi_wakeup_interrupt() static
 - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
 - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
 - Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
 - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'
 
  xen/arch/x86/hvm/vmx/vmcs.c|  3 +++
  xen/arch/x86/hvm/vmx/vmx.c | 54
 ++
  xen/include/asm-x86/hvm/hvm.h  |  1 +
  xen/include/asm-x86/hvm/vmx/vmcs.h |  5 
  xen/include/asm-x86/hvm/vmx/vmx.h  |  5 
  5 files changed, 68 insertions(+)
 
 diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
 index 11dc1b5..0c5ce3f 100644
 --- a/xen/arch/x86/hvm/vmx/vmcs.c
 +++ b/xen/arch/x86/hvm/vmx/vmcs.c
 @@ -631,6 +631,9 @@ int vmx_cpu_up(void)
  if ( cpu_has_vmx_vpid )
  vpid_sync_all();
 
 +INIT_LIST_HEAD(per_cpu(pi_blocked_vcpu, cpu));
 +spin_lock_init(per_cpu(pi_blocked_vcpu_lock, cpu));
 +
  return 0;
  }
 
 diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
 index b94ef6a..7db6009 100644
 --- a/xen/arch/x86/hvm/vmx/vmx.c
 +++ b/xen/arch/x86/hvm/vmx/vmx.c
 @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr, 
 uint64_t
 *msr_content);
  static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
  static void vmx_invlpg_intercept(unsigned long vaddr);
 
 +/*
 + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
 + * can find which vCPU should be waken up.
 + */
 +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
 +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
 +
  uint8_t __read_mostly posted_intr_vector;
 +uint8_t __read_mostly pi_wakeup_vector;
 +
 +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg)
 +{
 +vcpu_unblock((struct vcpu *)arg);
 +}
 
  static int vmx_domain_initialise(struct domain *d)
  {
 @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v)
  if ( v-vcpu_id == 0 )
  v-arch.user_regs.eax = 1;
 
 +tasklet_init(
 +v-arch.hvm_vmx.pi_vcpu_wakeup_tasklet,
 +pi_vcpu_wakeup_tasklet_handler,
 +(unsigned long)v);
 +
 +INIT_LIST_HEAD(v-arch.hvm_vmx.pi_blocked_vcpu_list);
 +
  return 0;
  }
 
  static void vmx_vcpu_destroy(struct vcpu *v)
  {
 +tasklet_kill(v-arch.hvm_vmx.pi_vcpu_wakeup_tasklet);
  /*
   * There are cases that domain still remains in log-dirty mode when it is
   * about to be destroyed (ex, user types 'xl destroy dom'), in which 
 case
 @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
 vmx_function_table = {
  .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
  };
 
 +/*
 + * Handle VT-d posted-interrupt when VCPU is blocked.
 + */
 +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
 +{
 +struct arch_vmx_struct *vmx;
 +unsigned int cpu = smp_processor_id();
 +
 +spin_lock(per_cpu(pi_blocked_vcpu_lock, cpu));
 +
 +/*
 + * FIXME: The length of the list depends on how many
 + * vCPU is current blocked on this specific pCPU.
 + * This may hurt the interrupt latency if the list
 + * grows to too many entries.
 + */

let's go with this linked list first until a real issue is identified.

 +list_for_each_entry(vmx, per_cpu(pi_blocked_vcpu, cpu),
 +pi_blocked_vcpu_list)
 +if ( vmx-pi_desc.on )
 +tasklet_schedule(vmx-pi_vcpu_wakeup_tasklet);

Not sure where the vcpu is removed from the list (possibly in later patch).
But at least removing vcpu from the list at this point should be safe and 
right way to go. IIRC Andrew and other guys raised similar concern earlier. :-)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 2/3] arm: Allow the user to specify the GIC version

2015-07-08 Thread Ian Campbell

On Wed, 2015-07-08 at 11:17 +0100, Ian Campbell wrote:
 On Tue, 2015-07-07 at 17:22 +0100, Julien Grall wrote:
  diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
  index e1632fa..11f6461 100644
  --- a/tools/libxl/libxl_types.idl
  +++ b/tools/libxl/libxl_types.idl
  @@ -369,6 +369,12 @@ libxl_vnode_info = Struct(vnode_info, [
   (vcpus, libxl_bitmap), # vcpus in this node
   ])
   
  +libxl_gic_version = Enumeration(gic_version, [
  +(0, DEFAULT),
  +(0x20, v2),
  +(0x30, v3)
  +], init_val = LIBXL_GIC_VERSION_DEFAULT)
  +
   libxl_domain_build_info = Struct(domain_build_info,[
   (max_vcpus,   integer),
   (avail_vcpus, libxl_bitmap),
  @@ -480,6 +486,11 @@ libxl_domain_build_info = Struct(domain_build_info,[
 ])),
(invalid, None),
], keyvar_init_val = LIBXL_DOMAIN_TYPE_INVALID)),
  +
  +
  +(arch_arm, Struct(None, [(gic_version, libxl_gic_version),
  +  ])),
  +
   ], dir=DIR_IN
 
 This results in the following when building the ocaml bindings:
 
 Traceback (most recent call last):
   File genwrap.py, line 529, in module
 ml.write(gen_ocaml_ml(ty, False))
   File genwrap.py, line 217, in gen_ocaml_ml
 s += gen_struct(ty)
   File genwrap.py, line 119, in gen_struct
 x = ocaml_instance_of_field(f)
   File genwrap.py, line 112, in ocaml_instance_of_field
 return %s : %s % (munge_name(name), ocaml_type_of(f.type))
   File genwrap.py, line 90, in ocaml_type_of
 return ty.rawname.capitalize() + .t
 AttributeError: 'NoneType' object has no attribute 'capitalize'
 make[7]: *** No rule to make target '_libxl_types.ml.in', needed by 
 'xenlight.ml'.  Stop.
 
 I'll take a look.

I have a patch to genwrap.py which results in the following diff to the
generate ml files for the anonymous sub-struct added by the IDL change
above.

Dave/Euan/Rob, is that idiomatic ocaml or is it possible to have
anonymous structs in ocaml like it is in C?

If there is a better/more usual way to do this would you mind supplying
me with the ocaml I should be aiming for please?

Ian.

--- tools/ocaml/libs/xl/_libxl_BACKUP_types.ml.in   2015-07-08 
11:22:35.0 +0100
+++ tools/ocaml/libs/xl/_libxl_types.ml.in  2015-07-08 12:25:56.0 
+0100
@@ -508,6 +508,17 @@ module Vnode_info = struct
external default : ctx - unit - t = stub_libxl_vnode_info_init
 end
 
+(* libxl_gic_version implementation *)
+type gic_version = 
+| GIC_VERSION_DEFAULT
+| GIC_VERSION_V2
+| GIC_VERSION_V3
+
+let string_of_gic_version = function
+   | GIC_VERSION_DEFAULT - DEFAULT
+   | GIC_VERSION_V2 - V2
+   | GIC_VERSION_V3 - V3
+
 (* libxl_domain_build_info implementation *)
 module Domain_build_info = struct
 
@@ -566,6 +577,10 @@ module Domain_build_info = struct

type type__union = Hvm of type_hvm | Pv of type_pv | Invalid

+   type arch_arm__anon = {
+   gic_version : gic_version;
+   }
+   
type t =
{
max_vcpus : int;
@@ -607,6 +622,7 @@ module Domain_build_info = struct
ramdisk : string option;
device_tree : string option;
xl_type : type__union;
+   arch_arm : arch_arm__anon;
}
external default : ctx - ?xl_type:domain_type - unit - t = 
stub_libxl_domain_build_info_init
 end




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes

2015-07-08 Thread Tian, Kevin

 From: Wu, Feng
 Sent: Wednesday, July 08, 2015 7:05 PM

  -Original Message-
  From: Wu, Feng
  Sent: Wednesday, July 08, 2015 6:32 PM
  To: Tian, Kevin; xen-devel@lists.xen.org
  Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
  Yang Z; george.dun...@eu.citrix.com; Wu, Feng
  Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
  changes

   -Original Message-
   From: Tian, Kevin
   Sent: Wednesday, July 08, 2015 6:23 PM
   To: Wu, Feng; xen-devel@lists.xen.org
   Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
   Yang Z; george.dun...@eu.citrix.com
   Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
   changes

From: Wu, Feng
Sent: Wednesday, June 24, 2015 1:18 PM

When guest changes its interrupt configuration (such as, vector, etc.)
for direct-assigned devices, we need to update the associated IRTE
with the new guest vector, so external interrupts from the assigned
devices can be injected to guests without VM-Exit.

For lowest-priority interrupts, we use vector-hashing mechamisn to find
the destination vCPU. This follows the hardware behavior, since modern
Intel CPUs use vector hashing to handle the lowest-priority interrupt.

For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
still use interrupt remapping.

Signed-off-by: Feng Wu feng...@intel.com
---
v3:
- Use bitmap to store the all the possible destination vCPUs of an
interrupt, then trying to find the right destination from the bitmap
- Typo and some small changes

 xen/drivers/passthrough/io.c | 96
+++-
 1 file changed, 95 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 9b77334..18e24e1 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -26,6 +26,7 @@
 #include asm/hvm/iommu.h
 #include asm/hvm/support.h
 #include xen/hvm/irq.h
+#include asm/io_apic.h

 static DEFINE_PER_CPU(struct list_head, dpci_list);

@@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci
  *dpci)
 xfree(dpci);
 }

+/*
+ * The purpose of this routine is to find the right destination vCPU 
for
+ * an interrupt which will be delivered by VT-d posted-interrupt. There
+ * are several cases as below:

   If you aim to have this interface common to more usages, don't restrict to
   VT-d posted-interrupt which should be just an example.

  Yes, making this a common interface should be better.

 Thinking about this a little more, this function itself is kind of restricted 
 to
 VT-d posted-interrupt, since it doesn't handle multicast/broadcast interrupts,
 it only handle lowest-priority and single destination interrupts. However, I
 can make the vector-hashing logic as a separate function, which can be
 used elsewhere.

iommu_intpost is a general option, not VT-d specific. It's fine to keep this 
function here. My earlier comment is more about the accuracy of the code
comment above. :-)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V4 1/3] xen/mem_access: Support for memory-content hiding

2015-07-08 Thread Lengyel, Tamas

On Wed, Jul 8, 2015 at 6:22 AM, Razvan Cojocaru rcojoc...@bitdefender.com
wrote:

 This patch adds support for memory-content hiding, by modifying the
 value returned by emulated instructions that read certain memory
 addresses that contain sensitive data. The patch only applies to
 cases where MEM_ACCESS_EMULATE or MEM_ACCESS_EMULATE_NOWRITE have
 been set to a vm_event response.

 Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com
 Acked-by: George Dunlap george.dun...@eu.citrix.com

 ---
 Changes since V3:
  - Renamed MEM_ACCESS_SET_EMUL_READ_DATA to
VM_EVENT_FLAG_SET_EMUL_READ_DATA and updated its comment.
  - Removed xfree(v-arch.vm_event.emul_read_data) from
free_vcpu_struct().
  - Returning X86EMUL_UNHANDLEABLE from hvmemul_cmpxchg() when
!curr-arch.vm_event.emul_read_data.
  - Replaced in xmalloc_bytes() with xmalloc_array() in
hvmemul_rep_outs_set_context().
  - Setting the rest of the buffer to zero in hvmemul_rep_movs()
(no longer leaking heap contents).
  - No longer memset()ing the whole buffer before copy (just zeroing
out the rest).
  - Moved hvmemul_ctxt-set_context = 0 to hvm_emulate_prepare() and
removed hvm_emulate_one_set_context().
 ---
  tools/tests/xen-access/xen-access.c |2 +-
  xen/arch/x86/hvm/emulate.c  |  138
 ++-
  xen/arch/x86/hvm/event.c|   50 ++---
  xen/arch/x86/mm/p2m.c   |   92 +--
  xen/common/domain.c |2 +
  xen/common/vm_event.c   |   23 ++
  xen/include/asm-x86/domain.h|2 +
  xen/include/asm-x86/hvm/emulate.h   |   10 ++-
  xen/include/public/vm_event.h   |   31 ++--
  9 files changed, 274 insertions(+), 76 deletions(-)


Acked-by: Tamas K Lengyel tleng...@novetta.com
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v8 12/14] make-flight: refactor PV debian tests

2015-07-08 Thread Ian Campbell

No functional change, standalone-generate-dump-flight-runvars confirms
no change to the runvars.

Includes a hook which is not used yet, $recipe_sfx.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
---
v4: new patch
---
 make-flight | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/make-flight b/make-flight
index 725da26..2a132df 100755
--- a/make-flight
+++ b/make-flight
@@ -361,6 +361,17 @@ do_pvgrub_tests () {
   all_hostflags=$most_hostflags
 }
 
+do_pv_debian_test_one () {
+  testname=$1; shift
+  recipe_sfx=$1; shift
+  toolstack=$1; shift
+
+  job_create_test test-$xenarch$kern-$dom0arch-$testname\
+ test-debian$recipe_sfx $toolstack  \
+$xenarch $dom0arch  \
+$debian_runvars all_hostflags=$most_hostflags $@
+}
+
 do_pv_debian_tests () {
   xsms=$(xenbranch_xsm_variants)
 
@@ -376,20 +387,13 @@ do_pv_debian_tests () {
   suffix=${platform:+-$platform}
   hostflags=${most_hostflags}${platform:+,platform-$platform}
 
-  job_create_test test-$xenarch$kern-$dom0arch-xl$suffix   \
-  test-debian xl   \
-  $xenarch $dom0arch   \
-  enable_xsm=$xsm  \
-  $debian_runvars all_hostflags=$hostflags
+  do_pv_debian_test_one xl$suffix '' xl enable_xsm=$xsm
+
 done
   done
 
   for xsm in $xsms ; do
-job_create_test test-$xenarch$kern-$dom0arch-libvirt \
-test-debian libvirt  \
-$xenarch $dom0arch   \
-enable_xsm=$xsm  \
-$debian_runvars all_hostflags=$most_hostflags
+do_pv_debian_test_one libvirt '' libvirt enable_xsm=$xsm
   done
 }
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v8 13/14] Add testing of file backed disk formats

2015-07-08 Thread Ian Campbell

xen-create-image makes this tricky to do since it is rather LVM
centric. Now that we have the ability to install from d-i it's
possible to arrange fairly easily that they use something other than
a phy backend over a bare LVM device.

Here we add support to the test script and infra and create a bunch of
new jobs testing the cross product of {xl,libvirt} x {raw,qcow2,vhd}.

A disk format of raw means a raw backing file, where as none (the
default) means to continue to use the base LVM device.

The test scripts are modified such that when constructing a domain
with a diskfmt runvar specifeies a file backed disk format (i.e. not
none):

 - the LVM device is slightly enlarged to account for file format
   headers (1M should be plenty).
 - the LVM device will have an ext3 filesystem created on it instead
   of being used as a phy device for the guest. Reusing the LVM volume
   in this way means we don't need to do more storage management in
   dom0 (i.e. arranging for / to be large enough, or managing a
   special images LV)
 - the relevant type of container is created within the filesystem
   using the appropriate tool.
 - New properties Disk{fmt,spec} are added to all $gho, containing
   the format used for the root disk and the xl diskspec to load it.
 - lvm backed guests use a xend/xm compatible spec, everything
   else uses the improved xl syntax which libvirt also supports.
   We won't test non-LVM on xend.
 - New properties Disk{mnt,img} are added to $gho which are not using
   LVM. These contain the mount point to use (configurable via
   OSSTEST_CONFIG and runvars) and the full path (including mount
   point) to the image itself.
 - When starting or stopping a guest we arrange for the filesystem to
   be (u)mounted.
 - The prepearation when starting a guest copes gracefully with
   the disk already being prepared.
 - Hooks are called from guest_create() and guest_destroy() to
   manipulate the disk as needed.

Using standalong-generate-dump-flight-runvars a representative set of
runvars is:
+test-amd64-amd64-xl-qcow2 all_hostflags   
arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test
+test-amd64-amd64-xl-qcow2 archamd64
+test-amd64-amd64-xl-qcow2 buildjob
build-amd64
+test-amd64-amd64-xl-qcow2 debian_arch amd64
+test-amd64-amd64-xl-qcow2 debian_bootloader   
pygrub
+test-amd64-amd64-xl-qcow2 debian_diskfmt  qcow2
+test-amd64-amd64-xl-qcow2 debian_kernkind pvops
+test-amd64-amd64-xl-qcow2 debian_method   
netboot
+test-amd64-amd64-xl-qcow2 debian_suite
wheezy
+test-amd64-amd64-xl-qcow2 kernbuildjob
build-amd64-pvops
+test-amd64-amd64-xl-qcow2 kernkindpvops
+test-amd64-amd64-xl-qcow2 toolstack   xl
+test-amd64-amd64-xl-qcow2 xenbuildjob 
build-amd64

Compared to test-amd64-amd64-pygrub (which is the most similar job) and
normalising the test name the difference is:
 test-amd64-amd64-SUFFIX   all_hostflags   
arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test
 test-amd64-amd64-SUFFIX   archamd64
 test-amd64-amd64-SUFFIX   buildjob
build-amd64
 test-amd64-amd64-SUFFIX   debian_arch amd64
 test-amd64-amd64-SUFFIX   debian_bootloader   
pygrub
+test-amd64-amd64-SUFFIX   debian_diskfmt  qcow2
+test-amd64-amd64-SUFFIX   debian_kernkind pvops
 test-amd64-amd64-SUFFIX   debian_method   
netboot
 test-amd64-amd64-SUFFIX   debian_suite
wheezy
 test-amd64-amd64-SUFFIX   kernbuildjob
build-amd64-pvops
 test-amd64-amd64-SUFFIX   kernkindpvops
 test-amd64-amd64-SUFFIX   toolstack   xl
 test-amd64-amd64-SUFFIX   xenbuildjob 
build-amd64

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
v8: Default diskfmt is none (was lvm), i.e. use the LVM device
directly. Reword the commit log to reflect this.

v7: Use the right arch for tests, not always amd64 (doesn't work well
on arm!)
Defer guest_find_diskimg until _vg runvar and thence Lvdev are
setup:
selectguest calls guest_find_lv then guest_find_diskimg, using
preexisting runvars.

But prepare_guest calls selectguest before setting disk_lv, so
Lvdev ends up undefined, after setting

[Xen-devel] [PATCH OSSTEST v8 14/14] make-distros-flight: Use ftp.debian.org directly

2015-07-08 Thread Ian Campbell

The local proxy seems to serve stale packages for Jessie etc, I blame
the intercepting cache on the way out of our network, similar to
b5f15136900d mg-debian-installer-update: workaround caching proxies,
except it is between the apt-cache and the world not the osstest vm
and the world.

Since the netboot kernel+initrd are reasonably small, these flights
are infrequent and they are intended to test the current upstream
version I think this is tollerable.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
 make-distros-flight | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/make-distros-flight b/make-distros-flight
index 49f4b60..d407fcb 100755
--- a/make-distros-flight
+++ b/make-distros-flight
@@ -79,7 +79,9 @@ test_do_one_netboot () {
   gsuite=sid
   gver=daily
   else
-local mirror=http://`getconfig DebianMirrorHost`/`getconfig 
DebianMirrorSubpath`
+#local mirror=http://`getconfig DebianMirrorHost`/`getconfig 
DebianMirrorSubpath`
+# XXX local mirror seems to serve up stale files.
+local mirror=http://ftp.debian.org/debian;
 diurl=$mirror/dists/$gsuite/main/installer-$domU/current/images/netboot
 gver=$gsuite
   fi
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v8 05/14] distros: Support pvgrub for Wheezy too.

2015-07-08 Thread Ian Campbell

This requires us to install pv-grub-menu from backports, which we do
using a late_command.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
v8:
  - Use a heredoc for sources.list additions, since this was suggested
by Ian I would have retained the ack apart from the second change.
  - Dropped unused $cd argument to setup_netinst. (Noticed when I came
to document gident_cd and found it wasn't used!)
v7:
  - Remove vestigial attempts to enable -backports via d-i preseeding.
v3:
  - Remove spurious () from (END) (and the prexisting  too)
  - Remove $xopts{EnableBackports} and automatically handle the need to add
backports in preseed_base.
  - Install via late_command not apt-setup, since the former has
issues, hence subject drops attempt to...
---
 Osstest/Debian.pm| 40 ---
 make-distros-flight  | 58 +--
 ts-debian-di-install | 76 +++-
 3 files changed, 168 insertions(+), 6 deletions(-)

diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm
index 1edf49f..7c94b6c 100644
--- a/Osstest/Debian.pm
+++ b/Osstest/Debian.pm
@@ -777,8 +777,6 @@ sub preseed_base (;@) {
 preseed_hook_overlay($ho, $sfx, 'overlay', 'overlay.tar');
 
 my $preseed = END;
-d-i mirror/suite string $suite
-
 d-i debian-installer/locale string en_GB
 d-i console-keymaps-at/keymap select gb
 d-i keyboard-configuration/xkb-keymap string en_GB
@@ -854,6 +852,11 @@ END
 d-i clock-setup/ntp-server string $ntpserver
 END
 
+# For CDROM the suite is part of the image
+$preseed .= END unless $xopts{CDROM};
+d-i mirror/suite string $suite
+END
+
 $preseed .= END;
 
 ### END OF DEBIAN PRESEED BASE
@@ -867,7 +870,38 @@ sub preseed_create_guest ($$;@) {
 
 my $suite= $xopts{Suite} || $c{DebianSuite};
 
-my $extra_packages = pv-grub-menu if $xopts{PvMenuLst};
+my $extra_packages = ;
+if ($xopts{PvMenuLst}) {
+if ($suite =~ m/wheezy/) {
+# pv-grub-menu/wheezy-backports + using apt-setup to add
+# backports results in iproute, ifupdown and
+# isc-dhcp-client getting removed because tasksel's
+# invocation of apt-get install somehow decides the
+# iproute2 from wheezy-backports is a thing it wants to
+# install. So instead lets fake it with a late command...
+#
+# This also has the bonus of working round an issue with
+# 1.2.1~bpo70+1 which created an invalid menu.lst using
+# root(/dev/xvda,0) which pvgrub cannot parse because
+# the Grub device.map isn't present at pkgsel/include time
+# but it is by late_command time. This was fixed by
+# version 1.3 which is in Jessie onwards.
+preseed_hook_command($ho, 'late_command', $sfx, END);
+#!/bin/sh
+set -ex
+
+cat EOF /target/etc/apt/sources.list
+
+\# $suite backports
+deb http://$c{DebianMirrorHost}/$c{DebianMirrorSubpath} $suite-backports main
+EOF
+in-target apt-get update
+in-target apt-get install -y -t wheezy-backports pv-grub-menu
+END
+} else {
+$extra_packages = pv-grub-menu;
+}
+}
 
 my $preseed_file= preseed_base($ho, $suite, $sfx, $extra_packages, %xopts);
 $preseed_file.= (END);
diff --git a/make-distros-flight b/make-distros-flight
index c19e3ba..49f4b60 100755
--- a/make-distros-flight
+++ b/make-distros-flight
@@ -106,9 +106,9 @@ test_do_one_netboot () {
 arm*_arm*_*) bootloader=pygrub;; # no pvgrub for arm
 
 # Needs a menu.lst, not present in Squeeze+ due to switch to grub2,
-# workedaround in Jessie+ with pv-grub-menu package.
+# workedaround in Wheezy+ with pv-grub-menu package (backports in Wheezy,
+# in Jessie+ main).
 *_squeeze) bootloader=pygrub;;
-*_wheezy) bootloader=pygrub;;
 
 # pv-grub-x86_64.gz is not built by 32-bit dom0 userspace build.
 i386_amd64_*) bootloader=pygrub;;
@@ -127,6 +127,48 @@ test_do_one_netboot () {
   all_hostflags=$most_hostflags
 }
 
+test_do_one_netinst () {
+  local path_arch
+  case $domU in
+amd64|i386) path_arch=multi-arch; file_arch=amd64-i386;;
+*)  path_arch=$domU;  file_arch=$domU;;
+  esac
+  case $domU in
+amd64) iso_path=/install.amd/xen;;
+i386)  iso_path=/install.386/xen;;
+*) iso_path=/install.$domU;;
+  esac
+
+  local cdurl=
+  case $cd in
+current)
+  cdurl=http://cdimage.debian.org/debian-cd/current/${path_arch}/jigdo-cd;
+  ;;
+weekly)
+  
cdurl=http://cdimage.debian.org/cdimage/weekly-builds/${path_arch}/jigdo-cd;
+  ;;
+*)
+  echo cd $cd?
+  exit 1
+  ;;
+  esac
+
+  # Always pygrub since no pv-grub-menu on CD
+  job_create_test   \
+   test-$xenarch$kern-$dom0arch-$domU-$cd-netinst-pygrub\
+test-debian-di xl $xenarch $dom0arch\
+

[Xen-devel] [PATCH OSSTEST v8 09/14] distros: Run one suite per day on a weekly basis

2015-07-08 Thread Ian Campbell

Once a week should be sufficient for these tests. Perhaps in the
future we will want to increase the frequency for the suites under
active development (testing, unstable)

For now run this on the Citrix Cambridge instance until the XenProject
instance has sufficient capacity.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
v8: Switch to hard tabs split to previous patch.
v7: Replaces distros: Run a flight over the weekend.
Now run in Cambridge
Run separate flight per-suite
Dropped Ack
---
 crontab-cambridge | 5 +
 1 file changed, 5 insertions(+)

diff --git a/crontab-cambridge b/crontab-cambridge
index e0c3eff..7d3ed57 100644
--- a/crontab-cambridge
+++ b/crontab-cambridge
@@ -2,4 +2,9 @@ 
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
 MAILTO=ian.jack...@citrix.com,ian.campb...@eu.citrix.com
 # mh   dom mon dow command
 4-59/30*   * * *   cd testing.git  
BRANCHES='osstest'./cr-for-branches branches -q 
./cr-daily-branch --real
+46 7   * * 6   cd testing.git  
BRANCHES='distros-debian-snapshot'./cr-for-branches branches -w 
./cr-daily-branch --real
+46 7   * * 5   cd testing.git  
BRANCHES='distros-debian-sid' ./cr-for-branches branches -w 
./cr-daily-branch --real
+46 7   * * 4   cd testing.git  
BRANCHES='distros-debian-jessie'  ./cr-for-branches branches -w 
./cr-daily-branch --real
+46 7   * * 3   cd testing.git  
BRANCHES='distros-debian-wheezy'  ./cr-for-branches branches -w 
./cr-daily-branch --real
+46 7   * * 2   cd testing.git  
BRANCHES='distros-debian-squeeze' ./cr-for-branches branches -w 
./cr-daily-branch --real
 3  4   * * *   savelog -c28 
testing.git/tmp/cr-for-branches.log /dev/null
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v8 04/14] distros: support booting Debian PV (d-i installed) guests with pvgrub.

2015-07-08 Thread Ian Campbell

This requires the use of the pv-grub-menu package which is in Jessie
onwards.  (it is in wheezy-backports which is the subject of a
subsequent patch).

The bootloader to use is specified via a runvar {Guest}_bootloader.

Adjust make-distros-flight to use pvgrub for some subset of i386 and
amd64 guests to get coverage.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
---
v8: Added comment regarding new runvar. Since this was inspired by
Ian's comment on distros: support PV guest install from Debian
netinst media I have retained the ack.
v7: Move definition of $extra_packages variable to here which is its
first usage.
Use {Guest}_suite not {Guest}_dist as runvar to choose version.
v3: Define and use arch_debian2xen and arch_xen2debian
Avoid pv-grub-x86_64.gz on i386 dom0, we don't built it there.
Fiddle with py vs pv grub stripy a bit.
---
 Osstest.pm   |  7 +++
 Osstest/Debian.pm|  4 +++-
 make-distros-flight  | 20 +++-
 ts-debian-di-install | 18 ++
 4 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/Osstest.pm b/Osstest.pm
index 6535401..8f97dd2 100644
--- a/Osstest.pm
+++ b/Osstest.pm
@@ -39,6 +39,7 @@ BEGIN {
   db_begin_work db_prepare
   ensuredir get_filecontents_core_quiet system_checked
   nonempty visible_undef show_abs_time
+  %arch_debian2xen %arch_xen2debian
   );
 %EXPORT_TAGS = ( );
 
@@ -54,6 +55,12 @@ scalar *main::DEBUG;
 # declaration prevents `Name main::DEBUG used only once'
 # scalar prevents `useless use of a variable in void context'
 
+our %arch_debian2xen = qw(i386 x86_32
+ amd64 x86_64
+ armhf armhf);
+our %arch_xen2debian;
+$arch_xen2debian{$arch_debian2xen{$_}} = $_ foreach keys %arch_debian2xen;
+
 #-- static default config settings --
 
 our %c = qw(
diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm
index 2d49ff8..1edf49f 100644
--- a/Osstest/Debian.pm
+++ b/Osstest/Debian.pm
@@ -867,7 +867,9 @@ sub preseed_create_guest ($$;@) {
 
 my $suite= $xopts{Suite} || $c{DebianSuite};
 
-my $preseed_file= preseed_base($ho, $suite, $sfx, '', %xopts);
+my $extra_packages = pv-grub-menu if $xopts{PvMenuLst};
+
+my $preseed_file= preseed_base($ho, $suite, $sfx, $extra_packages, %xopts);
 $preseed_file.= (END);
 d-i partman-auto/method string regular
 d-i partman-auto/choose_recipe \\
diff --git a/make-distros-flight b/make-distros-flight
index bdca7d1..c19e3ba 100755
--- a/make-distros-flight
+++ b/make-distros-flight
@@ -90,6 +90,11 @@ test_do_one_netboot () {
 *) ;;
   esac
 
+  stripy bootloader pvgrub pygrub \
+$xenarch amd64 \
+$dom0arch i386 \
+$domU amd64 \
+
   case $domU in
 i386|amd64)
   diurl=$diurl/xen;;
@@ -97,8 +102,20 @@ test_do_one_netboot () {
   diurl=$diurl/debian-installer/arm64;;
   esac
 
+  case ${dom0arch}_${domU}_${gsuite} in
+arm*_arm*_*) bootloader=pygrub;; # no pvgrub for arm
+
+# Needs a menu.lst, not present in Squeeze+ due to switch to grub2,
+# workedaround in Jessie+ with pv-grub-menu package.
+*_squeeze) bootloader=pygrub;;
+*_wheezy) bootloader=pygrub;;
+
+# pv-grub-x86_64.gz is not built by 32-bit dom0 userspace build.
+i386_amd64_*) bootloader=pygrub;;
+  esac
+
   job_create_test   \
-   test-$xenarch$kern-$dom0arch-$domU-$gver-netboot-pygrub  \
+   test-$xenarch$kern-$dom0arch-$domU-$gver-netboot-$bootloader \
 test-debian-di xl $xenarch $dom0arch\
   kernbuildjob=${bfi}build-$dom0arch-$kernbuild \
   debian_arch=$domU \
@@ -106,6 +123,7 @@ test_do_one_netboot () {
   debian_method=netboot \
   debian_netboot_kernel=$diurl/vmlinuz\
   debian_netboot_ramdisk=$diurl/initrd.gz \
+  debian_bootloader=$bootloader \
   all_hostflags=$most_hostflags
 }
 
diff --git a/ts-debian-di-install b/ts-debian-di-install
index 08019a9..a59194a 100755
--- a/ts-debian-di-install
+++ b/ts-debian-di-install
@@ -22,13 +22,16 @@
 #  Debian arch to install.
 # - gident_method:
 #  Install method, currently only netboot.
+# - gident_bootloader:
+#  The PV bootloader to use when booting the guest. One of
+#  pvgrub or pygrub. Default is pygrub.
 #
 # For method=netboot:
 #
 #  - gident_netboot_kernel:
-#   URL of the kernel to boot
+#   URL of the kernel to boot.
 #  - gident_netboot_ramdisk:
-#   URL of the ramdisk to boot
+#   URL of the ramdisk to boot.
 #
 #If neither kernel nor ramdisk are specified then the current
 #TftpDiVersion of d-i will be used, and the runvars will be set to
@@

[Xen-devel] [PATCH OSSTEST v8 11/14] ts-debian-di-install: Refactor root_disk specification

2015-07-08 Thread Ian Campbell

Signed-off-by: Ian Campbell ian.campb...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
---
v4: new patch
---
 ts-debian-di-install | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/ts-debian-di-install b/ts-debian-di-install
index 373fad1..6fafd6d 100755
--- a/ts-debian-di-install
+++ b/ts-debian-di-install
@@ -227,12 +227,14 @@ END
OnPowerOff = preserve
 );
 
+my $root_disk = 'phy:$gho-{Lvdev},xvda,w';
+
 prepareguest_part_xencfg($ho, $gho, $ram_mb, \%install_xopts, END);
 $method_cfg
 extra   = $cmdline
 #
 disk= [
-$extra_disk 'phy:$gho-{Lvdev},xvda,w'
+$extra_disk $root_disk
 ]
 END
 
@@ -256,7 +258,7 @@ END
 $blcfg
 #
 disk= [
-'phy:$gho-{Lvdev},xvda,w'
+$root_disk
 ]
 END
 return;
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v8 07/14] distros: add branch infrastructure

2015-07-08 Thread Ian Campbell

Since the distro nightlies are not version controlled we cannot use
the usual mechanisms for detecting regressions. Special case things
appropriately. We use an OLD_REVISION of flight-NNN to signify that
the old revision is another flight and not a tree revision.

A grep over $NEW_REVISION needed adjusting since NEW_REVISION is empty
in this mode, leading to grep filename which hangs waiting for
stdin.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
---
v7:
  Handle empty $NEW_REVISION by quoting it instead of a needless test -n
  Switch to flight-per-suite model
v3:
  Handle within cr-daily-branch, since ap-fetch-version* don't make sense for
  a branch such as this.
---
 cr-daily-branch | 36 
 cri-common  |  1 +
 2 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/cr-daily-branch b/cr-daily-branch
index 34b6d2b..1fcfd9d 100755
--- a/cr-daily-branch
+++ b/cr-daily-branch
@@ -68,23 +68,34 @@ fetch_version () {
printf '%s\n' $fetch_version_result
 }
 
-treeurl=`./ap-print-url $branch`
+case $branch in
+distros-*)
+   treeurl=none;;
+*)
+   treeurl=`./ap-print-url $branch`;;
+esac
 
 force_baseline=false
 skipidentical=true
 wantpush=$OSSTEST_PUSH
 
-if [ x$OLD_REVISION = x ]; then
-OLD_REVISION=`./ap-fetch-version-old $branch`
-export OLD_REVISION
-fi
-
 check_tested () {
./sg-check-tested --debug --branch=$branch \
  --blessings=${DAILY_BRANCH_TESTED_BLESSING:-$OSSTEST_BLESSING} \
  $@
 }
 
+if [ x$OLD_REVISION = x ]; then
+case $branch in
+   distros-*)
+   OSSTEST_NO_BASELINE=y
+   OLD_REVISION=flight-`check_tested`
+   ;;
+   *) OLD_REVISION=`./ap-fetch-version-old $branch`;;
+esac
+export OLD_REVISION
+fi
+
 if [ x$OSSTEST_NO_BASELINE != xy ] ; then
testedflight=`check_tested --revision-$tree=$OLD_REVISION`
 
@@ -227,6 +238,11 @@ if [ x$OLD_REVISION = xdetermine-late ]; then
OLD_REVISION=`./ap-fetch-version-baseline-late $branch $NEW_REVISION`
 fi
 
+case $branch in
+distros-*) makeflight=./make-distros-flight ;;
+*) makeflight=./make-flight ;;
+esac
+
 if [ x$NEW_REVISION = x$OLD_REVISION ]; then
 wantpush=false
for checkbranch in x $BRANCHES_ALWAYS; do
@@ -241,7 +257,7 @@ if [ x$NEW_REVISION = x$OLD_REVISION ]; then
 fi
 
 $DAILY_BRANCH_PREMAKE_HOOK
-flight=`./make-flight $branch $xenbranch $OSSTEST_BLESSING $@`
+flight=`$makeflight $branch $xenbranch $OSSTEST_BLESSING $@`
 $DAILY_BRANCH_POSTMAKE_HOOK
 
 heading=tmp/$flight.heading-info
@@ -261,6 +277,10 @@ fi
 revlog=tmp/$flight.revision-log
 
 case $NEW_REVISION/$OLD_REVISION in
+/flight-[0-9]*)
+   echo 2 SGR COMPARISON AGAINST ${OLD_REVISION}
+   sgr_args+= --that-flight=${OLD_REVISION#flight-}
+   ;;
 */*[^0-9a-f]* | *[^0-9a-f]*/*)
 echo 2 NO SGR COMPARISON badchar $NEW_REVISION/$OLD_REVISION
 ;;
@@ -321,7 +341,7 @@ start_email $flight $branch $sgr_args $subject_prefix
 push=false
 if grep '^tolerable$' $mrof /dev/null 21; then push=$wantpush; fi
 if test -f $branch.force; then push=$OSSTEST_PUSH; fi
-if grep -xF $NEW_REVISION $branch.force-rev; then push=$OSSTEST_PUSH; fi
+if grep -xF $NEW_REVISION $branch.force-rev; then push=$OSSTEST_PUSH; fi
 if test -f $branch.block; then push=false; fi
 
 if test -e $mrof  test -e $tree_bisect  ! grep '^broken' $mrof; then
diff --git a/cri-common b/cri-common
index ad44546..58b08f2 100644
--- a/cri-common
+++ b/cri-common
@@ -72,6 +72,7 @@ select_xenbranch () {
rumpuserxen)  tree=rumpuserxen; xenbranch=xen-unstable ;;
seabios)tree=seabios;   xenbranch=xen-unstable ;;
ovmf)   tree=ovmf;  xenbranch=xen-unstable ;;
+   distros-*)  tree=none;  xenbranch=xen-unstable ;;
osstest)tree=osstest;   xenbranch=xen-unstable ;;
esac
if [ x$tree = xlinux ]; then
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v8 03/14] distros: add support for installing Debian PV guests via d-i, flight and jobs

2015-07-08 Thread Ian Campbell

This patch introduces ts-debian-di-install which can install Debian
from a netboot (PXE) debian installer image. By default it installs
from the d-i image used by osstest (using the special Xen PV guest
enabled flavour where necessary) but it can also fetch the kernel and
ramdisk from URLs specified in runvars. The resulting guests boot the
distro kernel using pygrub (pvgrub will follow).

The distros flights differ substantially from the existing flights.
Introduce make-distros-flight using the functionality previously
refactored into mfi-common. The new flight tests all versions of
Debian from Squeeze onward as an amd64, i386 and armhf guests (armhf
from Jessie onwards only) using the usual smoke tests.

Test names are suffixed -pygrub pending the addition of pvgrub
variants in a future commit.

Add the new cases to sg-run-job

Signed-off-by: Ian Campbell ian.campb...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
---
v8: mfi-common: Allow make-*flight to filter the set of build jobs to
include was moved to the front of the series, so the bits of that
and make-distros-flight: don't bother building for XSM or libvirt.
have been folded in here. This patch and those two were all
already acked so I have retained the ack here.

Added description of configuraiton runvar's to
ts-debian-di-install itself. This was due to Ian's feedback on
distros: support PV guest install from Debian netinst media.
further down the series, so I have retained the ack.

v7: Use {Guest}_suite as runvar. Also use $suite not $dist in
make-distros-flight for consistency.
Switch to a flight per Debian suite model rather than one
enourmous flight.
Switch to constructing the URLs in make-distros-flight
v6: Only apply -xen suffix to x86 images when doing a netboot using
  the osstest version of d-i, since that is the only arch where we
  create such files, other arches can use the bare names.
Use the guest $arch not the host $r{arch} when finding the
  kernel+initrd to use for d-i install using the osstest d-i.
v4: use guest create
v3: $BUILD_LVEXTEND_MAX now handled in mfi-common
Consolidate setting of ruvars
Include $flight and $job in tmpdir name
Use Osstest::Debian::di_installcmdline_core
Document the usage of get_host_property on a guest object
Correct ARM netboot paths
Include bootloader in test name
   Should include -pv too?
console= repetition for Jessie onwards.
Wait for up to an hour for the install. I'd seen timeouts right at
the end of the install with the previous value
---
 Osstest/TestSupport.pm |   3 +
 make-distros-flight| 138 +
 sg-run-job |  11 +++
 ts-debian-di-install   | 180 +
 4 files changed, 332 insertions(+)
 create mode 100755 make-distros-flight
 create mode 100755 ts-debian-di-install

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index 1cace4f..3a7a535 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -931,8 +931,11 @@ sub propname_massage ($) {
 return $prop;
 }
 
+# It is fine to call this on a guest object too, in which case it will
+# always return $defval.
 sub get_host_property ($$;$) {
 my ($ho, $prop, $defval) = @_;
+return $defval unless $ho-{Properties};
 my $val = $ho-{Properties}{propname_massage($prop)};
 return defined($val) ? $val : $defval;
 }
diff --git a/make-distros-flight b/make-distros-flight
new file mode 100755
index 000..bdca7d1
--- /dev/null
+++ b/make-distros-flight
@@ -0,0 +1,138 @@
+#!/bin/bash
+
+# This is part of osstest, an automated testing framework for Xen.
+# Copyright (C) 2009-2013 Citrix Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+
+
+set -e
+
+branch=$1
+xenbranch=$2
+blessing=$3
+buildflight=$4
+
+flight=`./cs-flight-create $blessing $branch`
+
+. cri-common
+. ap-common
+. mfi-common
+
+defsuite=`getconfig DebianSuite`
+defguestsuite=`getconfig GuestDebianSuite`
+
+case $branch in
+  distros-debian-*) debian_suite=${branch#distros-debian-} ;;
+  *)echo $branch 2; exit 1   ;;
+esac
+
+job_create_build_filter_callback () {
+  local job=$1; shift
+
+  case $job in
+build-*-libvirt) return 1;;
+  esac
+  case  $*  in
+* enable_xsm=true *) return 1;;
+

[Xen-devel] [PATCH OSSTEST v8 10/14] Debian: Handle lack of bootloader support in d-i on ARM.

2015-07-08 Thread Ian Campbell

Debian doesn't currently know what bootloader to install in a Xen
guest on ARM. We install pv-grub-menu above which actually does what
we need, but the installer doesn't treat that as a bootloader.

Most ARM platforms end up installing a u-boot boot.scr, based on a
platform whitelist. This doesn't seem appropriate for us. Grub is not
available for arm32. For arm64 we will eventually end up with in-guest
UEFI and therefore grub-efi and things will work normally. I'm not
sure what the answer is going to be for arm32.

This patch enables the workaround for Wheezy, Jessie and Sid,
post-Jessie should be enabled as we add them. (Pre-wheezy does not
support running as a Xen guest on ARM so we don't test them at all).

Signed-off-by: Ian Campbell ian.campb...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
---
v4: Handle sid too
v3: New
---
 Osstest/Debian.pm| 14 --
 ts-debian-di-install |  6 --
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm
index 7c94b6c..4669047 100644
--- a/Osstest/Debian.pm
+++ b/Osstest/Debian.pm
@@ -865,8 +865,8 @@ END
 return $preseed;
 }
 
-sub preseed_create_guest ($$;@) {
-my ($ho, $sfx, %xopts) = @_;
+sub preseed_create_guest ($$$;@) {
+my ($ho, $arch, $sfx, %xopts) = @_;
 
 my $suite= $xopts{Suite} || $c{DebianSuite};
 
@@ -913,6 +913,16 @@ d-i grub-installer/bootdev  string /dev/xvda
 
 END
 
+# Debian doesn't currently know what bootloader to install in a
+# Xen guest on ARM. We install pv-grub-menu above which actually
+# does what we need, but the installer doesn't treat that as a
+# bootloader.
+logm(\$arch is $arch, \$suite is $suite);
+$preseed_file.= (END) if $arch =~ /^arm/  $suite =~ 
/wheezy|jessie|sid/;
+d-i nobootloader/confirmation_common boolean true
+
+END
+
 $preseed_file .= preseed_hook_cmds();
 
 return create_webfile($ho, preseed$sfx, $preseed_file);
diff --git a/ts-debian-di-install b/ts-debian-di-install
index 1a7e1d0..373fad1 100755
--- a/ts-debian-di-install
+++ b/ts-debian-di-install
@@ -192,7 +192,9 @@ END
 
$method_cfg = setup_netboot($tmpdir, $arch, $suite);
 
-   $ps_url = preseed_create_guest($gho, '', Suite=$suite, PvMenuLst=($bl 
eq pvgrub));
+   $ps_url = preseed_create_guest($gho, $arch, '',
+  Suite=$suite,
+  PvMenuLst=($bl eq pvgrub));
 
$extra_disk = ;
 }
@@ -202,7 +204,7 @@ END
 
($method_cfg,$extra_disk) = setup_netinst($tmpdir, $arch);
 
-   $ps_url = preseed_create_guest($gho, '', CDROM=1);
+   $ps_url = preseed_create_guest($gho, $arch, '', CDROM=1);
 }
 else
 {
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread Jan Beulich

 On 08.07.15 at 11:39, david.vra...@citrix.com wrote:
 On 08/07/15 09:56, Jan Beulich wrote:
 Rather than assuming only PV guests need special treatment (and
 dealing with that directly when an IRQ gets set up), keep all guest MSI
 IRQs masked until either the (HVM) guest unmasks them via vMSI or the
 (PV, PVHVM, or PVH) guest sets up an event channel for it.
 
 To not further clutter the common evtchn_bind_pirq() with x86-specific
 code, introduce an arch_evtchn_bind_pirq() hook instead.
 
 Can you describe the symptoms of the bug being fixed here?

Interrupts simply didn't get unmasked for PVHVM Linux guests.

 --- a/xen/include/asm-arm/irq.h
 +++ b/xen/include/asm-arm/irq.h
 @@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d, 
  
  void arch_move_irqs(struct vcpu *v);
  
 +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))
 
 Would this be better as a inline function?
 
 +
  /* Set IRQ type for an SPI */
  int irq_set_spi_type(unsigned int spi, unsigned int type);
  
 --- a/xen/include/xen/irq.h
 +++ b/xen/include/xen/irq.h
 @@ -172,4 +172,8 @@ unsigned int set_desc_affinity(struct ir
  unsigned int arch_hwdom_irqs(domid_t);
  #endif
  
 +#ifndef arch_evtchn_bind_pirq
 +void arch_evtchn_bind_pirq(struct domain *, int pirq);
 
 ... moving this into xen/include/asm-x86/irq.h

Oh, right, (also to Julien) - this is exactly the reason I do not want it
to be an inline function for ARM: I want the declaration here, not
replicated in every interested arch's header.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 13/15] vmx: Properly handle notification event when vCPU is running

2015-07-08 Thread Tian, Kevin

 From: Wu, Feng
 Sent: Wednesday, June 24, 2015 1:18 PM
 
 When a vCPU is running in Root mode and a notification event
 has been injected to it. we need to set VCPU_KICK_SOFTIRQ for
 the current cpu, so the pending interrupt in PIRR will be
 synced to vIRR before VM-Exit in time.
 
 Signed-off-by: Feng Wu feng...@intel.com

Acked-by: Kevin Tian kevin.t...@intel.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Wu, Feng



 -Original Message-
 From: Tian, Kevin
 Sent: Wednesday, July 08, 2015 7:00 PM
 To: Wu, Feng; xen-devel@lists.xen.org
 Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
 Yang Z; george.dun...@eu.citrix.com
 Subject: RE: [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
 
  From: Wu, Feng
  Sent: Wednesday, June 24, 2015 1:18 PM
 
  This patch includes the following aspects:
  - Add a global vector to wake up the blocked vCPU
when an interrupt is being posted to it (This
part was sugguested by Yang Zhang yang.z.zh...@intel.com).
  - Adds a new per-vCPU tasklet to wakeup the blocked
vCPU. It can be used in the case vcpu_unblock
cannot be called directly.
  - Define two per-cpu variables:
* pi_blocked_vcpu:
A list storing the vCPUs which were blocked on this pCPU.
 
* pi_blocked_vcpu_lock:
The spinlock to protect pi_blocked_vcpu.
 
  Signed-off-by: Feng Wu feng...@intel.com
  ---
  v3:
  - This patch is generated by merging the following three patches in v2:
 [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
 [RFC v2 10/15] vmx: Define two per-cpu variables
 [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d
 Posted-Interrupts
  - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
  - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct 
  arch_vmx_struct'
  - rename 'vcpu_wakeup_tasklet_handler' to
 'pi_vcpu_wakeup_tasklet_handler'
  - Make pi_wakeup_interrupt() static
  - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
  - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
  - Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
  - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'
 
   xen/arch/x86/hvm/vmx/vmcs.c|  3 +++
   xen/arch/x86/hvm/vmx/vmx.c | 54
  ++
   xen/include/asm-x86/hvm/hvm.h  |  1 +
   xen/include/asm-x86/hvm/vmx/vmcs.h |  5 
   xen/include/asm-x86/hvm/vmx/vmx.h  |  5 
   5 files changed, 68 insertions(+)
 
  diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
  index 11dc1b5..0c5ce3f 100644
  --- a/xen/arch/x86/hvm/vmx/vmcs.c
  +++ b/xen/arch/x86/hvm/vmx/vmcs.c
  @@ -631,6 +631,9 @@ int vmx_cpu_up(void)
   if ( cpu_has_vmx_vpid )
   vpid_sync_all();
 
  +INIT_LIST_HEAD(per_cpu(pi_blocked_vcpu, cpu));
  +spin_lock_init(per_cpu(pi_blocked_vcpu_lock, cpu));
  +
   return 0;
   }
 
  diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
  index b94ef6a..7db6009 100644
  --- a/xen/arch/x86/hvm/vmx/vmx.c
  +++ b/xen/arch/x86/hvm/vmx/vmx.c
  @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr,
 uint64_t
  *msr_content);
   static int vmx_msr_write_intercept(unsigned int msr, uint64_t
 msr_content);
   static void vmx_invlpg_intercept(unsigned long vaddr);
 
  +/*
  + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
  + * can find which vCPU should be waken up.
  + */
  +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
  +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
  +
   uint8_t __read_mostly posted_intr_vector;
  +uint8_t __read_mostly pi_wakeup_vector;
  +
  +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg)
  +{
  +vcpu_unblock((struct vcpu *)arg);
  +}
 
   static int vmx_domain_initialise(struct domain *d)
   {
  @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v)
   if ( v-vcpu_id == 0 )
   v-arch.user_regs.eax = 1;
 
  +tasklet_init(
  +v-arch.hvm_vmx.pi_vcpu_wakeup_tasklet,
  +pi_vcpu_wakeup_tasklet_handler,
  +(unsigned long)v);
  +
  +INIT_LIST_HEAD(v-arch.hvm_vmx.pi_blocked_vcpu_list);
  +
   return 0;
   }
 
   static void vmx_vcpu_destroy(struct vcpu *v)
   {
  +tasklet_kill(v-arch.hvm_vmx.pi_vcpu_wakeup_tasklet);
   /*
* There are cases that domain still remains in log-dirty mode when it
 is
* about to be destroyed (ex, user types 'xl destroy dom'), in which
 case
  @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
  vmx_function_table = {
   .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
   };
 
  +/*
  + * Handle VT-d posted-interrupt when VCPU is blocked.
  + */
  +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
  +{
  +struct arch_vmx_struct *vmx;
  +unsigned int cpu = smp_processor_id();
  +
  +spin_lock(per_cpu(pi_blocked_vcpu_lock, cpu));
  +
  +/*
  + * FIXME: The length of the list depends on how many
  + * vCPU is current blocked on this specific pCPU.
  + * This may hurt the interrupt latency if the list
  + * grows to too many entries.
  + */
 
 let's go with this linked list first until a real issue is identified.
 
  +list_for_each_entry(vmx, per_cpu(pi_blocked_vcpu, cpu),
  +pi_blocked_vcpu_list)
  +if ( vmx-pi_desc.on )
  +

Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread Julien Grall




On 08/07/2015 11:55, Jan Beulich wrote:

On 08.07.15 at 11:07, julien.gr...@citrix.com wrote:

On 08/07/2015 09:56, Jan Beulich wrote:

--- a/xen/include/asm-arm/irq.h
+++ b/xen/include/asm-arm/irq.h
@@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d,

   void arch_move_irqs(struct vcpu *v);

+#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))
+


This addition is here in order to ensure that d and pirq are evaluated,
right?


Sure.


If so, I didn't find it obvious to understand. Why didn't you use a
static inline? Or maybe add a comment explicitly say this is not
implemented.


A static inline could be used in this case, yes. But I see no
significant advantages. As to the comment - it is implemented,
it's just a no-op. And stating that it is a no-op would be
redundant with it obviously being so by looking at it.


It's not so obvious as I asked about it.

The first thing I saw was (d) + (pirq) and I though : Why do we want to 
add a domain with a pirq?. I only see after the (void) and it just 
because I remembered we talked about similar case a year ago.


Having a comment doesn't hurt and help the comprehension.

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] x86, arm: remove asm/spinlock.h from all architectures removed x86's _raw_read_unlock()

2015-07-08 Thread David Vrabel

On 08/07/15 11:45, Jan Beulich wrote:
 David,
 
 I'm afraid we'll need another fixup here, even if things build fine
 despite the removal.

Ah, we get a generic implementation instead.  Thanks for pointing this
out.  I'll fix it.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread Jan Beulich

 On 08.07.15 at 11:07, julien.gr...@citrix.com wrote:
 On 08/07/2015 09:56, Jan Beulich wrote:
 --- a/xen/include/asm-arm/irq.h
 +++ b/xen/include/asm-arm/irq.h
 @@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d,

   void arch_move_irqs(struct vcpu *v);

 +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))
 +
 
 This addition is here in order to ensure that d and pirq are evaluated, 
 right?

Sure.

 If so, I didn't find it obvious to understand. Why didn't you use a 
 static inline? Or maybe add a comment explicitly say this is not 
 implemented.

A static inline could be used in this case, yes. But I see no
significant advantages. As to the comment - it is implemented,
it's just a no-op. And stating that it is a no-op would be
redundant with it obviously being so by looking at it.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes

2015-07-08 Thread Wu, Feng

 -Original Message-
 From: Wu, Feng
 Sent: Wednesday, July 08, 2015 6:32 PM
 To: Tian, Kevin; xen-devel@lists.xen.org
 Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
 Yang Z; george.dun...@eu.citrix.com; Wu, Feng
 Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
 changes

  -Original Message-
  From: Tian, Kevin
  Sent: Wednesday, July 08, 2015 6:23 PM
  To: Wu, Feng; xen-devel@lists.xen.org
  Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
  Yang Z; george.dun...@eu.citrix.com
  Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
  changes

   From: Wu, Feng
   Sent: Wednesday, June 24, 2015 1:18 PM

   When guest changes its interrupt configuration (such as, vector, etc.)
   for direct-assigned devices, we need to update the associated IRTE
   with the new guest vector, so external interrupts from the assigned
   devices can be injected to guests without VM-Exit.

   For lowest-priority interrupts, we use vector-hashing mechamisn to find
   the destination vCPU. This follows the hardware behavior, since modern
   Intel CPUs use vector hashing to handle the lowest-priority interrupt.

   For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
   still use interrupt remapping.

   Signed-off-by: Feng Wu feng...@intel.com
   ---
   v3:
   - Use bitmap to store the all the possible destination vCPUs of an
   interrupt, then trying to find the right destination from the bitmap
   - Typo and some small changes

xen/drivers/passthrough/io.c | 96
   +++-
1 file changed, 95 insertions(+), 1 deletion(-)

   diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
   index 9b77334..18e24e1 100644
   --- a/xen/drivers/passthrough/io.c
   +++ b/xen/drivers/passthrough/io.c
   @@ -26,6 +26,7 @@
#include asm/hvm/iommu.h
#include asm/hvm/support.h
#include xen/hvm/irq.h
   +#include asm/io_apic.h

static DEFINE_PER_CPU(struct list_head, dpci_list);

   @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci
 *dpci)
xfree(dpci);
}

   +/*
   + * The purpose of this routine is to find the right destination vCPU for
   + * an interrupt which will be delivered by VT-d posted-interrupt. There
   + * are several cases as below:

  If you aim to have this interface common to more usages, don't restrict to
  VT-d posted-interrupt which should be just an example.

 Yes, making this a common interface should be better.

Thinking about this a little more, this function itself is kind of restricted to
VT-d posted-interrupt, since it doesn't handle multicast/broadcast interrupts,
it only handle lowest-priority and single destination interrupts. However, I
can make the vector-hashing logic as a separate function, which can be
used elsewhere.

Thanks,
Feng

   + *
   + * - For lowest-priority interrupts, we find the destination vCPU from 
   the
   + *   guest vector using vector-hashing mechanism and return true. This
  follows
   + *   the hardware behavior, since modern Intel CPUs use vector hashing
 to
   + *   handle the lowest-priority interrupt.

  Does AMD use same hashing mechanism? Can this interface be reused by
  other IOMMU type or it's an Intel specific implementation?

 I am not sure how AMD handle lowest-priority. Intel hardware guys told me
 recent Intel hardware platform use this method to deliver lowest-priority
 interrupts. What do you mean by other IOMMU type?

 Thanks,
 Feng

   + * - Otherwise, for single destination interrupt, it is straightforward 
   to
   + *   find the destination vCPU and return true.
   + * - For multicast/broadcast vCPU, we cannot handle it via interrupt
 posting,
   + *   so return false.
   + *
   + *   Here is the details about the vector-hashing mechanism:
   + *   1. For lowest-priority interrupts, store all the possible 
   destination
   + *  vCPUs in an array.
   + *   2. Use gvec % max number of destination vCPUs to find the right
   + *  destination vCPU in the array for the lowest-priority interrupt.
   + */
   +static struct vcpu *pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
   +  uint8_t dest_mode, uint8_t
  delivery_mode,
   +  uint8_t gvec)
   +{
   +unsigned long *dest_vcpu_bitmap = NULL;
   +unsigned int dest_vcpu_num = 0, idx = 0;
   +int size = (d-max_vcpus + BITS_PER_LONG - 1) / BITS_PER_LONG;
   +struct vcpu *v, *dest = NULL;
   +int i;
   +
   +dest_vcpu_bitmap = xzalloc_array(unsigned long, size);
   +if ( !dest_vcpu_bitmap )
   +{
   +dprintk(XENLOG_G_INFO,
   +dom%d: failed to allocate memory\n, d-domain_id);
   +return NULL;
   +}
   +
   +for_each_vcpu ( d, v )
   +{
   +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
   +

Re: [Xen-devel] [PATCH] xen: Use module_pci_driver() in platform pci driver.

2015-07-08 Thread David Vrabel

On 08/07/15 06:54, Rajat Jain wrote:
 Eliminate the module_init function by using module_pci_driver()

This is not equivalent since this adds a useless module_exit() function.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread Jan Beulich

 On 08.07.15 at 13:14, david.vra...@citrix.com wrote:
 On 08/07/15 11:58, Jan Beulich wrote:
 On 08.07.15 at 11:39, david.vra...@citrix.com wrote:
 On 08/07/15 09:56, Jan Beulich wrote:
 +
  /* Set IRQ type for an SPI */
  int irq_set_spi_type(unsigned int spi, unsigned int type);
  
 --- a/xen/include/xen/irq.h
 +++ b/xen/include/xen/irq.h
 @@ -172,4 +172,8 @@ unsigned int set_desc_affinity(struct ir
  unsigned int arch_hwdom_irqs(domid_t);
  #endif
  
 +#ifndef arch_evtchn_bind_pirq
 +void arch_evtchn_bind_pirq(struct domain *, int pirq);

 ... moving this into xen/include/asm-x86/irq.h
 
 Oh, right, (also to Julien) - this is exactly the reason I do not want it
 to be an inline function for ARM: I want the declaration here, not
 replicated in every interested arch's header.
 
 Ok.
 
 FWIW, with this requirement I would (instead of the macros) add a weak
 arch_evtchn_bind_pirq() that's a no-op.

Yeah, that's how Linux likes to do it. But we learned the hard way
that weak conflicts with our making symbols hidden by default, so
no, weak is not an option either I'm afraid.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V4 3/3] xen/vm_event: Deny register writes if refused by vm_event reply

2015-07-08 Thread Lengyel, Tamas


 Are the license headers required? I just tried to make the change as
 small as possible, and looking at the other headers (for example in
 xen/include/asm-arm), at least half of them have no license header. I'm
 guessing this is something we'd now like to start correcting in new
 patches?


 Thanks,
 Razvan


The wiki's definition that goes with the Signed-off-by tag goes: The
contribution was created in whole or in part by me and I have the right to
submit it under the open source license indicated in the file;. So, the
open source license should be indicated in the file. If it's a new file
being created, I would say it's the creators responsibility to add the
license. But I haven't seen any discussion/documentation on the matter so
I'm just guessing.

Cheers,
Tamas
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v25 05/15] x86/VPMU: Initialize VPMUs with __initcall

2015-07-08 Thread Dietmar Hahn

Am Freitag 19 Juni 2015, 14:44:36 schrieb Boris Ostrovsky:
 Move some VPMU initilization operations into __initcalls to avoid performing
 same tests and calculations for each vcpu.
 
 Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com
 Acked-by: Jan Beulich jbeul...@suse.com

For the Intel/VMX part:

Reviewed-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com

 ---
  xen/arch/x86/hvm/svm/vpmu.c   | 106 --
  xen/arch/x86/hvm/vmx/vpmu_core2.c | 151 
 +++---
  xen/arch/x86/hvm/vpmu.c   |  32 
  xen/include/asm-x86/hvm/vpmu.h|   2 +
  4 files changed, 156 insertions(+), 135 deletions(-)
 
 diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
 index 481ea7b..b60ca40 100644
 --- a/xen/arch/x86/hvm/svm/vpmu.c
 +++ b/xen/arch/x86/hvm/svm/vpmu.c
 @@ -356,54 +356,6 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t 
 *msr_content)
  return 1;
  }
  
 -static int amd_vpmu_initialise(struct vcpu *v)
 -{
 -struct xen_pmu_amd_ctxt *ctxt;
 -struct vpmu_struct *vpmu = vcpu_vpmu(v);
 -uint8_t family = current_cpu_data.x86;
 -
 -if ( counters == NULL )
 -{
 - switch ( family )
 -  {
 -  case 0x15:
 -  num_counters = F15H_NUM_COUNTERS;
 -  counters = AMD_F15H_COUNTERS;
 -  ctrls = AMD_F15H_CTRLS;
 -  k7_counters_mirrored = 1;
 -  break;
 -  case 0x10:
 -  case 0x12:
 -  case 0x14:
 -  case 0x16:
 -  default:
 -  num_counters = F10H_NUM_COUNTERS;
 -  counters = AMD_F10H_COUNTERS;
 -  ctrls = AMD_F10H_CTRLS;
 -  k7_counters_mirrored = 0;
 -  break;
 -  }
 -}
 -
 -ctxt = xzalloc_bytes(sizeof(*ctxt) +
 - 2 * sizeof(uint64_t) * num_counters);
 -if ( !ctxt )
 -{
 -gdprintk(XENLOG_WARNING, Insufficient memory for PMU, 
 - PMU feature is unavailable on domain %d vcpu %d.\n,
 -v-vcpu_id, v-domain-domain_id);
 -return -ENOMEM;
 -}
 -
 -ctxt-counters = sizeof(*ctxt);
 -ctxt-ctrls = ctxt-counters + sizeof(uint64_t) * num_counters;
 -
 -vpmu-context = ctxt;
 -vpmu-priv_context = NULL;
 -vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
 -return 0;
 -}
 -
  static void amd_vpmu_destroy(struct vcpu *v)
  {
  struct vpmu_struct *vpmu = vcpu_vpmu(v);
 @@ -474,30 +426,62 @@ struct arch_vpmu_ops amd_vpmu_ops = {
  
  int svm_vpmu_initialise(struct vcpu *v)
  {
 +struct xen_pmu_amd_ctxt *ctxt;
  struct vpmu_struct *vpmu = vcpu_vpmu(v);
 -uint8_t family = current_cpu_data.x86;
 -int ret = 0;
  
 -/* vpmu enabled? */
  if ( vpmu_mode == XENPMU_MODE_OFF )
  return 0;
  
 -switch ( family )
 +if ( !counters )
 +return -EINVAL;
 +
 +ctxt = xzalloc_bytes(sizeof(*ctxt) +
 + 2 * sizeof(uint64_t) * num_counters);
 +if ( !ctxt )
  {
 +printk(XENLOG_G_WARNING Insufficient memory for PMU, 
 +PMU feature is unavailable on domain %d vcpu %d.\n,
 +   v-vcpu_id, v-domain-domain_id);
 +return -ENOMEM;
 +}
 +
 +ctxt-counters = sizeof(*ctxt);
 +ctxt-ctrls = ctxt-counters + sizeof(uint64_t) * num_counters;
 +
 +vpmu-context = ctxt;
 +vpmu-priv_context = NULL;
 +
 +vpmu-arch_vpmu_ops = amd_vpmu_ops;
 +
 +vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
 +return 0;
 +}
 +
 +int __init amd_vpmu_init(void)
 +{
 +switch ( current_cpu_data.x86 )
 +{
 +case 0x15:
 +num_counters = F15H_NUM_COUNTERS;
 +counters = AMD_F15H_COUNTERS;
 +ctrls = AMD_F15H_CTRLS;
 +k7_counters_mirrored = 1;
 +break;
  case 0x10:
  case 0x12:
  case 0x14:
 -case 0x15:
  case 0x16:
 -ret = amd_vpmu_initialise(v);
 -if ( !ret )
 -vpmu-arch_vpmu_ops = amd_vpmu_ops;
 -return ret;
 +num_counters = F10H_NUM_COUNTERS;
 +counters = AMD_F10H_COUNTERS;
 +ctrls = AMD_F10H_CTRLS;
 +k7_counters_mirrored = 0;
 +break;
 +default:
 +printk(XENLOG_WARNING VPMU: Unsupported CPU family %#x\n,
 +   current_cpu_data.x86);
 +return -EINVAL;
  }
  
 -printk(VPMU: Initialization failed. 
 -   AMD processor family %d has not 
 -   been supported\n, family);
 -return -EINVAL;
 +return 0;
  }
  
 diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c 
 b/xen/arch/x86/hvm/vmx/vpmu_core2.c
 index cfcdf42..025c970 100644
 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
 +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
 @@ -708,62 +708,6 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs 
 *regs)
  return 1;
  }
  
 -static int core2_vpmu_initialise(struct vcpu *v)
 -{
 -struct vpmu_struct *vpmu = vcpu_vpmu(v);
 -u64 msr_content;
 -static bool_t ds_warned;
 -
 -if ( !(vpmu_features  XENPMU_FEATURE_INTEL_BTS) )
 -

Re: [Xen-devel] [v6][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-08 Thread Ian Jackson

Tiejun Chen writes ([v6][PATCH 10/16] tools: introduce some new parameters to 
set rdm policy):
 This patch introduces user configurable parameters to specify RDM
 resource and according policies,

Thanks.


I appreciate that I have come to this review late.  While I have found
the review conversation quite unsatisfactory, I don't really feel that
I can reject the patch series pending better answers to my questions.

Instead, I feel that I need to make a set of decisions which will
avoid my review comments being a blocker for this series.  After
discussing matters with the other tools maintainers, I have concluded:


* On the question of whether the default should be `strategy=host' or
  `strategy=none':

  I still don't understand what is going on here and I am frustrated
  because I don't feel that the replies I have been getting are
  actually answers to my questions.  They seem to be answers to
  different questions.

  However, the patch series with `strategy=none' is strictly less of a
  change to the codebase than with `stategy=host' and it is easy to
  change defaults later.  It would be perverse to block this
  functionality on the grounds that it is not enabled strongly enough
  by default.

  Therefore, despite the fact that after several rounds of emails I
  still do not have a convincing explanation, I am going to drop this
  line of questioning.


* On the question of the documentation: The documentation is
  unfortunately a poor guide to a user.  Many of my questions were
  prompted by reading the documentation.  Having gone several rounds
  of emails I still do not know enough to suggest improvements.

  In my view the effect of the poor documentation will be that most
  users will simply ignore the whole feature as too confusing.
  (Unless they have somehow divined that they are having RDM trouble
  in which case they may flail at random experimenting with various
  options.)

  Again, the effect therefore is that knowledgeable users might be
  able to do better, but for most users this is just yet another piece
  of docs for some feature they don't want to use.

  While I'm not entirely comfortable with accepting documentation
  which reduces the overall readability and usefulness of the manual,
  I think this is a relatively minor objection which I am prepared to
  overlook.

  Of course there is some opportunity for improving the documentation
  during the freeze.


* On the question of option naming, `strategy' vs `type':

  `type' was definitely wrong.  It may be that a better name than
  `strategy' would be correct.  This depends on the contemplated
  direction for future expansion.

  Sadly, I do not expect that further discussion is going to
  illuminate this further.  `strategy' will do.


* On the question of option naming, `none' vs `ignore':

  I asked whether the submitter agreed that `none' should be renamed
  `ignore'.  I have not received a clear opinion.  Instead, the
  submitter indicated a willingness to change this on my request.  the
  latest resubmission just did the rename.

  The purpose of asking `do you agree', in this way, is to try to help
  the submitters and the maintainers come up with the best answers.

  Note that it is a fundamental assumption of the patch review process
  that the submitter understands the design and implementation
  decisions embodied in the patchset.  The submitter needs to be able
  to respond to suggestions with evaluations, not simply acquiescence.
  (If it happens that some of the decisions were made by someone else,
  the submitter needs to 1. state this clearly where relevant and
  2. either consult the designers/authors, or if they aren't
  available, reverse-engineer the intent.)

  In the absence of a clear statement of the submitter's own opinion,
  I remain doubtful that this rename was correct.  But, I don't think
  it important enough to make any more fuss about.


* On the question of option naming, the `reserve='.

  Ian Campbell points out that the API structure for `[rdm_]reserve'
  as submitted is anomalous.  I agree with him.  The existing
  API and config file arrangements are rather too confusing.

  Please change `reserve' to `policy', in the following places:

  * In the xl rdm config parsing, `reserve=' should be `policy='.
  * In the xl pci config parsing, `rdm_reserve=' should be
`rdm_policy='.
  * The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  * The field name `reserve' in `libxl_rdm_reserve' should be
`policy'.


I think that with these changes I will be able to ack the remaining
tools parts of this series, and drop my objections to the parts acked
by Wei.

I can't speak for the hypervisor side, which I haven't really looked
at.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 08/15] Suppress posting interrupts when 'SN' is set

2015-07-08 Thread Wu, Feng



 -Original Message-
 From: Tian, Kevin
 Sent: Wednesday, July 08, 2015 7:31 PM
 To: Wu, Feng; xen-devel@lists.xen.org
 Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
 Yang Z; george.dun...@eu.citrix.com
 Subject: RE: [v3 08/15] Suppress posting interrupts when 'SN' is set
 
  From: Wu, Feng
  Sent: Wednesday, July 08, 2015 6:11 PM
   From: Tian, Kevin
   Sent: Wednesday, July 08, 2015 5:06 PM
  
From: Wu, Feng
Sent: Wednesday, June 24, 2015 1:18 PM
   
Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot send
posted-interrupt when 'SN' is set.
   
Signed-off-by: Feng Wu feng...@intel.com
---
v3:
use cmpxchg to test SN/ON and set ON
   
 xen/arch/x86/hvm/vmx/vmx.c | 32
  
 1 file changed, 28 insertions(+), 4 deletions(-)
   
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 0837627..b94ef6a 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1686,6 +1686,8 @@ static void
 __vmx_deliver_posted_interrupt(struct
   vcpu *v)
   
 static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
 {
+struct pi_desc old, new, prev;
+
  
   move to 'else if'.
  
 if ( pi_test_and_set_pir(vector, v-arch.hvm_vmx.pi_desc) )
 return;
   
@@ -1698,13 +1700,35 @@ static void vmx_deliver_posted_intr(struct
 vcpu
   *v, u8
vector)
  */
 pi_set_on(v-arch.hvm_vmx.pi_desc);
 }
-else if ( !pi_test_and_set_on(v-arch.hvm_vmx.pi_desc) )
+else
 {
+prev.control = 0;
+
+do {
+old.control = v-arch.hvm_vmx.pi_desc.control 
+  ~(1  POSTED_INTR_ON | 1 
   POSTED_INTR_SN);
+new.control = v-arch.hvm_vmx.pi_desc.control |
+  1  POSTED_INTR_ON;
+
+/*
+ * Currently, we don't support urgent interrupt, all
+ * interrupts are recognized as non-urgent interrupt,
+ * so we cannot send posted-interrupt when 'SN' is set.
+ * Besides that, if 'ON' is already set, we cannot set
+ * posted-interrupts as well.
+ */
+if ( prev.sn || prev.on )
+{
+vcpu_kick(v);
+return;
+}
  
   would it make more sense to move above check after cmpxchg?
 
  My original idea is that, we only need to do the check when
  prev.control != old.control, which means the cmpxchg is not
  successful completed. If we add the check between cmpxchg
  and while ( prev.control != old.control ), it seems the logic is
  not so clear, since we don't need to check prev.sn and prev.on
  when cmxchg succeeds in setting the new value.
 
  Thanks,
  Feng
 
 
 Then it'd be clearer if you move the check the start of the loop, so
 you can avoid two additional reads when the prev.on/sn is set. :-)

Good idea!

Thanks,
Feng

 
 Thanks
 Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86: correct socket_cpumask allocation for AP

2015-07-08 Thread Jan Beulich

 On 08.07.15 at 11:36, chao.p.p...@linux.intel.com wrote:
 @@ -84,11 +85,21 @@ void *stack_base[NR_CPUS];
  static void smp_store_cpu_info(int id)
  {
  struct cpuinfo_x86 *c = cpu_data + id;
 +unsigned int socket;
  
  *c = boot_cpu_data;
  if ( id != 0 )
 +{
  identify_cpu(c);
  
 +socket = cpu_to_socket(id);
 +if ( !socket_cpumask[socket] )
 +{
 +socket_cpumask[socket] = secondary_socket_cpumask;
 +secondary_socket_cpumask = NULL;

I don't think this will build with small enough NR_CPUS. Which
raises the question whether the use of cpumask_var_t is suitable
here in the first place.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v10 01/13] x86: add socket_cpumask

2015-07-08 Thread Jan Beulich

 On 08.07.15 at 04:43, chao.p.p...@linux.intel.com wrote:
 On Tue, Jul 07, 2015 at 06:32:55PM -0400, Boris Ostrovsky wrote:
 @@ -245,6 +248,8 @@ static void set_cpu_sibling_map(int cpu)
   cpumask_set_cpu(cpu, cpu_sibling_setup_map);
 +cpumask_set_cpu(cpu, socket_cpumask[cpu_to_socket(cpu)]);
 
 This patch crashes Xen on my 32-cpu Intel box here for cpu 16, which is the
 first CPU on the second socket (i.e. on socket 1).
 
 The reason appears to be that cpu_to_socket(16) is (correctly) 1 here, but
 ...
 
 +
   if ( c[cpu].x86_num_siblings  1 )
   {
   for_each_cpu ( i, cpu_sibling_setup_map )
 @@ -649,7 +654,13 @@ void cpu_exit_clear(unsigned int cpu)
   static void cpu_smpboot_free(unsigned int cpu)
   {
 -unsigned int order;
 +unsigned int order, socket = cpu_to_socket(cpu);
 +
 +if ( cpumask_empty(socket_cpumask[socket]) )
 +{
 +free_cpumask_var(socket_cpumask[socket]);
 +socket_cpumask[socket] = NULL;
 +}
   free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
   free_cpumask_var(per_cpu(cpu_core_mask, cpu));
 @@ -694,6 +705,7 @@ static int cpu_smpboot_alloc(unsigned int cpu)
   nodeid_t node = cpu_to_node(cpu);
   struct desc_struct *gdt;
   unsigned long stub_page;
 +unsigned int socket = cpu_to_socket(cpu);
 
 ... is zero here, meaning that socket_cpumask[1] is NULL. I suspect that
 phys_proc_id is probably not set at this point but is by the time we get to
 set_cpu_sibling_map(). I haven't looked any further yet. I might do this
 tomorrow unless Chao does it before me.
 
 Thanks for testing.

Boris' report first of all raises the question: Did you test this at all
on a multi-socket system? Considering you not having tested the
CPU removal case either, I'm starting to wonder how much testing
this series has seen overall...

 I think I have found the reason. For AP, phys_proc_id is set in:
 start_secondary()=smp_callin()=smp_store_cpu_info()=identify_cpu()
 which is behind cpu_smpboot_alloc() called from CPU_PREPARE.
 
 One way would move 'zalloc_cpumask_var(socket_cpumask + socket)' to
 set_cpu_sibling_map() to fix it if Jan agrees that, otherwise other
 solution needs to be found.

Looks sensible at a first glance, but in order to be able to do
proper error handling the allocation needs to remain in
cpu_smpboot_alloc(). I.e. you'd add a static variable, pre-
allocate a cpumask into it if it's currently NULL, and consume the
allocation in set_cpu_sibling_map() (or maybe even better in
smp_store_cpu_info() right after the identify_cpu() call) if
socket_cpumask[socket] is NULL.

And then you test this on an affected system, and submit
asap, so we can preferably avoid reverting the whole series.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] traps.c:3227: GPF (0000): ffff82d080194a4d - ffff82d080239d85 and other dom0 induced log messages

2015-07-08 Thread Jan Beulich

 On 08.07.15 at 10:45, li...@eikelenboom.it wrote:
 Here we go:
 
 (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 - 
 82d080239d85
 (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 - 
 82d080239d85
 
 which leads to:
 # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080195583
 /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758
 
 # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080239d85
 ??:?
 
 Were /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 leads to:
 
 case MSR_EFER:
  rdmsr_normal:
 /* Everyone can read the MSR space. */
 /* gdprintk(XENLOG_WARNING,Domain attempted RDMSR %p.\n,
 _p(regs-ecx));*/
 HERE --if ( rdmsr_safe(regs-ecx, val) )

Right, so as Andrew suspected - we won't know whether that's
legitimate/reasonable without knowing the MSR being accessed.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] x86: correct socket_cpumask allocation for AP

2015-07-08 Thread Chao Peng

For AP, phys_proc_id is still not valid in CPU_PREPARE notifier
(cpu_smpboot_alloc), so cpu_to_socket(cpu) is not valid as well.

Introduce a pre-allocated secondary_cpu_mask so that later in
smp_store_cpu_info() socket_cpumask[socket] can consume it.

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
---
This is targeted for staging branch.
I tested on a 2-sockets machine and looks fine.
---
 xen/arch/x86/smpboot.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index c73aa1b..49b8497 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -62,6 +62,7 @@ EXPORT_SYMBOL(cpu_online_map);
 
 unsigned int __read_mostly nr_sockets;
 cpumask_var_t *__read_mostly socket_cpumask;
+static cpumask_var_t secondary_socket_cpumask;
 
 struct cpuinfo_x86 cpu_data[NR_CPUS];
 
@@ -84,11 +85,21 @@ void *stack_base[NR_CPUS];
 static void smp_store_cpu_info(int id)
 {
 struct cpuinfo_x86 *c = cpu_data + id;
+unsigned int socket;
 
 *c = boot_cpu_data;
 if ( id != 0 )
+{
 identify_cpu(c);
 
+socket = cpu_to_socket(id);
+if ( !socket_cpumask[socket] )
+{
+socket_cpumask[socket] = secondary_socket_cpumask;
+secondary_socket_cpumask = NULL;
+}
+}
+
 /*
  * Certain Athlons might work (for various values of 'work') in SMP
  * but they are not certified as MP capable.
@@ -705,7 +716,6 @@ static int cpu_smpboot_alloc(unsigned int cpu)
 nodeid_t node = cpu_to_node(cpu);
 struct desc_struct *gdt;
 unsigned long stub_page;
-unsigned int socket = cpu_to_socket(cpu);
 
 if ( node != NUMA_NO_NODE )
 memflags = MEMF_node(node);
@@ -748,8 +758,8 @@ static int cpu_smpboot_alloc(unsigned int cpu)
 goto oom;
 per_cpu(stubs.addr, cpu) = stub_page + STUB_BUF_CPU_OFFS(cpu);
 
-if ( !socket_cpumask[socket] 
- !zalloc_cpumask_var(socket_cpumask + socket) )
+if ( !secondary_socket_cpumask 
+ !zalloc_cpumask_var(secondary_socket_cpumask) )
 goto oom;
 
 if ( zalloc_cpumask_var(per_cpu(cpu_sibling_mask, cpu)) 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH V4 0/3] Vm_event memory introspection helpers

2015-07-08 Thread Razvan Cojocaru

Version 4 of the series addresses V3 reviews, and consists of:

[PATCH 1/3] xen/mem_access: Support for memory-content hiding
[PATCH 2/3] xen/vm_event: Support for guest-requested events
[PATCH 3/3] xen/vm_event: Deny register writes if refused by
vm_event reply

All the patches in this version have been acked by at least one
person. For [PATCH 3/3], Tamas has suggested that I move the
DENY logic from p2m.c to dedicated files, which I've done here.
Since this is simply a trivial move without any modifications
to the logic itself, I've kept both acks received for the patch;
George's ack should in any case not be an issue, as it only
concerned the mm parts which are unchanged, but if I shouldn't
have kept Jan's ack then please disregard it.

This version of the series assumes the patch vm_event: Rename
MEM_ACCESS_EMULATE and MEM_ACCESS_EMULATE_NOWRITE that I've
submitted yesterday. I've not added that patch to this series
because I wanted it to be available for Tamas as well, as he's
working on a parallel series and I had hoped that this way
would be better than him having to wait for this whole series
to go in.


Thank you,
Razvan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH V4 3/3] xen/vm_event: Deny register writes if refused by vm_event reply

2015-07-08 Thread Razvan Cojocaru

Deny register writes if a vm_client subscribed to mov_to_msr or
control register write events forbids them. Currently supported for
MSR, CR0, CR3 and CR4 events.

Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com
Acked-by: George Dunlap george.dun...@eu.citrix.com
Acked-by: Jan Beulich jbeul...@suse.com

---
Changes since V3:
 - Renamed MEM_ACCESS_FLAG_DENY to VM_EVENT_FLAG_DENY (and fixed
   the bit shift appropriately).
 - Moved the DENY vm_event response logic from p2m.c to newly
   added dedicated files for vm_event handling, as suggested
   by Tamas Lengyel.
---
 MAINTAINERS   |1 +
 xen/arch/x86/Makefile |1 +
 xen/arch/x86/domain.c |2 +
 xen/arch/x86/hvm/emulate.c|8 +--
 xen/arch/x86/hvm/event.c  |5 +-
 xen/arch/x86/hvm/hvm.c|  118 -
 xen/arch/x86/hvm/svm/nestedsvm.c  |   14 ++---
 xen/arch/x86/hvm/svm/svm.c|2 +-
 xen/arch/x86/hvm/vmx/vmx.c|   15 +++--
 xen/arch/x86/hvm/vmx/vvmx.c   |   18 +++---
 xen/arch/x86/vm_event.c   |   33 +++
 xen/common/vm_event.c |9 +++
 xen/include/asm-arm/vm_event.h|   12 
 xen/include/asm-x86/domain.h  |   18 +-
 xen/include/asm-x86/hvm/event.h   |9 ++-
 xen/include/asm-x86/hvm/support.h |9 +--
 xen/include/asm-x86/vm_event.h|8 +++
 xen/include/public/vm_event.h |6 ++
 18 files changed, 242 insertions(+), 46 deletions(-)
 create mode 100644 xen/arch/x86/vm_event.c
 create mode 100644 xen/include/asm-arm/vm_event.h
 create mode 100644 xen/include/asm-x86/vm_event.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 6b1068e..59c0822 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -383,6 +383,7 @@ F:  xen/common/vm_event.c
 F: xen/common/mem_access.c
 F: xen/arch/x86/hvm/event.c
 F: xen/arch/x86/monitor.c
+F: xen/arch/x86/vm_event.c
 
 XENTRACE
 M: George Dunlap george.dun...@eu.citrix.com
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 37e547c..5f24951 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -60,6 +60,7 @@ obj-y += machine_kexec.o
 obj-y += crash.o
 obj-y += tboot.o
 obj-y += hpet.o
+obj-y += vm_event.o
 obj-y += xstate.o
 
 obj-$(crash_debug) += gdbstub.o
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a8fe046..c688ab9 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -678,6 +678,8 @@ void arch_domain_destroy(struct domain *d)
 cleanup_domain_irq_mapping(d);
 
 psr_free_rmid(d);
+
+xfree(d-arch.event_write_data);
 }
 
 void arch_domain_shutdown(struct domain *d)
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index c6ccb1f..780adb4 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1389,14 +1389,14 @@ static int hvmemul_write_cr(
 switch ( reg )
 {
 case 0:
-return hvm_set_cr0(val);
+return hvm_set_cr0(val, 1);
 case 2:
 current-arch.hvm_vcpu.guest_cr[2] = val;
 return X86EMUL_OKAY;
 case 3:
-return hvm_set_cr3(val);
+return hvm_set_cr3(val, 1);
 case 4:
-return hvm_set_cr4(val);
+return hvm_set_cr4(val, 1);
 default:
 break;
 }
@@ -1417,7 +1417,7 @@ static int hvmemul_write_msr(
 uint64_t val,
 struct x86_emulate_ctxt *ctxt)
 {
-return hvm_msr_write_intercept(reg, val);
+return hvm_msr_write_intercept(reg, val, 1);
 }
 
 static int hvmemul_wbinvd(
diff --git a/xen/arch/x86/hvm/event.c b/xen/arch/x86/hvm/event.c
index 17638ea..042e583 100644
--- a/xen/arch/x86/hvm/event.c
+++ b/xen/arch/x86/hvm/event.c
@@ -90,7 +90,7 @@ static int hvm_event_traps(uint8_t sync, vm_event_request_t 
*req)
 return 1;
 }
 
-void hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
+bool_t hvm_event_cr(unsigned int index, unsigned long value, unsigned long old)
 {
 struct arch_domain *currad = current-domain-arch;
 unsigned int ctrlreg_bitmask = monitor_ctrlreg_bitmask(index);
@@ -109,7 +109,10 @@ void hvm_event_cr(unsigned int index, unsigned long value, 
unsigned long old)
 
 hvm_event_traps(currad-monitor.write_ctrlreg_sync  ctrlreg_bitmask,
 req);
+return 1;
 }
+
+return 0;
 }
 
 void hvm_event_msr(unsigned int msr, uint64_t value)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 536d1c8..abfca33 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -52,6 +52,7 @@
 #include asm/traps.h
 #include asm/mc146818rtc.h
 #include asm/mce.h
+#include asm/monitor.h
 #include asm/hvm/hvm.h
 #include asm/hvm/vpt.h
 #include asm/hvm/support.h
@@ -468,6 +469,35 @@ void hvm_do_resume(struct vcpu *v)
 }
 }
 
+if ( unlikely(d-arch.event_write_data) )
+{
+struct monitor_write_data *w = d-arch.event_write_data[v-vcpu_id];
+
+if ( w-do_write.msr )
+{
+

[Xen-devel] [PATCH V4 1/3] xen/mem_access: Support for memory-content hiding

2015-07-08 Thread Razvan Cojocaru

This patch adds support for memory-content hiding, by modifying the
value returned by emulated instructions that read certain memory
addresses that contain sensitive data. The patch only applies to
cases where MEM_ACCESS_EMULATE or MEM_ACCESS_EMULATE_NOWRITE have
been set to a vm_event response.

Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com
Acked-by: George Dunlap george.dun...@eu.citrix.com

---
Changes since V3:
 - Renamed MEM_ACCESS_SET_EMUL_READ_DATA to
   VM_EVENT_FLAG_SET_EMUL_READ_DATA and updated its comment.
 - Removed xfree(v-arch.vm_event.emul_read_data) from
   free_vcpu_struct().
 - Returning X86EMUL_UNHANDLEABLE from hvmemul_cmpxchg() when
   !curr-arch.vm_event.emul_read_data.
 - Replaced in xmalloc_bytes() with xmalloc_array() in
   hvmemul_rep_outs_set_context().
 - Setting the rest of the buffer to zero in hvmemul_rep_movs()
   (no longer leaking heap contents).
 - No longer memset()ing the whole buffer before copy (just zeroing
   out the rest).
 - Moved hvmemul_ctxt-set_context = 0 to hvm_emulate_prepare() and
   removed hvm_emulate_one_set_context().
---
 tools/tests/xen-access/xen-access.c |2 +-
 xen/arch/x86/hvm/emulate.c  |  138 ++-
 xen/arch/x86/hvm/event.c|   50 ++---
 xen/arch/x86/mm/p2m.c   |   92 +--
 xen/common/domain.c |2 +
 xen/common/vm_event.c   |   23 ++
 xen/include/asm-x86/domain.h|2 +
 xen/include/asm-x86/hvm/emulate.h   |   10 ++-
 xen/include/public/vm_event.h   |   31 ++--
 9 files changed, 274 insertions(+), 76 deletions(-)

diff --git a/tools/tests/xen-access/xen-access.c 
b/tools/tests/xen-access/xen-access.c
index 12ab921..e6ca9ba 100644
--- a/tools/tests/xen-access/xen-access.c
+++ b/tools/tests/xen-access/xen-access.c
@@ -530,7 +530,7 @@ int main(int argc, char *argv[])
 break;
 case VM_EVENT_REASON_SOFTWARE_BREAKPOINT:
 printf(Breakpoint: rip=%016PRIx64, gfn=%PRIx64 (vcpu 
%d)\n,
-   req.regs.x86.rip,
+   req.data.regs.x86.rip,
req.u.software_breakpoint.gfn,
req.vcpu_id);
 
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index fe5661d..c6ccb1f 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -653,6 +653,31 @@ static int hvmemul_read(
 unsigned int bytes,
 struct x86_emulate_ctxt *ctxt)
 {
+struct hvm_emulate_ctxt *hvmemul_ctxt =
+container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
+
+if ( unlikely(hvmemul_ctxt-set_context) )
+{
+struct vcpu *curr = current;
+unsigned int safe_bytes;
+
+if ( !curr-arch.vm_event.emul_read_data )
+return X86EMUL_UNHANDLEABLE;
+
+safe_bytes = min_t(unsigned int,
+   bytes, curr-arch.vm_event.emul_read_data-size);
+
+if ( safe_bytes )
+{
+memcpy(p_data, curr-arch.vm_event.emul_read_data-data, 
safe_bytes);
+
+if ( bytes  safe_bytes )
+memset(p_data + safe_bytes, 0, bytes - safe_bytes);
+}
+
+return X86EMUL_OKAY;
+}
+
 return __hvmemul_read(
 seg, offset, p_data, bytes, hvm_access_read,
 container_of(ctxt, struct hvm_emulate_ctxt, ctxt));
@@ -893,6 +918,28 @@ static int hvmemul_cmpxchg(
 unsigned int bytes,
 struct x86_emulate_ctxt *ctxt)
 {
+struct hvm_emulate_ctxt *hvmemul_ctxt =
+container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
+
+if ( unlikely(hvmemul_ctxt-set_context) )
+{
+struct vcpu *curr = current;
+
+if ( curr-arch.vm_event.emul_read_data )
+{
+unsigned int safe_bytes = min_t(unsigned int, bytes,
+curr-arch.vm_event.emul_read_data-size);
+
+memcpy(p_new, curr-arch.vm_event.emul_read_data-data,
+   safe_bytes);
+
+if ( bytes  safe_bytes )
+memset(p_new + safe_bytes, 0, bytes - safe_bytes);
+}
+else
+return X86EMUL_UNHANDLEABLE;
+}
+
 /* Fix this in case the guest is really relying on r-m-w atomicity. */
 return hvmemul_write(seg, offset, p_new, bytes, ctxt);
 }
@@ -935,6 +982,43 @@ static int hvmemul_rep_ins(
!!(ctxt-regs-eflags  X86_EFLAGS_DF), gpa);
 }
 
+static int hvmemul_rep_outs_set_context(
+enum x86_segment src_seg,
+unsigned long src_offset,
+uint16_t dst_port,
+unsigned int bytes_per_rep,
+unsigned long *reps,
+struct x86_emulate_ctxt *ctxt)
+{
+unsigned int bytes = *reps * bytes_per_rep;
+struct vcpu *curr = current;
+unsigned int safe_bytes;
+char *buf = NULL;
+int rc;
+
+if ( !curr-arch.vm_event.emul_read_data )
+return X86EMUL_UNHANDLEABLE;
+
+buf = xmalloc_array(char, bytes);
+
+if ( buf == NULL )
+return

[Xen-devel] [PATCH V4 2/3] xen/vm_event: Support for guest-requested events

2015-07-08 Thread Razvan Cojocaru

Added support for a new class of vm_events: VM_EVENT_REASON_REQUEST,
sent via HVMOP_request_vm_event. The guest can request that a
generic vm_event (containing only the vm_event-filled guest registers
as information) be sent to userspace by setting up the correct
registers and doing a VMCALL. For example, for a 32-bit guest, this
means: EAX = 34 (hvmop), EBX = 24 (HVMOP_guest_request_vm_event),
ECX = 0 (NULL required for the hypercall parameter, reserved).

Signed-off-by: Razvan Cojocaru rcojoc...@bitdefender.com
Acked-by: Tamas K Lengyel tleng...@novetta.com
Acked-by: Wei Liu wei.l...@citrix.com
Acked-by: Jan Beulich jbeul...@suse.com

---
Changes since V3:
 - None, just addded acks.
---
 tools/libxc/include/xenctrl.h   |2 ++
 tools/libxc/xc_monitor.c|   15 +++
 xen/arch/x86/hvm/event.c|   16 
 xen/arch/x86/hvm/hvm.c  |8 +++-
 xen/arch/x86/monitor.c  |   16 
 xen/include/asm-x86/domain.h|   16 +---
 xen/include/asm-x86/hvm/event.h |1 +
 xen/include/public/domctl.h |6 ++
 xen/include/public/hvm/hvm_op.h |2 ++
 xen/include/public/vm_event.h   |2 ++
 10 files changed, 76 insertions(+), 8 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d1d2ab3..4ce519a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2384,6 +2384,8 @@ int xc_monitor_mov_to_msr(xc_interface *xch, domid_t 
domain_id, bool enable,
 int xc_monitor_singlestep(xc_interface *xch, domid_t domain_id, bool enable);
 int xc_monitor_software_breakpoint(xc_interface *xch, domid_t domain_id,
bool enable);
+int xc_monitor_guest_request(xc_interface *xch, domid_t domain_id,
+ bool enable, bool sync);
 
 /***
  * Memory sharing operations.
diff --git a/tools/libxc/xc_monitor.c b/tools/libxc/xc_monitor.c
index 63013de..d979122 100644
--- a/tools/libxc/xc_monitor.c
+++ b/tools/libxc/xc_monitor.c
@@ -105,3 +105,18 @@ int xc_monitor_singlestep(xc_interface *xch, domid_t 
domain_id,
 
 return do_domctl(xch, domctl);
 }
+
+int xc_monitor_guest_request(xc_interface *xch, domid_t domain_id, bool enable,
+ bool sync)
+{
+DECLARE_DOMCTL;
+
+domctl.cmd = XEN_DOMCTL_monitor_op;
+domctl.domain = domain_id;
+domctl.u.monitor_op.op = enable ? XEN_DOMCTL_MONITOR_OP_ENABLE
+: XEN_DOMCTL_MONITOR_OP_DISABLE;
+domctl.u.monitor_op.event = XEN_DOMCTL_MONITOR_EVENT_GUEST_REQUEST;
+domctl.u.monitor_op.u.guest_request.sync = sync;
+
+return do_domctl(xch, domctl);
+}
diff --git a/xen/arch/x86/hvm/event.c b/xen/arch/x86/hvm/event.c
index 5341937..17638ea 100644
--- a/xen/arch/x86/hvm/event.c
+++ b/xen/arch/x86/hvm/event.c
@@ -126,6 +126,22 @@ void hvm_event_msr(unsigned int msr, uint64_t value)
 hvm_event_traps(1, req);
 }
 
+void hvm_event_guest_request(void)
+{
+struct vcpu *curr = current;
+struct arch_domain *currad = curr-domain-arch;
+
+if ( currad-monitor.guest_request_enabled )
+{
+vm_event_request_t req = {
+.reason = VM_EVENT_REASON_GUEST_REQUEST,
+.vcpu_id = curr-vcpu_id,
+};
+
+hvm_event_traps(currad-monitor.guest_request_sync, req);
+}
+}
+
 int hvm_event_int3(unsigned long gla)
 {
 int rc = 0;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..536d1c8 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5974,7 +5974,6 @@ static int hvmop_get_param(
 #define HVMOP_op_mask 0xff
 
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
-
 {
 unsigned long start_iter, mask;
 long rc = 0;
@@ -6388,6 +6387,13 @@ long do_hvm_op(unsigned long op, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 break;
 }
 
+case HVMOP_guest_request_vm_event:
+if ( guest_handle_is_null(arg) )
+hvm_event_guest_request();
+else
+rc = -EINVAL;
+break;
+
 default:
 {
 gdprintk(XENLOG_DEBUG, Bad HVM op %ld.\n, op);
diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
index 896acf7..f8df7d2 100644
--- a/xen/arch/x86/monitor.c
+++ b/xen/arch/x86/monitor.c
@@ -161,6 +161,22 @@ int monitor_domctl(struct domain *d, struct 
xen_domctl_monitor_op *mop)
 break;
 }
 
+case XEN_DOMCTL_MONITOR_EVENT_GUEST_REQUEST:
+{
+bool_t status = ad-monitor.guest_request_enabled;
+
+rc = status_check(mop, status);
+if ( rc )
+return rc;
+
+ad-monitor.guest_request_sync = mop-u.guest_request.sync;
+
+domain_pause(d);
+ad-monitor.guest_request_enabled = !status;
+domain_unpause(d);
+break;
+}
+
 default:
 return -EOPNOTSUPP;
 
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 7908844..f712caa 100644
---

Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes

2015-07-08 Thread Tian, Kevin

 From: Wu, Feng
 Sent: Wednesday, June 24, 2015 1:18 PM
 
 When guest changes its interrupt configuration (such as, vector, etc.)
 for direct-assigned devices, we need to update the associated IRTE
 with the new guest vector, so external interrupts from the assigned
 devices can be injected to guests without VM-Exit.
 
 For lowest-priority interrupts, we use vector-hashing mechamisn to find
 the destination vCPU. This follows the hardware behavior, since modern
 Intel CPUs use vector hashing to handle the lowest-priority interrupt.
 
 For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
 still use interrupt remapping.
 
 Signed-off-by: Feng Wu feng...@intel.com
 ---
 v3:
 - Use bitmap to store the all the possible destination vCPUs of an
 interrupt, then trying to find the right destination from the bitmap
 - Typo and some small changes
 
  xen/drivers/passthrough/io.c | 96
 +++-
  1 file changed, 95 insertions(+), 1 deletion(-)
 
 diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
 index 9b77334..18e24e1 100644
 --- a/xen/drivers/passthrough/io.c
 +++ b/xen/drivers/passthrough/io.c
 @@ -26,6 +26,7 @@
  #include asm/hvm/iommu.h
  #include asm/hvm/support.h
  #include xen/hvm/irq.h
 +#include asm/io_apic.h
 
  static DEFINE_PER_CPU(struct list_head, dpci_list);
 
 @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
  xfree(dpci);
  }
 
 +/*
 + * The purpose of this routine is to find the right destination vCPU for
 + * an interrupt which will be delivered by VT-d posted-interrupt. There
 + * are several cases as below:

If you aim to have this interface common to more usages, don't restrict to
VT-d posted-interrupt which should be just an example.

 + *
 + * - For lowest-priority interrupts, we find the destination vCPU from the
 + *   guest vector using vector-hashing mechanism and return true. This 
 follows
 + *   the hardware behavior, since modern Intel CPUs use vector hashing to
 + *   handle the lowest-priority interrupt.

Does AMD use same hashing mechanism? Can this interface be reused by
other IOMMU type or it's an Intel specific implementation?

 + * - Otherwise, for single destination interrupt, it is straightforward to
 + *   find the destination vCPU and return true.
 + * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
 + *   so return false.
 + *
 + *   Here is the details about the vector-hashing mechanism:
 + *   1. For lowest-priority interrupts, store all the possible destination
 + *  vCPUs in an array.
 + *   2. Use gvec % max number of destination vCPUs to find the right
 + *  destination vCPU in the array for the lowest-priority interrupt.
 + */
 +static struct vcpu *pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
 +  uint8_t dest_mode, uint8_t 
 delivery_mode,
 +  uint8_t gvec)
 +{
 +unsigned long *dest_vcpu_bitmap = NULL;
 +unsigned int dest_vcpu_num = 0, idx = 0;
 +int size = (d-max_vcpus + BITS_PER_LONG - 1) / BITS_PER_LONG;
 +struct vcpu *v, *dest = NULL;
 +int i;
 +
 +dest_vcpu_bitmap = xzalloc_array(unsigned long, size);
 +if ( !dest_vcpu_bitmap )
 +{
 +dprintk(XENLOG_G_INFO,
 +dom%d: failed to allocate memory\n, d-domain_id);
 +return NULL;
 +}
 +
 +for_each_vcpu ( d, v )
 +{
 +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
 +dest_id, dest_mode) )
 +continue;
 +
 +__set_bit(v-vcpu_id, dest_vcpu_bitmap);
 +dest_vcpu_num++;
 +}
 +
 +if ( delivery_mode == dest_LowestPrio )
 +{
 +if (  dest_vcpu_num != 0 )
 +{

Having 'idx=0' here is more readable than initializing it earlier.

 +for ( i = 0; i = gvec % dest_vcpu_num; i++)
 +idx = find_next_bit(dest_vcpu_bitmap, d-max_vcpus, idx) + 1;
 +idx--;
 +
 +BUG_ON(idx = d-max_vcpus || idx  0);

idx is unsigned int. can't 0

 +dest = d-vcpu[idx];
 +}
 +}
 +else if (  dest_vcpu_num == 1 )

a comment would be applausive to explain the condition means
fixed destination, while multicast/broadcast will have num as ZERO.

 +{
 +idx = find_first_bit(dest_vcpu_bitmap, d-max_vcpus);
 +BUG_ON(idx = d-max_vcpus || idx  0);
 +dest = d-vcpu[idx];
 +}
 +
 +xfree(dest_vcpu_bitmap);
 +
 +return dest;
 +}
 +
  int pt_irq_create_bind(
  struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
  {
 @@ -257,7 +330,7 @@ int pt_irq_create_bind(
  {
  case PT_IRQ_TYPE_MSI:
  {
 -uint8_t dest, dest_mode;
 +uint8_t dest, dest_mode, delivery_mode;
  int dest_vcpu_id;
 
  if ( !(pirq_dpci-flags  HVM_IRQ_DPCI_MAPPED) )
 @@ -330,11 +403,32 @@ int pt_irq_create_bind(
  /* Calculate dest_vcpu_id for MSI-type pirq

Re: [Xen-devel] [v3 11/15] Update IRTE according to guest interrupt config changes

2015-07-08 Thread Wu, Feng



 -Original Message-
 From: Tian, Kevin
 Sent: Wednesday, July 08, 2015 6:23 PM
 To: Wu, Feng; xen-devel@lists.xen.org
 Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
 Yang Z; george.dun...@eu.citrix.com
 Subject: RE: [v3 11/15] Update IRTE according to guest interrupt config
 changes
 
  From: Wu, Feng
  Sent: Wednesday, June 24, 2015 1:18 PM
 
  When guest changes its interrupt configuration (such as, vector, etc.)
  for direct-assigned devices, we need to update the associated IRTE
  with the new guest vector, so external interrupts from the assigned
  devices can be injected to guests without VM-Exit.
 
  For lowest-priority interrupts, we use vector-hashing mechamisn to find
  the destination vCPU. This follows the hardware behavior, since modern
  Intel CPUs use vector hashing to handle the lowest-priority interrupt.
 
  For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
  still use interrupt remapping.
 
  Signed-off-by: Feng Wu feng...@intel.com
  ---
  v3:
  - Use bitmap to store the all the possible destination vCPUs of an
  interrupt, then trying to find the right destination from the bitmap
  - Typo and some small changes
 
   xen/drivers/passthrough/io.c | 96
  +++-
   1 file changed, 95 insertions(+), 1 deletion(-)
 
  diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
  index 9b77334..18e24e1 100644
  --- a/xen/drivers/passthrough/io.c
  +++ b/xen/drivers/passthrough/io.c
  @@ -26,6 +26,7 @@
   #include asm/hvm/iommu.h
   #include asm/hvm/support.h
   #include xen/hvm/irq.h
  +#include asm/io_apic.h
 
   static DEFINE_PER_CPU(struct list_head, dpci_list);
 
  @@ -199,6 +200,78 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
   xfree(dpci);
   }
 
  +/*
  + * The purpose of this routine is to find the right destination vCPU for
  + * an interrupt which will be delivered by VT-d posted-interrupt. There
  + * are several cases as below:
 
 If you aim to have this interface common to more usages, don't restrict to
 VT-d posted-interrupt which should be just an example.

Yes, making this a common interface should be better.

 
  + *
  + * - For lowest-priority interrupts, we find the destination vCPU from the
  + *   guest vector using vector-hashing mechanism and return true. This
 follows
  + *   the hardware behavior, since modern Intel CPUs use vector hashing to
  + *   handle the lowest-priority interrupt.
 
 Does AMD use same hashing mechanism? Can this interface be reused by
 other IOMMU type or it's an Intel specific implementation?

I am not sure how AMD handle lowest-priority. Intel hardware guys told me
recent Intel hardware platform use this method to deliver lowest-priority
interrupts. What do you mean by other IOMMU type?

Thanks,
Feng

 
  + * - Otherwise, for single destination interrupt, it is straightforward to
  + *   find the destination vCPU and return true.
  + * - For multicast/broadcast vCPU, we cannot handle it via interrupt 
  posting,
  + *   so return false.
  + *
  + *   Here is the details about the vector-hashing mechanism:
  + *   1. For lowest-priority interrupts, store all the possible destination
  + *  vCPUs in an array.
  + *   2. Use gvec % max number of destination vCPUs to find the right
  + *  destination vCPU in the array for the lowest-priority interrupt.
  + */
  +static struct vcpu *pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
  +  uint8_t dest_mode, uint8_t
 delivery_mode,
  +  uint8_t gvec)
  +{
  +unsigned long *dest_vcpu_bitmap = NULL;
  +unsigned int dest_vcpu_num = 0, idx = 0;
  +int size = (d-max_vcpus + BITS_PER_LONG - 1) / BITS_PER_LONG;
  +struct vcpu *v, *dest = NULL;
  +int i;
  +
  +dest_vcpu_bitmap = xzalloc_array(unsigned long, size);
  +if ( !dest_vcpu_bitmap )
  +{
  +dprintk(XENLOG_G_INFO,
  +dom%d: failed to allocate memory\n, d-domain_id);
  +return NULL;
  +}
  +
  +for_each_vcpu ( d, v )
  +{
  +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
  +dest_id, dest_mode) )
  +continue;
  +
  +__set_bit(v-vcpu_id, dest_vcpu_bitmap);
  +dest_vcpu_num++;
  +}
  +
  +if ( delivery_mode == dest_LowestPrio )
  +{
  +if (  dest_vcpu_num != 0 )
  +{
 
 Having 'idx=0' here is more readable than initializing it earlier.
 
  +for ( i = 0; i = gvec % dest_vcpu_num; i++)
  +idx = find_next_bit(dest_vcpu_bitmap, d-max_vcpus,
 idx) + 1;
  +idx--;
  +
  +BUG_ON(idx = d-max_vcpus || idx  0);
 
 idx is unsigned int. can't 0
 
  +dest = d-vcpu[idx];
  +}
  +}
  +else if (  dest_vcpu_num == 1 )
 
 a comment would be applausive to explain the condition means
 fixed destination, while

[Xen-devel] [PATCH 4.0 01/55] config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected

2015-07-08 Thread Greg Kroah-Hartman

4.0-stable review patch.  If anyone has any objections, please let me know.

--

From: Konrad Rzeszutek Wilk konrad.w...@oracle.com

commit a6dfa128ce5c414ab46b1d690f7a1b8decb8526d upstream.

A huge amount of NIC drivers use the DMA API, however if
compiled under 32-bit an very important part of the DMA API can
be ommitted leading to the drivers not working at all
(especially if used with 'swiotlb=force iommu=soft').

As Prashant Sreedharan explains it: the driver [tg3] uses
DEFINE_DMA_UNMAP_ADDR(), dma_unmap_addr_set() to keep a copy of
the dma mapping and dma_unmap_addr() to get the mapping
value. On most of the platforms this is a no-op, but ... with
iommu=soft and swiotlb=force this house keeping is required,
... otherwise we pass 0 while calling pci_unmap_/pci_dma_sync_
instead of the DMA address.

As such enable this even when using 32-bit kernels.

Reported-by: Ian Jackson ian.jack...@eu.citrix.com
Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Acked-by: David S. Miller da...@davemloft.net
Acked-by: Prashant Sreedharan prash...@broadcom.com
Cc: Borislav Petkov b...@alien8.de
Cc: H. Peter Anvin h...@zytor.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Michael Chan mc...@broadcom.com
Cc: Thomas Gleixner t...@linutronix.de
Cc: boris.ostrov...@oracle.com
Cc: casca...@linux.vnet.ibm.com
Cc: david.vra...@citrix.com
Cc: sanje...@broadcom.com
Cc: siva.kal...@broadcom.com
Cc: vyasev...@gmail.com
Cc: xen-de...@lists.xensource.com
Link: http://lkml.kernel.org/r/20150417190448.ga9...@l.oracle.com
Signed-off-by: Ingo Molnar mi...@kernel.org
Cc: Ben Hutchings b...@decadent.org.uk
Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org

---
 arch/x86/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -177,7 +177,7 @@ config SBUS
 
 config NEED_DMA_MAP_STATE
def_bool y
-   depends on X86_64 || INTEL_IOMMU || DMA_API_DEBUG
+   depends on X86_64 || INTEL_IOMMU || DMA_API_DEBUG || SWIOTLB
 
 config NEED_SG_DMA_LENGTH
def_bool y



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 05/15] vt-d: VT-d Posted-Interrupts feature detection

2015-07-08 Thread Tian, Kevin

 From: Wu, Feng
 Sent: Wednesday, June 24, 2015 1:18 PM
 
 VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
 With VT-d Posted-Interrupts enabled, external interrupts from
 direct-assigned devices can be delivered to guests without VMM
 intervention when guest is running in non-root mode.
 
 This patch adds feature detection logic for VT-d posted-interrupt.
 
 Signed-off-by: Feng Wu feng...@intel.com
 ---
 v3:
 - Remove the if no intremap then no intpost logic in
   intel_vtd_setup(), it is covered in the iommu_setup().
 - Add if no intremap then no intpost logic in the end
   of init_vtd_hw() which is called by vtd_resume().
 
 So the logic exists in the following three places:
 - parse_iommu_param()
 - iommu_setup()
 - init_vtd_hw()
 
  xen/drivers/passthrough/vtd/iommu.c | 18 --
  xen/drivers/passthrough/vtd/iommu.h |  1 +
  2 files changed, 17 insertions(+), 2 deletions(-)
 
 diff --git a/xen/drivers/passthrough/vtd/iommu.c
 b/xen/drivers/passthrough/vtd/iommu.c
 index 9053a1f..4221185 100644
 --- a/xen/drivers/passthrough/vtd/iommu.c
 +++ b/xen/drivers/passthrough/vtd/iommu.c
 @@ -2071,6 +2071,9 @@ static int init_vtd_hw(void)
  disable_intremap(drhd-iommu);
  }
 
 +if ( !iommu_intremap )
 +iommu_intpost = 0;
 +
  /*
   * Set root entries for each VT-d engine.  After set root entry,
   * must globally invalidate context cache, and then globally
 @@ -2133,8 +2136,8 @@ int __init intel_vtd_setup(void)
  }
 
  /* We enable the following features only if they are supported by all 
 VT-d
 - * engines: Snoop Control, DMA passthrough, Queued Invalidation and
 - * Interrupt Remapping.
 + * engines: Snoop Control, DMA passthrough, Queued Invalidation, 
 Interrupt
 + * Remapping, and Posted Interrupt
   */
  for_each_drhd_unit ( drhd )
  {
 @@ -2162,6 +2165,15 @@ int __init intel_vtd_setup(void)
  if ( iommu_intremap  !ecap_intr_remap(iommu-ecap) )
  iommu_intremap = 0;
 
 +/*
 + * We cannot use posted interrupt if X86_FEATURE_CX16 is
 + * not supported, since we count on this feature to
 + * atomically update 16-byte IRTE in posted format.
 + */
 +if ( !iommu_intremap 
 + (!cap_intr_post(iommu-cap) || !cpu_has_cx16) )
 +iommu_intpost = 0;
 +

Looks a typo here. -||

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] traps.c:3227: GPF (0000): ffff82d080194a4d - ffff82d080239d85 and other dom0 induced log messages

2015-07-08 Thread Sander Eikelenboom


Monday, July 6, 2015, 11:33:09 AM, you wrote:

 On 26.06.15 at 17:57, li...@eikelenboom.it wrote:
 On 2015-06-26 17:51, Jan Beulich wrote:
 On 26.06.15 at 17:41, li...@eikelenboom.it wrote:
 from 3.16 to 3.19 we gained a lot of these, if i remember correctly
 related to
 perf being enabled in the kernel:
 
 +   traps.c:2655:d0v0 Domain attempted WRMSR c081 from
 0xe023e008 to 0x00230010.
 +   traps.c:2655:d0v0 Domain attempted WRMSR c082 from
 0x82d0b000 to 0x81bc2670.
 +   traps.c:2655:d0v0 Domain attempted WRMSR c083 from
 0x82d0b020 to 0x81bc4630.
 
 These are the SYSCALL (STAR) MSRs, which the kernel has no business
 touching when running on Xen.
 
 from 3.19 to 4.0 we gained:
 +   d0 attempted to change d0v0's CR4 flags 0660 - 0760
 +   d0 attempted to change d0v1's CR4 flags 0660 - 0760
 +   d0 attempted to change d0v2's CR4 flags 0660 - 0760
 +   d0 attempted to change d0v3's CR4 flags 0660 - 0760
 +   d0 attempted to change d0v4's CR4 flags 0660 - 0760
 +   d0 attempted to change d0v5's CR4 flags 0660 - 0760
 
 This is X86_CR4_PCE - not sure how to properly handle that.
 Andrew, you're fiddling with the CR4 handling right now anyway -
 any thoughts?
 
 and from 4.0 to 4.1 we gained the ones you were interested in:
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 
 For these to be meaningful you need to translate them to symbolic
 addresses. (And yes, we should see to make the code print them
 in a more useful manner.)
 
 How ?

 addr2line against xen-syms (or xen.efi if you use that one). And of
 course the result may need manual adjustment to account for
 eventual patches you have in your tree.

 Jan

Ah yeah .. silly me .. somehow i had in mind it would be kernel addresses 
instead of xen, so running it against vmlinux of course lead no where.

Here we go:

(XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 - 
82d080239d85
(XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 - 
82d080239d85

which leads to:
# addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080195583
/usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758

# addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080239d85
??:?

Were /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 leads to:

case MSR_EFER:
 rdmsr_normal:
/* Everyone can read the MSR space. */
/* gdprintk(XENLOG_WARNING,Domain attempted RDMSR %p.\n,
_p(regs-ecx));*/
HERE --if ( rdmsr_safe(regs-ecx, val) )
goto fail;
 rdmsr_writeback:
regs-eax = (uint32_t)val;
regs-edx = (uint32_t)(val  32);
break;
}
break;


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64

2015-07-08 Thread Wu, Feng

 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, July 08, 2015 4:44 PM
 To: Wu, Feng
 Cc: Andrew Cooper; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z;
 xen-devel@lists.xen.org; k...@xen.org
 Subject: RE: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64

  On 08.07.15 at 10:33, feng...@intel.com wrote:
  From: Jan Beulich [mailto:jbeul...@suse.com]
  Sent: Wednesday, July 08, 2015 4:13 PM
   On 08.07.15 at 09:06, feng...@intel.com wrote:
   From: xen-devel-boun...@lists.xen.org
   [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Andrew Cooper
   Sent: Thursday, June 25, 2015 2:35 AM
   On 24/06/15 06:18, Feng Wu wrote:
+{
+uint128_t prev;
+
+ASSERT(cpu_has_cx16);

   Given that if this assertion were to fail, cmpxchg16b would fail with
   #UD, I would hand-code a asm_fixup section which in turn panics.  This
   avoids a situation where non-debug builds could die with an unqualified
   #UD exception.

   Is there an existing way to panic the hypervisor in assembler code, I
   don't find it, it would be appreciated if you can point it out.

  I'm not convinced such a #UD would be a significant problem: Looking
  at the disassembly will show the cause right away. The out of line
  ud2-s in some of VMX'es inline assembly wrappers are far worse.

  So, do you agree with the fixup section or not?

 I'd rather not go that route, unless Andrew or your manage to
 convince me otherwise.

  I think Andrew's enforce
  really means ASSERT() or BUG_ON(), again to avoid an unqualified
  exception. However - see above.

  Plus, all that said, without having seen the actual use sites of
  cmpxchg16b yet, I'm not at all convinced we really need this patch.

  After introducing posted format in IRTE, some fields exist in both the
  High 64 bit and the low 64 bit,such as pda_h and pda_l, how to make
  sure it is atomic when updating the pda field?

 Is there a need for updating these _after_ initially setting up an
 entry?

Each time the guest sets the affinity, we need to change this
filed to refer to the new destination.

Thanks,
Feng

 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64

2015-07-08 Thread Andrew Cooper

On 08/07/2015 09:12, Jan Beulich wrote:


 +{
 +uint128_t prev;
 +
 +ASSERT(cpu_has_cx16);
 Given that if this assertion were to fail, cmpxchg16b would fail with
 #UD, I would hand-code a asm_fixup section which in turn panics.  This
 avoids a situation where non-debug builds could die with an unqualified
 #UD exception.
 Is there an existing way to panic the hypervisor in assembler code, I
 don't find it, it would be appreciated if you can point it out.

When I asked for this, I was thinking of having an assertion frame with
the cmpxchg16b instruction in the place of the regular ud2a.  This way,
if it were to failed with #UD, there is a more useful error message.

However, there is no easy way of doing this at the moment, and it is an
obscure set of circumstances, so probably not worth the hassle.

 I'm not convinced such a #UD would be a significant problem: Looking
 at the disassembly will show the cause right away. The out of line
 ud2-s in some of VMX'es inline assembly wrappers are far worse.

Unqualified #UDs are harder to debug than qualified ones, and I have an
annoying habit of hitting them.  In some copious free time, I want to
continue the work started with c/s 0a3e27e and 881d6bf.  git grep
suggests there isn't actually too much to fix up in this regard.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used

2015-07-08 Thread Tian, Kevin

 From: Wu, Feng
 Sent: Wednesday, June 24, 2015 1:18 PM
 
 This patch adds an API which is used to update the IRTE
 for posted-interrupt when guest changes MSI/MSI-X information.
 
 Signed-off-by: Feng Wu feng...@intel.com

Acked-by: Kevin Tian kevin.t...@intel.com, with one small comment:

 +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint8_t gvec)
 +{
 +struct irq_desc *desc;
 +struct msi_desc *msi_desc;
 +int remap_index;
 +int rc = 0;
 +struct pci_dev *pci_dev;
 +struct acpi_drhd_unit *drhd;
 +struct iommu *iommu;
 +struct ir_ctrl *ir_ctrl;
 +struct iremap_entry *iremap_entries = NULL, *p = NULL;
 +struct iremap_entry new_ire;
 +struct pi_desc *pi_desc = v-arch.hvm_vmx.pi_desc;
 +unsigned long flags;
 +uint128_t old_ire, ret;
 +
 +desc = pirq_spin_lock_irq_desc(pirq, NULL);
 +if ( !desc )
 +return -ENOMEM;

-EINVAL?



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 08/15] Suppress posting interrupts when 'SN' is set

2015-07-08 Thread Wu, Feng



 -Original Message-
 From: Tian, Kevin
 Sent: Wednesday, July 08, 2015 5:06 PM
 To: Wu, Feng; xen-devel@lists.xen.org
 Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
 Yang Z; george.dun...@eu.citrix.com
 Subject: RE: [v3 08/15] Suppress posting interrupts when 'SN' is set
 
  From: Wu, Feng
  Sent: Wednesday, June 24, 2015 1:18 PM
 
  Currently, we don't support urgent interrupt, all interrupts
  are recognized as non-urgent interrupt, so we cannot send
  posted-interrupt when 'SN' is set.
 
  Signed-off-by: Feng Wu feng...@intel.com
  ---
  v3:
  use cmpxchg to test SN/ON and set ON
 
   xen/arch/x86/hvm/vmx/vmx.c | 32 
   1 file changed, 28 insertions(+), 4 deletions(-)
 
  diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
  index 0837627..b94ef6a 100644
  --- a/xen/arch/x86/hvm/vmx/vmx.c
  +++ b/xen/arch/x86/hvm/vmx/vmx.c
  @@ -1686,6 +1686,8 @@ static void __vmx_deliver_posted_interrupt(struct
 vcpu *v)
 
   static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
   {
  +struct pi_desc old, new, prev;
  +
 
 move to 'else if'.
 
   if ( pi_test_and_set_pir(vector, v-arch.hvm_vmx.pi_desc) )
   return;
 
  @@ -1698,13 +1700,35 @@ static void vmx_deliver_posted_intr(struct vcpu
 *v, u8
  vector)
*/
   pi_set_on(v-arch.hvm_vmx.pi_desc);
   }
  -else if ( !pi_test_and_set_on(v-arch.hvm_vmx.pi_desc) )
  +else
   {
  +prev.control = 0;
  +
  +do {
  +old.control = v-arch.hvm_vmx.pi_desc.control 
  +  ~(1  POSTED_INTR_ON | 1 
 POSTED_INTR_SN);
  +new.control = v-arch.hvm_vmx.pi_desc.control |
  +  1  POSTED_INTR_ON;
  +
  +/*
  + * Currently, we don't support urgent interrupt, all
  + * interrupts are recognized as non-urgent interrupt,
  + * so we cannot send posted-interrupt when 'SN' is set.
  + * Besides that, if 'ON' is already set, we cannot set
  + * posted-interrupts as well.
  + */
  +if ( prev.sn || prev.on )
  +{
  +vcpu_kick(v);
  +return;
  +}
 
 would it make more sense to move above check after cmpxchg?

My original idea is that, we only need to do the check when
prev.control != old.control, which means the cmpxchg is not
successful completed. If we add the check between cmpxchg
and while ( prev.control != old.control ), it seems the logic is
not so clear, since we don't need to check prev.sn and prev.on
when cmxchg succeeds in setting the new value.

Thanks,
Feng

 
  +
  +prev.control = cmpxchg(v-arch.hvm_vmx.pi_desc.control,
  +   old.control, new.control);
  +} while ( prev.control != old.control );
  +
   __vmx_deliver_posted_interrupt(v);
  -return;
   }
  -
  -vcpu_kick(v);
   }
 
   static void vmx_sync_pir_to_irr(struct vcpu *v)
  --
  2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 06/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts

2015-07-08 Thread Tian, Kevin

 From: Wu, Feng
 Sent: Wednesday, June 24, 2015 1:18 PM
 
 Extend struct pi_desc according to VT-d Posted-Interrupts Spec.
 
 Signed-off-by: Feng Wu feng...@intel.com

Acked-by: Kevin Tian kevin.t...@intel.com


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v10 00/13] enable Cache Allocation Technology (CAT) for VMs

2015-07-08 Thread Chao Peng

On Tue, Jul 07, 2015 at 03:46:21PM +0100, Ian Campbell wrote:
 On Fri, 2015-06-26 at 16:43 +0800, Chao Peng wrote:
  Chao Peng (13):
x86: add socket_cpumask
x86: detect and initialize Intel CAT feature
x86: maintain COS to CBM mapping for each socket
x86: add COS information for each domain
x86: expose CBM length and COS number information
x86: dynamically get/set CBM for a domain
x86: add scheduling support for Intel CAT
xsm: add CAT related xsm policies
 
 Jan applied to here.
 
 So I was going to apply these 5:
 
tools/libxl: minor name changes for CMT commands
tools/libxl: add command to show PSR hardware info
tools/libxl: introduce some socket helpers
tools: add tools support for Intel CAT
docs: add xl-psr.markdown
 
 But, on i686 I see:
 
 xl_cmdimpl.c: In function ‘psr_cat_hwinfo’:
 xl_cmdimpl.c:8390:16: error: format ‘%llx’ expects argument of type ‘long 
 long unsigned int’, but argument 3 has type ‘long unsigned int’ 
 [-Werror=format=]
 (1ul  info-cbm_len) - 1);
 ^
 xl_cmdimpl.c: In function ‘psr_cat_print_socket’:
 xl_cmdimpl.c:8450:5: error: format ‘%llx’ expects argument of type ‘long long 
 unsigned int’, but argument 3 has type ‘long unsigned int’ [-Werror=format=]
  printf(%-16s: %#PRIx64\n, Default CBM, (1ul  info-cbm_len) - 1);
  ^
 cc1: all warnings being treated as errors
 
 It seems there is some mismatch between your types and the printf
 formats used.
 
 The appropriate format specifier for an unsigned long (which you have
 from the ul in the constant) is %#lx and not %#PRIxXX which is
 associated with uintXX_t types.
 
 If you need a 64 bit type then you might have meant instead to use ull
 in which case you want %#llx as the format specifier.

This is what I need. Thanks for suggestion.

Chao
 
 If you really want/need an exactly 64 bit type then you'll have to do
 some nasty casting, something like ((uint64_t)1)  info-cbm_len) - 1
 or something, that's pretty ugly though. If you have to go this route
 then please test both builds, in case I've gotten my ()'s wrong.
 
 Ian.
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] Modified RTDS scheduler to use an event-driven model instead of polling.

2015-07-08 Thread Dario Faggioli

[Trimming the Cc-list a bit, to avoid bothering Wei and Jan]

On Tue, 2015-07-07 at 22:56 -0700, Meng Xu wrote:
 Hi Dario,
 
Hi,

 2015-07-07 7:03 GMT-07:00 Dario Faggioli dario.faggi...@citrix.com:
 
  On Mon, 2015-07-06 at 22:51 -0700, Meng Xu wrote:

  So, it looks to me that, as far as (1) and (2) are concerned, since we
  are just inserting a vCPU in the runq, if we have M pCPUs, and we know
  whether we inserted it within the first M spots, we already have what we
  want, or am I missing something? And if __runq_insert() now (with Dagaen
  patch) tells us this, well, we can simplify the tickling logic, can't
  we?
 
 I think you might assume that the first M VCPUs  in the runq are the
 current running VCPUs on the M pCPUs. Am I correct? (From what you
 described in the following example, I think I'm correct. ;-) )
 
Mmm... Interesting. Yes, I was. I was basing this assumption on this
chunk on Dagaen's patch:

// If we become one of top [# CPUs] in the runq, tickle it
// TODO: make this work when multiple tickles are required
if ( new_position  0  new_position = prv-NUM_CPUS )
runq_tickle(ops, svc);

And forgot (and did not go check) about the __q_remove() in
rt_schedule(). My bad again.

But then, since we don't have the running vCPUs in the runq, how the
code above is supposed to be correct?

  With an example:
  We are waking up (or re-inserting, in rt_context_saved()) vCPU j. We
  have 6 pCPUs. __runq_insert() tells us that it put vCPU j at the 3rd
  place in the runq. This means vCPU j should be set to run as soon as
  possible. So, if vCPU j is 3rd in runq, either
   (a) there are only 3 runnable vCPUs (i.e., if we are waking up j, there
   were 2 of them, and j is the third; if we are in context_saved,
   there already where 3, and j just got it's deadline postponed, or
   someone else got its one replenished);
   (b) there are more than 3 runnable vCPUs, i.e., there is at least a 4th
   vCPU --say vCPU k-- in the runq, which was the 3rd before vCPU j
   were woken (or re-inserted), but now became the 4th, because
   deadline(j)deadline(k).
  In case (a), there are for sure idle pCPUs, and we should tickle one of
  them.
 
 I tell that you make the above assumption from here.
 
 However, in the current implementation, runq does not hold the current
 running VCPUs on the pCPUs. We remove the vcpu from runq in
 rt_schedule() function. What you described above make perfect sense
 if we decide to make runq hold the current running VCPUs.
 
Yep. And it indeed seems to me that we may well think about doing so. It
will make it possible to base on the position for making/optimizing
scheduling decisions, and at the same time I don't think I see much
downsides in that, do you?

 Actually, after thinking about the example you described, I think we
 can hold the current running VCPUs *and* the current idle pCPUs in the
 scheduler-wide structure; 

What do you mean with 'current idle pCPUs'? I said something similar as
well, and what I meant was a cpumask with bit i set if i-eth pCPU is
idle, do you also mean this?

About the running vCPUs, why just not leave them in the actual runq?

 In other words, we can have another runningq
 (not runq) and a idle_pcpu list in the rt_private; Now all VCPUs are
 stored in three queues: runningq, runq, and depletedq, in increasing
 priority order.
 
Perhaps, but I'm not sure I see the need for another list. Again, why
just not leave them in runq? I appreciate this is a rather big  change
(although, perhaps it looks bigger said than done), but I think it could
be worth pursuing.

For double checking, asserting, and making sure that we are able to
identify the running svc-s, we have the __RTDS_scheduled flag.

 When we make the tickle decision, we only need to scan the idle_pcpu
 and then runningq to figure out which pCPU to tickle. All of other
 design you describe still hold here, except that the position where a
 VCPU is inserted into runq cannot directly give us which pCPU to
 tickle. What do you think?
 
I think that I'd like to know why you think adding another queue is
necessary, instead of just leaving the vCPUs in the actual runq. Is
there something bad about that which I'm missing?

  In case (b) there may be idle pCPUs (and, if that's the case, we
  should tickle one of them, of course) or not. If not, we need to go
  figure out which pCPU to tickle, which is exactly what runq_tickle()
  does, but we at least know for sure that we want to tickle the pCPU
  where vCPU k runs, or others where vCPUs with deadline greater than vCPU
  k run.
 
  Does this make sense?
 
 Yes, if we decide to hold the currently running VCPUs in
 scheduler-wide structure: it can be runq or runningq.
 
Yes, but if we use two queues, we defeat at least part of this
optimization/simplification.

  Still, I think I gave enough material for an actual optimization. What
  do you think?
 
 Yes. It is very clear.
 The only thing is how we are going

Re: [Xen-devel] [v3 05/15] vt-d: VT-d Posted-Interrupts feature detection

2015-07-08 Thread Wu, Feng



 -Original Message-
 From: Tian, Kevin
 Sent: Wednesday, July 08, 2015 3:32 PM
 To: Wu, Feng; xen-devel@lists.xen.org
 Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
 Yang Z; george.dun...@eu.citrix.com
 Subject: RE: [v3 05/15] vt-d: VT-d Posted-Interrupts feature detection
 
  From: Wu, Feng
  Sent: Wednesday, June 24, 2015 1:18 PM
 
  VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
  With VT-d Posted-Interrupts enabled, external interrupts from
  direct-assigned devices can be delivered to guests without VMM
  intervention when guest is running in non-root mode.
 
  This patch adds feature detection logic for VT-d posted-interrupt.
 
  Signed-off-by: Feng Wu feng...@intel.com
  ---
  v3:
  - Remove the if no intremap then no intpost logic in
intel_vtd_setup(), it is covered in the iommu_setup().
  - Add if no intremap then no intpost logic in the end
of init_vtd_hw() which is called by vtd_resume().
 
  So the logic exists in the following three places:
  - parse_iommu_param()
  - iommu_setup()
  - init_vtd_hw()
 
   xen/drivers/passthrough/vtd/iommu.c | 18 --
   xen/drivers/passthrough/vtd/iommu.h |  1 +
   2 files changed, 17 insertions(+), 2 deletions(-)
 
  diff --git a/xen/drivers/passthrough/vtd/iommu.c
  b/xen/drivers/passthrough/vtd/iommu.c
  index 9053a1f..4221185 100644
  --- a/xen/drivers/passthrough/vtd/iommu.c
  +++ b/xen/drivers/passthrough/vtd/iommu.c
  @@ -2071,6 +2071,9 @@ static int init_vtd_hw(void)
   disable_intremap(drhd-iommu);
   }
 
  +if ( !iommu_intremap )
  +iommu_intpost = 0;
  +
   /*
* Set root entries for each VT-d engine.  After set root entry,
* must globally invalidate context cache, and then globally
  @@ -2133,8 +2136,8 @@ int __init intel_vtd_setup(void)
   }
 
   /* We enable the following features only if they are supported by all
 VT-d
  - * engines: Snoop Control, DMA passthrough, Queued Invalidation and
  - * Interrupt Remapping.
  + * engines: Snoop Control, DMA passthrough, Queued Invalidation,
 Interrupt
  + * Remapping, and Posted Interrupt
*/
   for_each_drhd_unit ( drhd )
   {
  @@ -2162,6 +2165,15 @@ int __init intel_vtd_setup(void)
   if ( iommu_intremap  !ecap_intr_remap(iommu-ecap) )
   iommu_intremap = 0;
 
  +/*
  + * We cannot use posted interrupt if X86_FEATURE_CX16 is
  + * not supported, since we count on this feature to
  + * atomically update 16-byte IRTE in posted format.
  + */
  +if ( !iommu_intremap 
  + (!cap_intr_post(iommu-cap) || !cpu_has_cx16) )
  +iommu_intpost = 0;
  +
 
 Looks a typo here. -||

Yes, this is a typo. Thanks for the review.

Thanks,
Feng
 
 Thanks
 Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 07/15] vmx: Initialize VT-d Posted-Interrupts Descriptor

2015-07-08 Thread Tian, Kevin

 From: Wu, Feng
 Sent: Wednesday, June 24, 2015 1:18 PM
 
 This patch initializes the VT-d Posted-interrupt Descriptor.
 
 Signed-off-by: Feng Wu feng...@intel.com

Acked-by: Kevin Tian kevin.t...@intel.com


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 for Xen 4.6 1/4] xen: enable per-VCPU parameter settings for RTDS scheduler

2015-07-08 Thread Dario Faggioli

On Tue, 2015-07-07 at 23:06 -0700, Meng Xu wrote:
 2015-07-07 7:39 GMT-07:00 Dario Faggioli dario.faggi...@citrix.com:
  On Tue, 2015-07-07 at 09:59 +0100, Jan Beulich wrote:
   On 29.06.15 at 04:44, lichong...@gmail.com wrote:
   --- a/xen/common/Makefile
   +++ b/xen/common/Makefile
   @@ -31,7 +31,6 @@ obj-y += rbtree.o
obj-y += rcupdate.o
obj-y += sched_credit.o
obj-y += sched_credit2.o
   -obj-y += sched_sedf.o
obj-y += sched_arinc653.o
obj-y += sched_rt.o
obj-y += schedule.o
 
  Stray change. Or perhaps the file doesn't build anymore, in which case
  you should instead have stated that the patch is dependent upon the
  series removing SEDF.
 
  This indeed does not belong in here. And of course, things should
  build... So, Chong, either deal with SEDF as well, if basing your
  patches on a tree where it is still there, or base on top of my patches,
  ignore it, but state the dependency, as Jan is asking.
 
   @@ -1157,8 +1158,75 @@ rt_dom_cntl(
 
   +case XEN_DOMCTL_SCHEDOP_putvcpuinfo:
   +spin_lock_irqsave(prv-lock, flags);
   +for( index = 0; index  op-u.v.nr_vcpus; index++ )
   +{
   +if ( copy_from_guest_offset(local_sched,
   +op-u.v.vcpus, index, 1) )
   +{
   +rc = -EFAULT;
   +break;
   +}
   +if ( local_sched.vcpuid = d-max_vcpus
   +|| d-vcpu[local_sched.vcpuid] == NULL )
   +{
   +rc = -EINVAL;
   +break;
   +}
   +svc = rt_vcpu(d-vcpu[local_sched.vcpuid]);
   +svc-period = MICROSECS(local_sched.s.rtds.period);
   +svc-budget = MICROSECS(local_sched.s.rtds.budget);
 
  Are all input values valid here?
 
  That's a good point, actually. Right now, SEDF does some range
  enforcement, by means of these values:
 
  #define PERIOD_MAX MILLISECS(1) /* 10s  */
  #define PERIOD_MIN (MICROSECS(10))  /* 10us */
  #define SLICE_MIN (MICROSECS(5))/*  5us */
 
  Chong, it probably makes sense to (in a separate patch), introduce
  something like this in RTDS too (with SLICE_MIN--BUDGET_MIN), and then
  use them, in this patch, for sanity checking the input.
 
  It also makes sense to check and enforce budget=period, IMO.
 
  About the specific values, I'm open to proposals. I think something like
  the SEDF's one is fine. Meng?
 
 We are trying to make some range enforcement for RTDS scheduler. Is my
 understanding correct? (It should be, but just in case. :-) )
 
We are wondering whether that could be necessary/useful, and IMO, it
would.

 As to the range of period, I think the max value can be as large as
 the type of period (ie. s_time_t) can represent. When we want a
 dedicated CPU for a guest, we will set budget=period and  can set the
 period to a very very large value to avoid the unnecessarily
 invocation of the scheduler.

Makes sense. We do have STIME_MAX and, given that period is something
that is added to current time during scheduling, STIME_DELTA_MAX.

Maybe, put something together basing on those? 

 As to the min value of period, I think it should be =100us. The
 scheduler overhead of running a large box could be 1us if the runq is
 long and competetion of the runq lock is heavy. If the scheduler is
 potentially invoked every 10us, the scheduler overhead will be 10% of
 total computation time, which seems a lot to me.
 
Ok.

 As to the range of budget, the min value can be 5us, the same with
 SEDF; 

Well, wouldn't the above reasoning about overhead apply here too?
Budgets of 5us mean the scheduler can be invoked every 5us for budget
enforcement. If 10us was unreasonable, 5 is even more so.

Therefore, 100us here too? Or maybe let's allow for lower values (like
50us or 10us), but print a warning?

 the max value is the value of period of the same VCPU.
 
Yep.

And, whatever the values, it would be useful to have comments somewhere
(either when the values are defined or enforced), stating what you said
above.

Regards,
Dario
-- 
This happens because I choose it to happen! (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v6][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-07-08 Thread Tiejun Chen

Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v6:

* Nothing is changed.

v5:
 
* Nothing is changed.

v4:

* Refine one code comment.

 xen/drivers/passthrough/vtd/iommu.c | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index c833290..095fb1d 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2297,13 +2297,39 @@ static int intel_iommu_assign_device(
 if ( list_empty(acpi_drhd_units) )
 return -ENODEV;
 
+seg = pdev-seg;
+bus = pdev-bus;
+/*
+ * In rare cases one given rmrr is shared by multiple devices but
+ * obviously this would put the security of a system at risk. So
+ * we should prevent from this sort of device assignment.
+ *
+ * TODO: in the future we can introduce group device assignment
+ * interface to make sure devices sharing RMRR are assigned to the
+ * same domain together.
+ */
+for_each_rmrr_device( rmrr, bdf, i )
+{
+if ( rmrr-segment == seg 
+ PCI_BUS(bdf) == bus 
+ PCI_DEVFN2(bdf) == devfn )
+{
+if ( rmrr-scope.devices_cnt  1 )
+{
+printk(XENLOG_G_ERR VTDPREFIX
+cannot assign %04x:%02x:%02x.%u
+with shared RMRR for Dom%d.\n,
+   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+   d-domain_id);
+return -EPERM;
+}
+}
+}
+
 ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
 if ( ret )
 return ret;
 
-seg = pdev-seg;
-bus = pdev-bus;
-
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v6][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-08 Thread Tiejun Chen

When allocating mmio address for PCI bars, we need to make
sure they don't overlap with reserved regions.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6:

* Nothing is changed.

v5:

* Rename that field, is_64bar, inside struct bars with flag, and
  then extend to also indicate if this bar is already allocated.

v4:

* We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2.

  2. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3.

  3. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource-base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource-base to 
reserved_end
  just to reallocate this bar.

 tools/firmware/hvmloader/pci.c | 194 ++---
 1 file changed, 164 insertions(+), 30 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..397f3b7 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,31 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+static void relocate_ram_for_pci_memory(unsigned long cur_pci_mem_start)
+{
+struct xen_add_to_physmap xatp;
+unsigned int nr_pages = min_t(
+unsigned int,
+hvm_info-low_mem_pgend - (cur_pci_mem_start  PAGE_SHIFT),
+(1u  16) - 1);
+if ( hvm_info-high_mem_pgend == 0 )
+hvm_info-high_mem_pgend = 1ull  (32 - PAGE_SHIFT);
+hvm_info-low_mem_pgend -= nr_pages;
+printf(Relocating 0x%x pages from PRIllx to PRIllx\
+for lowmem MMIO hole\n,
+   nr_pages,
+   PRIllx_arg(((uint64_t)hvm_info-low_mem_pgend)PAGE_SHIFT),
+   PRIllx_arg(((uint64_t)hvm_info-high_mem_pgend)PAGE_SHIFT));
+xatp.domid = DOMID_SELF;
+xatp.space = XENMAPSPACE_gmfn_range;
+xatp.idx   = hvm_info-low_mem_pgend;
+xatp.gpfn  = hvm_info-high_mem_pgend;
+xatp.size  = nr_pages;
+if ( hypercall_memory_op(XENMEM_add_to_physmap, xatp) != 0 )
+BUG();
+hvm_info-high_mem_pgend += nr_pages;
+}
+
 void pci_setup(void)
 {
 uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -50,17 +75,22 @@ void pci_setup(void)
 /* Resources assignable to PCI devices via BARs. */
 struct resource {
 uint64_t base, max;
-} *resource, mem_resource, high_mem_resource, io_resource;
+} *resource, mem_resource, high_mem_resource, io_resource, 
exp_mem_resource;
 
 /* Create a list of device BARs in descending order of size. */
 struct bars {
-uint32_t is_64bar;
+#define PCI_BAR_IS_64BIT0x1
+#define PCI_BAR_IS_ALLOCATED0x2
+uint32_t flag;
 uint32_t devfn;
 uint32_t bar_reg;
 uint64_t bar_sz;
 } *bars = (struct bars *)scratch_start;
-unsigned int i, nr_bars = 0;
-uint64_t mmio_hole_size = 0;
+unsigned int i, j, n, nr_bars = 0;
+uint64_t mmio_hole_size = 0, reserved_start, reserved_end, reserved_size;
+bool bar32_allocating = 0;
+uint64_t mmio32_unallocated_total = 0;
+unsigned long

[Xen-devel] [v6][PATCH 14/16] xen/vtd: enable USB device assignment

2015-07-08 Thread Tiejun Chen

USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we
can handle this scenario cleanly with new policy mechanism so previous
hack code can be removed now.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v6:

* Nothing is changed.

v5:

* Nothing is changed.

v4:

* Refine the patch head description

 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++-
 xen/drivers/passthrough/vtd/utils.c |  7 ---
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h 
b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {\
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 56f5911..c833290 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2245,11 +2245,9 @@ static int reassign_device_ownership(
 /*
  * If the device belongs to the hardware domain, and it has RMRR, don't
  * remove it from the hardware domain, because BIOS may use RMRR at
- * booting time. Also account for the special casing of USB below (in
- * intel_iommu_assign_device()).
+ * booting time.
  */
-if ( !is_hardware_domain(source) 
- !is_usb_device(pdev-seg, pdev-bus, pdev-devfn) )
+if ( !is_hardware_domain(source) )
 {
 const struct acpi_rmrr_unit *rmrr;
 u16 bdf;
@@ -2303,13 +2301,8 @@ static int intel_iommu_assign_device(
 if ( ret )
 return ret;
 
-/* FIXME: Because USB RMRR conflicts with guest bios region,
- * ignore USB RMRR temporarily.
- */
 seg = pdev-seg;
 bus = pdev-bus;
-if ( is_usb_device(seg, bus, pdev-devfn) )
-return 0;
 
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c 
b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include extern.h
 #include asm/io_apic.h
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-PCI_CLASS_DEVICE);
-return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v6][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-08 Thread Tiejun Chen

This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
rdm = strategy=host,reserve=strict/relaxed
Per-device RDM parameter:
pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]

Global RDM parameter, strategy, allows user to specify reserved regions
explicitly, Currently, using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. By default this isn't set so we don't
check all rdms. Instead, we just check rdm specific to a given device if
you're assigning this kind of device. Note this option is not recommended
unless you can make sure any conflict does exist.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6:

* Some rename to make our policy reasonable
  type - strategy
  none - ignore
* Don't expose ignore in xl level and just keep that as a default.
  And then sync docs and the patch head description

v5:

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.
* A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
* Improve all descriptions in doc.
* Make all rdm variables specific to .hvm

v4:

* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5| 81 
 docs/misc/vtd.txt| 24 +
 tools/libxl/libxl_create.c   |  7 
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c  |  9 +
 tools/libxl/libxl_types.idl  | 18 ++
 6 files changed, 141 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..091e80d 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,79 @@ assigned slave device.
 
 =back
 
+=item Brdm=RDM_RESERVATION_STRING
+
+(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+BRDM_RESERVE_STRING has the form C[KEY=VALUE,KEY=VALUE,... where:
+
+=over 4
+
+=item BKEY=VALUE
+
+Possible BKEYs are:
+
+=over 4
+
+=item Bstrategy=STRING
+
+Currently there is only one valid type:
+
+host means all reserved device memory on this platform should be checked to
+reserve regions in this VM's guest address space. This global rdm parameter
+allows user to specify reserved regions explicitly, and using host includes
+all reserved regions reported on this platform, which is useful when doing
+hotplug.
+
+By default this isn't set so we don't check all rdms. Instead, we just check
+rdm specific to a given device if you're assigning this kind of device. Note
+this option is not recommended unless you can make sure any conflict does 
exist.
+
+For example, you're trying to set memory = 2800 to allocate memory to one
+given VM but the platform owns two RDM regions like,
+
+Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff
+Device B [sbdf_B]: RMRR region_B: base_addr ad80 end_address afff
+
+In this conflict case,
+
+#1. If Bstrategy is set to host, for example,
+
+rdm = strategy=host,reserve=strict or rdm = strategy=host,reserve=relaxed
+
+It means all conflicts will be handled according to the policy
+introduced by Breserve as described below.
+
+#2. If Bstrategy is not set at all, but
+
+pci = [ 'sbdf_A, rdm_reserve=x' ]
+
+It means only one conflict of region_A will be handled according to the policy
+introduced by Brdm_reserve=STRING as described inside pci options.
+
+=item Breserve=STRING
+
+Specifies how to deal with conflicts when reserving reserved device
+memory in guest address space.
+
+When that conflict is unsolved,
+
+strict means VM can't be created, or the associated device can't be
+attached in the case of hotplug.
+
+relaxed allows VM to be created but may cause VM to crash if
+pass-through device accesses RDM. For exampl,e Windows IGD GFX driver
+always accessed RDM regions so it leads to VM crash.
+
+Note this may be overridden by

[Xen-devel] [v6][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest

2015-07-08 Thread Tiejun Chen

Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist, and hvmloader would need this info
later.

Note this guest e820 table would be same as before if the
platform has no any RDM or we disable RDM (by default).

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6:

* Nothing is changed.

v5:

* Rephrase patch's short log
* Make libxl__domain_construct_e820() hidden

v4:

* Use goto style error handling.
* Instead of NOGC, we shoud use libxl__malloc(gc,XXX) to allocate local e820.

 tools/libxl/libxl_dom.c  |  5 +++
 tools/libxl/libxl_internal.h | 24 +
 tools/libxl/libxl_x86.c  | 83 
 3 files changed, 112 insertions(+)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 62ef120..41da479 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 goto out;
 }
 
+if (libxl__domain_construct_e820(gc, d_config, domid, args)) {
+LOG(ERROR, setting domain memory map failed);
+goto out;
+}
+
 ret = hvm_build_set_params(ctx-xch, domid, info, state-store_port,
state-store_mfn, state-console_port,
state-console_mfn, state-store_domid,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b4d8419..a50449a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3794,6 +3794,30 @@ static inline void libxl__update_config_vtpm(libxl__gc 
*gc,
  */
 void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr,
 const libxl_bitmap *sptr);
+
+/*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+_hidden int libxl__domain_construct_e820(libxl__gc *gc,
+ libxl_domain_config *d_config,
+ uint32_t domid,
+ struct xc_hvm_build_args *args);
+
 #endif
 
 /*
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..be297b2 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t 
domid, int irq)
 }
 
 /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x10
+int libxl__domain_construct_e820(libxl__gc *gc,
+ libxl_domain_config *d_config,
+ uint32_t domid,
+ struct xc_hvm_build_args *args)
+{
+int rc = 0;
+unsigned int nr = 0, i;
+/* We always own at least one lowmem entry. */
+unsigned int e820_entries = 1;
+struct e820entry *e820 = NULL;
+uint64_t highmem_size =
+args-highmem_end ? args-highmem_end - (1ull  32) : 0;
+
+/* Add all rdm entries. */
+for (i = 0; i  d_config-num_rdms; i++)
+if (d_config-rdms[i].flag != LIBXL_RDM_RESERVE_FLAG_INVALID)
+e820_entries++;
+
+
+/* If we should have a highmem range. */
+if (highmem_size)
+e820_entries++;
+
+if (e820_entries = E820MAX) {
+LOG(ERROR, Ooops! Too many entries in the memory map!\n);
+rc = ERROR_INVAL;
+goto out;
+}
+
+e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
+
+/* Low memory */
+e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
+e820[nr].size = args-lowmem_end - GUEST_LOW_MEM_START_DEFAULT;
+e820[nr].type

[Xen-devel] [v6][PATCH 02/16] xen/vtd: create RMRR mapping

2015-07-08 Thread Tiejun Chen

RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

And we also need to introduce a pair of helper to create/clear this
sort of identity mapping as follows:

set_identity_p2m_entry():

If the gfn space is unoccupied, we just set the mapping. If space
is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned.

clear_identity_p2m_entry():

We just define macro to wrapper guest_physmap_remove_page() with
a returning value as necessary.

CC: Tim Deegan t...@xen.org
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Acked-by: George Dunlap george.dun...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6:

* Nothing is changed.

v5:

* Fold our original patch #2 and #3 as this new

* Introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

v4:

* Change that orginal condition,

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
  
  to make sure we catch those invalid mfn mapping as we expected.

* To have

  if ( !paging_mode_translate(p2m-domain) )
return 0;

  at the start, instead of indenting the whole body of the function
  in an inner scope. 

* extend guest_physmap_remove_page() to return a value as a proper
  unmapping helper

* Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. 

* Drop iommu_map_page() since actually ept_set_entry() can do this
  internally.

 xen/arch/x86/mm/p2m.c   | 40 +++--
 xen/drivers/passthrough/vtd/iommu.c |  5 ++---
 xen/include/asm-x86/p2m.h   | 13 +---
 3 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6b39733..99a26ca 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long 
gfn, unsigned long mfn,
  p2m-default_access);
 }
 
-void
+int
 guest_physmap_remove_page(struct domain *d, unsigned long gfn,
   unsigned long mfn, unsigned int page_order)
 {
 struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int rc;
 gfn_lock(p2m, gfn, page_order);
-p2m_remove_page(p2m, gfn, mfn, page_order);
+rc = p2m_remove_page(p2m, gfn, mfn, page_order);
 gfn_unlock(p2m, gfn, page_order);
+return rc;
 }
 
 int
@@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+   p2m_access_t p2ma)
+{
+p2m_type_t p2mt;
+p2m_access_t a;
+mfn_t mfn;
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int ret;
+
+if ( !paging_mode_translate(p2m-domain) )
+return 0;
+
+gfn_lock(p2m, gfn, 0);
+
+mfn = p2m-get_entry(p2m, gfn, p2mt, a, 0, NULL);
+
+if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
+ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+p2m_mmio_direct, p2ma);
+else if ( mfn_x(mfn) == gfn  p2mt == p2m_mmio_direct  a == p2ma )
+ret = 0;
+else
+{
+ret = -EBUSY;
+printk(XENLOG_G_WARNING
+   Cannot setup identity map d%d:%lx,
+gfn already mapped to %lx.\n,
+   d-domain_id, gfn, mfn_x(mfn));
+}
+
+gfn_unlock(p2m, gfn, 0);
+return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 44ed23d..8415958 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-if ( intel_iommu_unmap_page(d, base_pfn) )
+if ( clear_identity_p2m_entry(d, base_pfn, 0) )
 ret = -ENXIO;
 base_pfn++;
 }
@@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-int err = intel_iommu_map_page(d, base_pfn, base_pfn,
-

[Xen-devel] [v6][PATCH 04/16] xen: enable XENMEM_memory_map in hvm

2015-07-08 Thread Tiejun Chen

This patch enables XENMEM_memory_map in hvm. So hvmloader can
use it to setup the e820 mappings.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Reviewed-by: Kevin Tian kevin.t...@intel.com
Acked-by: Jan Beulich jbeul...@suse.com
Acked-by: George Dunlap george.dun...@eu.citrix.com
---
v6:

* Nothing is changed.

v5:

* Nothing is changed.

v4:

* Just refine the patch head description as Jan commented.

 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c  | 6 --
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..638daee 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4741,7 +4741,6 @@ static long hvm_memory_op(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
@@ -4817,7 +4816,6 @@ static long hvm_memory_op_compat32(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fd151c6..92eccd0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return rc;
 }
 
-if ( is_hvm_domain(d) )
-{
-rcu_unlock_domain(d);
-return -EPERM;
-}
-
 e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
 if ( e820 == NULL )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v6][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy

2015-07-08 Thread Tiejun Chen

This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Note this also bring some fallout to python usage of xc_assign_device().

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
CC: David Scott dave.sc...@eu.citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6:

* Nothing is changed.

v5:

* Fix the flag field as 0 to DT device

v4:

* In the patch head description, I add to explain why we need to sync
  the xc.c file

 tools/libxc/include/xenctrl.h   |  3 ++-
 tools/libxc/xc_domain.c |  9 -
 tools/libxl/libxl_pci.c |  3 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 16 
 tools/python/xen/lowlevel/xc/xc.c   | 30 --
 5 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 9160623..89cbc5a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2079,7 +2079,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
  uint32_t domid,
- uint32_t machine_sbdf);
+ uint32_t machine_sbdf,
+ uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
  uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 0951291..ef41228 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
 xc_interface *xch,
 uint32_t domid,
-uint32_t machine_sbdf)
+uint32_t machine_sbdf,
+uint32_t flag)
 {
 DECLARE_DOMCTL;
 
@@ -1705,6 +1706,7 @@ int xc_assign_device(
 domctl.domain = domid;
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
 domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.flag = flag;
 
 return do_domctl(xch, domctl);
 }
@@ -1792,6 +1794,11 @@ int xc_assign_dt_device(
 
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
 domctl.u.assign_device.u.dt.size = size;
+/*
+ * DT doesn't own any RDM so actually DT has nothing to do
+ * for any flag and here just fix that as 0.
+ */
+domctl.u.assign_device.flag = 0;
 set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
 
 rc = do_domctl(xch, domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 FILE *f;
 unsigned long long start, end, flags, size;
 int irq, i, rc, hvm = 0;
+uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 
 if (type == LIBXL_DOMAIN_TYPE_INVALID)
 return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 
 out:
 if (!libxl_is_stubdom(ctx, domid, NULL)) {
-rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev));
+rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev), 
flag);
 if (rc  0  (hvm || errno != ENOSYS)) {
 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, xc_assign_device failed);
 return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c 
b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..b7de615 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,17 @@ CAMLprim value stub_xc_domain_test_assign_device(value 
xch, value domid, value d
CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+XEN_DOMCTL_DEV_RDM_RELAXED,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+value rflag)
 {
-   CAMLparam3(xch, domid, desc);
+   CAMLparam4(xch, domid, desc, rflag);
int ret;
int domain, bus, dev, func;
-   uint32_t sbdf;
+   uint32_t sbdf, flag;
 
domain = Int_val(Field(desc, 0));
bus = Int_val(Field(desc, 1));
@@ -1185,7 +1190,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, 
value domid, value desc)
func = Int_val(Field(desc, 3));
sbdf = encode_sbdf(domain, bus, dev, func);
 
-   ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+   ret = Int_val(Field(rflag, 0));
+   flag = domain_assign_device_rdm_flag_table[ret];
+
+   ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
if (ret  0)

[Xen-devel] [v6][PATCH 05/16] hvmloader: get guest memory map into memory_map[]

2015-07-08 Thread Tiejun Chen

Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END].
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
---
v6:

* Nothing is changed.

v5:

* Nothing is changed.

v4:

* Move some codes related to e820 to that specific file, e820.c.

* Consolidate printf()+BUG() and BUG_ON()

* Avoid another fixed width type for the parameter of get_mem_mapping_layout()

 tools/firmware/hvmloader/e820.c  | 35 +++
 tools/firmware/hvmloader/e820.h  |  7 +++
 tools/firmware/hvmloader/hvmloader.c |  2 ++
 tools/firmware/hvmloader/util.c  | 26 ++
 tools/firmware/hvmloader/util.h  | 12 
 5 files changed, 82 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..3e53c47 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -23,6 +23,41 @@
 #include config.h
 #include util.h
 
+struct e820map memory_map;
+
+void memory_map_setup(void)
+{
+unsigned int nr_entries = E820MAX, i;
+int rc;
+uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
+uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+rc = get_mem_mapping_layout(memory_map.map, nr_entries);
+
+if ( rc || !nr_entries )
+{
+printf(Get guest memory maps[%d] failed. (%d)\n, nr_entries, rc);
+BUG();
+}
+
+memory_map.nr_map = nr_entries;
+
+for ( i = 0; i  nr_entries; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED )
+{
+if ( check_overlap(alloc_addr, alloc_size,
+   memory_map.map[i].addr,
+   memory_map.map[i].size) )
+{
+printf(Fail to setup memory map due to conflict);
+printf( on dynamic reserved memory range.\n);
+BUG();
+}
+}
+}
+}
+
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
 uint64_t last_end = 0, start, end;
diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
 uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX128
+
+struct e820map {
+unsigned int nr_map;
+struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c 
b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..84c588c 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -262,6 +262,8 @@ int main(void)
 
 init_hypercalls();
 
+memory_map_setup();
+
 xenbus_setup();
 
 bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include xen/memory.h
 #include xen/sched.h
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+   uint64_t reserved_start, uint64_t reserved_size)
+{
+return (start + size  reserved_start) 
+(start  reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
 asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
 *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+int rc;
+struct xen_memory_map memmap = {
+.nr_entries = *max_entries
+};
+
+set_xen_guest_handle(memmap.buffer, entries);
+
+rc = hypercall_memory_op(XENMEM_memory_map, memmap);
+*max_entries = memmap.nr_entries;
+
+return rc;
+}
+
 void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
 {
 static int over_allocated;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index f99c0f19..1100a3b 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -4,8 +4,10 @@

[Xen-devel] [v6][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-08 Thread Tiejun Chen

Now we can use that memory map to build our final
e820 table but it may need to reorder all e820
entries.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6:

* Nothing is changed.

v5:

* Nothing is changed.

v4:

* Rename local variable, low_mem_pgend, to low_mem_end.

* Improve some code comments

* Adjust highmem after lowmem is changed.

 tools/firmware/hvmloader/e820.c | 80 +
 1 file changed, 66 insertions(+), 14 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 3e53c47..aa2569f 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -108,7 +108,9 @@ int build_e820_table(struct e820entry *e820,
  unsigned int lowmem_reserved_base,
  unsigned int bios_image_base)
 {
-unsigned int nr = 0;
+unsigned int nr = 0, i, j;
+uint64_t add_high_mem = 0;
+uint64_t low_mem_end = hvm_info-low_mem_pgend  PAGE_SHIFT;
 
 if ( !lowmem_reserved_base )
 lowmem_reserved_base = 0xA;
@@ -152,13 +154,6 @@ int build_e820_table(struct e820entry *e820,
 e820[nr].type = E820_RESERVED;
 nr++;
 
-/* Low RAM goes here. Reserve space for special pages. */
-BUG_ON((hvm_info-low_mem_pgend  PAGE_SHIFT)  (2u  20));
-e820[nr].addr = 0x10;
-e820[nr].size = (hvm_info-low_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
-nr++;
-
 /*
  * Explicitly reserve space for special pages.
  * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -194,16 +189,73 @@ int build_e820_table(struct e820entry *e820,
 nr++;
 }
 
-
-if ( hvm_info-high_mem_pgend )
+/*
+ * Construct E820 table according to recorded memory map.
+ *
+ * The memory map created by toolstack may include,
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. Reserved regions if they exist
+ *
+ * #3. High memory region if it exists
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
 {
-e820[nr].addr = ((uint64_t)1  32);
-e820[nr].size =
-((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
+e820[nr] = memory_map.map[i];
 nr++;
 }
 
+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end  (2u  20));
+
+/*
+ * We may need to adjust real lowmem end since we may
+ * populate RAM to get enough MMIO previously.
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+uint64_t end = e820[i].addr + e820[i].size;
+if ( e820[i].type == E820_RAM 
+ low_mem_end  e820[i].addr  low_mem_end  end )
+{
+add_high_mem = end - low_mem_end;
+e820[i].size = low_mem_end - e820[i].addr;
+}
+}
+
+/*
+ * And then we also need to adjust highmem.
+ */
+if ( add_high_mem )
+{
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+if ( e820[i].type == E820_RAM 
+ e820[i].addr  (1ull  32))
+e820[i].size += add_high_mem;
+}
+}
+
+/* Finally we need to reorder all e820 entries. */
+for ( j = 0; j  nr-1; j++ )
+{
+for ( i = j+1; i  nr; i++ )
+{
+if ( e820[j].addr  e820[i].addr )
+{
+struct e820entry tmp;
+tmp = e820[j];
+e820[j] = e820[i];
+e820[i] = tmp;
+}
+}
+}
+
 return nr;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen-unstable: pci-passthrough of device using MSI-X interrupts not working after commit x86/MSI: track host and guest masking separately

2015-07-08 Thread Sander Eikelenboom


Tuesday, July 7, 2015, 6:08:25 PM, you wrote:

 On 26.06.15 at 17:48, li...@eikelenboom.it wrote:
 On 2015-06-26 17:22, Jan Beulich wrote:
 I have an idea: In
 
 static unsigned int startup_msi_irq(struct irq_desc *desc)
 {
 bool_t guest_masked = (desc-status  IRQ_GUEST) 
   is_hvm_domain(desc-msi_desc-dev-domain);
 
 if ( unlikely(!msi_set_mask_bit(desc, 0, guest_masked)) )
 WARN();
 return 0;
 }
 
 I think we need to also exclude the emuirq case (which is what I
 understand backs the pvhvm interrupt in the guest - Stefano,
 please confirm). For testing purposes, could you try simply passing
 zero instead of guest_masked here?
 
 I can confirm, with 0 it works !

 Okay, here's something that hopefully could go in (provided of
 course it too works for you).

Hi Jan,

Just tested and it works fine :-)

--
Sander

 Jan

 --- unstable.orig/xen/arch/x86/irq.c2015-07-07 17:56:52.0 +0200
 +++ unstable/xen/arch/x86/irq.c   2015-07-07 17:04:08.0 +0200
 @@ -2502,6 +2502,25 @@ int unmap_domain_pirq_emuirq(struct doma
  return ret;
  }
  
 +void arch_evtchn_bind_pirq(struct domain *d, int pirq)
 +{
 +int irq = domain_pirq_to_irq(d, pirq);
 +struct irq_desc *desc;
 +unsigned long flags;
 +
 +if ( irq = 0 )
 +return;
 +
 +if ( is_hvm_domain(d) )
 +map_domain_emuirq_pirq(d, pirq, IRQ_PT);
 +
 +desc = irq_to_desc(irq);
 +spin_lock_irqsave(desc-lock, flags);
+if ( desc-msi_desc )
 +guest_mask_msi_irq(desc, 0);
 +spin_unlock_irqrestore(desc-lock, flags);
 +}
 +
  bool_t hvm_domain_use_pirq(const struct domain *d, const struct pirq *pirq)
  {
  return is_hvm_domain(d)  pirq 
 --- unstable.orig/xen/arch/x86/msi.c2015-07-07 17:56:53.0 +0200
 +++ unstable/xen/arch/x86/msi.c   2015-07-07 16:50:02.0 +0200
 @@ -422,10 +422,7 @@ void guest_mask_msi_irq(struct irq_desc 
  
  static unsigned int startup_msi_irq(struct irq_desc *desc)
  {
 -bool_t guest_masked = (desc-status  IRQ_GUEST) 
 -  is_hvm_domain(desc-msi_desc-dev-domain);
 -
 -msi_set_mask_bit(desc, 0, guest_masked);
 +msi_set_mask_bit(desc, 0, !!(desc-status  IRQ_GUEST));
  return 0;
  }
  
 --- unstable.orig/xen/common/event_channel.c2015-07-07 17:56:51.0 
 +0200
 +++ unstable/xen/common/event_channel.c   2015-07-07 16:53:47.0 
 +0200
 @@ -456,10 +456,7 @@ static long evtchn_bind_pirq(evtchn_bind
  
  bind-port = port;
  
 -#ifdef CONFIG_X86
 -if ( is_hvm_domain(d)  domain_pirq_to_irq(d, pirq)  0 )
 -map_domain_emuirq_pirq(d, pirq, IRQ_PT);
 -#endif
 +arch_evtchn_bind_pirq(d, pirq);
  
   out:
  spin_unlock(d-event_lock);
 --- unstable.orig/xen/include/asm-arm/irq.h 2015-07-07 17:56:49.0 
 +0200
 +++ unstable/xen/include/asm-arm/irq.h  2015-07-07 17:02:00.0 +0200
 @@ -48,6 +48,8 @@ int release_guest_irq(struct domain *d, 
  
  void arch_move_irqs(struct vcpu *v);
  
 +#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))
 +
  /* Set IRQ type for an SPI */
  int irq_set_spi_type(unsigned int spi, unsigned int type);
  
 --- unstable.orig/xen/include/xen/irq.h   2015-07-07 17:56:49.0 
 +0200
 +++ unstable/xen/include/xen/irq.h  2015-07-07 17:02:49.0 +0200
 @@ -172,4 +172,8 @@ unsigned int set_desc_affinity(struct ir
  unsigned int arch_hwdom_irqs(domid_t);
  #endif
  
 +#ifndef arch_evtchn_bind_pirq
 +void arch_evtchn_bind_pirq(struct domain *, int pirq);
 +#endif
 +
  #endif /* __XEN_IRQ_H__ */




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64

2015-07-08 Thread Wu, Feng

 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, July 08, 2015 4:13 PM
 To: Wu, Feng
 Cc: Andrew Cooper; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z;
 xen-devel@lists.xen.org; k...@xen.org
 Subject: RE: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64

  On 08.07.15 at 09:06, feng...@intel.com wrote:

  -Original Message-
  From: xen-devel-boun...@lists.xen.org
  [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Andrew Cooper
  Sent: Thursday, June 25, 2015 2:35 AM
  To: Wu, Feng; xen-devel@lists.xen.org
  Cc: george.dun...@eu.citrix.com; Zhang, Yang Z; Tian, Kevin; k...@xen.org;
  jbeul...@suse.com
  Subject: Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64

  On 24/06/15 06:18, Feng Wu wrote:
   This patch adds cmpxchg16b support for x86-64, so software
   can perform 128-bit atomic write/read.

   Signed-off-by: Feng Wu feng...@intel.com
   ---
   v3:
   Newly added.

xen/include/asm-x86/x86_64/system.h | 28

xen/include/xen/types.h |  5 +
2 files changed, 33 insertions(+)

   diff --git a/xen/include/asm-x86/x86_64/system.h
  b/xen/include/asm-x86/x86_64/system.h
   index 662813a..a910d00 100644
   --- a/xen/include/asm-x86/x86_64/system.h
   +++ b/xen/include/asm-x86/x86_64/system.h
   @@ -6,6 +6,34 @@
   (unsigned
  long)(n),sizeof(*(ptr

/*
   + * Atomic 16 bytes compare and exchange.  Compare OLD with MEM, if
   + * identical, store NEW in MEM.  Return the initial value in MEM.
   + * Success is indicated by comparing RETURN with OLD.
   + *
   + * This function can only be called when cpu_has_cx16 is ture.
   + */
   +
   +static always_inline uint128_t __cmpxchg16b(
   +volatile void *ptr, uint128_t old, uint128_t new)

  It is not nice for register scheduling taking uint128_t's by value.
  Instead, I would pass them by pointer and let the inlining sort the
  eventual references out.

   +{
   +uint128_t prev;
   +
   +ASSERT(cpu_has_cx16);

  Given that if this assertion were to fail, cmpxchg16b would fail with
  #UD, I would hand-code a asm_fixup section which in turn panics.  This
  avoids a situation where non-debug builds could die with an unqualified
  #UD exception.

  Is there an existing way to panic the hypervisor in assembler code, I
  don't find it, it would be appreciated if you can point it out.

 I'm not convinced such a #UD would be a significant problem: Looking
 at the disassembly will show the cause right away. The out of line
 ud2-s in some of VMX'es inline assembly wrappers are far worse.

So, do you agree with the fixup section or not?

 As to panic()ing from assembly code:

   movq$string-label, %rdi
   callpanic

  Also, you must enforce 16-byte alignment of the memory reference, as
  described in the manual.

  What should I do if the caller passes an non 16-byte alignment data
  (struct iremap_entry in this case) ? Do this mean I need to define
  it like this?

  struct iremap_entry {

  ..

  } __attribute__ ((aligned (16)));

 How would that help? The table entries hardware uses are supposed
 to be 16-byte aligned anyway, aren't they?

Oh, yes, the base address of the remapping table is 4K aligned.

 I think Andrew's enforce
 really means ASSERT() or BUG_ON(), again to avoid an unqualified
 exception. However - see above.

 Plus, all that said, without having seen the actual use sites of
 cmpxchg16b yet, I'm not at all convinced we really need this patch.

After introducing posted format in IRTE, some fields exist in both the
High 64 bit and the low 64 bit,such as pda_h and pda_l, how to make
sure it is atomic when updating the pda field?

Thanks,
Feng

 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v10 00/13] enable Cache Allocation Technology (CAT) for VMs

2015-07-08 Thread Wei Liu

On Wed, Jul 08, 2015 at 05:40:47PM +0800, Chao Peng wrote:
 On Tue, Jul 07, 2015 at 03:46:21PM +0100, Ian Campbell wrote:
  On Fri, 2015-06-26 at 16:43 +0800, Chao Peng wrote:
   Chao Peng (13):
 x86: add socket_cpumask
 x86: detect and initialize Intel CAT feature
 x86: maintain COS to CBM mapping for each socket
 x86: add COS information for each domain
 x86: expose CBM length and COS number information
 x86: dynamically get/set CBM for a domain
 x86: add scheduling support for Intel CAT
 xsm: add CAT related xsm policies
  
  Jan applied to here.
  
  So I was going to apply these 5:
  
 tools/libxl: minor name changes for CMT commands
 tools/libxl: add command to show PSR hardware info
 tools/libxl: introduce some socket helpers
 tools: add tools support for Intel CAT
 docs: add xl-psr.markdown
  
  But, on i686 I see:
  
  xl_cmdimpl.c: In function ‘psr_cat_hwinfo’:
  xl_cmdimpl.c:8390:16: error: format ‘%llx’ expects argument of type ‘long 
  long unsigned int’, but argument 3 has type ‘long unsigned int’ 
  [-Werror=format=]
  (1ul  info-cbm_len) - 1);
  ^
  xl_cmdimpl.c: In function ‘psr_cat_print_socket’:
  xl_cmdimpl.c:8450:5: error: format ‘%llx’ expects argument of type ‘long 
  long unsigned int’, but argument 3 has type ‘long unsigned int’ 
  [-Werror=format=]
   printf(%-16s: %#PRIx64\n, Default CBM, (1ul  info-cbm_len) - 
  1);
   ^
  cc1: all warnings being treated as errors
  
  It seems there is some mismatch between your types and the printf
  formats used.
  
  The appropriate format specifier for an unsigned long (which you have
  from the ul in the constant) is %#lx and not %#PRIxXX which is
  associated with uintXX_t types.
  
  If you need a 64 bit type then you might have meant instead to use ull
  in which case you want %#llx as the format specifier.
 
 This is what I need. Thanks for suggestion.
 

Chao, 4.6 freeze is on Friday. Can you fix that minor bug and
repost your series within two days?

Wei.

 Chao
  
  If you really want/need an exactly 64 bit type then you'll have to do
  some nasty casting, something like ((uint64_t)1)  info-cbm_len) - 1
  or something, that's pretty ugly though. If you have to go this route
  then please test both builds, in case I've gotten my ()'s wrong.
  
  Ian.
  
  
  ___
  Xen-devel mailing list
  Xen-devel@lists.xen.org
  http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used

2015-07-08 Thread Wu, Feng



 -Original Message-
 From: Tian, Kevin
 Sent: Wednesday, July 08, 2015 6:00 PM
 To: Wu, Feng; xen-devel@lists.xen.org
 Cc: k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com; Zhang,
 Yang Z; george.dun...@eu.citrix.com
 Subject: RE: [v3 10/15] vt-d: Add API to update IRTE when VT-d PI is used
 
  From: Wu, Feng
  Sent: Wednesday, June 24, 2015 1:18 PM
 
  This patch adds an API which is used to update the IRTE
  for posted-interrupt when guest changes MSI/MSI-X information.
 
  Signed-off-by: Feng Wu feng...@intel.com
 
 Acked-by: Kevin Tian kevin.t...@intel.com, with one small comment:
 
  +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint8_t gvec)
  +{
  +struct irq_desc *desc;
  +struct msi_desc *msi_desc;
  +int remap_index;
  +int rc = 0;
  +struct pci_dev *pci_dev;
  +struct acpi_drhd_unit *drhd;
  +struct iommu *iommu;
  +struct ir_ctrl *ir_ctrl;
  +struct iremap_entry *iremap_entries = NULL, *p = NULL;
  +struct iremap_entry new_ire;
  +struct pi_desc *pi_desc = v-arch.hvm_vmx.pi_desc;
  +unsigned long flags;
  +uint128_t old_ire, ret;
  +
  +desc = pirq_spin_lock_irq_desc(pirq, NULL);
  +if ( !desc )
  +return -ENOMEM;
 
 -EINVAL?
 

I think -EINVAL is reasonable.

Thanks,
Feng


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.

2015-07-08 Thread Tim Deegan

Hi,

At 17:38 + on 07 Jul (1436290689), Sahita, Ravi wrote:
 In order to make forward progress, do the other maintainers (Jan,
 Andrew, Tim) agree with the patch direction that George has
 suggested for this particular patch?

I'm no longer a maintainer for this code, but FWIW I think that this
direction (adding a new argument to the internal APIs rather than
adding new internal APIs) is correct.

Because the sve bit must be _set_ to get the old/default behaviour, I
think the p2m_pt implementation should always return sve = 1 on _get
and possibly also assert sve != 0 on _set.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] traps.c:3227: GPF (0000): ffff82d080194a4d - ffff82d080239d85 and other dom0 induced log messages

2015-07-08 Thread Andrew Cooper

On 08/07/2015 11:04, Sander Eikelenboom wrote:
 Wednesday, July 8, 2015, 10:58:02 AM, you wrote:

 On 08/07/2015 09:45, Sander Eikelenboom wrote:
 Monday, July 6, 2015, 11:33:09 AM, you wrote:

 On 26.06.15 at 17:57, li...@eikelenboom.it wrote:
 On 2015-06-26 17:51, Jan Beulich wrote:
 On 26.06.15 at 17:41, li...@eikelenboom.it wrote:
 from 3.16 to 3.19 we gained a lot of these, if i remember correctly
 related to
 perf being enabled in the kernel:

 +   traps.c:2655:d0v0 Domain attempted WRMSR c081 from
 0xe023e008 to 0x00230010.
 +   traps.c:2655:d0v0 Domain attempted WRMSR c082 from
 0x82d0b000 to 0x81bc2670.
 +   traps.c:2655:d0v0 Domain attempted WRMSR c083 from
 0x82d0b020 to 0x81bc4630.
 These are the SYSCALL (STAR) MSRs, which the kernel has no business
 touching when running on Xen.

 from 3.19 to 4.0 we gained:
 +   d0 attempted to change d0v0's CR4 flags 0660 - 0760
 +   d0 attempted to change d0v1's CR4 flags 0660 - 0760
 +   d0 attempted to change d0v2's CR4 flags 0660 - 0760
 +   d0 attempted to change d0v3's CR4 flags 0660 - 0760
 +   d0 attempted to change d0v4's CR4 flags 0660 - 0760
 +   d0 attempted to change d0v5's CR4 flags 0660 - 0760
 This is X86_CR4_PCE - not sure how to properly handle that.
 Andrew, you're fiddling with the CR4 handling right now anyway -
 any thoughts?

 and from 4.0 to 4.1 we gained the ones you were interested in:
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 For these to be meaningful you need to translate them to symbolic
 addresses. (And yes, we should see to make the code print them
 in a more useful manner.)
 How ?
 addr2line against xen-syms (or xen.efi if you use that one). And of
 course the result may need manual adjustment to account for
 eventual patches you have in your tree.
 Jan
 Ah yeah .. silly me .. somehow i had in mind it would be kernel addresses 
 instead of xen, so running it against vmlinux of course lead no where.

 Here we go:

 (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 
 - 82d080239d85
 (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 
 - 82d080239d85

 which leads to:
 # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080195583
 /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758

 # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080239d85
 ??:?
 The second one is not.  It is the fixup label, which will be hidden away
 out-of-line, and lacking debug symbols.
 Were /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 leads to:

 case MSR_EFER:
  rdmsr_normal:
 /* Everyone can read the MSR space. */
 /* gdprintk(XENLOG_WARNING,Domain attempted RDMSR %p.\n,
 _p(regs-ecx));*/
 HERE --if ( rdmsr_safe(regs-ecx, val) )
 goto fail;
 Moving the printk into the fail case will identify which is the
 problematic MSR.  We need the value of regs-_ecx here (the low 32bits,
 not the full 64 as the commented printk currently has).
 I have a small todo list of misc debugging improvements.  I will add
 this to the list.
 ~Andrew
  rdmsr_writeback:
 regs-eax = (uint32_t)val;
 regs-edx = (uint32_t)(val  32);
 break;
 }
 break;

 Don't know if the full 64bits is of equal use

It is (just with an unhelpful quantity of zeroes)

 , but here it is:

 (XEN) [2015-07-08 10:01:58.717] traps.c:2760:d14v0 Domain attempted but 
 failed RDMSR 0570.

Looks to be  MSR_IA32_RTIT_CTL, which is part of the Intel Processor
Trace PMU driver (Linux/arch/x86/kernel/cpu/perf_event_intel_pt.c).  A
PV domain running on AMD absolutely shouldn't be attempting to read this.

It appears that pt_init() blindly probes the MSR without any
cpuid/vendor detection.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 2/3] arm: Allow the user to specify the GIC version

2015-07-08 Thread Ian Campbell

On Tue, 2015-07-07 at 17:22 +0100, Julien Grall wrote:
 diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
 index e1632fa..11f6461 100644
 --- a/tools/libxl/libxl_types.idl
 +++ b/tools/libxl/libxl_types.idl
 @@ -369,6 +369,12 @@ libxl_vnode_info = Struct(vnode_info, [
  (vcpus, libxl_bitmap), # vcpus in this node
  ])
  
 +libxl_gic_version = Enumeration(gic_version, [
 +(0, DEFAULT),
 +(0x20, v2),
 +(0x30, v3)
 +], init_val = LIBXL_GIC_VERSION_DEFAULT)
 +
  libxl_domain_build_info = Struct(domain_build_info,[
  (max_vcpus,   integer),
  (avail_vcpus, libxl_bitmap),
 @@ -480,6 +486,11 @@ libxl_domain_build_info = Struct(domain_build_info,[
])),
   (invalid, None),
   ], keyvar_init_val = LIBXL_DOMAIN_TYPE_INVALID)),
 +
 +
 +(arch_arm, Struct(None, [(gic_version, libxl_gic_version),
 +  ])),
 +
  ], dir=DIR_IN

This results in the following when building the ocaml bindings:

Traceback (most recent call last):
  File genwrap.py, line 529, in module
ml.write(gen_ocaml_ml(ty, False))
  File genwrap.py, line 217, in gen_ocaml_ml
s += gen_struct(ty)
  File genwrap.py, line 119, in gen_struct
x = ocaml_instance_of_field(f)
  File genwrap.py, line 112, in ocaml_instance_of_field
return %s : %s % (munge_name(name), ocaml_type_of(f.type))
  File genwrap.py, line 90, in ocaml_type_of
return ty.rawname.capitalize() + .t
AttributeError: 'NoneType' object has no attribute 'capitalize'
make[7]: *** No rule to make target '_libxl_types.ml.in', needed by 
'xenlight.ml'.  Stop.

I'll take a look.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [linux-3.4 test] 59139: regressions - FAIL

2015-07-08 Thread osstest service owner

flight 59139 linux-3.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59139/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-win7-amd64  6 xen-boot  fail REGR. vs. 30511

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-sedf-pin  6 xen-boot   fail in 58831 pass in 58798
 test-amd64-amd64-xl   6 xen-boot   fail in 59091 pass in 59139
 test-amd64-amd64-pair10 xen-boot/dst_host   fail pass in 58798
 test-amd64-amd64-pair 9 xen-boot/src_host   fail pass in 58798
 test-amd64-i386-pair 10 xen-boot/dst_host   fail pass in 58831
 test-amd64-i386-pair  9 xen-boot/src_host   fail pass in 58831
 test-amd64-i386-xl-qemuu-win7-amd64  9 windows-install  fail pass in 59091

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-amd64-xl-multivcpu  6 xen-boot   fail baseline untested
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-amd64-libvirt-xsm  6 xen-bootfail baseline untested
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 6 xen-boot fail baseline 
untested
 test-amd64-i386-libvirt-xsm   6 xen-bootfail baseline untested
 test-amd64-amd64-xl-credit2   6 xen-bootfail baseline untested
 test-amd64-i386-xl-xsm6 xen-bootfail baseline untested
 test-amd64-amd64-xl-xsm   6 xen-bootfail baseline untested
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 guest-localmigrate 
fail baseline untested
 test-amd64-amd64-xl-sedf  6 xen-boot  fail in 58831 like 30406
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-localmigrate/x10 
fail in 59091 baseline untested
 test-amd64-i386-libvirt  11 guest-start  fail   like 30511
 test-amd64-amd64-libvirt 11 guest-start  fail   like 30511
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 30511
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail like 30511
 test-amd64-amd64-xl-qemuu-ovmf-amd64  6 xen-bootfail like 53709-bisect
 test-amd64-i386-xl6 xen-bootfail like 53725-bisect
 test-amd64-i386-freebsd10-amd64  6 xen-boot fail like 58780-bisect
 test-amd64-i386-xl-qemuu-winxpsp3  6 xen-boot   fail like 58786-bisect
 test-amd64-i386-qemut-rhel6hvm-intel  6 xen-bootfail like 58788-bisect
 test-amd64-i386-rumpuserxen-i386  6 xen-bootfail like 58799-bisect
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1  6 xen-bootfail like 58801-bisect
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  6 xen-boot   fail like 58803-bisect
 test-amd64-amd64-xl-qemut-winxpsp3  6 xen-boot  fail like 58804-bisect
 test-amd64-i386-freebsd10-i386  6 xen-boot  fail like 58805-bisect
 test-amd64-i386-xl-qemuu-ovmf-amd64  6 xen-boot fail like 58806-bisect
 test-amd64-amd64-xl-qemuu-winxpsp3  6 xen-boot  fail like 58807-bisect
 test-amd64-i386-xl-qemut-winxpsp3  6 xen-boot   fail like 58808-bisect
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1  6 xen-bootfail like 58809-bisect
 test-amd64-amd64-rumpuserxen-amd64  6 xen-boot  fail like 58810-bisect
 test-amd64-i386-xl-qemuu-debianhvm-amd64  6 xen-bootfail like 58811-bisect
 test-amd64-amd64-xl-qemut-debianhvm-amd64  6 xen-boot   fail like 58813-bisect
 test-amd64-i386-qemuu-rhel6hvm-intel  6 xen-bootfail like 58814-bisect
 test-amd64-i386-xl-qemut-debianhvm-amd64  6 xen-bootfail like 58815-bisect

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt-xsm 12 migrate-support-check fail in 58831 never pass
 test-amd64-i386-libvirt  12 migrate-support-check fail in 58831 never pass
 test-amd64-amd64-libvirt 12 migrate-support-check fail in 58831 never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail in 59091 never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass

version targeted for testing:
 linuxcf1b3dad6c5699b977273276bada8597636ef3e2
baseline version:
 linuxbb4a05a0400ed6d2f1e13d1f82f289ff74300a70

Last test of basis30511  2014-09-29 16:37:46 Z  281 days
Failing since 32004  2014-12-02 04:10:03 Z  218 days  167 attempts
Testing same since58781  2015-06-20 14:15:50 Z   17 days   21 attempts


500 people touched revisions under test,

Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-08 Thread Ian Campbell

On Wed, 2015-07-08 at 08:54 +0800, Chen, Tiejun wrote:
  +none is the default value and it means we don't check any reserved 
  regions
  +and then all rdm policies would be ignored. Guest just works as before and
  +the conflict of RDM and guest address space wouldn't be handled, and then
  +this may result in the associated device not being able to work or even 
  crash
  +the VM. So if you're assigning this kind of device, this option is not
  +recommended unless you can make sure any conflict doesn't exist.
  +
 
  One issue didn't come to conclusion during last round of review. Ian was
  asking what's the difference with type=none vs not specifying rdm option
  at all.
 
  You need to either convince Ian or remove type=none in *xl* level.
  I.e. don't touch the libxl IDL. It still needs a none type.
 
 I'll update this next revision. And also rephrase this doc to address 
 your comments below.

FTR I think I indicated yesterday that I was satisfied with your
explanation for why type=none exists as an option even at the xl level,
namely that it allows us to change the default in the future.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] traps.c:3227: GPF (0000): ffff82d080194a4d - ffff82d080239d85 and other dom0 induced log messages

2015-07-08 Thread Andrew Cooper

On 08/07/2015 09:45, Sander Eikelenboom wrote:
 Monday, July 6, 2015, 11:33:09 AM, you wrote:

 On 26.06.15 at 17:57, li...@eikelenboom.it wrote:
 On 2015-06-26 17:51, Jan Beulich wrote:
 On 26.06.15 at 17:41, li...@eikelenboom.it wrote:
 from 3.16 to 3.19 we gained a lot of these, if i remember correctly
 related to
 perf being enabled in the kernel:

 +   traps.c:2655:d0v0 Domain attempted WRMSR c081 from
 0xe023e008 to 0x00230010.
 +   traps.c:2655:d0v0 Domain attempted WRMSR c082 from
 0x82d0b000 to 0x81bc2670.
 +   traps.c:2655:d0v0 Domain attempted WRMSR c083 from
 0x82d0b020 to 0x81bc4630.
 These are the SYSCALL (STAR) MSRs, which the kernel has no business
 touching when running on Xen.

 from 3.19 to 4.0 we gained:
 +   d0 attempted to change d0v0's CR4 flags 0660 - 0760
 +   d0 attempted to change d0v1's CR4 flags 0660 - 0760
 +   d0 attempted to change d0v2's CR4 flags 0660 - 0760
 +   d0 attempted to change d0v3's CR4 flags 0660 - 0760
 +   d0 attempted to change d0v4's CR4 flags 0660 - 0760
 +   d0 attempted to change d0v5's CR4 flags 0660 - 0760
 This is X86_CR4_PCE - not sure how to properly handle that.
 Andrew, you're fiddling with the CR4 handling right now anyway -
 any thoughts?

 and from 4.0 to 4.1 we gained the ones you were interested in:
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 +   traps.c:3227: GPF (): 82d080194a4d - 82d080239d85
 For these to be meaningful you need to translate them to symbolic
 addresses. (And yes, we should see to make the code print them
 in a more useful manner.)
 How ?
 addr2line against xen-syms (or xen.efi if you use that one). And of
 course the result may need manual adjustment to account for
 eventual patches you have in your tree.
 Jan
 Ah yeah .. silly me .. somehow i had in mind it would be kernel addresses 
 instead of xen, so running it against vmlinux of course lead no where.

 Here we go:

 (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 - 
 82d080239d85
 (XEN) [2015-07-08 08:31:00.384] traps.c:3227: GPF (): 82d080195583 - 
 82d080239d85

 which leads to:
 # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080195583
 /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758

 # addr2line -e /usr/lib/debug/xen-syms-4.6-unstable 82d080239d85
 ??:?

The second one is not.  It is the fixup label, which will be hidden away
out-of-line, and lacking debug symbols.


 Were /usr/src/new/xen-unstable/xen/arch/x86/traps.c:2758 leads to:

 case MSR_EFER:
  rdmsr_normal:
 /* Everyone can read the MSR space. */
 /* gdprintk(XENLOG_WARNING,Domain attempted RDMSR %p.\n,
 _p(regs-ecx));*/
 HERE --if ( rdmsr_safe(regs-ecx, val) )
 goto fail;

Moving the printk into the fail case will identify which is the
problematic MSR.  We need the value of regs-_ecx here (the low 32bits,
not the full 64 as the commented printk currently has).

I have a small todo list of misc debugging improvements.  I will add
this to the list.

~Andrew

  rdmsr_writeback:
 regs-eax = (uint32_t)val;
 regs-edx = (uint32_t)(val  32);
 break;
 }
 break;



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [linux-4.1 test] 59143: regressions - FAIL

2015-07-08 Thread osstest service owner

flight 59143 linux-4.1 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59143/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-localmigrate/x10 
fail REGR. vs. 59031

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-libvirt  11 guest-start   fail REGR. vs. 59031

Tests which did not succeed, but are not blocking:
 test-amd64-i386-freebsd10-amd64  9 freebsd-install fail never pass
 test-amd64-i386-freebsd10-i386  9 freebsd-install  fail never pass
 test-amd64-amd64-xl-pvh-intel 13 guest-saverestorefail  never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  11 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 11 guest-start  fail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass

version targeted for testing:
 linux6a010c0abd49388a49af3d5a5bfc00e0d5767607
baseline version:
 linuxb953c0d234bc72e8489d3bf51a276c5c4ec85345

Last test of basis59031  2015-07-02 23:39:59 Z5 days
Testing same since59054  2015-07-05 10:20:43 Z2 days3 attempts


People who touched revisions under test:
  Alexander Shishkin alexander.shish...@linux.intel.com
  Alexey Sokolov soko...@7pikes.com
  Andi Kleen a...@linux.intel.com
  Arnaldo Carvalho de Melo a...@redhat.com
  Borislav Petkov b...@alien8.de
  Borislav Petkov b...@suse.de
  Dmitry Tunin hanipouspi...@gmail.com
  Greg Kroah-Hartman gre...@linuxfoundation.org
  Imre Palik im...@amazon.de
  Ingo Molnar mi...@kernel.org
  Jiri Olsa jo...@kernel.org
  Kalle Valo kv...@codeaurora.org
  Lukas Wunner lu...@wunner.de
  Marcel Holtmann mar...@holtmann.org
  Oleg Nesterov o...@redhat.com
  Palik, Imre im...@amazon.de
  Peter Zijlstra (Intel) pet...@infradead.org
  RafaÅ MiÅecki zaj...@gmail.com

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm fail
 test-amd64-amd64-libvirt-xsm pass
 test-armhf-armhf-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  fail
 test-amd64-amd64-xl-xsm  pass
 test-armhf-armhf-xl-xsm  pass
 test-amd64-i386-xl-xsm

Re: [Xen-devel] [v3 08/15] Suppress posting interrupts when 'SN' is set

2015-07-08 Thread Tian, Kevin

 From: Wu, Feng
 Sent: Wednesday, June 24, 2015 1:18 PM
 
 Currently, we don't support urgent interrupt, all interrupts
 are recognized as non-urgent interrupt, so we cannot send
 posted-interrupt when 'SN' is set.
 
 Signed-off-by: Feng Wu feng...@intel.com
 ---
 v3:
 use cmpxchg to test SN/ON and set ON
 
  xen/arch/x86/hvm/vmx/vmx.c | 32 
  1 file changed, 28 insertions(+), 4 deletions(-)
 
 diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
 index 0837627..b94ef6a 100644
 --- a/xen/arch/x86/hvm/vmx/vmx.c
 +++ b/xen/arch/x86/hvm/vmx/vmx.c
 @@ -1686,6 +1686,8 @@ static void __vmx_deliver_posted_interrupt(struct vcpu 
 *v)
 
  static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
  {
 +struct pi_desc old, new, prev;
 +

move to 'else if'.

  if ( pi_test_and_set_pir(vector, v-arch.hvm_vmx.pi_desc) )
  return;
 
 @@ -1698,13 +1700,35 @@ static void vmx_deliver_posted_intr(struct vcpu *v, u8
 vector)
   */
  pi_set_on(v-arch.hvm_vmx.pi_desc);
  }
 -else if ( !pi_test_and_set_on(v-arch.hvm_vmx.pi_desc) )
 +else
  {
 +prev.control = 0;
 +
 +do {
 +old.control = v-arch.hvm_vmx.pi_desc.control 
 +  ~(1  POSTED_INTR_ON | 1  POSTED_INTR_SN);
 +new.control = v-arch.hvm_vmx.pi_desc.control |
 +  1  POSTED_INTR_ON;
 +
 +/*
 + * Currently, we don't support urgent interrupt, all
 + * interrupts are recognized as non-urgent interrupt,
 + * so we cannot send posted-interrupt when 'SN' is set.
 + * Besides that, if 'ON' is already set, we cannot set
 + * posted-interrupts as well.
 + */
 +if ( prev.sn || prev.on )
 +{
 +vcpu_kick(v);
 +return;
 +}

would it make more sense to move above check after cmpxchg?

 +
 +prev.control = cmpxchg(v-arch.hvm_vmx.pi_desc.control,
 +   old.control, new.control);
 +} while ( prev.control != old.control );
 +
  __vmx_deliver_posted_interrupt(v);
 -return;
  }
 -
 -vcpu_kick(v);
  }
 
  static void vmx_sync_pir_to_irr(struct vcpu *v)
 --
 2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/MSI: fix guest unmasking when handling IRQ via event channel

2015-07-08 Thread Julien Grall


Hi,

On 08/07/2015 09:56, Jan Beulich wrote:

--- a/xen/include/asm-arm/irq.h
+++ b/xen/include/asm-arm/irq.h
@@ -47,6 +47,8 @@ int release_guest_irq(struct domain *d,

  void arch_move_irqs(struct vcpu *v);

+#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))
+


This addition is here in order to ensure that d and pirq are evaluated, 
right?


If so, I didn't find it obvious to understand. Why didn't you use a 
static inline? Or maybe add a comment explicitly say this is not 
implemented.


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v5][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-08 Thread Ian Campbell

On Wed, 2015-07-08 at 17:06 +0800, Chen, Tiejun wrote:

 #2. Don't expose ignore to user and just keep host as the default
 
 He told me he would discuss this with you, but sounds he didn't do this, 
 or I'm missing something here?

My question was regarding how xl rdm=type=none differed from not
saying anything (i.e. getting the default). You explained that this was
useful to allow the default to be changed, which I agreed with.

The question regarding the actually naming of the options at either the
xl level or the libxl (which seems to be what Ian J's comments were on)
are orthogonal to the question of whether there should be a way to
explicitly ask for the default (as opposed to implicitly asking for it
by omission of the option).

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 1/7] libxl: get rid of the SEDF scheduler

2015-07-08 Thread Ian Campbell

On Tue, 2015-07-07 at 18:43 +0200, Dario Faggioli wrote:
 only the interface is left in place, for backward
 compile-time compatibility, but every attempt to
 use it would throw an error.
 
 Signed-off-by: Dario Faggioli dario.faggi...@citrix.com
 ---
 Cc: George Dunlap george.dun...@eu.citrix.com
 Cc: Ian Jackson ian.jack...@eu.citrix.com
 Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com

Acked-by: Ian Campbell ian.campb...@citrix.com

 Cc: Wei Liu wei.l...@citrix.com
 
 Changes from v3:
  - drop George's Rev-by: which should not be there since v2;
  - better grouping of fields in libxl_domain_sched_params, as
suggested during review;
  - improved comment for ERROR_FEATURE_REMOVED, as suggested
during review.
 
 Changes from v2:
  - introduce and use ERROR_FEATURE_REMOVED, as requested
during review;
  - mark the SEDF only parameter as deprecated in libxl_types.idl,
as requested during review.
 ---
  tools/libxl/libxl.c |   73 
 ++-
  tools/libxl/libxl_create.c  |   61 
  tools/libxl/libxl_types.idl |8 -
  3 files changed, 11 insertions(+), 131 deletions(-)
 
 diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
 index 3a83903..38aff8d 100644
 --- a/tools/libxl/libxl.c
 +++ b/tools/libxl/libxl.c
 @@ -5728,73 +5728,6 @@ static int sched_credit2_domain_set(libxl__gc *gc, 
 uint32_t domid,
  return 0;
  }
  
 -static int sched_sedf_domain_get(libxl__gc *gc, uint32_t domid,
 - libxl_domain_sched_params *scinfo)
 -{
 -uint64_t period;
 -uint64_t slice;
 -uint64_t latency;
 -uint16_t extratime;
 -uint16_t weight;
 -int rc;
 -
 -rc = xc_sedf_domain_get(CTX-xch, domid, period, slice, latency,
 -extratime, weight);
 -if (rc != 0) {
 -LOGE(ERROR, getting domain sched sedf);
 -return ERROR_FAIL;
 -}
 -
 -libxl_domain_sched_params_init(scinfo);
 -scinfo-sched = LIBXL_SCHEDULER_SEDF;
 -scinfo-period = period / 100;
 -scinfo-slice = slice / 100;
 -scinfo-latency = latency / 100;
 -scinfo-extratime = extratime;
 -scinfo-weight = weight;
 -
 -return 0;
 -}
 -
 -static int sched_sedf_domain_set(libxl__gc *gc, uint32_t domid,
 - const libxl_domain_sched_params *scinfo)
 -{
 -uint64_t period;
 -uint64_t slice;
 -uint64_t latency;
 -uint16_t extratime;
 -uint16_t weight;
 -
 -int ret;
 -
 -ret = xc_sedf_domain_get(CTX-xch, domid, period, slice, latency,
 -extratime, weight);
 -if (ret != 0) {
 -LOGE(ERROR, getting domain sched sedf);
 -return ERROR_FAIL;
 -}
 -
 -if (scinfo-period != LIBXL_DOMAIN_SCHED_PARAM_PERIOD_DEFAULT)
 -period = (uint64_t)scinfo-period * 100;
 -if (scinfo-slice != LIBXL_DOMAIN_SCHED_PARAM_SLICE_DEFAULT)
 -slice = (uint64_t)scinfo-slice * 100;
 -if (scinfo-latency != LIBXL_DOMAIN_SCHED_PARAM_LATENCY_DEFAULT)
 -latency = (uint64_t)scinfo-latency * 100;
 -if (scinfo-extratime != LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT)
 -extratime = scinfo-extratime;
 -if (scinfo-weight != LIBXL_DOMAIN_SCHED_PARAM_WEIGHT_DEFAULT)
 -weight = scinfo-weight;
 -
 -ret = xc_sedf_domain_set(CTX-xch, domid, period, slice, latency,
 -extratime, weight);
 -if ( ret  0 ) {
 -LOGE(ERROR, setting domain sched sedf);
 -return ERROR_FAIL;
 -}
 -
 -return 0;
 -}
 -
  static int sched_rtds_domain_get(libxl__gc *gc, uint32_t domid,
 libxl_domain_sched_params *scinfo)
  {
 @@ -5873,7 +5806,8 @@ int libxl_domain_sched_params_set(libxl_ctx *ctx, 
 uint32_t domid,
  
  switch (sched) {
  case LIBXL_SCHEDULER_SEDF:
 -ret=sched_sedf_domain_set(gc, domid, scinfo);
 +LOG(ERROR, SEDF scheduler no longer available);
 +ret=ERROR_FEATURE_REMOVED;
  break;
  case LIBXL_SCHEDULER_CREDIT:
  ret=sched_credit_domain_set(gc, domid, scinfo);
 @@ -5909,7 +5843,8 @@ int libxl_domain_sched_params_get(libxl_ctx *ctx, 
 uint32_t domid,
  
  switch (scinfo-sched) {
  case LIBXL_SCHEDULER_SEDF:
 -ret=sched_sedf_domain_get(gc, domid, scinfo);
 +LOG(ERROR, SEDF scheduler no longer available);
 +ret=ERROR_FEATURE_REMOVED;
  break;
  case LIBXL_SCHEDULER_CREDIT:
  ret=sched_credit_domain_get(gc, domid, scinfo);
 diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
 index 9c2303c..3f31a3b 100644
 --- a/tools/libxl/libxl_create.c
 +++ b/tools/libxl/libxl_create.c
 @@ -50,61 +50,6 @@ int libxl__domain_create_info_setdefault(libxl__gc *gc,
  return 0;
  }
  
 -static int sched_params_valid(libxl__gc *gc,
 -  uint32_t domid, libxl_domain_sched_params *scp)
 -{
 -int

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-08 Thread Wu, Feng



 -Original Message-
 From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
 Sent: Tuesday, June 30, 2015 1:07 AM
 To: Wu, Feng; xen-devel@lists.xen.org
 Cc: k...@xen.org; jbeul...@suse.com; Tian, Kevin; Zhang, Yang Z;
 george.dun...@eu.citrix.com
 Subject: Re: [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked
 
 On 24/06/15 06:18, Feng Wu wrote:
  This patch includes the following aspects:
  - Add a global vector to wake up the blocked vCPU
when an interrupt is being posted to it (This
part was sugguested by Yang Zhang yang.z.zh...@intel.com).
  - Adds a new per-vCPU tasklet to wakeup the blocked
vCPU. It can be used in the case vcpu_unblock
cannot be called directly.
  - Define two per-cpu variables:
* pi_blocked_vcpu:
A list storing the vCPUs which were blocked on this pCPU.
 
* pi_blocked_vcpu_lock:
The spinlock to protect pi_blocked_vcpu.
 
  Signed-off-by: Feng Wu feng...@intel.com
  ---
  v3:
  - This patch is generated by merging the following three patches in v2:
 [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
 [RFC v2 10/15] vmx: Define two per-cpu variables
 [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d
 Posted-Interrupts
  - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
  - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct 
  arch_vmx_struct'
  - rename 'vcpu_wakeup_tasklet_handler' to
 'pi_vcpu_wakeup_tasklet_handler'
  - Make pi_wakeup_interrupt() static
  - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
  - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
  - Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
  - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'
 
   xen/arch/x86/hvm/vmx/vmcs.c|  3 +++
   xen/arch/x86/hvm/vmx/vmx.c | 54
 ++
   xen/include/asm-x86/hvm/hvm.h  |  1 +
   xen/include/asm-x86/hvm/vmx/vmcs.h |  5 
   xen/include/asm-x86/hvm/vmx/vmx.h  |  5 
   5 files changed, 68 insertions(+)
 
  diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
  index 11dc1b5..0c5ce3f 100644
  --- a/xen/arch/x86/hvm/vmx/vmcs.c
  +++ b/xen/arch/x86/hvm/vmx/vmcs.c
  @@ -631,6 +631,9 @@ int vmx_cpu_up(void)
   if ( cpu_has_vmx_vpid )
   vpid_sync_all();
 
  +INIT_LIST_HEAD(per_cpu(pi_blocked_vcpu, cpu));
  +spin_lock_init(per_cpu(pi_blocked_vcpu_lock, cpu));
  +
   return 0;
   }
 
  diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
  index b94ef6a..7db6009 100644
  --- a/xen/arch/x86/hvm/vmx/vmx.c
  +++ b/xen/arch/x86/hvm/vmx/vmx.c
  @@ -82,7 +82,20 @@ static int vmx_msr_read_intercept(unsigned int msr,
 uint64_t *msr_content);
   static int vmx_msr_write_intercept(unsigned int msr, uint64_t
 msr_content);
   static void vmx_invlpg_intercept(unsigned long vaddr);
 
  +/*
  + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
  + * can find which vCPU should be waken up.
  + */
  +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
  +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
  +
   uint8_t __read_mostly posted_intr_vector;
  +uint8_t __read_mostly pi_wakeup_vector;
  +
  +static void pi_vcpu_wakeup_tasklet_handler(unsigned long arg)
  +{
  +vcpu_unblock((struct vcpu *)arg);
  +}
 
   static int vmx_domain_initialise(struct domain *d)
   {
  @@ -148,11 +161,19 @@ static int vmx_vcpu_initialise(struct vcpu *v)
   if ( v-vcpu_id == 0 )
   v-arch.user_regs.eax = 1;
 
  +tasklet_init(
  +v-arch.hvm_vmx.pi_vcpu_wakeup_tasklet,
  +pi_vcpu_wakeup_tasklet_handler,
  +(unsigned long)v);
 
 c/s f6dd295 indicates that the global tasklet lock causes a bottleneck
 when injecting interrupts, and replaced a tasklet with a softirq to fix
 the scalability issue.
 
 I would expect exactly the bottleneck to exist here.

I am still considering this comments. Jan, what is your opinion about this?

Thanks,
Feng

 
  +
  +INIT_LIST_HEAD(v-arch.hvm_vmx.pi_blocked_vcpu_list);
  +
   return 0;
   }
 
   static void vmx_vcpu_destroy(struct vcpu *v)
   {
  +tasklet_kill(v-arch.hvm_vmx.pi_vcpu_wakeup_tasklet);
   /*
* There are cases that domain still remains in log-dirty mode when it
 is
* about to be destroyed (ex, user types 'xl destroy dom'), in which
 case
  @@ -1848,6 +1869,33 @@ static struct hvm_function_table __initdata
 vmx_function_table = {
   .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
   };
 
  +/*
  + * Handle VT-d posted-interrupt when VCPU is blocked.
  + */
  +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
  +{
  +struct arch_vmx_struct *vmx;
  +unsigned int cpu = smp_processor_id();
  +
  +spin_lock(per_cpu(pi_blocked_vcpu_lock, cpu));
 
 this_cpu($foo) should be used in preference to per_cpu($foo, $myself).
 
 However, always hoist repeated uses of this/per_cpu into local
 variables, as

[Xen-devel] x86, arm: remove asm/spinlock.h from all architectures removed x86's _raw_read_unlock()

2015-07-08 Thread Jan Beulich

David,

I'm afraid we'll need another fixup here, even if things build fine
despite the removal.

Thanks, Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Performance problem about address translation

2015-07-08 Thread xinyue



On 2015年07月08日 14:26, xinyue wrote:

Very sorry for sending wrong before.
On 2015年07月08日 14:13, xinyue wrote:


On 2015年07月07日 19:49, Ian Campbell wrote:

On Tue, 2015-07-07 at 11:24 +0800, xinyue wrote:

Please don't use HTML mail and do proper  quoting


And after analyzing the performance of hvm domu, I found a process
named evolution-data- using almost 99.9% cpu. Does someone known
what's this and why it appears?

evolution-data-server is part of the evolution mail client. It has
nothing to do with Xen I'm afraid so you will have to look elsewhere 
for

why it is taking so much CPU.

Ian.




Sorry for that and thanks very much.

I think the problem maybe caused by the address alignment. The HVM 
DomU crashed after the hypercall and Dom0 crashed later sometimes with 
Bus error.


I think the function that caused the crash is get_gfn. The related 
code is


unsigned long gfn;
unsigned long mfn;
struct vcpu *vcpu = current;
struct domain *d = vcpu-domain;
uint32_t pfec = PFEC_page_present;
p2m_type_t t;
gfn = paging_gva_to_gfn(current, 0xc029, pfec);
mfn = get_gfn(d, gfn, t);

Is that I lost some type translation?


Thanks and best regards!

xinyue



Thanks for all advices, I found the problem appeared because I forget 
adding function put_gfn.


Thanks again and best regards!

xinyue

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v25 00/15] x86/PMU: Xen PMU PV(H) support

2015-07-08 Thread Jan Beulich

 On 19.06.15 at 20:44, boris.ostrov...@oracle.com wrote:

While making another scan through this series now that some more
reviews from Dietmar are trickling in, I notice:

 Boris Ostrovsky (15):
   common/symbols: Export hypervisor symbols to privileged guest
   x86/VPMU: Add public xenpmu.h
   x86/VPMU: Make vpmu not HVM-specific
   x86/VPMU: Interface for setting PMU mode and flags

still missing a VMX maintainer's ack

   x86/VPMU: Initialize VPMUs with __initcall

same here plus no review (albeit I wouldn't make the latter a
requirement)

   x86/VPMU: Initialize PMU for PV(H) guests

same regarding review state

   x86/VPMU: Save VPMU state for PV guests during context switch
   x86/VPMU: When handling MSR accesses, leave fault injection to callers

again same regarding review state

   x86/VPMU: Add support for PMU register handling on PV guests
   x86/VPMU: Use pre-computed masks when checking validity of MSRs
   VPMU/AMD: Check MSR values before writing to hardware

no review yet (and here I'd really like to have one)

   x86/VPMU: Handle PMU interrupts for PV(H) guests

same here

   x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
   x86/VPMU: Add privileged PMU mode

here a review would again be nice, but I'd again not make it a
requirement

   x86/VPMU: Move VPMU files up from hvm/ directory

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3 03/15] Add cmpxchg16b support for x86-64

2015-07-08 Thread Jan Beulich

 On 08.07.15 at 10:33, feng...@intel.com wrote:
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, July 08, 2015 4:13 PM
  On 08.07.15 at 09:06, feng...@intel.com wrote:
  From: xen-devel-boun...@lists.xen.org 
  [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Andrew Cooper
  Sent: Thursday, June 25, 2015 2:35 AM
  On 24/06/15 06:18, Feng Wu wrote:
   +{
   +uint128_t prev;
   +
   +ASSERT(cpu_has_cx16);

  Given that if this assertion were to fail, cmpxchg16b would fail with
  #UD, I would hand-code a asm_fixup section which in turn panics.  This
  avoids a situation where non-debug builds could die with an unqualified
  #UD exception.

  Is there an existing way to panic the hypervisor in assembler code, I
  don't find it, it would be appreciated if you can point it out.

 I'm not convinced such a #UD would be a significant problem: Looking
 at the disassembly will show the cause right away. The out of line
 ud2-s in some of VMX'es inline assembly wrappers are far worse.

 So, do you agree with the fixup section or not?

I'd rather not go that route, unless Andrew or your manage to
convince me otherwise.

 I think Andrew's enforce
 really means ASSERT() or BUG_ON(), again to avoid an unqualified
 exception. However - see above.

 Plus, all that said, without having seen the actual use sites of
 cmpxchg16b yet, I'm not at all convinced we really need this patch.

 After introducing posted format in IRTE, some fields exist in both the
 High 64 bit and the low 64 bit,such as pda_h and pda_l, how to make
 sure it is atomic when updating the pda field?

Is there a need for updating these _after_ initially setting up an
entry?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

1 2 3 >

1 - 100 of 263 matches

Mail list logo