date:20150413

flight 50399 rumpuserxen real [real]
http://logs.test-lab.xenproject.org/osstest/logs/50399/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i386-rumpuserxen5 rumpuserxen-build fail REGR. vs. 33866
 build-amd64-rumpuserxen   5 rumpuserxen-build fail REGR. vs. 33866

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a

version targeted for testing:
 rumpuserxen  80294081448af74270ca2dd2ba758f0ed308f46c
baseline version:
 rumpuserxen  30d72f3fc5e35cd53afd82c8179cc0e0b11146ad


People who touched revisions under test:
  Antti Kantee po...@iki.fi
  Ian Jackson ian.jack...@eu.citrix.com
  Martin Lucina mar...@lucina.net
  Wei Liu l...@liuw.name


jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-rumpuserxen-amd64   blocked 
 test-amd64-i386-rumpuserxen-i386 blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/osstest/pub/logs
images: /home/osstest/pub/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 527 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC v1 00/15] Add VT-d Posted-Interrupts support

2015-04-13 Thread Wu, Feng

 -Original Message-
 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Monday, April 13, 2015 8:13 PM
 To: Wu, Feng
 Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
 Subject: RE: [RFC v1 00/15] Add VT-d Posted-Interrupts support

  On 01.04.15 at 15:21, feng...@intel.com wrote:
  Hi Jan,

  Any more comments about this series? Thanks a lot!

 I think it makes little sense to wait for my review before posting v2
 with issues addressed which others have pointed out. I just
 returned from vacation, and would have time to look at v1 next
 week the earliest (with the week after being occupied by travel).

Thanks a lot, Jan! Any comments are appreciated!

Thanks,
Feng

 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC][PATCH 10/13] tools: extend XENMEM_set_memory_map

2015-04-13 Thread Chen, Tiejun


On 2015/4/13 19:02, Wei Liu wrote:

On Mon, Apr 13, 2015 at 10:09:51AM +0800, Chen, Tiejun wrote:
[...]

Hardcoded value?


Yes. Actually, we intend to use this to present that lowmem entry,

tools/firmware/hvmloader/e820.c:

 /* Low RAM goes here. Reserve space for special pages. */
 ...
 e820[nr].addr = 0x10;



I don't like the idea of having two hardcoded values in different


Just one place since based on our logic, hvmloader doesn't have this 
setting anymore and actually it really grab that info from here. Please 
refer to patch #13.



locations. Please put this value into a header file and reference it
here and in hvmloader.


Anyway, I'd like to define this here directly since no one consumes this 
again.


diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 5134b33..af747e6 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -787,6 +787,7 @@ out:
 return rc;
 }

+#define GUEST_LOW_MEM_START_DEFAULT 0x10
 static int libxl__domain_construct_memmap(libxl__gc *gc,
   libxl_domain_config *d_config,
   uint32_t domid,
@@ -812,8 +813,8 @@ static int libxl__domain_construct_memmap(libxl__gc *gc,
 e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);

 /* Low memory */
-e820[nr].addr = 0x10;
-e820[nr].size = args-lowmem_size - 0x10;
+e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
+e820[nr].size = args-lowmem_size - GUEST_LOW_MEM_START_DEFAULT;
 e820[nr].type = E820_RAM;
 nr++;

Is this fine to you?

Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/2] Re-factoring passthrough/pci.c and adding place-holder code for ARM/PCI

2015-04-13 Thread Jaggi, Manish

Hi Julien,

From: Julien Grall julien.grall@gmail.com
Sent: Monday, April 13, 2015 3:49 PM
To: Jaggi, Manish; xen Devel; Stefano Stabellini; Julien Grall; Ian Campbell; 
Kumar, Vijaya; prasun.kap...@cavium.com
Subject: Re: [Xen-devel] [PATCH 0/2] Re-factoring passthrough/pci.c and adding 
place-holder code for ARM/PCI

Hi Manish,

On 13/04/15 08:37, Manish Jaggi wrote:
 Xen currently does not have PCI support for ARM builds. This patch set
 makes the code compilable for ARM PCI and adds places-holder code
 which would be replaced with PCI pass-through support patch series.

May I ask why you did send directly all the code to support PCI on ARM?

Without the rest it's hard to tell whether these patches make sense or not.
[Manish] As I have mentioned these two patches are only for refactoring msi 
specific  functions to x86 files and providing the constructs that make PCI ARM 
code compilable.

The code for PCI ARM is very basic in the patch and PCI passthrough code 
patches will follow in next series. Please let me know which code does not 
makes sense and I will modify it.

 Re-factor MSI Handling
 -
 There is a some x86 specific code which is found in common code:
 xen/drivers/passthrough/pci.c which needs to be re factored.

 MSI/X are configured and handled by dom0 or domU code on ARM64 and is not
 required to be part of common code. However there are functions which are
 used as part of common code and calls to these functions cannot be easily
 re factored like pci_cleanup_msi.

On x86, the hypervisor is taking care of MSI (enabling/disabling in the
config space) as long as doing sanity check on MSI used by a given domain.

How do you plan to handle it on ARM? IHMO, the MSI code is very useful
and would also help GIC MSI implement (ITS + V2M).

[manish] As we had discussed in Linaro Connect and over the mailing list MSIs 
(LPIs in ARMv8) are handled directly on ARM by the guest and hypervisor is only 
doing injection of the LPIs using GIVv3 emulation. 

FWIW, I've pointed out the same issue on the ITS series a couple of
weeks ago.

Regards,

--
Julien Grall
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/2] xen/arm: Make HAS_PCI compilable on ARM by adding place-holder code

2015-04-13 Thread Jaggi, Manish

Hi Julien,


From: xen-devel-boun...@lists.xen.org xen-devel-boun...@lists.xen.org on 
behalf of Julien Grall julien.gr...@citrix.com
Sent: Monday, April 13, 2015 4:12 PM
To: Jaggi, Manish; Xen Devel; Stefano Stabellini; Julien Grall; Ian Campbell; 
Kumar, Vijaya; prasun.kap...@cavium.com
Subject: Re: [Xen-devel] [PATCH 2/2] xen/arm: Make HAS_PCI compilable on ARM by 
adding place-holder code

Hi Manish,

On 13/04/15 08:48, Manish Jaggi wrote:
 Add ARM PCI Support
 ---
 a) Place holder functions are added for pci_conf_read/write calls.
 b) Macros dev_is_pci, pci_to_dev are implemented in
 drivers/passthrough/pci/arm code

 Signed-off-by: Manish Jaggi manish.ja...@caviumnetworks.com
 ---
  xen/arch/arm/Makefile|1 +
  xen/arch/arm/pci.c   |   60 +++
  xen/drivers/passthrough/arm/Makefile |1 +
  xen/drivers/passthrough/arm/pci.c|   88
 ++
  xen/drivers/passthrough/arm/smmu.c   |1 -
  xen/drivers/passthrough/pci.c|9 ++--
  xen/include/asm-arm/device.h |   33 +
  xen/include/asm-arm/domain.h |3 ++
  xen/include/asm-arm/pci.h|7 +--
  9 files changed, 186 insertions(+), 17 deletions(-)

[...]
 diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
 index 004aba9..68c292b 100644
 --- a/xen/drivers/passthrough/pci.c
 +++ b/xen/drivers/passthrough/pci.c
 @@ -1252,9 +1252,12 @@ static int assign_device(struct domain *d, u16
 seg, u8 bus, u8 devfn)
  /* Prevent device assign if mem paging or mem sharing have been
   * enabled for this domain */
  if ( unlikely(!need_iommu(d) 
 -(d-arch.hvm_domain.mem_sharing_enabled ||
 - d-mem_event-paging.ring_page ||
 - p2m_get_hostp2m(d)-global_logdirty)) )
 +(d-mem_event-paging.ring_page
 +#ifdef CONFIG_X86
 + || d-arch.hvm_domain.mem_sharing_enabled ||
 + p2m_get_hostp2m(d)-global_logdirty
 +#endif
 +)) )

Please avoid #ifdef CONFIG_* and introduce an arch macro. 
[Manish] ok
  return -EXDEV;

  if ( !spin_trylock(pcidevs_lock) )
 diff --git a/xen/include/asm-arm/device.h b/xen/include/asm-arm/device.h
 index a72f7c9..009ff0a 100644
 --- a/xen/include/asm-arm/device.h
 +++ b/xen/include/asm-arm/device.h
 @@ -6,6 +6,17 @@
  enum device_type
  {
  DEV_DT,
 +DEV_ENUMERATED,
 +};
 +
 +enum device_class
 +{
 +DEVICE_SERIAL,
 +DEVICE_IOMMU,
 +DEVICE_GIC,
 +DEVICE_PCI,
 +/* Use for error */
 +DEVICE_UNKNOWN,
  };

Why? It will be very unlikely that we have to create a structure device
for the IOMMU, GIC and SERIAL.

It would make more sense to add a DEV_PCI directly to device_type.
[manish] ok. 


  struct dev_archdata {
 @@ -16,28 +27,30 @@ struct dev_archdata {
  struct device
  {
  enum device_type type;
 +enum device_class class;
  #ifdef HAS_DEVICE_TREE
  struct dt_device_node *of_node; /* Used by drivers imported from
 Linux */
  #endif
  struct dev_archdata archdata;
 +#ifdef HAS_PCI
 +void *pci_dev;

This field is not necessary. I've added one for the DT for keeping
compatibility with Linux.
[Manish] pci_dev is not in struct device currently. Didn't get you
 I have added as It is required  for to_pci_dev call.

 +#endif

[..]

 +int dev_is_pci(device_t *dev);

This could easily be a macro.
[manish] ok


  struct device_desc {
  /* Device name */
 diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
 index 9e0419e..41ae847 100644
 --- a/xen/include/asm-arm/domain.h
 +++ b/xen/include/asm-arm/domain.h
 @@ -42,6 +42,8 @@ struct vtimer {
  uint64_t cval;
  };

 +#define has_arch_pdevs(d)(!list_empty((d)-arch.pdev_list))
 +
  struct arch_domain
  {
  #ifdef CONFIG_ARM_64
 @@ -56,6 +58,7 @@ struct arch_domain
  xen_pfn_t *grant_table_gpfn;

  struct io_handler io_handlers;
 +struct list_head pdev_list;
  /* Continuable domain_relinquish_resources(). */
  enum {
  RELMEM_not_started,
 diff --git a/xen/include/asm-arm/pci.h b/xen/include/asm-arm/pci.h
 index de13359..b8ec882 100644
 --- a/xen/include/asm-arm/pci.h
 +++ b/xen/include/asm-arm/pci.h
 @@ -1,7 +1,8 @@
 -#ifndef __X86_PCI_H__
 -#define __X86_PCI_H__
 +#ifndef __ARM_PCI_H__
 +#define __ARM_PCI_H__

  struct arch_pci_dev {
 +void *dev;

void * is error-prone. Why can't you use the use the real structure?

[manish]Will change it.  I believe dev_archdata structure has also a void *  
(in asm-arm/device.h). So all void * are error prone in xen ?

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [libvirt test] 50401: regressions - FAIL

flight 50401 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/50401/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   5 libvirt-build fail REGR. vs. 50368

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm  11 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 11 guest-start  fail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  b487bb810ec95df862e7e80468c8e861ed80b0cb
baseline version:
 libvirt  225aa80246d5e4a9e3a16ebd4c482525045da3db


People who touched revisions under test:
  CÃ©dric Bosdonnat cbosdon...@suse.com
  Dawid Zamirski dzamir...@datto.com
  Dmitry Guryanov dgurya...@parallels.com
  Erik Skultety eskul...@redhat.com
  John Ferlan jfer...@redhat.com
  JÃ¡n Tomko jto...@redhat.com
  Lubomir Rintel lkund...@v3.sk
  Luyao Huang lhu...@redhat.com
  Maxim Nestratov mnestra...@parallels.com
  Michael Chapman m...@very.puzzling.org
  Michal Privoznik mpriv...@redhat.com
  Peter Krempa pkre...@redhat.com


jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  fail
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-xsm fail
 test-armhf-armhf-libvirt-xsm blocked
 test-amd64-i386-libvirt-xsm  fail
 test-amd64-amd64-libvirt pass
 test-armhf-armhf-libvirt blocked
 test-amd64-i386-libvirt  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/osstest/pub/logs
images: /home/osstest/pub/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 848 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/1] xc_domain_restore: Allow QEMU to increase memory

2015-04-13 Thread Don Slutz

On 04/13/15 12:20, Wei Liu wrote:
 On Mon, Apr 13, 2015 at 12:09:13PM -0400, Don Slutz wrote:
 If QEMU has called on xc_domain_setmaxmem to add more memory for
 option ROMs, domain restore needs to also increase the memory.

 Signed-off-by: Don Slutz dsl...@verizon.com
 ---
 To see the hvmloader loader issue:

...
  
 +/* Leave some slack so that hvmloader does not complain about lack of
 + * memory at boot time (Could not allocate order=0 extent).
 + * Once hvmloader is modified to cope with that situation without
 + * printing warning messages, QEMU_SPARE_PAGES can be removed.
 Yes, please properly modify hvmloader to do this instead of trying to
 workaround this over and over again.

At some point I might get to this.

 But didn't Stefano make some changes to libxl too to handle the extra
 ram needed?

Not that I have seen.

 + */
 +#define QEMU_SPARE_PAGES 16
 +
 This is still arbitrary and prone to breakage.

Yes, but turns out not to be the important part.


  struct restore_ctx {
  unsigned long max_mfn; /* max mfn of the current host machine */
  unsigned long hvirt_start; /* virtual starting address of the 
 hypervisor */
 @@ -209,12 +216,44 @@ static int uncanonicalize_pagetable(
  if (!ctx-hvm  ctx-superpages)
  rc = alloc_superpage_mfns(xch, dom, ctx, nr_mfns);
  else
 +{
...
 +}
 This is not maintainable. Trying to do smart things inside libxc
 without libxl knowing it is prone to breakage.

Some how I forgot to include that this is a bug in master
(4.6-unstable).  Much more info in
my reply to Andrew Cooper (short form below).


[root@dcs-xen-52 don]# xl cre -p /home/don/e1000x8.xfg
Parsing config from /home/don/e1000x8.xfg
got a tsc mode string: native_paravirt
[root@dcs-xen-52 don]# xl save e1000x8 e1000x8.save
Saving to e1000x8.save new xl format (info 0x1/0x0/3506)
xc: Saving memory: iter 0 (last sent 0 skipped 0): 1044481/1044481  100%
[root@dcs-xen-52 don]# xl restore e1000x8.save  
Loading new save file e1000x8.save (new xl fmt info 0x1/0x0/3506)
 Savefile contains xl domain config in JSON format
Parsing config from saved
xc: error: Failed to allocate memory for batch.!: Internal error
libxl: error: libxl_create.c:1057:libxl__xc_domain_restore_done:
restoring domain: Cannot allocate memory
libxl: error: libxl_create.c:1129:domcreate_rebuild_done: cannot
(re-)build domain: -3
libxl: error: libxl.c:1576:libxl__destroy_domid: non-existant domain 2
libxl: error: libxl.c:1540:domain_destroy_callback: unable to destroy
guest with domid 2
libxl: error: libxl_create.c:1490:domcreate_destruction_cb: unable to
destroy domain 2 following failed creation
[root@dcs-xen-52 don]# 


   -Don Slutz


 Wei.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/hvm: Fix the unknown nested vmexit reason 80000021 bug

2015-04-13 Thread Li, Liang Z

  This would be easier to read as
 
  if ( cpu_has_vmx_vnmi 
   (idtv_info  INTR_INFO_INTR_TYPE_MASK) ==
  (X86_EVENTTYPE_NMI8)) )
 
  I was going to say something similar, but I think in the past Jan has
  said that Liang's original is more in line with the coding style.
 
 No, my complaint here wouldn't be about coding style, but about the hard-
 coded 8 - it's not been that long ago that I replaced may of them, and I'd
 really like to see it replaced here too. Liang - can you please submit an
 incremental change (as the original one got committed already)? There
 should be several examples in VMX code on how the 8 can be avoided.
 
 Jan

Of course, with pleasure.

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/1] xc_domain_restore: Allow QEMU to increase memory

2015-04-13 Thread Hongyang Yang


This patch also fix a triple fault when guests running under COLO mode.
(XEN) d0v1 Over-allocation for domain 1: 524545  524544
(XEN) memory.c:155:d0v1 Could not allocate order=0 extent: id=1 memflags=0 (181 
of 235)

(XEN) d1v1 Triple fault - invoking HVM shutdown action 1

On 04/14/2015 12:09 AM, Don Slutz wrote:

If QEMU has called on xc_domain_setmaxmem to add more memory for
option ROMs, domain restore needs to also increase the memory.

Signed-off-by: Don Slutz dsl...@verizon.com
---
To see the hvmloader loader issue:

   xl cre -p e1000x8.xfg
   xl save e1000x8 e1000x8.save
   xl restore e1000x8.save

With e1000x8.xfg:
---
builder = hvm
device_model_args_hvm = [
  -monitor,
  pty,
  -boot,
  menu=on,
]
device_model_version = qemu-xen
disk = [
  /dev/etherd/e500.1,,xvda,
  /dev/etherd/e500.2,,xvdb,
  /dev/etherd/e500.3,,xvde,
  /local-isos/iso/centos/x86_64/CentOS-6.3-x86_64-bin-DVD1.iso,,hdc,cdrom,
  /local-isos/iso/centos/x86_64/CentOS-6.3-x86_64-bin-DVD2.iso,,hdd,cdrom,
]
maxmem = 8192
memory = 8192
name = e1000x8
serial = pty
tsc_mode = native_paravirt
uuid = e5000105-3d83-c962-07ae-2bc46c3644a0
videoram = 16
vif = [
  model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:a0,
  model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:aa,
  model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:b4,
  model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:be,
  model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:c8,
  model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:d2,
  model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:dc,
  model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:e6,
  model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:f0,
  model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:fa,
]
viridian = 0
xen_platform_pci = 1
mmio_hole = 2048
vcpus = 1
maxvcpus = 6
on_poweroff = preserve
on_reboot = preserve
on_watchdog = preserve
on_crash = preserve
---


  tools/libxc/xc_domain_restore.c | 75 +++--
  1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index 2ab9f46..28b4fa6 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -47,6 +47,13 @@
  #include xen/hvm/ioreq.h
  #include xen/hvm/params.h

+/* Leave some slack so that hvmloader does not complain about lack of
+ * memory at boot time (Could not allocate order=0 extent).
+ * Once hvmloader is modified to cope with that situation without
+ * printing warning messages, QEMU_SPARE_PAGES can be removed.
+ */
+#define QEMU_SPARE_PAGES 16
+
  struct restore_ctx {
  unsigned long max_mfn; /* max mfn of the current host machine */
  unsigned long hvirt_start; /* virtual starting address of the hypervisor 
*/
@@ -209,12 +216,44 @@ static int uncanonicalize_pagetable(
  if (!ctx-hvm  ctx-superpages)
  rc = alloc_superpage_mfns(xch, dom, ctx, nr_mfns);
  else
+{
+xc_domaininfo_t info;
+unsigned long free_pages;
+
+if ((xc_domain_getinfolist(xch, dom, 1, info) != 1) ||
+(info.domain != dom)) {
+ERROR(Failed xc_domain_getinfolist for batch 
(uncanonicalize_pagetable)\n);
+errno = ENOMEM;
+return 0;
+}
+free_pages = info.max_pages - info.tot_pages;
+if (free_pages  QEMU_SPARE_PAGES) {
+free_pages -= QEMU_SPARE_PAGES;
+} else {
+free_pages = 0;
+}
+if (free_pages  nr_mfns)
+{
+DPRINTF(Adjust memory for batch (uncanonicalize_pagetable): 
free_pages=%lu nr_mfns=%d max_pages=%lu tot_pages=%lu max_mfn=%lu\n,
+free_pages, nr_mfns, (unsigned long)info.max_pages,
+(unsigned long)info.tot_pages, ctx-max_mfn);
+if (xc_domain_setmaxmem(xch, dom,
+((info.max_pages + nr_mfns - 
free_pages)
+  (XC_PAGE_SHIFT - 10)))  0)
+{
+ERROR(Failed xc_domain_setmaxmem for batch 
(uncanonicalize_pagetable)\n);
+errno = ENOMEM;
+return 0;
+}
+}
  rc = xc_domain_populate_physmap_exact(xch, dom, nr_mfns, 0, 0,
ctx-p2m_batch);
+}

  if (rc)
  {
-ERROR(Failed to allocate memory for batch.!\n);
+ERROR(Failed to allocate memory for batch. rc=%d nr_mfns=%d!\n,
+  rc, nr_mfns);
  errno = ENOMEM;
  return 0;
  }
@@ -1241,12 +1280,44 @@ static int apply_batch(xc_interface *xch, uint32_t dom, 
struct restore_ctx *ctx,
  if (!ctx-hvm  ctx-superpages)
  rc =

Re: [Xen-devel] [RFC PATCH COLO v5 01/29] Add readme

2015-04-13 Thread Hongyang Yang


Wei,

  Thanks for the review and sorry for the late reply, was debugging some triple
fault bug. However, it is now fixed, and COLO running much more stable now.

  For the readme, it is kind of outdated, we will update it and also address
your comments, then we'll put it onto wiki page. The intree readme will be some
simple desciption and links to wiki pages.

On 04/09/2015 02:11 AM, Wei Liu wrote:

On Wed, Apr 01, 2015 at 02:41:37PM +0800, Yang Hongyang wrote:

From: Wen Congyang we...@cn.fujitsu.com

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
  docs/README.colo | 92 
  1 file changed, 92 insertions(+)
  create mode 100644 docs/README.colo

diff --git a/docs/README.colo b/docs/README.colo
new file mode 100644
index 000..60f487d
--- /dev/null
+++ b/docs/README.colo
@@ -0,0 +1,92 @@
+COLO provides fault tolerance for virtual machines by sending continuous
+checkpoints to a backup, which will activate if the target VM fails. It
+only supports HVM guest(without pv extensions).

^ PV


+
+Requriements:
+1. Hardware requriements
+   There is at least one directly connected nic to forward the nic from client
+   to secondary vm. The directly connected nic must not be used by any other
+   purpose. If your guest has more than one nic, you should have directly
+   connected nic for each guest nic. If you don't have enouth directly 
connected
+   nic, you can use vlan.
+2. Dom0 requirements
+   - Support dom0
+   - kernel module:
+sch_ingress
+cls_basic
+cls_tcindex
+cls_u32
+act_mirred
+   - libnl-tools = 3.0. This package provides the command nl-qdisc-list, and
+ colo need this command.

 ^ COLO


+   - If your host os has OEM-released xen tools, please uninstall it first.

  ^OS ^ Xen

(and please fix other occurrences of wrong capitalisations as well)


OK



This is a very broad statement and it is not very helpful from both
developers and users' point of view. Can you elaborate on what
functionalities that COLO needs to have exclusive access to?


+   - You can load the module which is not provided by OEM.


What does this mean?


Means you may need to compile the module yourself, will make it clear.




+3. Guest requirements
+   Only HVM guest(without pv extensions) is supported now. If you want to
+   use OEM released guest os, please use SUSE. REDHAT and Ubuntu is not
+   supported now because I don't find any way to disable pv extensions.
+   If you want to use REDHAT or Ubuntu, you need to build the newest
+   kernel which has the parameter xen_nopv.
+


FWIW, does xen_platform_pci=0 in your xl.cfg work for RH and Ubuntu
guests?


It works only if you compile the newer kernel by yourself which support this
option.




+Network link topology
+   Please refer to: http://wiki.qemu.org/Features/COLO#Network_link_topology
+
+The steps to setup COLO environment:
+You need to recompile your host kernel because colo-proxy module need cooperate
+with linux kernel.
+Please refer to: http://wiki.qemu.org/Features/COLO#Test_environment_prepare
+1. Build and install xen
+2. Apply the patch for qemu xen, and rebuild xen tools:
+- cd tools/qemu-xen-dir
+- use git am to apply the patch:
+  
https://raw.githubusercontent.com/wencongyang/colo-files/master/patch_for_qemu/*.patch
+- make tools  make install-tools
+Note: You must use qemu-xen. qemu-xen-traditional is not supported.


Note that you will eventually need to upstream your changes to QEMU.


Sure, we have already posted the block patches to QEMU. It is under review now.
http://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg00399.html




+3. Install COLO proxy module:
+3.1 Download COLO proxy, compile and install it:
+https://github.com/gao-feng/colo-proxy.git
+3.2 Download iptables patch, it is based on v1.4.21 compile and install it:
+
https://github.com/gao-feng/colo-proxy/blob/master/colo-patch-for-kernel.patch
+4. Install the guest
+4.1 Add xen_platform_pci=0 into the guest configfile
+4.2 If you use suse, please select physical machine
+4.3 copy the disk image to the secondary host
+5. Update your guest config file for COLO:
+5.1 disk
+disk = [
+
'format=raw,devtype=disk,access=w,vdev=hda,backendtype=qdisk,colo,colo-params=192.168.3.1:9000:exportname=qdisk1,active-disk=/mnt/ramfs/active_disk.img,hidden-disk=/mnt/ramfs/hidden_disk.img,target=/root/images/colo-hvm.img'
 ]


It's unclear which parts are updated compared to the original config,
i.e. can you list the additional bits to enable COLO? Presumably it's
only those options starting with colo?


mostly, will update the doc to make it clear.




+5.2 nic
+vif = [ 'mac=00:16:4f:00:00:11, bridge=br0, model=e1000, 
forwarddev=eth0, forwardbr=br1' ]
+Note:
+a. The

[Xen-devel] [linux-3.16 test] 50396: regressions - FAIL

flight 50396 linux-3.16 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/50396/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-credit2  17 guest-localmigrate/x10fail REGR. vs. 34167

Tests which are failing intermittently (not blocking):
 test-amd64-i386-pair 21 guest-migrate/src_host/dst_host fail pass in 50289
 test-amd64-amd64-xl-qemuu-win7-amd64  9 windows-install fail pass in 50360

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-xsm 7 capture-logs(7) broken in 50360 blocked in 34167
 test-armhf-armhf-libvirt-xsm 7 capture-logs(7) broken in 50360 blocked in 34167
 test-armhf-armhf-libvirt  12 capture-logs(12) broken in 50360 blocked in 34167
 test-armhf-armhf-xl-cubietruck 18 capture-logs(18) broken in 50360 blocked in 
34167
 test-armhf-armhf-xl   18 capture-logs(18) broken in 50360 blocked in 34167
 test-armhf-armhf-xl-sedf-pin 18 capture-logs(18) broken in 50360 blocked in 
34167
 test-armhf-armhf-xl-sedf  18 capture-logs(18) broken in 50360 blocked in 34167
 test-armhf-armhf-xl-multivcpu 18 capture-logs(18) broken in 50360 blocked in 
34167
 test-armhf-armhf-xl-credit2 18 capture-logs(18) broken in 50360 blocked in 
34167
 test-armhf-armhf-xl-arndale 18 capture-logs(18) broken in 50360 blocked in 
34167
 test-armhf-armhf-libvirt  9 guest-start   fail in 50289 like 34167

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stopfail in 50289 never pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 9 debian-hvm-install fail never 
pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-libvirt-xsm 11 guest-start  fail   never pass
 test-amd64-i386-xl-xsm   11 guest-start  fail   never pass
 test-amd64-amd64-xl-sedf 17 guest-localmigrate/x10   fail   never pass
 test-amd64-amd64-xl-xsm  11 guest-start  fail   never pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 9 debian-hvm-install fail never 
pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 9 debian-hvm-install fail never 
pass
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 9 debian-hvm-install fail never 
pass
 test-amd64-amd64-xl-multivcpu 17 guest-localmigrate/x10   fail  never pass
 test-amd64-i386-libvirt-xsm  11 guest-start  fail   never pass
 test-armhf-armhf-xl-xsm   6 xen-boot fail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-freebsd10-i386  9 freebsd-install  fail never pass
 test-armhf-armhf-libvirt-xsm  6 xen-boot fail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-win7-amd64 16 guest-stop   fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass
 test-amd64-i386-freebsd10-amd64  9 freebsd-install fail never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-winxpsp3 16 guest-stopfail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 16 guest-stop   fail never pass
 test-amd64-amd64-xl-winxpsp3 16 guest-stop   fail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf-pin 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-amd64-amd64-xl-qemut-winxpsp3 16 guest-stop   fail never pass
 test-amd64-i386-xl-win7-amd64 16 guest-stop   fail  never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-winxpsp3  16 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 16 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass
 test-amd64-amd64-xl-sedf-pin 17 guest-localmigrate/x10   fail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 16 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 16 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 16 guest-stopfail never pass

version targeted for testing:
 linux0596eeda8839481a3a2d64b07116581c0fc5fa0a
baseline version:

Re: [Xen-devel] [PATCH 4/4] hvmloader: add knob for fixed VGABIOS date string

 On 02.04.15 at 11:45, ian.campb...@citrix.com wrote:
 On Wed, 2015-04-01 at 13:28 +, Olaf Hering wrote:
 To allow reproducible builds of hvmloader introduce a make variable
 VGABIOS_REL_DATE=dd Mon  to provide a fixed date string. Without
 this change the hvmloader binary changes with every rebuild.

 Signed-off-by: Olaf Hering o...@aepfle.de
 Cc: Ian Jackson ian.jack...@eu.citrix.com
 Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com

 Acked-by: Ian Campbell ian.campb...@citrix.com

 It's not clear if vgabios should be maintained by us tools folks or if
 it should have been taken under the hvmloader umbrella and maintained by
 the hypervisor folks.

I'm not sure either - hvmloader is really code we grew and hence
maintaining it together with its hypervisor side makes sense. For
the various firmware blobs, otoh, I think they match more closely
with the qemu/tools pattern.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 2/2] xen/arm: Make HAS_PCI compilable on ARM by adding place-holder code


Add ARM PCI Support
---
a) Place holder functions are added for pci_conf_read/write calls.
b) Macros dev_is_pci, pci_to_dev are implemented in
drivers/passthrough/pci/arm code

Signed-off-by: Manish Jaggi manish.ja...@caviumnetworks.com
---
 xen/arch/arm/Makefile|1 +
 xen/arch/arm/pci.c   |   60 +++
 xen/drivers/passthrough/arm/Makefile |1 +
 xen/drivers/passthrough/arm/pci.c|   88 ++
 xen/drivers/passthrough/arm/smmu.c   |1 -
 xen/drivers/passthrough/pci.c|9 ++--
 xen/include/asm-arm/device.h |   33 +
 xen/include/asm-arm/domain.h |3 ++
 xen/include/asm-arm/pci.h|7 +--
 9 files changed, 186 insertions(+), 17 deletions(-)

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 935999e..aaf5d88 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -39,6 +39,7 @@ obj-y += device.o
 obj-y += decode.o
 obj-y += processor.o
 obj-y += smc.o
+obj-$(HAS_PCI) += pci.o
 
 #obj-bin-y += o
 
diff --git a/xen/arch/arm/pci.c b/xen/arch/arm/pci.c

new file mode 100644
index 000..9438f41
--- /dev/null
+++ b/xen/arch/arm/pci.c
@@ -0,0 +1,60 @@
+#include xen/pci.h
+
+void _pci_conf_write(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg, uint32_t data, int bytes)
+{
+}
+
+uint32_t _pci_conf_read(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg, int bytes)
+{
+return 0;
+}
+
+uint8_t pci_conf_read8(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg)
+{
+return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 1);
+}
+
+uint16_t pci_conf_read16(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg)
+{
+return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 2);
+}
+
+uint32_t pci_conf_read32(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg)
+{
+return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 4);
+}
+
+void pci_conf_write8(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg, uint8_t data)
+{
+ _pci_conf_write(seg, bus, dev, func, reg, data, 1);
+}
+
+void pci_conf_write16(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg, uint16_t data)
+{
+ _pci_conf_write(seg, bus, dev, func, reg, data, 2);
+}
+
+void pci_conf_write32(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg, uint32_t data)
+{
+ _pci_conf_write(seg, bus, dev, func, reg, data, 4);
+}
+
+void arch_pci_ro_device(int seg, int bdf)
+{
+}
diff --git a/xen/drivers/passthrough/arm/Makefile 
b/xen/drivers/passthrough/arm/Makefile
index f4cd26e..1a41549 100644
--- a/xen/drivers/passthrough/arm/Makefile
+++ b/xen/drivers/passthrough/arm/Makefile
@@ -1,2 +1,3 @@
 obj-y += iommu.o
 obj-y += smmu.o
+obj-$(HAS_PCI) += pci.o
diff --git a/xen/drivers/passthrough/arm/pci.c 
b/xen/drivers/passthrough/arm/pci.c
new file mode 100644
index 000..07a5a78
--- /dev/null
+++ b/xen/drivers/passthrough/arm/pci.c
@@ -0,0 +1,88 @@
+/*
+ * PCI helper functions for arm for passthrough support.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2015 Cavium Networks
+ *
+ * Author: Manish Jaggi manish.ja...@cavium.com
+ */
+#include xen/pci.h
+#include asm/device.h
+#include xen/sched.h
+
+int _dump_pci_devices(struct pci_seg *pseg, void *arg)
+{
+struct pci_dev *pdev;
+
+printk( segment %04x \n, pseg-nr);
+
+list_for_each_entry ( pdev, pseg-alldevs_list, alldevs_list )
+{
+printk(%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs  ,
+   pseg-nr, pdev-bus,
+   PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn),
+   pdev-domain ? pdev-domain-domain_id : -1,
+   (pdev-node != NUMA_NO_NODE) ? pdev-node : -1);
+printk(\n);
+}
+
+return 0;
+}
+
+struct device* pci_to_dev(struct pci_dev *pci)
+{
+if(!pci-arch.dev)
+{
+device_t *dev;
+dev = xzalloc (struct device);
+pci-arch.dev = dev;
+dev-class = DEVICE_PCI;
+dev-type = DEV_ENUMERATED;
+}
+return pci-arch.dev;
+}
+
+struct pci_dev* to_pci_dev(struct device *dev)
+{
+if(dev-class == DEVICE_PCI)
+return

Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown

 On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote:
 On Wed, 25 Mar 2015, Jan Beulich wrote:
 When a device gets detached from a guest, pciback will clear its
 command register, thus disabling both memory and I/O decoding. The
 disabled memory decoding, however, has an effect on the MSI-X table
 accesses the hypervisor does: These won't have the intended effect
 anymore. Even worse, for PCIe devices (but not SR-IOV virtual
 functions) such accesses may (will?) be treated as Unsupported
 Requests, causing respective errors to be surfaced, potentially in the
 form of NMIs that may be fatal to the hypervisor or Dom0 is different
 ways. Hence rather than carrying out these accesses, we should avoid
 them where we can, and use alternative (e.g. PCI config space based)
 mechanisms to achieve at least the same effect.
 
 I don't think that it is a good idea for both Xen and Linux to access
 the command register simultaneously.  Working around Linux in Xen
 doesn't sound like an optimal solution.   Maybe we could just fix the
 pciback and that would be enough.

I'm afraid that would just eliminate the specific case, but not the
general issue. While we trust Dom0 to not do outright bad things,
the hypervisor should still avoid doing things that can go wrong
due to the state a device is put (or left) in by Dom0.

 In any case we should make it clear somewhere who is supposed to write
 to the command register (and other PCI reigsters) at any given time,
 otherwise it would be very easy for a new kernel update to break the
 hypervisor and we wouldn't even know why it happened.

We should, but at this point in time this is going to be rather
problematic. Such a separation of responsibilities should have been
done before all the pass-through code got written.

 @@ -369,24 +401,52 @@ static void msi_set_mask_bit(struct irq_
  }
  break;
  case PCI_CAP_ID_MSIX:
 -{
 -int offset = PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
 -writel(flag, entry-mask_base + offset);
 -readl(entry-mask_base + offset);
 -break;
 -}
 +if ( likely(memory_decoded(pdev)) )
 +{
 +writel(flag, entry-mask_base + 
 PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
 +readl(entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
 +break;
 +}
 +if ( flag )
 +{
 +u16 control;
 +domid_t domid = pdev-domain-domain_id;
 +
 +control = pci_conf_read16(seg, bus, slot, func,
 +  
 msix_control_reg(entry-msi_attrib.pos));
 +if ( control  PCI_MSIX_FLAGS_MASKALL )
 +break;
 +pci_conf_write16(seg, bus, slot, func,
 + msix_control_reg(entry-msi_attrib.pos),
 + control | PCI_MSIX_FLAGS_MASKALL);
 
 How is that going to interact with Linux (Dom0) writing to the command
 register? Moreover QEMU writes to the PCI_MSIX_FLAGS_MASKALL bit for
 devices assigned to HVM guests. Could this cause any conflicts?

I don't see the relation to the comment register here - all quoted code
only deals with the MSI-X control register.

 One might argue that QEMU should not touch PCI_MSIX_FLAGS_MASKALL, but
 as a matter of fact QEMU has done that for years and we cannot break
 that behaviour without introducing regressions.  In fact as it stands
 QEMU is the owner of PCI_MSIX_FLAGS_MASKALL for devices assigned to HVM
 guests, not Xen.
 
 Can we avoid messing with PCI_MSIX_FLAGS_MASKALL in Xen for passed
 through devices to running domains?  I think that might be a good enough
 separation of responsibilities between Xen and QEMU.

No, the bits has to be under hypervisor control (just like any other
hardware bits controlling interrupts). Dealing with this is the subject
of patches I have ready, but didn't post yet.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [OSSTEST Nested PATCH v8 7/7] Add test job for nest test case

1. This patch adds creation of the nested test job, when job creation
procedure is invoked.
2. Set nested L1's vif model as e1000 by make-flight.

Signed-off-by: longtao.pang longtaox.p...@intel.com
---
Changes in v8:
1. Use 'nestedl1' and 'nestedl2' as L1 and L2's clearer name in nested
test job.
2. Setting Debian_ISO as 'debian-7.2.0-amd64-CD-1.iso' in make-flight.
---
 make-flight |   20 
 1 file changed, 20 insertions(+)

diff --git a/make-flight b/make-flight
index cc8ecdb..40ca808 100755
--- a/make-flight
+++ b/make-flight
@@ -204,6 +204,25 @@ do_hvm_win7_x64_tests () {
 all_hostflags=$most_hostflags,hvm
 }
 
+do_hvm_debian_nested_tests () {
+  if [ $xenarch != amd64 -a $dom0arch != amd64 ]; then
+return
+  fi
+
+  job_create_test test-$xenarch$kern-$dom0arch-nested test-nested xl \
+$xenarch $dom0arch \
+nestedl1_image=debian-7.2.0-amd64-CD-1.iso \
+nestedl2_image=debian-7.2.0-amd64-CD-1.iso \
+bios=seabios \
+kernbuildjob=build-amd64-pvops \
+kernkind=pvops \
+nestedl1_vifmodel='e1000' \
+nested_disk='15000' \
+nestedl1_memory='3072' \
+device_model_version=qemu-xen \
+all_hostflags=$most_hostflags,hvm
+}
+
 do_hvm_debian_test_one () {
   testname=$1
   bios=$2
@@ -430,6 +449,7 @@ test_matrix_do_one () {
 done
 
   fi
+  do_hvm_debian_nested_tests
   #do_passthrough_tests
 }
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [OSSTEST Nested PATCH v8 4/7] Changes on test step of Debian hvm guest install

1. Increase disk size to accommodate to nested test requirement.
2. Since 'Debain-xxx-.iso' image will be stored in rootfs of L1 guest,
therefore needs more disk capacity, increase root partition size in
preseed generation.
3. In L1 installation context, assign more memory to it; Since it
acts as a nested hypervisor anyway.
4. Comment out CDROM entry in sources.list to make HTTP URL entry
available for L1 hvm guest.
5. Enable nestedhvm feature in ExtraConfig for nested job.

Signed-off-by: longtao.pang longtaox.p...@intel.com
---
Changes in v8:
1. Update a conventional way to comment out CDROM entry in sources.list.
2. Enable nestedhvm feature only for the nested job.
3. Set nested disk and memroy size to be driven from runvars in
'ts-debian-hvm-install'.
---
 ts-debian-hvm-install |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/ts-debian-hvm-install b/ts-debian-hvm-install
index cfd5144..13009d0 100755
--- a/ts-debian-hvm-install
+++ b/ts-debian-hvm-install
@@ -36,7 +36,7 @@ our $ho= selecthost($whhost);
 
 # guest memory size will be set based on host free memory, see below
 our $ram_mb;
-our $disk_mb= 1;
+our $disk_mb= $r{'nested_disk'} ? $r{'nested_disk'} : 1;
 
 our $guesthost= $gn.guest.osstest;
 our $gho;
@@ -60,7 +60,7 @@ d-i partman-auto/expert_recipe string \\
 use_filesystem{ } filesystem{ vfat } \\
 mountpoint{ /boot/efi } \\
 . \\
-5000 50 5000 ext4 \\
+1 50 1 ext4 \\
 method{ format } format{ } \\
 use_filesystem{ } filesystem{ ext4 } \\
 mountpoint{ / } \\
@@ -75,6 +75,7 @@ d-i preseed/late_command string \\
 in-target mkdir -p /boot/efi/EFI/boot; \\
 in-target cp /boot/efi/EFI/debian/grubx64.efi 
/boot/efi/EFI/boot/bootx64.efi ;\\
 in-target mkdir -p /root/.ssh; \\
+in-target sed -i 's/^deb *cdrom/#/g' /etc/apt/sources.list; \\
 in-target sh -c echo -e '$authkeys' /root/.ssh/authorized_keys;
 END
 return $preseed_file;
@@ -149,9 +150,11 @@ sub prep () {
 target_putfilecontents_root_stash($ho, 10, preseed(),
   $preseed_file_path);
 
+my $nestedhvm_feature = $gn eq 'nestedl1' ? 'nestedhvm=1' : '';
 more_prepareguest_hvm($ho,$gho, $ram_mb, $disk_mb,
   OnReboot = 'preserve',
   Bios = $r{bios},
+  ExtraConfig = $nestedhvm_feature,
   PostImageHook = sub {
 my $cmds = iso_copy_content_from_image($gho, $newiso);
 $cmds .= prepare_initrd($initrddir,$newiso,$preseed_file_path);
@@ -174,7 +177,7 @@ my $ram_lots = 5000;
 if ($host_freemem_mb  $ram_lots * 2 + $ram_minslop) {
 $ram_mb = $ram_lots;
 } else {
-$ram_mb = 768;
+$ram_mb = $r{${gn}_memory} ? $r{${gn}_memory} : 768;
 }
 logm(Host has $host_freemem_mb MB free memory, setting guest memory size to 
$ram_mb MB);
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [OSSTEST Nested PATCH v8 3/7] Edit some APIs in TestSupport.pm for nested test

1. Designate vif model by make-flight.
2. In L2 installation context, its host (L1) IP address is not queried
from DNS, but after running ts-nested-setup + host + nested, L1 IP is
stored in runvar.

Signed-off-by: longtao.pang longtaox.p...@intel.com
---
Changes in v8:
Remove the unnecessary symbol of ';' in 'TestSupport.pm'.
---
 Osstest/TestSupport.pm |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index 1bde67d..4cdacfc 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -818,6 +818,10 @@ sub selecthost ($) {
logm(Host $name is in HostGroup $ho-{Properties}{HostGroup});
 }
 
+if ( $r{${name}_ip} ) {
+$setprop-(IpAddr, $r{${name}_ip});
+}
+
 # Next, we use the config file's general properites as defaults
 foreach my $k (keys %c) {
next unless $k =~ m/^HostProp_([A-Z].*)$/;
@@ -1548,11 +1552,13 @@ sub prepareguest_part_xencfg ($) {
 my $oncrash= $xopts-{OnCrash} || 'preserve';
 my $vcpus= guest_var($gho, 'vcpus', $xopts-{DefVcpus} || 2);
 my $xoptcfg= $xopts-{ExtraConfig};
+my $vif= guest_var($gho, 'vifmodel','');
+my $vifmodel= $vif ? model=$vif : '';
 $xoptcfg='' unless defined $xoptcfg;
 my $xencfg= END;
 name= '$gho-{Name}'
 memory = ${ram_mb}
-vif = [ 'type=ioemu,mac=$gho-{Ether}' ]
+vif = [ 'type=ioemu,${vifmodel},mac=$gho-{Ether}' ]
 #
 on_poweroff = '$onpoweroff'
 on_reboot   = '$onreboot'
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [OSSTEST Nested PATCH v8 6/7] Compose the main recipe of nested test job

Signed-off-by: longtao.pang longtaox.p...@intel.com
---
Changes in v8:
Change the patch order from 6 to 5.
---
 sg-run-job |   11 +++
 1 file changed, 11 insertions(+)

diff --git a/sg-run-job b/sg-run-job
index eae159d..2ca5ebf 100755
--- a/sg-run-job
+++ b/sg-run-job
@@ -299,6 +299,17 @@ proc run-job/test-pair {} {
 #run-ts . remus-failover ts-remus-check src_host dst_host + debian
 }
 
+proc need-hosts/test-nested {} {return host}
+proc run-job/test-nested {} {
+run-ts . = ts-debian-hvm-install + host + nestedl1
+run-ts . = ts-nested-setup + host + nestedl1
+run-ts . = ts-xen-install nested_l1
+run-ts . = ts-host-reboot nested_l1
+run-ts . = ts-debian-hvm-install nested_l1 nestedl2
+run-ts . = ts-guest-stop nested_l1 nestedl2
+run-ts . = ts-guest-destroy host nestedl1
+}
+
 proc test-guest-migr {g} {
 if {[catch { run-ts . = ts-migrate-support-check + host $g }]} return
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [OSSTEST Nested PATCH v8 5/7] Add new script to customize nested test configuration

1. In this script, make some appropriate runvars which selecthost would
recognise.
2. Prepare the configurations for installing L2 guest VM.
3. Create a lv disk in L0 and hot-attach it to L1, need to restart L1 to
make the block disk to be recognized by L1 system, then using this disk
to create a VG that used for installing L2.

Signed-off-by: longtao.pang longtaox.p...@intel.com
---
Changes in v8:
1. Replace '$nested_host' by '$l1-{Guest}'.
---
 ts-nested-setup |   52 
 1 file changed, 52 insertions(+)
 create mode 100755 ts-nested-setup

diff --git a/ts-nested-setup b/ts-nested-setup
new file mode 100755
index 000..41d5e80
--- /dev/null
+++ b/ts-nested-setup
@@ -0,0 +1,52 @@
+#!/usr/bin/perl -w
+# This is part of osstest, an automated testing framework for Xen.
+# Copyright (C) 2015 Intel Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+
+use strict qw(vars);
+use DBI;
+use Osstest;
+use Osstest::Debian;
+use Osstest::TestSupport;
+
+tsreadconfig();
+our ($l0,$l1) = ts_get_host_guest(@ARGV);
+
+guest_check_ip($l1);
+target_cmd_root($l1, mkdir -p /home/osstest/.ssh  cp 
/root/.ssh/authorized_keys /home/osstest/.ssh/);
+my $url = $c{WebspaceUrl}.$c{WebspaceCommon}.$l0-{Name}._.'overlay.tar';
+target_cmd_root($l1, END);
+wget -O overlay.tar $url
+tar -xf overlay.tar -C /
+rm overlay.tar -f
+update-rc.d osstest-confirm-booted start 99 2 .
+END
+
+store_runvar('nested_l1',$l1-{Guest});
+store_runvar($l1-{Guest}_ip,$l1-{Ip});
+
+my $l2_disk_mb = 2;
+my $l2_lv_name = 'nestedl2-disk';
+my $vgname = $l0-{Name};
+$vgname .= .$c{TestHostDomain} if ($l0-{Suite} =~ m/lenny/);
+target_cmd_root($l0, END);
+lvremove -f /dev/${vgname}/${l2_lv_name} ||:
+lvcreate -L ${l2_disk_mb}M -n $l2_lv_name $vgname
+dd if=/dev/zero of=/dev/${vgname}/${l2_lv_name} count=10
+xl block-attach $l1-{Name} /dev/${vgname}/${l2_lv_name},raw,sdb,rw
+END
+target_install_packages_norec($l1, qw(lvm2 rsync genisoimage));
+target_reboot($l1);
+target_cmd_root($l1, pvcreate /dev/xvdb  vgcreate ${l2_lv_name}_vg 
/dev/xvdb);
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [OSSTEST Nested PATCH v8 2/7] Changes to support '/boot' leading paths of kernel, xen, in grub

Support situations of grub that have vmlinuz and other things starting
with path of '/boot' rather than '/'.

Signed-off-by: longtao.pang longtaox.p...@intel.com
---
 Osstest/Debian.pm |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm
index 378af54..1a57cdd 100644
--- a/Osstest/Debian.pm
+++ b/Osstest/Debian.pm
@@ -449,21 +449,21 @@ sub setupboot_grub2 () {
 if (m/^submenu\s+[\'\](.*)[\'\].*\{\s*$/) {
 $submenu={ StartLine =$.};
 }
-if (m/^\s*multiboot\s*\/(xen\-[0-9][-+.0-9a-z]*\S+)/) {
+if (m/^\s*multiboot\s*(?:\/boot)?\/(xen\S+)/) {
 die unless $entry;
 $entry-{Hv}= $1;
 }
-if (m/^\s*multiboot\s*\/(vmlinu[xz]-(\S+))/) {
+if (m/^\s*multiboot\s*(?:\/boot)?\/(vmlinu[xz]-(\S+))/) {
 die unless $entry;
 $entry-{KernOnly}= $1;
 $entry-{KernVer}= $2;
 }
-if (m/^\s*module\s*\/(vmlinu[xz]-(\S+))/) {
+if (m/^\s*module\s*(?:\/boot)?\/(vmlinu[xz]-(\S+))/) {
 die unless $entry;
 $entry-{KernDom0}= $1;
 $entry-{KernVer}= $2;
 }
-if (m/^\s*module\s*\/(initrd\S+)/) {
+if (m/^\s*module\s*(?:\/boot)?\/(initrd\S+)/) {
 $entry-{Initrd}= $1;
 }
if (m/^\s*module\s*\/(xenpolicy\S+)/) {
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [OSSTEST Nested PATCH v8 1/7] parsing grub which has 'submenu' primitive

From a hvm kernel build from Linux stable Kernel tree,
the auto generated grub2 menu will have 'submenu' primitive, upon the
'menuentry' items. Xen boot entries will be grouped into a submenu. This
patch adds capability to support such grub formats.

Signed-off-by: longtao.pang longtaox.p...@intel.com
---
Changes in v8:
Separate unrelated codes to the use of submenu.
---
 Osstest/Debian.pm |   13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm
index 69530fb..378af54 100644
--- a/Osstest/Debian.pm
+++ b/Osstest/Debian.pm
@@ -404,10 +404,18 @@ sub setupboot_grub2 () {
 
 my $count= 0;
 my $entry;
+my $submenu;
 while ($f) {
 next if m/^\s*\#/ || !m/\S/;
 if (m/^\s*\}\s*$/) {
-die unless $entry;
+die unless $entry || $submenu;
+if(!defined $entry  defined $submenu){
+logm(Met end of a submenu starting from .
+$submenu-{StartLine}. .
+Our want kern is $want_kernver);
+$submenu=undef;
+next;
+}
 my (@missing) =
 grep { !defined $entry-{$_} } 
(defined $xenhopt
@@ -438,6 +446,9 @@ sub setupboot_grub2 () {
 $entry= { Title = $1, StartLine = $., Number = $count };
 $count++;
 }
+if (m/^submenu\s+[\'\](.*)[\'\].*\{\s*$/) {
+$submenu={ StartLine =$.};
+}
 if (m/^\s*multiboot\s*\/(xen\-[0-9][-+.0-9a-z]*\S+)/) {
 die unless $entry;
 $entry-{Hv}= $1;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 0/2] Re-factoring passthrough/pci.c and adding place-holder code for ARM/PCI


Xen currently does not have PCI support for ARM builds. This patch set
makes the code compilable for ARM PCI and adds places-holder code
which would be replaced with PCI pass-through support patch series.

Re-factor MSI Handling
-
There is a some x86 specific code which is found in common code:
xen/drivers/passthrough/pci.c which needs to be re factored.

MSI/X are configured and handled by dom0 or domU code on ARM64 and is not
required to be part of common code. However there are functions which are
used as part of common code and calls to these functions cannot be easily
re factored like pci_cleanup_msi.

xen/drivers/passthrough/arch/pci.c files handle these functions.

Add ARM PCI Support
---
a) Place holder functions are added for pci_conf_read/write calls.
b) Macros dev_is_pci, pci_to_dev are implemented in
drivers/passthrough/pci/arm code

Manish Jaggi (2):
  xen/x86: Patch re-factors MSI/X config code from
drivers/passthrough/pci.c to x86 specific
  xen/arm: Make HAS_PCI compilable on ARM by adding place-holder code

 xen/arch/arm/Makefile|1 +
 xen/arch/arm/pci.c   |   60 ++
 xen/drivers/passthrough/arm/Makefile |1 +
 xen/drivers/passthrough/arm/pci.c|   88 ++
 xen/drivers/passthrough/arm/smmu.c   |1 -
 xen/drivers/passthrough/pci.c|  111 +++-
 xen/drivers/passthrough/x86/Makefile |1 +
 xen/drivers/passthrough/x86/pci.c|  115 ++
 xen/include/asm-arm/device.h |   33 +++---
 xen/include/asm-arm/domain.h |3 +
 xen/include/asm-arm/pci.h|7 ++-
 xen/include/asm-x86/msi.h|1 -
 xen/include/xen/pci.h|   20 +-
 13 files changed, 323 insertions(+), 119 deletions(-)
 create mode 100644 xen/arch/arm/pci.c
 create mode 100644 xen/drivers/passthrough/arm/pci.c
 create mode 100644 xen/drivers/passthrough/x86/pci.c

--
1.7.9.5


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] preparing for 4.5.1

 On 02.04.15 at 12:01, ian.campb...@citrix.com wrote:
 On Thu, 2015-03-26 at 17:07 +, Jan Beulich wrote:
 having been released mid January, it is time to get ready for 4.5.1-rc1.
 Please reply with backport requests which you consider necessary but
 still missing. Please also be patient with them being pulled in - I'll be
 gone the coming two weeks, and I'd hope to get an RC1 put together
 soon thereafter.
 
 On my list for ARM I have this patch, which also touches x86. What do
 you think?

If it fixes a real problem in 4.5 (rather than just a theoretical one,
exposed only in 4.6), then I'm fine with this (and assume since it's
mostly for ARM you'll take care of doing and applying the backport).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Outstanding fixes for Linux 4.1?

2015-04-13 Thread Bob Liu


Hi David,

On 04/03/2015 03:22 AM, Konrad Rzeszutek Wilk wrote:

On Thu, Apr 02, 2015 at 05:21:50PM +0100, David Vrabel wrote:

All,

Are there any outstanding patches that should be added to Linux 4.1
(the merge window for which will be opening shortly)?


Bob's two patches to the deferred grant mechanism.



I've already posted them to xen-devel several days ago.

[PATCH 1/2] xen/blkback: safely unmap purge persistent grants
http://lists.xenproject.org/archives/html/xen-devel/2015-04/msg00392.html

[PATCH v2 2/2] xen/grant: introduce func gnttab_unmap_refs_sync()
http://lists.xenproject.org/archives/html/xen-devel/2015-04/msg00393.html

Should I rebase them to your latest tree and repost?

Thanks,
-Bob


Also the XSA-120 addendum. Will send that out on Friday along
with some other potential fixes.


So far I have queued:

Dan Carpenter (1):
   xen/mce: fix up xen_late_init_mcelog() error handling

David Vrabel (2):
   xen: unify foreign GFN map/unmap for auto-xlated physmap guests
   xen/privcmd: improve performance of MMAPBATCH_V2

Jan Beulich (1):
   xen-pciback: also support disabling of bus-mastering and 
memory-write-inva

Juergen Gross (6):
   xen: build infrastructure for generating hypercall depending symbols
   xen: synchronize include/xen/interface/xen.h with xen
   xen: use generated hypervisor symbols in arch/x86/xen/trace.c
   xen: use generated hypercall symbols in arch/x86/xen/xen-head.S
   xen: scsiback: add LUN of restored domain
   xen: support suspend/resume in pvscsi frontend

Konrad Rzeszutek Wilk (3):
   xen/pciback: Don't print scary messages when unsupported by hypervisor.
   x86/xen: Provide a Xen PV APIC driver to support 255 VCPUs
   x86/xen/apic: WARN with details.

Takashi Iwai (2):
   xen: pcpu: Use static attribute groups for sysfs entry
   xen: balloon: Use static attribute groups for sysfs entries

Tao Chen (1):
   xen-scsiback: define a pr_fmt macro with xen-pvscsi

David


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [OSSTEST Nested PATCH v8 0/7] Introduction of netsted HVM test job

This patch set adds nested HVM test case for osstest.
In this test case, a Xen hypervisor (L1) runs on top of another Xen 
hypervisor (L0). 
Upon L1 hypervisor, we will then create a nested guest (L2), and test if the 
Linux guest can then be installed and run well.
About nested Xen virtualization, 
refer to http://wiki.xenproject.org/wiki/Nested_Virtualization_in_Xen.

Test steps
0. To run osstest in standalone mode, write a config file in
   '~/.xen-osstest/config', and then create a standalone.config file 
   to define 'TREE_LINUX', 'REVISION_LINUX' which will be used for
   nested test. The directory path of 'Debian Images' could be defined 
   in '~/.xen-osstest/config'. 
1. run 'build-amd64' job and then 'build-amd64-pvops', to prepare xen
   installation tarball and hvm guest kernel.
2. run 'test-amd64-amd6-nested' job, it does following:
a. invoke test step of 'ts-debain-hvm-install' to install 
   a normal HVM guest
b. invoke test step of 'ts-nested-setup' to make some
   appropriate runvars which selecthost() would recognise and
   prepare the configurations for installing L2 guest VM. 
c. invoke test step of 'ts-xen-install' to install xen on 
   the normal guest, alter it into a L1 hypervisor
d. invoke test step of 'ts-debain-hvm-install' again, but 
   take the L1 hypervisor as host, install the L2 guest on it
e. invoke test step of 'ts-guest-stop', stop L2 guest.
f. invoke test step of 'ts-guest-destroy' to destroy L1 guest.

This patch set reuse 'ts-debian-hvm-install' for both L1 installation and L2
installation, define 'nestedl1' and 'nestedl2' as L1 and L2's hostname,
define 'nested_l1 as L1's host identity.
It also reuses 'ts-xen-install' with L1's hostname 'nestedl1' input parameter 
to differentiate from L0 Xen installation.
This patch series has been tested on test machines of amd64 arch,
Debian-7.2.0-amd64 as guests OS, with hvm domain0 of Linux kernel 3.18.5, 
in standalone mode.
Also, we use linux-stable tree as domain0 kernel source.

longtao.pang (7):
  parsing grub which has 'submenu' primitive
  Changes to support '/boot' leading paths of kernel, xen, in grub
  Edit some APIs in TestSupport.pm for nested test
  Changes on test step of Debian hvm guest install
  Add new script to customize nested test configuration
  Compose the main recipe of nested test job
  Add test job for nest test case

 Osstest/Debian.pm  |   21 -
 Osstest/TestSupport.pm |8 +++-
 make-flight|   20 
 sg-run-job |   11 +++
 ts-debian-hvm-install  |9 ++---
 ts-nested-setup|   52 

 6 files changed, 112 insertions(+), 9 deletions(-)
 create mode 100755 ts-nested-setup


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 1/2] xen/x86: Patch re-factors MSI/X config code from, drivers/passthrough/pci.c to x86 specific


This patch re-factors msi specific code to x86 specific files from
xen/drivers/passthrough/pci.c.

Signed-off-by: Manish Jaggi manish.ja...@caviumnetworks.com
---
 xen/drivers/passthrough/pci.c|  102 +-
 xen/drivers/passthrough/x86/Makefile |1 +
 xen/drivers/passthrough/x86/pci.c|  115 ++
 xen/include/asm-x86/msi.h|1 -
 xen/include/xen/pci.h|   20 +-
 5 files changed, 137 insertions(+), 102 deletions(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index ecd061e..004aba9 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -23,7 +23,6 @@
 #include xen/iommu.h
 #include xen/irq.h
 #include asm/hvm/iommu.h
-#include asm/hvm/irq.h
 #include xen/delay.h
 #include xen/keyhandler.h
 #include xen/event.h
@@ -33,21 +32,6 @@
 #include xen/softirq.h
 #include xen/tasklet.h
 #include xsm/xsm.h
-#include asm/msi.h
-
-struct pci_seg {
-struct list_head alldevs_list;
-u16 nr;
-unsigned long *ro_map;
-/* bus2bridge_lock protects bus2bridge array */
-spinlock_t bus2bridge_lock;
-#define MAX_BUSES 256
-struct {
-u8 map;
-u8 bus;
-u8 devfn;
-} bus2bridge[MAX_BUSES];
-};
 
 spinlock_t pcidevs_lock = SPIN_LOCK_UNLOCKED;

 static struct radix_tree_root pci_segments;
@@ -282,22 +266,10 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, 
u8 bus, u8 devfn)
 *((u8*) pdev-bus) = bus;
 *((u8*) pdev-devfn) = devfn;
 pdev-domain = NULL;
-INIT_LIST_HEAD(pdev-msi_list);
-
-if ( pci_find_cap_offset(pseg-nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
- PCI_CAP_ID_MSIX) )
+if (!pci_alloc_msix (pdev, pseg, bus, devfn))
 {
-struct arch_msix *msix = xzalloc(struct arch_msix);
-
-if ( !msix )
-{
-xfree(pdev);
-return NULL;
-}
-spin_lock_init(msix-table_lock);
-pdev-msix = msix;
+return NULL;
 }
-
 list_add(pdev-alldevs_list, pseg-alldevs_list);
 
 /* update bus2bridge */

@@ -755,54 +727,6 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
 return ret;
 }
 
-static int pci_clean_dpci_irq(struct domain *d,

-  struct hvm_pirq_dpci *pirq_dpci, void *arg)
-{
-struct dev_intx_gsi_link *digl, *tmp;
-
-pirq_guest_unbind(d, dpci_pirq(pirq_dpci));
-
-if ( pt_irq_need_timer(pirq_dpci-flags) )
-kill_timer(pirq_dpci-timer);
-
-list_for_each_entry_safe ( digl, tmp, pirq_dpci-digl_list, list )
-{
-list_del(digl-list);
-xfree(digl);
-}
-
-return pt_pirq_softirq_active(pirq_dpci) ? -ERESTART : 0;
-}
-
-static int pci_clean_dpci_irqs(struct domain *d)
-{
-struct hvm_irq_dpci *hvm_irq_dpci = NULL;
-
-if ( !iommu_enabled )
-return 0;
-
-if ( !is_hvm_domain(d) )
-return 0;
-
-spin_lock(d-event_lock);
-hvm_irq_dpci = domain_get_irq_dpci(d);
-if ( hvm_irq_dpci != NULL )
-{
-int ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
-
-if ( ret )
-{
-spin_unlock(d-event_lock);
-return ret;
-}
-
-d-arch.hvm_domain.irq.dpci = NULL;
-free_hvm_irq_dpci(hvm_irq_dpci);
-}
-spin_unlock(d-event_lock);
-return 0;
-}
-
 int pci_release_devices(struct domain *d)
 {
 struct pci_dev *pdev;
@@ -1186,28 +1110,6 @@ bool_t pcie_aer_get_firmware_first(const struct pci_dev 
*pdev)
 }
 #endif
 
-static int _dump_pci_devices(struct pci_seg *pseg, void *arg)

-{
-struct pci_dev *pdev;
-struct msi_desc *msi;
-
-printk( segment %04x \n, pseg-nr);
-
-list_for_each_entry ( pdev, pseg-alldevs_list, alldevs_list )
-{
-printk(%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs  ,
-   pseg-nr, pdev-bus,
-   PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn),
-   pdev-domain ? pdev-domain-domain_id : -1,
-   (pdev-node != NUMA_NO_NODE) ? pdev-node : -1);
-list_for_each_entry ( msi, pdev-msi_list, list )
-   printk(%d , msi-irq);
-printk(\n);
-}
-
-return 0;
-}
-
 static void dump_pci_devices(unsigned char ch)
 {
 printk( PCI devices \n);
diff --git a/xen/drivers/passthrough/x86/Makefile 
b/xen/drivers/passthrough/x86/Makefile
index a70cf94..a2bcf94 100644
--- a/xen/drivers/passthrough/x86/Makefile
+++ b/xen/drivers/passthrough/x86/Makefile
@@ -1,2 +1,3 @@
 obj-y += ats.o
 obj-y += iommu.o
+obj-$(HAS_PCI) += pci.o
diff --git a/xen/drivers/passthrough/x86/pci.c 
b/xen/drivers/passthrough/x86/pci.c
new file mode 100644
index 000..cf37b0a
--- /dev/null
+++ b/xen/drivers/passthrough/x86/pci.c
@@ -0,0 +1,115 @@
+/*
+ * x86 specific code for PCI MSI
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ *

[Xen-devel] linux-next: build failure after merge of the xen-tip tree

2015-04-13 Thread Stephen Rothwell

Hi all,

After merging the xen-tip tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

drivers/char/tpm/xen-tpmfront.c: In function 'setup_ring':
drivers/char/tpm/xen-tpmfront.c:203:7: warning: passing argument 2 of 
'xenbus_grant_ring' makes pointer from integer without a cast
  rv = xenbus_grant_ring(dev, virt_to_mfn(priv-shr));
   ^
In file included from drivers/char/tpm/xen-tpmfront.c:17:0:
include/xen/xenbus.h:206:5: note: expected 'void *' but argument is of type 
'long unsigned int'
 int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,
 ^
drivers/char/tpm/xen-tpmfront.c:203:7: error: too few arguments to function 
'xenbus_grant_ring'
  rv = xenbus_grant_ring(dev, virt_to_mfn(priv-shr));
   ^
In file included from drivers/char/tpm/xen-tpmfront.c:17:0:
include/xen/xenbus.h:206:5: note: declared here
 int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,
 ^

Caused by commit 1b1586eeeb8c (xenbus_client: Extend interface to
support multi-page ring).

I have used the xen-tip tree from next-20150410 for today.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpc23mvwiL3u.pgp
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] Question. Inject virq to Domain on Xen ARM.

2015-04-13 Thread 신정섭

HI
I have a question about Inject virq to Domain on Xen ARM.
 
Function 'vgic_vcpu_inject_irq' is inject virq to target vcpu.
At the end of vgic_vcpu_inject_irq, like below
--
running = v-gt;is_running;
vcpu_unblock(v); 
if ( running amp;amp; v != current )
smp_send_event_check_mask(cpumask_of(v-gt;processor)); 
--
In code if target vcpu is not current, send SGI to v-gt;processor;
 
I think this function help inject virq to target vcpu immediately in Xen 4.5.0.
In Xen 4.5.0, vcpu that receive SGI, will execute 'enter_hypervisor_head' 
function
and move irq from lr_queue to linked register. 
 
But I think that in Xen 4.4.X above code is not help inject virq. because there 
is code
to move irq from lr_queue to linked register in hyp handler routine.
 
Thanks
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register

 On 01.04.15 at 11:59, m...@redhat.com wrote:
 On Wed, Apr 01, 2015 at 10:41:12AM +0100, Andrew Cooper wrote:
 On 01/04/15 10:20, Stefano Stabellini wrote:
  CC'ing the author of the patch and xen-devel.
  FYI I think that Jan is going to be on vacation for a couple of weeks.
 
  On Wed, 1 Apr 2015, Michael S. Tsirkin wrote:
  On Tue, Mar 31, 2015 at 03:18:03PM +0100, Stefano Stabellini wrote:
  From: Jan Beulich jbeul...@suse.com
 
  Otherwise the guest can abuse that control to cause e.g. PCIe
  Unsupported Request responses (by disabling memory and/or I/O decoding
  and subsequently causing [CPU side] accesses to the respective address
  ranges), which (depending on system configuration) may be fatal to the
  host.
 
  This is CVE-2015-2756 / XSA-126.
 
  Signed-off-by: Jan Beulich jbeul...@suse.com
  Reviewed-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
  Acked-by: Ian Campbell ian.campb...@citrix.com
  The patch description seems somewhat incorrect to me.
  UR should not be fatal to the system, and it's not platform
  specific.
  I think that people have been able to repro this, but I'll let Jan
  comment on it.
 
 Depending on how the BIOS sets up AER (if even available), a UR can very
 easily be fatal to the system.
 
 If firmware first mode is set, Xen (or indeed Linux) can't fix a
 problematic setup.  Experimentally, doing so can cause infinite loops in
 certain vendors SMM handlers.
 
 I think it can, just disable UR reporting, this is up to OS.  This is
 what the PCI spec says - you have snipped the relevant part out from the
 mail you are replying to.

As already said by Andrew, the OS must not do such when the
system is in APEI firmware first mode.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [xen-4.3-testing test] 50394: regressions - FAIL

flight 50394 xen-4.3-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/50394/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-rumpuserxen   4 capture-logs!broken in 50358 [st=!broken!]
 build-i386-rumpuserxen4 capture-logs!broken in 50358 [st=!broken!]
 build-amd64-rumpuserxen  3 host-install(3) broken in 50358 REGR. vs. 36755
 build-i386-rumpuserxen   3 host-install(3) broken in 50358 REGR. vs. 36755
 test-amd64-i386-freebsd10-i386 10 guest-start fail REGR. vs. 36755

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-sedf  3 host-install(3)  broken in 50358 pass in 50394
 test-amd64-amd64-xl-qemuu-winxpsp3 15 guest-localmigrate/x10 fail in 50358 
pass in 50394
 test-amd64-i386-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail pass in 
50358

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-credit2 7 capture-logs(7) broken in 50358 blocked in 36755
 test-armhf-armhf-xl-cubietruck 7 capture-logs(7) broken in 50358 blocked in 
36755
 test-armhf-armhf-libvirt7 capture-logs(7) broken in 50358 blocked in 36755
 test-armhf-armhf-xl-multivcpu 7 capture-logs(7) broken in 50358 blocked in 
36755
 test-armhf-armhf-xl 7 capture-logs(7) broken in 50358 blocked in 36755
 test-armhf-armhf-xl-sedf-pin 7 capture-logs(7) broken in 50358 blocked in 36755
 test-armhf-armhf-xl-arndale 7 capture-logs(7) broken in 50358 blocked in 36755
 test-amd64-i386-pair21 guest-migrate/src_host/dst_host fail like 36755

Tests which did not succeed, but are not blocking:
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-armhf-armhf-xl-sedf  4 capture-logs(4) broken in 50358 never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail in 50358 never pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64  9 debian-hvm-install fail never pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  9 debian-hvm-install  fail never pass
 test-armhf-armhf-xl-credit2   6 xen-boot fail   never pass
 test-armhf-armhf-xl-cubietruck  6 xen-boot fail never pass
 test-armhf-armhf-libvirt  6 xen-boot fail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu  6 xen-boot fail  never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 16 guest-stop fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 16 guest-stop   fail never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass
 test-amd64-amd64-xl-win7-amd64 16 guest-stop   fail never pass
 test-amd64-i386-xl-win7-amd64 16 guest-stop   fail  never pass
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 16 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 16 guest-stop fail never pass
 test-amd64-amd64-xl-winxpsp3 16 guest-stop   fail   never pass
 test-amd64-i386-xend-winxpsp3 20 leak-check/check fail  never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass
 build-amd64-rumpuserxen   6 xen-buildfail   never pass
 build-i386-rumpuserxen6 xen-buildfail   never pass
 test-armhf-armhf-xl-sedf  6 xen-boot fail   never pass
 test-armhf-armhf-xl   6 xen-boot fail   never pass
 test-armhf-armhf-xl-sedf-pin  6 xen-boot fail   never pass
 test-armhf-armhf-xl-arndale   6 xen-boot fail   never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 16 guest-stop   fail never pass
 test-amd64-i386-xend-qemut-winxpsp3 20 leak-check/checkfail never pass

version targeted for testing:
 xen  46ed0083a76efa82713ea979b312fa69250380b2
baseline version:
 xen  c58b16ef1572176cf2f6a424b527b5ed4bb73f17


People who touched revisions under test:
  Andrew Cooper andrew.coop...@citrix.com
  Ian Campbell ian.campb...@citrix.com
  Ian Jackson ian.jack...@eu.citrix.com
  Jan Beulich jbeul...@suse.com
  Konrad Rzeszutek Wilk konrad.w...@oracle.com


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass

Re: [Xen-devel] linux-next: build failure after merge of the xen-tip tree

2015-04-13 Thread Bob Liu


Hi Stephen,

On 04/13/2015 04:09 PM, Stephen Rothwell wrote:

Hi all,

After merging the xen-tip tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

drivers/char/tpm/xen-tpmfront.c: In function 'setup_ring':
drivers/char/tpm/xen-tpmfront.c:203:7: warning: passing argument 2 of 
'xenbus_grant_ring' makes pointer from integer without a cast
   rv = xenbus_grant_ring(dev, virt_to_mfn(priv-shr));
^
In file included from drivers/char/tpm/xen-tpmfront.c:17:0:
include/xen/xenbus.h:206:5: note: expected 'void *' but argument is of type 
'long unsigned int'
  int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,
  ^
drivers/char/tpm/xen-tpmfront.c:203:7: error: too few arguments to function 
'xenbus_grant_ring'
   rv = xenbus_grant_ring(dev, virt_to_mfn(priv-shr));
^
In file included from drivers/char/tpm/xen-tpmfront.c:17:0:
include/xen/xenbus.h:206:5: note: declared here
  int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,
  ^

Caused by commit 1b1586eeeb8c (xenbus_client: Extend interface to
support multi-page ring).

I have used the xen-tip tree from next-20150410 for today.



Sorry for this issue, I missed the xentpm-front.c file in that patch.
(Original patch from Wei Liu already included the right modification 
which didn't exist in Paul's.)


Attached patch should fix this build failure.

--
Regards,
-Bob
From 973eacee793595b9790957186ffd27f192f5dd4f Mon Sep 17 00:00:00 2001
From: Bob Liu bob@oracle.com
Date: Mon, 13 Apr 2015 16:29:10 +0800
Subject: [PATCH] xen-tpmfront: fix build failure

Fix build failure caused by commit 1b1586eeeb8c (xenbus_client: Extend
interface to support multi-page ring), xen-tpmfront was missed in that commit.

Signed-off-by: Bob Liu bob@oracle.com
---
 drivers/char/tpm/xen-tpmfront.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/xen-tpmfront.c b/drivers/char/tpm/xen-tpmfront.c
index c3b4f5a..3111f27 100644
--- a/drivers/char/tpm/xen-tpmfront.c
+++ b/drivers/char/tpm/xen-tpmfront.c
@@ -193,6 +193,7 @@ static int setup_ring(struct xenbus_device *dev, struct tpm_private *priv)
 	struct xenbus_transaction xbt;
 	const char *message = NULL;
 	int rv;
+	grant_ref_t gref;
 
 	priv-shr = (void *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
 	if (!priv-shr) {
@@ -200,11 +201,11 @@ static int setup_ring(struct xenbus_device *dev, struct tpm_private *priv)
 		return -ENOMEM;
 	}
 
-	rv = xenbus_grant_ring(dev, virt_to_mfn(priv-shr));
+	rv = xenbus_grant_ring(dev, priv-shr, 1, gref);
 	if (rv  0)
 		return rv;
 
-	priv-ring_ref = rv;
+	priv-ring_ref = gref;
 
 	rv = xenbus_alloc_evtchn(dev, priv-evtchn);
 	if (rv)
-- 
1.7.10.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] preparing for 4.5.1

 On 30.03.15 at 15:15, roger@citrix.com wrote:
 El 26/03/15 a les 18.07, Jan Beulich ha escrit:
 All,
 
 having been released mid January, it is time to get ready for 4.5.1-rc1.
 Please reply with backport requests which you consider necessary but
 still missing. Please also be patient with them being pulled in - I'll be
 gone the coming two weeks, and I'd hope to get an RC1 put together
 soon thereafter.
 
 I would like to request the backport of the shared page tables fix
 between EPT and IOMMU for PVH:
 
 797842 iommu: fix usage of shared EPT/IOMMU page tables on PVH

Apart from you having failed to also request the fixup on top (as
pointed out by Julien) you also didn't give a reason for the request.
Aiui this is a PVH Dom0 fix only, and PVH Dom0 doesn't even come
near to be supported in 4.5.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/4] x86/MSI-X: access MSI-X table only after having enabled MSI-X

 On 10.04.15 at 22:02, konrad.w...@oracle.com wrote:
 On Wed, Mar 25, 2015 at 04:39:49PM +, Jan Beulich wrote:
 As done in Linux by f598282f51 (PCI: Fix the NIU MSI-X problem in a
 better way) and its broken predecessor, make sure we don't access the
 MSI-X table without having enabled MSI-X first, using the mask-all flag
 instead to prevent interrupts from occurring.
 
 This causes an regression with an Linux guest that has the XSA120 + XSA120
 addendum with PV guests (hadn't tried yet HVM).

You mentioning XSA-120 and its addendum - are these requirements
for the problem to be seen? I admit I may have tested a PV guest
only with an SR-IOV VF (and only a HVM guest also with an ordinary
device), but I'd like to be clear about the validity of the connection.

 When PV guest requests an MSI-X, pciback gets:
 
 [  122.778654] xen-pciback: :0a:00.0: enable MSI-X
 [  122.778861] pciback :0a:00.0: xen map irq failed -6 for 1 domain
 [  122.778887] xen_pciback: :0a:00.0: error enabling MSI-X for guest 1: 
 err -6!

Yeah, there were a few adjustments necessary to get similar issues
under control for HVM guests - maybe there's a path left not covered
by what's done for HVM guests.

 The device has the PCI_COMMAND enabled correctly:
 
 # lspci -s 0a:0.0 -vvv | head
 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network 
 Connection (rev 01)
 Subsystem: Super Micro Computer Inc Device 10c9
 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
 Stepping- SERR- FastB2B- DisINTx-
 
 Attaching the 'xl dmesg', 'dmesg', and 'lspci'

Right - MSI-X is still disabled if the lspci really matches the state at
the time the failure occurred. I'm attaching the original patch with
some debugging code left in, which I suppose would show where
the problematic path is in case you can get to it before I would.

Jan


TODO: drop //temp-s

As done in Linux by f598282f51 (PCI: Fix the NIU MSI-X problem in a
better way) and its broken predecessor, make sure we don't access the
MSI-X table without having enabled MSI-X first, using the mask-all flag
instead to prevent interrupts from occurring.

--- unstable.orig/xen/arch/x86/msi.c2015-02-10 08:19:17.0 +0100
+++ unstable/xen/arch/x86/msi.c 2015-02-10 09:05:12.0 +0100
@@ -142,6 +142,23 @@ static bool_t memory_decoded(const struc
   PCI_COMMAND_MEMORY);
 }
 
+static bool_t msix_memory_decoded(const struct pci_dev *dev, unsigned int pos)
+{
+u16 control = pci_conf_read16(dev-seg, dev-bus,
+  PCI_SLOT(dev-devfn),
+  PCI_FUNC(dev-devfn),
+  msix_control_reg(pos));
+
+if ( !(control  PCI_MSIX_FLAGS_ENABLE) )
+{//temp
+ static bool_t warned;
+ WARN_ON(!test_and_set_bool(warned));
+return 0;
+}
+
+return memory_decoded(dev);
+}
+
 /*
  * MSI message composition
  */
@@ -219,7 +236,8 @@ static bool_t read_msi_msg(struct msi_de
 void __iomem *base;
 base = entry-mask_base;
 
-if ( unlikely(!memory_decoded(entry-dev)) )
+if ( unlikely(!msix_memory_decoded(entry-dev,
+   entry-msi_attrib.pos)) )
 return 0;
 msg-address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
 msg-address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
@@ -285,7 +303,8 @@ static int write_msi_msg(struct msi_desc
 void __iomem *base;
 base = entry-mask_base;
 
-if ( unlikely(!memory_decoded(entry-dev)) )
+if ( unlikely(!msix_memory_decoded(entry-dev,
+   entry-msi_attrib.pos)) )
 return -ENXIO;
 writel(msg-address_lo,
base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
@@ -379,7 +398,7 @@ static bool_t msi_set_mask_bit(struct ir
 {
 struct msi_desc *entry = desc-msi_desc;
 struct pci_dev *pdev;
-u16 seg;
+u16 seg, control;
 u8 bus, slot, func;
 
 ASSERT(spin_is_locked(desc-lock));
@@ -401,35 +420,38 @@ static bool_t msi_set_mask_bit(struct ir
 }
 break;
 case PCI_CAP_ID_MSIX:
+control = pci_conf_read16(seg, bus, slot, func,
+  msix_control_reg(entry-msi_attrib.pos));
+if ( unlikely(!(control  PCI_MSIX_FLAGS_ENABLE)) )
+pci_conf_write16(seg, bus, slot, func,
+ msix_control_reg(entry-msi_attrib.pos),
+ control | (PCI_MSIX_FLAGS_ENABLE |
+PCI_MSIX_FLAGS_MASKALL));
 if ( likely(memory_decoded(pdev)) )
 {
 writel(flag, entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
 readl(entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
-break;
+if ( likely(control  PCI_MSIX_FLAGS_ENABLE) )
+break;
+flag = 1;
 }
-if ( flag )
+

Re: [Xen-devel] [PATCH] xen: arm: X-Gene Storm check GIC DIST address for EOI quirk

2015-04-13 Thread Pranavkumar Sawargaonkar

On 8 April 2015 at 13:58, Christoffer Dall christoffer.d...@linaro.org wrote:
 Hi Pranav,

 On Mon, Apr 06, 2015 at 02:24:01PM +0530, Pranavkumar Sawargaonkar wrote:
 In old X-Gene Storm firmware and DT, secure mode addresses have been
 mentioned in GICv2 node. In this case maintenance interrupt is used
 instead of EOI HW method.

 This patch checks the GIC Distributor Base Address to enable EOI quirk
 for old firmware.

 Ref:
 http://lists.xen.org/archives/html/xen-devel/2014-07/msg01263.html

 Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org

 I have tested this on the m400 cluster and can confirm that it prevents
 the issue with ab trashing the machine:

 Tested-by: Christoffer Dall christoffer.d...@linaro.org

Thanks Christoffer for cross checking.
 will sent out a new revision of the patch with comments addressed.

Thanks,
Pranav

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] PMU in xen enviroment

2015-04-13 Thread 蒋雄伟(蒋冲)


Hi，everyone


We would like to obtain the distribution of CPU-specific hardware events
such as clock cycles and cache and TLB misses and branch-miss and etc. in
Xen environment . It is said that Xenoprof can be used to do so. We use
Intel's Ivy Bridge Processor which is released in 2012, so I think it is
better to use oprofile 0.9.8 or above. But http://xenoprof.sourceforge.net/
only released oprofile-0.9.5-xen.patch which is released in 2009. Does it
mean no patch is needed since OProfile 0.9.6 ? is there a detailed document
describing how to use oprofile in xen environment ? is there any other tools
to do so ?


Thanks a lot. 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Dom0 linux 4.0 + devel/for-linus-4.1 branch: p2m.c:884:d0v0 gfn_to_mfn failed! gfn=ffffffff001ed type:4


Monday, April 13, 2015, 11:50:51 AM, you wrote:

 On 13/04/15 10:39, Sander Eikelenboom wrote:
 Hi David,
 
 I seem to have spotted some trouble with a 4.0 dom0 kernel with the 
 devel/for-linus-4.1 branch pulled on top.
 
 Does this remind you of any specific commits in the devel/for-linus-4.1 
 branch that could
 likely be involved that i could try to revert ?

 Yes.  This will probably be 4e8c0c8c4bf3a (xen/privcmd: improve
 performance of MMAPBATCH_V2) which makes the kernel try harder to map
 all GFNs instead of failing on the first one.

 I think this is qemu incorrectly trying to map GFNs.

 David

Reverted that specific one, but still get those messages.

--
Sander


 I now get a very large number of these at guest with passthrough start:
 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
 gfn=001ed type:4




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown

 On 13.04.15 at 12:50, stefano.stabell...@eu.citrix.com wrote:
 On Mon, 13 Apr 2015, Jan Beulich wrote:
  On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote:
  On Wed, 25 Mar 2015, Jan Beulich wrote:
  When a device gets detached from a guest, pciback will clear its
  command register, thus disabling both memory and I/O decoding. The
  disabled memory decoding, however, has an effect on the MSI-X table
  accesses the hypervisor does: These won't have the intended effect
  anymore. Even worse, for PCIe devices (but not SR-IOV virtual
  functions) such accesses may (will?) be treated as Unsupported
  Requests, causing respective errors to be surfaced, potentially in the
  form of NMIs that may be fatal to the hypervisor or Dom0 is different
  ways. Hence rather than carrying out these accesses, we should avoid
  them where we can, and use alternative (e.g. PCI config space based)
  mechanisms to achieve at least the same effect.
  
  I don't think that it is a good idea for both Xen and Linux to access
  the command register simultaneously.  Working around Linux in Xen
  doesn't sound like an optimal solution.   Maybe we could just fix the
  pciback and that would be enough.
 
 I'm afraid that would just eliminate the specific case, but not the
 general issue.
 
 If we trust Dom0 to do the right thing, then I don't think there is a
 general issue to be solved. Dom0 can break the system at any time, I
 don't see any differences here, unless we have a plan to actually be
 able to handle a misbehaving dom0, in that case I am all for it.

No, that gets us in the wrong direction. Dom0 can have legitimate
reasons to have to clear memory or I/O decoding on a device at
run time (even if current Linux doesn't do so). The more general
problem we may need to solve is that of racing config space
accesses (one by Dom0, the other by the hypervisor). But that's
beyond this series' scope.

 While we trust Dom0 to not do outright bad things,
 the hypervisor should still avoid doing things that can go wrong
 due to the state a device is put (or left) in by Dom0.
 
 Xen should also avoid doing things that can go wrong because of the
 state a device is put in by QEMU or other components in the system.
 There isn't much room for Xen to play with.

Qemu is either part of Dom0, or doesn't play with devices directly.

 And how are we going to deal with older unfixed QEMUs?
 So far we have been using the same policy for QEMU and the Dom0 kernel:
 Xen doesn't break them -- old Linux kernels and QEMUs are supposed to
 just work.

I'm not sure that's really true for qemu, or if it is, then only by pure
luck: The tool stack interface of the hypervisor as well as the libxc
interfaces are subject to change between any two releases. I view
it as unavoidable to break older qemu here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register

2015-04-13 Thread Michael S. Tsirkin

On Mon, Apr 13, 2015 at 12:34:34PM +0100, Jan Beulich wrote:
  On 13.04.15 at 13:19, m...@redhat.com wrote:
  Yes Linux can't fix firmware 1st mode, but
  PCI express spec says what firmware should do in this case:
  
  IMPLEMENTATION NOTE Software UR Reporting Compatibility with 1.0a Devices
  
  With 1.0a device Functions, 96 if the Unsupported Request Reporting 
  Enable bit is set, the Function
  when operating as a Completer will send an uncorrectable error 
  Message (if enabled) when a UR
  error is detected. On platforms where an uncorrectable error 
  Message 
  is handled as a System Error,
  this will break PC-compatible Configuration Space probing, so 
  software/firmware on such
  platforms may need to avoid setting the Unsupported Request 
  Reporting Enable bit.
  With device Functions implementing Role-Based Error Reporting, 
  setting the Unsupported Request
  Reporting Enable bit will not interfere with PC-compatible 
  Configuration Space probing, assuming
  that the severity for UR is left at its default of non-fatal. 
  However, setting the Unsupported Request
  Reporting Enable bit will enable the Function to report UR errors 
  detected with posted Requests,
  helping avoid this case for potential silent data corruption.
  On platforms where robust error handling and PC-compatible 
  Configuration Space probing is
  required, it is suggested that software or firmware have the 
  Unsupported Request Reporting Enable
  bit Set for Role-Based Error Reporting Functions, but clear for 
  1.0a 
  Functions. Software or
  firmware can distinguish the two classes of Functions by examining 
  the Role-Based Error Reporting
  bit in the Device Capabilities register.
  
  
  What I think you have is a very old 1.0a system, and you set Unsupported
  Request Reporting Enable.
  
  Can you confirm?
 
 No. In at least one of the two cases we got reports of the original
 problem, triggering the finding of this issue, this is a brand new one,
 only soon to become available publicly. Furthermore I'm being
 confused by the mention of PC-compatible config space probing
 above: The URs we talk about here don't result from config space
 accessed at all.

OK. Can you please explain why does UR cause a system error then?
It looks like a hardware bug: PCIE 1.1 seems to say it shouldn't.

  You will have other problems if your firmware doesn't follow the spec. So 
  how about either
  
  - Don't use firmware 1st mode with pci express
(Seems no reason to do firmware 1st for PCIE, architecture is completely
 standard. I saw mentions of using combined/parallel mode, using AER for 
  some
 devices but not others, but I don't know how this is supposed to be 
  enabled.
 Any idea?)
  
  or
  
  - ask your vendor to update firmware if it doesn't do the right thing
 
 Both not very practical suggestions, based on experience.
 
 Jan

Well using OS native mode is definitely practical, the question
is how to detect the problematic configurations.

There's always XSA-124 which says buggy hardware can cause
security problems.

-- 
MST

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [xen-unstable test] 50291: regressions - FAIL

 On 10.04.15 at 19:03, ian.jack...@eu.citrix.com wrote:
 osstest service user writes ([xen-unstable test] 50291: regressions - FAIL):
 flight 50291 xen-unstable real [real]
 http://logs.test-lab.xenproject.org/osstest/logs/50291/ 
 
 Regressions :-(
 
 Tests which did not succeed and are blocking,
 including tests which could not be run:
  test-amd64-i386-freebsd10-i386 11 guest-localmigrate  fail REGR. vs. 36514
  test-amd64-i386-freebsd10-amd64 11 guest-localmigrate fail REGR. vs. 36514
 
 I have done extensive testing of the network and found no network
 problems, infrastructure VM stalls, or anything, so I'm pretty sure
 these are not infrastructure problems but rather real bugs with the
 software under test.
 
 We believe this to be a bug in FreeBSD and are trying to update to
 FreeBSD 10.1 which may fix it.
 
 We think this is not anything to do with Xen.  This bug seems to be
 exposed more in the new colo than the old test lab, for some reason.
 
 These were the only problems in this flight.  Under the circumstances
 I think we would be justified in force pushing to xen.git#master:
 
 version targeted for testing:
  xen  3dca8a17f639f4b53cd2eeb687cec4e9f83acf3d

Yes please.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Dom0 linux 4.0 + devel/for-linus-4.1 branch: p2m.c:884:d0v0 gfn_to_mfn failed! gfn=ffffffff001ed type:4


Monday, April 13, 2015, 2:07:02 PM, you wrote:

 On 13/04/15 12:21, Sander Eikelenboom wrote:
 
 Monday, April 13, 2015, 11:50:51 AM, you wrote:
 
 On 13/04/15 10:39, Sander Eikelenboom wrote:
 Hi David,

 I seem to have spotted some trouble with a 4.0 dom0 kernel with the 
 devel/for-linus-4.1 branch pulled on top.

 Does this remind you of any specific commits in the devel/for-linus-4.1 
 branch that could
 likely be involved that i could try to revert ?
 
 Yes.  This will probably be 4e8c0c8c4bf3a (xen/privcmd: improve
 performance of MMAPBATCH_V2) which makes the kernel try harder to map
 all GFNs instead of failing on the first one.
 
 I think this is qemu incorrectly trying to map GFNs.
 
 David
 
 Reverted that specific one, but still get those messages.

 You'll have to bisect it then.  Because I don't see any other relevant
 commits in devel/for-linus-4.1

 David

Ok .. hmm first candidate of the bisect also looks interessting:
[628c28eefd6f2cef03b212081b466ae43fd093a3] xen: unify foreign GFN map/unmap for 
auto-xlated physmap guests


--
Sander


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC v1 00/15] Add VT-d Posted-Interrupts support

 On 01.04.15 at 15:21, feng...@intel.com wrote:
 Hi Jan,
 
 Any more comments about this series? Thanks a lot!

I think it makes little sense to wait for my review before posting v2
with issues addressed which others have pointed out. I just
returned from vacation, and would have time to look at v1 next
week the earliest (with the week after being occupied by travel).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/2] xen/x86: Patch re-factors MSI/X config code from, drivers/passthrough/pci.c to x86 specific



On Monday 13 April 2015 03:45 PM, Stefano Stabellini wrote:

On Mon, 13 Apr 2015, Manish Jaggi wrote:

This patch re-factors msi specific code to x86 specific files from
xen/drivers/passthrough/pci.c.

Signed-off-by: Manish Jaggi manish.ja...@caviumnetworks.com
---
  xen/drivers/passthrough/pci.c|  102 +-
  xen/drivers/passthrough/x86/Makefile |1 +
  xen/drivers/passthrough/x86/pci.c|  115
++
  xen/include/asm-x86/msi.h|1 -
  xen/include/xen/pci.h|   20 +-
  5 files changed, 137 insertions(+), 102 deletions(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index ecd061e..004aba9 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -23,7 +23,6 @@
  #include xen/iommu.h
  #include xen/irq.h
  #include asm/hvm/iommu.h
-#include asm/hvm/irq.h
  #include xen/delay.h
  #include xen/keyhandler.h
  #include xen/event.h
@@ -33,21 +32,6 @@
  #include xen/softirq.h
  #include xen/tasklet.h
  #include xsm/xsm.h
-#include asm/msi.h
-
-struct pci_seg {
-struct list_head alldevs_list;
-u16 nr;
-unsigned long *ro_map;
-/* bus2bridge_lock protects bus2bridge array */
-spinlock_t bus2bridge_lock;
-#define MAX_BUSES 256
-struct {
-u8 map;
-u8 bus;
-u8 devfn;
-} bus2bridge[MAX_BUSES];
-};
   spinlock_t pcidevs_lock = SPIN_LOCK_UNLOCKED;
  static struct radix_tree_root pci_segments;
@@ -282,22 +266,10 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg,
u8 bus, u8 devfn)
  *((u8*) pdev-bus) = bus;
  *((u8*) pdev-devfn) = devfn;
  pdev-domain = NULL;
-INIT_LIST_HEAD(pdev-msi_list);
-
-if ( pci_find_cap_offset(pseg-nr, bus, PCI_SLOT(devfn),
PCI_FUNC(devfn),
- PCI_CAP_ID_MSIX) )
+if (!pci_alloc_msix (pdev, pseg, bus, devfn))
  {
-struct arch_msix *msix = xzalloc(struct arch_msix);
-
-if ( !msix )
-{
-xfree(pdev);
-return NULL;
-}
-spin_lock_init(msix-table_lock);
-pdev-msix = msix;
+return NULL;
  }
-

 From the look of it, this code doesn't seem x86 specific
MSIX is only used for x86, dom0/U handles MSI/X in ARM. I think If ARM 
is not using MSI/X then code has to be in x86 specific file.



  list_add(pdev-alldevs_list, pseg-alldevs_list);
   /* update bus2bridge */
@@ -755,54 +727,6 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
  return ret;
  }
  -static int pci_clean_dpci_irq(struct domain *d,
-  struct hvm_pirq_dpci *pirq_dpci, void *arg)
-{
-struct dev_intx_gsi_link *digl, *tmp;
-
-pirq_guest_unbind(d, dpci_pirq(pirq_dpci));
-
-if ( pt_irq_need_timer(pirq_dpci-flags) )
-kill_timer(pirq_dpci-timer);
-
-list_for_each_entry_safe ( digl, tmp, pirq_dpci-digl_list, list )
-{
-list_del(digl-list);
-xfree(digl);
-}
-
-return pt_pirq_softirq_active(pirq_dpci) ? -ERESTART : 0;
-}
-
-static int pci_clean_dpci_irqs(struct domain *d)
-{
-struct hvm_irq_dpci *hvm_irq_dpci = NULL;
-
-if ( !iommu_enabled )
-return 0;
-
-if ( !is_hvm_domain(d) )
-return 0;
-
-spin_lock(d-event_lock);
-hvm_irq_dpci = domain_get_irq_dpci(d);
-if ( hvm_irq_dpci != NULL )
-{
-int ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
-
-if ( ret )
-{
-spin_unlock(d-event_lock);
-return ret;
-}
-
-d-arch.hvm_domain.irq.dpci = NULL;
-free_hvm_irq_dpci(hvm_irq_dpci);
-}
-spin_unlock(d-event_lock);
-return 0;
-}

..but these two functions do



  int pci_release_devices(struct domain *d)
  {
  struct pci_dev *pdev;
@@ -1186,28 +1110,6 @@ bool_t pcie_aer_get_firmware_first(const struct
pci_dev *pdev)
  }
  #endif
  -static int _dump_pci_devices(struct pci_seg *pseg, void *arg)
-{
-struct pci_dev *pdev;
-struct msi_desc *msi;
-
-printk( segment %04x \n, pseg-nr);
-
-list_for_each_entry ( pdev, pseg-alldevs_list, alldevs_list )
-{
-printk(%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs  ,
-   pseg-nr, pdev-bus,
-   PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn),
-   pdev-domain ? pdev-domain-domain_id : -1,
-   (pdev-node != NUMA_NO_NODE) ? pdev-node : -1);
-list_for_each_entry ( msi, pdev-msi_list, list )
-   printk(%d , msi-irq);
-printk(\n);
-}
-
-return 0;
-}
-
  static void dump_pci_devices(unsigned char ch)
  {
  printk( PCI devices \n);
diff --git a/xen/drivers/passthrough/x86/Makefile
b/xen/drivers/passthrough/x86/Makefile
index a70cf94..a2bcf94 100644
--- a/xen/drivers/passthrough/x86/Makefile
+++ b/xen/drivers/passthrough/x86/Makefile
@@ -1,2 +1,3 @@
  obj-y += ats.o
  obj-y += iommu.o
+obj-$(HAS_PCI) += pci.o
diff --git

Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen

2015-04-13 Thread George Dunlap

On Thu, Apr 9, 2015 at 5:36 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Thu, 9 Apr 2015, Eric Dumazet wrote:
 On Thu, 2015-04-09 at 16:46 +0100, Stefano Stabellini wrote:
  Hi all,
 
  I found a performance regression when running netperf -t TCP_MAERTS from
  an external host to a Xen VM on ARM64: v3.19 and v4.0-rc4 running in the
  virtual machine are 30% slower than v3.18.
 
  Through bisection I found that the perf regression is caused by the
  prensence of the following commit in the guest kernel:
 
 
  commit 605ad7f184b60cfaacbc038aa6c55ee68dee3c89
  Author: Eric Dumazet eduma...@google.com
  Date:   Sun Dec 7 12:22:18 2014 -0800
 
  tcp: refine TSO autosizing

[snip]

 This commit restored original TCP Small Queue behavior, which is the
 first step to fight bufferbloat.

 Some network drivers are known to be problematic because of a delayed TX
 completion.

[snip]

 Try to tweak /proc/sys/net/ipv4/tcp_limit_output_bytes to see if it
 makes a difference ?

 A very big difference:

 echo 262144  /proc/sys/net/ipv4/tcp_limit_output_bytes
 brings us much closer to the original performance, the slowdown is just
 8%

 echo 1048576  /proc/sys/net/ipv4/tcp_limit_output_bytes
 fills the gap entirely, same performance as before refine TSO
 autosizing


 What would be the next step for here?  Should I just document this as an
 important performance tweaking step for Xen, or is there something else
 we can do?

Is the problem perhaps that netback/netfront delays TX completion?
Would it be better to see if that can be addressed properly, so that
the original purpose of the patch (fighting bufferbloat) can be
achieved while not degrading performance for Xen?  Or at least, so
that people get decent perfomance out of the box without having to
tweak TCP parameters?

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Dom0 linux 4.0 + devel/for-linus-4.1 branch: p2m.c:884:d0v0 gfn_to_mfn failed! gfn=ffffffff001ed type:4

On 13/04/15 12:21, Sander Eikelenboom wrote:
 
 Monday, April 13, 2015, 11:50:51 AM, you wrote:
 
 On 13/04/15 10:39, Sander Eikelenboom wrote:
 Hi David,

 I seem to have spotted some trouble with a 4.0 dom0 kernel with the 
 devel/for-linus-4.1 branch pulled on top.

 Does this remind you of any specific commits in the devel/for-linus-4.1 
 branch that could
 likely be involved that i could try to revert ?
 
 Yes.  This will probably be 4e8c0c8c4bf3a (xen/privcmd: improve
 performance of MMAPBATCH_V2) which makes the kernel try harder to map
 all GFNs instead of failing on the first one.
 
 I think this is qemu incorrectly trying to map GFNs.
 
 David
 
 Reverted that specific one, but still get those messages.

You'll have to bisect it then.  Because I don't see any other relevant
commits in devel/for-linus-4.1

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: Use an higher value for the GIC phandle

Hi Ian,

On 09/04/15 18:04, Ian Jackson wrote:
 Julien Grall writes (Re: [Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: 
 Use an higher value for the GIC phandle):
 On 09/04/15 17:52, Ian Jackson wrote:
 What would happen if our assumption about the DT compiler were
 violated ?

 The phandle would be present in 2 different nodes of the DT. FYI, that
 may also happen if a user use 2 times the same phandle in the partial DT.
 
 So we are making an assumption about the DT compiler which we don't
 check, and when the assumption is violated, we produce a corrupted DT
 which produces undefined behaviour in the guest.
 
 Can we do some kind of check ?

As discussed IRL, I will add a note in the documentation about this
restriction.

This will be removed when the phandle will be generated dynamically.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/2] Re-factoring passthrough/pci.c and adding place-holder code for ARM/PCI

Hi Manish,

On 13/04/15 08:37, Manish Jaggi wrote:
 Xen currently does not have PCI support for ARM builds. This patch set
 makes the code compilable for ARM PCI and adds places-holder code
 which would be replaced with PCI pass-through support patch series.

May I ask why you did send directly all the code to support PCI on ARM?

Without the rest it's hard to tell whether these patches make sense or not.

 Re-factor MSI Handling
 -
 There is a some x86 specific code which is found in common code:
 xen/drivers/passthrough/pci.c which needs to be re factored.
 
 MSI/X are configured and handled by dom0 or domU code on ARM64 and is not
 required to be part of common code. However there are functions which are
 used as part of common code and calls to these functions cannot be easily
 re factored like pci_cleanup_msi.

On x86, the hypervisor is taking care of MSI (enabling/disabling in the
config space) as long as doing sanity check on MSI used by a given domain.

How do you plan to handle it on ARM? IHMO, the MSI code is very useful
and would also help GIC MSI implement (ITS + V2M).

FWIW, I've pointed out the same issue on the ITS series a couple of
weeks ago.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/2] Re-factoring passthrough/pci.c and adding place-holder code for ARM/PCI

BTW, your series is not threaded.

I've already said it on another series. Please look at [1] to see how to
send correctly your series.

[1]
http://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches#Sending_the_patches_to_the_list

Regards,

On 13/04/15 11:19, Julien Grall wrote:
Hi Manish,

On 13/04/15 08:37, Manish Jaggi wrote:
Xen currently does not have PCI support for ARM builds. This patch set
makes the code compilable for ARM PCI and adds places-holder code
which would be replaced with PCI pass-through support patch series.

May I ask why you did send directly all the code to support PCI on ARM?

Without the rest it's hard to tell whether these patches make sense or not.

Re-factor MSI Handling
-
There is a some x86 specific code which is found in common code:
xen/drivers/passthrough/pci.c which needs to be re factored.

MSI/X are configured and handled by dom0 or domU code on ARM64 and is not
required to be part of common code. However there are functions which are
used as part of common code and calls to these functions cannot be easily
re factored like pci_cleanup_msi.

On x86, the hypervisor is taking care of MSI (enabling/disabling in the
config space) as long as doing sanity check on MSI used by a given domain.

How do you plan to handle it on ARM? IHMO, the MSI code is very useful
and would also help GIC MSI implement (ITS + V2M).

FWIW, I've pointed out the same issue on the ITS series a couple of
weeks ago.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register

 On 13.04.15 at 13:19, m...@redhat.com wrote:
 Yes Linux can't fix firmware 1st mode, but
 PCI express spec says what firmware should do in this case:
 
 IMPLEMENTATION NOTE Software UR Reporting Compatibility with 1.0a Devices
 
 With 1.0a device Functions, 96 if the Unsupported Request Reporting 
 Enable bit is set, the Function
 when operating as a Completer will send an uncorrectable error 
 Message (if enabled) when a UR
 error is detected. On platforms where an uncorrectable error Message 
 is handled as a System Error,
 this will break PC-compatible Configuration Space probing, so 
 software/firmware on such
 platforms may need to avoid setting the Unsupported Request 
 Reporting Enable bit.
 With device Functions implementing Role-Based Error Reporting, 
 setting the Unsupported Request
 Reporting Enable bit will not interfere with PC-compatible 
 Configuration Space probing, assuming
 that the severity for UR is left at its default of non-fatal. 
 However, setting the Unsupported Request
 Reporting Enable bit will enable the Function to report UR errors 
 detected with posted Requests,
 helping avoid this case for potential silent data corruption.
 On platforms where robust error handling and PC-compatible 
 Configuration Space probing is
 required, it is suggested that software or firmware have the 
 Unsupported Request Reporting Enable
 bit Set for Role-Based Error Reporting Functions, but clear for 1.0a 
 Functions. Software or
 firmware can distinguish the two classes of Functions by examining 
 the Role-Based Error Reporting
 bit in the Device Capabilities register.
 
 
 What I think you have is a very old 1.0a system, and you set Unsupported
 Request Reporting Enable.
 
 Can you confirm?

No. In at least one of the two cases we got reports of the original
problem, triggering the finding of this issue, this is a brand new one,
only soon to become available publicly. Furthermore I'm being
confused by the mention of PC-compatible config space probing
above: The URs we talk about here don't result from config space
accessed at all.

 You will have other problems if your firmware doesn't follow the spec. So 
 how about either
 
 - Don't use firmware 1st mode with pci express
   (Seems no reason to do firmware 1st for PCIE, architecture is completely
standard. I saw mentions of using combined/parallel mode, using AER for 
 some
devices but not others, but I don't know how this is supposed to be 
 enabled.
Any idea?)
 
 or
 
 - ask your vendor to update firmware if it doesn't do the right thing

Both not very practical suggestions, based on experience.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/2] xen/x86: Patch re-factors MSI/X config code from, drivers/passthrough/pci.c to x86 specific

On Mon, 13 Apr 2015, Manish Jaggi wrote:
 This patch re-factors msi specific code to x86 specific files from
 xen/drivers/passthrough/pci.c.
 
 Signed-off-by: Manish Jaggi manish.ja...@caviumnetworks.com
 ---
  xen/drivers/passthrough/pci.c|  102 +-
  xen/drivers/passthrough/x86/Makefile |1 +
  xen/drivers/passthrough/x86/pci.c|  115
 ++
  xen/include/asm-x86/msi.h|1 -
  xen/include/xen/pci.h|   20 +-
  5 files changed, 137 insertions(+), 102 deletions(-)
 
 diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
 index ecd061e..004aba9 100644
 --- a/xen/drivers/passthrough/pci.c
 +++ b/xen/drivers/passthrough/pci.c
 @@ -23,7 +23,6 @@
  #include xen/iommu.h
  #include xen/irq.h
  #include asm/hvm/iommu.h
 -#include asm/hvm/irq.h
  #include xen/delay.h
  #include xen/keyhandler.h
  #include xen/event.h
 @@ -33,21 +32,6 @@
  #include xen/softirq.h
  #include xen/tasklet.h
  #include xsm/xsm.h
 -#include asm/msi.h
 -
 -struct pci_seg {
 -struct list_head alldevs_list;
 -u16 nr;
 -unsigned long *ro_map;
 -/* bus2bridge_lock protects bus2bridge array */
 -spinlock_t bus2bridge_lock;
 -#define MAX_BUSES 256
 -struct {
 -u8 map;
 -u8 bus;
 -u8 devfn;
 -} bus2bridge[MAX_BUSES];
 -};
   spinlock_t pcidevs_lock = SPIN_LOCK_UNLOCKED;
  static struct radix_tree_root pci_segments;
 @@ -282,22 +266,10 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg,
 u8 bus, u8 devfn)
  *((u8*) pdev-bus) = bus;
  *((u8*) pdev-devfn) = devfn;
  pdev-domain = NULL;
 -INIT_LIST_HEAD(pdev-msi_list);
 -
 -if ( pci_find_cap_offset(pseg-nr, bus, PCI_SLOT(devfn),
 PCI_FUNC(devfn),
 - PCI_CAP_ID_MSIX) )
 +if (!pci_alloc_msix (pdev, pseg, bus, devfn))
  {
 -struct arch_msix *msix = xzalloc(struct arch_msix);
 -
 -if ( !msix )
 -{
 -xfree(pdev);
 -return NULL;
 -}
 -spin_lock_init(msix-table_lock);
 -pdev-msix = msix;
 +return NULL;
  }
 -

From the look of it, this code doesn't seem x86 specific


  list_add(pdev-alldevs_list, pseg-alldevs_list);
   /* update bus2bridge */
 @@ -755,54 +727,6 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
  return ret;
  }
  -static int pci_clean_dpci_irq(struct domain *d,
 -  struct hvm_pirq_dpci *pirq_dpci, void *arg)
 -{
 -struct dev_intx_gsi_link *digl, *tmp;
 -
 -pirq_guest_unbind(d, dpci_pirq(pirq_dpci));
 -
 -if ( pt_irq_need_timer(pirq_dpci-flags) )
 -kill_timer(pirq_dpci-timer);
 -
 -list_for_each_entry_safe ( digl, tmp, pirq_dpci-digl_list, list )
 -{
 -list_del(digl-list);
 -xfree(digl);
 -}
 -
 -return pt_pirq_softirq_active(pirq_dpci) ? -ERESTART : 0;
 -}
 -
 -static int pci_clean_dpci_irqs(struct domain *d)
 -{
 -struct hvm_irq_dpci *hvm_irq_dpci = NULL;
 -
 -if ( !iommu_enabled )
 -return 0;
 -
 -if ( !is_hvm_domain(d) )
 -return 0;
 -
 -spin_lock(d-event_lock);
 -hvm_irq_dpci = domain_get_irq_dpci(d);
 -if ( hvm_irq_dpci != NULL )
 -{
 -int ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL);
 -
 -if ( ret )
 -{
 -spin_unlock(d-event_lock);
 -return ret;
 -}
 -
 -d-arch.hvm_domain.irq.dpci = NULL;
 -free_hvm_irq_dpci(hvm_irq_dpci);
 -}
 -spin_unlock(d-event_lock);
 -return 0;
 -}

..but these two functions do


  int pci_release_devices(struct domain *d)
  {
  struct pci_dev *pdev;
 @@ -1186,28 +1110,6 @@ bool_t pcie_aer_get_firmware_first(const struct
 pci_dev *pdev)
  }
  #endif
  -static int _dump_pci_devices(struct pci_seg *pseg, void *arg)
 -{
 -struct pci_dev *pdev;
 -struct msi_desc *msi;
 -
 -printk( segment %04x \n, pseg-nr);
 -
 -list_for_each_entry ( pdev, pseg-alldevs_list, alldevs_list )
 -{
 -printk(%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs  ,
 -   pseg-nr, pdev-bus,
 -   PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn),
 -   pdev-domain ? pdev-domain-domain_id : -1,
 -   (pdev-node != NUMA_NO_NODE) ? pdev-node : -1);
 -list_for_each_entry ( msi, pdev-msi_list, list )
 -   printk(%d , msi-irq);
 -printk(\n);
 -}
 -
 -return 0;
 -}
 -
  static void dump_pci_devices(unsigned char ch)
  {
  printk( PCI devices \n);
 diff --git a/xen/drivers/passthrough/x86/Makefile
 b/xen/drivers/passthrough/x86/Makefile
 index a70cf94..a2bcf94 100644
 --- a/xen/drivers/passthrough/x86/Makefile
 +++ b/xen/drivers/passthrough/x86/Makefile
 @@ -1,2 +1,3 @@
  obj-y += ats.o
  obj-y += iommu.o
 +obj-$(HAS_PCI) += pci.o
 diff --git a/xen/drivers/passthrough/x86/pci.c
 b/xen/drivers/passthrough/x86/pci.c
 new file mode 100644

Re: [Xen-devel] [PATCH v3 3/8] x86: detect and initialize Intel CAT feature

On 13/04/15 11:51, Jan Beulich wrote:
 On 27.03.15 at 19:31, andrew.coop...@citrix.com wrote:
 On 26/03/15 12:38, Chao Peng wrote:
 +cpuid_count(PSR_CPUID_LEVEL_CAT, 0, eax, ebx, ecx, edx);
 +if ( ebx  PSR_RESOURCE_TYPE_L3 )
 +{
 +cpuid_count(PSR_CPUID_LEVEL_CAT, 1, eax, ebx, ecx, edx);
 +info-cbm_len = (eax  0x1f) + 1;
 +info-cos_max = (edx  0x);
 +
 +info-enabled = 1;
 +printk(XENLOG_DEBUG CAT: enabled on socket %u, cos_max:%u, 
 cbm_len:%u\n,
 +   socket, info-cos_max, info-cbm_len);
 I would bump this to INFO, at least for socket 0.  Similar to CMT, a
 user which has gone to the effort of enabling CAT will want some
 indication in the boot log without also having to bump the logging level.
 Which INFO wouldn't be sufficient for (at least not with the xen.org
 defaults, not sure about XenServer's).

Huh - you are right.  XenServer doesn't patch this, but I spend most of
my time using debug builds so I tend not to notice.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register

2015-04-13 Thread Michael S. Tsirkin

On Mon, Apr 13, 2015 at 09:17:04AM +0100, Jan Beulich wrote:
  On 01.04.15 at 11:59, m...@redhat.com wrote:
  On Wed, Apr 01, 2015 at 10:41:12AM +0100, Andrew Cooper wrote:
  On 01/04/15 10:20, Stefano Stabellini wrote:
   CC'ing the author of the patch and xen-devel.
   FYI I think that Jan is going to be on vacation for a couple of weeks.
  
   On Wed, 1 Apr 2015, Michael S. Tsirkin wrote:
   On Tue, Mar 31, 2015 at 03:18:03PM +0100, Stefano Stabellini wrote:
   From: Jan Beulich jbeul...@suse.com
  
   Otherwise the guest can abuse that control to cause e.g. PCIe
   Unsupported Request responses (by disabling memory and/or I/O decoding
   and subsequently causing [CPU side] accesses to the respective address
   ranges), which (depending on system configuration) may be fatal to the
   host.
  
   This is CVE-2015-2756 / XSA-126.
  
   Signed-off-by: Jan Beulich jbeul...@suse.com
   Reviewed-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
   Acked-by: Ian Campbell ian.campb...@citrix.com
   The patch description seems somewhat incorrect to me.
   UR should not be fatal to the system, and it's not platform
   specific.
   I think that people have been able to repro this, but I'll let Jan
   comment on it.
  
  Depending on how the BIOS sets up AER (if even available), a UR can very
  easily be fatal to the system.
  
  If firmware first mode is set, Xen (or indeed Linux) can't fix a
  problematic setup.  Experimentally, doing so can cause infinite loops in
  certain vendors SMM handlers.
  
  I think it can, just disable UR reporting, this is up to OS.  This is
  what the PCI spec says - you have snipped the relevant part out from the
  mail you are replying to.
 
 As already said by Andrew, the OS must not do such when the
 system is in APEI firmware first mode.
 
 Jan


Yes Linux can't fix firmware 1st mode, but
PCI express spec says what firmware should do in this case:

IMPLEMENTATION NOTE Software UR Reporting Compatibility with 1.0a Devices

With 1.0a device Functions, 96 if the Unsupported Request Reporting 
Enable bit is set, the Function
when operating as a Completer will send an uncorrectable error Message 
(if enabled) when a UR
error is detected. On platforms where an uncorrectable error Message is 
handled as a System Error,
this will break PC-compatible Configuration Space probing, so 
software/firmware on such
platforms may need to avoid setting the Unsupported Request Reporting 
Enable bit.
With device Functions implementing Role-Based Error Reporting, setting 
the Unsupported Request
Reporting Enable bit will not interfere with PC-compatible 
Configuration Space probing, assuming
that the severity for UR is left at its default of non-fatal. However, 
setting the Unsupported Request
Reporting Enable bit will enable the Function to report UR errors 
detected with posted Requests,
helping avoid this case for potential silent data corruption.
On platforms where robust error handling and PC-compatible 
Configuration Space probing is
required, it is suggested that software or firmware have the 
Unsupported Request Reporting Enable
bit Set for Role-Based Error Reporting Functions, but clear for 1.0a 
Functions. Software or
firmware can distinguish the two classes of Functions by examining the 
Role-Based Error Reporting
bit in the Device Capabilities register.


What I think you have is a very old 1.0a system, and you set Unsupported
Request Reporting Enable.

Can you confirm?

If you have access to the system in question, I can provide
a test script to detect this.

You will have other problems if your firmware doesn't follow the spec. So how
about either

- Don't use firmware 1st mode with pci express
  (Seems no reason to do firmware 1st for PCIE, architecture is completely
   standard. I saw mentions of using combined/parallel mode, using AER for some
   devices but not others, but I don't know how this is supposed to be enabled.
   Any idea?)

or

- ask your vendor to update firmware if it doesn't do the right thing

-- 
MST

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Dom0 linux 4.0 + devel/for-linus-4.1 branch: p2m.c:884:d0v0 gfn_to_mfn failed! gfn=ffffffff001ed type:4

On 13/04/15 10:39, Sander Eikelenboom wrote:
 Hi David,
 
 I seem to have spotted some trouble with a 4.0 dom0 kernel with the 
 devel/for-linus-4.1 branch pulled on top.
 
 Does this remind you of any specific commits in the devel/for-linus-4.1 
 branch that could
 likely be involved that i could try to revert ?

Yes.  This will probably be 4e8c0c8c4bf3a (xen/privcmd: improve
performance of MMAPBATCH_V2) which makes the kernel try harder to map
all GFNs instead of failing on the first one.

I think this is qemu incorrectly trying to map GFNs.

David

 I now get a very large number of these at guest with passthrough start:
 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
 gfn=001ed type:4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/8] Add core.sh and wrapper function

2015-04-13 Thread George Dunlap

On 04/10/2015 03:29 PM, Stefano Stabellini wrote:
 diff --git a/raise b/raise
 new file mode 100755
 index 000..7f3faae
 --- /dev/null
 +++ b/raise
 @@ -0,0 +1,16 @@
 +#!/usr/bin/env bash
 
 It is important to set -e immediately to catch all possible errors:
 
  -e  Exit immediately if a command exits with a non-zero status.
 
 Then you can drop set -e from raise.sh and unraise.sh when you turn them
 into libraries.

set -e gets tricky when you sometimes expect commands to fail;
particularly if you may want to do some sort of clean-up.  But let's
cross that bridge when we come to it. :-)

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/2] xen/arm: Make HAS_PCI compilable on ARM by adding place-holder code



On Monday 13 April 2015 03:44 PM, Stefano Stabellini wrote:

On Mon, 13 Apr 2015, Manish Jaggi wrote:

Add ARM PCI Support
---
a) Place holder functions are added for pci_conf_read/write calls.
b) Macros dev_is_pci, pci_to_dev are implemented in
drivers/passthrough/pci/arm code

Signed-off-by: Manish Jaggi manish.ja...@caviumnetworks.com
---
  xen/arch/arm/Makefile|1 +
  xen/arch/arm/pci.c   |   60 +++
  xen/drivers/passthrough/arm/Makefile |1 +
  xen/drivers/passthrough/arm/pci.c|   88
++
  xen/drivers/passthrough/arm/smmu.c   |1 -
  xen/drivers/passthrough/pci.c|9 ++--
  xen/include/asm-arm/device.h |   33 +
  xen/include/asm-arm/domain.h |3 ++
  xen/include/asm-arm/pci.h|7 +--
  9 files changed, 186 insertions(+), 17 deletions(-)

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 935999e..aaf5d88 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -39,6 +39,7 @@ obj-y += device.o
  obj-y += decode.o
  obj-y += processor.o
  obj-y += smc.o
+obj-$(HAS_PCI) += pci.o
   #obj-bin-y += o
  diff --git a/xen/arch/arm/pci.c b/xen/arch/arm/pci.c
new file mode 100644
index 000..9438f41
--- /dev/null
+++ b/xen/arch/arm/pci.c
@@ -0,0 +1,60 @@
+#include xen/pci.h
+
+void _pci_conf_write(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg, uint32_t data, int bytes)
+{
+}
+
+uint32_t _pci_conf_read(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg, int bytes)
+{
+return 0;
+}

I guess that they are going to be implemented with platform specific
code? Although I thought that the mmcfg stuff is somewhat standard across
architectures.


yes.
_pci_conf_read/write will call pcihb_conf_read/write (pcihb = 
pci_host_bridge).

ref: http://lists.xen.org/archives/html/xen-devel/2015-02/msg02717.html

+uint8_t pci_conf_read8(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg)
+{
+return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 1);
+}
+
+uint16_t pci_conf_read16(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg)
+{
+return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 2);
+}
+
+uint32_t pci_conf_read32(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg)
+{
+return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 4);
+}
+
+void pci_conf_write8(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg, uint8_t data)
+{
+ _pci_conf_write(seg, bus, dev, func, reg, data, 1);
+}
+
+void pci_conf_write16(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg, uint16_t data)
+{
+ _pci_conf_write(seg, bus, dev, func, reg, data, 2);
+}
+
+void pci_conf_write32(
+uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
+uint32_t reg, uint32_t data)
+{
+ _pci_conf_write(seg, bus, dev, func, reg, data, 4);
+}
+
+void arch_pci_ro_device(int seg, int bdf)
+{
+}

This is also missing an implementation. Maybe you should add /* TODO */
here too

ok.




diff --git a/xen/drivers/passthrough/arm/Makefile
b/xen/drivers/passthrough/arm/Makefile
index f4cd26e..1a41549 100644
--- a/xen/drivers/passthrough/arm/Makefile
+++ b/xen/drivers/passthrough/arm/Makefile
@@ -1,2 +1,3 @@
  obj-y += iommu.o
  obj-y += smmu.o
+obj-$(HAS_PCI) += pci.o
diff --git a/xen/drivers/passthrough/arm/pci.c
b/xen/drivers/passthrough/arm/pci.c
new file mode 100644
index 000..07a5a78
--- /dev/null
+++ b/xen/drivers/passthrough/arm/pci.c
@@ -0,0 +1,88 @@
+/*
+ * PCI helper functions for arm for passthrough support.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307,
USA.
+ *
+ * Copyright (C) 2015 Cavium Networks
+ *
+ * Author: Manish Jaggi manish.ja...@cavium.com
+ */
+#include xen/pci.h
+#include asm/device.h
+#include xen/sched.h
+
+int _dump_pci_devices(struct pci_seg *pseg, void *arg)
+{
+struct pci_dev *pdev;
+
+printk( segment %04x \n, pseg-nr);
+
+list_for_each_entry ( pdev, pseg-alldevs_list, alldevs_list )
+{
+printk(%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs  ,
+   pseg-nr, pdev-bus,
+   PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn),
+

Re: [Xen-devel] [PATCH 1/8] Add core.sh and wrapper function

2015-04-13 Thread George Dunlap

On 04/10/2015 03:29 PM, Stefano Stabellini wrote:
 +arg_parse_cmd=\
 +local -a args;
 +local _a;
 +local _vn;
 +local _m;
 +
 +_m=true;
 +
 +for _a in \\$@\ ; do
 +false  echo \Evaluating \${_a} [[ \\${_a/=}\ = \\${_a}\ ]]\;
 +if \$_m  [[ \\${_a/=}\ != \\${_a}\ ]] ; then
 +false  echo Parameter;
 +_vn=\${_a%%=*};
 +eval \local \$_vn\;
 +eval \\$_a\;
 +elif \$_m  [[ \\${_a}\ == \--\ ]] ; then
 +false  echo Separator;
 +_m=false;
 +else
 +false  echo Argument;
 +_m=false;
 +args+=(\\$_a\);
 +fi;
 +done
 
 I am sorry but I cannot bear myself to introduce so much complexity into
 the world.  raisin is supposed to be easy to read.  If we end up
 actually needing something so powerful and so complex in the future we
 can import it then, but I think that at the moment we can do nicely
 without it.

I can certainly understand the sentiment, but the idea behind this
calling convention is actually to make it *easier* both to read and
debug most of the source file, at the cost of two very complex, but
well-tested and generic macros ($arg_parse and $requireargs).

Consider some criteria:
- Difficulty of writing or reading code that calls a function
- Difficulty of writing a function which will be called
- Availability of sensible defaults
- Ability to make sure all necessary arguments have been specified
- Passing function arguments into sub-functions
- Global variables are nasty to debug

Consider alternate ways of passing arguments into functions:

1. Positional

i.e., cp SRC DEST

In this model, it's easy to make sure that all necessary arguments have
been specified; and there are no nasty global variables to work with.

However is really hard to program and read because you have to remember
what argument goes where.  You also have to specify all possible
arguments -- you can't have a default.  And if you need to take an
argument and pass it to another function you call, you have to repeat it
again.

2. Option

i.e., -v --components='a b'

This is easier to read if you use long options (or common ones like -v
or -y), and you can also have defaults.  Additionally, you can have a
certain amount of error checking -- if you get an argument you don't
recognize you can throw an error.

But you have to write a lot of boilerplate option processing at the top
of each function; and if you have anything a bit complex (line
--components= above), it makes it a lot harder to read.  And you have
to write new code every time, giving you more opportunity for bugs.

Consider the argument parsing code you have in raise at the moment -- it
assumes that you will run a command without any additional parameters.
What if you wanted to do something like this:

raise -v build xen

The current code would have to be refactored.

And you have to write a lot of custom code to check to make sure that
all the required arguments have been passed in.

And you have the same problem that if you need to pass an argument
further down the call stack, you have to repeat it again.

3. Global variables

i.e.,
VAR=foo
 function

This is a bit easier to read in the sense that you're using full
variable names for options.  It's easier to write, since you don't have
any parsing code.  Additionally, variables can be passed further down
the calling stack automatically without having to be explicitly copied
around.  In theory you should be able to have sensible local defaults.

But you have all the problems with global variables being really hard to
debug.

4. Prefix variable

i.e., YES=y check-builddep

This is a bit easier to read in the sense that you're using full
variable names for options.  It's easier to write, since you don't have
any parsing code.  Additionally, variables can be passed further down
the calling stack automatically without having to be explicitly copied
around.  In theory you should be able to have sensible local defaults.

I hadn't thought about this as a calling convention.  It certainly has
the advantage that it's build into bash and fairly well-understood by
experienced users.

But I do think it's a bit unnatural to have to prefix all the variables
to the function.

5. VAR=VAL automatic parsing with inheritance

This allows you to have a rich, consistent way to pass in arguments that
are descriptive.  It's easy to write code to parse the arguments and
check to make sure you have all the ones you need: Just add $arg_parse
and $requireargs at the top of your function.  They are automatically
declared local, so there's no worry about polluting the global
namespace.  They are automatically inherited, so you don't have to keep
passing information further down the stack.  And you can have sensible
defaults.

I think #5 is well worth the little bit of macro hackery confined to a
handful of well-tested macros, particularly if raise becomes a fairly
large and fully-functional script.  I've been using this calling
convention for several years now in my

Re: [Xen-devel] [PATCH 2/2] xen/arm: Make HAS_PCI compilable on ARM by adding place-holder code

Hi Manish,

On 13/04/15 08:48, Manish Jaggi wrote:
 Add ARM PCI Support
 ---
 a) Place holder functions are added for pci_conf_read/write calls.
 b) Macros dev_is_pci, pci_to_dev are implemented in
 drivers/passthrough/pci/arm code
 
 Signed-off-by: Manish Jaggi manish.ja...@caviumnetworks.com
 ---
  xen/arch/arm/Makefile|1 +
  xen/arch/arm/pci.c   |   60 +++
  xen/drivers/passthrough/arm/Makefile |1 +
  xen/drivers/passthrough/arm/pci.c|   88
 ++
  xen/drivers/passthrough/arm/smmu.c   |1 -
  xen/drivers/passthrough/pci.c|9 ++--
  xen/include/asm-arm/device.h |   33 +
  xen/include/asm-arm/domain.h |3 ++
  xen/include/asm-arm/pci.h|7 +--
  9 files changed, 186 insertions(+), 17 deletions(-)
 
 diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
 index 935999e..aaf5d88 100644
 --- a/xen/arch/arm/Makefile
 +++ b/xen/arch/arm/Makefile
 @@ -39,6 +39,7 @@ obj-y += device.o
  obj-y += decode.o
  obj-y += processor.o
  obj-y += smc.o
 +obj-$(HAS_PCI) += pci.o
  
  #obj-bin-y += o
  
 diff --git a/xen/arch/arm/pci.c b/xen/arch/arm/pci.c
 new file mode 100644
 index 000..9438f41
 --- /dev/null
 +++ b/xen/arch/arm/pci.c
 @@ -0,0 +1,60 @@
 +#include xen/pci.h
 +
 +void _pci_conf_write(

The _ is not necessary here. Please name the function pci_conf_write.

 +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
 +uint32_t reg, uint32_t data, int bytes)

unsigned int bytes?

 +{
 +}
 +
 +uint32_t _pci_conf_read(
 +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
 +uint32_t reg, int bytes)
 +{
 +return 0;
 +}

Same here.

 +
 +uint8_t pci_conf_read8(
 +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
 +uint32_t reg)
 +{
 +return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 1);
 +}
 +
 +uint16_t pci_conf_read16(
 +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
 +uint32_t reg)
 +{
 +return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 2);

Wrong cast.

 +}
 +
 +uint32_t pci_conf_read32(
 +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func,
 +uint32_t reg)
 +{
 +return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 4);

Wrong cast.

[..]

 diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
 index 004aba9..68c292b 100644
 --- a/xen/drivers/passthrough/pci.c
 +++ b/xen/drivers/passthrough/pci.c
 @@ -1252,9 +1252,12 @@ static int assign_device(struct domain *d, u16
 seg, u8 bus, u8 devfn)
  /* Prevent device assign if mem paging or mem sharing have been
   * enabled for this domain */
  if ( unlikely(!need_iommu(d) 
 -(d-arch.hvm_domain.mem_sharing_enabled ||
 - d-mem_event-paging.ring_page ||
 - p2m_get_hostp2m(d)-global_logdirty)) )
 +(d-mem_event-paging.ring_page
 +#ifdef CONFIG_X86
 + || d-arch.hvm_domain.mem_sharing_enabled ||
 + p2m_get_hostp2m(d)-global_logdirty
 +#endif
 +)) )

Please avoid #ifdef CONFIG_* and introduce an arch macro.

  return -EXDEV;
  
  if ( !spin_trylock(pcidevs_lock) )
 diff --git a/xen/include/asm-arm/device.h b/xen/include/asm-arm/device.h
 index a72f7c9..009ff0a 100644
 --- a/xen/include/asm-arm/device.h
 +++ b/xen/include/asm-arm/device.h
 @@ -6,6 +6,17 @@
  enum device_type
  {
  DEV_DT,
 +DEV_ENUMERATED,
 +};
 +
 +enum device_class
 +{
 +DEVICE_SERIAL,
 +DEVICE_IOMMU,
 +DEVICE_GIC,
 +DEVICE_PCI,
 +/* Use for error */
 +DEVICE_UNKNOWN,
  };

Why? It will be very unlikely that we have to create a structure device
for the IOMMU, GIC and SERIAL.

It would make more sense to add a DEV_PCI directly to device_type.

  
  struct dev_archdata {
 @@ -16,28 +27,30 @@ struct dev_archdata {
  struct device
  {
  enum device_type type;
 +enum device_class class;
  #ifdef HAS_DEVICE_TREE
  struct dt_device_node *of_node; /* Used by drivers imported from
 Linux */
  #endif
  struct dev_archdata archdata;
 +#ifdef HAS_PCI
 +void *pci_dev;

This field is not necessary. I've added one for the DT for keeping
compatibility with Linux.

 +#endif

[..]

 +int dev_is_pci(device_t *dev);

This could easily be a macro.

  
  struct device_desc {
  /* Device name */
 diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
 index 9e0419e..41ae847 100644
 --- a/xen/include/asm-arm/domain.h
 +++ b/xen/include/asm-arm/domain.h
 @@ -42,6 +42,8 @@ struct vtimer {
  uint64_t cval;
  };
  
 +#define has_arch_pdevs(d)(!list_empty((d)-arch.pdev_list))
 +
  struct arch_domain
  {
  #ifdef CONFIG_ARM_64
 @@ -56,6 +58,7 @@ struct arch_domain
  xen_pfn_t *grant_table_gpfn;
  
  struct io_handler io_handlers;
 +struct list_head pdev_list;
  /* Continuable domain_relinquish_resources(). */
  enum {

[Xen-devel] Dom0 linux 4.0 + devel/for-linus-4.1 branch: p2m.c:884:d0v0 gfn_to_mfn failed! gfn=ffffffff001ed type:4

Hi David,

I seem to have spotted some trouble with a 4.0 dom0 kernel with the 
devel/for-linus-4.1 branch pulled on top.

Does this remind you of any specific commits in the devel/for-linus-4.1 branch 
that could
likely be involved that i could try to revert ?

--
Sander


I now get a very large number of these at guest with passthrough start:
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001ed type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001ee type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001ef type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001f0 type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001f1 type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001f2 type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001f3 type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001f4 type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001f5 type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001f6 type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001f7 type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001f8 type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001f9 type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001fa type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001fb type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001fc type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001fd type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001fe type:4
(XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
gfn=001ff type:4
(XEN) [2015-04-12 14:55:20.226] memory_map: error -22 removing dom1 access to 
[fe200,fe3ff]


But the guest starts after that .. and the device seems to work fine:
(d1) [2015-04-12 14:55:20.230] pci dev 05:0 INTA-IRQ10
(d1) [2015-04-12 14:55:20.246] No RAM in high memory; setting high_mem resource 
base to 1
(d1) [2015-04-12 14:55:20.247] pci dev 03:0 bar 10 size 00200: 0f008
(d1) [2015-04-12 14:55:20.248] pci dev 02:0 bar 14 size 00100: 0f208
(d1) [2015-04-12 14:55:20.250] pci dev 05:0 bar 10 size 00020: 0f304
(XEN) [2015-04-12 14:55:20.250] memory_map:add: dom1 gfn=f3000 mfn=fe200 nr=200
(d1) [2015-04-12 14:55:20.255] pci dev 04:0 bar 30 size 4: 0f320
(d1) [2015-04-12 14:55:20.257] pci dev 04:0 bar 10 size 2: 0f324
(d1) [2015-04-12 14:55:20.257] pci dev 03:0 bar 30 size 1: 0f326
(d1) [2015-04-12 14:55:20.259] pci dev 03:0 bar 14 size 01000: 0f327
(d1) [2015-04-12 14:55:20.260] pci dev 02:0 bar 10 size 00100: 0c001
(d1) [2015-04-12 14:55:20.262] pci dev 04:0 bar 14 size 00040: 0c101
(d1) [2015-04-12 14:55:20.263] pci dev 01:1 bar 20 size 00010: 0c141
(d1) [2015-04-12 14:55:20.266] Multiprocessor initialisation:
(d1) [2015-04-12 14:55:20.286]  - CPU0 ... 48-bit phys ... fixed MTRRs ... var 
MTRRs [1/8] ... done.
(d1) [2015-04-12 14:55:20.307]  - CPU1 ... 48-bit phys ... fixed MTRRs ... var 
MTRRs [1/8] ... done.
(d1) [2015-04-12 14:55:20.333]  - CPU2 ... 48-bit phys ... fixed MTRRs ... var 
MTRRs [1/8] ... done.
(d1) [2015-04-12 14:55:20.359]  - CPU3 ... 48-bit phys ... fixed MTRRs ... var 
MTRRs [1/8] ... done.
(d1) [2015-04-12 14:55:20.359] Testing HVM environment:
(d1) [2015-04-12 14:55:20.375]  - REP INSB across page boundaries ... passed
(d1) [2015-04-12 14:55:20.388]  - GS base MSRs and SWAPGS ... passed
(d1) [2015-04-12 14:55:20.388] Passed 2 of 2 tests
(d1) [2015-04-12 14:55:20.388] Writing SMBIOS tables ...
(d1) [2015-04-12 14:55:20.389] Loading SeaBIOS ...
(d1) [2015-04-12 14:55:20.389] Creating MP tables ...
(d1) [2015-04-12 14:55:20.389] Loading ACPI ...
(d1) [2015-04-12 14:55:20.390] vm86 TSS at fc00a200
(d1) [2015-04-12 14:55:20.391] BIOS map:
(d1) [2015-04-12 14:55:20.391]  1-100d3: Scratch space
(d1) [2015-04-12 14:55:20.391]  c-f: Main BIOS
(d1) [2015-04-12 14:55:20.391] E820 table:
(d1) [2015-04-12 14:55:20.391]  [00]: : - :000a: RAM
(d1) [2015-04-12 14:55:20.391]  HOLE: :000a - :000c
(d1) [2015-04-12 14:55:20.391]  [01]: :000c - :0010: 
RESERVED
(d1) [2015-04-12 14:55:20.391]  [02]: :0010 - :3f80: RAM
(d1) [2015-04-12 14:55:20.391]  HOLE: :3f80 - :fc00
(d1) [2015-04-12 14:55:20.391]  [03]: :fc00 - 0001:: 
RESERVED
(d1) [2015-04-12 14:55:20.391] Invoking SeaBIOS ...
(d1)

Re: [Xen-devel] Outstanding fixes for Linux 4.1?

On 02/04/15 20:22, Konrad Rzeszutek Wilk wrote:
 On Thu, Apr 02, 2015 at 05:21:50PM +0100, David Vrabel wrote:
 All,

 Are there any outstanding patches that should be added to Linux 4.1
 (the merge window for which will be opening shortly)?
 
 Bob's two patches to the deferred grant mechanism.

These need acking by a blkback maintainer.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown

On Mon, 13 Apr 2015, Jan Beulich wrote:
  On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote:
  On Wed, 25 Mar 2015, Jan Beulich wrote:
  When a device gets detached from a guest, pciback will clear its
  command register, thus disabling both memory and I/O decoding. The
  disabled memory decoding, however, has an effect on the MSI-X table
  accesses the hypervisor does: These won't have the intended effect
  anymore. Even worse, for PCIe devices (but not SR-IOV virtual
  functions) such accesses may (will?) be treated as Unsupported
  Requests, causing respective errors to be surfaced, potentially in the
  form of NMIs that may be fatal to the hypervisor or Dom0 is different
  ways. Hence rather than carrying out these accesses, we should avoid
  them where we can, and use alternative (e.g. PCI config space based)
  mechanisms to achieve at least the same effect.
  
  I don't think that it is a good idea for both Xen and Linux to access
  the command register simultaneously.  Working around Linux in Xen
  doesn't sound like an optimal solution.   Maybe we could just fix the
  pciback and that would be enough.
 
 I'm afraid that would just eliminate the specific case, but not the
 general issue.

If we trust Dom0 to do the right thing, then I don't think there is a
general issue to be solved. Dom0 can break the system at any time, I
don't see any differences here, unless we have a plan to actually be
able to handle a misbehaving dom0, in that case I am all for it.


 While we trust Dom0 to not do outright bad things,
 the hypervisor should still avoid doing things that can go wrong
 due to the state a device is put (or left) in by Dom0.

Xen should also avoid doing things that can go wrong because of the
state a device is put in by QEMU or other components in the system.
There isn't much room for Xen to play with.


  In any case we should make it clear somewhere who is supposed to write
  to the command register (and other PCI reigsters) at any given time,
  otherwise it would be very easy for a new kernel update to break the
  hypervisor and we wouldn't even know why it happened.
 
 We should, but at this point in time this is going to be rather
 problematic. Such a separation of responsibilities should have been
 done before all the pass-through code got written.

That is true.


  @@ -369,24 +401,52 @@ static void msi_set_mask_bit(struct irq_
   }
   break;
   case PCI_CAP_ID_MSIX:
  -{
  -int offset = PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
  -writel(flag, entry-mask_base + offset);
  -readl(entry-mask_base + offset);
  -break;
  -}
  +if ( likely(memory_decoded(pdev)) )
  +{
  +writel(flag, entry-mask_base + 
  PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
  +readl(entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
  +break;
  +}
  +if ( flag )
  +{
  +u16 control;
  +domid_t domid = pdev-domain-domain_id;
  +
  +control = pci_conf_read16(seg, bus, slot, func,
  +  
  msix_control_reg(entry-msi_attrib.pos));
  +if ( control  PCI_MSIX_FLAGS_MASKALL )
  +break;
  +pci_conf_write16(seg, bus, slot, func,
  + msix_control_reg(entry-msi_attrib.pos),
  + control | PCI_MSIX_FLAGS_MASKALL);
  
  How is that going to interact with Linux (Dom0) writing to the command
  register? Moreover QEMU writes to the PCI_MSIX_FLAGS_MASKALL bit for
  devices assigned to HVM guests. Could this cause any conflicts?
 
 I don't see the relation to the comment register here - all quoted code
 only deals with the MSI-X control register.

I meant control register, sorry for the confusion.


  One might argue that QEMU should not touch PCI_MSIX_FLAGS_MASKALL, but
  as a matter of fact QEMU has done that for years and we cannot break
  that behaviour without introducing regressions.  In fact as it stands
  QEMU is the owner of PCI_MSIX_FLAGS_MASKALL for devices assigned to HVM
  guests, not Xen.
  
  Can we avoid messing with PCI_MSIX_FLAGS_MASKALL in Xen for passed
  through devices to running domains?  I think that might be a good enough
  separation of responsibilities between Xen and QEMU.
 
 No, the bits has to be under hypervisor control (just like any other
 hardware bits controlling interrupts).

Maybe they should be under hypervisor control, but keep in mind that
they are not and they haven't been for years now.  We cannot go ahead
and break existing code paths and use cases. As it stands, QEMU is in
charge of PCI_MSIX_FLAGS_MASKALL for passthrough devices and this is a
regression.


 Dealing with this is the subject of patches I have ready, but didn't post yet.

OK, that is very good. But if we really want to make this change,
then I think that those patches would need to be part of this series so
that they can be committed

Re: [Xen-devel] [PATCH v3 3/8] x86: detect and initialize Intel CAT feature

 On 27.03.15 at 19:31, andrew.coop...@citrix.com wrote:
 On 26/03/15 12:38, Chao Peng wrote:
 +cpuid_count(PSR_CPUID_LEVEL_CAT, 0, eax, ebx, ecx, edx);
 +if ( ebx  PSR_RESOURCE_TYPE_L3 )
 +{
 +cpuid_count(PSR_CPUID_LEVEL_CAT, 1, eax, ebx, ecx, edx);
 +info-cbm_len = (eax  0x1f) + 1;
 +info-cos_max = (edx  0x);
 +
 +info-enabled = 1;
 +printk(XENLOG_DEBUG CAT: enabled on socket %u, cos_max:%u, 
 cbm_len:%u\n,
 +   socket, info-cos_max, info-cbm_len);
 
 I would bump this to INFO, at least for socket 0.  Similar to CMT, a
 user which has gone to the effort of enabling CAT will want some
 indication in the boot log without also having to bump the logging level.

Which INFO wouldn't be sufficient for (at least not with the xen.org
defaults, not sure about XenServer's).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC][PATCH 10/13] tools: extend XENMEM_set_memory_map

2015-04-13 Thread Wei Liu

On Mon, Apr 13, 2015 at 10:09:51AM +0800, Chen, Tiejun wrote:
[...]
 Hardcoded value?
 
 Yes. Actually, we intend to use this to present that lowmem entry,
 
 tools/firmware/hvmloader/e820.c:
 
 /* Low RAM goes here. Reserve space for special pages. */
 ...
 e820[nr].addr = 0x10;
 

I don't like the idea of having two hardcoded values in different
locations. Please put this value into a header file and reference it
here and in hvmloader.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown

On Mon, 13 Apr 2015, Jan Beulich wrote:
  On 13.04.15 at 12:50, stefano.stabell...@eu.citrix.com wrote:
  On Mon, 13 Apr 2015, Jan Beulich wrote:
   On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote:
   On Wed, 25 Mar 2015, Jan Beulich wrote:
   When a device gets detached from a guest, pciback will clear its
   command register, thus disabling both memory and I/O decoding. The
   disabled memory decoding, however, has an effect on the MSI-X table
   accesses the hypervisor does: These won't have the intended effect
   anymore. Even worse, for PCIe devices (but not SR-IOV virtual
   functions) such accesses may (will?) be treated as Unsupported
   Requests, causing respective errors to be surfaced, potentially in the
   form of NMIs that may be fatal to the hypervisor or Dom0 is different
   ways. Hence rather than carrying out these accesses, we should avoid
   them where we can, and use alternative (e.g. PCI config space based)
   mechanisms to achieve at least the same effect.
   
   I don't think that it is a good idea for both Xen and Linux to access
   the command register simultaneously.  Working around Linux in Xen
   doesn't sound like an optimal solution.   Maybe we could just fix the
   pciback and that would be enough.
  
  I'm afraid that would just eliminate the specific case, but not the
  general issue.
  
  If we trust Dom0 to do the right thing, then I don't think there is a
  general issue to be solved. Dom0 can break the system at any time, I
  don't see any differences here, unless we have a plan to actually be
  able to handle a misbehaving dom0, in that case I am all for it.
 
 No, that gets us in the wrong direction. Dom0 can have legitimate
 reasons to have to clear memory or I/O decoding on a device at
 run time (even if current Linux doesn't do so). The more general
 problem we may need to solve is that of racing config space
 accesses (one by Dom0, the other by the hypervisor). But that's
 beyond this series' scope.

This is why I was asking for a document that describes who is in charge
of what and when. I don't think we can move forward without it.


  While we trust Dom0 to not do outright bad things,
  the hypervisor should still avoid doing things that can go wrong
  due to the state a device is put (or left) in by Dom0.
  
  Xen should also avoid doing things that can go wrong because of the
  state a device is put in by QEMU or other components in the system.
  There isn't much room for Xen to play with.
 
 Qemu is either part of Dom0, or doesn't play with devices directly.

I don't understand the point you are trying to make here.
If your intention is to point out that QEMU shouldn't be writing to the
control register, as I wrote earlier, the current codebase disagrees
with you and has been that way for years.


  And how are we going to deal with older unfixed QEMUs?
  So far we have been using the same policy for QEMU and the Dom0 kernel:
  Xen doesn't break them -- old Linux kernels and QEMUs are supposed to
  just work.
 
 I'm not sure that's really true for qemu, or if it is, then only by pure
 luck:

This is false, it is so by design.  QEMU has an internal libxc
compatibility layer.


 The tool stack interface of the hypervisor as well as the libxc
 interfaces are subject to change between any two releases.

QEMU knows how to cope with libxc interface changes.


 I view it as unavoidable to break older qemu here.

I disagree and I am opposed to any patches series that would
deliberately break QEMU.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/efi: Reserve SMBIOS table region when EFI booting

 On 02.04.15 at 10:24, ross.lagerw...@citrix.com wrote:
 Some EFI firmware implementations may place the SMBIOS table in RAM
 marked as BootServicesData, which Xen does not consider as reserved.
 When dom0 tries to access the SMBIOS, the region is not contained in the
 initial P2M and it crashes with a page fault. To fix this, reserve the
 SMBIOS region.
 
 Also, fix the memcmp check for existence of the SMBIOS

As Andrew said, put the memcmp() fixes all in a separate patch, or
fold the one remaining one in here.

 and the DMI
 checksum calculation.

This I don't agree with, and even less so without explanation. The
spec says

15h
 Intermediate Checksum
 BYTE
 Checksum of Intermediate Entry Point Structure (IEPS). This value,
 when added to all other bytes in the IEPS, results in the value 00h
 (using 8-bit addition calculations). Values in the IEPS are summed
 starting at offset 10h, for 0Fh bytes.

I.e. I can't see why the sizeof() you replace would be wrong (other
than for broken firmware).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register

2015-04-13 Thread Michael S. Tsirkin

On Mon, Apr 13, 2015 at 01:40:59PM +0100, Jan Beulich wrote:
  On 13.04.15 at 13:47, m...@redhat.com wrote:
  On Mon, Apr 13, 2015 at 12:34:34PM +0100, Jan Beulich wrote:
   On 13.04.15 at 13:19, m...@redhat.com wrote:
   Yes Linux can't fix firmware 1st mode, but
   PCI express spec says what firmware should do in this case:
   
   IMPLEMENTATION NOTE Software UR Reporting Compatibility with 1.0a Devices
   
   With 1.0a device Functions, 96 if the Unsupported Request 
   Reporting 
  
   Enable bit is set, the Function
   when operating as a Completer will send an uncorrectable error 
   Message (if enabled) when a UR
   error is detected. On platforms where an uncorrectable error 
  Message 
   is handled as a System Error,
   this will break PC-compatible Configuration Space probing, so 
   software/firmware on such
   platforms may need to avoid setting the Unsupported Request 
   Reporting Enable bit.
   With device Functions implementing Role-Based Error Reporting, 
   setting the Unsupported Request
   Reporting Enable bit will not interfere with PC-compatible 
   Configuration Space probing, assuming
   that the severity for UR is left at its default of non-fatal. 
   However, setting the Unsupported Request
   Reporting Enable bit will enable the Function to report UR 
   errors 
   detected with posted Requests,
   helping avoid this case for potential silent data corruption.
   On platforms where robust error handling and PC-compatible 
   Configuration Space probing is
   required, it is suggested that software or firmware have the 
   Unsupported Request Reporting Enable
   bit Set for Role-Based Error Reporting Functions, but clear for 
   1.0a 
  
   Functions. Software or
   firmware can distinguish the two classes of Functions by 
   examining 
   the Role-Based Error Reporting
   bit in the Device Capabilities register.
   
   
   What I think you have is a very old 1.0a system, and you set Unsupported
   Request Reporting Enable.
   
   Can you confirm?
  
  No. In at least one of the two cases we got reports of the original
  problem, triggering the finding of this issue, this is a brand new one,
  only soon to become available publicly. Furthermore I'm being
  confused by the mention of PC-compatible config space probing
  above: The URs we talk about here don't result from config space
  accessed at all.
  
  OK. Can you please explain why does UR cause a system error then?
  It looks like a hardware bug: PCIE 1.1 seems to say it shouldn't.
 
 Quite possible. Looking at the ITP log we were provided, the UR
 severity bit is clear (non-fatal), yet the error got surfaced to the
 OS as a fatal one (I would guess because it validly gets flagged as
 uncorrectable at the same time).
 
 Jan

No, that's not valid.
Can you check device capabilities register, offset 0x4 within
pci express capability structure?
Bit 15 is 15 Role-Based Error Reporting.
Is it set?

The spec says:

15
On platforms where robust error handling and PC-compatible 
Configuration Space probing is
required, it is suggested that software or firmware have the 
Unsupported Request Reporting Enable
bit Set for Role-Based Error Reporting Functions, but clear for 1.0a 
Functions. Software or
firmware can distinguish the two classes of Functions by examining the 
Role-Based Error Reporting
bit in the Device Capabilities register.

-- 
MST

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCHv2 1/6] x86/hvm: don't include asm/spinlock.h

On 10/04/15 16:24, Andrew Cooper wrote:
 On 10/04/15 15:19, David Vrabel wrote:
 asm/spinlock.h should not be included directly.

 Signed-off-by: David Vrabel david.vra...@citrix.com
 
 s/asm/xen/g instead of a straight delete?
 
 Otherwise you are relying on pulling in xen/spinlock.h implicitly.

None of these files declare a spinlock so getting the header implicitly
is correct, IMO.

David

  xen/arch/x86/hvm/hvm.c |1 -
  xen/arch/x86/hvm/svm/svm.c |1 -
  xen/arch/x86/hvm/vmx/vmx.c |1 -


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [xen-unstable test] 50291: regressions - FAIL

2015-04-13 Thread Ian Jackson

Jan Beulich writes (Re: [Xen-devel] [xen-unstable test] 50291: regressions - 
FAIL):
 On 10.04.15 at 19:03, ian.jack...@eu.citrix.com wrote:
  These were the only problems in this flight.  Under the circumstances
  I think we would be justified in force pushing to xen.git#master:
  
  version targeted for testing:
   xen  3dca8a17f639f4b53cd2eeb687cec4e9f83acf3d
 
 Yes please.

Done.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [xen-unstable test] 50291: regressions - FAIL

 On 13.04.15 at 15:55, ian.jack...@eu.citrix.com wrote:
 Jan Beulich writes (Re: [Xen-devel] [xen-unstable test] 50291: regressions - 
 FAIL):
 On 10.04.15 at 19:03, ian.jack...@eu.citrix.com wrote:
  These were the only problems in this flight.  Under the circumstances
  I think we would be justified in force pushing to xen.git#master:
  
  version targeted for testing:
   xen  3dca8a17f639f4b53cd2eeb687cec4e9f83acf3d
 
 Yes please.
 
 Done.

Thanks! Perhaps similarly consider flight 50355 a good one for 4.5 then?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC] tcp: Allow sk_wmem_alloc to exceed sysctl_tcp_limit_output_bytes

2015-04-13 Thread Eric Dumazet

On Mon, 2015-04-13 at 14:46 +0100, David Vrabel wrote:

  And the proof-of-concept patch for idea (b) I used was:
  
  @@ -552,6 +552,8 @@ static int xennet_start_xmit(struct sk_buff *skb,
  struct net_device *dev)
   goto drop;
   }
  
  +skb_orphan(skb);
  +
   page = virt_to_page(skb-data);
   offset = offset_in_page(skb-data);
   len = skb_headlen(skb);
 
 No. This a bunch of allocations and  a full memcpy of all the frags.

skb_orphan() does nothing like that.

But the main concern here is it basically breaks back pressure.

And you do not want this, unless there is no other choice.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] xen-pt: Fix bug cause PCI devices re-attach failed

2015-04-13 Thread Liang Li

Use the option like 'pci=[ '07:10,1', '0b:10.1', '81:10.1']' in HVM
guest configuration file to assign more than one VF PCI devices to
a guest, after the guest boot up, detach the VFs in sequence by
'xl pci-detach $DOM_ID $VF_BDF', and then attach the VFs by
'xl pci-attach $DOM_ID $VF_BDF' in sequence, an error message will
be reported like following:

libxl: error: libxl_qmp.c:287:qmp_handle_error_response: receive
an error message from QMP server: Duplicate ID 'pci-pt-81_10.1'
for device.

The error message will be printed in two cases:
1. Attach and detach very quickly.

The message will disappear if retry, it is expected because of
the asynchronous unplug mechanism.

2. Do the attach and detach operation with a time interval. eg. 10s.

The error message will not  disappear if retry, in this case, it's
a bug.

In the 'xen_pt_region_add' and 'xen_pt_region_del', we should only care
about the 'xen-pci-pt-*' memory region, this can avoid the region's
reference count is not equal with the dereference count when the
device is detached and prevent the device's related QemuOpts object from
being released properly, and then trigger the bug when the device is
re-attached.

I sent a patch to fix a similar bug before, but the patch could not
fix the issue completely.

Signed-off-by: Liang Li liang.z...@intel.com
---
 hw/xen/xen_pt.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index f2893b2..8aab769 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -61,6 +61,7 @@
 #include qemu/range.h
 #include exec/address-spaces.h
 
+#define XEN_PT_NAME_PREFIX xen-pci-pt
 #define XEN_PT_NR_IRQS (256)
 static uint8_t xen_pt_mapped_machine_irq[XEN_PT_NR_IRQS] = {0};
 
@@ -585,6 +586,10 @@ static void xen_pt_region_update(XenPCIPassthroughState *s,
 
 static void xen_pt_region_add(MemoryListener *l, MemoryRegionSection *sec)
 {
+if (strncmp(sec-mr-name, XEN_PT_NAME_PREFIX,
+sizeof(XEN_PT_NAME_PREFIX))) {
+return;
+}
 XenPCIPassthroughState *s = container_of(l, XenPCIPassthroughState,
  memory_listener);
 
@@ -594,6 +599,11 @@ static void xen_pt_region_add(MemoryListener *l, 
MemoryRegionSection *sec)
 
 static void xen_pt_region_del(MemoryListener *l, MemoryRegionSection *sec)
 {
+if (strncmp(sec-mr-name, XEN_PT_NAME_PREFIX,
+sizeof(XEN_PT_NAME_PREFIX))) {
+return;
+}
+
 XenPCIPassthroughState *s = container_of(l, XenPCIPassthroughState,
  memory_listener);
 
@@ -603,6 +613,10 @@ static void xen_pt_region_del(MemoryListener *l, 
MemoryRegionSection *sec)
 
 static void xen_pt_io_region_add(MemoryListener *l, MemoryRegionSection *sec)
 {
+if (strncmp(sec-mr-name, XEN_PT_NAME_PREFIX,
+sizeof(XEN_PT_NAME_PREFIX))) {
+return;
+}
 XenPCIPassthroughState *s = container_of(l, XenPCIPassthroughState,
  io_listener);
 
@@ -612,6 +626,10 @@ static void xen_pt_io_region_add(MemoryListener *l, 
MemoryRegionSection *sec)
 
 static void xen_pt_io_region_del(MemoryListener *l, MemoryRegionSection *sec)
 {
+if (strncmp(sec-mr-name, XEN_PT_NAME_PREFIX,
+sizeof(XEN_PT_NAME_PREFIX))) {
+return;
+}
 XenPCIPassthroughState *s = container_of(l, XenPCIPassthroughState,
  io_listener);
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015-04-13 Thread Peter Zijlstra

On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote:
 +void __init __pv_init_lock_hash(void)
 +{
 +   int pv_hash_size = 4 * num_possible_cpus();
 +
 +   if (pv_hash_size  (1U  LFSR_MIN_BITS))
 +   pv_hash_size = (1U  LFSR_MIN_BITS);
 +   /*
 +* Allocate space from bootmem which should be page-size aligned
 +* and hence cacheline aligned.
 +*/
 +   pv_lock_hash = alloc_large_system_hash(PV qspinlock,
 +  sizeof(struct pv_hash_bucket),
 +  pv_hash_size, 0, HASH_EARLY,
 +   pv_lock_hash_bits, NULL,
 +  pv_hash_size, pv_hash_size);
  pv_taps = lfsr_taps(pv_lock_hash_bits);
 
 
 I don't understand what you meant here.

Let me explain (even though I propose taking all the LFSR stuff out).

pv_lock_hash_bit is a runtime variable, therefore it cannot compile time
evaluate the forest of if statements required to compute the taps value.

Therefore its best to compute the taps _once_, and this seems like a
good place to do so.

 +   goto done;
 +   }
 +   }
 +
 +   hash = lfsr(hash, pv_lock_hash_bits, 0);
 Since pv_lock_hash_bits is a variable, you end up running through that
 massive if() forest to find the corresponding tap every single time. It
 cannot compile-time optimize it.
 
 The minimum bits size is now 8. So unless the system has more than 64 vCPUs,
 it will get the right value in the first if statement.

Still, no reason to not pre-compute the taps value, its simple enough.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] More force pushes (xen-4.5-testing, xen-unstable, qemu-mainline)

2015-04-13 Thread Ian Jackson

Here are some more which I propose to force push. In each case I show
all the blocking tests mentioned in the osstest test report.

osstest service user writes ([xen-4.5-testing test] 50373: regressions -
FAIL):
flight 50373 xen-4.5-testing real [real]
...
test-amd64-i386-freebsd10-amd64 14 guest-localmigrate/x10 fail in 50317
REGR. vs. 50268
...
version targeted for testing:
xen 0b754fb3ed6b7b6d0f2e1f7c1877d3c0a7da2168

osstest service user writes ([xen-unstable test] 50390: regressions - FAIL):
flight 50390 xen-unstable real [real]
test-amd64-i386-freebsd10-i386 13 guest-localmigrate fail REGR. vs.
36514
test-amd64-i386-freebsd10-amd64 15 guest-localmigrate.2 fail REGR. vs.
36514
...
version targeted for testing:
xen 123c7793797502b222300eb710cd3873dcca41ee

osstest service user writes ([qemu-mainline test] 50391: regressions - FAIL):
flight 50391 qemu-mainline real [real]
...
test-amd64-i386-freebsd10-i386 13 guest-localmigrate fail REGR. vs.
36709
test-amd64-i386-freebsd10-amd64 15 guest-localmigrate.2 fail REGR. vs.
36709
version targeted for testing:
qemuu58c24a4775ef7ccf16cfcd695753dfdf149fe29d

These are all FreeBSD migration tests. These seem to be failing much
more often in the new colo. As I say I don't know why this is. We
hope to fix it with the FreeBSD 10.1 update which Roger has submitted
and which is currently in the osstest self-push-gate.

In the general case they fail after a number of successful runs. In
each case the symptom is that the guest simply doesn't come back onto
the network after the one of the migrations. I double checked

http://logs.test-lab.xenproject.org/osstest/logs/50391/test-amd64-i386-freebsd10-amd64/info.html
and as with the earlier tests there doesn't seem to be any hints in
any of the logs.

These are all localhost migrations. This rules out a problem with the
colo infrastructure dropping or not honouring gratuitous ARP or MAC
address migration, since the guest's MAC and IP address remain (from
the infrastructure's point of view) on the same switch port.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen-pt: Fix bug cause PCI devices re-attach failed

2015-04-13 Thread Paolo Bonzini



On 13/04/2015 16:12, Liang Li wrote:
 2. Do the attach and detach operation with a time interval. eg. 10s.
 
 The error message will not  disappear if retry, in this case, it's
 a bug.
 
 In the 'xen_pt_region_add' and 'xen_pt_region_del', we should only care
 about the 'xen-pci-pt-*' memory region, this can avoid the region's
 reference count is not equal with the dereference count when the
 device is detached and prevent the device's related QemuOpts object from
 being released properly, and then trigger the bug when the device is
 re-attached.

This doesn't explain _which_ region is causing the bug and how.

Assuming this is the right fix, should you instead move the
memory_region_ref/unref pair from xen_pt_region_add/del after this
conditional:

if (bar == -1  (!s-msix || s-msix-mmio != mr)) {
return;
}

in xen_pt_region_update?

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/2] x86/domctl: cleanup

  On 01.04.15 at 17:31, andrew.coop...@citrix.com wrote:
  case XEN_DOMCTL_gettscinfo:
 -{
 -xen_guest_tsc_info_t info;
 -
 -ret = -EINVAL;
 -if ( d == current-domain ) /* no domain_pause() */
 -break;
 -
 -domain_pause(d);
 -tsc_get_info(d, info.tsc_mode,
 -info.elapsed_nsec,
 -info.gtsc_khz,
 -info.incarnation);
 -if ( copy_to_guest(domctl-u.tsc_info.out_info, info, 1) )
 -ret = -EFAULT;
 +if ( d == currd ) /* no domain_pause() */
 +ret = -EINVAL;
  else
 -ret = 0;
 -domain_unpause(d);
 -}
 -break;
 +{
 +xen_guest_tsc_info_t info;
 +
 +domain_pause(d);
 +tsc_get_info(d, info.tsc_mode,
 + info.elapsed_nsec,
 + info.gtsc_khz,
 + info.incarnation);
 +domain_unpause(d);
 +copyback = 1;

If you want to use copyback here, you need to pass pointers into
domctl-u.tsc_info.out_info to tsc_get_info().

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v9 10/15] tools/libxc: x86 HVM save code

2015-04-13 Thread Vitaly Kuznetsov

Andrew Cooper andrew.coop...@citrix.com writes:

 Save the x86 HVM specific parts of the domain.  This is considerably simpler
 than an x86 PV domain.  Only the HVM_CONTEXT and HVM_PARAMS records are
 needed.

 There is no need for any page normalisation.

 Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
 CC: Ian Campbell ian.campb...@citrix.com
 CC: Ian Jackson ian.jack...@eu.citrix.com
 CC: Wei Liu wei.l...@citrix.com
 ---
  tools/libxc/Makefile |1 +
  tools/libxc/xc_sr_common.h   |1 +
  tools/libxc/xc_sr_save_x86_hvm.c |  220 
 ++
  3 files changed, 222 insertions(+)
  create mode 100644 tools/libxc/xc_sr_save_x86_hvm.c

 diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
 index fc1e668..ba124f4 100644
 --- a/tools/libxc/Makefile
 +++ b/tools/libxc/Makefile
 @@ -58,6 +58,7 @@ GUEST_SRCS-$(CONFIG_X86) += xc_sr_common_x86.c
  GUEST_SRCS-$(CONFIG_X86) += xc_sr_common_x86_pv.c
  GUEST_SRCS-$(CONFIG_X86) += xc_sr_restore_x86_pv.c
  GUEST_SRCS-$(CONFIG_X86) += xc_sr_save_x86_pv.c
 +GUEST_SRCS-$(CONFIG_X86) += xc_sr_save_x86_hvm.c
  GUEST_SRCS-y += xc_sr_restore.c
  GUEST_SRCS-y += xc_sr_save.c
  GUEST_SRCS-y += xc_offline_page.c xc_compression.c
 diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
 index 4f33839..ec14d59 100644
 --- a/tools/libxc/xc_sr_common.h
 +++ b/tools/libxc/xc_sr_common.h
 @@ -286,6 +286,7 @@ struct xc_sr_context
  };

  extern struct xc_sr_save_ops save_ops_x86_pv;
 +extern struct xc_sr_save_ops save_ops_x86_hvm;

  extern struct xc_sr_restore_ops restore_ops_x86_pv;

 diff --git a/tools/libxc/xc_sr_save_x86_hvm.c 
 b/tools/libxc/xc_sr_save_x86_hvm.c
 new file mode 100644
 index 000..0928f19
 --- /dev/null
 +++ b/tools/libxc/xc_sr_save_x86_hvm.c
 @@ -0,0 +1,220 @@
 +#include assert.h
 +
 +#include xc_sr_common_x86.h
 +
 +#include xen/hvm/params.h
 +
 +/*
 + * Query for the HVM context and write an HVM_CONTEXT record into the stream.
 + */
 +static int write_hvm_context(struct xc_sr_context *ctx)
 +{
 +xc_interface *xch = ctx-xch;
 +int rc, hvm_buf_size;
 +struct xc_sr_record hvm_rec =
 +{
 +.type = REC_TYPE_HVM_CONTEXT,
 +};
 +
 +hvm_buf_size = xc_domain_hvm_getcontext(xch, ctx-domid, 0, 0);
 +if ( hvm_buf_size  0 )
 +{
 +PERROR(Couldn't get HVM context size from Xen);
 +rc = -1;
 +goto out;
 +}
 +
 +hvm_rec.data = malloc(hvm_buf_size);
 +if ( !hvm_rec.data )
 +{
 +PERROR(Couldn't allocate memory);
 +rc = -1;
 +goto out;
 +}
 +
 +hvm_buf_size = xc_domain_hvm_getcontext(xch, ctx-domid,
 +hvm_rec.data, hvm_buf_size);
 +if ( hvm_buf_size  0 )
 +{
 +PERROR(Couldn't get HVM context from Xen);
 +rc = -1;
 +goto out;
 +}
 +
 +hvm_rec.length = hvm_buf_size;
 +rc = write_record(ctx, hvm_rec);
 +if ( rc  0 )
 +{
 +PERROR(error write HVM_CONTEXT record);
 +goto out;
 +}
 +
 + out:
 +free(hvm_rec.data);
 +return rc;
 +}
 +
 +/*
 + * Query for a range of HVM parameters and write an HVM_PARAMS record into 
 the
 + * stream.
 + */
 +static int write_hvm_params(struct xc_sr_context *ctx)
 +{
 +static const unsigned int params[] = {
 +HVM_PARAM_STORE_PFN,
 +HVM_PARAM_IOREQ_PFN,
 +HVM_PARAM_BUFIOREQ_PFN,
 +HVM_PARAM_PAGING_RING_PFN,
 +HVM_PARAM_MONITOR_RING_PFN,
 +HVM_PARAM_SHARING_RING_PFN,
 +HVM_PARAM_VM86_TSS,
 +HVM_PARAM_CONSOLE_PFN,
 +HVM_PARAM_ACPI_IOPORTS_LOCATION,
 +HVM_PARAM_VIRIDIAN,
 +HVM_PARAM_IDENT_PT,
 +HVM_PARAM_PAE_ENABLED,
 +HVM_PARAM_VM_GENERATION_ID_ADDR,
 +HVM_PARAM_IOREQ_SERVER_PFN,
 +HVM_PARAM_NR_IOREQ_SERVER_PAGES,
 +};
 +
 +xc_interface *xch = ctx-xch;
 +struct xc_sr_rec_hvm_params_entry entries[ARRAY_SIZE(params)];
 +struct xc_sr_rec_hvm_params hdr = {
 +.count = 0,
 +};
 +struct xc_sr_record rec = {
 +.type   = REC_TYPE_HVM_PARAMS,
 +.length = sizeof(hdr),
 +.data   = hdr,
 +};
 +unsigned int i;
 +int rc;
 +
 +for ( i = 0; i  ARRAY_SIZE(params); i++ )
 +{
 +uint32_t index = params[i];
 +uint64_t value;
 +
 +rc = xc_hvm_param_get(xch, ctx-domid, index, value);
 +if ( rc )
 +{
 +PERROR(Failed to get HVMPARAM at index %u, index);
 +return rc;
 +}
 +
 +if ( value != 0 )
 +{
 +entries[hdr.count].index = index;
 +entries[hdr.count].value = value;
 +hdr.count++;
 +}
 +}

While reviewing my 'soft reset' series Ian C raised a question about the
unsafeness of sequential get/set of HVM params:
http://lists.xen.org/archives/html/xen-devel/2015-01/msg01177.html

In case the concern is valid I think the requirements for migration are

Re: [Xen-devel] Dom0 linux 4.0 + devel/for-linus-4.1 branch: p2m.c:884:d0v0 gfn_to_mfn failed! gfn=ffffffff001ed type:4


Monday, April 13, 2015, 2:21:21 PM, you wrote:

 On 13/04/15 13:14, Sander Eikelenboom wrote:
 
 Monday, April 13, 2015, 2:07:02 PM, you wrote:
 
 On 13/04/15 12:21, Sander Eikelenboom wrote:

 Monday, April 13, 2015, 11:50:51 AM, you wrote:

 On 13/04/15 10:39, Sander Eikelenboom wrote:
 Hi David,

 I seem to have spotted some trouble with a 4.0 dom0 kernel with the 
 devel/for-linus-4.1 branch pulled on top.

 Does this remind you of any specific commits in the devel/for-linus-4.1 
 branch that could
 likely be involved that i could try to revert ?

 Yes.  This will probably be 4e8c0c8c4bf3a (xen/privcmd: improve
 performance of MMAPBATCH_V2) which makes the kernel try harder to map
 all GFNs instead of failing on the first one.

 I think this is qemu incorrectly trying to map GFNs.

 David

 Reverted that specific one, but still get those messages.
 
 You'll have to bisect it then.  Because I don't see any other relevant
 commits in devel/for-linus-4.1
 
 David
 
 Ok .. hmm first candidate of the bisect also looks interessting:
 [628c28eefd6f2cef03b212081b466ae43fd093a3] xen: unify foreign GFN map/unmap 
 for auto-xlated physmap guests

 Unless your dom0 is PVH, no.

 David

Nope .. but bisect is underway .. so hopefully we will know it soon enough :-)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen

2015-04-13 Thread Jonathan Davies


On 13/04/15 11:56, George Dunlap wrote:

On Thu, Apr 9, 2015 at 5:36 PM, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:

On Thu, 9 Apr 2015, Eric Dumazet wrote:

On Thu, 2015-04-09 at 16:46 +0100, Stefano Stabellini wrote:

Hi all,

I found a performance regression when running netperf -t TCP_MAERTS from
an external host to a Xen VM on ARM64: v3.19 and v4.0-rc4 running in the
virtual machine are 30% slower than v3.18.

Through bisection I found that the perf regression is caused by the
prensence of the following commit in the guest kernel:


commit 605ad7f184b60cfaacbc038aa6c55ee68dee3c89
Author: Eric Dumazet eduma...@google.com
Date:   Sun Dec 7 12:22:18 2014 -0800

 tcp: refine TSO autosizing


[snip]


I recently discussed this issue on netdev in the following thread:

https://www.marc.info/?l=linux-netdevm=142738853820517


This commit restored original TCP Small Queue behavior, which is the
first step to fight bufferbloat.

Some network drivers are known to be problematic because of a delayed TX
completion.


[snip]


Try to tweak /proc/sys/net/ipv4/tcp_limit_output_bytes to see if it
makes a difference ?


A very big difference:

echo 262144  /proc/sys/net/ipv4/tcp_limit_output_bytes
brings us much closer to the original performance, the slowdown is just
8%

echo 1048576  /proc/sys/net/ipv4/tcp_limit_output_bytes
fills the gap entirely, same performance as before refine TSO
autosizing


What would be the next step for here?  Should I just document this as an
important performance tweaking step for Xen, or is there something else
we can do?


Is the problem perhaps that netback/netfront delays TX completion?
Would it be better to see if that can be addressed properly, so that
the original purpose of the patch (fighting bufferbloat) can be
achieved while not degrading performance for Xen?  Or at least, so
that people get decent perfomance out of the box without having to
tweak TCP parameters?


I agree; reducing the completion latency should be the ultimate goal. 
However, that won't be easy, so we need a work-around in the short term. 
I don't like the idea of relying on documenting the recommendation to 
change tcp_limit_output_bytes; too many people won't read this advice 
and will expect the out-of-the-box defaults to be reasonable.


Following Eric's pointers to where a similar problem had been 
experienced in wifi drivers, I came up with two proof-of-concept patches 
that gave a similar performance gain without any changes to sysctl 
parameters or core tcp/ip code. See 
https://www.marc.info/?l=linux-netdevm=142746161307283.


I haven't yet received any feedback from the xen-netfront maintainers 
about whether those ideas could be reasonably adopted.


Jonathan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register

 On 13.04.15 at 13:47, m...@redhat.com wrote:
 On Mon, Apr 13, 2015 at 12:34:34PM +0100, Jan Beulich wrote:
  On 13.04.15 at 13:19, m...@redhat.com wrote:
  Yes Linux can't fix firmware 1st mode, but
  PCI express spec says what firmware should do in this case:
  
  IMPLEMENTATION NOTE Software UR Reporting Compatibility with 1.0a Devices
  
  With 1.0a device Functions, 96 if the Unsupported Request 
  Reporting 
 
  Enable bit is set, the Function
  when operating as a Completer will send an uncorrectable error 
  Message (if enabled) when a UR
  error is detected. On platforms where an uncorrectable error 
 Message 
  is handled as a System Error,
  this will break PC-compatible Configuration Space probing, so 
  software/firmware on such
  platforms may need to avoid setting the Unsupported Request 
  Reporting Enable bit.
  With device Functions implementing Role-Based Error Reporting, 
  setting the Unsupported Request
  Reporting Enable bit will not interfere with PC-compatible 
  Configuration Space probing, assuming
  that the severity for UR is left at its default of non-fatal. 
  However, setting the Unsupported Request
  Reporting Enable bit will enable the Function to report UR errors 
  detected with posted Requests,
  helping avoid this case for potential silent data corruption.
  On platforms where robust error handling and PC-compatible 
  Configuration Space probing is
  required, it is suggested that software or firmware have the 
  Unsupported Request Reporting Enable
  bit Set for Role-Based Error Reporting Functions, but clear for 
  1.0a 
 
  Functions. Software or
  firmware can distinguish the two classes of Functions by examining 
  the Role-Based Error Reporting
  bit in the Device Capabilities register.
  
  
  What I think you have is a very old 1.0a system, and you set Unsupported
  Request Reporting Enable.
  
  Can you confirm?
 
 No. In at least one of the two cases we got reports of the original
 problem, triggering the finding of this issue, this is a brand new one,
 only soon to become available publicly. Furthermore I'm being
 confused by the mention of PC-compatible config space probing
 above: The URs we talk about here don't result from config space
 accessed at all.
 
 OK. Can you please explain why does UR cause a system error then?
 It looks like a hardware bug: PCIE 1.1 seems to say it shouldn't.

Quite possible. Looking at the ITP log we were provided, the UR
severity bit is clear (non-fatal), yet the error got surfaced to the
OS as a fatal one (I would guess because it validly gets flagged as
uncorrectable at the same time).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown

 On 13.04.15 at 14:01, stefano.stabell...@eu.citrix.com wrote:
 On Mon, 13 Apr 2015, Jan Beulich wrote:
  On 13.04.15 at 12:50, stefano.stabell...@eu.citrix.com wrote:
  On Mon, 13 Apr 2015, Jan Beulich wrote:
  While we trust Dom0 to not do outright bad things,
  the hypervisor should still avoid doing things that can go wrong
  due to the state a device is put (or left) in by Dom0.
  
  Xen should also avoid doing things that can go wrong because of the
  state a device is put in by QEMU or other components in the system.
  There isn't much room for Xen to play with.
 
 Qemu is either part of Dom0, or doesn't play with devices directly.
 
 I don't understand the point you are trying to make here.
 If your intention is to point out that QEMU shouldn't be writing to the
 control register, as I wrote earlier, the current codebase disagrees
 with you and has been that way for years.

Yes, this was precisely my intention. Qemu having done a certain
thing for many years doesn't mean this is correct, and shouldn't
be fixed if broken.

  And how are we going to deal with older unfixed QEMUs?
  So far we have been using the same policy for QEMU and the Dom0 kernel:
  Xen doesn't break them -- old Linux kernels and QEMUs are supposed to
  just work.
 
 I'm not sure that's really true for qemu, or if it is, then only by pure
 luck:
 
 This is false, it is so by design.  QEMU has an internal libxc
 compatibility layer.

Which could necessarily only ever be updated after the fact (of
a libxc change) and only ever in maintained branches. I.e. as
widely or as narrow as fixing the issues here would require.

 I view it as unavoidable to break older qemu here.
 
 I disagree and I am opposed to any patches series that would
 deliberately break QEMU.

If that was without also adjusting qemu I would understand you.
Considering that the issues here haven't been brought up just
yesterday, the lack of any substantial counter proposal on how
to fix this is signaling to me that there are little alternatives. Or
do you have any going beyond do it another way?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [xen-4.2-testing test] 50395: regressions - FAIL

flight 50395 xen-4.2-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/50395/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-qemuu-freebsd10-i386 10 guest-start   fail REGR. vs. 36512
 test-i386-i386-pair21 guest-migrate/src_host/dst_host fail REGR. vs. 36512
 test-amd64-i386-pair   21 guest-migrate/src_host/dst_host fail REGR. vs. 36512

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-pair 9 xen-boot/src_host   fail pass in 50375
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-localmigrate.2 fail pass in 50375

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-amd64 18 guest-start/debianhvm.repeat fail 
in 50375 blocked in 36512

Tests which did not succeed, but are not blocking:
 test-i386-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail in 50375 never pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64  9 debian-hvm-install fail never pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  9 debian-hvm-install  fail never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass
 test-i386-i386-xl-qemut-winxpsp3 16 guest-stop fail never pass
 test-i386-i386-libvirt   12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 16 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass
 test-i386-i386-xl-qemuu-winxpsp3 16 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 16 guest-stop   fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 16 guest-stop   fail never pass
 test-amd64-amd64-xl-winxpsp3 16 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemut-winxpsp3 16 guest-stop   fail never pass
 test-amd64-i386-xl-win7-amd64 16 guest-stop   fail  never pass
 test-i386-i386-xl-winxpsp3   16 guest-stop   fail   never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 16 guest-stop fail never pass
 build-amd64-rumpuserxen   5 rumpuserxen-buildfail   never pass
 test-amd64-i386-xend-winxpsp3 20 leak-check/check fail  never pass
 build-i386-rumpuserxen5 rumpuserxen-buildfail   never pass
 test-amd64-i386-xend-qemut-winxpsp3 20 leak-check/checkfail never pass
 test-amd64-amd64-xl-win7-amd64 16 guest-stop   fail never pass

version targeted for testing:
 xen  e3bfa4003ceaa2746cdd77655953ab2601acaf9c
baseline version:
 xen  5bec01c19839e150e489dd04376c65f961830c86


People who touched revisions under test:
  Ian Campbell ian.campb...@citrix.com
  Ian Jackson ian.jack...@eu.citrix.com
  Jan Beulich jbeul...@suse.com
  Konrad Rzeszutek Wilk konrad.w...@oracle.com


jobs:
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-xl  pass
 test-amd64-i386-xl   pass
 test-i386-i386-xlpass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-qemuu-freebsd10-amd64pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 fail
 test-amd64-i386-xl-qemuu-ovmf-amd64

Re: [Xen-devel] [PATCH RFC] tcp: Allow sk_wmem_alloc to exceed sysctl_tcp_limit_output_bytes

On 27/03/15 13:06, Jonathan Davies wrote:
 On 26/03/15 17:14, Eric Dumazet wrote:
 On Thu, 2015-03-26 at 16:46 +, Jonathan Davies wrote:
 Network drivers with slow TX completion can experience poor network
 transmit
 throughput, limited by hitting the sk_wmem_alloc limit check in
 tcp_write_xmit.
 The limit is 128 KB (by default), which means we are limited to two
 64 KB skbs
 in-flight. This has been observed to limit transmit throughput with
 xen-netfront
 because its TX completion can be relatively slow compared to physical
 NIC
 drivers.

 There have been several modifications to the calculation of the
 sk_wmem_alloc
 limit in the past. Here is a brief history:

   * Since TSQ was introduced, the queue size limit was
 sysctl_tcp_limit_output_bytes.

   * Commit c9eeec26 (tcp: TSQ can use a dynamic limit) made the limit
 max(skb-truesize, sk-sk_pacing_rate  10).
 This allows more packets in-flight according to the estimated rate.

   * Commit 98e09386 (tcp: tsq: restore minimal amount of queueing)
 made the
 limit
 max_t(unsigned int, sysctl_tcp_limit_output_bytes,
 sk-sk_pacing_rate  10).
 This ensures at least sysctl_tcp_limit_output_bytes in flight but
 allowed
 more if rate estimation shows this to be worthwhile.

   * Commit 605ad7f1 (tcp: refine TSO autosizing) made the limit
 min_t(u32, max(2 * skb-truesize, sk-sk_pacing_rate  10),
sysctl_tcp_limit_output_bytes).
 This meant that the limit can never exceed
 sysctl_tcp_limit_output_bytes,
 regardless of what rate estimation suggests. It's not clear from
 the commit
 message why this significant change was justified, changing
 sysctl_tcp_limit_output_bytes from being a lower bound to an
 upper bound.

 This patch restores the behaviour that allows the limit to grow above
 sysctl_tcp_limit_output_bytes according to the rate estimation.

 This has been measured to improve xen-netfront throughput from a domU
 to dom0
 from 5.5 Gb/s to 8.0 Gb/s. Or, in the case of transmitting from one
 domU to
 another (on the same host), throughput rose from 2.8 Gb/s to 8.0
 Gb/s. In the
 latter case, TX completion is especially slow, explaining the large
 improvement.
 These values were measured against 4.0-rc5 using iperf -c ip -i 1
 using
 CentOS 7.0 VM(s) on Citrix XenServer 6.5 on a Dell R730 host with a
 pair of Xeon
 E5-2650 v3 CPUs.

 Fixes: 605ad7f184b6 (tcp: refine TSO autosizing)
 Signed-off-by: Jonathan Davies jonathan.dav...@citrix.com
 ---
   net/ipv4/tcp_output.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
 index 1db253e..3a49af8 100644
 --- a/net/ipv4/tcp_output.c
 +++ b/net/ipv4/tcp_output.c
 @@ -2052,7 +2052,7 @@ static bool tcp_write_xmit(struct sock *sk,
 unsigned int mss_now, int nonagle,
* One example is wifi aggregation (802.11 AMPDU)
*/
   limit = max(2 * skb-truesize, sk-sk_pacing_rate  10);
 -limit = min_t(u32, limit, sysctl_tcp_limit_output_bytes);
 +limit = max_t(u32, limit, sysctl_tcp_limit_output_bytes);

   if (atomic_read(sk-sk_wmem_alloc)  limit) {
   set_bit(TSQ_THROTTLED, tp-tsq_flags);


 That will get a NACK from me and Google TCP team of course.

 Try your patch with low throughput flows, like 64kbps, GSO off...

 And observe TCP timestamps and rtt estimations, critical for vegas or
 any CC based on delay.

 I get line rate on 40Gbps NIC using this (low) limit of 2 TSO packets.

 This topic was discussed for wifi recently, and I suggested multiple
 ways to solve the problem on problematic drivers.

 There is a reason sysctl_tcp_limit_output_bytes exists : You can change
 its value to whatever you want.

 vi +652 Documentation/networking/ip-sysctl.txt

 tcp_limit_output_bytes - INTEGER
  Controls TCP Small Queue limit per tcp socket.
  TCP bulk sender tends to increase packets in flight until it
  gets losses notifications. With SNDBUF autotuning, this can
  result in a large amount of packets queued in qdisc/device
  on the local machine, hurting latency of other flows, for
  typical pfifo_fast qdiscs.
  tcp_limit_output_bytes limits the number of bytes on qdisc
  or device to reduce artificial RTT/cwnd and reduce bufferbloat.
  Default: 131072

 Documentation is pretty clear : This is the max value, not a min one.
 
 Okay, thanks for your feedback. It wasn't clear to me from the commit
 message in 605ad7f1 (tcp: refine TSO autosizing) why
 tcp_limit_output_bytes changed from being a lower bound to an upper
 bound in that patch. But it's now clear from your reply that that's the
 behaviour you intend as correct.
 
 Thanks for drawing my attention to the wifi thread, which is helpful. As
 I understand it, the only available options for fixing the performance
 regression experienced by xen-netfront are:

Re: [Xen-devel] Dom0 linux 4.0 + devel/for-linus-4.1 branch: p2m.c:884:d0v0 gfn_to_mfn failed! gfn=ffffffff001ed type:4

On 13/04/15 13:14, Sander Eikelenboom wrote:
 
 Monday, April 13, 2015, 2:07:02 PM, you wrote:
 
 On 13/04/15 12:21, Sander Eikelenboom wrote:

 Monday, April 13, 2015, 11:50:51 AM, you wrote:

 On 13/04/15 10:39, Sander Eikelenboom wrote:
 Hi David,

 I seem to have spotted some trouble with a 4.0 dom0 kernel with the 
 devel/for-linus-4.1 branch pulled on top.

 Does this remind you of any specific commits in the devel/for-linus-4.1 
 branch that could
 likely be involved that i could try to revert ?

 Yes.  This will probably be 4e8c0c8c4bf3a (xen/privcmd: improve
 performance of MMAPBATCH_V2) which makes the kernel try harder to map
 all GFNs instead of failing on the first one.

 I think this is qemu incorrectly trying to map GFNs.

 David

 Reverted that specific one, but still get those messages.
 
 You'll have to bisect it then.  Because I don't see any other relevant
 commits in devel/for-linus-4.1
 
 David
 
 Ok .. hmm first candidate of the bisect also looks interessting:
 [628c28eefd6f2cef03b212081b466ae43fd093a3] xen: unify foreign GFN map/unmap 
 for auto-xlated physmap guests

Unless your dom0 is PVH, no.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register

 On 13.04.15 at 14:47, m...@redhat.com wrote:
 On Mon, Apr 13, 2015 at 01:40:59PM +0100, Jan Beulich wrote:
 Quite possible. Looking at the ITP log we were provided, the UR
 severity bit is clear (non-fatal), yet the error got surfaced to the
 OS as a fatal one (I would guess because it validly gets flagged as
 uncorrectable at the same time).
 
 No, that's not valid.
 Can you check device capabilities register, offset 0x4 within
 pci express capability structure?
 Bit 15 is 15 Role-Based Error Reporting.
 Is it set?
 
 The spec says:
 
   15
   On platforms where robust error handling and PC-compatible 
 Configuration 
 Space probing is
   required, it is suggested that software or firmware have the 
 Unsupported 
 Request Reporting Enable
   bit Set for Role-Based Error Reporting Functions, but clear for 1.0a 
 Functions. Software or
   firmware can distinguish the two classes of Functions by examining the 
 Role-Based Error Reporting
   bit in the Device Capabilities register.

Yes, that bit is set.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCHv2 1/6] x86/hvm: don't include asm/spinlock.h

On 13/04/15 14:13, David Vrabel wrote:
 On 10/04/15 16:24, Andrew Cooper wrote:
 On 10/04/15 15:19, David Vrabel wrote:
 asm/spinlock.h should not be included directly.

 Signed-off-by: David Vrabel david.vra...@citrix.com
 s/asm/xen/g instead of a straight delete?

 Otherwise you are relying on pulling in xen/spinlock.h implicitly.
 None of these files declare a spinlock so getting the header implicitly
 is correct, IMO.

In which case, good riddance.

Acked-by: Andrew Cooper andrew.coop...@citrix.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v9 10/15] tools/libxc: x86 HVM save code

On 13/04/15 13:28, Vitaly Kuznetsov wrote:
 Andrew Cooper andrew.coop...@citrix.com writes:

 +/*
 + * Query for a range of HVM parameters and write an HVM_PARAMS record into 
 the
 + * stream.
 + */
 +static int write_hvm_params(struct xc_sr_context *ctx)
 +{
 +static const unsigned int params[] = {
 +HVM_PARAM_STORE_PFN,
 +HVM_PARAM_IOREQ_PFN,
 +HVM_PARAM_BUFIOREQ_PFN,
 +HVM_PARAM_PAGING_RING_PFN,
 +HVM_PARAM_MONITOR_RING_PFN,
 +HVM_PARAM_SHARING_RING_PFN,
 +HVM_PARAM_VM86_TSS,
 +HVM_PARAM_CONSOLE_PFN,
 +HVM_PARAM_ACPI_IOPORTS_LOCATION,
 +HVM_PARAM_VIRIDIAN,
 +HVM_PARAM_IDENT_PT,
 +HVM_PARAM_PAE_ENABLED,
 +HVM_PARAM_VM_GENERATION_ID_ADDR,
 +HVM_PARAM_IOREQ_SERVER_PFN,
 +HVM_PARAM_NR_IOREQ_SERVER_PAGES,
 +};
 +
 +xc_interface *xch = ctx-xch;
 +struct xc_sr_rec_hvm_params_entry entries[ARRAY_SIZE(params)];
 +struct xc_sr_rec_hvm_params hdr = {
 +.count = 0,
 +};
 +struct xc_sr_record rec = {
 +.type   = REC_TYPE_HVM_PARAMS,
 +.length = sizeof(hdr),
 +.data   = hdr,
 +};
 +unsigned int i;
 +int rc;
 +
 +for ( i = 0; i  ARRAY_SIZE(params); i++ )
 +{
 +uint32_t index = params[i];
 +uint64_t value;
 +
 +rc = xc_hvm_param_get(xch, ctx-domid, index, value);
 +if ( rc )
 +{
 +PERROR(Failed to get HVMPARAM at index %u, index);
 +return rc;
 +}
 +
 +if ( value != 0 )
 +{
 +entries[hdr.count].index = index;
 +entries[hdr.count].value = value;
 +hdr.count++;
 +}
 +}
 While reviewing my 'soft reset' series Ian C raised a question about the
 unsafeness of sequential get/set of HVM params:
 http://lists.xen.org/archives/html/xen-devel/2015-01/msg01177.html

 In case the concern is valid I think the requirements for migration are
 even stricter as we can migrate live. In case it is not, would it be
 possible to move your params[] array to some common place so it can be
 reused?

This code currently mirrors what the old migration did.  This was one
area we deliberately didn't try to clean up (we were focusing on a
functional replacement).

I am not a fan of hvm params.  They are used as a dumping ground for
things which IMO should live elsewhere.  The result is this mess,
without any clear direction of who owns which parameter, which should
move on migration, which shouldn't etc.  There are many areas of the
toolstack which independently poke at the parameters.

I would be hesitant to blindly lift this code out, although it is
probably the best start you will find.

What is really needed is formal statement of who owns which hvm params,
and what their purpose is.  That would at least be a good starting point
for evaluating questions like this.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen

2015-04-13 Thread Eric Dumazet

On Mon, 2015-04-13 at 11:56 +0100, George Dunlap wrote:

 Is the problem perhaps that netback/netfront delays TX completion?
 Would it be better to see if that can be addressed properly, so that
 the original purpose of the patch (fighting bufferbloat) can be
 achieved while not degrading performance for Xen?  Or at least, so
 that people get decent perfomance out of the box without having to
 tweak TCP parameters?

Sure, please provide a patch, that does not break back pressure.

But just in case, if Xen performance relied on bufferbloat, it might be
very difficult to reach a stable equilibrium : Any small change in stack
or scheduling might introduce a significant difference in 'raw
performance'.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] tools/libxc: Fix build of 32bit toolstacks on CentOS 5.x following XSA-125

On 13/04/15 17:33, Ian Jackson wrote:
 Andrew Cooper writes ([Xen-devel] [PATCH] tools/libxc: Fix build of 32bit 
 toolstacks on CentOS 5.x following XSA-125):
 gcc 4.1 of CentOS 5.x era does not like the typecheck in min() between
 uint64_t and unsigned long.

 Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
 CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 CC: Ian Campbell ian.campb...@citrix.com
 CC: Ian Jackson ian.jack...@eu.citrix.com
 CC: Wei Liu wei.l...@citrix.com
 As and when this is acked and pushed to #staging, we should update the
 advisory.

 This needs backporting to 4.5
 To be clear, do you know whether this is a trivial cherry pick ?

It is trivial to cherrypick.

4.4 is the problematic case which I did the backport for separately, but
then that is trivial to cherrypick back to 4.2

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] new functions libxl_bitmap_{or,and}

2015-04-13 Thread Linda Jacobson

provide logical and and or of two bitmaps

---

v.1  updated comments and format
v.2  rewrote bitmap functions to manipulate bytes not bits

Signed-off-by: Linda Jacobson lin...@jma3.com
---
 tools/libxl/libxl_utils.c | 74 +++
 tools/libxl/libxl_utils.h |  5 
 2 files changed, 79 insertions(+)

diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
index 9053b27..5c7178f 100644
--- a/tools/libxl/libxl_utils.c
+++ b/tools/libxl/libxl_utils.c
@@ -691,6 +691,80 @@ void libxl_bitmap_reset(libxl_bitmap *bitmap, int bit)
 bitmap-map[bit / 8] = ~(1  (bit  7));
 }
 
+/* provide logical or and logical and of two bitmaps */
+int libxl_bitmap_or(libxl_ctx *ctx, libxl_bitmap *or_map,
+libxl_bitmap *map1, libxl_bitmap *map2)
+{
+GC_INIT(ctx);
+int rc;
+uint32_t i;
+libxl_bitmap *lgmap;
+libxl_bitmap *smap;
+
+if (map1-size  map2-size) {
+lgmap = map1;
+smap = map2;
+}
+else {
+lgmap = map2;
+smap = map1;
+}
+
+rc = libxl_bitmap_alloc(ctx, or_map, lgmap-size * 8);
+if (rc)
+goto out;
+
+/*
+ *  if bitmaps aren't the same size, their union (logical or) will
+ *  be size of larger bit map.  Any bit past the end of the
+ *  smaller bit map, will match the larger one.
+ */
+for (i = 0; i  smap-size; i++)
+or_map-map[i] = (smap-map[i] | lgmap-map[i]);
+
+for (i = smap-size; i  lgmap-size; i++)
+or_map-map[i] = lgmap-map[i];
+
+out:
+GC_FREE;
+return rc;
+}
+
+int libxl_bitmap_and(libxl_ctx *ctx, libxl_bitmap *and_map,
+ libxl_bitmap *map1, libxl_bitmap *map2)
+{
+GC_INIT(ctx);
+int rc;
+uint32_t i;
+libxl_bitmap *lgmap, *smap;
+
+if (map1-size   map2-size) {
+lgmap = map1;
+smap = map2;
+}
+else {
+lgmap = map2;
+smap = map1;
+}
+
+
+rc = libxl_bitmap_alloc(ctx, and_map, smap-size * 8);
+if (rc)
+goto out;
+
+/* 
+ *  if bitmaps aren't same size, their 'and' will be size of
+ *  smaller bit map
+ */
+for (i = 0; i  and_map-size; i++)
+and_map-map[i] = (lgmap-map[i]  smap-map[i]);
+
+out: 
+GC_FREE;
+return rc;
+
+}
+
 int libxl_bitmap_count_set(const libxl_bitmap *bitmap)
 {
 int i, nr_set_bits = 0;
diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h
index acacdd9..1c0086b 100644
--- a/tools/libxl/libxl_utils.h
+++ b/tools/libxl/libxl_utils.h
@@ -90,6 +90,11 @@ int libxl_bitmap_test(const libxl_bitmap *bitmap, int bit);
 void libxl_bitmap_set(libxl_bitmap *bitmap, int bit);
 void libxl_bitmap_reset(libxl_bitmap *bitmap, int bit);
 int libxl_bitmap_count_set(const libxl_bitmap *cpumap);
+/* or and and functions for two bitmaps */
+int libxl_bitmap_or(libxl_ctx *ctx, libxl_bitmap *or_map,
+libxl_bitmap *map1, libxl_bitmap *map2); 
+int libxl_bitmap_and(libxl_ctx *ctx, libxl_bitmap *and_map,
+ libxl_bitmap *map1, libxl_bitmap *map2);
 char *libxl_bitmap_to_hex_string(libxl_ctx *ctx, const libxl_bitmap *cpumap);
 static inline void libxl_bitmap_set_any(libxl_bitmap *bitmap)
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC] tcp: Allow sk_wmem_alloc to exceed sysctl_tcp_limit_output_bytes

2015-04-13 Thread Malcolm Crossley

On 13/04/15 15:05, Eric Dumazet wrote:
 On Mon, 2015-04-13 at 14:46 +0100, David Vrabel wrote:
 
 And the proof-of-concept patch for idea (b) I used was:

 @@ -552,6 +552,8 @@ static int xennet_start_xmit(struct sk_buff *skb,
 struct net_device *dev)
  goto drop;
  }

 +skb_orphan(skb);
 +
  page = virt_to_page(skb-data);
  offset = offset_in_page(skb-data);
  len = skb_headlen(skb);

 No. This a bunch of allocations and  a full memcpy of all the frags.
 
 skb_orphan() does nothing like that.

I think David accidently read the skb_orphan_frags() code.

 
 But the main concern here is it basically breaks back pressure.
 
 And you do not want this, unless there is no other choice.
 

virtio_net already use's skb_orphan() in it's transmit path. It seems
only fair that other virtual network drivers behave in the same way.

There are no easy solutions to decrease the transmit latency for
netback/netfront. We map the guest memory through to the backend to
avoid memory copies. The frontend memory can only be freed once the
network driver has completed transmitting the packet in the backend.

Modern network drivers can be quite slow at freeing the skb's once
transmitted (the packet is already on the wire as far as they are
concerned) and this delay is compounded by needing the signal the
completion of the transmit back to the frontend (by IPI in worst case).

From a networking point of view, the backend is a switch. Is it OK to
consider the packet to have been transmitted from the guest point of
view once the backend is aware of the packet?

This would help justify the skb_orphan() in the frontend.

 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015-04-13 Thread Peter Zijlstra

On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote:

 +static void pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node)
 +{
 +   struct __qspinlock *l = (void *)lock;
 +   struct qspinlock **lp = NULL;
 +   struct pv_node *pn = (struct pv_node *)node;
 +   int slow_set = false;
 +   int loop;
 +
 +   for (;;) {
 +   for (loop = SPIN_THRESHOLD; loop; loop--) {
 +   if (!READ_ONCE(l-locked))
 +   return;
 +
 +   cpu_relax();
 +   }
 +
 +   WRITE_ONCE(pn-state, vcpu_halted);
 +   if (!lp)
 +   lp = pv_hash(lock, pn);
 +   /*
 +* lp must be set before setting _Q_SLOW_VAL
 +*
 +* [S] lp = lock[RmW] l = l-locked = 0
 +* MB MB
 +* [S] l-locked = _Q_SLOW_VAL  [L]   lp
 +*
 +* Matches the cmpxchg() in pv_queue_spin_unlock().
 +*/
 +   if (!slow_set
 +   !cmpxchg(l-locked, _Q_LOCKED_VAL, _Q_SLOW_VAL)) {
 +   /*
 +* The lock is free and _Q_SLOW_VAL has never been
 +* set. Need to clear the hash bucket before getting
 +* the lock.
 +*/
 +   WRITE_ONCE(*lp, NULL);
 +   return;
 +   } else if (slow_set  !READ_ONCE(l-locked))
 +   return;
 +   slow_set = true;

 I'm somewhat puzzled by the slow_set thing; what is wrong with the thing
 I had, namely:
 
  if (!lp) {
  lp = pv_hash(lock, pn);
 
  /*
   * comment
   */
  lv = cmpxchg(l-locked, _Q_LOCKED_VAL, _Q_SLOW_VAL);
  if (lv != _Q_LOCKED_VAL) {
  /* we're woken, unhash and return */
  WRITE_ONCE(*lp, NULL);
  return;
  }
  }
 +
 +   pv_wait(l-locked, _Q_SLOW_VAL);
 
 If we get a spurious wakeup (due to device interrupts or random kick)
 we'll loop around but -locked will remain _Q_SLOW_VAL.
 
 The purpose of the slow_set flag is not about the lock value. It is to make
 sure that pv_hash_find() will always find a match. Consider the following
 scenario:
 
 cpu1cpu2cpu3
 
 pv_wait
 spurious wakeup
 loop l-locked
 
 read _Q_SLOW_VAL
 pv_hash_find()
 unlock
 
 pv_hash() - same entry
 
 cmpxchg(l-locked)
 clear hash (?)
 
 Here, the entry for cpu3 will be removed leading to panic when
 pv_hash_find() can find the entry. So the hash entry can only be
 removed if the other cpu has no chance to see _Q_SLOW_VAL.

Still confused. Afaict that cannot happen with my code. We only do the
cmpxchg(, _Q_SLOW_VAL) _once_.

Only on the first time around that loop will we hash the lock and set
the slow flag. And cpu3 cannot come in on the same entry because we've
not yet released the lock when we find and unhash.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 1/1] xc_domain_restore: Allow QEMU to increase memory

2015-04-13 Thread Don Slutz

If QEMU has called on xc_domain_setmaxmem to add more memory for
option ROMs, domain restore needs to also increase the memory.

Signed-off-by: Don Slutz dsl...@verizon.com
---
To see the hvmloader loader issue:

  xl cre -p e1000x8.xfg
  xl save e1000x8 e1000x8.save
  xl restore e1000x8.save

With e1000x8.xfg:
---
builder = hvm
device_model_args_hvm = [
 -monitor,
 pty,
 -boot,
 menu=on,
]
device_model_version = qemu-xen
disk = [
 /dev/etherd/e500.1,,xvda,
 /dev/etherd/e500.2,,xvdb,
 /dev/etherd/e500.3,,xvde,
 /local-isos/iso/centos/x86_64/CentOS-6.3-x86_64-bin-DVD1.iso,,hdc,cdrom,
 /local-isos/iso/centos/x86_64/CentOS-6.3-x86_64-bin-DVD2.iso,,hdd,cdrom,
]
maxmem = 8192
memory = 8192
name = e1000x8
serial = pty
tsc_mode = native_paravirt
uuid = e5000105-3d83-c962-07ae-2bc46c3644a0
videoram = 16
vif = [
 model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:a0,
 model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:aa,
 model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:b4,
 model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:be,
 model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:c8,
 model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:d2,
 model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:dc,
 model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:e6,
 model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:f0,
 model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:fa,
]
viridian = 0
xen_platform_pci = 1
mmio_hole = 2048
vcpus = 1
maxvcpus = 6
on_poweroff = preserve
on_reboot = preserve
on_watchdog = preserve
on_crash = preserve
---


 tools/libxc/xc_domain_restore.c | 75 +++--
 1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index 2ab9f46..28b4fa6 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -47,6 +47,13 @@
 #include xen/hvm/ioreq.h
 #include xen/hvm/params.h
 
+/* Leave some slack so that hvmloader does not complain about lack of
+ * memory at boot time (Could not allocate order=0 extent).
+ * Once hvmloader is modified to cope with that situation without
+ * printing warning messages, QEMU_SPARE_PAGES can be removed.
+ */
+#define QEMU_SPARE_PAGES 16
+
 struct restore_ctx {
 unsigned long max_mfn; /* max mfn of the current host machine */
 unsigned long hvirt_start; /* virtual starting address of the hypervisor */
@@ -209,12 +216,44 @@ static int uncanonicalize_pagetable(
 if (!ctx-hvm  ctx-superpages)
 rc = alloc_superpage_mfns(xch, dom, ctx, nr_mfns);
 else
+{
+xc_domaininfo_t info;
+unsigned long free_pages;
+
+if ((xc_domain_getinfolist(xch, dom, 1, info) != 1) ||
+(info.domain != dom)) {
+ERROR(Failed xc_domain_getinfolist for batch 
(uncanonicalize_pagetable)\n);
+errno = ENOMEM;
+return 0;
+}
+free_pages = info.max_pages - info.tot_pages;
+if (free_pages  QEMU_SPARE_PAGES) {
+free_pages -= QEMU_SPARE_PAGES;
+} else {
+free_pages = 0;
+}
+if (free_pages  nr_mfns)
+{
+DPRINTF(Adjust memory for batch (uncanonicalize_pagetable): 
free_pages=%lu nr_mfns=%d max_pages=%lu tot_pages=%lu max_mfn=%lu\n,
+free_pages, nr_mfns, (unsigned long)info.max_pages,
+(unsigned long)info.tot_pages, ctx-max_mfn);
+if (xc_domain_setmaxmem(xch, dom,
+((info.max_pages + nr_mfns - 
free_pages)
+  (XC_PAGE_SHIFT - 10)))  0)
+{
+ERROR(Failed xc_domain_setmaxmem for batch 
(uncanonicalize_pagetable)\n);
+errno = ENOMEM;
+return 0;
+}
+}
 rc = xc_domain_populate_physmap_exact(xch, dom, nr_mfns, 0, 0,
   ctx-p2m_batch);
+}
 
 if (rc)
 {
-ERROR(Failed to allocate memory for batch.!\n);
+ERROR(Failed to allocate memory for batch. rc=%d nr_mfns=%d!\n,
+  rc, nr_mfns);
 errno = ENOMEM;
 return 0;
 }
@@ -1241,12 +1280,44 @@ static int apply_batch(xc_interface *xch, uint32_t dom, 
struct restore_ctx *ctx,
 if (!ctx-hvm  ctx-superpages)
 rc = alloc_superpage_mfns(xch, dom, ctx, nr_mfns);
 else
+{
+xc_domaininfo_t info;
+unsigned long free_pages;
+
+if ((xc_domain_getinfolist(xch, dom, 1, info) != 1) ||
+(info.domain != dom)) {
+ERROR(Failed xc_domain_getinfolist for apply_batch\n);
+errno = ENOMEM;
+

Re: [Xen-devel] [PATCH 1/1] x86: Set regs-entry_vector for early_page_fault

On 13/04/15 17:04, Don Slutz wrote:
 This changes:

 (XEN) Early fatal page fault at e008:82d080164252 (cr2=, 
 ec=)
 (XEN) [ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]
 (XEN) CPU:0
 (XEN) RIP:e008:[82d080164252] arch_domain_create+0x3e/0x4ef
 ...
 (XEN) Xen call trace:
 (XEN)[82d080164252] arch_domain_create+0x3e/0x4ef
 (XEN)[82d080105262] domain_create+0x384/0x556
 (XEN)[82d0802a0de4] scheduler_init+0x1c4/0x244
 (XEN)[82d0802be359] __start_xen+0x1d0e/0x22a1
 (XEN)[82d080100067] __high_start+0x53/0x58
 (XEN)
 (XEN)
 (XEN) 
 (XEN) Panic on CPU 0:
 (XEN) FATAL TRAP: vector = 0 (divide error)
 (XEN) [error_code=] , IN INTERRUPT CONTEXT
 (XEN) 
 ...

 to:

 (XEN) Early fatal page fault at e008:82d080164252 (cr2=, 
 ec=)
 (XEN) [ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]
 (XEN) CPU:0
 (XEN) RIP:e008:[82d080164252] arch_domain_create+0x3e/0x4ef
 ...
 (XEN) Xen call trace:
 (XEN)[82d080164252] arch_domain_create+0x3e/0x4ef
 (XEN)[82d080105262] domain_create+0x384/0x556
 (XEN)[82d0802a0de4] scheduler_init+0x1c4/0x244
 (XEN)[82d0802be359] __start_xen+0x1d0e/0x22a1
 (XEN)[82d080100067] __high_start+0x53/0x58
 (XEN)
 (XEN) Faulting linear address: 
 (XEN) Pagetable walk from :
 (XEN)  L4[0x000] = 00083a1a6063 
 (XEN)  L3[0x000] = 00083a1a5063 
 (XEN)  L2[0x000] = 00083a1a4063 
 (XEN)  L1[0x000] =  
 (XEN)
 (XEN) 
 (XEN) Panic on CPU 0:
 (XEN) FATAL TRAP: vector = 14 (page fault)
 (XEN) [error_code=] , IN INTERRUPT CONTEXT
 (XEN) 
 ...

 Signed-off-by: Don Slutz dsl...@verizon.com

Reviewed-by: Andrew Cooper andrew.coop...@citrix.com

 ---
  xen/arch/x86/x86_64/entry.S | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
 index 2d25d57..7e63e64 100644
 --- a/xen/arch/x86/x86_64/entry.S
 +++ b/xen/arch/x86/x86_64/entry.S
 @@ -626,6 +626,7 @@ ENTRY(double_fault)
  
  .pushsection .init.text, ax, @progbits
  ENTRY(early_page_fault)
 +movl  $TRAP_page_fault,4(%rsp)
  SAVE_ALL
  movq  %rsp,%rdi
  call  do_early_page_fault


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015-04-13 Thread Waiman Long


On 04/13/2015 11:09 AM, Peter Zijlstra wrote:

On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote:

+__visible void __pv_queue_spin_unlock(struct qspinlock *lock)
+{
+   struct __qspinlock *l = (void *)lock;
+   struct pv_node *node;
+
+   if (likely(cmpxchg(l-locked, _Q_LOCKED_VAL, 0) == _Q_LOCKED_VAL))
+   return;
+
+   /*
+* The queue head has been halted. Need to locate it and wake it up.
+*/
+   node = pv_hash_find(lock);
+   smp_store_release(l-locked, 0);

Ah yes, clever that.


+   /*
+* At this point the memory pointed at by lock can be freed/reused,
+* however we can still use the PV node to kick the CPU.
+*/
+   if (READ_ONCE(node-state) == vcpu_halted)
+   pv_kick(node-cpu);
+}
+PV_CALLEE_SAVE_REGS_THUNK(__pv_queue_spin_unlock);

However I feel the PV_CALLEE_SAVE_REGS_THUNK thing belongs in the x86
code.

That is why I originally put my version of the qspinlock_paravirt.h header
file under arch/x86/include/asm. Maybe we should move it back there. Putting
the thunk in arch/x86/kernel/kvm.c didn't work when you consider that the
Xen code also need that.

Well the function is 'generic' and belong here I think. Its just the
PV_CALLEE_SAVE_REGS_THUNK thing that arch specific. Should have live in
arch/x86/kernel/paravirt-spinlocks.c instead?


Another alternative is to put the PV_CALLEE_SAVE_REGS_THUNK into a 
separate asm/qspinlock_paravirt.h file and included in the generic 
qspinlock_paravirt.h file. By putting the thunk with the 
__pv_queue_spin_unlock together in the same file, we can make the thunk 
and the unlock fast path (_Q_SLOW_VAL not set) go into two adjacent 
instruction cachelines. I think that may help a bit in term of performance.


Cheers,
Longman


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/8] Add core.sh and wrapper function

On Mon, 13 Apr 2015, George Dunlap wrote:
 On 04/10/2015 03:29 PM, Stefano Stabellini wrote:
  +arg_parse_cmd=\
  +local -a args;
  +local _a;
  +local _vn;
  +local _m;
  +
  +_m=true;
  +
  +for _a in \\$@\ ; do
  +false  echo \Evaluating \${_a} [[ \\${_a/=}\ = \\${_a}\ ]]\;
  +if \$_m  [[ \\${_a/=}\ != \\${_a}\ ]] ; then
  +false  echo Parameter;
  +_vn=\${_a%%=*};
  +eval \local \$_vn\;
  +eval \\$_a\;
  +elif \$_m  [[ \\${_a}\ == \--\ ]] ; then
  +false  echo Separator;
  +_m=false;
  +else
  +false  echo Argument;
  +_m=false;
  +args+=(\\$_a\);
  +fi;
  +done
  
  I am sorry but I cannot bear myself to introduce so much complexity into
  the world.  raisin is supposed to be easy to read.  If we end up
  actually needing something so powerful and so complex in the future we
  can import it then, but I think that at the moment we can do nicely
  without it.
 
 I can certainly understand the sentiment, but the idea behind this
 calling convention is actually to make it *easier* both to read and
 debug most of the source file, at the cost of two very complex, but
 well-tested and generic macros ($arg_parse and $requireargs).
 
 Consider some criteria:
 - Difficulty of writing or reading code that calls a function
 - Difficulty of writing a function which will be called
 - Availability of sensible defaults
 - Ability to make sure all necessary arguments have been specified
 - Passing function arguments into sub-functions
 - Global variables are nasty to debug

 Consider alternate ways of passing arguments into functions:
 
 1. Positional
 
 i.e., cp SRC DEST
 
 In this model, it's easy to make sure that all necessary arguments have
 been specified; and there are no nasty global variables to work with.
 
 However is really hard to program and read because you have to remember
 what argument goes where.  You also have to specify all possible
 arguments -- you can't have a default.  And if you need to take an
 argument and pass it to another function you call, you have to repeat it
 again.
 
 2. Option
 
 i.e., -v --components='a b'
 
 This is easier to read if you use long options (or common ones like -v
 or -y), and you can also have defaults.  Additionally, you can have a
 certain amount of error checking -- if you get an argument you don't
 recognize you can throw an error.
 
 But you have to write a lot of boilerplate option processing at the top
 of each function; and if you have anything a bit complex (line
 --components= above), it makes it a lot harder to read.  And you have
 to write new code every time, giving you more opportunity for bugs.
 
 Consider the argument parsing code you have in raise at the moment -- it
 assumes that you will run a command without any additional parameters.
 What if you wanted to do something like this:
 
 raise -v build xen
 
 The current code would have to be refactored.
 
 And you have to write a lot of custom code to check to make sure that
 all the required arguments have been passed in.
 
 And you have the same problem that if you need to pass an argument
 further down the call stack, you have to repeat it again.
 
 3. Global variables
 
 i.e.,
 VAR=foo
  function
 
 This is a bit easier to read in the sense that you're using full
 variable names for options.  It's easier to write, since you don't have
 any parsing code.  Additionally, variables can be passed further down
 the calling stack automatically without having to be explicitly copied
 around.  In theory you should be able to have sensible local defaults.
 
 But you have all the problems with global variables being really hard to
 debug.
 
 4. Prefix variable
 
 i.e., YES=y check-builddep
 
 This is a bit easier to read in the sense that you're using full
 variable names for options.  It's easier to write, since you don't have
 any parsing code.  Additionally, variables can be passed further down
 the calling stack automatically without having to be explicitly copied
 around.  In theory you should be able to have sensible local defaults.
 
 I hadn't thought about this as a calling convention.  It certainly has
 the advantage that it's build into bash and fairly well-understood by
 experienced users.
 
 But I do think it's a bit unnatural to have to prefix all the variables
 to the function.
 
 5. VAR=VAL automatic parsing with inheritance
 
 This allows you to have a rich, consistent way to pass in arguments that
 are descriptive.  It's easy to write code to parse the arguments and
 check to make sure you have all the ones you need: Just add $arg_parse
 and $requireargs at the top of your function.  They are automatically
 declared local, so there's no worry about polluting the global
 namespace.  They are automatically inherited, so you don't have to keep
 passing information further down the stack.  And you can have sensible
 defaults.
 
 I think #5 is well worth the little bit of macro hackery confined to a

Re: [Xen-devel] [PATCH 1/1] xc_domain_restore: Allow QEMU to increase memory

On 13/04/15 17:09, Don Slutz wrote:
 If QEMU has called on xc_domain_setmaxmem to add more memory for
 option ROMs, domain restore needs to also increase the memory.

 Signed-off-by: Don Slutz dsl...@verizon.com

hvmloader has no interaction with xc_domain_restore().

It is xl's job to propagate the current memory as part of migration.  If
qemu has had to bump it up, this should be reflected in the domain
config file.

The migration path should not be hacked up to fix a higher level
toolstack bug.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015-04-13 Thread Waiman Long


On 04/13/2015 10:47 AM, Peter Zijlstra wrote:

On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote:

+void __init __pv_init_lock_hash(void)
+{
+   int pv_hash_size = 4 * num_possible_cpus();
+
+   if (pv_hash_size   (1U   LFSR_MIN_BITS))
+   pv_hash_size = (1U   LFSR_MIN_BITS);
+   /*
+* Allocate space from bootmem which should be page-size aligned
+* and hence cacheline aligned.
+*/
+   pv_lock_hash = alloc_large_system_hash(PV qspinlock,
+  sizeof(struct pv_hash_bucket),
+  pv_hash_size, 0, HASH_EARLY,
+   pv_lock_hash_bits, NULL,
+  pv_hash_size, pv_hash_size);

pv_taps = lfsr_taps(pv_lock_hash_bits);


I don't understand what you meant here.

Let me explain (even though I propose taking all the LFSR stuff out).

pv_lock_hash_bit is a runtime variable, therefore it cannot compile time
evaluate the forest of if statements required to compute the taps value.

Therefore its best to compute the taps _once_, and this seems like a
good place to do so.


OK, I got it. That make sense.


+   goto done;
+   }
+   }
+
+   hash = lfsr(hash, pv_lock_hash_bits, 0);

Since pv_lock_hash_bits is a variable, you end up running through that
massive if() forest to find the corresponding tap every single time. It
cannot compile-time optimize it.

The minimum bits size is now 8. So unless the system has more than 64 vCPUs,
it will get the right value in the first if statement.

Still, no reason to not pre-compute the taps value, its simple enough.



Still, we need to keep the hash_bits value as it will needed by the 
hashing function.


I have taken out the lfsr code and use linear probing in the updated 
qspinlock patch that I am working on. However, we can always add that 
back in as an additional patch.


Cheers,
Longman

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 1/2] x86/domctl: cleanup

 * latch curr/currd once at start
 * drop redundant ret = 0 and braces
 * use copyback = 1 when appropriate
 * move break statements inside case-specific braced scopes
 * don't bother check for NULL before calling xfree()
 * eliminate trailing whitespace
 * Xen style corrections

Signed-off-by: Andrew Cooper andrew.coop...@citrix.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
---
v2: Fix use of copyback in XEN_DOMCTL_gettscinfo
---
 xen/arch/x86/domctl.c |  274 -
 1 file changed, 111 insertions(+), 163 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 9450795..b6df23a 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1,6 +1,6 @@
 /**
  * Arch-specific domctl.c
- * 
+ *
  * Copyright (c) 2002-2006, K A Fraser
  */
 
@@ -39,7 +39,7 @@
 
 static int gdbsx_guest_mem_io(
 domid_t domid, struct xen_domctl_gdbsx_memio *iop)
-{   
+{
 ulong l_uva = (ulong)iop-uva;
 iop-remain = dbg_rw_mem(
 (dbgva_t)iop-gva, (dbgbyte_t *)l_uva, iop-len, domid,
@@ -53,6 +53,8 @@ long arch_do_domctl(
 struct xen_domctl *domctl, struct domain *d,
 XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 {
+struct vcpu *curr = current;
+struct domain *currd = curr-domain;
 long ret = 0;
 bool_t copyback = 0;
 unsigned long i;
@@ -61,15 +63,13 @@ long arch_do_domctl(
 {
 
 case XEN_DOMCTL_shadow_op:
-{
 ret = paging_domctl(d, domctl-u.shadow_op,
 guest_handle_cast(u_domctl, void), 0);
 if ( ret == -ERESTART )
 return hypercall_create_continuation(__HYPERVISOR_arch_1,
  h, u_domctl);
 copyback = 1;
-}
-break;
+break;
 
 case XEN_DOMCTL_ioport_permission:
 {
@@ -79,8 +79,7 @@ long arch_do_domctl(
 
 if ( (fp + np) = fp || (fp + np)  MAX_IOPORTS )
 ret = -EINVAL;
-else if ( !ioports_access_permitted(current-domain,
-fp, fp + np - 1) ||
+else if ( !ioports_access_permitted(currd, fp, fp + np - 1) ||
   xsm_ioport_permission(XSM_HOOK, d, fp, fp + np - 1, allow) )
 ret = -EPERM;
 else if ( allow )
@@ -89,8 +88,8 @@ long arch_do_domctl(
 ret = ioports_deny_access(d, fp, fp + np - 1);
 if ( !ret )
 memory_type_changed(d);
+break;
 }
-break;
 
 case XEN_DOMCTL_getpageframeinfo:
 {
@@ -127,16 +126,16 @@ long arch_do_domctl(
 break;
 }
 }
-
+
 put_page(page);
 }
 
 copyback = 1;
+break;
 }
-break;
 
 case XEN_DOMCTL_getpageframeinfo3:
-if (!has_32bit_shinfo(current-domain))
+if ( !has_32bit_shinfo(currd) )
 {
 unsigned int n, j;
 unsigned int num = domctl-u.getpageframeinfo3.num;
@@ -150,7 +149,7 @@ long arch_do_domctl(
 break;
 }
 
-page = alloc_domheap_page(current-domain, MEMF_no_owner);
+page = alloc_domheap_page(currd, MEMF_no_owner);
 if ( !page )
 {
 ret = -ENOMEM;
@@ -251,7 +250,7 @@ long arch_do_domctl(
 ret = -ENOMEM;
 break;
 }
- 
+
 ret = 0;
 for ( n = 0; n  num; )
 {
@@ -266,9 +265,9 @@ long arch_do_domctl(
 ret = -EFAULT;
 break;
 }
- 
+
 for ( j = 0; j  k; j++ )
-{  
+{
 struct page_info *page;
 unsigned long gfn = arr32[j];
 
@@ -320,8 +319,8 @@ long arch_do_domctl(
 }
 
 free_xenheap_page(arr32);
+break;
 }
-break;
 
 case XEN_DOMCTL_getmemlist:
 {
@@ -329,7 +328,8 @@ long arch_do_domctl(
 uint64_t mfn;
 struct page_info *page;
 
-if ( unlikely(d-is_dying) ) {
+if ( unlikely(d-is_dying) )
+{
 ret = -EINVAL;
 break;
 }
@@ -346,7 +346,7 @@ long arch_do_domctl(
  * rather than trying to fix it we restrict it for the time being.
  */
 if ( /* No nested locks inside copy_to_guest_offset(). */
- paging_mode_external(current-domain) ||
+ paging_mode_external(currd) ||
  /* Arbitrary limit capping processing time. */
  max_pfns  GB(4) / PAGE_SIZE )
 {
@@ -375,8 +375,8 @@ long arch_do_domctl(
 
 domctl-u.getmemlist.num_pfns = i;
 copyback = 1;
+break;
 }
-break;
 
 case XEN_DOMCTL_hypercall_init:
 {
@@ -403,15 +403,15 @@ long arch_do_domctl(
 unmap_domain_page(hypercall_page);

Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015-04-13 Thread Waiman Long


On 04/13/2015 11:08 AM, Peter Zijlstra wrote:

On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote:


+static void pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node)
+{
+   struct __qspinlock *l = (void *)lock;
+   struct qspinlock **lp = NULL;
+   struct pv_node *pn = (struct pv_node *)node;
+   int slow_set = false;
+   int loop;
+
+   for (;;) {
+   for (loop = SPIN_THRESHOLD; loop; loop--) {
+   if (!READ_ONCE(l-locked))
+   return;
+
+   cpu_relax();
+   }
+
+   WRITE_ONCE(pn-state, vcpu_halted);
+   if (!lp)
+   lp = pv_hash(lock, pn);
+   /*
+* lp must be set before setting _Q_SLOW_VAL
+*
+* [S] lp = lock[RmW] l = l-locked = 0
+* MB MB
+* [S] l-locked = _Q_SLOW_VAL  [L]   lp
+*
+* Matches the cmpxchg() in pv_queue_spin_unlock().
+*/
+   if (!slow_set
+   !cmpxchg(l-locked, _Q_LOCKED_VAL, _Q_SLOW_VAL)) {
+   /*
+* The lock is free and _Q_SLOW_VAL has never been
+* set. Need to clear the hash bucket before getting
+* the lock.
+*/
+   WRITE_ONCE(*lp, NULL);
+   return;
+   } else if (slow_set   !READ_ONCE(l-locked))
+   return;
+   slow_set = true;

I'm somewhat puzzled by the slow_set thing; what is wrong with the thing
I had, namely:

if (!lp) {
lp = pv_hash(lock, pn);

/*
 * comment
 */
lv = cmpxchg(l-locked, _Q_LOCKED_VAL, _Q_SLOW_VAL);
if (lv != _Q_LOCKED_VAL) {
/* we're woken, unhash and return */
WRITE_ONCE(*lp, NULL);
return;
}
}

+
+   pv_wait(l-locked, _Q_SLOW_VAL);

If we get a spurious wakeup (due to device interrupts or random kick)
we'll loop around but -locked will remain _Q_SLOW_VAL.

The purpose of the slow_set flag is not about the lock value. It is to make
sure that pv_hash_find() will always find a match. Consider the following
scenario:

cpu1cpu2cpu3

pv_wait
spurious wakeup
loop l-locked

 read _Q_SLOW_VAL
 pv_hash_find()
 unlock

 pv_hash()- same entry

cmpxchg(l-locked)
clear hash (?)

Here, the entry for cpu3 will be removed leading to panic when
pv_hash_find() can find the entry. So the hash entry can only be
removed if the other cpu has no chance to see _Q_SLOW_VAL.

Still confused. Afaict that cannot happen with my code. We only do the
cmpxchg(, _Q_SLOW_VAL) _once_.

Only on the first time around that loop will we hash the lock and set
the slow flag. And cpu3 cannot come in on the same entry because we've
not yet released the lock when we find and unhash.




Maybe I am not clear in my illustration. More than one locks can be 
hashed to the same value. So cpu3 is accessing a different lock which 
has the same hashed value as the lock used by cpu1 and cpu2.


Anyway, I remove the slow_set flag by unrolling the retry loop so that 
after pv_wait(), it goes into the 2nd loop instead of going back to the 
top. As a result, cmpxchg and pv_hash can only be called once.


Cheers,
Longman

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 1/5] sysctl: Make XEN_SYSCTL_numainfo a little more efficient