[Xen-devel] [seabios test] 36524: regressions - FAIL

2015-03-19 Thread xen . org
flight 36524 seabios real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/36524/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-win7-amd64  7 windows-install   fail REGR. vs. 35697

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass

version targeted for testing:
 seabios  a1ac8861049a5ffefc26ca294293ad666954fcc8
baseline version:
 seabios  d23eba6ea3d429ed8a4a34bae7faad20ce44d8a1


People who touched revisions under test:
  Gerd Hoffmann 
  Kevin O'Connor 
  Marcel Apfelbaum 
  Marcel Apfelbaum 
  Paolo Bonzini 


jobs:
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-pvh-amd  fail
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64fail
 test-amd64-amd64-xl-credit2  pass
 test-amd64-i386-freebsd10-i386   pass
 test-amd64-amd64-xl-pcipt-intel  fail
 test-amd64-amd64-xl-pvh-intelfail
 test-amd64-i386-rhel6hvm-intel   pass
 test-amd64-i386-qemut-rhel6hvm-intel pass
 test-amd64-i386-qemuu-rhel6hvm-intel pass
 test-amd64-amd64-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-xl-multivcpupass
 test-amd64-amd64-pairpass
 test-amd64-i386-pair pass
 test-amd64-amd64-xl-sedf-pin pass
 test-amd64-amd64-

[Xen-devel] [rumpuserxen bisection] complete build-i386-rumpuserxen

2015-03-19 Thread xen . org
branch xen-unstable
xen branch xen-unstable
job build-i386-rumpuserxen
test rumpuserxen-build

Tree: qemu git://xenbits.xen.org/staging/qemu-xen-unstable.git
Tree: qemuu git://xenbits.xen.org/staging/qemu-upstream-unstable.git
Tree: rumpuserxen https://github.com/rumpkernel/rumprun-xen
Tree: rumpuserxen_buildrumpsh https://github.com/rumpkernel/buildrump.sh.git
Tree: rumpuserxen_netbsdsrc https://github.com/rumpkernel/src-netbsd
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  rumpuserxen https://github.com/rumpkernel/rumprun-xen
  Bug introduced:  ac3ac63c62f33d4bd043ae008cf9e2909d485215
  Bug not present: 9b21ee776d66e5cc31bd27da3cf95b1d38e39eb9


  commit ac3ac63c62f33d4bd043ae008cf9e2909d485215
  Author: Antti Kantee 
  Date:   Tue Mar 3 22:20:44 2015 +
  
  Move files into platform/xen
  
  Prepares for the rumprun merge.


For bisection revision-tuple graph see:
   
http://www.chiark.greenend.org.uk/~xensrcts/results/bisect.rumpuserxen.build-i386-rumpuserxen.rumpuserxen-build.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Searching for failure / basis pass:
 36177 fail [host=grain-weevil] / 35854 ok.
Failure / basis pass flights: 36177 / 35854
(tree with no url: ovmf)
(tree with no url: seabios)
Tree: qemu git://xenbits.xen.org/staging/qemu-xen-unstable.git
Tree: qemuu git://xenbits.xen.org/staging/qemu-upstream-unstable.git
Tree: rumpuserxen https://github.com/rumpkernel/rumprun-xen
Tree: rumpuserxen_buildrumpsh https://github.com/rumpkernel/buildrump.sh.git
Tree: rumpuserxen_netbsdsrc https://github.com/rumpkernel/src-netbsd
Tree: xen git://xenbits.xen.org/xen.git
Latest a4b276b4ce49c8d70dd841ff885b900ec652b994 
0d37748342e29854db7c9f6c47d7f58c6cfba6b2 
7b103921add3cc1b30204d416ba246bfc8bdc05f 
3a7baff4c60335672b2905c3900ceb6a036c3ef6 
a2f6cbdc2a407e618af697b80db866c15e68656f 
f919dbc0583797d1c5c09da815518084ce77eb81
Basis pass a4b276b4ce49c8d70dd841ff885b900ec652b994 
0d37748342e29854db7c9f6c47d7f58c6cfba6b2 
fe65ed93201e2e522204a065ca2b2f153b610388 
b2e05fb7d236093a0e14f66ce21bdac6b36e7b35 
a2f6cbdc2a407e618af697b80db866c15e68656f 
24b2b8dea180098a3acc809a91cde6c0cc4c8607
Generating revisions with ./adhoc-revtuple-generator  
git://xenbits.xen.org/staging/qemu-xen-unstable.git#a4b276b4ce49c8d70dd841ff885b900ec652b994-a4b276b4ce49c8d70dd841ff885b900ec652b994
 
git://xenbits.xen.org/staging/qemu-upstream-unstable.git#0d37748342e29854db7c9f6c47d7f58c6cfba6b2-0d37748342e29854db7c9f6c47d7f58c6cfba6b2
 
https://github.com/rumpkernel/rumprun-xen#fe65ed93201e2e522204a065ca2b2f153b610388-7b103921add3cc1b30204d416ba246bfc8bdc05f
 
https://github.com/rumpkernel/buildrump.sh.git#b2e05fb7d236093a0e14f66ce21bdac6b36e7b35-3a7baff4c60335672b2905c3900ceb6a036c3ef6
 
https://github.com/rumpkernel/src-netbsd#a2f6cbdc2a407e618af697b80db866c15e68656f-a2f6cbdc2a407e618af697b80db866c15e68656f
 
git://xenbits.xen.org/xen.git#24b2b8dea180098a3acc809a91cde6c0cc4c8607-f919dbc0583797d1c5c09da815518084ce77eb81
+ exec
+ sh -xe
+ cd /export/home/osstest/repos/rumprun-xen
+ git remote set-url origin 
git://drall.uk.xensource.com:9419/https://github.com/rumpkernel/rumprun-xen
+ git fetch -p origin +refs/heads/*:refs/remotes/origin/*
+ exec
+ sh -xe
+ cd /export/home/osstest/repos/buildrump.sh
+ git remote set-url origin 
git://drall.uk.xensource.com:9419/https://github.com/rumpkernel/buildrump.sh.git
+ git fetch -p origin +refs/heads/*:refs/remotes/origin/*
+ exec
+ sh -xe
+ cd /export/home/osstest/repos/xen
+ git remote set-url origin 
git://drall.uk.xensource.com:9419/git://xenbits.xen.org/xen.git
+ git fetch -p origin +refs/heads/*:refs/remotes/origin/*
+ exec
+ sh -xe
+ cd /export/home/osstest/repos/rumprun-xen
+ git remote set-url origin 
git://drall.uk.xensource.com:9419/https://github.com/rumpkernel/rumprun-xen
+ git fetch -p origin +refs/heads/*:refs/remotes/origin/*
+ exec
+ sh -xe
+ cd /export/home/osstest/repos/buildrump.sh
+ git remote set-url origin 
git://drall.uk.xensource.com:9419/https://github.com/rumpkernel/buildrump.sh.git
+ git fetch -p origin +refs/heads/*:refs/remotes/origin/*
+ exec
+ sh -xe
+ cd /export/home/osstest/repos/xen
+ git remote set-url origin 
git://drall.uk.xensource.com:9419/git://xenbits.xen.org/xen.git
+ git fetch -p origin +refs/heads/*:refs/remotes/origin/*
Use of uninitialized value $parents in array dereference at 
./adhoc-revtuple-generator line 461.
Use of uninitialized value in concatenation (.) or string at 
./adhoc-revtuple-generator line 461.
Use of uninitialized value $parents in array dereference at 
./adhoc-revtuple-generator line 461.
Use of uninitialized value in concatenation (.) or string at 
./adhoc-revtuple-generator line 461.
Loaded 2077 nodes in revision graph
Searching for test results:
 35897 fail irrelevant
 35854 pass a4b276b4ce49c8d70dd841ff885b900ec652b994 
0d37748342e29854db7c9f6c47d7f58c6cfba6b2 
fe65ed93201e2e52220

Re: [Xen-devel] Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)

2015-03-19 Thread Marek Marczykowski-Górecki
On Thu, Mar 19, 2015 at 03:10:49PM +0200, Vitaly Chernooky wrote:
> David,
> 
> On Thu, Mar 19, 2015 at 3:00 PM, David Vrabel 
> wrote:
> 
> > On 19/03/15 12:10, Iurii Konovalenko wrote:
> > > Hi, guys!
> > >
> > > When I read, that I am not alone and that issue depends on kernel
> > > version, I decided to continue investigation.
> > > And I found why our threads locks on read/write operations.
> > > On Linux kernel 3.14+ syscalls of file read and write changed a bit:
> > > fdget() function was replaced by fdget_pos() - it is fdget() function
> > > plus additional position mutex lock for files with FMODE_ATOMIC_POS
> > > (files for inodes with S_IFREG flag set - regular nodes). As I thought
> > > our xen files are not regular and nonseekable, I hoped this flag is
> > > not set. But it is set. It is because our file system is created by
> > > function simple_fill_super(), and inside it this flag is hardly set:
> > > inode->i_mode = S_IFREG | files->mode;
> > > So, as a fast hack I made a patch: just made copy of this function for
> > > xen, which does not set this flag. It works for me. Could you please
> > > check if it works for you.
> >
> > I still can't get this to deadlock, but why not clear FMODE_ATOMIC_POS
> > in xenbus_file_open() ?
> >
> 
> Because it is not the root of issue. FMODE_ATOMIC_POS is just one of
> results of bug. Iurii has fixed the root of issue but in suboptimal way. So
> we just need to have found optimal way.

I can just confirm that:
1. (unsurprisingly) the bug is still present in 4.0-rc4
2. both proposed fixes are effective

I'm not sure if removing S_IFREG completely is a good idea, I guess
there will be much more side effects...
What about another idea: xenbus_file_open uses nonseekable_open - this
looks like a good place to clear FMODE_ATOMIC_POS if present? It
doesn't make sense to get a lock for position on nonseekable file,
right?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


pgp6U0o87k5OO.pgp
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] SeaBios/vTPM: Enable Xen stubdom vTPM for HVM virtual machine

2015-03-19 Thread Xu, Quan


> -Original Message-
> From: Ian Campbell [mailto:ian.campb...@citrix.com]
> Sent: Thursday, March 19, 2015 8:57 PM
> To: Xu, Quan
> Cc: ke...@koconnor.net; stef...@linux.vnet.ibm.com; xen-devel@lists.xen.org;
> qemu-de...@nongnu.org; stefano.stabell...@eu.citrix.com
> Subject: Re: [Xen-devel] [PATCH] SeaBios/vTPM: Enable Xen stubdom vTPM for
> HVM virtual machine
> 
> On Tue, 2015-03-10 at 08:16 -0400, Quan Xu wrote:
> > @@ -151,6 +152,8 @@ device_hardware_setup(void)
> >  esp_scsi_setup();
> >  megasas_setup();
> >  pvscsi_setup();
> > +if (runningOnXen())
> > +vtpm4hvm_setup();
> 
> Is there anything which is actually Xen specific about the driver in tpm.[ch]?
> Would it be better to just probe for it, perhaps gates by a Kconfig option 
> which
> enables TPM support.
> 
> And following that train of thought I think you could reasonable drop "4hvm"
> from the name. And possibly even the leading "v", since I suppose seabios
> shouldn't really care if the tpm is emulated or real so long as it looks like 
> a real
> tpm.
> 
> Ian.

Thanks for your review. Make sense.

Quan
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 00/33] xen/arm: Add support for non-PCI passthrough

2015-03-19 Thread Edgar E. Iglesias
On Thu, Mar 19, 2015 at 07:29:26PM +, Julien Grall wrote:
> Hello all,
> 
> This is the fourth version of this patch series to add support for platform
> device passthrough on ARM.
> 
> In order to passthrough a non-PCI device, the user will have to:
> - Map manually MMIO/IRQ
> - Describe the device in the newly partial device tree support
> - Specify the list of device protected by an IOMMU to assign to the
> guest.
> 
> While this solution is primitive, this is allow us to support more complex
> device in Xen with an little additionnal work for the user. Attempting to
> do it automatically is more difficult because we may not know the dependencies
> between devices (for instance a Network card and a phy).
> 
> To avoid adding code in DOM0 to manage platform device deassignment, the
> user has to add the property "xen,passthrough" to the device tree node
> describing the device. This can be easily done via U-Boot. For instance,
> if we want to passthrough the second network card of a Midway server to the
> guest. The user will have to add the following line the u-boot script:
> 
> fdt set /soc/ethernet@fff51000 xen,passthrough
> 
> This series has been tested on Midway by assigning the secondary network card
> to a guest (see instruction below). Though, it requires a separate patch as
> we decide to not support the Midway SMMU within the new drivers.
> 
> I plan to do futher testing on other boards.

Hi Julien,

I did a bring-up of your work (an older version of your patches) on
ZynqMP QEMU and it works nicely. Thanks for working on this!

The partial device-tree support is nice and very flexible. I couldn't help
thinking that it would be nice to be able to describe more of the
guest with device-trees. It may be controversial but it would be cool
to be able to go:

xl create my-guest.dtb

A more down-to earth thing I ran into is that on the ZynqMP, the Cortex-A53
is setup to have 40 bits physical addresses. Our SMMU announces support
for up to 48bit input addresses (but can be configured for 40bits
aswell).

When XEN sets up passthrough for a dev, it probes the SMMU for the
max input address size and uses that as the input size for the
context. But because XEN reuses the page tables from p2m for the
SMMU, we end up in a miss-match.

I haven't looked at the details of how to fix but my gut feeling
is that we should be re-using the input size of the stage 2
page-tables as the input-size for the SMMU.
And only use the max input size of the SMMU to assert that it
is big enough. I may be missing something though.

The code in question is at the end of arm_smmu_device_cfg_probe(),
already merged.

Best regards,
Edgar

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [qemu-upstream-unstable test] 36521: regressions - FAIL

2015-03-19 Thread xen . org
flight 36521 qemu-upstream-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/36521/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-pair   17 guest-migrate/src_host/dst_host fail REGR. vs. 33488
 test-amd64-amd64-xl-winxpsp3  7 windows-install   fail REGR. vs. 33488

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-midway   10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-armhf-armhf-xl-credit2  10 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass

version targeted for testing:
 qemuu42ffdf360dd9df66b0a4a7ada059c02a3cf3a8de
baseline version:
 qemuu0d37748342e29854db7c9f6c47d7f58c6cfba6b2


People who touched revisions under test:
  Alex Williamson 
  Alexander Graf 
  Andreas Färber 
  Cornelia Huck 
  Daniel P. Berrange 
  David Gibson 
  Dinar Valeev 
  Don Slutz 
  Dr. David Alan Gilbert 
  Eduardo Habkost 
  Fam Zheng 
  Gary R Hook 
  Gerd Hoffmann 
  Igor Mammedov 
  Juan Quintela 
  Jun Li 
  Kevin Wolf 
  Laurent Desnogues 
  Leon Alrae 
  Marcel Apfelbaum 
  Max Filippov 
  Max Reitz 
  Michael Roth 
  Michael S. Tsirkin 
  Michael Tokarev 
  Paolo Bonzini 
  Paul Durrant 
  Peter Maydell 
  Peter Wu 
  Riku Voipio 
  Stefan Hajnoczi 
  Stefano Stabellini 
  Vladimir Sementsov-Ogievskiy 
  Zhang Haoyu 


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-pvh-amd  fail
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-

Re: [Xen-devel] [v2][PATCH 2/2] libxl: introduce gfx_passthru_kind

2015-03-19 Thread Chen, Tiejun



On 2015/3/19 18:44, Ian Campbell wrote:

On Thu, 2015-03-19 at 10:07 +0800, Chen, Tiejun wrote:

This duplicates the code from above. I think this would be best done as:

static int libxl__detect_gfx_passthru_kind(libxl__gc *gc, guest_config)
{
  if (b_info->u.hvm.gfx_passthru_kind != LIBXL_GFX_PASSTHRU_KIND_DEFAULT)
  return 0;

  if (libxl__is_igd_vga_passthru(gc, guest_config)) {
  b_info->u.hvm.gfx_passthru_kind = LIBXL_GFX_PASSTHRU_KIND_IGD;
  return 0;
  }

  LOG(ERROR, "Unable to detect graphics passthru kind");
  return 1;
}

Then for the code in libxl__build_device_model_args_new:

   if (libxl_defbool_val(b_info->u.hvm.gfx_passthru)) {
   if (!libxl__detect_gfx_passthru_kind(gc, guest_config))
return NULL
   switch (b_info->u.hvm.gfx_passthru_kind) {
   case LIBXL_GFX_PASSTHRU_KIND_IGD:
   machinearg = GCSPRINTF("%s,igd-passthru=on", machinearg);
   break;
   default:
   LOG(ERROR, "unknown gfx_passthru_kind\n");
  return NULL;
   }
  }

That is, a helper to try and autodetect kind if it is default and then a
single switch entry for each kind.


+default:
+LOG(WARN, "gfx_passthru_kind is invalid so ignored.\n");


Please return an error here, as I've shown above.


Looks good and thanks, but here 'guest_config' is a const so we
shouldn't/can't reset b_info->u.hvm.gfx_passthru_kind like this,

b_info->u.hvm.gfx_passthru_kind = LIBXL_GFX_PASSTHRU_KIND_IGD;

So I tried to refactor a little bit to follow up yours,

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 8599a6a..605b17c 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -409,6 +409,23 @@ static char *dm_spice_options(libxl__gc *gc,
   return opt;
   }

+static int
+libxl__detect_gfx_passthru_kind(libxl__gc *gc,
+const libxl_domain_config *guest_config)
+{
+const libxl_domain_build_info *b_info = &guest_config->b_info;
+
+if (b_info->u.hvm.gfx_passthru_kind != LIBXL_GFX_PASSTHRU_KIND_DEFAULT)
+return b_info->u.hvm.gfx_passthru_kind;
+
+if (libxl__is_igd_vga_passthru(gc, guest_config)) {
+return LIBXL_GFX_PASSTHRU_KIND_IGD;
+}
+
+LOG(ERROR, "Unable to detect graphics passthru kind");
+return -1;


I think you can make this function return enum
libxl_gfx_passthrough_kind and then return
LIBXL_GFX_PASSTHRU_KIND_DEFAULT in this case.


+}
+
   static char ** libxl__build_device_model_args_new(libxl__gc *gc,
   const char *dm, int guest_domid,
   const libxl_domain_config
*guest_config,
@@ -427,7 +444,7 @@ static char **
libxl__build_device_model_args_new(libxl__gc *gc,
   const char *keymap = dm_keymap(guest_config);
   char *machinearg;
   flexarray_t *dm_args;
-int i, connection, devid;
+int i, connection, devid, gfx_passthru_kind;


Please declare this in the smallest necessary scope...
[...]

+if (libxl_defbool_val(b_info->u.hvm.gfx_passthru)) {
+gfx_passthru_kind = libxl__detect_gfx_passthru_kind(gc,


i.e. here.


+
guest_config);
+switch (gfx_passthru_kind) {
+case LIBXL_GFX_PASSTHRU_KIND_IGD:
+machinearg = GCSPRINTF("%s,igd-passthru=on", machinearg);
+break;
+default:


With the suggestion to return KIND_DEFAULT if detection fails then I
think an extra case should be added:
 case LIBXL_GFX_PASSTHRU_KIND_DEFAULT:
 LOG(ERROR, "unable to detect required
 gfx_passthru_kind");
 return NULL;


+LOG(ERROR, "unknown gfx_passthru_kind\n");


I think LOG is supposed to not include the final \n.



Refactor again,

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 8599a6a..05c8916 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -409,6 +409,23 @@ static char *dm_spice_options(libxl__gc *gc,
 return opt;
 }

+static enum libxl_gfx_passthru_kind
+libxl__detect_gfx_passthru_kind(libxl__gc *gc,
+const libxl_domain_config *guest_config)
+{
+const libxl_domain_build_info *b_info = &guest_config->b_info;
+
+if (b_info->u.hvm.gfx_passthru_kind != LIBXL_GFX_PASSTHRU_KIND_DEFAULT)
+return b_info->u.hvm.gfx_passthru_kind;
+
+if (libxl__is_igd_vga_passthru(gc, guest_config)) {
+return LIBXL_GFX_PASSTHRU_KIND_IGD;
+}
+
+LOG(ERROR, "Unable to detect graphics passthru kind");
+return LIBXL_GFX_PASSTHRU_KIND_DEFAULT;
+}
+
 static char ** libxl__build_device_model_args_new(libxl__gc *gc,
 const char *dm, int guest_domid,
 const libxl_domain_config 
*guest_config,
@@ -757,6 +771,21 @

Re: [Xen-devel] [PATCH v2 08/13] libxc: Check xc_domain_maximum_gpfn for negative return values

2015-03-19 Thread Konrad Rzeszutek Wilk
> How about this (compile tested but not yet runtime tested):

Runtime tested now too.

I've put the whole lot (including this patch) on 

 git://xenbits.xen.org/people/konradwilk/xen.git xc_cleanup.v3

To ease pulling it in.

Thank you!
> 
> 
> From 92085d29b7e2906095a2bc6849b5a17b478e5c79 Mon Sep 17 00:00:00 2001
> From: Konrad Rzeszutek Wilk 
> Date: Fri, 13 Mar 2015 14:57:44 -0400
> Subject: [PATCH v3] libxc: Check xc_domain_maximum_gpfn for negative return
>  values
> 
> Instead of assuming everything is always OK. We stash
> the gpfns value as an parameter. Since we use it in three
> of places we might as well stick it in a common file for
> all three of them to use.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  tools/libxc/xc_core_arm.c| 11 ---
>  tools/libxc/xc_core_x86.c| 18 ++
>  tools/libxc/xc_domain_save.c |  6 +-
>  tools/libxc/xc_private.c | 12 
>  tools/libxc/xc_private.h |  2 ++
>  5 files changed, 33 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/libxc/xc_core_arm.c b/tools/libxc/xc_core_arm.c
> index 16508e7..846bc6c 100644
> --- a/tools/libxc/xc_core_arm.c
> +++ b/tools/libxc/xc_core_arm.c
> @@ -30,12 +30,6 @@ xc_core_arch_gpfn_may_present(struct xc_core_arch_context 
> *arch_ctxt,
>  return 0;
>  }
>  
> -
> -static int nr_gpfns(xc_interface *xch, domid_t domid)
> -{
> -return xc_domain_maximum_gpfn(xch, domid) + 1;
> -}
> -
>  int
>  xc_core_arch_auto_translated_physmap(const xc_dominfo_t *info)
>  {
> @@ -48,9 +42,12 @@ xc_core_arch_memory_map_get(xc_interface *xch, struct 
> xc_core_arch_context *unus
>  xc_core_memory_map_t **mapp,
>  unsigned int *nr_entries)
>  {
> -unsigned long p2m_size = nr_gpfns(xch, info->domid);
> +unsigned long p2m_size = 0;
>  xc_core_memory_map_t *map;
>  
> +if ( xc_nr_gpfns(xch, info->domid, &p2m_size) < 0 )
> +return -1;
> +
>  map = malloc(sizeof(*map));
>  if ( map == NULL )
>  {
> diff --git a/tools/libxc/xc_core_x86.c b/tools/libxc/xc_core_x86.c
> index d8846f1..2f5ffea 100644
> --- a/tools/libxc/xc_core_x86.c
> +++ b/tools/libxc/xc_core_x86.c
> @@ -35,12 +35,6 @@ xc_core_arch_gpfn_may_present(struct xc_core_arch_context 
> *arch_ctxt,
>  return 1;
>  }
>  
> -
> -static int nr_gpfns(xc_interface *xch, domid_t domid)
> -{
> -return xc_domain_maximum_gpfn(xch, domid) + 1;
> -}
> -
>  int
>  xc_core_arch_auto_translated_physmap(const xc_dominfo_t *info)
>  {
> @@ -53,9 +47,12 @@ xc_core_arch_memory_map_get(xc_interface *xch, struct 
> xc_core_arch_context *unus
>  xc_core_memory_map_t **mapp,
>  unsigned int *nr_entries)
>  {
> -unsigned long p2m_size = nr_gpfns(xch, info->domid);
> +unsigned long p2m_size = 0;
>  xc_core_memory_map_t *map;
>  
> +if ( xc_nr_gpfns(xch, info->domid, &p2m_size) < 0 )
> +return -1;
> +
>  map = malloc(sizeof(*map));
>  if ( map == NULL )
>  {
> @@ -88,7 +85,12 @@ xc_core_arch_map_p2m_rw(xc_interface *xch, struct 
> domain_info_context *dinfo, xc
>  int err;
>  int i;
>  
> -dinfo->p2m_size = nr_gpfns(xch, info->domid);
> +if ( xc_nr_gpfns(xch, info->domid, &dinfo->p2m_size) < 0 )
> +{
> +ERROR("Could not get maximum GPFN!");
> +goto out;
> +}
> +
>  if ( dinfo->p2m_size < info->nr_pages  )
>  {
>  ERROR("p2m_size < nr_pages -1 (%lx < %lx", dinfo->p2m_size, 
> info->nr_pages - 1);
> diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
> index 254fdb3..b410273 100644
> --- a/tools/libxc/xc_domain_save.c
> +++ b/tools/libxc/xc_domain_save.c
> @@ -939,7 +939,11 @@ int xc_domain_save(xc_interface *xch, int io_fd, 
> uint32_t dom, uint32_t max_iter
>  }
>  
>  /* Get the size of the P2M table */
> -dinfo->p2m_size = xc_domain_maximum_gpfn(xch, dom) + 1;
> +if ( xc_nr_gpfns(xch, dom, &dinfo->p2m_size) < 0 )
> +{
> +ERROR("Could not get maximum GPFN!");
> +goto out;
> +}
>  
>  if ( dinfo->p2m_size > ~XEN_DOMCTL_PFINFO_LTAB_MASK )
>  {
> diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
> index 0735e23..0eb49ee 100644
> --- a/tools/libxc/xc_private.c
> +++ b/tools/libxc/xc_private.c
> @@ -540,6 +540,18 @@ long xc_maximum_ram_page(xc_interface *xch)
>  return do_memory_op(xch, XENMEM_maximum_ram_page, NULL, 0);
>  }
>  
> +int xc_nr_gpfns(xc_interface *xch, domid_t domid, unsigned long *gpfns)
> +{
> +int rc = xc_domain_maximum_gpfn(xch, domid);
> +
> +if ( rc >= 0 )
> +{
> +*gpfns = rc + 1;
> +rc = 0;
> +}
> +return rc;
> +}
> +
>  long long xc_domain_get_cpu_usage( xc_interface *xch, domid_t domid, int 
> vcpu )
>  {
>  DECLARE_DOMCTL;
> diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
> index 45b8644..4b7f001 100644
> --- a/tools/libxc/xc_private.h
> +++ b/tools/lib

[Xen-devel] xe timer

2015-03-19 Thread HANNAS YAYA Issa

Hello
I want to implement in xen hypervisor but I don't know how to do it. I 
search in google but I do not found how to use the xen timer (not 
linux).

when I compile xen the timer run only once. here is my code.

static void timer_handler(void *unused)
{
printk("hello world in timer\n");
}
static struct timer *domain_timer;

somewhere in my xen source I initialise the timer:

domain_timer = xmalloc(struct timer);
init_timer(domain_timer, timer_handler,NULL,0);
set_timer(domain_timer, SECONDS(60));

please can anybody explain what is wrong in my code

Thank you

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-4.3-testing test] 36526: regressions - FAIL

2015-03-19 Thread xen . org
flight 36526 xen-4.3-testing real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/36526/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i3865 xen-build fail REGR. vs. 36483
 build-amd64   5 xen-build fail REGR. vs. 36483

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 build-amd64-rumpuserxen   1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 build-i386-rumpuserxen1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-pv1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pcipt-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-i386-rhel6hvm-intel  1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemut-debianhvm-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-i386-xl1 build-check(1)   blocked  n/a
 test-amd64-i386-rhel6hvm-amd  1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-qemut-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-armhf-armhf-xl-multivcpu  5 xen-boot fail  never pass
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  5 xen-boot fail   never pass
 test-amd64-amd64-xl-sedf-pin  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-sedf-pin  5 xen-boot fail   never pass
 test-armhf-armhf-xl-sedf  5 xen-boot fail   never pass
 test-armhf-armhf-xl-midway5 xen-boot fail   never pass
 test-armhf-armhf-xl   5 xen-boot fail   never pass
 test-amd64-amd64-pv   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   5 xen-boot fail   never pass
 test-amd64-amd64-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-sedf  1 build-check(1)   blocked  n/a
 test-amd64-i386-qemut-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-i386-xl-win7-amd64  1 build-check(1)   blocked  n/a
 test-amd64-i386-xend-qemut-winxpsp3  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-winxpsp3-vcpus1  1 build-check(1)   blocked n/a
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemut-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-amd64-xl-win7-amd64  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-winxpsp3  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl-qemut-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-winxpsp3  1 build-check(1)   blocked  n/a
 test-amd64-i386-xend-winxpsp3  1 build-check(1)   blocked  n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-winxpsp3  1 build-check(1)   blocked n/a

version targeted for testing:
 xen  c58b16ef1572176cf2f6a424b527b5ed4bb73f17
baseline version:
 xen  1cf1e6024bfec941e10fe7308b04c9da1a7e74e4


People who touched revisions under test:
  Jan Beulich 


jobs:
 build-amd64

Re: [Xen-devel] [PATCH 8/9] qspinlock: Generic paravirt support

2015-03-19 Thread Waiman Long

On 03/19/2015 08:25 AM, Peter Zijlstra wrote:

On Thu, Mar 19, 2015 at 11:12:42AM +0100, Peter Zijlstra wrote:

So I was now thinking of hashing the lock pointer; let me go and quickly
put something together.

A little something like so; ideally we'd allocate the hashtable since
NR_CPUS is kinda bloated, but it shows the idea I think.

And while this has loops in (the rehashing thing) their fwd progress
does not depend on other CPUs.

And I suspect that for the typical lock contention scenarios its
unlikely we ever really get into long rehashing chains.

---
  include/linux/lfsr.h|   49 
  kernel/locking/qspinlock_paravirt.h |  143 

  2 files changed, 178 insertions(+), 14 deletions(-)


This is a much better alternative.


--- /dev/null
+++ b/include/linux/lfsr.h
@@ -0,0 +1,49 @@
+#ifndef _LINUX_LFSR_H
+#define _LINUX_LFSR_H
+
+/*
+ * Simple Binary Galois Linear Feedback Shift Register
+ *
+ * http://en.wikipedia.org/wiki/Linear_feedback_shift_register
+ *
+ */
+
+extern void __lfsr_needs_more_taps(void);
+
+static __always_inline u32 lfsr_taps(int bits)
+{
+   if (bits ==  1) return 0x0001;
+   if (bits ==  2) return 0x0001;
+   if (bits ==  3) return 0x0003;
+   if (bits ==  4) return 0x0009;
+   if (bits ==  5) return 0x0012;
+   if (bits ==  6) return 0x0021;
+   if (bits ==  7) return 0x0041;
+   if (bits ==  8) return 0x008E;
+   if (bits ==  9) return 0x0108;
+   if (bits == 10) return 0x0204;
+   if (bits == 11) return 0x0402;
+   if (bits == 12) return 0x0829;
+   if (bits == 13) return 0x100D;
+   if (bits == 14) return 0x2015;
+
+   /*
+* For more taps see:
+*   http://users.ece.cmu.edu/~koopman/lfsr/index.html
+*/
+   __lfsr_needs_more_taps();
+
+   return 0;
+}
+
+static inline u32 lfsr(u32 val, int bits)
+{
+   u32 bit = val&  1;
+
+   val>>= 1;
+   if (bit)
+   val ^= lfsr_taps(bits);
+   return val;
+}
+
+#endif /* _LINUX_LFSR_H */
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -2,6 +2,9 @@
  #error "do not include this file"
  #endif

+#include
+#include
+
  /*
   * Implement paravirt qspinlocks; the general idea is to halt the vcpus 
instead
   * of spinning them.
@@ -107,7 +110,120 @@ static void pv_kick_node(struct mcs_spin
pv_kick(pn->cpu);
  }

-static DEFINE_PER_CPU(struct qspinlock *, __pv_lock_wait);
+/*
+ * Hash table using open addressing with an LFSR probe sequence.
+ *
+ * Since we should not be holding locks from NMI context (very rare indeed) the
+ * max load factor is 0.75, which is around the point where open addressing
+ * breaks down.
+ *
+ * Instead of probing just the immediate bucket we probe all buckets in the
+ * same cacheline.
+ *
+ * http://en.wikipedia.org/wiki/Hash_table#Open_addressing
+ *
+ */
+
+#define HB_RESERVED((struct qspinlock *)1)
+
+struct pv_hash_bucket {
+   struct qspinlock *lock;
+   int cpu;
+};
+
+/*
+ * XXX dynamic allocate using nr_cpu_ids instead...
+ */
+#define PV_LOCK_HASH_BITS  (2 + NR_CPUS_BITS)
+


As said here, we should make it dynamically allocated depending on 
num_possible_cpus().



+#if PV_LOCK_HASH_BITS<  6
+#undef PV_LOCK_HASH_BITS
+#define PB_LOCK_HASH_BITS  6
+#endif
+
+#define PV_LOCK_HASH_SIZE  (1<<  PV_LOCK_HASH_BITS)
+
+static struct pv_hash_bucket __pv_lock_hash[PV_LOCK_HASH_SIZE] 
cacheline_aligned;
+
+#define PV_HB_PER_LINE (SMP_CACHE_BYTES / sizeof(struct 
pv_hash_bucket))
+
+static inline u32 hash_align(u32 hash)
+{
+   return hash&  ~(PV_HB_PER_LINE - 1);
+}
+
+static struct qspinlock **pv_hash(struct qspinlock *lock)
+{
+   u32 hash = hash_ptr(lock, PV_LOCK_HASH_BITS);
+   struct pv_hash_bucket *hb, *end;
+
+   if (!hash)
+   hash = 1;
+
+   hb =&__pv_lock_hash[hash_align(hash)];
+   for (;;) {
+   for (end = hb + PV_HB_PER_LINE; hb<  end; hb++) {
+   if (cmpxchg(&hb->lock, NULL, HB_RESERVED)) {
+   WRITE_ONCE(hb->cpu, smp_processor_id());
+   /*
+* Since we must read lock first and cpu
+* second, we must write cpu first and lock
+* second, therefore use HB_RESERVE to mark an
+* entry in use before writing the values.
+*
+* This can cause hb_hash_find() to not find a
+* cpu even though _Q_SLOW_VAL, this is not a
+* problem since we re-check l->locked before
+* going to sleep and the unlock will have
+* cleared l->locked already.
+*/
+   smp_wmb();

[Xen-devel] [PATCH v5 5/8] sysctl: Add sysctl interface for querying PCI topology

2015-03-19 Thread Boris Ostrovsky
Signed-off-by: Boris Ostrovsky 
---

Changes in v5:
* Increment ti->first_dev in the loop
* Make node in xen_sysctl_pcitopoinfo a uint32
* Move sysctl to follow hearder file's order
* Update comments in sysctl.h 

 xen/common/sysctl.c |   61 +++
 xen/include/public/sysctl.h |   28 +++
 2 files changed, 89 insertions(+), 0 deletions(-)

diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index acaeeb2..c73dfc9 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -399,6 +399,67 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) 
u_sysctl)
 break;
 #endif
 
+#ifdef HAS_PCI
+case XEN_SYSCTL_pcitopoinfo:
+{
+xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo;
+
+if ( guest_handle_is_null(ti->devs) ||
+ guest_handle_is_null(ti->nodes) ||
+ (ti->first_dev > ti->num_devs) )
+{
+ret = -EINVAL;
+break;
+}
+
+while ( ti->first_dev < ti->num_devs )
+{
+physdev_pci_device_t dev;
+uint32_t node;
+struct pci_dev *pdev;
+
+if ( copy_from_guest_offset(&dev, ti->devs, ti->first_dev, 1) )
+{
+ret = -EFAULT;
+break;
+}
+
+spin_lock(&pcidevs_lock);
+pdev = pci_get_pdev(dev.seg, dev.bus, dev.devfn);
+if ( !pdev || (pdev->node == NUMA_NO_NODE) )
+node = XEN_INVALID_NODE_ID;
+else
+node = pdev->node;
+spin_unlock(&pcidevs_lock);
+
+if ( copy_to_guest_offset(ti->nodes, ti->first_dev, &node, 1) )
+{
+ret = -EFAULT;
+break;
+}
+
+ti->first_dev++;
+
+if ( hypercall_preempt_check() )
+break;
+}
+
+if ( !ret )
+{
+if ( __copy_field_to_guest(u_sysctl, op, u.pcitopoinfo.first_dev) )
+{
+ret = -EFAULT;
+break;
+}
+
+if ( ti->first_dev < ti->num_devs )
+ret = hypercall_create_continuation(__HYPERVISOR_sysctl,
+"h", u_sysctl);
+}
+}
+break;
+#endif
+
 default:
 ret = arch_do_sysctl(op, u_sysctl);
 copyback = 0;
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 7e0d5fe..ceb8ac8 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -33,6 +33,7 @@
 
 #include "xen.h"
 #include "domctl.h"
+#include "physdev.h"
 
 #define XEN_SYSCTL_INTERFACE_VERSION 0x000C
 
@@ -668,6 +669,31 @@ struct xen_sysctl_psr_cmt_op {
 typedef struct xen_sysctl_psr_cmt_op xen_sysctl_psr_cmt_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cmt_op_t);
 
+/* XEN_SYSCTL_pcitopoinfo */
+struct xen_sysctl_pcitopoinfo {
+/* IN: Number of elements in 'pcitopo' and 'nodes' arrays */
+uint32_t num_devs;
+
+/*
+ * IN/OUT: First element of pcitopo array that needs to be processed by
+ * hypervisor.
+ * This is used by hypercall continuations, callers must set it to zero.
+ */
+uint32_t first_dev;
+
+/* IN: list of devices */
+XEN_GUEST_HANDLE_64(physdev_pci_device_t) devs;
+
+/*
+ * OUT: node identifier for each device.
+ * If information for a particular device is not avalable then set
+ * to XEN_INVALID_NODE_ID.
+ */
+XEN_GUEST_HANDLE_64(uint32) nodes;
+};
+typedef struct xen_sysctl_pcitopoinfo xen_sysctl_pcitopoinfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopoinfo_t);
+
 struct xen_sysctl {
 uint32_t cmd;
 #define XEN_SYSCTL_readconsole1
@@ -690,12 +716,14 @@ struct xen_sysctl {
 #define XEN_SYSCTL_scheduler_op  19
 #define XEN_SYSCTL_coverage_op   20
 #define XEN_SYSCTL_psr_cmt_op21
+#define XEN_SYSCTL_pcitopoinfo   22
 uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
 union {
 struct xen_sysctl_readconsole   readconsole;
 struct xen_sysctl_tbuf_op   tbuf_op;
 struct xen_sysctl_physinfo  physinfo;
 struct xen_sysctl_cputopoinfo   cputopoinfo;
+struct xen_sysctl_pcitopoinfo   pcitopoinfo;
 struct xen_sysctl_numainfo  numainfo;
 struct xen_sysctl_sched_id  sched_id;
 struct xen_sysctl_perfc_op  perfc_op;
-- 
1.7.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 2/8] pci: Stash device's PXM information in struct pci_dev

2015-03-19 Thread Boris Ostrovsky
If ACPI provides PXM data for IO devices then dom0 will pass it to
hypervisor during PHYSDEVOP_pci_device_add call. This information,
however, is currently ignored.

We will store this information (in the form of nodeID) in pci_dev
structure so that we can provide it, for example, to the toolstack
when it adds support (in the following patches) for querying the
hypervisor about device topology

We will also print it when user requests device information dump.

Signed-off-by: Boris Ostrovsky 
---

Changes in v5:
* Replace u8 with nodeid_t

 xen/arch/x86/physdev.c|   23 ---
 xen/drivers/passthrough/pci.c |   13 +
 xen/include/public/physdev.h  |6 ++
 xen/include/xen/pci.h |5 -
 4 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 1be1d50..57b7800 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -565,7 +565,8 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 if ( copy_from_guest(&manage_pci, arg, 1) != 0 )
 break;
 
-ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn, NULL);
+ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn,
+ NULL, NUMA_NO_NODE);
 break;
 }
 
@@ -597,13 +598,14 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 pdev_info.physfn.devfn = manage_pci_ext.physfn.devfn;
 ret = pci_add_device(0, manage_pci_ext.bus,
  manage_pci_ext.devfn,
- &pdev_info);
+ &pdev_info, NUMA_NO_NODE);
 break;
 }
 
 case PHYSDEVOP_pci_device_add: {
 struct physdev_pci_device_add add;
 struct pci_dev_info pdev_info;
+nodeid_t node;
 
 ret = -EFAULT;
 if ( copy_from_guest(&add, arg, 1) != 0 )
@@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 }
 else
 pdev_info.is_virtfn = 0;
-ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
+
+if ( add.flags & XEN_PCI_DEV_PXM )
+{
+uint32_t pxm;
+size_t optarr_off = offsetof(struct physdev_pci_device_add, 
optarr) /
+sizeof(add.optarr[0]);
+
+if ( copy_from_guest_offset(&pxm, arg, optarr_off, 1) )
+break;
+
+node = pxm_to_node(pxm);
+}
+else
+node = NUMA_NO_NODE;
+
+ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info, node);
 break;
 }
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 4b83583..ecd061e 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -568,7 +568,8 @@ static void pci_enable_acs(struct pci_dev *pdev)
 pci_conf_write16(seg, bus, dev, func, pos + PCI_ACS_CTRL, ctrl);
 }
 
-int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
+int pci_add_device(u16 seg, u8 bus, u8 devfn,
+   const struct pci_dev_info *info, nodeid_t node)
 {
 struct pci_seg *pseg;
 struct pci_dev *pdev;
@@ -586,7 +587,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct 
pci_dev_info *info)
 pdev = pci_get_pdev(seg, info->physfn.bus, info->physfn.devfn);
 spin_unlock(&pcidevs_lock);
 if ( !pdev )
-pci_add_device(seg, info->physfn.bus, info->physfn.devfn, NULL);
+pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
+   NULL, node);
 pdev_type = "virtual function";
 }
 else
@@ -609,6 +611,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct 
pci_dev_info *info)
 if ( !pdev )
 goto out;
 
+pdev->node = node;
+
 if ( info )
 pdev->info = *info;
 else if ( !pdev->vf_rlen[0] )
@@ -1191,10 +1195,11 @@ static int _dump_pci_devices(struct pci_seg *pseg, void 
*arg)
 
 list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
 {
-printk("%04x:%02x:%02x.%u - dom %-3d - MSIs < ",
+printk("%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs < ",
pseg->nr, pdev->bus,
PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
-   pdev->domain ? pdev->domain->domain_id : -1);
+   pdev->domain ? pdev->domain->domain_id : -1,
+   (pdev->node != NUMA_NO_NODE) ? pdev->node : -1);
 list_for_each_entry ( msi, &pdev->msi_list, list )
printk("%d ", msi->irq);
 printk(">\n");
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index 2683719..f33845d 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -293,6 +293,12 @@ struct physdev_pci_device_add {
 uint8_t bus;
 uint8_t devfn;
 } physfn;
+
+/*
+ * Optional parame

[Xen-devel] [PATCH v5 0/8] Display IO topology when PXM data is available (plus some cleanup)

2015-03-19 Thread Boris Ostrovsky
Changes in v5:
* Make CPU topology and NUMA info sysctls behave more like 
XEN_DOMCTL_get_vcpu_msrs
  when passed NULL buffers. This required toolstack changes as well
* Don't use 8-bit data types in interfaces
* Fold interface version update into patch#3

Changes in v4:
* Split cputopology and NUMA info changes into separate patches
* Added patch#1 (partly because patch#4 needs to know when when distance is 
invalid,
  i.e. NUMA_NO_DISTANCE)
* Split sysctl version update into a separate patch
* Other changes are listed in each patch
* NOTE: I did not test python's xc changes since I don't think I know how.

Changes in v3:
* Added patch #1 to more consistently define nodes as a u8 and properly
  use NUMA_NO_NODE.
* Make changes to xen_sysctl_numainfo, similar to those made to
  xen_sysctl_topologyinfo. (Q: I kept both sets of changes in the same
  patch #3 to avoid bumping interface version twice. Perhaps it's better
  to split it into two?)
* Instead of copying data for each loop index allocate a buffer and copy
  once for all three queries in sysctl.c.
* Move hypercall buffer management from libxl to libxc (as requested by
  Dario, patches #5 and #6).
* Report topology info for offlined CPUs as well
* Added LIBXL_HAVE_PCITOPO macro

Changes in v2:
* Split topology sysctls into two --- one for CPU topology and the other
  for devices
* Avoid long loops in the hypervisor by using continuations. (I am not
  particularly happy about using first_dev in the interface, suggestions
  for a better interface would be appreciated)
* Use proper libxl conventions for interfaces
* Avoid hypervisor stack corruption when copying PXM data from guest


A few patches that add interface for querying hypervisor about device
topology and allow 'xl info -n' display this information if PXM object
is provided by ACPI.

This series also makes some optimizations and cleanup of current CPU
topology and NUMA sysctl queries.



Boris Ostrovsky (8):
  numa: __node_distance() should return u8
  pci: Stash device's PXM information in struct pci_dev
  sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  sysctl: Make XEN_SYSCTL_numainfo a little more efficient
  sysctl: Add sysctl interface for querying PCI topology
  libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer
management to libxc
  libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management
to libxc
  libxl: Add interface for querying hypervisor about PCI topology

 tools/libxc/include/xenctrl.h |   12 ++-
 tools/libxc/xc_misc.c |  102 ---
 tools/libxl/libxl.c   |  183 --
 tools/libxl/libxl.h   |   12 ++
 tools/libxl/libxl_freebsd.c   |   12 ++
 tools/libxl/libxl_internal.h  |5 +
 tools/libxl/libxl_linux.c |   69 +
 tools/libxl/libxl_netbsd.c|   12 ++
 tools/libxl/libxl_types.idl   |7 ++
 tools/libxl/libxl_utils.c |8 ++
 tools/libxl/xl_cmdimpl.c  |   40 ++--
 tools/misc/xenpm.c|  101 --
 tools/python/xen/lowlevel/xc/xc.c |  105 +++-
 xen/arch/x86/physdev.c|   23 -
 xen/arch/x86/srat.c   |   13 ++-
 xen/common/page_alloc.c   |4 +-
 xen/common/sysctl.c   |  200 +++--
 xen/drivers/passthrough/pci.c |   13 ++-
 xen/include/asm-x86/numa.h|2 +-
 xen/include/public/physdev.h  |6 +
 xen/include/public/sysctl.h   |  138 -
 xen/include/xen/numa.h|3 +-
 xen/include/xen/pci.h |5 +-
 23 files changed, 715 insertions(+), 360 deletions(-)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 1/8] numa: __node_distance() should return u8

2015-03-19 Thread Boris Ostrovsky
SLIT values are byte-sized and some of them (0-9 and 255) have
special meaning. Adjust __node_distance() to reflect this and
modify scrub_heap_pages() and do_sysctl() to deal with
__node_distance() returning an invalid SLIT entry.

Signed-off-by: Boris Ostrovsky 
---

Changes in v5:
* XEN_SYSCTL_numainfo knows about NUMA_NO_DISTANCE
* Cleaner changes in __node_distance()

 xen/arch/x86/srat.c|   13 ++---
 xen/common/page_alloc.c|4 ++--
 xen/common/sysctl.c|6 +-
 xen/include/asm-x86/numa.h |2 +-
 xen/include/xen/numa.h |3 ++-
 5 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index dfabba3..92c89a5 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -496,14 +496,21 @@ static unsigned node_to_pxm(nodeid_t n)
return 0;
 }
 
-int __node_distance(nodeid_t a, nodeid_t b)
+u8 __node_distance(nodeid_t a, nodeid_t b)
 {
-   int index;
+   unsigned index;
+   u8 slit_val;
 
if (!acpi_slit)
return a == b ? 10 : 20;
index = acpi_slit->locality_count * node_to_pxm(a);
-   return acpi_slit->entry[index + node_to_pxm(b)];
+   slit_val = acpi_slit->entry[index + node_to_pxm(b)];
+
+   /* ACPI defines 0xff as an unreachable node and 0-9 are undefined */
+   if ((slit_val == 0xff) || (slit_val <= 9))
+   return NUMA_NO_DISTANCE;
+   else
+   return slit_val;
 }
 
 EXPORT_SYMBOL(__node_distance);
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index d999296..bfb356e 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1434,13 +1434,13 @@ void __init scrub_heap_pages(void)
 /* Figure out which NODE CPUs are close. */
 for_each_online_node ( j )
 {
-int distance;
+u8 distance;
 
 if ( cpumask_empty(&node_to_cpumask(j)) )
 continue;
 
 distance = __node_distance(i, j);
-if ( distance < last_distance )
+if ( (distance < last_distance) && (distance != NUMA_NO_DISTANCE) )
 {
 last_distance = distance;
 best_node = j;
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 0cb6ee1..6fdd029 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -304,7 +304,11 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) 
u_sysctl)
 {
 uint32_t distance = ~0u;
 if ( node_online(i) && node_online(j) )
-distance = __node_distance(i, j);
+{
+u8 d = __node_distance(i, j);
+if ( d != NUMA_NO_DISTANCE )
+distance = d;
+}
 if ( copy_to_guest_offset(
 ni->node_to_node_distance,
 i*(max_node_index+1) + j, &distance, 1) )
diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
index cc5b5d1..7a489d3 100644
--- a/xen/include/asm-x86/numa.h
+++ b/xen/include/asm-x86/numa.h
@@ -85,6 +85,6 @@ extern int valid_numa_range(u64 start, u64 end, nodeid_t 
node);
 #endif
 
 void srat_parse_regions(u64 addr);
-extern int __node_distance(nodeid_t a, nodeid_t b);
+extern u8 __node_distance(nodeid_t a, nodeid_t b);
 
 #endif
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index ac4b391..7aef1a8 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -7,7 +7,8 @@
 #define NODES_SHIFT 0
 #endif
 
-#define NUMA_NO_NODE0xFF
+#define NUMA_NO_NODE 0xFF
+#define NUMA_NO_DISTANCE 0xFF
 
 #define MAX_NUMNODES(1 << NODES_SHIFT)
 
-- 
1.7.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 8/8] libxl: Add interface for querying hypervisor about PCI topology

2015-03-19 Thread Boris Ostrovsky
.. and use this new interface to display it along with CPU topology
and NUMA information when 'xl info -n' command is issued

The output will look like
...
cpu_topology   :
cpu:coresocket node
  0:   000
...
device topology:
device   node
:00:00.0  0
:00:01.0  0
...

Signed-off-by: Boris Ostrovsky 
Reviewed-by: Dario Faggioli 
Acked-by: Ian Campbell 
---
 tools/libxc/include/xenctrl.h |3 ++
 tools/libxc/xc_misc.c |   31 ++
 tools/libxl/libxl.c   |   42 +
 tools/libxl/libxl.h   |   12 +++
 tools/libxl/libxl_freebsd.c   |   12 +++
 tools/libxl/libxl_internal.h  |5 +++
 tools/libxl/libxl_linux.c |   69 +
 tools/libxl/libxl_netbsd.c|   12 +++
 tools/libxl/libxl_types.idl   |7 
 tools/libxl/libxl_utils.c |8 +
 tools/libxl/xl_cmdimpl.c  |   40 +++
 11 files changed, 234 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 54540e7..f1207fa 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1229,6 +1229,7 @@ typedef xen_sysctl_physinfo_t xc_physinfo_t;
 typedef xen_sysctl_cputopo_t xc_cputopo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
 typedef xen_sysctl_meminfo_t xc_meminfo_t;
+typedef xen_sysctl_pcitopoinfo_t xc_pcitopoinfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
 typedef uint32_t xc_cpu_to_socket_t;
@@ -1242,6 +1243,8 @@ int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
xc_cputopo_t *cputopo);
 int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
 xc_meminfo_t *meminfo, uint32_t *distance);
+int xc_pcitopoinfo(xc_interface *xch, unsigned num_devs,
+   physdev_pci_device_t *devs, uint32_t *nodes);
 
 int xc_sched_id(xc_interface *xch,
 int *sched_id);
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 607ae61..6e10429 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -254,6 +254,37 @@ out:
 return ret;
 }
 
+int xc_pcitopoinfo(xc_interface *xch, unsigned num_devs,
+   physdev_pci_device_t *devs,
+   uint32_t *nodes)
+{
+int ret;
+DECLARE_SYSCTL;
+DECLARE_HYPERCALL_BOUNCE(devs, num_devs * sizeof(*devs),
+ XC_HYPERCALL_BUFFER_BOUNCE_IN);
+DECLARE_HYPERCALL_BOUNCE(nodes, num_devs* sizeof(*nodes),
+ XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+if ((ret = xc_hypercall_bounce_pre(xch, devs)))
+goto out;
+if ((ret = xc_hypercall_bounce_pre(xch, nodes)))
+goto out;
+
+sysctl.u.pcitopoinfo.first_dev = 0;
+sysctl.u.pcitopoinfo.num_devs = num_devs;
+set_xen_guest_handle(sysctl.u.pcitopoinfo.devs, devs);
+set_xen_guest_handle(sysctl.u.pcitopoinfo.nodes, nodes);
+
+sysctl.cmd = XEN_SYSCTL_pcitopoinfo;
+
+ret = do_sysctl(xch, &sysctl);
+
+ out:
+xc_hypercall_bounce_post(xch, devs);
+xc_hypercall_bounce_post(xch, nodes);
+
+return ret;
+}
 
 int xc_sched_id(xc_interface *xch,
 int *sched_id)
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index ee97a54..1c51c53 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5091,6 +5091,48 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx 
*ctx, int *nb_cpu_out)
 return ret;
 }
 
+libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs)
+{
+GC_INIT(ctx);
+physdev_pci_device_t *devs;
+uint32_t *nodes;
+libxl_pcitopology *ret = NULL;
+int i;
+
+*num_devs = libxl__pci_numdevs(gc);
+if (*num_devs < 0) {
+LOG(ERROR, "Unable to determine number of PCI devices");
+goto out;
+}
+
+devs = libxl__zalloc(gc, sizeof(*devs) * *num_devs);
+nodes = libxl__zalloc(gc, sizeof(*nodes) * *num_devs);
+
+if (libxl__pci_topology_init(gc, devs, *num_devs)) {
+LOGE(ERROR, "Cannot initialize PCI hypercall structure");
+goto out;
+}
+
+if (xc_pcitopoinfo(ctx->xch, *num_devs, devs, nodes) != 0) {
+LOGE(ERROR, "PCI topology info hypercall failed");
+goto out;
+}
+
+ret = libxl__zalloc(NOGC, sizeof(libxl_pcitopology) * *num_devs);
+
+for (i = 0; i < *num_devs; i++) {
+ret[i].seg = devs[i].seg;
+ret[i].bus = devs[i].bus;
+ret[i].devfn = devs[i].devfn;
+ret[i].node = (nodes[i] == XEN_INVALID_NODE_ID) ?
+LIBXL_PCITOPOLOGY_INVALID_ENTRY : nodes[i];
+}
+
+ out:
+GC_FREE;
+return ret;
+}
+
 libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
 {
 GC_INIT(ctx);
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 5eec092..c447636 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -720,6 +720,14 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, 
libxl_mac *src);
 #de

[Xen-devel] [PATCH v5 6/8] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc

2015-03-19 Thread Boris Ostrovsky
xc_cputopoinfo() is not expected to be used on a hot path and therefore
hypercall buffer management can be pushed into libxc. This will simplify
life for callers.

Also update error reporting macros.

Signed-off-by: Boris Ostrovsky 
---
Changes in v5:
* Adjust to new interface (as described in changelog for patch 3/8)

 tools/libxc/include/xenctrl.h |5 ++-
 tools/libxc/xc_misc.c |   28 +++--
 tools/libxl/libxl.c   |   37 ++--
 tools/misc/xenpm.c|   41 +++-
 tools/python/xen/lowlevel/xc/xc.c |   20 +++--
 5 files changed, 61 insertions(+), 70 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index f64f815..14d22ce 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1226,7 +1226,7 @@ int xc_readconsolering(xc_interface *xch,
 int xc_send_debug_keys(xc_interface *xch, char *keys);
 
 typedef xen_sysctl_physinfo_t xc_physinfo_t;
-typedef xen_sysctl_cputopoinfo_t xc_cputopoinfo_t;
+typedef xen_sysctl_cputopo_t xc_cputopo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
@@ -1237,7 +1237,8 @@ typedef uint64_t xc_node_to_memfree_t;
 typedef uint32_t xc_node_to_node_dist_t;
 
 int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
-int xc_cputopoinfo(xc_interface *xch, xc_cputopoinfo_t *info);
+int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
+   xc_cputopo_t *cputopo);
 int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
 
 int xc_sched_id(xc_interface *xch,
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index be68291..411128e 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -177,22 +177,36 @@ int xc_physinfo(xc_interface *xch,
 return 0;
 }
 
-int xc_cputopoinfo(xc_interface *xch,
-   xc_cputopoinfo_t *put_info)
+int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
+   xc_cputopo_t *cputopo)
 {
 int ret;
 DECLARE_SYSCTL;
+DECLARE_HYPERCALL_BOUNCE(cputopo, *max_cpus * sizeof(*cputopo),
+ XC_HYPERCALL_BUFFER_BOUNCE_OUT);
 
-sysctl.cmd = XEN_SYSCTL_cputopoinfo;
+if (cputopo) {
+if ((ret = xc_hypercall_bounce_pre(xch, cputopo)))
+goto out;
+
+sysctl.u.cputopoinfo.num_cpus = *max_cpus;
+set_xen_guest_handle(sysctl.u.cputopoinfo.cputopo, cputopo);
+} else
+set_xen_guest_handle(sysctl.u.cputopoinfo.cputopo,
+ HYPERCALL_BUFFER_NULL);
 
-memcpy(&sysctl.u.cputopoinfo, put_info, sizeof(*put_info));
+sysctl.cmd = XEN_SYSCTL_cputopoinfo;
 
 if ( (ret = do_sysctl(xch, &sysctl)) != 0 )
-return ret;
+goto out;
 
-memcpy(put_info, &sysctl.u.cputopoinfo, sizeof(*put_info));
+*max_cpus = sysctl.u.cputopoinfo.num_cpus;
 
-return 0;
+out:
+if (cputopo)
+xc_hypercall_bounce_post(xch, cputopo);
+
+return ret;
 }
 
 int xc_numainfo(xc_interface *xch,
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 0234e36..2b7d19c 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5054,37 +5054,28 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo 
*physinfo)
 libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
 {
 GC_INIT(ctx);
-xc_cputopoinfo_t tinfo;
-DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
+xc_cputopo_t *cputopo;
 libxl_cputopology *ret = NULL;
 int i;
+unsigned num_cpus;
 
-/* Setting buffer to NULL makes the hypercall return number of CPUs */
-set_xen_guest_handle(tinfo.cputopo, HYPERCALL_BUFFER_NULL);
-if (xc_cputopoinfo(ctx->xch, &tinfo) != 0)
+/* Setting buffer to NULL makes the call return number of CPUs */
+if (xc_cputopoinfo(ctx->xch, &num_cpus, NULL))
 {
-LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of CPUS");
-ret = NULL;
+LOGEV(ERROR, errno, "Unable to determine number of CPUS");
 goto out;
 }
 
-cputopo = xc_hypercall_buffer_alloc(ctx->xch, cputopo,
-sizeof(*cputopo) * tinfo.num_cpus);
-if (cputopo == NULL) {
-LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
-"Unable to allocate hypercall arguments");
-goto fail;
-}
-set_xen_guest_handle(tinfo.cputopo, cputopo);
+cputopo = libxl__zalloc(gc, sizeof(*cputopo) * num_cpus);
 
-if (xc_cputopoinfo(ctx->xch, &tinfo) != 0) {
-LIBXL__LOG_ERRNO(ctx, XTL_ERROR, "CPU topology info hypercall failed");
-goto fail;
+if (xc_cputopoinfo(ctx->xch, &num_cpus, cputopo)) {
+LOGEV(ERROR, errno, "CPU topology info hypercall failed");
+goto out;
 }
 
-ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * tinfo.num_cpus);
+ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * nu

[Xen-devel] [PATCH v5 3/8] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient

2015-03-19 Thread Boris Ostrovsky
A number of changes to XEN_SYSCTL_topologyinfo interface:

* Instead of copying data for each field in xen_sysctl_topologyinfo separately
  put cpu/socket/node into a single structure and do a single copy for each
  processor.
* A NULL cputopo handle passed is a request for maximum number of CPUs
  (num_cpus). If cputopo is valid and num_cpus is smaller than the number of
  CPUs in the system then -ENOBUFS is returned (and correct num_cpus is 
provided)
* Do not use max_cpu_index, which is almost always used for calculating number
  CPUs (thus requiring adding or subtracting one), replace it with num_cpus.
* There is no need to copy whole op in sysctl to user at the end, we only need
  num_cpus.
* Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
  that these are used for CPU topology. Subsequent patch will add support for
  PCI topology sysctl.
* Replace INVALID_TOPOLOGY_ID with "XEN_"-prefixed macros for each invalid type
  (core, socket, node).

Update sysctl version to 0x000C

Signed-off-by: Boris Ostrovsky 
---

Changes in v5:
* Make XEN_SYSCTL_cputopoinfo treat passed in NULL handles as requests
  for array size (and is too small size passed in results in -ENOBUFS)
* Make node in xen_sysctl_cputopo a uint32
* On the toolstack side use NULL handle to determine array size in 
  libxl_get_cpu_topology() and use dynamic arrays in python's xc.c and
  xenpm.c

 tools/libxc/include/xenctrl.h |4 +-
 tools/libxc/xc_misc.c |   10 ++--
 tools/libxl/libxl.c   |   55 +++
 tools/misc/xenpm.c|  106 ++---
 tools/python/xen/lowlevel/xc/xc.c |   57 +++-
 xen/common/sysctl.c   |   68 
 xen/include/public/sysctl.h   |   57 +++-
 7 files changed, 177 insertions(+), 180 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index df18292..f64f815 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1226,7 +1226,7 @@ int xc_readconsolering(xc_interface *xch,
 int xc_send_debug_keys(xc_interface *xch, char *keys);
 
 typedef xen_sysctl_physinfo_t xc_physinfo_t;
-typedef xen_sysctl_topologyinfo_t xc_topologyinfo_t;
+typedef xen_sysctl_cputopoinfo_t xc_cputopoinfo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
@@ -1237,7 +1237,7 @@ typedef uint64_t xc_node_to_memfree_t;
 typedef uint32_t xc_node_to_node_dist_t;
 
 int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
-int xc_topologyinfo(xc_interface *xch, xc_topologyinfo_t *info);
+int xc_cputopoinfo(xc_interface *xch, xc_cputopoinfo_t *info);
 int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
 
 int xc_sched_id(xc_interface *xch,
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index e253a58..be68291 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -177,20 +177,20 @@ int xc_physinfo(xc_interface *xch,
 return 0;
 }
 
-int xc_topologyinfo(xc_interface *xch,
-xc_topologyinfo_t *put_info)
+int xc_cputopoinfo(xc_interface *xch,
+   xc_cputopoinfo_t *put_info)
 {
 int ret;
 DECLARE_SYSCTL;
 
-sysctl.cmd = XEN_SYSCTL_topologyinfo;
+sysctl.cmd = XEN_SYSCTL_cputopoinfo;
 
-memcpy(&sysctl.u.topologyinfo, put_info, sizeof(*put_info));
+memcpy(&sysctl.u.cputopoinfo, put_info, sizeof(*put_info));
 
 if ( (ret = do_sysctl(xch, &sysctl)) != 0 )
 return ret;
 
-memcpy(put_info, &sysctl.u.topologyinfo, sizeof(*put_info));
+memcpy(put_info, &sysctl.u.cputopoinfo, sizeof(*put_info));
 
 return 0;
 }
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 94b4d59..c989abf 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5054,64 +5054,51 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo 
*physinfo)
 libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
 {
 GC_INIT(ctx);
-xc_topologyinfo_t tinfo;
-DECLARE_HYPERCALL_BUFFER(xc_cpu_to_core_t, coremap);
-DECLARE_HYPERCALL_BUFFER(xc_cpu_to_socket_t, socketmap);
-DECLARE_HYPERCALL_BUFFER(xc_cpu_to_node_t, nodemap);
+xc_cputopoinfo_t tinfo;
+DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
 libxl_cputopology *ret = NULL;
 int i;
-int max_cpus;
 
-max_cpus = libxl_get_max_cpus(ctx);
-if (max_cpus < 0)
+/* Setting buffer to NULL makes the hypercall return number of CPUs */
+set_xen_guest_handle(tinfo.cputopo, HYPERCALL_BUFFER_NULL);
+if (xc_cputopoinfo(ctx->xch, &tinfo) != 0)
 {
 LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of CPUS");
 ret = NULL;
 goto out;
 }
 
-coremap = xc_hypercall_buffer_alloc
-(ctx->xch, coremap, sizeof(*coremap) * max_cpus);
-socketmap = xc_hypercall_buffer_alloc
-(ctx->xch, socketmap, sizeof(*socketmap) * max_cpus);
-nodemap = x

[Xen-devel] [PATCH v5 7/8] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc

2015-03-19 Thread Boris Ostrovsky
xc_numainfo() is not expected to be used on a hot path and therefore
hypercall buffer management can be pushed into libxc. This will simplify
life for callers.

Also update error logging macros.

Signed-off-by: Boris Ostrovsky 
---
Changes in v5:
* Adjust to new interface (as described in changelog for patch 4/8)

 tools/libxc/include/xenctrl.h |4 ++-
 tools/libxc/xc_misc.c |   41 -
 tools/libxl/libxl.c   |   52 -
 tools/python/xen/lowlevel/xc/xc.c |   38 ++-
 4 files changed, 68 insertions(+), 67 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 14d22ce..54540e7 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1228,6 +1228,7 @@ int xc_send_debug_keys(xc_interface *xch, char *keys);
 typedef xen_sysctl_physinfo_t xc_physinfo_t;
 typedef xen_sysctl_cputopo_t xc_cputopo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
+typedef xen_sysctl_meminfo_t xc_meminfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
 typedef uint32_t xc_cpu_to_socket_t;
@@ -1239,7 +1240,8 @@ typedef uint32_t xc_node_to_node_dist_t;
 int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
 int xc_cputopoinfo(xc_interface *xch, unsigned *max_cpus,
xc_cputopo_t *cputopo);
-int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
+int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
+xc_meminfo_t *meminfo, uint32_t *distance);
 
 int xc_sched_id(xc_interface *xch,
 int *sched_id);
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 411128e..607ae61 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -209,22 +209,49 @@ out:
 return ret;
 }
 
-int xc_numainfo(xc_interface *xch,
-xc_numainfo_t *put_info)
+int xc_numainfo(xc_interface *xch, unsigned *max_nodes,
+xc_meminfo_t *meminfo, uint32_t *distance)
 {
 int ret;
 DECLARE_SYSCTL;
+DECLARE_HYPERCALL_BOUNCE(meminfo, *max_nodes * sizeof(*meminfo),
+ XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+DECLARE_HYPERCALL_BOUNCE(distance,
+ *max_nodes * *max_nodes * sizeof(*distance),
+ XC_HYPERCALL_BUFFER_BOUNCE_OUT);
 
-sysctl.cmd = XEN_SYSCTL_numainfo;
+if (meminfo && distance) {
+if ((ret = xc_hypercall_bounce_pre(xch, meminfo)))
+goto out;
+if ((ret = xc_hypercall_bounce_pre(xch, distance)))
+goto out;
 
-memcpy(&sysctl.u.numainfo, put_info, sizeof(*put_info));
+sysctl.u.numainfo.num_nodes = *max_nodes;
+set_xen_guest_handle(sysctl.u.numainfo.meminfo, meminfo);
+set_xen_guest_handle(sysctl.u.numainfo.distance, distance);
+} else if (meminfo || distance) {
+errno = EINVAL;
+return -1;
+} else {
+set_xen_guest_handle(sysctl.u.numainfo.meminfo,
+ HYPERCALL_BUFFER_NULL);
+set_xen_guest_handle(sysctl.u.numainfo.distance,
+ HYPERCALL_BUFFER_NULL);
+}
 
+sysctl.cmd = XEN_SYSCTL_numainfo;
 if ((ret = do_sysctl(xch, &sysctl)) != 0)
-return ret;
+goto out;
 
-memcpy(put_info, &sysctl.u.numainfo, sizeof(*put_info));
+*max_nodes = sysctl.u.numainfo.num_nodes;
 
-return 0;
+out:
+if (meminfo && distance) {
+xc_hypercall_bounce_post(xch, meminfo);
+xc_hypercall_bounce_post(xch, distance);
+}
+
+return ret;
 }
 
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 2b7d19c..ee97a54 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5094,62 +5094,44 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx 
*ctx, int *nb_cpu_out)
 libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
 {
 GC_INIT(ctx);
-xc_numainfo_t ninfo;
-DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
-DECLARE_HYPERCALL_BUFFER(uint32_t, distance);
+xc_meminfo_t *meminfo;
+uint32_t *distance;
 libxl_numainfo *ret = NULL;
 int i, j;
+unsigned num_nodes;
 
-set_xen_guest_handle(ninfo.meminfo, HYPERCALL_BUFFER_NULL);
-set_xen_guest_handle(ninfo.distance, HYPERCALL_BUFFER_NULL);
-if ( xc_numainfo(ctx->xch, &ninfo) != 0)
-{
-LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of NODES");
-ret = NULL;
+if (xc_numainfo(ctx->xch, &num_nodes, NULL, NULL)) {
+LOGEV(ERROR, errno, "Unable to determine number of nodes");
 goto out;
 }
 
-meminfo = xc_hypercall_buffer_alloc(ctx->xch, meminfo,
-sizeof(*meminfo) * ninfo.num_nodes);
-distance = xc_hypercall_buffer_alloc(ctx->xch, distance,
- sizeof(*distance) *
- ninfo.num_nodes * ninfo.num_nodes);
-if ((meminfo == NULL) || (dist

[Xen-devel] [PATCH v5 4/8] sysctl: Make XEN_SYSCTL_numainfo a little more efficient

2015-03-19 Thread Boris Ostrovsky
A number of changes to XEN_SYSCTL_numainfo interface:

* Make sysctl NUMA topology query use fewer copies by combining some
  fields into a single structure and copying distances for each node
  in a single copy.
* NULL meminfo and distance handles are a request for maximum number
  of nodes (num_nodes). If those handles are valid and num_nodes is
  is smaller than the number of nodes in the system then -ENOBUFS is
  returned (and correct num_nodes is provided)
* Instead of using max_node_index for passing number of nodes keep this
  value in num_nodes: almost all uses of max_node_index required adding
  or subtracting one to eventually get to number of nodes anyway.
* Replace INVALID_NUMAINFO_ID with XEN_INVALID_MEM_SZ and add
  XEN_INVALID_NODE_DIST.

Signed-off-by: Boris Ostrovsky 
---
Changes in v5:
* Similar to 3/8 patch:
  * Make XEN_SYSCTL_numainfo treat passed in NULL handles as requests
for array size (and is too small size passed in results in -ENOBUFS)
  * Make distance in xen_sysctl_numainfo a uint32
  * On the toolstack side use NULL handles to determine array size in
libxl_get_numainfo() and use dynamic arrays in python's xc.c


 tools/libxl/libxl.c   |   65 +++-
 tools/python/xen/lowlevel/xc/xc.c |   58 +---
 xen/common/sysctl.c   |   75 
 xen/include/public/sysctl.h   |   53 +++---
 4 files changed, 129 insertions(+), 122 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index c989abf..0234e36 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5108,65 +5108,60 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int 
*nr)
 {
 GC_INIT(ctx);
 xc_numainfo_t ninfo;
-DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, memsize);
-DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, memfree);
-DECLARE_HYPERCALL_BUFFER(uint32_t, node_dists);
+DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
+DECLARE_HYPERCALL_BUFFER(uint32_t, distance);
 libxl_numainfo *ret = NULL;
-int i, j, max_nodes;
+int i, j;
 
-max_nodes = libxl_get_max_nodes(ctx);
-if (max_nodes < 0)
+set_xen_guest_handle(ninfo.meminfo, HYPERCALL_BUFFER_NULL);
+set_xen_guest_handle(ninfo.distance, HYPERCALL_BUFFER_NULL);
+if ( xc_numainfo(ctx->xch, &ninfo) != 0)
 {
 LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of NODES");
 ret = NULL;
 goto out;
 }
 
-memsize = xc_hypercall_buffer_alloc
-(ctx->xch, memsize, sizeof(*memsize) * max_nodes);
-memfree = xc_hypercall_buffer_alloc
-(ctx->xch, memfree, sizeof(*memfree) * max_nodes);
-node_dists = xc_hypercall_buffer_alloc
-(ctx->xch, node_dists, sizeof(*node_dists) * max_nodes * max_nodes);
-if ((memsize == NULL) || (memfree == NULL) || (node_dists == NULL)) {
+meminfo = xc_hypercall_buffer_alloc(ctx->xch, meminfo,
+sizeof(*meminfo) * ninfo.num_nodes);
+distance = xc_hypercall_buffer_alloc(ctx->xch, distance,
+ sizeof(*distance) *
+ ninfo.num_nodes * ninfo.num_nodes);
+if ((meminfo == NULL) || (distance == NULL)) {
 LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
 "Unable to allocate hypercall arguments");
 goto fail;
 }
 
-set_xen_guest_handle(ninfo.node_to_memsize, memsize);
-set_xen_guest_handle(ninfo.node_to_memfree, memfree);
-set_xen_guest_handle(ninfo.node_to_node_distance, node_dists);
-ninfo.max_node_index = max_nodes - 1;
+set_xen_guest_handle(ninfo.meminfo, meminfo);
+set_xen_guest_handle(ninfo.distance, distance);
 if (xc_numainfo(ctx->xch, &ninfo) != 0) {
 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting numainfo");
 goto fail;
 }
 
-if (ninfo.max_node_index < max_nodes - 1)
-max_nodes = ninfo.max_node_index + 1;
+*nr = ninfo.num_nodes;
 
-*nr = max_nodes;
+ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * ninfo.num_nodes);
+for (i = 0; i < ninfo.num_nodes; i++)
+ret[i].dists = libxl__calloc(NOGC, ninfo.num_nodes, sizeof(*distance));
 
-ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * max_nodes);
-for (i = 0; i < max_nodes; i++)
-ret[i].dists = libxl__calloc(NOGC, max_nodes, sizeof(*node_dists));
-
-for (i = 0; i < max_nodes; i++) {
-#define V(mem, i) (mem[i] == INVALID_NUMAINFO_ID) ? \
-LIBXL_NUMAINFO_INVALID_ENTRY : mem[i]
-ret[i].size = V(memsize, i);
-ret[i].free = V(memfree, i);
-ret[i].num_dists = max_nodes;
-for (j = 0; j < ret[i].num_dists; j++)
-ret[i].dists[j] = V(node_dists, i * max_nodes + j);
+for (i = 0; i < ninfo.num_nodes; i++) {
+#define V(val, invalid) (val == invalid) ? \
+   LIBXL_NUMAINFO_INVALID_ENTRY : val
+ret[i].

Re: [Xen-devel] [PATCH 9/9] qspinlock, x86, kvm: Implement KVM support for paravirt qspinlock

2015-03-19 Thread Waiman Long

On 03/19/2015 06:01 AM, Peter Zijlstra wrote:

On Wed, Mar 18, 2015 at 10:45:55PM -0400, Waiman Long wrote:

On 03/16/2015 09:16 AM, Peter Zijlstra wrote:
I do have some concern about this call site patching mechanism as the
modification is not atomic. The spin_unlock() calls are in many places in
the kernel. There is a possibility that a thread is calling a certain
spin_unlock call site while it is being patched by another one with the
alternative() function call.

So far, I don't see any problem with bare metal where paravirt_patch_insns()
is used to patch it to the move instruction. However, in a virtual guest
enivornment where paravirt_patch_call() was used, there were situations
where the system panic because of page fault on some invalid memory in the
kthread. If you look at the paravirt_patch_call(), you will see:

 :
b->opcode = 0xe8; /* call */
b->delta = delta;

If another CPU reads the instruction at the call site at the right moment,
it will get the modified call instruction, but not the new delta value. It
will then jump to a random location. I believe that was causing the system
panic that I saw.

So I think it is kind of risky to use it here unless we can guarantee that
call site patching is atomic wrt other CPUs.

Just look at where the patching is done:

init/main.c:start_kernel()
   check_bugs()
 alternative_instructions()
   apply_paravirt()

We're UP and not holding any locks, disable IRQs (see text_poke_early())
and have NMIs 'disabled'.


You are probably right. The initial apply_paravirt() was done before the 
SMP boot. Subsequent ones were at kernel module load time. I put a 
counter in the __native_queue_spin_unlock() and it registered 26949 
unlock calls in a 16-cpu guest before it got patched out.


The panic that I observed before might be due to some coding error of my 
own.


-Longman

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 08/13] libxc: Check xc_domain_maximum_gpfn for negative return values

2015-03-19 Thread Konrad Rzeszutek Wilk
On Thu, Mar 19, 2015 at 04:47:58PM +, Ian Campbell wrote:
> On Wed, 2015-03-18 at 20:24 -0400, Konrad Rzeszutek Wilk wrote:
> > Instead of assuming everything is always OK. We stash
> > the gpfns value as an parameter.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk 
> > ---
> >  tools/libxc/xc_core_arm.c| 17 ++---
> >  tools/libxc/xc_core_x86.c| 24 
> >  tools/libxc/xc_domain_save.c |  8 +++-
> >  3 files changed, 41 insertions(+), 8 deletions(-)
> > 
> > diff --git a/tools/libxc/xc_core_arm.c b/tools/libxc/xc_core_arm.c
> > index 16508e7..26cec04 100644
> > --- a/tools/libxc/xc_core_arm.c
> > +++ b/tools/libxc/xc_core_arm.c
> > @@ -31,9 +31,16 @@ xc_core_arch_gpfn_may_present(struct 
> > xc_core_arch_context *arch_ctxt,
> >  }
> >  
> > 
> > -static int nr_gpfns(xc_interface *xch, domid_t domid)
> > +static int nr_gpfns(xc_interface *xch, domid_t domid, unsigned long *gpfns)
> 
> You didn't fancy merging the two versions of this then ;-)
> > diff --git a/tools/libxc/xc_core_x86.c b/tools/libxc/xc_core_x86.c
> > index d8846f1..02377e8 100644
> > --- a/tools/libxc/xc_core_x86.c
> > +++ b/tools/libxc/xc_core_x86.c
> 
> > @@ -88,7 +99,12 @@ xc_core_arch_map_p2m_rw(xc_interface *xch, struct 
> > domain_info_context *dinfo, xc
> >  int err;
> >  int i;
> >  
> > -dinfo->p2m_size = nr_gpfns(xch, info->domid);
> > +err = nr_gpfns(xch, info->domid, &dinfo->p2m_size);
> 
> Please could you avoid reusing err here, the reason is that it's sole
> use now is to save errno over the cleanup path, whereas here it looks
> like it is going to be used for something but it isn't.
> 
>  if ( nr_gpfns(...)  < 0 )
> 
> is ok per the Xen coding style if you don't actually need the return
> code.
> 
> Or
> 
> ret = nr_gpfns()
> if ( ret < 0 )
> error, goto out
> 
> ret = -1;
> .. the rest
> 
> would be ok too I guess. (coding style here allows
> if ( (ret = nr_gpfns(...)) < 0 )
> too FWIW).
> 
> > +if ( err < 0 )
> > +{
> > +ERROR("nr_gpfns returns errno: %d.", errno);
> > +goto out;
> > +}
> >  if ( dinfo->p2m_size < info->nr_pages  )
> >  {
> >  ERROR("p2m_size < nr_pages -1 (%lx < %lx", dinfo->p2m_size, 
> > info->nr_pages - 1);
> > diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
> > index 254fdb3..6346c12 100644
> > --- a/tools/libxc/xc_domain_save.c
> > +++ b/tools/libxc/xc_domain_save.c
> > @@ -939,7 +939,13 @@ int xc_domain_save(xc_interface *xch, int io_fd, 
> > uint32_t dom, uint32_t max_iter
> >  }
> >  
> >  /* Get the size of the P2M table */
> > -dinfo->p2m_size = xc_domain_maximum_gpfn(xch, dom) + 1;
> > +rc = xc_domain_maximum_gpfn(xch, dom);
> > +if ( rc < 0 )
> > +{
> > +ERROR("Could not get maximum GPFN!");
> > +goto out;
> > +}
> > +dinfo->p2m_size = rc + 1;
> 
> Shame this can't use the same helper as the others.

How about this (compile tested but not yet runtime tested):


>From 92085d29b7e2906095a2bc6849b5a17b478e5c79 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk 
Date: Fri, 13 Mar 2015 14:57:44 -0400
Subject: [PATCH v3] libxc: Check xc_domain_maximum_gpfn for negative return
 values

Instead of assuming everything is always OK. We stash
the gpfns value as an parameter. Since we use it in three
of places we might as well stick it in a common file for
all three of them to use.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 tools/libxc/xc_core_arm.c| 11 ---
 tools/libxc/xc_core_x86.c| 18 ++
 tools/libxc/xc_domain_save.c |  6 +-
 tools/libxc/xc_private.c | 12 
 tools/libxc/xc_private.h |  2 ++
 5 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/tools/libxc/xc_core_arm.c b/tools/libxc/xc_core_arm.c
index 16508e7..846bc6c 100644
--- a/tools/libxc/xc_core_arm.c
+++ b/tools/libxc/xc_core_arm.c
@@ -30,12 +30,6 @@ xc_core_arch_gpfn_may_present(struct xc_core_arch_context 
*arch_ctxt,
 return 0;
 }
 
-
-static int nr_gpfns(xc_interface *xch, domid_t domid)
-{
-return xc_domain_maximum_gpfn(xch, domid) + 1;
-}
-
 int
 xc_core_arch_auto_translated_physmap(const xc_dominfo_t *info)
 {
@@ -48,9 +42,12 @@ xc_core_arch_memory_map_get(xc_interface *xch, struct 
xc_core_arch_context *unus
 xc_core_memory_map_t **mapp,
 unsigned int *nr_entries)
 {
-unsigned long p2m_size = nr_gpfns(xch, info->domid);
+unsigned long p2m_size = 0;
 xc_core_memory_map_t *map;
 
+if ( xc_nr_gpfns(xch, info->domid, &p2m_size) < 0 )
+return -1;
+
 map = malloc(sizeof(*map));
 if ( map == NULL )
 {
diff --git a/tools/libxc/xc_core_x86.c b/tools/libxc/xc_core_x86.c
index d8846f1..2f5ffea 100644
--- a/tools/libxc/xc_core_x86.c
+++ b/tools/libxc/xc_core_x86.c
@@ -35,12 +35,6 @@ xc_core_arch_gpfn_may_present(struct xc_core_arch_context 
*arch_ctxt,
 return 1;
 }
 
-
-sta

Re: [Xen-devel] [PATCH v4 28/33] tools/libxl: Check if fdt_{first, next}_subnode are present in libfdt

2015-03-19 Thread Julien Grall
Hi,

On 19/03/15 19:29, Julien Grall wrote:
> The functions fdt_{fisrt,next}_subnode may not be available because:
> * It has been introduced in 2013 => Doesn't work on Wheezy
> * The prototype exists but the functions are not exposed. Don't ask
> why...
> 
> The later has been fixed recently in the dtc repo [1]
> 
> When the functions are not available, implement our own in order to use
> them in a following patch.
> 
> [1] git://git.kernel.org/pub/scm/utils/dtc/dtc.git
> commit a4b093f7366fdb429ca1781144d3985fa50d0fbb
> 
> Signed-off-by: Julien Grall 
> Cc: Ian Jackson 
> Cc: Wei Liu 
> 
> ---

I forgot to add that this patch modify tools/configure.ac and therefore
require to regenerate tools/configure.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 29/33] tools/(lib)xl: Add partial device tree support for ARM

2015-03-19 Thread Julien Grall
Let the user to pass additional nodes to the guest device tree. For
this purpose, everything in the node /passthrough from the partial
device tree will be copied into the guest device tree.

The node /aliases will be also copied to allow the user to define
aliases which can be used by the guest kernel.

A simple partial device tree will look like:

/dts-v1/;

/ {
#address-cells = <2>;
#size-cells = <2>;

passthrough {
compatible = "simple-bus";
ranges;
#address-cells = <2>;
#size-cells = <2>;

/* List of your nodes */
}
};

Note that:
* The interrupt-parent property will be added by the toolstack in
the root node
* The properties compatible, ranges, #address-cells and #size-cells
in /passthrough are mandatory.

The helpers provided by the libfdt don't perform all the necessary
security check on a given device tree. Therefore, only trusted device
tree should be used.

Signed-off-by: Julien Grall 
Cc: Ian Jackson 
Cc: Wei Liu 

---
An example of the partial device tree, as long as how to passthrough
a non-pci device will be added to the tree in a follow-up patch.

A new LIBXL_HAVE_* will be added in the patch which add support for
non-PCI passthrough as both are tight.

Changes in v4:
- Mark the option as unsafe
- The _fdt_* helpers has been moved in a separate patch/file.
Only the prototype is declared
- The partial DT is considered valid. Remove some security check
which make the code cleaner
- Typoes

Changes in v3:
- Patch added
---
 docs/man/xl.cfg.pod.5   |  10 +++
 tools/libxl/libxl_arm.c | 171 
 tools/libxl/libxl_types.idl |   1 +
 tools/libxl/xl_cmdimpl.c|   1 +
 4 files changed, 183 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 93cd7d2..bcbc277 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -453,6 +453,16 @@ not emulated.
 Specify that this domain is a driver domain. This enables certain
 features needed in order to run a driver domain.
 
+=item B
+
+Specify a partial device tree (compiled via the Device Tree Compiler).
+Everything under the node "/passthrough" will be copied into the guest
+device tree. For convenience, the node "/aliases" is also copied to allow
+the user to defined aliases which can be used by the guest kernel.
+
+Given the complexity of verifying the validity of a device tree, this
+option should only be used with trusted device tree.
+
 =back
 
 =head2 Devices
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 06e940b..54d197b 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -542,6 +542,156 @@ out:
 }
 }
 
+/* Only FDT v17 is supported */
+#define FDT_REQUIRED_VERSION0x11
+
+static int check_partial_fdt(libxl__gc *gc, void *fdt, size_t size)
+{
+int r;
+
+if (size < FDT_V17_SIZE) {
+LOG(ERROR, "Partial FDT is too small");
+return ERROR_FAIL;
+}
+
+if (fdt_magic(fdt) != FDT_MAGIC) {
+LOG(ERROR, "Partial FDT is not a valid Flat Device Tree");
+return ERROR_FAIL;
+}
+
+if (fdt_version(fdt) != FDT_REQUIRED_VERSION) {
+LOG(ERROR, "Partial FDT version not supported. Required 0x%x got 0x%x",
+FDT_REQUIRED_VERSION, fdt_version(fdt));
+return ERROR_FAIL;
+}
+
+r = fdt_check_header(fdt);
+if (r) {
+LOG(ERROR, "Failed to check the partial FDT (%d)", r);
+return ERROR_FAIL;
+}
+
+if (fdt_totalsize(fdt) > size) {
+LOG(ERROR, "Partial FDT totalsize is too big");
+return ERROR_FAIL;
+}
+
+return 0;
+}
+
+static int copy_properties(libxl__gc *gc, void *fdt, void *pfdt,
+   int nodeoff)
+{
+int propoff, nameoff, r;
+const struct fdt_property *prop;
+
+for (propoff = fdt_first_property_offset(pfdt, nodeoff);
+ propoff >= 0;
+ propoff = fdt_next_property_offset(pfdt, propoff)) {
+
+if (!(prop = fdt_get_property_by_offset(pfdt, propoff, NULL))) {
+return -FDT_ERR_INTERNAL;
+}
+
+nameoff = fdt32_to_cpu(prop->nameoff);
+r = fdt_property(fdt, fdt_string(pfdt, nameoff),
+ prop->data, fdt32_to_cpu(prop->len));
+if (r) return r;
+}
+
+/* FDT_ERR_NOTFOUND => There is no more properties for this node */
+return (propoff != -FDT_ERR_NOTFOUND)? propoff : 0;
+}
+
+/*
+ * These functions are defined by libfdt or libxl_fdt.c if it's not
+ * present on the former.
+ */
+int fdt_next_subnode(const void *fdt, int offset);
+int fdt_first_subnode(const void *fdt, int offset);
+
+/* Copy a node from the partial device tree to the guest device tree */
+static int copy_node(libxl__gc *gc, void *fdt, void *pfdt,
+ int nodeoff, int depth)
+{
+int r;
+
+r = fdt_begin_node(fdt,

[Xen-devel] [PATCH v4 31/33] libxl: Add support for non-PCI passthrough

2015-03-19 Thread Julien Grall
On ARM, every non-PCI device are described in the device tree. Each of
them can be found via a path.

This patch introduces a very basic support, only the IOMMU will be set
up correctly. The user will have to:
- Describe the device in the partial device tree
- Map manually MMIO/IRQ

This is a first approach, that will allow to have a basic non-PCI
passthrough support in Xen. This could be improved later.

Furthermore add LIBXL_HAVE_DEVICETREE_PASSTHROUGH to indicate we
support non-PCI passthrough and partial device tree (introduced by a
previous patch).

Signed-off-by: Julien Grall 
Cc: Ian Jackson 
Cc: Wei Liu 

---
Changes in v4:
- Add LIBXL_HAVE_DEVICTREE_PASSTHROUGH to indicate we support
non-PCI passthrough. This is also used in order to indicate
partial device tree is supported
- Remove libxl_dtdev.c as it contains only a 2 lines functions
and call directly xc_* from libxl_create.c
- Introduce domcreate_attach_dtdev

Changes in v3:
- Dynamic allocation has been dropped
- Rework the commit message in accordance with the previous
item

Changes in v2:
- Get DT infos earlier
- Allocate/map IRQ in libxl__arch_domain_create rather than in
libxl__device_dt_add
- Use VIRQ rather than the PIRQ to construct the interrupts
properties of the device tree
- Correct cpumask in make_dtdev_node. We allow the interrupt to
be used on the 8 CPUs
- Fix LOGE when we map the MMIO region in the guest in
libxl__device_dt_add. The domain and the IRQ were inverted
- Calculate the number of SPIs to configure the VGIC
- xc_physdev_dtdev_* helpers has been renamed to xc_dtdev_*
- Rename libxl_device_dt to libxl_device_dtdev
---
 tools/libxl/libxl.h  |  6 ++
 tools/libxl/libxl_create.c   | 32 
 tools/libxl/libxl_internal.h |  5 +
 tools/libxl/libxl_types.idl  |  5 +
 4 files changed, 48 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 6bc75c5..baaf06b 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -179,6 +179,12 @@
 #define LIBXL_HAVE_BUILDINFO_HVM_MMIO_HOLE_MEMKB 1
 
 /*
+ * libxl_domain_build_info has device_tree and libxl_device_dtdev
+ * exists. This mean non-PCI passthrough is supported for ARM
+ */
+#define LIBXL_HAVE_DEVICETREE_PASSTHROUGH 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 15b464e..39c828b 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -751,6 +751,8 @@ static void domcreate_attach_vtpms(libxl__egc *egc, 
libxl__multidev *multidev,
int ret);
 static void domcreate_attach_pci(libxl__egc *egc, libxl__multidev *aodevs,
  int ret);
+static void domcreate_attach_dtdev(libxl__egc *egc,
+   libxl__domain_create_state *dcs);
 
 static void domcreate_console_available(libxl__egc *egc,
 libxl__domain_create_state *dcs);
@@ -1444,6 +1446,36 @@ static void domcreate_attach_pci(libxl__egc *egc, 
libxl__multidev *multidev,
 }
 }
 
+domcreate_attach_dtdev(egc, dcs);
+return;
+
+error_out:
+assert(ret);
+domcreate_complete(egc, dcs, ret);
+}
+
+static void domcreate_attach_dtdev(libxl__egc *egc,
+   libxl__domain_create_state *dcs)
+{
+STATE_AO_GC(dcs->ao);
+int i;
+int ret;
+int domid = dcs->guest_domid;
+
+/* convenience aliases */
+libxl_domain_config *const d_config = dcs->guest_config;
+
+for (i = 0; i < d_config->num_dtdevs; i++) {
+const libxl_device_dtdev *dtdev = &d_config->dtdevs[i];
+
+LOG(DEBUG, "Assign device \"%s\" to dom%u", dtdev->path, domid);
+ret = xc_assign_dt_device(CTX->xch, domid, dtdev->path);
+if (ret < 0) {
+LOG(ERROR, "xc_assign_dtdevice failed: %d\n", ret);
+goto error_out;
+}
+}
+
 domcreate_console_available(egc, dcs);
 
 domcreate_complete(egc, dcs, 0);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 46fa624..1191098 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1198,6 +1198,11 @@ _hidden int libxl__create_pci_backend(libxl__gc *gc, 
uint32_t domid,
   libxl_device_pci *pcidev, int num);
 _hidden int libxl__device_pci_destroy_all(libxl__gc *gc, uint32_t domid);
 
+/* from libxl_dtdev */
+
+_hidden int libxl__device_dt_add(libxl__gc *gc, uint32_t domid,
+ const libxl_device_dtdev *dtdev);
+
 /*- xswait: wait for a xenstore node to be suitable -*/
 
 typedef struct libxl__xswait_state libxl__xswait_state;
diff --git a/tool

[Xen-devel] [PATCH v4 27/33] tools/libxl: Create a per-arch function to map IRQ to a domain

2015-03-19 Thread Julien Grall
ARM and x86 use a different hypercall to map an IRQ to a domain.

The hypercall to give IRQ permission to the domain as also been moved
on the x86 specific function as ARM guest won't be able to manage the IRQ.
We may want to support it later.

Signed-off-by: Julien Grall 
Cc: Ian Jackson 
Cc: Wei Liu 

---
Changes in v4:
- Patch added
---
 tools/libxl/libxl_arch.h   |  4 
 tools/libxl/libxl_arm.c|  7 +++
 tools/libxl/libxl_create.c |  6 ++
 tools/libxl/libxl_x86.c| 13 +
 4 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index cae64c0..77b1f2a 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -39,4 +39,8 @@ int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
   uint32_t domid,
   libxl_domain_build_info *b_info,
   libxl__domain_build_state *state);
+
+/* arch specific irq map function */
+int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq);
+
 #endif
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 2407c2e..06e940b 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -742,6 +742,13 @@ int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
 return libxl__vnuma_build_vmemrange_pv_generic(gc, domid, info, state);
 }
 
+int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq)
+{
+return xc_domain_bind_pt_irq(CTX->xch, domid, irq, PT_IRQ_TYPE_SPI,
+ 0 /* Not used */, 0 /* Not used */,
+ 0 /* Not used */, irq);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index e5a343f..15b464e 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1205,11 +1205,9 @@ static void domcreate_launch_dm(libxl__egc *egc, 
libxl__multidev *multidev,
 
 LOG(DEBUG, "dom%d irq %d", domid, irq);
 
-ret = irq >= 0 ? xc_physdev_map_pirq(CTX->xch, domid, irq, &irq)
+ret = irq >= 0 ? libxl__arch_domain_map_irq(gc, domid, irq)
: -EOVERFLOW;
-if (!ret)
-ret = xc_domain_irq_permission(CTX->xch, domid, irq, 1);
-if (ret < 0) {
+if (ret) {
 LOGE(ERROR, "failed give dom%d access to irq %d", domid, irq);
 ret = ERROR_FAIL;
 goto error_out;
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 3149896..9f6ec18 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -427,6 +427,19 @@ out:
 return rc;
 }
 
+int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq)
+{
+int ret;
+
+ret = xc_physdev_map_pirq(CTX->xch, domid, irq, &irq);
+if (ret)
+return ret;
+
+ret = xc_domain_irq_permission(CTX->xch, domid, irq, 1);
+
+return ret;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 28/33] tools/libxl: Check if fdt_{first, next}_subnode are present in libfdt

2015-03-19 Thread Julien Grall
The functions fdt_{fisrt,next}_subnode may not be available because:
* It has been introduced in 2013 => Doesn't work on Wheezy
* The prototype exists but the functions are not exposed. Don't ask
why...

The later has been fixed recently in the dtc repo [1]

When the functions are not available, implement our own in order to use
them in a following patch.

[1] git://git.kernel.org/pub/scm/utils/dtc/dtc.git
commit a4b093f7366fdb429ca1781144d3985fa50d0fbb

Signed-off-by: Julien Grall 
Cc: Ian Jackson 
Cc: Wei Liu 

---

Changes in v4:
- Patch added
---
 tools/config.h.in   |  6 
 tools/configure.ac  |  5 +++
 tools/libxl/Makefile|  2 +-
 tools/libxl/libxl_fdt.c | 84 +
 4 files changed, 96 insertions(+), 1 deletion(-)
 create mode 100644 tools/libxl/libxl_fdt.c

diff --git a/tools/config.h.in b/tools/config.h.in
index 2a0ae48..06f8b6c 100644
--- a/tools/config.h.in
+++ b/tools/config.h.in
@@ -3,6 +3,12 @@
 /* Blktap2 enabled */
 #undef HAVE_BLKTAP2
 
+/* Define to 1 if you have the `fdt_first_subnode' function. */
+#undef HAVE_FDT_FIRST_SUBNODE
+
+/* Define to 1 if you have the `fdt_next_subnode' function. */
+#undef HAVE_FDT_NEXT_SUBNODE
+
 /* Define to 1 if you have the  header file. */
 #undef HAVE_INTTYPES_H
 
diff --git a/tools/configure.ac b/tools/configure.ac
index d31c2f3..cc13336 100644
--- a/tools/configure.ac
+++ b/tools/configure.ac
@@ -355,6 +355,11 @@ AC_SUBST(libiconv)
 case "$host_cpu" in
 arm*|aarch64)
 AC_CHECK_LIB([fdt], [fdt_create], [], [AC_MSG_ERROR([Could not find libfdt])])
+
+# The functions fdt_{first,next}_subnode may not be available because:
+#   * It has been introduced in 2013 => Doesn't work on Wheezy
+#   * The prototype exists but the functions are not exposed. Don't ask why...
+AC_CHECK_FUNCS([fdt_first_subnode fdt_next_subnode])
 esac
 
 # Checks for header files.
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 1b16598..d74aee1 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -59,7 +59,7 @@ endif
 LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o
-LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
+LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_fdt.o
 
 ifeq ($(CONFIG_NetBSD),y)
 LIBXL_OBJS-y += libxl_netbsd.o
diff --git a/tools/libxl/libxl_fdt.c b/tools/libxl/libxl_fdt.c
new file mode 100644
index 000..f88e9f1
--- /dev/null
+++ b/tools/libxl/libxl_fdt.c
@@ -0,0 +1,84 @@
+/*
+ * libfdt - Flat Device Tree manipulation
+ * Copyright (C) 2006 David Gibson, IBM Corporation.
+ *
+ * libfdt is dual licensed: you can use it either under the terms of
+ * the GPL, or the BSD license, at your option.
+ *
+ *  a) This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this library; if not, write to the Free
+ * Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
+ * MA 02110-1301 USA
+ *
+ * Alternatively,
+ *
+ *  b) Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ * 2. Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
+ * EVEN IF 

[Xen-devel] [PATCH v4 23/33] xen/passthrough: iommu_deassign_device_dt: By default reassign device to nobody

2015-03-19 Thread Julien Grall
Currently, when the device is deassigned from a domain, we directly reassign
to DOM0.

As the device may not have been correctly reset, this may lead to corruption or
expose some part of DOM0 memory. Also, we may have no way to reset some
platform devices.

If Xen reassigns the device to "nobody", it may receive some global/context
fault because the transaction has failed (indeed the context has been
marked invalid). Unfortunately there is no simple way to quiesce a buggy
hardware. I think we could live with that for a first version of platform
device passthrough.

DOM0 will have to issue an hypercall to assign the device to itself if it
wants to use it.

Signed-off-by: Julien Grall 
Acked-by: Stefano Stabellini 

---
Note: This behavior is documented in a following patch which extend
DOMCT_*assign_device to support non-PCI passthrough.

Changes in v4:
- Add Stefano's ack

Changes in v3:
- Use the coding style of the new SMMU drivers

Changes in v2:
- Fix typoes in the commit message
- Update commit message
---
 xen/drivers/passthrough/arm/smmu.c| 8 +++-
 xen/drivers/passthrough/device_tree.c | 9 +++--
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index a7a7da9..7261834 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2649,7 +2649,7 @@ static int arm_smmu_reassign_dev(struct domain *s, struct 
domain *t,
int ret = 0;
 
/* Don't allow remapping on other domain than hwdom */
-   if (t != hardware_domain)
+   if (t && t != hardware_domain)
return -EPERM;
 
if (t == s)
@@ -2659,6 +2659,12 @@ static int arm_smmu_reassign_dev(struct domain *s, 
struct domain *t,
if (ret)
return ret;
 
+   if (t) {
+   ret = arm_smmu_assign_dev(t, devfn, dev);
+   if (ret)
+   return ret;
+   }
+
return 0;
 }
 
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 05ab274..0ec4103 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -80,15 +80,12 @@ int iommu_deassign_dt_device(struct domain *d, struct 
dt_device_node *dev)
 
 spin_lock(&dtdevs_lock);
 
-rc = hd->platform_ops->reassign_device(d, hardware_domain,
-   0, dt_to_dev(dev));
+rc = hd->platform_ops->reassign_device(d, NULL, 0, dt_to_dev(dev));
 if ( rc )
 goto fail;
 
-list_del(&dev->domain_list);
-
-dt_device_set_used_by(dev, hardware_domain->domain_id);
-list_add(&dev->domain_list, 
&domain_hvm_iommu(hardware_domain)->dt_devices);
+list_del_init(&dev->domain_list);
+dt_device_set_used_by(dev, DOMID_IO);
 
 fail:
 spin_unlock(&dtdevs_lock);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 16/33] xen/arm: Let the toolstack configure the number of SPIs

2015-03-19 Thread Julien Grall
Each domain may have a different number of IRQs depending on the devices
assigned to it.

Rather re-using the number of IRQs used by the hardwared GIC, let the
toolstack specify the number of SPIs when the domain is created. This
will avoid to waste memory.

To calculate the number of SPIs, we take advantage of the fact that the
libxl interface can only expose 1:1 mapping and look for the largest SPI
in the list.

Signed-off-by: Julien Grall 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Wei Liu 

---
Changes in v4:
- Check the number of SPIs supported by the virtual GIC against
the number supported by the hardware GIC.
- Use uint32_t rather than int in the toolstack code.
- Initialize spi after the check in the toolstack code
- Typoes

Changes in v3:
- Fix typoes
- A separate has been created to extend the DOMCTL create domain

Changes in v2:
- Patch added
---
 tools/libxc/xc_domain.c   |  1 +
 tools/libxl/libxl_arm.c   | 21 +
 xen/arch/arm/domain.c |  2 +-
 xen/arch/arm/setup.c  |  1 +
 xen/arch/arm/vgic.c   | 11 ++-
 xen/include/asm-arm/domain.h  |  2 ++
 xen/include/asm-arm/setup.h   |  1 +
 xen/include/asm-arm/vgic.h|  2 +-
 xen/include/public/arch-arm.h |  2 ++
 9 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 448e958..579d266 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -67,6 +67,7 @@ int xc_domain_create(xc_interface *xch,
 /* No arch-specific configuration for now */
 #elif defined (__arm__) || defined(__aarch64__)
 config.gic_version = XEN_DOMCTL_CONFIG_GIC_DEFAULT;
+config.nr_spis = 0;
 #else
 errno = ENOSYS;
 return -1;
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index dc0919a..2407c2e 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -39,6 +39,27 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
   libxl_domain_config *d_config,
   xc_domain_configuration_t *xc_config)
 {
+uint32_t nr_spis = 0;
+unsigned int i;
+
+for (i = 0; i < d_config->b_info.num_irqs; i++) {
+uint32_t irq = d_config->b_info.irqs[i];
+uint32_t spi;
+
+if (irq < 32)
+continue;
+
+spi = irq - 32;
+
+if (nr_spis <= spi)
+nr_spis = spi + 1;
+}
+
+LOG(DEBUG, "Configure the domain");
+
+xc_config->nr_spis = nr_spis;
+LOG(DEBUG, " - Allocate %u SPIs", nr_spis);
+
 xc_config->gic_version = XEN_DOMCTL_CONFIG_GIC_DEFAULT;
 
 return 0;
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index ce9f349..7c5bf9f 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -562,7 +562,7 @@ int arch_domain_create(struct domain *d, unsigned int 
domcr_flags,
 if ( (rc = gicv_setup(d)) != 0 )
 goto fail;
 
-if ( (rc = domain_vgic_init(d)) != 0 )
+if ( (rc = domain_vgic_init(d, config->nr_spis)) != 0 )
 goto fail;
 
 if ( (rc = domain_vtimer_init(d)) != 0 )
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index defe39e..c935266 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -830,6 +830,7 @@ void __init start_xen(unsigned long boot_phys_offset,
 /* Create initial domain 0. */
 /* The vGIC for DOM0 is exactly emulated the hardware GIC */
 config.gic_version = XEN_DOMCTL_CONFIG_GIC_DEFAULT;
+config.nr_spis = gic_number_lines() - 32;
 
 dom0 = domain_create(0, 0, 0, &config);
 if ( IS_ERR(dom0) || (alloc_dom0_vcpu0(dom0) == NULL) )
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index c07822f..74751e0 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -68,16 +68,17 @@ static void vgic_init_pending_irq(struct pending_irq *p, 
unsigned int virq)
 p->irq = virq;
 }
 
-int domain_vgic_init(struct domain *d)
+int domain_vgic_init(struct domain *d, unsigned int nr_spis)
 {
 int i;
 
 d->arch.vgic.ctlr = 0;
 
-if ( is_hardware_domain(d) )
-d->arch.vgic.nr_spis = gic_number_lines() - 32;
-else
-d->arch.vgic.nr_spis = 0; /* We don't need SPIs for the guest */
+/* Limit the number of SPIs supported base on the hardware */
+if ( nr_spis > (gic_number_lines() - NR_LOCAL_IRQS) )
+return -EINVAL;
+
+d->arch.vgic.nr_spis = nr_spis;
 
 switch ( gic_hw_version() )
 {
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 9e0419e..6dacfef 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -125,6 +125,8 @@ struct arch_domain
 unsigned int evtchn_irq;
 }  __cacheline_aligned;
 
+#define domain_is_configured(d) ((d)->arch.is_configured)
+
 struct arch_vcpu
 {
 struct {
diff --git a/xen/include/asm-arm/setup.h b/xen/include/asm-arm/setup.h
index ba5a67d..254cc17 100644
--- a/xen/

[Xen-devel] [PATCH v4 09/33] xen: Extend DOMCTL createdomain to support arch configuration

2015-03-19 Thread Julien Grall
On ARM the virtual GIC may differ between each guest (emulated GIC version,
number of SPIs...). This information is already known at the domain creation
and can never change.

For now only the gic_version is set. In the long run, there will be more
parameters such as the number of SPIs. All will be required to be set at the
same time.

A new arch-specific structure arch_domainconfig has been created, the x86
one doesn't have any specific configuration, for now, a dummy structure
(C-spec compliant) has been created.

Some external tools (qemu, xenstore) may be required to create a domain.
Rather than asking them to take care of the arch-specific domain
configuration, let the current function (xc_domain_create) chose a
default configuration and introduce a new one (xc_domain_create_config).

This patch also drops the previously introduced DOMCTL arm_configure_domain
in Xen 4.5, as it has been made useless.

Signed-off-by: Julien Grall 
Acked-by: Jan Beulich 
Acked-by: Daniel De Graaf 
Acked-by: Stefano Stabellini 
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Keir Fraser 
Cc: Andrew Cooper 
Cc: George Dunlap 

---
This is a follow-up of 
http://lists.xen.org/archives/html/xen-devel/2014-11/msg00522.html

Interesting discussion about the implication of this series:
- http://lists.xen.org/archives/html/xen-devel/2015-02/msg02721.html
- http://lists.xen.org/archives/html/xen-devel/2015-02/msg03306.html

As ARM will only support migration v2, it would be possible to
create a specific section which will store arch-configuration.

For the xc_domain_create, Stefano S. was looking to drop PV domain
creation support in QEMU. So maybe I could simply extend xc_domain_create
and drop the xc_domain_create_config.

Note: XEN_DOMCTL_INTERFACE_VERSION has been bumped by the commit
e1890c4 "time: widen wallclock seconds to 64 bits".

Changes in v4:
- Typoes in the commit message
- Bump the XEN_DOMCTL_INTERFACE_VERSION
- Add Jan's ack for x86 and the common hypervisors pieces
- Add Daniel's ack for XSM
- Add Stefano's ack
- Drop unused include in arch/arm/domctl.c (was added specially
  for the DOMCTL)
- Remove a spurious line change in libxl_create.c

Changes in v3:
- Patch was previously sent in a separate series [1]
- Rename arch_domainconfig to xen_arch_domainconfig
- Drop the typedef
- Pass NULL for DOM0 config on x86
- Drop spurious changes
- Update comment in start_xen in arch/arm/setup.c

[1] https://patches.linaro.org/41083/
---
 tools/flask/policy/policy/modules/xen/xen.if |  2 +-
 tools/libxc/include/xenctrl.h| 14 +
 tools/libxc/xc_domain.c  | 46 
 tools/libxl/libxl_arch.h |  6 
 tools/libxl/libxl_arm.c  | 28 ++---
 tools/libxl/libxl_create.c   | 20 +---
 tools/libxl/libxl_dm.c   |  3 +-
 tools/libxl/libxl_dom.c  |  2 +-
 tools/libxl/libxl_internal.h |  7 +++--
 tools/libxl/libxl_x86.c  | 10 ++
 xen/arch/arm/domain.c| 28 -
 xen/arch/arm/domctl.c| 36 --
 xen/arch/arm/mm.c|  6 ++--
 xen/arch/arm/setup.c |  6 +++-
 xen/arch/x86/domain.c|  3 +-
 xen/arch/x86/mm.c|  6 ++--
 xen/arch/x86/setup.c |  8 +++--
 xen/common/domain.c  |  7 +++--
 xen/common/domctl.c  |  3 +-
 xen/common/schedule.c|  3 +-
 xen/include/public/arch-arm.h|  8 +
 xen/include/public/arch-x86/xen.h|  4 +++
 xen/include/public/domctl.h  | 18 +--
 xen/include/xen/domain.h |  3 +-
 xen/include/xen/sched.h  |  9 --
 xen/xsm/flask/hooks.c|  3 --
 xen/xsm/flask/policy/access_vectors  |  2 --
 27 files changed, 169 insertions(+), 122 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.if 
b/tools/flask/policy/policy/modules/xen/xen.if
index 2d32e1c..620d151 100644
--- a/tools/flask/policy/policy/modules/xen/xen.if
+++ b/tools/flask/policy/policy/modules/xen/xen.if
@@ -51,7 +51,7 @@ define(`create_domain_common', `
getaffinity setaffinity setvcpuextstate };
allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim
set_max_evtchn set_vnumainfo get_vnumainfo cacheflush
-   psr_cmt_op configure_domain };
+   psr_cmt_op };
allow $1 $2:security check_context;
allow $1 $2:shadow enable;
allow $1 $2:mmu { map_read map_write adju

[Xen-devel] [PATCH v4 32/33] xl: Add new option dtdev

2015-03-19 Thread Julien Grall
The option "dtdev" will be used to passthrough a non-PCI device described
in the device tree to a guest.

Signed-off-by: Julien Grall 
Cc: Ian Jackson 
Cc: Wei Liu 

---
Changes in v4:
- Typoes in the documentation
- Wrap the line in xl_cmdimpl.c

Changes in v2:
- libxl_device_dt has been rename to libxl_device_dtdev
- use xrealloc instead of realloc
---
 docs/man/xl.cfg.pod.5|  5 +
 tools/libxl/xl_cmdimpl.c | 22 +-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index bcbc277..9241000 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -779,6 +779,11 @@ More information about Xen gfx_passthru feature is 
available
 on the XenVGAPassthrough L
 wiki page.
 
+=item B
+
+Specifies the host device tree nodes to passthrough to this guest. Each
+DTDEV_PATH is the absolute path in the device tree.
+
 =item B
 
 Allow guest to access specific legacy I/O ports. Each B
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c2415ba..ed82b6a 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1169,7 +1169,7 @@ static void parse_config_data(const char *config_source,
 long l, vcpus = 0;
 XLU_Config *config;
 XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms;
-XLU_ConfigList *channels, *ioports, *irqs, *iomem, *viridian;
+XLU_ConfigList *channels, *ioports, *irqs, *iomem, *viridian, *dtdevs;
 int num_ioports, num_irqs, num_iomem, num_cpus, num_viridian;
 int pci_power_mgmt = 0;
 int pci_msitranslate = 0;
@@ -1941,6 +1941,26 @@ skip_vfb:
 libxl_defbool_set(&b_info->u.pv.e820_host, true);
 }
 
+if (!xlu_cfg_get_list (config, "dtdev", &dtdevs, 0, 0)) {
+d_config->num_dtdevs = 0;
+d_config->dtdevs = NULL;
+for (i = 0; (buf = xlu_cfg_get_listitem(dtdevs, i)) != NULL; i++) {
+libxl_device_dtdev *dtdev;
+
+d_config->dtdevs = xrealloc(d_config->dtdevs,
+sizeof (libxl_device_dtdev) * (i + 1));
+dtdev = d_config->dtdevs + d_config->num_dtdevs;
+libxl_device_dtdev_init(dtdev);
+
+dtdev->path = strdup(buf);
+if (dtdev->path == NULL) {
+fprintf(stderr, "unable to duplicate string for dtdevs\n");
+exit(-1);
+}
+d_config->num_dtdevs++;
+}
+}
+
 switch (xlu_cfg_get_list(config, "cpuid", &cpuids, 0, 1)) {
 case 0:
 {
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 21/33] xen/passthrough: Introduce iommu_construct

2015-03-19 Thread Julien Grall
This new function will correctly initialize the IOMMU page table for the
current domain.

Also use it in iommu_assign_dt_device even though the current IOMMU
implementation on ARM shares P2M with the processor.

Signed-off-by: Julien Grall 
Cc: Jan Beulich 

---
Changes in v4:
- Move memory_type_changed in iommu_construct. Added by commit
06ed8cc "x86: avoid needless EPT table ajustment and cache
flush"
- And an ASSERT and a comment in iommu_assign_dt_device to
explain why the call is safe for DOM0

Changes in v3:
- The ASSERT in iommu_construct was redundant with the if ()
- Remove d->need_iommu = 1 in assign_device has it's already
done by iommu_construct.
- Simplify the code in the caller of iommu_construct

Changes in v2:
- Add missing Signed-off-by
- Rename iommu_buildup to iommu_construct
---
 xen/drivers/passthrough/arm/iommu.c   |  6 ++
 xen/drivers/passthrough/device_tree.c | 12 
 xen/drivers/passthrough/iommu.c   | 26 ++
 xen/drivers/passthrough/pci.c | 22 --
 xen/include/xen/iommu.h   |  2 ++
 5 files changed, 50 insertions(+), 18 deletions(-)

diff --git a/xen/drivers/passthrough/arm/iommu.c 
b/xen/drivers/passthrough/arm/iommu.c
index 3007b99..9234657 100644
--- a/xen/drivers/passthrough/arm/iommu.c
+++ b/xen/drivers/passthrough/arm/iommu.c
@@ -68,3 +68,9 @@ void arch_iommu_domain_destroy(struct domain *d)
 {
 iommu_dt_domain_destroy(d);
 }
+
+int arch_iommu_populate_page_table(struct domain *d)
+{
+/* The IOMMU shares the p2m with the CPU */
+return -ENOSYS;
+}
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 377d41d..4d82a09 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -41,6 +41,18 @@ int iommu_assign_dt_device(struct domain *d, struct 
dt_device_node *dev)
 if ( !list_empty(&dev->domain_list) )
 goto fail;
 
+if ( need_iommu(d) <= 0 )
+{
+/*
+ * The hwdom is forced to use IOMMU for protecting assigned
+ * device. Therefore the IOMMU data is already set up.
+ */
+ASSERT(!is_hardware_domain(d));
+rc = iommu_construct(d);
+if ( rc )
+goto fail;
+}
+
 rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
 
 if ( rc )
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 92ea26f..faddd50 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -187,6 +187,32 @@ void iommu_teardown(struct domain *d)
 tasklet_schedule(&iommu_pt_cleanup_tasklet);
 }
 
+int iommu_construct(struct domain *d)
+{
+int rc = 0;
+
+if ( need_iommu(d) > 0 )
+return 0;
+
+if ( !iommu_use_hap_pt(d) )
+{
+rc = arch_iommu_populate_page_table(d);
+if ( rc )
+return rc;
+}
+
+d->need_iommu = 1;
+/*
+ * There may be dirty cache lines when a device is assigned
+ * and before need_iommu(d) becoming true, this will cause
+ * memory_type_changed lose effect if memory type changes.
+ * Call memory_type_changed here to amend this.
+ */
+memory_type_changed(d);
+
+return rc;
+}
+
 void iommu_domain_destroy(struct domain *d)
 {
 struct hvm_iommu *hd = domain_hvm_iommu(d);
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 4b83583..18b74f4 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1353,25 +1353,11 @@ static int assign_device(struct domain *d, u16 seg, u8 
bus, u8 devfn)
 if ( !spin_trylock(&pcidevs_lock) )
 return -ERESTART;
 
-if ( need_iommu(d) <= 0 )
+rc = iommu_construct(d);
+if ( rc )
 {
-if ( !iommu_use_hap_pt(d) )
-{
-rc = arch_iommu_populate_page_table(d);
-if ( rc )
-{
-spin_unlock(&pcidevs_lock);
-return rc;
-}
-}
-d->need_iommu = 1;
-/*
- * There may be dirty cache lines when a device is assigned
- * and before need_iommu(d) becoming true, this will cause
- * memory_type_changed lose effect if memory type changes.
- * Call memory_type_changed here to amend this.
- */
-memory_type_changed(d);
+spin_unlock(&pcidevs_lock);
+return rc;
 }
 
 pdev = pci_get_pdev_by_domain(hardware_domain, seg, bus, devfn);
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index bf4aff0..e9d2d5c 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -65,6 +65,8 @@ int arch_iommu_domain_init(struct domain *d);
 int arch_iommu_populate_page_table(struct domain *d);
 void arch_iommu_check_autotranslated_hwdom(struct domain *d);
 
+int iommu_construct(struct domain *d);
+
 /* Function use

[Xen-devel] [PATCH v4 07/33] xen: guestcopy: Provide an helper to safely copy string from guest

2015-03-19 Thread Julien Grall
Flask code already provides a helper to copy a string from guest. In a later
patch, the new DT hypercalls will need a similar function.

To avoid code duplication, copy the flask helper (flask_copying_string) to
common code:
- Rename into safe_copy_string_from_guest
- Add comment to explain the extra +1
- Return the buffer directly and use the macros provided by
xen/err.h to return an error code if necessary.

Signed-off-by: Julien Grall 
Acked-by: Daniel De Graaf 
Acked-by: Ian Campbell 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Keir Fraser 

---
Changes in v4:
- Use -ENOBUFS rather than -ENOENT
- Fix coding style in comment
- Typoes in commit message
- Convert the new flask_copying_string (for DT) in
safe_copy_string_from_guest
- Add Ian and Daniel's ack

Changes in v3:
- Use macros of xen/err.h to return either the buffer or an
error code
- Reuse size_t instead of unsigned long
- Update comment and commit message

Changes in v2:
- Rename copy_string_from_guest into safe_copy_string_from_guest
- Update commit message and comment in the code
---
 xen/common/Makefile|  1 +
 xen/common/guestcopy.c | 31 ++
 xen/include/xen/guest_access.h |  5 +
 xen/xsm/flask/flask_op.c   | 49 +++---
 4 files changed, 50 insertions(+), 36 deletions(-)
 create mode 100644 xen/common/guestcopy.c

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 1956091..cf15887 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -9,6 +9,7 @@ obj-y += event_2l.o
 obj-y += event_channel.o
 obj-y += event_fifo.o
 obj-y += grant_table.o
+obj-y += guestcopy.o
 obj-y += irq.o
 obj-y += kernel.o
 obj-y += keyhandler.o
diff --git a/xen/common/guestcopy.c b/xen/common/guestcopy.c
new file mode 100644
index 000..1645cbd
--- /dev/null
+++ b/xen/common/guestcopy.c
@@ -0,0 +1,31 @@
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * The function copies a string from the guest and adds a NUL to
+ * make sure the string is correctly terminated.
+ */
+void *safe_copy_string_from_guest(XEN_GUEST_HANDLE(char) u_buf,
+  size_t size, size_t max_size)
+{
+char *tmp;
+
+if ( size > max_size )
+return ERR_PTR(-ENOBUFS);
+
+/* Add an extra +1 to append \0 */
+tmp = xmalloc_array(char, size + 1);
+if ( !tmp )
+return ERR_PTR(-ENOMEM);
+
+if ( copy_from_guest(tmp, u_buf, size) )
+{
+xfree(tmp);
+return ERR_PTR(-EFAULT);
+}
+tmp[size] = 0;
+
+return tmp;
+}
diff --git a/xen/include/xen/guest_access.h b/xen/include/xen/guest_access.h
index 373454e..55645e6 100644
--- a/xen/include/xen/guest_access.h
+++ b/xen/include/xen/guest_access.h
@@ -8,6 +8,8 @@
 #define __XEN_GUEST_ACCESS_H__
 
 #include 
+#include 
+#include 
 
 #define copy_to_guest(hnd, ptr, nr) \
 copy_to_guest_offset(hnd, 0, ptr, nr)
@@ -27,4 +29,7 @@
 #define __clear_guest(hnd, nr)  \
 __clear_guest_offset(hnd, 0, nr)
 
+void *safe_copy_string_from_guest(XEN_GUEST_HANDLE(char) u_buf,
+ size_t size, size_t max_size);
+
 #endif /* __XEN_GUEST_ACCESS_H__ */
diff --git a/xen/xsm/flask/flask_op.c b/xen/xsm/flask/flask_op.c
index 47aacc1..802ffd4 100644
--- a/xen/xsm/flask/flask_op.c
+++ b/xen/xsm/flask/flask_op.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -93,29 +94,6 @@ static int domain_has_security(struct domain *d, u32 perms)
 perms, NULL);
 }
 
-static int flask_copyin_string(XEN_GUEST_HANDLE(char) u_buf, char **buf,
-   size_t size, size_t max_size)
-{
-char *tmp;
-
-if ( size > max_size )
-return -ENOENT;
-
-tmp = xmalloc_array(char, size + 1);
-if ( !tmp )
-return -ENOMEM;
-
-if ( copy_from_guest(tmp, u_buf, size) )
-{
-xfree(tmp);
-return -EFAULT;
-}
-tmp[size] = 0;
-
-*buf = tmp;
-return 0;
-}
-
 #endif /* COMPAT */
 
 static int flask_security_user(struct xen_flask_userlist *arg)
@@ -129,9 +107,9 @@ static int flask_security_user(struct xen_flask_userlist 
*arg)
 if ( rv )
 return rv;
 
-rv = flask_copyin_string(arg->u.user, &user, arg->size, PAGE_SIZE);
-if ( rv )
-return rv;
+user = safe_copy_string_from_guest(arg->u.user, arg->size, PAGE_SIZE);
+if ( IS_ERR(user) )
+return PTR_ERR(user);
 
 rv = security_get_user_sids(arg->start_sid, user, &sids, &nsids);
 if ( rv < 0 )
@@ -244,9 +222,9 @@ static int flask_security_context(struct 
xen_flask_sid_context *arg)
 if ( rv )
 return rv;
 
-rv = flask_copyin_string(arg->context, &buf, arg->size, PAGE_SIZE);
-if ( rv )
-return rv;
+buf = safe_copy_string_from_guest(arg->context, arg->size, PAGE_SIZE);

[Xen-devel] [PATCH v4 04/33] xen/arm: vgic: Introduce a function to initialize pending_irq

2015-03-19 Thread Julien Grall
The structure pending_irq is initialized in the same way in 2 different
places. Introduce vgic_init_pending_irq to avoid code duplication.

Also move the setting of the irq field into this function as we need to
initialize it once rather than every time an IRQ is injected to the guest.

Finally, use unsigned int for the "irq" field to be consistent with the
virq variable

Signed-off-by: Julien Grall 
Acked-by: Stefano Stabellini 
Acked-by: Ian Campbell 

---
Changes in v4:
- Add Ian's ack
- Typoes in the commit message

Changes in v3:
- Add Stefano's acked
- The irq field is now unsigned int
- Update commit message to speak about the int -> unsigned int
change
- Use "unsigned int" rather than "unsigned"

Changes in v2:
- Patch added
---
 xen/arch/arm/gic.c |  2 +-
 xen/arch/arm/vgic.c| 19 ++-
 xen/include/asm-arm/vgic.h |  2 +-
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 8e7f24b..ba7950b 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -630,7 +630,7 @@ void gic_dump_info(struct vcpu *v)
 
 list_for_each_entry ( p, &v->arch.vgic.inflight_irqs, inflight )
 {
-printk("Inflight irq=%d lr=%u\n", p->irq, p->lr);
+printk("Inflight irq=%u lr=%u\n", p->irq, p->lr);
 }
 
 list_for_each_entry( p, &v->arch.vgic.lr_pending, lr_queue )
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index c14d79d..0b4fa57 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -61,6 +61,13 @@ struct vgic_irq_rank *vgic_rank_irq(struct vcpu *v, unsigned 
int irq)
 return vgic_get_rank(v, rank);
 }
 
+static void vgic_init_pending_irq(struct pending_irq *p, unsigned int virq)
+{
+INIT_LIST_HEAD(&p->inflight);
+INIT_LIST_HEAD(&p->lr_queue);
+p->irq = virq;
+}
+
 int domain_vgic_init(struct domain *d)
 {
 int i;
@@ -101,10 +108,8 @@ int domain_vgic_init(struct domain *d)
 return -ENOMEM;
 
 for (i=0; iarch.vgic.nr_spis; i++)
-{
-INIT_LIST_HEAD(&d->arch.vgic.pending_irqs[i].inflight);
-INIT_LIST_HEAD(&d->arch.vgic.pending_irqs[i].lr_queue);
-}
+vgic_init_pending_irq(&d->arch.vgic.pending_irqs[i], i + 32);
+
 for (i=0; iarch.vgic.shared_irqs[i].lock);
 
@@ -148,10 +153,7 @@ int vcpu_vgic_init(struct vcpu *v)
 
 memset(&v->arch.vgic.pending_irqs, 0, sizeof(v->arch.vgic.pending_irqs));
 for (i = 0; i < 32; i++)
-{
-INIT_LIST_HEAD(&v->arch.vgic.pending_irqs[i].inflight);
-INIT_LIST_HEAD(&v->arch.vgic.pending_irqs[i].lr_queue);
-}
+vgic_init_pending_irq(&v->arch.vgic.pending_irqs[i], i);
 
 INIT_LIST_HEAD(&v->arch.vgic.inflight_irqs);
 INIT_LIST_HEAD(&v->arch.vgic.lr_pending);
@@ -409,7 +411,6 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int irq)
 goto out;
 }
 
-n->irq = irq;
 n->priority = priority;
 
 /* the irq is enabled */
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index dd93872..0d0d114 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -67,7 +67,7 @@ struct pending_irq
 #define GIC_IRQ_GUEST_MIGRATING   4
 unsigned long status;
 struct irq_desc *desc; /* only set it the irq corresponds to a physical 
irq */
-int irq;
+unsigned int irq;
 #define GIC_INVALID_LR ~(uint8_t)0
 uint8_t lr;
 uint8_t priority;
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 19/33] xen/arm: Implement hypercall DOMCTL_{, un}bind_pt_pirq

2015-03-19 Thread Julien Grall
On x86, an IRQ is assigned in 2 steps to an HVM guest:
- The toolstack is calling PHYSDEVOP_map_pirq in order to create a
guest PIRQ (IRQ bound to an event channel)
- The emulator (QEMU) is calling DOMCTL_bind_pt_irq in order to
bind the IRQ

On ARM, there is no concept of PIRQ as the IRQ can be assigned to a
virtual IRQ using the interrupt controller.

It's not clear if we will need 2 different hypercalls on ARM to assign
IRQ and, for now, only the toolstack will manage IRQ.

In order to avoid re-using a fixed ABI hypercall (PHYSDEVOP_*) for a
different purpose and allow us more time to figure out the right out,
only DOMCTL_{,un}bind_pt_pirq is implemented on ARM.

The DOMCTL is extended with a new type PT_IRQ_TYPE_SPI and only IRQ ==
vIRQ (i.e machine_irq == spi) is supported.

Concerning XSM, even if ARM is using one hypercall rather than 2, the
resulting check is nearly the same.

XSM PHYSDEVOP_map_pirq:
1) Check if the current domain can add resource to the domain
2) Check if the current domain has permission to add the IRQ
3) Check if the target domain has permission to use the IRQ

XSM DOMCTL_bind_pirq_irq:
1) Check if the current domain can add resource to the domain
2) Check if the current domain has permission to bind the IRQ
3) Check if the target domain has permission to use the IRQ

Rather than checking that the current domain can both add and bind the
IRQ, we only check the bind permission. I think this is not a big deal
because we don't have emulator on ARM and therefore no disaggregation is
required.

Note: The toolstack changes for routing an IRQ to a guest will be done
in a separate patch.

Signed-off-by: Julien Grall 
Cc: Jan Beulich 

---
Contrawise PHYSDEV, DOMCTL interface is not fixed. This version is
using a DOMCTL in order to let us more to to see if we need a new
PHYSDEV op for vIRQ assignation.

DOMCTL_unbind_pt_irq has been implemented, although I haven't test
it. I'm not sure if we want to keep it.

Concerning XSM, the final security check is fairly the same:

Changes in v4:
- Move the implementation from PHYSDEV to DOMCTL. Reuse
DOMCTL_{,un}bind_pt_irq for this purpose.

Changes in v3:
- Functions to allocate/release/reserved a VIRQ has been moved
in a separate patch
- Make clear that only MAP_PIRQ_GSI is only supported for now

Changes in v2:
- Add PHYSDEVOP_unmap_pirq
- Rework commit message
- Add functions to allocate/release a VIRQ
- is_routable_irq has been renamed into is_assignable_irq
---
 tools/libxc/include/xenctrl.h |  8 +++--
 tools/libxc/xc_domain.c   | 18 +--
 xen/arch/arm/domctl.c | 66 
 xen/include/public/domctl.h   |  4 +++
 xen/include/xsm/dummy.h   | 24 +++
 xen/include/xsm/xsm.h | 28 -
 xen/xsm/dummy.c   |  4 +--
 xen/xsm/flask/hooks.c | 70 ++-
 8 files changed, 156 insertions(+), 66 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 60b61b6..b6212bf 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2090,7 +2090,7 @@ int xc_domain_bind_pt_irq(xc_interface *xch,
   uint8_t bus,
   uint8_t device,
   uint8_t intx,
-  uint8_t isa_irq);
+  uint16_t isa_irq);
 
 int xc_domain_unbind_pt_irq(xc_interface *xch,
   uint32_t domid,
@@ -2099,7 +2099,7 @@ int xc_domain_unbind_pt_irq(xc_interface *xch,
   uint8_t bus,
   uint8_t device,
   uint8_t intx,
-  uint8_t isa_irq);
+  uint16_t isa_irq);
 
 int xc_domain_bind_pt_pci_irq(xc_interface *xch,
   uint32_t domid,
@@ -2112,6 +2112,10 @@ int xc_domain_bind_pt_isa_irq(xc_interface *xch,
   uint32_t domid,
   uint8_t machine_irq);
 
+int xc_domain_bind_pt_spi_irq(xc_interface *xch,
+  uint32_t domid,
+  uint16_t spi);
+
 int xc_domain_set_machine_address_size(xc_interface *xch,
   uint32_t domid,
   unsigned int width);
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 579d266..8243b70 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1764,7 +1764,7 @@ int xc_domain_bind_pt_irq(
 uint8_t bus,
 uint8_t device,
 uint8_t intx,
-uint8_t isa_irq)
+uint16_t isa_irq)
 {
 int rc;
 xen_domctl_bind_pt_irq_t * bind;
@@ -1788,6 +1788,9 @@ int xc_domain_bind_pt_irq(
 case PT_IRQ_TYPE_ISA:
 bind->u.isa.isa_irq = 

[Xen-devel] [PATCH v4 22/33] xen/passthrough: arm: release the DT devices assigned to a guest earlier

2015-03-19 Thread Julien Grall
The toolstack may not have deassigned every device used by a guest.
Therefore we have to go through the device list and remove them before
asking the IOMMU drivers to release memory for this domain.

This can be done by moving the call to the release function when we
relinquish the resources. The IOMMU part will be destroyed later when
the domain is freed.

Signed-off-by: Julien Grall 
Signed-off-by: Robert VanVossen 
Acked-by: Jan Beulich 

---
Changes in v4:
- Typoes in commit message
- Add Jan's ack
- iommu_release_dt_devices was only release the first device by
mistake. Thanks for Robert VanVossen for spotting it.

Changes in v3:
- Patch added. Superseed the patch "xen/passthrough: call
arch_iommu_domain_destroy before calling iommu teardown" in
the previous patch series.
---
 xen/arch/arm/domain.c | 4 
 xen/drivers/passthrough/arm/iommu.c   | 1 -
 xen/drivers/passthrough/device_tree.c | 7 ++-
 xen/include/xen/iommu.h   | 2 +-
 4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 7c5bf9f..7ebdce3 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -767,6 +767,10 @@ int domain_relinquish_resources(struct domain *d)
 switch ( d->arch.relmem )
 {
 case RELMEM_not_started:
+ret = iommu_release_dt_devices(d);
+if ( ret )
+return ret;
+
 d->arch.relmem = RELMEM_xen;
 /* Falltrough */
 
diff --git a/xen/drivers/passthrough/arm/iommu.c 
b/xen/drivers/passthrough/arm/iommu.c
index 9234657..95b1abb 100644
--- a/xen/drivers/passthrough/arm/iommu.c
+++ b/xen/drivers/passthrough/arm/iommu.c
@@ -66,7 +66,6 @@ int arch_iommu_domain_init(struct domain *d)
 
 void arch_iommu_domain_destroy(struct domain *d)
 {
-iommu_dt_domain_destroy(d);
 }
 
 int arch_iommu_populate_page_table(struct domain *d)
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 4d82a09..05ab274 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -105,7 +105,7 @@ int iommu_dt_domain_init(struct domain *d)
 return 0;
 }
 
-void iommu_dt_domain_destroy(struct domain *d)
+int iommu_release_dt_devices(struct domain *d)
 {
 struct hvm_iommu *hd = domain_hvm_iommu(d);
 struct dt_device_node *dev, *_dev;
@@ -115,7 +115,12 @@ void iommu_dt_domain_destroy(struct domain *d)
 {
 rc = iommu_deassign_dt_device(d, dev);
 if ( rc )
+{
 dprintk(XENLOG_ERR, "Failed to deassign %s in domain %u\n",
 dt_node_full_name(dev), d->domain_id);
+return rc;
+}
 }
+
+return 0;
 }
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index e9d2d5c..d9c9ede 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -117,7 +117,7 @@ void iommu_read_msi_from_ire(struct msi_desc *msi_desc, 
struct msi_msg *msg);
 int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev);
 int iommu_deassign_dt_device(struct domain *d, struct dt_device_node *dev);
 int iommu_dt_domain_init(struct domain *d);
-void iommu_dt_domain_destroy(struct domain *d);
+int iommu_release_dt_devices(struct domain *d);
 
 #endif /* HAS_DEVICE_TREE */
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 20/33] xen/dts: Provide an helper to get a DT node from a path provided by a guest

2015-03-19 Thread Julien Grall
The maximum size of the copied string has been chosen based on the value
use by XSM in similar case.

Furthermore, Linux seems to allow path up to 4096 characters. Though
this could vary from one OS to another.

Signed-off-by: Julien Grall 

---
Changes in v4:
- Drop DEVICE_TREE_MAX_PATHLEN
- Bump the value to PAGE_SIZE (i.e 4096). It's used in XSM and
this value seems sensible for Linux
- Clarify how the maximum size has been chosen

Changes in v3:
- Use the new prototype of safe_copy_string_from_guest

Changes in v2:
- guest_copy_string_from_guest has been renamed into
safe_copy_string_from_guest
---
 xen/common/device_tree.c  | 18 ++
 xen/include/xen/device_tree.h | 14 ++
 2 files changed, 32 insertions(+)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 02cae91..31f169b 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -23,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 const void *device_tree_flattened;
 dt_irq_xlate_func dt_irq_xlate;
@@ -277,6 +279,22 @@ struct dt_device_node *dt_find_node_by_path(const char 
*path)
 return np;
 }
 
+int dt_find_node_by_gpath(XEN_GUEST_HANDLE(char) u_path, uint32_t u_plen,
+  struct dt_device_node **node)
+{
+char *path;
+
+path = safe_copy_string_from_guest(u_path, u_plen, PAGE_SIZE);
+if ( IS_ERR(path) )
+return PTR_ERR(path);
+
+*node = dt_find_node_by_path(path);
+
+xfree(path);
+
+return (*node == NULL) ? -ESRCH : 0;
+}
+
 struct dt_device_node *dt_find_node_by_alias(const char *alias)
 {
 const struct dt_alias_prop *app;
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index 57eb3ee..e187780 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -456,6 +456,20 @@ struct dt_device_node *dt_find_node_by_alias(const char 
*alias);
  */
 struct dt_device_node *dt_find_node_by_path(const char *path);
 
+
+/**
+ * dt_find_node_by_gpath - Same as dt_find_node_by_path but retrieve the
+ * path from the guest
+ *
+ * @u_path: Xen Guest handle to the buffer containing the path
+ * @u_plen: Length of the buffer
+ * @node: TODO
+ *
+ * Return 0 if succeed otherwise -errno
+ */
+int dt_find_node_by_gpath(XEN_GUEST_HANDLE(char) u_path, uint32_t u_plen,
+  struct dt_device_node **node);
+
 /**
  * dt_get_parent - Get a node's parent if any
  * @node: Node to get parent
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 30/33] tools/libxl: arm: Use an higher value for the GIC phandle

2015-03-19 Thread Julien Grall
The partial device tree may contains phandle. The Device Tree Compiler
tends to allocate the phandle from 1.

Reserve the ID 65000 for the GIC phandle. I think we can safely assume
that the partial device tree will never contain a such ID.

Signed-off-by: Julien Grall 
Cc: Ian Jackson 
Cc: Wei Liu 

---
It's not easily possible to track the maximum phandle in the partial
device tree.

We would need to parse it twice: one for looking the maximum
phandle, and one for copying the nodes. This is because we have to
know the phandle of the GIC when we create the properties of the
root.

As the phandle is encoded an unsigned 32 bits, I could use an higher
value. Though, having 65000 phandle is already a lot...

TODO: If it's necessary, I can check if the value has been used by
another phandle in the device tree.

Changes in v3:
- Patch added
---
 tools/libxl/libxl_arm.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 54d197b..0723a47 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -80,10 +80,11 @@ static struct arch_info {
 {"xen-3.0-aarch64", "arm,armv8-timer", "arm,armv8" },
 };
 
-enum {
-PHANDLE_NONE = 0,
-PHANDLE_GIC,
-};
+/*
+ * The device tree compiler (DTC) is allocating the phandle from 1 to
+ * onwards. Reserve a high value for the GIC phandle.
+ */
+#define PHANDLE_GIC (65000)
 
 typedef uint32_t be32;
 typedef be32 gic_interrupt[3];
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 33/33] docs/misc: arm: Add documentation about non-PCI passthrough

2015-03-19 Thread Julien Grall
Note that the example is done on Midway whose SMMU driver is not
supported on Xen upstream.

Currently, I don't have other platform where I can test non-PCI
passthrough.

Signed-off-by: Julien Grall 

---
Changes in v4:
- Patch added
---
 docs/misc/arm/passthrough.txt | 58 +++
 1 file changed, 58 insertions(+)
 create mode 100644 docs/misc/arm/passthrough.txt

diff --git a/docs/misc/arm/passthrough.txt b/docs/misc/arm/passthrough.txt
new file mode 100644
index 000..cf6cc61
--- /dev/null
+++ b/docs/misc/arm/passthrough.txt
@@ -0,0 +1,58 @@
+Passthrough a non-pci device to a guest
+===
+
+The example will use the secondary network card for the midway server.
+
+1) Mark the device to let Xen knowns the device will be used for passthrough.
+This is done in the device tree node describing the device by adding the
+property "xen,passthrough". The command to do it in U-Boot is:
+
+fdt set /soc/ethernet@fff51000 xen,passthrough
+
+2) Create the partial device tree describing the device. The IRQ are mapped
+1:1 to the guest (i.e VIRQ == IRQ). For MMIO will have to find hole in the
+guest memory layout (see xen/include/public/arch-arm.h, noted the layout
+is not stable and can change between 2 releases version of Xen).
+
+/dts-v1/;
+
+/ {
+/* #*cells are here to keep DTC happy */
+#address-cells = <2>;
+#size-cells = <2>;
+
+aliases {
+net = &mac0;
+};
+
+passthrough {
+compatible = "simple-bus";
+ranges;
+#address-cells = <2>;
+#size-cells = <2>;
+   mac0: ethernet@1000 {
+   compatible = "calxeda,hb-xgmac";
+reg = <0 0x1000 0 0x1000>;
+   interrupts = <0 80 4  0 81 4  0 82 4>;
+   };
+};
+};
+
+Note:
+* The interrupt-parent property will be added by the toolstack in the
+root node;
+* The properties compatible, ranges, #address-cells and #size-cells
+in /passthrough are mandatory.
+
+3) Compile the partial guest device with dtc (Device Tree Compiler).
+For our purpose, the compiled file will be called guest-midway.dtb and
+placed in /root in DOM0.
+
+3) Add the following options in the guest configuration file:
+
+device_tree = "/root/guest-midway.dtb"
+dtdev = [ "/soc/ethernet@fff51000" ]
+irqs = [ 112, 113, 114 ]
+iomem = [ "0xfff51,1@0x1" ]
+
+
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 26/33] xen/passthrough: Extend XEN_DOMCTL_*assign_device to support DT device

2015-03-19 Thread Julien Grall
A device node is described by a path. It will be used to retrieved the
node in the device tree and assign the related device to the domain.

Only non-PCI protected by an IOMMU can be assigned to a guest.

Also document the behavior of XEN_DOMCTL_deassign_device in the public
headers which differ between non-PCI and PCI.

Signed-off-by: Julien Grall 
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Jan Beulich 

---
Changes in v4:
- Add XSM bits
- Return -ENODEV rather than -ENOSYS
- Move the if (...) into the ifdef (see iommu.c)
- Document the behavior of XEN_DOMCTL_deassign_device
- Use PCI_BUS and PCI_DEVFN2 when it's possible
- iommu_dt_device_is_assigned now returns 0 when the device is
not protected

Changes in v2:
- Use a different number for XEN_DOMCTL_assign_dt_device
---
 tools/libxc/include/xenctrl.h |  10 
 tools/libxc/xc_domain.c   |  95 --
 xen/drivers/passthrough/device_tree.c | 108 +-
 xen/drivers/passthrough/iommu.c   |   9 ++-
 xen/drivers/passthrough/pci.c |  47 ++-
 xen/include/public/domctl.h   |  24 +++-
 xen/include/xen/iommu.h   |   3 +
 7 files changed, 271 insertions(+), 25 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index b6212bf..4648cb0 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2055,6 +2055,16 @@ int xc_deassign_device(xc_interface *xch,
  uint32_t domid,
  uint32_t machine_sbdf);
 
+int xc_assign_dt_device(xc_interface *xch,
+uint32_t domid,
+char *path);
+int xc_test_assign_dt_device(xc_interface *xch,
+ uint32_t domid,
+ char *path);
+int xc_deassign_dt_device(xc_interface *xch,
+  uint32_t domid,
+  char *path);
+
 int xc_domain_memory_mapping(xc_interface *xch,
  uint32_t domid,
  unsigned long first_gfn,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 8243b70..924a180 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1633,7 +1633,8 @@ int xc_assign_device(
 
 domctl.cmd = XEN_DOMCTL_assign_device;
 domctl.domain = domid;
-domctl.u.assign_device.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
+domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
 
 return do_domctl(xch, &domctl);
 }
@@ -1682,7 +1683,8 @@ int xc_test_assign_device(
 
 domctl.cmd = XEN_DOMCTL_test_assign_device;
 domctl.domain = domid;
-domctl.u.assign_device.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
+domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
 
 return do_domctl(xch, &domctl);
 }
@@ -1696,11 +1698,96 @@ int xc_deassign_device(
 
 domctl.cmd = XEN_DOMCTL_deassign_device;
 domctl.domain = domid;
-domctl.u.assign_device.machine_sbdf = machine_sbdf;
- 
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
+domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+
 return do_domctl(xch, &domctl);
 }
 
+int xc_assign_dt_device(
+xc_interface *xch,
+uint32_t domid,
+char *path)
+{
+int rc;
+size_t size = strlen(path);
+DECLARE_DOMCTL;
+DECLARE_HYPERCALL_BOUNCE(path, size, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+if ( xc_hypercall_bounce_pre(xch, path) )
+return -1;
+
+domctl.cmd = XEN_DOMCTL_assign_device;
+domctl.domain = (domid_t)domid;
+
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
+domctl.u.assign_device.u.dt.size = size;
+set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
+
+rc = do_domctl(xch, &domctl);
+
+xc_hypercall_bounce_post(xch, path);
+
+return rc;
+}
+
+int xc_test_assign_dt_device(
+xc_interface *xch,
+uint32_t domid,
+char *path)
+{
+int rc;
+size_t size = strlen(path);
+DECLARE_DOMCTL;
+DECLARE_HYPERCALL_BOUNCE(path, size, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+if ( xc_hypercall_bounce_pre(xch, path) )
+return -1;
+
+domctl.cmd = XEN_DOMCTL_test_assign_device;
+domctl.domain = (domid_t)domid;
+
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
+domctl.u.assign_device.u.dt.size = size;
+set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
+
+rc = do_domctl(xch, &domctl);
+
+xc_hypercall_bounce_post(xch, path);
+
+return rc;
+}
+
+int xc_deassign_dt_device(
+xc_interface *xch,
+uint32_t domid,
+char *path)
+{
+int rc;
+size_t size = strlen(path);
+DECLARE_DOMCTL;
+DECLARE_HYPERCALL_BOUNCE(path, size, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+if ( xc_hypercall_bounce_pre(xch, path) )
+return -1;
+
+do

[Xen-devel] [PATCH v4 12/33] xen/arm: gic: Add sanity checks gic_route_irq_to_guest

2015-03-19 Thread Julien Grall
With the addition of interrupt assignment to guest, we need to make sure
the guest can't blow up the interrupt management in Xen.

Before associating the IRQ to a vIRQ we need to make sure:
- the vIRQ is not already associated to another IRQ
- the guest didn't enable the vIRQ

Signed-off-by: Julien Grall 
Acked-by: Stefano Stabellini 

---
Changes in v4:
- Move functional change (GIC_IRQ_PRI -> priority) in a separate
patch.
- Split the 2 conditions of the ASSERT in 2 different ASSERTs
- Typoes
- Add Stefano's ack

Changes in v3:
- Patch added
---
 xen/arch/arm/gic.c| 35 +++
 xen/arch/arm/irq.c| 12 ++--
 xen/include/asm-arm/gic.h |  7 +++
 3 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index fe8f69b..2709415 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -126,22 +126,41 @@ void gic_route_irq_to_xen(struct irq_desc *desc, const 
cpumask_t *cpu_mask,
 /* Program the GIC to route an interrupt to a guest
  *   - desc.lock must be held
  */
-void gic_route_irq_to_guest(struct domain *d, unsigned int virq,
-struct irq_desc *desc,
-const cpumask_t *cpu_mask, unsigned int priority)
+int gic_route_irq_to_guest(struct domain *d, unsigned int virq,
+   struct irq_desc *desc, unsigned int priority)
 {
-struct pending_irq *p;
+unsigned long flags;
+/* Use vcpu0 to retrieve the pending_irq struct. Given that we only
+ * route SPIs to guests, it doesn't make any difference. */
+struct vcpu *v_target = vgic_get_target_vcpu(d->vcpu[0], virq);
+struct vgic_irq_rank *rank = vgic_rank_irq(v_target, virq);
+struct pending_irq *p = irq_to_pending(v_target, virq);
+int res = -EBUSY;
+
 ASSERT(spin_is_locked(&desc->lock));
+/* Caller has already checked that the IRQ is an SPI */
+ASSERT(virq >= 32);
+ASSERT(virq < vgic_num_irqs(d));
+
+vgic_lock_rank(v_target, rank, flags);
+
+if ( p->desc ||
+ /* The VIRQ should not be already enabled by the guest */
+ test_bit(GIC_IRQ_GUEST_ENABLED, &p->status) )
+goto out;
 
 desc->handler = gic_hw_ops->gic_guest_irq_type;
 set_bit(_IRQ_GUEST, &desc->status);
 
-gic_set_irq_properties(desc, cpumask_of(smp_processor_id()), GIC_PRI_IRQ);
+gic_set_irq_properties(desc, cpumask_of(v_target->processor), GIC_PRI_IRQ);
 
-/* Use vcpu0 to retrieve the pending_irq struct. Given that we only
- * route SPIs to guests, it doesn't make any difference. */
-p = irq_to_pending(d->vcpu[0], virq);
 p->desc = desc;
+res = 0;
+
+out:
+vgic_unlock_rank(v_target, rank, flags);
+
+return res;
 }
 
 int gic_irq_xlate(const u32 *intspec, unsigned int intsize,
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 4c3e381..b2ddf6b 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -492,14 +492,22 @@ int route_irq_to_guest(struct domain *d, unsigned int 
virq,
 if ( retval )
 goto out;
 
-gic_route_irq_to_guest(d, virq, desc, cpumask_of(smp_processor_id()),
-   GIC_PRI_IRQ);
+retval = gic_route_irq_to_guest(d, virq, desc, GIC_PRI_IRQ);
+
 spin_unlock_irqrestore(&desc->lock, flags);
+
+if ( retval )
+{
+release_irq(desc->irq, info);
+goto free_info;
+}
+
 return 0;
 
 out:
 spin_unlock_irqrestore(&desc->lock, flags);
 xfree(action);
+free_info:
 xfree(info);
 
 return retval;
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
index bb2a922..ef4bf9a 100644
--- a/xen/include/asm-arm/gic.h
+++ b/xen/include/asm-arm/gic.h
@@ -216,10 +216,9 @@ extern enum gic_version gic_hw_version(void);
 /* Program the GIC to route an interrupt */
 extern void gic_route_irq_to_xen(struct irq_desc *desc, const cpumask_t 
*cpu_mask,
  unsigned int priority);
-extern void gic_route_irq_to_guest(struct domain *, unsigned int virq,
-   struct irq_desc *desc,
-   const cpumask_t *cpu_mask,
-   unsigned int priority);
+extern int gic_route_irq_to_guest(struct domain *, unsigned int virq,
+  struct irq_desc *desc,
+  unsigned int priority);
 
 extern void gic_inject(void);
 extern void gic_clear_pending_irqs(struct vcpu *v);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 25/33] xen/xsm: Add helpers to check permission for device tree passthrough

2015-03-19 Thread Julien Grall
This is a follow-up of commit 525ee49 "xsm: add device tree labeling
support" which add support for device tree labelling in flask.

Those helpers will be use latter when non-pci passthrough (i.e device
tree) will be added.

Signed-off-by: Julien Grall 
Cc: Daniel De Graaf 

---
Changes in v4:
- Patch added
---
 xen/include/xsm/dummy.h | 23 +
 xen/include/xsm/xsm.h   | 27 +++
 xen/xsm/dummy.c |  6 
 xen/xsm/flask/avc.c |  3 ++
 xen/xsm/flask/hooks.c   | 69 -
 xen/xsm/flask/include/avc.h |  2 ++
 xen/xsm/flask/policy/access_vectors |  2 +-
 7 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index da414c7..8157252 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -350,6 +350,29 @@ static XSM_INLINE int xsm_deassign_device(XSM_DEFAULT_ARG 
struct domain *d, uint
 
 #endif /* HAS_PASSTHROUGH && HAS_PCI */
 
+#if defined(HAS_PASSTHROUGH) && defined(HAS_DEVICE_TREE)
+static XSM_INLINE int xsm_test_assign_dtdevice(XSM_DEFAULT_ARG const char 
*dtpath)
+{
+XSM_ASSERT_ACTION(XSM_HOOK);
+return xsm_default_action(action, current->domain, NULL);
+}
+
+static XSM_INLINE int xsm_assign_dtdevice(XSM_DEFAULT_ARG struct domain *d,
+  const char *dtpath)
+{
+XSM_ASSERT_ACTION(XSM_HOOK);
+return xsm_default_action(action, current->domain, d);
+}
+
+static XSM_INLINE int xsm_deassign_dtdevice(XSM_DEFAULT_ARG struct domain *d,
+const char *dtpath)
+{
+XSM_ASSERT_ACTION(XSM_HOOK);
+return xsm_default_action(action, current->domain, d);
+}
+
+#endif /* HAS_PASSTHROUGH && HAS_DEVICE_TREE */
+
 static XSM_INLINE int xsm_resource_plug_core(XSM_DEFAULT_VOID)
 {
 XSM_ASSERT_ACTION(XSM_HOOK);
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 99a59d0..a0eaaa1 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -121,6 +121,12 @@ struct xsm_operations {
 int (*deassign_device) (struct domain *d, uint32_t machine_bdf);
 #endif
 
+#if defined(HAS_PASSTHROUGH) && defined(HAS_DEVICE_TREE)
+int (*test_assign_dtdevice) (const char *dtpath);
+int (*assign_dtdevice) (struct domain *d, const char *dtpath);
+int (*deassign_dtdevice) (struct domain *d, const char *dtpath);
+#endif
+
 int (*resource_plug_core) (void);
 int (*resource_unplug_core) (void);
 int (*resource_plug_pci) (uint32_t machine_bdf);
@@ -473,6 +479,27 @@ static inline int xsm_deassign_device(xsm_default_t def, 
struct domain *d, uint3
 }
 #endif /* HAS_PASSTHROUGH && HAS_PCI) */
 
+#if defined(HAS_PASSTHROUGH) && defined(HAS_DEVICE_TREE)
+static inline int xsm_assign_dtdevice(xsm_default_t def, struct domain *d,
+  const char *dtpath)
+{
+return xsm_ops->assign_dtdevice(d, dtpath);
+}
+
+static inline int xsm_test_assign_dtdevice(xsm_default_t def,
+   const char *dtpath)
+{
+return xsm_ops->test_assign_dtdevice(dtpath);
+}
+
+static inline int xsm_deassign_dtdevice(xsm_default_t def, struct domain *d,
+const char *dtpath)
+{
+return xsm_ops->deassign_dtdevice(d, dtpath);
+}
+
+#endif /* HAS_PASSTHROUGH && HAS_DEVICE_TREE */
+
 static inline int xsm_resource_plug_pci (xsm_default_t def, uint32_t 
machine_bdf)
 {
 return xsm_ops->resource_plug_pci(machine_bdf);
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index b69a019..cd88e76 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -96,6 +96,12 @@ void xsm_fixup_ops (struct xsm_operations *ops)
 set_to_dummy_if_null(ops, deassign_device);
 #endif
 
+#if defined(HAS_PASSTHROUGH) && defined(HAS_DEVICE_TREE)
+set_to_dummy_if_null(ops, test_assign_dtdevice);
+set_to_dummy_if_null(ops, assign_dtdevice);
+set_to_dummy_if_null(ops, deassign_dtdevice);
+#endif
+
 set_to_dummy_if_null(ops, resource_plug_core);
 set_to_dummy_if_null(ops, resource_unplug_core);
 set_to_dummy_if_null(ops, resource_plug_pci);
diff --git a/xen/xsm/flask/avc.c b/xen/xsm/flask/avc.c
index b1a4f8a..31bc702 100644
--- a/xen/xsm/flask/avc.c
+++ b/xen/xsm/flask/avc.c
@@ -600,6 +600,9 @@ void avc_audit(u32 ssid, u32 tsid, u16 tclass, u32 
requested,
 case AVC_AUDIT_DATA_MEMORY:
 avc_printk(&buf, "pte=%#lx mfn=%#lx ", a->memory.pte, a->memory.mfn);
 break;
+case AVC_AUDIT_DATA_DTDEV:
+avc_printk(&buf, "dtdevice=%s ", a->dtdev);
+break;
 }
 
 avc_dump_query(&buf, ssid, tsid, tclass);
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index e1cc16a..9652034 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -589,7 +589,12 @@ static int flask_domctl(struct domain *d, int cmd)
 case XEN_DOMCTL_shadow_op:
 case XEN_DOMCTL_ioport_permissi

[Xen-devel] [PATCH v4 17/33] xen/arm: vgic: Add spi_to_pending

2015-03-19 Thread Julien Grall
Introduce spi_to_pending in order retrieve the irq_pending structure for
a specific SPI.

It's not possible to re-use irq_to_pending because it's required a VCPU
and some call of the new function may during domain destruction after
the VCPUs are freed.

Signed-off-by: Julien Grall 

---
Changes in v4:
- Patch added
---
 xen/arch/arm/vgic.c| 7 +++
 xen/include/asm-arm/vgic.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 74751e0..fc283ec 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -371,6 +371,13 @@ struct pending_irq *irq_to_pending(struct vcpu *v, 
unsigned int irq)
 return n;
 }
 
+struct pending_irq *spi_to_pending(struct domain *d, unsigned int irq)
+{
+ASSERT(irq >= NR_LOCAL_IRQS);
+
+return &d->arch.vgic.pending_irqs[irq - 32];
+}
+
 void vgic_clear_pending_irqs(struct vcpu *v)
 {
 struct pending_irq *p, *t;
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index 647f2fe..8d22532 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -185,6 +185,7 @@ extern void vgic_vcpu_inject_irq(struct vcpu *v, unsigned 
int virq);
 extern void vgic_vcpu_inject_spi(struct domain *d, unsigned int virq);
 extern void vgic_clear_pending_irqs(struct vcpu *v);
 extern struct pending_irq *irq_to_pending(struct vcpu *v, unsigned int irq);
+extern struct pending_irq *spi_to_pending(struct domain *d, unsigned int irq);
 extern struct vgic_irq_rank *vgic_rank_offset(struct vcpu *v, int b, int n, 
int s);
 extern struct vgic_irq_rank *vgic_rank_irq(struct vcpu *v, unsigned int irq);
 extern int vgic_emulate(struct cpu_user_regs *regs, union hsr hsr);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 06/33] xen/arm: Introduce xen, passthrough property

2015-03-19 Thread Julien Grall
When a device is marked for passthrough (via the new property
"xen,passthrough"), dom0 must not access to the device (i.e not
loading a driver), but should be able to manage the MMIO/interrupt
of the passthrough device.

The latter part will allow the toolstack to map MMIO/IRQ when a device
is pass through to a guest.

The property "xen,passthrough" will be translated as 'status="disabled"'
in the device tree to avoid DOM0 using the device. We assume that DOM0 is
able to cope with this property (already the case for Linux, and
required by ePAPR).

Rework the function map_device (renamed into handle_device) to:

* For a given device node:
- Give permission to manage IRQ/MMIO for this device
- Retrieve the IRQ configuration (i.e edge/level) from the device
tree
* When the device is not marked for guest passthrough:
- Assign the device to the guest if it's protected by an IOMMU
- Map the IRQs and MMIOs regions to the guest

Signed-off-by: Julien Grall 
Acked-by: Stefano Stabellini 

---
Changes in v4:
- Add Stefano's ack
- Rebase on the latest staging (no function changes)
- Rework the device passthrough documentation to be weaker
- Coding style and typoes

Changes in v3:
- This patch was formely "xen/arm: Follow-up to allow DOM0
manage IRQ and MMIO". It has been split in 2 parts [1].
- Update commit title and improve message
- Remove spurious change

[1] https://patches.linaro.org/34669/
---
 docs/misc/arm/device-tree/passthrough.txt |  9 +++
 xen/arch/arm/device.c |  2 +-
 xen/arch/arm/domain_build.c   | 97 ---
 xen/common/device_tree.c  |  6 ++
 xen/include/xen/device_tree.h | 11 
 5 files changed, 103 insertions(+), 22 deletions(-)
 create mode 100644 docs/misc/arm/device-tree/passthrough.txt

diff --git a/docs/misc/arm/device-tree/passthrough.txt 
b/docs/misc/arm/device-tree/passthrough.txt
new file mode 100644
index 000..6715646
--- /dev/null
+++ b/docs/misc/arm/device-tree/passthrough.txt
@@ -0,0 +1,9 @@
+Device passthrough
+===
+
+Any device with the property "xen,passthrough" set will not be exposed to
+DOM0 and therefore no driver will be loaded.
+
+It is highly recommended to set this property on devices which are passed
+through since many devices will not cope with being accessed by dom0 and
+then handed over to another domain.
diff --git a/xen/arch/arm/device.c b/xen/arch/arm/device.c
index 675784c..0b53f6a 100644
--- a/xen/arch/arm/device.c
+++ b/xen/arch/arm/device.c
@@ -30,7 +30,7 @@ int __init device_init(struct dt_device_node *dev, enum 
device_class class,
 
 ASSERT(dev != NULL);
 
-if ( !dt_device_is_available(dev) )
+if ( !dt_device_is_available(dev) || dt_device_for_passthrough(dev) )
 return  -ENODEV;
 
 for ( desc = _sdevice; desc != _edevice; desc++ )
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 31b32df..2eb31ad 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -402,7 +402,7 @@ static int write_properties(struct domain *d, struct 
kernel_info *kinfo,
 const struct dt_device_node *node)
 {
 const char *bootargs = NULL;
-const struct dt_property *prop;
+const struct dt_property *prop, *status = NULL;
 int res = 0;
 int had_dom0_bootargs = 0;
 
@@ -457,6 +457,17 @@ static int write_properties(struct domain *d, struct 
kernel_info *kinfo,
 }
 }
 
+/* Don't expose the property "xen,passthrough" to the guest */
+if ( dt_property_name_is_equal(prop, "xen,passthrough") )
+continue;
+
+/* Remember and skip the status property as Xen may modify it later */
+if ( dt_property_name_is_equal(prop, "status") )
+{
+status = prop;
+continue;
+}
+
 res = fdt_property(kinfo->fdt, prop->name, prop_data, prop_len);
 
 xfree(new_data);
@@ -465,6 +476,19 @@ static int write_properties(struct domain *d, struct 
kernel_info *kinfo,
 return res;
 }
 
+/*
+ * Override the property "status" to disable the device when it's
+ * marked for passthrough.
+ */
+if ( dt_device_for_passthrough(node) )
+res = fdt_property_string(kinfo->fdt, "status", "disabled");
+else if ( status )
+res = fdt_property(kinfo->fdt, "status", status->value,
+   status->length);
+
+if ( res )
+return res;
+
 if ( dt_node_path_is_equal(node, "/chosen") )
 {
 const struct bootmodule *mod = kinfo->initrd_bootmodule;
@@ -903,8 +927,15 @@ static int make_timer_node(const struct domain *d, void 
*fdt,
 return res;
 }
 
-/* Map the device in the domain */
-static int map_device(struct domain *d, struct dt_device_node *dev)
+/*
+ * For a given device node:
+ *  - Give permission to the gues

[Xen-devel] [PATCH v4 14/33] xen/arm: vgic: Correctly calculate GICD_TYPER.ITLinesNumber

2015-03-19 Thread Julien Grall
The formula of GICD_TYPER.ITLinesNumber is 32(N + 1).

As the number of SPIs suppported by the domain may not be a multiple of
32, we have to round up the number before using it.

At the same time remove the mask GICD_TYPE_LINES which is pointless.

Signed-off-by: Julien Grall 

---
Changes in v4:
- Patch added
---
 xen/arch/arm/vgic-v2.c | 2 +-
 xen/arch/arm/vgic-v3.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index 40619b2..b5a8f29 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -55,7 +55,7 @@ static int vgic_v2_distr_mmio_read(struct vcpu *v, 
mmio_info_t *info)
 /* No secure world support for guests. */
 vgic_lock(v);
 *r = ( ((v->domain->max_vcpus - 1) << GICD_TYPE_CPUS_SHIFT) )
-|( ((v->domain->arch.vgic.nr_spis / 32)) & GICD_TYPE_LINES );
+| DIV_ROUND_UP(v->domain->arch.vgic.nr_spis, 32);
 vgic_unlock(v);
 return 1;
 case GICD_IIDR:
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index ec79c2a..96c1be8 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -700,7 +700,7 @@ static int vgic_v3_distr_mmio_read(struct vcpu *v, 
mmio_info_t *info)
 if ( dabt.size != DABT_WORD ) goto bad_width;
 /* No secure world support for guests. */
 *r = ((ncpus - 1) << GICD_TYPE_CPUS_SHIFT |
-  ((v->domain->arch.vgic.nr_spis / 32) & GICD_TYPE_LINES));
+  DIV_ROUND_UP(v->domain->arch.vgic.nr_spis, 32));
 
 *r |= (irq_bits - 1) << GICD_TYPE_ID_BITS_SHIFT;
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 11/33] xen/arm: route_irq_to_guest: Check validity of the IRQ

2015-03-19 Thread Julien Grall
Currently Xen only supports SPIs routing for guest, add a function
is_assignable_irq to check if we can assign a given IRQ to the guest.

Secondly, make sure the vIRQ is not the greater that the number of IRQs
configured in the vGIC and it's an SPI.

Thirdly, when the IRQ is already assigned to the domain, check the user
is not asking to use a different vIRQ than the one already bound.

Finally, desc->arch.type which contains the IRQ type (i.e level/edge) must
be correctly configured before. The misconfiguration can happen when:
- the device has been blacklisted for the current platform
- the IRQ has not been described in the device tree

Also, use XENLOG_G_ERR in the error message within the function as it will
be later called from a guest.

Signed-off-by: Julien Grall 

---
Changes in v4:
- Use NR_LOCAL_IRQS rather than 32
- Move the check to the IRQ and irq_to_desc after the vIRQ check
- Typoes and rewording the commit message and in the patch
- Use printk rather than dprintk.

Changes in v3:
- Fix typo in commit message and comment
- Add a check that the vIRQ is an SPI
- Check if the user is not asking for a different vIRQ when the
IRQ is already assigned to the guest

Changes in v2:
- Rename is_routable_irq into is_assignable_irq
- Check if the IRQ is not greater than the number handled by the
number of IRQs handled by the gic
- Move is_assignable_irq in irq.c rather than defining in the
header irq.h
- Retrieve the irq descriptor after checking the validity of the
IRQ
- vgic_num_irqs has been moved in a separate patch
- Fix the irq check against vgic_num_irqs
- Use virq instead of irq for vGIC sanity check
---
 xen/arch/arm/irq.c| 59 +++
 xen/include/asm-arm/irq.h |  2 ++
 2 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index beb746a..4c3e381 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -387,6 +387,16 @@ err:
 return rc;
 }
 
+bool_t is_assignable_irq(unsigned int irq)
+{
+/* For now, we can only route SPIs to the guest */
+return ((irq >= NR_LOCAL_IRQS) && (irq < gic_number_lines()));
+}
+
+/*
+ * Route an IRQ to a specific guest.
+ * For now only SPIs are assignable to the guest.
+ */
 int route_irq_to_guest(struct domain *d, unsigned int virq,
unsigned int irq, const char * devname)
 {
@@ -396,6 +406,28 @@ int route_irq_to_guest(struct domain *d, unsigned int virq,
 unsigned long flags;
 int retval = 0;
 
+if ( virq >= vgic_num_irqs(d) )
+{
+printk(XENLOG_G_ERR
+   "the vIRQ number %u is too high for domain %u (max = %u)\n",
+   irq, d->domain_id, vgic_num_irqs(d));
+return -EINVAL;
+}
+
+/* Only routing to virtual SPIs is supported */
+if ( virq < NR_LOCAL_IRQS )
+{
+printk(XENLOG_G_ERR "IRQ can only be routed to an SPI");
+return -EINVAL;
+}
+
+if ( !is_assignable_irq(irq) )
+{
+printk(XENLOG_G_ERR "the IRQ%u is not routable\n", irq);
+return -EINVAL;
+}
+desc = irq_to_desc(irq);
+
 action = xmalloc(struct irqaction);
 if ( !action )
 return -ENOMEM;
@@ -416,8 +448,18 @@ int route_irq_to_guest(struct domain *d, unsigned int virq,
 
 spin_lock_irqsave(&desc->lock, flags);
 
-/* If the IRQ is already used by someone
- *  - If it's the same domain -> Xen doesn't need to update the IRQ desc
+if ( desc->arch.type == DT_IRQ_TYPE_INVALID )
+{
+printk(XENLOG_G_ERR "IRQ %u has not been configured\n", irq);
+retval = -EIO;
+goto out;
+}
+
+/*
+ * If the IRQ is already used by someone
+ *  - If it's the same domain -> Xen doesn't need to update the IRQ desc.
+ *  For safety check if we are not trying to assign the IRQ to a
+ *  different vIRQ.
  *  - Otherwise -> For now, don't allow the IRQ to be shared between
  *  Xen and domains.
  */
@@ -426,13 +468,22 @@ int route_irq_to_guest(struct domain *d, unsigned int 
virq,
 struct domain *ad = irq_get_domain(desc);
 
 if ( test_bit(_IRQ_GUEST, &desc->status) && d == ad )
+{
+if ( irq_get_guest_info(desc)->virq != virq )
+{
+printk(XENLOG_G_ERR
+   "d%u: IRQ %u is already assigned to vIRQ %u\n",
+   d->domain_id, irq, irq_get_guest_info(desc)->virq);
+retval = -EBUSY;
+}
 goto out;
+}
 
 if ( test_bit(_IRQ_GUEST, &desc->status) )
-printk(XENLOG_ERR "ERROR: IRQ %u is already used by domain %u\n",
+printk(XENLOG_G_ERR "IRQ %u is already used by domain %u\n",
irq, ad->domain_id);
 else
-printk(XENLOG_ERR "ERROR:

[Xen-devel] [PATCH v4 24/33] xen/iommu: arm: Wire iommu DOMCTL for ARM

2015-03-19 Thread Julien Grall
Signed-off-by: Julien Grall 
Acked-by: Stefano Stabellini 
Acked-by: Ian Campbell 

---
Changes in v4:
- Add Ian's ack

Changes in v3:
- Add Stefano's ack

Changes in v2:
- Don't move the call in common code.
---
 xen/arch/arm/domctl.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index f5d5a10..fab9ff7 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -97,7 +97,16 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain 
*d,
 return 0;
 }
 default:
-return subarch_do_domctl(domctl, d, u_domctl);
+{
+int rc;
+
+rc = subarch_do_domctl(domctl, d, u_domctl);
+
+if ( rc == -ENOSYS )
+rc = iommu_do_domctl(domctl, d, u_domctl);
+
+return rc;
+}
 }
 }
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 05/33] xen/arm: Map disabled device in DOM0

2015-03-19 Thread Julien Grall
The check to avoid mapping disabled devices in DOM0 was added in
anticipation of the device passthrough. But, a brand new property will
be added later to mark device which will be passthrough.

Also, remove the memory type check as we already skipped them earlier in
the function via skip_matches.

Furthermore, some platform (such as the OMAP) may try to poke device even
if the property "status" is set to "disabled".

Signed-off-by: Julien Grall 
Acked-by: Stefano Stabellini 
Acked-by: Ian Campbell 
Cc: Andrii Tseglytskyi 

---
Changes in v4:
- Typoes in the commit message
- Add Ian and Stefano's ack

Changes in v3:
- Patch added
- "xen/arm: follow-up to allow DOM0 manage IRQ and MMIO" has
been split in 2 patch [1]
- Drop the check for memory type. Thoses nodes have been
blacklisted.

[1] https://patches.linaro.org/34669/
---
 xen/arch/arm/domain_build.c| 19 +++
 xen/arch/arm/platforms/omap5.c | 12 
 2 files changed, 3 insertions(+), 28 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index ab4ad65..31b32df 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1085,22 +1085,9 @@ static int handle_node(struct domain *d, struct 
kernel_info *kinfo,
 return 0;
 }
 
-/*
- * Some device doesn't need to be mapped in Xen:
- *  - Memory: the guest will see a different view of memory. It will
- *  be allocated later.
- *  - Disabled device: Linux is able to cope with status="disabled"
- *  property. Therefore these device doesn't need to be mapped. This
- *  solution can be use later for pass through.
- */
-if ( !dt_device_type_is_equal(node, "memory") &&
- dt_device_is_available(node) )
-{
-res = map_device(d, node);
-
-if ( res )
-return res;
-}
+res = map_device(d, node);
+if ( res)
+return res;
 
 /*
  * The property "name" is used to have a different name on older FDT
diff --git a/xen/arch/arm/platforms/omap5.c b/xen/arch/arm/platforms/omap5.c
index 9d6e504..e7bf30d 100644
--- a/xen/arch/arm/platforms/omap5.c
+++ b/xen/arch/arm/platforms/omap5.c
@@ -155,17 +155,6 @@ static const char * const dra7_dt_compat[] __initconst =
 NULL
 };
 
-static const struct dt_device_match dra7_blacklist_dev[] __initconst =
-{
-/* OMAP Linux kernel handles devices with status "disabled" in a
- * weird manner - tries to reset them. While their memory ranges
- * are not mapped, this leads to data aborts, so skip these devices
- * from DT for dom0.
- */
-DT_MATCH_NOT_AVAILABLE(),
-{ /* sentinel */ },
-};
-
 PLATFORM_START(omap5, "TI OMAP5")
 .compatible = omap5_dt_compat,
 .init_time = omap5_init_time,
@@ -185,7 +174,6 @@ PLATFORM_START(dra7, "TI DRA7")
 
 .dom0_gnttab_start = 0x4b00,
 .dom0_gnttab_size = 0x2,
-.blacklist_dev = dra7_blacklist_dev,
 PLATFORM_END
 
 /*
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 18/33] xen/arm: Release IRQ routed to a domain when it's destroying

2015-03-19 Thread Julien Grall
Xen has to release IRQ routed to a domain in order to reuse later.
Currently only SPIs can be routed to the guest so we only need to
browse SPIs for a specific domain.

Furthermore, a guest can crash and let the IRQ in an incorrect state
(i.e has not being EOIed). Xen will have to reset the IRQ in order to
be able to reuse the IRQ later.

Introduce 2 new functions for release an IRQ routed to a domain:
- release_guest_irq: upper level to retrieve the IRQ, call the GIC
code and release the action
- gic_remove_guest_irq: Check if we can remove the IRQ, and reset
it if necessary

Signed-off-by: Julien Grall 

---
Changes in v4:
- Reorder the code flow
- Typoes and coding style
- Use the newly helper spi_to_pending

Changes in v3:
- Take the vgic rank lock to protect p->desc
- Correctly check if the IRQ is disabled
- Extend the check on the virq in release_guest_irq
- Use vgic_get_target_vcpu to get the target vCPU
- Remove spurious change

Changes in v2:
- Drop the desc->handler = &no_irq_type in release_irq as it's
buggy if the IRQ is routed to Xen
- Add release_guest_irq and gic_remove_guest_irq
---
 xen/arch/arm/gic.c| 45 +
 xen/arch/arm/irq.c| 46 ++
 xen/arch/arm/vgic.c   | 16 
 xen/include/asm-arm/gic.h |  4 
 xen/include/asm-arm/irq.h |  2 ++
 5 files changed, 113 insertions(+)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 5f34997..f023e4f 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -163,6 +163,51 @@ out:
 return res;
 }
 
+/* This function only works with SPIs for now */
+int gic_remove_irq_from_guest(struct domain *d, unsigned int virq,
+  struct irq_desc *desc)
+{
+struct vcpu *v_target = vgic_get_target_vcpu(d->vcpu[0], virq);
+struct vgic_irq_rank *rank = vgic_rank_irq(v_target, virq);
+struct pending_irq *p = irq_to_pending(v_target, virq);
+unsigned long flags;
+
+ASSERT(spin_is_locked(&desc->lock));
+ASSERT(test_bit(_IRQ_GUEST, &desc->status));
+ASSERT(p->desc == desc);
+
+vgic_lock_rank(v_target, rank, flags);
+
+if ( d->is_dying )
+{
+desc->handler->shutdown(desc);
+
+/* EOI the IRQ it it has not been done by the guest */
+if ( test_bit(_IRQ_INPROGRESS, &desc->status) )
+gic_hw_ops->deactivate_irq(desc);
+clear_bit(_IRQ_INPROGRESS, &desc->status);
+}
+else
+{
+/*
+ * TODO: Handle eviction from LRs For now, deny
+ * remove if the IRQ is inflight or not disabled.
+ */
+if ( test_bit(_IRQ_INPROGRESS, &desc->status) ||
+ !test_bit(_IRQ_DISABLED, &desc->status) )
+return -EBUSY;
+}
+
+clear_bit(_IRQ_GUEST, &desc->status);
+desc->handler = &no_irq_type;
+
+p->desc = NULL;
+
+vgic_unlock_rank(v_target, rank, flags);
+
+return 0;
+}
+
 int gic_irq_xlate(const u32 *intspec, unsigned int intsize,
   unsigned int *out_hwirq,
   unsigned int *out_type)
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index b2ddf6b..376c9f2 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -513,6 +513,52 @@ free_info:
 return retval;
 }
 
+int release_guest_irq(struct domain *d, unsigned int virq)
+{
+struct irq_desc *desc;
+struct irq_guest *info;
+unsigned long flags;
+struct pending_irq *p;
+int ret;
+
+/* Only SPIs are supported */
+if ( virq < NR_LOCAL_IRQS || virq >= vgic_num_irqs(d) )
+return -EINVAL;
+
+p = spi_to_pending(d, virq);
+if ( !p->desc )
+return -EINVAL;
+
+desc = p->desc;
+
+spin_lock_irqsave(&desc->lock, flags);
+
+ret = -EINVAL;
+if ( !test_bit(_IRQ_GUEST, &desc->status) )
+goto unlock;
+
+info = irq_get_guest_info(desc);
+ret = -EINVAL;
+if ( d != info->d )
+goto unlock;
+
+ret = gic_remove_irq_from_guest(d, virq, desc);
+if ( ret )
+goto unlock;
+
+spin_unlock_irqrestore(&desc->lock, flags);
+
+release_irq(desc->irq, info);
+xfree(info);
+
+return 0;
+
+unlock:
+spin_unlock_irqrestore(&desc->lock, flags);
+
+return ret;
+}
+
 /*
  * pirq event channels. We don't use these on ARM, instead we use the
  * features of the GIC to inject virtualised normal interrupts.
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index fc283ec..93d0139 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -135,6 +135,22 @@ void register_vgic_ops(struct domain *d, const struct 
vgic_ops *ops)
 
 void domain_vgic_free(struct domain *d)
 {
+int i;
+int ret;
+
+for ( i = 0; i < (d->arch.vgic.nr_spis); i++ )
+{
+struct pending_irq *p = spi_to_pending(d, i + 32);
+
+if ( p->desc )
+{
+ret = releas

[Xen-devel] [PATCH v4 02/33] xen/dts: Allow only IRQ translation that are mapped to main GIC

2015-03-19 Thread Julien Grall
Xen is only able to handle one GIC controller. Some platforms may contain
other interrupt controllers.

Make sure to only translate IRQ mapped into the GIC handled by Xen.

Signed-off-by: Julien Grall 
Acked-by: Stefano Stabellini 
Acked-by: Ian Campbell 

---

Changes in v4:
- Add Ian's and Stefano's ack

Changes in v3:
- Patch was previously sent a separate series [1]
- Rework the comment in dt_irq_translate.

Changelog based on the separate series:

Changes in v3:
- Add an ASSERT to check that dt_interrupt_controller is not
NULL.

Changes in v2:
- Fix compilation...

[1] https://patches.linaro.org/33312/
---
 xen/common/device_tree.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index d1c716f..26fa298 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -1058,8 +1058,14 @@ int dt_irq_translate(const struct dt_raw_irq *raw,
  struct dt_irq *out_irq)
 {
 ASSERT(dt_irq_xlate != NULL);
+ASSERT(dt_interrupt_controller != NULL);
 
-/* TODO: Retrieve the right irq_xlate. This is only work for the gic */
+/*
+ * TODO: Retrieve the right irq_xlate. This is only works for the primary
+ * interrupt controller.
+ */
+if ( raw->controller != dt_interrupt_controller )
+return -EINVAL;
 
 return dt_irq_xlate(raw->specifier, raw->size,
 &out_irq->irq, &out_irq->type);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 13/33] xen/arm: gic_route_irq_to_guest: Honor the priority given in parameter

2015-03-19 Thread Julien Grall
The priority is already hardcoded in route_irq_to_guest and therefore
can't be controlled by the guest.

Signed-off-by: Julien Grall 

---
Changes in v4:
- Patch added
---
 xen/arch/arm/gic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 2709415..5f34997 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -152,7 +152,7 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int 
virq,
 desc->handler = gic_hw_ops->gic_guest_irq_type;
 set_bit(_IRQ_GUEST, &desc->status);
 
-gic_set_irq_properties(desc, cpumask_of(v_target->processor), GIC_PRI_IRQ);
+gic_set_irq_properties(desc, cpumask_of(v_target->processor), priority);
 
 p->desc = desc;
 res = 0;
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 15/33] xen/arm: gic: GICv2 & GICv3 only supports 1020 physical interrupts

2015-03-19 Thread Julien Grall
GICD_TYPER.ITLinesNumber can encode up to 1024 interrupts. Although,
IRQ 1020-1023 are reserved for special purpose.

The result is used by the callers of gic_number_lines in order to check
the validity of an IRQ.

Signed-off-by: Julien Grall 
Cc: Frediano Ziglio 
Cc: Zoltan Kiss 

---
The GIC HIP04 driver would need a similar if they have some IRQ
reserved below 512. Maintainers are CCed.

Changes in v4:
- This patch was formerly sent separatly
https://patches.linaro.org/45373/
- s/(unsigned)1020/1020U/
---
 xen/arch/arm/gic-v2.c | 16 ++--
 xen/arch/arm/gic-v3.c | 16 ++--
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 3be4ad6..cfefb39 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -256,6 +256,7 @@ static void __init gicv2_dist_init(void)
 uint32_t type;
 uint32_t cpumask;
 uint32_t gic_cpus;
+unsigned int nr_lines;
 int i;
 
 cpumask = readl_gicd(GICD_ITARGETSR) & 0xff;
@@ -266,31 +267,34 @@ static void __init gicv2_dist_init(void)
 writel_gicd(0, GICD_CTLR);
 
 type = readl_gicd(GICD_TYPER);
-gicv2_info.nr_lines = 32 * ((type & GICD_TYPE_LINES) + 1);
+nr_lines = 32 * ((type & GICD_TYPE_LINES) + 1);
 gic_cpus = 1 + ((type & GICD_TYPE_CPUS) >> 5);
 printk("GICv2: %d lines, %d cpu%s%s (IID %8.8x).\n",
-   gicv2_info.nr_lines, gic_cpus, (gic_cpus == 1) ? "" : "s",
+   nr_lines, gic_cpus, (gic_cpus == 1) ? "" : "s",
(type & GICD_TYPE_SEC) ? ", secure" : "",
readl_gicd(GICD_IIDR));
 
 /* Default all global IRQs to level, active low */
-for ( i = 32; i < gicv2_info.nr_lines; i += 16 )
+for ( i = 32; i < nr_lines; i += 16 )
 writel_gicd(0x0, GICD_ICFGR + (i / 16) * 4);
 
 /* Route all global IRQs to this CPU */
-for ( i = 32; i < gicv2_info.nr_lines; i += 4 )
+for ( i = 32; i < nr_lines; i += 4 )
 writel_gicd(cpumask, GICD_ITARGETSR + (i / 4) * 4);
 
 /* Default priority for global interrupts */
-for ( i = 32; i < gicv2_info.nr_lines; i += 4 )
+for ( i = 32; i < nr_lines; i += 4 )
 writel_gicd(GIC_PRI_IRQ << 24 | GIC_PRI_IRQ << 16 |
 GIC_PRI_IRQ << 8 | GIC_PRI_IRQ,
 GICD_IPRIORITYR + (i / 4) * 4);
 
 /* Disable all global interrupts */
-for ( i = 32; i < gicv2_info.nr_lines; i += 32 )
+for ( i = 32; i < nr_lines; i += 32 )
 writel_gicd(~0x0, GICD_ICENABLER + (i / 32) * 4);
 
+/* Only 1020 interrupts are supported */
+gicv2_info.nr_lines = min(1020U, nr_lines);
+
 /* Turn on the distributor */
 writel_gicd(GICD_CTL_ENABLE, GICD_CTLR);
 }
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index 48772f1..b0f498e 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -528,23 +528,24 @@ static void __init gicv3_dist_init(void)
 uint32_t type;
 uint32_t priority;
 uint64_t affinity;
+unsigned int nr_lines;
 int i;
 
 /* Disable the distributor */
 writel_relaxed(0, GICD + GICD_CTLR);
 
 type = readl_relaxed(GICD + GICD_TYPER);
-gicv3_info.nr_lines = 32 * ((type & GICD_TYPE_LINES) + 1);
+nr_lines = 32 * ((type & GICD_TYPE_LINES) + 1);
 
 printk("GICv3: %d lines, (IID %8.8x).\n",
-   gicv3_info.nr_lines, readl_relaxed(GICD + GICD_IIDR));
+   nr_lines, readl_relaxed(GICD + GICD_IIDR));
 
 /* Default all global IRQs to level, active low */
-for ( i = NR_GIC_LOCAL_IRQS; i < gicv3_info.nr_lines; i += 16 )
+for ( i = NR_GIC_LOCAL_IRQS; i < nr_lines; i += 16 )
 writel_relaxed(0, GICD + GICD_ICFGR + (i / 16) * 4);
 
 /* Default priority for global interrupts */
-for ( i = NR_GIC_LOCAL_IRQS; i < gicv3_info.nr_lines; i += 4 )
+for ( i = NR_GIC_LOCAL_IRQS; i < nr_lines; i += 4 )
 {
 priority = (GIC_PRI_IRQ << 24 | GIC_PRI_IRQ << 16 |
 GIC_PRI_IRQ << 8 | GIC_PRI_IRQ);
@@ -552,7 +553,7 @@ static void __init gicv3_dist_init(void)
 }
 
 /* Disable all global interrupts */
-for ( i = NR_GIC_LOCAL_IRQS; i < gicv3_info.nr_lines; i += 32 )
+for ( i = NR_GIC_LOCAL_IRQS; i < nr_lines; i += 32 )
 writel_relaxed(0x, GICD + GICD_ICENABLER + (i / 32) * 4);
 
 gicv3_dist_wait_for_rwp();
@@ -566,8 +567,11 @@ static void __init gicv3_dist_init(void)
 /* Make sure we don't broadcast the interrupt */
 affinity &= ~GICD_IROUTER_SPI_MODE_ANY;
 
-for ( i = NR_GIC_LOCAL_IRQS; i < gicv3_info.nr_lines; i++ )
+for ( i = NR_GIC_LOCAL_IRQS; i < nr_lines; i++ )
 writeq_relaxed(affinity, GICD + GICD_IROUTER + i * 8);
+
+/* Only 1020 interrupts are supported */
+gicv3_info.nr_lines = min(1020U, nr_lines);
 }
 
 static int gicv3_enable_redist(void)
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 03/33] xen/dts: Use unsigned int for MMIO and IRQ index

2015-03-19 Thread Julien Grall
There is no reason to use signed integer for an index.

Signed-off-by: Julien Grall 
Acked-by: Stefano Stabellini 
Acked-by: Ian Campbell 

---
Changes in v4:
- Add Ian's ack

Changes in v3:
- Slightly update commit message to drop the reference to new
hypercalls.
- Add Stefano's acked

Changes in v2:
- Use unsigned int instead fancy one like unsigned or uint32_t
---
 xen/common/device_tree.c  | 11 ++-
 xen/include/xen/device_tree.h |  7 ---
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 26fa298..25880e8 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -496,7 +496,7 @@ static const struct dt_bus *dt_match_bus(const struct 
dt_device_node *np)
 }
 
 static const __be32 *dt_get_address(const struct dt_device_node *dev,
-int index, u64 *size,
+unsigned int index, u64 *size,
 unsigned int *flags)
 {
 const __be32 *prop;
@@ -683,7 +683,7 @@ bail:
 }
 
 /* dt_device_address - Translate device tree address and return it */
-int dt_device_get_address(const struct dt_device_node *dev, int index,
+int dt_device_get_address(const struct dt_device_node *dev, unsigned int index,
   u64 *addr, u64 *size)
 {
 const __be32 *addrp;
@@ -1006,7 +1006,8 @@ fail:
 return -EINVAL;
 }
 
-int dt_device_get_raw_irq(const struct dt_device_node *device, int index,
+int dt_device_get_raw_irq(const struct dt_device_node *device,
+  unsigned int index,
   struct dt_raw_irq *out_irq)
 {
 const struct dt_device_node *p;
@@ -1014,7 +1015,7 @@ int dt_device_get_raw_irq(const struct dt_device_node 
*device, int index,
 u32 intsize, intlen;
 int res = -EINVAL;
 
-dt_dprintk("dt_device_get_raw_irq: dev=%s, index=%d\n",
+dt_dprintk("dt_device_get_raw_irq: dev=%s, index=%u\n",
device->full_name, index);
 
 /* Get the interrupts property */
@@ -1071,7 +1072,7 @@ int dt_irq_translate(const struct dt_raw_irq *raw,
 &out_irq->irq, &out_irq->type);
 }
 
-int dt_device_get_irq(const struct dt_device_node *device, int index,
+int dt_device_get_irq(const struct dt_device_node *device, unsigned int index,
   struct dt_irq *out_irq)
 {
 struct dt_raw_irq raw;
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index c8a0375..6bbee6d 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -474,7 +474,7 @@ const struct dt_device_node *dt_get_parent(const struct 
dt_device_node *node);
  * This function resolves an address, walking the tree, for a give
  * device-tree node. It returns 0 on success.
  */
-int dt_device_get_address(const struct dt_device_node *dev, int index,
+int dt_device_get_address(const struct dt_device_node *dev, unsigned int index,
   u64 *addr, u64 *size);
 
 /**
@@ -504,7 +504,7 @@ unsigned int dt_number_of_address(const struct 
dt_device_node *device);
  * This function resolves an interrupt, walking the tree, for a given
  * device-tree node. It's the high level pendant to dt_device_get_raw_irq().
  */
-int dt_device_get_irq(const struct dt_device_node *device, int index,
+int dt_device_get_irq(const struct dt_device_node *device, unsigned int index,
   struct dt_irq *irq);
 
 /**
@@ -516,7 +516,8 @@ int dt_device_get_irq(const struct dt_device_node *device, 
int index,
  * This function resolves an interrupt for a device, no translation is
  * made. dt_irq_translate can be called after.
  */
-int dt_device_get_raw_irq(const struct dt_device_node *device, int index,
+int dt_device_get_raw_irq(const struct dt_device_node *device,
+  unsigned int index,
   struct dt_raw_irq *irq);
 
 /**
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 08/33] MAINTAINERS: move drivers/passthrough/device_tree.c in "DEVICE TREE"

2015-03-19 Thread Julien Grall
Suggested-by: Jan Beulich 
Signed-off-by: Julien Grall 
Cc: Ian Jackson 
Cc: Keir Fraser 

---
Changes in v4:
- Patch added
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index d88fca3..3558164 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -158,6 +158,7 @@ F:  xen/common/libfdt/
 F: xen/common/device_tree.c
 F: xen/include/xen/libfdt/
 F: xen/include/xen/device_tree.h
+F: xen/drivers/passthrough/device_tree.c
 
 EFI
 M: Jan Beulich 
@@ -214,6 +215,7 @@ F:  xen/drivers/passthrough/
 X: xen/drivers/passthrough/amd/
 X: xen/drivers/passthrough/arm/
 X: xen/drivers/passthrough/vtd/
+X: xen/drivers/passthrough/device_tree.c
 F: xen/include/xen/iommu.h
 
 KEXEC
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 10/33] xen/arm: Allow virq != irq

2015-03-19 Thread Julien Grall
Currently, Xen is assuming that the virtual IRQ will always be the same
as IRQ.

Modify route_guest_irq to take the virtual IRQ in parameter which allow
Xen to assign a different IRQ number. Also store the vIRQ in the desc
action to easily retrieve the IRQ target when we need to inject the
interrupt.

As DOM0 will get most the devices, the vIRQ is equal to the IRQ in that case.

At the same time modify the behavior of irq_get_domain. The function now
requires that the irq_desc belongs to an IRQ assigned to a guest.

Signed-off-by: Julien Grall 
Acked-by: Stefano Stabellini 

---
Changes in v4:
- Add Stefano's ack
- Typoes and rewording the commit message

Changes in v3
- Spelling/grammar nits
- Fix compilation on ARM64. Forgot to update route_irq_to_guest
  call for xgene platform.
- Add a word about irq_get_domain behavior change
- More s/irq/virq/ because of the rebasing on the latest staging

Changes in v2:
- Patch added
---
 xen/arch/arm/domain_build.c  |  2 +-
 xen/arch/arm/gic.c   |  5 ++--
 xen/arch/arm/irq.c   | 47 ++--
 xen/arch/arm/platforms/xgene-storm.c |  2 +-
 xen/arch/arm/vgic.c  | 20 +++
 xen/include/asm-arm/gic.h|  3 ++-
 xen/include/asm-arm/irq.h|  4 +--
 xen/include/asm-arm/vgic.h   |  4 +--
 8 files changed, 55 insertions(+), 32 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 2eb31ad..24a0242 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1020,7 +1020,7 @@ static int handle_device(struct domain *d, struct 
dt_device_node *dev)
  * twice the IRQ. This can happen if the IRQ is shared
  */
 vgic_reserve_virq(d, irq);
-res = route_irq_to_guest(d, irq, dt_node_name(dev));
+res = route_irq_to_guest(d, irq, irq, dt_node_name(dev));
 if ( res )
 {
 printk(XENLOG_ERR "Unable to route IRQ %u to domain %u\n",
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index ba7950b..fe8f69b 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -126,7 +126,8 @@ void gic_route_irq_to_xen(struct irq_desc *desc, const 
cpumask_t *cpu_mask,
 /* Program the GIC to route an interrupt to a guest
  *   - desc.lock must be held
  */
-void gic_route_irq_to_guest(struct domain *d, struct irq_desc *desc,
+void gic_route_irq_to_guest(struct domain *d, unsigned int virq,
+struct irq_desc *desc,
 const cpumask_t *cpu_mask, unsigned int priority)
 {
 struct pending_irq *p;
@@ -139,7 +140,7 @@ void gic_route_irq_to_guest(struct domain *d, struct 
irq_desc *desc,
 
 /* Use vcpu0 to retrieve the pending_irq struct. Given that we only
  * route SPIs to guests, it doesn't make any difference. */
-p = irq_to_pending(d->vcpu[0], desc->irq);
+p = irq_to_pending(d->vcpu[0], virq);
 p->desc = desc;
 }
 
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index cb9c99b..beb746a 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -31,6 +31,13 @@
 static unsigned int local_irqs_type[NR_LOCAL_IRQS];
 static DEFINE_SPINLOCK(local_irqs_type_lock);
 
+/* Describe an IRQ assigned to a guest */
+struct irq_guest
+{
+struct domain *d;
+unsigned int virq;
+};
+
 static void ack_none(struct irq_desc *irq)
 {
 printk("unexpected IRQ trap at irq %02x\n", irq->irq);
@@ -122,18 +129,20 @@ void __cpuinit init_secondary_IRQ(void)
 BUG_ON(init_local_irq_data() < 0);
 }
 
-static inline struct domain *irq_get_domain(struct irq_desc *desc)
+static inline struct irq_guest *irq_get_guest_info(struct irq_desc *desc)
 {
 ASSERT(spin_is_locked(&desc->lock));
-
-if ( !test_bit(_IRQ_GUEST, &desc->status) )
-return dom_xen;
-
+ASSERT(test_bit(_IRQ_GUEST, &desc->status));
 ASSERT(desc->action != NULL);
 
 return desc->action->dev_id;
 }
 
+static inline struct domain *irq_get_domain(struct irq_desc *desc)
+{
+return irq_get_guest_info(desc)->d;
+}
+
 void irq_set_affinity(struct irq_desc *desc, const cpumask_t *cpu_mask)
 {
 if ( desc != NULL )
@@ -204,7 +213,7 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, 
int is_fiq)
 
 if ( test_bit(_IRQ_GUEST, &desc->status) )
 {
-struct domain *d = irq_get_domain(desc);
+struct irq_guest *info = irq_get_guest_info(desc);
 
 perfc_incr(guest_irqs);
 desc->handler->end(desc);
@@ -214,7 +223,7 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, 
int is_fiq)
 
 /* the irq cannot be a PPI, we only support delivery of SPIs to
  * guests */
-vgic_vcpu_inject_spi(d, irq);
+vgic_vcpu_inject_spi(info->d, info->virq);
 goto out_no_end;
 }
 
@@ -378,19 +387,30 @@ err:
 return rc;
 }
 
-int route_irq_to_g

[Xen-devel] [PATCH v4 00/33] xen/arm: Add support for non-PCI passthrough

2015-03-19 Thread Julien Grall
Hello all,

This is the fourth version of this patch series to add support for platform
device passthrough on ARM.

In order to passthrough a non-PCI device, the user will have to:
- Map manually MMIO/IRQ
- Describe the device in the newly partial device tree support
- Specify the list of device protected by an IOMMU to assign to the
guest.

While this solution is primitive, this is allow us to support more complex
device in Xen with an little additionnal work for the user. Attempting to
do it automatically is more difficult because we may not know the dependencies
between devices (for instance a Network card and a phy).

To avoid adding code in DOM0 to manage platform device deassignment, the
user has to add the property "xen,passthrough" to the device tree node
describing the device. This can be easily done via U-Boot. For instance,
if we want to passthrough the second network card of a Midway server to the
guest. The user will have to add the following line the u-boot script:

fdt set /soc/ethernet@fff51000 xen,passthrough

This series has been tested on Midway by assigning the secondary network card
to a guest (see instruction below). Though, it requires a separate patch as
we decide to not support the Midway SMMU within the new drivers.

I plan to do futher testing on other boards.

A working tree can be found here:
git://xenbits.xen.org/julieng/xen-unstable.git branch passthrough-v4

Major changes in v4:
- The partial device tree option can only be used with trusted
device tree
- Add more documentation
- Use DOMCTL_bind_pt_irq rather than PHYSDEVOP_map_pirq
- Add XSM support for non-PCI passthrough

Major changes in v3:
- Rework the approach to passthrough a device (xen,passthrough +
  partial device tree).
- Extend the existing hypercalls to assign/deassign device rather than
adding new one.
- Merge series [4] and [5] in this serie.

Major changes in v2:
 - Drop the patch #1 of the previous version
 - Virtual IRQ are not anymore equal to the physical interrupt
 - Move the hypercall to get DT informations for privcmd to domctl
 - Split the domain creation in 2 two parts to allow per guest
 VGIC configuration (such as the number of SPIs).
 - Bunch of typoes, commit improvement, function renaming.

For all changes see in each patch.

Ian: I think, patch #1-#5 has been acked by all relevant people. Can you push
them to xen unstable?

Sincerely yours,

[1] http://lists.xen.org/archives/html/xen-devel/2014-07/msg04090.html
[2] http://lists.xenproject.org/archives/html/xen-devel/2014-12/msg01386.html
[3] http://lists.xenproject.org/archives/html/xen-devel/2014-12/msg01612.html
[4] http://lists.xen.org/archives/html/xen-devel/2014-11/msg01672.html
[5] http://lists.xenproject.org/archives/html/xen-devel/2014-07/msg02098.html

=

Instructions to passthrough a non-PCI device

The example will use the secondary network card for the midway server.

1) Mark the device to let Xen knowns the device will be used for passthrough.
This is done in the device tree node describing the device by adding the
property "xen,passthrough". The command to do it in U-Boot is:

fdt set /soc/ethernet@fff51000 xen,passthrough

2) Create the partial device tree describing the device. The IRQ are mapped
1:1 to the guest (i.e VIRQ == IRQ). For MMIO will have to find hole in the
guest memory layout (see xen/include/public/arch-arm.h, noted the layout
is not stable and can change between 2 releases version of Xen).

/dts-v1/;

/ {
#address-cells = <2>;
#size-cells = <2>;

aliases {
net = &mac0;
};

passthrough {
compatible = "simple-bus";
ranges;
#address-cells = <2>;
#size-cells = <2>;
mac0: ethernet@1000 {
compatible = "calxeda,hb-xgmac";
reg = <0 0x1000 0 0x1000>;
interrupts = <0 80 4  0 81 4  0 82 4>;
/* dma-coherent can't be set because it requires platform
 * specific code for highbank
 */
/*  dma-coherent; */
};

foo {
my = <&mac0>;
};
};
};

3) Compile the partial guest device with dtc (Device Tree Compiler).
For our purpose, the compiled file will be called guest-midway.dtb and
placed in /root in DOM0.

3) Add the following options in the guest configuration file:

device_tree = "/root/guest-midway.dtb"
dtdev = [ "/soc/ethernet@fff51000" ]
irqs = [ 112, 113, 114 ]
iomem = [ "0xfff51,1@0x1" ]

Cc: manish.ja...@caviumnetworks.com
Cc: suravee.suthikulpa...@amd.com
Cc: andrii.tseglyts...@globallogic.com
Cc: robert.vanvos...@dornerworks.com
Cc: josh.whiteh...@dornerworks.com

Julien Grall (33):
  xen/arm: Divide GIC initialization in 2 parts
  xen/dts: Allow only IRQ translation that are mapped to main GIC
  xen/dts: Use unsigned int for MMIO and IRQ index
  xen/ar

[Xen-devel] [PATCH v4 01/33] xen/arm: Divide GIC initialization in 2 parts

2015-03-19 Thread Julien Grall
Currently the function to translate IRQ from the device tree is set
unconditionally  to be able to be able to retrieve serial/timer IRQ before the
GIC has been initialized.

It assumes that the xlate function won't ever changed. We may also need to
have the primary interrupt controller very early.

Rework the gic initialization in 2 parts:
- gic_preinit: Get the interrupt controller device tree node and set
up GIC and xlate callbacks
- gic_init: Initialize the interrupt controller and the boot CPU
interrupts.

The former function will be called just after the IRQ subsystem as been
initialized.

Signed-off-by: Julien Grall 
Acked-by: Stefano Stabellini 
Acked-by: Ian Campbell 
Cc: Frediano Ziglio 
Cc: Zoltan Kiss 

---
Note that the HIP04 GIC driver has not been modified because I don't
have a platform where I can test my changes. Although, the code is
still building.

I let the Hisilicon guys (Frediano and Zoltan) providing a suitable
patch for there platform.

Meanwhile, I think it can go upstream as it has been acked by both
Ian and Stefano.

Changes in v4:
- Rebase on the latest staging (no functional changes)
- Add Ian and Stefano's ack
- Typo in the commit message

Changes in v3:
- Patch was previously sent in a separate series [1]
- Reorder the function to avoid forward declaration
- Make gic-v3 driver compliant to the new interface
- Remove spurious field addition in gicv2 structure

Changelog based on the separate series:

Changes in v3:
- Patch added.

[1] https://patches.linaro.org/33313/
---
 xen/arch/arm/gic-v2.c | 70 ++-
 xen/arch/arm/gic-v3.c | 75 ---
 xen/arch/arm/gic.c| 16 --
 xen/arch/arm/setup.c  |  3 +-
 xen/include/asm-arm/gic.h |  8 +
 5 files changed, 100 insertions(+), 72 deletions(-)

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 20cdbc9..3be4ad6 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -674,37 +674,10 @@ static hw_irq_controller gicv2_guest_irq_type = {
 .set_affinity = gicv2_irq_set_affinity,
 };
 
-const static struct gic_hw_operations gicv2_ops = {
-.info= &gicv2_info,
-.secondary_init  = gicv2_secondary_cpu_init,
-.save_state  = gicv2_save_state,
-.restore_state   = gicv2_restore_state,
-.dump_state  = gicv2_dump_state,
-.gicv_setup  = gicv2v_setup,
-.gic_host_irq_type   = &gicv2_host_irq_type,
-.gic_guest_irq_type  = &gicv2_guest_irq_type,
-.eoi_irq = gicv2_eoi_irq,
-.deactivate_irq  = gicv2_dir_irq,
-.read_irq= gicv2_read_irq,
-.set_irq_properties  = gicv2_set_irq_properties,
-.send_SGI= gicv2_send_SGI,
-.disable_interface   = gicv2_disable_interface,
-.update_lr   = gicv2_update_lr,
-.update_hcr_status   = gicv2_hcr_status,
-.clear_lr= gicv2_clear_lr,
-.read_lr = gicv2_read_lr,
-.write_lr= gicv2_write_lr,
-.read_vmcr_priority  = gicv2_read_vmcr_priority,
-.read_apr= gicv2_read_apr,
-.make_dt_node= gicv2_make_dt_node,
-};
-
-/* Set up the GIC */
-static int __init gicv2_init(struct dt_device_node *node, const void *data)
+static int __init gicv2_init(void)
 {
 int res;
-
-dt_device_set_used_by(node, DOMID_XEN);
+const struct dt_device_node *node = gicv2_info.node;
 
 res = dt_device_get_address(node, 0, &gicv2.dbase, NULL);
 if ( res || !gicv2.dbase || (gicv2.dbase & ~PAGE_MASK) )
@@ -727,9 +700,6 @@ static int __init gicv2_init(struct dt_device_node *node, 
const void *data)
 panic("GICv2: Cannot find the maintenance IRQ");
 gicv2_info.maintenance_irq = res;
 
-/* Set the GIC as the primary interrupt controller */
-dt_interrupt_controller = node;
-
 /* TODO: Add check on distributor, cpu size */
 
 printk("GICv2 initialization:\n"
@@ -774,8 +744,42 @@ static int __init gicv2_init(struct dt_device_node *node, 
const void *data)
 
 spin_unlock(&gicv2.lock);
 
+return 0;
+}
+
+const static struct gic_hw_operations gicv2_ops = {
+.info= &gicv2_info,
+.init= gicv2_init,
+.secondary_init  = gicv2_secondary_cpu_init,
+.save_state  = gicv2_save_state,
+.restore_state   = gicv2_restore_state,
+.dump_state  = gicv2_dump_state,
+.gicv_setup  = gicv2v_setup,
+.gic_host_irq_type   = &gicv2_host_irq_type,
+.gic_guest_irq_type  = &gicv2_guest_irq_type,
+.eoi_irq = gicv2_eoi_irq,
+.deactivate_irq  = gicv2_dir_irq,
+.read_irq= gicv2_read_irq,
+.set_irq_properties  = gicv2_set_irq_properties,
+.send_SGI= gicv2_send_SGI,
+.disable_interface   = gicv2_disable_i

Re: [Xen-devel] [PATCH v2] xentop: add support for qdisks

2015-03-19 Thread Charles Arnold
>>> On 3/19/2015 at 12:09 PM, Anthony PERARD  wrote: 
> On Wed, Mar 18, 2015 at 04:12:26PM +, Ian Campbell wrote:
>> My second concern here is with the use of /var/run/xen/qmp-libxl-%i from
>> outside of libxl. I can't remember if qemu is safe against multiple
>> users of the socket. ISTR asking Anthony this before, but I don't recall
>> the answer, sorry :-(
> 
> Last time I checked, only one client at a time can connect to the socket.
> If a second user want to connect to the socket, it will be blocked until
> the first one disconnect.

This seems correct based on some of my testing.  In rare cases (perhaps not so
rare for VMs under heavy I/O load), reading the socket to get the stats times 
out
and so xentop will report '0' for the read/write stats until the next read 
attempt
that succeeds.

It looks as if we do need a second socket to qemu.  I will include that in the 
next
patch version.

- Charles



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.

2015-03-19 Thread Konrad Rzeszutek Wilk
On Thu, Mar 19, 2015 at 07:53:35AM +, Jan Beulich wrote:
> >>> On 17.03.15 at 16:38,  wrote:
> > There is race when we clear the STATE_SCHED in the softirq
> > - which allows the 'raise_softirq_for' (on another CPU or
> > on the one running the softirq) to schedule the dpci.
> > 
> > Specifically this can happen when the other CPU receives
> > an interrupt, calls 'raise_softirq_for', and puts the dpci
> > on its per-cpu list (same dpci structure). Note that
> > this could also happen on the same physical CPU, however
> > the explanation for simplicity will assume two CPUs actors.
> > 
> > There would be two 'dpci_softirq' running at the same time
> > (on different CPUs) where on one CPU it would be executing
> > hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN)
> > and on the other CPU it is trying to call:
> > 
> >if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
> > BUG();
> > 
> > Since STATE_RUN is already set it would end badly.
> > 
> > The reason we can get his with this is when an interrupt
> > affinity is set over multiple CPUs.
> > 
> > Potential solutions:
> > 
> > a) Instead of the BUG() we can put the dpci back on the per-cpu
> > list to deal with later (when the softirq are activated again).
> > This putting the 'dpci' back on the per-cpu list is an spin
> > until the bad condition clears.
> > 
> > b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for
> > to detect for 'STATE_RUN' bit being set and schedule the dpci.
> > The BUG() check in dpci_softirq would be replace with a spin until
> > 'STATE_RUN' has been cleared. The dpci would still not
> > be scheduled when STATE_SCHED bit was set.
> > 
> > c) Only schedule the dpci when the state is cleared
> > (no STATE_SCHED and no STATE_RUN).  It would spin if STATE_RUN is set
> > (as it is in progress and will finish). If the STATE_SCHED is set
> > (so hasn't run yet) we won't try to spin and just exit.
> > 
> > Down-sides of the solutions:
> > 
> > a). Live-lock of the CPU. We could be finishing an dpci, then adding
> > the dpci, exiting, and the processing the dpci once more. And so on.
> > We would eventually stop as the TIMER_SOFTIRQ would be set, which will
> > cause SCHEDULER_SOFTIRQ to be set as well and we would exit this loop.
> > 
> > Interestingly the old ('tasklet') code used this mechanism.
> > If the function assigned to the tasklet was running  - the softirq
> > that ran said function (hvm_dirq_assist) would be responsible for
> > putting the tasklet back on the per-cpu list. This would allow
> > to have an running tasklet and an 'to-be-scheduled' tasklet
> > at the same time.
> > 
> > b). is similar to a) - instead of re-entering the dpci_softirq
> > we are looping in the softirq waiting for the correct condition to
> > arrive. As it does not allow unwedging ourselves because the other
> > softirqs are not called - it is less preferable.
> > 
> > c) can cause an dead-lock if the interrupt comes in when we are
> > processing the dpci in the softirq - iff this happens on the same CPU.
> > We would be looping in on raise_softirq waiting for STATE_RUN
> > to be cleared, while the softirq that was to clear it - is preempted
> > by our interrupt handler.
> > 
> > As such, this patch - which implements a) is the best candidate
> > for this quagmire.
> > 
> > Reported-and-Tested-by: Sander Eikelenboom 
> > Reported-and-Tested-by: Malcolm Crossley 
> > Signed-off-by: Konrad Rzeszutek Wilk 
> 
> So I now agree that in the state we're in this is the most reasonable
> fix. My reservations against the extra logic introduced earlier (and

Thank you.

Are you OK with me checking it in?
> being fixed here) stand though: From an abstract perspective the
> IRQ and softirq logic alone should be sufficient to deal with the
> needs we have. The complications really result from the desire to
> use a per-CPU list of hvm_pirq_dpci-s, which I still think a simpler
> alternative should be found for (after all real hardware doesn't use
> such lists).
> 
> A first thought would be to put them all on a per-domain list and
> have a cpumask tracking which CPUs they need servicing on. The
> downside of this would be (apart from again not being a proper
> equivalent of how actual hardware handles this) that - the softirq
> handler not having any other context - domains needing servicing
> would also need to be tracked in some form (in order to avoid
> having to iterate over all of them), and a per-CPU list would be
> undesirable for the exact same reasons. Yet a per-CPU
> domain-needs-service bitmap doesn't seem very attractive either,
> i.e. this would need further thought (also to make sure such an
> alternative model doesn't become even more involved than what
> we have now).

HA! (yes, I completly agree on - "complex" == "unpleasant")

Perhaps we can brainstorm some of this at XenHackathon in Shanghai?
> 
> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen

Re: [Xen-devel] [PATCH 1/2] xen: prepare p2m list for memory hotplug

2015-03-19 Thread Paul Bolle
On Thu, 2015-03-19 at 15:31 +0100, Juergen Gross wrote:
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c

> +#ifdef CONFIG_X86_32
> +BUILD_BUG_ON_MSG(CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT > 64)
> +#endif

I assume BUILD_BUG_ON_MSG() aborts the build. 

> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig

> +config XEN_BALLOON_MEMORY_HOTPLUG_LIMIT
> + int
> + default 512 if X86_64
> + default 4 if X86_32
> + depends on XEN_HAVE_PVMMU
> + depends on XEN_BALLOON_MEMORY_HOTPLUG
> + help
> +   Upper limit in GBs a pv domain can be expanded to using memory
> +   hotplug.
> +
> +   This value is used to allocate enough space in internal tables needed
> +   for physical memory administration.
> +

I think adding a
range 1 64 if X86_32

would allow to drop the BUILD_BUG_ON_MSG(). (I haven't tested this so
you're allowed to bark at me if this ends up wasting your time.)


Paul Bolle


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] (v2) VT-d Posted-interrupt (PI) design for XEN

2015-03-19 Thread Konrad Rzeszutek Wilk
On Thu, Mar 19, 2015 at 03:03:55AM +, Wu, Feng wrote:
> Thanks for the comments!
> 
> > -Original Message-
> > From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
> > Sent: Thursday, March 19, 2015 12:10 AM
> > To: Wu, Feng
> > Cc: xen-devel@lists.xen.org; Zhang, Yang Z; Tian, Kevin; Keir Fraser
> > (k...@xen.org); Jan Beulich (jbeul...@suse.com)
> > Subject: Re: [Xen-devel] (v2) VT-d Posted-interrupt (PI) design for XEN
> > 
> > On Wed, Mar 18, 2015 at 12:44:21PM +, Wu, Feng wrote:
> > > VT-d Posted-interrupt (PI) design for XEN
> > >
> > > Background
> > > ==
> > > With the development of virtualization, there are more and more device
> > > assignment requirements. However, today when a VM is running with
> > > assigned devices (such as, NIC), external interrupt handling for the 
> > > assigned
> > > devices always needs VMM intervention.
> > >
> > > VT-d Posted-interrupt is a more enhanced method to handle interrupts
> > > in the virtualization environment. Interrupt posting is the process by
> > > which an interrupt request is recorded in a memory-resident
> > > posted-interrupt-descriptor structure by the root-complex, followed by
> > > an optional notification event issued to the CPU complex.
> > >
> > > With VT-d Posted-interrupt we can get the following advantages:
> > > - Direct delivery of external interrupts to running vCPUs without VMM
> > > intervention
> > 
> > 
> > I hadn't digged deep in what Xen has currently - but I would assume that
> > this is exactly what we have now in Xen?
> 
> Here is what Xen currently does for external interrupts from assigned devices:
> 
> When a VM is running and an external interrupts from an assigned devices 
> occurs
> for it. VM-EXIT happens, then:
> 
> vmx_do_extint() --> do_IRQ() --> __do_IRQ_guest() --> hvm_do_IRQ_dpci() --> 
> raise_softirq_for(pirq_dpci) --> raise_softirq(HVM_DPCI_SOFTIRQ)
> 
> softirq HVM_DPCI_SOFTIRQ is bound to dpci_softirq()
> 
> dpci_softirq() --> hvm_dirq_assist() --> vmsi_deliver_pirq() --> 
> vmsi_deliver() -->
> vmsi_inj_irq() --> vlapic_set_irq()

 This would be fantastic to put in the design document to help
people make sure that their expectations are in line.

> 
> vlapic_set_irq() does the following things:
> 1. If CPU-side posted-interrupt is supported (I think it is supported from 
> Xen 4.3, or Xen 4.4,
> sorry, not quite remember the exact version), call vmx_deliver_posted_intr() 
> to deliver
> the virtual interrupt via posted-interrupt infrastructure.

The benefit is that if an interrupt comes for VCPU0 instead of
VCPU1 we can inject the interrupt in the VCPU1 without having it
do an VMEXIT.

However if we pin the vCPUs, then CPU-side posted interrupt do not
help - we still have to process the interrupt in Xen hypervisor.

> 2. Else If CPU-side posted-interrupt is not supported, set the related vIRR 
> in vLAPIC
> page and call vcpu_kick() to kick the related vCPU. Before VM-Entry, 
> vmx_intr_assist()
> will help to inject the interrupt to guests.
> 
> However, after VT-d PI is supported, when a guest is running in non-root and 
> an
> external interrupt from an assigned device occurs for it. _no_ VM-Exit is 
> needed,
> the guest can handle this totally in non-root mode, thus avoiding all the 
> above
> code flow.

 However it does require for Linux PVHVM guests to not use the
vector callback mechanism - or rather - not use the event mechanism.

What you require for this to work on the Linux side is for the PCIe
device to use the 'baremetal' mechanism to setup MSIs (program the
IOAPIC, etc). It would be worth mentioning this in the document too.

> 
> > 
> > Hm, actually we seem to be still invoking the hypervisor on the
> > interrupts  -except that if we need to dispatch it to another CPU
> > using an normal vector to do so - which would still cause the
> > hypervisor to be invoked? Or does it actually go straight in the
> > guest?
> > 
> 
> Like what I mentioned above, If the guest is running, we don't need invoke 
> hypervisor.
> 
> > So what kind of support do we currently have in Xen from posted
> > interrupt? Could you add a bit about this in the background please?
> 
> Good suggestion.
> 
> Currently, Xen only supports the CPU-side posted-interrupt. Like what I 
> mentioned above,
> function vlapic_set_irq() can use this to deliver virtual interrupts, 
> basically there are several
> methods to deliver virtual interrupts to guests:
> - Event delivery before VM-Entry via __vmx_inject_exception(), this is the 
> oldest way.
> - After APICv was enabled, we had hardware support for virtual interrupt 
> delivery, virtual
> interrupts are stored in virtual LAPIC page, after VM-Entry, guests can 
> evaluate these
> virtual interrupt and handle them in non-root mode.
> - As an enhancement to APICv, CPU-side posted-interrupt was introduced, like 
> above comments,
> with this new feature, we don't need to kick the vCPU and deliver the virtual 
> interrupts
> direct to it.
> 
> About AP

Re: [Xen-devel] [PATCH v2 07/13] libxc: Fix xc_tmem_control to return proper error.

2015-03-19 Thread Konrad Rzeszutek Wilk
On Thu, Mar 19, 2015 at 04:39:49PM +, Ian Campbell wrote:
> On Wed, 2015-03-18 at 20:24 -0400, Konrad Rzeszutek Wilk wrote:
> > The API returns now negative values on error and stashes
> > the error in errno. Fix the user of this API.
> > 
> > The 'xc_hypercall_bounce_pre' can fail - and if so it will
> > stash its errno values - no need to over-write it.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk 
> 
> Acked-by: Ian Campbell 
> 
> I'm still a little concerned about xenstat.c's handling of errno!
> =-ENOSYS, but not enough to nack.

You mean not handling it :-)

Yeah, there is certainly some more tmem related changes (another
wrapper function) so that it returns 0 when 'tmem' is not enabled
(and not modify 'errno'). But not this week..
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 08/13] libxc: Check xc_domain_maximum_gpfn for negative return values

2015-03-19 Thread Konrad Rzeszutek Wilk
On Thu, Mar 19, 2015 at 04:47:58PM +, Ian Campbell wrote:
> On Wed, 2015-03-18 at 20:24 -0400, Konrad Rzeszutek Wilk wrote:
> > Instead of assuming everything is always OK. We stash
> > the gpfns value as an parameter.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk 
> > ---
> >  tools/libxc/xc_core_arm.c| 17 ++---
> >  tools/libxc/xc_core_x86.c| 24 
> >  tools/libxc/xc_domain_save.c |  8 +++-
> >  3 files changed, 41 insertions(+), 8 deletions(-)
> > 
> > diff --git a/tools/libxc/xc_core_arm.c b/tools/libxc/xc_core_arm.c
> > index 16508e7..26cec04 100644
> > --- a/tools/libxc/xc_core_arm.c
> > +++ b/tools/libxc/xc_core_arm.c
> > @@ -31,9 +31,16 @@ xc_core_arch_gpfn_may_present(struct 
> > xc_core_arch_context *arch_ctxt,
> >  }
> >  
> > 
> > -static int nr_gpfns(xc_interface *xch, domid_t domid)
> > +static int nr_gpfns(xc_interface *xch, domid_t domid, unsigned long *gpfns)
> 
> You didn't fancy merging the two versions of this then ;-)

I was not sure where you would want to put them. xc_private looks
like the best place, but perhaps it should be in an new file?

> > diff --git a/tools/libxc/xc_core_x86.c b/tools/libxc/xc_core_x86.c
> > index d8846f1..02377e8 100644
> > --- a/tools/libxc/xc_core_x86.c
> > +++ b/tools/libxc/xc_core_x86.c
> 
> > @@ -88,7 +99,12 @@ xc_core_arch_map_p2m_rw(xc_interface *xch, struct 
> > domain_info_context *dinfo, xc
> >  int err;
> >  int i;
> >  
> > -dinfo->p2m_size = nr_gpfns(xch, info->domid);
> > +err = nr_gpfns(xch, info->domid, &dinfo->p2m_size);
> 
> Please could you avoid reusing err here, the reason is that it's sole
> use now is to save errno over the cleanup path, whereas here it looks
> like it is going to be used for something but it isn't.
> 
>  if ( nr_gpfns(...)  < 0 )
> 
> is ok per the Xen coding style if you don't actually need the return
> code.
> 
> Or
> 
> ret = nr_gpfns()
> if ( ret < 0 )
> error, goto out
> 
> ret = -1;
> .. the rest
> 
> would be ok too I guess. (coding style here allows
> if ( (ret = nr_gpfns(...)) < 0 )
> too FWIW).
> 
> > +if ( err < 0 )
> > +{
> > +ERROR("nr_gpfns returns errno: %d.", errno);
> > +goto out;
> > +}
> >  if ( dinfo->p2m_size < info->nr_pages  )
> >  {
> >  ERROR("p2m_size < nr_pages -1 (%lx < %lx", dinfo->p2m_size, 
> > info->nr_pages - 1);
> > diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
> > index 254fdb3..6346c12 100644
> > --- a/tools/libxc/xc_domain_save.c
> > +++ b/tools/libxc/xc_domain_save.c
> > @@ -939,7 +939,13 @@ int xc_domain_save(xc_interface *xch, int io_fd, 
> > uint32_t dom, uint32_t max_iter
> >  }
> >  
> >  /* Get the size of the P2M table */
> > -dinfo->p2m_size = xc_domain_maximum_gpfn(xch, dom) + 1;
> > +rc = xc_domain_maximum_gpfn(xch, dom);
> > +if ( rc < 0 )
> > +{
> > +ERROR("Could not get maximum GPFN!");
> > +goto out;
> > +}
> > +dinfo->p2m_size = rc + 1;
> 
> Shame this can't use the same helper as the others.

But if we do stick that 'nr_gpfns' in xc_private.c it could!
> 
> Ian.
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] xen: before ballooning hotplugged memory, set frames to invalid

2015-03-19 Thread Daniel Kiper
On Thu, Mar 19, 2015 at 06:22:06PM +0100, Juergen Gross wrote:
> On 03/19/2015 05:21 PM, Daniel Kiper wrote:
> >On Thu, Mar 19, 2015 at 03:31:02PM +0100, Juergen Gross wrote:
> >>Commit 25b884a83d487fd62c3de7ac1ab5549979188482 ("x86/xen: set
> >>regions above the end of RAM as 1:1") introduced a regression.
> >>
> >>To be able to add memory pages which were added via memory hotplug to
> >>a pv domain, the pages must be "invalid" instead of "identity" in the
> >>p2m list before they can be added.
> >>
> >>Suggested-by: David Vrabel 
> >>Signed-off-by: Juergen Gross 
> >>---
> >>  drivers/xen/balloon.c | 13 +++--
> >>  1 file changed, 11 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> >>index 0b52d92..52e331f 100644
> >>--- a/drivers/xen/balloon.c
> >>+++ b/drivers/xen/balloon.c
> >>@@ -221,15 +221,24 @@ static bool balloon_is_inflated(void)
> >>
> >>  static enum bp_state reserve_additional_memory(long credit)
> >>  {
> >>-   int nid, rc;
> >>+   int nid, rc = 0;
> >>u64 hotplug_start_paddr;
> >>unsigned long balloon_hotplug = credit;
> >>+   unsigned long pfn;
> >>
> >>hotplug_start_paddr = PFN_PHYS(SECTION_ALIGN_UP(max_pfn));
> >>balloon_hotplug = round_up(balloon_hotplug, PAGES_PER_SECTION);
> >>nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
> >>
> >>-   rc = add_memory(nid, hotplug_start_paddr, balloon_hotplug << 
> >>PAGE_SHIFT);
> >>+   for (pfn = PFN_DOWN(hotplug_start_paddr);
> >>+!rc && pfn < PFN_DOWN(hotplug_start_paddr) + balloon_hotplug;
> >>+pfn++)
> >>+   if (!set_phys_to_machine(pfn, INVALID_P2M_ENTRY))
> >
> >rc = set_phys_to_machine(pfn, INVALID_P2M_ENTRY)?
>
> Not really. set_phys_to_machine returns false on failure...
>
> >
> >>+   rc = 1;
> >
> >I do not think that this stuff is needed for HVM or PVH guests.
>
> True.
>
> >
> >>+   if (!rc)
> >>+   rc = add_memory(nid, hotplug_start_paddr,
> >>+   balloon_hotplug << PAGE_SHIFT);
> >>
> >>if (rc) {
> >>pr_warn("Cannot add additional memory (%i)\n", rc);
> >
> >It will be nice to know what part of infrastructure failed.
> >Could you create separate pr_warn() message for set_phys_to_machine()?
>
> Value 1 for rc is the indicator for that case.

Well... Personally I prefer explicit messages telling what happened than
something which requires digging in a code to understand a problem.
Additionally, I think that we should use negative numbers (as David
suggested) to signal an error. Most of kernel stuff work in that way.

Daniel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable test] 36514: tolerable FAIL - PUSHED

2015-03-19 Thread xen . org
flight 36514 xen-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/36514/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 35957

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumpuserxen-amd64 13 
rumpuserxen-demo-xenstorels/xenstorels.repeat fail never pass
 test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail  never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  10 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-midway   10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass

version targeted for testing:
 xen  3a28f760508fb35c430edac17a9efde5aff6d1d5
baseline version:
 xen  f919dbc0583797d1c5c09da815518084ce77eb81


People who touched revisions under test:
  Andrew Cooper 
  Daniel De Graaf 
  Dario Faggioli 
  David Vrabel 
  Edgar E. Iglesias 
  Edgar E. Iglesias 
  George Dunlap 
  Ian Campbell 
  Ian Jackson 
  Jan Beulich 
  Julien Grall 
  Kevin Tian 
  Konrad Rzeszutek Wilk 
  Mike Latimer 
  Philipp Hahn 
  Roger Pau Monné 
  Ross Lagerwall 
  Samuel Thibault 
  Sander Eikelenboom 
  Stefano Stabellini 
  Tiejun Chen 
  Tim Deegan 
  Wei Liu 


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-oldkern  pass
 build-i386-oldkern   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-pvh-amd  fail
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64   

Re: [Xen-devel] [PATCH 0/9] qspinlock stuff -v15

2015-03-19 Thread Peter Zijlstra
On Thu, Mar 19, 2015 at 06:01:34PM +, David Vrabel wrote:
> This seems work for me, but I've not got time to give it a more thorough
> testing.
> 
> You can fold this into your series.

Thanks!

> There doesn't seem to be a way to disable QUEUE_SPINLOCKS when supported by
> the arch, is this intentional?  If so, the existing ticketlock code could go.

Yeah, its left as a rudiment such that if we find issues with the
qspinlock code we can 'revert' with a trivial patch. If no issues show
up we can rip out all the old code in a subsequent release.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xen/passthrough: Support a single iommu_domain(context bank) per xen domain per SMMU

2015-03-19 Thread Robbie VanVossen
If multiple devices are being passed through to the same domain and they
share a single SMMU, then they only require a single iommu_domain.

In arm_smmu_assign_dev, before a new iommu_domain is created, the
xen_domain->contexts is checked for any iommu_domains that are already
assigned to device that uses the same SMMU as the current device. If one
is found, attach the device to that iommu_domain. If a new one isn't
found, create a new iommu_domain just like before.

The arm_smmu_deassign_dev function assumes that there is a single
device per iommu_domain. This meant that when the first device was
deassigned, the iommu_domain was freed and when another device was
deassigned a crash occured in xen.

To fix this, a reference counter was added to the iommu_domain struct.
When an arm_smmu_xen_device references an iommu_domain, the
iommu_domains ref is incremented. When that reference is removed, the
iommu_domains ref is decremented. The iommu_domain will only be freed
when the ref is 0.

Signed-off-by: Robbie VanVossen 
---
 xen/drivers/passthrough/arm/smmu.c |  113 
 1 file changed, 88 insertions(+), 25 deletions(-)

diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index a7a7da9..9b46054 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -223,6 +223,7 @@ struct iommu_domain
/* Runtime SMMU configuration for this iommu_domain */
struct arm_smmu_domain  *priv;
 
+   atomic_t ref;
/* Used to link iommu_domain contexts for a same domain.
 * There is at least one per-SMMU to used by the domain.
 * */
@@ -315,6 +316,26 @@ static struct iommu_group *iommu_group_get(struct device 
*dev)
 
 #define iommu_group_get_iommudata(group) (group)->cfg
 
+static int iommu_domain_add_device(struct iommu_domain *domain,
+ struct device *dev)
+{
+   dev_iommu_domain(dev) = domain;
+
+   atomic_inc(&domain->ref);
+
+   return 0;
+}
+
+static int iommu_domain_remove_device(struct iommu_domain *domain,
+ struct device *dev)
+{
+   dev_iommu_domain(dev) = NULL;
+
+   atomic_dec(&domain->ref);
+
+   return 0;
+}
+
 /* Start of Linux SMMU code */
 
 /* Maximum number of stream IDs assigned to a single device */
@@ -1583,7 +1604,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
ret = arm_smmu_domain_add_master(smmu_domain, cfg);
 
if (!ret)
-   dev_iommu_domain(dev) = domain;
+   ret = iommu_domain_add_device(domain, dev);
return ret;
 }
 
@@ -1596,7 +1617,7 @@ static void arm_smmu_detach_dev(struct iommu_domain 
*domain, struct device *dev)
if (!cfg)
return;
 
-   dev_iommu_domain(dev) = NULL;
+   iommu_domain_remove_device(domain, dev);
arm_smmu_domain_remove_master(smmu_domain, cfg);
 }
 
@@ -2569,7 +2590,9 @@ static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
 {
struct iommu_domain *domain;
struct arm_smmu_xen_domain *xen_domain;
+   struct arm_smmu_device *smmu;
int ret;
+   int existing_ctxt_fnd = 0;
 
xen_domain = domain_hvm_iommu(d)->arch.priv;
 
@@ -2585,29 +2608,65 @@ static int arm_smmu_assign_dev(struct domain *d, u8 
devfn,
return ret;
}
 
-   /*
-* TODO: Share the context bank (i.e iommu_domain) when the device is
-* under the same SMMU as another device assigned to this domain.
-* Would it useful for PCI
+   /* 
+* Check to see if a context bank (iommu_domain) already exists for 
this xen domain
+* under the same SMMU 
 */
-   domain = xzalloc(struct iommu_domain);
-   if (!domain)
-   return -ENOMEM;
+   if (!list_empty(&xen_domain->contexts)) {
+   smmu = find_smmu_for_device(dev);
+   if (!smmu) {
+   dev_err(dev, "cannot find SMMU\n");
+   return -ENXIO;
+   }
 
-   ret = arm_smmu_domain_init(domain);
-   if (ret)
-   goto err_dom_init;
+   /* Loop through the &xen_domain->contexts to locate a context 
assigned to this SMMU */
+   spin_lock(&xen_domain->lock);
+   list_for_each_entry(domain, &xen_domain->contexts, list) {
+   if(domain->priv->smmu == smmu)
+   {
+   /* We have found a context already associated 
with the same xen domain and SMMU */
+   ret = arm_smmu_attach_dev(domain, dev);
+   if (ret) {
+   /* 
+* TODO: If arm_smmu_attach_dev fails, 
should we perform arm_smmu_domain_destroy,
+* eventhough another 

Re: [Xen-devel] [PATCH v2] xentop: add support for qdisks

2015-03-19 Thread Anthony PERARD
On Wed, Mar 18, 2015 at 04:12:26PM +, Ian Campbell wrote:
> My second concern here is with the use of /var/run/xen/qmp-libxl-%i from
> outside of libxl. I can't remember if qemu is safe against multiple
> users of the socket. ISTR asking Anthony this before, but I don't recall
> the answer, sorry :-(

Last time I checked, only one client at a time can connect to the socket.
If a second user want to connect to the socket, it will be blocked until
the first one disconnect.

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [libvirt test] 36520: tolerable all pass - PUSHED

2015-03-19 Thread xen . org
flight 36520 libvirt real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/36520/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  bd235cd873f406efac0d5b79c968caa384b8e438
baseline version:
 libvirt  b39b1397ea6e6155b5e363d456196504093edd07


People who touched revisions under test:
  Antoni Segura Puimedon 
  Antoni Segura Puimedon 
  Chen Fan 
  Dawid Zamirski 
  Dawid Zamirski 
  Deepak C Shetty 
  Deepak Shetty 
  Eric Blake 
  Erik Skultety 
  Gao Haifeng 
  Jim Fehlig 
  Jiri Denemark 
  John Ferlan 
  Ján Tomko 
  Laine Stump 
  Luyao Huang 
  Marek Marczykowski 
  Marek Marczykowski-Górecki 
  Martin Kletzander 
  Maxim Nestratov 
  Michael Chapman 
  Michal Privoznik 
  Mikhail Feoktistov 
  Nehal J Wani 
  Pavel Hrdina 
  Peter Krempa 
  Zhang Bo 
  Zhou Yimin 


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass



sg-report-flight on osstest.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images

Logs, config files, etc. are available at
http://www.chiark.greenend.org.uk/~xensrcts/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=libvirt
+ revision=bd235cd873f406efac0d5b79c968caa384b8e438
+ . cri-lock-repos
++ . cri-common
+++ . cri-getconfig
+++ umask 002
+++ getconfig Repos
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
++ repos=/export/home/osstest/repos
++ repos_lock=/export/home/osstest/repos/lock
++ '[' x '!=' x/export/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/export/home/osstest/repos/lock
++ exec with-lock-ex -w /export/home/osstest/repos/lock ./ap-push libvirt 
bd235cd873f406efac0d5b79c968caa384b8e438
+ branch=libvirt
+ revision=bd235cd873f406efac0d5b79c968caa384b8e438
+ . cri-lock-repos
++ . cri-common
+++ . cri-getconfig
+++ umask 002
+++ getconfig Repos
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
++ repos=/export/home/osstest/repos
++ repos_lock=/export/home/osstest/repos/lock
++ '[' x/export/home/osstest/repos/lock '!=' x/export/home/osstest/repos/lock 
']'
+ . cri-common
++ . cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=libvirt
+ xenbranch=xen-unstable
+ '[' xlibvirt = xlinux ']'
+ linuxbranch=
+ '[' x = x ']'
+ qemuubranch=qemu-upstream-unstable
+ : tested/2.6.39.x
+ . ap-common
++ : osst...@xenbits.xensource.com
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xensource.com:/home/xen/git/xen.git
++ : git://xenbits.xen.org/staging/qemu-xen-unstable.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://libvirt.org/libvirt.git
++ : osst...@xenbits.xensource.com:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : git://xenbits.xen.org/rumpuser-xen.git
++ : git
++ : git://xenbits.xen.org/rumpuser-xen.git
++ : osst...@xenbits.xensource.com:/home/xen/git/rumpuser-xen.git
+++ besteffort_repo https://github.com/rumpkernel/rumpkernel-netbsd-src
+++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src
+++ cached_repo https://github.com/rumpkernel/rumpkernel-netbsd-src 
'[fetch=try]'
+++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src
+++ local 'options=[fetch=try]'
 getconfig GitCacheProxy
 perl -e '
use Osstest;
readglobalconfig();
print $c{"GitCacheProxy"} or die $!;
'
+++ local cache=git://drall.uk.xensource.com:9419/
+++ '[' xgit://drall.uk.xensource.com:9419/ '!=' x ']'
+++ echo 
'git://drall.uk.xensource.com:9419/https://github.com/rum

[Xen-devel] Outreachy / OPW application deadline on March 24th

2015-03-19 Thread Lars Kurth
Hi all,
I have not seen anyone applying for Outreachy or working on any small projects 
for Xen Project. But then I spot checked a number of mailing lists for other 
orgs participating and it has not been that different for the other projects I 
checked. It is possible that the name change is having a negative impact. In 
any case, if you do know of applicants, do point them to your list as well as
* http://wiki.xenproject.org/wiki/Outreachy/Round10 
 
* 
https://blog.xenproject.org/2015/03/18/xen-project-participates-in-outreachy-formerly-opw/
 

 
Best Regards
Lars ___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/9] qspinlock stuff -v15

2015-03-19 Thread David Vrabel
On 16/03/15 13:16, Peter Zijlstra wrote:
> 
> I feel that if someone were to do a Xen patch we can go ahead and merge this
> stuff (finally!).

This seems work for me, but I've not got time to give it a more thorough
testing.

You can fold this into your series.

There doesn't seem to be a way to disable QUEUE_SPINLOCKS when supported by
the arch, is this intentional?  If so, the existing ticketlock code could go.

David

8<--
x86/xen: paravirt support for qspinlocks

Provide the wait and kick ops necessary for paravirt-aware queue
spinlocks.

Signed-off-by: David Vrabel 
---
 arch/x86/xen/spinlock.c |   40 +---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 956374c..b019b2a 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -95,17 +95,43 @@ static inline void spin_time_accum_blocked(u64 start)
 }
 #endif  /* CONFIG_XEN_DEBUG_FS */
 
+static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
+static DEFINE_PER_CPU(char *, irq_name);
+static bool xen_pvspin = true;
+
+#ifdef CONFIG_QUEUE_SPINLOCK
+
+#include 
+
+PV_CALLEE_SAVE_REGS_THUNK(__pv_queue_spin_unlock);
+
+static void xen_qlock_wait(u8 *ptr, u8 val)
+{
+   int irq = __this_cpu_read(lock_kicker_irq);
+
+   xen_clear_irq_pending(irq);
+
+   barrier();
+
+   if (READ_ONCE(*ptr) == val)
+   xen_poll_irq(irq);
+}
+
+static void xen_qlock_kick(int cpu)
+{
+   xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
+}
+
+#else
+
 struct xen_lock_waiting {
struct arch_spinlock *lock;
__ticket_t want;
 };
 
-static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
-static DEFINE_PER_CPU(char *, irq_name);
 static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting);
 static cpumask_t waiting_cpus;
 
-static bool xen_pvspin = true;
 __visible void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 {
int irq = __this_cpu_read(lock_kicker_irq);
@@ -217,6 +243,7 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
}
}
 }
+#endif /* !QUEUE_SPINLOCK */
 
 static irqreturn_t dummy_handler(int irq, void *dev_id)
 {
@@ -280,8 +307,15 @@ void __init xen_init_spinlocks(void)
return;
}
printk(KERN_DEBUG "xen: PV spinlocks enabled\n");
+#ifdef CONFIG_QUEUE_SPINLOCK
+   pv_lock_ops.queue_spin_lock_slowpath = __pv_queue_spin_lock_slowpath;
+   pv_lock_ops.queue_spin_unlock = PV_CALLEE_SAVE(__pv_queue_spin_unlock);
+   pv_lock_ops.wait = xen_qlock_wait;
+   pv_lock_ops.kick = xen_qlock_kick;
+#else
pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
+#endif
 }
 
 /*
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-4.2-testing test] 36512: tolerable FAIL - PUSHED

2015-03-19 Thread xen . org
flight 36512 xen-4.2-testing real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/36512/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-i386-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 debian-hvm-install  fail never pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64  7 debian-hvm-install fail never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-i386-i386-libvirt   10 migrate-support-checkfail   never pass
 build-i386-rumpuserxen5 rumpuserxen-buildfail   never pass
 build-amd64-rumpuserxen   5 rumpuserxen-buildfail   never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-i386-i386-xl-winxpsp3   14 guest-stop   fail   never pass
 test-i386-i386-xl-qemuu-winxpsp3 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xend-winxpsp3 17 leak-check/check fail  never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-i386-i386-xl-qemut-winxpsp3 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass

version targeted for testing:
 xen  5bec01c19839e150e489dd04376c65f961830c86
baseline version:
 xen  bbf1b2bde00075648c96065ba0dc390150c4808f


People who touched revisions under test:
  Ian Campbell 
  Ian Jackson 
  Jan Beulich 


jobs:
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-xl  pass
 test-amd64-i386-xl   pass
 test-i386-i386-xlpass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-qemuu-freebsd10-amd64pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 fail
 test-amd64-i386-xl-qemuu-ovmf-amd64  fail
 test-amd64-amd64-rumpuserxen-amd64   blocked 
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64fail
 test-amd64-amd64-xl-credit2  pass
 test-i386-i386-xl-credit2pass
 test-amd64-i386-qemuu-freebsd10-i386

Re: [Xen-devel] [PATCH v2] xentop: add support for qdisks

2015-03-19 Thread Charles Arnold
>>> On 3/18/2015 at 10:12 AM, Ian Campbell  wrote: 
> On Wed, 2015-03-11 at 11:51 -0600, Charles Arnold wrote:
>> Now that Xen uses qdisks by default and qemu does not write out
>> statistics to sysfs this patch queries the QMP for disk statistics.
>> 
>> This patch depends on libyajl for parsing statistics returned from
>> QMP. The runtime requires libyajl 2.0.3 or newer for required bug
>> fixes in yajl_tree_parse().
> 
> Elsewhere we currently support libyajl1 even, which means that this is
> all configure tests for.
> 
> You say bug fixes here, but the code comment says:
>  /* Use libyajl version 2.1.x or newer for the tree parser feature with bug 
> fixes */
> 
> which suggests it perhaps didn't even exist in earlier versions. Also I
> note the quoted versions differ, FWIW.

I looked it up in libyajl's ChangeLog and noticed it was fixed specifically
in 2.0.3 and noted that in the patch header but failed to go back and fix
the code comment. I'll fix the code comment.

> 
> Whether the interface exists (even in buggy form) or not in older
> versions is important because if it doesn't exist then that would be a
> build failure, which we would want to avoid.

Right. The tree feature was added in version 2.0.0 (again according
to the ChangeLog file).  I guess you would prefer not making this a
requirement in tools/configure given the statement below.

> 
> Whereas a functional failure would perhaps be tolerable. However, given
> the existing HAVE_YAJL_YAJL_VERSION_H define I think the code could
> easily check if the YAJL library is good enough at compile time and stub
> itself out -- i.e. not report qdisk stats if the yajl doesn't do the
> job.

Ok, I'll do it this way.

> 
> My second concern here is with the use of /var/run/xen/qmp-libxl-%i from
> outside of libxl. I can't remember if qemu is safe against multiple
> users of the socket. ISTR asking Anthony this before, but I don't recall
> the answer, sorry :-(
> 
> Even if it is strictly speaking ok it seems a bit warty to do it, but
> perhaps for an in-tree user like libxenstat it is tolerable.
> Alternatively we could (relatively) easily arrange for their to be a
> second qemp-libxenstat-%i socket, assuming the qemu overhead of a second
> one is sane.

As a test I modified libxl to create a qmp-libxenstat-%i socket and updated
libxenstat to use it instead of qmp-libxl-%i.  It works fine although I don't
know if there is any performance penalty for having a second socket. I am
ok with going with this solution if this is preferred.

> 
> Would it be possible to include somewhere, either in a code comment or
> in the changelog, an example of the JSON response to the QMP commands.

No problem.

> 
> (I'm also consistently surprised by the lack of a qmp client library,
> but that's not your fault!)
> 
>> diff --git a/tools/xenstat/libxenstat/src/xenstat.c 
> b/tools/xenstat/libxenstat/src/xenstat.c
>> index 8072a90..f3847be 100644
>> --- a/tools/xenstat/libxenstat/src/xenstat.c
>> +++ b/tools/xenstat/libxenstat/src/xenstat.c
>> @@ -657,6 +657,24 @@ static void xenstat_uninit_xen_version(xenstat_handle * 
> handle)
>>   * VBD functions
>>   */
>>  
>> +/* Save VBD information */
>> +xenstat_vbd *xenstat_save_vbd(xenstat_domain *domain, xenstat_vbd *vbd)
>> +{
>> +if (domain->vbds == NULL) {
>> +domain->num_vbds = 1;
>> +domain->vbds = malloc(sizeof(xenstat_vbd));
>> +} else {
>> +domain->num_vbds++;
>> +domain->vbds = realloc(domain->vbds,
>> +   domain->num_vbds *
>> +   sizeof(xenstat_vbd));
>> +}
> 
> FYI realloc handles the old pointer being NULL just fine, so you don't
> need to special case that so long as num_vbds starts out initialised to
> 0.
> 
> Also, if realloc returns NULL then you need to have remembered the old
> value to free it, else it gets leaked.
> 
>> @@ -477,18 +480,10 @@ int xenstat_collect_vbds(xenstat_node * node)
>>  continue;
>>  }
>>  
>> -if (domain->vbds == NULL) {
>> -domain->num_vbds = 1;
>> -domain->vbds = malloc(sizeof(xenstat_vbd));
>> -} else {
>> -domain->num_vbds++;
>> -domain->vbds = realloc(domain->vbds,
>> -   domain->num_vbds *
>> -   sizeof(xenstat_vbd));
>> -}
> 
> Oh, I see my comments above were actually on the old code you were
> moving.

I'll look at fixing this up based on your realloc comments above.

> 
> 
>> +/* Use libyajl version 2.1.x or newer for the tree parser feature with 
>> bug 
> fixes */
>> +if ((info = yajl_tree_parse((char *)qmp_stats, errbuf, sizeof(errbuf))) 
>> == 
> NULL) {
> 
> You don't want to log something using errbuf? If not then it may as well
> be as small as possible.

Ok.

> 
>> +/* Use

[Xen-devel] [qemu-upstream-4.4-testing test] 36519: regressions - FAIL

2015-03-19 Thread xen . org
flight 36519 qemu-upstream-4.4-testing real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/36519/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-winxpsp3  7 windows-install fail REGR. vs. 31663

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-sedf 15 guest-localmigrate/x10fail REGR. vs. 31663

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xend-winxpsp3 17 leak-check/check fail  never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass

version targeted for testing:
 qemuud173a0c20d7970c17fa593cf86abc1791a8a4a3a
baseline version:
 qemuub04df88d41f64fc6b56d193b6e90fb840cedb1d3


People who touched revisions under test:
  Benoit Canet 
  Benoît Canet 
  Dmitry Fleytman 
  Gerd Hoffmann 
  Jason Wang 
  Jeff Cody 
  Juan Quintela 
  Kevin Wolf 
  Laszlo Ersek 
  Michael Roth 
  Michael S. Tsirkin 
  Peter Maydell 
  Petr Matousek 
  Stefan Hajnoczi 
  Stefano Stabellini 


jobs:
 build-amd64-xend pass
 build-i386-xend  pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64fail
 test-amd64-amd64-xl-credit2  pass
 test-amd64-i386-freebsd10-i386   pass
 test-amd64-amd64-xl-pcipt-intel  fail
 test-amd64-i386-rhel6hvm-intel   pass
 test-amd64-i386-qemut-rhel6hvm-intel pass
 test-amd64-i386-qemuu-rhel6hvm-intel pass
 test-amd64-amd64-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-xl-multivcpupass
 test-amd64-amd64-pairpass
 test-amd64-i386-pair pass
 test-amd64-amd64-xl-sedf-pin pass
 test-amd64-amd64-pv  pass
 test-amd64-i386-pv 

Re: [Xen-devel] [PATCH 2/2] xen: before ballooning hotplugged memory, set frames to invalid

2015-03-19 Thread David Vrabel
On 19/03/15 17:19, Juergen Gross wrote:
> On 03/19/2015 05:18 PM, David Vrabel wrote:
>> On 19/03/15 14:31, Juergen Gross wrote:
>>> Commit 25b884a83d487fd62c3de7ac1ab5549979188482 ("x86/xen: set
>>> regions above the end of RAM as 1:1") introduced a regression.
>>>
>>> To be able to add memory pages which were added via memory hotplug to
>>> a pv domain, the pages must be "invalid" instead of "identity" in the
>>> p2m list before they can be added.
>> [...]
>>> --- a/drivers/xen/balloon.c
>>> +++ b/drivers/xen/balloon.c
>>> @@ -221,15 +221,24 @@ static bool balloon_is_inflated(void)
>>>
>>>   static enum bp_state reserve_additional_memory(long credit)
>>>   {
>>> -int nid, rc;
>>> +int nid, rc = 0;
>>>   u64 hotplug_start_paddr;
>>>   unsigned long balloon_hotplug = credit;
>>> +unsigned long pfn;
>>>
>>>   hotplug_start_paddr = PFN_PHYS(SECTION_ALIGN_UP(max_pfn));
>>>   balloon_hotplug = round_up(balloon_hotplug, PAGES_PER_SECTION);
>>>   nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
>>>
>>> -rc = add_memory(nid, hotplug_start_paddr, balloon_hotplug <<
>>> PAGE_SHIFT);
>>> +for (pfn = PFN_DOWN(hotplug_start_paddr);
>>> + !rc && pfn < PFN_DOWN(hotplug_start_paddr) + balloon_hotplug;
>>> + pfn++)
>>> +if (!set_phys_to_machine(pfn, INVALID_P2M_ENTRY))
>>> +rc = 1;
>>
>>  rc = -ENOMEM;
> 
> I used the value 1 on purpose to be able to identify the case by the
> value printed in the warning below.

Ok.

> 
>>  break;
> 
> Why? !rc is already tested in the for() clause.

I prefer an explicit break.

>>> +
>>> +if (!rc)
>>> +rc = add_memory(nid, hotplug_start_paddr,
>>> +balloon_hotplug << PAGE_SHIFT);
>>
>> Use else here.
> 
> Huh? I want the message to be printed if either set_phys_to_machine()
> or add_memory() failed.

Ok.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] xen: before ballooning hotplugged memory, set frames to invalid

2015-03-19 Thread Juergen Gross

On 03/19/2015 05:21 PM, Daniel Kiper wrote:

On Thu, Mar 19, 2015 at 03:31:02PM +0100, Juergen Gross wrote:

Commit 25b884a83d487fd62c3de7ac1ab5549979188482 ("x86/xen: set
regions above the end of RAM as 1:1") introduced a regression.

To be able to add memory pages which were added via memory hotplug to
a pv domain, the pages must be "invalid" instead of "identity" in the
p2m list before they can be added.

Suggested-by: David Vrabel 
Signed-off-by: Juergen Gross 
---
  drivers/xen/balloon.c | 13 +++--
  1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 0b52d92..52e331f 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -221,15 +221,24 @@ static bool balloon_is_inflated(void)

  static enum bp_state reserve_additional_memory(long credit)
  {
-   int nid, rc;
+   int nid, rc = 0;
u64 hotplug_start_paddr;
unsigned long balloon_hotplug = credit;
+   unsigned long pfn;

hotplug_start_paddr = PFN_PHYS(SECTION_ALIGN_UP(max_pfn));
balloon_hotplug = round_up(balloon_hotplug, PAGES_PER_SECTION);
nid = memory_add_physaddr_to_nid(hotplug_start_paddr);

-   rc = add_memory(nid, hotplug_start_paddr, balloon_hotplug << 
PAGE_SHIFT);
+   for (pfn = PFN_DOWN(hotplug_start_paddr);
+!rc && pfn < PFN_DOWN(hotplug_start_paddr) + balloon_hotplug;
+pfn++)
+   if (!set_phys_to_machine(pfn, INVALID_P2M_ENTRY))


rc = set_phys_to_machine(pfn, INVALID_P2M_ENTRY)?


Not really. set_phys_to_machine returns false on failure...




+   rc = 1;


I do not think that this stuff is needed for HVM or PVH guests.


True.




+   if (!rc)
+   rc = add_memory(nid, hotplug_start_paddr,
+   balloon_hotplug << PAGE_SHIFT);

if (rc) {
pr_warn("Cannot add additional memory (%i)\n", rc);


It will be nice to know what part of infrastructure failed.
Could you create separate pr_warn() message for set_phys_to_machine()?


Value 1 for rc is the indicator for that case.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] xen: prepare p2m list for memory hotplug

2015-03-19 Thread David Vrabel
On 19/03/15 17:16, Juergen Gross wrote:
> On 03/19/2015 05:18 PM, David Vrabel wrote:
>> On 19/03/15 14:31, Juergen Gross wrote:
>>> Commit 054954eb051f35e74b75a566a96fe756015352c8 ("xen: switch to linear
>>> virtual mapped sparse p2m list") introduced a regression regarding to
>>> memory hotplug for a pv-domain: as the virtual space for the p2m list
>>> is allocated for the to be expected memory size of the domain only,
>>> hotplugged memory above that size will not be usable by the domain.
>>>
>>> Correct this by using a configurable size for the p2m list in case of
>>> memory hotplug enabled (default supported memory size is 512 GB for
>>> 64 bit domains and 4 GB for 32 bit domains).
>> [...]
>>> --- a/arch/x86/xen/p2m.c
>>> +++ b/arch/x86/xen/p2m.c
>>> @@ -91,6 +91,17 @@ EXPORT_SYMBOL_GPL(xen_p2m_size);
>>>   unsigned long xen_max_p2m_pfn __read_mostly;
>>>   EXPORT_SYMBOL_GPL(xen_max_p2m_pfn);
>>>
>>> +#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT
>>> +#ifdef CONFIG_X86_32
>>> +BUILD_BUG_ON_MSG(CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT > 64)
>>> +#endif
>>> +#define P2M_LIMIT max(xen_max_p2m_pfn,\
>>> +((unsigned long)((u64)CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT
>>> *\
>>> +1024 * 1024 * 1024 / PAGE_SIZE)))
>>> +#else
>>> +#define P2M_LIMIT xen_max_p2m_pfn
>>> +#endif
>>
>> Can you arrange the #ifdef's to set xen_max_p2m_pfn to the right value
>> instead of introducing P2M_LIMIT?
> 
> Hmm, this would require additional checks in setup.c. What about:
> 
> #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT
> #define P2M_LIMIT CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT
> #else
> #define P2M_LIMIT 0
> #endif
> 
> and do the max(...) calculation in xen_vmalloc_p2m_tree()?

Yes, this is fine.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] xen: before ballooning hotplugged memory, set frames to invalid

2015-03-19 Thread Juergen Gross

On 03/19/2015 05:18 PM, David Vrabel wrote:

On 19/03/15 14:31, Juergen Gross wrote:

Commit 25b884a83d487fd62c3de7ac1ab5549979188482 ("x86/xen: set
regions above the end of RAM as 1:1") introduced a regression.

To be able to add memory pages which were added via memory hotplug to
a pv domain, the pages must be "invalid" instead of "identity" in the
p2m list before they can be added.

[...]

--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -221,15 +221,24 @@ static bool balloon_is_inflated(void)

  static enum bp_state reserve_additional_memory(long credit)
  {
-   int nid, rc;
+   int nid, rc = 0;
u64 hotplug_start_paddr;
unsigned long balloon_hotplug = credit;
+   unsigned long pfn;

hotplug_start_paddr = PFN_PHYS(SECTION_ALIGN_UP(max_pfn));
balloon_hotplug = round_up(balloon_hotplug, PAGES_PER_SECTION);
nid = memory_add_physaddr_to_nid(hotplug_start_paddr);

-   rc = add_memory(nid, hotplug_start_paddr, balloon_hotplug << 
PAGE_SHIFT);
+   for (pfn = PFN_DOWN(hotplug_start_paddr);
+!rc && pfn < PFN_DOWN(hotplug_start_paddr) + balloon_hotplug;
+pfn++)
+   if (!set_phys_to_machine(pfn, INVALID_P2M_ENTRY))
+   rc = 1;


 rc = -ENOMEM;


I used the value 1 on purpose to be able to identify the case by the
value printed in the warning below.


 break;


Why? !rc is already tested in the for() clause.




+
+   if (!rc)
+   rc = add_memory(nid, hotplug_start_paddr,
+   balloon_hotplug << PAGE_SHIFT);


Use else here.


Huh? I want the message to be printed if either set_phys_to_machine()
or add_memory() failed.




if (rc) {
pr_warn("Cannot add additional memory (%i)\n", rc);




Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] docs: Mention a common pitfall in ballooning

2015-03-19 Thread George Dunlap
Several users have reported that their available free memory in the
guest when they used maxmem >> memory was much smaller than when
maxmem == memory.  This is the unavoidable consequence of how
ballooning works, but it's not something users expect.

Warn them of this effect in the place we tell them how to make it
happen, so they aren't surprised.

Signed-off-by: George Dunlap 
---
CC: Ian Campbell 
CC: Ian Jackson 
CC: Wei Liu 
---
 docs/man/xl.cfg.pod.5 | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 93cd7d2..4432f95 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -264,6 +264,13 @@ if the values of B and B differ.
 A "pre-ballooned" HVM guest needs a balloon driver, without a balloon driver
 it will crash.
 
+NOTE: Because of the way ballooning works, the guest has to allocate
+memory to keep track of maxmem pages, regardless of how much memory it
+actually has available to it.  A guest with maxmem=262144 and
+memory=8096 will report significantly less memory available for use
+than a system with maxmem=8096 memory=8096 due to the memory overhead
+of having to track the unused pages.
+
 =back
 
 =head3 Guest Virtual NUMA Configuration
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 5/5] Revert "x86/hvm: wait for at least one ioreq server to be enabled"

2015-03-19 Thread Ian Campbell
On Thu, 2015-03-19 at 13:18 +, Wei Liu wrote:
> This reverts commit dd748d128d86996592afafea02e578cc7d4e6d42.
> 
> We don't need this workaround anymore since we have fixed the toolstack
> interlock problem that affects stubdom.
> 
> Signed-off-by: Wei Liu 

Acked-by: Ian Campbell 

But, really this needs acks from the x86 guys. Cc Added.

> ---
>  xen/arch/x86/hvm/hvm.c   | 21 -
>  xen/include/asm-x86/hvm/domain.h |  1 -
>  2 files changed, 22 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 4734d71..32905d0 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -892,13 +892,6 @@ static void hvm_ioreq_server_enable(struct 
> hvm_ioreq_server *s,
>  
>done:
>  spin_unlock(&s->lock);
> -
> -/* This check is protected by the domain ioreq server lock. */
> -if ( d->arch.hvm_domain.ioreq_server.waiting )
> -{
> -d->arch.hvm_domain.ioreq_server.waiting = 0;
> -domain_unpause(d);
> -}
>  }
>  
>  static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s,
> @@ -1450,20 +1443,6 @@ int hvm_domain_initialise(struct domain *d)
>  
>  spin_lock_init(&d->arch.hvm_domain.ioreq_server.lock);
>  INIT_LIST_HEAD(&d->arch.hvm_domain.ioreq_server.list);
> -
> -/*
> - * In the case where a stub domain is providing emulation for
> - * the guest, there is no interlock in the toolstack to prevent
> - * the guest from running before the stub domain is ready.
> - * Hence the domain must remain paused until at least one ioreq
> - * server is created and enabled.
> - */
> -if ( !is_pvh_domain(d) )
> -{
> -domain_pause(d);
> -d->arch.hvm_domain.ioreq_server.waiting = 1;
> -}
> -
>  spin_lock_init(&d->arch.hvm_domain.irq_lock);
>  spin_lock_init(&d->arch.hvm_domain.uc_lock);
>  
> diff --git a/xen/include/asm-x86/hvm/domain.h 
> b/xen/include/asm-x86/hvm/domain.h
> index 0702bf5..2757c7f 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -83,7 +83,6 @@ struct hvm_domain {
>  struct {
>  spinlock_t   lock;
>  ioservid_t   id;
> -bool_t   waiting;
>  struct list_head list;
>  } ioreq_server;
>  struct hvm_ioreq_server *default_ioreq_server;



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 4/5] libxl: use new QEMU xenstore protocol

2015-03-19 Thread Ian Campbell
On Thu, 2015-03-19 at 13:18 +, Wei Liu wrote:
>  entries = libxl__xs_directory(gc, 0, GCSPRINTF(
> -"/local/domain/0/device-model/%d/physmap", domid), &num);
> +"/local/domain/%d/device-model/%d/physmap",
> +dm_domid, domid), &num);

You've missed an opportunity to use your new helper, I think.

With that fixed: Acked-by: Ian Campbell 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] xen: prepare p2m list for memory hotplug

2015-03-19 Thread Juergen Gross

On 03/19/2015 05:18 PM, David Vrabel wrote:

On 19/03/15 14:31, Juergen Gross wrote:

Commit 054954eb051f35e74b75a566a96fe756015352c8 ("xen: switch to linear
virtual mapped sparse p2m list") introduced a regression regarding to
memory hotplug for a pv-domain: as the virtual space for the p2m list
is allocated for the to be expected memory size of the domain only,
hotplugged memory above that size will not be usable by the domain.

Correct this by using a configurable size for the p2m list in case of
memory hotplug enabled (default supported memory size is 512 GB for
64 bit domains and 4 GB for 32 bit domains).

[...]

--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -91,6 +91,17 @@ EXPORT_SYMBOL_GPL(xen_p2m_size);
  unsigned long xen_max_p2m_pfn __read_mostly;
  EXPORT_SYMBOL_GPL(xen_max_p2m_pfn);

+#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT
+#ifdef CONFIG_X86_32
+BUILD_BUG_ON_MSG(CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT > 64)
+#endif
+#define P2M_LIMIT max(xen_max_p2m_pfn, \
+   ((unsigned long)((u64)CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT * \
+   1024 * 1024 * 1024 / PAGE_SIZE)))
+#else
+#define P2M_LIMIT xen_max_p2m_pfn
+#endif


Can you arrange the #ifdef's to set xen_max_p2m_pfn to the right value
instead of introducing P2M_LIMIT?


Hmm, this would require additional checks in setup.c. What about:

#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT
#define P2M_LIMIT CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT
#else
#define P2M_LIMIT 0
#endif

and do the max(...) calculation in xen_vmalloc_p2m_tree()?


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] xen: prepare p2m list for memory hotplug

2015-03-19 Thread Juergen Gross

On 03/19/2015 05:14 PM, Daniel Kiper wrote:

On Thu, Mar 19, 2015 at 03:31:01PM +0100, Juergen Gross wrote:

Commit 054954eb051f35e74b75a566a96fe756015352c8 ("xen: switch to linear
virtual mapped sparse p2m list") introduced a regression regarding to
memory hotplug for a pv-domain: as the virtual space for the p2m list
is allocated for the to be expected memory size of the domain only,
hotplugged memory above that size will not be usable by the domain.

Correct this by using a configurable size for the p2m list in case of
memory hotplug enabled (default supported memory size is 512 GB for
64 bit domains and 4 GB for 32 bit domains).

Signed-off-by: Juergen Gross 
---
  arch/x86/xen/p2m.c  | 13 -
  drivers/xen/Kconfig | 13 +
  2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 9f93af5..30e84ae 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -91,6 +91,17 @@ EXPORT_SYMBOL_GPL(xen_p2m_size);
  unsigned long xen_max_p2m_pfn __read_mostly;
  EXPORT_SYMBOL_GPL(xen_max_p2m_pfn);

+#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT
+#ifdef CONFIG_X86_32
+BUILD_BUG_ON_MSG(CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT > 64)
+#endif
+#define P2M_LIMIT max(xen_max_p2m_pfn, \
+   ((unsigned long)((u64)CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT * \
+   1024 * 1024 * 1024 / PAGE_SIZE)))
+#else
+#define P2M_LIMIT xen_max_p2m_pfn
+#endif
+
  static DEFINE_SPINLOCK(p2m_update_lock);

  static unsigned long *p2m_mid_missing_mfn;
@@ -387,7 +398,7 @@ void __init xen_vmalloc_p2m_tree(void)
static struct vm_struct vm;

vm.flags = VM_ALLOC;
-   vm.size = ALIGN(sizeof(unsigned long) * xen_max_p2m_pfn,
+   vm.size = ALIGN(sizeof(unsigned long) * P2M_LIMIT,
PMD_SIZE * PMDS_PER_MID_PAGE);


What happens when somebody will allocate more memory for guest than
XEN_BALLOON_MEMORY_HOTPLUG_LIMIT? Additionally, I think that we do


It will fail to add like it does without this patch.


not need allocate extra virtual address space if memory hotplug is
disabled.


As XEN_BALLOON_MEMORY_HOTPLUG_LIMIT depends on
XEN_BALLOON_MEMORY_HOTPLUG which in turn depends on MEMORY_HOTPLUG,
this is already the case.




vm_area_register_early(&vm, PMD_SIZE * PMDS_PER_MID_PAGE);
pr_notice("p2m virtual area at %p, size is %lx\n", vm.addr, vm.size);
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index b812462..0a61ddf 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -55,6 +55,19 @@ config XEN_BALLOON_MEMORY_HOTPLUG

  In that case step 3 should be omitted.

+config XEN_BALLOON_MEMORY_HOTPLUG_LIMIT
+   int
+   default 512 if X86_64
+   default 4 if X86_32
+   depends on XEN_HAVE_PVMMU


Do we need dependency on XEN_HAVE_PVMMU?


Yes. Needed only for pv domains.


If yes than maybe "depends on XEN_HAVE_PVMMU && XEN_BALLOON_MEMORY_HOTPLUG".


Matter of taste, I guess.




+   depends on XEN_BALLOON_MEMORY_HOTPLUG


Should not this option be available even when memory hotplug is disabled?


Why?


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 3/5] libxl: use LIBXL_TOOLSTACK_DOMID

2015-03-19 Thread Ian Campbell
On Thu, 2015-03-19 at 13:18 +, Wei Liu wrote:
> The function in question is libxl__spawn_local_dm. We should use
> LIBXL_TOOLSTACK_DOMID when constructing xenstore path.
> 
> Currently LIBXL_TOOLSTACK_DOMID is 0, so this patch introduces no
> functional change.
> 
> Use helper function to generate xenstore path.
> 
> Signed-off-by: Wei Liu 

Acked-by: Ian Campbell 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 13/13] libxc: Fix do_memory_op to return negative value on errors

2015-03-19 Thread Ian Campbell
On Wed, 2015-03-18 at 20:24 -0400, Konrad Rzeszutek Wilk wrote:
> instead of the -Exx values (which should go in errno).
> 
> This patch has HUGE implications. There is a lot of APIs
> that are using do_memory_op. Fortunately most of them
> check for 'if (do_memory_op(..) < 0)' so will function
> properly. However there were some which printed the return
> value to the user. They have been fixed in:
> 
>  libxc: Don't assign return value to errno for E820 get/set xc_ calls.
>  libxc: Check xc_sharing_* for proper return values.
>  libxc: If xc_domain_add_to_physmap fails, include errno value
>  libxc: Check xc_maximum_ram_page for negative return values.
>  libxc: Check xc_domain_maximum_gpfn for negative return values
> 
> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Ian Campbell 




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/5] libxl: remove device model path in libxl__device_model_destroy

2015-03-19 Thread Ian Campbell
On Thu, 2015-03-19 at 13:18 +, Wei Liu wrote:
> ... and not devices_destroy_cb because it is the right place to clean up
> device model stuff.
> 
> And the path should use LIBXL_TOOLSTACK_DOMID instead of hardcoded 0.
> 
> Signed-off-by: Wei Liu 

Acked-by: Ian Campbell 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 09/13] libxc: Check xc_maximum_ram_page for negative return values.

2015-03-19 Thread Ian Campbell
On Wed, 2015-03-18 at 20:24 -0400, Konrad Rzeszutek Wilk wrote:
> Instead of assuming everything is always OK. As such
> we return now the return value (or zero for success).
> The max_mfn is now passed in as the parameter.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Ian Campbell 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/5] libxl: introduce libxl__device_model_xs_path

2015-03-19 Thread Ian Campbell
On Thu, 2015-03-19 at 13:18 +, Wei Liu wrote:
> Introduce this helper to return xenstore path for device model to avoid
> handcoded paths.
> 
> Signed-off-by: Wei Liu 
> ---
>  tools/libxl/libxl_internal.c | 22 ++
>  tools/libxl/libxl_internal.h |  3 +++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/tools/libxl/libxl_internal.c b/tools/libxl/libxl_internal.c
> index ddc68ab..8877288 100644
> --- a/tools/libxl/libxl_internal.c
> +++ b/tools/libxl/libxl_internal.c
> @@ -555,6 +555,28 @@ void libxl__update_domain_configuration(libxl__gc *gc,
>  dst->b_info.video_memkb = src->b_info.video_memkb;
>  }
>  
> +char *libxl__device_model_xs_path(libxl__gc *gc, uint32_t dm_domid,
> +  uint32_t domid, const char *format,  ...)
> +{
> +char *s, *fmt;
> +va_list ap;
> +int ret;
> +
> +fmt = GCSPRINTF("/local/domain/%u/device-model/%u%s", dm_domid,
> +domid, format);
> +
> +va_start(ap, format);
> +ret = vsnprintf(NULL, 0, fmt, ap);
> +va_end(ap);
> +
> +s = libxl__zalloc(gc, ret + 1);
> +va_start(ap, format);
> +ret = vsnprintf(s, ret + 1, fmt, ap);
> +va_end(ap);

Please could you refactor the existing libxl__sprintf into a
libxl__vsprintf (i.e. which takes a va_list, and uses va_copy for the
two calls to vsnprintf). Then implement your new helper in terms of the
libxl__vsprintf.

> +
> +return s;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 934465a..9ef2ec6 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -1794,6 +1794,9 @@ _hidden libxl__json_object *libxl__json_parse(libxl__gc 
> *gc_opt, const char *s);
>  _hidden int libxl__device_model_version_running(libxl__gc *gc, uint32_t 
> domid);
>/* Return the system-wide default device model */
>  _hidden libxl_device_model_version libxl__default_device_model(libxl__gc 
> *gc);
> +_hidden char *libxl__device_model_xs_path(libxl__gc *gc, uint32_t dm_domid,
> +  uint32_t domid,
> +  const char *format,  ...);
>  
>  /* Check how executes hotplug script currently */
>  int libxl__hotplug_settings(libxl__gc *gc, xs_transaction_t t);



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] xen: prepare p2m list for memory hotplug

2015-03-19 Thread Juergen Gross

On 03/19/2015 04:53 PM, Boris Ostrovsky wrote:

On 03/19/2015 10:31 AM, Juergen Gross wrote:

Commit 054954eb051f35e74b75a566a96fe756015352c8 ("xen: switch to linear
virtual mapped sparse p2m list") introduced a regression regarding to
memory hotplug for a pv-domain: as the virtual space for the p2m list
is allocated for the to be expected memory size of the domain only,
hotplugged memory above that size will not be usable by the domain.

Correct this by using a configurable size for the p2m list in case of
memory hotplug enabled (default supported memory size is 512 GB for
64 bit domains and 4 GB for 32 bit domains).

Signed-off-by: Juergen Gross 
---
  arch/x86/xen/p2m.c  | 13 -
  drivers/xen/Kconfig | 13 +
  2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 9f93af5..30e84ae 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -91,6 +91,17 @@ EXPORT_SYMBOL_GPL(xen_p2m_size);
  unsigned long xen_max_p2m_pfn __read_mostly;
  EXPORT_SYMBOL_GPL(xen_max_p2m_pfn);
+#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT
+#ifdef CONFIG_X86_32
+BUILD_BUG_ON_MSG(CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT > 64)


Will this work in Kconfig instead?

 range 0 64 if CONFIG_X86_32


I'll give it a try.

Juergen



-boris


+#endif
+#define P2M_LIMIT max(xen_max_p2m_pfn,\
+((unsigned long)((u64)CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT *\
+1024 * 1024 * 1024 / PAGE_SIZE)))
+#else
+#define P2M_LIMIT xen_max_p2m_pfn
+#endif
+
  static DEFINE_SPINLOCK(p2m_update_lock);
  static unsigned long *p2m_mid_missing_mfn;
@@ -387,7 +398,7 @@ void __init xen_vmalloc_p2m_tree(void)
  static struct vm_struct vm;
  vm.flags = VM_ALLOC;
-vm.size = ALIGN(sizeof(unsigned long) * xen_max_p2m_pfn,
+vm.size = ALIGN(sizeof(unsigned long) * P2M_LIMIT,
  PMD_SIZE * PMDS_PER_MID_PAGE);
  vm_area_register_early(&vm, PMD_SIZE * PMDS_PER_MID_PAGE);
  pr_notice("p2m virtual area at %p, size is %lx\n", vm.addr,
vm.size);
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index b812462..0a61ddf 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -55,6 +55,19 @@ config XEN_BALLOON_MEMORY_HOTPLUG
In that case step 3 should be omitted.
+config XEN_BALLOON_MEMORY_HOTPLUG_LIMIT
+int
+default 512 if X86_64
+default 4 if X86_32
+depends on XEN_HAVE_PVMMU
+depends on XEN_BALLOON_MEMORY_HOTPLUG
+help
+  Upper limit in GBs a pv domain can be expanded to using memory
+  hotplug.
+
+  This value is used to allocate enough space in internal tables
needed
+  for physical memory administration.
+
  config XEN_SCRUB_PAGES
  bool "Scrub pages before returning them to system"
  depends on XEN_BALLOON


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 11/13] libxc: Check xc_sharing_* for proper return values.

2015-03-19 Thread Ian Campbell
On Wed, 2015-03-18 at 20:24 -0400, Konrad Rzeszutek Wilk wrote:
> If there is a negative return value - check for that and
> also use errno for the proper error value.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Ian Campbell 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xc: error: EPT not supported for this guest: Internal error

2015-03-19 Thread HANNAS YAYA Issa

On Thu, 19 Mar 2015 16:35:43 +, Ian Campbell wrote:

On Thu, 2015-03-19 at 17:04 +0100, HANNAS YAYA Issa wrote:

Please ask user configuration questions on the xen-users list, this 
list

is for development topics.

Perhaps also check your favourite search engine before doing so.

Ian.


Hi
when I run xen-access test (tools/tests/xen-access) I got the 
following

error
xc: error: EPT not supported for this guest: Internal error
Please can you explain me how to enable EPT in guest
Thanks
Hannas


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

OK. sorry

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Network blocked after sending several packets larger than 128 bytes when using Driver Domain

2015-03-19 Thread Zoltan Kiss



On 19/03/15 03:40, openlui wrote:

Hi, all:

I am trying to use a HVM with PCI pass-through NIC as network driver domain. 
However, when I send packets whose size are larger than 128 bytes from DomU 
using pkt-gen tools, after several seconds, the network between driver domain 
and destination host will be blocked.

The networking structure when testing is shown below:
Pkt-gen (in DomU) <--> Virtual Eth (in DomU) <---> VIF (in Driver Domain) <--> OVS (in 
Driver Domain) <--> pNIC (passthrough nic in Driver Domain) <---> Another Host

The summarized results are as follows:
1. When we just ping from DomU to another host, the network seems ok.
2. When sending 64 or 128 bytes UDP packets from DomU, the network will not be 
blocked
3. When sending 256, 1024 or 1400 bytes UDP packets from DomU, and if the 
scatter-gather feature of passthrough NIC in driver domain is on, the network 
will be blocked
4. When sending 256, 1024 or 1400 bytes UDP packets from DomU, and only if the 
scatter-gather feature of passthrough NIC in driver domain is off, the network 
will not be blocked

As shown in detailed syslog below, when network is blocked, it seems that the 
passthrough NIC's driver entry an exception state and the tx queue is hung.
As far as I know, when sending 64 or 128 bytes package, the skb generated by 
netback only has the linearized data, and the data is stored in the PAGE 
allocated from the driver domain's memory. But for packets whose size is larger 
than 128 bytes, the skb will also has a frag page which is grant mapped from 
DomU's memory. And if we disable the scatter-gather feature of NIC, the skb 
sent from netback will be linearized firstly, and it will make the skb's data 
is stored in the PAGE allocated from the driver domain other than the DomU's 
memory.
Yes, you are correct: the first slot (at most 128 bytes from it) is 
grant copied to a locally allocated skb, whilst the rest is grant mapped 
from the guest's memory in this case.


I am wondering if it is the problem caused by PCI-passthrough and DMA 
operations, or if there is some wrong configuration in our environment. How can 
I continue to debug this problem? I am looking forward to your replay and 
advice, Thanks.

The environment we used are as follows:
a. Dom0: SUSE 12 (kernel: 3.12.28)
b. XEN: 4.4.1_0602.2 (provided by SUSE 12)
c. DomU: kernel 3.17.4
d. Driver Domain: kernel 3.17.8
I would try out an upstream kernel, there were some grant mapping 
changes recently, maybe that solves your issue.

Also, have you set the kernel's loglevel to DEBUG?
ixgbe also has a modul parameter to enable further logging.

e. OVS: 2.1.2
f. Host: Huawei RH2288, CPU Intel Xenon E5645@2.40GHz, disabled HyperThread, 
enabled VT-d
g. pNIC: we tried Intel 82599 10GE NIC (ixgbe v3.23.2), Intel 82576 1GE NIC 
(igb) and Broadcom NetXtreme II BCM 5709 1GE NIC (bnx2 v2.2.5)
h. para-virtulization driver: netfront/netback
i. MTU: 1500

The detailed Logs in Driver Domain after the network is blocked are as follows:
1. When using 82599 10GE NIC, syslog and dmesg includes infos below. The log 
shows that the Tx unit Hang is detected and driver will try to reset the 
adapter repeatly, however, the network is still blocked.


ixgbe: :00:04.0 eth10: Detected Tx Unit Hang
Tx Queue <0>
TDH, TDT <1fd>, <5a>
next_to_use  <5a>
next_to_clean<1fc>
ixgbe: :00:04.0 eth0: tx hang 11 detected on queue 0, resetting adapter
ixgbe: :00:04.0 eth10: Reset adapter
ixgbe: :00:04.0 eth10: PCIe transaction pending bit also did not clear
ixgbe: :00:04.0 master disable timed out
ixgbe: :00:04.0 eth10: detected SFP+: 3
ixgbe: :00:04.0 eth10: NIC Link is Up 10 Gbps, Flow Control: RX/TX
...


I have tried to remove the "reset adpater" call in ixgbe driver's ndo_tx_timeout 
function, and the logs are shown below. The log shows that when network is blocked, the 
"TDH" and the nic cannot be incremented any more.


ixgbe :00:04.0 eth3: Detected Tx Unit Hang
Tx Queue <0>
TDH, TDT <1fd>, <5a>
next_to_use  <5a>
next_to_clean<1fc>
ixgbe :00:04.0 eth3: tx_buffer_info[next_to_clean]
time_stamp   <1075b74ca>
jiffies  <1075b791c>
ixgbe :00:04.0 eth3: Fake Tx hang detected with timeout of 5 seconds
ixgbe :00:04.0 eth3: Detected Tx Unit Hang
Tx Queue <0>
TDH, TDT <1fd>, <5a>
next_to_use  <5a>
next_to_clean<1fc>
ixgbe :00:04.0 eth3: tx_buffer_info[next_to_clean]
time_stamp   <1075b74ca>
jiffies  <1075b7b11>
...


I have also compared the nic's corresponding pci status before and after the network is hung, and found that 
the "DevSta" filed changed from "TransPend-" to "TransPend+" after the network 
is blocked:


DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend+


The network can only be recovered after we reload the ixgbe module in driver 
domain.

2. When using BCM5709 NIC, t

Re: [Xen-devel] [PATCH v2 08/13] libxc: Check xc_domain_maximum_gpfn for negative return values

2015-03-19 Thread Ian Campbell
On Wed, 2015-03-18 at 20:24 -0400, Konrad Rzeszutek Wilk wrote:
> Instead of assuming everything is always OK. We stash
> the gpfns value as an parameter.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  tools/libxc/xc_core_arm.c| 17 ++---
>  tools/libxc/xc_core_x86.c| 24 
>  tools/libxc/xc_domain_save.c |  8 +++-
>  3 files changed, 41 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/libxc/xc_core_arm.c b/tools/libxc/xc_core_arm.c
> index 16508e7..26cec04 100644
> --- a/tools/libxc/xc_core_arm.c
> +++ b/tools/libxc/xc_core_arm.c
> @@ -31,9 +31,16 @@ xc_core_arch_gpfn_may_present(struct xc_core_arch_context 
> *arch_ctxt,
>  }
>  
> 
> -static int nr_gpfns(xc_interface *xch, domid_t domid)
> +static int nr_gpfns(xc_interface *xch, domid_t domid, unsigned long *gpfns)

You didn't fancy merging the two versions of this then ;-)
> diff --git a/tools/libxc/xc_core_x86.c b/tools/libxc/xc_core_x86.c
> index d8846f1..02377e8 100644
> --- a/tools/libxc/xc_core_x86.c
> +++ b/tools/libxc/xc_core_x86.c

> @@ -88,7 +99,12 @@ xc_core_arch_map_p2m_rw(xc_interface *xch, struct 
> domain_info_context *dinfo, xc
>  int err;
>  int i;
>  
> -dinfo->p2m_size = nr_gpfns(xch, info->domid);
> +err = nr_gpfns(xch, info->domid, &dinfo->p2m_size);

Please could you avoid reusing err here, the reason is that it's sole
use now is to save errno over the cleanup path, whereas here it looks
like it is going to be used for something but it isn't.

 if ( nr_gpfns(...)  < 0 )

is ok per the Xen coding style if you don't actually need the return
code.

Or

ret = nr_gpfns()
if ( ret < 0 )
error, goto out

ret = -1;
.. the rest

would be ok too I guess. (coding style here allows
if ( (ret = nr_gpfns(...)) < 0 )
too FWIW).

> +if ( err < 0 )
> +{
> +ERROR("nr_gpfns returns errno: %d.", errno);
> +goto out;
> +}
>  if ( dinfo->p2m_size < info->nr_pages  )
>  {
>  ERROR("p2m_size < nr_pages -1 (%lx < %lx", dinfo->p2m_size, 
> info->nr_pages - 1);
> diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
> index 254fdb3..6346c12 100644
> --- a/tools/libxc/xc_domain_save.c
> +++ b/tools/libxc/xc_domain_save.c
> @@ -939,7 +939,13 @@ int xc_domain_save(xc_interface *xch, int io_fd, 
> uint32_t dom, uint32_t max_iter
>  }
>  
>  /* Get the size of the P2M table */
> -dinfo->p2m_size = xc_domain_maximum_gpfn(xch, dom) + 1;
> +rc = xc_domain_maximum_gpfn(xch, dom);
> +if ( rc < 0 )
> +{
> +ERROR("Could not get maximum GPFN!");
> +goto out;
> +}
> +dinfo->p2m_size = rc + 1;

Shame this can't use the same helper as the others.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST Nested PATCH 2/6] Add and expose some testsupport APIs

2015-03-19 Thread Ian Campbell
On Tue, 2015-03-17 at 14:16 -0400, longtao.pang wrote:
> From: "longtao.pang" 
> 
> 1. Designate vif model to 'e1000', otherwise, with default device model,
> the L1 eth0 interface disappear, hence xenbridge cannot work.
> Maybe this limitation can be removed later after some fix it. For now, we
> have to accomodate to it.

You have done this unconditionally, which means it affects all guests.
You need to make this configurable by the caller, probably by plumbing
it through in $xopts (a hash of extra options).

I see now you were told this last time around by Ian J, please don't
just resend such things without change either fix them, make an argument
for doing it your way or ask for clarification if you don't understand
the requested change.

> 2. Since reboot L1 guest VM will take more time to boot up, we increase
> multi-times for reboot-confirm-booted if test nested job, and the multi value 
> is stored as a runvar in 'ts-nested-setup' script. Added another function 
> 'guest_editconfig_cd' and expose it, this function bascically changes guest
> boot device sequence, alter its on_reboot behavior to restart and enabled 
> nestedhvm feature.

This looks like two items run together?

The multi_reboot_time thing sounds ok, but it should be called
reboot_time_factor or something like that. In fact I see that Ian
suggested previously that it should have the host ident in it, that
makes sense to me.

The editconfig_cd thing -- yet another thing which Ian questioned and
which it was agreed you would change but you haven't.

I think perhaps you have accidentally resent an older version of the
series. If not then please go back and ensure you have addressed all of
the feedback given on the last iteration before sending another version.

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 07/13] libxc: Fix xc_tmem_control to return proper error.

2015-03-19 Thread Ian Campbell
On Wed, 2015-03-18 at 20:24 -0400, Konrad Rzeszutek Wilk wrote:
> The API returns now negative values on error and stashes
> the error in errno. Fix the user of this API.
> 
> The 'xc_hypercall_bounce_pre' can fail - and if so it will
> stash its errno values - no need to over-write it.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Ian Campbell 

I'm still a little concerned about xenstat.c's handling of errno!
=-ENOSYS, but not enough to nack.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xc: error: EPT not supported for this guest: Internal error

2015-03-19 Thread Ian Campbell
On Thu, 2015-03-19 at 17:04 +0100, HANNAS YAYA Issa wrote:

Please ask user configuration questions on the xen-users list, this list
is for development topics.

Perhaps also check your favourite search engine before doing so.

Ian.

> Hi
> when I run xen-access test (tools/tests/xen-access) I got the following 
> error
> xc: error: EPT not supported for this guest: Internal error
> Please can you explain me how to enable EPT in guest
> Thanks
> Hannas
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST Nested PATCH 0/6] Introduction of netsted HVM test job

2015-03-19 Thread Ian Campbell
On Tue, 2015-03-17 at 14:16 -0400, longtao.pang wrote:
> This patch set adds nested HVM test case for osstest.

I've now looked at the first two patches in this series and I've found
that in both patches you have consistently not reacted to the review
comments made the first time around, so I'm not going to read the rest
of the series now since it would appear to be a waste of my time if I'm
just going to end up repeating things which were said last time.

Please resend a 3rd version when you have made sure that you have
addressed the previous feedback.

Please also:

  * Use "git send-email --reroll-count=N" (or with older git
--subject-prefix including vN) to indicate which revision of the
patch series this is (i.e. here you should have used
--reroll-count=2 and next time 3)
  * Add a miniture changelog to each patch indication what has
changed in this iteration, this should go after the commit
message, S-o-b and a "---" marker.

See the "[PATCH v2] foobar: Add a new trondle calls" example in
http://wiki.xen.org/wiki/Submitting_Xen_Patches#Review.2C_Rinse_.26_Repeat for 
example of both of these.

In fact please read wiki.xen.org/wiki/Submitting_Xen_Patches for lots of
hints on all of this stuff.

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] xen: before ballooning hotplugged memory, set frames to invalid

2015-03-19 Thread Daniel Kiper
On Thu, Mar 19, 2015 at 03:31:02PM +0100, Juergen Gross wrote:
> Commit 25b884a83d487fd62c3de7ac1ab5549979188482 ("x86/xen: set
> regions above the end of RAM as 1:1") introduced a regression.
>
> To be able to add memory pages which were added via memory hotplug to
> a pv domain, the pages must be "invalid" instead of "identity" in the
> p2m list before they can be added.
>
> Suggested-by: David Vrabel 
> Signed-off-by: Juergen Gross 
> ---
>  drivers/xen/balloon.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 0b52d92..52e331f 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -221,15 +221,24 @@ static bool balloon_is_inflated(void)
>
>  static enum bp_state reserve_additional_memory(long credit)
>  {
> - int nid, rc;
> + int nid, rc = 0;
>   u64 hotplug_start_paddr;
>   unsigned long balloon_hotplug = credit;
> + unsigned long pfn;
>
>   hotplug_start_paddr = PFN_PHYS(SECTION_ALIGN_UP(max_pfn));
>   balloon_hotplug = round_up(balloon_hotplug, PAGES_PER_SECTION);
>   nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
>
> - rc = add_memory(nid, hotplug_start_paddr, balloon_hotplug << 
> PAGE_SHIFT);
> + for (pfn = PFN_DOWN(hotplug_start_paddr);
> +  !rc && pfn < PFN_DOWN(hotplug_start_paddr) + balloon_hotplug;
> +  pfn++)
> + if (!set_phys_to_machine(pfn, INVALID_P2M_ENTRY))

rc = set_phys_to_machine(pfn, INVALID_P2M_ENTRY)?

> + rc = 1;

I do not think that this stuff is needed for HVM or PVH guests.

> + if (!rc)
> + rc = add_memory(nid, hotplug_start_paddr,
> + balloon_hotplug << PAGE_SHIFT);
>
>   if (rc) {
>   pr_warn("Cannot add additional memory (%i)\n", rc);

It will be nice to know what part of infrastructure failed.
Could you create separate pr_warn() message for set_phys_to_machine()?

Daniel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] xen: before ballooning hotplugged memory, set frames to invalid

2015-03-19 Thread David Vrabel
On 19/03/15 14:31, Juergen Gross wrote:
> Commit 25b884a83d487fd62c3de7ac1ab5549979188482 ("x86/xen: set
> regions above the end of RAM as 1:1") introduced a regression.
> 
> To be able to add memory pages which were added via memory hotplug to
> a pv domain, the pages must be "invalid" instead of "identity" in the
> p2m list before they can be added.
[...]
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -221,15 +221,24 @@ static bool balloon_is_inflated(void)
>  
>  static enum bp_state reserve_additional_memory(long credit)
>  {
> - int nid, rc;
> + int nid, rc = 0;
>   u64 hotplug_start_paddr;
>   unsigned long balloon_hotplug = credit;
> + unsigned long pfn;
>  
>   hotplug_start_paddr = PFN_PHYS(SECTION_ALIGN_UP(max_pfn));
>   balloon_hotplug = round_up(balloon_hotplug, PAGES_PER_SECTION);
>   nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
>  
> - rc = add_memory(nid, hotplug_start_paddr, balloon_hotplug << 
> PAGE_SHIFT);
> + for (pfn = PFN_DOWN(hotplug_start_paddr);
> +  !rc && pfn < PFN_DOWN(hotplug_start_paddr) + balloon_hotplug;
> +  pfn++)
> + if (!set_phys_to_machine(pfn, INVALID_P2M_ENTRY))
> + rc = 1;

rc = -ENOMEM;
break;

> +
> + if (!rc)
> + rc = add_memory(nid, hotplug_start_paddr,
> + balloon_hotplug << PAGE_SHIFT);

Use else here.

>   if (rc) {
>   pr_warn("Cannot add additional memory (%i)\n", rc);
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] xen: prepare p2m list for memory hotplug

2015-03-19 Thread David Vrabel
On 19/03/15 14:31, Juergen Gross wrote:
> Commit 054954eb051f35e74b75a566a96fe756015352c8 ("xen: switch to linear
> virtual mapped sparse p2m list") introduced a regression regarding to
> memory hotplug for a pv-domain: as the virtual space for the p2m list
> is allocated for the to be expected memory size of the domain only,
> hotplugged memory above that size will not be usable by the domain.
> 
> Correct this by using a configurable size for the p2m list in case of
> memory hotplug enabled (default supported memory size is 512 GB for
> 64 bit domains and 4 GB for 32 bit domains).
[...]
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -91,6 +91,17 @@ EXPORT_SYMBOL_GPL(xen_p2m_size);
>  unsigned long xen_max_p2m_pfn __read_mostly;
>  EXPORT_SYMBOL_GPL(xen_max_p2m_pfn);
>  
> +#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT
> +#ifdef CONFIG_X86_32
> +BUILD_BUG_ON_MSG(CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT > 64)
> +#endif
> +#define P2M_LIMIT max(xen_max_p2m_pfn,   
> \
> + ((unsigned long)((u64)CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT * \
> + 1024 * 1024 * 1024 / PAGE_SIZE)))
> +#else
> +#define P2M_LIMIT xen_max_p2m_pfn
> +#endif

Can you arrange the #ifdef's to set xen_max_p2m_pfn to the right value
instead of introducing P2M_LIMIT?

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


  1   2   3   >