Re: [Xen-devel] [PATCH v6 17/31] xen/arm: ITS: Store LPIs allocated and IRQ ID bits per domain

2015-09-07 Thread Vijay Kilari
On Thu, Sep 3, 2015 at 9:55 PM, Julien Grall  wrote:
> Hi Vijay,
>
> On 31/08/15 12:06, vijay.kil...@gmail.com wrote:
>> From: Vijaya Kumar K 
>>
>> Store number of lpis and number of id bits
>> in vgic structure
>>
>> Signed-off-by: Vijaya Kumar K 
>> ---
>>  xen/arch/arm/irq.c   |9 +
>>  xen/arch/arm/vgic-v3-its.c   |2 ++
>>  xen/arch/arm/vgic.c  |   12 
>>  xen/include/asm-arm/domain.h |3 +++
>>  xen/include/asm-arm/irq.h|3 +++
>>  5 files changed, 29 insertions(+)
>>
>> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
>> index 24c4f24..93e9411 100644
>> --- a/xen/arch/arm/irq.c
>> +++ b/xen/arch/arm/irq.c
>> @@ -31,6 +31,15 @@
>>  static unsigned int local_irqs_type[NR_LOCAL_IRQS];
>>  static DEFINE_SPINLOCK(local_irqs_type_lock);
>>
>> +/* Number of LPI supported in XEN */
>> +/*
>
> Number of LPI supported by Xen.
>
> Also, it's not necessary to have a separate comment for this. It could
> be included in the next one. I.e:
>
> /*
>  * Number of LPI supported by Xen
>  *
>  * LPI 
>
>> + * LPI number start from 8192. Minimum number of bits
>> + * required to represent 8192 is 13 bits. So to Support LPIs minimum
>> + * 14 bits are required which can represent maximum LPI 16384.
>> + * 16384 - 8192 = 8192. Minimum number of LPIs supported is 8192
>> + */
>
> The explanation is hard to understand. I would rewrite like:
>
> "The LPI identifier starts from 8192. Given that the hardware is
> providing the number of identifier bits, supporting LPIs requires at
> least 14 bits. This will represent 16384 interrupt ID of which 8192
> LPIs. So the minimum of LPIs supported when the hardware supports LPIs
> is 8192 ".
>
>> +unsigned int nr_lpis = 8192;
>> +
>
> I still don't think that it makes sense to introduce the number of LPIs
> variable in a patch for vITS. As you describe it, it should be part of
> an ITS/GICv3 patch.
>
> Although, I think you should use the field nr_irq_ids in the gic
> structure (see patch #10) to avoid problem you currently have with this
> solution.
>
> For instance, gic_is_lpi is relying on the number of ID bits exposed by
> the hardware but you allocate the IRQ desc for LPIs based on nr_lpis.
>

You mean to restrict nr_irq_ids to FIRST_GIC_LPI + 8192 (nr_lpis)
instead of HW supported value?
and let gic_is_lpi() always check for Xen supported lpi number instead of hw
supported value?

> It's fine to restrict the value in the GIC compare to hardware.
>
> Furthermore this value should be 0 on platform where LPIs is not
> supported in order to make confusion and introduce possible problem with
> the code later. I could add that, AFAICT, this new variable (nr_lpis) is
> not that often within the code.

 nr_lpis is global variable that we define to tell number of LPIs supported
by Xen. If platform does not support LPIs, then vgic.nr_lpis is set to 0.

>
> Lastly, on a previous version (don't remember which one) we said that
> the user should be able to change the value on the command line. What's
> your plan for that?

Yes, it was part of our discussion on chat. I will fix it.

>
>>  /* Describe an IRQ assigned to a guest */
>>  struct irq_guest
>>  {
>> diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c
>> index fabbad0..cef6139 100644
>> --- a/xen/arch/arm/vgic-v3-its.c
>> +++ b/xen/arch/arm/vgic-v3-its.c
>> @@ -547,6 +547,8 @@ int vits_domain_init(struct domain *d)
>>
>>  ASSERT(is_hardware_domain(d));
>>
>> +d->arch.vgic.nr_lpis = nr_lpis;
>> +
>
> With  my suggestion, this would turn into gic_nr_irq_ids() - 8192;

As suggested above, if we restrict nr_irq_ids to Xen supported
value (FIRST_GIC_LPI + 8192(nr_lpis) then you suggestion is right.

Regards
Vijay

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Linux 4.1 reports wrong number of pages to toolstack

2015-09-07 Thread Jan Beulich
>>> On 04.09.15 at 10:53,  wrote:
> I have set the adhoc bisector working on the ~200 commits between rc3 and
> rc4. It's running in the Citrix instance (which is quieter) so the interim
> results are only visible within our network at http://osstest.xs.citrite.ne 
> t/~osstest/testlogs/results-adhoc/bisect/xen-unstable/test-amd64-i386
> -xl..html.
> 
> So far it has confirmed the basis fail and it is now rechecking the basis
> pass.
> 
> Slightly strange though is:
> $ git log --oneline v3.19-rc3..v3.19-rc4 -- drivers/xen/ arch/x86/xen/ 
> include/xen/
> $
> 
> i.e. there are no relevant seeming xen commits in that range. Maybe the
> last one of this is more relevant?
> 
> $ git log --grep=[xX][eE][nN] --oneline v3.19-rc3..v3.19-rc4 -- 
> bdec419 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
> 07ff890 xen-netback: fixing the propagation of the transmit shaper timeout
> 132978b x86: Fix step size adjustment during initial memory mapping
> $

So if I'm interpreting the graph right it was indeed the last of these
which got fingered, which is mine. Yet having looked at it in close
detail just now again I can't see it to be wrong, or even have an
effect on post-boot state: All it does is adjust the block sizes in
which the 1:1 mapping gets established. The final result ought to
still be the same (with - obviously - the exception of which pages
may get used for page tables). No change to any global variables
afaics.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns

2015-09-07 Thread Jan Beulich
>>> On 06.09.15 at 22:05,  wrote:
> The original implementation of populate_pfns didn't consider the same
> pfn can be present multiple times in the array. The mechanism to prevent
> populating the same pfn multiple times only worked if the recurring pfn
> appeared in different batches.
> 
> This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
> which has several ptes pointing to same pfn, which results in an array
> containing recurring pfn.

Since you must have debugged this, and since the bisector appears
to have fingered a patch of mine on the Linux side which triggered
this, would you mind explaining this a little more? In particular I'm
worried that this may point out some other bug in Linux, as in the
context of the change there - dealing with the 1:1 mapping - I can't
see a legitimate reason for multiple PTEs to reference the same PFN.

Thanks, Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-3.14 test] 61421: trouble: blocked/broken

2015-09-07 Thread osstest service owner
flight 61421 linux-3.14 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/61421/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-xsm   3 host-install(3) broken REGR. vs. 60666
 build-armhf   3 host-install(3) broken REGR. vs. 60666
 build-armhf-pvops 3 host-install(3) broken REGR. vs. 60666
 build-i386-xsm3 host-install(3) broken REGR. vs. 60666
 build-i3863 host-install(3) broken REGR. vs. 60666
 build-i386-pvops  3 host-install(3) broken REGR. vs. 60666
 build-amd64   3 host-install(3) broken REGR. vs. 60666
 build-amd64-pvops 3 host-install(3) broken REGR. vs. 60666
 build-amd64-xsm   3 host-install(3) broken REGR. vs. 60666

Tests which did not succeed, but are not blocking:
 build-amd64-rumpuserxen   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 build-i386-rumpuserxen1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-raw   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-qcow2 1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm  1 build-check(1)blocked n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-armhf-armhf-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-qemut-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-amd64-i386-xl-vhd1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qcow2  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvh-intel  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 test-amd64-i386-xl1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-qemut-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl-pvh-amd   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-raw1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-winxpsp3  1 build-check(1)   blocked n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 bui

[Xen-devel] BUG: failed to save x86 HVM guest with 1TB ram

2015-09-07 Thread wangxin (U)
Hi,

I'm tring to hibernate an x86 HVM guest with 1TB ram,
  [1.VM config]
  builder = "hvm"
  name = "suse12_sp3"
  memory = 1048576
  vcpus = 16
  boot = "c"
  disk = [ '/mnt/sda10/vm/SELS_ide_disk.img,raw,xvda,rw' ]
  device_model_version = "qemu-xen"
  vnc = 1
  vnclisten = '9.51.3.174'
  vncdisplay = 0

but I get the error messages(see below) from XC:
  [2.VM saving] xl save -p suse12_sp3 suse12_sp3.save
  Saving to suse12_sp3.save new xl format (info 0x1/0x0/1309)
  xc: error: Cannot save this big a guest: Internal error
  libxl: error: libxl_dom.c:1875:libxl__xc_domain_save_done: saving domain: \
  domain did not respond to suspend request: Argument list too long
  libxl: error: libxl_dom.c:2032:remus_teardown_done: Remus: failed to \
  teardown device for guest with domid 3, rc -8
  Failed to save domain, resuming domain
  xc: error: Dom 3 not suspended: (shutdown 0, reason 255): Internal error
  libxl: error: libxl.c:508:libxl__domain_resume: xc_domain_resume failed \
  for domain 3: Invalid argument

The error in function xc_domain_save in xc_domain_save.c, 
/* Get the size of the P2M table */
dinfo->p2m_size = xc_domain_maximum_gpfn(xch, dom) + 1;

if ( dinfo->p2m_size > ~XEN_DOMCTL_PFINFO_LTAB_MASK )
{
errno = E2BIG;
ERROR("Cannot save this big a guest");
goto out;
}

it may be 1TB ram plus pci-hole space make the MFN wider than limit size.

If I want to save a VM with 1TB ram or larger, what shoud I do? Did anyone
have tried this before and have some configuration I can refer to?

Thank you very much!

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [adhoc xen-unstable bisection] complete test-amd64-i386-xl

2015-09-07 Thread Ian Campbell
This is the result of the bisection of the migration issue with 32-bit on
Linux 4.1 onwards.

This is just for info as I believe the issue is already under discussion
etc elsewhere.

I've copied the final graph to 
http://xenbits.xen.org/people/ianc/tmp/adhoc/test-amd64-i386-xl..html

Ian.


On Sun, 2015-09-06 at 04:44 +0100, Platform Team regression test user wrote:
> branch xen-unstable
> xen branch xen-unstable
> job test-amd64-i386-xl
> test guest-saverestore
> 
> Tree: linux git://xenbits.xen.org/linux-pvops.git
> Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
> Tree: qemu git://xenbits.xen.org/staging/qemu-xen-unstable.git
> Tree: qemuu git://xenbits.xen.org/staging/qemu-upstream-unstable.git
> Tree: xen git://xenbits.xen.org/xen.git

[...snip first pass blaming a merge changeset which I aborted but was still 
reported here...]

> *** Found and reproduced problem changeset ***
> 
>   Bug is in tree:  linux git://xenbits.xen.org/linux-pvops.git
>   Bug introduced:  132978b94e66f8ad7d20790f8332f0e9c1426029
>   Bug not present: fbe1bf140671619508dfa575d74a185ae53c5dbb
> 
> 
>   commit 132978b94e66f8ad7d20790f8332f0e9c1426029
>   Author: Jan Beulich 
>   Date:   Fri Dec 19 16:10:54 2014 +
>   
>   x86: Fix step size adjustment during initial memory mapping
>   
>   The old scheme can lead to failure in certain cases - the
>   problem is that after bumping step_size the next (non-final)
>   iteration is only guaranteed to make available a memory block
>   the size of what step_size was before. E.g. for a memory block
>   [0,300460) we'd have:
>   
>iter>  > start>>   > end>  >   > step> >   
> > amount
>1> > 300440>   > 30045f>   >  2M>  >   >   2M
>2> > 300400>   > 30043f>   > 64M>  >   >   4M
>3> > 30>   > 3003ff>   >  2G>  >   >  64M
>4> > 20>   > 2f>   > 64G>  >   >  64G
>   
>   Yet to map 64G with 4k pages (as happens e.g. under PV Xen) we
>   need slightly over 128M, but the first three iterations made
>   only about 70M available.
>   
>   The condition (new_mapped_ram_size > mapped_ram_size) for
>   bumping step_size is just not suitable. Instead we want to bump
>   it when we know we have enough memory available to cover a block
>   of the new step_size. And rather than making that condition more
>   complicated than needed, simply adjust step_size by the largest
>   possible factor we know we can cover at that point - which is
>   shifting it left by one less than the difference between page
>   table level shifts. (Interestingly the original STEP_SIZE_SHIFT
>   definition had a comment hinting at that having been the
>   intention, just that it should have been PUD_SHIFT-PMD_SHIFT-1
>   instead of (PUD_SHIFT-PMD_SHIFT)/2, and of course for non-PAE
>   32-bit we can't really use these two constants as they're equal
>   there.)
>   
>   Furthermore the comment in get_new_step_size() didn't get
>   updated when the bottom-down mapping logic got added. Yet while
>   an overflow (flushing step_size to zero) of the shift doesn't
>   matter for the top-down method, it does for bottom-up because
>   round_up(x, 0) = 0, and an upper range boundary of zero can't
>   really work well.
>   
>   Signed-off-by: Jan Beulich 
>   Acked-by: Yinghai Lu 
>   Link: 
> http://lkml.kernel.org/r/54945c1e027800051...@mail.emea.novell.com
>   Signed-off-by: Ingo Molnar 
> 
> 
> For bisection revision-tuple graph see:
>
> http://osstest.xs.citrite.net/~osstest/testlogs/results-adhoc/bisect/xen-unstable/test-amd64-i386-xl..html
> Revision IDs in each graph node refer, respectively, to the Trees above.
> 
> 
> Searching for failure / basis pass:
>  37864 fail [host=rice-weevil] / 37863 ok.
> Failure / basis pass flights: 37864 / 37863
> (tree with no url: ovmf)
> (tree with no url: seabios)
> Tree: linux git://xenbits.xen.org/linux-pvops.git
> Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
> Tree: qemu git://xenbits.xen.org/staging/qemu-xen-unstable.git
> Tree: qemuu git://xenbits.xen.org/staging/qemu-upstream-unstable.git
> Tree: xen git://xenbits.xen.org/xen.git
> Latest eaa27f34e91a14cdceed26ed6c6793ec1d186115 
> c530a75c1e6a472b0eb9558310b518f0dfcd8860 
> 7f057440b31da38196e3398fd1b618fc36ad97d6 
> bcf35eec0b621c46dbf0aeb40c6bc06b5d3981aa 
> 145a8004a7d659668d5a3b0ad9868d7678b24822
> Basis pass 97bf6af1f928216fd6c5a66e8a57bfa95a659672 
> c530a75c1e6a472b0eb9558310b518f0dfcd8860 
> 7f057440b31da38196e3398fd1b618fc36ad97d6 
> bcf35eec0b621c46dbf0aeb40c6bc06b5d3981aa 
> 145a8004a7d659668d5a3b0ad9868d7678b24822
> Generating revisions with ./adhoc-revtuple-generator  
> git://xenbits.xen.org/linux-pvops.git#97bf6af1f928216fd6c5a66e8a

Re: [Xen-devel] What is the difference between XenAPI and XenAPI-Extensions

2015-09-07 Thread Ian Campbell
On Fri, 2015-09-04 at 14:49 -0400, D'Mita Levy wrote:
> Hello,

Hi.

xapi/XenAPI has it's own development list at xen-...@lists.xen.org where
you will likely find more people with the expertise to answer your
questions.

Ian.

> 
> I am a student at FIU working on a school project so I am a not very 
> experienced. Right now I am trying to use the Xen API code I found at 
> https://github.com/xenserver/xenadmin to Snapshot a VM. The project I 
> have so far uses the XenAPI.VM functions and I have been successful in 
> starting and stopping a VM. As an example I perform those actions like 
> this:
> 
>  XenRef vmRef = VM.get_by_uuid(_session, vmUUID);
>  VM.start(_session, vmRef.opaque_ref, false, true);
> 
> My mentors would like me to try and clone a VM. I have copied the Run() 
> function from VMSnapshotCreateAction.cs and I tried to make a snapshot 
> with the following code:
> 
>XenRef vmRef = VM.get_by_uuid(_session, GetUUIDByName(vmName));
>SnapshotType m_Type = SnapshotType.DISK; //hardcoded snapshot type
>string snapshot_name = "Testing_Snapshot";
> 
>XenAPI.VM.async_snapshot(_session, vmRef.opaque_ref, snapshot_name);
> 
> I have two VM's. A Windows XP SP3 VM (No actual OS installed, just the 
> base VM) and an Ubuntu 14.04 Trusty Tahr VM (installed OS and running). 
> The Windows XP SP3 Snapshot gets created successfully but the Ubuntu 
> snapshot does not.
> 
> Upon furthure inspection I notice that XenAPI.VM works for start/stop but 
> XenAPI-Extensions.VM is needed to make snapshots - based on 
> VMSnapshotCreateAction.cs 
> 
> Why is it that I need XenAPI-Extensions.VM to createa  vm reference to 
> make snapshots and cannot use XenAPI.VM? What is the difference between 
> XenAPI and Extensions? Finally, is it possible just to pull out the code 
> so that I can have the core functionality; it seems adding it to my 
> simple WPF project requires a ton of dependencies and references
> 
> Thanks,
> D'Mita
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6 v3 2/5] libxc: migration v2 prefix Memory -> Frames

2015-09-07 Thread Andrew Cooper

On 06/09/15 21:05, Wei Liu wrote:

The prefix "Memory" is confusing because the numbers shown after that
are referring to frames.

Change a bunch of prefixes from "Memory" to "Frames". Also rename
send_memory_verify to verify_frames.

Signed-off-by: Wei Liu 


Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Draft C] Boot ABI for HVM guests without a device-model

2015-09-07 Thread Roger Pau Monné
Hello,

El 04/09/15 a les 18.12, Ian Campbell ha escrit:
> On Fri, 2015-09-04 at 17:47 +0200, Roger Pau Monné wrote:
>> VCPUOP_initialize was never available to HVM guests, so I don't think
>> changing the argument is a problem. However, I understand that for the
>> sake of clarity overloading an hypercall this way is not the best
>> practice. What about naming it VCPUOP_hvm_initialise?
> 
> If the new interface could support both PV (vcpu_guest_context) and the new
> thing (i.e. with a type field and a union perhaps), or if the new interface
> can work for PV some other way then it's not unheard of to rename the
> existing number with _compat and take over the name with a new number.
> 
> It just needs some compat __XEN_INTERFACE_VERSION__ stuff in the headers,
> like with e.g. __HYPERVISOR_sched_op vs __HYPERVISOR_sched_op_compat.
> 
> (I've not looked at this interface and I don't really remember what the old
> one looks like, so maybe this is an insane idea in this case)

So AFAICS we have 3 options:

1. Overload VCPUOP_initialise like it's done in the current series (v6).
For PV guests the hypercall parameter is of type vcpu_guest_context,
while for HVM guests the parameter is of type vcpu_hvm_context.

2. Create a new hypercall (VCPUOP_hvm_initialise) only available to HVM
guests, that only allows vcpu_hvm_context as a parameter.

3. Deprecate current VCPUOP_initialise, introduce a new
VCPUOP_initialise, that takes the following parameter:

union vcpu_context {
struct vcpu_guest_context pv_ctx;
struct vcpu_hvm_context hvm_ctx;
};

TBH, I don't have an opinion between 2 and 3, but I would like to get a
consensus before I start implementing any of those.

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns

2015-09-07 Thread Wei Liu
On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote:
> >>> On 06.09.15 at 22:05,  wrote:
> > The original implementation of populate_pfns didn't consider the same
> > pfn can be present multiple times in the array. The mechanism to prevent
> > populating the same pfn multiple times only worked if the recurring pfn
> > appeared in different batches.
> > 
> > This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
> > which has several ptes pointing to same pfn, which results in an array
> > containing recurring pfn.
> 
> Since you must have debugged this, and since the bisector appears
> to have fingered a patch of mine on the Linux side which triggered
> this, would you mind explaining this a little more? In particular I'm
> worried that this may point out some other bug in Linux, as in the
> context of the change there - dealing with the 1:1 mapping - I can't
> see a legitimate reason for multiple PTEs to reference the same PFN.
> 

Sure. I can try to explain this as clear as possible. Note that I didn't
even look at Linux side changes because at that point I was sure there
was a bug in migration v2.

So there is a step called normalise_page in migration v2. It's nop for
HVM guest. For PV guest, it only cares about page table frames. To
normalise a page table frame, the core idea is to replace all MFNs in
page tables to PFNs inside the guest.

When restoring, there is a step called localise_page, which again is a
nop for HVM guest. For PV guest, it does the reverse of normalise_page.
It goes through all page table frames, extract all PFNs pointed to by
PTEs in such frames, populate them, then reconstruct page tables.

What I discovered is that PTEs inside one page table frame contained the
same PFN (something like fd42). The original implementation of toolstack
populate_pfns didn't consider such scenario. As for what that PFN
referred to, I wasn't sure and I didn't really care about that.

Let me know if you need more information.

Wei.

> Thanks, Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-09-07 Thread Jan Beulich
>>> On 06.09.15 at 06:19,  wrote:
> On 9/6/2015 11:19 AM, Tamas K Lengyel wrote:
>>> --- a/xen/drivers/passthrough/vtd/iommu.c
>>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>>> @@ -2310,12 +2310,16 @@ static int intel_iommu_assign_device(
>>>   PCI_DEVFN2(bdf) == devfn &&
>>>   rmrr->scope.devices_cnt > 1 )
>>>  {
>>> +u32 relaxed = flag & XEN_DOMCTL_DEV_RDM_RELAXED;
>>> +
>>>  printk(XENLOG_G_ERR VTDPREFIX
>>> -   " cannot assign %04x:%02x:%02x.%u"
>>> +   " Currently its %s to assign %04x:%02x:%02x.%u"
>>> " with shared RMRR at %"PRIx64" for Dom%d.\n",
>>> +   relaxed ? "disallowed" : "risky",
>>>
>>
>> This debug message is backwards?
> 
> Yeah. Its indeed like this, relaxed ? "risky" : "disallowed"
> 
> But lets wait Jan's comment to step next.

What kind of comment are you expecting from me?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] BUG: failed to save x86 HVM guest with 1TB ram

2015-09-07 Thread Andrew Cooper

On 07/09/15 09:09, wangxin (U) wrote:

Hi,

I'm tring to hibernate an x86 HVM guest with 1TB ram,
   [1.VM config]
   builder = "hvm"
   name = "suse12_sp3"
   memory = 1048576
   vcpus = 16
   boot = "c"
   disk = [ '/mnt/sda10/vm/SELS_ide_disk.img,raw,xvda,rw' ]
   device_model_version = "qemu-xen"
   vnc = 1
   vnclisten = '9.51.3.174'
   vncdisplay = 0

but I get the error messages(see below) from XC:
   [2.VM saving] xl save -p suse12_sp3 suse12_sp3.save
   Saving to suse12_sp3.save new xl format (info 0x1/0x0/1309)
   xc: error: Cannot save this big a guest: Internal error
   libxl: error: libxl_dom.c:1875:libxl__xc_domain_save_done: saving domain: \
   domain did not respond to suspend request: Argument list too long
   libxl: error: libxl_dom.c:2032:remus_teardown_done: Remus: failed to \
   teardown device for guest with domid 3, rc -8
   Failed to save domain, resuming domain
   xc: error: Dom 3 not suspended: (shutdown 0, reason 255): Internal error
   libxl: error: libxl.c:508:libxl__domain_resume: xc_domain_resume failed \
   for domain 3: Invalid argument

The error in function xc_domain_save in xc_domain_save.c,
 /* Get the size of the P2M table */
 dinfo->p2m_size = xc_domain_maximum_gpfn(xch, dom) + 1;

 if ( dinfo->p2m_size > ~XEN_DOMCTL_PFINFO_LTAB_MASK )
 {
 errno = E2BIG;
 ERROR("Cannot save this big a guest");
 goto out;
 }

it may be 1TB ram plus pci-hole space make the MFN wider than limit size.

If I want to save a VM with 1TB ram or larger, what shoud I do? Did anyone
have tried this before and have some configuration I can refer to?


This is clearly not from Xen 4.6, but the same issue will be present.

The check serves a dual purpose.  In the legacy case, it is to avoid 
clobbering the upper bits of pfn information with pfn type information 
for 32bit toolstacks;  any PFN above 2^28 would have type information 
clobbering the upper bits.  This has been mitigated somewhat in 
migration v2, as pfns are strictly 64bit values, still using the upper 4 
bits for type information, allowing 60 bits for the PFN itself.


The second purpose is just as a limit on toolstack resources. Migration 
requires allocating structures which scale linearly with the size of the 
VM; the biggest of which would be ~1GB for the p2m. Added to this is 
>1GB for the m2p, and suddenly a 32bit toolstack process is looking 
scarce on RAM.


During the development of migration v2, I didn't spend any time 
considering if or how much it was sensible to lift the restriction by, 
so the check was imported wholesale from the legacy code.


For now, I am going to say that it simply doesn't work.  Simply upping 
the limit is only a stopgap measure; an HVM guest can still mess this up 
by playing physmap games and mapping a page of ram at a really high 
(guest) physical address.  Longterm, we need hypervisor support for 
getting a compressed view of guest physical address space, so toolstack 
side resources are proportional to the amount of RAM given to the guest, 
not to how big a guest decides to make its physmap.


~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Found memory leak in libxenstore

2015-09-07 Thread Sunguodong
Hi,
I found this memory leak on xen-4.5.0:
linux-byXjTX:~ # virsh version
Compiled against library: libvirt 1.2.17
Using library: libvirt 1.2.17
Using API: Xen 1.2.17
Running hypervisor: Xen 4.5.0

Steps to produce this leak:

1.   Start a vm, win7_64_2U_vhd, for example.

2.   Stop libvirtd and start it with valgrind, using the following command:

valgrind --tool=memcheck --log-file=xs.log --leak-check=full 
--show-reachable=yes --track-origins=yes --trace-children=yes --verbose 
libvirtd -d -l

3.   Wait until libvirtd started, then reboot the vm:

virsh reboot win7_64_2U_vhd

4.   After vm restarted, kill valgind with signal 3:

kill -3 27483 (27483 is the pid of valgrind)

5.   Open xs.log, search 'definitely', we will always find this leak:

==28989== 40 bytes in 1 blocks are definitely lost in loss record 327 of 585

==28989==at 0x4C27B9B: malloc (vg_replace_malloc.c:263)

==28989==by 0x7EBD618: read_message (xs.c:1146)

==28989==by 0x7EBE8C6: read_thread (xs.c:1222)

==28989==by 0x74F8805: start_thread (in /lib64/libpthread-2.11.3.so)

==28989==by 0x77EC66C: clone (in /lib64/libc-2.11.3.so)

When we reboot vm twice, this leak would happen twice either. Could anyone 
verify this and fix it?

VM XML config attached here, just for reference:

   win7_64_2U_vhd
   2097152
   2097152
   2
   
   destroy
   restart
   destroy
   1
   3600
   
   en-us
  
   hvm
   /usr/lib/xen/boot/hvmloader
   
  
  
/usr/lib/xen/bin/qemu-system-i386 

  
  
  


 
 
 
 



  



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns

2015-09-07 Thread Jan Beulich
>>> On 07.09.15 at 11:36,  wrote:
> On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote:
>> >>> On 06.09.15 at 22:05,  wrote:
>> > The original implementation of populate_pfns didn't consider the same
>> > pfn can be present multiple times in the array. The mechanism to prevent
>> > populating the same pfn multiple times only worked if the recurring pfn
>> > appeared in different batches.
>> > 
>> > This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
>> > which has several ptes pointing to same pfn, which results in an array
>> > containing recurring pfn.
>> 
>> Since you must have debugged this, and since the bisector appears
>> to have fingered a patch of mine on the Linux side which triggered
>> this, would you mind explaining this a little more? In particular I'm
>> worried that this may point out some other bug in Linux, as in the
>> context of the change there - dealing with the 1:1 mapping - I can't
>> see a legitimate reason for multiple PTEs to reference the same PFN.
>> 
> 
> Sure. I can try to explain this as clear as possible. Note that I didn't
> even look at Linux side changes because at that point I was sure there
> was a bug in migration v2.
> 
> So there is a step called normalise_page in migration v2. It's nop for
> HVM guest. For PV guest, it only cares about page table frames. To
> normalise a page table frame, the core idea is to replace all MFNs in
> page tables to PFNs inside the guest.
> 
> When restoring, there is a step called localise_page, which again is a
> nop for HVM guest. For PV guest, it does the reverse of normalise_page.
> It goes through all page table frames, extract all PFNs pointed to by
> PTEs in such frames, populate them, then reconstruct page tables.
> 
> What I discovered is that PTEs inside one page table frame contained the
> same PFN (something like fd42). The original implementation of toolstack
> populate_pfns didn't consider such scenario. As for what that PFN
> referred to, I wasn't sure and I didn't really care about that.

That's unfortunate, as that's precisely the information I was after,
since - as said - taking the repetition of the same PFN together with
what the triggering Linux change is about, it smells like there's
something wrong on the Linux side too. Do you at least recall how
many times that same PFN got repeated?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns

2015-09-07 Thread Wei Liu
On Mon, Sep 07, 2015 at 03:53:57AM -0600, Jan Beulich wrote:
> >>> On 07.09.15 at 11:36,  wrote:
> > On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote:
> >> >>> On 06.09.15 at 22:05,  wrote:
> >> > The original implementation of populate_pfns didn't consider the same
> >> > pfn can be present multiple times in the array. The mechanism to prevent
> >> > populating the same pfn multiple times only worked if the recurring pfn
> >> > appeared in different batches.
> >> > 
> >> > This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
> >> > which has several ptes pointing to same pfn, which results in an array
> >> > containing recurring pfn.
> >> 
> >> Since you must have debugged this, and since the bisector appears
> >> to have fingered a patch of mine on the Linux side which triggered
> >> this, would you mind explaining this a little more? In particular I'm
> >> worried that this may point out some other bug in Linux, as in the
> >> context of the change there - dealing with the 1:1 mapping - I can't
> >> see a legitimate reason for multiple PTEs to reference the same PFN.
> >> 
> > 
> > Sure. I can try to explain this as clear as possible. Note that I didn't
> > even look at Linux side changes because at that point I was sure there
> > was a bug in migration v2.
> > 
> > So there is a step called normalise_page in migration v2. It's nop for
> > HVM guest. For PV guest, it only cares about page table frames. To
> > normalise a page table frame, the core idea is to replace all MFNs in
> > page tables to PFNs inside the guest.
> > 
> > When restoring, there is a step called localise_page, which again is a
> > nop for HVM guest. For PV guest, it does the reverse of normalise_page.
> > It goes through all page table frames, extract all PFNs pointed to by
> > PTEs in such frames, populate them, then reconstruct page tables.
> > 
> > What I discovered is that PTEs inside one page table frame contained the
> > same PFN (something like fd42). The original implementation of toolstack
> > populate_pfns didn't consider such scenario. As for what that PFN
> > referred to, I wasn't sure and I didn't really care about that.
> 
> That's unfortunate, as that's precisely the information I was after,
> since - as said - taking the repetition of the same PFN together with
> what the triggering Linux change is about, it smells like there's
> something wrong on the Linux side too. Do you at least recall how
> many times that same PFN got repeated?
> 

Thousands of times.

Wei.

> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns

2015-09-07 Thread David Vrabel
On 07/09/15 10:36, Wei Liu wrote:
> On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote:
> On 06.09.15 at 22:05,  wrote:
>>> The original implementation of populate_pfns didn't consider the same
>>> pfn can be present multiple times in the array. The mechanism to prevent
>>> populating the same pfn multiple times only worked if the recurring pfn
>>> appeared in different batches.
>>>
>>> This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
>>> which has several ptes pointing to same pfn, which results in an array
>>> containing recurring pfn.
>>
>> Since you must have debugged this, and since the bisector appears
>> to have fingered a patch of mine on the Linux side which triggered
>> this, would you mind explaining this a little more? In particular I'm
>> worried that this may point out some other bug in Linux, as in the
>> context of the change there - dealing with the 1:1 mapping - I can't
>> see a legitimate reason for multiple PTEs to reference the same PFN.
>>
> 
> Sure. I can try to explain this as clear as possible. Note that I didn't
> even look at Linux side changes because at that point I was sure there
> was a bug in migration v2.
> 
> So there is a step called normalise_page in migration v2. It's nop for
> HVM guest. For PV guest, it only cares about page table frames. To
> normalise a page table frame, the core idea is to replace all MFNs in
> page tables to PFNs inside the guest.
> 
> When restoring, there is a step called localise_page, which again is a
> nop for HVM guest. For PV guest, it does the reverse of normalise_page.
> It goes through all page table frames, extract all PFNs pointed to by
> PTEs in such frames, populate them, then reconstruct page tables.
> 
> What I discovered is that PTEs inside one page table frame contained the
> same PFN (something like fd42). The original implementation of toolstack
> populate_pfns didn't consider such scenario. As for what that PFN
> referred to, I wasn't sure and I didn't really care about that.
> 
> Let me know if you need more information.

I am somewhat amazed that this worked at all since finding multiple PTEs
with the same PFN in a match of 1024 pages would be really common.

But this bug would be avoided if the page table pages were more often
located towards the end of RAM such that the PFNs were already populated
as part of the normal page transfer.

I suppose it is possible that the Linux commit found by the bisector
results in changes to where page table pages are allocated.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Commit moratorium

2015-09-07 Thread Wei Liu
On Thu, Sep 03, 2015 at 05:25:17PM +0100, Wei Liu wrote:
> Committers,
> 
> Xen tree is going to branch at 4.6 RC3. I don't want to branch when
> master != staging, so please avoid committing new patches to staging now
> to let master catch up with staging. Another announcement will be made
> when the moratorium is lifted.
> 

Unfortunately we didn't get a push over the weekend. There is also a
critical bug discovered for migration v2.

I think we should commit the series to fix that migration v2 bug today,
as well as any other pending critical bug fixes.

Wei.

> Thanks
> Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6 v3 0/5] Migration v2 fix

2015-09-07 Thread Andrew Cooper

On 06/09/15 21:05, Wei Liu wrote:

Wei Liu (5):
   libxc: clearer migration v2 debug message
   libxc: migration v2 prefix Memory -> Frames
   libxc: fix indentation
   libxc: don't populate same pfn more than once in populate_pfns
   libxc: add assertion to avoid setting same bit more than once

  tools/libxc/xc_sr_restore.c|  7 ---
  tools/libxc/xc_sr_restore_x86_pv.c |  4 ++--
  tools/libxc/xc_sr_save.c   | 10 +-
  3 files changed, 11 insertions(+), 10 deletions(-)



Please test against Xen 4.5 as well.  I have looked at the legacy code, 
but can't convince myself either way as to whether the bug is present or 
not.


~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Draft C] Boot ABI for HVM guests without a device-model

2015-09-07 Thread Jan Beulich
>>> On 07.09.15 at 11:34,  wrote:
> So AFAICS we have 3 options:
> 
> 1. Overload VCPUOP_initialise like it's done in the current series (v6).
> For PV guests the hypercall parameter is of type vcpu_guest_context,
> while for HVM guests the parameter is of type vcpu_hvm_context.
> 
> 2. Create a new hypercall (VCPUOP_hvm_initialise) only available to HVM
> guests, that only allows vcpu_hvm_context as a parameter.
> 
> 3. Deprecate current VCPUOP_initialise, introduce a new
> VCPUOP_initialise, that takes the following parameter:
> 
> union vcpu_context {
>   struct vcpu_guest_context pv_ctx;
>   struct vcpu_hvm_context hvm_ctx;
> };
> 
> TBH, I don't have an opinion between 2 and 3, but I would like to get a
> consensus before I start implementing any of those.

Let me first take a look at how your implementation of 1 looks like.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns

2015-09-07 Thread Juergen Gross

On 09/07/2015 11:53 AM, Jan Beulich wrote:

On 07.09.15 at 11:36,  wrote:

On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote:

On 06.09.15 at 22:05,  wrote:

The original implementation of populate_pfns didn't consider the same
pfn can be present multiple times in the array. The mechanism to prevent
populating the same pfn multiple times only worked if the recurring pfn
appeared in different batches.

This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
which has several ptes pointing to same pfn, which results in an array
containing recurring pfn.


Since you must have debugged this, and since the bisector appears
to have fingered a patch of mine on the Linux side which triggered
this, would you mind explaining this a little more? In particular I'm
worried that this may point out some other bug in Linux, as in the
context of the change there - dealing with the 1:1 mapping - I can't
see a legitimate reason for multiple PTEs to reference the same PFN.



Sure. I can try to explain this as clear as possible. Note that I didn't
even look at Linux side changes because at that point I was sure there
was a bug in migration v2.

So there is a step called normalise_page in migration v2. It's nop for
HVM guest. For PV guest, it only cares about page table frames. To
normalise a page table frame, the core idea is to replace all MFNs in
page tables to PFNs inside the guest.

When restoring, there is a step called localise_page, which again is a
nop for HVM guest. For PV guest, it does the reverse of normalise_page.
It goes through all page table frames, extract all PFNs pointed to by
PTEs in such frames, populate them, then reconstruct page tables.

What I discovered is that PTEs inside one page table frame contained the
same PFN (something like fd42). The original implementation of toolstack
populate_pfns didn't consider such scenario. As for what that PFN
referred to, I wasn't sure and I didn't really care about that.


That's unfortunate, as that's precisely the information I was after,
since - as said - taking the repetition of the same PFN together with
what the triggering Linux change is about, it smells like there's
something wrong on the Linux side too. Do you at least recall how
many times that same PFN got repeated?


The linear p2m list support introduced this behaviour. Instead of having
multiple copies of identical p2m pages (e.g. for all entries of the page
being ~0UL) only one such page is existing which is mapped multiple
times in the linear p2m list. This will happen for large regions (2 MB
aligned) of either identity mapped or invalid pfns.

In domUs we see such a scenario rather rarely as it would require either
large memory holes or large identity regions. You might have introduced
the latter.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns

2015-09-07 Thread Jan Beulich
>>> On 07.09.15 at 11:59,  wrote:
> But this bug would be avoided if the page table pages were more often
> located towards the end of RAM such that the PFNs were already populated
> as part of the normal page transfer.

Ah, good point.

> I suppose it is possible that the Linux commit found by the bisector
> results in changes to where page table pages are allocated.

It certainly does.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6 v3 0/5] Migration v2 fix

2015-09-07 Thread Wei Liu
On Mon, Sep 07, 2015 at 11:04:25AM +0100, Andrew Cooper wrote:
> On 06/09/15 21:05, Wei Liu wrote:
> >Wei Liu (5):
> >   libxc: clearer migration v2 debug message
> >   libxc: migration v2 prefix Memory -> Frames
> >   libxc: fix indentation
> >   libxc: don't populate same pfn more than once in populate_pfns
> >   libxc: add assertion to avoid setting same bit more than once
> >
> >  tools/libxc/xc_sr_restore.c|  7 ---
> >  tools/libxc/xc_sr_restore_x86_pv.c |  4 ++--
> >  tools/libxc/xc_sr_save.c   | 10 +-
> >  3 files changed, 11 insertions(+), 10 deletions(-)
> >
> 
> Please test against Xen 4.5 as well.  I have looked at the legacy code, but
> can't convince myself either way as to whether the bug is present or not.
> 

DYM cross version migration from 4.5 to 4.6 or 32bit Linux 4.1 save /
restore with Xen 4.5?

> ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6 v3 0/5] Migration v2 fix

2015-09-07 Thread Andrew Cooper



On 07/09/15 11:07, Wei Liu wrote:

On Mon, Sep 07, 2015 at 11:04:25AM +0100, Andrew Cooper wrote:

On 06/09/15 21:05, Wei Liu wrote:

Wei Liu (5):
   libxc: clearer migration v2 debug message
   libxc: migration v2 prefix Memory -> Frames
   libxc: fix indentation
   libxc: don't populate same pfn more than once in populate_pfns
   libxc: add assertion to avoid setting same bit more than once

  tools/libxc/xc_sr_restore.c|  7 ---
  tools/libxc/xc_sr_restore_x86_pv.c |  4 ++--
  tools/libxc/xc_sr_save.c   | 10 +-
  3 files changed, 11 insertions(+), 10 deletions(-)


Please test against Xen 4.5 as well.  I have looked at the legacy code, but
can't convince myself either way as to whether the bug is present or not.


DYM cross version migration from 4.5 to 4.6 or 32bit Linux 4.1 save /
restore with Xen 4.5?


Just plain legacy migration with Linux 4.1, to see whether legacy 
suffers from the same issue.  If it does, we should probably backport a 
legacy fix.


~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Question about the status of vNUMA in Xen

2015-09-07 Thread Wei Liu
On Sun, Sep 06, 2015 at 03:03:13PM +0800, 甘清甜 wrote:
> I'm a master student from Huazhong University of Science & technology,
> China. I'm now researching on Xen optimization under NUMA. I'm puzzled
> that if  vNUMA on HVM is enbaded in Xen or not. If yes, which version of
> xen support it?
> 

Upcoming Xen 4.6 supports vNUMA for HVM guest, with certain limitations.

> So far, I have read the Xen NUMA Roadmap page and watched the
> video about vNUMA in Xen on Youtube. I have known that some work
> has been done for vNUMA in Xen. Since my benchmark result show a
> great improvement in Xen-4.5.1 when compared with Xen-4.0.1. I'm
> puzzled if vNUMA has contributed to that improvement.

No. That's mostly due to other improvements.

Wei.

> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 02/18] Add cmpxchg16b support for x86-64

2015-09-07 Thread Jan Beulich
>>> On 06.09.15 at 08:07,  wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Friday, September 04, 2015 10:23 PM
>> >>> On 25.08.15 at 03:57,  wrote:
>> > ---
>> > v6:
>> > - Fix a typo
>> >
>> > v5:
>> > - Change back the parameters of __cmpxchg16b() to __uint128_t *
>> > - Remove pointless cast for 'ptr'
>> > - Remove pointless parentheses
>> > - Use A constraint for the output
>> 
>> There are a few comments I had on v4 which neither have been
>> addressed verbally nor have got incorporated into the patch. See
>> http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg04694.html 
> 
> Do you mean I missed the following part in the above link?
> 
> " But I think now I recall that it may have been me asking for using
> void pointers here: That way one can pass pointers to other types.
> But then you would better cast to void * here, and have the inline
> function have properly typed pointers. And it should go without
> saying that if you cast (or implicitly convert to void), you should
> validate that what your caller handed you is at least the size of a
> __uint128_t (and for "ptr" perhaps also, as I think Andrew already
> pointed out, that it is suitably aligned). "
> 
> If that is the case, what about the changes below?
> 
> #define cmpxchg16b(ptr,o,n)  \
> ASSERT((unsigned long)ptr & 0xF == 0);  \
> ASSERT(sizeof(*o) == sizeof(__uint128_t))\
> ASSERT(sizeof(*n) == sizeof(__uint128_t))\
> __cmpxchg16b((ptr), (void *)(o), (void *)(n))

Along those lines, but with BUILD_BUG_ON() and properly
parenthesized please.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Commit moratorium

2015-09-07 Thread Ian Campbell
On Mon, 2015-09-07 at 11:01 +0100, Wei Liu wrote:
> On Thu, Sep 03, 2015 at 05:25:17PM +0100, Wei Liu wrote:
> > Committers,
> > 
> > Xen tree is going to branch at 4.6 RC3. I don't want to branch when
> > master != staging, so please avoid committing new patches to staging 
> > now
> > to let master catch up with staging. Another announcement will be made
> > when the moratorium is lifted.
> > 
> 
> Unfortunately we didn't get a push over the weekend. There is also a
> critical bug discovered for migration v2.
> 
> I think we should commit the series to fix that migration v2 bug today,

OK, will look into it shortly.

> as well as any other pending critical bug fixes.

If there are any then someone will need to point me at a specific set of
patches to commit if they want me to do it.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 02/18] Add cmpxchg16b support for x86-64

2015-09-07 Thread Jan Beulich
>>> On 06.09.15 at 08:32,  wrote:
>> From: Wu, Feng
>> Sent: Sunday, September 06, 2015 2:07 PM
>> If that is the case, what about the changes below?
>> 
>> #define cmpxchg16b(ptr,o,n)
>> \
>> ASSERT((unsigned long)ptr & 0xF == 0);
>> \
>> ASSERT(sizeof(*o) == sizeof(__uint128_t))
>> \
>> ASSERT(sizeof(*n) == sizeof(__uint128_t))
>> \
>> __cmpxchg16b((ptr), (void *)(o), (void *)(n))
> 
> Seems there is a build error with this change, we cannot
> add stuff before __cmpxchg() since it needs return some
> value to the caller. Any suggestion here?

You of course first need to make this proper C (missing semicolons)
and convert it to a construct that is just a single statement (if
necessary using gcc's compound expression extension).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Found memory leak in libxenstore

2015-09-07 Thread Wei Liu
I only had a brief look at this.

On Mon, Sep 07, 2015 at 09:52:08AM +, Sunguodong wrote:
> Hi,
> I found this memory leak on xen-4.5.0:
> linux-byXjTX:~ # virsh version
> Compiled against library: libvirt 1.2.17
> Using library: libvirt 1.2.17
> Using API: Xen 1.2.17
> Running hypervisor: Xen 4.5.0
> 
> Steps to produce this leak:
> 
> 1.   Start a vm, win7_64_2U_vhd, for example.
> 
> 2.   Stop libvirtd and start it with valgrind, using the following 
> command:
> 
> valgrind --tool=memcheck --log-file=xs.log --leak-check=full 
> --show-reachable=yes --track-origins=yes --trace-children=yes --verbose 
> libvirtd -d -l
> 
> 3.   Wait until libvirtd started, then reboot the vm:
> 
> virsh reboot win7_64_2U_vhd
> 
> 4.   After vm restarted, kill valgind with signal 3:
> 
> kill -3 27483 (27483 is the pid of valgrind)
> 
> 5.   Open xs.log, search 'definitely', we will always find this leak:
> 
> ==28989== 40 bytes in 1 blocks are definitely lost in loss record 327 of 585
> 
> ==28989==at 0x4C27B9B: malloc (vg_replace_malloc.c:263)
> 
> ==28989==by 0x7EBD618: read_message (xs.c:1146)
> 
> ==28989==by 0x7EBE8C6: read_thread (xs.c:1222)
> 
> ==28989==by 0x74F8805: start_thread (in /lib64/libpthread-2.11.3.so)
> 
> ==28989==by 0x77EC66C: clone (in /lib64/libc-2.11.3.so)
> 
> When we reboot vm twice, this leak would happen twice either. Could anyone 
> verify this and fix it?

Those messages are freed after the connection is closed. Are you sure
they are still there after you close the connection?

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 02/18] Add cmpxchg16b support for x86-64

2015-09-07 Thread Jan Beulich
>>> On 06.09.15 at 08:07,  wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Friday, September 04, 2015 10:23 PM
>> >>> On 25.08.15 at 03:57,  wrote:
>> > ---
>> > v6:
>> > - Fix a typo
>> >
>> > v5:
>> > - Change back the parameters of __cmpxchg16b() to __uint128_t *
>> > - Remove pointless cast for 'ptr'
>> > - Remove pointless parentheses
>> > - Use A constraint for the output
>> 
>> There are a few comments I had on v4 which neither have been
>> addressed verbally nor have got incorporated into the patch. See
>> http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg04694.html 
> 
> Do you mean I missed the following part in the above link?
> 
> " But I think now I recall that it may have been me asking for using
> void pointers here: That way one can pass pointers to other types.
> But then you would better cast to void * here, and have the inline
> function have properly typed pointers. And it should go without
> saying that if you cast (or implicitly convert to void), you should
> validate that what your caller handed you is at least the size of a
> __uint128_t (and for "ptr" perhaps also, as I think Andrew already
> pointed out, that it is suitably aligned). "

And for the record - I was referring to more than just that (in
particular the need for a memory clobber). Please, prior to asking
back, check what parts of the comments you actually did take care
of, and deal with _everything_ you didn't.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Found memory leak in libxenstore

2015-09-07 Thread Wei Liu
On Mon, Sep 07, 2015 at 11:37:48AM +0100, Wei Liu wrote:
> I only had a brief look at this.
> 
> On Mon, Sep 07, 2015 at 09:52:08AM +, Sunguodong wrote:
> > Hi,
> > I found this memory leak on xen-4.5.0:
> > linux-byXjTX:~ # virsh version
> > Compiled against library: libvirt 1.2.17
> > Using library: libvirt 1.2.17
> > Using API: Xen 1.2.17
> > Running hypervisor: Xen 4.5.0
> > 
> > Steps to produce this leak:
> > 
> > 1.   Start a vm, win7_64_2U_vhd, for example.
> > 
> > 2.   Stop libvirtd and start it with valgrind, using the following 
> > command:
> > 
> > valgrind --tool=memcheck --log-file=xs.log --leak-check=full 
> > --show-reachable=yes --track-origins=yes --trace-children=yes --verbose 
> > libvirtd -d -l
> > 
> > 3.   Wait until libvirtd started, then reboot the vm:
> > 
> > virsh reboot win7_64_2U_vhd
> > 
> > 4.   After vm restarted, kill valgind with signal 3:
> > 
> > kill -3 27483 (27483 is the pid of valgrind)
> > 
> > 5.   Open xs.log, search 'definitely', we will always find this leak:
> > 
> > ==28989== 40 bytes in 1 blocks are definitely lost in loss record 327 of 585
> > 
> > ==28989==at 0x4C27B9B: malloc (vg_replace_malloc.c:263)
> > 
> > ==28989==by 0x7EBD618: read_message (xs.c:1146)
> > 
> > ==28989==by 0x7EBE8C6: read_thread (xs.c:1222)
> > 
> > ==28989==by 0x74F8805: start_thread (in /lib64/libpthread-2.11.3.so)
> > 
> > ==28989==by 0x77EC66C: clone (in /lib64/libc-2.11.3.so)
> > 
> > When we reboot vm twice, this leak would happen twice either. Could anyone 
> > verify this and fix it?
> 
> Those messages are freed after the connection is closed. Are you sure
> they are still there after you close the connection?
> 

Another theory is that libvirt never closes the connection. I do think
this is a bit undesirable (that, messages are not freed until connection
is closed), but unfortunately I don't have time for this until 4.6 is
out.

Wei.

> Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 04/18] vt-d: VT-d Posted-Interrupts feature detection

2015-09-07 Thread Jan Beulich
>>> On 06.09.15 at 03:49,  wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Friday, September 04, 2015 10:31 PM
>> >>> On 25.08.15 at 03:57,  wrote:
>> > --- a/xen/drivers/passthrough/vtd/iommu.c
>> > +++ b/xen/drivers/passthrough/vtd/iommu.c
>> > @@ -2079,6 +2079,9 @@ static int init_vtd_hw(void)
>> >  disable_intremap(drhd->iommu);
>> >  }
>> >
>> > +if ( !iommu_intremap )
>> > +iommu_intpost = 0;
>> 
>> Why is this being repeated here? iommu_setup() is already taking
>> care of this thanks to the earlier patch.
> 
> Here is my original thought, we have two path to get here as below:
> 
> iommu_setup() -> ... -> int_vtd_hw()
> vtd_resume() -> ... -> int_vtd_hw()
> 
> I added the logic here to cover the 'vtd_resume()' case, however,
> after thinking more, seems I don't need to do this, because the
> logic has been done before resume, it should be there after back,
> right?

Obviously any assignment to variables like this one should be benign
(as in not change the variable's value) during resume, or else it'll be
very likely for other stuff to get out of sync and hence broken.

>> > @@ -2176,6 +2179,14 @@ int __init intel_vtd_setup(void)
>> >  if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
>> >  iommu_intremap = 0;
>> >
>> > +/*
>> > + * We cannot use posted interrupt if X86_FEATURE_CX16 is
>> > + * not supported, since we count on this feature to
>> > + * atomically update 16-byte IRTE in posted format.
>> > + */
>> > +if ( !iommu_intremap || !cap_intr_post(iommu->cap)
>> || !cpu_has_cx16 )
>> > +iommu_intpost = 0;
>> 
>> You should check iommu_intpost instead of iommu_intremap to
>> match the other checks above here.
> 
> Here I followed your advice:
> http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg01261.html 
> 
> I think the following code should be ok here though, if you like it, I can 
> change back.
> 
> +if ( !iommu_intpost || !cap_intr_post(iommu->cap)
> +|| !cpu_has_cx16 )
> +iommu_intpost = 0;

The first part is neither in line with the earlier adjustments (as _still_
see in the patch context above) not does it make much sense to set
a variable to zero when it already is zero.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Found memory leak in libxenstore

2015-09-07 Thread Ian Campbell
On Mon, 2015-09-07 at 11:37 +0100, Wei Liu wrote:
> I only had a brief look at this.
> 
> On Mon, Sep 07, 2015 at 09:52:08AM +, Sunguodong wrote:
> > Hi,
> > I found this memory leak on xen-4.5.0:
> > linux-byXjTX:~ # virsh version
> > Compiled against library: libvirt 1.2.17
> > Using library: libvirt 1.2.17
> > Using API: Xen 1.2.17
> > Running hypervisor: Xen 4.5.0
> > 
> > Steps to produce this leak:
> > 
> > 1.   Start a vm, win7_64_2U_vhd, for example.
> > 
> > 2.   Stop libvirtd and start it with valgrind, using the following 
> > command:
> > 
> > valgrind --tool=memcheck --log-file=xs.log --leak-check=full --show
> > -reachable=yes --track-origins=yes --trace-children=yes --verbose 
> > libvirtd -d -l
> > 
> > 3.   Wait until libvirtd started, then reboot the vm:
> > 
> > virsh reboot win7_64_2U_vhd
> > 
> > 4.   After vm restarted, kill valgind with signal 3:
> > 
> > kill -3 27483 (27483 is the pid of valgrind)
> > 
> > 5.   Open xs.log, search 'definitely', we will always find this 
> > leak:
> > 
> > ==28989== 40 bytes in 1 blocks are definitely lost in loss record 327 
> > of 585
> > 
> > ==28989==at 0x4C27B9B: malloc (vg_replace_malloc.c:263)
> > 
> > ==28989==by 0x7EBD618: read_message (xs.c:1146)
> > 
> > ==28989==by 0x7EBE8C6: read_thread (xs.c:1222)
> > 
> > ==28989==by 0x74F8805: start_thread (in /lib64/libpthread
> > -2.11.3.so)
> > 
> > ==28989==by 0x77EC66C: clone (in /lib64/libc-2.11.3.so)
> > 
> > When we reboot vm twice, this leak would happen twice either. Could 
> > anyone verify this and fix it?
> 
> Those messages are freed after the connection is closed. Are you sure
> they are still there after you close the connection?

It seems like libvirtd would be killed ungraciously by killing valgrind in
step 4 above, so it would never close down these connections and cleanup
the memory used by them.

A better test would be to send libvirtd whatever signal causes it to exit
gracefully.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 06/18] vmx: Add some helper functions for Posted-Interrupts

2015-09-07 Thread Jan Beulich
>>> On 06.09.15 at 04:05,  wrote:

> 
>> -Original Message-
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Friday, September 04, 2015 10:40 PM
>> To: Wu, Feng
>> Cc: Andrew Cooper; Tian, Kevin; xen-devel@lists.xen.org; Keir Fraser
>> Subject: Re: [PATCH v6 06/18] vmx: Add some helper functions for
>> Posted-Interrupts
>> 
>> >>> On 25.08.15 at 03:57,  wrote:
>> > @@ -121,11 +122,31 @@ static inline int pi_test_and_clear_on(struct
>> pi_desc *pi_desc)
>> >  return test_and_clear_bit(POSTED_INTR_ON, &pi_desc->control);
>> >  }
>> >
>> > +static inline int pi_test_on(struct pi_desc *pi_desc)
>> > +{
>> > +return test_bit(POSTED_INTR_ON, &pi_desc->control);
>> > +}
>> 
>> For this and ...
>> 
>> > +static inline int pi_test_sn(struct pi_desc *pi_desc)
>> > +{
>> > +return test_bit(POSTED_INTR_SN, &pi_desc->control);
>> > +}
>> 
>> ... this I wonder whether using the bitfield you defined in the
>> previous patch wouldn't allow the compiler more freedom in
>> how to carry this out.
> 
> I am sorry, I don't quite understand it. Do you mean: the bitfield
> defined in previous patch is pointless, or using the bitfield here?

Use it here would seem preferable. (But please recall that I
questioned this two fold access model - partly using bitfields,
partly using bitops - earlier on, and ideally _all_ accesses to
a certain kind of data structure would follow a single, uniform
model.)

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 07/18] vmx: Initialize VT-d Posted-Interrupts Descriptor

2015-09-07 Thread Jan Beulich
>>> On 06.09.15 at 04:22,  wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Friday, September 04, 2015 10:47 PM
>> >>> On 25.08.15 at 03:57,  wrote:
>> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> > @@ -39,6 +39,7 @@
>> >  #include 
>> >  #include 
>> >  #include 
>> > +#include 
>> 
>> I can't seem to spot what this is needed for.
> 
> It is for ' x2apic_enabled '.

Well, not nice to include the header here, but what do we do...

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 5

2015-09-07 Thread Stefano Stabellini
On Mon, 7 Sep 2015, Shannon Zhao wrote:
> On 2015/9/2 20:18, Ian Campbell wrote:
> > On Fri, 2015-08-28 at 17:45 +0800, Shannon Zhao wrote:
> >>
> >> 1. Create minimal DT to pass required information to Dom0
> >> --
> >> The UEFI stub is a feature that extends the Image/zImage into a valid
> >> UEFI PE/COFF executable, including a loader application that makes it
> >> possible to load the kernel directly from the UEFI shell, boot menu, or
> >> one of the lightweight bootloaders like Gummiboot or rEFInd.
> >> The kernel image built with stub support remains a valid kernel image
> >> for booting in non-UEFI environments and the UEFI stub will be jumped
> >> over for non-UEFI environments.
> >>
> >> When booting in UEFI mode, the UEFI stub will create a minimal DT in
> >> order to pass the command line and other informations (such as the EFI
> >> memory table) to the kernel. And when booting with ACPI, kernel will get
> >> command line, ACPI root table address and memory map information from
> >> the minimal DT. Also, it will check if the DT contains only the /chosen
> >> node to know whether it boots with DT or ACPI.
> >>
> >> In addition, the current names of these properties with a "linux,"
> >> prefix in the minimal DT are Linux specified. It needs to standardize
> >> them so that other OS(such as FreeBSD) could reuse them in the future.
> > 
> > I mentioned this just now in a reply to an older revision while I was
> > catching up on my mail backlog but I think it is important enough to
> > reiterate here on the currently latest version:
> > 
> > We need to discuss this possible standardisation of (some derivative of)
> > this Linux internal interfaces in the appropriate forums ASAP and come to a
> > wider agreement that it is acceptable than just here amongst us Xen people.
> > 
> > A large part of this design is predicated on this and we don't want to get
> > too far down this path only to discover the rest of the world says "No,
> > thanks".
> > 
> > See my earlier reply at 
> > http://lists.xen.org/archives/html/xen-devel/2015-09/msg00189.html for some
> > thoughts as to who we should be talking to.
> > 
> 
> Can we start with the patch for dropping the prefix "linux," and send it
> to Linux kernel ML, Linux arm kernel ML, devicetree-spec and BSD ML?
> Explain why it needs to do the change and standardization. Base on this,
> we could discuss it with other people from the related fields.

It looks like a good way forward to me

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC v2 0/4] HVM x86 deprivileged mode operations

2015-09-07 Thread Ben Catterall



On 03/09/15 17:15, David Vrabel wrote:

On 03/09/15 17:01, Ben Catterall wrote:


Intel Intel 2.2GHz Xeon E5-2407 0 processor:

1.55e-06 seconds was the average time for performing the write without the
  deprivileged code running.

5.75e-06 seconds was the average time for performing the write with the
  deprivileged code running.

So approximately 351% overhead

AMD Opteron 2376:
-
1.74e-06 seconds was the average time for performing the write without the
  deprivileged code running.
3.10e-06 seconds was the average time for performing the write with an entry and
  exit from deprvileged mode.

So approximately 178% overhead.


How does this compare to the overhead of passing the I/O through to qemu?

So, passing this portio op through to QEMU takes roughly 20e-6 seconds. 
However, I don't know if the emulator would have gone and prodded a 
physical port as port of that which could skew the result. I'll look 
into this and get back if I find anything to clarify this.


Ben

David



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 08/18] vmx: Suppress posting interrupts when 'SN' is set

2015-09-07 Thread Jan Beulich
>>> On 06.09.15 at 04:33,  wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Friday, September 04, 2015 10:53 PM
>> >>> On 25.08.15 at 03:57,  wrote:
>> > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> > @@ -1701,8 +1701,36 @@ static void vmx_deliver_posted_intr(struct vcpu
>> *v, u8 vector)
>> >   */
>> >  pi_set_on(&v->arch.hvm_vmx.pi_desc);
>> >  }
>> > -else if ( !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
>> > +else
>> >  {
>> > +struct pi_desc old, new, prev;
>> > +
>> > +/* To skip over first check in the loop below. */
>> > +prev.control = 0;
>> 
>> Why don't you just read the field instead of adding the comment?
> 
> What do you mean by "read the field"? Could you please elaborate it?

prev.control = v->arch.hvm_vmx.pi_desc.control;

Jan



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 11/18] vt-d: Add API to update IRTE when VT-d PI is used

2015-09-07 Thread Jan Beulich
>>> On 06.09.15 at 07:24,  wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Friday, September 04, 2015 11:11 PM
>> >>> On 25.08.15 at 03:57,  wrote:
>> > --- a/xen/drivers/passthrough/vtd/intremap.c
>> > +++ b/xen/drivers/passthrough/vtd/intremap.c
>> > @@ -899,3 +899,110 @@ void iommu_disable_x2apic_IR(void)
>> >  for_each_drhd_unit ( drhd )
>> >  disable_qinval(drhd->iommu);
>> >  }
>> > +
>> > +static void setup_posted_irte(struct iremap_entry *new_ire,
>> > +const struct pi_desc *pi_desc, const uint8_t gvec)
>> > +{
>> > +new_ire->post.urg = 0;
>> > +new_ire->post.vector = gvec;
>> > +new_ire->post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
>> > +new_ire->post.pda_h = virt_to_maddr(pi_desc) >> 32;
>> > +
>> > +new_ire->post.res_1 = 0;
>> > +new_ire->post.res_2 = 0;
>> > +new_ire->post.res_3 = 0;
>> > +new_ire->post.res_4 = 0;
>> > +new_ire->post.res_5 = 0;
>> > +
>> > +new_ire->post.im = 1;
>> > +}
>> 
>> I think it would be better to just clear out the structure first. This
>> may then also allow for no longer naming all of the bitfield res_*
>> member, making it even more obvious that they're reserved. Or
>> wait - looking at the use below you seem to be updating the
>> descriptor. Why would you need to zero out reserved fields in
>> that case?
> 
> Here we first get the IRTE from hardware, which may be in
> remapped format, we still need some of the information in it
> after updating it to posted format, such as, fpd, sid, etc. So
> I cannot clear it here. Besides that, there are some fields which
> are needed in remapped format are defined as reserved in posted
> format, I need to clear the reserved field one by one if we get a
> remapped format from the hardware, here I just simply clear it
> for all the cases and it is not a frequent operations, I think it is
> not a big deal.

While perhaps indeed not a big deal performance wise, I would
suppose you at least partly agree that the code above looks
ugly. I'd much rather see made explicit which fields you re-use
than which (reserved) fields you clear. I.e. start out from a
zeroed structure, copy over all fields from the original which you
want to retain, and fill in any non-reserved fields you have to
give new values.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 17/31] xen/arm: ITS: Store LPIs allocated and IRQ ID bits per domain

2015-09-07 Thread Julien Grall
Hi Vijay,

On 07/09/15 07:59, Vijay Kilari wrote:
> On Thu, Sep 3, 2015 at 9:55 PM, Julien Grall  wrote:
>> I still don't think that it makes sense to introduce the number of LPIs
>> variable in a patch for vITS. As you describe it, it should be part of
>> an ITS/GICv3 patch.
>>
>> Although, I think you should use the field nr_irq_ids in the gic
>> structure (see patch #10) to avoid problem you currently have with this
>> solution.
>>
>> For instance, gic_is_lpi is relying on the number of ID bits exposed by
>> the hardware but you allocate the IRQ desc for LPIs based on nr_lpis.
>>
> 
> You mean to restrict nr_irq_ids to FIRST_GIC_LPI + 8192 (nr_lpis)
> instead of HW supported value?
> and let gic_is_lpi() always check for Xen supported lpi number instead of hw
> supported value?

Yes for both.

>> It's fine to restrict the value in the GIC compare to hardware.
>>
>> Furthermore this value should be 0 on platform where LPIs is not
>> supported in order to make confusion and introduce possible problem with
>> the code later. I could add that, AFAICT, this new variable (nr_lpis) is
>> not that often within the code.
> 
>  nr_lpis is global variable that we define to tell number of LPIs supported
> by Xen. If platform does not support LPIs, then vgic.nr_lpis is set to 0.

While you set vgic.nr_lpis to 0, you don't clear nr_lpis. So
technically, someone could decide to use it later but it won't report
the correct value

It was difficult to find the correct place to comment about this given
that you've introduced the variable in an odd place

> 
>>
>> Lastly, on a previous version (don't remember which one) we said that
>> the user should be able to change the value on the command line. What's
>> your plan for that?
> 
> Yes, it was part of our discussion on chat. I will fix it.

A summarize would have been nice... like a TODO in the code and/or in
the cover letter. Depends when you plan to handle it.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 13/18] Update IRTE according to guest interrupt config changes

2015-09-07 Thread Jan Beulich
>>> On 06.09.15 at 06:54,  wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Friday, September 04, 2015 11:59 PM
>> >>> On 25.08.15 at 03:57,  wrote:
>> 
>> First of all - an empty Cc list on a patch is suspicious.
> 
> I did Cc you for this patch. Why do you say "an empty Cc list"?

Oops, sorry - this really belongs to patch 12.

>> > +static struct vcpu *pi_find_dest_vcpu(const struct domain *d, uint32_t
>> dest_id,
>> > +  bool_t dest_mode, uint8_t
>> delivery_mode,
>> > +  uint8_t gvec)
>> > +{
>> > +unsigned int dest_vcpus = 0;
>> > +struct vcpu *v, *dest = NULL;
>> > +
>> > +if ( delivery_mode == dest_LowestPrio )
>> > +return vector_hashing_dest(d, dest_id, dest_mode, gvec);
>> 
>> So at this point delivery_mode == dest_Fixed, right?
> 
> It won't be dest_LowestPrio here, so it might be proper to add
> else if (delivery_mode == dest_Fixed) for the code below. So the
> question is: Can deliver modes other than lowest priority and fixed,
> such as NMI, SMI, etc. hit here? Any ideas?

That's for you to validate. If it can't, add an ASSERT(). If it can, use
an if() as you suggest.

>> > +for_each_vcpu ( d, v )
>> > +{
>> > +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL,
>> APIC_DEST_NOSHORT,
>> > +dest_id, dest_mode) )
>> > +continue;
>> > +
>> > +dest_vcpus++;
>> > +dest = v;
>> > +}
>> > +
>> > +/*
>> > + * For fixed destination, we only handle single-destination
>> > + * interrupts.
>> > + */
>> > +if ( dest_vcpus == 1 )
>> > +return dest;
>> 
>> Is it thus even possible for the if() condition to be false? 
> 
> It can be false if the interrupts has multiple destination with Fixed
> deliver mode.
> 
>> If it isn't,
>> returning early from the loop would seem the better option. And
>> even if it is, I would question whether delivering the interrupt to
>> the first match isn't going to be better than dropping it.
> 
> Here, if we deliver all the interrupts to the first match, only this
> vCPU will receive all the interrupts, even though the irq affinity
> shows it has multiple destination. I don't think this is correct.
> Furthermore, is there any performance issues if the interrupt frequency
> is high and the matched vCPU cannot handle them in time? So here, I
> just leave these kind of interrupts to legacy interrupt remapping
> mechanism.

Oh, okay, if they're not getting lost, that's fine then.

>> >  dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
>> >  pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
>> >  spin_unlock(&d->event_lock);
>> >  if ( dest_vcpu_id >= 0 )
>> >  hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
>> > +
>> > +/* Use interrupt posting if it is supported. */
>> > +if ( iommu_intpost )
>> > +{
>> > +const struct vcpu *vcpu = pi_find_dest_vcpu(d, dest,
>> dest_mode,
>> > +  delivery_mode,
>> pirq_dpci->gmsi.gvec);
>> > +
>> > +if ( vcpu )
>> > +{
>> > +rc = pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec );
>> > +if ( unlikely(rc) )
>> > +dprintk(XENLOG_G_INFO,
>> > +"%pv: failed to update PI IRTE,
>> gvec:%02x\n",
>> > +vcpu, pirq_dpci->gmsi.gvec);
>> 
>> Even if only a debug build printk() - aren't you afraid that if this
>> ever triggers, it will trigger a lot? And hence be pretty useless?
> 
> I think it reaches this debug printk rarely, but a least, when it is really 
> failed, it
> can give people some hints about why we are using interrupt remapping 
> instead
> of interrupt posing for the external interrupts.

I understand your motivation, but you don't really answer my
question. (And btw., if you really mean "rarely", then there's a bug
somewhere that you need to fix. It should _never_ trigger if
everything is working correctly.)

>> > +}
>> 
>> (Not only) depending on the answer, I'd consider adding an "else
>> printk()" here.
> 
> So do you mean something like this:
> 
> if ( vcpu )
> {
> rc = pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec );
> if ( unlikely(rc) )
> dprintk(XENLOG_G_INFO,
> "%pv: failed to update PI IRTE, gvec:%02x, use 
> interrupt remapping\n",
> vcpu, pirq_dpci->gmsi.gvec);
> else
> dprintk(XENLOG_G_INFO,
> "%pv: Succeed to update PI IRTE, gvec:%02x, use 
> interrupt posting\n",
> vcpu, pirq_dpci->gmsi.gvec);
> }

No. I intentionally placed my question _after_ the closing brace, i.e.
I suggested an "else" to the outer "if".

Jan

___

Re: [Xen-devel] [PATCH for 4.6 v3 0/5] Migration v2 fix

2015-09-07 Thread Ian Campbell
On Sun, 2015-09-06 at 21:05 +0100, Wei Liu wrote:
> Wei Liu (5):
>   libxc: clearer migration v2 debug message
>   libxc: migration v2 prefix Memory -> Frames
>   libxc: fix indentation
>   libxc: don't populate same pfn more than once in populate_pfns
>   libxc: add assertion to avoid setting same bit more than once

All applied.

> 
>  tools/libxc/xc_sr_restore.c|  7 ---
>  tools/libxc/xc_sr_restore_x86_pv.c |  4 ++--
>  tools/libxc/xc_sr_save.c   | 10 +-
>  3 files changed, 11 insertions(+), 10 deletions(-)
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen-blkback: free requests on disconnection

2015-09-07 Thread Julien Grall
On 07/09/15 07:07, Bob Liu wrote:
> Hi Julien,

Hi Bob,

> On 09/04/2015 09:51 PM, Julien Grall wrote:
>> Hi Roger,
>>
>> On 04/09/15 11:08, Roger Pau Monne wrote:
>>> Request allocation has been moved to connect_ring, which is called every
>>> time blkback connects to the frontend (this can happen multiple times during
>>> a blkback instance life cycle). On the other hand, request freeing has not
>>> been moved, so it's only called when destroying the backend instance. Due to
>>> this mismatch, blkback can allocate the request pool multiple times, without
>>> freeing it.
>>>
>>> In order to fix it, move the freeing of requests to xen_blkif_disconnect to
>>> restore the symmetry between request allocation and freeing.
>>>
>>> Reported-by: Julien Grall 
>>> Signed-off-by: Roger Pau Monné 
>>> Cc: Julien Grall 
>>> Cc: Konrad Rzeszutek Wilk 
>>> Cc: Boris Ostrovsky 
>>> Cc: David Vrabel 
>>> Cc: xen-de...@lists.xenproject.org
>>
>> The patch is fixing my problem when using UEFI in the guest. Thank you!
>>
> 
> Could you please explain the problem you met a bit more?
> So that I can know back port this patch if met similar issue.

This is related to commit 86839c56dee28c315a4c19b7bfee450ccd84cd25
"xen/block: add multi-page ring support" (Roger, it may be worth to
indicate the offending commit in you commit message).

When starting a guest using UEFI. After the domain is destroyed I get
the following warning from blkback:


[ cut here ]
WARNING: CPU: 2 PID: 95 at
/home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274
xen_blkif_deferred_free+0x1f4/0x1f8()
Modules linked in:
CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: GW   4.2.0 #85
Hardware name: APM X-Gene Mustang board (DT)
Workqueue: events xen_blkif_deferred_free
Call trace:
[] dump_backtrace+0x0/0x124
[] show_stack+0x10/0x1c
[] dump_stack+0x78/0x98
[] warn_slowpath_common+0x9c/0xd4
[] warn_slowpath_null+0x14/0x20
[] xen_blkif_deferred_free+0x1f0/0x1f8
[] process_one_work+0x160/0x3b4
[] worker_thread+0x140/0x494
[] kthread+0xd8/0xf0
---[ end trace 6f859b7883c88cdd ]---

This is because the allocation of the requests are done during the
connection but the free is done when the domain is destroyed. Therefore
if the domain is re-initializing the connection (because UEFI or PV Grub
is used), the request won't be free and kept until the end.

So I think this should be backported in Linux 4.2 where the patch has
been introduced.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6 v3 0/5] Migration v2 fix

2015-09-07 Thread Wei Liu
On Mon, Sep 07, 2015 at 11:10:24AM +0100, Andrew Cooper wrote:
> 
> 
> On 07/09/15 11:07, Wei Liu wrote:
> >On Mon, Sep 07, 2015 at 11:04:25AM +0100, Andrew Cooper wrote:
> >>On 06/09/15 21:05, Wei Liu wrote:
> >>>Wei Liu (5):
> >>>   libxc: clearer migration v2 debug message
> >>>   libxc: migration v2 prefix Memory -> Frames
> >>>   libxc: fix indentation
> >>>   libxc: don't populate same pfn more than once in populate_pfns
> >>>   libxc: add assertion to avoid setting same bit more than once
> >>>
> >>>  tools/libxc/xc_sr_restore.c|  7 ---
> >>>  tools/libxc/xc_sr_restore_x86_pv.c |  4 ++--
> >>>  tools/libxc/xc_sr_save.c   | 10 +-
> >>>  3 files changed, 11 insertions(+), 10 deletions(-)
> >>>
> >>Please test against Xen 4.5 as well.  I have looked at the legacy code, but
> >>can't convince myself either way as to whether the bug is present or not.
> >>
> >DYM cross version migration from 4.5 to 4.6 or 32bit Linux 4.1 save /
> >restore with Xen 4.5?
> 
> Just plain legacy migration with Linux 4.1, to see whether legacy suffers
> from the same issue.  If it does, we should probably backport a legacy fix.
> 

Xen 4.5.1 works fine when migrating 32 bit Linux 4.1 guest.

Wei.

> ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen-blkback: free requests on disconnection

2015-09-07 Thread Bob Liu

On 09/07/2015 07:10 PM, Julien Grall wrote:
> On 07/09/15 07:07, Bob Liu wrote:
>> Hi Julien,
> 
> Hi Bob,
> 
>> On 09/04/2015 09:51 PM, Julien Grall wrote:
>>> Hi Roger,
>>>
>>> On 04/09/15 11:08, Roger Pau Monne wrote:
 Request allocation has been moved to connect_ring, which is called every
 time blkback connects to the frontend (this can happen multiple times 
 during
 a blkback instance life cycle). On the other hand, request freeing has not
 been moved, so it's only called when destroying the backend instance. Due 
 to
 this mismatch, blkback can allocate the request pool multiple times, 
 without
 freeing it.

 In order to fix it, move the freeing of requests to xen_blkif_disconnect to
 restore the symmetry between request allocation and freeing.

 Reported-by: Julien Grall 
 Signed-off-by: Roger Pau Monné 
 Cc: Julien Grall 
 Cc: Konrad Rzeszutek Wilk 
 Cc: Boris Ostrovsky 
 Cc: David Vrabel 
 Cc: xen-de...@lists.xenproject.org
>>>
>>> The patch is fixing my problem when using UEFI in the guest. Thank you!
>>>
>>
>> Could you please explain the problem you met a bit more?
>> So that I can know back port this patch if met similar issue.
> 
> This is related to commit 86839c56dee28c315a4c19b7bfee450ccd84cd25
> "xen/block: add multi-page ring support" (Roger, it may be worth to
> indicate the offending commit in you commit message).
> 
> When starting a guest using UEFI. After the domain is destroyed I get
> the following warning from blkback:
> 
> 
> [ cut here ]
> WARNING: CPU: 2 PID: 95 at
> /home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274
> xen_blkif_deferred_free+0x1f4/0x1f8()
> Modules linked in:
> CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: GW   4.2.0 #85
> Hardware name: APM X-Gene Mustang board (DT)
> Workqueue: events xen_blkif_deferred_free
> Call trace:
> [] dump_backtrace+0x0/0x124
> [] show_stack+0x10/0x1c
> [] dump_stack+0x78/0x98
> [] warn_slowpath_common+0x9c/0xd4
> [] warn_slowpath_null+0x14/0x20
> [] xen_blkif_deferred_free+0x1f0/0x1f8
> [] process_one_work+0x160/0x3b4
> [] worker_thread+0x140/0x494
> [] kthread+0xd8/0xf0
> ---[ end trace 6f859b7883c88cdd ]---
> 
> This is because the allocation of the requests are done during the
> connection but the free is done when the domain is destroyed. Therefore
> if the domain is re-initializing the connection (because UEFI or PV Grub
> is used), the request won't be free and kept until the end.
> 

Thank you!
Roger, I think it's better to have this information in your commit message too.

-- 
Regards,
-Bob

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 14/18] vmx: posted-interrupt handling when vCPU is blocked

2015-09-07 Thread Jan Beulich
>>> On 25.08.15 at 03:57,  wrote:
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -661,6 +661,9 @@ int vmx_cpu_up(void)
>  if ( cpu_has_vmx_vpid )
>  vpid_sync_all();
>  
> +INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> +spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));

This initialization appears to be the sole reason why the respective
variables can't be static (in vmx.c). I'd really like to ask for this to
be adjusted, even if this means adding another call out of
vmx_cpu_up() into vmx.c.

> @@ -106,6 +114,9 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>  
>  spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
>  
> +INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list);

Are these really needed (these aren't list heads afaict)?

> @@ -1976,6 +1987,54 @@ static struct hvm_function_table __initdata 
> vmx_function_table = {
>  .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
>  };
>  
> +/*
> + * Handle VT-d posted-interrupt when VCPU is blocked.
> + */

This is (and hence should be formatted as) a single line comment.

> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> +struct arch_vmx_struct *vmx, *tmp;
> +struct vcpu *v;
> +spinlock_t *lock = &this_cpu(pi_blocked_vcpu_lock);
> +struct list_head *blocked_vcpus = &this_cpu(pi_blocked_vcpu);

At least on possibly performance relevant paths please avoid
multiple back-to-back uses of this_cpu() (use per_cpu() instead).

> +LIST_HEAD(list);
> +
> +spin_lock(lock);
> +
> +/*
> + * XXX: The length of the list depends on how many vCPU is current
> + * blocked on this specific pCPU. This may hurt the interrupt latency
> + * if the list grows to too many entries.
> + */
> +list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list)
> +{
> +if ( pi_test_on(&vmx->pi_desc) )
> +{
> +list_del_init(&vmx->pi_blocked_vcpu_list);

Why not just list_del() (also further down in the second loop)?

> +
> +/*
> + * We cannot call vcpu_unblock here, since it also needs
> + * 'pi_blocked_vcpu_lock', we store the vCPUs with ON
> + * set in another list and unblock them after we release
> + * 'pi_blocked_vcpu_lock'.
> + */

At least at this point in time this doesn't appear to be correct: I
cannot see how, after this patch, vcpu_unblock() would end up
acquiring the lock. And without that (plus without anything ever
adding to pi_blocked_vcpu), this moving between two lists seems
rather odd; I think the splitting of the series into patches needs
to be re-thought unless (preferably) you can find a (reasonably
clean) way without using two lists in the first place.

> +list_add_tail(&vmx->pi_vcpu_on_set_list, &list);
> +}
> +}
> +
> +spin_unlock(lock);
> +
> +ack_APIC_irq();

So why here and not further up or further down? If this is "as
early as possible", a comment should say why it can't be moved
further up.

> +list_for_each_entry_safe(vmx, tmp, &list, pi_vcpu_on_set_list)
> +{
> +v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> +list_del_init(&vmx->pi_vcpu_on_set_list);
> +vcpu_unblock(v);
> +}
> +
> +this_cpu(irq_count)++;

This would probably better pair with ack_APIC_irq().

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] osstest breakage, now believed fixed (Was: Re: [xen-unstable test] 61379: trouble: blocked/broken)

2015-09-07 Thread Ian Campbell
On Sun, 2015-09-06 at 16:49 +, osstest service owner wrote:
> flight 61379 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/61379/
> 
> Failures and problems with tests :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  build-armhf   3 host-install(3) broken REGR. vs. 
> 61059
>  build-armhf-xsm   3 host-install(3) broken REGR. vs. 
> 61059
>  build-amd64-oldkern   3 host-install(3) broken REGR. vs. 
> 61059
>  build-amd64-prev  3 host-install(3) broken REGR. vs. 
> 61059
>  build-i386-oldkern3 host-install(3) broken REGR. vs. 
> 61059
>  build-amd64-xsm   3 host-install(3) broken REGR. vs. 
> 61059
>  build-amd64-pvops 3 host-install(3) broken REGR. vs. 
> 61059
>  build-i386-prev   3 host-install(3) broken REGR. vs. 
> 61059
>  build-i386-pvops  3 host-install(3) broken REGR. vs. 
> 61059
>  build-i3863 host-install(3) broken REGR. vs. 
> 61059
>  build-i386-xsm3 host-install(3) broken REGR. vs. 
> 61059
>  build-amd64   3 host-install(3) broken REGR. vs. 
> 61059
>  build-armhf-pvops 3 host-install(3) broken REGR. vs. 
> 61059

The cache/proxy had gotten itself into some weird state and was giving 404
for URLs which did in fact exist and which were insisted upon by the Debian
Installer.

Ian and I investigated a bit and the cache process has now been restarted
and seems to be running fine now. Flights will restart in the normal way.

Ian.

> 
> Tests which did not succeed, but are not blocking:
>  build-amd64-rumpuserxen   1 build-check(1)   blocked 
>  n/a
>  build-amd64-libvirt   1 build-check(1)   blocked 
>  n/a
>  build-i386-libvirt1 build-check(1)   blocked 
>  n/a
>  test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) 
> blocked n/a
>  test-amd64-amd64-xl-pvh-amd   1 build-check(1)   blocked 
>  n/a
>  test-amd64-amd64-xl-pvh-intel  1 build-check(1)   blocked 
>  n/a
>  test-armhf-armhf-xl-raw   1 build-check(1)   blocked 
>  n/a
>  test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked 
>  n/a
>  build-i386-rumpuserxen1 build-check(1)   blocked 
>  n/a
>  test-armhf-armhf-xl-qcow2 1 build-check(1)   blocked 
>  n/a
>  build-armhf-libvirt   1 build-check(1)   blocked 
>  n/a
>  test-armhf-armhf-xl-vhd   1 build-check(1)   blocked 
>  n/a
>  test-armhf-armhf-libvirt-vhd  1 build-check(1)   blocked 
>  n/a
>  test-armhf-armhf-libvirt  1 build-check(1)   blocked 
>  n/a
>  test-armhf-armhf-libvirt-xsm  1 build-check(1)   blocked 
>  n/a
>  test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)  
>  blocked n/a
>  test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked 
>  n/a
>  test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked 
>  n/a
>  test-amd64-i386-xl1 build-check(1)   blocked 
>  n/a
>  test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked 
>  n/a
>  test-amd64-i386-libvirt   1 build-check(1)   blocked 
>  n/a
>  test-amd64-amd64-xl-xsm   1 build-check(1)   blocked 
>  n/a
>  test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked 
>  n/a
>  test-amd64-i386-libvirt-pair  1 build-check(1)   blocked 
>  n/a
>  test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked 
>  n/a
>  test-amd64-i386-xl-xsm1 build-check(1)   blocked 
>  n/a
>  test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked
>   n/a
>  test-armhf-armhf-xl-rtds  1 build-check(1)   blocked 
>  n/a
>  test-amd64-amd64-xl   1 build-check(1)   blocked 
>  n/a
>  test-armhf-armhf-xl   1 build-check(1)   blocked 
>  n/a
>  test-amd64-i386-qemut-rhel6hvm-intel  1 build-check(1)
>  blocked n/a
>  test-amd64-amd64-libvirt  1 build-check(1)   blocked 
>  n/a
>  test-armhf-armhf-xl-arndale   1 build-check(1)   blocked 
>  n/a
>  test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) 
> blocked n/a
>  test-amd64-amd64-pair 1 build-check(1)   blocked 
>  n/a
>  test-amd64-i386-qemuu-rhel6hvm-intel  1 build-check(1)
>  blocked n/a
>  test-amd64-i386-libvirt-raw   1 build-check(1)   blocked 
>  n/a
>  test-amd64-i386-qemut-rhel6hvm-amd  1 build-check(1)  
>  blocked n/a
>  test-amd64-amd64-xl-qcow2 1 build-check(1)   blocked 
>  n/a
>  test-amd64-i386-migrupgrade   1 build-check(1)   

Re: [Xen-devel] [PATCH v6 15/18] vmx: Properly handle notification event when vCPU is running

2015-09-07 Thread Jan Beulich
>>> On 25.08.15 at 03:57,  wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2035,6 +2035,52 @@ static void pi_wakeup_interrupt(struct cpu_user_regs 
> *regs)
>  this_cpu(irq_count)++;
>  }
>  
> +/* Handle VT-d posted-interrupt when VCPU is running. */
> +static void pi_notification_interrupt(struct cpu_user_regs *regs)
> +{
> +ack_APIC_irq();
> +
> +/*
> + * We get here when a vCPU is running in root-mode (such as via 
> hypercall,
> + * or any other reasons which can result in VM-Exit), and before vCPU is
> + * back to non-root, external interrupts from an assigned device happen
> + * and a notification event is delivered to this logical CPU.
> + *
> + * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
> + * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR 
> will
> + * be synced to vIRR before VM-Exit in time.
> + *
> + * Please refer to the following code fragments from
> + * xen/arch/x86/hvm/vmx/entry.S:
> + *
> + * .Lvmx_do_vmentry
> + *
> + *  ..
> + *  point 1
> + *
> + *  cmp  %ecx,(%rdx,%rax,1)

I think retaining the "cli" right above this is quite critical for
understanding why code past this check isn't susceptible to the
issue anymore.

> + *  jnz  .Lvmx_process_softirqs
> + *
> + *  ..
> + *
> + *  je   .Lvmx_launch
> + *
> + *  ..
> + *
> + * .Lvmx_process_softirqs:
> + *  sti
> + *  call do_softirq
> + *  jmp  .Lvmx_do_vmentry
> + *
> + * If VT-d engine issues a notification event at point 1 above, it cannot
> + * be delivered to the guest during this VM-entry without raising the
> + * softirq in this notification handler.
> + */
> +raise_softirq(VCPU_KICK_SOFTIRQ);
> +
> +this_cpu(irq_count)++;
> +}
> +
>  const struct hvm_function_table * __init start_vmx(void)
>  {
>  set_in_cr4(X86_CR4_VMXE);
> @@ -2073,7 +2119,7 @@ const struct hvm_function_table * __init start_vmx(void)
>  
>  if ( cpu_has_vmx_posted_intr_processing )
>  {
> -alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
> +alloc_direct_apic_vector(&posted_intr_vector, 
> pi_notification_interrupt);
>  
>  if ( iommu_intpost )
>  alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);

Considering that you do this setup independent of iommu_intpost,
won't this (namely, but not only) for the !iommu_inpost case result
in a whole lot of useless softirqs to be raised? IOW - shouldn't this
setup be conditional, and shouldn't the handler also only
conditionally raise the softirq (to as much as possible limit their
amount)?

Yang, in this context: Why does __vmx_deliver_posted_interrupt()
not use cpu_raise_softirq(), instead kind of open coding it (see your
d7dafa375b ["VMX: Add posted interrupt supporting"])?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] osstest breakage, now believed fixed (Was: Re: [xen-unstable test] 61379: trouble: blocked/broken)

2015-09-07 Thread Wei Liu
On Mon, Sep 07, 2015 at 12:55:25PM +0100, Ian Campbell wrote:
> On Sun, 2015-09-06 at 16:49 +, osstest service owner wrote:
> > flight 61379 xen-unstable real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/61379/
> > 
> > Failures and problems with tests :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >  build-armhf   3 host-install(3) broken REGR. vs. 
> > 61059
> >  build-armhf-xsm   3 host-install(3) broken REGR. vs. 
> > 61059
> >  build-amd64-oldkern   3 host-install(3) broken REGR. vs. 
> > 61059
> >  build-amd64-prev  3 host-install(3) broken REGR. vs. 
> > 61059
> >  build-i386-oldkern3 host-install(3) broken REGR. vs. 
> > 61059
> >  build-amd64-xsm   3 host-install(3) broken REGR. vs. 
> > 61059
> >  build-amd64-pvops 3 host-install(3) broken REGR. vs. 
> > 61059
> >  build-i386-prev   3 host-install(3) broken REGR. vs. 
> > 61059
> >  build-i386-pvops  3 host-install(3) broken REGR. vs. 
> > 61059
> >  build-i3863 host-install(3) broken REGR. vs. 
> > 61059
> >  build-i386-xsm3 host-install(3) broken REGR. vs. 
> > 61059
> >  build-amd64   3 host-install(3) broken REGR. vs. 
> > 61059
> >  build-armhf-pvops 3 host-install(3) broken REGR. vs. 
> > 61059
> 
> The cache/proxy had gotten itself into some weird state and was giving 404
> for URLs which did in fact exist and which were insisted upon by the Debian
> Installer.
> 
> Ian and I investigated a bit and the cache process has now been restarted
> and seems to be running fine now. Flights will restart in the normal way.
> 

I believe it's going to pick up the latest xen-unstable staging?

Wei.

> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-09-07 Thread Jan Beulich
>>> On 25.08.15 at 03:57,  wrote:
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1573,6 +1573,22 @@ static void __context_switch(void)
>  per_cpu(curr_vcpu, cpu) = n;
>  }
>  
> +static inline void pi_ctxt_switch_from(struct vcpu *prev)
> +{
> +/*
> + * When switching from non-idle to idle, we only do a lazy context 
> switch.
> + * However, in order for posted interrupt (if available and enabled) to
> + * work properly, we at least need to update the descriptors.
> + */
> +if ( prev->arch.pi_ctxt_switch_from && !is_idle_vcpu(prev) )
> +prev->arch.pi_ctxt_switch_from(prev);
> +}
> +
> +static inline void pi_ctxt_switch_to(struct vcpu *next)
> +{
> +if ( next->arch.pi_ctxt_switch_to && !is_idle_vcpu(next) )
> +next->arch.pi_ctxt_switch_to(next);
> +}
>  
>  void context_switch(struct vcpu *prev, struct vcpu *next)
>  {
> @@ -1605,9 +1621,12 @@ void context_switch(struct vcpu *prev, struct vcpu 
> *next)
>  
>  set_current(next);
>  
> +pi_ctxt_switch_from(prev);
> +
>  if ( (per_cpu(curr_vcpu, cpu) == next) ||
>   (is_idle_domain(nextd) && cpu_online(cpu)) )
>  {
> +pi_ctxt_switch_to(next);
>  local_irq_enable();

This placement, if really intended that way, needs explanation (in a
comment) and perhaps even renaming of the involved symbols, as
looking at it from a general perspective it seems wrong (with
pi_ctxt_switch_to() excluding idle vCPU-s it effectively means you
want this only when switching back to what got switched out lazily
before, i.e. this would be not something to take place on an arbitrary
context switch). As to possible alternative names - maybe make the
hooks ctxt_switch_prepare() and ctxt_switch_cancel()?

> @@ -117,10 +119,20 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>  INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
>  INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list);
>  
> +v->arch.hvm_vmx.pi_block_cpu = -1;
> +
> +spin_lock_init(&v->arch.hvm_vmx.pi_lock);
> +
>  v->arch.schedule_tail= vmx_do_resume;
>  v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>  v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
>  
> +if ( iommu_intpost && is_hvm_vcpu(v) )
> +{
> +v->arch.pi_ctxt_switch_from = vmx_pre_ctx_switch_pi;
> +v->arch.pi_ctxt_switch_to = vmx_post_ctx_switch_pi;
> +}

Why conditional upon is_hvm_vcpu()?

> @@ -718,6 +730,140 @@ static void vmx_fpu_leave(struct vcpu *v)
>  }
>  }
>  
> +void arch_vcpu_wake_prepare(struct vcpu *v)

A non-static function with this name should not live in vmx.c.

> +{
> +unsigned long gflags;

Just "flags" please.

> +if ( !iommu_intpost || !is_hvm_vcpu(v) || !has_arch_pdevs(v->domain) )

DYM !has_hvm_container_vcpu()?

> +return;
> +
> +spin_lock_irqsave(&v->arch.hvm_vmx.pi_lock, gflags);
> +
> +if ( likely(vcpu_runnable(v)) ||
> + !test_bit(_VPF_blocked, &v->pause_flags) )

_VPF_blocked set implies !vcpu_runnable(), i.e. afaict just the
latter check would suffice if this is really what you mean.

> +{
> +struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +unsigned long flags;

You don't need this - you already know IRQs are off for the lock
acquire below.

> +/*
> + * We don't need to send notification event to a non-running
> + * vcpu, the interrupt information will be delivered to it before
> + * VM-ENTRY when the vcpu is scheduled to run next time.
> + */
> +pi_set_sn(pi_desc);
> +
> +/*
> + * Set 'NV' field back to posted_intr_vector, so the
> + * Posted-Interrupts can be delivered to the vCPU by
> + * VT-d HW after it is scheduled to run.
> + */
> +write_atomic((uint8_t*)&pi_desc->nv, posted_intr_vector);

Bogus cast.

> +
> +/*
> + * Delete the vCPU from the related block list
> + * if we are resuming from blocked state.
> + */
> +if ( v->arch.hvm_vmx.pi_block_cpu != -1 )
> +{
> +spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> +  v->arch.hvm_vmx.pi_block_cpu), flags);
> +list_del_init(&v->arch.hvm_vmx.pi_blocked_vcpu_list);

Shouldn't you set .pi_block_cpu back to -1 along with removing
the vCPU from the list? Which then ought to allow using just
list_del() here?

> +static void vmx_pre_ctx_switch_pi(struct vcpu *v)
> +{
> +struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +struct pi_desc old, new;

Please limit the scope of these two variables.

> +unsigned long flags, gflags;

See above.

> +if ( !has_arch_pdevs(v->domain) )
> +return;
> +
> +spin_lock_irqsave(&v->arch.hvm_vmx.pi_lock, gflags);
> +
> +if ( vcpu_runnable(v) || !test_bit(_VPF_blocked, &v->pause_flags) )

See above.

> +{
> +/*
> + * The vCPU has been preempted or went to sleep. We don't need to

[Xen-devel] [PATCH OSSTEST] ts-xen-install: Rewrite /etc/hosts to comment out 127.0.1.1 entry

2015-09-07 Thread Ian Campbell
Debian creates an entry such as:
127.0.1.1 lace-bug.xs.citrite.net lace-bug

This causes local lookups of the FQDN to get 127.0.1.1, which is
unhelpful if you are looking for an address to bind to and were hoping
to get the public IP address, as libvirt does on the target host for
migration.

Here we remove (actually, comment) any 127.0.1.1 line in /etc/hosts.
This means that lookups of a hosts own name (fqdn or just dn) now rely
on DNS, which may not be ideal. However for a host which uses DHCP I'm
not aware of a way to keep /etc/hosts up to date with the actual IP
address the machine has.  In our infra the test host IP addresses are
all static, but I don't think we want to rely on at any more that we
already do.

Signed-off-by: Ian Campbell 
---
 ts-xen-install | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/ts-xen-install b/ts-xen-install
index e039173..7c80a13 100755
--- a/ts-xen-install
+++ b/ts-xen-install
@@ -229,6 +229,24 @@ END
 target_cmd_root($ho, $cmd);
 }
 
+sub hosts () {
+# Debian installer generates a line like:
+#
+# 127.0.1.1 lace-bug.xs.citrite.net lace-bug
+#
+# Where xs.citrite.net is $c{TestHostDomain} and lace-bug is the
+# hostname, this causes lookups of the hosts FQDN to result in
+# 127.0.1.1 which is not useful if the reason for the lookup was
+# to be able to bind to the externally accessible IP address.
+target_editfile_root($ho, "/etc/hosts", "etc-hosts", sub {
+my $hn = $ho->{Name};
+while () {
+   s|^127.0.1.1|#$&|;
+   print EO;
+}
+});
+}
+
 sub nodhcp () {
 target_editfile_root($ho, "/etc/network/interfaces",
  "etc-network-interfaces", sub {
@@ -350,4 +368,5 @@ if ($checkmode) {
 setupboot();
 setupinitd();
 nodhcp();
+hosts();
 }
-- 
2.5.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] osstest breakage, now believed fixed (Was: Re: [xen-unstable test] 61379: trouble: blocked/broken)

2015-09-07 Thread Ian Campbell
On Mon, 2015-09-07 at 13:43 +0100, Wei Liu wrote:
> On Mon, Sep 07, 2015 at 12:55:25PM +0100, Ian Campbell wrote:
> > On Sun, 2015-09-06 at 16:49 +, osstest service owner wrote:
> > > flight 61379 xen-unstable real [real]
> > > http://logs.test-lab.xenproject.org/osstest/logs/61379/
> > > 
> > > Failures and problems with tests :-(
> > > 
> > > Tests which did not succeed and are blocking,
> > > including tests which could not be run:
> > >  build-armhf   3 host-install(3) broken REGR. 
> > > vs. 61059
> > >  build-armhf-xsm   3 host-install(3) broken REGR. 
> > > vs. 61059
> > >  build-amd64-oldkern   3 host-install(3) broken REGR. 
> > > vs. 61059
> > >  build-amd64-prev  3 host-install(3) broken REGR. 
> > > vs. 61059
> > >  build-i386-oldkern3 host-install(3) broken REGR. 
> > > vs. 61059
> > >  build-amd64-xsm   3 host-install(3) broken REGR. 
> > > vs. 61059
> > >  build-amd64-pvops 3 host-install(3) broken REGR. 
> > > vs. 61059
> > >  build-i386-prev   3 host-install(3) broken REGR. 
> > > vs. 61059
> > >  build-i386-pvops  3 host-install(3) broken REGR. 
> > > vs. 61059
> > >  build-i3863 host-install(3) broken REGR. 
> > > vs. 61059
> > >  build-i386-xsm3 host-install(3) broken REGR. 
> > > vs. 61059
> > >  build-amd64   3 host-install(3) broken REGR. 
> > > vs. 61059
> > >  build-armhf-pvops 3 host-install(3) broken REGR. 
> > > vs. 61059
> > 
> > The cache/proxy had gotten itself into some weird state and was giving 
> > 404
> > for URLs which did in fact exist and which were insisted upon by the 
> > Debian
> > Installer.
> > 
> > Ian and I investigated a bit and the cache process has now been 
> > restarted
> > and seems to be running fine now. Flights will restart in the normal 
> > way.
> > 
> 
> I believe it's going to pick up the latest xen-unstable staging?

Flight 61521 seems to have picked up on a7b39c8bd6cb, yes.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 15/18] vmx: Properly handle notification event when vCPU is running

2015-09-07 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-09-07:
>> + *  jnz  .Lvmx_process_softirqs
>> + *
>> + *  ..
>> + *
>> + *  je   .Lvmx_launch
>> + *
>> + *  ..
>> + *
>> + * .Lvmx_process_softirqs:
>> + *  sti
>> + *  call do_softirq
>> + *  jmp  .Lvmx_do_vmentry
>> + *
>> + * If VT-d engine issues a notification event at point 1 above, it 
>> cannot
>> + * be delivered to the guest during this VM-entry without raising the
>> + * softirq in this notification handler.
>> + */
>> +raise_softirq(VCPU_KICK_SOFTIRQ);
>> +
>> +this_cpu(irq_count)++;
>> +}
>> +
>>  const struct hvm_function_table * __init start_vmx(void)
>>  {
>>  set_in_cr4(X86_CR4_VMXE);
>> @@ -2073,7 +2119,7 @@ const struct hvm_function_table * __init
>> start_vmx(void)
>> 
>>  if ( cpu_has_vmx_posted_intr_processing )
>>  {
>> -alloc_direct_apic_vector(&posted_intr_vector,
>> event_check_interrupt); +   
>> alloc_direct_apic_vector(&posted_intr_vector,
>> pi_notification_interrupt);
>> 
>>  if ( iommu_intpost )
>>  alloc_direct_apic_vector(&pi_wakeup_vector,
> pi_wakeup_interrupt);
> 
> Considering that you do this setup independent of iommu_intpost, won't
> this (namely, but not only) for the !iommu_inpost case result in a whole
> lot of useless softirqs to be raised? IOW - shouldn't this setup be
> conditional, and shouldn't the handler also only conditionally raise the
> softirq (to as much as possible limit their amount)?
> 
> Yang, in this context: Why does __vmx_deliver_posted_interrupt()
> not use cpu_raise_softirq(), instead kind of open coding it (see your
> d7dafa375b ["VMX: Add posted interrupt supporting"])?

Sorry, I am not in the context. What do you mean of using cpu_raise_softirq() 
in __vmx_deliver_posted_interrupt()?

Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] osstest breakage, now believed fixed (Was: Re: [xen-unstable test] 61379: trouble: blocked/broken)

2015-09-07 Thread Wei Liu
On Mon, Sep 07, 2015 at 01:49:16PM +0100, Ian Campbell wrote:
> On Mon, 2015-09-07 at 13:43 +0100, Wei Liu wrote:
> > On Mon, Sep 07, 2015 at 12:55:25PM +0100, Ian Campbell wrote:
> > > On Sun, 2015-09-06 at 16:49 +, osstest service owner wrote:
> > > > flight 61379 xen-unstable real [real]
> > > > http://logs.test-lab.xenproject.org/osstest/logs/61379/
> > > > 
> > > > Failures and problems with tests :-(
> > > > 
> > > > Tests which did not succeed and are blocking,
> > > > including tests which could not be run:
> > > >  build-armhf   3 host-install(3) broken REGR. 
> > > > vs. 61059
> > > >  build-armhf-xsm   3 host-install(3) broken REGR. 
> > > > vs. 61059
> > > >  build-amd64-oldkern   3 host-install(3) broken REGR. 
> > > > vs. 61059
> > > >  build-amd64-prev  3 host-install(3) broken REGR. 
> > > > vs. 61059
> > > >  build-i386-oldkern3 host-install(3) broken REGR. 
> > > > vs. 61059
> > > >  build-amd64-xsm   3 host-install(3) broken REGR. 
> > > > vs. 61059
> > > >  build-amd64-pvops 3 host-install(3) broken REGR. 
> > > > vs. 61059
> > > >  build-i386-prev   3 host-install(3) broken REGR. 
> > > > vs. 61059
> > > >  build-i386-pvops  3 host-install(3) broken REGR. 
> > > > vs. 61059
> > > >  build-i3863 host-install(3) broken REGR. 
> > > > vs. 61059
> > > >  build-i386-xsm3 host-install(3) broken REGR. 
> > > > vs. 61059
> > > >  build-amd64   3 host-install(3) broken REGR. 
> > > > vs. 61059
> > > >  build-armhf-pvops 3 host-install(3) broken REGR. 
> > > > vs. 61059
> > > 
> > > The cache/proxy had gotten itself into some weird state and was giving 
> > > 404
> > > for URLs which did in fact exist and which were insisted upon by the 
> > > Debian
> > > Installer.
> > > 
> > > Ian and I investigated a bit and the cache process has now been 
> > > restarted
> > > and seems to be running fine now. Flights will restart in the normal 
> > > way.
> > > 
> > 
> > I believe it's going to pick up the latest xen-unstable staging?
> 
> Flight 61521 seems to have picked up on a7b39c8bd6cb, yes.
> 

Thanks for confirming!

Wei.

> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 17/18] VT-d: Dump the posted format IRTE

2015-09-07 Thread Jan Beulich
>>> On 25.08.15 at 03:57,  wrote:
> @@ -220,7 +224,7 @@ static void dump_iommu_info(unsigned char key)
>  struct iremap_entry *p;
>  if ( i % (1 << IREMAP_ENTRY_ORDER) == 0 )
>  {
> -/* This entry across page boundry */
> +/* This entry across page boundary. */

Either leave the comment alone, or fix its content along with its format.

> @@ -230,7 +234,7 @@ static void dump_iommu_info(unsigned char key)
>  else
>  p = &iremap_entries[i % (1 << IREMAP_ENTRY_ORDER)];
>  
> -if ( !p->remap.p )
> +if ( !p->remap.p || p->remap.im )
>  continue;
>  printk("  %04x:  %x   %x  %04x %08x %02x%x   %x  %x  %x  
> %x"
>  "   %x %x\n", i,
> @@ -239,6 +243,41 @@ static void dump_iommu_info(unsigned char key)
>  p->remap.rh, p->remap.dm, p->remap.fpd, p->remap.p);
>  print_cnt++;
>  }
> +
> +if ( iremap_entries )
> +unmap_vtd_domain_page(iremap_entries);
> +
> +iremap_entries = NULL;
> +printk ("\nEntries for posted format:\n");
> +printk("   SVT  SQ   SID  PDA  V  URG AVL FPD 
> P\n");
> +for ( i = 0; i < nr_entry; i++ )
> +{
> +struct iremap_entry *p;
> +if ( i % (1 << IREMAP_ENTRY_ORDER) == 0 )
> +{
> +/* This entry across page boundry */
> +if ( iremap_entries )
> +unmap_vtd_domain_page(iremap_entries);
> +
> +GET_IREMAP_ENTRY(iremap_maddr, i,
> + iremap_entries, p);
> +}
> +else
> +p = &iremap_entries[i % (1 << IREMAP_ENTRY_ORDER)];
> +
> +if ( !p->post.p || !p->post.im )
> +continue;
> +
> +printk("  %04x:  %x   %x  %04x %16lx %02x%x   %x  %x  
> %x\n",
> +i,
> +p->post.svt, p->post.sq, p->post.sid,
> +((u64)p->post.pda_h << 32) | (p->post.pda_l << 6),
> +p->post.vector, p->post.urg, p->post.avail, p->post.fpd,
> +p->post.p);
> +
> +print_cnt++;
> +}

Hmm, this two stage approach seems to imply more overhead in
terms of execution time and in terms of analysis (of the output)
time. Can you see to make this a single loop with the output
formatted in a way such that
- one can immediately recognize the kind that each line represents,
- mixed output is still sufficiently aligned to retain overall readability?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 18/18] Add a command line parameter for VT-d posted-interrupts

2015-09-07 Thread Jan Beulich
>>> On 25.08.15 at 03:57,  wrote:
> Enable VT-d Posted-Interrupts and add a command line
> parameter for it.
> 
> Signed-off-by: Feng Wu 
> Reviewed-by: Kevin Tian 

Acked-by: Jan Beulich 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 15/18] vmx: Properly handle notification event when vCPU is running

2015-09-07 Thread Jan Beulich
>>> On 07.09.15 at 15:00,  wrote:
> Jan Beulich wrote on 2015-09-07:
>> Yang, in this context: Why does __vmx_deliver_posted_interrupt()
>> not use cpu_raise_softirq(), instead kind of open coding it (see your
>> d7dafa375b ["VMX: Add posted interrupt supporting"])?
> 
> Sorry, I am not in the context. What do you mean of using 
> cpu_raise_softirq() in __vmx_deliver_posted_interrupt()?

Why is the function not using that ready to use helper? Looking at
it ...

>static void __vmx_deliver_posted_interrupt(struct vcpu *v)
>{
>bool_t running = v->is_running;
>
>vcpu_unblock(v);
>if ( running && (in_irq() || (v != current)) )
>{
>unsigned int cpu = v->processor;
>
>if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))

... this line as well as ...

> && (cpu != smp_processor_id()) )
>send_IPI_mask(cpumask_of(cpu), posted_intr_vector);

... this one ...

>}
>}

... pretty certainly don't belong into vmx.c, or the apparent open
coding of cpu_raise_softirq() would require a justifying comment.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 21/31] xen/arm: ITS: Add GITS registers emulation

2015-09-07 Thread Julien Grall
Hi Vijay,

On 31/08/15 12:06, vijay.kil...@gmail.com wrote:
> From: Vijaya Kumar K 
> 
> Emulate GITS* registers
> 
> Signed-off-by: Vijaya Kumar K 
> ---
> v6: - Removed unrelated code of this patch
> - Used vgic_regN_{read,write}
> v4: - Removed GICR register emulation
> ---
>  xen/arch/arm/vgic-v3-its.c|  337 
> -
>  xen/include/asm-arm/gic-its.h |   16 ++
>  2 files changed, 352 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c
> index 53f2a27..c384ecf 100644
> --- a/xen/arch/arm/vgic-v3-its.c
> +++ b/xen/arch/arm/vgic-v3-its.c
> @@ -35,6 +35,14 @@
>  
>  //#define DEBUG_ITS
>  
> +/* GITS_PIDRn register values for ARM implementations */
> +#define GITS_PIDR0_VAL   (0x94)
> +#define GITS_PIDR1_VAL   (0xb4)
> +#define GITS_PIDR2_VAL   (0x3b)
> +#define GITS_PIDR3_VAL   (0x00)
> +#define GITS_PIDR4_VAL   (0x04)
> +#define GITS_BASER_INIT_VAL  ((1UL << GITS_BASER_TYPE_SHIFT) | \
> + (0x7UL << GITS_BASER_ENTRY_SIZE_SHIFT))

This value is only for BASER0 so GITS_BASER0_INIT_VAL

You also need to explain what means the different values. And please
don't hardcode the size of the Device Entry but the proper sizeof.

>  #ifdef DEBUG_ITS
>  # define DPRINTK(fmt, args...) dprintk(XENLOG_DEBUG, fmt, ##args)
>  #else
> @@ -535,7 +543,7 @@ static int vits_read_virt_cmd(struct vcpu *v, struct 
> vgic_its *vits,
>  return 0;
>  }
>  
> -int vits_process_cmd(struct vcpu *v, struct vgic_its *vits)
> +static int vits_process_cmd(struct vcpu *v, struct vgic_its *vits)

You would have been able to keep the static when the patch has been
introduced if you have plumbed vgic-v3 for vITS support latter (i.e part
of #19 moved at the end).

>  {
>  its_cmd_block virt_cmd;


[...]

> +static int vgic_v3_gits_mmio_read(struct vcpu *v, mmio_info_t *info)
> +{
> +struct vgic_its *vits = v->domain->arch.vgic.vits;
> +struct hsr_dabt dabt = info->dabt;
> +struct cpu_user_regs *regs = guest_cpu_user_regs();
> +register_t *r = select_user_reg(regs, dabt.reg);
> +uint64_t val = 0;
> +uint32_t gits_reg;
> +
> +gits_reg = info->gpa - vits->gits_base;
> +DPRINTK("%pv: vITS: GITS_MMIO_READ offset 0x%"PRIx32"\n", v, gits_reg);
> +
> +switch ( gits_reg )
> +{
> +case GITS_CTLR:
> +if ( dabt.size != DABT_WORD ) goto bad_width;
> +vits_spin_lock(vits);
> +/*
> + * vITS is always quiescent, has no translations in progress and
> + * completed all operations. So set quescent by default.

s/quescent/quiescent/

> + */
> +*r = vgic_reg32_read((vits->ctrl | GITS_CTLR_QUIESCENT), info);

IHMO the Quiescent bit should be set at write time not read time. So we
can use it in the emulation if necessary.

> +vits_spin_unlock(vits);
> +return 1;
> +case GITS_IIDR:
> +if ( dabt.size != DABT_WORD ) goto bad_width;
> +*r = vgic_reg32_read(GICV3_GICD_IIDR_VAL, info);
> +return 1;
> +case GITS_TYPER:
> +case GITS_TYPER + 4:
> +/*
> + * GITS_TYPER.HCC = max_vcpus + 1 (max collection supported)
> + * GITS_TYPER.Devbits = HW supported Devbits size
> + * GITS_TYPER.IDbits = HW supported IDbits size
> + * GITS_TYPER.PTA = 0 (Target addresses are linear processor numbers)
> + * GITS_TYPER.ITTSize = Size of struct vitt
> + * GITS_TYPER.Physical = 1
> + */
> +if ( !vgic_reg64_check_access(dabt) ) goto bad_width;
> +val = ((vits_get_max_collections(v->domain) << GITS_TYPER_HCC_SHIFT 
> ) |
> +   ((vits_hw.dev_bits - 1) << GITS_TYPER_DEVBITS_SHIFT)  
>  |
> +   ((vits_hw.eventid_bits - 1) << GITS_TYPER_IDBITS_SHIFT)   
>  |
> +   ((sizeof(struct vitt) - 1) << GITS_TYPER_ITT_SIZE_SHIFT)  
>  |
> + GITS_TYPER_PHYSICAL_LPIS);
> +if ( dabt.size == DABT_DOUBLE_WORD )
> +*r = vgic_reg64_read(val, info);
> +else
> +*r = vgic_reg32_read(val, info);

Can you explain why this is necessary? The goal of vgic_reg64_read is to
handle all access size to 64bit register.
The vgic_reg32_read will only handle access on 32bit and therefore it
won't be possible to access to the most significant word.

> +return 1;
> +case 0x0010 ... 0x007c:
> +case 0xc000 ... 0xffcc:
> +/* Implementation defined -- read ignored */
> +goto read_as_zero;
> +case GITS_CBASER:
> +case GITS_CBASER + 4:
> +/* Read supports only 32/64-bit access */

This comment is not necessary and is confusing as you call the helper now

> +if ( !vgic_reg64_check_access(dabt) ) goto bad_width;
> +vits_spin_lock(vits);
> +if ( dabt.size == DABT_DOUBLE_WORD )
> +*r = vgic_reg64_read(vits->cmd_base, info);
> +

Re: [Xen-devel] Xen-unstable: Keyboard doesn't seem to work when using VNC to HVM with qemu-xen device model.

2015-09-07 Thread Wei Liu
On Sun, Sep 06, 2015 at 12:46:01PM +0200, Sander Eikelenboom wrote:
> Hi All,
> 
> Today i noticed that keyboard doesn't work when using VNC on a HVM guest
> which runs under qemu-xen device-model.
> Mouse still works with usbdevice='tablet', but keypress show no response
> what so ever.
> 
> Unfortunately i'm on a tight time budget the next few weeks, so i just
> switched to device_model_version='qemu-xen-traditional (with the same
> config) and there all is fine.
> 
> This is on a xen-unstable of about 2 weeks ago, since i don't have time to
> reply, and xen is already in the RC's hopefully someone else can re-test and
> have a look.
> 

I just tried staging branch, keyboard worked. My guest is a my usual
test guest which has Debian install. Note that I don't run desktop
environment in that guest.

I fail to see what changes in recent weeks could affect QEMU VNC
keyboard.

Wei.

> Thanks,
> 
> Sander
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 0/2] xen/arm: vgic: Support 32-bit access for 64-bit register

2015-09-07 Thread Julien Grall
On 04/08/15 12:59, Julien Grall wrote:
> Hi all,

Hi,

Any more comments on this series before I send a new version?

Regards,

> This series aims to fix the 32-bit access on 64-bit register. Some guest
> OS such as FreeBSD and Linux (only in the ITS) use those access and will
> crash when starting on Xen.
> 
> The first patch introduces generic helpers to read/write/clear/set a register.
> While the second is the main purpose of this series.
> 
> I'd like to go a bit further in the clean up, hence the RFC. But I wanted them
> out in order to help Vijay supporting any access quickly for his ITS series.
> 
> TODO:
> - use the new helpers in vGICv2
> - support signed extension generically
> - see what assembly is generated on 32-bit with the uint64_t cast.
> It may be possible to optimize it a bit by avoid uint64_t. Although I'm
> not sure if it's worth it.
> 
> Sincerely yours,
> 
> Julien Grall (2):
>   xen/arm: vgic: Introduce helpers to read/write/clear/set vGIC register
> ...
>   xen/arm: vgic-v3: Support 32-bit access for 64-bit registers
> 
>  xen/arch/arm/vgic-v3.c | 126 
> +
>  xen/include/asm-arm/vgic.h | 104 +
>  2 files changed, 187 insertions(+), 43 deletions(-)
> 


-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 22/31] xen/arm: ITS: Add virtual ITS availability check helper

2015-09-07 Thread Julien Grall
Hi Vijay,

On 31/08/15 12:06, vijay.kil...@gmail.com wrote:
> diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
> index 659d919..1c88300 100644
> --- a/xen/include/asm-arm/vgic.h
> +++ b/xen/include/asm-arm/vgic.h
> @@ -110,7 +110,13 @@ static inline void sgi_target_init(struct sgi_target 
> *sgi_target)
>  sgi_target->list = 0;
>  }
>  
> +struct vgic_info {
> +bool_t its_enabled;
> +};
> +
>  struct vgic_ops {
> +/* Hold vGIC information */
> +const struct vgic_info *info;

The vgic_ops are shared between all the domains so you will end up to
allow ITS on every domains.

Any domain specific value should go in the domain structure and not in
the vgic_ops.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.6] efi: introduce efi_arch_flush_dcache_area

2015-09-07 Thread Stefano Stabellini
Objects loaded by FileHandle->Read need to be flushed to dcache,
otherwise copy_from_paddr will read stale data when copying the kernel,
causing a failure to boot.

Introduce efi_arch_flush_dcache_area and call it from read_file.

This commit introduces no functional changes on x86.

Reported-by: Mark Rutland 
Signed-off-by: Stefano Stabellini 
CC: ian.campb...@citrix.com
CC: jbeul...@suse.com
CC: wei.l...@citrix.com
---
 xen/arch/arm/efi/efi-boot.h |7 +++
 xen/arch/x86/efi/efi-boot.h |2 ++
 xen/common/efi/boot.c   |3 ++-
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/efi/efi-boot.h b/xen/arch/arm/efi/efi-boot.h
index 6a12d91..fd79658 100644
--- a/xen/arch/arm/efi/efi-boot.h
+++ b/xen/arch/arm/efi/efi-boot.h
@@ -9,6 +9,7 @@
 #include 
 
 void noreturn efi_xen_start(void *fdt_ptr, uint32_t fdt_size);
+void __flush_dcache_area(const void * vaddr, unsigned long size);
 
 #define DEVICE_TREE_GUID \
 {0xb1b621d5, 0xf19c, 0x41a5, {0x83, 0x0b, 0xd9, 0x15, 0x2c, 0x69, 0xaa, 0xe0}}
@@ -571,6 +572,12 @@ static void __init 
efi_arch_video_init(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,
EFI_GRAPHICS_OUTPUT_MODE_INFORMATION 
*mode_info)
 {
 }
+
+static void efi_arch_flush_dcache_area(const void *vaddr, UINTN size)
+{
+__flush_dcache_area(vaddr, size);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 2dd69f6..4c7f383 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -640,6 +640,8 @@ static bool_t __init 
efi_arch_use_config_file(EFI_SYSTEM_TABLE *SystemTable)
 return 1; /* x86 always uses a config file */
 }
 
+static void efi_arch_flush_dcache_area(const void *vaddr, UINTN size) { }
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index 75a939f..f0f26c1 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -526,7 +526,8 @@ static bool_t __init read_file(EFI_FILE_HANDLE dir_handle, 
CHAR16 *name,
 PrintErr(what);
 PrintErr(L" failed for ");
 PrintErrMesg(name, ret);
-}
+} else
+efi_arch_flush_dcache_area(file->ptr, file->size);
 
 return 1;
 }
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 24/31] xen/arm: ITS: Add GICR register emulation

2015-09-07 Thread Julien Grall
Hi Vijay,

On 31/08/15 12:06, vijay.kil...@gmail.com wrote:
> +static int vgic_v3_gits_lpi_mmio_read(struct vcpu *v, mmio_info_t *info)

[...]

> +/* Neglect if not LPI. */
> +if ( offset < FIRST_GIC_LPI )
> +{
> +*r = 0;
> +return 1;
> +}

[...]

> +static int vgic_v3_gits_lpi_mmio_write(struct vcpu *v, mmio_info_t *info)
> +{

[...]

> +/* Neglect if not LPI. */
> +if ( offset < FIRST_GIC_LPI )
> +return 1;

Based on the spec, those 2 checks are wrong and make impossible to use
LPIs. Please test this patch series before sending it on the ML.

Regards,


-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6] efi: introduce efi_arch_flush_dcache_area

2015-09-07 Thread Ian Campbell
On Mon, 2015-09-07 at 15:13 +0100, Stefano Stabellini wrote:
> Objects loaded by FileHandle->Read need to be flushed to dcache,
> otherwise copy_from_paddr will read stale data when copying the kernel,
> causing a failure to boot.
> 
> Introduce efi_arch_flush_dcache_area and call it from read_file.
> 
> This commit introduces no functional changes on x86.
> 
> Reported-by: Mark Rutland 
> Signed-off-by: Stefano Stabellini 
> CC: ian.campb...@citrix.com
> CC: jbeul...@suse.com
> CC: wei.l...@citrix.com

Acked-by: Ian Campbell 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 0/2] support gzipped kernels on arm

2015-09-07 Thread Stefano Stabellini
Hi all,

this patch series introduces support for gzipped kernels, such as the
standard Image.gz format used by Linux on arm64 by default, in Xen on
arm. Without it, Xen cannot load the default kernel shipped by distros,
such as CentOS 7.


Stefano Stabellini (2):
  xen: move perform_gunzip to common
  xen/arm: support gzip compressed kernels

 xen/arch/arm/kernel.c   |   75 +++
 xen/arch/arm/setup.c|2 +-
 xen/arch/x86/bzimage.c  |  134 +-
 xen/common/Makefile |1 +
 xen/common/gunzip.c |  137 +++
 xen/include/asm-arm/setup.h |2 +
 xen/include/xen/gunzip.h|7 +++
 7 files changed, 224 insertions(+), 134 deletions(-)
 create mode 100644 xen/common/gunzip.c
 create mode 100644 xen/include/xen/gunzip.h

Cheers,

Stefano

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-07 Thread Liuqiming (John)
I believe this also has something to do with a windows guest boot hang 
issue.


It randomly occured, when boot a guest has windows 2008 os and pv-driver 
installed.

The boot process hangs when wait xenstored replay event signal.

It can be reproduced after hundreds reboot using the xen staging branch.
But after I changed this code the hang issue can not reproduce.

Any Ideas?

On 2015/9/7 22:46, - wrote:

From: liuqiming 00178499 

This set operation doesn't make any sense, and it will block later
interrupt injected into vm (by vcpu_kick or deliver_posted_intr),
which will cause a performance issue on CPU supporting posted-interrupt.

Signed-off-by: Qiming Liu 
Cc: Yang Zhang 
---
  xen/arch/x86/hvm/vmx/vmx.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 2582cdd..9480b44 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1681,7 +1681,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
  {
  unsigned int cpu = v->processor;
  
-if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))

+if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
   && (cpu != smp_processor_id()) )
  send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
  }




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 2/2] xen/arm: support gzip compressed kernels

2015-09-07 Thread Stefano Stabellini
Free the memory used for the compressed kernel and update the relative
mod->start and mod->size parameters with the uncompressed ones.

Signed-off-by: Stefano Stabellini 
Reviewed-by: Julien Grall 
CC: ian.campb...@citrix.com

---

Changes in v5:
- code style

Changes in v4:
- return uint32_t from output_length
- __init kernel_decompress
- code style
- add comment
- if kernel_decompress fails with error, return

Changes in v3:
- better error checks in kernel_decompress
- free unneeded pages between output_size and kernel_order_out
- alloc pages from domheap

Changes in v2:
- use gzip_check
- avoid useless casts
- free original kernel image and update the mod with the new address and
size
- remove changes to common Makefile
- remove CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
---
 xen/arch/arm/kernel.c   |   75 +++
 xen/arch/arm/setup.c|2 +-
 xen/include/asm-arm/setup.h |2 ++
 3 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index f641b12..baa5717 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "kernel.h"
 
@@ -257,6 +259,63 @@ static int kernel_uimage_probe(struct kernel_info *info,
 return 0;
 }
 
+static __init uint32_t output_length(char *image, unsigned long image_len)
+{
+return *(uint32_t *)&image[image_len - 4];
+}
+
+static __init int kernel_decompress(struct kernel_info *info,
+ paddr_t *addr, paddr_t *size)
+{
+char *output, *input, *end;
+char magic[2];
+int rc;
+unsigned kernel_order_out;
+paddr_t output_size;
+struct page_info *pages;
+
+if ( *size < 2 )
+return -EINVAL;
+
+copy_from_paddr(magic, *addr, sizeof(magic));
+
+/* only gzip is supported */
+if ( !gzip_check(magic, *size) )
+return -EINVAL;
+
+input = ioremap_cache(*addr, *size);
+if ( input == NULL )
+return -EFAULT;
+
+output_size = output_length(input, *size);
+kernel_order_out = get_order_from_bytes(output_size);
+pages = alloc_domheap_pages(NULL, kernel_order_out, 0);
+if ( pages == NULL )
+{
+iounmap(input);
+return -ENOMEM;
+}
+output = page_to_virt(pages);
+
+rc = perform_gunzip(output, input, *size);
+clean_dcache_va_range(output, output_size);
+iounmap(input);
+
+*addr = virt_to_maddr(output);
+*size = output_size;
+
+end = output + (1 << (kernel_order_out + PAGE_SHIFT));
+/* 
+ * Need to free pages after output_size here because they won't be
+ * freed by discard_initial_modules
+ */
+output += (output_size + PAGE_SIZE - 1) & PAGE_MASK;
+for ( ; output < end; output += PAGE_SIZE )
+free_domheap_page(virt_to_page(output));
+
+return 0;
+}
+
 #ifdef CONFIG_ARM_64
 /*
  * Check if the image is a 64-bit Image.
@@ -463,6 +522,22 @@ int kernel_probe(struct kernel_info *info)
 printk("Loading ramdisk from boot module @ %"PRIpaddr"\n",
info->initrd_bootmodule->start);
 
+/* if it is a gzip'ed image, 32bit or 64bit, uncompress it */
+rc = kernel_decompress(info, &start, &size);
+if (rc < 0 && rc != -EINVAL)
+return rc;
+else if (!rc)
+{
+/* 
+ * Free the original kernel, update the pointers to the
+ * decompressed kernel
+ */
+dt_unreserved_regions(mod->start, mod->start + mod->size,
+  init_domheap_pages, 0);
+mod->start = start;
+mod->size = size;
+}
+
 #ifdef CONFIG_ARM_64
 rc = kernel_zimage64_probe(info, start, size);
 if (rc < 0)
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 6626eba..109c71c 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -165,7 +165,7 @@ static void __init processor_id(void)
 processor_setup();
 }
 
-static void dt_unreserved_regions(paddr_t s, paddr_t e,
+void dt_unreserved_regions(paddr_t s, paddr_t e,
   void (*cb)(paddr_t, paddr_t), int first)
 {
 int i, nr = fdt_num_mem_rsv(device_tree_flattened);
diff --git a/xen/include/asm-arm/setup.h b/xen/include/asm-arm/setup.h
index 81bb3da..30ac53b 100644
--- a/xen/include/asm-arm/setup.h
+++ b/xen/include/asm-arm/setup.h
@@ -54,6 +54,8 @@ void copy_from_paddr(void *dst, paddr_t paddr, unsigned long 
len);
 int construct_dom0(struct domain *d);
 
 void discard_initial_modules(void);
+void dt_unreserved_regions(paddr_t s, paddr_t e,
+   void (*cb)(paddr_t, paddr_t), int first);
 
 size_t __init boot_fdt_info(const void *fdt, paddr_t paddr);
 const char __init *boot_fdt_cmdline(const void *fdt);
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 1/2] xen: move perform_gunzip to common

2015-09-07 Thread Stefano Stabellini
The current gunzip code to decompress the Dom0 kernel is implemented in
inflate.c which is included by bzimage.c.

I am looking to doing the same on ARM64 but there is quite a bit of
boilerplate definitions that I would need to import in order for
inflate.c to work correctly.

Instead of copying/pasting the code from x86/bzimage.c, move those
definitions to a new common file, gunzip.c. Export only perform_gunzip
and gzip_check. Leave output_length where it is.

Signed-off-by: Stefano Stabellini 
Acked-by: Jan Beulich 
CC: andrew.coop...@citrix.com

---
Changes in v4:
- move gunzip.init.o to its alphabetically correct place

Changes in v3:
- build gunzip.c as gunzip.init.o
- remove #include 
- remove __init from declarations

Changes in v2:
- the patch has been reworked from scratch
---
 xen/arch/x86/bzimage.c   |  134 +
 xen/common/Makefile  |1 +
 xen/common/gunzip.c  |  137 ++
 xen/include/xen/gunzip.h |7 +++
 4 files changed, 146 insertions(+), 133 deletions(-)
 create mode 100644 xen/common/gunzip.c
 create mode 100644 xen/include/xen/gunzip.h

diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c
index c86c39e..50ebb84 100644
--- a/xen/arch/x86/bzimage.c
+++ b/xen/arch/x86/bzimage.c
@@ -4,148 +4,16 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
-#define HEAPORDER 3
-
-static unsigned char *__initdata window;
-#define memptr long
-static memptr __initdata free_mem_ptr;
-static memptr __initdata free_mem_end_ptr;
-
-#define WSIZE   0x8000
-
-static unsigned char *__initdata inbuf;
-static unsigned __initdata insize;
-
-/* Index of next byte to be processed in inbuf: */
-static unsigned __initdata inptr;
-
-/* Bytes in output buffer: */
-static unsigned __initdata outcnt;
-
-#define OF(args)args
-#define STATIC  static
-
-#define memzero(s, n)   memset((s), 0, (n))
-
-typedef unsigned char   uch;
-typedef unsigned short  ush;
-typedef unsigned long   ulg;
-
-#define INIT__init
-#define INITDATA__initdata
-
-#define get_byte()  (inptr < insize ? inbuf[inptr++] : fill_inbuf())
-
-/* Diagnostic functions */
-#ifdef DEBUG
-#  define Assert(cond, msg) do { if (!(cond)) error(msg); } while (0)
-#  define Trace(x)  do { fprintf x; } while (0)
-#  define Tracev(x) do { if (verbose) fprintf x ; } while (0)
-#  define Tracevv(x)do { if (verbose > 1) fprintf x ; } while (0)
-#  define Tracec(c, x)  do { if (verbose && (c)) fprintf x ; } while (0)
-#  define Tracecv(c, x) do { if (verbose > 1 && (c)) fprintf x ; } while (0)
-#else
-#  define Assert(cond, msg)
-#  define Trace(x)
-#  define Tracev(x)
-#  define Tracevv(x)
-#  define Tracec(c, x)
-#  define Tracecv(c, x)
-#endif
-
-static long __initdata bytes_out;
-static void flush_window(void);
-
-static __init void error(char *x)
-{
-panic("%s", x);
-}
-
-static __init int fill_inbuf(void)
-{
-error("ran out of input data");
-return 0;
-}
-
-
-#include "../../common/inflate.c"
-
-static __init void flush_window(void)
-{
-/*
- * The window is equal to the output buffer therefore only need to
- * compute the crc.
- */
-unsigned long c = crc;
-unsigned n;
-unsigned char *in, ch;
-
-in = window;
-for ( n = 0; n < outcnt; n++ )
-{
-ch = *in++;
-c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
-}
-crc = c;
-
-bytes_out += (unsigned long)outcnt;
-outcnt = 0;
-}
-
 static __init unsigned long output_length(char *image, unsigned long image_len)
 {
 return *(uint32_t *)&image[image_len - 4];
 }
 
-static __init int gzip_check(char *image, unsigned long image_len)
-{
-unsigned char magic0, magic1;
-
-if ( image_len < 2 )
-return 0;
-
-magic0 = (unsigned char)image[0];
-magic1 = (unsigned char)image[1];
-
-return (magic0 == 0x1f) && ((magic1 == 0x8b) || (magic1 == 0x9e));
-}
-
-static __init int perform_gunzip(char *output, char *image, unsigned long 
image_len)
-{
-int rc;
-
-if ( !gzip_check(image, image_len) )
-return 1;
-
-window = (unsigned char *)output;
-
-free_mem_ptr = (unsigned long)alloc_xenheap_pages(HEAPORDER, 0);
-free_mem_end_ptr = free_mem_ptr + (PAGE_SIZE << HEAPORDER);
-
-inbuf = (unsigned char *)image;
-insize = image_len;
-inptr = 0;
-
-makecrc();
-
-if ( gunzip() < 0 )
-{
-rc = -EINVAL;
-}
-else
-{
-rc = 0;
-}
-
-free_xenheap_pages((void *)free_mem_ptr, HEAPORDER);
-
-return rc;
-}
-
 struct __packed setup_header {
 uint8_t _pad0[0x1f1];   /* skip uninteresting stuff */
 uint8_t setup_sects;
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 3fdf931..e681aaa 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -10,6 +10,7 @@ obj-y += event_channel.o
 obj-y += event_fifo.o
 obj-y +

Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-07 Thread Jan Beulich
>>> On 07.09.15 at 16:24,  wrote:
> I believe this also has something to do with a windows guest boot hang 
> issue.
> 
> It randomly occured, when boot a guest has windows 2008 os and pv-driver 
> installed.
> The boot process hangs when wait xenstored replay event signal.
> 
> It can be reproduced after hundreds reboot using the xen staging branch.
> But after I changed this code the hang issue can not reproduce.

The change below (which I don't think was ever posted to xen-devel)
does not make any sense, as it prohibits timely delivery of guest
interrupts. If there is an issue, I think you'd need to start with clearly
describing the issue you see and what you think it is being caused by.

Jan

> On 2015/9/7 22:46, - wrote:
>> From: liuqiming 00178499 
>>
>> This set operation doesn't make any sense, and it will block later
>> interrupt injected into vm (by vcpu_kick or deliver_posted_intr),
>> which will cause a performance issue on CPU supporting posted-interrupt.
>>
>> Signed-off-by: Qiming Liu 
>> Cc: Yang Zhang 
>> ---
>>   xen/arch/x86/hvm/vmx/vmx.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>> index 2582cdd..9480b44 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -1681,7 +1681,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu 
> *v)
>>   {
>>   unsigned int cpu = v->processor;
>>   
>> -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>> +if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>>&& (cpu != smp_processor_id()) )
>>   send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>>   }
> 
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netback grant copying issues

2015-09-07 Thread Wei Liu
On Fri, Sep 04, 2015 at 03:27:42AM -0600, Jan Beulich wrote:
> Ian, Wei,
> 
> I seem to be seeing two issues in the grant copy handling of netback,
> solely from code inspection:
> 
> 1) Shouldn't MAX_GRANT_COPY_OPS, to take care of the copying
> the header may require, be
> ((MAX_SKB_FRAGS + 1) * XEN_NETIF_RX_RING_SIZE)?
> 

Not sure what you mean by "header". I take it you mean SKB itself? We do
need to use grant copy data inside SKB (not in frag list) to frontend.

It sounds plausible that there is some sort of miscounting, just that no
SKB is seen to be so broken to trigger that.

With that in mind, even MAX_SKB_FRAGS + 1 is not enough. It would be
MAX_SKB_FRAGS + 64K / PAGE_SIZE, i.e. we count the most extreme
situation that we have 63K data in SKB, and 1 byte in each frag.

> 2) xenvif_gop_frag_copy() may set up any number of copy_gop-s
> per meta slot, yet xenvif_check_gop() checks only one.
> 

That's indeed something that need to be fixed.

> Apart from that I find it puzzling that the comment ahead of
> xenvif_gop_frag_copy() talks about flipping interfaces, while iirc
> all flipping code has long been deleted from the upstream driver.
> 

That should be removed IMO.

Wei.

> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [raisin][PATCH] Gracefully handle an unsupported distro.

2015-09-07 Thread Stefano Stabellini
On Tue, 18 Aug 2015, Doug Goldstein wrote:
> When your distro is not supported, handle the case gracefully and let
> the user know instead of spitting out bash errors.
> 
> Signed-off-by: Doug Goldstein 

Thanks for the patch!

Reviewed-by: Stefano Stabellini 


>  lib/common-functions.sh | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/lib/common-functions.sh b/lib/common-functions.sh
> index efde84c..c52174a 100644
> --- a/lib/common-functions.sh
> +++ b/lib/common-functions.sh
> @@ -176,6 +176,7 @@ function get_distro() {
>  ;;
>  *)
>  DISTRO=$os_VENDOR
> +PKGTYPE="unknown"
>  ;;
>  esac
>  
> @@ -220,6 +221,16 @@ function _install-package-rpm() {
>  $SUDO yum install -y $* > /dev/null
>  }
>  
> +function _check-package-unknown() {
> +error_echo "I don't know distro $DISTRO. It might be missing packages."
> +return 1
> +}
> +
> +function _install-package-unknown() {
> +error_echo "I don't know distro $DISTRO. Cannot install packages."
> +return 1
> +}
> +
>  # Modifies inherited variable "missing"
>  function check-package() {
>  for p in $*

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6] efi: introduce efi_arch_flush_dcache_area

2015-09-07 Thread Jan Beulich
>>> On 07.09.15 at 16:13,  wrote:
> Objects loaded by FileHandle->Read need to be flushed to dcache,
> otherwise copy_from_paddr will read stale data when copying the kernel,
> causing a failure to boot.

I have to admit that I'm a little puzzled by this description: If this
was a general requirement (which is how it reads to me), why does
->Read() not do the necessary flushing? Or if it's not a general
requirement, perhaps the description could be changed to make
clear what exact dependency exists that does not exist for all
users of ->Read()?

> --- a/xen/arch/arm/efi/efi-boot.h
> +++ b/xen/arch/arm/efi/efi-boot.h
> @@ -9,6 +9,7 @@
>  #include 
>  
>  void noreturn efi_xen_start(void *fdt_ptr, uint32_t fdt_size);
> +void __flush_dcache_area(const void * vaddr, unsigned long size);

While I see Ian ack-ed this, I find it wrong to replicate a declaration
here that now can get out of sync with the canonical one at any
point in time, without anyone noticing at build time. Or wait - this
looks to be a boot time only interface that so far didn't even have a
C declaration. That's fine then except for the minor coding style issue
(stray blank between "*" and "vaddr").

> --- a/xen/common/efi/boot.c
> +++ b/xen/common/efi/boot.c
> @@ -526,7 +526,8 @@ static bool_t __init read_file(EFI_FILE_HANDLE 
> dir_handle, CHAR16 *name,
>  PrintErr(what);
>  PrintErr(L" failed for ");
>  PrintErrMesg(name, ret);
> -}
> +} else
> +efi_arch_flush_dcache_area(file->ptr, file->size);

No need for the (misplaced) "else" - PrintErrMesg() does not return.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] XEN_NETIF_NR_SLOTS_MIN

2015-09-07 Thread Wei Liu
On Fri, Sep 04, 2015 at 06:50:45AM -0600, Jan Beulich wrote:
> Wei,
> 
> coming back to commit 723a375f4e introducing this constant along
> with some commentary: First of all - why 18 when in old Linux kernels
> MAX_SKB_FRAGS is 18 and the head usually requires another slot?

That indeed didn't count the head slot, which is handled separately. It
also doesn't include meta slots.

Maybe the comment needs to be updated to "slots that contain actual
data"?

> And then - why mostly/only for the (guest) TX path? Don't we have
> two compatibility issues:
> 1) old netfront (max 18 frags) sending via new netback (max 17
> frags)

That's handled by netback by consolidating frags.

> 2) old netback (max 18 frags) handing received packets to new
> frontend (max 17 frags)

That's fine because 18 > 17.

> I.e. shouldn't modern netback be capable to (guest) send at least
> 19 slots despite there only being 17 frags, and shouldn't similarly

It does -- as said above, the head and extra slots are not counted
against that limit.

> modern netfront be able to receive at least 19 slots? In the latter
> case - if not, how should netback guess at what number of slots it
> can safely forward? (Yes, I am aware of David's 1650d5455b

Note that slot != frag. I admit netif.h conflates the two concepts in a
bad way. I think that's due to netback has evolved a lot overtime.

I'm pretty sure there is check in netback to not overrun the ring by
using too many slots.

If you're talking about number of frags, it's not on protocol level.
Linux netfront is able to consolidate several slots into reasonable
number of frags and make sure the number of frags won't exceed limit.
See xennet_fill_frags.

> relying on hypervisor side improvements available only in 4.6, i.e.
> while in general considering this a reasonable approach, I'm not
> sure this should be done unconditionally, and hence I'm trying to
> figure reasonable criteria of when to do this on older hypervisors.)
> 

I think with my above explanation it should be safe for you to not
include 1650d5455b.

Wei.

> Thanks, Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netback grant copying issues

2015-09-07 Thread Jan Beulich
>>> On 07.09.15 at 16:47,  wrote:
> On Fri, Sep 04, 2015 at 03:27:42AM -0600, Jan Beulich wrote:
>> Ian, Wei,
>> 
>> I seem to be seeing two issues in the grant copy handling of netback,
>> solely from code inspection:
>> 
>> 1) Shouldn't MAX_GRANT_COPY_OPS, to take care of the copying
>> the header may require, be
>> ((MAX_SKB_FRAGS + 1) * XEN_NETIF_RX_RING_SIZE)?
>> 
> 
> Not sure what you mean by "header". I take it you mean SKB itself? We do
> need to use grant copy data inside SKB (not in frag list) to frontend.

Yes - with "header" is mean everything up to skb_headlen(skb).

> It sounds plausible that there is some sort of miscounting, just that no
> SKB is seen to be so broken to trigger that.

Why "broken"?

> With that in mind, even MAX_SKB_FRAGS + 1 is not enough. It would be
> MAX_SKB_FRAGS + 64K / PAGE_SIZE, i.e. we count the most extreme
> situation that we have 63K data in SKB, and 1 byte in each frag.

No, every component (head + every frag) can contribute
exactly one copy operation to a RX ring slot (larger heads, just
like larger frags) would get copied into multiple RX ring slots.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable: Keyboard doesn't seem to work when using VNC to HVM with qemu-xen device model.

2015-09-07 Thread Sander Eikelenboom

Monday, September 7, 2015, 3:21:45 PM, you wrote:

> On Sun, Sep 06, 2015 at 12:46:01PM +0200, Sander Eikelenboom wrote:
>> Hi All,
>> 
>> Today i noticed that keyboard doesn't work when using VNC on a HVM guest
>> which runs under qemu-xen device-model.
>> Mouse still works with usbdevice='tablet', but keypress show no response
>> what so ever.
>> 
>> Unfortunately i'm on a tight time budget the next few weeks, so i just
>> switched to device_model_version='qemu-xen-traditional (with the same
>> config) and there all is fine.
>> 
>> This is on a xen-unstable of about 2 weeks ago, since i don't have time to
>> reply, and xen is already in the RC's hopefully someone else can re-test and
>> have a look.
>> 

> I just tried staging branch, keyboard worked. My guest is a my usual
> test guest which has Debian install. Note that I don't run desktop
> environment in that guest.

> I fail to see what changes in recent weeks could affect QEMU VNC
> keyboard.

> Wei.

>> Thanks,
>> 
>> Sander
>> 
>> ___
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel

Ok perhaps it's something special on my install then guest was linux console 
(clonezilla). Unfortunatly no time to retest any time soon, but since it works 
for you it will probably work for most installs :) thanks for testing.

--
Sander


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netback grant copying issues

2015-09-07 Thread Wei Liu
On Mon, Sep 07, 2015 at 09:03:12AM -0600, Jan Beulich wrote:
> >>> On 07.09.15 at 16:47,  wrote:
> > On Fri, Sep 04, 2015 at 03:27:42AM -0600, Jan Beulich wrote:
> >> Ian, Wei,
> >> 
> >> I seem to be seeing two issues in the grant copy handling of netback,
> >> solely from code inspection:
> >> 
> >> 1) Shouldn't MAX_GRANT_COPY_OPS, to take care of the copying
> >> the header may require, be
> >> ((MAX_SKB_FRAGS + 1) * XEN_NETIF_RX_RING_SIZE)?
> >> 
> > 
> > Not sure what you mean by "header". I take it you mean SKB itself? We do
> > need to use grant copy data inside SKB (not in frag list) to frontend.
> 
> Yes - with "header" is mean everything up to skb_headlen(skb).
> 
> > It sounds plausible that there is some sort of miscounting, just that no
> > SKB is seen to be so broken to trigger that.
> 
> Why "broken"?
> 

Broken because I think that's going to introduce some performance
regression. But I see why "broken" is a bad description. Let's say, a
bit "extreme".

> > With that in mind, even MAX_SKB_FRAGS + 1 is not enough. It would be
> > MAX_SKB_FRAGS + 64K / PAGE_SIZE, i.e. we count the most extreme
> > situation that we have 63K data in SKB, and 1 byte in each frag.
> 
> No, every component (head + every frag) can contribute
> exactly one copy operation to a RX ring slot (larger heads, just
> like larger frags) would get copied into multiple RX ring slots.
> 

I think you misunderstand copy_op slot vs RX ring slot.  The limitation
rs for copy_op slots, not RX ring slots. Note that multiplication of
XEN_NETIF_RX_RING_SIZE.

The skb header data len is not limited to one 4K page (4K because that's
what xen grant table interface supports), netback will break skb header
data into multiple copy_op slots.

Wei.

> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6] efi: introduce efi_arch_flush_dcache_area

2015-09-07 Thread Stefano Stabellini
On Mon, 7 Sep 2015, Jan Beulich wrote:
> >>> On 07.09.15 at 16:13,  wrote:
> > Objects loaded by FileHandle->Read need to be flushed to dcache,
> > otherwise copy_from_paddr will read stale data when copying the kernel,
> > causing a failure to boot.
> 
> I have to admit that I'm a little puzzled by this description: If this
> was a general requirement (which is how it reads to me)

Yes, it is


> why does > ->Read() not do the necessary flushing? Or if it's not a
> general requirement, perhaps the description could be changed to make
> clear what exact dependency exists that does not exist for all users
> of ->Read()?

It is a general requirement: anything that could be accessed without a
cacheable mapping needs to be flushed. Unfortunately the UEFI spec is
not clear enough on who should do the flushing on what, and in fact the
Forum is working on improving it


> > --- a/xen/arch/arm/efi/efi-boot.h
> > +++ b/xen/arch/arm/efi/efi-boot.h
> > @@ -9,6 +9,7 @@
> >  #include 
> >  
> >  void noreturn efi_xen_start(void *fdt_ptr, uint32_t fdt_size);
> > +void __flush_dcache_area(const void * vaddr, unsigned long size);
> 
> While I see Ian ack-ed this, I find it wrong to replicate a declaration
> here that now can get out of sync with the canonical one at any
> point in time, without anyone noticing at build time. Or wait - this
> looks to be a boot time only interface that so far didn't even have a
> C declaration.

That's right


> That's fine then except for the minor coding style issue
> (stray blank between "*" and "vaddr").

Thanks for spotting that


> > --- a/xen/common/efi/boot.c
> > +++ b/xen/common/efi/boot.c
> > @@ -526,7 +526,8 @@ static bool_t __init read_file(EFI_FILE_HANDLE 
> > dir_handle, CHAR16 *name,
> >  PrintErr(what);
> >  PrintErr(L" failed for ");
> >  PrintErrMesg(name, ret);
> > -}
> > +} else
> > +efi_arch_flush_dcache_area(file->ptr, file->size);
> 
> No need for the (misplaced) "else" - PrintErrMesg() does not return.

Ah! Thanks!

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netback grant copying issues

2015-09-07 Thread David Vrabel
On 04/09/15 10:27, Jan Beulich wrote:
> Ian, Wei,
> 
> I seem to be seeing two issues in the grant copy handling of netback,
> solely from code inspection:
> 
> 1) Shouldn't MAX_GRANT_COPY_OPS, to take care of the copying
> the header may require, be
> ((MAX_SKB_FRAGS + 1) * XEN_NETIF_RX_RING_SIZE)?

We really should have the copy batches be independent of ring size and
skb format.

e.g., Call xenvif_gnt_copy_add(ref, ptr, len) for each copy op needed.
This will issue the hypercall when the batch is full.  There would need
to be axenvif_gnt_copy_flush() at the end for last hypercall for the
remaining ops.

David


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] XEN_NETIF_NR_SLOTS_MIN

2015-09-07 Thread Jan Beulich
>>> On 07.09.15 at 16:56,  wrote:
> On Fri, Sep 04, 2015 at 06:50:45AM -0600, Jan Beulich wrote:
>> Wei,
>> 
>> coming back to commit 723a375f4e introducing this constant along
>> with some commentary: First of all - why 18 when in old Linux kernels
>> MAX_SKB_FRAGS is 18 and the head usually requires another slot?
> 
> That indeed didn't count the head slot, which is handled separately. It
> also doesn't include meta slots.
> 
> Maybe the comment needs to be updated to "slots that contain actual
> data"?

That would take care of the meta slots, but the head to me also is
"actual data".

>> And then - why mostly/only for the (guest) TX path? Don't we have
>> two compatibility issues:
>> 1) old netfront (max 18 frags) sending via new netback (max 17
>> frags)
> 
> That's handled by netback by consolidating frags.

Right - that's the TX path that the description in netif.h refers to.

>> 2) old netback (max 18 frags) handing received packets to new
>> frontend (max 17 frags)
> 
> That's fine because 18 > 17.

Exactly because of 18 > 17 this is not fine: How would netfront fit
18 slots handed to it by netback into 17 frags without special
adjustments?

>> I.e. shouldn't modern netback be capable to (guest) send at least
>> 19 slots despite there only being 17 frags, and shouldn't similarly
> 
> It does -- as said above, the head and extra slots are not counted
> against that limit.
> 
>> modern netfront be able to receive at least 19 slots? In the latter
>> case - if not, how should netback guess at what number of slots it
>> can safely forward? (Yes, I am aware of David's 1650d5455b
> 
> Note that slot != frag. I admit netif.h conflates the two concepts in a
> bad way. I think that's due to netback has evolved a lot overtime.
> 
> I'm pretty sure there is check in netback to not overrun the ring by
> using too many slots.
> 
> If you're talking about number of frags, it's not on protocol level.
> Linux netfront is able to consolidate several slots into reasonable
> number of frags and make sure the number of frags won't exceed limit.
> See xennet_fill_frags.

That's too late - xennet_get_responses() would have rejected such
a packet already afaict (error message "Too many slots").

>> relying on hypervisor side improvements available only in 4.6, i.e.
>> while in general considering this a reasonable approach, I'm not
>> sure this should be done unconditionally, and hence I'm trying to
>> figure reasonable criteria of when to do this on older hypervisors.)
> 
> I think with my above explanation it should be safe for you to not
> include 1650d5455b.

Without an equivalent of David's change I don't see how netfront
would cope with what _old_ netback may hand it.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] netback grant copying issues

2015-09-07 Thread Jan Beulich
>>> On 07.09.15 at 17:10,  wrote:
> On Mon, Sep 07, 2015 at 09:03:12AM -0600, Jan Beulich wrote:
>> >>> On 07.09.15 at 16:47,  wrote:
>> > With that in mind, even MAX_SKB_FRAGS + 1 is not enough. It would be
>> > MAX_SKB_FRAGS + 64K / PAGE_SIZE, i.e. we count the most extreme
>> > situation that we have 63K data in SKB, and 1 byte in each frag.
>> 
>> No, every component (head + every frag) can contribute
>> exactly one copy operation to a RX ring slot (larger heads, just
>> like larger frags) would get copied into multiple RX ring slots.
>> 
> 
> I think you misunderstand copy_op slot vs RX ring slot.  The limitation
> rs for copy_op slots, not RX ring slots. Note that multiplication of
> XEN_NETIF_RX_RING_SIZE.
> 
> The skb header data len is not limited to one 4K page (4K because that's
> what xen grant table interface supports), netback will break skb header
> data into multiple copy_op slots.

But in that case it will not only use multiple copy_op slots, but also
copy into multiple RX slots. I.e. still - per RX slot there can only be
MAX_SKB_FRAGS+1 copy_op slots.

Two examples may help

head1 byte
frag1   1 byte
...
fragN   1 byte

will need N+1 copy slots and one RX slot.

head8k  2 copy slots, 2 RX slots
frag1   1 byte  3rd copy slot, 3rd RX slot
...
fragN   1 byte  N+2nd copy slot, 3rd RX slot

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 24/31] xen/arm: ITS: Add GICR register emulation

2015-09-07 Thread Vijay Kilari
On Mon, Sep 7, 2015 at 7:50 PM, Julien Grall  wrote:
> Hi Vijay,
>
> On 31/08/15 12:06, vijay.kil...@gmail.com wrote:
>> +static int vgic_v3_gits_lpi_mmio_read(struct vcpu *v, mmio_info_t *info)
>
> [...]
>
>> +/* Neglect if not LPI. */
>> +if ( offset < FIRST_GIC_LPI )
>> +{
>> +*r = 0;
>> +return 1;
>> +}
>
> [...]
>
>> +static int vgic_v3_gits_lpi_mmio_write(struct vcpu *v, mmio_info_t *info)
>> +{
>
> [...]
>
>> +/* Neglect if not LPI. */
>> +if ( offset < FIRST_GIC_LPI )
>> +return 1;
>
> Based on the spec, those 2 checks are wrong and make impossible to use
> LPIs. Please test this patch series before sending it on the ML.

Why do you think so?.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch V1 1/3] xen: introduce dummy system device

2015-09-07 Thread Stefano Stabellini
On Thu, 3 Sep 2015, Juergen Gross wrote:
> Introduce a new dummy system device serving as parent for virtual
> buses. This will enable new pv backends to introduce virtual buses
> which are removable again opposed to system buses which are meant
> to stay once added.
> 
> Signed-off-by: Juergen Gross 
> ---
>  hw/xenpv/xen_machine_pv.c| 39 +++
>  include/hw/xen/xen_backend.h |  1 +
>  2 files changed, 40 insertions(+)
> 
> diff --git a/hw/xenpv/xen_machine_pv.c b/hw/xenpv/xen_machine_pv.c
> index 2e545d2..57bc071 100644
> --- a/hw/xenpv/xen_machine_pv.c
> +++ b/hw/xenpv/xen_machine_pv.c
> @@ -24,10 +24,15 @@
>  
>  #include "hw/hw.h"
>  #include "hw/boards.h"
> +#include "hw/sysbus.h"
>  #include "hw/xen/xen_backend.h"
>  #include "xen_domainbuild.h"
>  #include "sysemu/block-backend.h"
>  
> +#define TYPE_XENSYSDEV "xensysdev"
> +
> +DeviceState *xen_sysdev;
> +
>  static void xen_init_pv(MachineState *machine)
>  {
>  const char *kernel_filename = machine->kernel_filename;
> @@ -59,6 +64,9 @@ static void xen_init_pv(MachineState *machine)
>  break;
>  }
>  
> +xen_sysdev = qdev_create(NULL, TYPE_XENSYSDEV);
> +qdev_init_nofail(xen_sysdev);
> +
>  xen_be_register("console", &xen_console_ops);
>  xen_be_register("vkbd", &xen_kbdmouse_ops);
>  xen_be_register("vfb", &xen_framebuffer_ops);
> @@ -93,6 +101,31 @@ static void xen_init_pv(MachineState *machine)
>  xen_init_display(xen_domid);
>  }
>  
> +static int xen_sysdev_init(SysBusDevice *dev)
> +{
> +return 0;
> +}
> +
> +static Property xen_sysdev_properties[] = {
> +{/* end of property list */},
> +};
> +
> +static void xen_sysdev_class_init(ObjectClass *klass, void *data)
> +{
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
> +
> +k->init = xen_sysdev_init;
> +dc->props = xen_sysdev_properties;
> +}
> +
> +static const TypeInfo xensysdev_info = {
> +.name  = TYPE_XENSYSDEV,
> +.parent= TYPE_SYS_BUS_DEVICE,
> +.instance_size = sizeof(SysBusDevice),
> +.class_init= xen_sysdev_class_init,
> +};
> +
>  static QEMUMachine xenpv_machine = {
>  .name = "xenpv",
>  .desc = "Xen Para-virtualized PC",
> @@ -101,9 +134,15 @@ static QEMUMachine xenpv_machine = {
>  .default_machine_opts = "accel=xen",
>  };
>  
> +static void xenpv_register_types(void)
> +{
> +type_register_static(&xensysdev_info);
> +}

Given that you need this just for usbback, I wonder if you could
move xen_sysdev and its initalization to usbback_init.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 03/20] xen: Add Xen specific page definition

2015-09-07 Thread Julien Grall
The Xen hypercall interface is always using 4K page granularity on ARM
and x86 architecture.

With the incoming support of 64K page granularity for ARM64 guest, it
won't be possible to re-use the Linux page definition in Xen drivers.

Introduce Xen page definition helpers based on the Linux page
definition. They have exactly the same name but prefixed with
XEN_/xen_ prefix.

Also modify xen_page_to_gfn to use new Xen page definition.

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Changes in v4:
- Typoes
- Rename xen_page_to_pfn to page_to_xen_pfn

Changes in v3:
- Fix errors reported by checkpatch.pl
- Rename pfn to xen_pfn in xen_pfn_to_page
- Add a comment that we assume PAGE_SIZE to be a multiple of
XEN_PAGE_SIZE
- s/MFN/GFN/ according to new naming
- Add Stefano's reviewed-by

Changes in v2:
- Add XEN_PFN_UP
- Add a comment describing the behavior of page_to_pfn
---
 include/xen/page.h | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/xen/page.h b/include/xen/page.h
index 1daae48..96294ac 100644
--- a/include/xen/page.h
+++ b/include/xen/page.h
@@ -1,11 +1,36 @@
 #ifndef _XEN_PAGE_H
 #define _XEN_PAGE_H
 
+#include 
+
+/* The hypercall interface supports only 4KB page */
+#define XEN_PAGE_SHIFT 12
+#define XEN_PAGE_SIZE  (_AC(1, UL) << XEN_PAGE_SHIFT)
+#define XEN_PAGE_MASK  (~(XEN_PAGE_SIZE-1))
+#define xen_offset_in_page(p)  ((unsigned long)(p) & ~XEN_PAGE_MASK)
+
+/*
+ * We assume that PAGE_SIZE is a multiple of XEN_PAGE_SIZE
+ * XXX: Add a BUILD_BUG_ON?
+ */
+
+#define xen_pfn_to_page(xen_pfn)   \
+   ((pfn_to_page(((unsigned long)(xen_pfn) << XEN_PAGE_SHIFT) >> 
PAGE_SHIFT)))
+#define page_to_xen_pfn(page)  \
+   (((page_to_pfn(page)) << PAGE_SHIFT) >> XEN_PAGE_SHIFT)
+
+#define XEN_PFN_PER_PAGE   (PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define XEN_PFN_DOWN(x)((x) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_UP(x)  (((x) + XEN_PAGE_SIZE-1) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_PHYS(x)((phys_addr_t)(x) << XEN_PAGE_SHIFT)
+
 #include 
 
+/* Return the GFN associated to the first 4KB of the page */
 static inline unsigned long xen_page_to_gfn(struct page *page)
 {
-   return pfn_to_gfn(page_to_pfn(page));
+   return pfn_to_gfn(page_to_xen_pfn(page));
 }
 
 struct xen_memory_region {
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 06/20] block/xen-blkfront: Split blkif_queue_request in 2

2015-09-07 Thread Julien Grall
Currently, blkif_queue_request has 2 distinct execution path:
- Send a discard request
- Send a read/write request

The function is also allocating grants to use for generating the
request. Although, this is only used for read/write request.

Rather than having a function with 2 distinct execution path, separate
the function in 2. This will also remove one level of tabulation.

Signed-off-by: Julien Grall 
Reviewed-by: Roger Pau Monné 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Roger, if you really want if can drop the else clause in
blkif_queue_request, IHMO it's more clear here. Although I've kept
your Reviewed-by. Let me know if it's not fine.

Changes in v3:
- Fix errors reported by checkpatch.pl
- Add Roger's Reviewed-by

Changes in v2:
- Patch added
---
 drivers/block/xen-blkfront.c | 277 ---
 1 file changed, 153 insertions(+), 124 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 432e105..b11f084 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -395,13 +395,35 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t 
mode,
return 0;
 }
 
-/*
- * Generate a Xen blkfront IO request from a blk layer request.  Reads
- * and writes are handled as expected.
- *
- * @req: a request struct
- */
-static int blkif_queue_request(struct request *req)
+static int blkif_queue_discard_req(struct request *req)
+{
+   struct blkfront_info *info = req->rq_disk->private_data;
+   struct blkif_request *ring_req;
+   unsigned long id;
+
+   /* Fill out a communications ring structure. */
+   ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
+   id = get_id_from_freelist(info);
+   info->shadow[id].request = req;
+
+   ring_req->operation = BLKIF_OP_DISCARD;
+   ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
+   ring_req->u.discard.id = id;
+   ring_req->u.discard.sector_number = (blkif_sector_t)blk_rq_pos(req);
+   if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
+   ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
+   else
+   ring_req->u.discard.flag = 0;
+
+   info->ring.req_prod_pvt++;
+
+   /* Keep a private copy so we can reissue requests when recovering. */
+   info->shadow[id].req = *ring_req;
+
+   return 0;
+}
+
+static int blkif_queue_rw_req(struct request *req)
 {
struct blkfront_info *info = req->rq_disk->private_data;
struct blkif_request *ring_req;
@@ -421,9 +443,6 @@ static int blkif_queue_request(struct request *req)
struct scatterlist *sg;
int nseg, max_grefs;
 
-   if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
-   return 1;
-
max_grefs = req->nr_phys_segments;
if (max_grefs > BLKIF_MAX_SEGMENTS_PER_REQUEST)
/*
@@ -453,139 +472,131 @@ static int blkif_queue_request(struct request *req)
id = get_id_from_freelist(info);
info->shadow[id].request = req;
 
-   if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE))) {
-   ring_req->operation = BLKIF_OP_DISCARD;
-   ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
-   ring_req->u.discard.id = id;
-   ring_req->u.discard.sector_number = 
(blkif_sector_t)blk_rq_pos(req);
-   if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
-   ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
-   else
-   ring_req->u.discard.flag = 0;
+   BUG_ON(info->max_indirect_segments == 0 &&
+  req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
+   BUG_ON(info->max_indirect_segments &&
+  req->nr_phys_segments > info->max_indirect_segments);
+   nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+   ring_req->u.rw.id = id;
+   if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+   /*
+* The indirect operation can only be a BLKIF_OP_READ or
+* BLKIF_OP_WRITE
+*/
+   BUG_ON(req->cmd_flags & (REQ_FLUSH | REQ_FUA));
+   ring_req->operation = BLKIF_OP_INDIRECT;
+   ring_req->u.indirect.indirect_op = rq_data_dir(req) ?
+   BLKIF_OP_WRITE : BLKIF_OP_READ;
+   ring_req->u.indirect.sector_number = 
(blkif_sector_t)blk_rq_pos(req);
+   ring_req->u.indirect.handle = info->handle;
+   ring_req->u.indirect.nr_segments = nseg;
} else {
-   BUG_ON(info->max_indirect_segments == 0 &&
-  req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
-   BUG_ON(info->max_indirect_segments &&
-  req->nr_phys_segments > info->max_indirect_segments);
-   nseg = blk_r

[Xen-devel] [PATCH v4 05/20] xen/grant: Add helper gnttab_page_grant_foreign_access_ref_one

2015-09-07 Thread Julien Grall
Many PV drivers contain the idiom:

pfn = page_to_gfn(...) /* Or similar */
gnttab_grant_foreign_access_ref

Replace it by a new helper. Note that when Linux is using a different
page granularity than Xen, the helper only gives access to the first 4KB
grant.

This is useful where drivers are allocating a full Linux page for each
grant.

Also include xen/interface/grant_table.h rather than xen/grant_table.h in
asm/page.h for x86 to fix a compilation issue [1]. Only the former is
useful in order to get the structure definition.

[1] Interdependency between asm/page.h and xen/grant_table.h which result
to page_mfn not being defined when necessary.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 

Changes in v3:
- Rename gnttab_page_grant_foreign_access_ref into
gnttab_page_grant_foreign_access_ref_one
- Fix typo in the commit message
- s/mfn/gfn based on the new naming
- Add David and Stefano's reviewed-by

Changes in v2:
- Patch added
---
 include/xen/grant_table.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 05b5b08..e17a4b3 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -131,6 +131,15 @@ void gnttab_cancel_free_callback(struct 
gnttab_free_callback *callback);
 void gnttab_grant_foreign_access_ref(grant_ref_t ref, domid_t domid,
 unsigned long frame, int readonly);
 
+/* Give access to the first 4K of the page */
+static inline void gnttab_page_grant_foreign_access_ref_one(
+   grant_ref_t ref, domid_t domid,
+   struct page *page, int readonly)
+{
+   gnttab_grant_foreign_access_ref(ref, domid, xen_page_to_gfn(page),
+   readonly);
+}
+
 void gnttab_grant_foreign_transfer_ref(grant_ref_t, domid_t domid,
   unsigned long pfn);
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 00/20] xen/arm64: Add support for 64KB page in Linux

2015-09-07 Thread Julien Grall
Hi all,

ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
hypercall interface and PV protocol are always based on 4KB page granularity.

Any attempt to boot a Linux guest with 64KB pages enabled will result to a
guest crash.

This series is a first attempt to allow those Linux running with the current
hypercall interface and PV protocol.

This solution has been chosen because we want to run Linux 64KB in released
Xen ARM version or/and platform using an old version of Linux DOM0.

There is room for improvement, such as support of 64KB grant, modification
of PV protocol to support different page size... They will be explored in a
separate patch series later.

TODO list:
- Convert swiotlb to 64KB
- Convert xenfb to 64KB
- Support for multiple page ring support
- Support for 64KB in gnttdev
- Support of non-indirect grant with 64KB frontend
- It may be possible to move some common define between
netback/netfront and blkfront/blkback in an header

I've got most of the patches for the TODO items. I'm planning to send them as
a follow-up as it's not a requirement for a basic guests.

All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
to run it on x86 as I don't have a box with Xen x86 running. I would be
happy if someone give a try and see possible regression for x86.

I know that Konrad as a test-suite for x86. Konrand, would it be possible to
give a run to for this series?

A branch based on the latest xentip/for-linus-4.3 can be found here:

git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v4

Comments, suggestions are welcomed.

Sincerely yours,

Cc: david.vra...@citrix.com
Cc: konrad.w...@oracle.com
Cc: boris.ostrov...@oracle.com
Cc: wei.l...@citrix.com
Cc: roger@citrix.com

Status of each patch:

A: Reviewed-by - Acked-by
M: Patch modified in this series
m: Minor changes in this series (i.e renaming due to previous patches, typoes)
L: Missing Acked-by from a Linux maintainers (Boris, David, Konrad)
N: Missing Acked-by from a Netback maintainers (Ian or Wei)

Julien Grall (20):
A   net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
A   arm/xen: Drop pte_mfn and mfn_pte
A M L   xen: Add Xen specific page definition
A M xen/grant: Introduce helpers to split a page into grant
A   xen/grant: Add helper gnttab_page_grant_foreign_access_ref_one
A   block/xen-blkfront: Split blkif_queue_request in 2
A m block/xen-blkfront: Store a page rather a pfn in the grant structure
A   block/xen-blkfront: split get_grant in 2
A m L   xen/biomerge: Don't allow biovec's to be merged when Linux is not
  using 4KB pages
A   xen/xenbus: Use Xen page definition
A m L   tty/hvc: xen: Use xen page definition
  M L   xen/balloon: Don't rely on the page granularity is the same for Xen
  and Linux
A   xen/events: fifo: Make it running on 64KB granularity
A   xen/grant-table: Make it running on 64KB granularity
A m block/xen-blkfront: Make it running on 64KB page granularity
A   block/xen-blkback: Make it running on 64KB page granularity
A m net/xen-netfront: Make it running on 64KB page granularity
  m  N  net/xen-netback: Make it running on 64KB page granularity
A m xen/privcmd: Add support for Linux 64KB page granularity
A   arm/xen: Add support for 64KB page granularity

 arch/arm/include/asm/xen/page.h |  18 +-
 arch/arm/xen/enlighten.c|   6 +-
 arch/arm/xen/p2m.c  |   6 +-
 arch/x86/include/asm/xen/page.h |   2 +-
 drivers/block/xen-blkback/blkback.c |   5 +-
 drivers/block/xen-blkback/common.h  |  17 +-
 drivers/block/xen-blkback/xenbus.c  |   9 +-
 drivers/block/xen-blkfront.c| 552 +++-
 drivers/net/xen-netback/common.h|  18 +-
 drivers/net/xen-netback/netback.c   | 163 +++
 drivers/net/xen-netfront.c  | 122 +---
 drivers/tty/hvc/hvc_xen.c   |   4 +-
 drivers/xen/balloon.c   |  59 +++-
 drivers/xen/biomerge.c  |   8 +
 drivers/xen/events/events_base.c|   2 +-
 drivers/xen/events/events_fifo.c|   2 +-
 drivers/xen/grant-table.c   |  32 ++-
 drivers/xen/privcmd.c   |   8 +-
 drivers/xen/xenbus/xenbus_client.c  |   6 +-
 drivers/xen/xenbus/xenbus_probe.c   |   3 +-
 drivers/xen/xlate_mmu.c | 124 +---
 include/xen/grant_table.h   |  51 
 include/xen/page.h  |  27 +-
 23 files changed, 855 insertions(+), 389 deletions(-)

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 01/20] net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop

2015-09-07 Thread Julien Grall
The skb doesn't change within the function. Therefore it's only
necessary to check if we need GSO once at the beginning.

Signed-off-by: Julien Grall 
Acked-by: Wei Liu 

---
Cc: Ian Campbell 
Cc: net...@vger.kernel.org

Changes in v4:
- Add Wei's acked

Changes in v2:
- Patch added
---
 drivers/net/xen-netback/netback.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 7c64c74..d4c1bc7 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -277,6 +277,13 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
*queue, struct sk_buff *skb
unsigned long bytes;
int gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
+   if (skb_is_gso(skb)) {
+   if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+   gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+   else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
+   gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+   }
+
/* Data must not cross a page boundary. */
BUG_ON(size + offset > PAGE_SIZEgso_type & SKB_GSO_TCPV6)
-   gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
-   }
-
if (*head && ((1 << gso_type) & queue->vif->gso_mask))
queue->rx.req_cons++;
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 04/20] xen/grant: Introduce helpers to split a page into grant

2015-09-07 Thread Julien Grall
Currently, a grant is always based on the Xen page granularity (i.e
4KB). When Linux is using a different page granularity, a single page
will be split between multiple grants.

The new helpers will be in charge of splitting the Linux page into grants
and call a function given by the caller on each grant.

Also provide an helper to count the number of grants within a given
contiguous region.

Note that the x86/include/asm/xen/page.h is now including
xen/interface/grant_table.h rather than xen/grant_table.h. It's
necessary because xen/grant_table.h depends on asm/xen/page.h and will
break the compilation. Furthermore, only definition in
interface/grant_table.h is required.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org

Changes in v4:
- Typoes
- Rename gnttab_one_grant into gnttab_for_one_grant
- Add Stefano and David's reviewed-by
- s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming

Changes in v3:
- Fix error reported by checkpatch.pl
- Typoes
- s/pfn/xen_pfn/ in gnttab_foreach_grant
- Drop the possibility to use less data. The complexity is moved
in netback which is the only user
- Rename gnttab_foreach_grant into gnttab_foreach_grant_in_range
- s/offset/start/ in gnttab_count_grant and update the
description of the parameter
- s/mfn/gfn base on the new terminologies
- Add EXPORT_SYMBOL_GPL for gnttab_foreach_grant_in_range
- Use xen_offset_in_page and XEN_PFN_DOWN whenever it's possible
- Fix compilation on x86.

Changes in v2:
- Patch added
---
 arch/x86/include/asm/xen/page.h |  2 +-
 drivers/xen/grant-table.c   | 26 +
 include/xen/grant_table.h   | 42 +
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 0b762f6..501479e 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -12,7 +12,7 @@
 #include 
 
 #include 
-#include 
+#include 
 #include 
 
 /* Xen machine address */
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 62f591f..7b4e1cf 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -776,6 +776,32 @@ void gnttab_batch_copy(struct gnttab_copy *batch, unsigned 
count)
 }
 EXPORT_SYMBOL_GPL(gnttab_batch_copy);
 
+void gnttab_foreach_grant_in_range(struct page *page,
+  unsigned int offset,
+  unsigned int len,
+  xen_grant_fn_t fn,
+  void *data)
+{
+   unsigned int goffset;
+   unsigned int glen;
+   unsigned long xen_pfn;
+
+   len = min_t(unsigned int, PAGE_SIZE - offset, len);
+   goffset = xen_offset_in_page(offset);
+
+   xen_pfn = page_to_xen_pfn(page) + XEN_PFN_DOWN(offset);
+
+   while (len) {
+   glen = min_t(unsigned int, XEN_PAGE_SIZE - goffset, len);
+   fn(pfn_to_gfn(xen_pfn), goffset, glen, data);
+
+   goffset = 0;
+   xen_pfn++;
+   len -= glen;
+   }
+}
+EXPORT_SYMBOL_GPL(gnttab_foreach_grant_in_range);
+
 int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
struct page **pages, unsigned int count)
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 4478f4b..05b5b08 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -45,8 +45,10 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
+#include 
 
 #define GNTTAB_RESERVED_XENSTORE 1
 
@@ -224,4 +226,44 @@ static inline struct xen_page_foreign 
*xen_page_foreign(struct page *page)
 #endif
 }
 
+/* Split Linux page in chunk of the size of the grant and call fn
+ *
+ * Parameters of fn:
+ * gfn: guest frame number
+ * offset: offset in the grant
+ * len: length of the data in the grant.
+ * data: internal information
+ */
+typedef void (*xen_grant_fn_t)(unsigned long gfn, unsigned int offset,
+  unsigned int len, void *data);
+
+void gnttab_foreach_grant_in_range(struct page *page,
+  unsigned int offset,
+  unsigned int len,
+  xen_grant_fn_t fn,
+  void *data);
+
+/* Helper to get to call fn only on the first "grant chunk" */
+static inline void gnttab_for_one_grant(struct page *page, unsigned int offset,
+   unsigned len, xen_grant_fn_t fn,
+   void *data)
+{
+   /* The first request is 

[Xen-devel] [PATCH v4 08/20] block/xen-blkfront: split get_grant in 2

2015-09-07 Thread Julien Grall
Prepare the code to support 64KB page granularity. The first
implementation will use a full Linux page per indirect and persistent
grant. When non-persistent grant is used, each page of a bio request
may be split in multiple grant.

Furthermore, the field page of the grant structure is only used to copy
data from persistent grant or indirect grant. Avoid to set it for other
use case as it will have no meaning given the page will be split in
multiple grant.

Provide 2 functions, to setup indirect grant, the other for bio page.

Signed-off-by: Julien Grall 
Acked-by: Roger Pau Monné 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Changes in v4:
- Add Roger's acked-by

Changes in v3:
- Fix errors reported by checkpatch.pl
- gnttab_page_grant_foreign_access_ref has been renamed to
gnttab_page_grant_foreign_access_ref_one
- Fix compilation by using get_indirect_grant rather than
get_grant (the changes was in a later patch...).
- Make grant_foreign_access static inline
- s/mfn/gfn/ based on the new naming

Changes in v2:
- Patch added
---
 drivers/block/xen-blkfront.c | 88 +---
 1 file changed, 59 insertions(+), 29 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 556475d..4232cbd 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -245,34 +245,77 @@ out_of_memory:
return -ENOMEM;
 }
 
-static struct grant *get_grant(grant_ref_t *gref_head,
-  struct page *page,
-  struct blkfront_info *info)
+static struct grant *get_free_grant(struct blkfront_info *info)
 {
struct grant *gnt_list_entry;
-   unsigned long buffer_gfn;
 
BUG_ON(list_empty(&info->grants));
gnt_list_entry = list_first_entry(&info->grants, struct grant,
- node);
+ node);
list_del(&gnt_list_entry->node);
 
-   if (gnt_list_entry->gref != GRANT_INVALID_REF) {
+   if (gnt_list_entry->gref != GRANT_INVALID_REF)
info->persistent_gnts_c--;
+
+   return gnt_list_entry;
+}
+
+static inline void grant_foreign_access(const struct grant *gnt_list_entry,
+   const struct blkfront_info *info)
+{
+   gnttab_page_grant_foreign_access_ref_one(gnt_list_entry->gref,
+info->xbdev->otherend_id,
+gnt_list_entry->page,
+0);
+}
+
+static struct grant *get_grant(grant_ref_t *gref_head,
+  unsigned long gfn,
+  struct blkfront_info *info)
+{
+   struct grant *gnt_list_entry = get_free_grant(info);
+
+   if (gnt_list_entry->gref != GRANT_INVALID_REF)
return gnt_list_entry;
+
+   /* Assign a gref to this page */
+   gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
+   BUG_ON(gnt_list_entry->gref == -ENOSPC);
+   if (info->feature_persistent)
+   grant_foreign_access(gnt_list_entry, info);
+   else {
+   /* Grant access to the GFN passed by the caller */
+   gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
+   info->xbdev->otherend_id,
+   gfn, 0);
}
 
+   return gnt_list_entry;
+}
+
+static struct grant *get_indirect_grant(grant_ref_t *gref_head,
+   struct blkfront_info *info)
+{
+   struct grant *gnt_list_entry = get_free_grant(info);
+
+   if (gnt_list_entry->gref != GRANT_INVALID_REF)
+   return gnt_list_entry;
+
/* Assign a gref to this page */
gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
BUG_ON(gnt_list_entry->gref == -ENOSPC);
if (!info->feature_persistent) {
-   BUG_ON(!page);
-   gnt_list_entry->page = page;
+   struct page *indirect_page;
+
+   /* Fetch a pre-allocated page to use for indirect grefs */
+   BUG_ON(list_empty(&info->indirect_pages));
+   indirect_page = list_first_entry(&info->indirect_pages,
+struct page, lru);
+   list_del(&indirect_page->lru);
+   gnt_list_entry->page = indirect_page;
}
-   buffer_gfn = xen_page_to_gfn(gnt_list_entry->page);
-   gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
-   info->xbdev->otherend_id,
-   buffer_gfn, 0);
+   grant_foreign_access(gnt_list_entry, info);
+
return gnt_list_entry;
 }
 
@@ -525,32 +568,19 @@ static 

[Xen-devel] [PATCH v4 09/20] xen/biomerge: Don't allow biovec's to be merged when Linux is not using 4KB pages

2015-09-07 Thread Julien Grall
On ARM all dma-capable devices on a same platform may not be protected
by an IOMMU. The DMA requests have to use the BFN (i.e MFN on ARM) in
order to use correctly the device.

While the DOM0 memory is allocated in a 1:1 fashion (PFN == MFN), grant
mapping will screw this contiguous mapping.

When Linux is using 64KB page granularitary, the page may be split
accross multiple non-contiguous MFN (Xen is using 4KB page
granularity). Therefore a DMA request will likely fail.

Checking that a 64KB page is using contiguous MFN is tedious. For
now, always says that biovec are not mergeable.

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

There is some ideas to check whether two biovec could be merged
(see [1]) but it's not critical and can be consider as a performance
improvement.

Changes in v4:
- Fix typoes in the subject
- Add Stefano's reviewed-by

Changes in v3:
- Update commit message
- s/mfn/bfn/ base on the new renaming
- Update TODO

Changes in v2:
- Remove the workaround and check if the Linux page granularity
is the same as Xen or not

[1] https://lkml.org/lkml/2015/7/17/418
---
 drivers/xen/biomerge.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
index 8ae2fc90..4da69db 100644
--- a/drivers/xen/biomerge.c
+++ b/drivers/xen/biomerge.c
@@ -6,10 +6,18 @@
 bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
   const struct bio_vec *vec2)
 {
+#if XEN_PAGE_SIZE == PAGE_SIZE
unsigned long bfn1 = pfn_to_bfn(page_to_pfn(vec1->bv_page));
unsigned long bfn2 = pfn_to_bfn(page_to_pfn(vec2->bv_page));
 
return __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
((bfn1 == bfn2) || ((bfn1+1) == bfn2));
+#else
+   /*
+* XXX: Add support for merging bio_vec when using different page
+* size in Xen and Linux.
+*/
+   return 0;
+#endif
 }
 EXPORT_SYMBOL(xen_biovec_phys_mergeable);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 07/20] block/xen-blkfront: Store a page rather a pfn in the grant structure

2015-09-07 Thread Julien Grall
All the usage of the field pfn are done using the same idiom:

pfn_to_page(grant->pfn)

This will  return always the same page. Store directly the page in the
grant to clean up the code.

Signed-off-by: Julien Grall 
Acked-by: Roger Pau Monné 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Roger, Stefano, I kept your Acked-by/Reviewed-by because the rebase was
minor. Let me know if you disagree.

Changes in v4:
- rebase after 7adf12b87f45a77d364464018fb8e9e1ac875152
"xen-blkfront: don't add indirect pages to list when
!feature_persistent"

Changes in v3:
- Use the correct indentation in get_grant. The current
indentation (i.e without this patch) was wrong because it was
using space rather than tabulation.
- Add Roger's acked and Stefano's reviewed
- s/mfn/gfn based on the new naming

Changes in v2:
- Patch added
---
 drivers/block/xen-blkfront.c | 39 +++
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b11f084..556475d 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -68,7 +68,7 @@ enum blkif_state {
 
 struct grant {
grant_ref_t gref;
-   unsigned long pfn;
+   struct page *page;
struct list_head node;
 };
 
@@ -222,7 +222,7 @@ static int fill_grant_buffer(struct blkfront_info *info, 
int num)
kfree(gnt_list_entry);
goto out_of_memory;
}
-   gnt_list_entry->pfn = page_to_pfn(granted_page);
+   gnt_list_entry->page = granted_page;
}
 
gnt_list_entry->gref = GRANT_INVALID_REF;
@@ -237,7 +237,7 @@ out_of_memory:
 &info->grants, node) {
list_del(&gnt_list_entry->node);
if (info->feature_persistent)
-   __free_page(pfn_to_page(gnt_list_entry->pfn));
+   __free_page(gnt_list_entry->page);
kfree(gnt_list_entry);
i--;
}
@@ -246,8 +246,8 @@ out_of_memory:
 }
 
 static struct grant *get_grant(grant_ref_t *gref_head,
-   unsigned long pfn,
-   struct blkfront_info *info)
+  struct page *page,
+  struct blkfront_info *info)
 {
struct grant *gnt_list_entry;
unsigned long buffer_gfn;
@@ -266,10 +266,10 @@ static struct grant *get_grant(grant_ref_t *gref_head,
gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
BUG_ON(gnt_list_entry->gref == -ENOSPC);
if (!info->feature_persistent) {
-   BUG_ON(!pfn);
-   gnt_list_entry->pfn = pfn;
+   BUG_ON(!page);
+   gnt_list_entry->page = page;
}
-   buffer_gfn = pfn_to_gfn(gnt_list_entry->pfn);
+   buffer_gfn = xen_page_to_gfn(gnt_list_entry->page);
gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
info->xbdev->otherend_id,
buffer_gfn, 0);
@@ -525,7 +525,7 @@ static int blkif_queue_rw_req(struct request *req)
 
if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
(i % SEGS_PER_INDIRECT_FRAME == 0)) {
-   unsigned long uninitialized_var(pfn);
+   struct page *uninitialized_var(page);
 
if (segments)
kunmap_atomic(segments);
@@ -542,15 +542,15 @@ static int blkif_queue_rw_req(struct request *req)
indirect_page = 
list_first_entry(&info->indirect_pages,
 struct page, 
lru);
list_del(&indirect_page->lru);
-   pfn = page_to_pfn(indirect_page);
+   page = indirect_page;
}
-   gnt_list_entry = get_grant(&gref_head, pfn, info);
+   gnt_list_entry = get_grant(&gref_head, page, info);
info->shadow[id].indirect_grants[n] = gnt_list_entry;
-   segments = 
kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+   segments = kmap_atomic(gnt_list_entry->page);
ring_req->u.indirect.indirect_grefs[n] = 
gnt_list_entry->gref;
}
 
-   gnt_list_entry = get_grant(&gref_head, 
page_to_pfn(sg_page(sg)), info);
+   gnt_list_entry = get_grant(&gref_head, sg_page(sg), info);
ref = gnt_list_entry->gref;
 
info->shadow[id].grants_used[i] = gnt_list_entry;
@@ -561,7 +5

[Xen-devel] [PATCH v4 02/20] arm/xen: Drop pte_mfn and mfn_pte

2015-09-07 Thread Julien Grall
They are not used in common code expect in one place in balloon.c which is
only compiled when Linux is using PV MMU. It's not the case on ARM.

Rather than worrying how to handle the 64KB case, drop them.

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Cc: Russell King 

Changes in v4:
- Add Stefano's reviewed

Changes in v3:
- Patch added
---
 arch/arm/include/asm/xen/page.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 1279563..98c9fc3 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -13,9 +13,6 @@
 
 #define phys_to_machine_mapping_valid(pfn) (1)
 
-#define pte_mfnpte_pfn
-#define mfn_ptepfn_pte
-
 /* Xen machine address */
 typedef struct xmaddr {
phys_addr_t maddr;
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 15/20] block/xen-blkfront: Make it running on 64KB page granularity

2015-09-07 Thread Julien Grall
The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using block
device on a non-modified Xen.

The block API is using segment which should at least be the size of a
Linux page. Therefore, the driver will have to break the page in chunk
of 4K before giving the page to the backend.

When breaking a 64KB segment in 4KB chunks, it is possible that some
chunks are empty. As the PV protocol always require to have data in the
chunk, we have to count the number of Xen page which will be in use and
avoid sending empty chunks.

Note that, a pre-defined number of grants are reserved before preparing
the request. This pre-defined number is based on the number and the
maximum size of the segments. If each segment contains a very small
amount of data, the driver may reserve too many grants (16 grants is
reserved per segment with 64KB page granularity).

Furthermore, in the case of persistent grants we allocate one Linux page
per grant although only the first 4KB of the page will be effectively
in use. This could be improved by sharing the page with multiple grants.

Signed-off-by: Julien Grall 
Acked-by: Roger Pau Monné 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

Improvement such as support 64KB grant is not taken into consideration in
this patch because we have the requirement to run a Linux using 64KB page
on a non-modified Xen.

Changes in v4:
- Rebase after d50babbe300eedf33ea5b00a12c5df3a05bd96c7 "
xen-blkfront: introduce blkfront_gather_backend_features()"
- Fix typoes
- Add Roger's acked-by

Changes in v3:
- Use DIV_ROUND_UP in INDIRECT_GREFS
- Split lines over 80 characters whenever it's possible
- s/mfn/gfn/ based on the new naming
- The grant callback doesn't allow anymore to change the len
(wasn't used here).
- gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
- Use gnttab_count_grant to get the number of grants in a sg
- Do some renaming to use the correct variable every time

Changes in v2:
- Use gnttab_foreach_grant to split a Linux page into grant
---
 drivers/block/xen-blkfront.c | 324 ---
 1 file changed, 213 insertions(+), 111 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 4232cbd..f2cdc73 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -78,6 +78,7 @@ struct blk_shadow {
struct grant **grants_used;
struct grant **indirect_grants;
struct scatterlist *sg;
+   unsigned int num_sg;
 };
 
 struct split_bio {
@@ -107,8 +108,12 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, 
S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for 
the shared ring");
 
-#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * 
(info)->nr_ring_pages)
-#define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * 
XENBUS_MAX_RING_PAGES)
+#define BLK_RING_SIZE(info)\
+   __CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * (info)->nr_ring_pages)
+
+#define BLK_MAX_RING_SIZE  \
+   __CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * XENBUS_MAX_RING_PAGES)
+
 /*
  * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
  * characters are enough. Define to 20 to keep consist with backend.
@@ -147,6 +152,7 @@ struct blkfront_info
unsigned int discard_granularity;
unsigned int discard_alignment;
unsigned int feature_persistent:1;
+   /* Number of 4KB segments handled */
unsigned int max_indirect_segments;
int is_ready;
struct blk_mq_tag_set tag_set;
@@ -175,10 +181,23 @@ static DEFINE_SPINLOCK(minor_lock);
 
 #define DEV_NAME   "xvd"   /* name in /dev */
 
-#define SEGS_PER_INDIRECT_FRAME \
-   (PAGE_SIZE/sizeof(struct blkif_request_segment))
-#define INDIRECT_GREFS(_segs) \
-   ((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
+/*
+ * Grants are always the same size as a Xen page (i.e 4KB).
+ * A physical segment is always the same size as a Linux page.
+ * Number of grants per physical segment
+ */
+#define GRANTS_PER_PSEG(PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define GRANTS_PER_INDIRECT_FRAME \
+   (XEN_PAGE_SIZE / sizeof(struct blkif_request_segment))
+
+#define PSEGS_PER_INDIRECT_FRAME   \
+   (GRANTS_INDIRECT_FRAME / GRANTS_PSEGS)
+
+#define INDIRECT_GREFS(_grants)\
+   DIV_ROUND_UP(_grants, GRANTS_PER_INDIRECT_FRAME)
+
+#define GREFS(_psegs)  ((_psegs) * GRANTS_PER_PSEG)
 
 static int blkfront_setup_indirect(struct blkfront_info *info);
 static int blkfront_gather_backend_features(struct blkfront_info *info);
@@ -466,14 +485,100 @@ static int blkif_queue_discard_req(struct request *req)
return 0;
 }
 
+struct setu

[Xen-devel] [PATCH v4 19/20] xen/privcmd: Add support for Linux 64KB page granularity

2015-09-07 Thread Julien Grall
The hypercall interface (as well as the toolstack) is always using 4KB
page granularity. When the toolstack is asking for mapping a series of
guest PFN in a batch, it expects to have the page map contiguously in
its virtual memory.

When Linux is using 64KB page granularity, the privcmd driver will have
to map multiple Xen PFN in a single Linux page.

Note that this solution works on page granularity which is a multiple of
4KB.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 

I kept the hypercall arguments in remap_data to avoid allocating them on
the stack every time that remap_pte_fn is called.
I will keep like that unless someone is strongly disagree.

Changes in v4:
- s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
- Add David's reviewed-by

Changes in v3:
- The function to split a Linux page in mutiple Xen page has
been moved internally. It was the only use (not used anymore in
the balloon) and it's not quite clear what should be the common
interface. Differ the question until someone need to use it.
- s/nr_pfn/numgfns/ to make clear that we are dealing with GFN
- Use DIV_ROUND_UP rather round_up and fix the usage in
xen_xlate_unmap_gfn_range

Changes in v2:
- Use xen_apply_to_page
---
 drivers/xen/privcmd.c   |   8 ++--
 drivers/xen/xlate_mmu.c | 124 
 2 files changed, 89 insertions(+), 43 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index c6deb87..c8798ee 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -446,7 +446,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
return -EINVAL;
}
 
-   nr_pages = m.num;
+   nr_pages = DIV_ROUND_UP(m.num, XEN_PFN_PER_PAGE);
if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT)))
return -EINVAL;
 
@@ -494,7 +494,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
goto out_unlock;
}
if (xen_feature(XENFEAT_auto_translated_physmap)) {
-   ret = alloc_empty_pages(vma, m.num);
+   ret = alloc_empty_pages(vma, nr_pages);
if (ret < 0)
goto out_unlock;
} else
@@ -518,6 +518,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
state.global_error  = 0;
state.version   = version;
 
+   BUILD_BUG_ON(((PAGE_SIZE / sizeof(xen_pfn_t)) % XEN_PFN_PER_PAGE) != 0);
/* mmap_batch_fn guarantees ret == 0 */
BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
&pagelist, mmap_batch_fn, &state));
@@ -582,12 +583,13 @@ static void privcmd_close(struct vm_area_struct *vma)
 {
struct page **pages = vma->vm_private_data;
int numpgs = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+   int numgfns = (vma->vm_end - vma->vm_start) >> XEN_PAGE_SHIFT;
int rc;
 
if (!xen_feature(XENFEAT_auto_translated_physmap) || !numpgs || !pages)
return;
 
-   rc = xen_unmap_domain_gfn_range(vma, numpgs, pages);
+   rc = xen_unmap_domain_gfn_range(vma, numgfns, pages);
if (rc == 0)
free_xenballooned_pages(numpgs, pages);
else
diff --git a/drivers/xen/xlate_mmu.c b/drivers/xen/xlate_mmu.c
index cff2387..5063c5e 100644
--- a/drivers/xen/xlate_mmu.c
+++ b/drivers/xen/xlate_mmu.c
@@ -38,31 +38,28 @@
 #include 
 #include 
 
-/* map fgfn of domid to lpfn in the current domain */
-static int map_foreign_page(unsigned long lpfn, unsigned long fgfn,
-   unsigned int domid)
-{
-   int rc;
-   struct xen_add_to_physmap_range xatp = {
-   .domid = DOMID_SELF,
-   .foreign_domid = domid,
-   .size = 1,
-   .space = XENMAPSPACE_gmfn_foreign,
-   };
-   xen_ulong_t idx = fgfn;
-   xen_pfn_t gpfn = lpfn;
-   int err = 0;
+typedef void (*xen_gfn_fn_t)(unsigned long gfn, void *data);
 
-   set_xen_guest_handle(xatp.idxs, &idx);
-   set_xen_guest_handle(xatp.gpfns, &gpfn);
-   set_xen_guest_handle(xatp.errs, &err);
+/* Break down the pages in 4KB chunk and call fn for each gfn */
+static void xen_for_each_gfn(struct page **pages, unsigned nr_gfn,
+xen_gfn_fn_t fn, void *data)
+{
+   unsigned long xen_pfn = 0;
+   struct page *page;
+   int i;
 
-   rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
-   return rc < 0 ? rc : err;
+   for (i = 0; i < nr_gfn; i++) {
+   if ((i % XEN_PFN_PER_PAGE) == 0) {
+   page = pages[i / XEN_PFN_PER_PAGE];
+   xen_pfn = page_to_xen_pfn(page);
+

[Xen-devel] [PATCH v4 13/20] xen/events: fifo: Make it running on 64KB granularity

2015-09-07 Thread Julien Grall
Only use the first 4KB of the page to store the events channel info. It
means that we will waste 60KB every time we allocate page for:
 * control block: a page is allocating per CPU
 * event array: a page is allocating everytime we need to expand it

I think we can reduce the memory waste for the 2 areas by:

* control block: sharing between multiple vCPUs. Although it will
require some bookkeeping in order to not free the page when the CPU
goes offline and the other CPUs sharing the page still there

* event array: always extend the array event by 64K (i.e 16 4K
chunk). That would require more care when we fail to expand the
event channel.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 
Reviewed-by: Stefano Stabellini 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 

Note I haven't updated the suggestion to reduce the memory waste
after David's email [1]. I can do it if necessary.

Changes in v3:
- Add David and Stefano's reviewed-by

[1] http://lists.xen.org/archives/html/xen-devel/2015-07/msg04596.html
---
 drivers/xen/events/events_base.c | 2 +-
 drivers/xen/events/events_fifo.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index c49bb7a..00dd923 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -40,11 +40,11 @@
 #include 
 #include 
 #include 
-#include 
 #endif
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
index 1d4baf5..e3e9e3d 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -54,7 +54,7 @@
 
 #include "events_internal.h"
 
-#define EVENT_WORDS_PER_PAGE (PAGE_SIZE / sizeof(event_word_t))
+#define EVENT_WORDS_PER_PAGE (XEN_PAGE_SIZE / sizeof(event_word_t))
 #define MAX_EVENT_ARRAY_PAGES (EVTCHN_FIFO_NR_CHANNELS / EVENT_WORDS_PER_PAGE)
 
 struct evtchn_fifo_queue {
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 14/20] xen/grant-table: Make it running on 64KB granularity

2015-09-07 Thread Julien Grall
The Xen interface is using 4KB page granularity. This means that each
grant is 4KB.

The current implementation allocates a Linux page per grant. On Linux
using 64KB page granularity, only the first 4KB of the page will be
used.

We could decrease the memory wasted by sharing the page with multiple
grant. It will require some care with the {Set,Clear}ForeignPage macro.

Note that no changes has been made in the x86 code because both Linux
and Xen will only use 4KB page granularity.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 
Reviewed-by: Stefano Stabellini 

---
Cc: Stefano Stabellini 
Cc: Russell King 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 

Changes in v3:
- Add Stefano's reviewed-by

Changes in v2
- Add David's reviewed-by
---
 arch/arm/xen/p2m.c| 6 +++---
 drivers/xen/grant-table.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm/xen/p2m.c b/arch/arm/xen/p2m.c
index 887596c..0ed01f2 100644
--- a/arch/arm/xen/p2m.c
+++ b/arch/arm/xen/p2m.c
@@ -93,8 +93,8 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref 
*map_ops,
for (i = 0; i < count; i++) {
if (map_ops[i].status)
continue;
-   set_phys_to_machine(map_ops[i].host_addr >> PAGE_SHIFT,
-   map_ops[i].dev_bus_addr >> PAGE_SHIFT);
+   set_phys_to_machine(map_ops[i].host_addr >> XEN_PAGE_SHIFT,
+   map_ops[i].dev_bus_addr >> XEN_PAGE_SHIFT);
}
 
return 0;
@@ -108,7 +108,7 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref 
*unmap_ops,
int i;
 
for (i = 0; i < count; i++) {
-   set_phys_to_machine(unmap_ops[i].host_addr >> PAGE_SHIFT,
+   set_phys_to_machine(unmap_ops[i].host_addr >> XEN_PAGE_SHIFT,
INVALID_P2M_ENTRY);
}
 
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 7b4e1cf..99ed9c2 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -642,7 +642,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
if (xen_auto_xlat_grant_frames.count)
return -EINVAL;
 
-   vaddr = xen_remap(addr, PAGE_SIZE * max_nr_gframes);
+   vaddr = xen_remap(addr, XEN_PAGE_SIZE * max_nr_gframes);
if (vaddr == NULL) {
pr_warn("Failed to ioremap gnttab share frames (addr=%pa)!\n",
&addr);
@@ -654,7 +654,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
return -ENOMEM;
}
for (i = 0; i < max_nr_gframes; i++)
-   pfn[i] = PFN_DOWN(addr) + i;
+   pfn[i] = XEN_PFN_DOWN(addr) + i;
 
xen_auto_xlat_grant_frames.vaddr = vaddr;
xen_auto_xlat_grant_frames.pfn = pfn;
@@ -1004,7 +1004,7 @@ static void gnttab_request_version(void)
 {
/* Only version 1 is used, which will always be available. */
grant_table_version = 1;
-   grefs_per_grant_frame = PAGE_SIZE / sizeof(struct grant_entry_v1);
+   grefs_per_grant_frame = XEN_PAGE_SIZE / sizeof(struct grant_entry_v1);
gnttab_interface = &gnttab_v1_ops;
 
pr_info("Grant tables using version %d layout\n", grant_table_version);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 11/20] tty/hvc: xen: Use xen page definition

2015-09-07 Thread Julien Grall
The console ring is always based on the page granularity of Xen.

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Cc: David Vrabel 
Cc: Boris Ostrovsky 
Cc: linuxppc-...@lists.ozlabs.org

Changes in v4:
- The ring is always 4K (i.e XEN_PAGE_SIZE), so no need to
map with PAGE_SIZE. This was correctly done in v2 but lost with
the rebase to the "s/mfn/gfn/" series

Changes in v3:
- Some changes has been moved in the series "Use correctly the
Xen memory terminologies in Linux".
- Add Stefano's reviewed-by
---
 drivers/tty/hvc/hvc_xen.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 10beb15..fa816b7 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -230,7 +230,7 @@ static int xen_hvm_console_init(void)
if (r < 0 || v == 0)
goto err;
gfn = v;
-   info->intf = xen_remap(gfn << PAGE_SHIFT, PAGE_SIZE);
+   info->intf = xen_remap(gfn << XEN_PAGE_SHIFT, XEN_PAGE_SIZE);
if (info->intf == NULL)
goto err;
info->vtermno = HVC_COOKIE;
@@ -472,7 +472,7 @@ static int xencons_resume(struct xenbus_device *dev)
struct xencons_info *info = dev_get_drvdata(&dev->dev);
 
xencons_disconnect_backend(info);
-   memset(info->intf, 0, PAGE_SIZE);
+   memset(info->intf, 0, XEN_PAGE_SIZE);
return xencons_connect_backend(dev, info);
 }
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 20/20] arm/xen: Add support for 64KB page granularity

2015-09-07 Thread Julien Grall
The hypercall interface is always using 4KB page granularity. This is
requiring to use xen page definition macro when we deal with hypercall.

Note that pfn_to_gfn is working with a Xen pfn (i.e 4KB). We may want to
rename pfn_gfn to make this explicit.

We also allocate a 64KB page for the shared page even though only the
first 4KB is used. I don't think this is really important for now as it
helps to have the pointer 4KB aligned (XENMEM_add_to_physmap is taking a
Xen PFN).

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Cc: Russell King 

Stefano, I've dropped your reviewed-by given I've updated the doc and do
changes to avoid usage of XEN_PAGE_SHIFT

Changes in v4:
- Add Stefano's Reviewed-by

Changes in v3:
- s/MFN/GFN/ base on the new naming
- Use virt_to_gfn to avoid use XEN_PAGE_SHIFT
- Drop Stefano's reviewed-by
- Add some docs in arch/arm/asm/xen/page.h

Changes in v2
- Add Stefano's reviewed-by
---
 arch/arm/include/asm/xen/page.h | 15 +--
 arch/arm/xen/enlighten.c|  6 +++---
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 98c9fc3..e3d94cf 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -28,6 +28,17 @@ typedef struct xpaddr {
 
 #define INVALID_P2M_ENTRY  (~0UL)
 
+/*
+ * The pseudo-physical frame (pfn) used in all the helpers is always based
+ * on Xen page granularity (i.e 4KB).
+ *
+ * A Linux page may be split across multiple non-contiguous Xen page so we
+ * have to keep track with frame based on 4KB page granularity.
+ *
+ * PV drivers should never make a direct usage of those helpers (particularly
+ * pfn_to_gfn and gfn_to_pfn).
+ */
+
 unsigned long __pfn_to_mfn(unsigned long pfn);
 extern struct rb_root phys_to_mach;
 
@@ -64,8 +75,8 @@ static inline unsigned long bfn_to_pfn(unsigned long bfn)
 #define bfn_to_local_pfn(bfn)  bfn_to_pfn(bfn)
 
 /* VIRT <-> GUEST conversion */
-#define virt_to_gfn(v) (pfn_to_gfn(virt_to_pfn(v)))
-#define gfn_to_virt(m) (__va(gfn_to_pfn(m) << PAGE_SHIFT))
+#define virt_to_gfn(v) (pfn_to_gfn(virt_to_phys(v) >> XEN_PAGE_SHIFT))
+#define gfn_to_virt(m) (__va(gfn_to_pfn(m) << XEN_PAGE_SHIFT))
 
 /* Only used in PV code. But ARM guests are always HVM. */
 static inline xmaddr_t arbitrary_virt_to_machine(void *vaddr)
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index eeeab07..50b4769 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -89,8 +89,8 @@ static void xen_percpu_init(void)
pr_info("Xen: initializing cpu%d\n", cpu);
vcpup = per_cpu_ptr(xen_vcpu_info, cpu);
 
-   info.mfn = __pa(vcpup) >> PAGE_SHIFT;
-   info.offset = offset_in_page(vcpup);
+   info.mfn = virt_to_gfn(vcpup);
+   info.offset = xen_offset_in_page(vcpup);
 
err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, cpu, &info);
BUG_ON(err);
@@ -213,7 +213,7 @@ static int __init xen_guest_init(void)
xatp.domid = DOMID_SELF;
xatp.idx = 0;
xatp.space = XENMAPSPACE_shared_info;
-   xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+   xatp.gpfn = virt_to_gfn(shared_info_page);
if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
BUG();
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 17/20] net/xen-netfront: Make it running on 64KB page granularity

2015-09-07 Thread Julien Grall
The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using network
device on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Note that we allocate a Linux page for each rx skb but only the first
4KB is used. We may improve the memory usage by extending the size of
the rx skb.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: net...@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a Linux
using 64KB pages on a non-modified Xen.

Tested with workload such as ping, ssh, wget, git... I would happy if
someone give details how to test all the path.

Changes in v4:
- s/gnttab_one_grant/gnttab_for_one_grant/ based on the new naming
- Add David's reviewed-by

Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ base on the new naming
- xennet_tx_setup_grant was calling itself resulting an
guest stall when using iperf.
- The grant callback doesn't allow anymore to change the len
(wasn't used here)
- gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
- gnttab_page_grant_foreign_ref has been renamed to
gnttab_foreach_grant_foreign_ref_one

Changes in v2:
- Use gnttab_foreach_grant to split a Linux page in grant
- Fix count slots
---
 drivers/net/xen-netfront.c | 122 -
 1 file changed, 86 insertions(+), 36 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 47f791e..17b1013 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -74,8 +74,8 @@ struct netfront_cb {
 
 #define GRANT_INVALID_REF  0
 
-#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 /* Minimum number of Rx slots (includes slot for GSO metadata). */
 #define NET_RX_SLOTS_MIN (XEN_NETIF_NR_SLOTS_MIN + 1)
@@ -291,7 +291,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue 
*queue)
struct sk_buff *skb;
unsigned short id;
grant_ref_t ref;
-   unsigned long gfn;
+   struct page *page;
struct xen_netif_rx_request *req;
 
skb = xennet_alloc_one_rx_buffer(queue);
@@ -307,14 +307,13 @@ static void xennet_alloc_rx_buffers(struct netfront_queue 
*queue)
BUG_ON((signed short)ref < 0);
queue->grant_rx_ref[id] = ref;
 
-   gfn = 
xen_page_to_gfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+   page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
 
req = RING_GET_REQUEST(&queue->rx, req_prod);
-   gnttab_grant_foreign_access_ref(ref,
-   queue->info->xbdev->otherend_id,
-   gfn,
-   0);
-
+   gnttab_page_grant_foreign_access_ref_one(ref,
+
queue->info->xbdev->otherend_id,
+page,
+0);
req->id = id;
req->gref = ref;
}
@@ -415,25 +414,33 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
xennet_maybe_wake_tx(queue);
 }
 
-static struct xen_netif_tx_request *xennet_make_one_txreq(
-   struct netfront_queue *queue, struct sk_buff *skb,
-   struct page *page, unsigned int offset, unsigned int len)
+struct xennet_gnttab_make_txreq {
+   struct netfront_queue *queue;
+   struct sk_buff *skb;
+   struct page *page;
+   struct xen_netif_tx_request *tx; /* Last request */
+   unsigned int size;
+};
+
+static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset,
+ unsigned int len, void *data)
 {
+   struct xennet_gnttab_make_txreq *info = data;
unsigned int id;
struct xen_netif_tx_request *tx;
grant_ref_t ref;
-
-   len = min_t(unsigned int, PAGE_SIZE - offset, len);
+   /* convenient aliases */
+   struct page *page = info->page;
+   struct netfront_queue *queue = info->queue;
+   struct sk_buff *skb = info->skb;
 
id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
ref = gnttab_claim_

[Xen-devel] [PATCH v4 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux

2015-09-07 Thread Julien Grall
For ARM64 guests, Linux is able to support either 64K or 4K page
granularity. Although, the hypercall interface is always based on 4K
page granularity.

With 64K page granularity, a single page will be spread over multiple
Xen frame.

To avoid splitting the page into 4K frame, take advantage of the
extent_order field to directly allocate/free chunk of the Linux page
size.

Note that PVMMU is only used for PV guest (which is x86) and the page
granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
that because the code has not been modified.

Signed-off-by: Julien Grall 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
Cc: Wei Liu 

Note that two BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE) in code built
for the PV MMU code is kept in order to have at least one even if we
ever decide to drop of code section.

Changes in v4:
- s/xen_page_to_pfn/page_to_xen_pfn/ based on the new naming
- Use the field lru in the page to get a list of pages when
decreasing the memory reservation. It avoids to use a static
array to store the pages (see v3).
- Update comment for EXTENT_ORDER.

Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ based on the new naming
- Rather than splitting the page into 4KB chunk, use the
extent_order field to allocate directly a Linux page size. This
is avoid lots of code for no benefits.

Changes in v2:
- Use xen_apply_to_page to split a page in 4K chunk
- It's not necessary to have a smaller frame list. Re-use
PAGE_SIZE
- Convert reserve_additional_memory to use XEN_... macro
---
 drivers/xen/balloon.c | 59 ++-
 1 file changed, 44 insertions(+), 15 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index c79329f..3babf13 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -70,6 +70,11 @@
 #include 
 #include 
 
+/* Use one extent per PAGE_SIZE to avoid to break down the page into
+ * multiple frame.
+ */
+#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
+
 /*
  * balloon_process() state:
  *
@@ -230,6 +235,11 @@ static enum bp_state reserve_additional_memory(long credit)
nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
 
 #ifdef CONFIG_XEN_HAVE_PVMMU
+   /* We don't support PV MMU when Linux and Xen is using
+* different page granularity.
+*/
+   BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
 /*
  * add_memory() will build page tables for the new memory so
  * the p2m must contain invalid entries so the correct
@@ -326,11 +336,11 @@ static enum bp_state reserve_additional_memory(long 
credit)
 static enum bp_state increase_reservation(unsigned long nr_pages)
 {
int rc;
-   unsigned long  pfn, i;
+   unsigned long i;
struct page   *page;
struct xen_memory_reservation reservation = {
.address_bits = 0,
-   .extent_order = 0,
+   .extent_order = EXTENT_ORDER,
.domid= DOMID_SELF
};
 
@@ -352,7 +362,11 @@ static enum bp_state increase_reservation(unsigned long 
nr_pages)
nr_pages = i;
break;
}
-   frame_list[i] = page_to_pfn(page);
+
+   /* XENMEM_populate_physmap requires a PFN based on Xen
+* granularity.
+*/
+   frame_list[i] = page_to_xen_pfn(page);
page = balloon_next_page(page);
}
 
@@ -366,10 +380,15 @@ static enum bp_state increase_reservation(unsigned long 
nr_pages)
page = balloon_retrieve(false);
BUG_ON(page == NULL);
 
-   pfn = page_to_pfn(page);
-
 #ifdef CONFIG_XEN_HAVE_PVMMU
+   /* We don't support PV MMU when Linux and Xen is using
+* different page granularity.
+*/
+   BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   unsigned long pfn = page_to_pfn(page);
+
set_phys_to_machine(pfn, frame_list[i]);
 
/* Link back into the page tables if not highmem. */
@@ -396,14 +415,15 @@ static enum bp_state increase_reservation(unsigned long 
nr_pages)
 static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 {
enum bp_state state = BP_DONE;
-   unsigned long  pfn, i;
-   struct page   *page;
+   unsigned long i;
+   struct page *page, *tmp;
int ret;
struct xen_memory_reservation reservation = {
.address_bits = 0,
-   .extent_order = 0,
+   .extent_order = EXTENT_ORDER,
.domid= DOMID_SELF
};
+   LIST_HEAD(pages);
 
 #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
 

[Xen-devel] [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity

2015-09-07 Thread Julien Grall
The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity working as a
network backend on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Signed-off-by: Julien Grall 

---
Cc: Ian Campbell 
Cc: Wei Liu 
Cc: net...@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

Note that I haven't add a comment why the offset is 0 after the first
iteration. See [1] for more details.

[1] https://lkml.org/lkml/2015/8/10/456

Changes in v4:
- Add a comment to explain how we compute MAX_XEN_SKB_FRAGS

Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ based on the new naming
- gnttab_foreach_grant has been renamed to gnttab_forach_grant_in_range
- The grant callback doesn't allow anymore to use less data. An
helpers has been added in netback to handle this.

Changes in v2:
- Correctly set MAX_GRANT_COPY_OPS and XEN_NETBK_RX_SLOTS_MAX
- Don't use XEN_PAGE_SIZE in handle_frag_list as we coalesce
fragment into a new skb
- Use gnntab_foreach_grant to split a Linux page into grant
---
 drivers/net/xen-netback/common.h  |  18 +++--
 drivers/net/xen-netback/netback.c | 153 --
 2 files changed, 110 insertions(+), 61 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a495b3..24cb365 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 typedef unsigned int pending_ring_idx_t;
@@ -64,8 +65,8 @@ struct pending_tx_info {
struct ubuf_info callback_struct;
 };
 
-#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 struct xenvif_rx_meta {
int id;
@@ -80,16 +81,21 @@ struct xenvif_rx_meta {
 /* Discriminate from any valid pending_idx value. */
 #define INVALID_PENDING_IDX 0x
 
-#define MAX_BUFFER_OFFSET PAGE_SIZE
+#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
 
 #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
 
+/* The maximum number of frags is derived from the size of a grant (same
+ * as a Xen page size for now).
+ */
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
 /* It's possible for an skb to have a maximal number of frags
  * but still be less than MAX_BUFFER_OFFSET in size. Thus the
- * worst-case number of copy operations is MAX_SKB_FRAGS per
+ * worst-case number of copy operations is MAX_XEN_SKB_FRAGS per
  * ring slot.
  */
-#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
+#define MAX_GRANT_COPY_OPS (MAX_XEN_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
 
 #define NETBACK_INVALID_HANDLE -1
 
@@ -203,7 +209,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
 /* Maximum number of Rx slots a to-guest packet may use, including the
  * slot needed for GSO meta-data.
  */
-#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1)
+#define XEN_NETBK_RX_SLOTS_MAX ((MAX_XEN_SKB_FRAGS + 1))
 
 enum state_bit_shift {
/* This bit marks that the vif is connected */
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index d4c1bc7..b1649aa 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -263,6 +263,80 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct 
xenvif_queue *queue,
return meta;
 }
 
+struct gop_frag_copy {
+   struct xenvif_queue *queue;
+   struct netrx_pending_operations *npo;
+   struct xenvif_rx_meta *meta;
+   int head;
+   int gso_type;
+
+   struct page *page;
+};
+
+static void xenvif_setup_copy_gop(unsigned long gfn,
+ unsigned int offset,
+ unsigned int *len,
+ struct gop_frag_copy *info)
+{
+   struct gnttab_copy *copy_gop;
+   struct xen_page_foreign *foreign;
+   /* Convenient aliases */
+   struct xenvif_queue *queue = info->queue;
+   struct netrx_pending_operations *npo = info->npo;
+   struct page *page = info->page;
+
+   BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
+
+   if (npo->copy_off == MAX_BUFFER_OFFSET)
+   info->meta = get_next_rx_buffer(queue, npo);
+
+   if (npo->copy_off + *len > MAX_BUFFER_OFFSET)
+   *len = MAX_BUFFER_OFFSET - npo->copy_off;
+
+   copy_gop = npo->copy + npo->copy_prod++;
+   copy_gop->flags = GNTCOPY_de

  1   2   >