Re: [Xen-devel] Xen Project and Xvisor.

2016-12-09 Thread Jason Long
Please look at 
"http://xhypervisor.org/pdf/Embedded_Hypervisor_Xvisor_A_comparative_analysis.pdf";.



On Friday, December 9, 2016 11:30 AM, Konrad Rzeszutek Wilk 
 wrote:
On Fri, Dec 09, 2016 at 06:50:03PM +, Jason Long wrote:

> Hello.
> I like to see Xen developer ideas and concerns about "Xvisor" hypervisor. Any 
> experiences and compares?

Um (https://github.com/xvisor/xvisor/blob/master/HOSTS), this:

M:x86_64 Generic
A:x86 64-bit
C:x86_64
V:Intel (http://www.intel.com)
E:QEMU (http://qemu.org/)
G:Work-In-Progress.
S:Work-In-Progress.
D:docs/x86/x86_64_generic.txt

As it looks it is geared towards ARM (and only runs on ARM).

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen Project and Xvisor.

2016-12-09 Thread Jason Long
I see "http://www.xvisor-x86.org/wiki/Main_Page"; and they want move it to x86.



On Friday, December 9, 2016 11:30 AM, Konrad Rzeszutek Wilk 
 wrote:
On Fri, Dec 09, 2016 at 06:50:03PM +, Jason Long wrote:

> Hello.
> I like to see Xen developer ideas and concerns about "Xvisor" hypervisor. Any 
> experiences and compares?

Um (https://github.com/xvisor/xvisor/blob/master/HOSTS), this:

M:x86_64 Generic
A:x86 64-bit
C:x86_64
V:Intel (http://www.intel.com)
E:QEMU (http://qemu.org/)
G:Work-In-Progress.
S:Work-In-Progress.
D:docs/x86/x86_64_generic.txt

As it looks it is geared towards ARM (and only runs on ARM).

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-4.6-testing baseline-only test] 68188: trouble: blocked/broken

2016-12-09 Thread Platform Team regression test user
This run is configured for baseline tests only.

flight 68188 xen-4.6-testing real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/68188/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-pvops 2 hosts-allocate  broken REGR. vs. 68153
 build-amd64-pvops 2 hosts-allocate  broken REGR. vs. 68153
 build-i3862 hosts-allocate  broken REGR. vs. 68153
 build-amd64   2 hosts-allocate  broken REGR. vs. 68153
 build-amd64-xsm   2 hosts-allocate  broken REGR. vs. 68153
 build-amd64-xtf   2 hosts-allocate  broken REGR. vs. 68153
 build-i386-xsm2 hosts-allocate  broken REGR. vs. 68153
 build-amd64-prev  2 hosts-allocate  broken REGR. vs. 68153
 build-i386-pvops  2 hosts-allocate  broken REGR. vs. 68153
 build-armhf-xsm   2 hosts-allocate  broken REGR. vs. 68153
 build-i386-prev   2 hosts-allocate  broken REGR. vs. 68153
 build-armhf   2 hosts-allocate  broken REGR. vs. 68153

Regressions which are regarded as allowable (not blocking):
 build-amd64   3 capture-logs   broken blocked in 68153
 build-amd64-prev  3 capture-logs   broken blocked in 68153
 build-amd64-pvops 3 capture-logs   broken blocked in 68153
 build-amd64-xsm   3 capture-logs   broken blocked in 68153
 build-amd64-xtf   3 capture-logs   broken blocked in 68153
 build-i386-xsm3 capture-logs   broken blocked in 68153
 build-i386-prev   3 capture-logs   broken blocked in 68153
 build-i386-pvops  3 capture-logs   broken blocked in 68153
 build-i3863 capture-logs   broken blocked in 68153

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-intel  1 build-check(1)  blocked n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-xtf-amd64-amd64-11 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-midway1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-migrupgrade  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
n/a
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1  1 build-check(1) blocked n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-winxpsp3  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemut-winxpsp3  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvh-amd   1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemut-debianhvm-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-qemut-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 build-i386-rumprun1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-pygrub   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-winxpsp3  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl-qcow2 1 build-check(1)   blocked  n/a
 test-amd64-amd64-amd64-pvgrub  1 build-check(1)   blocked  n/a
 test-xtf-amd64-amd64-21 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64  1 build-check(1) blocked n/a
 test-armhf-armhf-libvirt-xsm  1 build-check(1)   blocked  n/a
 build-amd64-rumprun   1 build-check(1)   blocked  n/a
 test-amd64-i386-xl1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   b

Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-09 Thread Boris Ostrovsky



On 12/09/2016 06:00 PM, Thomas Gleixner wrote:

On Fri, 9 Dec 2016, Boris Ostrovsky wrote:

On 12/09/2016 05:06 PM, Thomas Gleixner wrote:

On Thu, 8 Dec 2016, Thomas Gleixner wrote:

Boris, can you please verify if that makes the
topology_update_package_map() call which you placed into the Xen cpu
starting code obsolete ?


Will do. I did test your patch but without removing
topology_update_package_map() call. It complained about package IDs
being wrong, but that's expected until I fix Xen part.


That should not longer be the case as I changed the approach to that
management thing.



I didn't notice this email before I sent the earlier message.

Is these anything else besides this patch that I should use? I applied 
it to Linus tree and it didn't apply cleanly (there was some fuzz and 
such) so I wonder whether I am missing something.


-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/smpboot: Make logical package management more robust

2016-12-09 Thread Boris Ostrovsky



On 12/09/2016 06:02 PM, Boris Ostrovsky wrote:

On 12/09/2016 05:06 PM, Thomas Gleixner wrote:

On Thu, 8 Dec 2016, Thomas Gleixner wrote:

Boris, can you please verify if that makes the
topology_update_package_map() call which you placed into the Xen cpu
starting code obsolete ?


Will do. I did test your patch but without removing
topology_update_package_map() call. It complained about package IDs
being wrong, but that's expected until I fix Xen part.


Ignore my statement about earlier testing --- it was all on single-node 
machines.


Something is broken with multi-node on Intel, but failure modes are 
different. Prior to this patch build_sched_domain() reports an error and 
pretty soon we crash in scheduler (don't remember off the top of my 
head). With patch applied I crash mush later, when one of the drivers 
does kmalloc_node(.., cpu_to_node(cpu)) and cpu_to_node() returns 1, 
which should never happen ("x86: Booted up 1 node, 32 CPUs" is reported, 
for example).


2-node AMD box doesn't have these problems.

I haven't upgraded the Intel machine for about a month but this all must 
have happened in 4.9 timeframe.


So I can't answer your question since we clearly have other problems on 
Xen. I will be looking into this.


-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] remove dead code from arm/decode.c

2016-12-09 Thread Stefano Stabellini
The rt variable can only be 0 or 7, no need to check if it's 15.

Coverity-ID: 1381835

Signed-off-by: Stefano Stabellini 

diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
index c6f49a5..decd9dd 100644
--- a/xen/arch/arm/decode.c
+++ b/xen/arch/arm/decode.c
@@ -58,11 +58,6 @@ static int decode_thumb2(register_t pc, struct hsr_dabt 
*dabt, uint16_t hw1)
 /* Undefined opcodes */
 goto bad_thumb2;
 
-/* Store/Load single data item */
-if ( rt == 15 )
-/* XXX: Rt == 15 is only invalid for store instruction */
-goto bad_thumb2;
-
 if ( !load && sign )
 /* Store instruction doesn't support sign extension */
 goto bad_thumb2;

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] fix potential int overflow in efi/boot

2016-12-09 Thread Stefano Stabellini
Forgot to CC Jan again.

On Fri, 9 Dec 2016, Stefano Stabellini wrote:
> HorizontalResolution and VerticalResolution are 32bit, while size is
> 64bit. As it stands multiplications are evaluated with 32bit arithmetic,
> which could overflow. Cast HorizontalResolution to 64bit to avoid that.
> 
> Coverity-ID: 1381858
> 
> Signed-off-by: Stefano Stabellini 
> 
> ---
> Changes in v2:
> - remove stray space
> - fix other multiplication
> 
> diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
> index 56544dc..3e5e4ab 100644
> --- a/xen/common/efi/boot.c
> +++ b/xen/common/efi/boot.c
> @@ -684,10 +684,10 @@ static UINTN __init 
> efi_find_gop_mode(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,
>  break;
>  }
>  if ( !cols && !rows &&
> - mode_info->HorizontalResolution *
> + (UINTN)mode_info->HorizontalResolution *
>   mode_info->VerticalResolution > size )
>  {
> -size = mode_info->HorizontalResolution *
> +size = (UINTN)mode_info->HorizontalResolution *
> mode_info->VerticalResolution;
>  gop_mode = i;
>  }
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Question about porting IPMMU-VMSA Linux driver to XEN

2016-12-09 Thread Stefano Stabellini
On Thu, 8 Dec 2016, Oleksandr Tyshchenko wrote:
> On Thu, Dec 8, 2016 at 9:39 PM, Julien Grall  wrote:
> >
> >
> > On 08/12/16 17:06, Oleksandr Tyshchenko wrote:
> >>
> >> Hi Julien,
> >
> >
> > Hi Oleksandr,
> Hi Julien,
> 
> thank you for sharing your opinion.
> 
> >
> > As discussed on IRC, I CCed xen-devel and Stefano.
> >
> >> We would like to hear your opinion about the proper way of porting
> >> kernel driver to XEN.
> >> There is a Linux iommu driver "IPMMU VMSA" for supporting
> >> VMSA-compatible IPMMUs that are integrated in the newest Renesas SoCs.
> >> Mainline:
> >> http://lxr.free-electrons.com/source/drivers/iommu/ipmmu-vmsa.c
> >> But we would prefer to rely on code that hasn't reach upstream yet but
> >> shipped with BSP for this platform and seems to be more complete:
> >>
> >> https://git.kernel.org/cgit/linux/kernel/git/horms/renesas-bsp.git/tree/drivers/iommu/ipmmu-vmsa.c?h=v4.6/rcar-3.3.x
> >>
> >> For passthrough and future Remote processor (coproc) use-cases to work
> >> on R-Car Gen3 based platforms we need this driver to be ported to XEN.
> >> But I am in doubt how to do this in a right way.
> >>
> >> So, from my point of view there are two possible ways:
> >> 1. Try to keep this driver as close as possible from Linux like you
> >> did for arm-smmu driver. Even keeping the Linux style. I understand
> >> the main goal despite the overhead and so on.
> >> 2. Another way is to try to be as similar as possible from arm-smmu
> >> driver in XEN. In such case many common for both drivers things might
> >> be moved to the common part.
> >
> >
> > I don't think you would be able to share a lot of code between arm-smmu and
> > ipmmu-vmsa. The former is using stage-2 page table whilst ipmmu-vmsa is
> > using stage-1 page table. So the page table would have to be unshared in
> > your case. This is something we don't yet support on Xen ARM.
> >
> > Furthermore, I still want to keep arm-smmu as close as possible from Linux
> > mainline. When I first implemented the driver I chose to not stick on Linux,
> > but we decided to re-sync few months later because it was too hard to
> > maintain. One of the main advantage of keeping close to Linux is we can
> > backport bug more easily.
> I agree.
> This is especially important when the driver is "new".
> I mean when the driver is intensively developed. This leads to bunch of
> fixes, features that should be taken. So, we will likely move in this
> direction too.
> 
> >
> > I would prefer to see ipmmu-vmsa as close as Linux (BSP or mainline), but
> > this is not a strong requirement.
> I got it.
> 
> > Do you know why the changes you are
> > looking for are not yet upstreamed? What are the differences?
> At the moment I don't know why these changes haven't upstreamed yet.
> They look like features and bug fixes in most cases (multi context
> support, etc).
> So, I think it would be better to rely on the BSP at the first time
> and then re-sync with
> mainline when these changes reach upstream.

I am not completely against importing a non-upstream driver, but be
careful because not being upstream means that it hasn't been reviewed as
much as the upstream counterpart. It might have bugs. Once it reaches
upstream, it might be different and hard to sync.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-4.8-testing test] 103103: regressions - FAIL

2016-12-09 Thread osstest service owner
flight 103103 xen-4.8-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/103103/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-debianhvm-amd64 14 guest-saverestore.2 fail REGR. 
vs. 103036
 test-armhf-armhf-xl-credit2 15 guest-start/debian.repeat fail REGR. vs. 103036

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds15 guest-start/debian.repeat fail REGR. vs. 102998
 test-amd64-amd64-xl-rtds  9 debian-install   fail REGR. vs. 103036
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 103036

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 12 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  7967dafe6acce66193a8a81fa88ac4d3eb7b48aa
baseline version:
 xen  1f4ea1603570a91c486f2cd26c819d076f260f30

Last test of basis   103036  2016-12-07 16:58:05 Z2 days
Testing same since   103103  2016-12-08 19:26:08 Z1 days1 attempts


People who touched revisions under test:
  Ian Jackson 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-prev pass
 build-i386-prev  pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumprun 

Re: [Xen-devel] [RFC PATCH 21/24] ARM: vITS: handle INVALL command

2016-12-09 Thread Stefano Stabellini
On Fri, 9 Dec 2016, Andre Przywara wrote:
> >> I've been spending some time thinking about this, and I think we can in
> >> fact get away without ever propagating command from domains to the host.
> >>
> >> I made a list of all commands that possible require host ITS command
> >> propagation. There are two groups:
> >> 1: enabling/disabling LPIs: INV, INVALL
> >> 2: mapping/unmapping devices/events: DISCARD, MAPD, MAPTI.
> >>
> >> The second group can be handled by mapping all required devices up
> >> front, I will elaborate on that in a different email.
> >>
> >> For the first group, read below ...
> >>
> >> On 01/12/16 01:19, Stefano Stabellini wrote:
> >>> On Fri, 25 Nov 2016, Julien Grall wrote:
>  Hi,
> 
>  On 18/11/16 18:39, Stefano Stabellini wrote:
> > On Fri, 11 Nov 2016, Stefano Stabellini wrote:
> >> On Fri, 11 Nov 2016, Julien Grall wrote:
> >>> On 10/11/16 20:42, Stefano Stabellini wrote:
> >>> That's why in the approach we had on the previous series was "host ITS
> >>> command
> >>> should be limited when emulating guest ITS command". From my recall, 
> >>> in
> >>> that
> >>> series the host and guest LPIs was fully separated (enabling a guest
> >>> LPIs was
> >>> not enabling host LPIs).
> >>
> >> I am interested in reading what Ian suggested to do when the physical
> >> ITS queue is full, but I cannot find anything specific about it in the
> >> doc.
> >>
> >> Do you have a suggestion for this?
> >>
> >> The only things that come to mind right now are:
> >>
> >> 1) check if the ITS queue is full and busy loop until it is not 
> >> (spin_lock
> >> style)
> >> 2) check if the ITS queue is full and sleep until it is not (mutex 
> >> style)
> >
> > Another, probably better idea, is to map all pLPIs of a device when the
> > device is assigned to a guest (including Dom0). This is what was written
> > in Ian's design doc. The advantage of this approach is that Xen doesn't
> > need to take any actions on the physical ITS command queue when the
> > guest issues virtual ITS commands, therefore completely solving this
> > problem at the root. (Although I am not sure about enable/disable
> > commands: could we avoid issuing enable/disable on pLPIs?)
> 
>  In the previous design document (see [1]), the pLPIs are enabled when the
>  device is assigned to the guest. This means that it is not necessary to 
>  send
>  command there. This is also means we may receive a pLPI before the 
>  associated
>  vLPI has been configured.
> 
>  That said, given that LPIs are edge-triggered, there is no deactivate 
>  state
>  (see 4.1 in ARM IHI 0069C). So as soon as the priority drop is done, the 
>  same
>  LPIs could potentially be raised again. This could generate a storm.
> >>>
> >>> Thank you for raising this important point. You are correct.
> >>>
>  The priority drop is necessary if we don't want to block the reception of
>  interrupt for the current physical CPU.
> 
>  What I am more concerned about is this problem can also happen in normal
>  running (i.e the pLPI is bound to an vLPI and the vLPI is enabled). For
>  edge-triggered interrupt, there is no way to prevent them to fire again. 
>  Maybe
>  it is time to introduce rate-limit interrupt for ARM. Any opinions?
> >>>
> >>> Yes. It could be as simple as disabling the pLPI when Xen receives a
> >>> second pLPI before the guest EOIs the first corresponding vLPI, which
> >>> shouldn't happen in normal circumstances.
> >>>
> >>> We need a simple per-LPI inflight counter, incremented when a pLPI is
> >>> received, decremented when the corresponding vLPI is EOIed (the LR is
> >>> cleared).
> >>>
> >>> When the counter > 1, we disable the pLPI and request a maintenance
> >>> interrupt for the corresponding vLPI.
> >>
> >> So why do we need a _counter_? This is about edge triggered interrupts,
> >> I think we can just accumulate all of them into one.
> > 
> > The counter is not to re-inject the same amount of interrupts into the
> > guest, but to detect interrupt storms.
> 
> I was wondering if an interrupt "storm" could already be defined by
> "receiving an LPI while there is already one pending (in the guest's
> virtual pending table) and it being disabled by the guest". I admit that
> declaring two interrupts as a storm is a bit of a stretch, but in fact
> the guest had probably a reason for disabling it even though it
> fires, so Xen should just follow suit.
> The only difference is that we don't do it _immediately_ when the guest
> tells us (via INV), but only if needed (LPI actually fires).

Either way should work OK, I think.


> >>  - If the VLPI is enabled, we EOI it on the host and inject it.
> >>  - If the VLPI is disabled, we set the pending bit in the VCPU's
> >>pending table and EOI on the host - to allow other IRQs.
> >

Re: [Xen-devel] [RFC XEN PATCH 05/16] xen/x86: release pmem pages at domain destroy

2016-12-09 Thread Konrad Rzeszutek Wilk
On Mon, Oct 10, 2016 at 08:32:24AM +0800, Haozhong Zhang wrote:
> The host pmem pages mapped to a domain are unassigned at domain destroy
> so as to be used by other domains in future.
> 
> Signed-off-by: Haozhong Zhang 
> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> ---
>  xen/arch/x86/domain.c  |  5 +
>  xen/arch/x86/pmem.c| 41 +
>  xen/include/xen/pmem.h |  1 +
>  3 files changed, 47 insertions(+)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 1bd5eb6..05ab389 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -61,6 +61,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -2512,6 +2513,10 @@ int domain_relinquish_resources(struct domain *d)
>  if ( ret )
>  return ret;
>  
> +ret = pmem_teardown(d);
> +if ( ret )
> +return ret;

Good, so if ret == -ERESTART it preempts, but..
> +
>  /* Tear down paging-assistance stuff. */
>  ret = paging_teardown(d);
>  if ( ret )
> diff --git a/xen/arch/x86/pmem.c b/xen/arch/x86/pmem.c
> index e4dc685..50e496b 100644
> --- a/xen/arch/x86/pmem.c
> +++ b/xen/arch/x86/pmem.c
> @@ -282,3 +282,44 @@ int pmem_populate(struct xen_pmemmap_args *args)
>  args->nr_done = i;
>  return rc;
>  }
> +
> +static int pmem_teardown_preemptible(struct domain *d, int *preempted)
> +{
> +struct page_info *pg, *next;
> +int rc = 0;
> +
> +spin_lock(&d->pmem_lock);
> +
> +page_list_for_each_safe (pg, next, &d->pmem_page_list )
> +{
> +BUG_ON(page_get_owner(pg) != d);
> +BUG_ON(page_state_is(pg, free));
> +
> +page_list_del(pg, &d->pmem_page_list);
> +page_set_owner(pg, NULL);
> +pg->count_info = (pg->count_info & ~PGC_count_mask) | PGC_state_free;
> +
> +if ( preempted && hypercall_preempt_check() )
> +{
> +*preempted = 1;

.. you don't set rc = -ERSTART ?

> +goto out;
> +}
> +}
> +
> + out:
> +spin_unlock(&d->pmem_lock);
> +return rc;
> +}
> +
> +int pmem_teardown(struct domain *d)
> +{
> +int preempted = 0;
> +
> +ASSERT(d->is_dying);
> +ASSERT(d != current->domain);
> +
> +if ( !has_hvm_container_domain(d) || !paging_mode_translate(d) )
> +return -EINVAL;
> +
> +return pmem_teardown_preemptible(d, &preempted);

Not exactly sure what the 'preempted' is for? You don't seem to be
using it here?

Perhaps you meant to do:

  rc = pmem_teardown_preemptible(d, &preempted);
  if ( preempted )
return -ERESTART;
  return rc;
?

> +}
> diff --git a/xen/include/xen/pmem.h b/xen/include/xen/pmem.h
> index 60adf56..ffbef1c 100644
> --- a/xen/include/xen/pmem.h
> +++ b/xen/include/xen/pmem.h
> @@ -37,5 +37,6 @@ int pmem_add(unsigned long spfn, unsigned long epfn,
>   unsigned long rsv_spfn, unsigned long rsv_epfn,
>   unsigned long data_spfn, unsigned long data_epfn);
>  int pmem_populate(struct xen_pmemmap_args *args);
> +int pmem_teardown(struct domain *d);
>  
>  #endif /* __XEN_PMEM_H__ */
> -- 
> 2.10.1
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH 04/16] xen/x86: add XENMEM_populate_pmemmap to map host pmem pages to guest

2016-12-09 Thread Konrad Rzeszutek Wilk
On Mon, Oct 10, 2016 at 08:32:23AM +0800, Haozhong Zhang wrote:
> XENMEM_populate_pmemmap is used by toolstack to map given host pmem pages
> to given guest pages. Only pages in the data area of a pmem region are
> allowed to be mapped to guest.
> 
> Signed-off-by: Haozhong Zhang 
> ---
> Cc: Ian Jackson 
> Cc: Wei Liu 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> ---
>  tools/libxc/include/xenctrl.h |   8 +++
>  tools/libxc/xc_domain.c   |  14 +
>  xen/arch/x86/pmem.c   | 123 
> ++
>  xen/common/domain.c   |   3 ++
>  xen/common/memory.c   |  31 +++
>  xen/include/public/memory.h   |  14 -
>  xen/include/xen/pmem.h|  10 
>  xen/include/xen/sched.h   |   3 ++
>  8 files changed, 205 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 2c83544..46c71fc 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2710,6 +2710,14 @@ int xc_livepatch_revert(xc_interface *xch, char *name, 
> uint32_t timeout);
>  int xc_livepatch_unload(xc_interface *xch, char *name, uint32_t timeout);
>  int xc_livepatch_replace(xc_interface *xch, char *name, uint32_t timeout);
>  
> +/**
> + * Map host pmem pages at PFNs @mfn ~ (@mfn + @nr_mfns - 1) to
> + * guest physical pages at guest PFNs @gpfn ~ (@gpfn + @nr_mfns - 1)
> + */
> +int xc_domain_populate_pmemmap(xc_interface *xch, uint32_t domid,
> +   xen_pfn_t mfn, xen_pfn_t gpfn,
> +   unsigned int nr_mfns);
> +
>  /* Compat shims */
>  #include "xenctrl_compat.h"
>  
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 296b852..81a90a1 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -2520,6 +2520,20 @@ int xc_domain_soft_reset(xc_interface *xch,
>  domctl.domain = (domid_t)domid;
>  return do_domctl(xch, &domctl);
>  }
> +
> +int xc_domain_populate_pmemmap(xc_interface *xch, uint32_t domid,
> +   xen_pfn_t mfn, xen_pfn_t gpfn,
> +   unsigned int nr_mfns)
> +{
> +struct xen_pmemmap pmemmap = {
> +.domid   = domid,
> +.mfn = mfn,
> +.gpfn= gpfn,
> +.nr_mfns = nr_mfns,
> +};
> +return do_memory_op(xch, XENMEM_populate_pmemmap, &pmemmap, 
> sizeof(pmemmap));
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/pmem.c b/xen/arch/x86/pmem.c
> index 70358ed..e4dc685 100644
> --- a/xen/arch/x86/pmem.c
> +++ b/xen/arch/x86/pmem.c
> @@ -24,6 +24,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  #include 
>  
>  /*
> @@ -63,6 +66,48 @@ static int check_reserved_size(unsigned long rsv_mfns, 
> unsigned long total_mfns)
>  ((sizeof(*machine_to_phys_mapping) * total_mfns) >> PAGE_SHIFT);
>  }
>  
> +static int is_data_mfn(unsigned long mfn)

bool
> +{
> +struct list_head *cur;
> +int data = 0;
> +
> +ASSERT(spin_is_locked(&pmem_list_lock));
> +
> +list_for_each(cur, &pmem_list)
> +{
> +struct pmem *pmem = list_entry(cur, struct pmem, link);
> +
> +if ( pmem->data_spfn <= mfn && mfn < pmem->data_epfn )

You may want to change the first conditional to have 'mfn' on the left
side. And perhaps change 'mfn' to 'pfn' as that is what your structure
is called?

But ... maybe the #3 patch that introduces XENPF_pmem_add should
use 'data_smfn', 'data_emfn' and so on?

> +{
> +data = 1;
> +break;
> +}
> +}
> +
> +return data;
> +}
> +
> +static int pmem_page_valid(struct page_info *page, struct domain *d)

bool
> +{
> +/* only data area can be mapped to guest */
> +if ( !is_data_mfn(page_to_mfn(page)) )
> +{
> +dprintk(XENLOG_DEBUG, "pmem: mfn 0x%lx is not a pmem data page\n",
> +page_to_mfn(page));
> +return 0;
> +}
> +
> +/* inuse/offlined/offlining pmem page cannot be mapped to guest */
> +if ( !page_state_is(page, free) )
> +{
> +dprintk(XENLOG_DEBUG, "pmem: invalid page state of mfn 0x%lx: 
> 0x%lx\n",
> +page_to_mfn(page), page->count_info & PGC_state);
> +return 0;
> +}
> +
> +return 1;
> +}
> +
>  static int pmem_add_check(unsigned long spfn, unsigned long epfn,
>unsigned long rsv_spfn, unsigned long rsv_epfn,
>unsigned long data_spfn, unsigned long data_epfn)
> @@ -159,3 +204,81 @@ int pmem_add(unsigned long spfn, unsigned long epfn,
>   out:
>  return ret;
>  }
> +
> +static int pmem_assign_pages(struct domain *d,
> + struct page_info *pg, unsigned int order)
> +{
> +int rc = 0;
> +unsigned long i;
> +
> +spin_lock(&d->pmem_lock);
> +
> +if ( unlikely(d->is_dying) )
> +{
> +rc = -EINVAL;
> +goto out;
> +}
> +

Re: [Xen-devel] [RFC XEN PATCH 03/16] xen/x86: add a hypercall XENPF_pmem_add to report host pmem regions

2016-12-09 Thread Konrad Rzeszutek Wilk
On Mon, Oct 10, 2016 at 08:32:22AM +0800, Haozhong Zhang wrote:
> Xen hypervisor does not include a pmem driver. Instead, it relies on the
> pmem driver in Dom0 to report the PFN ranges of the entire pmem region,
> its reserved area and data area via XENPF_pmem_add. The reserved area is
> used by Xen hypervisor to place the frame table and M2P table, and is
> disallowed to be accessed from Dom0 once it's reported.
> 
> Signed-off-by: Haozhong Zhang 
> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Daniel De Graaf 
> ---
>  xen/arch/x86/Makefile |   1 +
>  xen/arch/x86/platform_hypercall.c |   7 ++
>  xen/arch/x86/pmem.c   | 161 
> ++
>  xen/arch/x86/x86_64/mm.c  |  54 +
>  xen/include/asm-x86/mm.h  |   4 +
>  xen/include/public/platform.h |  14 
>  xen/include/xen/pmem.h|  31 
>  xen/xsm/flask/hooks.c |   1 +
>  8 files changed, 273 insertions(+)
>  create mode 100644 xen/arch/x86/pmem.c
>  create mode 100644 xen/include/xen/pmem.h
> 
> diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
> index 931917d..9cf2da1 100644
> --- a/xen/arch/x86/Makefile
> +++ b/xen/arch/x86/Makefile
> @@ -67,6 +67,7 @@ obj-$(CONFIG_TBOOT) += tboot.o
>  obj-y += hpet.o
>  obj-y += vm_event.o
>  obj-y += xstate.o
> +obj-y += pmem.o

If possible please keep this alphabetical. Also I wonder if it makes
sense to have CONFIG_PMEM or such?

>  
>  x86_emulate.o: x86_emulate/x86_emulate.c x86_emulate/x86_emulate.h
>  
> diff --git a/xen/arch/x86/platform_hypercall.c 
> b/xen/arch/x86/platform_hypercall.c
> index 0879e19..c47eea4 100644
> --- a/xen/arch/x86/platform_hypercall.c
> +++ b/xen/arch/x86/platform_hypercall.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -822,6 +823,12 @@ ret_t 
> do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>  }
>  break;
>  
> +case XENPF_pmem_add:

Missing call to ret = xsm_resource_plug_core(XSM_HOOK);
or something similar .

> +ret = pmem_add(op->u.pmem_add.spfn, op->u.pmem_add.epfn,
> +   op->u.pmem_add.rsv_spfn, op->u.pmem_add.rsv_epfn,
> +   op->u.pmem_add.data_spfn, op->u.pmem_add.data_epfn);
> +break;
> +
>  default:
>  ret = -ENOSYS;
>  break;
> diff --git a/xen/arch/x86/pmem.c b/xen/arch/x86/pmem.c
> new file mode 100644
> index 000..70358ed
> --- /dev/null
> +++ b/xen/arch/x86/pmem.c
> @@ -0,0 +1,161 @@
> +/**
> + * arch/x86/pmem.c
> + *
> + * Copyright (c) 2016, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.

Hm, please consult Intel lawyers with what '(at your option)' what other
later versions they are comfortable with.

> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see .
> + *
> + * Author: Haozhong Zhang 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 

Since this is a new file could I ask you sort these alphabetically?

> +#include 
> +#include 
> +
> +/*
> + * All pmem regions reported from Dom0 are linked in pmem_list, which
> + * is proected by pmem_list_lock. Its entries are of type struct pmem

protected
> + * and sorted incrementally by field spa.
> + */
> +static DEFINE_SPINLOCK(pmem_list_lock);
> +static LIST_HEAD(pmem_list);
> +
> +struct pmem {
> +struct list_head link;   /* link to pmem_list */
> +unsigned long spfn;  /* start PFN of the whole pmem region */
> +unsigned long epfn;  /* end PFN of the whole pmem region */
> +unsigned long rsv_spfn;  /* start PFN of the reserved area */
> +unsigned long rsv_epfn;  /* end PFN of the reserved area */
> +unsigned long data_spfn; /* start PFN of the data area */
> +unsigned long data_epfn; /* end PFN of the data area */

Why not just:
struct pmem {
struct list_head link;
struct xenpf_pmem_add pmem;
}

or such?

> +};
> +
> +static int is_included(unsigned long s1, unsigned long e1,

bool?
> +   unsigned long s2, unsigned long e2)
> +{
> +return s1 <= s2 && s2 < e2 && e2 <= e1;

Is the s2 < e2 necessary?

> +}
> +
> +static int is_overlaped(unsigned long s1, unsigned long e1,

overlapped and perhaps bool?

> +unsigned long s2, unsigned lo

[Xen-devel] [linux-4.1 test] 103089: regressions - FAIL

2016-12-09 Thread osstest service owner
flight 103089 linux-4.1 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/103089/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-winxpsp3  6 xen-boot   fail REGR. vs. 101737
 test-amd64-amd64-xl   6 xen-boot fail REGR. vs. 101737
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 6 xen-boot fail REGR. vs. 
101737
 test-amd64-i386-xl-qemut-win7-amd64  6 xen-boot  fail REGR. vs. 101737
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail REGR. vs. 101737
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm  6 xen-boot fail REGR. vs. 101737
 test-amd64-i386-pair  9 xen-boot/src_hostfail REGR. vs. 101737
 test-amd64-i386-pair 10 xen-boot/dst_hostfail REGR. vs. 101737
 test-amd64-amd64-qemuu-nested-intel  6 xen-boot  fail REGR. vs. 101737
 test-amd64-amd64-xl-pvh-intel  6 xen-bootfail REGR. vs. 101737
 test-amd64-amd64-xl-multivcpu  6 xen-bootfail REGR. vs. 101737
 test-amd64-i386-freebsd10-amd64  6 xen-boot  fail REGR. vs. 101737
 build-armhf-pvops 5 kernel-build   fail in 102733 REGR. vs. 101737

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 3 host-install(3) broken in 103011 
pass in 103089
 test-amd64-amd64-xl-qemut-win7-amd64 3 host-install(3) broken in 103011 pass 
in 103089
 test-amd64-i386-xl   3 host-install(3) broken in 103011 pass in 103089
 test-amd64-amd64-xl-qemuu-win7-amd64 3 host-install(3) broken in 103011 pass 
in 103089
 test-amd64-i386-xl-qemut-winxpsp3 3 host-install(3) broken in 103011 pass in 
103089
 test-amd64-i386-xl-qemuu-ovmf-amd64 3 host-install(3) broken in 103011 pass in 
103089
 test-amd64-amd64-libvirt-vhd 9 debian-di-install fail in 102733 pass in 102886
 test-amd64-amd64-xl-xsm 19 guest-start/debian.repeat fail in 102755 pass in 
103089
 test-armhf-armhf-libvirt-xsm 14 guest-stop   fail in 102755 pass in 103089
 test-armhf-armhf-xl-multivcpu 11 guest-start fail in 102829 pass in 103089
 test-amd64-i386-xl-xsm6 xen-boot fail in 102886 pass in 103089
 test-amd64-amd64-qemuu-nested-amd 9 debian-hvm-install fail in 102886 pass in 
103089
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 9 windows-install fail in 102886 pass 
in 103089
 test-amd64-i386-xl-qemut-debianhvm-amd64 9 debian-hvm-install fail in 103011 
pass in 103089
 test-armhf-armhf-libvirt-xsm 11 guest-start  fail in 103011 pass in 103089
 test-amd64-amd64-libvirt  6 xen-boot   fail pass in 102733
 test-amd64-i386-qemuu-rhel6hvm-intel  6 xen-boot   fail pass in 102733
 test-amd64-i386-xl-raw6 xen-boot   fail pass in 102733
 test-amd64-i386-libvirt-xsm   6 xen-boot   fail pass in 102755
 test-amd64-amd64-xl-qemut-winxpsp3  6 xen-boot fail pass in 102755
 test-amd64-i386-qemut-rhel6hvm-intel  6 xen-boot   fail pass in 102829
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 6 xen-boot fail pass in 
102886
 test-amd64-amd64-libvirt-vhd  6 xen-boot   fail pass in 102886
 test-amd64-i386-freebsd10-i386  6 xen-boot fail pass in 103011
 test-armhf-armhf-xl-arndale  15 guest-start/debian.repeat  fail pass in 103011

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-credit2  11 guest-start fail in 102829 like 101737
 test-armhf-armhf-xl-xsm  11 guest-start fail in 102829 like 101737
 test-armhf-armhf-xl-rtds 16 guest-start.2 fail in 102886 blocked in 101737
 test-amd64-i386-rumprun-i386 16 rumprun-demo-xenstorels/xenstorels.repeat fail 
in 102886 blocked in 101737
 test-armhf-armhf-xl-credit2 15 guest-start/debian.repeat fail in 102886 like 
101687
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop  fail in 102886 like 101737
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop  fail in 102886 like 101737
 test-armhf-armhf-xl15 guest-start/debian.repeat fail in 102886 like 101737
 test-armhf-armhf-libvirt 15 guest-start/debian.repeatfail  like 101672
 test-armhf-armhf-xl  11 guest-start  fail  like 101672
 test-armhf-armhf-xl-xsm  15 guest-start/debian.repeatfail  like 101715
 test-armhf-armhf-xl-cubietruck 15 guest-start/debian.repeat   fail like 101715
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeatfail  like 101715
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 101737
 test-armhf-armhf-libvirt-xsm 15 guest-start/debian.repeatfail  like 101737
 test-armhf-armhf-xl-multivcpu 15 guest-start/debian.repeatfail like 101737
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 101737
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 101737
 test-armhf-armhf-xl-vhd   9 debian-di-installfail  like 101737

Re: [Xen-devel] [RFC XEN PATCH 02/16] x86_64/mm: explicitly specify the location to place the M2P table

2016-12-09 Thread Konrad Rzeszutek Wilk
On Mon, Oct 10, 2016 at 08:32:21AM +0800, Haozhong Zhang wrote:
> A reserved area on each pmem region is used to place the M2P table.
> However, it's not at the beginning of the pmem region, so we need to
> specify the location explicitly when creating the M2P table.
> 
> Signed-off-by: Haozhong Zhang 
> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> ---
>  xen/arch/x86/x86_64/mm.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> index 33f226a..5c0f527 100644
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -317,7 +317,8 @@ void destroy_m2p_mapping(struct mem_hotadd_info *info)
>   * spfn/epfn: the pfn ranges to be setup
>   * free_s/free_e: the pfn ranges that is free still
>   */
> -static int setup_compat_m2p_table(struct mem_hotadd_info *info)
> +static int setup_compat_m2p_table(struct mem_hotadd_info *info,
> +  struct mem_hotadd_info *alloc_info)
>  {
>  unsigned long i, va, smap, emap, rwva, epfn = info->epfn, mfn;
>  unsigned int n;
> @@ -371,7 +372,7 @@ static int setup_compat_m2p_table(struct mem_hotadd_info 
> *info)
>  if ( n == CNT )
>  continue;
>  
> -mfn = alloc_hotadd_mfn(info);
> +mfn = alloc_hotadd_mfn(alloc_info);
>  err = map_pages_to_xen(rwva, mfn, 1UL << PAGETABLE_ORDER,
> PAGE_HYPERVISOR);
>  if ( err )
> @@ -391,7 +392,8 @@ static int setup_compat_m2p_table(struct mem_hotadd_info 
> *info)
>   * Allocate and map the machine-to-phys table.
>   * The L3 for RO/RWRW MPT and the L2 for compatible MPT should be setup 
> already
>   */
> -static int setup_m2p_table(struct mem_hotadd_info *info)
> +static int setup_m2p_table(struct mem_hotadd_info *info,
> +   struct mem_hotadd_info *alloc_info)
>  {
>  unsigned long i, va, smap, emap;
>  unsigned int n;
> @@ -440,7 +442,7 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
>  break;
>  if ( n < CNT )
>  {
> -unsigned long mfn = alloc_hotadd_mfn(info);
> +unsigned long mfn = alloc_hotadd_mfn(alloc_info);
>  
>  ret = map_pages_to_xen(
>  RDWR_MPT_VIRT_START + i * sizeof(unsigned long),
> @@ -485,7 +487,7 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
>  #undef CNT
>  #undef MFN
>  
> -ret = setup_compat_m2p_table(info);
> +ret = setup_compat_m2p_table(info, alloc_info);
>  error:
>  return ret;
>  }
> @@ -1427,7 +1429,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, 
> unsigned int pxm)
>  total_pages += epfn - spfn;
>  
>  set_pdx_range(spfn, epfn);
> -ret = setup_m2p_table(&info);
> +ret = setup_m2p_table(&info, &info);

I am not sure I follow this logic. You are passing the same contents, it
is just that 'alloc_info' and 'info' are aliased together?


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH 01/16] x86_64/mm: explicitly specify the location to place the frame table

2016-12-09 Thread Konrad Rzeszutek Wilk
On Mon, Oct 10, 2016 at 08:32:20AM +0800, Haozhong Zhang wrote:
> A reserved area on each pmem region is used to place the frame table.
> However, it's not at the beginning of the pmem region, so we need to
> specify the location explicitly when extending the frame table.
> 
> Signed-off-by: Haozhong Zhang 
> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> ---
>  xen/arch/x86/x86_64/mm.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> index b8b6b70..33f226a 100644
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -792,7 +792,8 @@ static int setup_frametable_chunk(void *start, void *end,
>  return 0;
>  }
>  
> -static int extend_frame_table(struct mem_hotadd_info *info)
> +static int extend_frame_table(struct mem_hotadd_info *info,

This looks like it could be 'const struct mem_hotadd_info *info' ?

> +  struct mem_hotadd_info *alloc_info)
>  {
>  unsigned long cidx, nidx, eidx, spfn, epfn;
>  
> @@ -818,9 +819,9 @@ static int extend_frame_table(struct mem_hotadd_info 
> *info)
>  nidx = find_next_bit(pdx_group_valid, eidx, cidx);
>  if ( nidx >= eidx )
>  nidx = eidx;
> -err = setup_frametable_chunk(pdx_to_page(cidx * PDX_GROUP_COUNT ),
> +err = setup_frametable_chunk(pdx_to_page(cidx * PDX_GROUP_COUNT),
>   pdx_to_page(nidx * PDX_GROUP_COUNT),
> - info);
> + alloc_info);

Granted this one modifies the 'alloc_info' in
alloc_hotadd_mfn, and 'alloc_info'
>  if ( err )
>  return err;
>  
> @@ -1413,7 +1414,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, 
> unsigned int pxm)
>  info.epfn = epfn;
>  info.cur = spfn;
>  
> -ret = extend_frame_table(&info);
> +ret = extend_frame_table(&info, &info);

is equivalant to 'info' so I am not sure I understand the purpose
behind this patch?

Thanks.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/setup: Don't relocate p2m/initrd over existing one

2016-12-09 Thread Boris Ostrovsky
On 12/09/2016 10:50 AM, Ross Lagerwall wrote:
> When relocating the p2m/initrd, take special care not to relocate it so
> that is overlaps with the current location of the p2m/initrd. This is
> needed since the full extent of the current location is not marked as a
> reserved region in the e820 (and it shouldn't be since it is about to be
> moved).
>
> This was seen to happen to a dom0 with a large initial p2m and a small
> reserved region in the middle of the initial p2m.
>


>  
>  /*
> - * Find a free area in physical memory not yet reserved and compliant with
> - * E820 map.
> + * Find a free area in physical memory not yet reserved, compliant with the
> + * E820 map and not overlapping with the pre-allocated area.
>   * Used to relocate pre-allocated areas like initrd or p2m list which are in
>   * conflict with the to be used E820 map.
>   * In case no area is found, return 0. Otherwise return the physical address
>   * of the area which is already reserved for convenience.
>   */
> -phys_addr_t __init xen_find_free_area(phys_addr_t size)
> +phys_addr_t __init xen_find_free_area(phys_addr_t size, phys_addr_t 
> cur_start,
> +   phys_addr_t cur_size)
>  {
>   unsigned mapcnt;
>   phys_addr_t addr, start;
> @@ -652,7 +653,8 @@ phys_addr_t __init xen_find_free_area(phys_addr_t size)
>   continue;
>   start = entry->addr;
>   for (addr = start; addr < start + size; addr += PAGE_SIZE) {
> - if (!memblock_is_reserved(addr))
> + if (!memblock_is_reserved(addr) &&
> + (addr < cur_start || addr >= cur_start + cur_size))
>   continue;
>   start = addr + PAGE_SIZE;
>   if (start + size > entry->addr + entry->size)


Wouldn't this in fact make it even more likely to return pointer to the
currently allocated chunk?

If we are in [cur_start, cur_start+cur_size) then we assign this address
to 'start' and may well return it, no?

-boris



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable test] 103102: regressions - FAIL

2016-12-09 Thread osstest service owner
flight 103102 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/103102/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-arndale   6 xen-boot fail REGR. vs. 102942

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 102942
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 102942
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 102942
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stopfail like 102942
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 102942
 test-armhf-armhf-libvirt-qcow2 12 saverestore-support-check   fail like 102942
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 102942
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 102942
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 102942

Tests which did not succeed, but are not blocking:
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  3f916247afc3ed8f1ba4c1bc248eaaa7afe962d8
baseline version:
 xen  8e4b2676685f50bc26f03b5f62d8b7aea8e69dbf

Last test of basis   102942  2016-12-05 12:53:37 Z4 days
Failing since102997  2016-12-06 11:29:43 Z3 days3 attempts
Testing same since   103102  2016-12-08 18:55:56 Z1 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Andrii Anisov 
  Artem Mygaiev 
  Cedric Bosdonnat 
  Cédric Bosdonnat 
  Daniel De Graaf 
  Daniel Kiper 
  Ian Jackson 
  Jan Beulich 
  Juergen Gross 
  Julien Grall 
  Jun Sun 
  Oleksandr Tyshchenko 
  Oleksandr Tyshchenko 
  Peng Fan 
  Roger Pau Monné 
  Sameer Goel 
  Samuel Thibault 
  Stefano Stabellini 
  Steve Capper 
  Tamas K Lengyel 
  Tim Deegan 
  Wei Liu 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-ol

Re: [Xen-devel] [RFC PATCH 21/24] ARM: vITS: handle INVALL command

2016-12-09 Thread Stefano Stabellini
On Fri, 9 Dec 2016, Andre Przywara wrote:
> On 07/12/16 20:20, Stefano Stabellini wrote:
> > On Tue, 6 Dec 2016, Julien Grall wrote:
> >> On 06/12/2016 22:01, Stefano Stabellini wrote:
> >>> On Tue, 6 Dec 2016, Stefano Stabellini wrote:
>  moving a vCPU with interrupts assigned to it is slower than moving a
>  vCPU without interrupts assigned to it. You could say that the
>  slowness is directly proportional do the number of interrupts assigned
>  to the vCPU.
> >>>
> >>> To be pedantic, by "assigned" I mean that a physical interrupt is routed
> >>> to a given pCPU and is set to be forwarded to a guest vCPU running on it
> >>> by the _IRQ_GUEST flag. The guest could be dom0. Upon receiving one of
> >>> these physical interrupts, a corresponding virtual interrupt (could be a
> >>> different irq) will be injected into the guest vCPU.
> >>>
> >>> When the vCPU is migrated to a new pCPU, the physical interrupts that
> >>> are configured to be injected as virtual interrupts into the vCPU, are
> >>> migrated with it. The physical interrupt migration has a cost. However,
> >>> receiving physical interrupts on the wrong pCPU has an higher cost.
> >>
> >> I don't understand why it is a problem for you to receive the first 
> >> interrupt
> >> to the wrong pCPU and moving it if necessary.
> >>
> >> While this may have an higher cost (I don't believe so) on the first 
> >> received
> >> interrupt, migrating thousands of interrupts at the same time is very
> >> expensive and will likely get Xen stuck for a while (think about ITS with a
> >> single command queue).
> >>
> >> Furthermore, the current approach will move every single interrupt routed a
> >> the vCPU, even those disabled. That's pointless and a waste of resource. 
> >> You
> >> may argue that we can skip the ones disabled, but in that case what would 
> >> be
> >> the benefits to migrate the IRQs while migrate the vCPUs?
> >>
> >> So I would suggest to spread it over the time. This also means less 
> >> headache
> >> for the scheduler developers.
> > 
> > The most important aspect of interrupts handling in Xen is latency,
> > measured as the time between Xen receiving a physical interrupt and the
> > guest receiving it. This latency should be both small and deterministic.
> > 
> > We all agree so far, right?
> > 
> > 
> > The issue with spreading interrupts migrations over time is that it makes
> > interrupt latency less deterministic. It is OK, in the uncommon case of
> > vCPU migration with interrupts, to take a hit for a short time. This
> > "hit" can be measured. It can be known. If your workload cannot tolerate
> > it, vCPUs can be pinned. It should be a rare event anyway. On the other
> > hand, by spreading interrupts migrations, we make it harder to predict
> > latency. Aside from determinism, another problem with this approach is
> > that it ensures that every interrupt assigned to a vCPU will first hit
> > the wrong pCPU, then it will be moved. It guarantees the worst-case
> > scenario for interrupt latency for the vCPU that has been moved. If we
> > migrated all interrupts as soon as possible, we would minimize the
> > amount of interrupts delivered to the wrong pCPU. Most interrupts would
> > be delivered to the new pCPU right away, reducing interrupt latency.
> 
> So if this is such a crucial issue, why don't we use the ITS for good
> this time? The ITS hardware probably supports 16 bits worth of
> collection IDs, so what about we assign each VCPU (in every guest) a
> unique collection ID on the host and do a MAPC & MOVALL on a VCPU
> migration to let it point to the right physical redistributor.
> I see that this does not cover all use cases (> 65536 VCPUs, for
> instance), also depends much of many implementation details:

This is certainly an idea worth exploring. We don't need to assign a
collection ID to every vCPU, just the ones that have LPIs assigned to
them, which should be considerably fewer.


> - How costly is a MOVALL? It needs to scan the pending table and
> transfer set bits to the other redistributor, which may take a while.

This is an hardware operation, even if it is not fast, I'd prefer to
rely on that, rather than implementing something complex in software.
Usually hardware gets better over time at this sort of things.


> - Is there an impact if we exceed the number of hardware backed
> collections (GITS_TYPE[HCC])? If the ITS is forced to access system
> memory for every table lookup, this may slow down everyday operations.

We'll have to fall back to manually moving them one by one.


> - How likely are those misdirected interrupts in the first place? How
> often do we migrate VCPU compared to the the interrupt frequency?

This is where is scheduler work comes in.


> There are more, subtle parameters to consider, so I guess we just need
> to try and measure.

That's right. This is why I have been saying that we need numbers. This
is difficult hardware, difficult code and difficult scenarios. Intuition
onl

Re: [Xen-devel] [RFC PATCH 21/24] ARM: vITS: handle INVALL command

2016-12-09 Thread Stefano Stabellini
On Fri, 9 Dec 2016, Julien Grall wrote:
> Hi Stefano,
> 
> On 07/12/16 20:20, Stefano Stabellini wrote:
> > On Tue, 6 Dec 2016, Julien Grall wrote:
> > > On 06/12/2016 22:01, Stefano Stabellini wrote:
> > > > On Tue, 6 Dec 2016, Stefano Stabellini wrote:
> > > > > moving a vCPU with interrupts assigned to it is slower than moving a
> > > > > vCPU without interrupts assigned to it. You could say that the
> > > > > slowness is directly proportional do the number of interrupts assigned
> > > > > to the vCPU.
> > > > 
> > > > To be pedantic, by "assigned" I mean that a physical interrupt is routed
> > > > to a given pCPU and is set to be forwarded to a guest vCPU running on it
> > > > by the _IRQ_GUEST flag. The guest could be dom0. Upon receiving one of
> > > > these physical interrupts, a corresponding virtual interrupt (could be a
> > > > different irq) will be injected into the guest vCPU.
> > > > 
> > > > When the vCPU is migrated to a new pCPU, the physical interrupts that
> > > > are configured to be injected as virtual interrupts into the vCPU, are
> > > > migrated with it. The physical interrupt migration has a cost. However,
> > > > receiving physical interrupts on the wrong pCPU has an higher cost.
> > > 
> > > I don't understand why it is a problem for you to receive the first
> > > interrupt
> > > to the wrong pCPU and moving it if necessary.
> > > 
> > > While this may have an higher cost (I don't believe so) on the first
> > > received
> > > interrupt, migrating thousands of interrupts at the same time is very
> > > expensive and will likely get Xen stuck for a while (think about ITS with
> > > a
> > > single command queue).
> > > 
> > > Furthermore, the current approach will move every single interrupt routed
> > > a
> > > the vCPU, even those disabled. That's pointless and a waste of resource.
> > > You
> > > may argue that we can skip the ones disabled, but in that case what would
> > > be
> > > the benefits to migrate the IRQs while migrate the vCPUs?
> > > 
> > > So I would suggest to spread it over the time. This also means less
> > > headache
> > > for the scheduler developers.
> > 
> > The most important aspect of interrupts handling in Xen is latency,
> > measured as the time between Xen receiving a physical interrupt and the
> > guest receiving it. This latency should be both small and deterministic.
> > 
> > We all agree so far, right?
> > 
> > 
> > The issue with spreading interrupts migrations over time is that it makes
> > interrupt latency less deterministic. It is OK, in the uncommon case of
> > vCPU migration with interrupts, to take a hit for a short time.  This
> > "hit" can be measured. It can be known. If your workload cannot tolerate
> > it, vCPUs can be pinned. It should be a rare event anyway.  On the other
> > hand, by spreading interrupts migrations, we make it harder to predict
> > latency. Aside from determinism, another problem with this approach is
> > that it ensures that every interrupt assigned to a vCPU will first hit
> > the wrong pCPU, then it will be moved.  It guarantees the worst-case
> > scenario for interrupt latency for the vCPU that has been moved. If we
> > migrated all interrupts as soon as possible, we would minimize the
> > amount of interrupts delivered to the wrong pCPU. Most interrupts would
> > be delivered to the new pCPU right away, reducing interrupt latency.
> 
> Migrating all the interrupts can be really expensive because in the current
> state we have to go through every single interrupt and check whether the
> interrupt has been routed to this vCPU. We will also route disabled interrupt.
> And this seems really pointless. This may need some optimization here.

Indeed, that should be fixed.


> With ITS, we may have thousand of interrupts routed to a vCPU. This means that
> for every interrupt we have to issue a command in the host ITS queue. You will
> likely fill up the command queue and add much more latency.
> 
> Even if you consider the vCPU migration to be a rare case. You could still get
> the pCPU stuck for tens of milliseconds, the time to migrate everything. And I
> don't think this is not acceptable.
[...]
> If the number increases, you may end up to have the scheduler to decide to not
> migrate the vCPU because it will be too expensive. But you may have a
> situation where migrating a vCPU with many interrupts is the only possible
> choice and you will slow down the platform.

A vCPU with thousand of interrupts routed to it, is the case where I
would push back to the scheduler. It should know that moving the vcpu
would be very costly.

Regardless, we need to figure out a way to move the interrupts without
"blocking" the platform for long. In practice, we might find a
threshold: a number of active interrupts above which we cannot move them
all at once anymore. Something like: we move the first 500 active
interrupts immediately, we delay the rest. We can find this threshold
only with practical measurements.


> Anyway, I would li

[Xen-devel] [PATCH v2 2/2] p2m: split mem_access into separate files

2016-12-09 Thread Tamas K Lengyel
This patch relocates mem_access components that are currently mixed with p2m
code into separate files. This better aligns the code with similar subsystems,
such as mem_sharing and mem_paging, which are already in separate files. There
are no code-changes introduced, the patch is mechanical code movement.

On ARM we also relocate the static inline gfn_next_boundary function to p2m.h
as it is a function the mem_access code needs access to.

Signed-off-by: Tamas K Lengyel 
Acked-by: Razvan Cojocaru 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: George Dunlap 

v2: Don't move ARM radix tree functions
Include asm/mem_accesss.h in xen/mem_access.h
---
 MAINTAINERS  |   2 +
 xen/arch/arm/Makefile|   1 +
 xen/arch/arm/mem_access.c| 431 
 xen/arch/arm/p2m.c   | 414 +--
 xen/arch/arm/traps.c |   1 +
 xen/arch/x86/mm/Makefile |   1 +
 xen/arch/x86/mm/mem_access.c | 462 +++
 xen/arch/x86/mm/p2m.c| 421 ---
 xen/arch/x86/vm_event.c  |   3 +-
 xen/common/mem_access.c  |   2 +-
 xen/include/asm-arm/mem_access.h |  53 +
 xen/include/asm-arm/p2m.h|  31 ++-
 xen/include/asm-x86/mem_access.h |  61 ++
 xen/include/asm-x86/p2m.h|  24 +-
 xen/include/xen/mem_access.h |  67 +-
 xen/include/xen/p2m-common.h |  52 -
 16 files changed, 1089 insertions(+), 937 deletions(-)
 create mode 100644 xen/arch/arm/mem_access.c
 create mode 100644 xen/arch/x86/mm/mem_access.c
 create mode 100644 xen/include/asm-arm/mem_access.h
 create mode 100644 xen/include/asm-x86/mem_access.h

diff --git a/MAINTAINERS b/MAINTAINERS
index f0d0202..fb26be3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -402,6 +402,8 @@ S:  Supported
 F: tools/tests/xen-access
 F: xen/arch/*/monitor.c
 F: xen/arch/*/vm_event.c
+F: xen/arch/arm/mem_access.c
+F: xen/arch/x86/mm/mem_access.c
 F: xen/arch/x86/hvm/monitor.c
 F: xen/common/mem_access.c
 F: xen/common/monitor.c
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index da39d39..b095e8a 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -24,6 +24,7 @@ obj-y += io.o
 obj-y += irq.o
 obj-y += kernel.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
+obj-y += mem_access.o
 obj-y += mm.o
 obj-y += monitor.o
 obj-y += p2m.o
diff --git a/xen/arch/arm/mem_access.c b/xen/arch/arm/mem_access.c
new file mode 100644
index 000..a6e5bcd
--- /dev/null
+++ b/xen/arch/arm/mem_access.c
@@ -0,0 +1,431 @@
+/*
+ * arch/arm/mem_access.c
+ *
+ * Architecture-specific mem_access handling routines
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int __p2m_get_mem_access(struct domain *d, gfn_t gfn,
+xenmem_access_t *access)
+{
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+void *i;
+unsigned int index;
+
+static const xenmem_access_t memaccess[] = {
+#define ACCESS(ac) [p2m_access_##ac] = XENMEM_access_##ac
+ACCESS(n),
+ACCESS(r),
+ACCESS(w),
+ACCESS(rw),
+ACCESS(x),
+ACCESS(rx),
+ACCESS(wx),
+ACCESS(rwx),
+ACCESS(rx2rw),
+ACCESS(n2rwx),
+#undef ACCESS
+};
+
+ASSERT(p2m_is_locked(p2m));
+
+/* If no setting was ever set, just return rwx. */
+if ( !p2m->mem_access_enabled )
+{
+*access = XENMEM_access_rwx;
+return 0;
+}
+
+/* If request to get default access. */
+if ( gfn_eq(gfn, INVALID_GFN) )
+{
+*access = memaccess[p2m->default_access];
+return 0;
+}
+
+i = radix_tree_lookup(&p2m->mem_access_settings, gfn_x(gfn));
+
+if ( !i )
+{
+/*
+ * No setting was found in the Radix tree. Check if the
+ * entry exists in the page-tables.
+ */
+mfn_t mfn = p2m_get_entry(p2m, gfn, NULL, NULL, NULL);
+
+if ( mfn_eq(mfn, INVALID_MFN) )
+return -ESRCH;
+
+/* If entry exists then its rwx. */
+*access = XENMEM_access_rwx;
+}
+else
+{
+/* Setting was found in the Radix tree. */
+index = radix_tree_ptr_to_int(i);
+if 

[Xen-devel] [PATCH v2 1/2] arm/mem_access: adjust check_and_get_page to not rely on current

2016-12-09 Thread Tamas K Lengyel
The only caller of this function is get_page_from_gva which already takes
a vcpu pointer as input. Pass this along to make the function in-line with
its intended use-case.

Signed-off-by: Tamas K Lengyel 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
 xen/arch/arm/p2m.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index cc5634b..837be1d 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1461,7 +1461,8 @@ mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn)
  * we indeed found a conflicting mem_access setting.
  */
 static struct page_info*
-p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag)
+p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag,
+  const struct vcpu *v)
 {
 long rc;
 paddr_t ipa;
@@ -1470,7 +1471,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned 
long flag)
 xenmem_access_t xma;
 p2m_type_t t;
 struct page_info *page = NULL;
-struct p2m_domain *p2m = ¤t->domain->arch.p2m;
+struct p2m_domain *p2m = &v->domain->arch.p2m;
 
 rc = gva_to_ipa(gva, &ipa, flag);
 if ( rc < 0 )
@@ -1482,7 +1483,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned 
long flag)
  * We do this first as this is faster in the default case when no
  * permission is set on the page.
  */
-rc = __p2m_get_mem_access(current->domain, gfn, &xma);
+rc = __p2m_get_mem_access(v->domain, gfn, &xma);
 if ( rc < 0 )
 goto err;
 
@@ -1546,7 +1547,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned 
long flag)
 
 page = mfn_to_page(mfn_x(mfn));
 
-if ( unlikely(!get_page(page, current->domain)) )
+if ( unlikely(!get_page(page, v->domain)) )
 page = NULL;
 
 err:
@@ -1587,7 +1588,7 @@ struct page_info *get_page_from_gva(struct vcpu *v, 
vaddr_t va,
 
 err:
 if ( !page && p2m->mem_access_enabled )
-page = p2m_mem_access_check_and_get_page(va, flags);
+page = p2m_mem_access_check_and_get_page(va, flags, v);
 
 p2m_read_unlock(p2m);
 
-- 
2.10.2


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1] arm/irq: Reorder check in route_irq_to_guest() to avoid 4 layers of "if"

2016-12-09 Thread Stefano Stabellini
On Fri, 9 Dec 2016, Julien Grall wrote:
> Hi Oleksandr,
> 
> Thank you for the patch.
> 
> On 06/12/16 17:53, Oleksandr Tyshchenko wrote:
> > From: Oleksandr Tyshchenko 
> > 
> > Remove one layer of "if" by reordering the check
> > in route_irq_to_guest() to make code more clearer.
> > 
> > Signed-off-by: Oleksandr Tyshchenko 
> > CC: Julien Grall 
> > ---
> >  xen/arch/arm/irq.c | 18 +++---
> >  1 file changed, 7 insertions(+), 11 deletions(-)
> > 
> > diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> > index 508028b..6d7e44e 100644
> > --- a/xen/arch/arm/irq.c
> > +++ b/xen/arch/arm/irq.c
> > @@ -481,21 +481,17 @@ int route_irq_to_guest(struct domain *d, unsigned int
> > virq,
> >  {
> >  struct domain *ad = irq_get_domain(desc);
> > 
> > -if ( d == ad )
> > -{
> > -if ( irq_get_guest_info(desc)->virq != virq )
> > -{
> > -printk(XENLOG_G_ERR
> > -   "d%u: IRQ %u is already assigned to vIRQ %u\n",
> > -   d->domain_id, irq,
> > irq_get_guest_info(desc)->virq);
> > -retval = -EBUSY;
> > -}
> > -}
> > -else
> > +if ( d != ad )
> >  {
> >  printk(XENLOG_G_ERR "IRQ %u is already used by domain
> > %u\n",
> > irq, ad->domain_id);
> >  retval = -EBUSY;
> > +} else if ( irq_get_guest_info(desc)->virq != virq )
> 
> In Xen coding style the } and else if should be in separate line. E.g
> 
> }
> else if ( ... )
> 
> With that fixed:
> 
> Reviewed-by: Julien Grall 
> 
> Stefano, would you be happy to fix this minor coding style issue while
> committing?

done

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] AMD VMMCALL and VM86 mode

2016-12-09 Thread Andrew Cooper
On 09/12/16 19:55, Boris Ostrovsky wrote:
> On 12/09/2016 02:01 PM, Andrew Cooper wrote:
>> Hello,
>>
>> While working on XSA-192, I found a curious thing.  On AMD hardware, the
>> VMMCALL instruction appears to behave like a nop if executed in VM86
>> mode.  All other processor modes work fine.
>>
>> The documentation suggests it should be valid in any situation, but I
>> never get a #VMEXIT from it. 
> And I assume GENERAL2_INTERCEPT_VMMCALL is set (which is what we have in
> Xen by default)?

Yes, because I have already used hypercalls to get text to the console
before entering vm86 mode.

> What happens if you don't set it?

Let me do some hacking and see.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] AMD VMMCALL and VM86 mode

2016-12-09 Thread Boris Ostrovsky
On 12/09/2016 02:01 PM, Andrew Cooper wrote:
> Hello,
>
> While working on XSA-192, I found a curious thing.  On AMD hardware, the
> VMMCALL instruction appears to behave like a nop if executed in VM86
> mode.  All other processor modes work fine.
>
> The documentation suggests it should be valid in any situation, but I
> never get a #VMEXIT from it. 

And I assume GENERAL2_INTERCEPT_VMMCALL is set (which is what we have in
Xen by default)?

What happens if you don't set it?

-boris


>  Thus, I would have thought it would fall
> into the un-intercepted category and raise a #UD fault, but I don't get
> that either.
>
> Is this behaviour expected?  The documentation would certainly seem to
> indicate not.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Future support of 5-level paging in Xen:wq

2016-12-09 Thread Andrew Cooper
On 09/12/16 19:31, Stefano Stabellini wrote:
> On Fri, 9 Dec 2016, Juergen Gross wrote:
>> On 09/12/16 00:50, Stefano Stabellini wrote:
>>> On Thu, 8 Dec 2016, Andrew Cooper wrote:
 On 08/12/2016 19:18, Stefano Stabellini wrote:
> On Thu, 8 Dec 2016, Andrew Cooper wrote:
>> On 08/12/16 16:46, Juergen Gross wrote:
>>> The first round of (very preliminary) patches for supporting the new
>>> 5-level paging of future Intel x86 processors [1] has been posted to
>>> lkml:
>>>
>>> https://lkml.org/lkml/2016/12/8/378
>>>
>>> An explicit note has been added: "CONFIG_XEN is broken." and
>>> "I would appreciate help with the code."
>>>
>>> I think we should start a discussion what we want to do in future:
>>>
>>> - are we going to support 5-level paging for PV guests?
>>> - do we limit 5-level paging to PVH and HVM?
>> The 64bit PV ABI has 16TB of virtual address space just above the upper
>> 48-canonical boundary.
>>
>> Were Xen to support 5-level PV guests, we'd either leave the PV guest
>> kernel with exactly the same amount of higher half space as it currently
>> has, or we'd have to recompile Xen as properly position-independent and
>> use a different virtual range in different paging mode.
>>
>> Another pain point is the quantity of virtual address space handed away
>> in the ABI.  We currently had 97% of the virtual address space away to
>> 64bit PV guests, and frankly this is too much.  This is the wrong way
>> around when Xen has more management to do than the guest.  If we were to
>> go along the 5-level PV guests route, I will insist that there is a
>> rather more even split of virtual address space baked into the ABI.
>>
>> However, a big question is whether any of this effort is worth doing, in
>> the light of PVH.
> With my Aporeto hat on, I'll say that given the overwhelming amount of
> hardware available out there without virtualization support, this work
> is worth doing. I am referring to all the public cloud virtual machines,
> which can support Xen PV guests but cannot support PVH guests.
 Why is Xen supporting 5-level guests useful for running in a PV cloud
 VM?  Xen doesn't run PV.

 I am not suggesting that we avoid adding 5-level support to Xen.  We
 should absolutely do that.  The question is only whether we extend the
 PV ABI to support 5-level PV guests.  Conceptually, its very easy to
 have a 5-level Xen but only supporting 4-level PV guests.

 VT-x and SVM date from 2005/2006 and are now 10 years old.  I would be
 surprised if you would find much hardware of this age in any cloud; you
 can't by anything that old, and support contracts have probably run out
 if you have owned that hardware for 10 years.
>>> I am thinking that in a couple of years, we might already find VMs so
>>> large that to use all the memory in a nested virt scenario, we need
>>> 5-level PV abi support.
>>>
>> No, I don't think so. I believe there will be no hardware capable of
>> 5-level paging but without VMX/SVM support. Support of PVH/HVM for such
>> large guests should be enough. We don't need to extend PV which we want
>> to get rid of in Linux anyway, no?
> I am talking about nested virtualization when the L1 virtual machine
> does not support nested VMX or SVM. No Amazon AWS virtual machines
> support nested VMX, but it is possible to run Xen PV virtual machines on
> top of any Amazon HVM instance.
>
> When 5-level pagetable hardware will become available on Amazon AWS, it
> might be possible to get virtual machines so large that in order to use
> all the memory, you need to use 5-level pagetables in L1 Xen. In this
> scenario, if we want to create a L2 virtual machine as large as possible
> we will need support for 5-level page tables in the PV ABI.
>
> Please correct me if I am wrong.

That is a valid scenario, although I don't think its very likely to happen.

Intel currently tops out at 46 bits physical (64TB), according to the
whitepaper, while a lot of AMD hardware has 48 bits physical (256TB). I
dread to think how much AWS would charge you for that much RAM, or how
much Amazon would be charged to buy such a server in the first place.

This is more RAM that Xen can currently handle, and isn't going to
change without breaking the current ABI.

Also, given the rise of virtualisation-based security solutions even by
Microsoft themselves in Windows 10, I think the chances are good that
you will be able to get VMs with nested virt before being able to get
VMs large enough to need 5-level paging.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] fix potential int overflow in efi/boot

2016-12-09 Thread Stefano Stabellini
HorizontalResolution and VerticalResolution are 32bit, while size is
64bit. As it stands multiplications are evaluated with 32bit arithmetic,
which could overflow. Cast HorizontalResolution to 64bit to avoid that.

Coverity-ID: 1381858

Signed-off-by: Stefano Stabellini 

---
Changes in v2:
- remove stray space
- fix other multiplication

diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index 56544dc..3e5e4ab 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -684,10 +684,10 @@ static UINTN __init 
efi_find_gop_mode(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,
 break;
 }
 if ( !cols && !rows &&
- mode_info->HorizontalResolution *
+ (UINTN)mode_info->HorizontalResolution *
  mode_info->VerticalResolution > size )
 {
-size = mode_info->HorizontalResolution *
+size = (UINTN)mode_info->HorizontalResolution *
mode_info->VerticalResolution;
 gop_mode = i;
 }

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] fix potential pa_range_info out of bound access

2016-12-09 Thread Stefano Stabellini
pa_range_info has only 8 elements and is accessed using pa_range as
index. pa_range is initialized to 16, potentially causing out of bound
access errors. Fix the issue by checking that pa_range is not greater
than the size of the array.

Coverity-ID: 1381865

Signed-off-by: Stefano Stabellini 

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index e4991df..eb791db 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1639,7 +1639,8 @@ void __init setup_virt_paging(void)
 }
 
 /* pa_range is 4 bits, but the defined encodings are only 3 bits */
-if ( pa_range&0x8 || !pa_range_info[pa_range].pabits )
+if ( pa_range >= ARRAY_SIZE(pa_range_info) ||
+ pa_range&0x8 || !pa_range_info[pa_range].pabits )
 panic("Unknown encoding of ID_AA64MMFR0_EL1.PARange %x\n", pa_range);
 
 val |= VTCR_PS(pa_range);

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] fix potential pa_range_info out of bound access

2016-12-09 Thread Stefano Stabellini
On Fri, 9 Dec 2016, Julien Grall wrote:
> Hi Stefano,
> 
> On 09/12/16 01:40, Stefano Stabellini wrote:
> > On Thu, 8 Dec 2016, Stefano Stabellini wrote:
> > > pa_range_info has only 8 elements and is accessed using pa_range as
> > > index. pa_range is initialized to 16, potentially causing out of bound
> > > access errors. Fix the issue by initializing pa_range to the effective
> > > number of pa_range_info elements.
> > > 
> > > CID 1381865
> > > 
> > > Signed-off-by: Stefano Stabellini 
> > > 
> > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> > > index e4991df..245fcd1 100644
> > > --- a/xen/arch/arm/p2m.c
> > > +++ b/xen/arch/arm/p2m.c
> > > @@ -1629,7 +1629,7 @@ void __init setup_virt_paging(void)
> > >  };
> > > 
> > >  unsigned int cpu;
> > > -unsigned int pa_range = 0x10; /* Larger than any possible value */
> > > +unsigned int pa_range = sizeof(pa_range_info) /
> > > sizeof(pa_range_info[0]);
> > > 
> > >  for_each_online_cpu ( cpu )
> > >  {
> > 
> > this is wrong, it should be sizeof(pa_range_info) / sizeof(pa_range_info[0])
> > - 1:
> > 
> > ---
> > pa_range_info has only 8 elements and is accessed using pa_range as
> > index. pa_range is initialized to 16, potentially causing out of bound
> > access errors. Fix the issue by initializing pa_range to the effective
> > number of pa_range_info elements minus 1.
> > 
> > Coverity-ID: 1381865
> > 
> > Signed-off-by: Stefano Stabellini 
> > 
> > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> > index e4991df..14901b0 100644
> > --- a/xen/arch/arm/p2m.c
> > +++ b/xen/arch/arm/p2m.c
> > @@ -1629,7 +1629,7 @@ void __init setup_virt_paging(void)
> >  };
> > 
> >  unsigned int cpu;
> > -unsigned int pa_range = 0x10; /* Larger than any possible value */
> > +unsigned int pa_range = ARRAY_SIZE(pa_range_info) - 1;
> 
> The previous value was confusing and I think this one is even more.
> 
> But this is not really the problem, it is because the boundary check the later
> on is wrong:
> 
> if ( pa_range&0x8 || !pa_range_info[pa_range].pabits )
> 
> It will only check whether bit 3 is not set. But we want to check that
> pa_range is the range of the array. I.e
> 
> pa_range < ARRAY_SIZE(pa_range_info)

You are right, that is better and I don't think it requires changing the
initial value. Andrew suggested something similar on IRC too.


> If you still want to change the pa_range initial value, then I would prefer to
> see the boot CPU one (i.e boot_cpu_data.mm64.pa_range).


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] missing vgic_unlock_rank in gic_remove_irq_from_guest

2016-12-09 Thread Stefano Stabellini
On Fri, 9 Dec 2016, Artem Mygaiev wrote:
> Hi Stefano
> 
> On 09.12.16 02:59, Stefano Stabellini wrote:
> > Add missing vgic_unlock_rank on the error path in
> > gic_remove_irq_from_guest.
> >
> > CID: 1381843
> >
> > Signed-off-by: Stefano Stabellini 
> >
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index 63c744a..a5348f2 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -205,7 +205,10 @@ int gic_remove_irq_from_guest(struct domain *d, 
> > unsigned int virq,
> >   */
> >  if ( test_bit(_IRQ_INPROGRESS, &desc->status) ||
> >   !test_bit(_IRQ_DISABLED, &desc->status) )
> > +{
> > +vgic_unlock_rank(v_target, rank, flags);
> >  return -EBUSY;
> > +}
> >  }
> >  
> >  clear_bit(_IRQ_GUEST, &desc->status);
> >
> >
> would it be better to do it in the same way it is done in 
> gic_route_irq_to_guest() just above the patched function?

Yes, that is the preferred way of handling error paths, especially when
there are multiple. In this case, there is just one, so it doesn't make
a difference.


> Something like:
> 
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 63c744a..a9bf5d9 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -181,6 +181,7 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned 
> int virq,
>  struct vgic_irq_rank *rank = vgic_rank_irq(v_target, virq);
>  struct pending_irq *p = irq_to_pending(v_target, virq);
>  unsigned long flags;
> +int res = -EBUSY;
>  
>  ASSERT(spin_is_locked(&desc->lock));
>  ASSERT(test_bit(_IRQ_GUEST, &desc->status));
> @@ -205,17 +206,19 @@ int gic_remove_irq_from_guest(struct domain *d, 
> unsigned int virq,
>   */
>  if ( test_bit(_IRQ_INPROGRESS, &desc->status) ||
>   !test_bit(_IRQ_DISABLED, &desc->status) )
> -return -EBUSY;
> +goto out;
>  }
>  
>  clear_bit(_IRQ_GUEST, &desc->status);
>  desc->handler = &no_irq_type;
>  
>  p->desc = NULL;
> +res = 0;
>  
> +out:
>  vgic_unlock_rank(v_target, rank, flags);
>  
> -return 0;
> +return res;
>  }
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Future support of 5-level paging in Xen:wq

2016-12-09 Thread Stefano Stabellini
On Fri, 9 Dec 2016, Juergen Gross wrote:
> On 09/12/16 00:50, Stefano Stabellini wrote:
> > On Thu, 8 Dec 2016, Andrew Cooper wrote:
> >> On 08/12/2016 19:18, Stefano Stabellini wrote:
> >>> On Thu, 8 Dec 2016, Andrew Cooper wrote:
>  On 08/12/16 16:46, Juergen Gross wrote:
> > The first round of (very preliminary) patches for supporting the new
> > 5-level paging of future Intel x86 processors [1] has been posted to
> > lkml:
> >
> > https://lkml.org/lkml/2016/12/8/378
> >
> > An explicit note has been added: "CONFIG_XEN is broken." and
> > "I would appreciate help with the code."
> >
> > I think we should start a discussion what we want to do in future:
> >
> > - are we going to support 5-level paging for PV guests?
> > - do we limit 5-level paging to PVH and HVM?
>  The 64bit PV ABI has 16TB of virtual address space just above the upper
>  48-canonical boundary.
> 
>  Were Xen to support 5-level PV guests, we'd either leave the PV guest
>  kernel with exactly the same amount of higher half space as it currently
>  has, or we'd have to recompile Xen as properly position-independent and
>  use a different virtual range in different paging mode.
> 
>  Another pain point is the quantity of virtual address space handed away
>  in the ABI.  We currently had 97% of the virtual address space away to
>  64bit PV guests, and frankly this is too much.  This is the wrong way
>  around when Xen has more management to do than the guest.  If we were to
>  go along the 5-level PV guests route, I will insist that there is a
>  rather more even split of virtual address space baked into the ABI.
> 
>  However, a big question is whether any of this effort is worth doing, in
>  the light of PVH.
> >>> With my Aporeto hat on, I'll say that given the overwhelming amount of
> >>> hardware available out there without virtualization support, this work
> >>> is worth doing. I am referring to all the public cloud virtual machines,
> >>> which can support Xen PV guests but cannot support PVH guests.
> >>
> >> Why is Xen supporting 5-level guests useful for running in a PV cloud
> >> VM?  Xen doesn't run PV.
> >>
> >> I am not suggesting that we avoid adding 5-level support to Xen.  We
> >> should absolutely do that.  The question is only whether we extend the
> >> PV ABI to support 5-level PV guests.  Conceptually, its very easy to
> >> have a 5-level Xen but only supporting 4-level PV guests.
> >>
> >> VT-x and SVM date from 2005/2006 and are now 10 years old.  I would be
> >> surprised if you would find much hardware of this age in any cloud; you
> >> can't by anything that old, and support contracts have probably run out
> >> if you have owned that hardware for 10 years.
> > 
> > I am thinking that in a couple of years, we might already find VMs so
> > large that to use all the memory in a nested virt scenario, we need
> > 5-level PV abi support.
> > 
> 
> No, I don't think so. I believe there will be no hardware capable of
> 5-level paging but without VMX/SVM support. Support of PVH/HVM for such
> large guests should be enough. We don't need to extend PV which we want
> to get rid of in Linux anyway, no?

I am talking about nested virtualization when the L1 virtual machine
does not support nested VMX or SVM. No Amazon AWS virtual machines
support nested VMX, but it is possible to run Xen PV virtual machines on
top of any Amazon HVM instance.

When 5-level pagetable hardware will become available on Amazon AWS, it
might be possible to get virtual machines so large that in order to use
all the memory, you need to use 5-level pagetables in L1 Xen. In this
scenario, if we want to create a L2 virtual machine as large as possible
we will need support for 5-level page tables in the PV ABI.

Please correct me if I am wrong.


P.S.
I used the following terminology:

L0 Xen is the one running on the hardware
L1 virtual machine, is a VM created by L0 Xen. In this context this is
   an Amazon AWS HVM instance without nested VMX support.
L1 Xen is the one installed inside an L1 virtual machine
L2 virtual machine, is a VM created by L1 Xen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Issue while bringing up libvirtd.service

2016-12-09 Thread Praveen Kumar
On Fri, 2016-12-09 at 14:07 +0100, Dario Faggioli wrote:
> > 
Thanks Dario and Wei Liu.

I am able to create VMs using XL toolstack and removing old xen and
libvirt packages. I think you both were right, I had double environment
problem.

Thanks once again for your suggestions.

> On Thu, 2016-12-08 at 01:11 +0530, Praveen Kumar wrote:
> > 
> > Hi,
> > 
> Hey,
> 
> > I am new in Xen environment and trying to get VMs running with the
> > build xen code base.
> > 
> > But, am facing issue bringing up the libvirt deamon. I have
> > installed
> > latest unstable xen ( 4.9 ).
> > There seems to be a conflict in library version and because of
> > which
> > I am getting below error :
> > ...
> > Dec 08 00:44:43 kpraveen.labs.blr.novell.com libvirtd[14603]:
> > Unable
> > to configure libxl's memory management parameters
> > Dec 08 00:44:43 kpraveen.labs.blr.novell.com libvirtd[14603]:
> > Initialization of LIBXL state driver failed: no error
> > Dec 08 00:44:43 kpraveen.labs.blr.novell.com libvirtd[14603]:
> > Driver
> > state initialization failed
> > ...
> > 
> What distribution is this? In any case, if you are installing Xen
> (staging / 4.9) from the sources, and then installing libvirt's
> packages from the distro, that would not work.
> 
> In fact, most likely, libvirt packages will bring in as a dependency
> the version of Xen hypervisor and toolstack that is also packaged by
> the distro. At which point, you have two Xen environments installed
> in
> the same host, which is asking for trouble.
> 
> > 
> > I found that we have xen-libs of different version which is 4.7,
> > but
> > don't know if that is the root cause and how to upgrade that. As
> > suggested, I have also rebuild xen tree and tried reinstalling, but
> > the issue persists.
> > 
> Yep, here it is the version mismatch and double environment issue I
> was
> talking about.
> 
> > 
> > Any guidance will be very much helpful to resolve this problem .
> > Also, wanted to know, what others follow as best practices when
> > there
> > is a version change or if hit the similar problem.
> > Just FYI, I am following https://wiki.xenproject.org/wiki/Compiling
> > _X
> > en_From_Source link for compiling and installing xen.
> > 
> So, if you are able to build and install Xen from sources, you
> should,
> as a first step, verify that everything is working by creating a VM
> using the XL toolstack (`man xl', look at examples, search XL on our
> Wiki, etc).
> 
> If you want to use libvirt, I don't think you have much alternative
> than building libvirt from sources as well. If I want to test or do
> some libvirt development on upstream Xen, that is what I do, FWIW.
> 
> 

Regards,

~Praveen.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen Project and Xvisor.

2016-12-09 Thread Konrad Rzeszutek Wilk
On Fri, Dec 09, 2016 at 06:50:03PM +, Jason Long wrote:
> Hello.
> I like to see Xen developer ideas and concerns about "Xvisor" hypervisor. Any 
> experiences and compares?

Um (https://github.com/xvisor/xvisor/blob/master/HOSTS), this:

M:  x86_64 Generic
A:  x86 64-bit
C:  x86_64
V:  Intel (http://www.intel.com)
E:  QEMU (http://qemu.org/)
G:  Work-In-Progress.
S:  Work-In-Progress.
D:  docs/x86/x86_64_generic.txt

As it looks it is geared towards ARM (and only runs on ARM).

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-09 Thread Stefano Stabellini
On Fri, 9 Dec 2016, Jan Beulich wrote:
> >>> On 09.12.16 at 09:29,  wrote:
> > On Fri, 2016-12-09 at 01:13 -0700, Jan Beulich wrote:
> >> > > > On 08.12.16 at 22:54,  wrote:
> >> > Yeah, that was what was puzzling me too. Keeping them ordered has
> >> > the
> >> > nice property that if a user says the following in a config file:
> >> > 
> >> >  vcpuclass=["0-3:class0", "4-7:class1"]
> >> > 
> >> > (assuming that class0 and class1 are the always available Xen
> >> > names) it
> >> 
> >> This, btw, is another aspect I think has a basic problem: class0 and
> >> class1 say nothing about the properties of a class, and hence are
> >> tied to one particular host.
> >>
> > The other way round, I'd say. I mean, since they say nothing, they're
> > _not_ host specific?
> 
> No, not really. Or perhaps we mean different things. The name
> itself of course can be anything, but what is relevant here is
> what it stands for. And "class0" may mean one thing on host 1
> and a completely different thing on host2. Yet we need a certain
> name to always mean the same thing (or else we'd need
> translation when moving VMs between hosts).
> 
> >>  I think class names need to be descriptive
> >> and uniform across hosts. That would allow migration of such VMs as
> >> well as prevent starting them on a host not having suitable hardware.
> >> 
> > ...what George suggested (but please, George, when back, correct me if
> > I'm misrepresenting your ideas :-)) that:
> >  - something generic, such as class0, class1 will always exist (well, 
> >at least class0). They would basically constitute the Xen interface;
> >  - toolstack will accept more specific names, such as 'big' and 
> >'little', and also 'A57' and 'A43' (I'm making up the names), etc.
> >  - a VM with vCPUs in class0 and class1 will always be created and run 
> >on any 2 classes system;
> 
> How can that work, if you don't know what class1 represents?
> 
> > a VM with big and little vCPUs will only 
> >run on an ARM big.LITTLE incarnation; a VM with A57 and A43 vCPUs 
> >will only run on an host that has at least one A57 and one A43 
> >pCPUs.
> > 
> > What's not clear to me is how to establish:
> >  - the ordering among classes;
> 
> As said before - there's at best some partial ordering going to be
> possible.
> 
> >  - the mapping between Xen's neuter names and the toolstack's (arch) 
> >specific ones.
> 
> Perhaps it needs re-consideration whether class names make
> sense in the first place? What about, for example, making class
> names something entirely local to the domain config file, and
> besides specifying
> 
> vcpuclass=["0-3:class0", "4-7:class1"]
> 
> requiring for it to also specify the properties of the classes it
> uses:
> 
> class0=["..."]
> class1=["..."]
> 
> The specifiers then would be architecture specific, e.g.
> 
> class0=["arm64"]
> class1=["arm64.big"]
> 
> or on x86
> 
> class0=["x86-64"]
> class1=["x86.avx", "x86.avx2"]
> class2=["x86.XeonPhi"]
> 
> Of course this goes quite a bit in the direction of CPUID handling,
> so Andrew may have a word to say here.

This is good, but given that we are not likely to support cross-arch
migration (i.e. ARM to x86), the xl parser can be smart enough to
accept the following syntax too, as an alias to the one you suggested:

vcpuclass=["0-3:arm64.big", "4-7:arm64.LITTLE"]

or even

vcpuclass=["0-3:big", "4-7:LITTLE"]

if the receiving end is not a big.LITTLE machine, it will be easy for it
to map "big" and "LITTLE" to two arbitrary classes, such as class0 and
class1.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.9 Development Update

2016-12-09 Thread Andrew Cooper
On 09/12/16 19:01, Stefano Stabellini wrote:
> On Fri, 9 Dec 2016, Oleksandr Andrushchenko wrote:
>> On 12/09/2016 03:57 PM, Pasi Kärkkäinen wrote:
>>> On Fri, Dec 09, 2016 at 02:57:04PM +0200, Oleksandr Andrushchenko wrote:
>> Should we have a section on new PV drivers? If so, I suggest to add:
>> - Xen transport for 9pfs
>> - PV Calls
> Good idea. We could also include DRM and PV Sound (CC Oleksandr).
>
 This is a great idea. Let me explain what we have and what the direction
 is:
 1. Frontends which we already have, working, but need to refactor/cleanup:
 1.1. PV sound
 1.2. PV DRM
 1.3. DISPL protocol, I will push v1 for review right after sndif done
 1.3. PV DRM mapper (Dom0 generic DRM driver to implement DRM zero copy
 via DRM Prime buffer sharing)
 1.4. PV events not done, but we are considering [1]. If it fits and
 is maintained,
 then we'll probably stick to it, otherwise new PV will be created

 2. Backends, for the above frontends already implemented:
 2.1. A unified library for Xen backends (libxenbe)
 2.2. DRM + Wayland
 2.3. ALSA
 2.4. Events not implemented yet

 All the above sources are available on *public* Github repos
 (I can provide links on request) and the intention is to
 upstream.

>>> Please do post the links..
>> Please note these are all WIP:
>> 1. Frontends
>> https://github.com/andr2000?tab=repositories
>> 2. Backends
>> https://github.com/al1img?tab=repositories
> Now, I don't want to sound pessimistic, but I thought I was being
> audacious when I wrote both PV Calls and 9pfs for 4.9 - do you really
> think it is feasable to complete upstreaming of PV sound, PV DRM, DISPL,
> PV DRM frontends and backends, all by April? I would probably reduce the
> list a bit.

I think it would be good to main two lists.  One of "the stuff people
are working on overall", and "the subset of it intended/expected for the
forthcoming release".

Stuff will invariably slip, but even if the work isn't intended for the
forthcoming release, it it still useful to see if anyone in the
community is working on a related topic.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.9 Development Update

2016-12-09 Thread Stefano Stabellini
On Fri, 9 Dec 2016, Oleksandr Andrushchenko wrote:
> On 12/09/2016 03:57 PM, Pasi Kärkkäinen wrote:
> > On Fri, Dec 09, 2016 at 02:57:04PM +0200, Oleksandr Andrushchenko wrote:
> > > > > Should we have a section on new PV drivers? If so, I suggest to add:
> > > > > - Xen transport for 9pfs
> > > > > - PV Calls
> > > > Good idea. We could also include DRM and PV Sound (CC Oleksandr).
> > > > 
> > > This is a great idea. Let me explain what we have and what the direction
> > > is:
> > > 1. Frontends which we already have, working, but need to refactor/cleanup:
> > > 1.1. PV sound
> > > 1.2. PV DRM
> > > 1.3. DISPL protocol, I will push v1 for review right after sndif done
> > > 1.3. PV DRM mapper (Dom0 generic DRM driver to implement DRM zero copy
> > > via DRM Prime buffer sharing)
> > > 1.4. PV events not done, but we are considering [1]. If it fits and
> > > is maintained,
> > > then we'll probably stick to it, otherwise new PV will be created
> > > 
> > > 2. Backends, for the above frontends already implemented:
> > > 2.1. A unified library for Xen backends (libxenbe)
> > > 2.2. DRM + Wayland
> > > 2.3. ALSA
> > > 2.4. Events not implemented yet
> > > 
> > > All the above sources are available on *public* Github repos
> > > (I can provide links on request) and the intention is to
> > > upstream.
> > > 
> > Please do post the links..
> Please note these are all WIP:
> 1. Frontends
> https://github.com/andr2000?tab=repositories
> 2. Backends
> https://github.com/al1img?tab=repositories

Now, I don't want to sound pessimistic, but I thought I was being
audacious when I wrote both PV Calls and 9pfs for 4.9 - do you really
think it is feasable to complete upstreaming of PV sound, PV DRM, DISPL,
PV DRM frontends and backends, all by April? I would probably reduce the
list a bit.___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] AMD VMMCALL and VM86 mode

2016-12-09 Thread Andrew Cooper
Hello,

While working on XSA-192, I found a curious thing.  On AMD hardware, the
VMMCALL instruction appears to behave like a nop if executed in VM86
mode.  All other processor modes work fine.

The documentation suggests it should be valid in any situation, but I
never get a #VMEXIT from it.  Thus, I would have thought it would fall
into the un-intercepted category and raise a #UD fault, but I don't get
that either.

Is this behaviour expected?  The documentation would certainly seem to
indicate not.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 21/24] ARM: vITS: handle INVALL command

2016-12-09 Thread Andre Przywara
On 03/12/16 00:46, Stefano Stabellini wrote:
> On Fri, 2 Dec 2016, Andre Przywara wrote:
>> Hi,

Hi Stefano,

I started to answer this email some days ago, but then spend some time
on actually implementing what I suggested, hence the delay ...

>>
>> sorry for chiming in late 
>>
>> I've been spending some time thinking about this, and I think we can in
>> fact get away without ever propagating command from domains to the host.
>>
>> I made a list of all commands that possible require host ITS command
>> propagation. There are two groups:
>> 1: enabling/disabling LPIs: INV, INVALL
>> 2: mapping/unmapping devices/events: DISCARD, MAPD, MAPTI.
>>
>> The second group can be handled by mapping all required devices up
>> front, I will elaborate on that in a different email.
>>
>> For the first group, read below ...
>>
>> On 01/12/16 01:19, Stefano Stabellini wrote:
>>> On Fri, 25 Nov 2016, Julien Grall wrote:
 Hi,

 On 18/11/16 18:39, Stefano Stabellini wrote:
> On Fri, 11 Nov 2016, Stefano Stabellini wrote:
>> On Fri, 11 Nov 2016, Julien Grall wrote:
>>> On 10/11/16 20:42, Stefano Stabellini wrote:
>>> That's why in the approach we had on the previous series was "host ITS
>>> command
>>> should be limited when emulating guest ITS command". From my recall, in
>>> that
>>> series the host and guest LPIs was fully separated (enabling a guest
>>> LPIs was
>>> not enabling host LPIs).
>>
>> I am interested in reading what Ian suggested to do when the physical
>> ITS queue is full, but I cannot find anything specific about it in the
>> doc.
>>
>> Do you have a suggestion for this?
>>
>> The only things that come to mind right now are:
>>
>> 1) check if the ITS queue is full and busy loop until it is not 
>> (spin_lock
>> style)
>> 2) check if the ITS queue is full and sleep until it is not (mutex style)
>
> Another, probably better idea, is to map all pLPIs of a device when the
> device is assigned to a guest (including Dom0). This is what was written
> in Ian's design doc. The advantage of this approach is that Xen doesn't
> need to take any actions on the physical ITS command queue when the
> guest issues virtual ITS commands, therefore completely solving this
> problem at the root. (Although I am not sure about enable/disable
> commands: could we avoid issuing enable/disable on pLPIs?)

 In the previous design document (see [1]), the pLPIs are enabled when the
 device is assigned to the guest. This means that it is not necessary to 
 send
 command there. This is also means we may receive a pLPI before the 
 associated
 vLPI has been configured.

 That said, given that LPIs are edge-triggered, there is no deactivate state
 (see 4.1 in ARM IHI 0069C). So as soon as the priority drop is done, the 
 same
 LPIs could potentially be raised again. This could generate a storm.
>>>
>>> Thank you for raising this important point. You are correct.
>>>
 The priority drop is necessary if we don't want to block the reception of
 interrupt for the current physical CPU.

 What I am more concerned about is this problem can also happen in normal
 running (i.e the pLPI is bound to an vLPI and the vLPI is enabled). For
 edge-triggered interrupt, there is no way to prevent them to fire again. 
 Maybe
 it is time to introduce rate-limit interrupt for ARM. Any opinions?
>>>
>>> Yes. It could be as simple as disabling the pLPI when Xen receives a
>>> second pLPI before the guest EOIs the first corresponding vLPI, which
>>> shouldn't happen in normal circumstances.
>>>
>>> We need a simple per-LPI inflight counter, incremented when a pLPI is
>>> received, decremented when the corresponding vLPI is EOIed (the LR is
>>> cleared).
>>>
>>> When the counter > 1, we disable the pLPI and request a maintenance
>>> interrupt for the corresponding vLPI.
>>
>> So why do we need a _counter_? This is about edge triggered interrupts,
>> I think we can just accumulate all of them into one.
> 
> The counter is not to re-inject the same amount of interrupts into the
> guest, but to detect interrupt storms.

I was wondering if an interrupt "storm" could already be defined by
"receiving an LPI while there is already one pending (in the guest's
virtual pending table) and it being disabled by the guest". I admit that
declaring two interrupts as a storm is a bit of a stretch, but in fact
the guest had probably a reason for disabling it even though it
fires, so Xen should just follow suit.
The only difference is that we don't do it _immediately_ when the guest
tells us (via INV), but only if needed (LPI actually fires).

>> So here is what I think:
>> - We use the guest provided pending table to hold a pending bit for each
>> VLPI. We can unmap the memory from the guest, since software is not
>> supposed to access this table as per the spec.
>>

[Xen-devel] Xen Project and Xvisor.

2016-12-09 Thread Jason Long
Hello.
I like to see Xen developer ideas and concerns about "Xvisor" hypervisor. Any 
experiences and compares?

Thank you.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/3] x86/HVM: support (emulate) UMIP

2016-12-09 Thread Andrew Cooper
On 08/12/16 12:20, Jan Beulich wrote:
>
>> However, it would also require only enabling the SVM GP intercept in the
>> hvm_update_guest_vendor() path (which should be renamed to something
>> slightly more generic like hvm_cpuid_policy_updated()).
> Why that? We always need it intercepted as long as the guest
> wants UMIP, but the hardware doesn't offer it. The feature isn't
> tied to the vendor being Intel or some such.

The hvm_update_guest_vendor() path is post-domain_create() way of
signalling "cpuid has changed - you might want to reconfigure intercepts".

It is currently used only to alter the #UD intercept based on the set
CPUID vendor (hence its name), but the name now looks rather short sighted.

With the proposal of having emulated-UMIP as explicitly opt-in, the
required intercepts shouldn't be enabled at domain_create() time, and
should be enabled later after the toolstack has set a policy.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] p2m: split mem_access into separate files

2016-12-09 Thread Tamas K Lengyel
On Fri, Dec 9, 2016 at 2:27 AM, Jan Beulich  wrote:
 On 08.12.16 at 23:57,  wrote:
>> --- a/xen/arch/x86/mm/Makefile
>> +++ b/xen/arch/x86/mm/Makefile
>> @@ -9,6 +9,7 @@ obj-y += guest_walk_3.o
>>  obj-y += guest_walk_4.o
>>  obj-y += mem_paging.o
>>  obj-y += mem_sharing.o
>> +obj-y += mem_access.o
>
> Please honor prior (mostly?) alphabetical ordering.

I don't think there is any alphabetical ordering here. The list begins
with paging.o then goes to altp2m.o and then to guest_walk_2.o.. IMHO
sorting the list is something that should be done in a separate patch.

>
>> --- a/xen/common/mem_access.c
>> +++ b/xen/common/mem_access.c
>> @@ -24,8 +24,9 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>> -#include 
>> +#include 
>>  #include 
>
> Normally asm/ includes xen/ of the same name or the other way
> around, depending on how they relate to one another; you
> shouldn't ever need both includes, and I'd be surprised if the
> two headers really are (even conceptionally) completely
> independent of each other.

Sure, xen/mem_access.h can include the asm specific one.

>
> Otherwise this all looks like pure code motion (except for the
> adjustments described), but it would be nice if you could
> clarify that's indeed (intended to be) the case.

I do say in the commit message this is mechanical code motion: "There
are no code-changes introduced, the patch is mechanical code
movement."

>
> Jan
>

Thanks,
Tamas

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 1/1] PostgreSQL db: Retry transactions on constraint failures

2016-12-09 Thread Ian Jackson
This is unfortunate but appears to be necessary.

Signed-off-by: Ian Jackson 
CC: pgsql-hack...@postgresql.org
---
 Osstest/JobDB/Executive.pm | 45 -
 tcl/JobDB-Executive.tcl|  6 --
 2 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/Osstest/JobDB/Executive.pm b/Osstest/JobDB/Executive.pm
index 610549a..dc6d3c2 100644
--- a/Osstest/JobDB/Executive.pm
+++ b/Osstest/JobDB/Executive.pm
@@ -62,8 +62,51 @@ sub need_retry ($$$) {
 my ($jd, $dbh,$committing) = @_;
 return
($dbh_tests->err() // 0)==7 &&
-   ($dbh_tests->state =~ m/^(?:40P01|40001)/);
+   ($dbh_tests->state =~ m/^(?:40P01|40001|23|40002)/);
 # DEADLOCK DETECTED or SERIALIZATION FAILURE
+# or any Integrity Constraint Violation including
+# TRANSACTION_INTEGRITY_CONSTRAINT_VIOLATION.
+#
+# An Integrity Constraint Violation ought not to occur with
+# serialisable transactions, so it is aways a bug.  These bugs
+# should not be retried.  However, there is a longstanding bug in
+# PostgreSQL: SERIALIZABLE's guarantee of transaction
+# serialisability only applies to successful transactions.
+# Concurrent SERIALIZABLE transactions may generate "impossible"
+# errors.  For example, doing a SELECT to ensure that a row does
+# not exist, and then inserting it, may produce a unique
+# constraint violation.
+#
+# I have not been able to find out clearly which error codes may
+# be spuriously generated.  At the very least "23505
+# UNIQUE_VIOLATION" is, but I'm not sure about others.  I am
+# making the (hopefully not unwarranted) assumption that this is
+# the only class of spurious errors.  (We don't have triggers.)
+#
+# The undesirable side effect is that a buggy transaction would be
+# retried at intervals until the retry count is reached.  But
+# there seems no way to avoid this.
+#
+# This bug may have been fixed in very recent PostgreSQL (although
+# a better promise still seems absent from the documentation, at
+# the time of writing in December 2016).  But we need to work with
+# PostgreSQL back to at least 9.1.  Perhaps in the future we can
+# make this behaviour conditional on the pgsql bug being fixed.
+#
+# References:
+#
+# "WIP: Detecting SSI conflicts before reporting constraint violations"
+# January 2016 - April 2016 on pgsql-hackers
+# 
https://www.postgresql.org/message-id/flat/CAEepm%3D2_9PxSqnjp%3D8uo1XthkDVyOU9SO3%2BOLAgo6LASpAd5Bw%40mail.gmail.com
+# (includes patch for PostgreSQL and its documentation)
+#
+# BUG #9301: INSERT WHERE NOT EXISTS on table with UNIQUE constraint in 
concurrent SERIALIZABLE transactions
+# 2014, pgsql-bugs
+# 
https://www.postgresql.org/message-id/flat/3F697CF1-2BB7-40D4-9D20-919D1A5D6D93%40apple.com
+#
+# "Working around spurious unique constraint errors due to SERIALIZABLE 
bug"
+# 2009, pgsql-general
+# 
https://www.postgresql.org/message-id/flat/D960CB61B694CF459DCFB4B0128514C203937E44%40exadv11.host.magwien.gv.at
 }
 
 sub current_flight ($) { #method
diff --git a/tcl/JobDB-Executive.tcl b/tcl/JobDB-Executive.tcl
index 62c63af..6b9bcb0 100644
--- a/tcl/JobDB-Executive.tcl
+++ b/tcl/JobDB-Executive.tcl
@@ -365,8 +365,10 @@ proc transaction {tables script {autoreconnect 0}} {
if {$rc} {
switch -glob $errorCode {
{OSSTEST-PSQL * 40P01} -
-   {OSSTEST-PSQL * 40001} {
-   # DEADLOCK DETECTED or SERIALIZATION FAILURE
+   {OSSTEST-PSQL * 40001} -
+   {OSSTEST-PSQL * 23*}   -
+   {OSSTEST-PSQL * 40002} {
+   # See Osstest/JobDB/Executive.pm:need_retry
logputs stdout \
  "transaction serialisation failure ($errorCode) ($result) retrying ..."
if {$dbopen} { db-execute ROLLBACK }
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation

2016-12-09 Thread Ian Jackson
Hi.  This message is going to xen-devel (because that's where the
osstest project is) and to pgsql-hackers (because I hope they may be
able to advise about the scope of the PostgreSQL SERIALIZABLE
constraint problem).

In summary: PostgreSQL only provides transaction serialisability for
successful transactions.  Even with SERIALIZABLE, transactions may
fail due to spurious and "impossible" constraint violations.

As a result, I need to make osstest retry transactions not only on
explicitly reported serialisation failures and deadlocks, but also on
integrity violations.

It is not clear to me from the thread
  WIP: Detecting SSI conflicts before reporting constraint violations
which was on pgsql-hackers earlier this year, whether it is only
unique constraint violations which may spuriously occur.

Can anyone from the PostgreSQL hacker community advise ?  The existing
documentation patch just talks about "successful" transactions, and
uses unique constraints as an example.  In principle this leaves open
the possibility that the transaction might fail with bizarre and
unpredictable error codes, although I hope this isn't possible.

It would be good to narrow the scope of the retries in my system as
much as possible.

I'm hoping to get an authoritative answer here but it seems that this
is a common problem to which there is still not yet a definitive
solution.  I would like there to be a definitive solution.

If I get a clear answer I'll submit a further docs patch to pgsql :-).

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/vpmu: Move vpmu_do_cpuid() handling into {pv, hvm}_cpuid()

2016-12-09 Thread Boris Ostrovsky
On 12/09/2016 01:15 PM, Andrew Cooper wrote:
> On 06/12/16 11:11, Andrew Cooper wrote:
>> On 05/12/16 20:59, Boris Ostrovsky wrote:
>>> On 12/05/2016 01:24 PM, Andrew Cooper wrote:
 @@ -3516,6 +3516,17 @@ void hvm_cpuid(unsigned int input, unsigned int 
 *eax, unsigned int *ebx,
  if ( !(hvm_pae_enabled(v) || hvm_long_mode_enabled(v)) )
  *edx &= ~cpufeat_mask(X86_FEATURE_PSE36);
  }
 +
 +if ( vpmu_enabled(v) &&
 + vpmu_is_set(vcpu_vpmu(v), VPMU_CPU_HAS_DS) )
 +{
 +*edx |= cpufeat_mask(X86_FEATURE_DS);
 +if ( cpu_has(¤t_cpu_data, X86_FEATURE_DTES64) )
 +*ecx |= cpufeat_mask(X86_FEATURE_DTES64);
 +if ( cpu_has(¤t_cpu_data, X86_FEATURE_DSCPL) )
 +*ecx |= cpufeat_mask(X86_FEATURE_DSCPL);
 +}
 +
  break;
  
  case 0x7:
 @@ -3646,6 +3657,18 @@ void hvm_cpuid(unsigned int input, unsigned int 
 *eax, unsigned int *ebx,
  }
  break;
  
 +case 0x000a: /* Architectural Performance Monitor Features 
 (Intel) */
 +if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL || 
 !vpmu_enabled(v) )
 +{
 +*eax = *ebx = *ecx = *edx = 0;
 +break;
 +}
 +
 +/* Report at most version 3 since that's all we currently emulate 
 */
 +if ( (*eax & 0xff) > 3 )
 +*eax = (*eax & ~0xff) | 3;
 +break;
>>> Both this and Debug Store checks are the same for both HVM and PV. Can
>>> they be factored out?
>> The purpose of this patch series is to untangle the current call tree to
>> make it easier to finally merge the PV and HVM paths into a single
>> guest_cpuid().
>>
>> Yes, this does add a bit of duplication in the short timer, but allows
>> for easier movement to the longterm goal.
>>
>>> (and then perhaps version update can gain back PMU_VERSION_MASK macro)
>> That involves moving a whole load of Intel internals with generic names
>> into vpmu.h, which is why I chose not to do it.  The end result in
>> guest_cpuid() won't use it.
> Boris: Any further comment, or is my explanation ok?  Strictly speaking
> the patch isn't blocked on your review, but I'd prefer not to use that
> technicality if you are unhappy with it.


Since in the end both of these concerned will be addressed by
guest_cpuid() implementation I think it's all good.

Thanks.
-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 4/4] x86/asm: Rewrite sync_core() to use IRET-to-self

2016-12-09 Thread Andy Lutomirski
Aside from being excessively slow, CPUID is problematic: Linux runs
on a handful of CPUs that don't have CPUID.  Use IRET-to-self
instead.  IRET-to-self works everywhere, so it makes testing easy.

For reference, On my laptop, IRET-to-self is ~110ns,
CPUID(eax=1, ecx=0) is ~83ns on native and very very slow under KVM,
and MOV-to-CR2 is ~42ns.

While we're at it: sync_core() serves a very specific purpose.
Document it.

Cc: "H. Peter Anvin" 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/processor.h | 80 +---
 1 file changed, 58 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 64fbc937d586..ceb1f4d3f3fa 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -590,33 +590,69 @@ static __always_inline void cpu_relax(void)
 
 #define cpu_relax_lowlatency() cpu_relax()
 
-/* Stop speculative execution and prefetching of modified code. */
+/*
+ * This function forces the icache and prefetched instruction stream to
+ * catch up with reality in two very specific cases:
+ *
+ *  a) Text was modified using one virtual address and is about to be executed
+ * from the same physical page at a different virtual address.
+ *
+ *  b) Text was modified on a different CPU, may subsequently be
+ * executed on this CPU, and you want to make sure the new version
+ * gets executed.  This generally means you're calling this in a IPI.
+ *
+ * If you're calling this for a different reason, you're probably doing
+ * it wrong.
+ */
 static inline void sync_core(void)
 {
-   int tmp;
-
-#ifdef CONFIG_X86_32
/*
-* Do a CPUID if available, otherwise do a jump.  The jump
-* can conveniently enough be the jump around CPUID.
+* There are quite a few ways to do this.  IRET-to-self is nice
+* because it works on every CPU, at any CPL (so it's compatible
+* with paravirtualization), and it never exits to a hypervisor.
+* The only down sides are that it's a bit slow (it seems to be
+* a bit more than 2x slower than the fastest options) and that
+* it unmasks NMIs.  The "push %cs" is needed because, in
+* paravirtual environments, __KERNEL_CS may not be a valid CS
+* value when we do IRET directly.
+*
+* In case NMI unmasking or performance ever becomes a problem,
+* the next best option appears to be MOV-to-CR2 and an
+* unconditional jump.  That sequence also works on all CPUs,
+* but it will fault at CPL3 (i.e. Xen PV and lguest).
+*
+* CPUID is the conventional way, but it's nasty: it doesn't
+* exist on some 486-like CPUs, and it usually exits to a
+* hypervisor.
+*
+* Like all of Linux's memory ordering operations, this is a
+* compiler barrier as well.
 */
-   asm volatile("cmpl %2,%1\n\t"
-"jl 1f\n\t"
-"cpuid\n"
-"1:"
-: "=a" (tmp)
-: "rm" (boot_cpu_data.cpuid_level), "ri" (0), "0" (1)
-: "ebx", "ecx", "edx", "memory");
+   register void *__sp asm(_ASM_SP);
+
+#ifdef CONFIG_X86_32
+   asm volatile (
+   "pushfl\n\t"
+   "pushl %%cs\n\t"
+   "pushl $1f\n\t"
+   "iret\n\t"
+   "1:"
+   : "+r" (__sp) : : "memory");
 #else
-   /*
-* CPUID is a barrier to speculative execution.
-* Prefetched instructions are automatically
-* invalidated when modified.
-*/
-   asm volatile("cpuid"
-: "=a" (tmp)
-: "0" (1)
-: "ebx", "ecx", "edx", "memory");
+   unsigned int tmp;
+
+   asm volatile (
+   "mov %%ss, %0\n\t"
+   "pushq %q0\n\t"
+   "pushq %%rsp\n\t"
+   "addq $8, (%%rsp)\n\t"
+   "pushfq\n\t"
+   "mov %%cs, %0\n\t"
+   "pushq %q0\n\t"
+   "pushq $1f\n\t"
+   "iretq\n\t"
+   "1:"
+   : "=&r" (tmp), "+r" (__sp) : : "cc", "memory");
 #endif
 }
 
-- 
2.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 2/4] Revert "x86/boot: Fail the boot if !M486 and CPUID is missing"

2016-12-09 Thread Andy Lutomirski
This reverts commit ed68d7e9b9cfb64f3045ffbcb108df03c09a0f98.

The patch wasn't quite correct -- there are non-Intel (and hence
non-486) CPUs that we support that don't have CPUID.  Since we no
longer require CPUID for sync_core(), just revert the patch.

I think the relevant CPUs are Geode and Elan, but I'm not sure.

In principle, we should try to do better at identifying CPUID-less
CPUs in early boot, but that's more complicated.

Reported-by: One Thousand Gnomes 
Cc: Matthew Whitehead 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/boot/cpu.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/boot/cpu.c b/arch/x86/boot/cpu.c
index 4224ede43b4e..26240dde081e 100644
--- a/arch/x86/boot/cpu.c
+++ b/arch/x86/boot/cpu.c
@@ -87,12 +87,6 @@ int validate_cpu(void)
return -1;
}
 
-   if (CONFIG_X86_MINIMUM_CPU_FAMILY <= 4 && !IS_ENABLED(CONFIG_M486) &&
-   !has_eflag(X86_EFLAGS_ID)) {
-   printf("This kernel requires a CPU with the CPUID instruction.  
Build with CONFIG_M486=y to run on this CPU.\n");
-   return -1;
-   }
-
if (err_flags) {
puts("This kernel requires the following features "
 "not present on the CPU:\n");
-- 
2.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 3/4] x86/microcode/intel: Replace sync_core() with native_cpuid()

2016-12-09 Thread Andy Lutomirski
The Intel microcode driver is using sync_core() to mean "do CPUID
with EAX=1".  I want to rework sync_core(), but first the Intel
microcode driver needs to stop depending on its current behavior.

Reported-by: Henrique de Moraes Holschuh 
Acked-by: Borislav Petkov 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/cpu/microcode/intel.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c 
b/arch/x86/kernel/cpu/microcode/intel.c
index cdc0deab00c9..e0981bb2a351 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -356,6 +356,26 @@ get_matching_model_microcode(unsigned long start, void 
*data, size_t size,
return state;
 }
 
+static void cpuid_1(void)
+{
+   /*
+* According to the Intel SDM, Volume 3, 9.11.7:
+*
+*   CPUID returns a value in a model specific register in
+*   addition to its usual register return values. The
+*   semantics of CPUID cause it to deposit an update ID value
+*   in the 64-bit model-specific register at address 08BH
+*   (IA32_BIOS_SIGN_ID). If no update is present in the
+*   processor, the value in the MSR remains unmodified.
+*
+* Use native_cpuid -- this code runs very early and we don't
+* want to mess with paravirt.
+*/
+   unsigned int eax = 1, ebx, ecx = 0, edx;
+
+   native_cpuid(&eax, &ebx, &ecx, &edx);
+}
+
 static int collect_cpu_info_early(struct ucode_cpu_info *uci)
 {
unsigned int val[2];
@@ -385,7 +405,7 @@ static int collect_cpu_info_early(struct ucode_cpu_info 
*uci)
native_wrmsrl(MSR_IA32_UCODE_REV, 0);
 
/* As documented in the SDM: Do a CPUID 1 here */
-   sync_core();
+   cpuid_1();
 
/* get the current revision from MSR 0x8B */
native_rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]);
@@ -627,7 +647,7 @@ static int apply_microcode_early(struct ucode_cpu_info 
*uci, bool early)
native_wrmsrl(MSR_IA32_UCODE_REV, 0);
 
/* As documented in the SDM: Do a CPUID 1 here */
-   sync_core();
+   cpuid_1();
 
/* get the current revision from MSR 0x8B */
native_rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]);
@@ -927,7 +947,7 @@ static int apply_microcode_intel(int cpu)
wrmsrl(MSR_IA32_UCODE_REV, 0);
 
/* As documented in the SDM: Do a CPUID 1 here */
-   sync_core();
+   cpuid_1();
 
/* get the current revision from MSR 0x8B */
rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]);
-- 
2.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 1/4] x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels

2016-12-09 Thread Andy Lutomirski
We support various non-Intel CPUs that don't have the CPUID
instruction, so the M486 test was wrong.  For now, fix it with a big
hammer: handle missing CPUID on all 32-bit CPUs.

Reported-by: One Thousand Gnomes 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf17f6a..64fbc937d586 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -595,7 +595,7 @@ static inline void sync_core(void)
 {
int tmp;
 
-#ifdef CONFIG_M486
+#ifdef CONFIG_X86_32
/*
 * Do a CPUID if available, otherwise do a jump.  The jump
 * can conveniently enough be the jump around CPUID.
-- 
2.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 0/4] CPUID-less CPU/sync_core fixes and improvements

2016-12-09 Thread Andy Lutomirski
*** PATCHES 1 and 2 MAY BE 4.9 MATERIAL ***

Alan Cox pointed out that the 486 isn't the only supported CPU that
doesn't have CPUID.  Let's clean up the mess and make everything
faster while we're at it.

Patch 1 is intended to be an easy fix: it makes sync_core() work
without CPUID on all 32-bit kernels.  It should be quite safe.  This
will have a negligible performance cost during boot on kernels built
for newer CPUs.  With this in place, patch 2 reverts the buggy 486
check I added.

Patches 3-4 are meant to improve the situation.  Patch 3 cleans up
the Intel microcode loader and the patch 4 (which depends on patch 3
to work correctly) stops using CPUID in sync_core() altogether.

Changes from v3:
 - Improve sync_core() comments.
 - Tidy up sync_core() asm.

Changes from v2:
 - Switch to IRET-to-self and get rid of all the paravirt code.
 - Further immprove the sync_core() comment.

Changes from v1:
 - Fix Xen
 - Add timing info to the changelog (hint: 2x speedup)
 - Document patch 1 a bit better.

Andy Lutomirski (4):
  x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit
kernels
  Revert "x86/boot: Fail the boot if !M486 and CPUID is missing"
  x86/microcode/intel: Replace sync_core() with native_cpuid()
  x86/asm: Rewrite sync_core() to use IRET-to-self

 arch/x86/boot/cpu.c   |  6 ---
 arch/x86/include/asm/processor.h  | 80 +--
 arch/x86/kernel/cpu/microcode/intel.c | 26 ++--
 3 files changed, 81 insertions(+), 31 deletions(-)

-- 
2.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/hvm: don't create a default ioreq server...

2016-12-09 Thread Andrew Cooper
On 09/12/16 17:55, Paul Durrant wrote:
> ...if the domain is not under construction.
>
> If upstream QEMU is in use then it will explicitly create an ioreq server
> rather than implicitly creating the default ioreq server, which is a
> side-effect of reading HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN,
> or HVM_PARAM_BUFIOREQ_EVTCHN (as is done by legacy QEMUs).
>
> However, if the domain is subsequently saved/migrated then those parameters
> are read and hence the default server will be unnecessarily instantiated.
>
> This patch adds an extra check of the 'creation_finished' flag when those
> HVM params are read and will only instantiate the server if the domain is
> under construction, which will always be the case when QEMU is invoked.
>
> Signed-off-by: Paul Durrant 

Reviewed-by: Andrew Cooper 

CC'ing the COLO guys.  Please can you test with this patch?

> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> ---
>  xen/arch/x86/hvm/hvm.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index e0f936b..c531f37 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -5337,7 +5337,16 @@ static int hvmop_get_param(
>  {
>  domid_t domid;
>  
> -/* May need to create server. */
> +/*
> + * It may be necessary to create a default ioreq server here,
> + * because legacy versions of QEMU are not aware of the new API
> + * for explicit ioreq server creation. However, if the domain
> + * is not under construction then it will not be QEMU querying
> + * the parameters and thus the query should have that side-effect.
> + */
> +if ( d->creation_finished )
> +break;
> +
>  domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
>  rc = hvm_create_ioreq_server(d, domid, 1,
>   HVM_IOREQSRV_BUFIOREQ_LEGACY, NULL);


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] x86/hvm: don't create a default ioreq server...

2016-12-09 Thread Paul Durrant
...if the domain is not under construction.

If upstream QEMU is in use then it will explicitly create an ioreq server
rather than implicitly creating the default ioreq server, which is a
side-effect of reading HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN,
or HVM_PARAM_BUFIOREQ_EVTCHN (as is done by legacy QEMUs).

However, if the domain is subsequently saved/migrated then those parameters
are read and hence the default server will be unnecessarily instantiated.

This patch adds an extra check of the 'creation_finished' flag when those
HVM params are read and will only instantiate the server if the domain is
under construction, which will always be the case when QEMU is invoked.

Signed-off-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
 xen/arch/x86/hvm/hvm.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e0f936b..c531f37 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5337,7 +5337,16 @@ static int hvmop_get_param(
 {
 domid_t domid;
 
-/* May need to create server. */
+/*
+ * It may be necessary to create a default ioreq server here,
+ * because legacy versions of QEMU are not aware of the new API
+ * for explicit ioreq server creation. However, if the domain
+ * is not under construction then it will not be QEMU querying
+ * the parameters and thus the query should have that side-effect.
+ */
+if ( d->creation_finished )
+break;
+
 domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
 rc = hvm_create_ioreq_server(d, domid, 1,
  HVM_IOREQSRV_BUFIOREQ_LEGACY, NULL);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] xen/arm: Add support for 16 bit VMIDs

2016-12-09 Thread Julien Grall

On 09/12/16 10:49, Bhupinder Thakur wrote:

Hi Julien,


Hi Bhupinder,


On 6 December 2016 at 21:14, Julien Grall  wrote:







--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -789,7 +789,8 @@ void __init start_xen(unsigned long
boot_phys_offset,

 gic_init();

-p2m_vmid_allocator_init();
+if ( p2m_vmid_allocator_init() != 0 )
+panic("Could not allocate VMID bitmap space");




I am not sure why we have to initialize the VMID allocator far before
setting up the stage-2 translation (see call setup_virt_paging).

Overall, VMID are part of stage-2 subsystem. So I think it would be
better to move this call in setup_virt_paging.

With that you could take advantage of the for_each_online loop in
setup_virt_paging and avoid to have go through again all CPUs.

So what I would like to see is:
   - Patch #1: move p2m_vmid_allocator_init in setup_virt_paging
   - Patch #2: Add support for 16 bit VMIDs



I believe the 2nd patch should be based on the first patch. So they
should be applied in that order only. Should I send these two patches
as a series like [patch 1/2] and [patch 2/2]?


That's right.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/vpmu: Move vpmu_do_cpuid() handling into {pv, hvm}_cpuid()

2016-12-09 Thread Andrew Cooper
On 06/12/16 11:11, Andrew Cooper wrote:
> On 05/12/16 20:59, Boris Ostrovsky wrote:
>> On 12/05/2016 01:24 PM, Andrew Cooper wrote:
>>> @@ -3516,6 +3516,17 @@ void hvm_cpuid(unsigned int input, unsigned int 
>>> *eax, unsigned int *ebx,
>>>  if ( !(hvm_pae_enabled(v) || hvm_long_mode_enabled(v)) )
>>>  *edx &= ~cpufeat_mask(X86_FEATURE_PSE36);
>>>  }
>>> +
>>> +if ( vpmu_enabled(v) &&
>>> + vpmu_is_set(vcpu_vpmu(v), VPMU_CPU_HAS_DS) )
>>> +{
>>> +*edx |= cpufeat_mask(X86_FEATURE_DS);
>>> +if ( cpu_has(¤t_cpu_data, X86_FEATURE_DTES64) )
>>> +*ecx |= cpufeat_mask(X86_FEATURE_DTES64);
>>> +if ( cpu_has(¤t_cpu_data, X86_FEATURE_DSCPL) )
>>> +*ecx |= cpufeat_mask(X86_FEATURE_DSCPL);
>>> +}
>>> +
>>>  break;
>>>  
>>>  case 0x7:
>>> @@ -3646,6 +3657,18 @@ void hvm_cpuid(unsigned int input, unsigned int 
>>> *eax, unsigned int *ebx,
>>>  }
>>>  break;
>>>  
>>> +case 0x000a: /* Architectural Performance Monitor Features (Intel) 
>>> */
>>> +if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL || 
>>> !vpmu_enabled(v) )
>>> +{
>>> +*eax = *ebx = *ecx = *edx = 0;
>>> +break;
>>> +}
>>> +
>>> +/* Report at most version 3 since that's all we currently emulate 
>>> */
>>> +if ( (*eax & 0xff) > 3 )
>>> +*eax = (*eax & ~0xff) | 3;
>>> +break;
>> Both this and Debug Store checks are the same for both HVM and PV. Can
>> they be factored out?
> The purpose of this patch series is to untangle the current call tree to
> make it easier to finally merge the PV and HVM paths into a single
> guest_cpuid().
>
> Yes, this does add a bit of duplication in the short timer, but allows
> for easier movement to the longterm goal.
>
>> (and then perhaps version update can gain back PMU_VERSION_MASK macro)
> That involves moving a whole load of Intel internals with generic names
> into vpmu.h, which is why I chose not to do it.  The end result in
> guest_cpuid() won't use it.

Boris: Any further comment, or is my explanation ok?  Strictly speaking
the patch isn't blocked on your review, but I'd prefer not to use that
technicality if you are unhappy with it.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/balloon: Only mark a page as managed when it is released

2016-12-09 Thread Boris Ostrovsky
On 12/09/2016 12:10 PM, Ross Lagerwall wrote:
> Only mark a page as managed when it is released back to the allocator.
> This ensures that the managed page count does not get falsely increased
> when a VM is running. Correspondingly change it so that pages are
> marked as unmanaged after getting them from the allocator.
>
> Signed-off-by: Ross Lagerwall 
>


Reviewed-by: Boris Ostrovsky 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 21/24] ARM: vITS: handle INVALL command

2016-12-09 Thread Andre Przywara
Hi,

On 07/12/16 20:20, Stefano Stabellini wrote:
> On Tue, 6 Dec 2016, Julien Grall wrote:
>> On 06/12/2016 22:01, Stefano Stabellini wrote:
>>> On Tue, 6 Dec 2016, Stefano Stabellini wrote:
 moving a vCPU with interrupts assigned to it is slower than moving a
 vCPU without interrupts assigned to it. You could say that the
 slowness is directly proportional do the number of interrupts assigned
 to the vCPU.
>>>
>>> To be pedantic, by "assigned" I mean that a physical interrupt is routed
>>> to a given pCPU and is set to be forwarded to a guest vCPU running on it
>>> by the _IRQ_GUEST flag. The guest could be dom0. Upon receiving one of
>>> these physical interrupts, a corresponding virtual interrupt (could be a
>>> different irq) will be injected into the guest vCPU.
>>>
>>> When the vCPU is migrated to a new pCPU, the physical interrupts that
>>> are configured to be injected as virtual interrupts into the vCPU, are
>>> migrated with it. The physical interrupt migration has a cost. However,
>>> receiving physical interrupts on the wrong pCPU has an higher cost.
>>
>> I don't understand why it is a problem for you to receive the first interrupt
>> to the wrong pCPU and moving it if necessary.
>>
>> While this may have an higher cost (I don't believe so) on the first received
>> interrupt, migrating thousands of interrupts at the same time is very
>> expensive and will likely get Xen stuck for a while (think about ITS with a
>> single command queue).
>>
>> Furthermore, the current approach will move every single interrupt routed a
>> the vCPU, even those disabled. That's pointless and a waste of resource. You
>> may argue that we can skip the ones disabled, but in that case what would be
>> the benefits to migrate the IRQs while migrate the vCPUs?
>>
>> So I would suggest to spread it over the time. This also means less headache
>> for the scheduler developers.
> 
> The most important aspect of interrupts handling in Xen is latency,
> measured as the time between Xen receiving a physical interrupt and the
> guest receiving it. This latency should be both small and deterministic.
> 
> We all agree so far, right?
> 
> 
> The issue with spreading interrupts migrations over time is that it makes
> interrupt latency less deterministic. It is OK, in the uncommon case of
> vCPU migration with interrupts, to take a hit for a short time. This
> "hit" can be measured. It can be known. If your workload cannot tolerate
> it, vCPUs can be pinned. It should be a rare event anyway. On the other
> hand, by spreading interrupts migrations, we make it harder to predict
> latency. Aside from determinism, another problem with this approach is
> that it ensures that every interrupt assigned to a vCPU will first hit
> the wrong pCPU, then it will be moved. It guarantees the worst-case
> scenario for interrupt latency for the vCPU that has been moved. If we
> migrated all interrupts as soon as possible, we would minimize the
> amount of interrupts delivered to the wrong pCPU. Most interrupts would
> be delivered to the new pCPU right away, reducing interrupt latency.

So if this is such a crucial issue, why don't we use the ITS for good
this time? The ITS hardware probably supports 16 bits worth of
collection IDs, so what about we assign each VCPU (in every guest) a
unique collection ID on the host and do a MAPC & MOVALL on a VCPU
migration to let it point to the right physical redistributor.
I see that this does not cover all use cases (> 65536 VCPUs, for
instance), also depends much of many implementation details:
- How costly is a MOVALL? It needs to scan the pending table and
transfer set bits to the other redistributor, which may take a while.
- Is there an impact if we exceed the number of hardware backed
collections (GITS_TYPE[HCC])? If the ITS is forced to access system
memory for every table lookup, this may slow down everyday operations.
- How likely are those misdirected interrupts in the first place? How
often do we migrate VCPU compared to the the interrupt frequency?

There are more, subtle parameters to consider, so I guess we just need
to try and measure.

> Regardless of how we implement interrupts migrations on ARM, I think it
> still makes sense for the scheduler to know about it. I realize that
> this is a separate point. Even if we spread interrupts migrations over
> time, it still has a cost, in terms of latency as I wrote above, but also
> in terms of interactions with interrupt controllers and ITSes. A vCPU
> with no interrupts assigned to it poses no such problems. The scheduler
> should be aware of the difference. If the scheduler knew, I bet that
> vCPU migration would be a rare event for vCPUs that have many interrupts
> assigned to them. For example, Dom0 vCPU0 would never be moved, and
> dom0_pin_vcpus would be superfluous.

That's a good point, so indeed the "interrupt load" should be a
scheduler parameter. But as you said: that's a different story.

Cheers,
Andre.


Re: [Xen-devel] [PATCH 1/3] make tlbflush_filter()'s first parameter a pointer

2016-12-09 Thread Julien Grall

Hi Jan,

On 08/12/16 16:00, Jan Beulich wrote:

This brings it in line with most other functions dealing with CPU
masks. Convert both implementations to inline functions at once.

Signed-off-by: Jan Beulich 


Acked-by: Julien Grall 

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v11 06/13] efi: create new early memory allocator

2016-12-09 Thread Julien Grall

Hi Daniel,

On 05/12/16 22:25, Daniel Kiper wrote:

There is a problem with place_string() which is used as early memory
allocator. It gets memory chunks starting from start symbol and goes
down. Sadly this does not work when Xen is loaded using multiboot2
protocol because then the start lives on 1 MiB address and we should
not allocate a memory from below of it. So, I tried to use mem_lower
address calculated by GRUB2. However, this solution works only on some
machines. There are machines in the wild (e.g. Dell PowerEdge R820)
which uses first ~640 KiB for boot services code or data... :-(((
Hence, we need new memory allocator for Xen EFI boot code which is
quite simple and generic and could be used by place_string() and
efi_arch_allocate_mmap_buffer(). I think about following solutions:

1) We could use native EFI allocation functions (e.g. AllocatePool()
   or AllocatePages()) to get memory chunk. However, later (somewhere
   in __start_xen()) we must copy its contents to safe place or reserve
   it in e820 memory map and map it in Xen virtual address space. This
   means that the code referring to Xen command line, loaded modules and
   EFI memory map, mostly in __start_xen(), will be further complicated
   and diverge from legacy BIOS cases. Additionally, both former things
   have to be placed below 4 GiB because their addresses are stored in
   multiboot_info_t structure which has 32-bit relevant members.

2) We may allocate memory area statically somewhere in Xen code which
   could be used as memory pool for early dynamic allocations. Looks
   quite simple. Additionally, it would not depend on EFI at all and
   could be used on legacy BIOS platforms if we need it. However, we
   must carefully choose size of this pool. We do not want increase Xen
   binary size too much and waste too much memory but also we must fit
   at least memory map on x86 EFI platforms. As I saw on small machine,
   e.g. IBM System x3550 M2 with 8 GiB RAM, memory map may contain more
   than 200 entries. Every entry on x86-64 platform is 40 bytes in size.
   So, it means that we need more than 8 KiB for EFI memory map only.
   Additionally, if we use this memory pool for Xen and modules command
   line storage (it would be used when xen.efi is executed as EFI application)
   then we should add, I think, about 1 KiB. In this case, to be on safe
   side, we should assume at least 64 KiB pool for early memory allocations.
   Which is about 4 times of our earlier calculations. However, during
   discussion on Xen-devel Jan Beulich suggested that just in case we should
   use 1 MiB memory pool like it is in original place_string() implementation.
   So, let's use 1 MiB as it was proposed. If we think that we should not
   waste unallocated memory in the pool on running system then we can mark
   this region as __initdata and move all required data to dynamically
   allocated places somewhere in __start_xen().

2a) We could put memory pool into .bss.page_aligned section. Then allocate
memory chunks starting from the lowest address. After init phase we can
free unused portion of the memory pool as in case of .init.text or 
.init.data
sections. This way we do not need to allocate any space in image file and
freeing of unused area in the memory pool is very simple.

Now #2a solution is implemented because it is quite simple and requires
limited number of changes, especially in __start_xen().

New allocator is quite generic and can be used on ARM platforms too.
Though it is not enabled on ARM yet due to lack of some prereq.
List of them is placed before ebmalloc code.

Signed-off-by: Daniel Kiper 


FWIW,

Acked-by: Julien Grall 

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 21/24] ARM: vITS: handle INVALL command

2016-12-09 Thread Julien Grall

Hi Stefano,

On 07/12/16 20:20, Stefano Stabellini wrote:

On Tue, 6 Dec 2016, Julien Grall wrote:

On 06/12/2016 22:01, Stefano Stabellini wrote:

On Tue, 6 Dec 2016, Stefano Stabellini wrote:

moving a vCPU with interrupts assigned to it is slower than moving a
vCPU without interrupts assigned to it. You could say that the
slowness is directly proportional do the number of interrupts assigned
to the vCPU.


To be pedantic, by "assigned" I mean that a physical interrupt is routed
to a given pCPU and is set to be forwarded to a guest vCPU running on it
by the _IRQ_GUEST flag. The guest could be dom0. Upon receiving one of
these physical interrupts, a corresponding virtual interrupt (could be a
different irq) will be injected into the guest vCPU.

When the vCPU is migrated to a new pCPU, the physical interrupts that
are configured to be injected as virtual interrupts into the vCPU, are
migrated with it. The physical interrupt migration has a cost. However,
receiving physical interrupts on the wrong pCPU has an higher cost.


I don't understand why it is a problem for you to receive the first interrupt
to the wrong pCPU and moving it if necessary.

While this may have an higher cost (I don't believe so) on the first received
interrupt, migrating thousands of interrupts at the same time is very
expensive and will likely get Xen stuck for a while (think about ITS with a
single command queue).

Furthermore, the current approach will move every single interrupt routed a
the vCPU, even those disabled. That's pointless and a waste of resource. You
may argue that we can skip the ones disabled, but in that case what would be
the benefits to migrate the IRQs while migrate the vCPUs?

So I would suggest to spread it over the time. This also means less headache
for the scheduler developers.


The most important aspect of interrupts handling in Xen is latency,
measured as the time between Xen receiving a physical interrupt and the
guest receiving it. This latency should be both small and deterministic.

We all agree so far, right?


The issue with spreading interrupts migrations over time is that it makes
interrupt latency less deterministic. It is OK, in the uncommon case of
vCPU migration with interrupts, to take a hit for a short time.  This
"hit" can be measured. It can be known. If your workload cannot tolerate
it, vCPUs can be pinned. It should be a rare event anyway.  On the other
hand, by spreading interrupts migrations, we make it harder to predict
latency. Aside from determinism, another problem with this approach is
that it ensures that every interrupt assigned to a vCPU will first hit
the wrong pCPU, then it will be moved.  It guarantees the worst-case
scenario for interrupt latency for the vCPU that has been moved. If we
migrated all interrupts as soon as possible, we would minimize the
amount of interrupts delivered to the wrong pCPU. Most interrupts would
be delivered to the new pCPU right away, reducing interrupt latency.


Migrating all the interrupts can be really expensive because in the 
current state we have to go through every single interrupt and check 
whether the interrupt has been routed to this vCPU. We will also route 
disabled interrupt. And this seems really pointless. This may need some 
optimization here.


With ITS, we may have thousand of interrupts routed to a vCPU. This 
means that for every interrupt we have to issue a command in the host 
ITS queue. You will likely fill up the command queue and add much more 
latency.


Even if you consider the vCPU migration to be a rare case. You could 
still get the pCPU stuck for tens of milliseconds, the time to migrate 
everything. And I don't think this is not acceptable.


Anyway, I would like to see measurement in both situation before 
deciding when LPIs will be migrated.



Regardless of how we implement interrupts migrations on ARM, I think it
still makes sense for the scheduler to know about it. I realize that
this is a separate point. Even if we spread interrupts migrations over
time, it still has a cost, in terms of latency as I wrote above, but also
in terms of interactions with interrupt controllers and ITSes. A vCPU
with no interrupts assigned to it poses no such problems. The scheduler
should be aware of the difference. If the scheduler knew, I bet that
vCPU migration would be a rare event for vCPUs that have many interrupts
assigned to them. For example, Dom0 vCPU0 would never be moved, and
dom0_pin_vcpus would be superfluous.


The number of interrupts routed to a vCPU will vary over the time, this 
will depend what the guest decides to do, so you need scheduler to 
adapt. And in fine, you give the guest a chance to "control" the 
scheduler depending how the interrupts are spread between vCPU.


If the number increases, you may end up to have the scheduler to decide 
to not migrate the vCPU because it will be too expensive. But you may 
have a situation where migrating a vCPU with many interrupts is the only 
possible choice an

Re: [Xen-devel] [PATCH 1/3] Don't create default ioreq server

2016-12-09 Thread Paul Durrant
> -Original Message-
> From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
[snip]
> > >
> > >   This bug is caused by the read side effects of
> > > HVM_PARAM_IOREQ_PFN. The migration code needs a way of being
> able to
> > > query whether a default ioreq server exists, without creating one.
> > >
> > >   Can you remember what the justification for the read side effects
> > > were? ISTR that it was only for qemu compatibility until the ioreq server
> work
> > > got in upstream. If that was the case, can we drop the read side effects
> now
> > > and mandate that all qemus explicitly create their ioreq servers (even if
> this
> > > involves creating a default ioreq server for qemu-trad)?
> > >
> >
> > The read side effects are indeed because of the need to support the old
> qemu interface. If trad were patched then we could at least deprecate the
> default ioreq server but I'm not sure how long we'd need to leave it in place
> after that before it was removed. Perhaps it ought to be under a KCONFIG
> option, since it's also a bit of a security hole.
> >
> 
> So.. what can be done about to make COLO work?
> 

Andrew tells me there is a new boolean in Xen which can be used to determine 
whether the domain is under construction or not. QEMU should always be kicked 
off when the domain is under construction so we can limit the read side effect 
using that Boolean. Thus, when domain-comes along later and needs to query the 
magic page pfns, we don't magically get a default ioreq server created when we 
didn't want one. I'll send a patch... should be a one-liner.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable-smoke test] 103144: tolerable all pass - PUSHED

2016-12-09 Thread osstest service owner
flight 103144 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/103144/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  79e996a89f695db0d8af745a04121b778100be99
baseline version:
 xen  739d05ebb9ccadd4b6f93f8bf288d8c8c6b04c02

Last test of basis   103137  2016-12-09 12:01:46 Z0 days
Testing same since   103144  2016-12-09 15:25:14 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 

jobs:
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=xen-unstable-smoke
+ revision=79e996a89f695db0d8af745a04121b778100be99
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable-smoke 
79e996a89f695db0d8af745a04121b778100be99
+ branch=xen-unstable-smoke
+ revision=79e996a89f695db0d8af745a04121b778100be99
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=xen
+ xenbranch=xen-unstable-smoke
+ qemuubranch=qemu-upstream-unstable
+ '[' xxen = xlinux ']'
+ linuxbranch=
+ '[' xqemu-upstream-unstable = x ']'
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable-smoke
+ prevxenbranch=xen-4.8-testing
+ '[' x79e996a89f695db0d8af745a04121b778100be99 = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/xtf.git
++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git
++ : git://xenbits.xen.org/xtf.git
++ : git://xenbits.xen.org/libvirt.git
++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/rumprun.git
++ : git://git.seabios.org/seabios.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git
++ : git://xenbits.xen.org/osstest/seabios.git
++ : https://github.com/tianocore/edk2.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/ovmf.git
++ : git://xenbits.xen.org/osstest/ovmf.git
++ : git://xenbits.xen.org/osstest/linux-firmware.git
++ : osst...@xenbits.xen.org:/home/osstest/ext/linux-firmware.git
++ : git://git.kernel.org/pub/scm/linux/kernel/git/firmware/

Re: [Xen-devel] [PATCH 1/3] Don't create default ioreq server

2016-12-09 Thread Konrad Rzeszutek Wilk
On Fri, Dec 09, 2016 at 04:43:58PM +, Paul Durrant wrote:
> > -Original Message-
> > From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
> > Konrad Rzeszutek Wilk
> > Sent: 09 December 2016 16:14
> > To: Zhang Chen ; Paul Durrant
> > 
> > Cc: Changlong Xie ; Wei Liu
> > ; Eddie Dong ; Andrew
> > Cooper ; Ian Jackson
> > ; Wen Congyang ; Paul
> > Durrant ; Yang Hongyang
> > ; Xen devel 
> > Subject: Re: [Xen-devel] [PATCH 1/3] Don't create default ioreq server
> > 
> > .snip..
> > > > If you can be more specific about what is broken in COLO we might be
> > > > able to devise a fix for you.
> > >
> > > My workmate have reported this BUG last year:
> > > https://lists.xenproject.org/archives/html/xen-devel/2015-
> > 12/msg02850.html
> > 
> > Paul, Andrew was asking about:
> > 
> > This bug is caused by the read side effects of
> > HVM_PARAM_IOREQ_PFN. The migration code needs a way of being able to
> > query whether a default ioreq server exists, without creating one.
> > 
> > Can you remember what the justification for the read side effects
> > were? ISTR that it was only for qemu compatibility until the ioreq server 
> > work
> > got in upstream. If that was the case, can we drop the read side effects now
> > and mandate that all qemus explicitly create their ioreq servers (even if 
> > this
> > involves creating a default ioreq server for qemu-trad)?
> > 
> 
> The read side effects are indeed because of the need to support the old qemu 
> interface. If trad were patched then we could at least deprecate the default 
> ioreq server but I'm not sure how long we'd need to leave it in place after 
> that before it was removed. Perhaps it ought to be under a KCONFIG option, 
> since it's also a bit of a security hole.
> 

So.. what can be done about to make COLO work?

>   Paul
> 
> > 
> > ?
> > 
> > Full thread below:
> > 
> > [Top] [All Lists]
> > [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread
> > Index]
> > Re: [Xen-devel] question about migration
> > 
> > To: Wen Congyang 
> > From: Andrew Cooper 
> > Date: Tue, 29 Dec 2015 11:24:14 +
> > Cc: Paul Durrant , xen devel  > devel@x>
> > Delivery-date: Tue, 29 Dec 2015 11:24:33 +
> > List-id: Xen developer discussion 
> > On 25/12/2015 01:45, Wen Congyang wrote:
> > On 12/24/2015 08:36 PM, Andrew Cooper wrote:
> > On 24/12/15 02:29, Wen Congyang wrote:
> > Hi Andrew Cooper:
> > 
> > I rebase the COLO codes to the newest upstream xen, and test it. I found
> > a problem in the test, and I can reproduce this problem via the migration.
> > 
> > How to reproduce:
> > 1. xl cr -p hvm_nopv
> > 2. xl migrate hvm_nopv 192.168.3.1
> > You are the very first person to try a usecase like this.
> > 
> > It works as much as it does because of your changes to the uncooperative
> > HVM
> > domain logic.  I have said repeatedly during review, this is not 
> > necessarily a
> > safe change to make without an in-depth analysis of the knock-on effects; it
> > looks as if you have found the first knock-on effect.
> > 
> > The migration successes, but the vm doesn't run in the target machine.
> > You can get the reason from 'xl dmesg':
> > (XEN) HVM2 restore: VMCE_VCPU 1
> > (XEN) HVM2 restore: TSC_ADJUST 0
> > (XEN) HVM2 restore: TSC_ADJUST 1
> > (d2) HVM Loader
> > (d2) Detected Xen v4.7-unstable
> > (d2) Get guest memory maps[128] failed. (-38)
> > (d2) *** HVMLoader bug at e820.c:39
> > (d2) *** HVMLoader crashed.
> > 
> > The reason is that:
> > We don't call xc_domain_set_memory_map() in the target machine.
> > When we create a hvm domain:
> > libxl__domain_build()
> >   libxl__build_hvm()
> >   libxl__arch_domain_construct_memmap()
> >   xc_domain_set_memory_map()
> > 
> > Should we migrate the guest memory from source machine to target
> > machine?
> > This bug specifically is because HVMLoader is expected to have run and
> > turned
> > the hypercall information in an E820 table in the guest before a migration
> > occurs.
> > 
> > Unfortunately, the current codebase is riddled with such assumption and
> > expectations (e.g. the HVM save code assumed that FPU context is valid
> > when it
> > is saving register state) which is a direct side effect of how it was 
> > developed.
> > 
> > 
> > Having said all of the above, I agree that your example is a usecase which
> > should work.  It is the ultimate test of whether the migration stream 
> > contains
> > enough information to faithfully reproduce the domain on the far side.
> > Clearly
> > at the moment, this is not the case.
> > 
> > I have an upcoming project to work on the domain memory layout logic,
> > because
> > it is unsuitable for a number of XenServer usecases. Part of that will 
> > require
> > moving it in the migration stream.
> > I found another migration problem in the test:
> > If the migration fails, we will resume it in the source side.
> > But the hvm guest doesn't response any more.
> > 
> > In my test envir

Re: [Xen-devel] [PATCH 2/2] x86emul: consolidate string insn address increments

2016-12-09 Thread Andrew Cooper
On 09/12/16 15:22, Jan Beulich wrote:
> Move the looking at EFLAGS.DF into the macro, rendering all call sites
> more readable.
>
> Signed-off-by: Jan Beulich 

The net change is ok; it is certainly cleaner to read in the body of
x86_emulate().

However, the naming of register_address_increment() was previously ok,
as it was obvious at the calling point that a negative increment was
possible.  This subtly is now hidden.

How about reg_addr_adjust() or reg_addr_adjust_dir() as an alternative
name?  This retains the property that it is obvious that the direction
flag is followed in the calculation.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] x86emul: use SrcEax/DstEax where suitable for string insns

2016-12-09 Thread Jan Beulich
>>> On 09.12.16 at 18:06,  wrote:
> On 09/12/16 15:21, Jan Beulich wrote:
>> LODS, SCAS, and STOS all use the accumulator as one of their operands.
>> This avoids come open coding of things, but requires switching around
> 
> Do you mean "some open coding" ?

Indeed I do - not sure what my fingers did...

>> operands of SCAS.
>>
>> Signed-off-by: Jan Beulich 
> 
> Reviewed-by: Andrew Cooper 

Thanks, Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xen/balloon: Only mark a page as managed when it is released

2016-12-09 Thread Ross Lagerwall
Only mark a page as managed when it is released back to the allocator.
This ensures that the managed page count does not get falsely increased
when a VM is running. Correspondingly change it so that pages are
marked as unmanaged after getting them from the allocator.

Signed-off-by: Ross Lagerwall 
---
 drivers/xen/balloon.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index e4db19e..db107fa 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -180,7 +180,6 @@ static void __balloon_append(struct page *page)
 static void balloon_append(struct page *page)
 {
__balloon_append(page);
-   adjust_managed_page_count(page, -1);
 }
 
 /* balloon_retrieve: rescue a page from the balloon, if it is not empty. */
@@ -201,8 +200,6 @@ static struct page *balloon_retrieve(bool require_lowmem)
else
balloon_stats.balloon_low--;
 
-   adjust_managed_page_count(page, 1);
-
return page;
 }
 
@@ -478,7 +475,7 @@ static enum bp_state increase_reservation(unsigned long 
nr_pages)
 #endif
 
/* Relinquish the page back to the allocator. */
-   __free_reserved_page(page);
+   free_reserved_page(page);
}
 
balloon_stats.current_pages += rc;
@@ -509,6 +506,7 @@ static enum bp_state decrease_reservation(unsigned long 
nr_pages, gfp_t gfp)
state = BP_EAGAIN;
break;
}
+   adjust_managed_page_count(page, -1);
scrub_page(page);
list_add(&page->lru, &pages);
}
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 12/14] x86/PVHv2: fix dom0_max_vcpus so it's capped to 128 for PVHv2 Dom0

2016-12-09 Thread Jan Beulich
>>> On 30.11.16 at 17:49,  wrote:
> @@ -176,6 +177,8 @@ unsigned int __init dom0_max_vcpus(void)
>  max_vcpus = opt_dom0_max_vcpus_max;
>  if ( max_vcpus > MAX_VIRT_CPUS )
>  max_vcpus = MAX_VIRT_CPUS;
> +if ( dom0_hvm )
> +max_vcpus = min_t(typeof(max_vcpus), max_vcpus, HVM_MAX_VCPUS);

I don't see the need for min_t() here - just follow the code right
before your addition:

if ( dom0_hvm && max_vcpus > HVM_MAX_VCPUS )
max_vcpus = HVM_MAX_VCPUS;

> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -193,7 +193,7 @@ static void __init parse_acpi_param(char *s)
>   *  - hvm   Create a PVHv2 Dom0.
>   *  - shadowUse shadow paging for Dom0.
>   */
> -static bool __initdata dom0_hvm;
> +bool __initdata dom0_hvm;
>  static void __init parse_dom0_param(char *s)
>  {
>  char *ss;
> --- a/xen/include/asm-x86/setup.h
> +++ b/xen/include/asm-x86/setup.h
> @@ -63,4 +63,6 @@ extern bool opt_dom0_shadow;
>  #define opt_dom0_shadow 0
>  #endif
>  
> +extern bool dom0_hvm;

One more argument to move the command ling option parsing to
domain_build.c.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] x86emul: use SrcEax/DstEax where suitable for string insns

2016-12-09 Thread Andrew Cooper
On 09/12/16 15:21, Jan Beulich wrote:
> LODS, SCAS, and STOS all use the accumulator as one of their operands.
> This avoids come open coding of things, but requires switching around

Do you mean "some open coding" ?

> operands of SCAS.
>
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 11/14] xen/x86: parse Dom0 kernel for PVHv2

2016-12-09 Thread Jan Beulich
>>> On 30.11.16 at 17:49,  wrote:
> @@ -1930,12 +1931,148 @@ static int __init hvm_setup_p2m(struct domain *d)
>  #undef MB1_PAGES
>  }
>  
> +static int __init hvm_copy_to_phys(struct domain *d, paddr_t paddr, void 
> *buf,
> +   int size)

I guess you made size plain int because hvm_copy_to_guest_phys()
has it that way, but please let's not spread such bogus things - sizes
can't possibly be negative.

> +{
> +struct vcpu *saved_current;
> +int rc;
> +
> +saved_current = current;
> +set_current(d->vcpu[0]);
> +rc = hvm_copy_to_guest_phys(paddr, buf, size);
> +set_current(saved_current);

I continue to be uncertain about the behavior of this if something
inside hvm_copy_to_guest_phys() goes wrong: Did you either
statically analyze the code or try in practice out whether the
playing with current makes understanding the crash output any
harder?

While there's going to be some work involved with it, I do think
that the use here might be a reason for the whole hvm_copy()
machinery to gain a struct vcpu* parameter.

> +static int __init hvm_load_kernel(struct domain *d, const module_t *image,
> +  unsigned long image_headroom,
> +  module_t *initrd, char *image_base,
> +  char *cmdline, paddr_t *entry,
> +  paddr_t *start_info_addr)
> +{
> +char *image_start = image_base + image_headroom;
> +unsigned long image_len = image->mod_end;
> +struct elf_binary elf;
> +struct elf_dom_parms parms;
> +paddr_t last_addr;
> +struct hvm_start_info start_info;
> +struct hvm_modlist_entry mod;
> +struct vcpu *saved_current, *v = d->vcpu[0];
> +int rc;
> +
> +if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 )
> +{
> +printk("Error trying to detect bz compressed kernel\n");
> +return rc;
> +}
> +
> +if ( (rc = elf_init(&elf, image_start, image_len)) != 0 )
> +{
> +printk("Unable to init ELF\n");
> +return rc;
> +}
> +#ifdef VERBOSE
> +elf_set_verbose(&elf);
> +#endif
> +elf_parse_binary(&elf);
> +if ( (rc = elf_xen_parse(&elf, &parms)) != 0 )
> +{
> +printk("Unable to parse kernel for ELFNOTES\n");
> +return rc;
> +}
> +
> +if ( parms.phys_entry == UNSET_ADDR32 ) {
> +printk("Unable to find XEN_ELFNOTE_PHYS32_ENTRY address\n");
> +return -EINVAL;
> +}
> +
> +printk("OS: %s version: %s loader: %s bitness: %s\n", parms.guest_os,
> +   parms.guest_ver, parms.loader,
> +   elf_64bit(&elf) ? "64-bit" : "32-bit");
> +
> +/* Copy the OS image and free temporary buffer. */
> +elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base);
> +elf.dest_size = parms.virt_kend - parms.virt_kstart;
> +
> +saved_current = current;
> +set_current(v);
> +rc = elf_load_binary(&elf);
> +set_current(saved_current);

Same reservations as above.

> +if ( rc < 0 )
> +{
> +printk("Failed to load kernel: %d\n", rc);
> +printk("Xen dom0 kernel broken ELF: %s\n", elf_check_broken(&elf));
> +return rc;
> +}
> +
> +last_addr = ROUNDUP(parms.virt_kend - parms.virt_base, PAGE_SIZE);
> +
> +if ( initrd != NULL )
> +{
> +rc = hvm_copy_to_phys(d, last_addr, mfn_to_virt(initrd->mod_start),
> +  initrd->mod_end);
> +if ( rc )
> +{
> +printk("Unable to copy initrd to guest\n");
> +return rc;
> +}
> +
> +mod.paddr = last_addr;
> +mod.size = initrd->mod_end;
> +last_addr += ROUNDUP(initrd->mod_end, PAGE_SIZE);
> +}

mod is left uninitialized in the else case afaict - I don't think all
compilers we support (plus Coverity) can spot the common
dependency on initrd != NULL.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] docs: turn links to docs/* into absolute path

2016-12-09 Thread Wei Liu
On Fri, Dec 09, 2016 at 01:02:10PM +0100, Cedric Bosdonnat wrote:
> On Fri, 2016-12-09 at 11:25 +, Ian Jackson wrote:
> > Hi, Cedric et al.  I like the idea of tidying this up.  Thanks for the
> > patch, which (with some small changes) will be a good idea.
> > 
> > Cedric Bosdonnat writes ("Re: [Xen-devel] [PATCH] docs: turn links to 
> > docs/* into absolute path"):
> > > On Thu, 2016-12-08 at 17:59 +, Andrew Cooper wrote:
> > > > However, this change will cause
> > > > http://xenbits.xen.org/docs/unstable/man/xl.cfg.5.html to point at a
> > > > local file rather than something which is reasonably accessable from the
> > > > webroot.
> > > 
> > > Oh, I didn't think about this one. You're right!
> > 
> > This would be solved by making the path configurable at build time.
> > We would have the docs generator use an appropriate (perhaps empty)
> > prefix.
> > 
> > Can I suggest that your first patch should replace each instance of
> > docs/misc/ too ?  I mean, that you should introduce XEN_DOCMISC_DIR or
> > something, which subsumes docs/misc/.
> > 
> > That way when the docs are installed by a packager in
> > /usr/share/doc/xen/ there doesn't have to be this weird docs/misc/
> > path component.
> 
> Good idea!
> 
> > > > Another issue to consider is that some packagers only package the
> > > > manpages, not the other misc text content.  (I would argue that none of
> > > > the manpages should refer to misc text content in the first place).
> > 
> > I think those packagers are Doing It Wrong.
> 
> Hehe ;) let's not care about that case then.
> 
> > > So wouldn't the best thing to do rather be converting the misc text 
> > > content
> > > into proper man pages so that everyone gets it? And we could also easily
> > > jump from one man page to the other using tools like the Vim plugin
> > > (I'm sure other editors has the same sort of tool).
> > 
> > Well, that would be nice.  It would certainly be nice for more of the
> > docs to be in a more sophisticated format.
> 
> I'm currently moving a few of the misc docs into man pages. Already done
>  * xl-disk-configuration.txt (plus reformatting in POD)
>  * vbd-interface.txt (reformatted too)
>  * xl-disk-configuration.markdown
> 
> I can continue with these for sure: it's rather easy to do and wouldn't take
> too long to get finished.
> 
> > But do we want to insist on all new documentation being written in POD
> > or markdown ?  That might reduce the amount of documentation we get.
> 

Markdown should be fine. POD not so much...

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] tap device name for emulated NIC too long

2016-12-09 Thread Konrad Rzeszutek Wilk
On Wed, Nov 30, 2016 at 06:38:45PM -0700, Jim Fehlig wrote:
> Hi All,

.. crickets..
> 
> During the last Wg-openstack meetup we briefly discussed a long-standing bug
> when using Xen+libvirt+OpenStack with Neutron networking
> 
> https://bugs.launchpad.net/nova/+bug/1450465
> 
> The bug was also discussed on this list with no resolution
> 
> https://lists.xenproject.org/archives/html/xen-devel/2015-06/msg04116.html
> 
> To summarize: the tap device name for an emulated NIC is too long after
> libxl appends '-emu' to the name provided by Neutron. Some proposed fixes
> include
> 
> 1. Shorten '-emu' to just '-e', avoiding IFNAMSIZ limit. But users are free
> to provide a name that already occupies the full IFNAMSIZ. Also, the
> user-provided name may be used in rules, filters, etc. elsewhere in the
> network, so modifying it at all seems questionable.

+1
> 
> 2. Change OpenStack to not exceed IFNAMSIZ-4 when specifying Xen vif name.
> This could be proposed to the Neutron devs, but IMO adding such Xen-specific
> hacks in OpenStack is undesirable.
> 
> 3. Change the Xen default vif type from 'ioemu' to 'vif' (see
> docs/misc/xl-network-configuration.markdown), which avoids creating an
> emulated device. (Note: such a change could be made in Xen or libvirt.) But
> I think this is a no-go. I'd suspect it would result in a lot of broken
> configurations. E.g. a guest may not have PV drivers and is relying on the
> emulated device. Or the guest may be configured to network boot, in which
> case the emulated device would be needed for PXE [0].
> 
> We (the Wg-openstack folks) would like to hear your opinions on these
> proposals, or alternatives for fixing this bug.
> 
> Regards,
> Jim
> 
> [0] iPXE claims support for Xen netfront devices, but I've not yet got it to
> work: http://lists.ipxe.org/pipermail/ipxe-devel/2014-July/003674.html
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1] arm/irq: Reorder check in route_irq_to_guest() to avoid 4 layers of "if"

2016-12-09 Thread Julien Grall

Hi Oleksandr,

Thank you for the patch.

On 06/12/16 17:53, Oleksandr Tyshchenko wrote:

From: Oleksandr Tyshchenko 

Remove one layer of "if" by reordering the check
in route_irq_to_guest() to make code more clearer.

Signed-off-by: Oleksandr Tyshchenko 
CC: Julien Grall 
---
 xen/arch/arm/irq.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 508028b..6d7e44e 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -481,21 +481,17 @@ int route_irq_to_guest(struct domain *d, unsigned int 
virq,
 {
 struct domain *ad = irq_get_domain(desc);

-if ( d == ad )
-{
-if ( irq_get_guest_info(desc)->virq != virq )
-{
-printk(XENLOG_G_ERR
-   "d%u: IRQ %u is already assigned to vIRQ %u\n",
-   d->domain_id, irq, irq_get_guest_info(desc)->virq);
-retval = -EBUSY;
-}
-}
-else
+if ( d != ad )
 {
 printk(XENLOG_G_ERR "IRQ %u is already used by domain %u\n",
irq, ad->domain_id);
 retval = -EBUSY;
+} else if ( irq_get_guest_info(desc)->virq != virq )


In Xen coding style the } and else if should be in separate line. E.g

}
else if ( ... )

With that fixed:

Reviewed-by: Julien Grall 

Stefano, would you be happy to fix this minor coding style issue while 
committing?


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] fix potential pa_range_info out of bound access

2016-12-09 Thread Julien Grall

Hi Stefano,

On 09/12/16 01:40, Stefano Stabellini wrote:

On Thu, 8 Dec 2016, Stefano Stabellini wrote:

pa_range_info has only 8 elements and is accessed using pa_range as
index. pa_range is initialized to 16, potentially causing out of bound
access errors. Fix the issue by initializing pa_range to the effective
number of pa_range_info elements.

CID 1381865

Signed-off-by: Stefano Stabellini 

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index e4991df..245fcd1 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1629,7 +1629,7 @@ void __init setup_virt_paging(void)
 };

 unsigned int cpu;
-unsigned int pa_range = 0x10; /* Larger than any possible value */
+unsigned int pa_range = sizeof(pa_range_info) / sizeof(pa_range_info[0]);

 for_each_online_cpu ( cpu )
 {


this is wrong, it should be sizeof(pa_range_info) / sizeof(pa_range_info[0]) - 
1:

---
pa_range_info has only 8 elements and is accessed using pa_range as
index. pa_range is initialized to 16, potentially causing out of bound
access errors. Fix the issue by initializing pa_range to the effective
number of pa_range_info elements minus 1.

Coverity-ID: 1381865

Signed-off-by: Stefano Stabellini 

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index e4991df..14901b0 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1629,7 +1629,7 @@ void __init setup_virt_paging(void)
 };

 unsigned int cpu;
-unsigned int pa_range = 0x10; /* Larger than any possible value */
+unsigned int pa_range = ARRAY_SIZE(pa_range_info) - 1;


The previous value was confusing and I think this one is even more.

But this is not really the problem, it is because the boundary check the 
later on is wrong:


if ( pa_range&0x8 || !pa_range_info[pa_range].pabits )

It will only check whether bit 3 is not set. But we want to check that 
pa_range is the range of the array. I.e


pa_range < ARRAY_SIZE(pa_range_info)

If you still want to change the pa_range initial value, then I would 
prefer to see the boot CPU one (i.e boot_cpu_data.mm64.pa_range).


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] Don't create default ioreq server

2016-12-09 Thread Paul Durrant
> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
> Konrad Rzeszutek Wilk
> Sent: 09 December 2016 16:14
> To: Zhang Chen ; Paul Durrant
> 
> Cc: Changlong Xie ; Wei Liu
> ; Eddie Dong ; Andrew
> Cooper ; Ian Jackson
> ; Wen Congyang ; Paul
> Durrant ; Yang Hongyang
> ; Xen devel 
> Subject: Re: [Xen-devel] [PATCH 1/3] Don't create default ioreq server
> 
> .snip..
> > > If you can be more specific about what is broken in COLO we might be
> > > able to devise a fix for you.
> >
> > My workmate have reported this BUG last year:
> > https://lists.xenproject.org/archives/html/xen-devel/2015-
> 12/msg02850.html
> 
> Paul, Andrew was asking about:
> 
>   This bug is caused by the read side effects of
> HVM_PARAM_IOREQ_PFN. The migration code needs a way of being able to
> query whether a default ioreq server exists, without creating one.
> 
>   Can you remember what the justification for the read side effects
> were? ISTR that it was only for qemu compatibility until the ioreq server work
> got in upstream. If that was the case, can we drop the read side effects now
> and mandate that all qemus explicitly create their ioreq servers (even if this
> involves creating a default ioreq server for qemu-trad)?
> 

The read side effects are indeed because of the need to support the old qemu 
interface. If trad were patched then we could at least deprecate the default 
ioreq server but I'm not sure how long we'd need to leave it in place after 
that before it was removed. Perhaps it ought to be under a KCONFIG option, 
since it's also a bit of a security hole.

  Paul

> 
> ?
> 
> Full thread below:
> 
> [Top] [All Lists]
> [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread
> Index]
> Re: [Xen-devel] question about migration
> 
> To: Wen Congyang 
> From: Andrew Cooper 
> Date: Tue, 29 Dec 2015 11:24:14 +
> Cc: Paul Durrant , xen devel  devel@x>
> Delivery-date: Tue, 29 Dec 2015 11:24:33 +
> List-id: Xen developer discussion 
> On 25/12/2015 01:45, Wen Congyang wrote:
> On 12/24/2015 08:36 PM, Andrew Cooper wrote:
> On 24/12/15 02:29, Wen Congyang wrote:
> Hi Andrew Cooper:
> 
> I rebase the COLO codes to the newest upstream xen, and test it. I found
> a problem in the test, and I can reproduce this problem via the migration.
> 
> How to reproduce:
> 1. xl cr -p hvm_nopv
> 2. xl migrate hvm_nopv 192.168.3.1
> You are the very first person to try a usecase like this.
> 
> It works as much as it does because of your changes to the uncooperative
> HVM
> domain logic.  I have said repeatedly during review, this is not necessarily a
> safe change to make without an in-depth analysis of the knock-on effects; it
> looks as if you have found the first knock-on effect.
> 
> The migration successes, but the vm doesn't run in the target machine.
> You can get the reason from 'xl dmesg':
> (XEN) HVM2 restore: VMCE_VCPU 1
> (XEN) HVM2 restore: TSC_ADJUST 0
> (XEN) HVM2 restore: TSC_ADJUST 1
> (d2) HVM Loader
> (d2) Detected Xen v4.7-unstable
> (d2) Get guest memory maps[128] failed. (-38)
> (d2) *** HVMLoader bug at e820.c:39
> (d2) *** HVMLoader crashed.
> 
> The reason is that:
> We don't call xc_domain_set_memory_map() in the target machine.
> When we create a hvm domain:
> libxl__domain_build()
>   libxl__build_hvm()
>   libxl__arch_domain_construct_memmap()
>   xc_domain_set_memory_map()
> 
> Should we migrate the guest memory from source machine to target
> machine?
> This bug specifically is because HVMLoader is expected to have run and
> turned
> the hypercall information in an E820 table in the guest before a migration
> occurs.
> 
> Unfortunately, the current codebase is riddled with such assumption and
> expectations (e.g. the HVM save code assumed that FPU context is valid
> when it
> is saving register state) which is a direct side effect of how it was 
> developed.
> 
> 
> Having said all of the above, I agree that your example is a usecase which
> should work.  It is the ultimate test of whether the migration stream contains
> enough information to faithfully reproduce the domain on the far side.
> Clearly
> at the moment, this is not the case.
> 
> I have an upcoming project to work on the domain memory layout logic,
> because
> it is unsuitable for a number of XenServer usecases. Part of that will require
> moving it in the migration stream.
> I found another migration problem in the test:
> If the migration fails, we will resume it in the source side.
> But the hvm guest doesn't response any more.
> 
> In my test envirionment, the migration always successses, so I
> 
> "succeeds"
> 
> use a hack way to reproduce it:
> 1. modify the target xen tools:
> 
> diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
> index 258dec4..da95606 100644
> --- a/tools/libxl/libxl_stream_read.c
> +++ b/tools/libxl/libxl_stream_read.c
> @@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done

Re: [Xen-devel] [PATCH v4 10/14] xen/x86: populate PVHv2 Dom0 physical memory map

2016-12-09 Thread Jan Beulich
>>> On 30.11.16 at 17:49,  wrote:
> @@ -302,7 +307,8 @@ static unsigned long __init compute_dom0_nr_pages(
>  avail -= max_pdx >> s;
>  }
>  
> -need_paging = opt_dom0_shadow || (is_pvh_domain(d) && 
> !iommu_hap_pt_share);
> +need_paging = opt_dom0_shadow || (has_hvm_container_domain(d) &&
> +  (!iommu_hap_pt_share || !paging_mode_hap(d)));

Indentation.

> @@ -545,11 +552,12 @@ static __init void pvh_map_all_iomem(struct domain *d, 
> unsigned long nr_pages)
>  ASSERT(nr_holes == 0);
>  }
>  
> -static __init void pvh_setup_e820(struct domain *d, unsigned long nr_pages)
> +static __init void hvm_setup_e820(struct domain *d, unsigned long nr_pages)

Why?

> @@ -577,8 +585,19 @@ static __init void pvh_setup_e820(struct domain *d, 
> unsigned long nr_pages)
>  continue;
>  }
>  
> -*entry_guest = *entry;
> -pages = PFN_UP(entry_guest->size);
> +/*
> + * Make sure the start and length are aligned to PAGE_SIZE, because
> + * that's the minimum granularity of the 2nd stage translation.
> + */
> +start = ROUNDUP(entry->addr, PAGE_SIZE);
> +end = (entry->addr + entry->size) & PAGE_MASK;

Taking the comment into consideration, I wonder whether you
wouldn't better use PAGE_ORDER_4K here, as that's what the
p2m code uses.

> @@ -1010,8 +1029,6 @@ static int __init construct_dom0_pv(
>  BUG_ON(d->vcpu[0] == NULL);
>  BUG_ON(v->is_initialised);
>  
> -process_pending_softirqs();

Wouldn't this adjustment better fit into the previous patch, together
with its companion below?

> +static int __init hvm_steal_ram(struct domain *d, unsigned long size,
> +paddr_t limit, paddr_t *addr)
> +{
> +unsigned int i = d->arch.nr_e820;
> +
> +while ( i-- )
> +{
> +struct e820entry *entry = &d->arch.e820[i];
> +
> +if ( entry->type != E820_RAM || entry->size < size )
> +continue;
> +
> +/* Subtract from the beginning. */
> +if ( entry->addr + size < limit && entry->addr >= MB(1) )

<= I think (for the left comparison)?

> +static void __init hvm_steal_low_ram(struct domain *d, unsigned long start,
> + unsigned long nr_pages)
> +{
> +unsigned long mfn;
> +
> +ASSERT(start + nr_pages < PFN_DOWN(MB(1)));

<= again I think.

> +static int __init hvm_setup_p2m(struct domain *d)
> +{
> +struct vcpu *v = d->vcpu[0];
> +unsigned long nr_pages;
> +unsigned int i;
> +int rc;
> +bool preempted;
> +#define MB1_PAGES PFN_DOWN(MB(1))
> +
> +nr_pages = compute_dom0_nr_pages(d, NULL, 0);
> +
> +hvm_setup_e820(d, nr_pages);
> +do {
> +preempted = false;
> +paging_set_allocation(d, dom0_paging_pages(d, nr_pages),
> +  &preempted);
> +process_pending_softirqs();
> +} while ( preempted );
> +
> +/*
> + * Memory below 1MB is identity mapped.
> + * NB: this only makes sense when booted from legacy BIOS.
> + */
> +rc = modify_identity_mmio(d, 0, PFN_DOWN(MB(1)), true);
> +if ( rc )
> +{
> +printk("Failed to identity map low 1MB: %d\n", rc);
> +return rc;
> +}
> +
> +/* Populate memory map. */
> +for ( i = 0; i < d->arch.nr_e820; i++ )
> +{
> +unsigned long addr, size;
> +
> +if ( d->arch.e820[i].type != E820_RAM )
> +continue;
> +
> +addr = PFN_DOWN(d->arch.e820[i].addr);
> +size = PFN_DOWN(d->arch.e820[i].size);
> +
> +if ( addr >= MB1_PAGES )
> +rc = hvm_populate_memory_range(d, addr, size);
> +else if ( addr + size > MB1_PAGES )
> +{
> +hvm_steal_low_ram(d, addr, MB1_PAGES - addr);
> +rc = hvm_populate_memory_range(d, MB1_PAGES,
> +   size - (MB1_PAGES - addr));

Is this case possible at all? All x86 systems have some form of
BIOS right below the 1Mb boundary, and the E820 map for
Dom0 is being derived from the host one.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -475,6 +475,43 @@ void share_xen_page_with_guest(
>  spin_unlock(&d->page_alloc_lock);
>  }
>  
> +int unshare_xen_page_with_guest(struct page_info *page, struct domain *d)

__init

And once its __init, it may be possible to simplify it, as you don't need
to fear races anymore. E.g. you wouldn't need a loop over cmpxchg().

> +{
> +unsigned long y, x;
> +bool drop_dom_ref;
> +
> +if ( page_get_owner(page) != d || !(page->count_info & PGC_xen_heap) )

Please don't open code is_xen_heap_page().

> +return -EINVAL;
> +
> +spin_lock(&d->page_alloc_lock);
> +
> +/* Drop the page reference so we can chanfge the owner. */
> +y = page->count_info;
> +do {
> +x = y;
> +if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) )
> +{
> +spin_unlock(&d->pa

[Xen-devel] nocera1 Re: [xen-unstable test] 103010: regressions - trouble: broken/fail/pass

2016-12-09 Thread Ian Jackson
osstest service owner writes ("[xen-unstable test] 103010: regressions - 
trouble: broken/fail/pass"):
> flight 103010 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/103010/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-xtf-amd64-amd64-43 host-install(3) broken REGR. vs. 102942

nocera1 had forgotten its boot order (and afaict randomly permuted the
PCI IRQ assignments) in the BIOS.

I have put it back.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 04/11] docs: convert xl-disk-configuration into a man page

2016-12-09 Thread Cédric Bosdonnat
Convert xl-disk-configuration.txt from plain text file to a POD file
to get it as a man page. The references to it in the other man pages
are also updated.

Signed-off-by: Cédric Bosdonnat 
---
 docs/INDEX   |   1 -
 docs/man/xl-disk-configuration.pod.5 | 529 +++
 docs/man/xl.cfg.pod.5.in |   4 +-
 docs/man/xl.pod.1.in |   7 +-
 docs/misc/xl-disk-configuration.txt  | 359 
 5 files changed, 534 insertions(+), 366 deletions(-)
 create mode 100644 docs/man/xl-disk-configuration.pod.5
 delete mode 100644 docs/misc/xl-disk-configuration.txt

diff --git a/docs/INDEX b/docs/INDEX
index 2cfeef90a9..3a8b9472d8 100644
--- a/docs/INDEX
+++ b/docs/INDEX
@@ -16,7 +16,6 @@ misc/tscmode  TSC Mode HOWTO
 misc/vbd-interface Xen Guest Disk (VBD) Interface
 misc/xenstore  Xenstore protocol specification
 misc/xenstore-pathsXenstore path documentation
-misc/xl-disk-configuration XL Disk Configuration
 misc/distro_mappingDistro Directory Layouts
 misc/dump-core-format  Xen Core Dump Format
 misc/vtd   VT-d HOWTO
diff --git a/docs/man/xl-disk-configuration.pod.5 
b/docs/man/xl-disk-configuration.pod.5
new file mode 100644
index 00..6510536e02
--- /dev/null
+++ b/docs/man/xl-disk-configuration.pod.5
@@ -0,0 +1,529 @@
+=head1 NAME
+
+xl-disk-configuration - XL Disk Configuration Syntax
+
+=head1 SYNTAX
+
+This document specifies the xl config file format disk configuration
+option.  It has the following form:
+
+   disk = [ 'DISKSPEC', 'DISKSPEC', ... ]
+
+where each C is in this form:
+
+   [=|,]*,
+ [, [, [, [,
+ [=|,]*
+ [target=]
+
+For example, these strings are equivalent:
+
+/dev/vg/guest-volume,,hda
+/dev/vg/guest-volume,raw,hda,rw
+format=raw, vdev=hda, access=rw, target=/dev/vg/guest-volume
+raw:/dev/vg/guest-volume,hda,w  (deprecated, see below)
+
+As are these:
+
+/root/image.iso,,hdc,cdrom
+/root/image.iso,,hdc,,cdrom
+/root/image.iso,raw,hdc,devtype=cdrom
+format=raw, vdev=hdc, access=ro, devtype=cdrom, target=/root/image.iso
+raw:/root/image.iso,hdc:cdrom,ro   (deprecated, see below)
+
+These might be specified in the domain config file like this:
+
+disk = [ '/dev/vg/guest-volume,,hda', '/root/image.iso,,hdc,cdrom' ]
+
+
+More formally, the string is a series of comma-separated keyword/value
+pairs, flags and positional parameters.  Parameters which are not bare
+keywords and which do not contain "=" symbols are assigned to the
+so-far-unspecified positional parameters, in the order below.  The
+positional parameters may also be specified explicitly by name.
+
+Each parameter may be specified at most once, either as a positional
+parameter or a named parameter.  Default values apply if the parameter
+is not specified, or if it is specified with an empty value (whether
+positionally or explicitly).
+
+Whitespace may appear before each parameter and will be ignored.
+
+=head1 Positional Parameters
+
+=over 4
+
+=item B
+
+=over 4
+
+=item Description
+
+Block device or image file path.  When this is used as a path, F
+will be prepended if the path doesn't start with a '/'.
+
+=item Supported values
+
+N/A
+
+=item Deprecated values
+
+N/A
+
+=item Default value
+
+None.  While a path is provided in most cases there is an exception:
+for a cdrom device, lack of this attribute would imply an empty cdrom
+drive.
+
+=item Special syntax
+
+When this parameter is specified by name, ie with the C
+syntax in the configuration file, it consumes the whole rest of the
+C including trailing whitespaces.  Therefore in that case
+it must come last.  This is permissible even if an empty value for
+the target was already specified as a positional parameter.  This
+is the only way to specify a target string containing metacharacters
+such as commas and (in some cases) colons, which would otherwise be
+misinterpreted.
+
+Future parameter and flag names will start with an ascii letter and
+contain only ascii alphanumerics, hyphens and underscores, and will
+not be legal as vdevs.  Targets which might match that syntax
+should not be specified as positional parameters.
+
+=back
+
+=item B
+
+=over 4
+
+=item Description
+
+Specifies the format of image file.
+
+=item Supported values
+
+raw, qcow, qcow2, vhd, qed
+
+=item Deprecated values
+
+None
+
+=item Default value
+
+raw
+
+=back
+
+=item B
+
+=over 4
+
+=item Description
+
+Virtual device as seen by the guest (also referred to as guest drive
+designation in some specifications).  L
+
+=item Supported values
+
+hd[x], xvd[x], sd[x] etc.  Please refer to the above specification for
+further details.
+
+=item Deprecated values
+
+None
+
+=item Default Value
+
+None, this parameter is mandatory.
+
+=back
+
+=item B
+
+=over 4
+
+=item Description
+
+Specified access control information.  Whether or not the block de

[Xen-devel] [PATCH 11/11] docs: convert tscmode.txt into man page

2016-12-09 Thread Cédric Bosdonnat
tscmode.txt is referenced in xl.cfg(5). Convert it into a pod
formatted man page.

Signed-off-by: Cédric Bosdonnat 
---
 docs/INDEX   |   1 -
 docs/{misc/tscmode.txt => man/tscmode.pod.7} | 109 ++-
 docs/man/xl.cfg.pod.5.in |   4 +-
 3 files changed, 76 insertions(+), 38 deletions(-)
 rename docs/{misc/tscmode.txt => man/tscmode.pod.7} (89%)

diff --git a/docs/INDEX b/docs/INDEX
index 66cc82b78c..868ab1fc1d 100644
--- a/docs/INDEX
+++ b/docs/INDEX
@@ -12,7 +12,6 @@ misc/xen-command-line Xen Hypervisor Command Line 
Options
 misc/crashdb   Xen crash debugger notes
 misc/grant-tables  A Rough Introduction to Using Grant Tables
 misc/kexec_and_kdump   Kexec and Kdump for Xen
-misc/tscmode   TSC Mode HOWTO
 misc/xenstore  Xenstore protocol specification
 misc/xenstore-pathsXenstore path documentation
 misc/distro_mappingDistro Directory Layouts
diff --git a/docs/misc/tscmode.txt b/docs/man/tscmode.pod.7
similarity index 89%
rename from docs/misc/tscmode.txt
rename to docs/man/tscmode.pod.7
index 01ee060598..0da57e5327 100644
--- a/docs/misc/tscmode.txt
+++ b/docs/man/tscmode.pod.7
@@ -1,7 +1,4 @@
-TSC_MODE HOW-TO
-by: Dan Magenheimer 
-
-OVERVIEW
+=head1 OVERVIEW
 
 As of Xen 4.0, a new config option called tsc_mode may be specified
 for each domain.  The default for tsc_mode handles the vast majority
@@ -16,16 +13,29 @@ equally to both the OS and ALL apps that are running on this
 domain, now or in the future.
 
 Key questions to be answered for the OS and/or each application are:
-- Does the OS/app use the rdtsc instruction at all?  (We will explain below
-  how to determine this.)
-- At what frequency is the rdtsc instruction executed by either the OS
-  or any running apps?  If the sum exceeds about 10,000 rdtsc instructions
-  per second per processor, we call this a "high-TSC-frequency"
-  OS/app/environment.  (This is relatively rare, and developers of OS's
-  and apps that are high-TSC-frequency are usually aware of it.)
-- If the OS/app does use rdtsc, will it behave incorrectly if "time goes
-  backwards" or if the frequency of the TSC suddenly changes?  If so,
-  we call this a "TSC-sensitive" app or OS; otherwise it is "TSC-resilient".
+
+=over 4
+
+=item *
+
+Does the OS/app use the rdtsc instruction at all?
+(We will explain below how to determine this.)
+
+=item *
+
+At what frequency is the rdtsc instruction executed by either the OS
+or any running apps?  If the sum exceeds about 10,000 rdtsc instructions
+per second per processor, we call this a "high-TSC-frequency"
+OS/app/environment.  (This is relatively rare, and developers of OS's
+and apps that are high-TSC-frequency are usually aware of it.)
+
+=item *
+
+If the OS/app does use rdtsc, will it behave incorrectly if "time goes
+backwards" or if the frequency of the TSC suddenly changes?  If so,
+we call this a "TSC-sensitive" app or OS; otherwise it is "TSC-resilient".
+
+=back
 
 This last is the US$64,000 question as it may be very difficult
 (or, for legacy apps, even impossible) to predict all possible
@@ -46,38 +56,63 @@ an intelligent default but allows system administrator's to 
adjust
 how rdtsc instructions are executed differently for different domains.
 
 The non-default choices for tsc_mode are:
-- tsc_mode=1 (always emulate). All rdtsc instructions are emulated;
-   this is the best choice when TSC-sensitive apps are running and
-   it is necessary to understand worst-case performance degradation
-   for a specific hardware environment.
-- tsc_mode=2 (never emulate).  This is the same as prior to Xen 4.0
-   and is the best choice if it is certain that all apps running in
-   this VM are TSC-resilient and highest performance is required.
-- tsc_mode=3 (PVRDTSCP).  High-TSC-frequency apps may be paravirtualized
-   (modified) to obtain both correctness and highest performance; any
-   unmodified apps must be TSC-resilient.
-
-If tsc_mode is left unspecified (or set to tsc_mode=0), a hybrid
+
+=over 4
+
+=item * B (always emulate).
+
+All rdtsc instructions are emulated; this is the best choice when
+TSC-sensitive apps are running and it is necessary to understand
+worst-case performance degradation for a specific hardware environment.
+
+=item * B (never emulate).
+
+This is the same as prior to Xen 4.0 and is the best choice if it
+is certain that all apps running in this VM are TSC-resilient and
+highest performance is required.
+
+=item * B (PVRDTSCP).
+
+High-TSC-frequency apps may be paravirtualized (modified) to
+obtain both correctness and highest performance; any unmodified
+apps must be TSC-resilient.
+
+=back
+
+If tsc_mode is left unspecified (or set to B), a hybrid
 algorithm is utilized to ensure correctness while providing the
 best performance possible given:
-- the requirement of correctness,
-- the underlying hardware, and
-- whether or

[Xen-devel] [PATCH 08/11] docs: convert vtpmmgr into a pod man page

2016-12-09 Thread Cédric Bosdonnat
vtpmmgr.txt is referenced in a man page, convert it to a man page.

Signed-off-by: Cédric Bosdonnat 
---
 docs/{misc/vtpmmgr.txt => man/vtpmmgr.pod.7} | 351 ---
 1 file changed, 206 insertions(+), 145 deletions(-)
 rename docs/{misc/vtpmmgr.txt => man/vtpmmgr.pod.7} (54%)

diff --git a/docs/misc/vtpmmgr.txt b/docs/man/vtpmmgr.pod.7
similarity index 54%
rename from docs/misc/vtpmmgr.txt
rename to docs/man/vtpmmgr.pod.7
index d4f756c9d1..f2b2ca038e 100644
--- a/docs/misc/vtpmmgr.txt
+++ b/docs/man/vtpmmgr.pod.7
@@ -1,22 +1,30 @@
-
-Authors:
-Daniel De Graaf 
-Quan Xu 
-
+=head1 Authors
+
+=over 4
+
+=item Daniel De Graaf 
+
+=item Quan Xu 
+
+=back
 
 This document describes the operation and command line interface of
-vtpmmgr-stubdom. See docs/misc/vtpm.txt for details on the vTPM subsystem as a
+vtpmmgr-stubdom. See L for details on the vTPM subsystem as a
 whole.
 
-
-Overview
-
+=head1 Overview
 
 The TPM Manager has three primary functions:
 
-1. Securely store the encryption keys for vTPMs
-2. Provide a single controlled path of access to the physical TPM
-3. Provide evidence (via TPM Quotes) of the current configuration
+=over 4
+
+=item 1. Securely store the encryption keys for vTPMs
+
+=item 2. Provide a single controlled path of access to the physical TPM
+
+=item 3. Provide evidence (via TPM Quotes) of the current configuration
+
+=back
 
 When combined with a platform that provides a trusted method for creating
 domains, the TPM Manager provides assurance that the private keys in a vTPM are
@@ -26,9 +34,7 @@ The manager accepts commands from the vtpm-stubdom domains 
via the mini-os TPM
 backend driver. The vTPM manager communicates directly with hardware TPM using
 the mini-os tpm_tis driver.
 
-
-Boot Configurations and TPM Groups
-
+=head1 Boot Configurations and TPM Groups
 
 The TPM Manager's data is secured by using the physical TPM's seal operation,
 which allows data to be bound to specific PCRs. These PCRs are populated in the
@@ -52,9 +58,7 @@ has its own AIK in the physical TPM for quotes of the 
hardware TPM state; when
 used with a conforming Privacy CA, this allows each group on the system to form
 the basis of a distinct identity.
 
-
-Initial Provisioning
-
+=head1 Initial Provisioning
 
 When the TPM Manager first boots up, it will create a stub vTPM group along 
with
 entries for any vTPMs that communicate with it. This stub group must be
@@ -73,51 +77,87 @@ group is created, a signed list of boot measurements can be 
installed. The
 initial group controls the ability to boot the system as a whole, and cannot be
 deleted once provisioned.
 
-
-Command Line Arguments
-
+=head1 Command Line Arguments
 
 Command line arguments are passed to the domain via the 'extra' parameter in 
the
 VM config file. Each parameter is separated by white space. For example:
 
-extra="foo=bar baz"
+extra="foo=bar baz"
 
 Valid arguments:
 
-owner_auth=
-srk_auth=
-   Set the owner and SRK authdata for the TPM. If not specified, the
-   default is 160 zero bits (the well-known auth value). Valid values of
-are:
-   well-known   Use the well known auth (default)
-   hash:  Use the given 40-character ASCII hex string
-   text:   Use sha1 hash of .
-
-tpmdriver=
-   Choose the driver used for communication with the hardware TPM. Values
-   other than tpm_tis should only be used for testing.
-
-   The possible values of  are:
-   tpm_tisDirect communication with a hardware TPM 1.2.  The
-   domain must have access to TPM IO memory. (default)
-   tpmfront   Use the Xen tpmfront interface to talk to another
-   domain which provides access to the TPM.
+=over 4
+
+=item owner_auth=
+
+=item srk_auth=
+
+Set the owner and SRK authdata for the TPM. If not specified, the
+default is 160 zero bits (the well-known auth value). Valid values of
+ are:
+
+=over 4
+
+=item well-known
+
+Use the well known auth (default)
+
+=item hash:
+
+Use the given 40-character ASCII hex string
+
+=item text:
+
+Use sha1 hash of .
+
+=back
+
+=item tpmdriver=
+
+Choose the

[Xen-devel] [PATCH 01/11] docs: allow writing man pages in markdown

2016-12-09 Thread Cédric Bosdonnat
Some of the docs/misc documents are written in markdown language.
As an effort to cleanup man pages these documents will be converted into
man pages. To avoid some more conversion, add rules to the docs/Makefile
to generate man pages out of markdown files as well as pod ones.

However, pandoc doesn't know how to convert man pages links. Thus the
man links in markdown pages won't work.

Signed-off-by: Cédric Bosdonnat 
---
 docs/Makefile | 48 ++--
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/docs/Makefile b/docs/Makefile
index e2537e8755..d3f5eb607c 100644
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -3,13 +3,14 @@ include $(XEN_ROOT)/Config.mk
 -include $(XEN_ROOT)/config/Docs.mk
 
 VERSION:= $(shell $(MAKE) -C $(XEN_ROOT)/xen 
--no-print-directory xenversion)
+DATE   := $(shell date +%Y-%m-%d)
 
 DOC_ARCHES  := arm x86_32 x86_64
 
 # Documentation sources to build
-MAN1SRC-y := $(sort $(shell find man/ -name '*.pod.1' -print))
-MAN5SRC-y := $(sort $(shell find man/ -name '*.pod.5' -print))
-MAN8SRC-y := $(sort $(shell find man/ -name '*.pod.8' -print))
+MAN1SRC-y := $(sort $(shell find man/ -regex '.*\.\(pod\|markdown\)\.1' 
-print))
+MAN5SRC-y := $(sort $(shell find man/ -regex '.*\.\(pod\|markdown\)\.5' 
-print))
+MAN8SRC-y := $(sort $(shell find man/ -regex '.*\.\(pod\|markdown\)\.8' 
-print))
 
 MARKDOWNSRC-y := $(sort $(shell find misc -name '*.markdown' -print))
 
@@ -18,11 +19,17 @@ TXTSRC-y := $(sort $(shell find misc -name '*.txt' -print))
 PANDOCSRC-y := $(sort $(shell find features/ misc/ specs/ -name '*.pandoc' 
-print))
 
 # Documentation targets
-DOC_MAN1 := $(patsubst man/%.pod.1,man1/%.1,$(MAN1SRC-y))
-DOC_MAN5 := $(patsubst man/%.pod.5,man5/%.5,$(MAN5SRC-y))
-DOC_MAN8 := $(patsubst man/%.pod.8,man8/%.8,$(MAN8SRC-y))
+DOC_MAN1 := $(patsubst man/%.pod.1,man1/%.1,$(MAN1SRC-y)) \
+   $(patsubst man/%.markdown.1,man1/%.1,$(MAN1SRC-y))
+DOC_MAN5 := $(patsubst man/%.pod.5,man5/%.5,$(MAN5SRC-y)) \
+   $(patsubst man/%.markdown.5,man5/%.5,$(MAN5SRC-y))
+DOC_MAN8 := $(patsubst man/%.pod.8,man8/%.8,$(MAN8SRC-y)) \
+   $(patsubst man/%.markdown.8,man8/%.8,$(MAN8SRC-y))
 DOC_HTML := $(patsubst %.markdown,html/%.html,$(MARKDOWNSRC-y)) \
 $(patsubst %.pandoc,html/%.html,$(PANDOCSRC-y)) \
+$(patsubst man/%.markdown.1,html/man/%.1.html,$(MAN1SRC-y)) \
+$(patsubst man/%.markdown.5,html/man/%.5.html,$(MAN5SRC-y)) \
+$(patsubst man/%.markdown.8,html/man/%.8.html,$(MAN8SRC-y)) \
 $(patsubst man/%.pod.1,html/man/%.1.html,$(MAN1SRC-y)) \
 $(patsubst man/%.pod.5,html/man/%.5.html,$(MAN5SRC-y)) \
 $(patsubst man/%.pod.8,html/man/%.8.html,$(MAN8SRC-y)) \
@@ -31,6 +38,9 @@ DOC_HTML := $(patsubst 
%.markdown,html/%.html,$(MARKDOWNSRC-y)) \
 DOC_TXT  := $(patsubst %.txt,txt/%.txt,$(TXTSRC-y)) \
 $(patsubst %.markdown,txt/%.txt,$(MARKDOWNSRC-y)) \
 $(patsubst %.pandoc,txt/%.txt,$(PANDOCSRC-y)) \
+$(patsubst man/%.markdown.1,txt/man/%.1.txt,$(MAN1SRC-y)) \
+$(patsubst man/%.markdown.5,txt/man/%.5.txt,$(MAN5SRC-y)) \
+$(patsubst man/%.markdown.8,txt/man/%.8.txt,$(MAN8SRC-y)) \
 $(patsubst man/%.pod.1,txt/man/%.1.txt,$(MAN1SRC-y)) \
 $(patsubst man/%.pod.5,txt/man/%.5.txt,$(MAN5SRC-y)) \
 $(patsubst man/%.pod.8,txt/man/%.8.txt,$(MAN8SRC-y))
@@ -89,6 +99,16 @@ else
@echo "pod2man not installed; skipping $$@"
 endif
 
+man$(1)/%.$(1): man/%.markdown.$(1) Makefile
+ifneq ($(PANDOC),)
+   @$(INSTALL_DIR) $$(@D)
+   $(PANDOC) --standalone -V title=$$* -V section=$(1) \
+ -V date="$(DATE)" -V footer="$(VERSION)" \
+ -V header=Xen $$< -t man --output $$@
+else
+   @echo "pandoc not installed; skipping $$@"
+endif
+
 # HTML manpages
 html/man/%.$(1).html: man/%.pod.$(1) Makefile
 ifneq ($(POD2HTML),)
@@ -98,6 +118,14 @@ else
@echo "pod2html not installed; skipping $$@"
 endif
 
+html/man/%.$(1).html: man/%.markdown.$(1) Makefile
+ifneq ($(PANDOC),)
+   @$(INSTALL_DIR) $$(@D)
+   $(PANDOC) --standalone $$< -t html --toc --output $$@
+else
+   @echo "pandoc not installed; skipping $$@"
+endif
+
 # Text manpages
 txt/man/%.$(1).txt: man/%.pod.$(1) Makefile
 ifneq ($(POD2TEXT),)
@@ -107,6 +135,14 @@ else
@echo "pod2text not installed; skipping $$@"
 endif
 
+txt/man/%.$(1).txt: man/%.markdown.$(1) Makefile
+ifneq ($(PANDOC),)
+   @$(INSTALL_DIR) $$(@D)
+   $(PANDOC) --standalone $$< -t plain --output $$@
+else
+   @echo "pandoc not installed; skipping $$@"
+endif
+
 # Build
 .PHONY: man$(1)-pages
 man$(1)-pages: $$(DOC_MAN$(1))
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 05/11] docs: move vbd-interface from misc to man

2016-12-09 Thread Cédric Bosdonnat
Make vbd-interface a man page, section7, as this document is
referenced in other man pages (xl-disk-configuration)

Signed-off-by: Cédric Bosdonnat 
---
 docs/INDEX| 1 -
 docs/{misc/vbd-interface.txt => man/vbd-interface.markdown.7} | 0
 2 files changed, 1 deletion(-)
 rename docs/{misc/vbd-interface.txt => man/vbd-interface.markdown.7} (100%)

diff --git a/docs/INDEX b/docs/INDEX
index 3a8b9472d8..101d43c7aa 100644
--- a/docs/INDEX
+++ b/docs/INDEX
@@ -13,7 +13,6 @@ misc/crashdb  Xen crash debugger notes
 misc/grant-tables  A Rough Introduction to Using Grant Tables
 misc/kexec_and_kdump   Kexec and Kdump for Xen
 misc/tscmode   TSC Mode HOWTO
-misc/vbd-interface Xen Guest Disk (VBD) Interface
 misc/xenstore  Xenstore protocol specification
 misc/xenstore-pathsXenstore path documentation
 misc/distro_mappingDistro Directory Layouts
diff --git a/docs/misc/vbd-interface.txt b/docs/man/vbd-interface.markdown.7
similarity index 100%
rename from docs/misc/vbd-interface.txt
rename to docs/man/vbd-interface.markdown.7
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 00/11] Convert a few docs/misc pages into man pages

2016-12-09 Thread Cédric Bosdonnat
As pointed out in Ian and Andrew's emails on my recent docs improvement
attempt, getting the documents from docs/misc referenced in man pages
as man pages would would make these pages more visible for users. This
would also not break the docs html INDEX.

This series adds ability to write markdown man pages. Be aware that
markdown man pages can't link to other man pages.

I also added some pages to man 7 (Misc) section.

Cédric Bosdonnat (11):
  docs: allow writing man pages in markdown
  docs: add rules for man 7 section
  docs: xl-network-configuration turns into a man
  docs: convert xl-disk-configuration into a man page
  docs: move vbd-interface from misc to man
  docs: move xl-numa-placement.markdown to man7
  docs: move vtpm from misc to man
  docs: convert vtpmmgr into a pod man page
  docs: convert misc/channel.txt into xen-pv-channel man page
  docs: move pci-device-reservations from misc to man
  docs: convert tscmode.txt into man page

 .gitignore |   1 +
 docs/INDEX |   5 -
 docs/Makefile  |  57 ++-
 docs/{misc/tscmode.txt => man/tscmode.pod.7}   | 109 +++--
 .../vbd-interface.markdown.7}  |   0
 docs/{misc/vtpm.txt => man/vtpm.pod.7} | 364 +++---
 docs/{misc/vtpmmgr.txt => man/vtpmmgr.pod.7}   | 351 --
 docs/man/xen-pci-device-reservations.pod.7 |  84 
 .../channel.txt => man/xen-pv-channel.markdown.7}  |  20 +-
 docs/man/xl-disk-configuration.pod.5   | 529 +
 .../xl-network-configuration.markdown.5}   |   0
 .../xl-numa-placement.markdown.7}  |   0
 docs/man/xl.cfg.pod.5.in   |  21 +-
 docs/man/xl.pod.1.in   |  11 +-
 docs/misc/pci-device-reservations.txt  |  58 ---
 docs/misc/xl-disk-configuration.txt| 359 --
 16 files changed, 1162 insertions(+), 807 deletions(-)
 rename docs/{misc/tscmode.txt => man/tscmode.pod.7} (89%)
 rename docs/{misc/vbd-interface.txt => man/vbd-interface.markdown.7} (100%)
 rename docs/{misc/vtpm.txt => man/vtpm.pod.7} (57%)
 rename docs/{misc/vtpmmgr.txt => man/vtpmmgr.pod.7} (54%)
 create mode 100644 docs/man/xen-pci-device-reservations.pod.7
 rename docs/{misc/channel.txt => man/xen-pv-channel.markdown.7} (92%)
 create mode 100644 docs/man/xl-disk-configuration.pod.5
 rename docs/{misc/xl-network-configuration.markdown => 
man/xl-network-configuration.markdown.5} (100%)
 rename docs/{misc/xl-numa-placement.markdown => 
man/xl-numa-placement.markdown.7} (100%)
 delete mode 100644 docs/misc/pci-device-reservations.txt
 delete mode 100644 docs/misc/xl-disk-configuration.txt

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 06/11] docs: move xl-numa-placement.markdown to man7

2016-12-09 Thread Cédric Bosdonnat
docs/misc/xl-numa-placement.markdown is referenced by xl.cfg.5 man page,
move it to a man page, section 7.

Signed-off-by: Cédric Bosdonnat 
---
 .../xl-numa-placement.markdown => man/xl-numa-placement.markdown.7} | 0
 docs/man/xl.cfg.pod.5.in| 2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename docs/{misc/xl-numa-placement.markdown => 
man/xl-numa-placement.markdown.7} (100%)

diff --git a/docs/misc/xl-numa-placement.markdown 
b/docs/man/xl-numa-placement.markdown.7
similarity index 100%
rename from docs/misc/xl-numa-placement.markdown
rename to docs/man/xl-numa-placement.markdown.7
diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index 7fc8f55970..fc2faac12b 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -185,7 +185,7 @@ host cpus and memory. In that case, the soft affinity of 
all the vcpus
 of the domain will be set to the pcpus belonging to the NUMA nodes
 chosen during placement.
 
-For more details, see F.
+For more details, see L.
 
 =back
 
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 10/11] docs: move pci-device-reservations from misc to man

2016-12-09 Thread Cédric Bosdonnat
pci-device-reservations is references in xl.cfg(5), convert it as a man
page in pod format. The name is now prefixed with 'xen-' to avoid
possible name conflicts.

Signed-off-by: Cédric Bosdonnat 
---
 docs/man/xen-pci-device-reservations.pod.7 | 84 ++
 docs/man/xl.cfg.pod.5.in   |  2 +-
 docs/misc/pci-device-reservations.txt  | 58 -
 3 files changed, 85 insertions(+), 59 deletions(-)
 create mode 100644 docs/man/xen-pci-device-reservations.pod.7
 delete mode 100644 docs/misc/pci-device-reservations.txt

diff --git a/docs/man/xen-pci-device-reservations.pod.7 
b/docs/man/xen-pci-device-reservations.pod.7
new file mode 100644
index 00..dac92764fc
--- /dev/null
+++ b/docs/man/xen-pci-device-reservations.pod.7
@@ -0,0 +1,84 @@
+=head1 Description
+
+PCI vendor ID 0x5853 has been reserved for use by Xen systems in order to
+advertise certain virtual hardware to guest virtual machines. The primary
+use of this is with device ID 0x0001 to advertise the Xen Platform PCI
+device - the presence of this virtual device enables a guest Operating
+System (subject to the availability of suitable drivers) to make use of
+paravirtualisation features such as disk and network devices etc.
+
+Some Xen vendors wish to provide alternative and/or additional guest drivers
+that can bind to virtual devices[1]. This may be done using the Xen PCI
+vendor ID of 0x5853 and Xen-vendor/device specific PCI device IDs. This file
+records reservations made within the device ID range in order to avoid
+multiple Xen vendors using conflicting IDs.
+
+=head1 Guidelines
+
+=over 4
+
+=item 1. A vendor may request a range of device IDs by submitting a patch to
+ this file.
+
+=item 2. Vendor allocations should be in the range 0xc000-0xfffe to reduce the
+ possibility of clashes with community IDs assigned from the bottom up.
+
+=item 3. The vendor is responsible for allocations within the range and should
+ try to record specific device IDs in PCI ID databases such as
+ http://pciids.sourceforge.net and http//www.pcidatabase.com
+
+=back
+
+=head1 Reservations
+
+range | vendor/product
+
--+--
+0x0001| (Xen Platform PCI device)
+0x0002| Citrix XenServer (grandfathered allocation for XenServer 
6.1)
+0xc000-0xc0ff | Citrix XenServer
+0xc100-0xc1ff | Citrix XenClient
+
+=head1 Notes
+
+=over 4
+
+=item 1.
+
+Upstream QEMU provides a parameterized device called xen-pvdevice that
+can be used to host guest drivers. Execute:
+
+qemu-system-i386 -device xen-pvdevice,help
+
+for a list of all parameters. The following parameters are relevant to
+driver binding:
+
+=over 4
+
+=item  vendor-id (default 0x5853)
+
+The PCI vendor ID and subsystem vendor ID of the device.
+
+=item  device-id (must be specified)
+
+The PCI device ID and subsystem device ID of the device.
+
+=item  revision (default 0x01)
+
+The PCI revision of the device
+
+=back
+
+Also the size parameter (default 0x40) can be used to specify the
+size of the single MMIO BAR that the device exposes. This area may be
+used by drivers for mapping grant tables, etc.
+
+Note that the presence of the Xen Platform PCI device is generally a
+pre-requisite for an additional xen-pvdevice as it is the platform
+device that provides that IO ports necessary for unplugging emulated
+devices. See hvm-emulated-unplug.markdown for details of the IO ports
+and unplug protocol.
+
+libxl provides support for creation of a single additional xen-pvdevice.
+See the vendor_device parameter in xl.cfg(5).
+
+=back
diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index 94eada381f..0dac6f1d9a 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -1894,7 +1894,7 @@ specified, enabling the use of XenServer PV drivers in 
the guest.
 =back
 
 This parameter only takes effect when device_model_version=qemu-xen.
-See F for more information.
+See L for more information.
 
 =back
 
diff --git a/docs/misc/pci-device-reservations.txt 
b/docs/misc/pci-device-reservations.txt
deleted file mode 100644
index 9d6d780ace..00
--- a/docs/misc/pci-device-reservations.txt
+++ /dev/null
@@ -1,58 +0,0 @@
-PCI vendor ID 0x5853 has been reserved for use by Xen systems in order to
-advertise certain virtual hardware to guest virtual machines. The primary
-use of this is with device ID 0x0001 to advertise the Xen Platform PCI
-device - the presence of this virtual device enables a guest Operating
-System (subject to the availability of suitable drivers) to make use of
-paravirtualisation features such as disk and network devices etc.
-
-Some Xen vendors wish to provide alternative and/or additional guest drivers
-that can bind to virtual devices[1]. This may be done using the Xen PCI
-vendor ID of 0x5853 and Xen-vendor/device specific PCI device IDs. This file
-record

[Xen-devel] [PATCH 02/11] docs: add rules for man 7 section

2016-12-09 Thread Cédric Bosdonnat
Some of the docs/misc documents will need to go in man 7 section,
prepare docs/Makefile for it.

Signed-off-by: Cédric Bosdonnat 
---
 .gitignore| 1 +
 docs/Makefile | 9 -
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/.gitignore b/.gitignore
index a2f34a14cb..7584956383 100644
--- a/.gitignore
+++ b/.gitignore
@@ -45,6 +45,7 @@ docs/man/xl.cfg.pod.5
 docs/man/xl.pod.1
 docs/man1/
 docs/man5/
+docs/man7/
 docs/man8/
 docs/pdf/
 docs/txt/
diff --git a/docs/Makefile b/docs/Makefile
index d3f5eb607c..e064de0b77 100644
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -10,6 +10,7 @@ DOC_ARCHES  := arm x86_32 x86_64
 # Documentation sources to build
 MAN1SRC-y := $(sort $(shell find man/ -regex '.*\.\(pod\|markdown\)\.1' 
-print))
 MAN5SRC-y := $(sort $(shell find man/ -regex '.*\.\(pod\|markdown\)\.5' 
-print))
+MAN7SRC-y := $(sort $(shell find man/ -regex '.*\.\(pod\|markdown\)\.7' 
-print))
 MAN8SRC-y := $(sort $(shell find man/ -regex '.*\.\(pod\|markdown\)\.8' 
-print))
 
 MARKDOWNSRC-y := $(sort $(shell find misc -name '*.markdown' -print))
@@ -23,15 +24,19 @@ DOC_MAN1 := $(patsubst man/%.pod.1,man1/%.1,$(MAN1SRC-y)) \
$(patsubst man/%.markdown.1,man1/%.1,$(MAN1SRC-y))
 DOC_MAN5 := $(patsubst man/%.pod.5,man5/%.5,$(MAN5SRC-y)) \
$(patsubst man/%.markdown.5,man5/%.5,$(MAN5SRC-y))
+DOC_MAN7 := $(patsubst man/%.pod.7,man7/%.7,$(MAN7SRC-y)) \
+   $(patsubst man/%.markdown.7,man7/%.7,$(MAN7SRC-y))
 DOC_MAN8 := $(patsubst man/%.pod.8,man8/%.8,$(MAN8SRC-y)) \
$(patsubst man/%.markdown.8,man8/%.8,$(MAN8SRC-y))
 DOC_HTML := $(patsubst %.markdown,html/%.html,$(MARKDOWNSRC-y)) \
 $(patsubst %.pandoc,html/%.html,$(PANDOCSRC-y)) \
 $(patsubst man/%.markdown.1,html/man/%.1.html,$(MAN1SRC-y)) \
 $(patsubst man/%.markdown.5,html/man/%.5.html,$(MAN5SRC-y)) \
+$(patsubst man/%.markdown.7,html/man/%.7.html,$(MAN7SRC-y)) \
 $(patsubst man/%.markdown.8,html/man/%.8.html,$(MAN8SRC-y)) \
 $(patsubst man/%.pod.1,html/man/%.1.html,$(MAN1SRC-y)) \
 $(patsubst man/%.pod.5,html/man/%.5.html,$(MAN5SRC-y)) \
+$(patsubst man/%.pod.7,html/man/%.7.html,$(MAN7SRC-y)) \
 $(patsubst man/%.pod.8,html/man/%.8.html,$(MAN8SRC-y)) \
 $(patsubst %.txt,html/%.txt,$(TXTSRC-y)) \
 $(patsubst %,html/hypercall/%/index.html,$(DOC_ARCHES))
@@ -40,9 +45,11 @@ DOC_TXT  := $(patsubst %.txt,txt/%.txt,$(TXTSRC-y)) \
 $(patsubst %.pandoc,txt/%.txt,$(PANDOCSRC-y)) \
 $(patsubst man/%.markdown.1,txt/man/%.1.txt,$(MAN1SRC-y)) \
 $(patsubst man/%.markdown.5,txt/man/%.5.txt,$(MAN5SRC-y)) \
+$(patsubst man/%.markdown.7,txt/man/%.7.txt,$(MAN7SRC-y)) \
 $(patsubst man/%.markdown.8,txt/man/%.8.txt,$(MAN8SRC-y)) \
 $(patsubst man/%.pod.1,txt/man/%.1.txt,$(MAN1SRC-y)) \
 $(patsubst man/%.pod.5,txt/man/%.5.txt,$(MAN5SRC-y)) \
+$(patsubst man/%.pod.7,txt/man/%.7.txt,$(MAN7SRC-y)) \
 $(patsubst man/%.pod.8,txt/man/%.8.txt,$(MAN8SRC-y))
 DOC_PDF  := $(patsubst %.markdown,pdf/%.pdf,$(MARKDOWNSRC-y)) \
 $(patsubst %.pandoc,pdf/%.pdf,$(PANDOCSRC-y))
@@ -166,7 +173,7 @@ clean-man-pages: clean-man$(1)-pages
 endef
 
 # Generate manpage rules for each section
-$(foreach i,1 5 8,$(eval $(call GENERATE_MANPAGE_RULES,$(i
+$(foreach i,1 5 7 8,$(eval $(call GENERATE_MANPAGE_RULES,$(i
 
 .PHONY: install-html
 install-html: html txt figs
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 07/11] docs: move vtpm from misc to man

2016-12-09 Thread Cédric Bosdonnat
vtpm.txt is referenced in xl.cfg man page. Convert it to pod,
move it to the man folder and update the reference.

Signed-off-by: Cédric Bosdonnat 
---
 docs/INDEX |   1 -
 docs/{misc/vtpm.txt => man/vtpm.pod.7} | 364 +
 docs/man/xl.cfg.pod.5.in   |   3 +-
 3 files changed, 194 insertions(+), 174 deletions(-)
 rename docs/{misc/vtpm.txt => man/vtpm.pod.7} (57%)

diff --git a/docs/INDEX b/docs/INDEX
index 101d43c7aa..66cc82b78c 100644
--- a/docs/INDEX
+++ b/docs/INDEX
@@ -18,7 +18,6 @@ misc/xenstore-paths   Xenstore path documentation
 misc/distro_mappingDistro Directory Layouts
 misc/dump-core-format  Xen Core Dump Format
 misc/vtd   VT-d HOWTO
-misc/vtpm  Virtual TPM
 misc/xen-error-handlingXen Error Handling
 misc/xenpaging Xen Paging
 misc/xsm-flask XSM/FLASK Configuration
diff --git a/docs/misc/vtpm.txt b/docs/man/vtpm.pod.7
similarity index 57%
rename from docs/misc/vtpm.txt
rename to docs/man/vtpm.pod.7
index 1887d40d25..03bde1d4eb 100644
--- a/docs/misc/vtpm.txt
+++ b/docs/man/vtpm.pod.7
@@ -7,9 +7,8 @@ This document describes the virtual Trusted Platform Module 
(vTPM) subsystem
 for Xen. The reader is assumed to have familiarity with building and installing
 Xen, Linux, and a basic understanding of the TPM and vTPM concepts.
 
---
-INTRODUCTION
---
+=head1 INTRODUCTION
+
 The goal of this work is to provide a TPM functionality to a virtual guest
 operating system (a DomU).  This allows programs to interact with a TPM in a
 virtual system the same way they interact with a TPM on the physical system.
@@ -25,99 +24,114 @@ mini-os to reduce memory and processor overhead.
 This mini-os vTPM subsystem was built on top of the previous vTPM work done by
 IBM and Intel corporation.
  
---
-DESIGN OVERVIEW
---
+=head1 DESIGN OVERVIEW
 
 The architecture of vTPM is described below:
 
-+--+
-|Linux DomU| ...
-|   |  ^   |
-|   v  |   |
-|   xen-tpmfront   |
-+--+
-|  ^
-v  |
-+--+
-| mini-os/tpmback  |
-|   |  ^   |
-|   v  |   |
-|  vtpm-stubdom| ...
-|   |  ^   |
-|   v  |   |
-| mini-os/tpmfront |
-+--+
-|  ^
-v  |
-+--+
-| mini-os/tpmback  |
-|   |  ^   |
-|   v  |   |
-| vtpmmgr-stubdom  |
-|   |  ^   |
-|   v  |   |
-| mini-os/tpm_tis  |
-+--+
-|  ^
-v  |
-+--+
-|   Hardware TPM   |
-+--+
- * Linux DomU: The Linux based guest that wants to use a vTPM. There many be
-   more than one of these.
-
- * xen-tpmfront.ko: Linux kernel virtual TPM frontend driver. This driver
-provides vTPM access to a para-virtualized Linux based 
DomU.
-
- * mini-os/tpmback: Mini-os TPM backend driver. The Linux frontend driver
-connects to this backend driver to facilitate
-communications between the Linux DomU and its vTPM. This
-driver is also used by vtpmmgr-stubdom to communicate with
-vtpm-stubdom.
-
- * vtpm-stubdom: A mini-os stub domain that implements a vTPM. There is a
- one to one mapping between running vtpm-stubdom instances and
- logical vtpms on the system. The vTPM Platform Configuration
- Registers (PCRs) are all initialized to zero.
-
- * mini-os/tpmfront: Mini-os TPM frontend driver. The vTPM mini-os domain
- vtpm-stubdom uses this driver to communicate with
- vtpmmgr-stubdom. This driver could also be used 
separately to
- implement a mini-os domain that wishes to use a vTPM of
- its own.
-
- * vtpmmgr-stubdom: A mini-os domain that implements the vTPM manager.
-   There is only one vTPM manager and it should be running during
-   the entire lifetime of the machine.  This domain regulates
-   access to the physical TPM on the system and secures the
-   persistent state of each vTPM.
-
- * mini-os/tpm_tis: Mini-os TPM version 1.2 TPM Interface Specification (TIS)
-driver. This driver used by vtpmmgr-stubdom to talk 
directly to
-the hardware TPM. Communication is facilitated by mapping
-hardware memory pages into vtpmmgr-stubdom.
-
- * Hardware TPM: The physical TPM that is soldered onto the motherboard.
-
---
-INSTALLATION
---
-
-Prerequisites:
---
++--+
+|Linux DomU| ...
+|   |  ^   |
+| 

[Xen-devel] [PATCH 03/11] docs: xl-network-configuration turns into a man

2016-12-09 Thread Cédric Bosdonnat
Move docs/misc/xl-network-configuration.markdown to docs/man and
update the references to it in the other man pages.

Signed-off-by: Cédric Bosdonnat 
---
 docs/INDEX| 1 -
 .../xl-network-configuration.markdown.5}  | 0
 docs/man/xl.cfg.pod.5.in  | 4 ++--
 docs/man/xl.pod.1.in  | 4 ++--
 4 files changed, 4 insertions(+), 5 deletions(-)
 rename docs/{misc/xl-network-configuration.markdown => 
man/xl-network-configuration.markdown.5} (100%)

diff --git a/docs/INDEX b/docs/INDEX
index 7d26cf85d4..2cfeef90a9 100644
--- a/docs/INDEX
+++ b/docs/INDEX
@@ -17,7 +17,6 @@ misc/vbd-interfaceXen Guest Disk (VBD) Interface
 misc/xenstore  Xenstore protocol specification
 misc/xenstore-pathsXenstore path documentation
 misc/xl-disk-configuration XL Disk Configuration
-misc/xl-network-configuration  XL Network Configuration
 misc/distro_mappingDistro Directory Layouts
 misc/dump-core-format  Xen Core Dump Format
 misc/vtd   VT-d HOWTO
diff --git a/docs/misc/xl-network-configuration.markdown 
b/docs/man/xl-network-configuration.markdown.5
similarity index 100%
rename from docs/misc/xl-network-configuration.markdown
rename to docs/man/xl-network-configuration.markdown.5
diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index 21b58bc21e..517c7f9910 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -486,7 +486,7 @@ the host they should map to.  See 
F.
 
 Specifies the networking provision (both emulated network adapters,
 and Xen virtual interfaces) to provided to the guest.  See
-F.
+L.
 
 =item B
 
@@ -2032,7 +2032,7 @@ natively or via hardware backwards compatibility support.
 
 =item F
 
-=item F
+=item L
 
 =item F
 
diff --git a/docs/man/xl.pod.1.in b/docs/man/xl.pod.1.in
index 8e2aa5b5af..2937f33003 100644
--- a/docs/man/xl.pod.1.in
+++ b/docs/man/xl.pod.1.in
@@ -1383,7 +1383,7 @@ How the device should be presented to the guest domain; 
for example "hdc".
 Creates a new network device in the domain specified by I.
 I describes the device to attach, using the same format as the
 B string in the domain config file. See L and
-L
+L
 for more informations.
 
 Note that only attaching PV network interface is supported.
@@ -1797,10 +1797,10 @@ Transcendent Memory.
 The following man pages:
 
 L(5), L(5), B(1)
+L
 
 And the following documents on the xen.org website:
 
-L
 L
 L
 L
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 09/11] docs: convert misc/channel.txt into xen-pv-channel man page

2016-12-09 Thread Cédric Bosdonnat
channel.txt is referenced in xl.cfg(5). Move it to man pages, section 7

Signed-off-by: Cédric Bosdonnat 
---
 .../channel.txt => man/xen-pv-channel.markdown.7}| 20 ++--
 docs/man/xl.cfg.pod.5.in |  2 +-
 2 files changed, 11 insertions(+), 11 deletions(-)
 rename docs/{misc/channel.txt => man/xen-pv-channel.markdown.7} (92%)

diff --git a/docs/misc/channel.txt b/docs/man/xen-pv-channel.markdown.7
similarity index 92%
rename from docs/misc/channel.txt
rename to docs/man/xen-pv-channel.markdown.7
index 9fc701a64a..1c6149dae0 100644
--- a/docs/misc/channel.txt
+++ b/docs/man/xen-pv-channel.markdown.7
@@ -94,13 +94,13 @@ Channel name registry
 It is important that channel names are globally unique. To help ensure
 that no-one's name clashes with yours, please add yours to this list.
 
-Key:
-N: Name
-C: Contact
-D: Short description of use, possibly including a URL to your software
-   or API
-
-N: org.xenproject.guest.clipboard.0.1
-C: David Scott 
-D: Share clipboard data via an in-guest agent. See:
-   http://wiki.xenproject.org/wiki/Clipboard_sharing_protocol
+Key:
+N: Name
+C: Contact
+D: Short description of use, possibly including a URL to your software
+   or API
+
+N: org.xenproject.guest.clipboard.0.1
+C: David Scott 
+D: Share clipboard data via an in-guest agent. See:
+   http://wiki.xenproject.org/wiki/Clipboard_sharing_protocol
diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index 33fae038f7..94eada381f 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -603,7 +603,7 @@ Specifies the virtual channels to be provided to the guest. 
A
 channel is a low-bandwidth, bidirectional byte stream, which resembles
 a serial link. Typical uses for channels include transmitting VM
 configuration after boot and signalling to in-guest agents. Please see
-F for more details.
+L for more details.
 
 Each B is a comma-separated list of C
 settings. Leading and trailing whitespace is ignored in both KEY and
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen/cdrom: Ubuntu 16.04 VM read the content from CD-ROM abnormally

2016-12-09 Thread Konrad Rzeszutek Wilk
On Fri, Dec 09, 2016 at 04:21:02PM +0800, Ken wrote:
> Hi all,
> 
> I run the Ubuntu 16.04 server (2 vcpu/2G, Linux 4.4.0) on the Xen-4.1.2, and
> installed gcc through the CDROM used by 16.04 iso file, when I installed gcc
> that depends deb packages to decompress failed. But uploaded the ISO files
> into the VM are mounted by loop or used as CDROM for other VM (Fedora 24)
> transport these abnormal files to 16.04 are available. So the 16.04 ISO file
> should be correct.
> 
> Then I went to try to fix this problem, the steps are as follows:
> First of all, I was worried because the Hypervisor version is too old to
> cause this problems, so I upgraded to the Xen upstream and found that there
> are the problem still.
> 
> Then, I went to attempted to upgrade the VM kernel, by dichotomy to find the
> smallest available kernel, but found that the kernel patch has nothing to do
> the cdrom or isofs driver, Linux kernel committed history has nothing to do
> these, combined with the Fedora within Linux 4.4.0 have no problem, so I
> inferred the kernel had no this defective.
> 
> The above comparison can not troubleshoot the problem, so I analyzed the
> Linux CD-ROM device loading process to confirm that the problem is
> encountered before the mount. and I found the following three strange
> phenomenon:
> 1. Using Ubuntu 14.04.5 udevadm, re-packaged the init-ramfs of 16.04, reboot
> the VM, read the content of CD-ROM successfully, but compile the system-204
> from the 14.04.5 and installed on 16.04, the problem still can not be
> resolved, some abnormal logs as blow:
>random: udevadm: uninitialized urandom read (16 bytes read, 28 bits of
> entropy available)
> 
> 2. found that the VM registers dmi failed at startup, so I removed the dmi
> driver, reboot the VM, read the content of CD-ROM successfully, and from the
> kernel dmi_scan_machine code, it should not affect to use the CD-ROM, The
> failed logs as blow:
>ioremap error for 0xfc001000-0xfc002000, requested 0x2, got 0x0
>dmi: Firmware registration failed.
> 
> 3. Deploy the VM within 8 vcpus, 8G memory, read the content of CD-ROM
> successfully.
> 
> Therefore, I inferred this anomaly that has several problems, but can not
> focus it, there are other people have encountered this same anomaly? Do
> anyone have any suggests to
> debug it next step?

I think you are the first one to see this. I had run small guests before
(1GB, 1VCPU) with ISO and did not encounter these issues. Are there
erorrs in the dmesg when the file is copied?
> 
> Thanks.
> 
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] Don't create default ioreq server

2016-12-09 Thread Konrad Rzeszutek Wilk
.snip..
> > If you can be more specific about what is broken in COLO we might be
> > able to devise a fix for you.
> 
> My workmate have reported this BUG last year:
> https://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02850.html

Paul, Andrew was asking about:

This bug is caused by the read side effects of HVM_PARAM_IOREQ_PFN. The 
migration code needs a way of being able to query whether a default ioreq 
server exists, without creating one.

Can you remember what the justification for the read side effects were? 
ISTR that it was only for qemu compatibility until the ioreq server work got in 
upstream. If that was the case, can we drop the read side effects now and 
mandate that all qemus explicitly create their ioreq servers (even if this 
involves creating a default ioreq server for qemu-trad)?


?

Full thread below:

[Top] [All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] question about migration

To: Wen Congyang 
From: Andrew Cooper 
Date: Tue, 29 Dec 2015 11:24:14 +
Cc: Paul Durrant , xen devel 
Delivery-date: Tue, 29 Dec 2015 11:24:33 +
List-id: Xen developer discussion 
On 25/12/2015 01:45, Wen Congyang wrote:
On 12/24/2015 08:36 PM, Andrew Cooper wrote:
On 24/12/15 02:29, Wen Congyang wrote:
Hi Andrew Cooper:

I rebase the COLO codes to the newest upstream xen, and test it. I found
a problem in the test, and I can reproduce this problem via the migration.

How to reproduce:
1. xl cr -p hvm_nopv
2. xl migrate hvm_nopv 192.168.3.1
You are the very first person to try a usecase like this.

It works as much as it does because of your changes to the uncooperative HVM 
domain logic.  I have said repeatedly during review, this is not necessarily a 
safe change to make without an in-depth analysis of the knock-on effects; it 
looks as if you have found the first knock-on effect.

The migration successes, but the vm doesn't run in the target machine.
You can get the reason from 'xl dmesg':
(XEN) HVM2 restore: VMCE_VCPU 1
(XEN) HVM2 restore: TSC_ADJUST 0
(XEN) HVM2 restore: TSC_ADJUST 1
(d2) HVM Loader
(d2) Detected Xen v4.7-unstable
(d2) Get guest memory maps[128] failed. (-38)
(d2) *** HVMLoader bug at e820.c:39
(d2) *** HVMLoader crashed.

The reason is that:
We don't call xc_domain_set_memory_map() in the target machine.
When we create a hvm domain:
libxl__domain_build()
  libxl__build_hvm()
  libxl__arch_domain_construct_memmap()
  xc_domain_set_memory_map()

Should we migrate the guest memory from source machine to target machine?
This bug specifically is because HVMLoader is expected to have run and turned 
the hypercall information in an E820 table in the guest before a migration 
occurs.

Unfortunately, the current codebase is riddled with such assumption and 
expectations (e.g. the HVM save code assumed that FPU context is valid when it 
is saving register state) which is a direct side effect of how it was developed.


Having said all of the above, I agree that your example is a usecase which 
should work.  It is the ultimate test of whether the migration stream contains 
enough information to faithfully reproduce the domain on the far side.  Clearly 
at the moment, this is not the case.

I have an upcoming project to work on the domain memory layout logic, because 
it is unsuitable for a number of XenServer usecases. Part of that will require 
moving it in the migration stream.
I found another migration problem in the test:
If the migration fails, we will resume it in the source side.
But the hvm guest doesn't response any more.

In my test envirionment, the migration always successses, so I

"succeeds"

use a hack way to reproduce it:
1. modify the target xen tools:

diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 258dec4..da95606 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void 
*dcs_void,
  goto err;
  }
+ rc = ERROR_FAIL;
+
   err:
  check_all_finished(egc, stream, rc);
2. xl cr hvm_nopv, and wait some time(You can login to the guest)
3. xl migrate hvm_nopv 192.168.3.1

The reason it that:
We create a default ioreq server when we get the hvm param HVM_PARAM_IOREQ_PFN.
It means that: the problem occurs only when the migration fails after we get
the hvm param HVM_PARAM_IOREQ_PFN.

In the function hvm_select_ioreq_server()
If the I/O will be handed by non-default ioreq server, we will return the
non-default ioreq server. In this case, it is handed by qemu.
If the I/O will not be handed by non-default ioreq server, we will return
the default ioreq server. Before migration, we return NULL, and after migration
it is not NULL.
See the caller is hvmemul_do_io():
 case X86EMUL_UNHANDLEABLE:
 {
 struct hvm_ioreq_server *s =
 hvm_select_ioreq_server(curr->domain, &p);

 /* If there is no sui

Re: [Xen-devel] [PATCH v4 09/14] xen/x86: split Dom0 build into PV and PVHv2

2016-12-09 Thread Jan Beulich
>>> On 30.11.16 at 17:49,  wrote:
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -656,6 +656,23 @@ affinities to prefer but be not limited to the specified 
> node(s).
>  
>  Pin dom0 vcpus to their respective pcpus
>  
> +### dom0
> +> `= List of [ hvm | shadow ]`
> +
> +> Sub-options:
> +
> +> `hvm`
> +
> +> Default: `false`
> +
> +Flag that makes a dom0 boot in PVHv2 mode.
> +
> +> `shadow`
> +
> +> Default: `false`
> +
> +Flag that makes a dom0 use shadow paging.

Would you mind marking dom0_shadow deprecated at once? In fact
I wouldn't mind if it was removed from the documentation altogether,
the more that it still has no description at all.

> @@ -1655,6 +1653,28 @@ out:
>  return rc;
>  }
>  
> +static int __init construct_dom0_hvm(struct domain *d, const module_t *image,
> + unsigned long image_headroom,
> + module_t *initrd,
> + void *(*bootstrap_map)(const module_t 
> *),
> + char *cmdline)
> +{
> +
> +printk("** Building a PVH Dom0 **\n");

Why again is it that you call the function "hvm" but mean "pvh"?

> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -187,6 +187,35 @@ static void __init parse_acpi_param(char *s)
>  }
>  }
>  
> +/*
> + * List of parameters that affect Dom0 creation:
> + *
> + *  - hvm   Create a PVHv2 Dom0.
> + *  - shadowUse shadow paging for Dom0.
> + */
> +static bool __initdata dom0_hvm;
> +static void __init parse_dom0_param(char *s)
> +{
> +char *ss;
> +
> +do {
> +
> +ss = strchr(s, ',');
> +if ( ss )
> +*ss = '\0';
> +
> +if ( !strcmp(s, "hvm") )
> +dom0_hvm = true;
> +#ifdef CONFIG_SHADOW_PAGING
> +else if ( !strcmp(s, "shadow") )
> +opt_dom0_shadow = true;
> +#endif
> +
> +s = ss + 1;
> +} while ( ss );
> +}
> +custom_param("dom0", parse_dom0_param);

I continue to think that this should live in domain_build.c, and
dom0_hvm be the one off variable which needs to be global. After
all we intend to extend the "dom0=" quite a bit (presumably to
subsume everything which the various "dom0..." options now do),
and all that stuff lives there anyway.

> --- a/xen/include/asm-x86/setup.h
> +++ b/xen/include/asm-x86/setup.h
> @@ -57,4 +57,10 @@ extern uint8_t kbd_shift_flags;
>  extern unsigned long highmem_start;
>  #endif
>  
> +#ifdef CONFIG_SHADOW_PAGING
> +extern bool opt_dom0_shadow;
> +#else
> +#define opt_dom0_shadow 0

"false" please, to match up with "bool".

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 5/8] libelf: loop safety: Replace all calls to strcmp

2016-12-09 Thread Ian Jackson
strcmp can do singificant work, and is found in some inner loops where
we search for the meaning of things we find in the image.  We need to
avoid doing too much work.

So replace all calls to strcmp with elf_strcmp_safe.

Signed-off-by: Ian Jackson 
---
 xen/common/libelf/libelf-dominfo.c | 37 +++--
 xen/common/libelf/libelf-private.h |  7 ---
 xen/common/libelf/libelf-tools.c   |  4 ++--
 3 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/xen/common/libelf/libelf-dominfo.c 
b/xen/common/libelf/libelf-dominfo.c
index 87a47d9..b037d10 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -70,7 +70,8 @@ elf_errorstatus elf_xen_parse_features(struct elf_binary *elf,
 if ( feature[0] == '!' )
 {
 /* required */
-if ( !strcmp(feature + 1, elf_xen_feature_names[i]) )
+if ( !elf_strcmp_safe(elf, feature + 1,
+  elf_xen_feature_names[i]) )
 {
 elf_xen_feature_set(i, supported);
 if ( required )
@@ -81,7 +82,7 @@ elf_errorstatus elf_xen_parse_features(struct elf_binary *elf,
 else
 {
 /* supported */
-if ( !strcmp(feature, elf_xen_feature_names[i]) )
+if ( !elf_strcmp_safe(elf, feature, elf_xen_feature_names[i]) )
 {
 elf_xen_feature_set(i, supported);
 break;
@@ -173,13 +174,13 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
 safe_strcpy(parms->xen_ver, str);
 break;
 case XEN_ELFNOTE_PAE_MODE:
-if ( !strcmp(str, "yes") )
+if ( !elf_strcmp_safe(elf, str, "yes") )
 parms->pae = XEN_PAE_EXTCR3;
 if ( strstr(str, "bimodal") )
 parms->pae = XEN_PAE_BIMODAL;
 break;
 case XEN_ELFNOTE_BSD_SYMTAB:
-if ( !strcmp(str, "yes") )
+if ( !elf_strcmp_safe(elf, str, "yes") )
 parms->bsd_symtab = 1;
 break;
 
@@ -255,7 +256,7 @@ static unsigned elf_xen_parse_notes(struct elf_binary *elf,
 note_name = elf_note_name(elf, note);
 if ( note_name == NULL )
 continue;
-if ( strcmp(note_name, "Xen") )
+if ( elf_strcmp_safe(elf, note_name, "Xen") )
 continue;
 if ( elf_xen_parse_note(elf, parms, note) )
 return ELF_NOTE_INVALID;
@@ -315,38 +316,38 @@ elf_errorstatus elf_xen_parse_guest_info(struct 
elf_binary *elf,
 elf_msg(elf, "ELF: %s=\"%s\"\n", name, value);
 
 /* strings */
-if ( !strcmp(name, "LOADER") )
+if ( !elf_strcmp_safe(elf, name, "LOADER") )
 safe_strcpy(parms->loader, value);
-if ( !strcmp(name, "GUEST_OS") )
+if ( !elf_strcmp_safe(elf, name, "GUEST_OS") )
 safe_strcpy(parms->guest_os, value);
-if ( !strcmp(name, "GUEST_VER") )
+if ( !elf_strcmp_safe(elf, name, "GUEST_VER") )
 safe_strcpy(parms->guest_ver, value);
-if ( !strcmp(name, "XEN_VER") )
+if ( !elf_strcmp_safe(elf, name, "XEN_VER") )
 safe_strcpy(parms->xen_ver, value);
-if ( !strcmp(name, "PAE") )
+if ( !elf_strcmp_safe(elf, name, "PAE") )
 {
-if ( !strcmp(value, "yes[extended-cr3]") )
+if ( !elf_strcmp_safe(elf, value, "yes[extended-cr3]") )
 parms->pae = XEN_PAE_EXTCR3;
 else if ( !strncmp(value, "yes", 3) )
 parms->pae = XEN_PAE_YES;
 }
-if ( !strcmp(name, "BSD_SYMTAB") )
+if ( !elf_strcmp_safe(elf, name, "BSD_SYMTAB") )
 parms->bsd_symtab = 1;
 
 /* longs */
-if ( !strcmp(name, "VIRT_BASE") )
+if ( !elf_strcmp_safe(elf, name, "VIRT_BASE") )
 parms->virt_base = strtoull(value, NULL, 0);
-if ( !strcmp(name, "VIRT_ENTRY") )
+if ( !elf_strcmp_safe(elf, name, "VIRT_ENTRY") )
 parms->virt_entry = strtoull(value, NULL, 0);
-if ( !strcmp(name, "ELF_PADDR_OFFSET") )
+if ( !elf_strcmp_safe(elf, name, "ELF_PADDR_OFFSET") )
 parms->elf_paddr_offset = strtoull(value, NULL, 0);
-if ( !strcmp(name, "HYPERCALL_PAGE") )
+if ( !elf_strcmp_safe(elf, name, "HYPERCALL_PAGE") )
 parms->virt_hypercall = (strtoull(value, NULL, 0) << 12) +
 parms->virt_base;
 
 /* other */
-if ( !strcmp(name, "FEATURES") )
-if ( elf_xen_parse_features(value, parms->f_supported,
+if ( !elf_strcmp_safe(elf, name, "FEATURES") )
+if ( elf_xen_parse_features(elf, value, parms->f_supported,
 parms->f_required) )
 return -1;
 }
diff --git a/xen/common/libelf/libelf-private.h 
b/xen/common/libelf/libelf-private.h
index 388c3da..082c572 100644

[Xen-devel] [PATCH 2/8] libelf: loop safety: Pass `elf' to elf_xen_parse_features

2016-12-09 Thread Ian Jackson
Not used yet, so no functional change.  We will need this in a moment.

Signed-off-by: Ian Jackson 
---
 xen/common/libelf/libelf-dominfo.c | 5 +++--
 xen/include/xen/libelf.h   | 3 ++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/xen/common/libelf/libelf-dominfo.c 
b/xen/common/libelf/libelf-dominfo.c
index a52900c..7f4a6a0 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -32,7 +32,8 @@ static const char *const elf_xen_feature_names[] = {
 static const unsigned elf_xen_features =
 sizeof(elf_xen_feature_names) / sizeof(elf_xen_feature_names[0]);
 
-elf_errorstatus elf_xen_parse_features(const char *features,
+elf_errorstatus elf_xen_parse_features(struct elf_binary *elf,
+   const char *features,
uint32_t *supported,
uint32_t *required)
 {
@@ -202,7 +203,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
 break;
 
 case XEN_ELFNOTE_FEATURES:
-if ( elf_xen_parse_features(str, parms->f_supported,
+if ( elf_xen_parse_features(elf, str, parms->f_supported,
 parms->f_required) )
 return -1;
 break;
diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
index 294231a..6436bd7 100644
--- a/xen/include/xen/libelf.h
+++ b/xen/include/xen/libelf.h
@@ -452,7 +452,8 @@ static inline int elf_xen_feature_get(int nr, uint32_t * 
addr)
 return !!(addr[nr >> 5] & (1 << (nr & 31)));
 }
 
-int elf_xen_parse_features(const char *features,
+int elf_xen_parse_features(struct elf_binary *elf,
+   const char *features,
uint32_t *supported,
uint32_t *required);
 int elf_xen_parse_note(struct elf_binary *elf,
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Intel board support for xen

2016-12-09 Thread Konrad Rzeszutek Wilk
On Fri, Dec 09, 2016 at 06:40:07PM +0530, George John wrote:
> Hi all,
> 
> I am a newbie in xen. I wish to know which all intel platforms support xen
> hypervisor?.

All the ones that can do 64-bit mode.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xen/setup: Don't relocate p2m/initrd over existing one

2016-12-09 Thread Ross Lagerwall
When relocating the p2m/initrd, take special care not to relocate it so
that is overlaps with the current location of the p2m/initrd. This is
needed since the full extent of the current location is not marked as a
reserved region in the e820 (and it shouldn't be since it is about to be
moved).

This was seen to happen to a dom0 with a large initial p2m and a small
reserved region in the middle of the initial p2m.

Signed-off-by: Ross Lagerwall 
---
 arch/x86/xen/mmu.c |  4 ++--
 arch/x86/xen/setup.c   | 16 ++--
 arch/x86/xen/xen-ops.h |  5 +++--
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 7d5afdb..bc40325 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2074,7 +2074,7 @@ static phys_addr_t __init xen_early_virt_to_phys(unsigned 
long vaddr)
  * Find a new area for the hypervisor supplied p2m list and relocate the p2m to
  * this area.
  */
-void __init xen_relocate_p2m(void)
+void __init xen_relocate_p2m(phys_addr_t cur_start, phys_addr_t cur_size)
 {
phys_addr_t size, new_area, pt_phys, pmd_phys, pud_phys;
unsigned long p2m_pfn, p2m_pfn_end, n_frames, pfn, pfn_end;
@@ -2092,7 +2092,7 @@ void __init xen_relocate_p2m(void)
n_pud = roundup(size, PGDIR_SIZE) >> PGDIR_SHIFT;
n_frames = n_pte + n_pt + n_pmd + n_pud;
 
-   new_area = xen_find_free_area(PFN_PHYS(n_frames));
+   new_area = xen_find_free_area(PFN_PHYS(n_frames), cur_start, cur_size);
if (!new_area) {
xen_raw_console_write("Can't find new memory area for p2m 
needed due to E820 map conflict\n");
BUG();
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index f8960fc..513c48b 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -634,14 +634,15 @@ bool __init xen_is_e820_reserved(phys_addr_t start, 
phys_addr_t size)
 }
 
 /*
- * Find a free area in physical memory not yet reserved and compliant with
- * E820 map.
+ * Find a free area in physical memory not yet reserved, compliant with the
+ * E820 map and not overlapping with the pre-allocated area.
  * Used to relocate pre-allocated areas like initrd or p2m list which are in
  * conflict with the to be used E820 map.
  * In case no area is found, return 0. Otherwise return the physical address
  * of the area which is already reserved for convenience.
  */
-phys_addr_t __init xen_find_free_area(phys_addr_t size)
+phys_addr_t __init xen_find_free_area(phys_addr_t size, phys_addr_t cur_start,
+ phys_addr_t cur_size)
 {
unsigned mapcnt;
phys_addr_t addr, start;
@@ -652,7 +653,8 @@ phys_addr_t __init xen_find_free_area(phys_addr_t size)
continue;
start = entry->addr;
for (addr = start; addr < start + size; addr += PAGE_SIZE) {
-   if (!memblock_is_reserved(addr))
+   if (!memblock_is_reserved(addr) &&
+   (addr < cur_start || addr >= cur_start + cur_size))
continue;
start = addr + PAGE_SIZE;
if (start + size > entry->addr + entry->size)
@@ -726,7 +728,7 @@ static void __init xen_reserve_xen_mfnlist(void)
xen_raw_console_write("Xen hypervisor allocated p2m list conflicts with 
E820 map\n");
BUG();
 #else
-   xen_relocate_p2m();
+   xen_relocate_p2m(start, size);
 #endif
 }
 
@@ -887,7 +889,9 @@ char * __init xen_memory_setup(void)
 boot_params.hdr.ramdisk_size)) {
phys_addr_t new_area, start, size;
 
-   new_area = xen_find_free_area(boot_params.hdr.ramdisk_size);
+   new_area = xen_find_free_area(boot_params.hdr.ramdisk_size,
+ boot_params.hdr.ramdisk_image,
+ boot_params.hdr.ramdisk_size);
if (!new_area) {
xen_raw_console_write("Can't find new memory area for 
initrd needed due to E820 map conflict\n");
BUG();
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 3cbce3b..d3342b8 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -41,14 +41,15 @@ void __init xen_pt_check_e820(void);
 void xen_mm_pin_all(void);
 void xen_mm_unpin_all(void);
 #ifdef CONFIG_X86_64
-void __init xen_relocate_p2m(void);
+void __init xen_relocate_p2m(phys_addr_t cur_start, phys_addr_t cur_size);
 #endif
 
 bool __init xen_is_e820_reserved(phys_addr_t start, phys_addr_t size);
 unsigned long __ref xen_chk_extra_mem(unsigned long pfn);
 void __init xen_inv_extra_mem(void);
 void __init xen_remap_memory(void);
-phys_addr_t __init xen_find_free_area(phys_addr_t size);
+phys_addr_t __init xen_find_free_area(phys_addr_t size, phys_addr_t cur_start,
+ phys_addr_t cur_size);
 char * __init

[Xen-devel] [PATCH 1/8] libelf: loop safety: Introduce elf_iter_ok and elf_strcmp_safe

2016-12-09 Thread Ian Jackson
This will allow us to keep track of the total amount of work we are
doing.  When it becomes excessive, we mark the ELF broken, and stop
processing.

This is a more robust way of preventing DoS problems by bad images
than attempting to prove, for each of the (sometimes rather deeply
nested) loops, that the total work is "reasonable".  We bound the
notional work by 4x the image size (plus 1M).

Also introduce elf_strcmp_safe, which unconditionally does the work,
but increments the count so any outer loop may be aborted if
necessary.

Currently there are no callers, so no functional change.

Signed-off-by: Ian Jackson 
---
 xen/common/libelf/libelf-loader.c | 14 ++
 xen/include/xen/libelf.h  | 21 +
 2 files changed, 35 insertions(+)

diff --git a/xen/common/libelf/libelf-loader.c 
b/xen/common/libelf/libelf-loader.c
index a72cd8a..00479af 100644
--- a/xen/common/libelf/libelf-loader.c
+++ b/xen/common/libelf/libelf-loader.c
@@ -38,6 +38,7 @@ elf_errorstatus elf_init(struct elf_binary *elf, const char 
*image_input, size_t
 ELF_HANDLE_DECL(elf_shdr) shdr;
 unsigned i, count, section, link;
 uint64_t offset;
+const uint64_t max_size_for_deacc = (1UL << 63)/ELF_MAX_ITERATION_FACTOR;
 
 if ( !elf_is_elfbinary(image_input, size) )
 {
@@ -52,6 +53,10 @@ elf_errorstatus elf_init(struct elf_binary *elf, const char 
*image_input, size_t
 elf->class = elf_uval_3264(elf, elf->ehdr, e32.e_ident[EI_CLASS]);
 elf->data = elf_uval_3264(elf, elf->ehdr, e32.e_ident[EI_DATA]);
 
+elf->iteration_deaccumulator = 1024*1024 +
+(size > max_size_for_deacc ? max_size_for_deacc : size)
+* ELF_MAX_ITERATION_FACTOR;
+
 /* Sanity check phdr. */
 offset = elf_uval(elf, elf->ehdr, e_phoff) +
 elf_uval(elf, elf->ehdr, e_phentsize) * elf_phdr_count(elf);
@@ -546,6 +551,15 @@ uint64_t elf_lookup_addr(struct elf_binary * elf, const 
char *symbol)
 return value;
 }
 
+bool elf_iter_ok_counted(struct elf_binary *elf, uint64_t maxcopysz) {
+if (maxcopysz > elf->iteration_deaccumulator)
+elf_mark_broken(elf, "excessive iteration - too much work to parse");
+if (elf->broken)
+return false;
+elf->iteration_deaccumulator -= maxcopysz;
+return true;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
index 1b763f3..294231a 100644
--- a/xen/include/xen/libelf.h
+++ b/xen/include/xen/libelf.h
@@ -56,6 +56,8 @@ typedef void elf_log_callback(struct elf_binary*, void 
*caller_data,
 #define ELF_MAX_STRING_LENGTH 4096
 #define ELF_MAX_TOTAL_NOTE_COUNT 65536
 
+#define ELF_MAX_ITERATION_FACTOR 4
+
 /*  */
 
 /* Macros for accessing the input image and output area. */
@@ -201,6 +203,9 @@ struct elf_binary {
 uint64_t bsd_symtab_pstart;
 uint64_t bsd_symtab_pend;
 
+/* private */
+uint64_t iteration_deaccumulator;
+
 /*
  * caller's other acceptable destination.
  * Set by elf_set_xdest.  Do not set these directly.
@@ -264,6 +269,14 @@ uint64_t elf_access_unsigned(struct elf_binary *elf, 
elf_ptrval ptr,
 
 uint64_t elf_round_up(struct elf_binary *elf, uint64_t addr);
 
+bool elf_iter_ok_counted(struct elf_binary *elf, uint64_t count);
+  /* It is OK for count to be out by a smallish constant factor.
+   * It is OK for count to be 0, as we clamp it to 1, so we
+   * can use lengths or sizes from the image. */
+
+static inline bool elf_iter_ok(struct elf_binary *elf)
+{ return elf_iter_ok_counted(elf,1); }
+
 const char *elf_strval(struct elf_binary *elf, elf_ptrval start);
   /* may return NULL if the string is out of range etc. */
 
@@ -463,6 +476,14 @@ static inline void *elf_memset_unchecked(void *s, int c, 
size_t n)
* memcpy, memset and memmove to undefined MISTAKE things.
*/
 
+static inline int elf_strcmp_safe(struct elf_binary *elf,
+  const char *a, const char *b) {
+elf_iter_ok_counted(elf, strlen(b));
+return strcmp(a,b);
+}
+  /* Unlike other *_safe functions, elf_strcmp_safe is called on
+   * values already extracted from the image (eg by elf_strval),
+   * and fixed constant strings (typically, the latter is "b"). */
 
 /* Advances past amount bytes of the current destination area. */
 static inline void ELF_ADVANCE_DEST(struct elf_binary *elf, uint64_t amount)
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 6/8] libelf: loop safety cleanup: Remove obsolete check in elf_shdr_count

2016-12-09 Thread Ian Jackson
All the loops which might go out of control, due to excessive shdrs,
have been decorated with elf_iter_ok.  So there is no need for this
explicit (and rather crude) check.

(Anyway, the count was a 16-bit field, so the check was redundant.)

Signed-off-by: Ian Jackson 
---
 xen/common/libelf/libelf-tools.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/xen/common/libelf/libelf-tools.c b/xen/common/libelf/libelf-tools.c
index 7fa5963..b799b56 100644
--- a/xen/common/libelf/libelf-tools.c
+++ b/xen/common/libelf/libelf-tools.c
@@ -131,17 +131,7 @@ uint64_t elf_round_up(struct elf_binary *elf, uint64_t 
addr)
 
 unsigned elf_shdr_count(struct elf_binary *elf)
 {
-unsigned count = elf_uval(elf, elf->ehdr, e_shnum);
-uint64_t max = elf->size / sizeof(Elf32_Shdr);
-
-if ( max > UINT_MAX )
-max = UINT_MAX;
-if ( count > max )
-{
-elf_mark_broken(elf, "far too many section headers");
-count = max;
-}
-return count;
+return elf_uval(elf, elf->ehdr, e_shnum);
 }
 
 unsigned elf_phdr_count(struct elf_binary *elf)
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4/8] libelf: loop safety: Call elf_iter_ok_counted at every *mem*_unsafe

2016-12-09 Thread Ian Jackson
When we use elf_mem*_unsafe, we need to check that we are not doing
too much work.

Ensure that a call to elf_iter_ok_counted is near every call to
elf_mem*_unsafe.

(At one call site, just have a comment instead.)

Signed-off-by: Ian Jackson 
---
 xen/common/libelf/libelf-dominfo.c | 1 +
 xen/common/libelf/libelf-loader.c  | 2 +-
 xen/common/libelf/libelf-tools.c   | 6 --
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/xen/common/libelf/libelf-dominfo.c 
b/xen/common/libelf/libelf-dominfo.c
index b139e32..87a47d9 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -498,6 +498,7 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf,
 unsigned total_note_count = 0;
 
 elf_memset_unchecked(parms, 0, sizeof(*parms));
+elf_iter_ok_counted(elf, sizeof(*parms));
 parms->virt_base = UNSET_ADDR;
 parms->virt_entry = UNSET_ADDR;
 parms->virt_hypercall = UNSET_ADDR;
diff --git a/xen/common/libelf/libelf-loader.c 
b/xen/common/libelf/libelf-loader.c
index 68c9021..d5e51d3 100644
--- a/xen/common/libelf/libelf-loader.c
+++ b/xen/common/libelf/libelf-loader.c
@@ -46,7 +46,7 @@ elf_errorstatus elf_init(struct elf_binary *elf, const char 
*image_input, size_t
 return -1;
 }
 
-elf_memset_unchecked(elf, 0, sizeof(*elf));
+elf_memset_unchecked(elf, 0, sizeof(*elf)); /* loop safety: singleton */
 elf->image_base = image_input;
 elf->size = size;
 elf->ehdr = ELF_MAKE_HANDLE(elf_ehdr, (elf_ptrval)image_input);
diff --git a/xen/common/libelf/libelf-tools.c b/xen/common/libelf/libelf-tools.c
index 56dab63..ab83150 100644
--- a/xen/common/libelf/libelf-tools.c
+++ b/xen/common/libelf/libelf-tools.c
@@ -69,7 +69,8 @@ void elf_memcpy_safe(struct elf_binary *elf, elf_ptrval dst,
  elf_ptrval src, size_t size)
 {
 if ( elf_access_ok(elf, dst, size) &&
- elf_access_ok(elf, src, size) )
+ elf_access_ok(elf, src, size) &&
+ elf_iter_ok_counted(elf, size) )
 {
 /* use memmove because these checks do not prove that the
  * regions don't overlap and overlapping regions grant
@@ -80,7 +81,8 @@ void elf_memcpy_safe(struct elf_binary *elf, elf_ptrval dst,
 
 void elf_memset_safe(struct elf_binary *elf, elf_ptrval dst, int c, size_t 
size)
 {
-if ( elf_access_ok(elf, dst, size) )
+if ( elf_access_ok(elf, dst, size) &&
+ elf_iter_ok_counted(elf, size))
 {
 elf_memset_unchecked(ELF_UNSAFE_PTR(dst), c, size);
 }
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3/8] libelf: loop safety: Call elf_iter_ok[_counted] in every loop

2016-12-09 Thread Ian Jackson
In every `for' or `while' loop, either call elf_iter_ok, or explain
why it's not necessary.  This is part of comprehensive defence against
out of control loops.

Signed-off-by: Ian Jackson 
---
 xen/common/libelf/libelf-dominfo.c | 22 +-
 xen/common/libelf/libelf-loader.c  |  8 
 xen/common/libelf/libelf-tools.c   |  6 +++---
 3 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/xen/common/libelf/libelf-dominfo.c 
b/xen/common/libelf/libelf-dominfo.c
index 7f4a6a0..b139e32 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -43,10 +43,13 @@ elf_errorstatus elf_xen_parse_features(struct elf_binary 
*elf,
 if ( features == NULL )
 return 0;
 
-for ( pos = 0; features[pos] != '\0'; pos += len )
+for ( pos = 0;
+  elf_iter_ok_counted(elf, sizeof(feature)) &&
+  features[pos] != '\0';
+  pos += len )
 {
 elf_memset_unchecked(feature, 0, sizeof(feature));
-for ( len = 0;; len++ )
+for ( len = 0;; len++ ) /* can't do more than sizeof(feature) */
 {
 if ( len >= sizeof(feature)-1 )
 break;
@@ -60,7 +63,7 @@ elf_errorstatus elf_xen_parse_features(struct elf_binary *elf,
 feature[len] = features[pos + len];
 }
 
-for ( i = 0; i < elf_xen_features; i++ )
+for ( i = 0; elf_iter_ok(elf) && i < elf_xen_features; i++ )
 {
 if ( !elf_xen_feature_names[i] )
 continue;
@@ -236,7 +239,7 @@ static unsigned elf_xen_parse_notes(struct elf_binary *elf,
 parms->elf_note_start = start;
 parms->elf_note_end   = end;
 for ( note = ELF_MAKE_HANDLE(elf_note, parms->elf_note_start);
-  ELF_HANDLE_PTRVAL(note) < parms->elf_note_end;
+  elf_iter_ok(elf) && ELF_HANDLE_PTRVAL(note) < parms->elf_note_end;
   note = elf_note_next(elf, note) )
 {
 #ifdef __XEN__
@@ -273,11 +276,12 @@ elf_errorstatus elf_xen_parse_guest_info(struct 
elf_binary *elf,
 
 h = parms->guest_info;
 #define STAR(h) (elf_access_unsigned(elf, (h), 0, 1))
-while ( STAR(h) )
+while ( elf_iter_ok_counted(elf, sizeof(name) + sizeof(value)) &&
+STAR(h) )
 {
 elf_memset_unchecked(name, 0, sizeof(name));
 elf_memset_unchecked(value, 0, sizeof(value));
-for ( len = 0;; len++, h++ )
+for ( len = 0;; len++, h++ ) /* covered by iter_ok_counted above */
 {
 if ( len >= sizeof(name)-1 )
 break;
@@ -291,7 +295,7 @@ elf_errorstatus elf_xen_parse_guest_info(struct elf_binary 
*elf,
 if ( STAR(h) == '=' )
 {
 h++;
-for ( len = 0;; len++, h++ )
+for ( len = 0;; len++, h++ ) /* covered by iter_ok_counted */
 {
 if ( len >= sizeof(value)-1 )
 break;
@@ -504,7 +508,7 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf,
 
 /* Find and parse elf notes. */
 count = elf_phdr_count(elf);
-for ( i = 0; i < count; i++ )
+for ( i = 0; elf_iter_ok(elf) && i < count; i++ )
 {
 phdr = elf_phdr_by_index(elf, i);
 if ( !elf_access_ok(elf, ELF_HANDLE_PTRVAL(phdr), 1) )
@@ -537,7 +541,7 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf,
 if ( xen_elfnotes == 0 )
 {
 count = elf_shdr_count(elf);
-for ( i = 1; i < count; i++ )
+for ( i = 1; elf_iter_ok(elf) && i < count; i++ )
 {
 shdr = elf_shdr_by_index(elf, i);
 if ( !elf_access_ok(elf, ELF_HANDLE_PTRVAL(shdr), 1) )
diff --git a/xen/common/libelf/libelf-loader.c 
b/xen/common/libelf/libelf-loader.c
index 00479af..68c9021 100644
--- a/xen/common/libelf/libelf-loader.c
+++ b/xen/common/libelf/libelf-loader.c
@@ -85,7 +85,7 @@ elf_errorstatus elf_init(struct elf_binary *elf, const char 
*image_input, size_t
 
 /* Find symbol table and symbol string table. */
 count = elf_shdr_count(elf);
-for ( i = 1; i < count; i++ )
+for ( i = 1; elf_iter_ok(elf) && i < count; i++ )
 {
 shdr = elf_shdr_by_index(elf, i);
 if ( !elf_access_ok(elf, ELF_HANDLE_PTRVAL(shdr), 1) )
@@ -425,7 +425,7 @@ do {
\
  * NB: this _must_ be done one by one, and taking the bitness into account,
  * so that the guest can treat this as an array of type Elf{32/64}_Shdr.
  */
-for ( i = 0; i < ELF_BSDSYM_SECTIONS; i++ )
+for ( i = 0; elf_iter_ok(elf) && i < ELF_BSDSYM_SECTIONS; i++ )
 {
 rc = elf_load_image(elf, header_base + header_size + shdr_size * i,
 ELF_REALPTR2PTRVAL(&header.elf_header.section[i]),
@@ -453,7 +453,7 @@ void elf_parse_binary(struct elf_binary *elf)
 unsigned i, count;
 
 count = elf_phdr_count(elf);
-for ( i = 0; i < count; i++ )
+for ( i = 0; elf_iter_ok(elf) && i < c

[Xen-devel] [PATCH 8/8] libelf: safety: Document safety principles in header file

2016-12-09 Thread Ian Jackson
Signed-off-by: Ian Jackson 
---
 xen/include/xen/libelf.h | 92 
 1 file changed, 92 insertions(+)

diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
index 6436bd7..8b75242 100644
--- a/xen/include/xen/libelf.h
+++ b/xen/include/xen/libelf.h
@@ -60,6 +60,96 @@ typedef void elf_log_callback(struct elf_binary*, void 
*caller_data,
 
 /*  */
 
+/*
+ * DESIGN PRINCIPLES FOR THE SAFETY OF LIBELF
+ *
+ * libelf is a complex piece of code on a security boundary: when
+ * built as part of the tools, it parses guest kernels and loads them
+ * into guest memory.  Bugs in libelf can become privilege escalation
+ * or denial of service bugs in the toolstack.
+ *
+ * We try to reduce the risk of such bugs by writing the actual format
+ * parsing in a mostly-safe subset of C.  To avoid nonlocal exits or
+ * the need for explicit error-checking code, we make all references
+ * into the input image, or into guest memory, via an inherently safe
+ * wrapper system.
+ *
+ * This means that it is safe to simply honour the instructions from
+ * the image, even if they are nonsense.  If the image implies wild
+ * pointer accesses, these will be harmlessly defused; a note will be
+ * made that things are broken; and processing can safely continue
+ * despite some of the operations having not been done.  Eventually
+ * the error will be reported.
+ *
+ *
+ * To preserve these safety properties, there are some rules that
+ * programmers editing libelf need to follow:
+ *
+ *  - Any loop needs to be accompanied by calls to elf_iter_ok (or
+ *elf_iter_ok_counted).
+ *
+ *Rationale: the image must not be able to cause libelf to do
+ *unbounded work (ie, get stuck in a loop).
+ *
+ *  - The input image and output area must be accessed only via the
+ *safe pointer access system.  Real pointers into the input or
+ *output may not be even *calculated*.
+ *
+ *Rationale: calculating wild pointers is undefined behaviour;
+ *if the compiler sees that you might be calculating wild
+ *pointers, it may remove important checks!
+ *
+ *  - Stack local buffer variables containing information derived from
+ *the image (including structs, or byte buffers) must be
+ *completely zeroed, using elf_memset_unchecked (and an
+ *accompanying elf_iter_ok_counted) on entry to the function (or
+ *somewhere very obviously near there).
+ *
+ *Rationale: This avoids uninitialised stack data being used
+ *as input to any of the loader.
+ *
+ *  - All integer variables should be unsigned.
+ *
+ *Rationale: this avoids signed integer overflow (which has
+ *undefined behaviour in C, and if spotted by the compiler can
+ *cause it to generate bad code).
+ *
+ *  - Arithmetic operations other than + - * should be avoided; in
+ *particular, division (/ or %) by non-constant values should be
+ *avoided.  If it cannot be avoided, the divisor must be checked
+ *for zero.
+ *
+ *Rationale: We must avoid division-by-zero (or other overflow
+ *traps).
+ *
+ *  - If it is desirable to breach these rules, there should be a
+ *comment explaining why this is OK.
+ *
+ * Even so, this is a fairly high-risk environment, so:
+ *
+ *  - Do not add code which is not necessary for libelf to function
+ *with correct input, or to avoid being vulnerable to incorrect
+ *input.  Do not add additional functionally-unnecessary checks
+ *for diagnosing problems with the image, or validating sanity of
+ *the input ELF.
+ *
+ *Rationale: Redundant checks have almost zero benefit because
+ *1. we do not expect ELF-generating tools to generate invalid
+ *ELFs so these checks' failure paths will very likely never
+ *be executed anywhere, and 2. anyone debugging such a
+ *hypothetical bad ELF will have a variety of tools available
+ *which will do a much better job of analysing it.
+ *
+ *  - However, it is OK to have checks code which provide duplicate
+ *defence against certain hostile images, if it is not otherwise
+ *obvious how libelf would be defended against such images.
+ *
+ *Rationale: Redundant checks where the situation would
+ *otherwise not be quite clear mean that the safety of the
+ *code is easy to see throughout; so that any unsafe code
+ *would be more obvious.
+ */
+
 /* Macros for accessing the input image and output area. */
 
 /*
@@ -475,6 +565,8 @@ static inline void *elf_memset_unchecked(void *s, int c, 
size_t n)
* pointers.  These are just like the real functions.
* We provide these so that in libelf-private.h we can #define
* memcpy, memset and memmove to undefined MISTAKE things.
+   *
+   * Remember to call elf_iter_ok_counted nearby.
*/
 
 static inline int elf_strcmp_safe(struct e

[Xen-devel] [PATCH 0/8] libelf: safety enhancements

2016-12-09 Thread Ian Jackson
We recently discovered two near-miss in libelf:

* The intended method for limiting the phdr loop iteration count was
  not effective.  But happily this turned not to be important because
  the count field is only 16 bits.

* A recent commit accidentally introduced a division by zero
  vulnerability.

Subsequent discussion revealed that the design principles underlying
libelf's safety were not widely understood - because they were not
documented.

Initially I tried dealing with the loop safety problem by auditing the
code and adding a suitable comment next to each loop, stating a proof
sketch of the loop's safety.  I found that this quickly became
unworkable, because there are nested loops.  These nested loops did
not have badly unreasonable upper bounds but the complexity of the
analysis was unsuitable for security-critical review.

An upper bound on the work done in loops in libelf is necessary
because libelf may be called by the toolstack in a context where it
would block other work.  Specifically, libelf is called by libxl, and
libxl does all of its work within a single per-ctx lock.  libxl's
callers are not supposed to be required to invoke libxl on multiple
ctxs or with multiple processes simultaneously, and in any case
we don't want to generate and leak stuck toolstack processes.

So, in this series, I propose:

 * A new scheme for limiting the work done by libelf.  We track it
   explicitly, and check it on each iteration of every loop.  (This
   replaces a similar ad-hoc scheme used for copying image data.)

 * Documentation which states the safety design principles for libelf,
   and the coding rules which follow from those design principles.

After this series is done there are a few redundant loop safety
checks, from the previous approach:

 * There are a number of ad-hoc limits on string sizes, certain table
   sizes, etc.

 * There are calls to elf_access_ok which were intended to limit loop
   iteration counts (but are ineffective at doing so since the stride
   is controlled by the input image and might be zero).

I have chosen to retain these.  Removing them seems like an
unnecessary risk.  In particular, searching for and removing
certain elf_access_ok calls seems unwise.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 7/8] libelf: loop safety cleanup: Remove superseded image size copy check

2016-12-09 Thread Ian Jackson
Now, elf_load_image eventually calls elf_memcpy_safe, which calls
elf_iter_ok_counted.

So there is a work limit of 4x the image size.  This is larger than
the previous limit of 2x the image size, but it includes a lot of
other processing too.  And the purpose is to reject bad images without
a significant risk of rejecting sane ones.  A 4x limit is tight
enough.

So this ad-hoc remain_allow_copy check has been entirely superseded
and can be removed.

Signed-off-by: Ian Jackson 
---
 xen/common/libelf/libelf-loader.c | 19 ---
 1 file changed, 19 deletions(-)

diff --git a/xen/common/libelf/libelf-loader.c 
b/xen/common/libelf/libelf-loader.c
index d5e51d3..5e4671b 100644
--- a/xen/common/libelf/libelf-loader.c
+++ b/xen/common/libelf/libelf-loader.c
@@ -482,12 +482,6 @@ elf_errorstatus elf_load_binary(struct elf_binary *elf)
 uint64_t paddr, offset, filesz, memsz;
 unsigned i, count;
 elf_ptrval dest;
-/*
- * Let bizarre ELFs write the output image up to twice; this
- * calculation is just to ensure our copying loop is no worse than
- * O(domain_size).
- */
-uint64_t remain_allow_copy = (uint64_t)elf->dest_size * 2;
 
 count = elf_phdr_count(elf);
 for ( i = 0; elf_iter_ok(elf) && i < count; i++ )
@@ -504,19 +498,6 @@ elf_errorstatus elf_load_binary(struct elf_binary *elf)
 memsz = elf_uval(elf, phdr, p_memsz);
 dest = elf_get_ptr(elf, paddr);
 
-/*
- * We need to check that the input image doesn't have us copy
- * the whole image zillions of times, as that could lead to
- * O(n^2) time behaviour and possible DoS by a malicous ELF.
- */
-if ( remain_allow_copy < memsz )
-{
-elf_mark_broken(elf, "program segments total to more"
-" than the input image size");
-break;
-}
-remain_allow_copy -= memsz;
-
 elf_msg(elf,
 "ELF: phdr %u at %#"ELF_PRPTRVAL" -> %#"ELF_PRPTRVAL"\n",
 i, dest, (elf_ptrval)(dest + filesz));
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


  1   2   3   >