date:20210111

[libvirt test] 158374: regressions - FAIL

2021-01-11 Thread osstest service owner

flight 158374 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158374/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  33ecb95afd2993a3f28a984219b6e0e771901e99
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  186 days
Failing since151818  2020-07-11 04:18:52 Z  185 days  180 attempts
Testing same since   158374  2021-01-12 04:19:31 Z0 days1 attempts


People who touched revisions under test:
  Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Andika Triwidada 
  Andrea Bolognani 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastien Orivel 
  Bihong Yu 
  Binfeng Wu 
  Boris Fiuczynski 
  Brian Turek 
  Christian Ehrhardt 
  Christian Schoenebeck 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Eiichi Tsukata 
  Erik Skultety 
  Fabian Affolter 
  Fabian Freyer 
  Fangge Jin 
  Farhan Ali 
  Fedora Weblate Translation 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Ian Wienand 
  Jamie Strandboge 
  Jamie Strandboge 
  Jan Kuparinen 
  Jean-Baptiste Holcroft 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jiri Denemark 
  John Ferlan 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Laine Stump 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Masayoshi Mizuma 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Meina Li 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Neal Gompa 
  Nick Shyrokovskiy 
  Nickys Music Group 
  Nico Pache 
  Nikolay Shirokovskiy 
  Olaf Hering 
  Olesya Gerasimenko 
  Orion Poplawski 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Ricky Tigg 
  Roman Bogorodskiy 
  Roman Bolshakov 
  Ryan Gahagan 
  Ryan Schmidt 
  Sam Hartman 
  Scott Shambarger 
  Sebastian Mitterle 
  Shalini Chellathurai Saroja 
  Shaojun Yang 
  Shi Lei 
  Simon Gaiser 
  Stefan Bader 
  Stefan Berger 
  Szymon Scholz 
  Thomas Huth 
  Tim Wiederhake 
  Tomáš Golembiovský 
  Tomáš Janoušek 
  Tuguoyi 
  Wang Xin 
  Weblate 
  Yang Hang 
  Yanqiu Zhang 
  Yi Li 
  Yi Wang 
  Yuri Chornoivan 
  Zheng Chuan 
  zhenwei pi 
  Zhenyu Zheng 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  fail
 build-arm64-libvirt  fail
 build-armhf-libvirt  fail
 build-i386-libvirt   fail
 build-amd64-pvopspass
 build-arm64-pvops

Re: [RFC PATCH v3 0/6] Restricted DMA

2021-01-11 Thread Claire Chang

On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli  wrote:
>
> On 1/7/21 9:42 AM, Claire Chang wrote:
>
> >> Can you explain how ATF gets involved and to what extent it does help,
> >> besides enforcing a secure region from the ARM CPU's perpsective? Does
> >> the PCIe root complex not have an IOMMU but can somehow be denied access
> >> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> >> still some sort of basic protection that the HW enforces, right?
> >
> > We need the ATF support for memory MPU (memory protection unit).
> > Restricted DMA (with reserved-memory in dts) makes sure the predefined 
> > memory
> > region is for PCIe DMA only, but we still need MPU to locks down PCIe 
> > access to
> > that specific regions.
>
> OK so you do have a protection unit of some sort to enforce which region
> in DRAM the PCIE bridge is allowed to access, that makes sense,
> otherwise the restricted DMA region would only be a hint but nothing you
> can really enforce. This is almost entirely analogous to our systems then.

Here is the example of setting the MPU:
https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132

>
> There may be some value in standardizing on an ARM SMCCC call then since
> you already support two different SoC vendors.
>
> >
> >>
> >> On Broadcom STB SoCs we have had something similar for a while however
> >> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> >> basic protection mechanism whereby we can configure a region in DRAM to
> >> be PCIe read/write and CPU read/write which then gets used as the PCIe
> >> inbound region for the PCIe EP. By default the PCIe bridge is not
> >> allowed access to DRAM so we must call into a security agent to allow
> >> the PCIe bridge to access the designated DRAM region.
> >>
> >> We have done this using a private CMA area region assigned via Device
> >> Tree, assigned with a and requiring the PCIe EP driver to use
> >> dma_alloc_from_contiguous() in order to allocate from this device
> >> private CMA area. The only drawback with that approach is that it
> >> requires knowing how much memory you need up front for buffers and DMA
> >> descriptors that the PCIe EP will need to process. The problem is that
> >> it requires driver modifications and that does not scale over the number
> >> of PCIe EP drivers, some we absolutely do not control, but there is no
> >> need to bounce buffer. Your approach scales better across PCIe EP
> >> drivers however it does require bounce buffering which could be a
> >> performance hit.
> >
> > Only the streaming DMA (map/unmap) needs bounce buffering.
>
> True, and typically only on transmit since you don't really control
> where the sk_buff are allocated from, right? On RX since you need to
> hand buffer addresses to the WLAN chip prior to DMA, you can allocate
> them from a pool that already falls within the restricted DMA region, right?
>

Right, but applying bounce buffering to RX will make it more secure.
The device won't be able to modify the content after unmap. Just like what
iommu_unmap does.

> > I also added alloc/free support in this series
> > (https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc() 
> > will
> > try to allocate memory from the predefined memory region.
> >
> > As for the performance hit, it should be similar to the default swiotlb.
> > Here are my experiment results. Both SoCs lack IOMMU for PCIe.
> >
> > PCIe wifi vht80 throughput -
> >
> >   MTK SoC  tcp_tx tcp_rxudp_tx   udp_rx
> >   w/o Restricted DMA  244.1 134.66   312.56   350.79
> >   w/ Restricted DMA246.95   136.59   363.21   351.99
> >
> >   Rockchip SoC   tcp_tx tcp_rxudp_tx   udp_rx
> >   w/o Restricted DMA  237.87   133.86   288.28   361.88
> >   w/ Restricted DMA256.01   130.95   292.28   353.19
>
> How come you get better throughput with restricted DMA? Is it because
> doing DMA to/from a contiguous region allows for better grouping of
> transactions from the DRAM controller's perspective somehow?

I'm not sure, but actually, enabling the default swiotlb for wifi also helps the
throughput a little bit for me.

>
> >
> > The CPU usage doesn't increase too much either.
> > Although I didn't measure the CPU usage very precisely, it's ~3% with a 
> > single
> > big core (Cortex-A72) and ~5% with a single small core (Cortex-A53).
> >
> > Thanks!
> >
> >>
> >> Thanks!
> >> --
> >> Florian
>
>
> --
> Florian

Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

2021-01-11 Thread Claire Chang

On Fri, Jan 8, 2021 at 2:15 AM Florian Fainelli  wrote:
>
> On 1/7/21 10:00 AM, Konrad Rzeszutek Wilk wrote:
> >>>
> >>>
> >>>  - Nothing stops the physical device from bypassing the SWIOTLB buffer.
> >>>That is if an errant device screwed up the length or DMA address, the
> >>>SWIOTLB would gladly do what the device told it do?
> >>
> >> So the system needs to provide a way to lock down the memory access, e.g. 
> >> MPU.
> >
> > OK! Would it be prudent to have this in the description above perhaps?
>
> Yes this is something that must be documented as a requirement for the
> restricted DMA pool users, otherwise attempting to do restricted DMA
> pool is no different than say, using a device private CMA region.
> Without the enforcement, this is just a best effort.

Will add in the next version.

> --
> Florian

Re: [PATCH] xen/privcmd: allow fetching resource sizes

2021-01-11 Thread Jürgen Groß


On 12.01.21 06:50, Jürgen Groß wrote:

On 11.01.21 23:39, Andrew Cooper wrote:

On 11/01/2021 22:09, boris.ostrov...@oracle.com wrote:

On 1/11/21 10:29 AM, Roger Pau Monne wrote:

+    xdata.domid = kdata.dom;
+    xdata.type = kdata.type;
+    xdata.id = kdata.id;
+
+    if (!kdata.addr && !kdata.num) {


I think we should not allow only one of them to be zero. If it's only 
kdata.num then we will end up with pfns array set to ZERO_SIZE_PTR 
(which is 0x10). We seem to be OK in that we are not derefencing pfns 
(either in kernel or in hypervisor) if number of frames is zero but 
IMO we shouldn't be tempting the fate.



(And if it's only kdata.addr then we will get a vma but I am not sure 
it will do what we want.)


Passing addr == 0 without num being 0 is already an error in Xen, and
passing num == 0 without addr being 0 is bogus and will be an error by
the time I'm finished fixing this.

FWIW, the common usecase for non-trivial examples will be:

xenforeignmem_resource_size(domid, type, id, &size);
xenforeignmem_map_resource(domid, type, id, NULL, size, ...);

which translates into:

ioctl(MAP_RESOURCE, NULL, 0) => size
mmap(NULL, size, ...) => ptr
ioctl(MAP_RESOURCE, ptr, size)

from the kernels point of view, and two hypercalls from Xen's point of
view.  The NULL's above are expected to be the common case for letting
the kernel chose the vma, but ought to be filled in by the time the
second ioctl() occurs.

See
https://lore.kernel.org/xen-devel/20200922182444.12350-1-andrew.coop...@citrix.com/T/#u 


for all the gory details.


I don't think the kernel should rely on the hypervisor to return
an error in case addr != 0 and num == 0.

The driver should return -EINVAL in that case IMO.


And additionally I think the kernel should check num to be not too
large (in the interface it is u64, while intermediate values are
stored in unsigned int), limiting it to something below INT_MAX
seems to be sensible.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH] xen/privcmd: allow fetching resource sizes

2021-01-11 Thread Jürgen Groß


On 11.01.21 16:29, Roger Pau Monne wrote:

Allow issuing an IOCTL_PRIVCMD_MMAP_RESOURCE ioctl with num = 0 and
addr = 0 in order to fetch the size of a specific resource.

Add a shortcut to the default map resource path, since fetching the
size requires no address to be passed in, and thus no VMA to setup.

Fixes: 3ad0876554caf ('xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE')


I don't think this addition is a reason to add a "Fixes:" tag. This is
clearly new functionality.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH] xen/privcmd: allow fetching resource sizes

2021-01-11 Thread Jürgen Groß


On 11.01.21 23:39, Andrew Cooper wrote:

On 11/01/2021 22:09, boris.ostrov...@oracle.com wrote:

On 1/11/21 10:29 AM, Roger Pau Monne wrote:
  
+	xdata.domid = kdata.dom;

+   xdata.type = kdata.type;
+   xdata.id = kdata.id;
+
+   if (!kdata.addr && !kdata.num) {


I think we should not allow only one of them to be zero. If it's only kdata.num 
then we will end up with pfns array set to ZERO_SIZE_PTR (which is 0x10). We 
seem to be OK in that we are not derefencing pfns (either in kernel or in 
hypervisor) if number of frames is zero but IMO we shouldn't be tempting the 
fate.


(And if it's only kdata.addr then we will get a vma but I am not sure it will 
do what we want.)


Passing addr == 0 without num being 0 is already an error in Xen, and
passing num == 0 without addr being 0 is bogus and will be an error by
the time I'm finished fixing this.

FWIW, the common usecase for non-trivial examples will be:

xenforeignmem_resource_size(domid, type, id, &size);
xenforeignmem_map_resource(domid, type, id, NULL, size, ...);

which translates into:

ioctl(MAP_RESOURCE, NULL, 0) => size
mmap(NULL, size, ...) => ptr
ioctl(MAP_RESOURCE, ptr, size)

from the kernels point of view, and two hypercalls from Xen's point of
view.  The NULL's above are expected to be the common case for letting
the kernel chose the vma, but ought to be filled in by the time the
second ioctl() occurs.

See
https://lore.kernel.org/xen-devel/20200922182444.12350-1-andrew.coop...@citrix.com/T/#u
for all the gory details.


I don't think the kernel should rely on the hypervisor to return
an error in case addr != 0 and num == 0.

The driver should return -EINVAL in that case IMO.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature

[PATCH v2 1/2] viridian: remove implicit limit of 64 VPs per partition

2021-01-11 Thread Igor Druzhinin

TLFS 7.8.1 stipulates that "a virtual processor index must be less than
the maximum number of virtual processors per partition" that "can be obtained
through CPUID leaf 0x4005". Furthermore, "Requirements for Implementing
the Microsoft Hypervisor Interface" defines that starting from Windows Server
2012, which allowed more than 64 CPUs to be brought up, this leaf can now
contain a value -1 basically assuming the hypervisor has no restriction while
0 (that we currently expose) means the default restriction is still present.

Along with the previous changes exposing ExProcessorMasks this allows a recent
Windows VM with Viridian extension enabled to have more than 64 vCPUs without
going into BSOD in some cases.

Since we didn't expose the leaf before and to keep CPUID data consistent for
incoming streams from previous Xen versions - let's keep it behind an option.

Signed-off-by: Igor Druzhinin 
---
Changes in v2:
- expose the option in libxl
---
 docs/man/xl.cfg.5.pod.in |  9 -
 tools/include/libxl.h|  6 ++
 tools/libs/light/libxl_types.idl |  1 +
 tools/libs/light/libxl_x86.c |  4 
 xen/arch/x86/hvm/viridian/viridian.c | 23 +++
 xen/include/public/hvm/params.h  |  7 ++-
 6 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index c8e017f..3467eae 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -2260,11 +2260,18 @@ mask. Hence this enlightenment must be specified for 
guests with more
 than 64 vCPUs if B and/or B are also
 specified.
 
+=item B
+
+This group when set indicates to a guest that the hypervisor does not
+explicitly have any limits on the number of Virtual processors a guest
+is allowed to bring up. It is strongly recommended to keep this enabled
+for guests with more than 64 vCPUs.
+
 =item B
 
 This is a special value that enables the default set of groups, which
 is currently the B, B, B, B,
-B and B groups.
+B, B and B groups.
 
 =item B
 
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 3433c95..be1e288 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -452,6 +452,12 @@
 #define LIBXL_HAVE_VIRIDIAN_EX_PROCESSOR_MASKS 1
 
 /*
+ * LIBXL_HAVE_VIRIDIAN_NO_VP_LIMIT indicates that the 'no_vp_limit' value
+ * is present in the viridian enlightenment enumeration.
+ */
+#define LIBXL_HAVE_VIRIDIAN_NO_VP_LIMIT 1
+
+/*
  * LIBXL_HAVE_DEVICE_PCI_LIST_FREE indicates that the
  * libxl_device_pci_list_free() function is defined.
  */
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 0532473..8502b29 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -239,6 +239,7 @@ libxl_viridian_enlightenment = 
Enumeration("viridian_enlightenment", [
 (8, "stimer"),
 (9, "hcall_ipi"),
 (10, "ex_processor_masks"),
+(11, "no_vp_limit"),
 ])
 
 libxl_hdtype = Enumeration("hdtype", [
diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
index 86d2729..5c4c194 100644
--- a/tools/libs/light/libxl_x86.c
+++ b/tools/libs/light/libxl_x86.c
@@ -309,6 +309,7 @@ static int hvm_set_viridian_features(libxl__gc *gc, 
uint32_t domid,
 libxl_bitmap_set(&enlightenments, 
LIBXL_VIRIDIAN_ENLIGHTENMENT_TIME_REF_COUNT);
 libxl_bitmap_set(&enlightenments, 
LIBXL_VIRIDIAN_ENLIGHTENMENT_APIC_ASSIST);
 libxl_bitmap_set(&enlightenments, 
LIBXL_VIRIDIAN_ENLIGHTENMENT_CRASH_CTL);
+libxl_bitmap_set(&enlightenments, 
LIBXL_VIRIDIAN_ENLIGHTENMENT_NO_VP_LIMIT);
 }
 
 libxl_for_each_set_bit(v, info->u.hvm.viridian_enable) {
@@ -369,6 +370,9 @@ static int hvm_set_viridian_features(libxl__gc *gc, 
uint32_t domid,
 if (libxl_bitmap_test(&enlightenments, 
LIBXL_VIRIDIAN_ENLIGHTENMENT_EX_PROCESSOR_MASKS))
 mask |= HVMPV_ex_processor_masks;
 
+if (libxl_bitmap_test(&enlightenments, 
LIBXL_VIRIDIAN_ENLIGHTENMENT_NO_VP_LIMIT))
+mask |= HVMPV_no_vp_limit;
+
 if (mask != 0 &&
 xc_hvm_param_set(CTX->xch,
  domid,
diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
b/xen/arch/x86/hvm/viridian/viridian.c
index ed97804..ae1ea86 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -209,6 +209,29 @@ void cpuid_viridian_leaves(const struct vcpu *v, uint32_t 
leaf,
 res->b = viridian_spinlock_retry_count;
 break;
 
+case 5:
+/*
+ * From "Requirements for Implementing the Microsoft Hypervisor
+ *  Interface":
+ *
+ * "On Windows operating systems versions through Windows Server
+ * 2008 R2, reporting the HV#1 hypervisor interface limits
+ * the Windows virtual machine to a maximum of 64 VPs, regardless of
+ * what is reported via CPUID.4005.EAX.
+ *
+ * Starting with Windows Server 2012 and Windows 8, if
+ *

[PATCH v2 2/2] viridian: allow vCPU hotplug for Windows VMs

2021-01-11 Thread Igor Druzhinin

If Viridian extensions are enabled, Windows wouldn't currently allow
a hotplugged vCPU to be brought up dynamically. We need to expose a special
bit to let the guest know we allow it. Hide it behind an option to stay
on the safe side regarding compatibility with existing guests but
nevertheless set the option on by default.

Signed-off-by: Igor Druzhinin 
---
Changes on v2:
- hide the bit under an option and expose it in libxl
---
 docs/man/xl.cfg.5.pod.in | 7 ++-
 tools/include/libxl.h| 6 ++
 tools/libs/light/libxl_types.idl | 1 +
 tools/libs/light/libxl_x86.c | 4 
 xen/arch/x86/hvm/viridian/viridian.c | 5 -
 xen/include/public/hvm/params.h  | 7 ++-
 6 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 3467eae..7cdb859 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -2267,11 +2267,16 @@ explicitly have any limits on the number of Virtual 
processors a guest
 is allowed to bring up. It is strongly recommended to keep this enabled
 for guests with more than 64 vCPUs.
 
+=item B
+
+This set enables dynamic changes to Virtual processor states in Windows
+guests effectively allowing vCPU hotplug.
+
 =item B
 
 This is a special value that enables the default set of groups, which
 is currently the B, B, B, B,
-B, B and B groups.
+B, B, B and B groups.
 
 =item B
 
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index be1e288..7c7c541 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -458,6 +458,12 @@
 #define LIBXL_HAVE_VIRIDIAN_NO_VP_LIMIT 1
 
 /*
+ * LIBXL_HAVE_VIRIDIAN_CPU_HOTPLUG indicates that the 'cpu_hotplug' value
+ * is present in the viridian enlightenment enumeration.
+ */
+#define LIBXL_HAVE_VIRIDIAN_CPU_HOTPLUG 1
+
+/*
  * LIBXL_HAVE_DEVICE_PCI_LIST_FREE indicates that the
  * libxl_device_pci_list_free() function is defined.
  */
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 8502b29..00a8e68 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -240,6 +240,7 @@ libxl_viridian_enlightenment = 
Enumeration("viridian_enlightenment", [
 (9, "hcall_ipi"),
 (10, "ex_processor_masks"),
 (11, "no_vp_limit"),
+(12, "cpu_hotplug"),
 ])
 
 libxl_hdtype = Enumeration("hdtype", [
diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
index 5c4c194..91a9fc7 100644
--- a/tools/libs/light/libxl_x86.c
+++ b/tools/libs/light/libxl_x86.c
@@ -310,6 +310,7 @@ static int hvm_set_viridian_features(libxl__gc *gc, 
uint32_t domid,
 libxl_bitmap_set(&enlightenments, 
LIBXL_VIRIDIAN_ENLIGHTENMENT_APIC_ASSIST);
 libxl_bitmap_set(&enlightenments, 
LIBXL_VIRIDIAN_ENLIGHTENMENT_CRASH_CTL);
 libxl_bitmap_set(&enlightenments, 
LIBXL_VIRIDIAN_ENLIGHTENMENT_NO_VP_LIMIT);
+libxl_bitmap_set(&enlightenments, 
LIBXL_VIRIDIAN_ENLIGHTENMENT_CPU_HOTPLUG);
 }
 
 libxl_for_each_set_bit(v, info->u.hvm.viridian_enable) {
@@ -373,6 +374,9 @@ static int hvm_set_viridian_features(libxl__gc *gc, 
uint32_t domid,
 if (libxl_bitmap_test(&enlightenments, 
LIBXL_VIRIDIAN_ENLIGHTENMENT_NO_VP_LIMIT))
 mask |= HVMPV_no_vp_limit;
 
+if (libxl_bitmap_test(&enlightenments, 
LIBXL_VIRIDIAN_ENLIGHTENMENT_CPU_HOTPLUG))
+mask |= HVMPV_cpu_hotplug;
+
 if (mask != 0 &&
 xc_hvm_param_set(CTX->xch,
  domid,
diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
b/xen/arch/x86/hvm/viridian/viridian.c
index ae1ea86..b906f7b 100644
--- a/xen/arch/x86/hvm/viridian/viridian.c
+++ b/xen/arch/x86/hvm/viridian/viridian.c
@@ -76,6 +76,7 @@ typedef union _HV_CRASH_CTL_REG_CONTENTS
 } HV_CRASH_CTL_REG_CONTENTS;
 
 /* Viridian CPUID leaf 3, Hypervisor Feature Indication */
+#define CPUID3D_CPU_DYNAMIC_PARTITIONING (1 << 3)
 #define CPUID3D_CRASH_MSRS (1 << 10)
 #define CPUID3D_SINT_POLLING (1 << 17)
 
@@ -179,8 +180,10 @@ void cpuid_viridian_leaves(const struct vcpu *v, uint32_t 
leaf,
 res->a = u.lo;
 res->b = u.hi;
 
+if ( viridian_feature_mask(d) & HVMPV_cpu_hotplug )
+   res->d = CPUID3D_CPU_DYNAMIC_PARTITIONING;
 if ( viridian_feature_mask(d) & HVMPV_crash_ctl )
-res->d = CPUID3D_CRASH_MSRS;
+res->d |= CPUID3D_CRASH_MSRS;
 if ( viridian_feature_mask(d) & HVMPV_synic )
 res->d |= CPUID3D_SINT_POLLING;
 
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 805f4ca..c9d6e70 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -172,6 +172,10 @@
 #define _HVMPV_no_vp_limit 11
 #define HVMPV_no_vp_limit (1 << _HVMPV_no_vp_limit)
 
+/* Enable vCPU hotplug */
+#define _HVMPV_cpu_hotplug 12
+#define HVMPV_cpu_hotplug (1 << _HVMPV_cpu_hotplug)
+
 #define HVMPV_feature_mask \
 (HVMPV_base_freq | \
  HVMPV_no_freq | \
@@ -184,7 +188,8

[linux-linus test] 158368: regressions - FAIL

2021-01-11 Thread osstest service owner

flight 158368 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158368/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-xsm7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-libvirt-xsm   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-coresched-i386-xl  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-raw7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-pvshim 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-freebsd10-i386  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-shadow 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-amd64-xl-multivcpu 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-xl  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-pvshim   14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-credit2  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-pvhv2-intel 14 guest-start   fail REGR. vs. 152332
 test-amd64-amd64-xl-shadow   14 guest-start  fail REGR. vs. 152332
 test-arm64-arm64-xl-credit2  10 host-ping-check-xen  fail REGR. vs. 152332
 test-amd64-i386-examine   6 xen-install  fail REGR. vs. 152332
 test-amd64-amd64-dom0pvh-xl-amd 14 guest-start   fail REGR. vs. 152332
 test-amd64-amd64-xl-pvhv2-amd 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-libvirt-xsm 14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-libvirt 14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-credit1  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-xsm  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-dom0pvh-xl-intel 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-libvirt-pair 25 guest-start/debian  fail REGR. vs. 152332
 test-amd64-coresched-amd64-xl 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-pair25 guest-start/debian   fail REGR. vs. 152332
 test-amd64-amd64-amd64-pvgrub 20 guest-stop  fail REGR. vs. 152332
 test-amd64-amd64-i386-pvgrub 20 guest-stop   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-arm64-arm64-xl-seattle  10 host-ping-check-xen  fail REGR. vs. 152332
 test-arm64-arm64-libvirt-xsm  8 xen-boot fail REGR. vs. 152332
 test-arm64-arm64-xl-credit1 10 host-ping-check-xen fail in 158346 REGR. vs. 
152332
 test-arm64-arm64-xl-xsm  12 debian-install fail in 158346 REGR. vs. 152332
 test-arm64-arm64-examine 13 examine-iommu  fail in 158346 REGR. vs. 152332

Tests which are failing intermittently (not blocking):
 test-arm64-arm64-xl-credit2   8 xen-boot fail in 158346 pass in 158368
 test-arm64-arm64-xl-seattle   8 xen-boot fail in 158346 pass in 158368
 test-arm64-arm64-xl   10 host-ping-check-xen fail in 158

[xen-unstable-smoke test] 158371: tolerable all pass - PUSHED

2021-01-11 Thread osstest service owner

flight 158371 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158371/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  edad4c760a1b28abb15836ac4325912203c44905
baseline version:
 xen  faa0ab2a1df0381e00d85312247024b32d60a7b9

Last test of basis   158362  2021-01-11 14:00:27 Z0 days
Testing same since   158371  2021-01-12 01:00:27 Z0 days1 attempts


People who touched revisions under test:
  Julien Grall 
  Oleksandr Tyshchenko 
  Stefano Stabellini 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   faa0ab2a1d..edad4c760a  edad4c760a1b28abb15836ac4325912203c44905 -> smoke

[qemu-mainline test] 158367: regressions - FAIL

2021-01-11 Thread osstest service owner

flight 158367 qemu-mainline real [real]
flight 158370 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/158367/
http://logs.test-lab.xenproject.org/osstest/logs/158370/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeat fail REGR. vs. 152631
 test-amd64-amd64-xl-qcow2   21 guest-start/debian.repeat fail REGR. vs. 152631
 test-armhf-armhf-xl-vhd 17 guest-start/debian.repeat fail REGR. vs. 152631

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152631
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 152631
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152631
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152631
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 152631
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152631
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 152631
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass

version targeted for testing:
 qemuu7b09f127738ae3d0e71716cea086fc8f847a5686
baseline version:
 qemuu1d806cef0e38b5db8347a8e12f214d543204a314

Last test of basis   152631  2020-08-20 09:07:46 Z  144 days
Failing since152659  2020-08-21 14:07:39 Z  143 days  297 attempts
Testing same since   158291  2021-01-09 02:23:06 Z2 days6 attempts


336 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm

Re: [RESEND] [RFC PATCH] xen/arm: domain_build: Ignore empty memory bank

2021-01-11 Thread Stefano Stabellini

On Mon, 4 Jan 2021, Elliott Mitchell wrote:
> On Mon, Dec 21, 2020 at 06:28:35PM +, Julien Grall wrote:
> > On 21/12/2020 17:30, Elliott Mitchell wrote:
> > > I doubt this is the only bug exposed by
> > > 5a37207df52066efefe419c677b089a654d37afc.
> > 
> > Are you saying that with my patch dropped, Xen will boot but with it 
> > will not?
> 
> I thought that was the cause.  Yet after a bunch of builds trying to
> ensure I can cause it to reproduce or not, I wasn't able to.  As such I
> now think this is a misattribution.  :-(
> 
> Other candidate on my radar is this showed up near the time I started
> trying other kernel sources.  I now wonder if this is due to the
> device-trees being produced with recent RPF kernels versus those being
> produced with pure mainline.  Presently I'm using a 5.10 RPF kernel and
> device-tree.
> 
> 
> > So I think we first need to figure out what is the offending node and 
> > why it is dt_device_get_address() is returning an error for it.
> > 
> > That said, I agree that we possibly want a check size == 0 (action TBD) 
> > in map_range_to_domain() as the code would do the wrong thing for 0.
> 
> Already stated "/scb/pcie@7d50/pci@1,0/usb@1,0".

Can you please post the full node for usb@1,0? I would like to check the
corresponding device tree binding to see the expected format.


> Perhaps the code should be ignoring nodes for which
> which dt_device_get_address() fails, or perhaps this should only be done
> for Domain 0 (where it results in panic).

That seems reasonable

Re: [PATCH] iommu/arm: ipmmu-vmsa: Use 1U << 31 rather than 1 << 31

2021-01-11 Thread Stefano Stabellini

On Mon, 11 Jan 2021, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko 
> 
> Replace all the use of 1 << 31 with 1U << 31 to prevent undefined
> behavior in the IPMMU-VMSA driver.
> 
> Signed-off-by: Oleksandr Tyshchenko 

Reviewed-by: Stefano Stabellini 


> ---
> This is a follow-up to
> https://patchew.org/Xen/20201224152419.22453-1-jul...@xen.org/
> ---
>  xen/drivers/passthrough/arm/ipmmu-vmsa.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c 
> b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> index 346165c..aef358d 100644
> --- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> +++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
> @@ -187,7 +187,7 @@ static DEFINE_SPINLOCK(ipmmu_devices_lock);
>  #define IMCAAR   0x0004
>  
>  #define IMTTBCR0x0008
> -#define IMTTBCR_EAE(1 << 31)
> +#define IMTTBCR_EAE(1U << 31)
>  #define IMTTBCR_PMB(1 << 30)
>  #define IMTTBCR_SH1_NON_SHAREABLE  (0 << 28)
>  #define IMTTBCR_SH1_OUTER_SHAREABLE(2 << 28)
> @@ -251,7 +251,7 @@ static DEFINE_SPINLOCK(ipmmu_devices_lock);
>  #define IMUCTR(n)  ((n) < 32 ? IMUCTR0(n) : IMUCTR32(n))
>  #define IMUCTR0(n) (0x0300 + ((n) * 16))
>  #define IMUCTR32(n)(0x0600 + (((n) - 32) * 16))
> -#define IMUCTR_FIXADDEN(1 << 31)
> +#define IMUCTR_FIXADDEN(1U << 31)
>  #define IMUCTR_FIXADD_MASK (0xff << 16)
>  #define IMUCTR_FIXADD_SHIFT16
>  #define IMUCTR_TTSEL_MMU(n)((n) << 4)
> -- 
> 2.7.4
>

Re: [PATCH 2/2] sysemu: Let VMChangeStateHandler take boolean 'running' argument

2021-01-11 Thread David Gibson

On Mon, Jan 11, 2021 at 04:20:20PM +0100, Philippe Mathieu-Daudé wrote:
> The 'running' argument from VMChangeStateHandler does not require
> other value than 0 / 1. Make it a plain boolean.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

ppc parts
Acked-by: David Gibson 

> ---
>  include/sysemu/runstate.h   | 10 --
>  target/arm/kvm_arm.h|  2 +-
>  target/ppc/cpu-qom.h|  2 +-
>  accel/xen/xen-all.c |  2 +-
>  audio/audio.c   |  2 +-
>  block/block-backend.c   |  2 +-
>  gdbstub.c   |  2 +-
>  hw/block/pflash_cfi01.c |  2 +-
>  hw/block/virtio-blk.c   |  2 +-
>  hw/display/qxl.c|  2 +-
>  hw/i386/kvm/clock.c |  2 +-
>  hw/i386/kvm/i8254.c |  2 +-
>  hw/i386/kvmvapic.c  |  2 +-
>  hw/i386/xen/xen-hvm.c   |  2 +-
>  hw/ide/core.c   |  2 +-
>  hw/intc/arm_gicv3_its_kvm.c |  2 +-
>  hw/intc/arm_gicv3_kvm.c |  2 +-
>  hw/intc/spapr_xive_kvm.c|  2 +-
>  hw/misc/mac_via.c   |  2 +-
>  hw/net/e1000e_core.c|  2 +-
>  hw/nvram/spapr_nvram.c  |  2 +-
>  hw/ppc/ppc.c|  2 +-
>  hw/ppc/ppc_booke.c  |  2 +-
>  hw/s390x/tod-kvm.c  |  2 +-
>  hw/scsi/scsi-bus.c  |  2 +-
>  hw/usb/hcd-ehci.c   |  2 +-
>  hw/usb/host-libusb.c|  2 +-
>  hw/usb/redirect.c   |  2 +-
>  hw/vfio/migration.c |  2 +-
>  hw/virtio/virtio-rng.c  |  2 +-
>  hw/virtio/virtio.c  |  2 +-
>  net/net.c   |  2 +-
>  softmmu/memory.c|  2 +-
>  softmmu/runstate.c  |  2 +-
>  target/arm/kvm.c|  2 +-
>  target/i386/kvm/kvm.c   |  2 +-
>  target/i386/sev.c   |  2 +-
>  target/i386/whpx/whpx-all.c |  2 +-
>  target/mips/kvm.c   |  4 ++--
>  ui/gtk.c|  2 +-
>  ui/spice-core.c |  2 +-
>  41 files changed, 49 insertions(+), 43 deletions(-)
> 
> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
> index 3ab35a039a0..a5356915734 100644
> --- a/include/sysemu/runstate.h
> +++ b/include/sysemu/runstate.h
> @@ -10,7 +10,7 @@ bool runstate_is_running(void);
>  bool runstate_needs_reset(void);
>  bool runstate_store(char *str, size_t size);
>  
> -typedef void VMChangeStateHandler(void *opaque, int running, RunState state);
> +typedef void VMChangeStateHandler(void *opaque, bool running, RunState 
> state);
>  
>  VMChangeStateEntry *qemu_add_vm_change_state_handler(VMChangeStateHandler 
> *cb,
>   void *opaque);
> @@ -20,7 +20,13 @@ VMChangeStateEntry 
> *qdev_add_vm_change_state_handler(DeviceState *dev,
>   VMChangeStateHandler 
> *cb,
>   void *opaque);
>  void qemu_del_vm_change_state_handler(VMChangeStateEntry *e);
> -void vm_state_notify(int running, RunState state);
> +/**
> + * vm_state_notify: Notify the state of the VM
> + *
> + * @running: whether the VM is running or not.
> + * @state: the #RunState of the VM.
> + */
> +void vm_state_notify(bool running, RunState state);
>  
>  static inline bool shutdown_caused_by_guest(ShutdownCause cause)
>  {
> diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
> index eb81b7059eb..68ec970c4f4 100644
> --- a/target/arm/kvm_arm.h
> +++ b/target/arm/kvm_arm.h
> @@ -352,7 +352,7 @@ void kvm_arm_get_virtual_time(CPUState *cs);
>   */
>  void kvm_arm_put_virtual_time(CPUState *cs);
>  
> -void kvm_arm_vm_state_change(void *opaque, int running, RunState state);
> +void kvm_arm_vm_state_change(void *opaque, bool running, RunState state);
>  
>  int kvm_arm_vgic_probe(void);
>  
> diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
> index 63b9e8632ca..118baf8d41f 100644
> --- a/target/ppc/cpu-qom.h
> +++ b/target/ppc/cpu-qom.h
> @@ -218,7 +218,7 @@ extern const VMStateDescription vmstate_ppc_timebase;
>  .offset = vmstate_offset_value(_state, _field, PPCTimebase),  \
>  }
>  
> -void cpu_ppc_clock_vm_state_change(void *opaque, int running,
> +void cpu_ppc_clock_vm_state_change(void *opaque, bool running,
> RunState state);
>  #endif
>  
> diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c
> index 878a4089d97..3756aca27be 100644
> --- a/accel/xen/xen-all.c
> +++ b/accel/xen/xen-all.c
> @@ -122,7 +122,7 @@ static void xenstore_record_dm_state(struct xs_handle 
> *xs, const char *state)
>  }
>  
>  
> -static void xen_change_state_handler(void *opaque, int running,
> +static void xen_change_state_handler(void *opaque, bool running,
>   RunState state)
>  {
>  if (running) {
> diff --git a/audio/audio.c b/audio/audio.c
> index b48471bb3f6..f2d56e7e57d 100644
> --- a/audio/audio.c
> +++ b/audio/audio.c
> @@ -1549,7 +1549,7 @@ static int audio_driver_init(AudioState *s, struct 
> audio_driver *drv,
>  }
>  }
>  
> -static void audio_vm_ch

[PATCH] xen/arm: don't read aarch32 regs when aarch32 isn't available

2021-01-11 Thread Stefano Stabellini

Don't read aarch32 system registers at boot time when the aarch32 state
is not available. They are UNKNOWN, so it is not useful to read them.
Moreover, on Cavium ThunderX reading ID_PFR2_EL1 causes a Xen crash.
Instead, only read them when aarch32 is available.

Leave the corresponding fields in struct cpuinfo_arm so that they
are read-as-zero from a guest.

Since we are editing identify_cpu, also fix the indentation: 4 spaces
instead of 8.

Fixes: 9cfdb489af81 ("xen/arm: Add ID registers and complete cpuinfo")
Link: https://marc.info/?l=xen-devel&m=161035501118086
Link: 
http://logs.test-lab.xenproject.org/osstest/logs/158293/test-arm64-arm64-xl-xsm/info.html
Suggested-by: Julien Grall 
Signed-off-by: Stefano Stabellini 
---
 xen/arch/arm/cpufeature.c | 35 +--
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
index 698bfa0201..b1c82ade49 100644
--- a/xen/arch/arm/cpufeature.c
+++ b/xen/arch/arm/cpufeature.c
@@ -101,29 +101,35 @@ int enable_nonboot_cpu_caps(const struct 
arm_cpu_capabilities *caps)
 
 void identify_cpu(struct cpuinfo_arm *c)
 {
-c->midr.bits = READ_SYSREG(MIDR_EL1);
-c->mpidr.bits = READ_SYSREG(MPIDR_EL1);
+bool aarch32 = true;
+
+c->midr.bits = READ_SYSREG(MIDR_EL1);
+c->mpidr.bits = READ_SYSREG(MPIDR_EL1);
 
 #ifdef CONFIG_ARM_64
-c->pfr64.bits[0] = READ_SYSREG(ID_AA64PFR0_EL1);
-c->pfr64.bits[1] = READ_SYSREG(ID_AA64PFR1_EL1);
+c->pfr64.bits[0] = READ_SYSREG(ID_AA64PFR0_EL1);
+c->pfr64.bits[1] = READ_SYSREG(ID_AA64PFR1_EL1);
+
+c->dbg64.bits[0] = READ_SYSREG(ID_AA64DFR0_EL1);
+c->dbg64.bits[1] = READ_SYSREG(ID_AA64DFR1_EL1);
 
-c->dbg64.bits[0] = READ_SYSREG(ID_AA64DFR0_EL1);
-c->dbg64.bits[1] = READ_SYSREG(ID_AA64DFR1_EL1);
+c->aux64.bits[0] = READ_SYSREG(ID_AA64AFR0_EL1);
+c->aux64.bits[1] = READ_SYSREG(ID_AA64AFR1_EL1);
 
-c->aux64.bits[0] = READ_SYSREG(ID_AA64AFR0_EL1);
-c->aux64.bits[1] = READ_SYSREG(ID_AA64AFR1_EL1);
+c->mm64.bits[0]  = READ_SYSREG(ID_AA64MMFR0_EL1);
+c->mm64.bits[1]  = READ_SYSREG(ID_AA64MMFR1_EL1);
+c->mm64.bits[2]  = READ_SYSREG(ID_AA64MMFR2_EL1);
 
-c->mm64.bits[0]  = READ_SYSREG(ID_AA64MMFR0_EL1);
-c->mm64.bits[1]  = READ_SYSREG(ID_AA64MMFR1_EL1);
-c->mm64.bits[2]  = READ_SYSREG(ID_AA64MMFR2_EL1);
+c->isa64.bits[0] = READ_SYSREG(ID_AA64ISAR0_EL1);
+c->isa64.bits[1] = READ_SYSREG(ID_AA64ISAR1_EL1);
 
-c->isa64.bits[0] = READ_SYSREG(ID_AA64ISAR0_EL1);
-c->isa64.bits[1] = READ_SYSREG(ID_AA64ISAR1_EL1);
+c->zfr64.bits[0] = READ_SYSREG(ID_AA64ZFR0_EL1);
 
-c->zfr64.bits[0] = READ_SYSREG(ID_AA64ZFR0_EL1);
+aarch32 = c->pfr64.el1 == 2;
 #endif
 
+if ( aarch32 )
+{
 c->pfr32.bits[0] = READ_SYSREG(ID_PFR0_EL1);
 c->pfr32.bits[1] = READ_SYSREG(ID_PFR1_EL1);
 c->pfr32.bits[2] = READ_SYSREG(ID_PFR2_EL1);
@@ -153,6 +159,7 @@ void identify_cpu(struct cpuinfo_arm *c)
 #ifndef MVFR2_MAYBE_UNDEFINED
 c->mvfr.bits[2] = READ_SYSREG(MVFR2_EL1);
 #endif
+}
 }
 
 /*
-- 
2.17.1

Re: [PATCH] xen/privcmd: allow fetching resource sizes

2021-01-11 Thread Andrew Cooper

On 11/01/2021 22:09, boris.ostrov...@oracle.com wrote:
> On 1/11/21 10:29 AM, Roger Pau Monne wrote:
>>  
>> +xdata.domid = kdata.dom;
>> +xdata.type = kdata.type;
>> +xdata.id = kdata.id;
>> +
>> +if (!kdata.addr && !kdata.num) {
>
> I think we should not allow only one of them to be zero. If it's only 
> kdata.num then we will end up with pfns array set to ZERO_SIZE_PTR (which is 
> 0x10). We seem to be OK in that we are not derefencing pfns (either in kernel 
> or in hypervisor) if number of frames is zero but IMO we shouldn't be 
> tempting the fate.
>
>
> (And if it's only kdata.addr then we will get a vma but I am not sure it will 
> do what we want.)

Passing addr == 0 without num being 0 is already an error in Xen, and
passing num == 0 without addr being 0 is bogus and will be an error by
the time I'm finished fixing this.

FWIW, the common usecase for non-trivial examples will be:

xenforeignmem_resource_size(domid, type, id, &size);
xenforeignmem_map_resource(domid, type, id, NULL, size, ...);

which translates into:

ioctl(MAP_RESOURCE, NULL, 0) => size
mmap(NULL, size, ...) => ptr
ioctl(MAP_RESOURCE, ptr, size)

from the kernels point of view, and two hypercalls from Xen's point of
view.  The NULL's above are expected to be the common case for letting
the kernel chose the vma, but ought to be filled in by the time the
second ioctl() occurs.

See
https://lore.kernel.org/xen-devel/20200922182444.12350-1-andrew.coop...@citrix.com/T/#u
for all the gory details.

~Andrew

Re: [PATCH v2 09/11] xen/memory: Fix mapping grant tables with XENMEM_acquire_resource

2021-01-11 Thread Andrew Cooper

On 11/01/2021 20:05, Andrew Cooper wrote:
>>> --- a/xen/common/memory.c
>>> +++ b/xen/common/memory.c
>>> @@ -1027,17 +1027,31 @@ static unsigned int resource_max_frames(struct 
>>> domain *d,
>>>  }
>>>  }
>>>  
>>> +/*
>>> + * Returns -errno on error, or positive in the range [1, nr_frames] on
>>> + * success.  Returning less than nr_frames contitutes a request for a
>>> + * continuation.
>>> + */
>>> +static int _acquire_resource(
>>> +struct domain *d, unsigned int type, unsigned int id, unsigned long 
>>> frame,
>>> +unsigned int nr_frames, xen_pfn_t mfn_list[])
>> As per the comment the return type may again want to be "long" here.
>> Albeit I realize the restriction to (UINT_MAX >> MEMOP_EXTENT_SHIFT)
>> makes this (and the other place above) only a latent issue for now,
>> so it may well be fine to be left as is.
> Hmm yes - it should be long, because per the ABI we still should be able
> to return 0x to a caller in the success case.
>
> I'll update.

Actually, no.  Wrong half the hypercall.

For _acquire_resource(), the return value is bound by nr_frames which is
a maximum of 32, and is unlikely to grow substantially from this.

~Andrew

[xen-unstable test] 158357: regressions - FAIL

2021-01-11 Thread osstest service owner

flight 158357 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158357/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-xl-thunderx  8 xen-boot fail REGR. vs. 158290
 test-arm64-arm64-examine  8 reboot   fail REGR. vs. 158290
 test-arm64-arm64-xl-xsm   8 xen-boot fail REGR. vs. 158290
 test-arm64-arm64-xl-credit1   8 xen-boot fail REGR. vs. 158290

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 20 guest-start/debianhvm.repeat fail 
in 158296 pass in 158357
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 17 
depriv-audit-qemu/create fail in 158303 pass in 158357
 test-arm64-arm64-xl   8 xen-boot   fail pass in 158296
 test-arm64-arm64-libvirt-xsm  8 xen-boot   fail pass in 158303

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl 15 migrate-support-check fail in 158296 never pass
 test-arm64-arm64-xl 16 saverestore-support-check fail in 158296 never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-check fail in 158296 never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-check fail in 158296 never 
pass
 test-amd64-amd64-examine  4 memdisk-try-append   fail  like 158231
 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeatfail  like 158269
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 158290
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 158290
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 158290
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 158290
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 158290
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 158290
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 158290
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 158290
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 158290
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 158290
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 158290
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  ce59e3dda5f99afbe7257e1e9a22dffd5c4d033c
baseline version:
 xen  7ba2ab495be54f608cb47440e1497b2795bd301a

Last test of basis

Re: [PATCH] xen/privcmd: allow fetching resource sizes

2021-01-11 Thread boris . ostrovsky

On 1/11/21 10:29 AM, Roger Pau Monne wrote:
>  
> + xdata.domid = kdata.dom;
> + xdata.type = kdata.type;
> + xdata.id = kdata.id;
> +
> + if (!kdata.addr && !kdata.num) {

I think we should not allow only one of them to be zero. If it's only kdata.num 
then we will end up with pfns array set to ZERO_SIZE_PTR (which is 0x10). We 
seem to be OK in that we are not derefencing pfns (either in kernel or in 
hypervisor) if number of frames is zero but IMO we shouldn't be tempting the 
fate.

(And if it's only kdata.addr then we will get a vma but I am not sure it will 
do what we want.)

-boris

Re: [PATCH v2 02/11] xen/gnttab: Rework resource acquisition

2021-01-11 Thread Andrew Cooper

On 24/09/2020 10:51, Paul Durrant wrote:
>> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
>> index a5d3ed8bda..912f07be47 100644
>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
>> @@ -4013,6 +4013,81 @@ static int gnttab_get_shared_frame_mfn(struct domain 
>> *d,
>>  return 0;
>>  }
>>
>> +int gnttab_acquire_resource(
>> +struct domain *d, unsigned int id, unsigned long frame,
>> +unsigned int nr_frames, xen_pfn_t mfn_list[])
>> +{
>> +struct grant_table *gt = d->grant_table;
>> +unsigned int i = nr_frames, tot_frames;
>> +mfn_t tmp;
>> +void **vaddrs = NULL;
>> +int rc;
>> +
>> +/* Input sanity. */
> Nit: inconsistency with full stops on single line comments.

The whole point of relaxing the style was because feedback over minutia
such as this was deemed detrimental to the community.

If I ever see feedback like this, I will commit commit the patch there
and then.  This is the only way upstream Xen is going to turn into a
less toxic environment for contributors.

>> +rc = -EINVAL;
>> +break;
>> +}
>> +rc = gnttab_get_status_frame_mfn(d, tot_frames - 1, &tmp);
>> +break;
>> +}
>> +
> I think you could drop the write lock here...
>
>> +/* Any errors from growing the table? */
>> +if ( rc )
>> +goto out;
>> +
> ...and acquire it read here, since we know the table cannot shrink. You'd 
> need to re-check the gt_version for safety though.

And you've correctly identified why I didn't.  If we had a
relax-write-to-read lock operation, that would also be fine, but we don't.

Fundamentally, this is an operation made once during VM construction, to
map one single frame.  It is not a hotpath in need of microptimising its
locking pattern, and absolutely not something worth introducing a safety
hazard for.

~Andrew

Re: [PATCH v2 02/11] xen/gnttab: Rework resource acquisition

2021-01-11 Thread Andrew Cooper

On 25/09/2020 14:17, Jan Beulich wrote:
> On 22.09.2020 20:24, Andrew Cooper wrote:
>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
>> @@ -4013,6 +4013,81 @@ static int gnttab_get_shared_frame_mfn(struct domain 
>> *d,
>>  return 0;
>>  }
>>  
>> +int gnttab_acquire_resource(
>> +struct domain *d, unsigned int id, unsigned long frame,
>> +unsigned int nr_frames, xen_pfn_t mfn_list[])
>> +{
>> +struct grant_table *gt = d->grant_table;
>> +unsigned int i = nr_frames, tot_frames;
>> +mfn_t tmp;
>> +void **vaddrs = NULL;
>> +int rc;
>> +
>> +/* Input sanity. */
>> +if ( !nr_frames )
>> +return -EINVAL;
> I continue to object to this becoming an error.

It's not a path any legitimate caller will ever exercise.  POSIX defines
any mmap() of zero length to be an error, and I completely agree.

The problem isn't, per say, with accepting bogus arguments.  It is the
quantity of additional complexity in the hypervisor required to support
accepting the bogus input cleanly.

There are exactly 2 cases where 0 might be found here.  Either the
caller passed it in directly (and bypassed the POSIX check which would
reject the attempt), or some part of multi-layer continuation handling
went wrong on the previous iteration.

For this hypercall (by the end of the series), _acquire_resource()
returning 0 is specifically treated as an error so we don't livelock in
32-chunking loop until some other preemption kicks in.

In this case, the check isn't actually necessary because it is (will be)
guarded higher up the call chain in a more general way, but I'm not
interested in adding unnecessary extra complexity (to area I've had to
rewrite from scratch to remove the bugs) simply to support a
non-existent usecase.

~Andrew

Re: [PATCH v2 09/11] xen/memory: Fix mapping grant tables with XENMEM_acquire_resource

2021-01-11 Thread Andrew Cooper

On 28/09/2020 10:37, Jan Beulich wrote:
> On 22.09.2020 20:24, Andrew Cooper wrote:
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -4632,7 +4632,6 @@ int arch_acquire_resource(struct domain *d, unsigned 
>> int type,
>>  if ( id != (unsigned int)ioservid )
>>  break;
>>  
>> -rc = 0;
>>  for ( i = 0; i < nr_frames; i++ )
>>  {
>>  mfn_t mfn;
>> @@ -4643,6 +4642,9 @@ int arch_acquire_resource(struct domain *d, unsigned 
>> int type,
>>  
>>  mfn_list[i] = mfn_x(mfn);
>>  }
>> +if ( i == nr_frames )
>> +/* Success.  Passed nr_frames back to the caller. */
>> +rc = nr_frames;
> With this, shouldn't the return type of the function be changed to
> "long"? I realize that's no an issue with XENMEM_resource_ioreq_server
> specifically, but I mean the general case.

That would require going back in time and making a more sane ABI for
struct xen_mem_acquire_resource

We really do have a 32bit nr_frames field, and a 64bit "where to
continue from" field.

>> --- a/xen/common/compat/memory.c
>> +++ b/xen/common/compat/memory.c
>> @@ -636,15 +662,45 @@ int compat_memory_op(unsigned int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) compat)
>>  compat_frame_list[i] = frame;
>>  }
>>  
>> -if ( __copy_to_compat_offset(cmp.mar.frame_list, 0,
>> - compat_frame_list,
>> - cmp.mar.nr_frames) )
>> +if ( __copy_to_compat_offset(
>> + cmp.mar.frame_list, start_extent,
>> + compat_frame_list, done) )
>>  return -EFAULT;
>>  }
>> -break;
>> +
>> +start_extent += done;
>> +
>> +/* Completely done. */
>> +if ( start_extent == cmp.mar.nr_frames )
>> +break;
>> +
>> +/*
>> + * Done a "full" batch, but we were limited by space in the xlat
>> + * area.  Go around the loop again without necesserily returning
>> + * to guest context.
>> + */
>> +if ( done == nat.mar->nr_frames )
>> +{
>> +split = 1;
>> +break;
>> +}
>> +
>> +/* Explicit continuation request from a higher level. */
>> +if ( done < nat.mar->nr_frames )
>> +return hypercall_create_continuation(
>> +__HYPERVISOR_memory_op, "ih",
>> +op | (start_extent << MEMOP_EXTENT_SHIFT), compat);
>> +
>> +/*
>> + * Well... Somethings gone wrong with the two levels of 
>> chunking.
>> + * My condolences to whomever next has to debug this mess.
>> + */
> Any suggestion how to overcome this "mess"?

The double level of array handling is what makes it so complicated. 
There are enough cases in compat_memory_op() alone which can't

We've got two cases in practice.  A singleton object needing conversion,
or a large array of them.  I'm quite certain we'd have less code and
less complexity by having copy_$OJBECT_{to,from}_guest() helpers which
dealt with compat internally as appropriate.

We don't care about the performance of 32bit hypercalls, but not doing
batch conversions of 1020/etc objects in the compat layer will probably
result in better performance overall, as we don't throw away the work as
we batch things at smaller increments higher up the stack.

>
>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
>> @@ -4105,6 +4105,9 @@ int gnttab_acquire_resource(
>>  for ( i = 0; i < nr_frames; ++i )
>>  mfn_list[i] = virt_to_mfn(vaddrs[frame + i]);
>>  
>> +/* Success.  Passed nr_frames back to the caller. */
> Nit: "Pass"?

We have already passed them back to the caller.  "Pass" is the wrong
tense to use.

>
>> --- a/xen/common/memory.c
>> +++ b/xen/common/memory.c
>> @@ -1027,17 +1027,31 @@ static unsigned int resource_max_frames(struct 
>> domain *d,
>>  }
>>  }
>>  
>> +/*
>> + * Returns -errno on error, or positive in the range [1, nr_frames] on
>> + * success.  Returning less than nr_frames contitutes a request for a
>> + * continuation.
>> + */
>> +static int _acquire_resource(
>> +struct domain *d, unsigned int type, unsigned int id, unsigned long 
>> frame,
>> +unsigned int nr_frames, xen_pfn_t mfn_list[])
> As per the comment the return type may again want to be "long" here.
> Albeit I realize the restriction to (UINT_MAX >> MEMOP_EXTENT_SHIFT)
> makes this (and the other place above) only a latent issue for now,
> so it may well be fine to be left as is.

Hmm yes - it should be long, because per the ABI we still should be able
to return 0x to a caller in the success case.

I'll update.

~Andrew

Re: [PATCH v2] xen/arm: do not read MVFR2 when is not defined

2021-01-11 Thread Bertrand Marquis

Hi,

> On 11 Jan 2021, at 19:07, Julien Grall  wrote:
> 
> 
> 
> On 11/01/2021 19:02, Bertrand Marquis wrote:
>> Hi Julien,
> 
> Hi Bertrand,
> 
>>> On 11 Jan 2021, at 18:50, Julien Grall  wrote:
>>> 
>>> On 11/01/2021 18:21, Bertrand Marquis wrote:
 Hi Julien,
>>> 
>>> Hi Bertrand,
>>> 
 Sorry for the delay but I was on holiday until today.
>>> 
>>> Welcome back! No worries.
>>> 
> On 11 Jan 2021, at 10:25, Julien Grall  wrote:
> 
> Hi Jan,
> 
> On 11/01/2021 08:49, Jan Beulich wrote:
>> On 08.01.2021 20:22, Stefano Stabellini wrote:
>>> MVFR2 is not available on ARMv7. It is available on ARMv8 aarch32 and
>>> aarch64. If Xen reads MVFR2 on ARMv7 it could crash.
>>> 
>>> Avoid the issue by doing the following:
>>> 
>>> - define MVFR2_MAYBE_UNDEFINED on arm32
>>> - if MVFR2_MAYBE_UNDEFINED, do not attempt to read MVFR2 in Xen
>>> - keep the 3rd register_t in struct cpuinfo_arm.mvfr on arm32 so that a
>>>   guest read to the register returns '0' instead of crashing the guest.
>>> 
>>> '0' is an appropriate value to return to the guest because it is defined
>>> as "no support for miscellaneous features".
>>> 
>>> Aarch64 Xen is not affected by this patch.
>> But it looks to also be affected by ...
> 
> AFAICT, the smoke test passed on Laxton0 (AMD Seattle) [1] over the 
> week-end.
> 
>>> Fixes: 9cfdb489af81 ("xen/arm: Add ID registers and complete cpuinfo")
>> ... this, faulting (according to osstest logs) early during boot on
> 
> The xen-unstable flight [2] ran on Rochester0 (Cavium Thunder-X). So this 
> has something to do with the platform.
> 
> The main difference is AMD Seattle supports AArch32 while Cavium 
> Thunder-X doesn't.
> 
>> 0025D314 mrs x1, id_pfr2_el1
> This register contains information for the AArch32 state.
> 
> AFAICT, the Arm Arm back to at least ARM DDI 0487A.j (published in 2016) 
> described the encoding as Read-Only. So I am not sure why we receive an 
> UNDEF here, the more it looks like ID_PFR{0, 1}_EL1 were correctly 
> accessed.
> 
> Andre, Bertrand, do you have any clue?
 I will double check this but my understanding when I checked this was that 
 it would be possible to read with an unknown value but should not generate 
 an UNDEF.
> 
> However, most of the AArch32 ID registers are UNKNOWN on platform not 
> implementing AArch32. So we may want to conditionally skip the access to 
> AArch32 state.
 We could skip aarch32 registers on platforms not supporting aarch32 but we 
 will still have to provide values to a guest trying to access them so 
 might be better to return what is returned by the hardware.
>>> 
>>> Per the Arm Arm, the value of the registers may changed at any time. IOW, 
>>> two read of the sytem registers may return different values.
>>> 
>>> IIRC, the original intent of the series was to provide sanitized value of 
>>> the ID registers. So I think it would be unwise to let the guest using the 
>>> values.
>>> 
>>> Instead, I would suggest to implement them as RAZ.
>> Works for me.
>>> 
 Now if some platforms are generating an UNDEF we need to understand in 
 what cases and behave the same way for the guest.
>>> 
>>> I am not entirely sure what you mean by platforms here.
>>> 
>>> If you mean any platform conforming with the Arm Arm, then I agree with 
>>> your statement.
>>> 
>>> However, if you refer to platform that may not follow the Arm Arm, then I 
>>> disagree. We should try to expose a sane interface to the guest whenever it 
>>> is possible.
>>> 
>>> In this case, I would bet the hardware would not even allow us to trap the 
>>> ID_PFR2. Although, I haven't tried it.
>>> 
 Do i understand it right that on Cavium which has no aarch32 support the 
 access is generating an UNDEF ?
>>> 
>>> Yes. The UNDEF will happen when trying to read ID_PFR2_EL1. Interestingly, 
>>> it doesn't happen when reading ID_PFR{0, 1}_EL1. So this smells like a 
>>> silicon bug.
>> Sounds like the ifdef ARM64 should be something like if (!cavium)
> 
> Hmmm Cavium may not the only platform where AArch32 is not supported.
> So as the values are actually UNKOWN (or UNDEF or Cavium), then there is no 
> point to read them.
> 
> Therefore the following pseudo-code should be enough:
> 
> if ( aarch32 supported )
>  read AArch32 ID registers
> 
> This will nicely solve the UNDEF on Cavium without adding more workaround in 
> the code :).

Works for me.

Cheers
Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall

Re: [PATCH v2] xen/arm: do not read MVFR2 when is not defined

2021-01-11 Thread Julien Grall





On 11/01/2021 19:02, Bertrand Marquis wrote:

Hi Julien,


Hi Bertrand,




On 11 Jan 2021, at 18:50, Julien Grall  wrote:

On 11/01/2021 18:21, Bertrand Marquis wrote:

Hi Julien,


Hi Bertrand,


Sorry for the delay but I was on holiday until today.


Welcome back! No worries.


On 11 Jan 2021, at 10:25, Julien Grall  wrote:

Hi Jan,

On 11/01/2021 08:49, Jan Beulich wrote:

On 08.01.2021 20:22, Stefano Stabellini wrote:

MVFR2 is not available on ARMv7. It is available on ARMv8 aarch32 and
aarch64. If Xen reads MVFR2 on ARMv7 it could crash.

Avoid the issue by doing the following:

- define MVFR2_MAYBE_UNDEFINED on arm32
- if MVFR2_MAYBE_UNDEFINED, do not attempt to read MVFR2 in Xen
- keep the 3rd register_t in struct cpuinfo_arm.mvfr on arm32 so that a
   guest read to the register returns '0' instead of crashing the guest.

'0' is an appropriate value to return to the guest because it is defined
as "no support for miscellaneous features".

Aarch64 Xen is not affected by this patch.

But it looks to also be affected by ...


AFAICT, the smoke test passed on Laxton0 (AMD Seattle) [1] over the week-end.


Fixes: 9cfdb489af81 ("xen/arm: Add ID registers and complete cpuinfo")

... this, faulting (according to osstest logs) early during boot on


The xen-unstable flight [2] ran on Rochester0 (Cavium Thunder-X). So this has 
something to do with the platform.

The main difference is AMD Seattle supports AArch32 while Cavium Thunder-X 
doesn't.


0025D314mrs x1, id_pfr2_el1

This register contains information for the AArch32 state.

AFAICT, the Arm Arm back to at least ARM DDI 0487A.j (published in 2016) 
described the encoding as Read-Only. So I am not sure why we receive an UNDEF 
here, the more it looks like ID_PFR{0, 1}_EL1 were correctly accessed.

Andre, Bertrand, do you have any clue?

I will double check this but my understanding when I checked this was that it 
would be possible to read with an unknown value but should not generate an 
UNDEF.


However, most of the AArch32 ID registers are UNKNOWN on platform not 
implementing AArch32. So we may want to conditionally skip the access to 
AArch32 state.

We could skip aarch32 registers on platforms not supporting aarch32 but we will 
still have to provide values to a guest trying to access them so might be 
better to return what is returned by the hardware.


Per the Arm Arm, the value of the registers may changed at any time. IOW, two 
read of the sytem registers may return different values.

IIRC, the original intent of the series was to provide sanitized value of the 
ID registers. So I think it would be unwise to let the guest using the values.

Instead, I would suggest to implement them as RAZ.


Works for me.




Now if some platforms are generating an UNDEF we need to understand in what 
cases and behave the same way for the guest.


I am not entirely sure what you mean by platforms here.

If you mean any platform conforming with the Arm Arm, then I agree with your 
statement.

However, if you refer to platform that may not follow the Arm Arm, then I 
disagree. We should try to expose a sane interface to the guest whenever it is 
possible.

In this case, I would bet the hardware would not even allow us to trap the 
ID_PFR2. Although, I haven't tried it.


Do i understand it right that on Cavium which has no aarch32 support the access 
is generating an UNDEF ?


Yes. The UNDEF will happen when trying to read ID_PFR2_EL1. Interestingly, it 
doesn't happen when reading ID_PFR{0, 1}_EL1. So this smells like a silicon bug.


Sounds like the ifdef ARM64 should be something like if (!cavium)


Hmmm Cavium may not the only platform where AArch32 is not supported.
So as the values are actually UNKOWN (or UNDEF or Cavium), then there is 
no point to read them.


Therefore the following pseudo-code should be enough:

if ( aarch32 supported )
  read AArch32 ID registers

This will nicely solve the UNDEF on Cavium without adding more 
workaround in the code :).


Cheers,

--
Julien Grall

Re: [PATCH v2] xen/arm: do not read MVFR2 when is not defined

2021-01-11 Thread Bertrand Marquis

Hi Julien,

> On 11 Jan 2021, at 18:50, Julien Grall  wrote:
> 
> On 11/01/2021 18:21, Bertrand Marquis wrote:
>> Hi Julien,
> 
> Hi Bertrand,
> 
>> Sorry for the delay but I was on holiday until today.
> 
> Welcome back! No worries.
> 
>>> On 11 Jan 2021, at 10:25, Julien Grall  wrote:
>>> 
>>> Hi Jan,
>>> 
>>> On 11/01/2021 08:49, Jan Beulich wrote:
 On 08.01.2021 20:22, Stefano Stabellini wrote:
> MVFR2 is not available on ARMv7. It is available on ARMv8 aarch32 and
> aarch64. If Xen reads MVFR2 on ARMv7 it could crash.
> 
> Avoid the issue by doing the following:
> 
> - define MVFR2_MAYBE_UNDEFINED on arm32
> - if MVFR2_MAYBE_UNDEFINED, do not attempt to read MVFR2 in Xen
> - keep the 3rd register_t in struct cpuinfo_arm.mvfr on arm32 so that a
>   guest read to the register returns '0' instead of crashing the guest.
> 
> '0' is an appropriate value to return to the guest because it is defined
> as "no support for miscellaneous features".
> 
> Aarch64 Xen is not affected by this patch.
 But it looks to also be affected by ...
>>> 
>>> AFAICT, the smoke test passed on Laxton0 (AMD Seattle) [1] over the 
>>> week-end.
>>> 
> Fixes: 9cfdb489af81 ("xen/arm: Add ID registers and complete cpuinfo")
 ... this, faulting (according to osstest logs) early during boot on
>>> 
>>> The xen-unstable flight [2] ran on Rochester0 (Cavium Thunder-X). So this 
>>> has something to do with the platform.
>>> 
>>> The main difference is AMD Seattle supports AArch32 while Cavium Thunder-X 
>>> doesn't.
>>> 
 0025D314   mrs x1, id_pfr2_el1
>>> This register contains information for the AArch32 state.
>>> 
>>> AFAICT, the Arm Arm back to at least ARM DDI 0487A.j (published in 2016) 
>>> described the encoding as Read-Only. So I am not sure why we receive an 
>>> UNDEF here, the more it looks like ID_PFR{0, 1}_EL1 were correctly accessed.
>>> 
>>> Andre, Bertrand, do you have any clue?
>> I will double check this but my understanding when I checked this was that 
>> it would be possible to read with an unknown value but should not generate 
>> an UNDEF.
>>> 
>>> However, most of the AArch32 ID registers are UNKNOWN on platform not 
>>> implementing AArch32. So we may want to conditionally skip the access to 
>>> AArch32 state.
>> We could skip aarch32 registers on platforms not supporting aarch32 but we 
>> will still have to provide values to a guest trying to access them so might 
>> be better to return what is returned by the hardware.
> 
> Per the Arm Arm, the value of the registers may changed at any time. IOW, two 
> read of the sytem registers may return different values.
> 
> IIRC, the original intent of the series was to provide sanitized value of the 
> ID registers. So I think it would be unwise to let the guest using the values.
> 
> Instead, I would suggest to implement them as RAZ.

Works for me.

> 
>> Now if some platforms are generating an UNDEF we need to understand in what 
>> cases and behave the same way for the guest.
> 
> I am not entirely sure what you mean by platforms here.
> 
> If you mean any platform conforming with the Arm Arm, then I agree with your 
> statement.
> 
> However, if you refer to platform that may not follow the Arm Arm, then I 
> disagree. We should try to expose a sane interface to the guest whenever it 
> is possible.
> 
> In this case, I would bet the hardware would not even allow us to trap the 
> ID_PFR2. Although, I haven't tried it.
> 
>> Do i understand it right that on Cavium which has no aarch32 support the 
>> access is generating an UNDEF ?
> 
> Yes. The UNDEF will happen when trying to read ID_PFR2_EL1. Interestingly, it 
> doesn't happen when reading ID_PFR{0, 1}_EL1. So this smells like a silicon 
> bug.

Sounds like the ifdef ARM64 should be something like if (!cavium)

Cheers
Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall

Re: [PATCH v2] xen/arm: do not read MVFR2 when is not defined

2021-01-11 Thread Julien Grall


On 11/01/2021 18:21, Bertrand Marquis wrote:

Hi Julien,


Hi Bertrand,


Sorry for the delay but I was on holiday until today.


Welcome back! No worries.




On 11 Jan 2021, at 10:25, Julien Grall  wrote:

Hi Jan,

On 11/01/2021 08:49, Jan Beulich wrote:

On 08.01.2021 20:22, Stefano Stabellini wrote:

MVFR2 is not available on ARMv7. It is available on ARMv8 aarch32 and
aarch64. If Xen reads MVFR2 on ARMv7 it could crash.

Avoid the issue by doing the following:

- define MVFR2_MAYBE_UNDEFINED on arm32
- if MVFR2_MAYBE_UNDEFINED, do not attempt to read MVFR2 in Xen
- keep the 3rd register_t in struct cpuinfo_arm.mvfr on arm32 so that a
   guest read to the register returns '0' instead of crashing the guest.

'0' is an appropriate value to return to the guest because it is defined
as "no support for miscellaneous features".

Aarch64 Xen is not affected by this patch.

But it looks to also be affected by ...


AFAICT, the smoke test passed on Laxton0 (AMD Seattle) [1] over the week-end.


Fixes: 9cfdb489af81 ("xen/arm: Add ID registers and complete cpuinfo")

... this, faulting (according to osstest logs) early during boot on


The xen-unstable flight [2] ran on Rochester0 (Cavium Thunder-X). So this has 
something to do with the platform.

The main difference is AMD Seattle supports AArch32 while Cavium Thunder-X 
doesn't.


0025D314mrs x1, id_pfr2_el1

This register contains information for the AArch32 state.

AFAICT, the Arm Arm back to at least ARM DDI 0487A.j (published in 2016) 
described the encoding as Read-Only. So I am not sure why we receive an UNDEF 
here, the more it looks like ID_PFR{0, 1}_EL1 were correctly accessed.

Andre, Bertrand, do you have any clue?


I will double check this but my understanding when I checked this was that it 
would be possible to read with an unknown value but should not generate an 
UNDEF.



However, most of the AArch32 ID registers are UNKNOWN on platform not 
implementing AArch32. So we may want to conditionally skip the access to 
AArch32 state.


We could skip aarch32 registers on platforms not supporting aarch32 but we will 
still have to provide values to a guest trying to access them so might be 
better to return what is returned by the hardware.


Per the Arm Arm, the value of the registers may changed at any time. 
IOW, two read of the sytem registers may return different values.


IIRC, the original intent of the series was to provide sanitized value 
of the ID registers. So I think it would be unwise to let the guest 
using the values.


Instead, I would suggest to implement them as RAZ.


Now if some platforms are generating an UNDEF we need to understand in what 
cases and behave the same way for the guest.


I am not entirely sure what you mean by platforms here.

If you mean any platform conforming with the Arm Arm, then I agree with 
your statement.


However, if you refer to platform that may not follow the Arm Arm, then 
I disagree. We should try to expose a sane interface to the guest 
whenever it is possible.


In this case, I would bet the hardware would not even allow us to trap 
the ID_PFR2. Although, I haven't tried it.




Do i understand it right that on Cavium which has no aarch32 support the access 
is generating an UNDEF ?


Yes. The UNDEF will happen when trying to read ID_PFR2_EL1. 
Interestingly, it doesn't happen when reading ID_PFR{0, 1}_EL1. So this 
smells like a silicon bug.


Cheers,

--
Julien Grall

Re: [PATCH v2] xen/arm: do not read MVFR2 when is not defined

2021-01-11 Thread Bertrand Marquis

Hi Julien,

Sorry for the delay but I was on holiday until today.

> On 11 Jan 2021, at 10:25, Julien Grall  wrote:
> 
> Hi Jan,
> 
> On 11/01/2021 08:49, Jan Beulich wrote:
>> On 08.01.2021 20:22, Stefano Stabellini wrote:
>>> MVFR2 is not available on ARMv7. It is available on ARMv8 aarch32 and
>>> aarch64. If Xen reads MVFR2 on ARMv7 it could crash.
>>> 
>>> Avoid the issue by doing the following:
>>> 
>>> - define MVFR2_MAYBE_UNDEFINED on arm32
>>> - if MVFR2_MAYBE_UNDEFINED, do not attempt to read MVFR2 in Xen
>>> - keep the 3rd register_t in struct cpuinfo_arm.mvfr on arm32 so that a
>>>   guest read to the register returns '0' instead of crashing the guest.
>>> 
>>> '0' is an appropriate value to return to the guest because it is defined
>>> as "no support for miscellaneous features".
>>> 
>>> Aarch64 Xen is not affected by this patch.
>> But it looks to also be affected by ...
> 
> AFAICT, the smoke test passed on Laxton0 (AMD Seattle) [1] over the week-end.
> 
>>> Fixes: 9cfdb489af81 ("xen/arm: Add ID registers and complete cpuinfo")
>> ... this, faulting (according to osstest logs) early during boot on
> 
> The xen-unstable flight [2] ran on Rochester0 (Cavium Thunder-X). So this has 
> something to do with the platform.
> 
> The main difference is AMD Seattle supports AArch32 while Cavium Thunder-X 
> doesn't.
> 
>> 0025D314 mrs x1, id_pfr2_el1
> This register contains information for the AArch32 state.
> 
> AFAICT, the Arm Arm back to at least ARM DDI 0487A.j (published in 2016) 
> described the encoding as Read-Only. So I am not sure why we receive an UNDEF 
> here, the more it looks like ID_PFR{0, 1}_EL1 were correctly accessed.
> 
> Andre, Bertrand, do you have any clue?

I will double check this but my understanding when I checked this was that it 
would be possible to read with an unknown value but should not generate an 
UNDEF.

> 
> However, most of the AArch32 ID registers are UNKNOWN on platform not 
> implementing AArch32. So we may want to conditionally skip the access to 
> AArch32 state.

We could skip aarch32 registers on platforms not supporting aarch32 but we will 
still have to provide values to a guest trying to access them so might be 
better to return what is returned by the hardware.
Now if some platforms are generating an UNDEF we need to understand in what 
cases and behave the same way for the guest.

Do i understand it right that on Cavium which has no aarch32 support the access 
is generating an UNDEF ?

Cheers
Bertrand

> 
> Cheers,
> 
> [1] 
> http://logs.test-lab.xenproject.org/osstest/logs/158293/test-arm64-arm64-xl-xsm/info.html
> 
>> Jan
> 
> [1]
> 
> 
> -- 
> Julien Grall

Re: [PATCH v20210111 01/39] stubdom: fix tpm_version

2021-01-11 Thread Samuel Thibault

Olaf Hering, le lun. 11 janv. 2021 18:41:46 +0100, a ecrit:
> It is just a declaration, not a variable.
> 
> ld: 
> /home/abuild/rpmbuild/BUILD/xen-4.14.20200616T103126.3625b04991/non-dbg/stubdom/vtpmmgr/vtpmmgr.a(vtpm_cmd_handler.o):(.bss+0x0):
>  multiple definition of `tpm_version'; 
> /home/abuild/rpmbuild/BUILD/xen-4.14.20200616T103126.3625b04991/non-dbg/stubdom/vtpmmgr/vtpmmgr.a(vtpmmgr.o):(.bss+0x0):
>  first defined here
> 
> Signed-off-by: Olaf Hering 

Reviewed-by: Samuel Thibault 

> ---
>  stubdom/vtpmmgr/vtpmmgr.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/stubdom/vtpmmgr/vtpmmgr.h b/stubdom/vtpmmgr/vtpmmgr.h
> index 2e6f8de9e4..f40ca9fd67 100644
> --- a/stubdom/vtpmmgr/vtpmmgr.h
> +++ b/stubdom/vtpmmgr/vtpmmgr.h
> @@ -53,7 +53,7 @@
>  enum {
>  TPM1_HARDWARE = 1,
>  TPM2_HARDWARE,
> -} tpm_version;
> +};
>  
>  struct tpm_hardware_version {
>  int hw_version;
>

[PATCH v20210111 01/39] stubdom: fix tpm_version

2021-01-11 Thread Olaf Hering

It is just a declaration, not a variable.

ld: 
/home/abuild/rpmbuild/BUILD/xen-4.14.20200616T103126.3625b04991/non-dbg/stubdom/vtpmmgr/vtpmmgr.a(vtpm_cmd_handler.o):(.bss+0x0):
 multiple definition of `tpm_version'; 
/home/abuild/rpmbuild/BUILD/xen-4.14.20200616T103126.3625b04991/non-dbg/stubdom/vtpmmgr/vtpmmgr.a(vtpmmgr.o):(.bss+0x0):
 first defined here

Signed-off-by: Olaf Hering 
---
 stubdom/vtpmmgr/vtpmmgr.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/stubdom/vtpmmgr/vtpmmgr.h b/stubdom/vtpmmgr/vtpmmgr.h
index 2e6f8de9e4..f40ca9fd67 100644
--- a/stubdom/vtpmmgr/vtpmmgr.h
+++ b/stubdom/vtpmmgr/vtpmmgr.h
@@ -53,7 +53,7 @@
 enum {
 TPM1_HARDWARE = 1,
 TPM2_HARDWARE,
-} tpm_version;
+};
 
 struct tpm_hardware_version {
 int hw_version;

[PATCH v20210111 28/39] tools/guest: restore: move pfns array in populate_pfns

2021-01-11 Thread Olaf Hering

Remove allocation from hotpath, move populate_pfns' pfns array into 
preallocated space.
Use some prefix to avoid conflict with an array used in handle_page_data.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  |  1 +
 tools/libs/guest/xg_sr_restore.c | 11 +--
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 96a77b5969..3fe665b91d 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -232,6 +232,7 @@ struct xc_sr_restore_arrays {
 int map_errs[MAX_BATCH_SIZE];
 /* populate_pfns */
 xen_pfn_t pp_mfns[MAX_BATCH_SIZE];
+xen_pfn_t pp_pfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 85a32aaed2..71b39612ee 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -139,17 +139,10 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int 
count,
 {
 xc_interface *xch = ctx->xch;
 xen_pfn_t *mfns = ctx->restore.m->pp_mfns,
-*pfns = malloc(count * sizeof(*pfns));
+*pfns = ctx->restore.m->pp_pfns;
 unsigned int i, nr_pfns = 0;
 int rc = -1;
 
-if ( !pfns )
-{
-ERROR("Failed to allocate %zu bytes for populating the physmap",
-  2 * count * sizeof(*mfns));
-goto err;
-}
-
 for ( i = 0; i < count; ++i )
 {
 if ( (!types ||
@@ -190,8 +183,6 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int 
count,
 rc = 0;
 
  err:
-free(pfns);
-
 return rc;
 }

[PATCH v20210111 34/39] tools: adjust libxl_domain_suspend to receive a struct props

2021-01-11 Thread Olaf Hering

Upcoming changes will pass more knobs down to xc_domain_save.
Adjust the libxl_domain_suspend API to allow easy adding of additional knobs.

No change in behavior intented.

Signed-off-by: Olaf Hering 
---
 tools/include/libxl.h| 26 +++---
 tools/libs/light/libxl_domain.c  |  7 ---
 tools/ocaml/libs/xl/xenlight_stubs.c |  3 ++-
 tools/xl/xl_migrate.c|  9 ++---
 tools/xl/xl_saverestore.c|  3 ++-
 5 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 6546dcd819..94b8f1095f 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -1667,12 +1667,32 @@ static inline int 
libxl_retrieve_domain_configuration_0x041200(
 libxl_retrieve_domain_configuration_0x041200
 #endif
 
+/*
+ * LIBXL_HAVE_DOMAIN_SUSPEND_PROPS indicates that the
+ * libxl_domain_suspend_props() function takes a props struct.
+ */
+#define LIBXL_HAVE_DOMAIN_SUSPEND_PROPS 1
+
+typedef struct {
+uint32_t flags; /* LIBXL_SUSPEND_* */
+} libxl_domain_suspend_props;
+#define LIBXL_SUSPEND_DEBUG 1
+#define LIBXL_SUSPEND_LIVE 2
+
 int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
- int flags, /* LIBXL_SUSPEND_* */
+ libxl_domain_suspend_props *props,
  const libxl_asyncop_how *ao_how)
  LIBXL_EXTERNAL_CALLERS_ONLY;
-#define LIBXL_SUSPEND_DEBUG 1
-#define LIBXL_SUSPEND_LIVE 2
+#if defined(LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x041500
+static inline int libxl_domain_suspend_0x041400(libxl_ctx *ctx, uint32_t domid,
+ int fd, int flags, /* LIBXL_SUSPEND_* */
+ const libxl_asyncop_how *ao_how)
+{
+libxl_domain_suspend_props props = { .flags = flags, };
+return libxl_domain_suspend(ctx, domid, fd, &props, ao_how);
+}
+#define libxl_domain_suspend libxl_domain_suspend_0x041400
+#endif
 
 /*
  * Only suspend domain, do not save its state to file, do not destroy it.
diff --git a/tools/libs/light/libxl_domain.c b/tools/libs/light/libxl_domain.c
index 5d4ec90711..45e0c57c3a 100644
--- a/tools/libs/light/libxl_domain.c
+++ b/tools/libs/light/libxl_domain.c
@@ -505,7 +505,8 @@ static void domain_suspend_cb(libxl__egc *egc,
 
 }
 
-int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd, int flags,
+int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
+ libxl_domain_suspend_props *props,
  const libxl_asyncop_how *ao_how)
 {
 AO_CREATE(ctx, domid, ao_how);
@@ -526,8 +527,8 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, 
int fd, int flags,
 dss->domid = domid;
 dss->fd = fd;
 dss->type = type;
-dss->live = flags & LIBXL_SUSPEND_LIVE;
-dss->debug = flags & LIBXL_SUSPEND_DEBUG;
+dss->live = props->flags & LIBXL_SUSPEND_LIVE;
+dss->debug = props->flags & LIBXL_SUSPEND_DEBUG;
 dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_NONE;
 
 rc = libxl__fd_flags_modify_save(gc, dss->fd,
diff --git a/tools/ocaml/libs/xl/xenlight_stubs.c 
b/tools/ocaml/libs/xl/xenlight_stubs.c
index 352a00134d..eaf7bce35a 100644
--- a/tools/ocaml/libs/xl/xenlight_stubs.c
+++ b/tools/ocaml/libs/xl/xenlight_stubs.c
@@ -614,10 +614,11 @@ value stub_libxl_domain_suspend(value ctx, value domid, 
value fd, value async, v
int ret;
uint32_t c_domid = Int_val(domid);
int c_fd = Int_val(fd);
+libxl_domain_suspend_props props = {};
libxl_asyncop_how *ao_how = aohow_val(async);
 
caml_enter_blocking_section();
-   ret = libxl_domain_suspend(CTX, c_domid, c_fd, 0, ao_how);
+   ret = libxl_domain_suspend(CTX, c_domid, c_fd, &props, ao_how);
caml_leave_blocking_section();
 
free(ao_how);
diff --git a/tools/xl/xl_migrate.c b/tools/xl/xl_migrate.c
index 856a6e2be1..fc9f69bf06 100644
--- a/tools/xl/xl_migrate.c
+++ b/tools/xl/xl_migrate.c
@@ -188,7 +188,10 @@ static void migrate_domain(uint32_t domid, int 
preserve_domid,
 char *away_domname;
 char rc_buf;
 uint8_t *config_data;
-int config_len, flags = LIBXL_SUSPEND_LIVE;
+int config_len;
+libxl_domain_suspend_props props = {
+.flags = LIBXL_SUSPEND_LIVE,
+};
 unsigned xtl_flags = XTL_STDIOSTREAM_HIDE_PROGRESS;
 
 save_domain_core_begin(domid, preserve_domid, override_config_file,
@@ -210,8 +213,8 @@ static void migrate_domain(uint32_t domid, int 
preserve_domid,
 xtl_stdiostream_adjust_flags(logger, xtl_flags, 0);
 
 if (debug)
-flags |= LIBXL_SUSPEND_DEBUG;
-rc = libxl_domain_suspend(ctx, domid, send_fd, flags, NULL);
+props.flags |= LIBXL_SUSPEND_DEBUG;
+rc = libxl_domain_suspend(ctx, domid, send_fd, &props, NULL);
 if (rc) {
 fprintf(stderr, "migration sender: libxl_domain_suspend failed"
 " (rc=%d)\n", rc);
diff --git a/tools/xl/xl_saverestor

[PATCH v20210111 26/39] tools/guest: restore: move map_errs array

2021-01-11 Thread Olaf Hering

Remove allocation from hotpath, move map_errs array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  |  1 +
 tools/libs/guest/xg_sr_restore.c | 12 +---
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 5731a5c186..eba3a49877 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -229,6 +229,7 @@ struct xc_sr_restore_arrays {
 uint32_t types[MAX_BATCH_SIZE];
 /* process_page_data */
 xen_pfn_t mfns[MAX_BATCH_SIZE];
+int map_errs[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 3ba089f862..94c329032f 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -206,21 +206,13 @@ static int process_page_data(struct xc_sr_context *ctx, 
unsigned int count,
 {
 xc_interface *xch = ctx->xch;
 xen_pfn_t *mfns = ctx->restore.m->mfns;
-int *map_errs = malloc(count * sizeof(*map_errs));
+int *map_errs = ctx->restore.m->map_errs;
 int rc;
 void *mapping = NULL, *guest_page = NULL;
 unsigned int i, /* i indexes the pfns from the record. */
 j,  /* j indexes the subset of pfns we decide to map. */
 nr_pages = 0;
 
-if ( !map_errs )
-{
-rc = -1;
-ERROR("Failed to allocate %zu bytes to process page data",
-  count * (sizeof(*mfns) + sizeof(*map_errs)));
-goto err;
-}
-
 rc = populate_pfns(ctx, count, pfns, types);
 if ( rc )
 {
@@ -298,8 +290,6 @@ static int process_page_data(struct xc_sr_context *ctx, 
unsigned int count,
 if ( mapping )
 xenforeignmemory_unmap(xch->fmem, mapping, nr_pages);
 
-free(map_errs);
-
 return rc;
 }

[PATCH v20210111 32/39] tools: remove tabs from code produced by libxl_save_msgs_gen.pl

2021-01-11 Thread Olaf Hering

Signed-off-by: Olaf Hering 
---
 tools/libs/light/libxl_save_msgs_gen.pl | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/libs/light/libxl_save_msgs_gen.pl 
b/tools/libs/light/libxl_save_msgs_gen.pl
index 5bfbd4fd10..9d425b1dee 100755
--- a/tools/libs/light/libxl_save_msgs_gen.pl
+++ b/tools/libs/light/libxl_save_msgs_gen.pl
@@ -120,8 +120,8 @@ sub typeid ($) { my ($t) = @_; $t =~ s/\W/_/; return $t; };
 
 $out_body{'callout'} .= <($c_recv);
 $c_decl .= "void *user)";

[PATCH v20210111 22/39] tools/guest: save: move local_pages array

2021-01-11 Thread Olaf Hering

Remove allocation from hotpath, move local_pages array into preallocated space.

Adjust the code to use the src page as is in case of HVM.
In case of PV the page may need to be normalised, use an private memory
area for this purpose.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h   | 22 ++-
 tools/libs/guest/xg_sr_save.c | 25 +++--
 tools/libs/guest/xg_sr_save_x86_hvm.c |  5 +++--
 tools/libs/guest/xg_sr_save_x86_pv.c  | 31 ++-
 4 files changed, 39 insertions(+), 44 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 33e66678c6..2a020fef5c 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -33,16 +33,12 @@ struct xc_sr_save_ops
  * Optionally transform the contents of a page from being specific to the
  * sending environment, to being generic for the stream.
  *
- * The page of data at the end of 'page' may be a read-only mapping of a
- * running guest; it must not be modified.  If no transformation is
- * required, the callee should leave '*pages' untouched.
+ * The page of data '*src' may be a read-only mapping of a running guest;
+ * it must not be modified. If no transformation is required, the callee
+ * should leave '*src' untouched, and return it via '**ptr'.
  *
- * If a transformation is required, the callee should allocate themselves
- * a local page using malloc() and return it via '*page'.
- *
- * The caller shall free() '*page' in all cases.  In the case that the
- * callee encounters an error, it should *NOT* free() the memory it
- * allocated for '*page'.
+ * If a transformation is required, the callee should provide the
+ * transformed page in a private buffer and return it via '**ptr'.
  *
  * It is valid to fail with EAGAIN if the transformation is not able to be
  * completed at this point.  The page shall be retried later.
@@ -50,7 +46,7 @@ struct xc_sr_save_ops
  * @returns 0 for success, -1 for failure, with errno appropriately set.
  */
 int (*normalise_page)(struct xc_sr_context *ctx, xen_pfn_t type,
-  void **page);
+  void *src, unsigned int idx, void **ptr);
 
 /**
  * Set up local environment to save a domain. (Typically querying
@@ -371,6 +367,12 @@ struct xc_sr_context
 
 union
 {
+struct
+{
+/* Used by write_batch for modified pages. */
+void *normalised_pages;
+} save;
+
 struct
 {
 /* State machine for the order of received records. */
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index a571246894..e724ba9cb8 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -91,11 +91,10 @@ static int write_batch(struct xc_sr_context *ctx)
 xen_pfn_t *mfns = ctx->save.m->mfns, *types = ctx->save.m->types;
 void *guest_mapping = NULL;
 void **guest_data = ctx->save.m->guest_data;
-void **local_pages = NULL;
 int *errors = ctx->save.m->errors, rc = -1;
 unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
 unsigned int nr_pfns = ctx->save.nr_batch_pfns;
-void *page, *orig_page;
+void *src;
 uint64_t *rec_pfns = ctx->save.m->rec_pfns;
 struct iovec *iov = ctx->save.m->iov; int iovcnt = 0;
 struct xc_sr_rec_page_data_header hdr = { 0 };
@@ -105,16 +104,6 @@ static int write_batch(struct xc_sr_context *ctx)
 
 assert(nr_pfns != 0);
 
-/* Pointers to locally allocated pages.  Need freeing. */
-local_pages = calloc(nr_pfns, sizeof(*local_pages));
-
-if ( !local_pages )
-{
-ERROR("Unable to allocate arrays for a batch of %u pages",
-  nr_pfns);
-goto err;
-}
-
 for ( i = 0; i < nr_pfns; ++i )
 {
 types[i] = mfns[i] = ctx->save.ops.pfn_to_gfn(ctx,
@@ -176,11 +165,8 @@ static int write_batch(struct xc_sr_context *ctx)
 goto err;
 }
 
-orig_page = page = guest_mapping + (p * PAGE_SIZE);
-rc = ctx->save.ops.normalise_page(ctx, types[i], &page);
-
-if ( orig_page != page )
-local_pages[i] = page;
+src = guest_mapping + (p * PAGE_SIZE);
+rc = ctx->save.ops.normalise_page(ctx, types[i], src, i, 
&guest_data[i]);
 
 if ( rc )
 {
@@ -195,8 +181,6 @@ static int write_batch(struct xc_sr_context *ctx)
 else
 goto err;
 }
-else
-guest_data[i] = page;
 
 rc = -1;
 ++p;
@@ -255,9 +239,6 @@ static int write_batch(struct xc_sr_context *ctx)
  err:
 if ( guest_mapping )
 xenforeignmemory_un

[PATCH v20210111 33/39] tools: recognize LIBXL_API_VERSION for 4.15

2021-01-11 Thread Olaf Hering

This is required by upcoming API changes.

Signed-off-by: Olaf Hering 
---
 tools/include/libxl.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 3433c950f9..6546dcd819 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -690,7 +690,8 @@ typedef struct libxl__ctx libxl_ctx;
 #if LIBXL_API_VERSION != 0x040200 && LIBXL_API_VERSION != 0x040300 && \
 LIBXL_API_VERSION != 0x040400 && LIBXL_API_VERSION != 0x040500 && \
 LIBXL_API_VERSION != 0x040700 && LIBXL_API_VERSION != 0x040800 && \
-LIBXL_API_VERSION != 0x041300 && LIBXL_API_VERSION != 0x041400
+LIBXL_API_VERSION != 0x041300 && LIBXL_API_VERSION != 0x041400 && \
+LIBXL_API_VERSION != 0x041500
 #error Unknown LIBXL_API_VERSION
 #endif
 #endif

[PATCH v20210111 39/39] tools: add --abort_if_busy to libxl_domain_suspend

2021-01-11 Thread Olaf Hering

Provide a knob to the host admin to abort the live migration of a
running domU if the downtime during final transit will be too long
for the workload within domU.

Adjust error reporting. Add ERROR_MIGRATION_ABORTED to allow callers of
libxl_domain_suspend to distinguish between errors and the requested
constraint.

Adjust precopy_policy to simplify reporting of remaining dirty pages.
The loop in send_memory_live populates ->dirty_count in a different
place than ->iteration. Let it proceeed one more time to provide the
desired information before leaving the loop.

This patch adjusts xl(1) and the libxl API.
External users check LIBXL_HAVE_DOMAIN_SUSPEND_PROPS for the availibility
of the new .abort_if_busy property.

Signed-off-by: Olaf Hering 
---
 docs/man/xl.1.pod.in  |  8 +++
 tools/include/libxl.h |  1 +
 tools/libs/light/libxl_dom_save.c |  7 ++-
 tools/libs/light/libxl_domain.c   |  1 +
 tools/libs/light/libxl_internal.h |  2 ++
 tools/libs/light/libxl_stream_write.c |  9 +++-
 tools/libs/light/libxl_types.idl  |  1 +
 tools/xl/xl_cmdtable.c|  6 +-
 tools/xl/xl_migrate.c | 30 ---
 9 files changed, 55 insertions(+), 10 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 930270fe23..8064acb226 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -508,6 +508,14 @@ low, the guest is suspended and the domU will finally be 
moved to I.
 This allows the host admin to control for how long the domU will likely
 be suspended during transit.
 
+=item B<--abort_if_busy>
+
+Abort migration instead of doing final suspend/move/resume if the
+guest produced more than I dirty pages during th number
+of I iterations.
+This avoids long periods of time where the guest is suspended, which
+may confuse the workload within domU.
+
 =back
 
 =item B [I] I I
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index d45d3a4460..ad660e9c9f 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -1680,6 +1680,7 @@ typedef struct {
 } libxl_domain_suspend_props;
 #define LIBXL_SUSPEND_DEBUG 1
 #define LIBXL_SUSPEND_LIVE 2
+#define LIBXL_SUSPEND_ABORT_IF_BUSY 4
 
 int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
  libxl_domain_suspend_props *props,
diff --git a/tools/libs/light/libxl_dom_save.c 
b/tools/libs/light/libxl_dom_save.c
index ad5df89b2c..1999a8997f 100644
--- a/tools/libs/light/libxl_dom_save.c
+++ b/tools/libs/light/libxl_dom_save.c
@@ -383,11 +383,16 @@ static int 
libxl__domain_save_precopy_policy(precopy_stats_t stats, void *user)
  stats.iteration, stats.dirty_count, stats.total_written);
 if (stats.dirty_count >= 0 && stats.dirty_count < dss->min_remaining)
 goto stop_copy;
-if (stats.iteration >= dss->max_iters)
+if (stats.dirty_count >= 0 && stats.iteration >= dss->max_iters)
 goto stop_copy;
 return XGS_POLICY_CONTINUE_PRECOPY;
 
 stop_copy:
+if (dss->abort_if_busy)
+{
+dss->remaining_dirty_pages = stats.dirty_count;
+return XGS_POLICY_ABORT;
+}
 return XGS_POLICY_STOP_AND_COPY;
 }
 
diff --git a/tools/libs/light/libxl_domain.c b/tools/libs/light/libxl_domain.c
index ae4dc9ad01..913653bd76 100644
--- a/tools/libs/light/libxl_domain.c
+++ b/tools/libs/light/libxl_domain.c
@@ -529,6 +529,7 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, 
int fd,
 dss->type = type;
 dss->max_iters = props->max_iters ?: LIBXL_XGS_POLICY_MAX_ITERATIONS;
 dss->min_remaining = props->min_remaining ?: 
LIBXL_XGS_POLICY_TARGET_DIRTY_COUNT;
+dss->abort_if_busy = props->flags & LIBXL_SUSPEND_ABORT_IF_BUSY;
 dss->live = props->flags & LIBXL_SUSPEND_LIVE;
 dss->debug = props->flags & LIBXL_SUSPEND_DEBUG;
 dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_NONE;
diff --git a/tools/libs/light/libxl_internal.h 
b/tools/libs/light/libxl_internal.h
index d7631022a0..c08918b37b 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -3639,9 +3639,11 @@ struct libxl__domain_save_state {
 libxl_domain_type type;
 int live;
 int debug;
+int abort_if_busy;
 int checkpointed_stream;
 uint32_t max_iters;
 uint32_t min_remaining;
+long remaining_dirty_pages;
 const libxl_domain_remus_info *remus;
 /* private */
 int rc;
diff --git a/tools/libs/light/libxl_stream_write.c 
b/tools/libs/light/libxl_stream_write.c
index 634f3240d1..1ab3943f3e 100644
--- a/tools/libs/light/libxl_stream_write.c
+++ b/tools/libs/light/libxl_stream_write.c
@@ -344,11 +344,18 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void 
*dss_void,
 goto err;
 
 if (retval) {
+if (dss->remaining_dirty_pages) {
+LOGD(NOTICE, dss->domid, "saving domain: aborted,"
+ " %ld remaining dirty pages.", dss->remaining_dirty_pages);
+} else

[PATCH v20210111 37/39] tools: add --max_iters to libxl_domain_suspend

2021-01-11 Thread Olaf Hering

Migrating a large, and potentially busy, domU will take more
time than neccessary due to excessive number of copying iterations.

Allow to host admin to control the number of iterations which
copy cumulated domU dirty pages to the target host.

The default remains 5, which means one initial iteration to copy the
entire domU memory, and up to 4 additional iterations to copy dirty
memory from the still running domU. After the given number of iterations
the domU is suspended, remaining dirty memory is copied and the domU is
finally moved to the target host.

This patch adjusts xl(1) and the libxl API.
External users check LIBXL_HAVE_DOMAIN_SUSPEND_PROPS for the availibility
of the new .max_iters property.

Signed-off-by: Olaf Hering 
---
 docs/man/xl.1.pod.in  |  4 
 tools/include/libxl.h |  1 +
 tools/libs/light/libxl_dom_save.c |  2 +-
 tools/libs/light/libxl_domain.c   |  1 +
 tools/libs/light/libxl_internal.h |  1 +
 tools/xl/xl_cmdtable.c|  3 ++-
 tools/xl/xl_migrate.c | 10 +-
 7 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index e6e4e8e83a..8339bd8e6f 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -496,6 +496,10 @@ such that it will be identical on the destination host, 
unless that
 configuration is overridden using the B<-C> option. Note that it is not
 possible to use this option for a 'localhost' migration.
 
+=item B<--max_iters> I
+
+Number of copy iterations before final suspend+move (default: 5)
+
 =back
 
 =item B [I] I I
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 94b8f1095f..646cef28aa 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -1675,6 +1675,7 @@ static inline int 
libxl_retrieve_domain_configuration_0x041200(
 
 typedef struct {
 uint32_t flags; /* LIBXL_SUSPEND_* */
+uint32_t max_iters;
 } libxl_domain_suspend_props;
 #define LIBXL_SUSPEND_DEBUG 1
 #define LIBXL_SUSPEND_LIVE 2
diff --git a/tools/libs/light/libxl_dom_save.c 
b/tools/libs/light/libxl_dom_save.c
index 3f3cff0342..938c0127f3 100644
--- a/tools/libs/light/libxl_dom_save.c
+++ b/tools/libs/light/libxl_dom_save.c
@@ -383,7 +383,7 @@ static int 
libxl__domain_save_precopy_policy(precopy_stats_t stats, void *user)
  stats.iteration, stats.dirty_count, stats.total_written);
 if (stats.dirty_count >= 0 && stats.dirty_count < 
LIBXL_XGS_POLICY_TARGET_DIRTY_COUNT)
 goto stop_copy;
-if (stats.iteration >= LIBXL_XGS_POLICY_MAX_ITERATIONS)
+if (stats.iteration >= dss->max_iters)
 goto stop_copy;
 return XGS_POLICY_CONTINUE_PRECOPY;
 
diff --git a/tools/libs/light/libxl_domain.c b/tools/libs/light/libxl_domain.c
index 45e0c57c3a..612d3dc4ea 100644
--- a/tools/libs/light/libxl_domain.c
+++ b/tools/libs/light/libxl_domain.c
@@ -527,6 +527,7 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, 
int fd,
 dss->domid = domid;
 dss->fd = fd;
 dss->type = type;
+dss->max_iters = props->max_iters ?: LIBXL_XGS_POLICY_MAX_ITERATIONS;
 dss->live = props->flags & LIBXL_SUSPEND_LIVE;
 dss->debug = props->flags & LIBXL_SUSPEND_DEBUG;
 dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_NONE;
diff --git a/tools/libs/light/libxl_internal.h 
b/tools/libs/light/libxl_internal.h
index d4cc694c01..80a31c3a88 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -3640,6 +3640,7 @@ struct libxl__domain_save_state {
 int live;
 int debug;
 int checkpointed_stream;
+uint32_t max_iters;
 const libxl_domain_remus_info *remus;
 /* private */
 int rc;
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index a0567169bf..75e2e4d54b 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -170,7 +170,8 @@ struct cmd_spec cmd_table[] = {
   "-T  Show timestamps during the migration process.\n"
   "--debug Verify transferred domU page data.\n"
   "-p  Do not unpause domain after migrating it.\n"
-  "-D  Preserve the domain id"
+  "-D  Preserve the domain id\n"
+  "--max_iters N   Number of copy iterations before final stop+move"
 },
 { "restore",
   &main_restore, 0, 1,
diff --git a/tools/xl/xl_migrate.c b/tools/xl/xl_migrate.c
index fc9f69bf06..a724bc21ea 100644
--- a/tools/xl/xl_migrate.c
+++ b/tools/xl/xl_migrate.c
@@ -180,6 +180,7 @@ static void migrate_do_preamble(int send_fd, int recv_fd, 
pid_t child,
 
 static void migrate_domain(uint32_t domid, int preserve_domid,
const char *rune, int debug,
+   uint32_t max_iters,
const char *override_config_file)
 {
 pid_t child = -1;
@@ -191,6 +192,7 @@ static void migrate_domain(uint32_t domid, int 
preserve_domid,
 int config_len;
 libxl_domain_suspend_props props = {
 .flags = LIBXL_SUSPEND_L

[PATCH v20210111 06/39] Use XEN_SCRIPT_DIR to refer to /etc/xen/scripts

2021-01-11 Thread Olaf Hering

Replace all hardcoded paths to use XEN_SCRIPT_DIR to expand the actual location.

Update .gitignore.

Signed-off-by: Olaf Hering 
---
 .gitignore  | 3 +++
 docs/configure.ac   | 3 +++
 ...k-configuration.5.pod => xl-disk-configuration.5.pod.in} | 2 +-
 ...onfiguration.5.pod => xl-network-configuration.5.pod.in} | 4 ++--
 docs/man/xl.1.pod.in| 2 +-
 docs/man/{xl.conf.5.pod => xl.conf.5.pod.in}| 6 +++---
 docs/misc/block-scripts.txt | 2 +-
 tools/xl/xl_cmdtable.c  | 2 +-
 8 files changed, 15 insertions(+), 9 deletions(-)
 rename docs/man/{xl-disk-configuration.5.pod => 
xl-disk-configuration.5.pod.in} (99%)
 rename docs/man/{xl-network-configuration.5.pod => 
xl-network-configuration.5.pod.in} (98%)
 rename docs/man/{xl.conf.5.pod => xl.conf.5.pod.in} (97%)

diff --git a/.gitignore b/.gitignore
index b169d78ed7..76c13f3189 100644
--- a/.gitignore
+++ b/.gitignore
@@ -48,7 +48,10 @@ dist/*
 docs/tmp.*
 docs/html/
 docs/man/xl.cfg.5.pod
+docs/man/xl-disk-configuration.5.pod
+docs/man/xl-network-configuration.5.pod
 docs/man/xl.1.pod
+docs/man/xl.conf.5.pod
 docs/man1/
 docs/man5/
 docs/man7/
diff --git a/docs/configure.ac b/docs/configure.ac
index cb5a6eaa4c..c2e5edd3b3 100644
--- a/docs/configure.ac
+++ b/docs/configure.ac
@@ -9,6 +9,9 @@ AC_CONFIG_FILES([
 ../config/Docs.mk
 man/xl.cfg.5.pod
 man/xl.1.pod
+man/xl-disk-configuration.5.pod
+man/xl-network-configuration.5.pod
+man/xl.conf.5.pod
 ])
 AC_CONFIG_AUX_DIR([../])
 
diff --git a/docs/man/xl-disk-configuration.5.pod 
b/docs/man/xl-disk-configuration.5.pod.in
similarity index 99%
rename from docs/man/xl-disk-configuration.5.pod
rename to docs/man/xl-disk-configuration.5.pod.in
index 46feedb95e..71d0e86e3d 100644
--- a/docs/man/xl-disk-configuration.5.pod
+++ b/docs/man/xl-disk-configuration.5.pod.in
@@ -257,7 +257,7 @@ automatically determine the most suitable backend.
 
 Specifies that B is not a normal host path, but rather
 information to be interpreted by the executable program I

[PATCH v20210111 31/39] tools/guest: restore: write data directly into guest

2021-01-11 Thread Olaf Hering

Read incoming migration stream directly into the guest memory.
This avoids the memory allocation and copying, and the resulting
performance penalty.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  |   1 +
 tools/libs/guest/xg_sr_restore.c | 132 ++-
 2 files changed, 129 insertions(+), 4 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 7ec8867b88..f76af23bcc 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -231,6 +231,7 @@ struct xc_sr_restore_arrays {
 xen_pfn_t mfns[MAX_BATCH_SIZE];
 int map_errs[MAX_BATCH_SIZE];
 void *guest_data[MAX_BATCH_SIZE];
+struct iovec iov[MAX_BATCH_SIZE];
 
 /* populate_pfns */
 xen_pfn_t pp_mfns[MAX_BATCH_SIZE];
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 060f3d1f4e..2f575d7dd9 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -392,6 +392,122 @@ err:
 return rc;
 }
 
+/*
+ * Handle PAGE_DATA record from the stream.
+ * Given a list of pfns, their types, and a block of page data from the
+ * stream, populate and record their types, map the relevant subset and copy
+ * the data into the guest.
+ */
+static int handle_incoming_page_data(struct xc_sr_context *ctx,
+ struct xc_sr_rhdr *rhdr)
+{
+xc_interface *xch = ctx->xch;
+struct xc_sr_restore_arrays *m = ctx->restore.m;
+struct xc_sr_rec_page_data_header *pages = &m->pages;
+uint64_t *pfn_nums = m->pages.pfn;
+uint32_t i;
+int rc, iov_idx;
+
+rc = handle_static_data_end_v2(ctx);
+if ( rc )
+goto err;
+
+/* First read and verify the header */
+rc = read_exact(ctx->fd, pages, sizeof(*pages));
+if ( rc )
+{
+PERROR("Could not read rec_pfn header");
+goto err;
+}
+
+if ( verify_rec_page_hdr(ctx, rhdr->length, pages) == false )
+{
+rc = -1;
+goto err;
+}
+
+/* Then read and verify the incoming pfn numbers */
+rc = read_exact(ctx->fd, pfn_nums, sizeof(*pfn_nums) * pages->count);
+if ( rc )
+{
+PERROR("Could not read rec_pfn data");
+goto err;
+}
+
+if ( verify_rec_page_pfns(ctx, rhdr->length, pages) == false )
+{
+rc = -1;
+goto err;
+}
+
+/* Finally read and verify the incoming pfn data */
+rc = map_guest_pages(ctx, pages);
+if ( rc )
+goto err;
+
+/* Prepare read buffers, either guest or throw away memory */
+for ( i = 0, iov_idx = 0; i < pages->count; i++ )
+{
+if ( !m->guest_data[i] )
+continue;
+
+m->iov[iov_idx].iov_len = PAGE_SIZE;
+if ( ctx->restore.verify )
+m->iov[iov_idx].iov_base = ctx->restore.verify_buf + i * PAGE_SIZE;
+else
+m->iov[iov_idx].iov_base = m->guest_data[i];
+iov_idx++;
+}
+
+if ( !iov_idx )
+goto done;
+
+rc = readv_exact(ctx->fd, m->iov, iov_idx);
+if ( rc )
+{
+PERROR("read of %d pages failed", iov_idx);
+goto err;
+}
+
+/* Post-processing of pfn data */
+for ( i = 0, iov_idx = 0; i < pages->count; i++ )
+{
+if ( !m->guest_data[i] )
+continue;
+
+rc = ctx->restore.ops.localise_page(ctx, m->types[i], 
m->iov[iov_idx].iov_base);
+if ( rc )
+{
+ERROR("Failed to localise pfn %#"PRIpfn" (type %#"PRIx32")",
+  m->pfns[i], m->types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
+goto err;
+
+}
+
+if ( ctx->restore.verify )
+{
+if ( memcmp(m->guest_data[i], m->iov[iov_idx].iov_base, PAGE_SIZE) 
)
+{
+ERROR("verify pfn %#"PRIpfn" failed (type %#"PRIx32")",
+  m->pfns[i], m->types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
+}
+}
+
+iov_idx++;
+}
+
+done:
+rc = 0;
+
+err:
+if ( ctx->restore.guest_mapping )
+{
+xenforeignmemory_unmap(xch->fmem, ctx->restore.guest_mapping, 
ctx->restore.nr_mapped_pages);
+ctx->restore.guest_mapping = NULL;
+}
+return rc;
+}
+
 /*
  * Handle PAGE_DATA record from an existing buffer
  * Given a list of pfns, their types, and a block of page data from the
@@ -773,11 +889,19 @@ static int process_incoming_record_header(struct 
xc_sr_context *ctx, struct xc_s
 struct xc_sr_record rec;
 int rc;
 
-rc = read_record_data(ctx, ctx->fd, rhdr, &rec);
-if ( rc )
-return rc;
+switch ( rhdr->type )
+{
+case REC_TYPE_PAGE_DATA:
+rc = handle_incoming_page_data(ctx, rhdr);
+break;
+default:
+rc = read_record_data(ctx, ctx->fd, rhdr, &rec);
+if ( rc == 0 )
+rc = process_buffered_record(ctx, &rec);;
+break;
+}
 
-return process_buffered_record(ctx, &rec);
+return rc;
 }

[PATCH v20210111 00/39] leftover from 2020

2021-01-11 Thread Olaf Hering

Various unreviewed changes.

Olaf Hering (39):
  stubdom: fix tpm_version
  xl: use proper name for bash_completion file
  docs: remove stale create example from xl.1
  docs: substitute XEN_CONFIG_DIR in xl.conf.5
  tools: add with-xen-scriptdir configure option
  Use XEN_SCRIPT_DIR to refer to /etc/xen/scripts
  xl: optionally print timestamps during xl migrate
  xl: fix description of migrate --debug
  tools: add readv_exact to libxenctrl
  tools: add xc_is_known_page_type to libxenctrl
  tools: use xc_is_known_page_type
  tools: unify type checking for data pfns in migration stream
  tools: show migration transfer rate in send_dirty_pages
  tools/guest: prepare to allocate arrays once
  tools/guest: save: move batch_pfns
  tools/guest: save: move mfns array
  tools/guest: save: move types array
  tools/guest: save: move errors array
  tools/guest: save: move iov array
  tools/guest: save: move rec_pfns array
  tools/guest: save: move guest_data array
  tools/guest: save: move local_pages array
  tools/guest: restore: move pfns array
  tools/guest: restore: move types array
  tools/guest: restore: move mfns array
  tools/guest: restore: move map_errs array
  tools/guest: restore: move mfns array in populate_pfns
  tools/guest: restore: move pfns array in populate_pfns
  tools/guest: restore: split record processing
  tools/guest: restore: split handle_page_data
  tools/guest: restore: write data directly into guest
  tools: remove tabs from code produced by libxl_save_msgs_gen.pl
  tools: recognize LIBXL_API_VERSION for 4.15
  tools: adjust libxl_domain_suspend to receive a struct props
  tools: change struct precopy_stats to precopy_stats_t
  tools: add callback to libxl for precopy_policy and precopy_stats_t
  tools: add --max_iters to libxl_domain_suspend
  tools: add --min_remaining to libxl_domain_suspend
  tools: add --abort_if_busy to libxl_domain_suspend

 .gitignore|   3 +
 docs/configure.ac |   3 +
 ...n.5.pod => xl-disk-configuration.5.pod.in} |   2 +-
 pod => xl-network-configuration.5.pod.in} |   4 +-
 docs/man/xl.1.pod.in  |  39 +-
 docs/man/{xl.conf.5.pod => xl.conf.5.pod.in}  |   8 +-
 docs/misc/block-scripts.txt   |   2 +-
 m4/paths.m4   |   8 +-
 stubdom/vtpmmgr/vtpmmgr.h |   2 +-
 tools/include/libxl.h |  32 +-
 tools/include/xenguest.h  |   7 +-
 tools/libs/ctrl/xc_private.c  |  55 +-
 tools/libs/ctrl/xc_private.h  |  34 ++
 tools/libs/guest/xg_sr_common.c   |  33 +-
 tools/libs/guest/xg_sr_common.h   |  88 ++-
 tools/libs/guest/xg_sr_restore.c  | 562 --
 tools/libs/guest/xg_sr_save.c | 164 ++---
 tools/libs/guest/xg_sr_save_x86_hvm.c |   5 +-
 tools/libs/guest/xg_sr_save_x86_pv.c  |  31 +-
 tools/libs/light/libxl_dom_save.c |  24 +
 tools/libs/light/libxl_domain.c   |  10 +-
 tools/libs/light/libxl_internal.h |   6 +
 tools/libs/light/libxl_save_msgs_gen.pl   |  23 +-
 tools/libs/light/libxl_stream_write.c |   9 +-
 tools/libs/light/libxl_types.idl  |   1 +
 tools/ocaml/libs/xl/xenlight_stubs.c  |   3 +-
 tools/xl/Makefile |   4 +-
 tools/xl/bash-completion  |   2 +-
 tools/xl/xl_cmdtable.c|  29 +-
 tools/xl/xl_migrate.c |  79 ++-
 tools/xl/xl_saverestore.c |   3 +-
 31 files changed, 901 insertions(+), 374 deletions(-)
 rename docs/man/{xl-disk-configuration.5.pod => 
xl-disk-configuration.5.pod.in} (99%)
 rename docs/man/{xl-network-configuration.5.pod => 
xl-network-configuration.5.pod.in} (98%)
 rename docs/man/{xl.conf.5.pod => xl.conf.5.pod.in} (96%)

[PATCH v20210111 19/39] tools/guest: save: move iov array

2021-01-11 Thread Olaf Hering

Remove allocation from hotpath, move iov array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h | 2 ++
 tools/libs/guest/xg_sr_save.c   | 7 ++-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 71b676c0e0..f315b4f526 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -219,6 +219,8 @@ struct xc_sr_save_arrays {
 xen_pfn_t types[MAX_BATCH_SIZE];
 /* write_batch: Errors from attempting to map the gfns. */
 int errors[MAX_BATCH_SIZE];
+/* write_batch: iovec[] for writev(). */
+struct iovec iov[MAX_BATCH_SIZE + 4];
 };
 
 struct xc_sr_restore_arrays {
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index a1bddd5dcb..1d04bd0a44 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -97,7 +97,7 @@ static int write_batch(struct xc_sr_context *ctx)
 unsigned int nr_pfns = ctx->save.nr_batch_pfns;
 void *page, *orig_page;
 uint64_t *rec_pfns = NULL;
-struct iovec *iov = NULL; int iovcnt = 0;
+struct iovec *iov = ctx->save.m->iov; int iovcnt = 0;
 struct xc_sr_rec_page_data_header hdr = { 0 };
 struct xc_sr_record rec = {
 .type = REC_TYPE_PAGE_DATA,
@@ -109,10 +109,8 @@ static int write_batch(struct xc_sr_context *ctx)
 guest_data = calloc(nr_pfns, sizeof(*guest_data));
 /* Pointers to locally allocated pages.  Need freeing. */
 local_pages = calloc(nr_pfns, sizeof(*local_pages));
-/* iovec[] for writev(). */
-iov = malloc((nr_pfns + 4) * sizeof(*iov));
 
-if ( !guest_data || !local_pages || !iov )
+if ( !guest_data || !local_pages )
 {
 ERROR("Unable to allocate arrays for a batch of %u pages",
   nr_pfns);
@@ -266,7 +264,6 @@ static int write_batch(struct xc_sr_context *ctx)
 xenforeignmemory_unmap(xch->fmem, guest_mapping, nr_pages_mapped);
 for ( i = 0; local_pages && i < nr_pfns; ++i )
 free(local_pages[i]);
-free(iov);
 free(local_pages);
 free(guest_data);

[PATCH v20210111 36/39] tools: add callback to libxl for precopy_policy and precopy_stats_t

2021-01-11 Thread Olaf Hering

This duplicates simple_precopy_policy. To recap its purpose:
- do up to 5 iterations of copying dirty domU memory to target,
  including the initial copying of all domU memory, excluding
  the final copying while the domU is suspended
- do fewer iterations in case the domU dirtied less than 50 pages

Take the opportunity to also move xen_pfn_t into qw().

Signed-off-by: Olaf Hering 
---
 tools/libs/light/libxl_dom_save.c   | 19 +++
 tools/libs/light/libxl_internal.h   |  2 ++
 tools/libs/light/libxl_save_msgs_gen.pl |  3 ++-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/tools/libs/light/libxl_dom_save.c 
b/tools/libs/light/libxl_dom_save.c
index 32e3cb5a13..3f3cff0342 100644
--- a/tools/libs/light/libxl_dom_save.c
+++ b/tools/libs/light/libxl_dom_save.c
@@ -373,6 +373,24 @@ int 
libxl__save_emulator_xenstore_data(libxl__domain_save_state *dss,
 return rc;
 }
 
+static int libxl__domain_save_precopy_policy(precopy_stats_t stats, void *user)
+{
+libxl__save_helper_state *shs = user;
+libxl__domain_save_state *dss = shs->caller_state;
+STATE_AO_GC(dss->ao);
+
+LOGD(DEBUG, shs->domid, "iteration %u dirty_count %ld total_written %lu",
+ stats.iteration, stats.dirty_count, stats.total_written);
+if (stats.dirty_count >= 0 && stats.dirty_count < 
LIBXL_XGS_POLICY_TARGET_DIRTY_COUNT)
+goto stop_copy;
+if (stats.iteration >= LIBXL_XGS_POLICY_MAX_ITERATIONS)
+goto stop_copy;
+return XGS_POLICY_CONTINUE_PRECOPY;
+
+stop_copy:
+return XGS_POLICY_STOP_AND_COPY;
+}
+
 /*- main code for saving, in order of execution -*/
 
 void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
@@ -430,6 +448,7 @@ void libxl__domain_save(libxl__egc *egc, 
libxl__domain_save_state *dss)
 callbacks->suspend = libxl__domain_suspend_callback;
 
 callbacks->switch_qemu_logdirty = 
libxl__domain_suspend_common_switch_qemu_logdirty;
+callbacks->precopy_policy = libxl__domain_save_precopy_policy;
 
 dss->sws.ao  = dss->ao;
 dss->sws.dss = dss;
diff --git a/tools/libs/light/libxl_internal.h 
b/tools/libs/light/libxl_internal.h
index c79523ba92..d4cc694c01 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -124,6 +124,8 @@
 #define DOMID_XS_PATH "domid"
 #define PVSHIM_BASENAME "xen-shim"
 #define PVSHIM_CMDLINE "pv-shim console=xen,pv"
+#define LIBXL_XGS_POLICY_MAX_ITERATIONS 5
+#define LIBXL_XGS_POLICY_TARGET_DIRTY_COUNT 50
 
 /* Size macros. */
 #define __AC(X,Y)   (X##Y)
diff --git a/tools/libs/light/libxl_save_msgs_gen.pl 
b/tools/libs/light/libxl_save_msgs_gen.pl
index 9d425b1dee..7481818361 100755
--- a/tools/libs/light/libxl_save_msgs_gen.pl
+++ b/tools/libs/light/libxl_save_msgs_gen.pl
@@ -23,6 +23,7 @@ our @msgs = (
  STRING doing_what),
 'unsigned long', 'done',
 'unsigned long', 'total'] ],
+[ 'scxW',   "precopy_policy", ['precopy_stats_t', 'stats'] ],
 [ 'srcxA',  "suspend", [] ],
 [ 'srcxA',  "postcopy", [] ],
 [ 'srcxA',  "checkpoint", [] ],
@@ -142,7 +143,7 @@ static void bytes_put(unsigned char *const buf, int *len,
 
 END
 
-foreach my $simpletype (qw(int uint16_t uint32_t unsigned), 'unsigned long', 
'xen_pfn_t') {
+foreach my $simpletype (qw(int uint16_t uint32_t unsigned precopy_stats_t 
xen_pfn_t), 'unsigned long') {
 my $typeid = typeid($simpletype);
 $out_body{'callout'} .= <

[PATCH v20210111 38/39] tools: add --min_remaining to libxl_domain_suspend

2021-01-11 Thread Olaf Hering

The decision to stop+move a domU to the new host must be based on two factors:
- the available network bandwidth for the migration stream
- the maximum time a workload within a domU can be savely suspended

Both values define how many dirty pages a workload may produce prior the
final stop+move.

The default value of 50 pages is much too low with todays network bandwidths.
On an idle 1GiB link these 200K will be transferred within ~2ms.

Give the admin a knob to adjust the point when the final stop+move will
be done, so he can base this decision on his own needs.

This patch adjusts xl(1) and the libxl API.
External users check LIBXL_HAVE_DOMAIN_SUSPEND_PROPS for the availibility
of the new .min_remaining property.

Signed-off-by: Olaf Hering 
---
 docs/man/xl.1.pod.in  |  8 
 tools/include/libxl.h |  1 +
 tools/libs/light/libxl_dom_save.c |  2 +-
 tools/libs/light/libxl_domain.c   |  1 +
 tools/libs/light/libxl_internal.h |  1 +
 tools/xl/xl_cmdtable.c| 25 +
 tools/xl/xl_migrate.c |  9 -
 7 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 8339bd8e6f..930270fe23 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -500,6 +500,14 @@ possible to use this option for a 'localhost' migration.
 
 Number of copy iterations before final suspend+move (default: 5)
 
+=item B<--min_remaing> I
+
+Number of remaining dirty pages. If the number of dirty pages drops that
+low, the guest is suspended and the domU will finally be moved to I.
+
+This allows the host admin to control for how long the domU will likely
+be suspended during transit.
+
 =back
 
 =item B [I] I I
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 646cef28aa..d45d3a4460 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -1676,6 +1676,7 @@ static inline int 
libxl_retrieve_domain_configuration_0x041200(
 typedef struct {
 uint32_t flags; /* LIBXL_SUSPEND_* */
 uint32_t max_iters;
+uint32_t min_remaining;
 } libxl_domain_suspend_props;
 #define LIBXL_SUSPEND_DEBUG 1
 #define LIBXL_SUSPEND_LIVE 2
diff --git a/tools/libs/light/libxl_dom_save.c 
b/tools/libs/light/libxl_dom_save.c
index 938c0127f3..ad5df89b2c 100644
--- a/tools/libs/light/libxl_dom_save.c
+++ b/tools/libs/light/libxl_dom_save.c
@@ -381,7 +381,7 @@ static int 
libxl__domain_save_precopy_policy(precopy_stats_t stats, void *user)
 
 LOGD(DEBUG, shs->domid, "iteration %u dirty_count %ld total_written %lu",
  stats.iteration, stats.dirty_count, stats.total_written);
-if (stats.dirty_count >= 0 && stats.dirty_count < 
LIBXL_XGS_POLICY_TARGET_DIRTY_COUNT)
+if (stats.dirty_count >= 0 && stats.dirty_count < dss->min_remaining)
 goto stop_copy;
 if (stats.iteration >= dss->max_iters)
 goto stop_copy;
diff --git a/tools/libs/light/libxl_domain.c b/tools/libs/light/libxl_domain.c
index 612d3dc4ea..ae4dc9ad01 100644
--- a/tools/libs/light/libxl_domain.c
+++ b/tools/libs/light/libxl_domain.c
@@ -528,6 +528,7 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, 
int fd,
 dss->fd = fd;
 dss->type = type;
 dss->max_iters = props->max_iters ?: LIBXL_XGS_POLICY_MAX_ITERATIONS;
+dss->min_remaining = props->min_remaining ?: 
LIBXL_XGS_POLICY_TARGET_DIRTY_COUNT;
 dss->live = props->flags & LIBXL_SUSPEND_LIVE;
 dss->debug = props->flags & LIBXL_SUSPEND_DEBUG;
 dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_NONE;
diff --git a/tools/libs/light/libxl_internal.h 
b/tools/libs/light/libxl_internal.h
index 80a31c3a88..d7631022a0 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -3641,6 +3641,7 @@ struct libxl__domain_save_state {
 int debug;
 int checkpointed_stream;
 uint32_t max_iters;
+uint32_t min_remaining;
 const libxl_domain_remus_info *remus;
 /* private */
 int rc;
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 75e2e4d54b..2c1acaf536 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -160,18 +160,19 @@ struct cmd_spec cmd_table[] = {
   &main_migrate, 0, 1,
   "Migrate a domain to another host",
   "[options]  ",
-  "-h  Print this help.\n"
-  "-C  Send  instead of config file from creation.\n"
-  "-s  Use  instead of ssh.  String will be 
passed\n"
-  "to sh. If empty, run  instead of ssh  xl\n"
-  "migrate-receive [-d -e]\n"
-  "-e  Do not wait in the background (on ) for the 
death\n"
-  "of the domain.\n"
-  "-T  Show timestamps during the migration process.\n"
-  "--debug Verify transferred domU page data.\n"
-  "-p  Do not unpause domain after migrating it.\n"
-  "-D  Preserve the domain id\n"
-  "--max_iters N   Number of copy iterations be

[PATCH v20210111 30/39] tools/guest: restore: split handle_page_data

2021-01-11 Thread Olaf Hering

handle_page_data must be able to read directly into mapped guest memory.
This will avoid unneccesary memcpy calls for data that can be consumed verbatim.

Split the various steps of record processing:
- move processing to handle_buffered_page_data
- adjust xenforeignmemory_map to set errno in case of failure
- adjust verify mode to set errno in case of failure

This change is preparation for future changes in handle_page_data,
no change in behavior is intended.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  |   9 +
 tools/libs/guest/xg_sr_restore.c | 343 ---
 2 files changed, 231 insertions(+), 121 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 66d1b0dfe6..7ec8867b88 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -230,9 +230,14 @@ struct xc_sr_restore_arrays {
 /* process_page_data */
 xen_pfn_t mfns[MAX_BATCH_SIZE];
 int map_errs[MAX_BATCH_SIZE];
+void *guest_data[MAX_BATCH_SIZE];
+
 /* populate_pfns */
 xen_pfn_t pp_mfns[MAX_BATCH_SIZE];
 xen_pfn_t pp_pfns[MAX_BATCH_SIZE];
+
+/* Must be the last member */
+struct xc_sr_rec_page_data_header pages;
 };
 
 struct xc_sr_context
@@ -323,7 +328,11 @@ struct xc_sr_context
 
 /* Sender has invoked verify mode on the stream. */
 bool verify;
+void *verify_buf;
+
 struct xc_sr_restore_arrays *m;
+void *guest_mapping;
+uint32_t nr_mapped_pages;
 } restore;
 };
 
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 93f69b9ba8..060f3d1f4e 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -186,123 +186,18 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned 
int count,
 return rc;
 }
 
-/*
- * Given a list of pfns, their types, and a block of page data from the
- * stream, populate and record their types, map the relevant subset and copy
- * the data into the guest.
- */
-static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
- xen_pfn_t *pfns, uint32_t *types, void *page_data)
+static int handle_static_data_end_v2(struct xc_sr_context *ctx)
 {
-xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = ctx->restore.m->mfns;
-int *map_errs = ctx->restore.m->map_errs;
-int rc;
-void *mapping = NULL, *guest_page = NULL;
-unsigned int i, /* i indexes the pfns from the record. */
-j,  /* j indexes the subset of pfns we decide to map. */
-nr_pages = 0;
-
-rc = populate_pfns(ctx, count, pfns, types);
-if ( rc )
-{
-ERROR("Failed to populate pfns for batch of %u pages", count);
-goto err;
-}
-
-for ( i = 0; i < count; ++i )
-{
-ctx->restore.ops.set_page_type(ctx, pfns[i], types[i]);
-
-if ( page_type_has_stream_data(types[i]) == true )
-mfns[nr_pages++] = ctx->restore.ops.pfn_to_gfn(ctx, pfns[i]);
-}
-
-/* Nothing to do? */
-if ( nr_pages == 0 )
-goto done;
-
-mapping = guest_page = xenforeignmemory_map(
-xch->fmem, ctx->domid, PROT_READ | PROT_WRITE,
-nr_pages, mfns, map_errs);
-if ( !mapping )
-{
-rc = -1;
-PERROR("Unable to map %u mfns for %u pages of data",
-   nr_pages, count);
-goto err;
-}
-
-for ( i = 0, j = 0; i < count; ++i )
-{
-if ( page_type_has_stream_data(types[i]) == false )
-continue;
-
-if ( map_errs[j] )
-{
-rc = -1;
-ERROR("Mapping pfn %#"PRIpfn" (mfn %#"PRIpfn", type %#"PRIx32") 
failed with %d",
-  pfns[i], mfns[j], types[i], map_errs[j]);
-goto err;
-}
-
-/* Undo page normalisation done by the saver. */
-rc = ctx->restore.ops.localise_page(ctx, types[i], page_data);
-if ( rc )
-{
-ERROR("Failed to localise pfn %#"PRIpfn" (type %#"PRIx32")",
-  pfns[i], types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
-goto err;
-}
-
-if ( ctx->restore.verify )
-{
-/* Verify mode - compare incoming data to what we already have. */
-if ( memcmp(guest_page, page_data, PAGE_SIZE) )
-ERROR("verify pfn %#"PRIpfn" failed (type %#"PRIx32")",
-  pfns[i], types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
-}
-else
-{
-/* Regular mode - copy incoming data into place. */
-memcpy(guest_page, page_data, PAGE_SIZE);
-}
-
-++j;
-guest_page += PAGE_SIZE;
-page_data += PAGE_SIZE;
-}
-
- done:
-rc = 0;
-
- err:
-if ( mapping )
-xenforeignmemory_unmap(xch->fmem, mapping, nr_pages);
-
-return rc;
-}
+int rc = 0;
 
-/*
- * Validate a PAGE_DATA record from the stream, and p

[PATCH v20210111 24/39] tools/guest: restore: move types array

2021-01-11 Thread Olaf Hering

Remove allocation from hotpath, move types array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  |  1 +
 tools/libs/guest/xg_sr_restore.c | 12 +---
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 516d9b9fb5..0ceecb289d 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -226,6 +226,7 @@ struct xc_sr_save_arrays {
 struct xc_sr_restore_arrays {
 /* handle_page_data */
 xen_pfn_t pfns[MAX_BATCH_SIZE];
+uint32_t types[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 7d1447e86c..7729071e41 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -316,7 +316,7 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 int rc = -1;
 
 xen_pfn_t *pfns = ctx->restore.m->pfns, pfn;
-uint32_t *types = NULL, type;
+uint32_t *types = ctx->restore.m->types, type;
 
 /*
  * v2 compatibility only exists for x86 streams.  This is a bit of a
@@ -363,14 +363,6 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 goto err;
 }
 
-types = malloc(pages->count * sizeof(*types));
-if ( !types )
-{
-ERROR("Unable to allocate enough memory for %u pfns",
-  pages->count);
-goto err;
-}
-
 for ( i = 0; i < pages->count; ++i )
 {
 pfn = pages->pfn[i] & PAGE_DATA_PFN_MASK;
@@ -410,8 +402,6 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 rc = process_page_data(ctx, pages->count, pfns, types,
&pages->pfn[pages->count]);
  err:
-free(types);
-
 return rc;
 }

[PATCH v20210111 35/39] tools: change struct precopy_stats to precopy_stats_t

2021-01-11 Thread Olaf Hering

This will help libxl_save_msgs_gen.pl to copy the struct as a region of memory.

No change in behavior intented.

Signed-off-by: Olaf Hering 
---
 tools/include/xenguest.h| 7 +++
 tools/libs/guest/xg_sr_common.h | 2 +-
 tools/libs/guest/xg_sr_save.c   | 6 +++---
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/tools/include/xenguest.h b/tools/include/xenguest.h
index 775cf34c04..b567d7e0ec 100644
--- a/tools/include/xenguest.h
+++ b/tools/include/xenguest.h
@@ -435,18 +435,17 @@ static inline xen_pfn_t xc_dom_p2m(struct xc_dom_image 
*dom, xen_pfn_t pfn)
 struct xenevtchn_handle;
 
 /* For save's precopy_policy(). */
-struct precopy_stats
-{
+typedef struct {
 unsigned int iteration;
 unsigned long total_written;
 long dirty_count; /* -1 if unknown */
-};
+} precopy_stats_t;
 
 /*
  * A precopy_policy callback may not be running in the same address
  * space as libxc an so precopy_stats is passed by value.
  */
-typedef int (*precopy_policy_t)(struct precopy_stats, void *);
+typedef int (*precopy_policy_t)(precopy_stats_t, void *);
 
 /* callbacks provided by xc_domain_save */
 struct save_callbacks {
diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index f76af23bcc..ba2f7e72b1 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -271,7 +271,7 @@ struct xc_sr_context
 size_t pages_sent;
 size_t overhead_sent;
 
-struct precopy_stats stats;
+precopy_stats_t stats;
 
 unsigned int nr_batch_pfns;
 unsigned long *deferred_pages;
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index d766384ed6..c86180730f 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -489,7 +489,7 @@ static int update_progress_string(struct xc_sr_context 
*ctx, char **str)
 #define SPP_MAX_ITERATIONS  5
 #define SPP_TARGET_DIRTY_COUNT 50
 
-static int simple_precopy_policy(struct precopy_stats stats, void *user)
+static int simple_precopy_policy(precopy_stats_t stats, void *user)
 {
 return ((stats.dirty_count >= 0 &&
  stats.dirty_count < SPP_TARGET_DIRTY_COUNT) ||
@@ -516,13 +516,13 @@ static int send_memory_live(struct xc_sr_context *ctx)
 precopy_policy_t precopy_policy = ctx->save.callbacks->precopy_policy;
 void *data = ctx->save.callbacks->data;
 
-struct precopy_stats *policy_stats;
+precopy_stats_t *policy_stats;
 
 rc = update_progress_string(ctx, &progress_str);
 if ( rc )
 goto out;
 
-ctx->save.stats = (struct precopy_stats){
+ctx->save.stats = (precopy_stats_t){
 .dirty_count = ctx->save.p2m_size,
 };
 policy_stats = &ctx->save.stats;

[PATCH v20210111 29/39] tools/guest: restore: split record processing

2021-01-11 Thread Olaf Hering

handle_page_data must be able to read directly into mapped guest memory.
This will avoid unneccesary memcpy calls for data which can be consumed 
verbatim.

Rearrange the code to allow decisions based on the incoming record.

This change is preparation for future changes in handle_page_data,
no change in behavior is intended.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.c  | 33 -
 tools/libs/guest/xg_sr_common.h  |  4 ++-
 tools/libs/guest/xg_sr_restore.c | 49 ++--
 tools/libs/guest/xg_sr_save.c|  7 -
 4 files changed, 63 insertions(+), 30 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.c b/tools/libs/guest/xg_sr_common.c
index 17567ab133..cabde4ef74 100644
--- a/tools/libs/guest/xg_sr_common.c
+++ b/tools/libs/guest/xg_sr_common.c
@@ -91,26 +91,33 @@ int write_split_record(struct xc_sr_context *ctx, struct 
xc_sr_record *rec,
 return -1;
 }
 
-int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec)
+int read_record_header(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr 
*rhdr)
 {
 xc_interface *xch = ctx->xch;
-struct xc_sr_rhdr rhdr;
-size_t datasz;
 
-if ( read_exact(fd, &rhdr, sizeof(rhdr)) )
+if ( read_exact(fd, rhdr, sizeof(*rhdr)) )
 {
 PERROR("Failed to read Record Header from stream");
 return -1;
 }
 
-if ( rhdr.length > REC_LENGTH_MAX )
+if ( rhdr->length > REC_LENGTH_MAX )
 {
-ERROR("Record (0x%08x, %s) length %#x exceeds max (%#x)", rhdr.type,
-  rec_type_to_str(rhdr.type), rhdr.length, REC_LENGTH_MAX);
+ERROR("Record (0x%08x, %s) length %#x exceeds max (%#x)", rhdr->type,
+  rec_type_to_str(rhdr->type), rhdr->length, REC_LENGTH_MAX);
 return -1;
 }
 
-datasz = ROUNDUP(rhdr.length, REC_ALIGN_ORDER);
+return 0;
+}
+
+int read_record_data(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr 
*rhdr,
+ struct xc_sr_record *rec)
+{
+xc_interface *xch = ctx->xch;
+size_t datasz;
+
+datasz = ROUNDUP(rhdr->length, REC_ALIGN_ORDER);
 
 if ( datasz )
 {
@@ -119,7 +126,7 @@ int read_record(struct xc_sr_context *ctx, int fd, struct 
xc_sr_record *rec)
 if ( !rec->data )
 {
 ERROR("Unable to allocate %zu bytes for record data (0x%08x, %s)",
-  datasz, rhdr.type, rec_type_to_str(rhdr.type));
+  datasz, rhdr->type, rec_type_to_str(rhdr->type));
 return -1;
 }
 
@@ -128,18 +135,18 @@ int read_record(struct xc_sr_context *ctx, int fd, struct 
xc_sr_record *rec)
 free(rec->data);
 rec->data = NULL;
 PERROR("Failed to read %zu bytes of data for record (0x%08x, %s)",
-   datasz, rhdr.type, rec_type_to_str(rhdr.type));
+   datasz, rhdr->type, rec_type_to_str(rhdr->type));
 return -1;
 }
 }
 else
 rec->data = NULL;
 
-rec->type   = rhdr.type;
-rec->length = rhdr.length;
+rec->type   = rhdr->type;
+rec->length = rhdr->length;
 
 return 0;
-};
+}
 
 static void __attribute__((unused)) build_assertions(void)
 {
diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 3fe665b91d..66d1b0dfe6 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -475,7 +475,9 @@ static inline int write_record(struct xc_sr_context *ctx,
  *
  * On failure, the contents of the record structure are undefined.
  */
-int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec);
+int read_record_header(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr 
*rhdr);
+int read_record_data(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr 
*rhdr,
+ struct xc_sr_record *rec);
 
 /*
  * This would ideally be private in restore.c, but is needed by
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 71b39612ee..93f69b9ba8 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -471,7 +471,7 @@ static int send_checkpoint_dirty_pfn_list(struct 
xc_sr_context *ctx)
 return rc;
 }
 
-static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
+static int process_buffered_record(struct xc_sr_context *ctx, struct 
xc_sr_record *rec);
 static int handle_checkpoint(struct xc_sr_context *ctx)
 {
 xc_interface *xch = ctx->xch;
@@ -510,7 +510,7 @@ static int handle_checkpoint(struct xc_sr_context *ctx)
 
 for ( i = 0; i < ctx->restore.buffered_rec_num; i++ )
 {
-rc = process_record(ctx, &ctx->restore.buffered_records[i]);
+rc = process_buffered_record(ctx, 
&ctx->restore.buffered_records[i]);
 if ( rc )
 goto err;
 }
@@ -571,10 +571,11 @@ static int handle_checkpoint(struct xc_sr_context *ctx)
 return rc;
 }
 
-static int buffer_

[PATCH v20210111 23/39] tools/guest: restore: move pfns array

2021-01-11 Thread Olaf Hering

Remove allocation from hotpath, move pfns array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  | 2 ++
 tools/libs/guest/xg_sr_restore.c | 6 ++
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 2a020fef5c..516d9b9fb5 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -224,6 +224,8 @@ struct xc_sr_save_arrays {
 };
 
 struct xc_sr_restore_arrays {
+/* handle_page_data */
+xen_pfn_t pfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 4a9ece9681..7d1447e86c 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -315,7 +315,7 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 unsigned int i, pages_of_data = 0;
 int rc = -1;
 
-xen_pfn_t *pfns = NULL, pfn;
+xen_pfn_t *pfns = ctx->restore.m->pfns, pfn;
 uint32_t *types = NULL, type;
 
 /*
@@ -363,9 +363,8 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 goto err;
 }
 
-pfns = malloc(pages->count * sizeof(*pfns));
 types = malloc(pages->count * sizeof(*types));
-if ( !pfns || !types )
+if ( !types )
 {
 ERROR("Unable to allocate enough memory for %u pfns",
   pages->count);
@@ -412,7 +411,6 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
&pages->pfn[pages->count]);
  err:
 free(types);
-free(pfns);
 
 return rc;
 }

[PATCH v20210111 27/39] tools/guest: restore: move mfns array in populate_pfns

2021-01-11 Thread Olaf Hering

Remove allocation from hotpath, move populate_pfns mfns array into preallocated 
space.
Use some prefix to avoid conflict with an array used in handle_page_data.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  | 2 ++
 tools/libs/guest/xg_sr_restore.c | 5 ++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index eba3a49877..96a77b5969 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -230,6 +230,8 @@ struct xc_sr_restore_arrays {
 /* process_page_data */
 xen_pfn_t mfns[MAX_BATCH_SIZE];
 int map_errs[MAX_BATCH_SIZE];
+/* populate_pfns */
+xen_pfn_t pp_mfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 94c329032f..85a32aaed2 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -138,12 +138,12 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int 
count,
   const xen_pfn_t *original_pfns, const uint32_t *types)
 {
 xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = malloc(count * sizeof(*mfns)),
+xen_pfn_t *mfns = ctx->restore.m->pp_mfns,
 *pfns = malloc(count * sizeof(*pfns));
 unsigned int i, nr_pfns = 0;
 int rc = -1;
 
-if ( !mfns || !pfns )
+if ( !pfns )
 {
 ERROR("Failed to allocate %zu bytes for populating the physmap",
   2 * count * sizeof(*mfns));
@@ -191,7 +191,6 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int 
count,
 
  err:
 free(pfns);
-free(mfns);
 
 return rc;
 }

[PATCH v20210111 25/39] tools/guest: restore: move mfns array

2021-01-11 Thread Olaf Hering

Remove allocation from hotpath, move mfns array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  | 2 ++
 tools/libs/guest/xg_sr_restore.c | 5 ++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 0ceecb289d..5731a5c186 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -227,6 +227,8 @@ struct xc_sr_restore_arrays {
 /* handle_page_data */
 xen_pfn_t pfns[MAX_BATCH_SIZE];
 uint32_t types[MAX_BATCH_SIZE];
+/* process_page_data */
+xen_pfn_t mfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 7729071e41..3ba089f862 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -205,7 +205,7 @@ static int process_page_data(struct xc_sr_context *ctx, 
unsigned int count,
  xen_pfn_t *pfns, uint32_t *types, void *page_data)
 {
 xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = malloc(count * sizeof(*mfns));
+xen_pfn_t *mfns = ctx->restore.m->mfns;
 int *map_errs = malloc(count * sizeof(*map_errs));
 int rc;
 void *mapping = NULL, *guest_page = NULL;
@@ -213,7 +213,7 @@ static int process_page_data(struct xc_sr_context *ctx, 
unsigned int count,
 j,  /* j indexes the subset of pfns we decide to map. */
 nr_pages = 0;
 
-if ( !mfns || !map_errs )
+if ( !map_errs )
 {
 rc = -1;
 ERROR("Failed to allocate %zu bytes to process page data",
@@ -299,7 +299,6 @@ static int process_page_data(struct xc_sr_context *ctx, 
unsigned int count,
 xenforeignmemory_unmap(xch->fmem, mapping, nr_pages);
 
 free(map_errs);
-free(mfns);
 
 return rc;
 }

Re: [PATCH 2/2] sysemu: Let VMChangeStateHandler take boolean 'running' argument

2021-01-11 Thread Alex Bennée



Philippe Mathieu-Daudé  writes:

> The 'running' argument from VMChangeStateHandler does not require
> other value than 0 / 1. Make it a plain boolean.
>
> Signed-off-by: Philippe Mathieu-Daudé 

Seems reasonable

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH 1/2] sysemu/runstate: Let runstate_is_running() return bool

2021-01-11 Thread Alex Bennée



Philippe Mathieu-Daudé  writes:

> runstate_check() returns a boolean. runstate_is_running()
> returns what runstate_check() returns, also a boolean.
>
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

[PATCH v20210111 21/39] tools/guest: save: move guest_data array

2021-01-11 Thread Olaf Hering

Remove allocation from hotpath, move guest_data array into preallocated space.

Because this was allocated with calloc:
Adjust the loop to clear unused entries as needed.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h |  2 ++
 tools/libs/guest/xg_sr_save.c   | 11 ++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 81158a4f4d..33e66678c6 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -223,6 +223,8 @@ struct xc_sr_save_arrays {
 struct iovec iov[MAX_BATCH_SIZE + 4];
 /* write_batch */
 uint64_t rec_pfns[MAX_BATCH_SIZE];
+/* write_batch: Pointers to page data to send. Mapped gfns or local 
allocations. */
+void *guest_data[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_restore_arrays {
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 0e34c4b051..a571246894 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -90,7 +90,7 @@ static int write_batch(struct xc_sr_context *ctx)
 xc_interface *xch = ctx->xch;
 xen_pfn_t *mfns = ctx->save.m->mfns, *types = ctx->save.m->types;
 void *guest_mapping = NULL;
-void **guest_data = NULL;
+void **guest_data = ctx->save.m->guest_data;
 void **local_pages = NULL;
 int *errors = ctx->save.m->errors, rc = -1;
 unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
@@ -105,12 +105,10 @@ static int write_batch(struct xc_sr_context *ctx)
 
 assert(nr_pfns != 0);
 
-/* Pointers to page data to send.  Mapped gfns or local allocations. */
-guest_data = calloc(nr_pfns, sizeof(*guest_data));
 /* Pointers to locally allocated pages.  Need freeing. */
 local_pages = calloc(nr_pfns, sizeof(*local_pages));
 
-if ( !guest_data || !local_pages )
+if ( !local_pages )
 {
 ERROR("Unable to allocate arrays for a batch of %u pages",
   nr_pfns);
@@ -166,7 +164,10 @@ static int write_batch(struct xc_sr_context *ctx)
 for ( i = 0, p = 0; i < nr_pfns; ++i )
 {
 if ( page_type_has_stream_data(types[i]) == false )
+{
+guest_data[i] = NULL;
 continue;
+}
 
 if ( errors[p] )
 {
@@ -183,6 +184,7 @@ static int write_batch(struct xc_sr_context *ctx)
 
 if ( rc )
 {
+guest_data[i] = NULL;
 if ( rc == -1 && errno == EAGAIN )
 {
 set_bit(ctx->save.m->batch_pfns[i], 
ctx->save.deferred_pages);
@@ -256,7 +258,6 @@ static int write_batch(struct xc_sr_context *ctx)
 for ( i = 0; local_pages && i < nr_pfns; ++i )
 free(local_pages[i]);
 free(local_pages);
-free(guest_data);
 
 return rc;
 }

[PATCH v20210111 20/39] tools/guest: save: move rec_pfns array

2021-01-11 Thread Olaf Hering

Remove allocation from hotpath, move rec_pfns array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h |  2 ++
 tools/libs/guest/xg_sr_save.c   | 11 +--
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index f315b4f526..81158a4f4d 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -221,6 +221,8 @@ struct xc_sr_save_arrays {
 int errors[MAX_BATCH_SIZE];
 /* write_batch: iovec[] for writev(). */
 struct iovec iov[MAX_BATCH_SIZE + 4];
+/* write_batch */
+uint64_t rec_pfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_restore_arrays {
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 1d04bd0a44..0e34c4b051 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -96,7 +96,7 @@ static int write_batch(struct xc_sr_context *ctx)
 unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
 unsigned int nr_pfns = ctx->save.nr_batch_pfns;
 void *page, *orig_page;
-uint64_t *rec_pfns = NULL;
+uint64_t *rec_pfns = ctx->save.m->rec_pfns;
 struct iovec *iov = ctx->save.m->iov; int iovcnt = 0;
 struct xc_sr_rec_page_data_header hdr = { 0 };
 struct xc_sr_record rec = {
@@ -201,14 +201,6 @@ static int write_batch(struct xc_sr_context *ctx)
 }
 }
 
-rec_pfns = malloc(nr_pfns * sizeof(*rec_pfns));
-if ( !rec_pfns )
-{
-ERROR("Unable to allocate %zu bytes of memory for page data pfn list",
-  nr_pfns * sizeof(*rec_pfns));
-goto err;
-}
-
 hdr.count = nr_pfns;
 
 rec.length = sizeof(hdr);
@@ -259,7 +251,6 @@ static int write_batch(struct xc_sr_context *ctx)
 rc = ctx->save.nr_batch_pfns = 0;
 
  err:
-free(rec_pfns);
 if ( guest_mapping )
 xenforeignmemory_unmap(xch->fmem, guest_mapping, nr_pages_mapped);
 for ( i = 0; local_pages && i < nr_pfns; ++i )

[PATCH v20210111 16/39] tools/guest: save: move mfns array

2021-01-11 Thread Olaf Hering

Remove allocation from hotpath, move mfns array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h | 2 ++
 tools/libs/guest/xg_sr_save.c   | 7 ++-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index c78a07b8f8..0c2bef8f78 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -213,6 +213,8 @@ static inline int update_blob(struct xc_sr_blob *blob,
 
 struct xc_sr_save_arrays {
 xen_pfn_t batch_pfns[MAX_BATCH_SIZE];
+/* write_batch: Mfns of the batch pfns. */
+xen_pfn_t mfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_restore_arrays {
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 700344b6b6..fd6437afc0 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -88,7 +88,7 @@ static int write_checkpoint_record(struct xc_sr_context *ctx)
 static int write_batch(struct xc_sr_context *ctx)
 {
 xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = NULL, *types = NULL;
+xen_pfn_t *mfns = ctx->save.m->mfns, *types = NULL;
 void *guest_mapping = NULL;
 void **guest_data = NULL;
 void **local_pages = NULL;
@@ -105,8 +105,6 @@ static int write_batch(struct xc_sr_context *ctx)
 
 assert(nr_pfns != 0);
 
-/* Mfns of the batch pfns. */
-mfns = malloc(nr_pfns * sizeof(*mfns));
 /* Types of the batch pfns. */
 types = malloc(nr_pfns * sizeof(*types));
 /* Errors from attempting to map the gfns. */
@@ -118,7 +116,7 @@ static int write_batch(struct xc_sr_context *ctx)
 /* iovec[] for writev(). */
 iov = malloc((nr_pfns + 4) * sizeof(*iov));
 
-if ( !mfns || !types || !errors || !guest_data || !local_pages || !iov )
+if ( !types || !errors || !guest_data || !local_pages || !iov )
 {
 ERROR("Unable to allocate arrays for a batch of %u pages",
   nr_pfns);
@@ -277,7 +275,6 @@ static int write_batch(struct xc_sr_context *ctx)
 free(guest_data);
 free(errors);
 free(types);
-free(mfns);
 
 return rc;
 }

[PATCH v20210111 18/39] tools/guest: save: move errors array

2021-01-11 Thread Olaf Hering

Remove allocation from hotpath, move errors array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h | 2 ++
 tools/libs/guest/xg_sr_save.c   | 7 ++-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 3cbadb607b..71b676c0e0 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -217,6 +217,8 @@ struct xc_sr_save_arrays {
 xen_pfn_t mfns[MAX_BATCH_SIZE];
 /* write_batch: Types of the batch pfns. */
 xen_pfn_t types[MAX_BATCH_SIZE];
+/* write_batch: Errors from attempting to map the gfns. */
+int errors[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_restore_arrays {
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index ff70f62b1e..a1bddd5dcb 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -92,7 +92,7 @@ static int write_batch(struct xc_sr_context *ctx)
 void *guest_mapping = NULL;
 void **guest_data = NULL;
 void **local_pages = NULL;
-int *errors = NULL, rc = -1;
+int *errors = ctx->save.m->errors, rc = -1;
 unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
 unsigned int nr_pfns = ctx->save.nr_batch_pfns;
 void *page, *orig_page;
@@ -105,8 +105,6 @@ static int write_batch(struct xc_sr_context *ctx)
 
 assert(nr_pfns != 0);
 
-/* Errors from attempting to map the gfns. */
-errors = malloc(nr_pfns * sizeof(*errors));
 /* Pointers to page data to send.  Mapped gfns or local allocations. */
 guest_data = calloc(nr_pfns, sizeof(*guest_data));
 /* Pointers to locally allocated pages.  Need freeing. */
@@ -114,7 +112,7 @@ static int write_batch(struct xc_sr_context *ctx)
 /* iovec[] for writev(). */
 iov = malloc((nr_pfns + 4) * sizeof(*iov));
 
-if ( !errors || !guest_data || !local_pages || !iov )
+if ( !guest_data || !local_pages || !iov )
 {
 ERROR("Unable to allocate arrays for a batch of %u pages",
   nr_pfns);
@@ -271,7 +269,6 @@ static int write_batch(struct xc_sr_context *ctx)
 free(iov);
 free(local_pages);
 free(guest_data);
-free(errors);
 
 return rc;
 }

[PATCH v20210111 14/39] tools/guest: prepare to allocate arrays once

2021-01-11 Thread Olaf Hering

The hotpath 'send_dirty_pages' is supposed to do just one thing: sending.
The other end 'handle_page_data' is supposed to do just receiving.

But instead both do other costly work like memory allocations and data moving.
Do the allocations once, the array sizes are a compiletime constant.
Avoid unneeded copying of data by receiving data directly into mapped guest 
memory.

This patch is just prepartion, subsequent changes will populate the arrays.

Once all changes are applied, migration of a busy HVM domU changes like that:

Without this series, from sr650 to sr950 (xen-4.15.20201027T173911.16a20963b3 
xen_testing):
2020-10-29 10:23:10.711+: xc: show_transfer_rate: 23663128 bytes + 2879563 
pages in 55.324905335 sec, 203 MiB/sec: Internal error
2020-10-29 10:23:35.115+: xc: show_transfer_rate: 16829632 bytes + 2097552 
pages in 24.401179720 sec, 335 MiB/sec: Internal error
2020-10-29 10:23:59.436+: xc: show_transfer_rate: 16829032 bytes + 2097478 
pages in 24.319025928 sec, 336 MiB/sec: Internal error
2020-10-29 10:24:23.844+: xc: show_transfer_rate: 16829024 bytes + 2097477 
pages in 24.406992500 sec, 335 MiB/sec: Internal error
2020-10-29 10:24:48.292+: xc: show_transfer_rate: 16828912 bytes + 2097463 
pages in 24.446489027 sec, 335 MiB/sec: Internal error
2020-10-29 10:25:01.816+: xc: show_transfer_rate: 16836080 bytes + 2098356 
pages in 13.447091818 sec, 609 MiB/sec: Internal error

With this series, from sr650 to sr950 (xen-4.15.20201027T173911.16a20963b3 
xen_unstable):
2020-10-28 21:26:05.074+: xc: show_transfer_rate: 23663128 bytes + 2879563 
pages in 52.564054368 sec, 213 MiB/sec: Internal error
2020-10-28 21:26:23.527+: xc: show_transfer_rate: 16830040 bytes + 2097603 
pages in 18.450592015 sec, 444 MiB/sec: Internal error
2020-10-28 21:26:41.926+: xc: show_transfer_rate: 16830944 bytes + 2097717 
pages in 18.397862306 sec, 445 MiB/sec: Internal error
2020-10-28 21:27:00.339+: xc: show_transfer_rate: 16829176 bytes + 2097498 
pages in 18.411973339 sec, 445 MiB/sec: Internal error
2020-10-28 21:27:18.643+: xc: show_transfer_rate: 16828592 bytes + 2097425 
pages in 18.303326695 sec, 447 MiB/sec: Internal error
2020-10-28 21:27:26.289+: xc: show_transfer_rate: 16835952 bytes + 2098342 
pages in 7.579846749 sec, 1081 MiB/sec: Internal error

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  | 8 
 tools/libs/guest/xg_sr_restore.c | 8 
 tools/libs/guest/xg_sr_save.c| 4 +++-
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index f3a7a29298..62bc87b5f4 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -211,6 +211,12 @@ static inline int update_blob(struct xc_sr_blob *blob,
 return 0;
 }
 
+struct xc_sr_save_arrays {
+};
+
+struct xc_sr_restore_arrays {
+};
+
 struct xc_sr_context
 {
 xc_interface *xch;
@@ -248,6 +254,7 @@ struct xc_sr_context
 unsigned long *deferred_pages;
 unsigned long nr_deferred_pages;
 xc_hypercall_buffer_t dirty_bitmap_hbuf;
+struct xc_sr_save_arrays *m;
 } save;
 
 struct /* Restore data. */
@@ -299,6 +306,7 @@ struct xc_sr_context
 
 /* Sender has invoked verify mode on the stream. */
 bool verify;
+struct xc_sr_restore_arrays *m;
 } restore;
 };
 
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 0332ae9f32..4a9ece9681 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -739,6 +739,13 @@ static int setup(struct xc_sr_context *ctx)
 }
 ctx->restore.allocated_rec_num = DEFAULT_BUF_RECORDS;
 
+ctx->restore.m = malloc(sizeof(*ctx->restore.m));
+if ( !ctx->restore.m ) {
+ERROR("Unable to allocate memory for arrays");
+rc = -1;
+goto err;
+}
+
  err:
 return rc;
 }
@@ -757,6 +764,7 @@ static void cleanup(struct xc_sr_context *ctx)
 xc_hypercall_buffer_free_pages(
 xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->restore.p2m_size)));
 
+free(ctx->restore.m);
 free(ctx->restore.buffered_records);
 free(ctx->restore.populated_pfns);
 
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index da031fcfce..baaeb12762 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -853,8 +853,9 @@ static int setup(struct xc_sr_context *ctx)
 ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
   sizeof(*ctx->save.batch_pfns));
 ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
+ctx->save.m = malloc(sizeof(*ctx->save.m));
 
-if ( !ctx->save.batch_pfns || !dirty_bitmap || !ctx->save.deferred_pages )
+if ( !ctx->save.m || !ctx->save.batch_pfns || !dirty_bitmap || 
!ctx->save.deferred_pages )
 {
 ERROR("Unable to allocate m

[PATCH v20210111 17/39] tools/guest: save: move types array

2021-01-11 Thread Olaf Hering

Remove allocation from hotpath, move types array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h | 2 ++
 tools/libs/guest/xg_sr_save.c   | 7 ++-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 0c2bef8f78..3cbadb607b 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -215,6 +215,8 @@ struct xc_sr_save_arrays {
 xen_pfn_t batch_pfns[MAX_BATCH_SIZE];
 /* write_batch: Mfns of the batch pfns. */
 xen_pfn_t mfns[MAX_BATCH_SIZE];
+/* write_batch: Types of the batch pfns. */
+xen_pfn_t types[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_restore_arrays {
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index fd6437afc0..ff70f62b1e 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -88,7 +88,7 @@ static int write_checkpoint_record(struct xc_sr_context *ctx)
 static int write_batch(struct xc_sr_context *ctx)
 {
 xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = ctx->save.m->mfns, *types = NULL;
+xen_pfn_t *mfns = ctx->save.m->mfns, *types = ctx->save.m->types;
 void *guest_mapping = NULL;
 void **guest_data = NULL;
 void **local_pages = NULL;
@@ -105,8 +105,6 @@ static int write_batch(struct xc_sr_context *ctx)
 
 assert(nr_pfns != 0);
 
-/* Types of the batch pfns. */
-types = malloc(nr_pfns * sizeof(*types));
 /* Errors from attempting to map the gfns. */
 errors = malloc(nr_pfns * sizeof(*errors));
 /* Pointers to page data to send.  Mapped gfns or local allocations. */
@@ -116,7 +114,7 @@ static int write_batch(struct xc_sr_context *ctx)
 /* iovec[] for writev(). */
 iov = malloc((nr_pfns + 4) * sizeof(*iov));
 
-if ( !types || !errors || !guest_data || !local_pages || !iov )
+if ( !errors || !guest_data || !local_pages || !iov )
 {
 ERROR("Unable to allocate arrays for a batch of %u pages",
   nr_pfns);
@@ -274,7 +272,6 @@ static int write_batch(struct xc_sr_context *ctx)
 free(local_pages);
 free(guest_data);
 free(errors);
-free(types);
 
 return rc;
 }

[PATCH v20210111 13/39] tools: show migration transfer rate in send_dirty_pages

2021-01-11 Thread Olaf Hering

Show how fast domU pages are transferred in each iteration.

The relevant data is how fast the pfns travel, not so much how much
protocol overhead exists. So the reported MiB/sec is just for pfns.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h |  2 ++
 tools/libs/guest/xg_sr_save.c   | 47 +
 2 files changed, 49 insertions(+)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 70e328e951..f3a7a29298 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -238,6 +238,8 @@ struct xc_sr_context
 bool debug;
 
 unsigned long p2m_size;
+size_t pages_sent;
+size_t overhead_sent;
 
 struct precopy_stats stats;
 
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 0546d3d9e6..da031fcfce 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -1,5 +1,6 @@
 #include 
 #include 
+#include 
 
 #include "xg_sr_common.h"
 
@@ -238,6 +239,8 @@ static int write_batch(struct xc_sr_context *ctx)
 iov[3].iov_len = nr_pfns * sizeof(*rec_pfns);
 
 iovcnt = 4;
+ctx->save.pages_sent += nr_pages;
+ctx->save.overhead_sent += sizeof(rec) + sizeof(hdr) + nr_pfns * 
sizeof(*rec_pfns);
 
 if ( nr_pages )
 {
@@ -357,6 +360,43 @@ static int suspend_domain(struct xc_sr_context *ctx)
 return 0;
 }
 
+static void show_transfer_rate(struct xc_sr_context *ctx, struct timespec 
*start)
+{
+xc_interface *xch = ctx->xch;
+struct timespec end = {}, diff = {};
+size_t ms, MiB_sec = ctx->save.pages_sent * PAGE_SIZE;
+
+if (!MiB_sec)
+return;
+
+if ( clock_gettime(CLOCK_MONOTONIC, &end) )
+PERROR("clock_gettime");
+
+if ( (end.tv_nsec - start->tv_nsec) < 0 )
+{
+diff.tv_sec = end.tv_sec - start->tv_sec - 1;
+diff.tv_nsec = end.tv_nsec - start->tv_nsec + (1000U*1000U*1000U);
+}
+else
+{
+diff.tv_sec = end.tv_sec - start->tv_sec;
+diff.tv_nsec = end.tv_nsec - start->tv_nsec;
+}
+
+ms = (diff.tv_nsec / (1000U*1000U));
+if (!ms)
+ms = 1;
+ms += (diff.tv_sec * 1000U);
+
+MiB_sec *= 1000U;
+MiB_sec /= ms;
+MiB_sec /= 1024U*1024U;
+
+errno = 0;
+IPRINTF("%s: %zu bytes + %zu pages in %ld.%09ld sec, %zu MiB/sec", 
__func__,
+ctx->save.overhead_sent, ctx->save.pages_sent, diff.tv_sec, 
diff.tv_nsec, MiB_sec);
+}
+
 /*
  * Send a subset of pages in the guests p2m, according to the dirty bitmap.
  * Used for each subsequent iteration of the live migration loop.
@@ -370,9 +410,15 @@ static int send_dirty_pages(struct xc_sr_context *ctx,
 xen_pfn_t p;
 unsigned long written;
 int rc;
+struct timespec start = {};
 DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
 &ctx->save.dirty_bitmap_hbuf);
 
+ctx->save.pages_sent = 0;
+ctx->save.overhead_sent = 0;
+if ( clock_gettime(CLOCK_MONOTONIC, &start) )
+PERROR("clock_gettime");
+
 for ( p = 0, written = 0; p < ctx->save.p2m_size; ++p )
 {
 if ( !test_bit(p, dirty_bitmap) )
@@ -396,6 +442,7 @@ static int send_dirty_pages(struct xc_sr_context *ctx,
 if ( written > entries )
 DPRINTF("Bitmap contained more entries than expected...");
 
+show_transfer_rate(ctx, &start);
 xc_report_progress_step(xch, entries, entries);
 
 return ctx->save.ops.check_vm_state(ctx);

[PATCH v20210111 15/39] tools/guest: save: move batch_pfns

2021-01-11 Thread Olaf Hering

The batch_pfns array is already allocated in advance.
Move it into the preallocated area.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h |  2 +-
 tools/libs/guest/xg_sr_save.c   | 25 +++--
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 62bc87b5f4..c78a07b8f8 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -212,6 +212,7 @@ static inline int update_blob(struct xc_sr_blob *blob,
 }
 
 struct xc_sr_save_arrays {
+xen_pfn_t batch_pfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_restore_arrays {
@@ -249,7 +250,6 @@ struct xc_sr_context
 
 struct precopy_stats stats;
 
-xen_pfn_t *batch_pfns;
 unsigned int nr_batch_pfns;
 unsigned long *deferred_pages;
 unsigned long nr_deferred_pages;
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index baaeb12762..700344b6b6 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -77,7 +77,7 @@ static int write_checkpoint_record(struct xc_sr_context *ctx)
 
 /*
  * Writes a batch of memory as a PAGE_DATA record into the stream.  The batch
- * is constructed in ctx->save.batch_pfns.
+ * is constructed in ctx->save.m->batch_pfns.
  *
  * This function:
  * - gets the types for each pfn in the batch.
@@ -128,12 +128,12 @@ static int write_batch(struct xc_sr_context *ctx)
 for ( i = 0; i < nr_pfns; ++i )
 {
 types[i] = mfns[i] = ctx->save.ops.pfn_to_gfn(ctx,
-  ctx->save.batch_pfns[i]);
+  
ctx->save.m->batch_pfns[i]);
 
 /* Likely a ballooned page. */
 if ( mfns[i] == INVALID_MFN )
 {
-set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
+set_bit(ctx->save.m->batch_pfns[i], ctx->save.deferred_pages);
 ++ctx->save.nr_deferred_pages;
 }
 }
@@ -179,7 +179,7 @@ static int write_batch(struct xc_sr_context *ctx)
 if ( errors[p] )
 {
 ERROR("Mapping of pfn %#"PRIpfn" (mfn %#"PRIpfn") failed %d",
-  ctx->save.batch_pfns[i], mfns[p], errors[p]);
+  ctx->save.m->batch_pfns[i], mfns[p], errors[p]);
 goto err;
 }
 
@@ -193,7 +193,7 @@ static int write_batch(struct xc_sr_context *ctx)
 {
 if ( rc == -1 && errno == EAGAIN )
 {
-set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
+set_bit(ctx->save.m->batch_pfns[i], 
ctx->save.deferred_pages);
 ++ctx->save.nr_deferred_pages;
 types[i] = XEN_DOMCTL_PFINFO_XTAB;
 --nr_pages;
@@ -224,7 +224,7 @@ static int write_batch(struct xc_sr_context *ctx)
 rec.length += nr_pages * PAGE_SIZE;
 
 for ( i = 0; i < nr_pfns; ++i )
-rec_pfns[i] = ((uint64_t)(types[i]) << 32) | ctx->save.batch_pfns[i];
+rec_pfns[i] = ((uint64_t)(types[i]) << 32) | 
ctx->save.m->batch_pfns[i];
 
 iov[0].iov_base = &rec.type;
 iov[0].iov_len = sizeof(rec.type);
@@ -296,9 +296,9 @@ static int flush_batch(struct xc_sr_context *ctx)
 
 if ( !rc )
 {
-VALGRIND_MAKE_MEM_UNDEFINED(ctx->save.batch_pfns,
+VALGRIND_MAKE_MEM_UNDEFINED(ctx->save.m->batch_pfns,
 MAX_BATCH_SIZE *
-sizeof(*ctx->save.batch_pfns));
+sizeof(*ctx->save.m->batch_pfns));
 }
 
 return rc;
@@ -315,7 +315,7 @@ static int add_to_batch(struct xc_sr_context *ctx, 
xen_pfn_t pfn)
 rc = flush_batch(ctx);
 
 if ( rc == 0 )
-ctx->save.batch_pfns[ctx->save.nr_batch_pfns++] = pfn;
+ctx->save.m->batch_pfns[ctx->save.nr_batch_pfns++] = pfn;
 
 return rc;
 }
@@ -850,14 +850,12 @@ static int setup(struct xc_sr_context *ctx)
 
 dirty_bitmap = xc_hypercall_buffer_alloc_pages(
 xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
-ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
-  sizeof(*ctx->save.batch_pfns));
 ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
 ctx->save.m = malloc(sizeof(*ctx->save.m));
 
-if ( !ctx->save.m || !ctx->save.batch_pfns || !dirty_bitmap || 
!ctx->save.deferred_pages )
+if ( !ctx->save.m || !dirty_bitmap || !ctx->save.deferred_pages )
 {
-ERROR("Unable to allocate memory for dirty bitmaps, batch pfns and"
+ERROR("Unable to allocate memory for dirty bitmaps and"
   " deferred pages");
 rc = -1;
 errno = ENOMEM;
@@ -886,7 +884,6 @@ static void cleanup(struct xc_sr_context *ctx)
 xc_hypercall_buffer_free_pages(xch, dirty_bitmap,

[PATCH v20210111 07/39] xl: optionally print timestamps during xl migrate

2021-01-11 Thread Olaf Hering

During 'xl -v.. migrate domU host' a large amount of debug is generated.
It is difficult to map each line to the sending and receiving side.
Also the time spent for migration is not reported.

With 'xl migrate -T domU host' both sides will print timestamps and
also the pid of the invoked xl process to make it more obvious which
side produced a given log line.

Signed-off-by: Olaf Hering 
---
 docs/man/xl.1.pod.in   |  4 
 tools/xl/xl_cmdtable.c |  1 +
 tools/xl/xl_migrate.c  | 25 +
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index df98adc9e4..494a84ee13 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -475,6 +475,10 @@ domain. See the corresponding option of the I 
subcommand.
 Send the specified  file instead of the file used on creation of the
 domain.
 
+=item B<-T>
+
+Include timestamps in output.
+
 =item B<--debug>
 
 Display huge (!) amount of debug information during the migration process.
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 37710880d3..da0473ddfb 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -167,6 +167,7 @@ struct cmd_spec cmd_table[] = {
   "migrate-receive [-d -e]\n"
   "-e  Do not wait in the background (on ) for the 
death\n"
   "of the domain.\n"
+  "-T  Show timestamps during the migration process.\n"
   "--debug Print huge (!) amount of debug during the migration 
process.\n"
   "-p  Do not unpause domain after migrating it.\n"
   "-D  Preserve the domain id"
diff --git a/tools/xl/xl_migrate.c b/tools/xl/xl_migrate.c
index 0813beb801..856a6e2be1 100644
--- a/tools/xl/xl_migrate.c
+++ b/tools/xl/xl_migrate.c
@@ -32,6 +32,8 @@
 
 #ifndef LIBXL_HAVE_NO_SUSPEND_RESUME
 
+static bool timestamps;
+
 static pid_t create_migration_child(const char *rune, int *send_fd,
 int *recv_fd)
 {
@@ -187,6 +189,7 @@ static void migrate_domain(uint32_t domid, int 
preserve_domid,
 char rc_buf;
 uint8_t *config_data;
 int config_len, flags = LIBXL_SUSPEND_LIVE;
+unsigned xtl_flags = XTL_STDIOSTREAM_HIDE_PROGRESS;
 
 save_domain_core_begin(domid, preserve_domid, override_config_file,
&config_data, &config_len);
@@ -202,7 +205,9 @@ static void migrate_domain(uint32_t domid, int 
preserve_domid,
 migrate_do_preamble(send_fd, recv_fd, child, config_data, config_len,
 rune);
 
-xtl_stdiostream_adjust_flags(logger, XTL_STDIOSTREAM_HIDE_PROGRESS, 0);
+if (timestamps)
+xtl_flags |= XTL_STDIOSTREAM_SHOW_DATE | XTL_STDIOSTREAM_SHOW_PID;
+xtl_stdiostream_adjust_flags(logger, xtl_flags, 0);
 
 if (debug)
 flags |= LIBXL_SUSPEND_DEBUG;
@@ -328,6 +333,11 @@ static void migrate_receive(int debug, int daemonize, int 
monitor,
 char rc_buf;
 char *migration_domname;
 struct domain_create dom_info;
+unsigned xtl_flags = 0;
+
+if (timestamps)
+xtl_flags |= XTL_STDIOSTREAM_SHOW_DATE | XTL_STDIOSTREAM_SHOW_PID;
+xtl_stdiostream_adjust_flags(logger, xtl_flags, 0);
 
 signal(SIGPIPE, SIG_IGN);
 /* if we get SIGPIPE we'd rather just have it as an error */
@@ -491,7 +501,7 @@ int main_migrate_receive(int argc, char **argv)
 COMMON_LONG_OPTS
 };
 
-SWITCH_FOREACH_OPT(opt, "Fedrp", opts, "migrate-receive", 0) {
+SWITCH_FOREACH_OPT(opt, "FedrpT", opts, "migrate-receive", 0) {
 case 'F':
 daemonize = 0;
 break;
@@ -517,6 +527,9 @@ int main_migrate_receive(int argc, char **argv)
 case 'p':
 pause_after_migration = 1;
 break;
+case 'T':
+timestamps = 1;
+break;
 }
 
 if (argc-optind != 0) {
@@ -545,7 +558,7 @@ int main_migrate(int argc, char **argv)
 COMMON_LONG_OPTS
 };
 
-SWITCH_FOREACH_OPT(opt, "FC:s:epD", opts, "migrate", 2) {
+SWITCH_FOREACH_OPT(opt, "FC:s:eTpD", opts, "migrate", 2) {
 case 'C':
 config_filename = optarg;
 break;
@@ -559,6 +572,9 @@ int main_migrate(int argc, char **argv)
 daemonize = 0;
 monitor = 0;
 break;
+case 'T':
+timestamps = 1;
+break;
 case 'p':
 pause_after_migration = 1;
 break;
@@ -592,11 +608,12 @@ int main_migrate(int argc, char **argv)
 } else {
 verbose_len = (minmsglevel_default - minmsglevel) + 2;
 }
-xasprintf(&rune, "exec %s %s xl%s%.*s migrate-receive%s%s%s",
+xasprintf(&rune, "exec %s %s xl%s%.*s migrate-receive%s%s%s%s",
   ssh_command, host,
   pass_tty_arg ? " -t" : "",
   verbose_len, verbose_buf,
   daemonize ? "" : " -e",
+  timestamps ? " -T" : "",
   debug ? " -d" : "",
   pause_after_migrat

[PATCH v20210111 12/39] tools: unify type checking for data pfns in migration stream

2021-01-11 Thread Olaf Hering

Introduce a helper which decides if a given pfn type has data
for the migration stream.

No change in behavior intended.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  | 17 
 tools/libs/guest/xg_sr_restore.c | 34 +---
 tools/libs/guest/xg_sr_save.c| 14 ++---
 3 files changed, 24 insertions(+), 41 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index cc3ad1c394..70e328e951 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -455,6 +455,23 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int 
count,
 /* Handle a STATIC_DATA_END record. */
 int handle_static_data_end(struct xc_sr_context *ctx);
 
+static inline bool page_type_has_stream_data(uint32_t type)
+{
+bool ret;
+
+switch (type)
+{
+case XEN_DOMCTL_PFINFO_XTAB:
+case XEN_DOMCTL_PFINFO_XALLOC:
+case XEN_DOMCTL_PFINFO_BROKEN:
+ret = false;
+break;
+default:
+ret = true;
+break;
+}
+return ret;
+}
 #endif
 /*
  * Local variables:
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index f1c3169229..0332ae9f32 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -152,9 +152,8 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int 
count,
 
 for ( i = 0; i < count; ++i )
 {
-if ( (!types || (types &&
- (types[i] != XEN_DOMCTL_PFINFO_XTAB &&
-  types[i] != XEN_DOMCTL_PFINFO_BROKEN))) &&
+if ( (!types ||
+  (types && page_type_has_stream_data(types[i]) == true)) &&
  !pfn_is_populated(ctx, original_pfns[i]) )
 {
 rc = pfn_set_populated(ctx, original_pfns[i]);
@@ -233,25 +232,8 @@ static int process_page_data(struct xc_sr_context *ctx, 
unsigned int count,
 {
 ctx->restore.ops.set_page_type(ctx, pfns[i], types[i]);
 
-switch ( types[i] )
-{
-case XEN_DOMCTL_PFINFO_NOTAB:
-
-case XEN_DOMCTL_PFINFO_L1TAB:
-case XEN_DOMCTL_PFINFO_L1TAB | XEN_DOMCTL_PFINFO_LPINTAB:
-
-case XEN_DOMCTL_PFINFO_L2TAB:
-case XEN_DOMCTL_PFINFO_L2TAB | XEN_DOMCTL_PFINFO_LPINTAB:
-
-case XEN_DOMCTL_PFINFO_L3TAB:
-case XEN_DOMCTL_PFINFO_L3TAB | XEN_DOMCTL_PFINFO_LPINTAB:
-
-case XEN_DOMCTL_PFINFO_L4TAB:
-case XEN_DOMCTL_PFINFO_L4TAB | XEN_DOMCTL_PFINFO_LPINTAB:
-
+if ( page_type_has_stream_data(types[i]) == true )
 mfns[nr_pages++] = ctx->restore.ops.pfn_to_gfn(ctx, pfns[i]);
-break;
-}
 }
 
 /* Nothing to do? */
@@ -271,14 +253,8 @@ static int process_page_data(struct xc_sr_context *ctx, 
unsigned int count,
 
 for ( i = 0, j = 0; i < count; ++i )
 {
-switch ( types[i] )
-{
-case XEN_DOMCTL_PFINFO_XTAB:
-case XEN_DOMCTL_PFINFO_BROKEN:
-case XEN_DOMCTL_PFINFO_XALLOC:
-/* No page data to deal with. */
+if ( page_type_has_stream_data(types[i]) == false )
 continue;
-}
 
 if ( map_errs[j] )
 {
@@ -413,7 +389,7 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 goto err;
 }
 
-if ( type < XEN_DOMCTL_PFINFO_BROKEN )
+if ( page_type_has_stream_data(type) == true )
 /* NOTAB and all L1 through L4 tables (including pinned) should
  * have a page worth of data in the record. */
 pages_of_data++;
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 044d0ae3aa..0546d3d9e6 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -153,13 +153,8 @@ static int write_batch(struct xc_sr_context *ctx)
 goto err;
 }
 
-switch ( types[i] )
-{
-case XEN_DOMCTL_PFINFO_BROKEN:
-case XEN_DOMCTL_PFINFO_XALLOC:
-case XEN_DOMCTL_PFINFO_XTAB:
+if ( page_type_has_stream_data(types[i]) == false )
 continue;
-}
 
 mfns[nr_pages++] = mfns[i];
 }
@@ -177,13 +172,8 @@ static int write_batch(struct xc_sr_context *ctx)
 
 for ( i = 0, p = 0; i < nr_pfns; ++i )
 {
-switch ( types[i] )
-{
-case XEN_DOMCTL_PFINFO_BROKEN:
-case XEN_DOMCTL_PFINFO_XALLOC:
-case XEN_DOMCTL_PFINFO_XTAB:
+if ( page_type_has_stream_data(types[i]) == false )
 continue;
-}
 
 if ( errors[p] )
 {

[PATCH v20210111 08/39] xl: fix description of migrate --debug

2021-01-11 Thread Olaf Hering

xl migrate --debug used to track every pfn in every batch of pages.
But these times are gone. Adjust the help text to tell what --debug
is supposed to do today.

Signed-off-by: Olaf Hering 
---
 docs/man/xl.1.pod.in   | 4 +++-
 tools/xl/xl_cmdtable.c | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 494a84ee13..e6e4e8e83a 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -481,7 +481,9 @@ Include timestamps in output.
 
 =item B<--debug>
 
-Display huge (!) amount of debug information during the migration process.
+Verify transferred domU page data. All memory will be transferred one more
+time to the destination host while the domU is paused, and compared with
+the result of the inital transfer while the domU was still running.
 
 =item B<-p>
 
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index da0473ddfb..a0567169bf 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -168,7 +168,7 @@ struct cmd_spec cmd_table[] = {
   "-e  Do not wait in the background (on ) for the 
death\n"
   "of the domain.\n"
   "-T  Show timestamps during the migration process.\n"
-  "--debug Print huge (!) amount of debug during the migration 
process.\n"
+  "--debug Verify transferred domU page data.\n"
   "-p  Do not unpause domain after migrating it.\n"
   "-D  Preserve the domain id"
 },

[PATCH v20210111 11/39] tools: use xc_is_known_page_type

2021-01-11 Thread Olaf Hering

Verify pfn type on sending side, also verify incoming batch of pfns.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_restore.c | 3 +--
 tools/libs/guest/xg_sr_save.c| 6 ++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index b57a787519..f1c3169229 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -406,8 +406,7 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 }
 
 type = (pages->pfn[i] & PAGE_DATA_TYPE_MASK) >> 32;
-if ( ((type >> XEN_DOMCTL_PFINFO_LTAB_SHIFT) >= 5) &&
- ((type >> XEN_DOMCTL_PFINFO_LTAB_SHIFT) <= 8) )
+if ( xc_is_known_page_type(type) == false )
 {
 ERROR("Invalid type %#"PRIx32" for pfn %#"PRIpfn" (index %u)",
   type, pfn, i);
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 2ba7c3200c..044d0ae3aa 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -147,6 +147,12 @@ static int write_batch(struct xc_sr_context *ctx)
 
 for ( i = 0; i < nr_pfns; ++i )
 {
+if ( xc_is_known_page_type(types[i]) == false )
+{
+ERROR("Wrong type %#"PRIpfn" for pfn %#"PRIpfn, types[i], mfns[i]);
+goto err;
+}
+
 switch ( types[i] )
 {
 case XEN_DOMCTL_PFINFO_BROKEN:

[PATCH v20210111 10/39] tools: add xc_is_known_page_type to libxenctrl

2021-01-11 Thread Olaf Hering

Users of xc_get_pfn_type_batch may want to sanity check the data
returned by Xen. Add a simple helper for this purpose.

Signed-off-by: Olaf Hering 
---
 tools/libs/ctrl/xc_private.h | 33 +
 1 file changed, 33 insertions(+)

diff --git a/tools/libs/ctrl/xc_private.h b/tools/libs/ctrl/xc_private.h
index 5d2c7274fb..afb08aafe1 100644
--- a/tools/libs/ctrl/xc_private.h
+++ b/tools/libs/ctrl/xc_private.h
@@ -421,6 +421,39 @@ void *xc_map_foreign_ranges(xc_interface *xch, uint32_t 
dom,
 int xc_get_pfn_type_batch(xc_interface *xch, uint32_t dom,
   unsigned int num, xen_pfn_t *);
 
+/* Sanitiy check for types returned by Xen */
+static inline bool xc_is_known_page_type(xen_pfn_t type)
+{
+bool ret;
+
+switch (type)
+{
+case XEN_DOMCTL_PFINFO_NOTAB:
+
+case XEN_DOMCTL_PFINFO_L1TAB:
+case XEN_DOMCTL_PFINFO_L1TAB | XEN_DOMCTL_PFINFO_LPINTAB:
+
+case XEN_DOMCTL_PFINFO_L2TAB:
+case XEN_DOMCTL_PFINFO_L2TAB | XEN_DOMCTL_PFINFO_LPINTAB:
+
+case XEN_DOMCTL_PFINFO_L3TAB:
+case XEN_DOMCTL_PFINFO_L3TAB | XEN_DOMCTL_PFINFO_LPINTAB:
+
+case XEN_DOMCTL_PFINFO_L4TAB:
+case XEN_DOMCTL_PFINFO_L4TAB | XEN_DOMCTL_PFINFO_LPINTAB:
+
+case XEN_DOMCTL_PFINFO_XTAB:
+case XEN_DOMCTL_PFINFO_XALLOC:
+case XEN_DOMCTL_PFINFO_BROKEN:
+ret = true;
+break;
+default:
+ret = false;
+break;
+}
+return ret;
+}
+
 void bitmap_64_to_byte(uint8_t *bp, const uint64_t *lp, int nbits);
 void bitmap_byte_to_64(uint64_t *lp, const uint8_t *bp, int nbits);

[PATCH v20210111 09/39] tools: add readv_exact to libxenctrl

2021-01-11 Thread Olaf Hering

Read a batch of iovec's.

In the common case of short reads, finish individual iov's with read_exact.

Signed-off-by: Olaf Hering 
---
 tools/libs/ctrl/xc_private.c | 55 +++-
 tools/libs/ctrl/xc_private.h |  1 +
 2 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/tools/libs/ctrl/xc_private.c b/tools/libs/ctrl/xc_private.c
index d94f846686..ea420b9ba8 100644
--- a/tools/libs/ctrl/xc_private.c
+++ b/tools/libs/ctrl/xc_private.c
@@ -659,8 +659,23 @@ int write_exact(int fd, const void *data, size_t size)
 
 #if defined(__MINIOS__)
 /*
- * MiniOS's libc doesn't know about writev(). Implement it as multiple 
write()s.
+ * MiniOS's libc doesn't know about readv/writev().
+ * Implement it as multiple read/write()s.
  */
+int readv_exact(int fd, const struct iovec *iov, int iovcnt)
+{
+int rc, i;
+
+for ( i = 0; i < iovcnt; ++i )
+{
+rc = read_exact(fd, iov[i].iov_base, iov[i].iov_len);
+if ( rc )
+return rc;
+}
+
+return 0;
+}
+
 int writev_exact(int fd, const struct iovec *iov, int iovcnt)
 {
 int rc, i;
@@ -675,6 +690,44 @@ int writev_exact(int fd, const struct iovec *iov, int 
iovcnt)
 return 0;
 }
 #else
+int readv_exact(int fd, const struct iovec *iov, int iovcnt)
+{
+int rc = 0, idx = 0;
+ssize_t len;
+
+while ( idx < iovcnt )
+{
+len = readv(fd, &iov[idx], min(iovcnt - idx, IOV_MAX));
+if ( len == -1 && errno == EINTR )
+continue;
+if ( len <= 0 )
+{
+rc = -1;
+goto out;
+}
+while ( len > 0 && idx < iovcnt )
+{
+if ( len >= iov[idx].iov_len )
+{
+len -= iov[idx].iov_len;
+}
+else
+{
+void *p = iov[idx].iov_base + len;
+size_t l = iov[idx].iov_len - len;
+
+rc = read_exact(fd, p, l);
+if ( rc )
+goto out;
+len = 0;
+}
+idx++;
+}
+}
+out:
+return rc;
+}
+
 int writev_exact(int fd, const struct iovec *iov, int iovcnt)
 {
 struct iovec *local_iov = NULL;
diff --git a/tools/libs/ctrl/xc_private.h b/tools/libs/ctrl/xc_private.h
index f0b5f83ac8..5d2c7274fb 100644
--- a/tools/libs/ctrl/xc_private.h
+++ b/tools/libs/ctrl/xc_private.h
@@ -441,6 +441,7 @@ int xc_flush_mmu_updates(xc_interface *xch, struct xc_mmu 
*mmu);
 
 /* Return 0 on success; -1 on error setting errno. */
 int read_exact(int fd, void *data, size_t size); /* EOF => -1, errno=0 */
+int readv_exact(int fd, const struct iovec *iov, int iovcnt);
 int write_exact(int fd, const void *data, size_t size);
 int writev_exact(int fd, const struct iovec *iov, int iovcnt);

[PATCH v20210111 05/39] tools: add with-xen-scriptdir configure option

2021-01-11 Thread Olaf Hering

In the near future all fresh installations will have an empty /etc.
The content of this directory will not be controlled by the package
manager anymore. One of the reasons for this move is to make snapshots
more robust.

As a first step into this direction, add a knob to configure to allow
storing the hotplug scripts to libexec because they are not exactly
configuration. The current default is unchanged, which is
/etc/xen/scripts.

Signed-off-by: Olaf Hering 
---
 m4/paths.m4 | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/m4/paths.m4 b/m4/paths.m4
index 89d3bb8312..0cec2bb190 100644
--- a/m4/paths.m4
+++ b/m4/paths.m4
@@ -70,6 +70,12 @@ AC_ARG_WITH([libexec-leaf-dir],
 [libexec_subdir=$withval],
 [libexec_subdir=$PACKAGE_TARNAME])
 
+AC_ARG_WITH([xen-scriptdir],
+AS_HELP_STRING([--with-xen-scriptdir=DIR],
+[Path to directory for dom0 hotplug scripts. [SYSCONFDIR/xen/scripts]]),
+[xen_scriptdir_path=$withval],
+[xen_scriptdir_path=$sysconfdir/xen/scripts])
+
 AC_ARG_WITH([xen-dumpdir],
 AS_HELP_STRING([--with-xen-dumpdir=DIR],
 [Path to directory for domU crash dumps. [LOCALSTATEDIR/lib/xen/dump]]),
@@ -137,7 +143,7 @@ AC_SUBST(INITD_DIR)
 XEN_CONFIG_DIR=$CONFIG_DIR/xen
 AC_SUBST(XEN_CONFIG_DIR)
 
-XEN_SCRIPT_DIR=$XEN_CONFIG_DIR/scripts
+XEN_SCRIPT_DIR=$xen_scriptdir_path
 AC_SUBST(XEN_SCRIPT_DIR)
 
 case "$host_os" in

[PATCH v20210111 02/39] xl: use proper name for bash_completion file

2021-01-11 Thread Olaf Hering

Files in the bash-completion dirs should be named like the commands,
without suffix. Without this change 'xl' will not be recognized as a
command with completion support if BASH_COMPLETION_DIR is set to
/usr/share/bash-completion/completions.

Fixes commit 9136a919b19929ecb242ef327053d55d824397df

Signed-off-by: Olaf Hering 
---
 tools/xl/Makefile| 4 ++--
 tools/xl/bash-completion | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/xl/Makefile b/tools/xl/Makefile
index bdf67c8464..656b21c7da 100644
--- a/tools/xl/Makefile
+++ b/tools/xl/Makefile
@@ -45,11 +45,11 @@ install: all
$(INSTALL_DIR) $(DESTDIR)$(sbindir)
$(INSTALL_DIR) $(DESTDIR)$(BASH_COMPLETION_DIR)
$(INSTALL_PROG) xl $(DESTDIR)$(sbindir)
-   $(INSTALL_DATA) bash-completion $(DESTDIR)$(BASH_COMPLETION_DIR)/xl.sh
+   $(INSTALL_DATA) bash-completion $(DESTDIR)$(BASH_COMPLETION_DIR)/xl
 
 .PHONY: uninstall
 uninstall:
-   rm -f $(DESTDIR)$(BASH_COMPLETION_DIR)/xl.sh
+   rm -f $(DESTDIR)$(BASH_COMPLETION_DIR)/xl
rm -f $(DESTDIR)$(sbindir)/xl
 
 .PHONY: clean
diff --git a/tools/xl/bash-completion b/tools/xl/bash-completion
index b7cd6b3992..7c6ed32f88 100644
--- a/tools/xl/bash-completion
+++ b/tools/xl/bash-completion
@@ -1,4 +1,4 @@
-# Copy this file to /etc/bash_completion.d/xl.sh
+# Copy this file to /etc/bash_completion.d/xl
 
 _xl()
 {

[PATCH v20210111 04/39] docs: substitute XEN_CONFIG_DIR in xl.conf.5

2021-01-11 Thread Olaf Hering

xl(1) opens xl.conf in XEN_CONFIG_DIR.
Substitute this variable also in the man page.

Signed-off-by: Olaf Hering 
Reviewed-by: Anthony PERARD 
---
 docs/man/xl.1.pod.in   | 2 +-
 docs/man/xl.conf.5.pod | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index c7b2fcc927..765c169ed2 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -50,7 +50,7 @@ setup the bridge.
 
 If you specify the amount of memory dom0 has, passing B to
 Xen, it is highly recommended to disable B. Edit
-B and set it to 0.
+B<@XEN_CONFIG_DIR@/xl.conf> and set it to 0.
 
 =item run xl as B
 
diff --git a/docs/man/xl.conf.5.pod b/docs/man/xl.conf.5.pod
index 41ee428744..dfea9d64ba 100644
--- a/docs/man/xl.conf.5.pod
+++ b/docs/man/xl.conf.5.pod
@@ -1,6 +1,6 @@
 =head1 NAME
 
-/etc/xen/xl.conf - XL Global/Host Configuration 
+@XEN_CONFIG_DIR@/xl.conf - XL Global/Host Configuration
 
 =head1 DESCRIPTION

[PATCH v20210111 03/39] docs: remove stale create example from xl.1

2021-01-11 Thread Olaf Hering

Maybe xm create had a feature to create a domU based on a configuration
file. xl create requires the '-f' option to refer to a file.
There is no code to look into XEN_CONFIG_DIR, so remove the example.

Signed-off-by: Olaf Hering 
---
 docs/man/xl.1.pod.in | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index af31d2b572..c7b2fcc927 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -171,13 +171,6 @@ B
 
 =over 4
 
-=item I
-
-  xl create DebianLenny
-
-This creates a domain with the file /etc/xen/DebianLenny, and returns as
-soon as it is run.
-
 =item I
 
   xl create hvm.cfg 'cpus="0-3"; pci=["01:05.1","01:05.2"]'

Re: [PATCH 10/24] Make libs/evtchn build on NetBSD

2021-01-11 Thread Roger Pau Monné

On Sun, Jan 10, 2021 at 01:22:50PM +0100, Manuel Bouyer wrote:
> On Mon, Jan 04, 2021 at 06:15:24PM +0100, Roger Pau Monné wrote:
> > On Mon, Jan 04, 2021 at 11:26:45AM +0100, Manuel Bouyer wrote:
> > > On Tue, Dec 29, 2020 at 12:52:43PM +0100, Roger Pau Monné wrote:
> > > > On Mon, Dec 14, 2020 at 05:36:09PM +0100, Manuel Bouyer wrote:
> > > > > ---
> > > > >  tools/libs/evtchn/netbsd.c | 8 
> > > > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/tools/libs/evtchn/netbsd.c b/tools/libs/evtchn/netbsd.c
> > > > > index 8b8545d2f9..6d4ce28011 100644
> > > > > --- a/tools/libs/evtchn/netbsd.c
> > > > > +++ b/tools/libs/evtchn/netbsd.c
> > > > > @@ -25,10 +25,10 @@
> > > > >  
> > > > >  #include 
> > > > >  
> > > > > -#include 
> > > > > -
> > > > >  #include "private.h"
> > > > >  
> > > > > +#include 
> > > > > +
> > > > >  #define EVTCHN_DEV_NAME  "/dev/xenevt"
> > > > >  
> > > > >  int osdep_evtchn_open(xenevtchn_handle *xce)
> > > > > @@ -131,7 +131,7 @@ xenevtchn_port_or_error_t 
> > > > > xenevtchn_pending(xenevtchn_handle *xce)
> > > > >  int fd = xce->fd;
> > > > >  evtchn_port_t port;
> > > > >  
> > > > > -if ( read_exact(fd, (char *)&port, sizeof(port)) == -1 )
> > > > > +if ( read(fd, (char *)&port, sizeof(port)) == -1 )
> > > > >  return -1;
> > > > >  
> > > > >  return port;
> > > > > @@ -140,7 +140,7 @@ xenevtchn_port_or_error_t 
> > > > > xenevtchn_pending(xenevtchn_handle *xce)
> > > > >  int xenevtchn_unmask(xenevtchn_handle *xce, evtchn_port_t port)
> > > > >  {
> > > > >  int fd = xce->fd;
> > > > > -return write_exact(fd, (char *)&port, sizeof(port));
> > > > > +return write(fd, (char *)&port, sizeof(port));
> > > > 
> > > > I'm afraid we will need some context as to why {read/write}_exact
> > > > doesn't work here.
> > > 
> > > It just doesn't exists on NetBSD
> > 
> > But those are not part of libc or any external library, they are
> > implemented in tools/libs/ctrl/xc_private.c and should be available to
> > the NetBSD build AFAICT.
> > 
> > They are just helpers build on top of the standard read/write calls.
> 
> Yes, I misremembered (I have this patch for a long time, since 4.11 at last,
> maybe even older).
> Anyway the build fails with:
> netbsd.c: In function 'xenevtchn_pending':
> netbsd.c:134:10: error: implicit declaration of function 'read_exact'; did 
> you mean 'readlinkat'? [-Werror=implicit-function-declaration]
> 
> The only header where I see this function defined is
> tools/libs/ctrl/xc_private.h, so I would need something like
> #include "../../ctrl/xc_private.h"
> but this doesn't look right.
> 
> I didn't find where other OSes are getting the prototype from (or maybe
> they just have this -Werror turned off ?)
> 
> Anyway I think NetBSD doesn't need this read_exact/write_exact thing,
> the underlying pseudo-device won't to partial read/write.

The usage of {read/write}_exact there is indeed a mistake, when the
evtchn library was split from libxc no one realized that the
{read/write}_exact where no longer available to that code.

Could you please add:

Fixes: b7f76a699dc ('tools: Refactor /dev/xen/evtchn wrappers into 
libxenevtchn.')

To the commit message?

And also:

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

[linux-linus test] 158346: regressions - FAIL

2021-01-11 Thread osstest service owner

flight 158346 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158346/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-xsm7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-libvirt-xsm   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-coresched-i386-xl  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-raw7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-pvshim 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-freebsd10-i386  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-shadow 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-amd64-xl-multivcpu 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-xl  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-pvshim   14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-credit2  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-pvhv2-intel 14 guest-start   fail REGR. vs. 152332
 test-amd64-i386-examine   6 xen-install  fail REGR. vs. 152332
 test-amd64-amd64-xl-shadow   14 guest-start  fail REGR. vs. 152332
 test-arm64-arm64-xl-credit1  10 host-ping-check-xen  fail REGR. vs. 152332
 test-amd64-amd64-dom0pvh-xl-amd 14 guest-start   fail REGR. vs. 152332
 test-amd64-amd64-xl-pvhv2-amd 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-libvirt-xsm 14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-libvirt 14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-credit1  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-xsm  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-dom0pvh-xl-intel 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-libvirt-pair 25 guest-start/debian  fail REGR. vs. 152332
 test-arm64-arm64-xl-xsm  12 debian-install   fail REGR. vs. 152332
 test-amd64-coresched-amd64-xl 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-pair25 guest-start/debian   fail REGR. vs. 152332
 test-arm64-arm64-xl-credit2   8 xen-boot fail REGR. vs. 152332
 test-amd64-amd64-amd64-pvgrub 20 guest-stop  fail REGR. vs. 152332
 test-arm64-arm64-examine 13 examine-iommufail REGR. vs. 152332
 test-amd64-amd64-i386-pvgrub 20 guest-stop   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-arm64-arm64-xl-seattle   8 xen-boot fail REGR. vs. 152332
 test-arm64-arm64-xl  10 host-ping-check-xen  fail REGR. vs. 152332
 test-arm64-arm64-libvirt-xsm  8 xen-boot fail REGR. vs. 152332

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 14 guest-start  fail REGR. vs. 152332

Tests which did not succeed, but are not blocking:
 test-amd

Re: [PATCH 12/24] Implement gnttab on NetBSD

2021-01-11 Thread Roger Pau Monné

On Sun, Jan 10, 2021 at 01:40:50PM +0100, Manuel Bouyer wrote:
> On Mon, Jan 04, 2021 at 06:24:11PM +0100, Roger Pau Monné wrote:
> > On Mon, Jan 04, 2021 at 11:29:51AM +0100, Manuel Bouyer wrote:
> > > On Tue, Dec 29, 2020 at 12:16:01PM +0100, Roger Pau Monné wrote:
> > > > Might need some kind of log message, and will also required your
> > > > Signed-off-by (or from the original author of the patch).
> > > > 
> > > > On Mon, Dec 14, 2020 at 05:36:11PM +0100, Manuel Bouyer wrote:
> > > > > ---
> > > > >  tools/libs/gnttab/Makefile |   2 +-
> > > > >  tools/libs/gnttab/netbsd.c | 267 
> > > > > +
> > > > >  2 files changed, 268 insertions(+), 1 deletion(-)
> > > > >  create mode 100644 tools/libs/gnttab/netbsd.c
> > > > > 
> > > > > diff --git a/tools/libs/gnttab/Makefile b/tools/libs/gnttab/Makefile
> > > > > index d86c49d243..ae390ce60f 100644
> > > > > --- a/tools/libs/gnttab/Makefile
> > > > > +++ b/tools/libs/gnttab/Makefile
> > > > > @@ -10,7 +10,7 @@ SRCS-GNTSHR+= gntshr_core.c
> > > > >  SRCS-$(CONFIG_Linux)   += $(SRCS-GNTTAB) $(SRCS-GNTSHR) linux.c
> > > > >  SRCS-$(CONFIG_MiniOS)  += $(SRCS-GNTTAB) gntshr_unimp.c minios.c
> > > > >  SRCS-$(CONFIG_FreeBSD) += $(SRCS-GNTTAB) $(SRCS-GNTSHR) freebsd.c
> > > > > +SRCS-$(CONFIG_NetBSD)  += $(SRCS-GNTTAB) $(SRCS-GNTSHR) netbsd.c
> > > > >  SRCS-$(CONFIG_SunOS)   += gnttab_unimp.c gntshr_unimp.c
> > > > > -SRCS-$(CONFIG_NetBSD)  += gnttab_unimp.c gntshr_unimp.c
> > > > >  
> > > > >  include $(XEN_ROOT)/tools/libs/libs.mk
> > > > > diff --git a/tools/libs/gnttab/netbsd.c b/tools/libs/gnttab/netbsd.c
> > > > > new file mode 100644
> > > > > index 00..2df7058cd7
> > > > > --- /dev/null
> > > > > +++ b/tools/libs/gnttab/netbsd.c
> > > > 
> > > > I think this is mostly (if not equal) to the FreeBSD version, in which
> > > > case we could rename freebsd.c to plain bsd.c and use it for
> > > > both FreeBSD and NetBSD?
> > > 
> > > I can't see why they won't diverge in the future ...
> > 
> > True, but then let's diverge when we have to cross that bridge I would
> > say.
> > 
> > There's IMO no point in having two verbatim copies of the same code in
> > different places, it's just more churn to maintain and to remember to
> > apply duplicate fixes.
> 
> Actually I just checked, the files are quite different, because the
> GNTTAB ioctls are not the same, and it seems they don't work the same way
> either. FreeBSD does mmap against the gnttab device; this is not supported
> on NetBSD. Merging the two would cause an #ifdef maze.

Ack, my bad. Then it's fine. I got to think they where the same by the
copyright message.

Thanks, Roger.

Re: [PATCH] x86/acpi: remove dead code

2021-01-11 Thread Roger Pau Monné

On Mon, Jan 11, 2021 at 10:33:28AM +0100, Jan Beulich wrote:
> On 11.01.2021 10:26, Roger Pau Monne wrote:
> > After the recent changes to acpi_fadt_parse_sleep_info the bad label
> > can never be called with facs mapped, and hence the unmap can be
> > removed.
> > 
> > Additionally remove the whole label, since it was used by a
> > single caller. Move the relevant code from the label.
> > 
> > No functional change intended.
> > 
> > CID: 1471722
> > Fixes: 16ca5b3f873 ('x86/ACPI: don't invalidate S5 data when S3 wakeup 
> > vector cannot be determined')
> 
> I kind of consider a "Fixes:" tag contrary to "No functional change
> intended", but I guess Coverity considering this an issue warrants
> the tag at least in a way.

I've just added the tag so that if the original patch was backported
this was also, in order to prevent Coverity complaining again.

Thanks, Roger.

[xen-unstable bisection] complete test-arm64-arm64-xl-credit1

2021-01-11 Thread osstest service owner

branch xen-unstable
xenbranch xen-unstable
job test-arm64-arm64-xl-credit1
testid xen-boot

Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  xen git://xenbits.xen.org/xen.git
  Bug introduced:  9cfdb489af810f71acb7dcdb87075dc7b3b313a0
  Bug not present: a9f1f03b2710f5ce84f69c1c4516349531053fac
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/158366/


  commit 9cfdb489af810f71acb7dcdb87075dc7b3b313a0
  Author: Bertrand Marquis 
  Date:   Thu Dec 17 15:38:02 2020 +
  
  xen/arm: Add ID registers and complete cpuinfo
  
  Add definition and entries in cpuinfo for ID registers introduced in
  newer Arm Architecture reference manual:
  - ID_PFR2: processor feature register 2
  - ID_DFR1: debug feature register 1
  - ID_MMFR4 and ID_MMFR5: Memory model feature registers 4 and 5
  - ID_ISA6: ISA Feature register 6
  Add more bitfield definitions in PFR fields of cpuinfo.
  Add MVFR2 register definition for aarch32.
  Add MVFRx_EL1 defines for aarch32.
  Add mvfr values in cpuinfo.
  Add some registers definition for arm64 in sysregs as some are not
  always know by compilers.
  Initialize the new values added in cpuinfo in identify_cpu during init.
  
  Signed-off-by: Bertrand Marquis 
  Reviewed-by: Stefano Stabellini 


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/xen-unstable/test-arm64-arm64-xl-credit1.xen-boot.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/xen-unstable/test-arm64-arm64-xl-credit1.xen-boot
 --summary-out=tmp/158366.bisection-summary --basis-template=158290 
--blessings=real,real-bisect,real-retry xen-unstable 
test-arm64-arm64-xl-credit1 xen-boot
Searching for failure / basis pass:
 158322 fail [host=rochester1] / 158290 [host=rochester0] 158269 [host=laxton1] 
158231 ok.
Failure / basis pass flights: 158322 / 158231
Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git
Latest a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
7ea428895af2840d85c524f0bd11a38aac308308 
ce59e3dda5f99afbe7257e1e9a22dffd5c4d033c
Basis pass a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
7ea428895af2840d85c524f0bd11a38aac308308 
7ba2ab495be54f608cb47440e1497b2795bd301a
Generating revisions with ./adhoc-revtuple-generator  
git://xenbits.xen.org/linux-pvops.git#a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9-a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9
 
git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860
 
git://xenbits.xen.org/qemu-xen.git#7ea428895af2840d85c524f0bd11a38aac308308-7ea428895af2840d85c524f0bd11a38aac308308
 git://xenbits.xen.org/xen.git#7ba2ab495be54f608cb47440e1497b2795bd301a-ce59e3d\
 da5f99afbe7257e1e9a22dffd5c4d033c
>From git://cache:9419/git://xenbits.xen.org/xen
   ce59e3dda5..faa0ab2a1d  smoke  -> origin/smoke
Loaded 5001 nodes in revision graph
Searching for test results:
 158132 [host=laxton1]
 158146 [host=rochester0]
 158183 [host=laxton0]
 158231 pass a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
7ea428895af2840d85c524f0bd11a38aac308308 
7ba2ab495be54f608cb47440e1497b2795bd301a
 158269 [host=laxton1]
 158290 [host=rochester0]
 158296 fail a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
7ea428895af2840d85c524f0bd11a38aac308308 
ce59e3dda5f99afbe7257e1e9a22dffd5c4d033c
 158303 fail a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
7ea428895af2840d85c524f0bd11a38aac308308 
ce59e3dda5f99afbe7257e1e9a22dffd5c4d033c
 158322 fail a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
7ea428895af2840d85c524f0bd11a38aac308308 
ce59e3dda5f99afbe7257e1e9a22dffd5c4d033c
 158348 pass a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
7ea428895af2840d85c524f0bd11a38aac308308 
7ba2ab495be54f608cb47440e1497b2795bd301a
 158349 fail a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
7ea428895af2840d85c524f0bd11a38aac308308 
ce59e3dda5f99afbe7257e1e9a22dffd5c4d033c
 158351 fail a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
7ea428895af2840d85c524f0bd11a38aac308308 
c7115531ea8ede5c6ab27f972c1be6ecad388f55
 158354 fail a6c5dd1dbaffe4cc398d8454546ba9246b9a95c9 
c530a75c1e6a472b0

[xen-unstable-smoke test] 158362: tolerable all pass - PUSHED

2021-01-11 Thread osstest service owner

flight 158362 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158362/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  faa0ab2a1df0381e00d85312247024b32d60a7b9
baseline version:
 xen  ce59e3dda5f99afbe7257e1e9a22dffd5c4d033c

Last test of basis   158293  2021-01-09 04:01:58 Z2 days
Testing same since   158362  2021-01-11 14:00:27 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Julien Grall 
  Roger Pau Monné 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   ce59e3dda5..faa0ab2a1d  faa0ab2a1df0381e00d85312247024b32d60a7b9 -> smoke

[PATCH v2] tools: create libxensaverestore

2021-01-11 Thread Olaf Hering

Move all save/restore related code from libxenguest.so into a separate
library libxensaverestore.so. The only consumer is libxl-save-helper.
There is no need to have the moved code mapped all the time in binaries
where libxenguest.so is used.

According to size(1) the change is:
   textdata bss dec hex filename
 1871834304  48  191535   2ec2f guest/libxenguest.so.4.15.0

 1241063376  48  127530   1f22a guest/libxenguest.so.4.15.0
  678411872   8   69721   11059 saverestore/libxensaverestore.so.4.15.0

Signed-off-by: Olaf Hering 
---
v2:
- copy also license header
- move xg_nomigrate.c
- add size(1) output to commit msg
- remove change from libxl_create.c
---
 .gitignore|   2 +
 tools/include/xenguest.h  | 186 
 tools/include/xensaverestore.h| 208 ++
 tools/libs/Makefile   |   1 +
 tools/libs/guest/Makefile |  11 -
 tools/libs/guest/xg_offline_page.c|   1 -
 tools/libs/light/Makefile |   4 +-
 tools/libs/light/libxl_internal.h |   1 +
 tools/libs/light/libxl_save_helper.c  |   1 +
 tools/libs/light/libxl_save_msgs_gen.pl   |   2 +-
 tools/libs/saverestore/Makefile   |  38 
 .../{guest => saverestore}/xg_nomigrate.c |   0
 .../{guest => saverestore}/xg_save_restore.h  |   2 -
 .../{guest => saverestore}/xg_sr_common.c |   0
 .../{guest => saverestore}/xg_sr_common.h |  12 +
 .../{guest => saverestore}/xg_sr_common_x86.c |   0
 .../{guest => saverestore}/xg_sr_common_x86.h |   0
 .../xg_sr_common_x86_pv.c |   0
 .../xg_sr_common_x86_pv.h |   0
 .../{guest => saverestore}/xg_sr_restore.c|   0
 .../xg_sr_restore_x86_hvm.c   |   0
 .../xg_sr_restore_x86_pv.c|   0
 .../libs/{guest => saverestore}/xg_sr_save.c  |   0
 .../xg_sr_save_x86_hvm.c  |   0
 .../xg_sr_save_x86_pv.c   |   0
 .../xg_sr_stream_format.h |   0
 tools/libs/uselibs.mk |   4 +-
 27 files changed, 269 insertions(+), 204 deletions(-)
 create mode 100644 tools/include/xensaverestore.h
 create mode 100644 tools/libs/saverestore/Makefile
 rename tools/libs/{guest => saverestore}/xg_nomigrate.c (100%)
 rename tools/libs/{guest => saverestore}/xg_save_restore.h (98%)
 rename tools/libs/{guest => saverestore}/xg_sr_common.c (100%)
 rename tools/libs/{guest => saverestore}/xg_sr_common.h (98%)
 rename tools/libs/{guest => saverestore}/xg_sr_common_x86.c (100%)
 rename tools/libs/{guest => saverestore}/xg_sr_common_x86.h (100%)
 rename tools/libs/{guest => saverestore}/xg_sr_common_x86_pv.c (100%)
 rename tools/libs/{guest => saverestore}/xg_sr_common_x86_pv.h (100%)
 rename tools/libs/{guest => saverestore}/xg_sr_restore.c (100%)
 rename tools/libs/{guest => saverestore}/xg_sr_restore_x86_hvm.c (100%)
 rename tools/libs/{guest => saverestore}/xg_sr_restore_x86_pv.c (100%)
 rename tools/libs/{guest => saverestore}/xg_sr_save.c (100%)
 rename tools/libs/{guest => saverestore}/xg_sr_save_x86_hvm.c (100%)
 rename tools/libs/{guest => saverestore}/xg_sr_save_x86_pv.c (100%)
 rename tools/libs/{guest => saverestore}/xg_sr_stream_format.h (100%)

diff --git a/.gitignore b/.gitignore
index b169d78ed7..5c23d28f6b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -144,6 +144,8 @@ tools/libs/light/test_timedereg
 tools/libs/light/test_fdderegrace
 tools/libs/light/tmp.*
 tools/libs/light/xenlight.pc
+tools/libs/saverestore/libxensaverestore.map
+tools/libs/saverestore/xensaverestore.pc
 tools/libs/stat/_paths.h
 tools/libs/stat/headers.chk
 tools/libs/stat/libxenstat.map
diff --git a/tools/include/xenguest.h b/tools/include/xenguest.h
index 775cf34c04..23a407c56f 100644
--- a/tools/include/xenguest.h
+++ b/tools/include/xenguest.h
@@ -24,9 +24,6 @@
 
 #define XC_NUMA_NO_NODE   (~0U)
 
-#define XCFLAGS_LIVE  (1 << 0)
-#define XCFLAGS_DEBUG (1 << 1)
-
 #define X86_64_B_SIZE   64 
 #define X86_32_B_SIZE   32
 
@@ -434,189 +431,6 @@ static inline xen_pfn_t xc_dom_p2m(struct xc_dom_image 
*dom, xen_pfn_t pfn)
  */
 struct xenevtchn_handle;
 
-/* For save's precopy_policy(). */
-struct precopy_stats
-{
-unsigned int iteration;
-unsigned long total_written;
-long dirty_count; /* -1 if unknown */
-};
-
-/*
- * A precopy_policy callback may not be running in the same address
- * space as libxc an so precopy_stats is passed by value.
- */
-typedef int (*precopy_policy_t)(struct precopy_stats, void *);
-
-/* callbacks provided by xc_domain_save */
-struct save_callbacks {
-/*
- * Called after expiration of checkpoint interval,
- * to suspend the guest.
- */
-int (*suspend)(void *data);
-
-/*
- * Called before and after every batch of page data sent during
- * the precopy phase of a live migration to ask

Re: [PATCH v4 11/11] xen/arm: smmuv3: Add support for SMMUv3 driver

2021-01-11 Thread Oleksandr




Hi Rahul



-
  static int arm_smmu_device_probe(struct platform_device *pdev)
  {
  int irq, ret;
-    struct resource *res;
-    resource_size_t ioaddr;
+    paddr_t ioaddr, iosize;
  struct arm_smmu_device *smmu;
-    struct device *dev = &pdev->dev;
-    bool bypass;
  -    smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
+    smmu = xzalloc(struct arm_smmu_device);
  if (!smmu) {
-    dev_err(dev, "failed to allocate arm_smmu_device\n");
+    dev_err(pdev, "failed to allocate arm_smmu_device\n");
  return -ENOMEM;
  }
-    smmu->dev = dev;
+    smmu->dev = pdev;
  -    if (dev->of_node) {
+    if (pdev->of_node) {
  ret = arm_smmu_device_dt_probe(pdev, smmu);
+    if (ret)
+    return -EINVAL;
  } else {
  ret = arm_smmu_device_acpi_probe(pdev, smmu);
  if (ret == -ENODEV)
  return ret;
  }
  -    /* Set bypass mode according to firmware probing result */
-    bypass = !!ret;
-
  /* Base address */
-    res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-    if (resource_size(res) < arm_smmu_resource_size(smmu)) {
-    dev_err(dev, "MMIO region too small (%pr)\n", res);
+    ret = dt_device_get_address(dev_to_dt(pdev), 0, &ioaddr, &iosize);
+    if (ret)
+    return -ENODEV;
+
+    if (iosize < arm_smmu_resource_size(smmu)) {
+    dev_err(pdev, "MMIO region too small (%lx)\n", iosize);
  return -EINVAL;
  }
-    ioaddr = res->start;
    /*
   * Don't map the IMPLEMENTATION DEFINED regions, since they may 
contain

- * the PMCG registers which are reserved by the PMU driver.
+ * the PMCG registers which are optional and currently not 
supported.

   */
-    smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
+    smmu->base = ioremap_nocache(ioaddr, ARM_SMMU_REG_SZ);
  if (IS_ERR(smmu->base))
  return PTR_ERR(smmu->base);
  -    if (arm_smmu_resource_size(smmu) > SZ_64K) {
-    smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
+    if (iosize > SZ_64K) {
+    smmu->page1 = ioremap_nocache(ioaddr + SZ_64K,
 ARM_SMMU_REG_SZ);
  if (IS_ERR(smmu->page1))
  return PTR_ERR(smmu->page1);
@@ -2765,14 +3101,262 @@ static int arm_smmu_device_probe(struct 
platform_device *pdev)

  return ret;
    /* Reset the device */
-    ret = arm_smmu_device_reset(smmu, bypass);
+    ret = arm_smmu_device_reset(smmu);
  if (ret)
  return ret;
  +    /*
+ * Keep a list of all probed devices. This will be used to query
+ * the smmu devices based on the fwnode.
+ */
+    INIT_LIST_HEAD(&smmu->devices);
+
+    spin_lock(&arm_smmu_devices_lock);
+    list_add(&smmu->devices, &arm_smmu_devices);
+    spin_unlock(&arm_smmu_devices_lock);


Looks like that we need some kind of manual roll-back logic here in case 
of error during probe (there is no real devm_*):


iounmap, xfree, etc.




+
  return 0;
  }



--
Regards,

Oleksandr Tyshchenko

Re: [PATCH 3/4] x86: Allow non-faulting accesses to non-emulated MSRs if policy permits this

2021-01-11 Thread boris . ostrovsky



On 1/11/21 2:48 AM, Jan Beulich wrote:
> On 08.01.2021 21:39, boris.ostrov...@oracle.com wrote:
>> On 1/8/21 10:18 AM, Jan Beulich wrote:
>>
>>>
>>> Just to re-raise the question raised by Andrew already earlier
>>> on: Has Solaris been fixed in the meantime, or is this at least
>>> firmly planned to happen?
>> I was told they will open a bug.
> "Will", not "did"?


I can't say for sure, I don't have access to their bugDB, they typically keep 
bugs private (or so I am told). All I can say is that they are aware of this 
issue.


>
>>> @@ -3319,10 +3319,8 @@ static int vmx_msr_write_intercept(unsigned int msr, 
>>> uint64_t msr_content)
>>>   is_last_branch_msr(msr) )
>>>  break;
>>>  
>>> -gdprintk(XENLOG_WARNING,
>>> - "WRMSR 0x%08x val 0x%016"PRIx64" unimplemented\n",
>>> - msr, msr_content);
>>> -goto gp_fault;
>>> +if ( guest_unhandled_msr(v, msr, &msr_content, true) )
>>> +goto gp_fault;
>>>  }
>>>  
>>>  return X86EMUL_OKAY;
>>> These functions also get used from the insn emulator, when it
>>> needs to fetch an MSR value (not necessarily in the context of
>>> emulating RDMSR or WRMSR). I wonder whether applying this
>>> behavior in that case is actually correct. It would only be if
>>> we would settle on it being a requirement that any such MSRs
>>> have to have emulation present in one of the handlers.
>>
>> Hmm.. Yes, I did not consider this. I am not convinced this will
>> always result in correct behavior for the emulator so I will
>> need to pass down a parameter. Unless there is a way to figure
>> out whether we are running in the emulator (which I don't
>> immediately see)
> Passing a parameter for this is sort of ugly, but I presume
> unavoidable. The more that what you need to know is not "running
> in emulator", but "guest RDMSR/WRMSR" - this can also happen
> through the emulator.


Right, that's what I meant.


-boris

Re: [PATCH] hvmloader: pass PCI MMIO layout to OVMF as an info table

2021-01-11 Thread Laszlo Ersek

On 01/11/21 17:31, Igor Druzhinin wrote:
> On 11/01/2021 15:35, Laszlo Ersek wrote:
>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
>> unless you have verified the sender and know the content is safe.
>>
>> On 01/11/21 16:26, Igor Druzhinin wrote:
>>> On 11/01/2021 15:21, Jan Beulich wrote:
 On 11.01.2021 15:49, Laszlo Ersek wrote:
> On 01/11/21 15:00, Igor Druzhinin wrote:
>> On 11/01/2021 09:27, Jan Beulich wrote:
>>> On 11.01.2021 05:53, Igor Druzhinin wrote:
 We faced a problem with passing through a PCI device with 64GB BAR to
 UEFI guest. The BAR is expectedly programmed into 64-bit PCI aperture 
 at
 64G address which pushes physical address space to 37 bits. OVMF uses
 address width early in PEI phase to make DXE identity pages covering
 the whole addressable space so it needs to know the last address it 
 needs
 to cover but at the same time not overdo the mappings.

 As there is seemingly no other way to pass or get this information in
 OVMF at this early phase (ACPI is not yet available, PCI is not yet 
 enumerated,
 xenstore is not yet initialized) - extend the info structure with a new
 table. Since the structure was initially created to be extendable -
 the change is backward compatible.
>>>
>>> How does UEFI handle the same situation on baremetal? I'd guess it is
>>> in even more trouble there, as it couldn't even read addresses from
>>> BARs, but would first need to assign them (or at least calculate
>>> their intended positions).
>>
>> Maybe Laszlo or Anthony could answer this question quickly while I'm 
>> investigating?
>
> On the bare metal, the phys address width of the processor is known.

 From CPUID I suppose.

> OVMF does the whole calculation in reverse because there's no way for it
> to know the physical address width of the physical (= host) CPU.
> "Overdoing" the mappings doesn't only waste resources, it breaks hard
> with EPT -- access to a GPA that is inexpressible with the phys address
> width of the host CPU (= not mappable successfully with the nested page
> tables) will behave super bad. I don't recall the exact symptoms, but it
> prevents booting the guest OS.
>
> This is why the most conservative 36-bit width is assumed by default.

 IOW you don't trust virtualized CPUID output?
>>>
>>> I'm discussing this with Andrew and it appears we're certainly more lax in
>>> wiring physical address width into the guest from hardware directly rather
>>> than KVM.
>>>
>>> Another problem that I faced while experimenting is that creating page
>>> tables for 46-bits (that CPUID returned in my case) of address space takes
>>> about a minute on a modern CPU.
>>
>> Even if you enable 1GiB pages?
>>
>> (In the libvirt domain XML, it's expressed as
>>
>> 
>> )
>>
>> ... I'm not doubtful, just curious. I guess that, when the physical
>> address width is so large, a physical UEFI platform firmware will limit
>> itself to a lesser width -- it could even offer some knobs in the setup TUI.
> 
> So it wasn't the feature bit that we expose by default in Xen but the OVMF 
> configuration
> with 1G pages disabled for that use. I enabled it and got booting even with 
> 46-bits
> in reasonable time now.
> 
> Given we're not that sensitive in Xen to physical address being different and 
> prefer to
> control that on different level I'd like to abandon that ABI change approach 
> (does anyone
> have any objections?) and instead take physical address width directly from 
> CPUID which
> we do in hvmloader already. The change would be local to Xen platform.

Yes, as long as you limit the approach to "OvmfPkg/XenPlatformPei" (or,
more generally, to the "OvmfPkg/OvmfXen.dsc" platform), it makes perfect
sense.

Thanks!
Laszlo

Re: [PATCH] hvmloader: pass PCI MMIO layout to OVMF as an info table

2021-01-11 Thread Igor Druzhinin

On 11/01/2021 15:35, Laszlo Ersek wrote:
> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
> unless you have verified the sender and know the content is safe.
> 
> On 01/11/21 16:26, Igor Druzhinin wrote:
>> On 11/01/2021 15:21, Jan Beulich wrote:
>>> On 11.01.2021 15:49, Laszlo Ersek wrote:
 On 01/11/21 15:00, Igor Druzhinin wrote:
> On 11/01/2021 09:27, Jan Beulich wrote:
>> On 11.01.2021 05:53, Igor Druzhinin wrote:
>>> We faced a problem with passing through a PCI device with 64GB BAR to
>>> UEFI guest. The BAR is expectedly programmed into 64-bit PCI aperture at
>>> 64G address which pushes physical address space to 37 bits. OVMF uses
>>> address width early in PEI phase to make DXE identity pages covering
>>> the whole addressable space so it needs to know the last address it 
>>> needs
>>> to cover but at the same time not overdo the mappings.
>>>
>>> As there is seemingly no other way to pass or get this information in
>>> OVMF at this early phase (ACPI is not yet available, PCI is not yet 
>>> enumerated,
>>> xenstore is not yet initialized) - extend the info structure with a new
>>> table. Since the structure was initially created to be extendable -
>>> the change is backward compatible.
>>
>> How does UEFI handle the same situation on baremetal? I'd guess it is
>> in even more trouble there, as it couldn't even read addresses from
>> BARs, but would first need to assign them (or at least calculate
>> their intended positions).
>
> Maybe Laszlo or Anthony could answer this question quickly while I'm 
> investigating?

 On the bare metal, the phys address width of the processor is known.
>>>
>>> From CPUID I suppose.
>>>
 OVMF does the whole calculation in reverse because there's no way for it
 to know the physical address width of the physical (= host) CPU.
 "Overdoing" the mappings doesn't only waste resources, it breaks hard
 with EPT -- access to a GPA that is inexpressible with the phys address
 width of the host CPU (= not mappable successfully with the nested page
 tables) will behave super bad. I don't recall the exact symptoms, but it
 prevents booting the guest OS.

 This is why the most conservative 36-bit width is assumed by default.
>>>
>>> IOW you don't trust virtualized CPUID output?
>>
>> I'm discussing this with Andrew and it appears we're certainly more lax in
>> wiring physical address width into the guest from hardware directly rather
>> than KVM.
>>
>> Another problem that I faced while experimenting is that creating page
>> tables for 46-bits (that CPUID returned in my case) of address space takes
>> about a minute on a modern CPU.
> 
> Even if you enable 1GiB pages?
> 
> (In the libvirt domain XML, it's expressed as
> 
> 
> )
> 
> ... I'm not doubtful, just curious. I guess that, when the physical
> address width is so large, a physical UEFI platform firmware will limit
> itself to a lesser width -- it could even offer some knobs in the setup TUI.

So it wasn't the feature bit that we expose by default in Xen but the OVMF 
configuration
with 1G pages disabled for that use. I enabled it and got booting even with 
46-bits
in reasonable time now.

Given we're not that sensitive in Xen to physical address being different and 
prefer to
control that on different level I'd like to abandon that ABI change approach 
(does anyone
have any objections?) and instead take physical address width directly from 
CPUID which
we do in hvmloader already. The change would be local to Xen platform.

Igor

Re: [PATCH v4 11/11] xen/arm: smmuv3: Add support for SMMUv3 driver

2021-01-11 Thread Oleksandr




On 08.01.21 16:46, Rahul Singh wrote:

Hi Rahul


Add support for ARM architected SMMUv3 implementation. It is based on
the Linux SMMUv3 driver.

Driver is currently supported as Tech Preview.

Major differences with regard to Linux driver are as follows:
2. Only Stage-2 translation is supported as compared to the Linux driver
that supports both Stage-1 and Stage-2 translations.
3. Use P2M  page table instead of creating one as SMMUv3 has the
capability to share the page tables with the CPU.
4. Tasklets are used in place of threaded IRQ's in Linux for event queue
and priority queue IRQ handling.
5. Latest version of the Linux SMMUv3 code implements the commands queue
access functions based on atomic operations implemented in Linux.
Atomic functions used by the commands queue access functions are not
implemented in XEN therefore we decided to port the earlier version
of the code. Atomic operations are introduced to fix the bottleneck
of the SMMU command queue insertion operation. A new algorithm for
inserting commands into the queue is introduced, which is lock-free
on the fast-path.
Consequence of reverting the patch is that the command queue
insertion will be slow for large systems as spinlock will be used to
serializes accesses from all CPUs to the single queue supported by
the hardware. Once the proper atomic operations will be available in
XEN the driver can be updated.
6. Spin lock is used in place of mutex when attaching a device to the
SMMU, as there is no blocking locks implementation available in XEN.
This might introduce latency in XEN. Need to investigate before
driver is out for tech preview.
7. PCI ATS functionality is not supported, as there is no support
available in XEN to test the functionality. Code is not tested and
compiled. Code is guarded by the flag CONFIG_PCI_ATS.
8. MSI interrupts are not supported as there is no support available in
XEN to request MSI interrupts. Code is not tested and compiled. Code
is guarded by the flag CONFIG_MSI.

Signed-off-by: Rahul Singh 
---
Changes in V3:
- added return statement for readx_poll_timeout function.
- remove iommu_get_dma_cookie and iommu_put_dma_cookie.
- remove struct arm_smmu_xen_device as not required.
- move dt_property_match_string to device_tree.c file.
- replace arm_smmu_*_thread to arm_smmu_*_tasklet to avoid confusion.
- use ARM_SMMU_REG_SZ as size when map memory to XEN.
- remove bypass keyword to make sure when device-tree probe is failed we
   are reporting error and not continuing to configure SMMU in bypass
   mode.
- fixed minor comments.
Changes in V4:
- Fixed typo for CONFIG_MSI
- Added back the mutex code
- Rebase the patch on top of newly added WARN_ON().
- Remove the direct read of register VTCR_EL2.
- Fixed minor comments.
---
  MAINTAINERS   |   6 +
  SUPPORT.md|   1 +
  xen/drivers/passthrough/Kconfig   |  11 +
  xen/drivers/passthrough/arm/Makefile  |   1 +
  xen/drivers/passthrough/arm/smmu-v3.c | 808 ++
  5 files changed, 715 insertions(+), 112 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6dbd99aff4..d832e8fd65 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -249,6 +249,12 @@ F: xen/include/asm-arm/
  F:xen/include/public/arch-arm/
  F:xen/include/public/arch-arm.h
  
+ARM SMMUv3

+M: Bertrand Marquis 
+M: Rahul Singh 
+S: Supported
+F: xen/drivers/passthrough/arm/smmu-v3.c
+
  Change Log
  M:Paul Durrant 
  R:Community Manager 
diff --git a/SUPPORT.md b/SUPPORT.md
index ab02aca5f4..5ee3c8651a 100644
--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -67,6 +67,7 @@ For the Cortex A57 r0p0 - r1p1, see Errata 832075.
  Status, Intel VT-d: Supported
  Status, ARM SMMUv1: Supported, not security supported
  Status, ARM SMMUv2: Supported, not security supported
+Status, ARM SMMUv3: Tech Preview
  Status, Renesas IPMMU-VMSA: Supported, not security supported
  
  ### ARM/GICv3 ITS

diff --git a/xen/drivers/passthrough/Kconfig b/xen/drivers/passthrough/Kconfig
index 0036007ec4..341ba92b30 100644
--- a/xen/drivers/passthrough/Kconfig
+++ b/xen/drivers/passthrough/Kconfig
@@ -13,6 +13,17 @@ config ARM_SMMU
  Say Y here if your SoC includes an IOMMU device implementing the
  ARM SMMU architecture.
  
+config ARM_SMMU_V3

+   bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support" if EXPERT
+   depends on ARM_64
+   ---help---
+Support for implementations of the ARM System MMU architecture
+version 3. Driver is in experimental stage and should not be used in
+production.
+
+Say Y here if your system includes an IOMMU device implementing
+the ARM SMMUv3 architecture.
+
  config IPMMU_VMSA
bool "Renesas IPMMU-VMSA found in R-Car Gen3 SoCs"
depends on ARM_64
diff --git a/xen/drivers/passthrough/arm/Makefile 
b/xen/drivers/passthrou

Re: [PATCH 1/2] sysemu/runstate: Let runstate_is_running() return bool

2021-01-11 Thread David Hildenbrand

On 11.01.21 16:20, Philippe Mathieu-Daudé wrote:
> runstate_check() returns a boolean. runstate_is_running()
> returns what runstate_check() returns, also a boolean.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/sysemu/runstate.h | 2 +-
>  softmmu/runstate.c| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
> index e557f470d42..3ab35a039a0 100644
> --- a/include/sysemu/runstate.h
> +++ b/include/sysemu/runstate.h
> @@ -6,7 +6,7 @@
>  
>  bool runstate_check(RunState state);
>  void runstate_set(RunState new_state);
> -int runstate_is_running(void);
> +bool runstate_is_running(void);
>  bool runstate_needs_reset(void);
>  bool runstate_store(char *str, size_t size);
>  
> diff --git a/softmmu/runstate.c b/softmmu/runstate.c
> index 636aab0addb..c7a67147d17 100644
> --- a/softmmu/runstate.c
> +++ b/softmmu/runstate.c
> @@ -217,7 +217,7 @@ void runstate_set(RunState new_state)
>  current_run_state = new_state;
>  }
>  
> -int runstate_is_running(void)
> +bool runstate_is_running(void)
>  {
>  return runstate_check(RUN_STATE_RUNNING);
>  }
> 

Reviewed-by: David Hildenbrand 

-- 
Thanks,

David / dhildenb

[qemu-mainline test] 158341: regressions - FAIL

2021-01-11 Thread osstest service owner

flight 158341 qemu-mainline real [real]
flight 158363 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/158341/
http://logs.test-lab.xenproject.org/osstest/logs/158363/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeat fail REGR. vs. 152631
 test-amd64-amd64-xl-qcow2   21 guest-start/debian.repeat fail REGR. vs. 152631
 test-armhf-armhf-xl-vhd 17 guest-start/debian.repeat fail REGR. vs. 152631

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152631
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 152631
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152631
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152631
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 152631
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152631
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 152631
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass

version targeted for testing:
 qemuu7b09f127738ae3d0e71716cea086fc8f847a5686
baseline version:
 qemuu1d806cef0e38b5db8347a8e12f214d543204a314

Last test of basis   152631  2020-08-20 09:07:46 Z  144 days
Failing since152659  2020-08-21 14:07:39 Z  143 days  296 attempts
Testing same since   158291  2021-01-09 02:23:06 Z2 days5 attempts


336 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm

Re: [PATCH 5/5] x86/PV32: avoid TLB flushing after mod_l3_entry()

2021-01-11 Thread Roger Pau Monné

On Mon, Jan 11, 2021 at 03:28:23PM +0100, Jan Beulich wrote:
> On 11.01.2021 15:23, Roger Pau Monné wrote:
> > On Tue, Nov 03, 2020 at 11:58:16AM +0100, Jan Beulich wrote:
> >> 32-bit guests may not depend upon the side effect of using ordinary
> >> 4-level paging when running on a 64-bit hypervisor. For L3 entry updates
> >> to take effect, they have to use a CR3 reload. Therefore there's no need
> >> to issue a paging structure invalidating TLB flush in this case.
> > 
> > I assume it's fine for the Xen linear page tables to be lkely out of
> > sync during the windows between the entry update and the CR3 reload?
> 
> Yes, because ...
> 
> > I wonder, won't something similar also apply to 64bit and L4 entries?
> 
> ... unlike 64-bit paging, PAE paging special cases the treatment
> of the 4 top level table entries. On bare metal they get loaded
> by the CPU upon CR3 load, not when walking page tables.

I wouldn't mind having this added to the commit message. In any case:

Acked-by: Roger Pau Monné 

Thanks, Roger.

Re: [PATCH v3 05/11] tools/foreignmem: Support querying the size of a resource

2021-01-11 Thread Roger Pau Monné

On Mon, Jan 11, 2021 at 03:26:57PM +, Andrew Cooper wrote:
> With the Xen side of this interface fixed to return real sizes, userspace
> needs to be able to make the query.
> 
> Introduce xenforeignmemory_resource_size() for the purpose, bumping the
> library minor version.
> 
> Update both Linux and FreeBSD's osdep_xenforeignmemory_map_resource() to
> understand size requests, skip the mmap() operation, and copy back the
> nr_frames field.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

Re: [PATCH] hvmloader: pass PCI MMIO layout to OVMF as an info table

2021-01-11 Thread Laszlo Ersek

On 01/11/21 16:26, Igor Druzhinin wrote:
> On 11/01/2021 15:21, Jan Beulich wrote:
>> On 11.01.2021 15:49, Laszlo Ersek wrote:
>>> On 01/11/21 15:00, Igor Druzhinin wrote:
 On 11/01/2021 09:27, Jan Beulich wrote:
> On 11.01.2021 05:53, Igor Druzhinin wrote:
>> We faced a problem with passing through a PCI device with 64GB BAR to
>> UEFI guest. The BAR is expectedly programmed into 64-bit PCI aperture at
>> 64G address which pushes physical address space to 37 bits. OVMF uses
>> address width early in PEI phase to make DXE identity pages covering
>> the whole addressable space so it needs to know the last address it needs
>> to cover but at the same time not overdo the mappings.
>>
>> As there is seemingly no other way to pass or get this information in
>> OVMF at this early phase (ACPI is not yet available, PCI is not yet 
>> enumerated,
>> xenstore is not yet initialized) - extend the info structure with a new
>> table. Since the structure was initially created to be extendable -
>> the change is backward compatible.
>
> How does UEFI handle the same situation on baremetal? I'd guess it is
> in even more trouble there, as it couldn't even read addresses from
> BARs, but would first need to assign them (or at least calculate
> their intended positions).

 Maybe Laszlo or Anthony could answer this question quickly while I'm 
 investigating?
>>>
>>> On the bare metal, the phys address width of the processor is known.
>>
>> From CPUID I suppose.
>>
>>> OVMF does the whole calculation in reverse because there's no way for it
>>> to know the physical address width of the physical (= host) CPU.
>>> "Overdoing" the mappings doesn't only waste resources, it breaks hard
>>> with EPT -- access to a GPA that is inexpressible with the phys address
>>> width of the host CPU (= not mappable successfully with the nested page
>>> tables) will behave super bad. I don't recall the exact symptoms, but it
>>> prevents booting the guest OS.
>>>
>>> This is why the most conservative 36-bit width is assumed by default.
>>
>> IOW you don't trust virtualized CPUID output?
> 
> I'm discussing this with Andrew and it appears we're certainly more lax in
> wiring physical address width into the guest from hardware directly rather
> than KVM.
> 
> Another problem that I faced while experimenting is that creating page
> tables for 46-bits (that CPUID returned in my case) of address space takes
> about a minute on a modern CPU.

Even if you enable 1GiB pages?

(In the libvirt domain XML, it's expressed as


)

... I'm not doubtful, just curious. I guess that, when the physical
address width is so large, a physical UEFI platform firmware will limit
itself to a lesser width -- it could even offer some knobs in the setup TUI.

Thanks,
Laszlo

Laszlo

Re: [PATCH] hvmloader: pass PCI MMIO layout to OVMF as an info table

2021-01-11 Thread Jan Beulich

On 11.01.2021 16:26, Igor Druzhinin wrote:
> Another problem that I faced while experimenting is that creating page
> tables for 46-bits (that CPUID returned in my case) of address space takes
> about a minute on a modern CPU.

Which probably isn't fundamentally different from bare metal?

Jan

Re: [PATCH] hvmloader: pass PCI MMIO layout to OVMF as an info table

2021-01-11 Thread Laszlo Ersek

On 01/11/21 16:21, Jan Beulich wrote:
> On 11.01.2021 15:49, Laszlo Ersek wrote:
>> On 01/11/21 15:00, Igor Druzhinin wrote:
>>> On 11/01/2021 09:27, Jan Beulich wrote:
 On 11.01.2021 05:53, Igor Druzhinin wrote:
> We faced a problem with passing through a PCI device with 64GB BAR to
> UEFI guest. The BAR is expectedly programmed into 64-bit PCI aperture at
> 64G address which pushes physical address space to 37 bits. OVMF uses
> address width early in PEI phase to make DXE identity pages covering
> the whole addressable space so it needs to know the last address it needs
> to cover but at the same time not overdo the mappings.
>
> As there is seemingly no other way to pass or get this information in
> OVMF at this early phase (ACPI is not yet available, PCI is not yet 
> enumerated,
> xenstore is not yet initialized) - extend the info structure with a new
> table. Since the structure was initially created to be extendable -
> the change is backward compatible.

 How does UEFI handle the same situation on baremetal? I'd guess it is
 in even more trouble there, as it couldn't even read addresses from
 BARs, but would first need to assign them (or at least calculate
 their intended positions).
>>>
>>> Maybe Laszlo or Anthony could answer this question quickly while I'm 
>>> investigating?
>>
>> On the bare metal, the phys address width of the processor is known.
> 
> From CPUID I suppose.
> 
>> OVMF does the whole calculation in reverse because there's no way for it
>> to know the physical address width of the physical (= host) CPU.
>> "Overdoing" the mappings doesn't only waste resources, it breaks hard
>> with EPT -- access to a GPA that is inexpressible with the phys address
>> width of the host CPU (= not mappable successfully with the nested page
>> tables) will behave super bad. I don't recall the exact symptoms, but it
>> prevents booting the guest OS.
>>
>> This is why the most conservative 36-bit width is assumed by default.
> 
> IOW you don't trust virtualized CPUID output?

That's correct; it's not trustworthy / reliable.

One of the discussions (of the many) is here:

https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg04716.html

Thanks
Laszlo

[PATCH] xen/privcmd: allow fetching resource sizes

2021-01-11 Thread Roger Pau Monne

Allow issuing an IOCTL_PRIVCMD_MMAP_RESOURCE ioctl with num = 0 and
addr = 0 in order to fetch the size of a specific resource.

Add a shortcut to the default map resource path, since fetching the
size requires no address to be passed in, and thus no VMA to setup.

Fixes: 3ad0876554caf ('xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE')
Signed-off-by: Roger Pau Monné 
---
NB: fetching the size of a resource shouldn't trigger an hypercall
preemption, and hence I've dropped the preempt indications.
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: Paul Durrant 
Cc: xen-devel@lists.xenproject.org
---
 drivers/xen/privcmd.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index b0c73c58f987..a6e7e6e4286f 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -717,14 +717,15 @@ static long privcmd_ioctl_restrict(struct file *file, 
void __user *udata)
return 0;
 }
 
-static long privcmd_ioctl_mmap_resource(struct file *file, void __user *udata)
+static long privcmd_ioctl_mmap_resource(struct file *file,
+   struct privcmd_mmap_resource __user *udata)
 {
struct privcmd_data *data = file->private_data;
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
struct privcmd_mmap_resource kdata;
xen_pfn_t *pfns = NULL;
-   struct xen_mem_acquire_resource xdata;
+   struct xen_mem_acquire_resource xdata = { };
int rc;
 
if (copy_from_user(&kdata, udata, sizeof(kdata)))
@@ -734,6 +735,18 @@ static long privcmd_ioctl_mmap_resource(struct file *file, 
void __user *udata)
if (data->domid != DOMID_INVALID && data->domid != kdata.dom)
return -EPERM;
 
+   xdata.domid = kdata.dom;
+   xdata.type = kdata.type;
+   xdata.id = kdata.id;
+
+   if (!kdata.addr && !kdata.num) {
+   /* Query the size of the resource. */
+   rc = HYPERVISOR_memory_op(XENMEM_acquire_resource, &xdata);
+   if (rc)
+   return rc;
+   return __put_user(xdata.nr_frames, &udata->num);
+   }
+
mmap_write_lock(mm);
 
vma = find_vma(mm, kdata.addr);
@@ -768,10 +781,6 @@ static long privcmd_ioctl_mmap_resource(struct file *file, 
void __user *udata)
} else
vma->vm_private_data = PRIV_VMA_LOCKED;
 
-   memset(&xdata, 0, sizeof(xdata));
-   xdata.domid = kdata.dom;
-   xdata.type = kdata.type;
-   xdata.id = kdata.id;
xdata.frame = kdata.idx;
xdata.nr_frames = kdata.num;
set_xen_guest_handle(xdata.frame_list, pfns);
-- 
2.29.2

[PATCH v3 05/11] tools/foreignmem: Support querying the size of a resource

2021-01-11 Thread Andrew Cooper

With the Xen side of this interface fixed to return real sizes, userspace
needs to be able to make the query.

Introduce xenforeignmemory_resource_size() for the purpose, bumping the
library minor version.

Update both Linux and FreeBSD's osdep_xenforeignmemory_map_resource() to
understand size requests, skip the mmap() operation, and copy back the
nr_frames field.

Signed-off-by: Andrew Cooper 
---
CC: Wei Liu 
CC: Paul Durrant 
CC: Roger Pau Monné 
CC: Ian Jackson 
CC: Michał Leszczyński 
CC: Hubert Jasudowicz 
CC: Tamas K Lengyel 

This depends on a bugfix to the Linux IOCTL to understand size requests and
pass them on to Xen.

v3:
 * Rewrite from scratch, to avoid breaking restricted domid situations.  In
   particular, we cannot open a xencall interface and issue blind hypercalls.
---
 tools/include/xenforeignmemory.h | 15 +++
 tools/libs/foreignmemory/Makefile|  2 +-
 tools/libs/foreignmemory/core.c  | 18 ++
 tools/libs/foreignmemory/freebsd.c   | 18 +++---
 tools/libs/foreignmemory/libxenforeignmemory.map |  4 
 tools/libs/foreignmemory/linux.c | 18 +++---
 6 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/tools/include/xenforeignmemory.h b/tools/include/xenforeignmemory.h
index d594be8df0..1ba2f5316b 100644
--- a/tools/include/xenforeignmemory.h
+++ b/tools/include/xenforeignmemory.h
@@ -179,6 +179,21 @@ xenforeignmemory_resource_handle 
*xenforeignmemory_map_resource(
 int xenforeignmemory_unmap_resource(
 xenforeignmemory_handle *fmem, xenforeignmemory_resource_handle *fres);
 
+/**
+ * Determine the maximum size of a specific resource.
+ *
+ * @parm fmem handle to the open foreignmemory interface
+ * @parm domid the domain id
+ * @parm type the resource type
+ * @parm id the type-specific resource identifier
+ *
+ * Return 0 on success and fills in *nr_frames.  Sets errno and return -1 on
+ * error.
+ */
+int xenforeignmemory_resource_size(
+xenforeignmemory_handle *fmem, domid_t domid, unsigned int type,
+unsigned int id, unsigned long *nr_frames);
+
 #endif
 
 /*
diff --git a/tools/libs/foreignmemory/Makefile 
b/tools/libs/foreignmemory/Makefile
index 13850f7988..90d80a49ae 100644
--- a/tools/libs/foreignmemory/Makefile
+++ b/tools/libs/foreignmemory/Makefile
@@ -2,7 +2,7 @@ XEN_ROOT = $(CURDIR)/../../..
 include $(XEN_ROOT)/tools/Rules.mk
 
 MAJOR= 1
-MINOR= 3
+MINOR= 4
 
 SRCS-y += core.c
 SRCS-$(CONFIG_Linux)   += linux.c
diff --git a/tools/libs/foreignmemory/core.c b/tools/libs/foreignmemory/core.c
index 63f12e2450..1e92c567e1 100644
--- a/tools/libs/foreignmemory/core.c
+++ b/tools/libs/foreignmemory/core.c
@@ -188,6 +188,24 @@ int xenforeignmemory_unmap_resource(
 return rc;
 }
 
+int xenforeignmemory_resource_size(
+xenforeignmemory_handle *fmem, domid_t domid, unsigned int type,
+unsigned int id, unsigned long *nr_frames)
+{
+xenforeignmemory_resource_handle fres = {
+.domid = domid,
+.type  = type,
+.id= id,
+};
+int rc = osdep_xenforeignmemory_map_resource(fmem, &fres);
+
+if ( rc )
+return rc;
+
+*nr_frames = fres.nr_frames;
+return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/foreignmemory/freebsd.c 
b/tools/libs/foreignmemory/freebsd.c
index 3d403a7cd0..9a2796f0b7 100644
--- a/tools/libs/foreignmemory/freebsd.c
+++ b/tools/libs/foreignmemory/freebsd.c
@@ -119,6 +119,10 @@ int 
osdep_xenforeignmemory_map_resource(xenforeignmemory_handle *fmem,
 };
 int rc;
 
+if ( !fres->addr && !fres->nr_frames )
+/* Request for resource size.  Skip mmap(). */
+goto skip_mmap;
+
 fres->addr = mmap(fres->addr, fres->nr_frames << PAGE_SHIFT,
   fres->prot, fres->flags | MAP_SHARED, fmem->fd, 0);
 if ( fres->addr == MAP_FAILED )
@@ -126,6 +130,7 @@ int 
osdep_xenforeignmemory_map_resource(xenforeignmemory_handle *fmem,
 
 mr.addr = (uintptr_t)fres->addr;
 
+ skip_mmap:
 rc = ioctl(fmem->fd, IOCTL_PRIVCMD_MMAP_RESOURCE, &mr);
 if ( rc )
 {
@@ -136,13 +141,20 @@ int 
osdep_xenforeignmemory_map_resource(xenforeignmemory_handle *fmem,
 else
 errno = EOPNOTSUPP;
 
-saved_errno = errno;
-osdep_xenforeignmemory_unmap_resource(fmem, fres);
-errno = saved_errno;
+if ( fres->addr )
+{
+saved_errno = errno;
+osdep_xenforeignmemory_unmap_resource(fmem, fres);
+errno = saved_errno;
+}
 
 return -1;
 }
 
+/* If requesting size, copy back. */
+if ( !fres->addr )
+fres->nr_frames = mr.num;
+
 return 0;
 }
 
diff --git a/tools/libs/foreignmemory/libxenforeignmemory.map 
b/tools/libs/foreignmemory/libxenforeignmemory.map
index d5323c87d9..8aca341b99 100644
--- a/tools/libs/foreignmemory/libxenforeignmemory.map
+++ b/tools/libs/foreignmem

Re: [PATCH] hvmloader: pass PCI MMIO layout to OVMF as an info table

2021-01-11 Thread Igor Druzhinin

On 11/01/2021 15:21, Jan Beulich wrote:
> On 11.01.2021 15:49, Laszlo Ersek wrote:
>> On 01/11/21 15:00, Igor Druzhinin wrote:
>>> On 11/01/2021 09:27, Jan Beulich wrote:
 On 11.01.2021 05:53, Igor Druzhinin wrote:
> We faced a problem with passing through a PCI device with 64GB BAR to
> UEFI guest. The BAR is expectedly programmed into 64-bit PCI aperture at
> 64G address which pushes physical address space to 37 bits. OVMF uses
> address width early in PEI phase to make DXE identity pages covering
> the whole addressable space so it needs to know the last address it needs
> to cover but at the same time not overdo the mappings.
>
> As there is seemingly no other way to pass or get this information in
> OVMF at this early phase (ACPI is not yet available, PCI is not yet 
> enumerated,
> xenstore is not yet initialized) - extend the info structure with a new
> table. Since the structure was initially created to be extendable -
> the change is backward compatible.

 How does UEFI handle the same situation on baremetal? I'd guess it is
 in even more trouble there, as it couldn't even read addresses from
 BARs, but would first need to assign them (or at least calculate
 their intended positions).
>>>
>>> Maybe Laszlo or Anthony could answer this question quickly while I'm 
>>> investigating?
>>
>> On the bare metal, the phys address width of the processor is known.
> 
> From CPUID I suppose.
> 
>> OVMF does the whole calculation in reverse because there's no way for it
>> to know the physical address width of the physical (= host) CPU.
>> "Overdoing" the mappings doesn't only waste resources, it breaks hard
>> with EPT -- access to a GPA that is inexpressible with the phys address
>> width of the host CPU (= not mappable successfully with the nested page
>> tables) will behave super bad. I don't recall the exact symptoms, but it
>> prevents booting the guest OS.
>>
>> This is why the most conservative 36-bit width is assumed by default.
> 
> IOW you don't trust virtualized CPUID output?

I'm discussing this with Andrew and it appears we're certainly more lax in
wiring physical address width into the guest from hardware directly rather
than KVM.

Another problem that I faced while experimenting is that creating page
tables for 46-bits (that CPUID returned in my case) of address space takes
about a minute on a modern CPU.

Igor

Re: [PATCH] hvmloader: pass PCI MMIO layout to OVMF as an info table

2021-01-11 Thread Jan Beulich

On 11.01.2021 15:49, Laszlo Ersek wrote:
> On 01/11/21 15:00, Igor Druzhinin wrote:
>> On 11/01/2021 09:27, Jan Beulich wrote:
>>> On 11.01.2021 05:53, Igor Druzhinin wrote:
 We faced a problem with passing through a PCI device with 64GB BAR to
 UEFI guest. The BAR is expectedly programmed into 64-bit PCI aperture at
 64G address which pushes physical address space to 37 bits. OVMF uses
 address width early in PEI phase to make DXE identity pages covering
 the whole addressable space so it needs to know the last address it needs
 to cover but at the same time not overdo the mappings.

 As there is seemingly no other way to pass or get this information in
 OVMF at this early phase (ACPI is not yet available, PCI is not yet 
 enumerated,
 xenstore is not yet initialized) - extend the info structure with a new
 table. Since the structure was initially created to be extendable -
 the change is backward compatible.
>>>
>>> How does UEFI handle the same situation on baremetal? I'd guess it is
>>> in even more trouble there, as it couldn't even read addresses from
>>> BARs, but would first need to assign them (or at least calculate
>>> their intended positions).
>>
>> Maybe Laszlo or Anthony could answer this question quickly while I'm 
>> investigating?
> 
> On the bare metal, the phys address width of the processor is known.

>From CPUID I suppose.

> OVMF does the whole calculation in reverse because there's no way for it
> to know the physical address width of the physical (= host) CPU.
> "Overdoing" the mappings doesn't only waste resources, it breaks hard
> with EPT -- access to a GPA that is inexpressible with the phys address
> width of the host CPU (= not mappable successfully with the nested page
> tables) will behave super bad. I don't recall the exact symptoms, but it
> prevents booting the guest OS.
> 
> This is why the most conservative 36-bit width is assumed by default.

IOW you don't trust virtualized CPUID output?

Jan

[PATCH 2/2] sysemu: Let VMChangeStateHandler take boolean 'running' argument

2021-01-11 Thread Philippe Mathieu-Daudé

The 'running' argument from VMChangeStateHandler does not require
other value than 0 / 1. Make it a plain boolean.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/sysemu/runstate.h   | 10 --
 target/arm/kvm_arm.h|  2 +-
 target/ppc/cpu-qom.h|  2 +-
 accel/xen/xen-all.c |  2 +-
 audio/audio.c   |  2 +-
 block/block-backend.c   |  2 +-
 gdbstub.c   |  2 +-
 hw/block/pflash_cfi01.c |  2 +-
 hw/block/virtio-blk.c   |  2 +-
 hw/display/qxl.c|  2 +-
 hw/i386/kvm/clock.c |  2 +-
 hw/i386/kvm/i8254.c |  2 +-
 hw/i386/kvmvapic.c  |  2 +-
 hw/i386/xen/xen-hvm.c   |  2 +-
 hw/ide/core.c   |  2 +-
 hw/intc/arm_gicv3_its_kvm.c |  2 +-
 hw/intc/arm_gicv3_kvm.c |  2 +-
 hw/intc/spapr_xive_kvm.c|  2 +-
 hw/misc/mac_via.c   |  2 +-
 hw/net/e1000e_core.c|  2 +-
 hw/nvram/spapr_nvram.c  |  2 +-
 hw/ppc/ppc.c|  2 +-
 hw/ppc/ppc_booke.c  |  2 +-
 hw/s390x/tod-kvm.c  |  2 +-
 hw/scsi/scsi-bus.c  |  2 +-
 hw/usb/hcd-ehci.c   |  2 +-
 hw/usb/host-libusb.c|  2 +-
 hw/usb/redirect.c   |  2 +-
 hw/vfio/migration.c |  2 +-
 hw/virtio/virtio-rng.c  |  2 +-
 hw/virtio/virtio.c  |  2 +-
 net/net.c   |  2 +-
 softmmu/memory.c|  2 +-
 softmmu/runstate.c  |  2 +-
 target/arm/kvm.c|  2 +-
 target/i386/kvm/kvm.c   |  2 +-
 target/i386/sev.c   |  2 +-
 target/i386/whpx/whpx-all.c |  2 +-
 target/mips/kvm.c   |  4 ++--
 ui/gtk.c|  2 +-
 ui/spice-core.c |  2 +-
 41 files changed, 49 insertions(+), 43 deletions(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 3ab35a039a0..a5356915734 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -10,7 +10,7 @@ bool runstate_is_running(void);
 bool runstate_needs_reset(void);
 bool runstate_store(char *str, size_t size);
 
-typedef void VMChangeStateHandler(void *opaque, int running, RunState state);
+typedef void VMChangeStateHandler(void *opaque, bool running, RunState state);
 
 VMChangeStateEntry *qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
  void *opaque);
@@ -20,7 +20,13 @@ VMChangeStateEntry 
*qdev_add_vm_change_state_handler(DeviceState *dev,
  VMChangeStateHandler *cb,
  void *opaque);
 void qemu_del_vm_change_state_handler(VMChangeStateEntry *e);
-void vm_state_notify(int running, RunState state);
+/**
+ * vm_state_notify: Notify the state of the VM
+ *
+ * @running: whether the VM is running or not.
+ * @state: the #RunState of the VM.
+ */
+void vm_state_notify(bool running, RunState state);
 
 static inline bool shutdown_caused_by_guest(ShutdownCause cause)
 {
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index eb81b7059eb..68ec970c4f4 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -352,7 +352,7 @@ void kvm_arm_get_virtual_time(CPUState *cs);
  */
 void kvm_arm_put_virtual_time(CPUState *cs);
 
-void kvm_arm_vm_state_change(void *opaque, int running, RunState state);
+void kvm_arm_vm_state_change(void *opaque, bool running, RunState state);
 
 int kvm_arm_vgic_probe(void);
 
diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
index 63b9e8632ca..118baf8d41f 100644
--- a/target/ppc/cpu-qom.h
+++ b/target/ppc/cpu-qom.h
@@ -218,7 +218,7 @@ extern const VMStateDescription vmstate_ppc_timebase;
 .offset = vmstate_offset_value(_state, _field, PPCTimebase),  \
 }
 
-void cpu_ppc_clock_vm_state_change(void *opaque, int running,
+void cpu_ppc_clock_vm_state_change(void *opaque, bool running,
RunState state);
 #endif
 
diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c
index 878a4089d97..3756aca27be 100644
--- a/accel/xen/xen-all.c
+++ b/accel/xen/xen-all.c
@@ -122,7 +122,7 @@ static void xenstore_record_dm_state(struct xs_handle *xs, 
const char *state)
 }
 
 
-static void xen_change_state_handler(void *opaque, int running,
+static void xen_change_state_handler(void *opaque, bool running,
  RunState state)
 {
 if (running) {
diff --git a/audio/audio.c b/audio/audio.c
index b48471bb3f6..f2d56e7e57d 100644
--- a/audio/audio.c
+++ b/audio/audio.c
@@ -1549,7 +1549,7 @@ static int audio_driver_init(AudioState *s, struct 
audio_driver *drv,
 }
 }
 
-static void audio_vm_change_state_handler (void *opaque, int running,
+static void audio_vm_change_state_handler (void *opaque, bool running,
RunState state)
 {
 AudioState *s = opaque;
diff --git a/block/block-backend.c b/block/block-backend.c
index ce78d30794a..9175eb237a2 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -163

[PATCH 1/2] sysemu/runstate: Let runstate_is_running() return bool

2021-01-11 Thread Philippe Mathieu-Daudé

runstate_check() returns a boolean. runstate_is_running()
returns what runstate_check() returns, also a boolean.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/sysemu/runstate.h | 2 +-
 softmmu/runstate.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index e557f470d42..3ab35a039a0 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -6,7 +6,7 @@
 
 bool runstate_check(RunState state);
 void runstate_set(RunState new_state);
-int runstate_is_running(void);
+bool runstate_is_running(void);
 bool runstate_needs_reset(void);
 bool runstate_store(char *str, size_t size);
 
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index 636aab0addb..c7a67147d17 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -217,7 +217,7 @@ void runstate_set(RunState new_state)
 current_run_state = new_state;
 }
 
-int runstate_is_running(void)
+bool runstate_is_running(void)
 {
 return runstate_check(RUN_STATE_RUNNING);
 }
-- 
2.26.2

[PATCH 0/2] sysemu: Let VMChangeStateHandler take boolean 'running' argument

2021-01-11 Thread Philippe Mathieu-Daudé

Trivial prototype change to clarify the use of the 'running'
argument of VMChangeStateHandler.

Green CI:
https://gitlab.com/philmd/qemu/-/pipelines/239497352

Philippe Mathieu-Daudé (2):
  sysemu/runstate: Let runstate_is_running() return bool
  sysemu: Let VMChangeStateHandler take boolean 'running' argument

 include/sysemu/runstate.h   | 12 +---
 target/arm/kvm_arm.h|  2 +-
 target/ppc/cpu-qom.h|  2 +-
 accel/xen/xen-all.c |  2 +-
 audio/audio.c   |  2 +-
 block/block-backend.c   |  2 +-
 gdbstub.c   |  2 +-
 hw/block/pflash_cfi01.c |  2 +-
 hw/block/virtio-blk.c   |  2 +-
 hw/display/qxl.c|  2 +-
 hw/i386/kvm/clock.c |  2 +-
 hw/i386/kvm/i8254.c |  2 +-
 hw/i386/kvmvapic.c  |  2 +-
 hw/i386/xen/xen-hvm.c   |  2 +-
 hw/ide/core.c   |  2 +-
 hw/intc/arm_gicv3_its_kvm.c |  2 +-
 hw/intc/arm_gicv3_kvm.c |  2 +-
 hw/intc/spapr_xive_kvm.c|  2 +-
 hw/misc/mac_via.c   |  2 +-
 hw/net/e1000e_core.c|  2 +-
 hw/nvram/spapr_nvram.c  |  2 +-
 hw/ppc/ppc.c|  2 +-
 hw/ppc/ppc_booke.c  |  2 +-
 hw/s390x/tod-kvm.c  |  2 +-
 hw/scsi/scsi-bus.c  |  2 +-
 hw/usb/hcd-ehci.c   |  2 +-
 hw/usb/host-libusb.c|  2 +-
 hw/usb/redirect.c   |  2 +-
 hw/vfio/migration.c |  2 +-
 hw/virtio/virtio-rng.c  |  2 +-
 hw/virtio/virtio.c  |  2 +-
 net/net.c   |  2 +-
 softmmu/memory.c|  2 +-
 softmmu/runstate.c  |  4 ++--
 target/arm/kvm.c|  2 +-
 target/i386/kvm/kvm.c   |  2 +-
 target/i386/sev.c   |  2 +-
 target/i386/whpx/whpx-all.c |  2 +-
 target/mips/kvm.c   |  4 ++--
 ui/gtk.c|  2 +-
 ui/spice-core.c |  2 +-
 41 files changed, 51 insertions(+), 45 deletions(-)

-- 
2.26.2

Re: [PATCH] tools/libxenstat: ensure strnlen() declaration is visible

2021-01-11 Thread Ian Jackson

Jan Beulich writes ("[PATCH] tools/libxenstat: ensure strnlen() declaration is 
visible"):
> Its guard was updated such that it is visible by default when POSIX 2008
> was adopted by glibc. It's not visible by default on older glibc.
> 
> Fixes: 40fe714ca424 ("tools/libs/stat: use memcpy instead of strncpy in 
> getBridge")
> Signed-off-by: Jan Beulich 

Reviewed-by: Ian Jackson

Re: [PATCH v2 05/11] tools/foreignmem: Support querying the size of a resource

2021-01-11 Thread Andrew Cooper

On 11/01/2021 10:50, Roger Pau Monné wrote:
> On Fri, Jan 08, 2021 at 05:52:36PM +, Andrew Cooper wrote:
>> On 22/09/2020 19:24, Andrew Cooper wrote:
>>> diff --git a/tools/libs/foreignmemory/linux.c 
>>> b/tools/libs/foreignmemory/linux.c
>>> index fe73d5ab72..eec089e232 100644
>>> --- a/tools/libs/foreignmemory/linux.c
>>> +++ b/tools/libs/foreignmemory/linux.c
>>> @@ -339,6 +342,39 @@ int osdep_xenforeignmemory_map_resource(
>>>  return 0;
>>>  }
>>>  
>>> +int osdep_xenforeignmemory_resource_size(
>>> +xenforeignmemory_handle *fmem, domid_t domid, unsigned int type,
>>> +unsigned int id, unsigned long *nr_frames)
>>> +{
>>> +int rc;
>>> +struct xen_mem_acquire_resource *xmar =
>>> +xencall_alloc_buffer(fmem->xcall, sizeof(*xmar));
>>> +
>>> +if ( !xmar )
>>> +{
>>> +PERROR("Could not bounce memory for acquire_resource hypercall");
>>> +return -1;
>>> +}
>>> +
>>> +*xmar = (struct xen_mem_acquire_resource){
>>> +.domid = domid,
>>> +.type = type,
>>> +.id = id,
>>> +};
>>> +
>>> +rc = xencall2(fmem->xcall, __HYPERVISOR_memory_op,
>>> +  XENMEM_acquire_resource, (uintptr_t)xmar);
>>> +if ( rc )
>>> +goto out;
>>> +
>>> +*nr_frames = xmar->nr_frames;
>>> +
>>> + out:
>>> +xencall_free_buffer(fmem->xcall, xmar);
>>> +
>>> +return rc;
>>> +}
>> Having talked this through with Roger, it's broken.
>>
>> In the meantime, foreignmem has gained acquire_resource on FreeBSD.
>> Nothing in this osdep function is linux-specific, so it oughtn't to be
>> osdep.
>>
>> However, its also not permitted to make hypercalls like this in
>> restricted mode, and that isn't something we should be breaking. 
>> Amongst other things, it will prevent us from supporting >128 cpus, as
>> Qemu needs updating to use this interface in due course.
>>
>> The only solution (which keeps restricted mode working) is to fix
>> Linux's ioctl() to be able to understand size requests.  This also
>> avoids foreignmem needing to open a xencall handle which was fugly in
>> the first place.
> I think the following patch should allow you to fetch the resource
> size from Linux privcmd driver by doing an ioctl with addr = 0 and num
> = 0. I've just build tested it, but I haven't tried exercising the
> code.
>
> Roger.
> ---8<---
> From 5d717c7b9ad3561ed0b17e7c5cf76b7c9fb536db Mon Sep 17 00:00:00 2001
> From: Roger Pau Monne 
> Date: Mon, 11 Jan 2021 10:38:59 +0100
> Subject: [PATCH] xen/privcmd: allow fetching resource sizes
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Allow issuing an IOCTL_PRIVCMD_MMAP_RESOURCE ioctl with num = 0 and
> addr = 0 in order to fetch the size of a specific resource.
>
> Add a shortcut to the default map resource path, since fetching the
> size requires no address to be passed in, and thus no VMA to setup.
>
> Signed-off-by: Roger Pau Monné 

Tested-by: Andrew Cooper 

> ---
> NB: fetching the size of a resource shouldn't trigger an hypercall
> preemption, and hence I've dropped the preempt indications.

Yeah - that's fine.  Querying the size isn't ever going to turn into a
long running operation from the guest's point of view.

I'll submit the matching patch for libxenforeignmem.

~Andrew

Re: [PATCH] hvmloader: pass PCI MMIO layout to OVMF as an info table

2021-01-11 Thread Laszlo Ersek

On 01/11/21 15:00, Igor Druzhinin wrote:
> On 11/01/2021 09:27, Jan Beulich wrote:
>> On 11.01.2021 05:53, Igor Druzhinin wrote:
>>> We faced a problem with passing through a PCI device with 64GB BAR to
>>> UEFI guest. The BAR is expectedly programmed into 64-bit PCI aperture at
>>> 64G address which pushes physical address space to 37 bits. OVMF uses
>>> address width early in PEI phase to make DXE identity pages covering
>>> the whole addressable space so it needs to know the last address it needs
>>> to cover but at the same time not overdo the mappings.
>>>
>>> As there is seemingly no other way to pass or get this information in
>>> OVMF at this early phase (ACPI is not yet available, PCI is not yet 
>>> enumerated,
>>> xenstore is not yet initialized) - extend the info structure with a new
>>> table. Since the structure was initially created to be extendable -
>>> the change is backward compatible.
>>
>> How does UEFI handle the same situation on baremetal? I'd guess it is
>> in even more trouble there, as it couldn't even read addresses from
>> BARs, but would first need to assign them (or at least calculate
>> their intended positions).
> 
> Maybe Laszlo or Anthony could answer this question quickly while I'm 
> investigating?

On the bare metal, the phys address width of the processor is known.

OVMF does the whole calculation in reverse because there's no way for it
to know the physical address width of the physical (= host) CPU.
"Overdoing" the mappings doesn't only waste resources, it breaks hard
with EPT -- access to a GPA that is inexpressible with the phys address
width of the host CPU (= not mappable successfully with the nested page
tables) will behave super bad. I don't recall the exact symptoms, but it
prevents booting the guest OS.

This is why the most conservative 36-bit width is assumed by default.

> 
>>> --- a/tools/firmware/hvmloader/ovmf.c
>>> +++ b/tools/firmware/hvmloader/ovmf.c
>>> @@ -61,6 +61,14 @@ struct ovmf_info {
>>>  uint32_t e820_nr;
>>>  } __attribute__ ((packed));
>>>  
>>> +#define OVMF_INFO_PCI_TABLE 0
>>> +struct ovmf_pci_info {
>>> +uint64_t low_start;
>>> +uint64_t low_end;
>>> +uint64_t hi_start;
>>> +uint64_t hi_end;
>>> +} __attribute__ ((packed));
>>
>> Forming part of ABI, I believe this belongs in a public header,
>> which consumers could at least in principle use verbatim if
>> they wanted to.

(In OVMF I strongly prefer hand-coded structures, due to the particular
coding style edk2 employs. Although Xen headers have been imported and
fixed up in the past, and so further importing would not be without
precedent for Xen in OVMF, those imported headers continue to stick out
like a sore thumb, due to their different coding style. That's not to
say the Xen coding style is "wrong" or anything; just that esp. when
those structs are *used* in code, they look quite out of place.)

Thanks,
Laszlo

> 
> It probably does, but if we'd want to move all of hand-over structures
> wholesale that would include seabios as well. I'd stick with the current
> approach to avoid code churn in various repos. Besides the structures
> are not the only bits of ABI that are implicitly shared with BIOS images.
> 
>>> @@ -74,9 +82,21 @@ static void ovmf_setup_bios_info(void)
>>>  static void ovmf_finish_bios_info(void)
>>>  {
>>>  struct ovmf_info *info = (void *)OVMF_INFO_PHYSICAL_ADDRESS;
>>> +struct ovmf_pci_info *pci_info;
>>> +uint64_t *tables = 
>>> scratch_alloc(sizeof(uint64_t)*OVMF_INFO_MAX_TABLES, 0);
>>
>> I wasn't able to locate OVMF_INFO_MAX_TABLES in either
>> xen/include/public/ or tools/firmware/. Where does it get
>> defined?
> 
> I expect it to be unlimited from OVMF side. It just expects an array of 
> tables_nr elements.
> 
>> Also (nit) missing blanks around * .
>>
>>>  uint32_t i;
>>>  uint8_t checksum;
>>>  
>>> +pci_info = scratch_alloc(sizeof(struct ovmf_pci_info), 0);
>>
>> Is "scratch" correct here and above? I guess intended usage /
>> scope will want spelling out somewhere.
> 
> Again, scratch_alloc is used universally for handing over info between 
> hvmloader
> and BIOS images. Where would you want it to be spelled out?
> 
>>> +pci_info->low_start = pci_mem_start;
>>> +pci_info->low_end = pci_mem_end;
>>> +pci_info->hi_start = pci_hi_mem_start;
>>> +pci_info->hi_end = pci_hi_mem_end;
>>> +
>>> +tables[OVMF_INFO_PCI_TABLE] = (uint32_t)pci_info;
>>> +info->tables = (uint32_t)tables;
>>> +info->tables_nr = 1;
>>
>> In how far is this problem (and hence solution / workaround) OVMF
>> specific? IOW don't we need a more generic approach here?
> 
> I believe it's very OVMF specific given only OVMF constructs identity page
> tables for the whole address space - that's how it was designed. Seabios to
> the best of my knowledge only has access to lower 4G.
> 
> Igor
>

Re: [PATCH] hvmloader: pass PCI MMIO layout to OVMF as an info table

2021-01-11 Thread Igor Druzhinin

On 11/01/2021 14:14, Jan Beulich wrote:
> On 11.01.2021 15:00, Igor Druzhinin wrote:
>> On 11/01/2021 09:27, Jan Beulich wrote:
>>> On 11.01.2021 05:53, Igor Druzhinin wrote:
 --- a/tools/firmware/hvmloader/ovmf.c
 +++ b/tools/firmware/hvmloader/ovmf.c
 @@ -61,6 +61,14 @@ struct ovmf_info {
  uint32_t e820_nr;
  } __attribute__ ((packed));
  
 +#define OVMF_INFO_PCI_TABLE 0
 +struct ovmf_pci_info {
 +uint64_t low_start;
 +uint64_t low_end;
 +uint64_t hi_start;
 +uint64_t hi_end;
 +} __attribute__ ((packed));
>>>
>>> Forming part of ABI, I believe this belongs in a public header,
>>> which consumers could at least in principle use verbatim if
>>> they wanted to.
>>
>> It probably does, but if we'd want to move all of hand-over structures
>> wholesale that would include seabios as well. I'd stick with the current
>> approach to avoid code churn in various repos. Besides the structures
>> are not the only bits of ABI that are implicitly shared with BIOS images.
> 
> Well, so be it then for the time being. I'm going to be
> hesitant though ack-ing such, no matter that there are (bad)
> precedents. What I'd like to ask for as a minimum is to have
> a comment here clarifying this struct can't be changed
> arbitrarily because of being part of an ABI.

Ok, I will improve information in comments in an additional commit.

 @@ -74,9 +82,21 @@ static void ovmf_setup_bios_info(void)
  static void ovmf_finish_bios_info(void)
  {
  struct ovmf_info *info = (void *)OVMF_INFO_PHYSICAL_ADDRESS;
 +struct ovmf_pci_info *pci_info;
 +uint64_t *tables = 
 scratch_alloc(sizeof(uint64_t)*OVMF_INFO_MAX_TABLES, 0);
>>>
>>> I wasn't able to locate OVMF_INFO_MAX_TABLES in either
>>> xen/include/public/ or tools/firmware/. Where does it get
>>> defined?
>>
>> I expect it to be unlimited from OVMF side. It just expects an array of 
>> tables_nr elements.
> 
> That wasn't the (primary) question. Me not being able to locate
> the place where this constant gets #define-d means I wonder how
> this code builds.

It's right up there in the same file.

>>> Also (nit) missing blanks around * .
>>>
  uint32_t i;
  uint8_t checksum;
  
 +pci_info = scratch_alloc(sizeof(struct ovmf_pci_info), 0);
>>>
>>> Is "scratch" correct here and above? I guess intended usage /
>>> scope will want spelling out somewhere.
>>
>> Again, scratch_alloc is used universally for handing over info between 
>> hvmloader
>> and BIOS images. Where would you want it to be spelled out?
> 
> Next to where all the involved structures get declared.
> Consumers need to be aware they may need to take precautions to
> avoid clobbering the contents before consuming it. But as per
> above there doesn't look to be such a central place (yet).

I will duplicate the comments for now in all places involved.
The struct checksum I believe servers exactly the purpose you described -
to catch that sort of bugs early.

 +pci_info->low_start = pci_mem_start;
 +pci_info->low_end = pci_mem_end;
 +pci_info->hi_start = pci_hi_mem_start;
 +pci_info->hi_end = pci_hi_mem_end;
 +
 +tables[OVMF_INFO_PCI_TABLE] = (uint32_t)pci_info;
 +info->tables = (uint32_t)tables;
 +info->tables_nr = 1;
>>>
>>> In how far is this problem (and hence solution / workaround) OVMF
>>> specific? IOW don't we need a more generic approach here?
>>
>> I believe it's very OVMF specific given only OVMF constructs identity page
>> tables for the whole address space - that's how it was designed. Seabios to
>> the best of my knowledge only has access to lower 4G.
> 
> Quite likely, yet how would SeaBIOS access such a huge frame
> buffer then? They can't possibly place it below 4G. Do systems
> with such video cards get penalized by e.g. not surfacing VESA
> mode changing functionality?

Yes, VESA FB pointer is 32 bit only.
The framebuffer itself from my experience is located in a separate smaller BAR
on real cards. That makes it usually land in below 4G that masks the problem
in most scenarios.

Igor

1 2 >

1 - 100 of 144 matches

Mail list logo