date:20200505

Re: Troubles running Xen on Raspberry Pi 4 with 5.6.1 DomU

2020-05-05 Thread Jürgen Groß


On 06.05.20 00:34, Stefano Stabellini wrote:

+ Boris, Jürgen

See the crash Roman is seeing booting dom0 on the Raspberry Pi. It is
related to the recent xen dma_ops changes. Possibly the same thing
reported by Peng here:

https://marc.info/?l=linux-kernel=158805976230485=2

An in-depth analysis below.


On Mon, 4 May 2020, Roman Shaposhnik wrote:

[2.534292] Unable to handle kernel paging request at virtual
address 0026c340
[2.542373] Mem abort info:
[2.545257]   ESR = 0x9604
[2.548421]   EC = 0x25: DABT (current EL), IL = 32 bits
[2.553877]   SET = 0, FnV = 0
[2.557023]   EA = 0, S1PTW = 0
[2.560297] Data abort info:
[2.563258]   ISV = 0, ISS = 0x0004
[2.567208]   CM = 0, WnR = 0
[2.570294] [0026c340] user address but active_mm is swapper
[2.576783] Internal error: Oops: 9604 [#1] SMP
[2.581784] Modules linked in:
[2.584950] CPU: 3 PID: 135 Comm: kworker/3:1 Not tainted 5.6.1-default #9
[2.591970] Hardware name: Raspberry Pi 4 Model B (DT)
[2.597256] Workqueue: events deferred_probe_work_func
[2.602509] pstate: 6005 (nZCv daif -PAN -UAO)
[2.607431] pc : xen_swiotlb_free_coherent+0x198/0x1d8
[2.612696] lr : dma_free_attrs+0x98/0xd0
[2.616827] sp : 800011db3970
[2.620242] x29: 800011db3970 x28: 000f7b00
[2.625695] x27: 1000 x26: 37b68410
[2.631129] x25: 0001 x24: f7b0
[2.636583] x23:  x22: 
[2.642017] x21: 800011b0d000 x20: 80001179b548
[2.647461] x19: 37b68410 x18: 0010
[2.652905] x17: 35d66a00 x16: deadbeef
[2.658348] x15:  x14: 80001179b548
[2.663792] x13: 800091db37b7 x12: 800011db37bf
[2.669236] x11: 8000117c7000 x10: 800011db3740
[2.674680] x9 : ffd0 x8 : 80001071e980
[2.680124] x7 : 0132 x6 : 80001197a6ab
[2.685568] x5 :  x4 : 
[2.691012] x3 : f7b0 x2 : fde0
[2.696465] x1 : 0026c340 x0 : 0246c340
[2.701899] Call trace:
[2.704452]  xen_swiotlb_free_coherent+0x198/0x1d8
[2.709367]  dma_free_attrs+0x98/0xd0
[2.713143]  rpi_firmware_property_list+0x1e4/0x240
[2.718146]  rpi_firmware_property+0x6c/0xb0
[2.722535]  rpi_firmware_probe+0xf0/0x1e0
[2.726760]  platform_drv_probe+0x50/0xa0
[2.730879]  really_probe+0xd8/0x438
[2.734567]  driver_probe_device+0xdc/0x130
[2.738870]  __device_attach_driver+0x88/0x108
[2.743434]  bus_for_each_drv+0x78/0xc8
[2.747386]  __device_attach+0xd4/0x158
[2.751337]  device_initial_probe+0x10/0x18
[2.755649]  bus_probe_device+0x90/0x98
[2.759590]  deferred_probe_work_func+0x88/0xd8
[2.764244]  process_one_work+0x1f0/0x3c0
[2.768369]  worker_thread+0x138/0x570
[2.772234]  kthread+0x118/0x120
[2.775571]  ret_from_fork+0x10/0x18
[2.779263] Code: d34cfc00 f2dfbfe2 d37ae400 8b020001 (f8626800)
[2.785492] ---[ end trace 4c435212e349f45f ]---
[2.793340] usb 1-1: New USB device found, idVendor=2109,
idProduct=3431, bcdDevice= 4.20
[2.801038] usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[2.808297] usb 1-1: Product: USB2.0 Hub
[2.813710] hub 1-1:1.0: USB hub found
[2.817117] hub 1-1:1.0: 4 ports detected

This is bailing out right here:
  
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/firmware/raspberrypi.c?h=v5.6.1#n125

FYIW (since I modified the source to actually print what was returned
right before it bails) we get:
buf[1] == 0x80004
buf[2] == 0x0001

Status 0x80004 is of course suspicious since it is not even listed here:
 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/soc/bcm2835/raspberrypi-firmware.h#n14

So it appears that this DMA request path is somehow busted and it
would be really nice to figure out why.


You have actually discovered a genuine bug in the recent xen dma rework
in Linux. Congrats :-)


Nice! ;-)


I am doing some guesswork here, but from what I read in the thread and
the information in this email I think this patch might fix the issue.
If it doesn't fix the issue please add a few printks in
drivers/xen/swiotlb-xen.c:xen_swiotlb_free_coherent and please let me
know where exactly it crashes.


diff --git a/include/xen/arm/page-coherent.h b/include/xen/arm/page-coherent.h
index b9cc11e887ed..ff4677ed9788 100644
--- a/include/xen/arm/page-coherent.h
+++ b/include/xen/arm/page-coherent.h
@@ -8,12 +8,17 @@
  static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t 
size,
 dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs)
  {
+   void *cpu_addr;
+   if (dma_alloc_from_dev_coherent(hwdev, size, dma_handle, _addr))
+   return cpu_addr;
 return

Re: [PATCH] xenbus: avoid stack overflow warning

2020-05-05 Thread Jürgen Groß


On 05.05.20 22:57, Arnd Bergmann wrote:

On Tue, May 5, 2020 at 6:02 PM Jürgen Groß  wrote:

On 05.05.20 17:01, Arnd Bergmann wrote:

On Tue, May 5, 2020 at 4:34 PM Jürgen Groß  wrote:

On 05.05.20 16:15, Arnd Bergmann wrote:


I considered that as well, and don't really mind either way. I think it does
get a bit ugly whatever we do. If you prefer the union, I can respin the
patch that way.


Hmm, thinking more about it I think the real clean solution would be to
extend struct map_ring_valloc_hvm to cover the pv case, too, to add the
map and unmap arrays (possibly as a union) to it and to allocate it
dynamically instead of having it on the stack.

Would you be fine doing this?


This is a little more complex than I'd want to do without doing any testing
(and no, I don't want to do the testing either) ;-)

It does sound like a better approach though.


I take this as you are fine with me writing the patch and adding you as
"Reported-by:"?


Juergen

Re: Troubles running Xen on Raspberry Pi 4 with 5.6.1 DomU

2020-05-05 Thread Roman Shaposhnik

On Tue, May 5, 2020 at 3:34 PM Stefano Stabellini
 wrote:
>
> + Boris, Jürgen
>
> See the crash Roman is seeing booting dom0 on the Raspberry Pi. It is
> related to the recent xen dma_ops changes. Possibly the same thing
> reported by Peng here:
>
> https://marc.info/?l=linux-kernel=158805976230485=2
>
> An in-depth analysis below.
>
>
> On Mon, 4 May 2020, Roman Shaposhnik wrote:
> > > > [2.534292] Unable to handle kernel paging request at virtual
> > > > address 0026c340
> > > > [2.542373] Mem abort info:
> > > > [2.545257]   ESR = 0x9604
> > > > [2.548421]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > [2.553877]   SET = 0, FnV = 0
> > > > [2.557023]   EA = 0, S1PTW = 0
> > > > [2.560297] Data abort info:
> > > > [2.563258]   ISV = 0, ISS = 0x0004
> > > > [2.567208]   CM = 0, WnR = 0
> > > > [2.570294] [0026c340] user address but active_mm is swapper
> > > > [2.576783] Internal error: Oops: 9604 [#1] SMP
> > > > [2.581784] Modules linked in:
> > > > [2.584950] CPU: 3 PID: 135 Comm: kworker/3:1 Not tainted 
> > > > 5.6.1-default #9
> > > > [2.591970] Hardware name: Raspberry Pi 4 Model B (DT)
> > > > [2.597256] Workqueue: events deferred_probe_work_func
> > > > [2.602509] pstate: 6005 (nZCv daif -PAN -UAO)
> > > > [2.607431] pc : xen_swiotlb_free_coherent+0x198/0x1d8
> > > > [2.612696] lr : dma_free_attrs+0x98/0xd0
> > > > [2.616827] sp : 800011db3970
> > > > [2.620242] x29: 800011db3970 x28: 000f7b00
> > > > [2.625695] x27: 1000 x26: 37b68410
> > > > [2.631129] x25: 0001 x24: f7b0
> > > > [2.636583] x23:  x22: 
> > > > [2.642017] x21: 800011b0d000 x20: 80001179b548
> > > > [2.647461] x19: 37b68410 x18: 0010
> > > > [2.652905] x17: 35d66a00 x16: deadbeef
> > > > [2.658348] x15:  x14: 80001179b548
> > > > [2.663792] x13: 800091db37b7 x12: 800011db37bf
> > > > [2.669236] x11: 8000117c7000 x10: 800011db3740
> > > > [2.674680] x9 : ffd0 x8 : 80001071e980
> > > > [2.680124] x7 : 0132 x6 : 80001197a6ab
> > > > [2.685568] x5 :  x4 : 
> > > > [2.691012] x3 : f7b0 x2 : fde0
> > > > [2.696465] x1 : 0026c340 x0 : 0246c340
> > > > [2.701899] Call trace:
> > > > [2.704452]  xen_swiotlb_free_coherent+0x198/0x1d8
> > > > [2.709367]  dma_free_attrs+0x98/0xd0
> > > > [2.713143]  rpi_firmware_property_list+0x1e4/0x240
> > > > [2.718146]  rpi_firmware_property+0x6c/0xb0
> > > > [2.722535]  rpi_firmware_probe+0xf0/0x1e0
> > > > [2.726760]  platform_drv_probe+0x50/0xa0
> > > > [2.730879]  really_probe+0xd8/0x438
> > > > [2.734567]  driver_probe_device+0xdc/0x130
> > > > [2.738870]  __device_attach_driver+0x88/0x108
> > > > [2.743434]  bus_for_each_drv+0x78/0xc8
> > > > [2.747386]  __device_attach+0xd4/0x158
> > > > [2.751337]  device_initial_probe+0x10/0x18
> > > > [2.755649]  bus_probe_device+0x90/0x98
> > > > [2.759590]  deferred_probe_work_func+0x88/0xd8
> > > > [2.764244]  process_one_work+0x1f0/0x3c0
> > > > [2.768369]  worker_thread+0x138/0x570
> > > > [2.772234]  kthread+0x118/0x120
> > > > [2.775571]  ret_from_fork+0x10/0x18
> > > > [2.779263] Code: d34cfc00 f2dfbfe2 d37ae400 8b020001 (f8626800)
> > > > [2.785492] ---[ end trace 4c435212e349f45f ]---
> > > > [2.793340] usb 1-1: New USB device found, idVendor=2109,
> > > > idProduct=3431, bcdDevice= 4.20
> > > > [2.801038] usb 1-1: New USB device strings: Mfr=0, Product=1, 
> > > > SerialNumber=0
> > > > [2.808297] usb 1-1: Product: USB2.0 Hub
> > > > [2.813710] hub 1-1:1.0: USB hub found
> > > > [2.817117] hub 1-1:1.0: 4 ports detected
> > > >
> > > > This is bailing out right here:
> > > >  
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/firmware/raspberrypi.c?h=v5.6.1#n125
> > > >
> > > > FYIW (since I modified the source to actually print what was returned
> > > > right before it bails) we get:
> > > >buf[1] == 0x80004
> > > >buf[2] == 0x0001
> > > >
> > > > Status 0x80004 is of course suspicious since it is not even listed 
> > > > here:
> > > > 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/soc/bcm2835/raspberrypi-firmware.h#n14
> > > >
> > > > So it appears that this DMA request path is somehow busted and it
> > > > would be really nice to figure out why.
> > >
> > > You have actually discovered a genuine bug in the recent xen dma rework
> > > in Linux. Congrats :-)
> >
> > Nice! ;-)
> >
> > > I am doing some guesswork here, but from what I read in the thread and
> > > the information in this email I think this

[RFC PATCH] docs/designs: domB design document

2020-05-05 Thread Christopher Clark

Adds a design document for DomB.

Signed-off-by: Christopher Clark 
Signed-off by: Daniel P. Smith 
---


This is a Request for Comments on this draft design document which
describes the motivation and design for the funded development of domB.
We invite discussion of this on this month’s Xen Community Call on the
7th of May.



 docs/designs/disaggregated-launch.md | 297 +++
 1 file changed, 297 insertions(+)
 create mode 100644 docs/designs/disaggregated-launch.md

diff --git a/docs/designs/disaggregated-launch.md 
b/docs/designs/disaggregated-launch.md
new file mode 100644
index 00..de4db1baed
--- /dev/null
+++ b/docs/designs/disaggregated-launch.md
@@ -0,0 +1,297 @@
+# Introduction
+
+Born out of improving support for Dynamic Root of Trust for Measurement (DRTM),
+the DomB project is focused on restructuring the system launch of Xen. It
+provides a security architecture that builds on the principles of Least
+Privilege and Strong Isolation, achieving this through the disaggregation of
+system functions. DomB enables this with the introduction of a boot domain that
+works in conjunction with the hypervisor to provide the ability to launch
+multiple domains as part of host boot while maintaining a least privilege
+implementation.
+
+While the DomB project inception was and continues to be driven by a focus on
+security through disaggregation, there are multiple use cases with a
+non-security focus that would benefit from or are dependent upon the ability to
+launch multiple domains at host boot. This has been proven with the need that
+drove the implementation of the dom0less capability in the ARM branch of Xen. 
As
+such the design for DomB has been developed to allow for a flexible solution
+that is reusable across multiple use cases. In doing so this should ensure that
+DomB is capable, widely exercised, comprehensively tested, and well understood
+by the Xen community.
+
+When looking across those that have expressed interest or discussed a need for
+launching multiple domains at host boot, a majority are able to be binned in 
one
+of the four following use cases. 
+
+  * Deprivileging Dom0: disaggregating and/or eliminating one or more of 
Dom0’s roles
+
+  * System partitioning: dividing a systems resources and/or functions 
performed to a set of domains
+
+  * Low-latency boot and reboot: fast launch of the initial and possible final 
set of domains
+
+  * Isolated multiplexing of singular system resources: enabling dedicated 
domains to manage and multiplex a system resource for which only one exist but 
all domains must believe have exclusive access
+
+It is with this understanding as presented that the DomB project used as the
+basis for the development of its multiple domain boot capability for Xen. 
Within
+the remainder of this document is a detailed explanation of the multiple domain
+boot, the objectives it strives to achieve, the structure behind the approach,
+the sequence events in a boot, a contrasting with ARM’s dom0less, and closing
+with some exemplar implementations.
+
+## Terminology
+
+To help ensure clarity in reading this document, the following is the 
definition
+of terminology used within this document.
+
+* __Host Boot__: the system startup of Xen using the configuration provided by 
the bootloader
+* __Classic Boot__: the existing host boot that ends with the launch of a 
single domain (Dom0)
+* __Multiple Domain Boot__: a host boot that ends with the launch of one or 
more domains
+* __Boot Domain__: a domain with limited privileges launched by the hypervisor 
during a Multiple Domain Boot that is responsible for assisting with higher 
level operations of the domain setup process
+* __Initial Domain__: a domain other than the boot domain that is started as 
part of a multiple domain boot
+* __Crash Domain__: a fallback domain that the hypervisor may launch in the 
event of a detectable error during the multiple domain boot process
+* __Basic Configuration__: the minimal information Xen requires to instantiate 
a domain instance
+* __Extended Configuration__: any configuration options for a domain beyond 
its Basic Configuration
+
+
+# Approach
+
+At the outset of design of DomB a core set of objectives were established. 
These
+objectives were viewed as the bar that should be strived towards inorder to
+minimize the impact for existing Xen environments.
+
+## Objectives
+
+* In general strive to maintain compatibility with existing Xen behavior
+ 
+* A default build of Xen should be capable of booting both style of launch: 
Dom0 or DomB
+- Preferred that it be managed via two KCONFIG options to govern inclusion 
of support for each style
+* The selection between classic boot and multiple domain boot should be 
automatic
+- Preferred that it not require a kernel command line parameter for 
selection
+* It should not require modification to boot loaders
+* It should provide a user friendly interface for its configuration and

[xen-4.9-testing test] 150038: regressions - trouble: fail/pass/starved

2020-05-05 Thread osstest service owner

flight 150038 xen-4.9-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/150038/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop   fail REGR. vs. 149649

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop  fail blocked in 149649
 test-amd64-amd64-xl-qemuu-ws16-amd64 16 guest-localmigrate/x10 fail blocked in 
149649
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-localmigrate/x10 fail like 149649
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 149649
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail  like 149649
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 149649
 test-amd64-i386-xl-qemut-ws16-amd64 16 guest-localmigrate/x10 fail like 149649
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 149649
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx  2 hosts-allocate   starved  n/a

version targeted for testing:
 xen  93cc305d1f3e7c6949a8f4116446624fa2dbfdf4
baseline version:
 xen  45c90737d5f0c8bf479adcd8cb88450f1998e55c

Last test of basis   149649  2020-04-14 13:35:25 Z   21 days
Testing same since   150038  2020-05-05 16:06:01 Z0 days1 attempts


People who touched revisions under test:
  Juergen Gross 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf

[xen-4.10-testing test] 150039: tolerable trouble: fail/pass/starved - PUSHED

2020-05-05 Thread osstest service owner

flight 150039 xen-4.10-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/150039/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail like 149676
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail like 149676
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install 
fail never pass
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install 
fail never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
 test-arm64-arm64-xl-thunderx  2 hosts-allocate   starved  n/a

version targeted for testing:
 xen  b922c4431df33ed5b724e53c3f3348e470ddd349
baseline version:
 xen  24d62e126296b3f67dabdd49887d41d8ed69487f

Last test of basis   149676  2020-04-15 13:44:41 Z   20 days
Testing same since   150039  2020-05-05 16:06:01 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Anthony PERARD 
  Ian Jackson 
  Jan Beulich 
  Juergen Gross 
  Wei Liu 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm

[PATCH] optee: immediately free buffers that are released by OP-TEE

2020-05-05 Thread Volodymyr Babchuk

Normal World can share buffer with OP-TEE for two reasons:
1. Some client application wants to exchange data with TA
2. OP-TEE asks for shared buffer for internal needs

The second case was handle more strictly than necessary:

1. In RPC request OP-TEE asks for buffer
2. NW allocates buffer and provides it via RPC response
3. Xen pins pages and translates data
4. Xen provides buffer to OP-TEE
5. OP-TEE uses it
6. OP-TEE sends request to free the buffer
7. NW frees the buffer and sends the RPC response
8. Xen unpins pages and forgets about the buffer

The problem is that Xen should forget about buffer in between stages 6
and 7. I.e. the right flow should be like this:

6. OP-TEE sends request to free the buffer
7. Xen unpins pages and forgets about the buffer
8. NW frees the buffer and sends the RPC response

This is because OP-TEE internally frees the buffer before sending the
"free SHM buffer" request. So we have no reason to hold reference for
this buffer anymore. Moreover, in multiprocessor systems NW have time
to reuse buffer cookie for another buffer. Xen complained about this
and denied the new buffer registration. I have seen this issue while
running tests on iMX SoC.

So, this patch basically corrects that behavior by freeing the buffer
earlier, when handling RPC return from OP-TEE.

Signed-off-by: Volodymyr Babchuk 
---
 xen/arch/arm/tee/optee.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
index 6a035355db..af19fc31f8 100644
--- a/xen/arch/arm/tee/optee.c
+++ b/xen/arch/arm/tee/optee.c
@@ -1099,6 +1099,26 @@ static int handle_rpc_return(struct optee_domain *ctx,
 if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_ALLOC )
 call->rpc_buffer_type = shm_rpc->xen_arg->params[0].u.value.a;
 
+/*
+ * OP-TEE signals that it frees the buffer that it requested
+ * before. This is the right for us to do the same.
+ */
+if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_FREE )
+{
+uint64_t cookie = shm_rpc->xen_arg->params[0].u.value.b;
+
+free_optee_shm_buf(ctx, cookie);
+
+/*
+ * This should never happen. We have a bug either in the
+ * OP-TEE or in the mediator.
+ */
+if ( call->rpc_data_cookie && call->rpc_data_cookie != cookie )
+gprintk(XENLOG_ERR,
+"Saved RPC cookie does not corresponds to OP-TEE's 
(%"PRIx64" != %"PRIx64")\n",
+call->rpc_data_cookie, cookie);
+call->rpc_data_cookie = 0;
+}
 unmap_domain_page(shm_rpc->xen_arg);
 }
 
@@ -1464,10 +1484,6 @@ static void handle_rpc_cmd(struct optee_domain *ctx, 
struct cpu_user_regs *regs,
 }
 break;
 case OPTEE_RPC_CMD_SHM_FREE:
-free_optee_shm_buf(ctx, shm_rpc->xen_arg->params[0].u.value.b);
-if ( call->rpc_data_cookie ==
- shm_rpc->xen_arg->params[0].u.value.b )
-call->rpc_data_cookie = 0;
 break;
 default:
 break;
-- 
2.25.0

Re: Troubles running Xen on Raspberry Pi 4 with 5.6.1 DomU

2020-05-05 Thread Boris Ostrovsky



On 5/5/20 6:34 PM, Stefano Stabellini wrote:
>
>  
> The crash happens here:
>
>   if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||
>range_straddles_page_boundary(phys, size)) &&
>   TestClearPageXenRemapped(virt_to_page(vaddr)))
>   xen_destroy_contiguous_region(phys, order);
>
> I don't know exactly what is causing the crash. Is it the WARN_ON somehow?
> Is it TestClearPageXenRemapped? Neither should cause a crash in theory.


This doesn't look like warning to me. Would it be possible to see what
xen_swiotlb_free_coherent+0x198 corresponds to in code? I don't know if
./scripts/faddr2line works for ARM but could you try


./scripts/faddr2line vmlinux xen_swiotlb_free_coherent+0x198


-boris

Re: Troubles running Xen on Raspberry Pi 4 with 5.6.1 DomU

2020-05-05 Thread Stefano Stabellini

+ Boris, Jürgen

See the crash Roman is seeing booting dom0 on the Raspberry Pi. It is
related to the recent xen dma_ops changes. Possibly the same thing
reported by Peng here:

https://marc.info/?l=linux-kernel=158805976230485=2

An in-depth analysis below.


On Mon, 4 May 2020, Roman Shaposhnik wrote:
> > > [2.534292] Unable to handle kernel paging request at virtual
> > > address 0026c340
> > > [2.542373] Mem abort info:
> > > [2.545257]   ESR = 0x9604
> > > [2.548421]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > [2.553877]   SET = 0, FnV = 0
> > > [2.557023]   EA = 0, S1PTW = 0
> > > [2.560297] Data abort info:
> > > [2.563258]   ISV = 0, ISS = 0x0004
> > > [2.567208]   CM = 0, WnR = 0
> > > [2.570294] [0026c340] user address but active_mm is swapper
> > > [2.576783] Internal error: Oops: 9604 [#1] SMP
> > > [2.581784] Modules linked in:
> > > [2.584950] CPU: 3 PID: 135 Comm: kworker/3:1 Not tainted 
> > > 5.6.1-default #9
> > > [2.591970] Hardware name: Raspberry Pi 4 Model B (DT)
> > > [2.597256] Workqueue: events deferred_probe_work_func
> > > [2.602509] pstate: 6005 (nZCv daif -PAN -UAO)
> > > [2.607431] pc : xen_swiotlb_free_coherent+0x198/0x1d8
> > > [2.612696] lr : dma_free_attrs+0x98/0xd0
> > > [2.616827] sp : 800011db3970
> > > [2.620242] x29: 800011db3970 x28: 000f7b00
> > > [2.625695] x27: 1000 x26: 37b68410
> > > [2.631129] x25: 0001 x24: f7b0
> > > [2.636583] x23:  x22: 
> > > [2.642017] x21: 800011b0d000 x20: 80001179b548
> > > [2.647461] x19: 37b68410 x18: 0010
> > > [2.652905] x17: 35d66a00 x16: deadbeef
> > > [2.658348] x15:  x14: 80001179b548
> > > [2.663792] x13: 800091db37b7 x12: 800011db37bf
> > > [2.669236] x11: 8000117c7000 x10: 800011db3740
> > > [2.674680] x9 : ffd0 x8 : 80001071e980
> > > [2.680124] x7 : 0132 x6 : 80001197a6ab
> > > [2.685568] x5 :  x4 : 
> > > [2.691012] x3 : f7b0 x2 : fde0
> > > [2.696465] x1 : 0026c340 x0 : 0246c340
> > > [2.701899] Call trace:
> > > [2.704452]  xen_swiotlb_free_coherent+0x198/0x1d8
> > > [2.709367]  dma_free_attrs+0x98/0xd0
> > > [2.713143]  rpi_firmware_property_list+0x1e4/0x240
> > > [2.718146]  rpi_firmware_property+0x6c/0xb0
> > > [2.722535]  rpi_firmware_probe+0xf0/0x1e0
> > > [2.726760]  platform_drv_probe+0x50/0xa0
> > > [2.730879]  really_probe+0xd8/0x438
> > > [2.734567]  driver_probe_device+0xdc/0x130
> > > [2.738870]  __device_attach_driver+0x88/0x108
> > > [2.743434]  bus_for_each_drv+0x78/0xc8
> > > [2.747386]  __device_attach+0xd4/0x158
> > > [2.751337]  device_initial_probe+0x10/0x18
> > > [2.755649]  bus_probe_device+0x90/0x98
> > > [2.759590]  deferred_probe_work_func+0x88/0xd8
> > > [2.764244]  process_one_work+0x1f0/0x3c0
> > > [2.768369]  worker_thread+0x138/0x570
> > > [2.772234]  kthread+0x118/0x120
> > > [2.775571]  ret_from_fork+0x10/0x18
> > > [2.779263] Code: d34cfc00 f2dfbfe2 d37ae400 8b020001 (f8626800)
> > > [2.785492] ---[ end trace 4c435212e349f45f ]---
> > > [2.793340] usb 1-1: New USB device found, idVendor=2109,
> > > idProduct=3431, bcdDevice= 4.20
> > > [2.801038] usb 1-1: New USB device strings: Mfr=0, Product=1, 
> > > SerialNumber=0
> > > [2.808297] usb 1-1: Product: USB2.0 Hub
> > > [2.813710] hub 1-1:1.0: USB hub found
> > > [2.817117] hub 1-1:1.0: 4 ports detected
> > >
> > > This is bailing out right here:
> > >  
> > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/firmware/raspberrypi.c?h=v5.6.1#n125
> > >
> > > FYIW (since I modified the source to actually print what was returned
> > > right before it bails) we get:
> > >buf[1] == 0x80004
> > >buf[2] == 0x0001
> > >
> > > Status 0x80004 is of course suspicious since it is not even listed 
> > > here:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/soc/bcm2835/raspberrypi-firmware.h#n14
> > >
> > > So it appears that this DMA request path is somehow busted and it
> > > would be really nice to figure out why.
> >
> > You have actually discovered a genuine bug in the recent xen dma rework
> > in Linux. Congrats :-)
> 
> Nice! ;-)
> 
> > I am doing some guesswork here, but from what I read in the thread and
> > the information in this email I think this patch might fix the issue.
> > If it doesn't fix the issue please add a few printks in
> > drivers/xen/swiotlb-xen.c:xen_swiotlb_free_coherent and please let me
> > know where exactly it crashes.
> >
> >
> > diff --git a/include/xen/arm/page-coherent.h 
>

[xen-unstable-smoke test] 150049: tolerable all pass - PUSHED

2020-05-05 Thread osstest service owner

flight 150049 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/150049/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  779efdbb502b38c66b774b124fa0ceed254875bd
baseline version:
 xen  2e3d87cc734a895ef5b486926274a178836b67a9

Last test of basis   150037  2020-05-05 16:00:51 Z0 days
Testing same since   150049  2020-05-05 20:00:32 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Ashok Raj 
  Borislav Petkov 
  Ian Jackson 
  Jan Beulich 
  Thomas Gleixner 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   2e3d87cc73..779efdbb50  779efdbb502b38c66b774b124fa0ceed254875bd -> smoke

Re: [RFC] UEFI Secure Boot on Xen Hosts

2020-05-05 Thread Bobby Eshleman

On Thu, Apr 30, 2020 at 01:41:12PM +0200, Ard Biesheuvel wrote:
> On Thu, 30 Apr 2020 at 13:15, Daniel Kiper  wrote:
> > Anyway, the advantage of this new protocol is that it uses UEFI API to
> > load and execute PE kernel and its modules. This means that it will be
> > architecture independent. However, IIRC, if we want to add new modules
> > types to the Xen then we have to teach all bootloaders in the wild about
> > new GUIDs. Ard, am I correct?
> >
> 
> Well, if you are passing multiple files that are not interchangeable,
> each bootloader will need to do something, right? It could be another
> GUID, or we could extend the initrd media device path to take
> discriminators.
> 
> So today, we have
> 
> VendorMedia(5568e427-68fc-4f3d-ac74-ca555231cc68)
> 
> which identifies /the/ initrd on Linux. As long as this keeps working
> as intended, I have no objections extending this to
> 
> VendorMedia(5568e427-68fc-4f3d-ac74-ca555231cc68)/File(...)
> 
> etc, if we can agree that omitting the File() part means the unnamed
> initrd, and that L"initrd" is reserved as a file name. That way, you
> can choose any abstract file name you want, but please note that the
> loader still needs to provide some kind of mapping of how these
> abstract file names relate to actual files selected by the user.

This seems reasonable to me and I can't think of any good reason right
now, if we go down this path, to not just extend the initrd media device
path (as opposed to creating one that is Xen-specific).  It definitely
beats having a GUID per boot module since the number of modules changes
per Xen deployment, so there would need to be some range of GUIDs
representing modules specifically for Xen, and some rules as to how they
are mapped to real files.

It does seem simpler to ask the loader for "dom0's kernel" or "dom1's
initrd" and to have the loader use these references to grab real files.

> One thing to keep in mind, though, is that this protocol goes away
> after ExitBootServices().
> 

Roger that.


Thanks,

-Bobby

Re: [PATCH] xenbus: avoid stack overflow warning

2020-05-05 Thread Arnd Bergmann

On Tue, May 5, 2020 at 6:02 PM Jürgen Groß  wrote:
> On 05.05.20 17:01, Arnd Bergmann wrote:
> > On Tue, May 5, 2020 at 4:34 PM Jürgen Groß  wrote:
> >> On 05.05.20 16:15, Arnd Bergmann wrote:
> >
> > I considered that as well, and don't really mind either way. I think it does
> > get a bit ugly whatever we do. If you prefer the union, I can respin the
> > patch that way.
>
> Hmm, thinking more about it I think the real clean solution would be to
> extend struct map_ring_valloc_hvm to cover the pv case, too, to add the
> map and unmap arrays (possibly as a union) to it and to allocate it
> dynamically instead of having it on the stack.
>
> Would you be fine doing this?

This is a little more complex than I'd want to do without doing any testing
(and no, I don't want to do the testing either) ;-)

It does sound like a better approach though.

  Arnd

[ovmf test] 150045: tolerable trouble: pass/starved - PUSHED

2020-05-05 Thread osstest service owner

flight 150045 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/150045/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-ovmf-amd64  2 hosts-allocate starved n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  2 hosts-allocate  starved n/a

version targeted for testing:
 ovmf de15e7c2651ada46cc649c5b3c8c0c145354ac04
baseline version:
 ovmf e54310451f1ac2ce4ccb90a110f45bb9b4f3ccd6

Last test of basis   149891  2020-04-30 16:10:01 Z5 days
Testing same since   150045  2020-05-05 16:09:42 Z0 days1 attempts


People who touched revisions under test:
  Ard Biesheuvel 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 starved 
 test-amd64-i386-xl-qemuu-ovmf-amd64  starved 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   e54310451f..de15e7c265  de15e7c2651ada46cc649c5b3c8c0c145354ac04 -> 
xen-tested-master

[xen-unstable-smoke test] 150037: tolerable all pass - PUSHED

2020-05-05 Thread osstest service owner

flight 150037 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/150037/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  2e3d87cc734a895ef5b486926274a178836b67a9
baseline version:
 xen  0135be8bd8cd60090298f02310691b688d95c3a8

Last test of basis   149888  2020-04-30 09:00:53 Z5 days
Failing since149986  2020-05-05 01:24:46 Z0 days3 attempts
Testing same since   150037  2020-05-05 16:00:51 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Hongyan Xia 
  Jan Beulich 
  Roger Pau Monné 
  Wei Liu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   0135be8bd8..2e3d87cc73  2e3d87cc734a895ef5b486926274a178836b67a9 -> smoke

[PATCH] x86/svm: Use flush-by-asid when available

2020-05-05 Thread Andrew Cooper

AMD Fam15h processors introduced the flush-by-asid feature, for more fine
grain flushing purposes.

Flushing everything including ASID 0 (i.e. Xen context) is an an unnecesserily
large hammer, and never necessary in the context of guest TLBs needing
invalidating.

When available, use TLB_CTRL_FLUSH_ASID in preference to TLB_CTRL_FLUSH_ALL.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 

The APM currently describes tlb_control encoding 1 as "Flush entire
TLB (Should be used only by legacy hypervisors.)".  AMD have agreed that this
is misleading and should say "legacy hardware" instead.

This change does come with a minor observed perf improvement on Fam17h
hardware, of ~0.6s over ~22s for my XTF pagewalk test.  This test will spot
TLB flushing issues, but isn't optimal for spotting the perf increase from
better flushing.  There were no observed differences for Fam15h, but this
could simply mean that the measured code footprint was larger than the TLB on
this CPU.
---
 xen/arch/x86/hvm/svm/asid.c   | 9 ++---
 xen/include/asm-x86/hvm/svm/svm.h | 1 +
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/asid.c b/xen/arch/x86/hvm/svm/asid.c
index 9be90058c7..ab06dd3f3a 100644
--- a/xen/arch/x86/hvm/svm/asid.c
+++ b/xen/arch/x86/hvm/svm/asid.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void svm_asid_init(const struct cpuinfo_x86 *c)
 {
@@ -47,15 +48,17 @@ void svm_asid_handle_vmrun(void)
 if ( p_asid->asid == 0 )
 {
 vmcb_set_guest_asid(vmcb, 1);
-/* TODO: investigate using TLB_CTRL_FLUSH_ASID here instead. */
-vmcb->tlb_control = TLB_CTRL_FLUSH_ALL;
+vmcb->tlb_control =
+cpu_has_svm_flushbyasid ? TLB_CTRL_FLUSH_ASID : TLB_CTRL_FLUSH_ALL;
 return;
 }
 
 if ( vmcb_get_guest_asid(vmcb) != p_asid->asid )
 vmcb_set_guest_asid(vmcb, p_asid->asid);
 
-vmcb->tlb_control = need_flush ? TLB_CTRL_FLUSH_ALL : TLB_CTRL_NO_FLUSH;
+vmcb->tlb_control =
+!need_flush ? TLB_CTRL_NO_FLUSH :
+cpu_has_svm_flushbyasid ? TLB_CTRL_FLUSH_ASID : TLB_CTRL_FLUSH_ALL;
 }
 
 /*
diff --git a/xen/include/asm-x86/hvm/svm/svm.h 
b/xen/include/asm-x86/hvm/svm/svm.h
index 16a994ec74..cd71402cbb 100644
--- a/xen/include/asm-x86/hvm/svm/svm.h
+++ b/xen/include/asm-x86/hvm/svm/svm.h
@@ -79,6 +79,7 @@ extern u32 svm_feature_flags;
 #define cpu_has_svm_svml  cpu_has_svm_feature(SVM_FEATURE_SVML)
 #define cpu_has_svm_nrips cpu_has_svm_feature(SVM_FEATURE_NRIPS)
 #define cpu_has_svm_cleanbits cpu_has_svm_feature(SVM_FEATURE_VMCBCLEAN)
+#define cpu_has_svm_flushbyasid cpu_has_svm_feature(SVM_FEATURE_FLUSHBYASID)
 #define cpu_has_svm_decodecpu_has_svm_feature(SVM_FEATURE_DECODEASSISTS)
 #define cpu_has_svm_vgif  cpu_has_svm_feature(SVM_FEATURE_VGIF)
 #define cpu_has_pause_filter  cpu_has_svm_feature(SVM_FEATURE_PAUSEFILTER)
-- 
2.11.0

[PATCH] x86/svm: Clean up vmcbcleanbits_t handling

2020-05-05 Thread Andrew Cooper

Rework the vmcbcleanbits_t definitons to use bool, drop 'fields' from the
namespace, position the comments in an unambiguous position, and include the
bit position.

In svm_vmexit_handler(), don't bother conditionally writing ~0 or 0 based on
hardware support.  The field was entirely unused and ignored on older
hardware (and we're already setting reserved cleanbits anyway).

In nsvm_vmcb_prepare4vmrun(), simplify the logic massivly by dropping the
vcleanbit_set() macro using a vmcbcleanbits_t local variable which only gets
filled in the case that clean bits were valid previously.  Fix up the style on
impacted lines.

No practical change in behaviour.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
---
 xen/arch/x86/hvm/svm/nestedsvm.c   | 45 +++---
 xen/arch/x86/hvm/svm/svm.c | 10 -
 xen/arch/x86/hvm/svm/svmdebug.c|  2 +-
 xen/include/asm-x86/hvm/svm/vmcb.h | 45 ++
 4 files changed, 44 insertions(+), 58 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index bbd06e342e..998790af1b 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -343,7 +343,7 @@ static int nsvm_vcpu_hostrestore(struct vcpu *v, struct 
cpu_user_regs *regs)
 n1vmcb->exit_int_info.raw = 0;
 
 /* Cleanbits */
-n1vmcb->cleanbits.bytes = 0;
+n1vmcb->cleanbits.raw = 0;
 
 return 0;
 }
@@ -423,7 +423,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct 
cpu_user_regs *regs)
 struct nestedvcpu *nv = _nestedhvm(v);
 struct nestedsvm *svm = _nestedsvm(v);
 struct vmcb_struct *ns_vmcb, *n1vmcb, *n2vmcb;
-bool_t vcleanbits_valid;
+vmcbcleanbits_t clean = {};
 int rc;
 uint64_t cr0;
 
@@ -435,17 +435,13 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct 
cpu_user_regs *regs)
 ASSERT(n2vmcb != NULL);
 
 /* Check if virtual VMCB cleanbits are valid */
-vcleanbits_valid = 1;
-if ( svm->ns_ovvmcb_pa == INVALID_PADDR )
-vcleanbits_valid = 0;
-if (svm->ns_ovvmcb_pa != nv->nv_vvmcxaddr)
-vcleanbits_valid = 0;
-
-#define vcleanbit_set(_name)   \
-(vcleanbits_valid && ns_vmcb->cleanbits.fields._name)
+if ( svm->ns_ovvmcb_pa != INVALID_PADDR &&
+ svm->ns_ovvmcb_pa != nv->nv_vvmcxaddr )
+clean = ns_vmcb->cleanbits;
 
 /* Enable l2 guest intercepts */
-if (!vcleanbit_set(intercepts)) {
+if ( !clean.intercepts )
+{
 svm->ns_cr_intercepts = ns_vmcb->_cr_intercepts;
 svm->ns_dr_intercepts = ns_vmcb->_dr_intercepts;
 svm->ns_exception_intercepts = ns_vmcb->_exception_intercepts;
@@ -492,7 +488,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct 
cpu_user_regs *regs)
 n2vmcb->_tsc_offset = n1vmcb->_tsc_offset + ns_vmcb->_tsc_offset;
 
 /* Nested IO permission bitmaps */
-rc = nsvm_vmrun_permissionmap(v, vcleanbit_set(iopm));
+rc = nsvm_vmrun_permissionmap(v, clean.iopm);
 if (rc)
 return rc;
 
@@ -502,7 +498,8 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct 
cpu_user_regs *regs)
 n2vmcb->tlb_control = ns_vmcb->tlb_control;
 
 /* Virtual Interrupts */
-if (!vcleanbit_set(tpr)) {
+if ( !clean.tpr )
+{
 n2vmcb->_vintr = ns_vmcb->_vintr;
 n2vmcb->_vintr.fields.intr_masking = 1;
 }
@@ -520,9 +517,9 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct 
cpu_user_regs *regs)
 n2vmcb->event_inj = ns_vmcb->event_inj;
 
 /* LBR and other virtualization */
-if (!vcleanbit_set(lbr)) {
+if ( !clean.lbr )
 svm->ns_virt_ext = ns_vmcb->virt_ext;
-}
+
 n2vmcb->virt_ext.bytes =
 n1vmcb->virt_ext.bytes | ns_vmcb->virt_ext.bytes;
 
@@ -533,7 +530,8 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct 
cpu_user_regs *regs)
  */
 
 /* Segments */
-if (!vcleanbit_set(seg)) {
+if ( !clean.seg )
+{
 n2vmcb->es = ns_vmcb->es;
 n2vmcb->cs = ns_vmcb->cs;
 n2vmcb->ss = ns_vmcb->ss;
@@ -541,7 +539,8 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct 
cpu_user_regs *regs)
 /* CPL */
 n2vmcb->_cpl = ns_vmcb->_cpl;
 }
-if (!vcleanbit_set(dt)) {
+if ( !clean.dt )
+{
 n2vmcb->gdtr = ns_vmcb->gdtr;
 n2vmcb->idtr = ns_vmcb->idtr;
 }
@@ -614,7 +613,8 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct 
cpu_user_regs *regs)
 }
 
 /* DRn */
-if (!vcleanbit_set(dr)) {
+if ( !clean.dr )
+{
 n2vmcb->_dr7 = ns_vmcb->_dr7;
 n2vmcb->_dr6 = ns_vmcb->_dr6;
 }
@@ -637,11 +637,11 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct 
cpu_user_regs *regs)
  */
 
 /* PAT */
-if (!vcleanbit_set(np)) {
+if ( !clean.np )
 n2vmcb->_g_pat = ns_vmcb->_g_pat;
-}
 
-if (!vcleanbit_set(lbr)) {
+if ( !clean.lbr )
+{

Re: [PATCH v3] tools/xenstore: don't store domU's mfn of ring page in xenstored

2020-05-05 Thread Julien Grall


Hi Juergen,

On 30/04/2020 06:38, Juergen Gross wrote:

The XS_INTRODUCE command has two parameters: the mfn (or better: gfn)
of the domain's xenstore ring page and the event channel of the
domain for communicating with Xenstore.

The gfn is not really needed. It is stored in the per-domain struct
in xenstored and in case of another XS_INTRODUCE for the domain it
is tested to match the original value. If it doesn't match the
command is aborted via EINVAL, otherwise the event channel to the
domain is recreated.

As XS_INTRODUCE is limited to dom0 and there is no real downside of
recreating the event channel just omit the test for the gfn to
match and don't return EINVAL for multiple XS_INTRODUCE calls.

Signed-off-by: Juergen Gross 


Reviewed-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [PATCH] xenbus: avoid stack overflow warning

2020-05-05 Thread Jürgen Groß


On 05.05.20 18:12, Boris Ostrovsky wrote:


On 5/5/20 12:02 PM, Jürgen Groß wrote:

On 05.05.20 17:01, Arnd Bergmann wrote:

On Tue, May 5, 2020 at 4:34 PM Jürgen Groß  wrote:

On 05.05.20 16:15, Arnd Bergmann wrote:

The __xenbus_map_ring() function has two large arrays, 'map' and
'unmap' on its stack. When clang decides to inline it into its caller,
xenbus_map_ring_valloc_hvm(), the total stack usage exceeds the
warning
limit for stack size on 32-bit architectures.

drivers/xen/xenbus/xenbus_client.c:592:12: error: stack frame size
of 1104 bytes in function 'xenbus_map_ring_valloc_hvm'
[-Werror,-Wframe-larger-than=]

As far as I can tell, other compilers don't inline it here, so we get
no warning, but the stack usage is actually the same. It is possible
for both arrays to use the same location on the stack, but the
compiler
cannot prove that this is safe because they get passed to external
functions that may end up using them until they go out of scope.

Move the two arrays into separate basic blocks to limit the scope
and force them to occupy less stack in total, regardless of the
inlining decision.


Why don't you put both arrays into a union?


I considered that as well, and don't really mind either way. I think
it does
get a bit ugly whatever we do. If you prefer the union, I can respin the
patch that way.


Hmm, thinking more about it I think the real clean solution would be to
extend struct map_ring_valloc_hvm to cover the pv case, too, to add the
map and unmap arrays (possibly as a union) to it and to allocate it
dynamically instead of having it on the stack.

Would you be fine doing this?




Another option might be to factor out/modify code from
xenbus_unmap_ring() and call the resulting code from
__xenbus_map_ring()'s fail path.


This will still allocate large arrays on the stack. If we ever increase
the max ring page order it will explode.


Juergen

Re: [PATCH] xenbus: avoid stack overflow warning

2020-05-05 Thread Boris Ostrovsky



On 5/5/20 12:02 PM, Jürgen Groß wrote:
> On 05.05.20 17:01, Arnd Bergmann wrote:
>> On Tue, May 5, 2020 at 4:34 PM Jürgen Groß  wrote:
>>> On 05.05.20 16:15, Arnd Bergmann wrote:
 The __xenbus_map_ring() function has two large arrays, 'map' and
 'unmap' on its stack. When clang decides to inline it into its caller,
 xenbus_map_ring_valloc_hvm(), the total stack usage exceeds the
 warning
 limit for stack size on 32-bit architectures.

 drivers/xen/xenbus/xenbus_client.c:592:12: error: stack frame size
 of 1104 bytes in function 'xenbus_map_ring_valloc_hvm'
 [-Werror,-Wframe-larger-than=]

 As far as I can tell, other compilers don't inline it here, so we get
 no warning, but the stack usage is actually the same. It is possible
 for both arrays to use the same location on the stack, but the
 compiler
 cannot prove that this is safe because they get passed to external
 functions that may end up using them until they go out of scope.

 Move the two arrays into separate basic blocks to limit the scope
 and force them to occupy less stack in total, regardless of the
 inlining decision.
>>>
>>> Why don't you put both arrays into a union?
>>
>> I considered that as well, and don't really mind either way. I think
>> it does
>> get a bit ugly whatever we do. If you prefer the union, I can respin the
>> patch that way.
>
> Hmm, thinking more about it I think the real clean solution would be to
> extend struct map_ring_valloc_hvm to cover the pv case, too, to add the
> map and unmap arrays (possibly as a union) to it and to allocate it
> dynamically instead of having it on the stack.
>
> Would you be fine doing this?



Another option might be to factor out/modify code from 
xenbus_unmap_ring() and call the resulting code from
__xenbus_map_ring()'s fail path.


-boris

Re: [PATCH] xenbus: avoid stack overflow warning

2020-05-05 Thread Jürgen Groß


On 05.05.20 17:01, Arnd Bergmann wrote:

On Tue, May 5, 2020 at 4:34 PM Jürgen Groß  wrote:

On 05.05.20 16:15, Arnd Bergmann wrote:

The __xenbus_map_ring() function has two large arrays, 'map' and
'unmap' on its stack. When clang decides to inline it into its caller,
xenbus_map_ring_valloc_hvm(), the total stack usage exceeds the warning
limit for stack size on 32-bit architectures.

drivers/xen/xenbus/xenbus_client.c:592:12: error: stack frame size of 1104 
bytes in function 'xenbus_map_ring_valloc_hvm' [-Werror,-Wframe-larger-than=]

As far as I can tell, other compilers don't inline it here, so we get
no warning, but the stack usage is actually the same. It is possible
for both arrays to use the same location on the stack, but the compiler
cannot prove that this is safe because they get passed to external
functions that may end up using them until they go out of scope.

Move the two arrays into separate basic blocks to limit the scope
and force them to occupy less stack in total, regardless of the
inlining decision.


Why don't you put both arrays into a union?


I considered that as well, and don't really mind either way. I think it does
get a bit ugly whatever we do. If you prefer the union, I can respin the
patch that way.


Hmm, thinking more about it I think the real clean solution would be to
extend struct map_ring_valloc_hvm to cover the pv case, too, to add the
map and unmap arrays (possibly as a union) to it and to allocate it
dynamically instead of having it on the stack.

Would you be fine doing this?


Juergen

Re: [PATCH v5] docs/designs: re-work the xenstore migration document...

2020-05-05 Thread Jürgen Groß


On 05.05.20 17:15, Edwin Torok wrote:

On Tue, 2020-05-05 at 14:13 +0100, Jürgen Groß wrote:

On 05.05.20 15:01, Julien Grall wrote:

Hi Paul,

On 28/04/2020 16:06, Paul Durrant wrote:

From: Paul Durrant 

... to specify a separate migration stream that will also be
suitable for
live update.

The original scope of the document was to support non-
cooperative
migration
of guests [1] but, since then, live update of xenstored has been
brought into
scope. Thus it makes more sense to define a separate image format
for
serializing xenstore state that is suitable for both purposes.

The document has been limited to specifying a new image format.
The
mechanism
for acquiring the image for live update or migration is not
covered as
that
is more appropriately dealt with by a patch to
docs/misc/xenstore.txt.
It is
also expected that, when the first implementation of live update
or
migration
making use of this specification is committed, that the document
is
moved from
docs/designs into docs/specs.

NOTE: It will only be necessary to save and restore state for
active
xenstore
connections, but the documentation for 'RESUME' in
xenstore.txt
implies
otherwise. That command is unused so this patch deletes it
from
the
specification.


Could someone from Citrix please verify that XAPI isn't using
XS_RESUME?


The implementation of XS_RESUME in oxenstored doesn't do much: it seems
to be just a way for Dom0 to check whether a domain exists or not, and
for a domain to check whether they are Dom0 or not.
If the domain exists, then the resume implementation just returns `()`,
i.e. does nothing.

I can't find any references to Xs.resume in xenopsd (or the other XAPI
repos that I got checked out), so I think it can be safely removed from
the spec and client libraries (I'd keep it in the actual oxenstored
implementation just in case some guest does call it).


Thanks for the confirmation.


Juergen

Re: [PATCH] x86/pv: Fix Clang build with !CONFIG_PV32

2020-05-05 Thread Jan Beulich

On 05.05.2020 17:05, Andrew Cooper wrote:
> On 05/05/2020 15:52, Jan Beulich wrote:
>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
>> unless you have verified the sender and know the content is safe.
>>
>> On 05.05.2020 16:28, Andrew Cooper wrote:
>>> @@ -753,8 +751,9 @@ void load_system_tables(void)
>>> _set_tssldt_desc(gdt + TSS_ENTRY, (unsigned long)tss,
>>>  sizeof(*tss) - 1, SYS_DESC_tss_avail);
>>> if ( IS_ENABLED(CONFIG_PV32) )
>>> -   _set_tssldt_desc(compat_gdt + TSS_ENTRY, (unsigned long)tss,
>>> -sizeof(*tss) - 1, SYS_DESC_tss_busy);
>>> +   _set_tssldt_desc(
>>> +   this_cpu(compat_gdt) - FIRST_RESERVED_GDT_ENTRY + 
>>> TSS_ENTRY,
>>> +   (unsigned long)tss, sizeof(*tss) - 1, 
>>> SYS_DESC_tss_busy);
>> Isn't indentation here off by 4 compared to what we
>> normally do with extremely large argument expressions?
> 
> No.  This is Linux style (therefore 8-space tabs), not Xen style (4 spaces).

Oh, right - din't pay attention at all to this being tabs, sorry.

Jan

Re: [PATCH v5] docs/designs: re-work the xenstore migration document...

2020-05-05 Thread Edwin Torok

On Tue, 2020-05-05 at 14:13 +0100, Jürgen Groß wrote:
> On 05.05.20 15:01, Julien Grall wrote:
> > Hi Paul,
> > 
> > On 28/04/2020 16:06, Paul Durrant wrote:
> > > From: Paul Durrant 
> > > 
> > > ... to specify a separate migration stream that will also be
> > > suitable for
> > > live update.
> > > 
> > > The original scope of the document was to support non-
> > > cooperative 
> > > migration
> > > of guests [1] but, since then, live update of xenstored has been 
> > > brought into
> > > scope. Thus it makes more sense to define a separate image format
> > > for
> > > serializing xenstore state that is suitable for both purposes.
> > > 
> > > The document has been limited to specifying a new image format.
> > > The 
> > > mechanism
> > > for acquiring the image for live update or migration is not
> > > covered as 
> > > that
> > > is more appropriately dealt with by a patch to
> > > docs/misc/xenstore.txt. 
> > > It is
> > > also expected that, when the first implementation of live update
> > > or 
> > > migration
> > > making use of this specification is committed, that the document
> > > is 
> > > moved from
> > > docs/designs into docs/specs.
> > > 
> > > NOTE: It will only be necessary to save and restore state for
> > > active 
> > > xenstore
> > >connections, but the documentation for 'RESUME' in
> > > xenstore.txt 
> > > implies
> > >otherwise. That command is unused so this patch deletes it
> > > from 
> > > the
> > >specification.
> 
> Could someone from Citrix please verify that XAPI isn't using
> XS_RESUME?

The implementation of XS_RESUME in oxenstored doesn't do much: it seems
to be just a way for Dom0 to check whether a domain exists or not, and
for a domain to check whether they are Dom0 or not.
If the domain exists, then the resume implementation just returns `()`,
i.e. does nothing.

I can't find any references to Xs.resume in xenopsd (or the other XAPI
repos that I got checked out), so I think it can be safely removed from
the spec and client libraries (I'd keep it in the actual oxenstored
implementation just in case some guest does call it).

Best regards,
--Edwin

> 
> 
> Juergen
>

Re: [PATCH] x86/pv: Fix Clang build with !CONFIG_PV32

2020-05-05 Thread Andrew Cooper

On 05/05/2020 15:52, Jan Beulich wrote:
> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
> unless you have verified the sender and know the content is safe.
>
> On 05.05.2020 16:28, Andrew Cooper wrote:
>> @@ -753,8 +751,9 @@ void load_system_tables(void)
>>  _set_tssldt_desc(gdt + TSS_ENTRY, (unsigned long)tss,
>>   sizeof(*tss) - 1, SYS_DESC_tss_avail);
>>  if ( IS_ENABLED(CONFIG_PV32) )
>> -_set_tssldt_desc(compat_gdt + TSS_ENTRY, (unsigned long)tss,
>> - sizeof(*tss) - 1, SYS_DESC_tss_busy);
>> +_set_tssldt_desc(
>> +this_cpu(compat_gdt) - FIRST_RESERVED_GDT_ENTRY + 
>> TSS_ENTRY,
>> +(unsigned long)tss, sizeof(*tss) - 1, 
>> SYS_DESC_tss_busy);
> Isn't indentation here off by 4 compared to what we
> normally do with extremely large argument expressions?

No.  This is Linux style (therefore 8-space tabs), not Xen style (4 spaces).

~Andrew

Re: [PATCH v2] x86/traps: fix an off-by-one error

2020-05-05 Thread Jan Beulich

On 05.05.2020 16:17, Hongyan Xia wrote:
> From: Hongyan Xia 
> 
> stack++ can go into the next page and unmap_domain_page() will unmap the
> wrong one, causing mapcache and memory corruption. Fix.
> 
> Signed-off-by: Hongyan Xia 

Reviewed-by: Jan Beulich

Re: [PATCH] xenbus: avoid stack overflow warning

2020-05-05 Thread Arnd Bergmann

On Tue, May 5, 2020 at 4:34 PM Jürgen Groß  wrote:
> On 05.05.20 16:15, Arnd Bergmann wrote:
> > The __xenbus_map_ring() function has two large arrays, 'map' and
> > 'unmap' on its stack. When clang decides to inline it into its caller,
> > xenbus_map_ring_valloc_hvm(), the total stack usage exceeds the warning
> > limit for stack size on 32-bit architectures.
> >
> > drivers/xen/xenbus/xenbus_client.c:592:12: error: stack frame size of 1104 
> > bytes in function 'xenbus_map_ring_valloc_hvm' 
> > [-Werror,-Wframe-larger-than=]
> >
> > As far as I can tell, other compilers don't inline it here, so we get
> > no warning, but the stack usage is actually the same. It is possible
> > for both arrays to use the same location on the stack, but the compiler
> > cannot prove that this is safe because they get passed to external
> > functions that may end up using them until they go out of scope.
> >
> > Move the two arrays into separate basic blocks to limit the scope
> > and force them to occupy less stack in total, regardless of the
> > inlining decision.
>
> Why don't you put both arrays into a union?

I considered that as well, and don't really mind either way. I think it does
get a bit ugly whatever we do. If you prefer the union, I can respin the
patch that way.

  Arnd

Re: [PATCH 1/3] x86/mm: do not attempt to convert _PAGE_GNTTAB to a boolean

2020-05-05 Thread Jan Beulich

On 05.05.2020 16:11, Roger Pau Monné wrote:
> On Tue, May 05, 2020 at 03:47:43PM +0200, Jan Beulich wrote:
>> On 05.05.2020 11:24, Roger Pau Monne wrote:
>>> Remove the conversion of _PAGE_GNTTAB to a boolean, since the and
>>> operation performed afterwards will already return false if the value
>>> of the macro is 0.
>>
>> I'm sorry, but no. The expression was put there on purpose by
>> 0932210ac095 ("x86: Address "Bitwise-and with zero
>> CONSTANT_EXPRESSION_RESULT" Coverity issues"), and the
>> description there is clearly telling us that this wants to stay
>> unless Coverity changed in the meantime. Otherwise I'm afraid
>> a more elaborate solution will be needed to please both.
> 
> Clang is fine with changing this to _PAGE_GNTTAB != 0. Would you be
> OK with this approach?

I'd be okay with it, but then I guess I'd prefer ...

>> Or a
>> more simplistic one, like using "#if _PAGE_GNTTAB" around the
>> construct.
> 
> Yes, that's the other solution I had in mind.

 this one. Let's see if Andrew has a clear opinion either
way - it was him to address the original Coverity issue after
all.

Jan

[PATCH RFC] automation: split logs into separate files

2020-05-05 Thread Roger Pau Monne

The aim of this patch is to make it easier to digest the output of the
gitlab jobs, as the current is IMO too long and that makes it hard to
spot what went wrong. So this patch does the following:
 - Rename build.log into run.log.
 - Split each build log into a logs folder, using the
   build{-kconfig}.log file. Note the default kconfig (either random
   or not) will use the name build.log.
 - The output from kconfig is also saved as kconfig{-kconfig}.log.
 - The build and configure output is no longer part of the default
   gitlab output.

Signed-off-by: Roger Pau Monné 
---
I find the output from the gitlab tests hard to consume, as it's
thousands of lines long. With this change such output is split into
several smaller files, IMO easier to consume, and the default output
should make it also easier to identify exactly which step went wrong.
---
 automation/gitlab-ci/build.yaml |  3 ++-
 automation/scripts/build| 18 +++---
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
index 1e61d30c85..697fe7cc55 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
@@ -2,9 +2,10 @@
   stage: build
   image: registry.gitlab.com/xen-project/xen/${CONTAINER}
   script:
-- ./automation/scripts/build 2>&1 | tee build.log
+- ./automation/scripts/build 2>&1 | tee run.log
   artifacts:
 paths:
+  - logs/
   - binaries/
   - xen-config
   - '*.log'
diff --git a/automation/scripts/build b/automation/scripts/build
index 0cd0f3971d..704b428f86 100755
--- a/automation/scripts/build
+++ b/automation/scripts/build
@@ -8,11 +8,14 @@ cc-ver()
 $CC -dumpversion | awk -F. '{ printf "0x%02x%02x%02x", $1, $2, $3 }'
 }
 
+mkdir logs
+
 # random config or default config
 if [[ "${RANDCONFIG}" == "y" ]]; then
-make -C xen KCONFIG_ALLCONFIG=tools/kconfig/allrandom.config randconfig
+make -C xen KCONFIG_ALLCONFIG=tools/kconfig/allrandom.config randconfig \
+ > logs/kconfig.log 2>&1
 else
-make -C xen defconfig
+make -C xen defconfig > logs/kconfig.log 2>&1
 fi
 
 # build up our configure options
@@ -38,9 +41,9 @@ if [[ "${CC}" == "gcc" && `cc-ver` -lt 0x040600 ]]; then
 cfgargs+=("--with-system-seabios=/bin/false")
 fi
 
-./configure "${cfgargs[@]}"
+./configure "${cfgargs[@]}" > logs/configure.log 2>&1
 
-make -j$(nproc) dist
+make -j$(nproc) dist > logs/build.log 2>&1
 
 # Extract artifacts to avoid getting rewritten by customised builds
 cp xen/.config xen-config
@@ -58,8 +61,9 @@ esac
 cfg_dir="automation/configs/${arch}"
 for cfg in `ls ${cfg_dir}`; do
 echo "Building $cfg"
-make -j$(nproc) -C xen clean
+make -j$(nproc) -C xen clean > /dev/null 2>&1
 rm -f xen/.config
-make -C xen KBUILD_DEFCONFIG=../../../../${cfg_dir}/${cfg} 
XEN_CONFIG_EXPERT=y defconfig
-make -j$(nproc) -C xen XEN_CONFIG_EXPERT=y
+make -C xen KBUILD_DEFCONFIG=../../../../${cfg_dir}/${cfg} \
+ XEN_CONFIG_EXPERT=y defconfig > logs/kconfig-${cfg}.log 2>&1
+make -j$(nproc) -C xen XEN_CONFIG_EXPERT=y > logs/build-${cfg}.log 2>&1
 done
-- 
2.26.2

Re: [PATCH] x86/traps: fix an off-by-one error

2020-05-05 Thread Jan Beulich

On 05.05.2020 16:01, Hongyan Xia wrote:
> On Tue, 2020-05-05 at 15:38 +0200, Jan Beulich wrote:
>> On 05.05.2020 13:06, Hongyan Xia wrote:
>>> @@ -358,7 +359,7 @@ static void show_guest_stack(struct vcpu *v,
>>> const struct cpu_user_regs *regs)
>>>  if ( mask == PAGE_SIZE )
>>>  {
>>>  BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
>>> -unmap_domain_page(stack);
>>> +unmap_domain_page(stack_page);
>>>  }
>>
>> With this I think you want to change the whole construct here to
>>
>> if ( stack_page )
>> unmap_domain_page(stack_page);
>>
>> i.e. with the then no longer relevant BUILD_BUG_ON() also dropped.
> 
> I wonder if such a construct is better with UNMAP_DOMAIN_PAGE(), since
> it deals with NULL and will nullify it to prevent misuse.

In the case here I think I agree. For the future may I ask that
you wait with sending a new version until the discussion on the
previous one has settled?

Jan

Re: Backports to 4.13

2020-05-05 Thread Jan Beulich

On 05.05.2020 16:51, Ian Jackson wrote:
> Andrew Cooper writes ("Backports to 4.13"):
>> On the tools side of things, f50a4f6e244c aafae0e800e9 2a62c22715bf
>> d79cc6bc2bac 0729830cc425 are all bugs in CPUID and/or migration.  "Fix
>> HVM_PARAM_PAE_ENABLED handling" is only for 4.13, whereas all the others
>> are back to 4.7 (technically speaking).
> 
> Done.
> 
> There seem to have been quite few requests, but there were a lot of
> fixes tagged with Backport or Fixes.  I have now applied these (not
> just to 4.13 but to all the supported trees so back to 4.10, as
> applicable; plus one non-security bugfix I considered important enough
> to go to the security-supported 4.9).

Why back to 4.10? Only 4.12 and 4.13 are fully supported at this point,
afaict. Older trees should only get security fixes imo, with extremely
few exceptions.

Jan

Re: [PATCH] x86/pv: Fix Clang build with !CONFIG_PV32

2020-05-05 Thread Jan Beulich

On 05.05.2020 16:28, Andrew Cooper wrote:
> @@ -753,8 +751,9 @@ void load_system_tables(void)
>   _set_tssldt_desc(gdt + TSS_ENTRY, (unsigned long)tss,
>sizeof(*tss) - 1, SYS_DESC_tss_avail);
>   if ( IS_ENABLED(CONFIG_PV32) )
> - _set_tssldt_desc(compat_gdt + TSS_ENTRY, (unsigned long)tss,
> -  sizeof(*tss) - 1, SYS_DESC_tss_busy);
> + _set_tssldt_desc(
> + this_cpu(compat_gdt) - FIRST_RESERVED_GDT_ENTRY + 
> TSS_ENTRY,
> + (unsigned long)tss, sizeof(*tss) - 1, 
> SYS_DESC_tss_busy);

Isn't indentation here off by 4 compared to what we
normally do with extremely large argument expressions?
Other than this lgtm.

Jan

Re: Backports to 4.13

2020-05-05 Thread Ian Jackson

Andrew Cooper writes ("Backports to 4.13"):
> On the tools side of things, f50a4f6e244c aafae0e800e9 2a62c22715bf
> d79cc6bc2bac 0729830cc425 are all bugs in CPUID and/or migration.  "Fix
> HVM_PARAM_PAE_ENABLED handling" is only for 4.13, whereas all the others
> are back to 4.7 (technically speaking).

Done.

There seem to have been quite few requests, but there were a lot of
fixes tagged with Backport or Fixes.  I have now applied these (not
just to 4.13 but to all the supported trees so back to 4.10, as
applicable; plus one non-security bugfix I considered important enough
to go to the security-supported 4.9).

Ian.

Re: [PATCH 09/16] x86/cpu: Adjust enable_nmis() to be shadow stack compatible

2020-05-05 Thread Jan Beulich

On 02.05.2020 00:58, Andrew Cooper wrote:
> When executing an IRET-to-self, the shadow stack must agree with the regular
> stack.  We can't manipulate SSP directly, so have to fake a shadow IRET frame
> by executing 3 CALLs, then editing the result to look correct.
> 
> This is not a fastpath, is called on the BSP long before CET can be set up,
> and may be called on the crash path after CET is disabled.  Use the fact that
> INCSSP is allocated from the hint nop space to construct a test for CET being
> active which is safe on all processors.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 
albeit with a question which may make a further change necessary:

> --- a/xen/include/asm-x86/processor.h
> +++ b/xen/include/asm-x86/processor.h
> @@ -544,17 +544,40 @@ static inline void enable_nmis(void)
>  {
>  unsigned long tmp;
>  
> -asm volatile ( "mov %%rsp, %[tmp] \n\t"
> -   "push %[ss]\n\t"
> -   "push %[tmp]   \n\t"
> -   "pushf \n\t"
> -   "push %[cs]\n\t"
> -   "lea 1f(%%rip), %[tmp] \n\t"
> -   "push %[tmp]   \n\t"
> -   "iretq; 1: \n\t"
> -   : [tmp] "=" (tmp)
> +asm volatile ( "mov %%rsp, %[rsp]\n\t"
> +   "lea.Ldone(%%rip), %[rip] \n\t"
> +#ifdef CONFIG_XEN_SHSTK
> +   /* Check for CET-SS being active. */
> +   "mov$1, %k[ssp]   \n\t"
> +   "rdsspq %[ssp]\n\t"
> +   "cmp$1, %k[ssp]   \n\t"
> +   "je .Lshstk_done  \n\t"
> +
> +   /* Push 3 words on the shadow stack */
> +   ".rept 3  \n\t"
> +   "call 1f; nop; 1: \n\t"
> +   ".endr\n\t"
> +
> +   /* Fixup to be an IRET shadow stack frame */
> +   "wrssq  %q[cs], -1*8(%[ssp])  \n\t"
> +   "wrssq  %[rip], -2*8(%[ssp])  \n\t"
> +   "wrssq  %[ssp], -3*8(%[ssp])  \n\t"
> +
> +   ".Lshstk_done:"
> +#endif
> +   /* Write an IRET regular frame */
> +   "push   %[ss] \n\t"
> +   "push   %[rsp]\n\t"
> +   "pushf\n\t"
> +   "push   %q[cs]\n\t"
> +   "push   %[rip]\n\t"
> +   "iretq\n\t"
> +   ".Ldone:  \n\t"
> +   : [rip] "=" (tmp),
> + [rsp] "=" (tmp),
> + [ssp] "=" (tmp)

Even after an hour of reading and searching through the gcc docs
I can't convince myself that this utilizes defined behavior. We
do tie multiple outputs to the same C variable elsewhere, yes,
but only in cases where we don't really care about the register,
or where the register is a fixed one anyway. What I can't find
is a clear statement that gcc wouldn't ever, now or in the
future, use the same register for all three outputs. I think one
can deduce this in certain ways, and experimentation also seems
to confirm it, but still.

Jan

Re: [PATCH] x86/pv: Fix Clang build with !CONFIG_PV32

2020-05-05 Thread Roger Pau Monné

On Tue, May 05, 2020 at 03:28:10PM +0100, Andrew Cooper wrote:
> Clang 3.5 doesn't do enough dead-code-elimination to drop the compat_gdt
> reference, resulting in a linker failure:
> 
>   hidden symbol `per_cpu__compat_gdt' isn't defined
> 
> Drop the local variable, and move evaluation of this_cpu(compat_gdt) to within
> the guarded region.
> 
> Reported-by: Roger Pau Monné 
> Signed-off-by: Andrew Cooper 

Tested-and-reviewed-by: Roger Pau Monné 

Thanks!

Re: [PATCH] xenbus: avoid stack overflow warning

2020-05-05 Thread Jürgen Groß


On 05.05.20 16:15, Arnd Bergmann wrote:

The __xenbus_map_ring() function has two large arrays, 'map' and
'unmap' on its stack. When clang decides to inline it into its caller,
xenbus_map_ring_valloc_hvm(), the total stack usage exceeds the warning
limit for stack size on 32-bit architectures.

drivers/xen/xenbus/xenbus_client.c:592:12: error: stack frame size of 1104 
bytes in function 'xenbus_map_ring_valloc_hvm' [-Werror,-Wframe-larger-than=]

As far as I can tell, other compilers don't inline it here, so we get
no warning, but the stack usage is actually the same. It is possible
for both arrays to use the same location on the stack, but the compiler
cannot prove that this is safe because they get passed to external
functions that may end up using them until they go out of scope.

Move the two arrays into separate basic blocks to limit the scope
and force them to occupy less stack in total, regardless of the
inlining decision.


Why don't you put both arrays into a union?


Juergen



Signed-off-by: Arnd Bergmann 
---
  drivers/xen/xenbus/xenbus_client.c | 74 +-
  1 file changed, 41 insertions(+), 33 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c 
b/drivers/xen/xenbus/xenbus_client.c
index 040d2a43e8e3..23ca70378e36 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -470,54 +470,62 @@ static int __xenbus_map_ring(struct xenbus_device *dev,
 unsigned int flags,
 bool *leaked)
  {
-   struct gnttab_map_grant_ref map[XENBUS_MAX_RING_GRANTS];
-   struct gnttab_unmap_grant_ref unmap[XENBUS_MAX_RING_GRANTS];
int i, j;
int err = GNTST_okay;
  
-	if (nr_grefs > XENBUS_MAX_RING_GRANTS)

-   return -EINVAL;
+   {
+   struct gnttab_map_grant_ref map[XENBUS_MAX_RING_GRANTS];
  
-	for (i = 0; i < nr_grefs; i++) {

-   memset([i], 0, sizeof(map[i]));
-   gnttab_set_map_op([i], addrs[i], flags, gnt_refs[i],
- dev->otherend_id);
-   handles[i] = INVALID_GRANT_HANDLE;
-   }
+   if (nr_grefs > XENBUS_MAX_RING_GRANTS)
+   return -EINVAL;
  
-	gnttab_batch_map(map, i);

+   for (i = 0; i < nr_grefs; i++) {
+   memset([i], 0, sizeof(map[i]));
+   gnttab_set_map_op([i], addrs[i], flags,
+ gnt_refs[i], dev->otherend_id);
+   handles[i] = INVALID_GRANT_HANDLE;
+   }
+
+   gnttab_batch_map(map, i);
  
-	for (i = 0; i < nr_grefs; i++) {

-   if (map[i].status != GNTST_okay) {
-   err = map[i].status;
-   xenbus_dev_fatal(dev, map[i].status,
+   for (i = 0; i < nr_grefs; i++) {
+   if (map[i].status != GNTST_okay) {
+   err = map[i].status;
+   xenbus_dev_fatal(dev, map[i].status,
 "mapping in shared page %d from domain 
%d",
 gnt_refs[i], dev->otherend_id);
-   goto fail;
-   } else
-   handles[i] = map[i].handle;
+   goto fail;
+   } else
+   handles[i] = map[i].handle;
+   }
}
-
return GNTST_okay;
  
   fail:

-   for (i = j = 0; i < nr_grefs; i++) {
-   if (handles[i] != INVALID_GRANT_HANDLE) {
-   memset([j], 0, sizeof(unmap[j]));
-   gnttab_set_unmap_op([j], (phys_addr_t)addrs[i],
-   GNTMAP_host_map, handles[i]);
-   j++;
+   {
+   struct gnttab_unmap_grant_ref unmap[XENBUS_MAX_RING_GRANTS];
+
+   for (i = j = 0; i < nr_grefs; i++) {
+   if (handles[i] != INVALID_GRANT_HANDLE) {
+   memset([j], 0, sizeof(unmap[j]));
+   gnttab_set_unmap_op([j],
+   (phys_addr_t)addrs[i],
+   GNTMAP_host_map,
+   handles[i]);
+   j++;
+   }
}
-   }
  
-	if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, unmap, j))

-   BUG();
+   if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
+ unmap, j))
+   BUG();
  
-	*leaked = false;

-   for (i = 0; i < j; i++) {
-   if (unmap[i].status != GNTST_okay) {
-   *leaked = true;
-   break;
+   *leaked = false;
+

[PATCH] x86/pv: Fix Clang build with !CONFIG_PV32

2020-05-05 Thread Andrew Cooper

Clang 3.5 doesn't do enough dead-code-elimination to drop the compat_gdt
reference, resulting in a linker failure:

  hidden symbol `per_cpu__compat_gdt' isn't defined

Drop the local variable, and move evaluation of this_cpu(compat_gdt) to within
the guarded region.

Reported-by: Roger Pau Monné 
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
---
 xen/arch/x86/cpu/common.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 131ff03fcf..63f3893c7a 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -711,8 +711,6 @@ void load_system_tables(void)
struct tss64 *tss = _cpu(tss_page).tss;
seg_desc_t *gdt =
this_cpu(gdt) - FIRST_RESERVED_GDT_ENTRY;
-   seg_desc_t *compat_gdt =
-   this_cpu(compat_gdt) - FIRST_RESERVED_GDT_ENTRY;
 
const struct desc_ptr gdtr = {
.base = (unsigned long)gdt,
@@ -753,8 +751,9 @@ void load_system_tables(void)
_set_tssldt_desc(gdt + TSS_ENTRY, (unsigned long)tss,
 sizeof(*tss) - 1, SYS_DESC_tss_avail);
if ( IS_ENABLED(CONFIG_PV32) )
-   _set_tssldt_desc(compat_gdt + TSS_ENTRY, (unsigned long)tss,
-sizeof(*tss) - 1, SYS_DESC_tss_busy);
+   _set_tssldt_desc(
+   this_cpu(compat_gdt) - FIRST_RESERVED_GDT_ENTRY + 
TSS_ENTRY,
+   (unsigned long)tss, sizeof(*tss) - 1, 
SYS_DESC_tss_busy);
 
per_cpu(full_gdt_loaded, cpu) = false;
lgdt();
-- 
2.11.0

RE: [PATCH v8 06/12] x86/HVM: make hvmemul_blk() capable of handling r/o operations

2020-05-05 Thread Paul Durrant

> -Original Message-
> From: Jan Beulich 
> Sent: 05 May 2020 09:15
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper ; Wei Liu ; Roger 
> Pau Monne
> ; Paul Durrant 
> Subject: [PATCH v8 06/12] x86/HVM: make hvmemul_blk() capable of handling r/o 
> operations
> 
> In preparation for handling e.g. FLDENV or {F,FX,X}RSTOR here as well.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Paul Durrant

[PATCH v2] x86/traps: fix an off-by-one error

2020-05-05 Thread Hongyan Xia

From: Hongyan Xia 

stack++ can go into the next page and unmap_domain_page() will unmap the
wrong one, causing mapcache and memory corruption. Fix.

Signed-off-by: Hongyan Xia 

---
Changed in v2:
- tweak how the unmap is handled.
- fix the bug in compat as well.
- remove part of the commit message which was not accurate.
---
 xen/arch/x86/traps.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 33e5d21ece..73c6218660 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -236,6 +236,7 @@ static void compat_show_guest_stack(struct vcpu *v,
 int debug_stack_lines)
 {
 unsigned int i, *stack, addr, mask = STACK_SIZE;
+void *stack_page = NULL;
 
 stack = (unsigned int *)(unsigned long)regs->esp;
 printk("Guest stack trace from esp=%08lx:\n ", (unsigned long)stack);
@@ -258,7 +259,7 @@ static void compat_show_guest_stack(struct vcpu *v,
 break;
 if ( !vcpu )
 {
-stack = do_page_walk(v, (unsigned long)stack);
+stack_page = stack = do_page_walk(v, (unsigned long)stack);
 if ( (unsigned long)stack < PAGE_SIZE )
 {
 printk("Inaccessible guest memory.\n");
@@ -285,11 +286,9 @@ static void compat_show_guest_stack(struct vcpu *v,
 printk(" %08x", addr);
 stack++;
 }
-if ( mask == PAGE_SIZE )
-{
-BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
-unmap_domain_page(stack);
-}
+
+UNMAP_DOMAIN_PAGE(stack_page);
+
 if ( i == 0 )
 printk("Stack empty.");
 printk("\n");
@@ -300,6 +299,7 @@ static void show_guest_stack(struct vcpu *v, const struct 
cpu_user_regs *regs)
 int i;
 unsigned long *stack, addr;
 unsigned long mask = STACK_SIZE;
+void *stack_page = NULL;
 
 /* Avoid HVM as we don't know what the stack looks like. */
 if ( is_hvm_vcpu(v) )
@@ -328,7 +328,7 @@ static void show_guest_stack(struct vcpu *v, const struct 
cpu_user_regs *regs)
 vcpu = maddr_get_owner(read_cr3()) == v->domain ? v : NULL;
 if ( !vcpu )
 {
-stack = do_page_walk(v, (unsigned long)stack);
+stack_page = stack = do_page_walk(v, (unsigned long)stack);
 if ( (unsigned long)stack < PAGE_SIZE )
 {
 printk("Inaccessible guest memory.\n");
@@ -355,11 +355,9 @@ static void show_guest_stack(struct vcpu *v, const struct 
cpu_user_regs *regs)
 printk(" %p", _p(addr));
 stack++;
 }
-if ( mask == PAGE_SIZE )
-{
-BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
-unmap_domain_page(stack);
-}
+
+UNMAP_DOMAIN_PAGE(stack_page);
+
 if ( i == 0 )
 printk("Stack empty.");
 printk("\n");
-- 
2.17.1

[PATCH] xenbus: avoid stack overflow warning

2020-05-05 Thread Arnd Bergmann

The __xenbus_map_ring() function has two large arrays, 'map' and
'unmap' on its stack. When clang decides to inline it into its caller,
xenbus_map_ring_valloc_hvm(), the total stack usage exceeds the warning
limit for stack size on 32-bit architectures.

drivers/xen/xenbus/xenbus_client.c:592:12: error: stack frame size of 1104 
bytes in function 'xenbus_map_ring_valloc_hvm' [-Werror,-Wframe-larger-than=]

As far as I can tell, other compilers don't inline it here, so we get
no warning, but the stack usage is actually the same. It is possible
for both arrays to use the same location on the stack, but the compiler
cannot prove that this is safe because they get passed to external
functions that may end up using them until they go out of scope.

Move the two arrays into separate basic blocks to limit the scope
and force them to occupy less stack in total, regardless of the
inlining decision.

Signed-off-by: Arnd Bergmann 
---
 drivers/xen/xenbus/xenbus_client.c | 74 +-
 1 file changed, 41 insertions(+), 33 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c 
b/drivers/xen/xenbus/xenbus_client.c
index 040d2a43e8e3..23ca70378e36 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -470,54 +470,62 @@ static int __xenbus_map_ring(struct xenbus_device *dev,
 unsigned int flags,
 bool *leaked)
 {
-   struct gnttab_map_grant_ref map[XENBUS_MAX_RING_GRANTS];
-   struct gnttab_unmap_grant_ref unmap[XENBUS_MAX_RING_GRANTS];
int i, j;
int err = GNTST_okay;
 
-   if (nr_grefs > XENBUS_MAX_RING_GRANTS)
-   return -EINVAL;
+   {
+   struct gnttab_map_grant_ref map[XENBUS_MAX_RING_GRANTS];
 
-   for (i = 0; i < nr_grefs; i++) {
-   memset([i], 0, sizeof(map[i]));
-   gnttab_set_map_op([i], addrs[i], flags, gnt_refs[i],
- dev->otherend_id);
-   handles[i] = INVALID_GRANT_HANDLE;
-   }
+   if (nr_grefs > XENBUS_MAX_RING_GRANTS)
+   return -EINVAL;
 
-   gnttab_batch_map(map, i);
+   for (i = 0; i < nr_grefs; i++) {
+   memset([i], 0, sizeof(map[i]));
+   gnttab_set_map_op([i], addrs[i], flags,
+ gnt_refs[i], dev->otherend_id);
+   handles[i] = INVALID_GRANT_HANDLE;
+   }
+
+   gnttab_batch_map(map, i);
 
-   for (i = 0; i < nr_grefs; i++) {
-   if (map[i].status != GNTST_okay) {
-   err = map[i].status;
-   xenbus_dev_fatal(dev, map[i].status,
+   for (i = 0; i < nr_grefs; i++) {
+   if (map[i].status != GNTST_okay) {
+   err = map[i].status;
+   xenbus_dev_fatal(dev, map[i].status,
 "mapping in shared page %d from domain 
%d",
 gnt_refs[i], dev->otherend_id);
-   goto fail;
-   } else
-   handles[i] = map[i].handle;
+   goto fail;
+   } else
+   handles[i] = map[i].handle;
+   }
}
-
return GNTST_okay;
 
  fail:
-   for (i = j = 0; i < nr_grefs; i++) {
-   if (handles[i] != INVALID_GRANT_HANDLE) {
-   memset([j], 0, sizeof(unmap[j]));
-   gnttab_set_unmap_op([j], (phys_addr_t)addrs[i],
-   GNTMAP_host_map, handles[i]);
-   j++;
+   {
+   struct gnttab_unmap_grant_ref unmap[XENBUS_MAX_RING_GRANTS];
+
+   for (i = j = 0; i < nr_grefs; i++) {
+   if (handles[i] != INVALID_GRANT_HANDLE) {
+   memset([j], 0, sizeof(unmap[j]));
+   gnttab_set_unmap_op([j],
+   (phys_addr_t)addrs[i],
+   GNTMAP_host_map,
+   handles[i]);
+   j++;
+   }
}
-   }
 
-   if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, unmap, j))
-   BUG();
+   if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
+ unmap, j))
+   BUG();
 
-   *leaked = false;
-   for (i = 0; i < j; i++) {
-   if (unmap[i].status != GNTST_okay) {
-   *leaked = true;
-   break;
+   *leaked = false;
+   for (i = 0; i < j; i++) {
+   if (unmap[i].status !=

Re: [PATCH 1/3] x86/mm: do not attempt to convert _PAGE_GNTTAB to a boolean

2020-05-05 Thread Roger Pau Monné

On Tue, May 05, 2020 at 03:47:43PM +0200, Jan Beulich wrote:
> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
> unless you have verified the sender and know the content is safe.
> 
> On 05.05.2020 11:24, Roger Pau Monne wrote:
> > Clang 10 complains with:
> > 
> > mm.c:1239:10: error: converting the result of '<<' to a boolean always 
> > evaluates to true
> >   [-Werror,-Wtautological-constant-compare]
> > if ( _PAGE_GNTTAB && (l1e_get_flags(l1e) & _PAGE_GNTTAB) &&
> >  ^
> > xen/include/asm/x86_64/page.h:161:25: note: expanded from macro 
> > '_PAGE_GNTTAB'
> > #define _PAGE_GNTTAB (1U<<22)
> > ^
> 
> This is a rather odd warning. Do they also warn for "if ( 0 )"
> or "do { } while ( 0 )", as we use in various places? There's
> no difference to me between a plain number and a constant
> composed via an expression.

Using plain 0 is fine, they just seem to dislike using << for some
reason that escapes me. Seems like it might be useful to catch bugs
where || is wrongly used instead of | when setting flags, ie:

https://github.com/haproxy/haproxy/issues/588

> 
> > Remove the conversion of _PAGE_GNTTAB to a boolean, since the and
> > operation performed afterwards will already return false if the value
> > of the macro is 0.
> 
> I'm sorry, but no. The expression was put there on purpose by
> 0932210ac095 ("x86: Address "Bitwise-and with zero
> CONSTANT_EXPRESSION_RESULT" Coverity issues"), and the
> description there is clearly telling us that this wants to stay
> unless Coverity changed in the meantime. Otherwise I'm afraid
> a more elaborate solution will be needed to please both.

Clang is fine with changing this to _PAGE_GNTTAB != 0. Would you be
OK with this approach?

> Or a
> more simplistic one, like using "#if _PAGE_GNTTAB" around the
> construct.

Yes, that's the other solution I had in mind.

Thanks, Roger.

Re: xen-4.13 tools/xentop.c backport request

2020-05-05 Thread Ian Jackson

Sander Eikelenboom writes ("xen-4.13 tools/xentop.c backport request"):
> If I'm not mistaken you do the tools backports.
> 
> I just noticed that the problem that is fixed by commit: 
> 4b5b431edd984b26f43b3efc7de465f3560a949e tools/xentop: Fix calculation of 
> used memory
> 
> is already present in the xen-4.13 branch (older releases are uneffected).
> Unfortunately I didn't check before, so I didn't include a "backport tag".
> 
> If it wasn't already on you backport list, please consider to backport / 
> apply this one.

It wasn't.  I have done this.

Ian.

Re: [PATCH] x86/traps: fix an off-by-one error

2020-05-05 Thread Hongyan Xia

On Tue, 2020-05-05 at 15:38 +0200, Jan Beulich wrote:
> On 05.05.2020 13:06, Hongyan Xia wrote:
> > --- a/xen/arch/x86/traps.c
> > +++ b/xen/arch/x86/traps.c
> > @@ -300,6 +300,7 @@ static void show_guest_stack(struct vcpu *v,
> > const struct cpu_user_regs *regs)
> >  int i;
> >  unsigned long *stack, addr;
> >  unsigned long mask = STACK_SIZE;
> > +void *stack_page = NULL;
> >  
> >  /* Avoid HVM as we don't know what the stack looks like. */
> >  if ( is_hvm_vcpu(v) )
> > @@ -328,7 +329,7 @@ static void show_guest_stack(struct vcpu *v,
> > const struct cpu_user_regs *regs)
> >  vcpu = maddr_get_owner(read_cr3()) == v->domain ? v :
> > NULL;
> >  if ( !vcpu )
> >  {
> > -stack = do_page_walk(v, (unsigned long)stack);
> > +stack_page = stack = do_page_walk(v, (unsigned
> > long)stack);
> >  if ( (unsigned long)stack < PAGE_SIZE )
> >  {
> >  printk("Inaccessible guest memory.\n");
> > @@ -358,7 +359,7 @@ static void show_guest_stack(struct vcpu *v,
> > const struct cpu_user_regs *regs)
> >  if ( mask == PAGE_SIZE )
> >  {
> >  BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
> > -unmap_domain_page(stack);
> > +unmap_domain_page(stack_page);
> >  }
> 
> With this I think you want to change the whole construct here to
> 
> if ( stack_page )
> unmap_domain_page(stack_page);
> 
> i.e. with the then no longer relevant BUILD_BUG_ON() also dropped.

I wonder if such a construct is better with UNMAP_DOMAIN_PAGE(), since
it deals with NULL and will nullify it to prevent misuse.

> What's more important though - please also fix the same issue in
> compat_show_guest_stack(). Unless I'm mistaken of course, in which
> case it would be nice if the description could mention why the
> other similar function isn't affected.

Compat suffers from the same problem. Thanks for pointing that out.

Hongyan

RE: [PATCH v2 02/10] xen: Fix and improve handling of device_add usb-host errors

2020-05-05 Thread Paul Durrant

> -Original Message-
> From: Markus Armbruster 
> Sent: 05 May 2020 11:19
> To: qemu-de...@nongnu.org
> Cc: Stefano Stabellini ; Anthony Perard 
> ; Paul
> Durrant ; Gerd Hoffmann ; 
> xen-devel@lists.xenproject.org
> Subject: [PATCH v2 02/10] xen: Fix and improve handling of device_add 
> usb-host errors
> 
> usbback_portid_add() leaks the error when qdev_device_add() fails.
> Fix that.  While there, use the error to improve the error message.
> 
> The qemu_opts_from_qdict() similarly leaks on failure.  But any
> failure there is a programming error.  Pass _abort.
> 
> Fixes: 816ac92ef769f9ffc534e49a1bb6177bddce7aa2
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Paul Durrant 
> Cc: Gerd Hoffmann 
> Cc: xen-devel@lists.xenproject.org
> Signed-off-by: Markus Armbruster 

Acked-by: Paul Durrant

[xen-unstable-smoke test] 150024: trouble: blocked/broken

2020-05-05 Thread osstest service owner

flight 150024 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/150024/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64  broken
 build-arm64-xsm  broken
 build-armhf  broken
 build-arm64-xsm   4 host-install(4)broken REGR. vs. 149888
 build-amd64   4 host-install(4)broken REGR. vs. 149888
 build-armhf   4 host-install(4)broken REGR. vs. 149888

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a

version targeted for testing:
 xen  fe36a173d110fd792f5e337e208a5ed714df1536
baseline version:
 xen  0135be8bd8cd60090298f02310691b688d95c3a8

Last test of basis   149888  2020-04-30 09:00:53 Z5 days
Testing same since   149986  2020-05-05 01:24:46 Z0 days2 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Roger Pau Monné 
  Wei Liu 

jobs:
 build-arm64-xsm  broken  
 build-amd64  broken  
 build-armhf  broken  
 build-amd64-libvirt  blocked 
 test-armhf-armhf-xl  blocked 
 test-arm64-arm64-xl-xsm  blocked 
 test-amd64-amd64-xl-qemuu-debianhvm-amd64blocked 
 test-amd64-amd64-libvirt blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary

broken-job build-amd64 broken
broken-job build-arm64-xsm broken
broken-job build-armhf broken
broken-step build-arm64-xsm host-install(4)
broken-step build-amd64 host-install(4)
broken-step build-armhf host-install(4)

Not pushing.


commit fe36a173d110fd792f5e337e208a5ed714df1536
Author: Andrew Cooper 
Date:   Thu Apr 30 10:47:14 2020 +0100

x86/amd: Initial support for Fam19h processors

Fam19h is very similar to Fam17h in these regards.

Signed-off-by: Andrew Cooper 
Reviewed-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 

commit 0c9751b53c2ee135fd484a03fd47f3bb5fbe63b8
Author: Jan Beulich 
Date:   Mon May 4 11:54:35 2020 +0200

x86/HyperV: correct hv_hcall_page for xen.efi build

Along the lines of what the not reverted part of 3c4b2eef4941 ("x86:
refine link time stub area related assertion") did, we need to transform
the absolute HV_HCALL_PAGE into the image base relative hv_hcall_page
(or else there'd be no need for two distinct symbols). Otherwise
mkreloc, as used for generating the base relocations of xen.efi, will
spit out warnings like "Difference at .text:0009b74f is 0xc000
(expected 0x4000)". As long as the offending relocations are PC
relative ones, the generated binary is correct afaict, but if there ever
was the absolute address stored, xen.efi would miss a fixup for it.

Reported-by: Andrew Cooper 
Signed-off-by: Jan Beulich 
Acked-by: Wei Liu 

commit b0f666c569b8af6a51ab8aeec3664d6acd1abee9
Author: Jan Beulich 
Date:   Mon May 4 11:53:42 2020 +0200

x86/EFI: correct section offsets in mkreloc diagnostics

These are more helpful if they point at the address where the relocated
value starts, rather than at the specific byte of the difference.

Signed-off-by: Jan Beulich 
Acked-by: Andrew Cooper 

commit 17b997aa1edb9eb8d9bd1958457ff50927f46832
Author: Roger Pau Monné 
Date:   Mon May 4 11:53:01 2020 +0200

x86/hap: be more selective with assisted TLB flush

When doing an assisted flush on HAP the purpose of the
on_selected_cpus is just to trigger a vmexit on remote CPUs that are
in guest context, and hence just using is_vcpu_dirty_cpu is too lax,

Re: [PATCH] x86/pv: Compile out emul-gate-op in !CONFIG_PV32 builds

2020-05-05 Thread Jan Beulich

On 05.05.2020 13:35, Andrew Cooper wrote:
> The caller is already guarded by is_pv_32bit_vcpu().
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich

Re: [PATCH] x86/pv: Prune include lists

2020-05-05 Thread Jan Beulich

On 05.05.2020 13:42, Andrew Cooper wrote:
> Several of these in particular haven't been pruned since the logic was all
> part of arch/x86/traps.c
> 
> Some adjustments to header files are required to avoid compile errors:
>  * emulate.h needs xen/sched.h because gdt_ldt_desc_ptr() uses v->vcpu_id.
>  * mmconfig.h needs to forward declare acpi_table_header.
>  * shadow.h and trace.h need to have uint*_t in scope before including the Xen
>public headers.  For shadow.h, reorder the includes.  For trace.h, include
>types.h
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Jan Beulich

Re: [PATCH 1/3] x86/mm: do not attempt to convert _PAGE_GNTTAB to a boolean

2020-05-05 Thread Jan Beulich

On 05.05.2020 11:24, Roger Pau Monne wrote:
> Clang 10 complains with:
> 
> mm.c:1239:10: error: converting the result of '<<' to a boolean always 
> evaluates to true
>   [-Werror,-Wtautological-constant-compare]
> if ( _PAGE_GNTTAB && (l1e_get_flags(l1e) & _PAGE_GNTTAB) &&
>  ^
> xen/include/asm/x86_64/page.h:161:25: note: expanded from macro '_PAGE_GNTTAB'
> #define _PAGE_GNTTAB (1U<<22)
> ^

This is a rather odd warning. Do they also warn for "if ( 0 )"
or "do { } while ( 0 )", as we use in various places? There's
no difference to me between a plain number and a constant
composed via an expression.

> Remove the conversion of _PAGE_GNTTAB to a boolean, since the and
> operation performed afterwards will already return false if the value
> of the macro is 0.

I'm sorry, but no. The expression was put there on purpose by
0932210ac095 ("x86: Address "Bitwise-and with zero
CONSTANT_EXPRESSION_RESULT" Coverity issues"), and the
description there is clearly telling us that this wants to stay
unless Coverity changed in the meantime. Otherwise I'm afraid
a more elaborate solution will be needed to please both. Or a
more simplistic one, like using "#if _PAGE_GNTTAB" around the
construct.

Jan

Re: [PATCH] x86/traps: fix an off-by-one error

2020-05-05 Thread Jan Beulich

On 05.05.2020 13:06, Hongyan Xia wrote:
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -300,6 +300,7 @@ static void show_guest_stack(struct vcpu *v, const struct 
> cpu_user_regs *regs)
>  int i;
>  unsigned long *stack, addr;
>  unsigned long mask = STACK_SIZE;
> +void *stack_page = NULL;
>  
>  /* Avoid HVM as we don't know what the stack looks like. */
>  if ( is_hvm_vcpu(v) )
> @@ -328,7 +329,7 @@ static void show_guest_stack(struct vcpu *v, const struct 
> cpu_user_regs *regs)
>  vcpu = maddr_get_owner(read_cr3()) == v->domain ? v : NULL;
>  if ( !vcpu )
>  {
> -stack = do_page_walk(v, (unsigned long)stack);
> +stack_page = stack = do_page_walk(v, (unsigned long)stack);
>  if ( (unsigned long)stack < PAGE_SIZE )
>  {
>  printk("Inaccessible guest memory.\n");
> @@ -358,7 +359,7 @@ static void show_guest_stack(struct vcpu *v, const struct 
> cpu_user_regs *regs)
>  if ( mask == PAGE_SIZE )
>  {
>  BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
> -unmap_domain_page(stack);
> +unmap_domain_page(stack_page);
>  }

With this I think you want to change the whole construct here to

if ( stack_page )
unmap_domain_page(stack_page);

i.e. with the then no longer relevant BUILD_BUG_ON() also dropped.

What's more important though - please also fix the same issue in
compat_show_guest_stack(). Unless I'm mistaken of course, in which
case it would be nice if the description could mention why the
other similar function isn't affected.

Jan

Re: [PATCH] x86/pv: Compile out emul-gate-op in !CONFIG_PV32 builds

2020-05-05 Thread Roger Pau Monné

On Tue, May 05, 2020 at 12:35:37PM +0100, Andrew Cooper wrote:
> The caller is already guarded by is_pv_32bit_vcpu().
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

Re: [PATCH v5] docs/designs: re-work the xenstore migration document...

2020-05-05 Thread Jürgen Groß


On 05.05.20 15:01, Julien Grall wrote:

Hi Paul,

On 28/04/2020 16:06, Paul Durrant wrote:

From: Paul Durrant 

... to specify a separate migration stream that will also be suitable for
live update.

The original scope of the document was to support non-cooperative 
migration
of guests [1] but, since then, live update of xenstored has been 
brought into

scope. Thus it makes more sense to define a separate image format for
serializing xenstore state that is suitable for both purposes.

The document has been limited to specifying a new image format. The 
mechanism
for acquiring the image for live update or migration is not covered as 
that
is more appropriately dealt with by a patch to docs/misc/xenstore.txt. 
It is
also expected that, when the first implementation of live update or 
migration
making use of this specification is committed, that the document is 
moved from

docs/designs into docs/specs.

NOTE: It will only be necessary to save and restore state for active 
xenstore
   connections, but the documentation for 'RESUME' in xenstore.txt 
implies
   otherwise. That command is unused so this patch deletes it from 
the

   specification.


Could someone from Citrix please verify that XAPI isn't using XS_RESUME?


Juergen

Re: [PATCH v5] docs/designs: re-work the xenstore migration document...

2020-05-05 Thread Julien Grall


Hi Paul,

On 28/04/2020 16:06, Paul Durrant wrote:

From: Paul Durrant 

... to specify a separate migration stream that will also be suitable for
live update.

The original scope of the document was to support non-cooperative migration
of guests [1] but, since then, live update of xenstored has been brought into
scope. Thus it makes more sense to define a separate image format for
serializing xenstore state that is suitable for both purposes.

The document has been limited to specifying a new image format. The mechanism
for acquiring the image for live update or migration is not covered as that
is more appropriately dealt with by a patch to docs/misc/xenstore.txt. It is
also expected that, when the first implementation of live update or migration
making use of this specification is committed, that the document is moved from
docs/designs into docs/specs.

NOTE: It will only be necessary to save and restore state for active xenstore
   connections, but the documentation for 'RESUME' in xenstore.txt implies
   otherwise. That command is unused so this patch deletes it from the
   specification.

[1] See 
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md

Signed-off-by: Paul Durrant 
Reviewed-by: Juergen Gross 


Acked-by: Julien Grall 

I will wait until tomorrow afternoon (BST) to give an opportunity to 
other to comment.


Cheers,

--
Julien Grall

Re: [PATCH v8 07/12] x86emul: support FNSTENV and FNSAVE

2020-05-05 Thread Jan Beulich

On 05.05.2020 10:15, Jan Beulich wrote:
> @@ -11542,6 +11611,12 @@ int x86_emul_blk(
>  switch ( state->blk )
>  {
>  bool zf;
> +struct {
> +struct x87_env32 env;
> +struct {
> +   uint8_t bytes[10];
> +} freg[8];
> +} fpstate;

This also needs #ifndef X86EMUL_NO_FPU around it for !HVM builds
to work.

Jan

[PATCH] x86/pv: Prune include lists

2020-05-05 Thread Andrew Cooper

Several of these in particular haven't been pruned since the logic was all
part of arch/x86/traps.c

Some adjustments to header files are required to avoid compile errors:
 * emulate.h needs xen/sched.h because gdt_ldt_desc_ptr() uses v->vcpu_id.
 * mmconfig.h needs to forward declare acpi_table_header.
 * shadow.h and trace.h need to have uint*_t in scope before including the Xen
   public headers.  For shadow.h, reorder the includes.  For trace.h, include
   types.h

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
---
 xen/arch/x86/pv/callback.c  |  5 -
 xen/arch/x86/pv/emul-gate-op.c  | 19 ---
 xen/arch/x86/pv/emul-inv-op.c   | 18 --
 xen/arch/x86/pv/emul-priv-op.c  |  9 +
 xen/arch/x86/pv/emulate.h   |  2 ++
 xen/arch/x86/pv/ro-page-fault.c |  8 
 xen/arch/x86/pv/shim.c  |  3 ---
 xen/arch/x86/x86_64/mmconfig.h  |  1 +
 xen/include/asm-x86/shadow.h|  3 ++-
 xen/include/xen/trace.h |  1 +
 10 files changed, 7 insertions(+), 62 deletions(-)

diff --git a/xen/arch/x86/pv/callback.c b/xen/arch/x86/pv/callback.c
index 106c16ed01..97242cd3d4 100644
--- a/xen/arch/x86/pv/callback.c
+++ b/xen/arch/x86/pv/callback.c
@@ -19,15 +19,10 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
 
-#include 
-#include 
 #include 
-#include 
 
 #include 
 
diff --git a/xen/arch/x86/pv/emul-gate-op.c b/xen/arch/x86/pv/emul-gate-op.c
index 3c7f6d70bc..61e65ce521 100644
--- a/xen/arch/x86/pv/emul-gate-op.c
+++ b/xen/arch/x86/pv/emul-gate-op.c
@@ -20,25 +20,6 @@
  */
 
 #include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
 
 #include "emulate.h"
 
diff --git a/xen/arch/x86/pv/emul-inv-op.c b/xen/arch/x86/pv/emul-inv-op.c
index 91d05790c2..59e3edc8c4 100644
--- a/xen/arch/x86/pv/emul-inv-op.c
+++ b/xen/arch/x86/pv/emul-inv-op.c
@@ -19,26 +19,8 @@
  * along with this program; If not, see .
  */
 
-#include 
-#include 
-#include 
-#include 
-#include 
 #include 
 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-
 #include "emulate.h"
 
 static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index e24b84f46a..3b705299cf 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -19,25 +19,18 @@
  * along with this program; If not, see .
  */
 
-#include 
+#include 
 #include 
 #include 
 #include 
-#include 
-#include 
 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 #include 
-#include 
-#include 
 
 #include 
 
diff --git a/xen/arch/x86/pv/emulate.h b/xen/arch/x86/pv/emulate.h
index fd2aa0a484..4b845b08e3 100644
--- a/xen/arch/x86/pv/emulate.h
+++ b/xen/arch/x86/pv/emulate.h
@@ -1,6 +1,8 @@
 #ifndef __PV_EMULATE_H__
 #define __PV_EMULATE_H__
 
+#include 
+
 #include 
 #include 
 
diff --git a/xen/arch/x86/pv/ro-page-fault.c b/xen/arch/x86/pv/ro-page-fault.c
index a920fb5e15..0eedb70002 100644
--- a/xen/arch/x86/pv/ro-page-fault.c
+++ b/xen/arch/x86/pv/ro-page-fault.c
@@ -20,15 +20,7 @@
  * along with this program; If not, see .
  */
 
-#include 
-#include 
-#include 
 #include 
-
-#include 
-#include 
-#include 
-#include 
 #include 
 
 #include "emulate.h"
diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 31264582cc..3a0525c209 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -25,14 +25,11 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
-#include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
diff --git a/xen/arch/x86/x86_64/mmconfig.h b/xen/arch/x86/x86_64/mmconfig.h
index 2e836848ad..4d3b9fcbdd 100644
--- a/xen/arch/x86/x86_64/mmconfig.h
+++ b/xen/arch/x86/x86_64/mmconfig.h
@@ -75,6 +75,7 @@ static inline void mmio_config_writel(void __iomem *pos, u32 
val)
 }
 
 /* function prototypes */
+struct acpi_table_header;
 int acpi_parse_mcfg(struct acpi_table_header *header);
 int pci_mmcfg_reserved(uint64_t address, unsigned int segment,
unsigned int start_bus, unsigned int end_bus,
diff --git a/xen/include/asm-x86/shadow.h b/xen/include/asm-x86/shadow.h
index 907c71f497..8335862c87 100644
--- a/xen/include/asm-x86/shadow.h
+++ b/xen/include/asm-x86/shadow.h
@@ -22,7 +22,6 @@
 #ifndef _XEN_SHADOW_H
 #define _XEN_SHADOW_H
 
-#include 
 #include 
 #include 
 #include 
@@ -31,6 +30,8 @@
 #include 
 #include 
 
+#include 
+
 /*
  * Macros to tell which shadow paging mode a domain is in*/
 
diff --git a/xen/include/xen/trace.h b/xen/include/xen/trace.h

[PATCH] x86/pv: Compile out emul-gate-op in !CONFIG_PV32 builds

2020-05-05 Thread Andrew Cooper

The caller is already guarded by is_pv_32bit_vcpu().

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
---
 xen/arch/x86/pv/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index cf28434ba9..75b01b0062 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -2,7 +2,7 @@ obj-y += callback.o
 obj-y += descriptor-tables.o
 obj-y += domain.o
 obj-y += emulate.o
-obj-y += emul-gate-op.o
+obj-$(CONFIG_PV32) += emul-gate-op.o
 obj-y += emul-inv-op.o
 obj-y += emul-priv-op.o
 obj-$(CONFIG_GRANT_TABLE) += grant_table.o
-- 
2.11.0

[PATCH] x86/traps: fix an off-by-one error

2020-05-05 Thread Hongyan Xia

From: Hongyan Xia 

stack++ can go into the next page and unmap_domain_page() will unmap the
wrong one, causing mapcache and memory corruption. Fix.

This is found with direct map removal. For now, the idle domain does not
have a mapcache and uses the direct map, so no errors will occur.

Signed-off-by: Hongyan Xia 
---
 xen/arch/x86/traps.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 33e5d21ece..f033a804a3 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -300,6 +300,7 @@ static void show_guest_stack(struct vcpu *v, const struct 
cpu_user_regs *regs)
 int i;
 unsigned long *stack, addr;
 unsigned long mask = STACK_SIZE;
+void *stack_page = NULL;
 
 /* Avoid HVM as we don't know what the stack looks like. */
 if ( is_hvm_vcpu(v) )
@@ -328,7 +329,7 @@ static void show_guest_stack(struct vcpu *v, const struct 
cpu_user_regs *regs)
 vcpu = maddr_get_owner(read_cr3()) == v->domain ? v : NULL;
 if ( !vcpu )
 {
-stack = do_page_walk(v, (unsigned long)stack);
+stack_page = stack = do_page_walk(v, (unsigned long)stack);
 if ( (unsigned long)stack < PAGE_SIZE )
 {
 printk("Inaccessible guest memory.\n");
@@ -358,7 +359,7 @@ static void show_guest_stack(struct vcpu *v, const struct 
cpu_user_regs *regs)
 if ( mask == PAGE_SIZE )
 {
 BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
-unmap_domain_page(stack);
+unmap_domain_page(stack_page);
 }
 if ( i == 0 )
 printk("Stack empty.");
-- 
2.17.1

[PATCH v2 02/10] xen: Fix and improve handling of device_add usb-host errors

2020-05-05 Thread Markus Armbruster

usbback_portid_add() leaks the error when qdev_device_add() fails.
Fix that.  While there, use the error to improve the error message.

The qemu_opts_from_qdict() similarly leaks on failure.  But any
failure there is a programming error.  Pass _abort.

Fixes: 816ac92ef769f9ffc534e49a1bb6177bddce7aa2
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Paul Durrant 
Cc: Gerd Hoffmann 
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Markus Armbruster 
---
 hw/usb/xen-usb.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c
index 961190d0f7..4d266d7bb4 100644
--- a/hw/usb/xen-usb.c
+++ b/hw/usb/xen-usb.c
@@ -30,6 +30,7 @@
 #include "hw/usb.h"
 #include "hw/xen/xen-legacy-backend.h"
 #include "monitor/qdev.h"
+#include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
 
@@ -755,13 +756,16 @@ static void usbback_portid_add(struct usbback_info 
*usbif, unsigned port,
 qdict_put_int(qdict, "port", port);
 qdict_put_int(qdict, "hostbus", atoi(busid));
 qdict_put_str(qdict, "hostport", portname);
-opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict, _err);
-if (local_err) {
-goto err;
-}
+opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict,
+_abort);
 usbif->ports[port - 1].dev = USB_DEVICE(qdev_device_add(opts, _err));
 if (!usbif->ports[port - 1].dev) {
-goto err;
+qobject_unref(qdict);
+xen_pv_printf(>xendev, 0,
+  "device %s could not be opened: %s\n",
+  busid, error_get_pretty(local_err));
+error_free(local_err);
+return;
 }
 qobject_unref(qdict);
 speed = usbif->ports[port - 1].dev->speed;
@@ -793,11 +797,6 @@ static void usbback_portid_add(struct usbback_info *usbif, 
unsigned port,
 usbback_hotplug_enq(usbif, port);
 
 TR_BUS(>xendev, "port %d attached\n", port);
-return;
-
-err:
-qobject_unref(qdict);
-xen_pv_printf(>xendev, 0, "device %s could not be opened\n", busid);
 }
 
 static void usbback_process_port(struct usbback_info *usbif, unsigned port)
-- 
2.21.1

[PATCH 0/3] build: fixes for clang 10

2020-05-05 Thread Roger Pau Monne

Hello,

Patches 1 and 3 are fixes in order to build Xen with Clang 10. Patch 2
is a fix for a configure bug I've found while attempting to fix Clang
10.

Thanks, Roger.

Roger Pau Monne (3):
  x86/mm: do not attempt to convert _PAGE_GNTTAB to a boolean
  configure: also add EXTRA_PREFIX to {CPP/LD}FLAGS
  tools/libxl: disable clang indentation check for the disk parser

 m4/set_cflags_ldflags.m4|  4 
 tools/libxl/libxlu_disk_l.l | 11 +++
 xen/arch/x86/mm.c   |  2 +-
 3 files changed, 16 insertions(+), 1 deletion(-)

-- 
2.26.2

[PATCH 1/3] x86/mm: do not attempt to convert _PAGE_GNTTAB to a boolean

2020-05-05 Thread Roger Pau Monne

Clang 10 complains with:

mm.c:1239:10: error: converting the result of '<<' to a boolean always 
evaluates to true
  [-Werror,-Wtautological-constant-compare]
if ( _PAGE_GNTTAB && (l1e_get_flags(l1e) & _PAGE_GNTTAB) &&
 ^
xen/include/asm/x86_64/page.h:161:25: note: expanded from macro '_PAGE_GNTTAB'
#define _PAGE_GNTTAB (1U<<22)
^

Remove the conversion of _PAGE_GNTTAB to a boolean, since the and
operation performed afterwards will already return false if the value
of the macro is 0.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/mm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 355c50ff91..27069d2451 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1236,7 +1236,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain 
*l1e_owner)
  * (Note that the undestroyable active grants are not a security hole in
  * Xen. All active grants can safely be cleaned up when the domain dies.)
  */
-if ( _PAGE_GNTTAB && (l1e_get_flags(l1e) & _PAGE_GNTTAB) &&
+if ( (l1e_get_flags(l1e) & _PAGE_GNTTAB) &&
  !l1e_owner->is_shutting_down && !l1e_owner->is_dying )
 {
 gdprintk(XENLOG_WARNING,
-- 
2.26.2

[PATCH 2/3] configure: also add EXTRA_PREFIX to {CPP/LD}FLAGS

2020-05-05 Thread Roger Pau Monne

The path provided by EXTRA_PREFIX should be added to the search path
of the configure script, like it's done in Config.mk. Not doing so
makes the search path for configure differ from the search path used
by the build.

Signed-off-by: Roger Pau Monné 
---
Please re-run autoconf.sh after applying.
---
 m4/set_cflags_ldflags.m4 | 4 
 1 file changed, 4 insertions(+)

diff --git a/m4/set_cflags_ldflags.m4 b/m4/set_cflags_ldflags.m4
index cbad3c10b0..08f5c983cc 100644
--- a/m4/set_cflags_ldflags.m4
+++ b/m4/set_cflags_ldflags.m4
@@ -15,6 +15,10 @@ for ldflag in $APPEND_LIB
 do
 APPEND_LDFLAGS="$APPEND_LDFLAGS -L$ldflag"
 done
+if [ ! -z $EXTRA_PREFIX ]; then
+CPPFLAGS="$CPPFLAGS -I$EXTRA_PREFIX/include"
+LDFLAGS="$LDFLAGS -L$EXTRA_PREFIX/lib"
+fi
 CPPFLAGS="$PREPEND_CPPFLAGS $CPPFLAGS $APPEND_CPPFLAGS"
 LDFLAGS="$PREPEND_LDFLAGS $LDFLAGS $APPEND_LDFLAGS"])
 
-- 
2.26.2

[PATCH 3/3] tools/libxl: disable clang indentation check for the disk parser

2020-05-05 Thread Roger Pau Monne

Clang 10 complains with:

13: error: misleading indentation; statement is not part of the previous 'if'
  [-Werror,-Wmisleading-indentation]
if ( ! yyg->yy_state_buf )
^
libxlu_disk_l.c:1259:9: note: previous statement is here
if ( ! yyg->yy_state_buf )
^

Due to the missing braces in single line statements and the wrong
indentation. Fix this by disabling the warning for that specific file.
I haven't found a way to force flex to add braces around single line
statements in conditional blocks.

Signed-off-by: Roger Pau Monné 
---
Please re-generate libxlu_disk_l.c before committing.
---
 tools/libxl/libxlu_disk_l.l | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tools/libxl/libxlu_disk_l.l b/tools/libxl/libxlu_disk_l.l
index 97039a2800..7a46f4a30c 100644
--- a/tools/libxl/libxlu_disk_l.l
+++ b/tools/libxl/libxlu_disk_l.l
@@ -36,6 +36,17 @@
 
 #define YY_NO_INPUT
 
+/* The code generated by flex is missing braces in single line expressions and
+ * is not properly indented, which triggers the clang misleading-indentation
+ * check that has been made part of -Wall since clang 10. In order to safely
+ * disable it on clang versions that don't have the diagnostic implemented
+ * also disable the unknown option and pragma warning. */
+#ifdef __clang__
+# pragma clang diagnostic ignored "-Wunknown-pragmas"
+# pragma clang diagnostic ignored "-Wunknown-warning-option"
+# pragma clang diagnostic ignored "-Wmisleading-indentation"
+#endif
+
 /* Some versions of flex have a bug (Fedora bugzilla 612465) which causes
  * it to fail to declare these functions, which it defines.  So declare
  * them ourselves.  Hopefully we won't have to simultaneously support
-- 
2.26.2

[PATCH v3 16/25] xen: gntdev: fix common struct sg_table related issues

2020-05-05 Thread Marek Szyprowski

The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of the entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. A common mistake was to ignore a result
of the dma_map_sg function and don't use the sg_table->orig_nents at all.

To avoid such issues, lets use common dma-mapping wrappers operating
directly on the struct sg_table objects and adjust references to the
nents and orig_nents respectively.

Signed-off-by: Marek Szyprowski 
---
For more information, see '[PATCH v3 00/25] DRM: fix struct sg_table nents
vs. orig_nents misuse' thread: https://lkml.org/lkml/2020/5/5/187
---
 drivers/xen/gntdev-dmabuf.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/gntdev-dmabuf.c b/drivers/xen/gntdev-dmabuf.c
index 75d3bb9..4b22785 100644
--- a/drivers/xen/gntdev-dmabuf.c
+++ b/drivers/xen/gntdev-dmabuf.c
@@ -247,8 +247,7 @@ static void dmabuf_exp_ops_detach(struct dma_buf *dma_buf,
 
if (sgt) {
if (gntdev_dmabuf_attach->dir != DMA_NONE)
-   dma_unmap_sg_attrs(attach->dev, sgt->sgl,
-  sgt->nents,
+   dma_unmap_sgtable_attrs(attach->dev, sgt,
   gntdev_dmabuf_attach->dir,
   DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sgt);
@@ -288,7 +287,7 @@ static void dmabuf_exp_ops_detach(struct dma_buf *dma_buf,
sgt = dmabuf_pages_to_sgt(gntdev_dmabuf->pages,
  gntdev_dmabuf->nr_pages);
if (!IS_ERR(sgt)) {
-   if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
+   if (dma_map_sgtable_attrs(attach->dev, sgt, dir,
  DMA_ATTR_SKIP_CPU_SYNC)) {
sg_free_table(sgt);
kfree(sgt);
@@ -625,7 +624,7 @@ static struct gntdev_dmabuf *dmabuf_imp_alloc_storage(int 
count)
 
/* Now convert sgt to array of pages and check for page validity. */
i = 0;
-   for_each_sg_page(sgt->sgl, _iter, sgt->nents, 0) {
+   for_each_sg_page(sgt->sgl, _iter, sgt->orig_nents, 0) {
struct page *page = sg_page_iter_page(_iter);
/*
 * Check if page is valid: this can happen if we are given
-- 
1.9.1

[PATCH v8 12/12] x86/HVM: don't needlessly intercept APERF/MPERF/TSC MSR reads

2020-05-05 Thread Jan Beulich

If the hardware can handle accesses, we should allow it to do so. This
way we can expose EFRO to HVM guests, and "all" that's left for exposing
APERF/MPERF is to figure out how to handle writes to these MSRs. (Note
that the leaf 6 guest CPUID checks will evaluate to false for now, as
recalculate_misc() zaps the entire leaf.)

For TSC the intercepts are made mirror the RDTSC ones.

Signed-off-by: Jan Beulich 
Reviewed-by: Kevin Tian 
---
v4: Make TSC intercepts mirror RDTSC ones. Re-base.
v3: New.

--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -595,6 +595,7 @@ static void svm_cpuid_policy_changed(str
 struct vmcb_struct *vmcb = svm->vmcb;
 const struct cpuid_policy *cp = v->domain->arch.cpuid;
 u32 bitmap = vmcb_get_exception_intercepts(vmcb);
+unsigned int mode;
 
 if ( opt_hvm_fep ||
  (v->domain->arch.cpuid->x86_vendor != boot_cpu_data.x86_vendor) )
@@ -607,6 +608,17 @@ static void svm_cpuid_policy_changed(str
 /* Give access to MSR_PRED_CMD if the guest has been told about it. */
 svm_intercept_msr(v, MSR_PRED_CMD,
   cp->extd.ibpb ? MSR_INTERCEPT_NONE : MSR_INTERCEPT_RW);
+
+/* Allow direct reads from APERF/MPERF if permitted by the policy. */
+mode = cp->basic.raw[6].c & CPUID6_ECX_APERFMPERF_CAPABILITY
+   ? MSR_INTERCEPT_WRITE : MSR_INTERCEPT_RW;
+svm_intercept_msr(v, MSR_IA32_APERF, mode);
+svm_intercept_msr(v, MSR_IA32_MPERF, mode);
+
+/* Allow direct access to their r/o counterparts if permitted. */
+mode = cp->extd.efro ? MSR_INTERCEPT_NONE : MSR_INTERCEPT_RW;
+svm_intercept_msr(v, MSR_APERF_RD_ONLY, mode);
+svm_intercept_msr(v, MSR_MPERF_RD_ONLY, mode);
 }
 
 void svm_sync_vmcb(struct vcpu *v, enum vmcb_sync_state new_state)
@@ -860,7 +872,10 @@ static void svm_set_rdtsc_exiting(struct
 {
 general1_intercepts |= GENERAL1_INTERCEPT_RDTSC;
 general2_intercepts |= GENERAL2_INTERCEPT_RDTSCP;
+svm_enable_intercept_for_msr(v, MSR_IA32_TSC);
 }
+else
+svm_intercept_msr(v, MSR_IA32_TSC, MSR_INTERCEPT_WRITE);
 
 vmcb_set_general1_intercepts(vmcb, general1_intercepts);
 vmcb_set_general2_intercepts(vmcb, general2_intercepts);
--- a/xen/arch/x86/hvm/svm/vmcb.c
+++ b/xen/arch/x86/hvm/svm/vmcb.c
@@ -108,6 +108,7 @@ static int construct_vmcb(struct vcpu *v
 {
 vmcb->_general1_intercepts |= GENERAL1_INTERCEPT_RDTSC;
 vmcb->_general2_intercepts |= GENERAL2_INTERCEPT_RDTSCP;
+svm_intercept_msr(v, MSR_IA32_TSC, MSR_INTERCEPT_WRITE);
 }
 
 /* Guest segment limits. */
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1141,8 +1141,13 @@ static int construct_vmcs(struct vcpu *v
 vmx_clear_msr_intercept(v, MSR_IA32_SYSENTER_CS, VMX_MSR_RW);
 vmx_clear_msr_intercept(v, MSR_IA32_SYSENTER_ESP, VMX_MSR_RW);
 vmx_clear_msr_intercept(v, MSR_IA32_SYSENTER_EIP, VMX_MSR_RW);
+
+if ( !(v->arch.hvm.vmx.exec_control & CPU_BASED_RDTSC_EXITING) )
+vmx_clear_msr_intercept(v, MSR_IA32_TSC, VMX_MSR_R);
+
 if ( paging_mode_hap(d) && (!is_iommu_enabled(d) || iommu_snoop) )
 vmx_clear_msr_intercept(v, MSR_IA32_CR_PAT, VMX_MSR_RW);
+
 if ( (vmexit_ctl & VM_EXIT_CLEAR_BNDCFGS) &&
  (vmentry_ctl & VM_ENTRY_LOAD_BNDCFGS) )
 vmx_clear_msr_intercept(v, MSR_IA32_BNDCFGS, VMX_MSR_RW);
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -589,6 +589,18 @@ static void vmx_cpuid_policy_changed(str
 vmx_clear_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW);
 else
 vmx_set_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW);
+
+/* Allow direct reads from APERF/MPERF if permitted by the policy. */
+if ( cp->basic.raw[6].c & CPUID6_ECX_APERFMPERF_CAPABILITY )
+{
+vmx_clear_msr_intercept(v, MSR_IA32_APERF, VMX_MSR_R);
+vmx_clear_msr_intercept(v, MSR_IA32_MPERF, VMX_MSR_R);
+}
+else
+{
+vmx_set_msr_intercept(v, MSR_IA32_APERF, VMX_MSR_R);
+vmx_set_msr_intercept(v, MSR_IA32_MPERF, VMX_MSR_R);
+}
 }
 
 int vmx_guest_x86_mode(struct vcpu *v)
@@ -1254,7 +1266,12 @@ static void vmx_set_rdtsc_exiting(struct
 vmx_vmcs_enter(v);
 v->arch.hvm.vmx.exec_control &= ~CPU_BASED_RDTSC_EXITING;
 if ( enable )
+{
 v->arch.hvm.vmx.exec_control |= CPU_BASED_RDTSC_EXITING;
+vmx_set_msr_intercept(v, MSR_IA32_TSC, VMX_MSR_R);
+}
+else
+vmx_clear_msr_intercept(v, MSR_IA32_TSC, VMX_MSR_R);
 vmx_update_cpu_exec_control(v);
 vmx_vmcs_exit(v);
 }
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -245,7 +245,7 @@ XEN_CPUFEATURE(ENQCMD,6*32+29) /
 
 /* AMD-defined CPU features, CPUID level 0x8007.edx, word 7 */
 XEN_CPUFEATURE(ITSC,  7*32+ 8) /*   Invariant TSC */
-XEN_CPUFEATURE(EFRO,  7*32+10) /*   APERF/MPERF Read Only interface */

[PATCH v8 11/12] x86emul: support RDPRU

2020-05-05 Thread Jan Beulich

While the PM doesn't say so, this assumes that the MPERF value read this
way gets scaled similarly to its reading through RDMSR.

Also introduce the SVM related constants at this occasion.

Signed-off-by: Jan Beulich 
---
v6: Re-base.
v5: The CPUID field used is just 8 bits wide.
v4: Add GENERAL2_INTERCEPT_RDPRU and VMEXIT_RDPRU enumerators. Fold
handling of out of bounds indexes into switch(). Avoid
recalculate_misc() clobbering what recalculate_cpu_policy() has
done. Re-base.
v3: New.
---
RFC: Andrew promised to take care of the CPUID side of this; re-base
 over his work once available.

--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -264,6 +264,7 @@ int libxl_cpuid_parse_config(libxl_cpuid
 
 {"clzero",   0x8008, NA, CPUID_REG_EBX,  0,  1},
 {"rstr-fp-err-ptrs", 0x8008, NA, CPUID_REG_EBX, 2, 1},
+{"rdpru",0x8008, NA, CPUID_REG_EBX,  4,  1},
 {"wbnoinvd", 0x8008, NA, CPUID_REG_EBX,  9,  1},
 {"ibpb", 0x8008, NA, CPUID_REG_EBX, 12,  1},
 {"ppin", 0x8008, NA, CPUID_REG_EBX, 23,  1},
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -148,6 +148,8 @@ static const char *const str_e8b[32] =
 [ 0] = "clzero",
 [ 2] = "rstr-fp-err-ptrs",
 
+[ 4] = "rdpru",
+
 /* [ 8] */[ 9] = "wbnoinvd",
 
 [12] = "ibpb",
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -683,6 +683,13 @@ static int read_msr(
 {
 switch ( reg )
 {
+case 0x00e8: /* APERF */
+case 0xc0e8: /* APERF_RD_ONLY */
+#define APERF_LO_VALUE 0xAEAEAEAE
+#define APERF_HI_VALUE 0xEAEAEAEA
+*val = ((uint64_t)APERF_HI_VALUE << 32) | APERF_LO_VALUE;
+return X86EMUL_OKAY;
+
 case 0xc080: /* EFER */
 *val = ctxt->addr_size > 32 ? 0x500 /* LME|LMA */ : 0;
 return X86EMUL_OKAY;
@@ -2421,6 +2428,30 @@ int main(int argc, char **argv)
 else
 printf("skipped\n");
 
+printf("%-40s", "Testing rdpru...");
+instr[0] = 0x0f; instr[1] = 0x01; instr[2] = 0xfd;
+regs.eip = (unsigned long)[0];
+regs.ecx = 1;
+regs.eflags = EFLAGS_ALWAYS_SET;
+rc = x86_emulate(, );
+if ( (rc != X86EMUL_OKAY) ||
+ (regs.eax != APERF_LO_VALUE) || (regs.edx != APERF_HI_VALUE) ||
+ !(regs.eflags & X86_EFLAGS_CF) ||
+ (regs.eip != (unsigned long)[3]) )
+goto fail;
+if ( ctxt.cpuid->extd.rdpru_max < 0x )
+{
+regs.eip = (unsigned long)[0];
+regs.ecx = ctxt.cpuid->extd.rdpru_max + 1;
+regs.eflags = EFLAGS_ALWAYS_SET | X86_EFLAGS_CF;
+rc = x86_emulate(, );
+if ( (rc != X86EMUL_OKAY) || regs.eax || regs.edx ||
+ (regs.eflags & X86_EFLAGS_CF) ||
+ (regs.eip != (unsigned long)[3]) )
+goto fail;
+}
+printf("okay\n");
+
 printf("%-40s", "Testing fnstenv 4(%ecx)...");
 if ( stack_exec && cpu_has_fpu )
 {
--- a/tools/tests/x86_emulator/x86-emulate.c
+++ b/tools/tests/x86_emulator/x86-emulate.c
@@ -84,6 +84,8 @@ bool emul_test_init(void)
 cp.feat.avx512pf = cp.feat.avx512f;
 cp.feat.rdpid = true;
 cp.extd.clzero = true;
+cp.extd.rdpru = true;
+cp.extd.rdpru_max = 1;
 
 if ( cpu_has_xsave )
 {
@@ -156,11 +158,11 @@ int emul_test_cpuid(
 }
 
 /*
- * The emulator doesn't itself use CLZERO, so we can always run the
+ * The emulator doesn't itself use CLZERO/RDPRU, so we can always run the
  * respective test(s).
  */
 if ( leaf == 0x8008 )
-res->b |= 1U << 0;
+res->b |= (1U << 0) | (1U << 4);
 
 return X86EMUL_OKAY;
 }
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -243,8 +243,6 @@ static void recalculate_misc(struct cpui
 /* Most of Power/RAS hidden from guests. */
 p->extd.raw[0x7].a = p->extd.raw[0x7].b = p->extd.raw[0x7].c = 0;
 
-p->extd.raw[0x8].d = 0;
-
 switch ( p->x86_vendor )
 {
 case X86_VENDOR_INTEL:
@@ -263,6 +261,7 @@ static void recalculate_misc(struct cpui
 
 p->extd.raw[0x8].a &= 0x;
 p->extd.raw[0x8].c = 0;
+p->extd.raw[0x8].d = 0;
 break;
 
 case X86_VENDOR_AMD:
@@ -281,6 +280,7 @@ static void recalculate_misc(struct cpui
 
 p->extd.raw[0x8].a &= 0x; /* GuestMaxPhysAddr hidden. */
 p->extd.raw[0x8].c &= 0x0003f0ff;
+p->extd.raw[0x8].d &= 0x;
 
 p->extd.raw[0x9] = EMPTY_LEAF;
 
@@ -643,6 +643,11 @@ void recalculate_cpuid_policy(struct dom
 
 p->extd.maxlinaddr = p->extd.lm ? 48 : 32;
 
+if ( p->extd.rdpru )
+p->extd.rdpru_max = min(p->extd.rdpru_max, max->extd.rdpru_max);
+else
+p->extd.rdpru_max = 0;
+
 recalculate_xstate(p);
 recalculate_misc(p);
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -1967,6 +1967,7 @@ amd_like(const struct

Re: [PATCH v8 09/12] x86/HVM: scale MPERF values reported to guests (on AMD)

2020-05-05 Thread Jan Beulich

On 05.05.2020 10:17, Jan Beulich wrote:
> AMD's PM specifies that MPERF (and its r/o counterpart) reads are
> affected by the TSC ratio. Hence when processing such reads in software
> we too should scale the values. While we don't currently (yet) expose
> the underlying feature flags, besides us allowing the MSRs to be read
> nevertheless, RDPRU is going to expose the values even to user space.
> 
> Furthermore, due to the not exposed feature flags, this change has the
> effect of making properly inaccessible (for reads) the two MSRs.
> 
> Note that writes to MPERF (and APERF) continue to be unsupported.
> 
> Signed-off-by: Jan Beulich 

Sorry, just re-sent this one with correct numbering.

Jan

[PATCH v8 10/12] x86/HVM: scale MPERF values reported to guests (on AMD)

2020-05-05 Thread Jan Beulich

AMD's PM specifies that MPERF (and its r/o counterpart) reads are
affected by the TSC ratio. Hence when processing such reads in software
we too should scale the values. While we don't currently (yet) expose
the underlying feature flags, besides us allowing the MSRs to be read
nevertheless, RDPRU is going to expose the values even to user space.

Furthermore, due to the not exposed feature flags, this change has the
effect of making properly inaccessible (for reads) the two MSRs.

Note that writes to MPERF (and APERF) continue to be unsupported.

Signed-off-by: Jan Beulich 
---
v3: New.
---
I did consider whether to put the code in guest_rdmsr() instead, but
decided that it's better to have it next to TSC handling.

--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3478,6 +3478,22 @@ int hvm_msr_read_intercept(unsigned int
 *msr_content = v->arch.hvm.msr_tsc_adjust;
 break;
 
+case MSR_MPERF_RD_ONLY:
+if ( !d->arch.cpuid->extd.efro )
+{
+goto gp_fault;
+
+case MSR_IA32_MPERF:
+if ( !(d->arch.cpuid->basic.raw[6].c &
+   CPUID6_ECX_APERFMPERF_CAPABILITY) )
+goto gp_fault;
+}
+if ( rdmsr_safe(msr, *msr_content) )
+goto gp_fault;
+if ( d->arch.cpuid->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) )
+*msr_content = hvm_get_guest_tsc_fixed(v, *msr_content);
+break;
+
 case MSR_APIC_BASE:
 *msr_content = vcpu_vlapic(v)->hw.apic_base_msr;
 break;
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -405,6 +405,9 @@
 #define MSR_IA32_MPERF 0x00e7
 #define MSR_IA32_APERF 0x00e8
 
+#define MSR_MPERF_RD_ONLY  0xc0e7
+#define MSR_APERF_RD_ONLY  0xc0e8
+
 #define MSR_IA32_THERM_CONTROL 0x019a
 #define MSR_IA32_THERM_INTERRUPT   0x019b
 #define MSR_IA32_THERM_STATUS  0x019c

[PATCH v8 09/12] x86emul: support FXSAVE/FXRSTOR

2020-05-05 Thread Jan Beulich

Note that FPU selector handling as well as MXCSR mask saving for now
does not honor differences between host and guest visible featuresets.

While for Intel operation of the insns with CR4.OSFXSR=0 is
implementation dependent, use the easiest solution there: Simply don't
look at the bit in the first place. For AMD and alike the behavior is
well defined, so it gets handled together with FFXSR.

Signed-off-by: Jan Beulich 
---
v8: Respect EFER.FFXSE and CR4.OSFXSR. Correct wrong X86EMUL_NO_*
dependencies. Reduce #ifdef-ary.
v7: New.

--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -767,6 +767,12 @@ static void zap_fpsel(unsigned int *env,
 }
 }
 
+static void zap_xfpsel(unsigned int *env)
+{
+env[3] &= ~0x;
+env[5] &= ~0x;
+}
+
 #ifdef __x86_64__
 # define STKVAL_DISP 64
 static const struct {
@@ -2517,6 +2523,91 @@ int main(int argc, char **argv)
 else
 printf("skipped\n");
 
+printf("%-40s", "Testing fxsave 4(%ecx)...");
+if ( stack_exec && cpu_has_fxsr )
+{
+const uint16_t nine = 9;
+
+memset(res + 0x80, 0xcc, 0x400);
+if ( cpu_has_sse2 )
+asm volatile ( "pcmpeqd %xmm7, %xmm7\n\t"
+   "pxor %xmm6, %xmm6\n\t"
+   "psubw %xmm7, %xmm6" );
+asm volatile ( "fninit\n\t"
+   "fld1\n\t"
+   "fidivs %1\n\t"
+   "fxsave %0"
+   : "=m" (res[0x100]) : "m" (nine) : "memory" );
+zap_xfpsel([0x100]);
+instr[0] = 0x0f; instr[1] = 0xae; instr[2] = 0x41; instr[3] = 0x04;
+regs.eip = (unsigned long)[0];
+regs.ecx = (unsigned long)(res + 0x7f);
+memset(res + 0x100 + 0x74, 0x33, 0x30);
+memset(res + 0x80 + 0x74, 0x33, 0x30);
+rc = x86_emulate(, );
+zap_xfpsel([0x80]);
+if ( (rc != X86EMUL_OKAY) ||
+ memcmp(res + 0x80, res + 0x100, 0x200) ||
+ (regs.eip != (unsigned long)[4]) )
+goto fail;
+printf("okay\n");
+}
+else
+printf("skipped\n");
+
+printf("%-40s", "Testing fxrstor -4(%ecx)...");
+if ( stack_exec && cpu_has_fxsr )
+{
+const uint16_t eleven = 11;
+
+memset(res + 0x80, 0xcc, 0x400);
+asm volatile ( "fxsave %0" : "=m" (res[0x80]) :: "memory" );
+zap_xfpsel([0x80]);
+if ( cpu_has_sse2 )
+asm volatile ( "pxor %xmm7, %xmm6\n\t"
+   "pxor %xmm7, %xmm3\n\t"
+   "pxor %xmm7, %xmm0\n\t"
+   "pxor %xmm7, %xmm7" );
+asm volatile ( "fninit\n\t"
+   "fld1\n\t"
+   "fidivs %0\n\t"
+   :: "m" (eleven) );
+instr[0] = 0x0f; instr[1] = 0xae; instr[2] = 0x49; instr[3] = 0xfc;
+regs.eip = (unsigned long)[0];
+regs.ecx = (unsigned long)(res + 0x81);
+rc = x86_emulate(, );
+asm volatile ( "fxsave %0" : "=m" (res[0x100]) :: "memory" );
+if ( (rc != X86EMUL_OKAY) ||
+ memcmp(res + 0x100, res + 0x80, 0x200) ||
+ (regs.eip != (unsigned long)[4]) )
+goto fail;
+printf("okay\n");
+}
+else
+printf("skipped\n");
+
+#ifdef __x86_64__
+printf("%-40s", "Testing fxsaveq 8(%edx)...");
+if ( stack_exec && cpu_has_fxsr )
+{
+memset(res + 0x80, 0xcc, 0x400);
+asm volatile ( "fxsaveq %0" : "=m" (res[0x100]) :: "memory" );
+instr[0] = 0x48; instr[1] = 0x0f; instr[2] = 0xae; instr[3] = 0x42; 
instr[4] = 0x08;
+regs.eip = (unsigned long)[0];
+regs.edx = (unsigned long)(res + 0x7e);
+memset(res + 0x100 + 0x74, 0x33, 0x30);
+memset(res + 0x80 + 0x74, 0x33, 0x30);
+rc = x86_emulate(, );
+if ( (rc != X86EMUL_OKAY) ||
+ memcmp(res + 0x80, res + 0x100, 0x200) ||
+ (regs.eip != (unsigned long)[5]) )
+goto fail;
+printf("okay\n");
+}
+else
+printf("skipped\n");
+#endif
+
 printf("%-40s", "Testing movq %mm3,(%ecx)...");
 if ( stack_exec && cpu_has_mmx )
 {
--- a/tools/tests/x86_emulator/x86-emulate.c
+++ b/tools/tests/x86_emulator/x86-emulate.c
@@ -30,6 +30,13 @@ struct cpuid_policy cp;
 static char fpu_save_area[4096] __attribute__((__aligned__((64;
 static bool use_xsave;
 
+/*
+ * Re-use the area above also as scratch space for the emulator itself.
+ * (When debugging the emulator, care needs to be taken when inserting
+ * printf() or alike function calls into regions using this.)
+ */
+#define FXSAVE_AREA ((struct x86_fxsr *)fpu_save_area)
+
 void emul_save_fpu_state(void)
 {
 if ( use_xsave )
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -860,6 +860,11 @@ struct x86_emulate_state {
 blk_fld, /* FLDENV, FRSTOR */
 blk_fst, /* FNSTENV,

[PATCH v8 09/12] x86/HVM: scale MPERF values reported to guests (on AMD)

2020-05-05 Thread Jan Beulich

AMD's PM specifies that MPERF (and its r/o counterpart) reads are
affected by the TSC ratio. Hence when processing such reads in software
we too should scale the values. While we don't currently (yet) expose
the underlying feature flags, besides us allowing the MSRs to be read
nevertheless, RDPRU is going to expose the values even to user space.

Furthermore, due to the not exposed feature flags, this change has the
effect of making properly inaccessible (for reads) the two MSRs.

Note that writes to MPERF (and APERF) continue to be unsupported.

Signed-off-by: Jan Beulich 
---
v3: New.
---
I did consider whether to put the code in guest_rdmsr() instead, but
decided that it's better to have it next to TSC handling.

--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3478,6 +3478,22 @@ int hvm_msr_read_intercept(unsigned int
 *msr_content = v->arch.hvm.msr_tsc_adjust;
 break;
 
+case MSR_MPERF_RD_ONLY:
+if ( !d->arch.cpuid->extd.efro )
+{
+goto gp_fault;
+
+case MSR_IA32_MPERF:
+if ( !(d->arch.cpuid->basic.raw[6].c &
+   CPUID6_ECX_APERFMPERF_CAPABILITY) )
+goto gp_fault;
+}
+if ( rdmsr_safe(msr, *msr_content) )
+goto gp_fault;
+if ( d->arch.cpuid->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) )
+*msr_content = hvm_get_guest_tsc_fixed(v, *msr_content);
+break;
+
 case MSR_APIC_BASE:
 *msr_content = vcpu_vlapic(v)->hw.apic_base_msr;
 break;
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -405,6 +405,9 @@
 #define MSR_IA32_MPERF 0x00e7
 #define MSR_IA32_APERF 0x00e8
 
+#define MSR_MPERF_RD_ONLY  0xc0e7
+#define MSR_APERF_RD_ONLY  0xc0e8
+
 #define MSR_IA32_THERM_CONTROL 0x019a
 #define MSR_IA32_THERM_INTERRUPT   0x019b
 #define MSR_IA32_THERM_STATUS  0x019c

[PATCH v8 08/12] x86emul: support FLDENV and FRSTOR

2020-05-05 Thread Jan Beulich

While the Intel SDM claims that FRSTOR itself may raise #MF upon
completion, this was confirmed by Intel to be a doc error which will be
corrected in due course; behavior is like FLDENV, and like old hard copy
manuals describe it. Otherwise we'd have to emulate the insn by filling
st(N) in suitable order, followed by FLDENV.

Signed-off-by: Jan Beulich 
---
v7: New.

--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -2442,6 +2442,27 @@ int main(int argc, char **argv)
 else
 printf("skipped\n");
 
+printf("%-40s", "Testing fldenv 8(%edx)...");
+if ( stack_exec && cpu_has_fpu )
+{
+asm volatile ( "fnstenv %0\n\t"
+   "fninit"
+   : "=m" (res[2]) :: "memory" );
+zap_fpsel([2], true);
+instr[0] = 0xd9; instr[1] = 0x62; instr[2] = 0x08;
+regs.eip = (unsigned long)[0];
+regs.edx = (unsigned long)res;
+rc = x86_emulate(, );
+asm volatile ( "fnstenv %0" : "=m" (res[9]) :: "memory" );
+if ( (rc != X86EMUL_OKAY) ||
+ memcmp(res + 2, res + 9, 28) ||
+ (regs.eip != (unsigned long)[3]) )
+goto fail;
+printf("okay\n");
+}
+else
+printf("skipped\n");
+
 printf("%-40s", "Testing 16-bit fnsave (%ecx)...");
 if ( stack_exec && cpu_has_fpu )
 {
@@ -2468,6 +2489,31 @@ int main(int argc, char **argv)
 goto fail;
 printf("okay\n");
 }
+else
+printf("skipped\n");
+
+printf("%-40s", "Testing frstor (%edx)...");
+if ( stack_exec && cpu_has_fpu )
+{
+const uint16_t seven = 7;
+
+asm volatile ( "fninit\n\t"
+   "fld1\n\t"
+   "fidivs %1\n\t"
+   "fnsave %0\n\t"
+   : "=" (res[0]) : "m" (seven) : "memory" );
+zap_fpsel([0], true);
+instr[0] = 0xdd; instr[1] = 0x22;
+regs.eip = (unsigned long)[0];
+regs.edx = (unsigned long)res;
+rc = x86_emulate(, );
+asm volatile ( "fnsave %0" : "=m" (res[27]) :: "memory" );
+if ( (rc != X86EMUL_OKAY) ||
+ memcmp(res, res + 27, 108) ||
+ (regs.eip != (unsigned long)[2]) )
+goto fail;
+printf("okay\n");
+}
 else
 printf("skipped\n");
 
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -857,6 +857,7 @@ struct x86_emulate_state {
 blk_NONE,
 blk_enqcmd,
 #ifndef X86EMUL_NO_FPU
+blk_fld, /* FLDENV, FRSTOR */
 blk_fst, /* FNSTENV, FNSAVE */
 #endif
 blk_movdir,
@@ -4948,21 +4949,14 @@ x86_emulate(
 dst.bytes = 4;
 emulate_fpu_insn_memdst(b, modrm_reg & 7, dst.val);
 break;
-case 4: /* fldenv - TODO */
-state->fpu_ctrl = true;
-goto unimplemented_insn;
-case 5: /* fldcw m2byte */
-state->fpu_ctrl = true;
-fpu_memsrc16:
-if ( (rc = ops->read(ea.mem.seg, ea.mem.off, ,
- 2, ctxt)) != X86EMUL_OKAY )
-goto done;
-emulate_fpu_insn_memsrc(b, modrm_reg & 7, src.val);
-break;
+case 4: /* fldenv */
+/* Raise #MF now if there are pending unmasked exceptions. */
+emulate_fpu_insn_stub(0xd9, 0xd0 /* fnop */);
+/* fall through */
 case 6: /* fnstenv */
 fail_if(!ops->blk);
-state->blk = blk_fst;
-/* REX is meaningless for this insn by this point. */
+state->blk = modrm_reg & 2 ? blk_fst : blk_fld;
+/* REX is meaningless for these insns by this point. */
 rex_prefix = in_protmode(ctxt, ops);
 if ( (rc = ops->blk(ea.mem.seg, ea.mem.off, NULL,
 op_bytes > 2 ? sizeof(struct x87_env32)
@@ -4972,6 +4966,14 @@ x86_emulate(
 goto done;
 state->fpu_ctrl = true;
 break;
+case 5: /* fldcw m2byte */
+state->fpu_ctrl = true;
+fpu_memsrc16:
+if ( (rc = ops->read(ea.mem.seg, ea.mem.off, ,
+ 2, ctxt)) != X86EMUL_OKAY )
+goto done;
+emulate_fpu_insn_memsrc(b, modrm_reg & 7, src.val);
+break;
 case 7: /* fnstcw m2byte */
 state->fpu_ctrl = true;
 fpu_memdst16:
@@ -5124,13 +5126,14 @@ x86_emulate(
 dst.bytes = 8;
 emulate_fpu_insn_memdst(b, modrm_reg & 7, dst.val);
 break;
-case 4: /* frstor - TODO */
-state->fpu_ctrl = true;
-goto unimplemented_insn;
+case 4: /* frstor

[PATCH v8 07/12] x86emul: support FNSTENV and FNSAVE

2020-05-05 Thread Jan Beulich

To avoid introducing another boolean into emulator state, the
rex_prefix field gets (ab)used to convey the real/VM86 vs protected mode
info (affecting structure layout, albeit not size) to x86_emul_blk().

Signed-off-by: Jan Beulich 
---
TBD: The full 16-bit padding fields in the 32-bit structures get filled
 with all ones by modern CPUs (i.e. other than the comment says for
 FIP and FDP). We may want to mirror this as well (for the real mode
 variant), even if those fields' contents are unspecified.
---
v7: New.

--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -120,6 +120,7 @@ static inline bool xcr0_mask(uint64_t ma
 }
 
 #define cache_line_size() (cp.basic.clflush_size * 8)
+#define cpu_has_fpucp.basic.fpu
 #define cpu_has_mmxcp.basic.mmx
 #define cpu_has_fxsr   cp.basic.fxsr
 #define cpu_has_ssecp.basic.sse
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -748,6 +748,25 @@ static struct x86_emulate_ops emulops =
 
 #define MMAP_ADDR 0x10
 
+/*
+ * 64-bit OSes may not (be able to) properly restore the two selectors in
+ * the FPU environment. Zap them so that memcmp() on two saved images will
+ * work regardless of whether a context switch occurred in the middle.
+ */
+static void zap_fpsel(unsigned int *env, bool is_32bit)
+{
+if ( is_32bit )
+{
+env[4] &= ~0x;
+env[6] &= ~0x;
+}
+else
+{
+env[2] &= ~0x;
+env[3] &= ~0x;
+}
+}
+
 #ifdef __x86_64__
 # define STKVAL_DISP 64
 static const struct {
@@ -2394,6 +2413,62 @@ int main(int argc, char **argv)
 printf("okay\n");
 }
 else
+printf("skipped\n");
+
+printf("%-40s", "Testing fnstenv 4(%ecx)...");
+if ( stack_exec && cpu_has_fpu )
+{
+const uint16_t three = 3;
+
+asm volatile ( "fninit\n\t"
+   "fld1\n\t"
+   "fidivs %1\n\t"
+   "fstenv %0"
+   : "=m" (res[9]) : "m" (three) : "memory" );
+zap_fpsel([9], true);
+instr[0] = 0xd9; instr[1] = 0x71; instr[2] = 0x04;
+regs.eip = (unsigned long)[0];
+regs.ecx = (unsigned long)res;
+res[8] = 0xaa55aa55;
+rc = x86_emulate(, );
+zap_fpsel([1], true);
+if ( (rc != X86EMUL_OKAY) ||
+ memcmp(res + 1, res + 9, 28) ||
+ res[8] != 0xaa55aa55 ||
+ (regs.eip != (unsigned long)[3]) )
+goto fail;
+printf("okay\n");
+}
+else
+printf("skipped\n");
+
+printf("%-40s", "Testing 16-bit fnsave (%ecx)...");
+if ( stack_exec && cpu_has_fpu )
+{
+const uint16_t five = 5;
+
+asm volatile ( "fninit\n\t"
+   "fld1\n\t"
+   "fidivs %1\n\t"
+   "fsaves %0"
+   : "=m" (res[25]) : "m" (five) : "memory" );
+zap_fpsel([25], false);
+asm volatile ( "frstors %0" :: "m" (res[25]) : "memory" );
+instr[0] = 0x66; instr[1] = 0xdd; instr[2] = 0x31;
+regs.eip = (unsigned long)[0];
+regs.ecx = (unsigned long)res;
+res[23] = 0xaa55aa55;
+res[24] = 0xaa55aa55;
+rc = x86_emulate(, );
+if ( (rc != X86EMUL_OKAY) ||
+ memcmp(res, res + 25, 94) ||
+ (res[23] >> 16) != 0xaa55 ||
+ res[24] != 0xaa55aa55 ||
+ (regs.eip != (unsigned long)[3]) )
+goto fail;
+printf("okay\n");
+}
+else
 printf("skipped\n");
 
 printf("%-40s", "Testing movq %mm3,(%ecx)...");
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -856,6 +856,9 @@ struct x86_emulate_state {
 enum {
 blk_NONE,
 blk_enqcmd,
+#ifndef X86EMUL_NO_FPU
+blk_fst, /* FNSTENV, FNSAVE */
+#endif
 blk_movdir,
 } blk;
 uint8_t modrm, modrm_mod, modrm_reg, modrm_rm;
@@ -897,6 +900,50 @@ struct x86_emulate_state {
 #define PTR_POISON NULL /* 32-bit builds are for user-space, so NULL is OK. */
 #endif
 
+#ifndef X86EMUL_NO_FPU
+struct x87_env16 {
+uint16_t fcw;
+uint16_t fsw;
+uint16_t ftw;
+union {
+struct {
+uint16_t fip_lo;
+uint16_t fop:11, :1, fip_hi:4;
+uint16_t fdp_lo;
+uint16_t :12, fdp_hi:4;
+} real;
+struct {
+uint16_t fip;
+uint16_t fcs;
+uint16_t fdp;
+uint16_t fds;
+} prot;
+} mode;
+};
+
+struct x87_env32 {
+uint32_t fcw:16, :16;
+uint32_t fsw:16, :16;
+uint32_t ftw:16, :16;
+union {
+struct {
+/* some CPUs/FPUs also store the full FIP here */
+uint32_t fip_lo:16, :16;
+uint32_t fop:11, :1, fip_hi:16, :4;
+/* some CPUs/FPUs also store the full FDP here */
+

[PATCH v8 05/12] x86emul: support X{SUS,RES}LDTRK

2020-05-05 Thread Jan Beulich

There's nothing to be done by the emulator, as we unconditionally abort
any XBEGIN.

Signed-off-by: Jan Beulich 
---
v6: New.

--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -208,6 +208,7 @@ int libxl_cpuid_parse_config(libxl_cpuid
 {"avx512-vnni",  0x0007,  0, CPUID_REG_ECX, 11,  1},
 {"avx512-bitalg",0x0007,  0, CPUID_REG_ECX, 12,  1},
 {"avx512-vpopcntdq",0x0007,0,CPUID_REG_ECX, 14,  1},
+{"tsxldtrk", 0x0007,  0, CPUID_REG_ECX, 16,  1},
 {"rdpid",0x0007,  0, CPUID_REG_ECX, 22,  1},
 {"cldemote", 0x0007,  0, CPUID_REG_ECX, 25,  1},
 
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -128,6 +128,7 @@ static const char *const str_7c0[32] =
 [10] = "vpclmulqdq",   [11] = "avx512_vnni",
 [12] = "avx512_bitalg",
 [14] = "avx512_vpopcntdq",
+[16] = "tsxldtrk",
 
 [22] = "rdpid",
 /* 24 */   [25] = "cldemote",
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -1921,6 +1921,7 @@ amd_like(const struct x86_emulate_ctxt *
 #define vcpu_has_avx512_vnni() (ctxt->cpuid->feat.avx512_vnni)
 #define vcpu_has_avx512_bitalg() (ctxt->cpuid->feat.avx512_bitalg)
 #define vcpu_has_avx512_vpopcntdq() (ctxt->cpuid->feat.avx512_vpopcntdq)
+#define vcpu_has_tsxldtrk()(ctxt->cpuid->feat.tsxldtrk)
 #define vcpu_has_rdpid()   (ctxt->cpuid->feat.rdpid)
 #define vcpu_has_movdiri() (ctxt->cpuid->feat.movdiri)
 #define vcpu_has_movdir64b()   (ctxt->cpuid->feat.movdir64b)
@@ -5668,6 +5669,20 @@ x86_emulate(
 host_and_vcpu_must_have(serialize);
 asm volatile ( ".byte 0x0f, 0x01, 0xe8" );
 break;
+case vex_f2: /* xsusldtrk */
+vcpu_must_have(tsxldtrk);
+break;
+default:
+goto unimplemented_insn;
+}
+break;
+
+case 0xe9:
+switch ( vex.pfx )
+{
+case vex_f2: /* xresldtrk */
+vcpu_must_have(tsxldtrk);
+break;
 default:
 goto unimplemented_insn;
 }
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -236,6 +236,7 @@ XEN_CPUFEATURE(VPCLMULQDQ,6*32+10) /
 XEN_CPUFEATURE(AVX512_VNNI,   6*32+11) /*A  Vector Neural Network Instrs */
 XEN_CPUFEATURE(AVX512_BITALG, 6*32+12) /*A  Support for VPOPCNT[B,W] and 
VPSHUFBITQMB */
 XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A  POPCNT for vectors of DW/QW */
+XEN_CPUFEATURE(TSXLDTRK,  6*32+16) /*A  TSX load tracking suspend/resume 
insns */
 XEN_CPUFEATURE(RDPID, 6*32+22) /*A  RDPID instruction */
 XEN_CPUFEATURE(CLDEMOTE,  6*32+25) /*A  CLDEMOTE instruction */
 XEN_CPUFEATURE(MOVDIRI,   6*32+27) /*A  MOVDIRI instruction */
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -284,6 +284,9 @@ def crunch_numbers(state):
 # as dependent features simplifies Xen's logic, and prevents the guest
 # from seeing implausible configurations.
 IBRSB: [STIBP, SSBD],
+
+# In principle the TSXLDTRK insns could also be considered independent.
+RTM: [TSXLDTRK],
 }
 
 deep_features = tuple(sorted(deps.keys()))

[PATCH v8 06/12] x86/HVM: make hvmemul_blk() capable of handling r/o operations

2020-05-05 Thread Jan Beulich

In preparation for handling e.g. FLDENV or {F,FX,X}RSTOR here as well.

Signed-off-by: Jan Beulich 
---
v8: New (could be folded into "x86emul: support MOVDIR{I,64B} insns",
but would invalidate Paul's R-b there).

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1453,7 +1453,7 @@ static int hvmemul_blk(
 struct hvm_emulate_ctxt *hvmemul_ctxt =
 container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
 unsigned long addr;
-uint32_t pfec = PFEC_page_present | PFEC_write_access;
+uint32_t pfec = PFEC_page_present;
 int rc;
 void *mapping = NULL;
 
@@ -1462,6 +1462,9 @@ static int hvmemul_blk(
 if ( rc != X86EMUL_OKAY || !bytes )
 return rc;
 
+if ( x86_insn_is_mem_write(state, ctxt) )
+pfec |= PFEC_write_access;
+
 if ( is_x86_system_segment(seg) )
 pfec |= PFEC_implicit;
 else if ( hvmemul_ctxt->seg_reg[x86_seg_ss].dpl == 3 )

[PATCH v8 02/12] x86emul: support MOVDIR{I,64B} insns

2020-05-05 Thread Jan Beulich

Introduce a new blk() hook, paralleling the rmw() one in a certain way,
but being intended for larger data sizes, and hence its HVM intermediate
handling function doesn't fall back to splitting the operation if the
requested virtual address can't be mapped.

Note that SDM revision 071 doesn't specify exception behavior for
ModRM.mod == 0b11; assuming #UD here.

Signed-off-by: Jan Beulich 
Reviewed-by: Paul Durrant 
---
v7: Add blk_NONE. Move  harness'es setting of .blk. Correct indentation.
Re-base.
v6: Fold MOVDIRI and MOVDIR64B changes again. Use blk() for both. All
tags dropped.
v5: Introduce/use ->blk() hook. Correct asm() operands.
v4: Split MOVDIRI and MOVDIR64B and move this one ahead. Re-base.
v3: Update description.
---
(SDE: -tnt)

--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -652,6 +652,18 @@ static int cmpxchg(
 return X86EMUL_OKAY;
 }
 
+static int blk(
+enum x86_segment seg,
+unsigned long offset,
+void *p_data,
+unsigned int bytes,
+uint32_t *eflags,
+struct x86_emulate_state *state,
+struct x86_emulate_ctxt *ctxt)
+{
+return x86_emul_blk((void *)offset, p_data, bytes, eflags, state, ctxt);
+}
+
 static int read_segment(
 enum x86_segment seg,
 struct segment_register *reg,
@@ -721,6 +733,7 @@ static struct x86_emulate_ops emulops =
 .insn_fetch = fetch,
 .write  = write,
 .cmpxchg= cmpxchg,
+.blk= blk,
 .read_segment = read_segment,
 .cpuid  = emul_test_cpuid,
 .read_cr= emul_test_read_cr,
@@ -2339,6 +2352,50 @@ int main(int argc, char **argv)
 goto fail;
 printf("okay\n");
 
+printf("%-40s", "Testing movdiri %edx,(%ecx)...");
+if ( stack_exec && cpu_has_movdiri )
+{
+instr[0] = 0x0f; instr[1] = 0x38; instr[2] = 0xf9; instr[3] = 0x11;
+
+regs.eip = (unsigned long)[0];
+regs.ecx = (unsigned long)memset(res, -1, 16);
+regs.edx = 0x44332211;
+
+rc = x86_emulate(, );
+if ( (rc != X86EMUL_OKAY) ||
+ (regs.eip != (unsigned long)[4]) ||
+ res[0] != 0x44332211 || ~res[1] )
+goto fail;
+printf("okay\n");
+}
+else
+printf("skipped\n");
+
+printf("%-40s", "Testing movdir64b 144(%edx),%ecx...");
+if ( stack_exec && cpu_has_movdir64b )
+{
+instr[0] = 0x66; instr[1] = 0x0f; instr[2] = 0x38; instr[3] = 0xf8;
+instr[4] = 0x8a; instr[5] = 0x90; instr[8] = instr[7] = instr[6] = 0;
+
+regs.eip = (unsigned long)[0];
+for ( i = 0; i < 64; ++i )
+res[i] = i - 20;
+regs.edx = (unsigned long)res;
+regs.ecx = (unsigned long)(res + 16);
+
+rc = x86_emulate(, );
+if ( (rc != X86EMUL_OKAY) ||
+ (regs.eip != (unsigned long)[9]) ||
+ res[15] != -5 || res[32] != 12 )
+goto fail;
+for ( i = 16; i < 32; ++i )
+if ( res[i] != i )
+goto fail;
+printf("okay\n");
+}
+else
+printf("skipped\n");
+
 printf("%-40s", "Testing movq %mm3,(%ecx)...");
 if ( stack_exec && cpu_has_mmx )
 {
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -154,6 +154,8 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512_vnni (cp.feat.avx512_vnni && xcr0_mask(0xe6))
 #define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6))
 #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6))
+#define cpu_has_movdiricp.feat.movdiri
+#define cpu_has_movdir64b  cp.feat.movdir64b
 #define cpu_has_avx512_4vnniw (cp.feat.avx512_4vnniw && xcr0_mask(0xe6))
 #define cpu_has_avx512_4fmaps (cp.feat.avx512_4fmaps && xcr0_mask(0xe6))
 #define cpu_has_avx512_bf16 (cp.feat.avx512_bf16 && xcr0_mask(0xe6))
--- a/xen/arch/x86/arch.mk
+++ b/xen/arch/x86/arch.mk
@@ -47,6 +47,7 @@ $(call as-option-add,CFLAGS,CC,"rdseed %
 $(call as-option-add,CFLAGS,CC,"clwb (%rax)",-DHAVE_AS_CLWB)
 $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1",-DHAVE_AS_QUOTED_SYM)
 $(call as-option-add,CFLAGS,CC,"invpcid (%rax)$$(comma)%rax",-DHAVE_AS_INVPCID)
+$(call as-option-add,CFLAGS,CC,"movdiri %rax$$(comma)(%rax)",-DHAVE_AS_MOVDIR)
 
 # GAS's idea of true is -1.  Clang's idea is 1
 $(call as-option-add,CFLAGS,CC,\
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1441,6 +1441,44 @@ static int hvmemul_rmw(
 return rc;
 }
 
+static int hvmemul_blk(
+enum x86_segment seg,
+unsigned long offset,
+void *p_data,
+unsigned int bytes,
+uint32_t *eflags,
+struct x86_emulate_state *state,
+struct x86_emulate_ctxt *ctxt)
+{
+struct hvm_emulate_ctxt *hvmemul_ctxt =
+container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
+unsigned long addr;
+uint32_t pfec = PFEC_page_present | PFEC_write_access;
+int rc;
+void *mapping = NULL;
+
+rc =

[PATCH v8 03/12] x86emul: support ENQCMD insns

2020-05-05 Thread Jan Beulich

Note that the ISA extensions document revision 038 doesn't specify
exception behavior for ModRM.mod == 0b11; assuming #UD here.

No tests are being added to the harness - this would be quite hard,
we can't just issue the insns against RAM. Their similarity with
MOVDIR64B should have the test case there be god enough to cover any
fundamental flaws.

Signed-off-by: Jan Beulich 
---
TBD: This doesn't (can't) consult PASID translation tables yet, as we
 have no VMX code for this so far. I guess for this we will want to
 replace the direct ->read_msr(MSR_IA32_PASID, ...) with a new
 ->read_pasid() hook.
---
v7: Re-base.
v6: Re-base.
v5: New.

--- a/xen/arch/x86/arch.mk
+++ b/xen/arch/x86/arch.mk
@@ -48,6 +48,7 @@ $(call as-option-add,CFLAGS,CC,"clwb (%r
 $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1",-DHAVE_AS_QUOTED_SYM)
 $(call as-option-add,CFLAGS,CC,"invpcid (%rax)$$(comma)%rax",-DHAVE_AS_INVPCID)
 $(call as-option-add,CFLAGS,CC,"movdiri %rax$$(comma)(%rax)",-DHAVE_AS_MOVDIR)
+$(call as-option-add,CFLAGS,CC,"enqcmd (%rax)$$(comma)%rax",-DHAVE_AS_ENQCMD)
 
 # GAS's idea of true is -1.  Clang's idea is 1
 $(call as-option-add,CFLAGS,CC,\
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -855,6 +855,7 @@ struct x86_emulate_state {
 } rmw;
 enum {
 blk_NONE,
+blk_enqcmd,
 blk_movdir,
 } blk;
 uint8_t modrm, modrm_mod, modrm_reg, modrm_rm;
@@ -901,6 +902,7 @@ typedef union {
 uint64_t __attribute__ ((aligned(16))) xmm[2];
 uint64_t __attribute__ ((aligned(32))) ymm[4];
 uint64_t __attribute__ ((aligned(64))) zmm[8];
+uint32_t data32[16];
 } mmval_t;
 
 /*
@@ -1922,6 +1924,7 @@ amd_like(const struct x86_emulate_ctxt *
 #define vcpu_has_rdpid()   (ctxt->cpuid->feat.rdpid)
 #define vcpu_has_movdiri() (ctxt->cpuid->feat.movdiri)
 #define vcpu_has_movdir64b()   (ctxt->cpuid->feat.movdir64b)
+#define vcpu_has_enqcmd()  (ctxt->cpuid->feat.enqcmd)
 #define vcpu_has_avx512_4vnniw() (ctxt->cpuid->feat.avx512_4vnniw)
 #define vcpu_has_avx512_4fmaps() (ctxt->cpuid->feat.avx512_4fmaps)
 #define vcpu_has_avx512_bf16() (ctxt->cpuid->feat.avx512_bf16)
@@ -10200,6 +10203,36 @@ x86_emulate(
 state->simd_size = simd_none;
 break;
 
+case X86EMUL_OPC_F2(0x0f38, 0xf8): /* enqcmd r,m512 */
+case X86EMUL_OPC_F3(0x0f38, 0xf8): /* enqcmds r,m512 */
+host_and_vcpu_must_have(enqcmd);
+generate_exception_if(ea.type != OP_MEM, EXC_UD);
+generate_exception_if(vex.pfx != vex_f2 && !mode_ring0(), EXC_GP, 0);
+src.val = truncate_ea(*dst.reg);
+generate_exception_if(!is_aligned(x86_seg_es, src.val, 64, ctxt, ops),
+  EXC_GP, 0);
+fail_if(!ops->blk);
+BUILD_BUG_ON(sizeof(*mmvalp) < 64);
+if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp, 64,
+ ctxt)) != X86EMUL_OKAY )
+goto done;
+if ( vex.pfx == vex_f2 ) /* enqcmd */
+{
+fail_if(!ops->read_msr);
+if ( (rc = ops->read_msr(MSR_IA32_PASID,
+ _val, ctxt)) != X86EMUL_OKAY )
+goto done;
+generate_exception_if(!(msr_val & PASID_VALID), EXC_GP, 0);
+mmvalp->data32[0] = MASK_EXTR(msr_val, PASID_PASID_MASK);
+}
+mmvalp->data32[0] &= ~0x7ff0;
+state->blk = blk_enqcmd;
+if ( (rc = ops->blk(x86_seg_es, src.val, mmvalp, 64, &_regs.eflags,
+state, ctxt)) != X86EMUL_OKAY )
+goto done;
+state->simd_size = simd_none;
+break;
+
 case X86EMUL_OPC(0x0f38, 0xf9): /* movdiri mem,r */
 host_and_vcpu_must_have(movdiri);
 generate_exception_if(dst.type != OP_MEM, EXC_UD);
@@ -11480,11 +11513,36 @@ int x86_emul_blk(
 {
 switch ( state->blk )
 {
+bool zf;
+
 /*
  * Throughout this switch(), memory clobbers are used to compensate
  * that other operands may not properly express the (full) memory
  * ranges covered.
  */
+case blk_enqcmd:
+ASSERT(bytes == 64);
+if ( ((unsigned long)ptr & 0x3f) )
+{
+ASSERT_UNREACHABLE();
+return X86EMUL_UNHANDLEABLE;
+}
+*eflags &= ~EFLAGS_MASK;
+#ifdef HAVE_AS_ENQCMD
+asm ( "enqcmds (%[src]), %[dst]" ASM_FLAG_OUT(, "; setz %0")
+  : [zf] ASM_FLAG_OUT("=@ccz", "=qm") (zf)
+  : [src] "r" (data), [dst] "r" (ptr) : "memory" );
+#else
+/* enqcmds (%rsi), %rdi */
+asm ( ".byte 0xf3, 0x0f, 0x38, 0xf8, 0x3e"
+  ASM_FLAG_OUT(, "; setz %[zf]")
+  : [zf] ASM_FLAG_OUT("=@ccz", "=qm") (zf)
+  : "S" (data), "D" (ptr) : "memory" );
+#endif
+if ( zf )
+*eflags |= X86_EFLAGS_ZF;
+break;
+
 case blk_movdir:
 switch ( bytes )
 {
---

[PATCH v8 04/12] x86emul: support SERIALIZE

2020-05-05 Thread Jan Beulich

... enabling its use by all guest kinds at the same time.

Signed-off-by: Jan Beulich 
---
v7: Re-base.
v6: New.

--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -214,6 +214,7 @@ int libxl_cpuid_parse_config(libxl_cpuid
 {"avx512-4vnniw",0x0007,  0, CPUID_REG_EDX,  2,  1},
 {"avx512-4fmaps",0x0007,  0, CPUID_REG_EDX,  3,  1},
 {"md-clear", 0x0007,  0, CPUID_REG_EDX, 10,  1},
+{"serialize",0x0007,  0, CPUID_REG_EDX, 14,  1},
 {"cet-ibt",  0x0007,  0, CPUID_REG_EDX, 20,  1},
 {"ibrsb",0x0007,  0, CPUID_REG_EDX, 26,  1},
 {"stibp",0x0007,  0, CPUID_REG_EDX, 27,  1},
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -161,6 +161,7 @@ static const char *const str_7d0[32] =
 
 [10] = "md-clear",
 /* 12 */[13] = "tsx-force-abort",
+[14] = "serialize",
 
 [18] = "pconfig",
 [20] = "cet-ibt",
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -158,6 +158,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_movdir64b  cp.feat.movdir64b
 #define cpu_has_avx512_4vnniw (cp.feat.avx512_4vnniw && xcr0_mask(0xe6))
 #define cpu_has_avx512_4fmaps (cp.feat.avx512_4fmaps && xcr0_mask(0xe6))
+#define cpu_has_serialize  cp.feat.serialize
 #define cpu_has_avx512_bf16 (cp.feat.avx512_bf16 && xcr0_mask(0xe6))
 
 #define cpu_has_xgetbv1   (cpu_has_xsave && cp.xstate.xgetbv1)
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -1927,6 +1927,7 @@ amd_like(const struct x86_emulate_ctxt *
 #define vcpu_has_enqcmd()  (ctxt->cpuid->feat.enqcmd)
 #define vcpu_has_avx512_4vnniw() (ctxt->cpuid->feat.avx512_4vnniw)
 #define vcpu_has_avx512_4fmaps() (ctxt->cpuid->feat.avx512_4fmaps)
+#define vcpu_has_serialize()   (ctxt->cpuid->feat.serialize)
 #define vcpu_has_avx512_bf16() (ctxt->cpuid->feat.avx512_bf16)
 
 #define vcpu_must_have(feat) \
@@ -5660,6 +5661,18 @@ x86_emulate(
 goto done;
 break;
 
+case 0xe8:
+switch ( vex.pfx )
+{
+case vex_none: /* serialize */
+host_and_vcpu_must_have(serialize);
+asm volatile ( ".byte 0x0f, 0x01, 0xe8" );
+break;
+default:
+goto unimplemented_insn;
+}
+break;
+
 case 0xf8: /* swapgs */
 generate_exception_if(!mode_64bit(), EXC_UD);
 generate_exception_if(!mode_ring0(), EXC_GP, 0);
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -129,6 +129,7 @@
 #define cpu_has_avx512_4vnniw   boot_cpu_has(X86_FEATURE_AVX512_4VNNIW)
 #define cpu_has_avx512_4fmaps   boot_cpu_has(X86_FEATURE_AVX512_4FMAPS)
 #define cpu_has_tsx_force_abort boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)
+#define cpu_has_serialize   boot_cpu_has(X86_FEATURE_SERIALIZE)
 
 /* CPUID level 0x0007:1.eax */
 #define cpu_has_avx512_bf16 boot_cpu_has(X86_FEATURE_AVX512_BF16)
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -258,6 +258,7 @@ XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /
 XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A  AVX512 Multiply Accumulation 
Single Precision */
 XEN_CPUFEATURE(MD_CLEAR,  9*32+10) /*A  VERW clears microarchitectural 
buffers */
 XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
+XEN_CPUFEATURE(SERIALIZE, 9*32+14) /*A  SERIALIZE insn */
 XEN_CPUFEATURE(CET_IBT,   9*32+20) /*   CET - Indirect Branch Tracking */
 XEN_CPUFEATURE(IBRSB, 9*32+26) /*A  IBRS and IBPB support (used by 
Intel) */
 XEN_CPUFEATURE(STIBP, 9*32+27) /*A  STIBP */

[PATCH v8 01/12] x86emul: disable FPU/MMX/SIMD insn emulation when !HVM

2020-05-05 Thread Jan Beulich

In a pure PV environment (the PV shim in particular) we don't really
need emulation of all these. To limit #ifdef-ary utilize some of the
CASE_*() macros we have, by providing variants expanding to
(effectively) nothing (really a label, which in turn requires passing
-Wno-unused-label to the compiler when build such configurations).

Due to the mixture of macro and #ifdef use, the placement of some of
the #ifdef-s is a little arbitrary.

The resulting object file's .text is less than half the size of the
original, and looks to also be compiling a little more quickly.

This is meant as a first step; more parts can likely be disabled down
the road.

Suggested-by: Andrew Cooper 
Signed-off-by: Jan Beulich 
---
v7: Integrate into this series. Re-base.
---
I'll be happy to take suggestions allowing to avoid -Wno-unused-label.

--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -73,6 +73,9 @@ obj-y += vm_event.o
 obj-y += xstate.o
 extra-y += asm-macros.i
 
+ifneq ($(CONFIG_HVM),y)
+x86_emulate.o: CFLAGS-y += -Wno-unused-label
+endif
 x86_emulate.o: x86_emulate/x86_emulate.c x86_emulate/x86_emulate.h
 
 efi-y := $(shell if [ ! -r $(BASEDIR)/include/xen/compile.h -o \
--- a/xen/arch/x86/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate.c
@@ -42,6 +42,12 @@
 }  \
 })
 
+#ifndef CONFIG_HVM
+# define X86EMUL_NO_FPU
+# define X86EMUL_NO_MMX
+# define X86EMUL_NO_SIMD
+#endif
+
 #include "x86_emulate/x86_emulate.c"
 
 int x86emul_read_xcr(unsigned int reg, uint64_t *val,
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3492,6 +3492,7 @@ x86_decode(
 op_bytes = 4;
 break;
 
+#ifndef X86EMUL_NO_SIMD
 case simd_packed_int:
 switch ( vex.pfx )
 {
@@ -3557,6 +3558,7 @@ x86_decode(
 case simd_256:
 op_bytes = 32;
 break;
+#endif /* !X86EMUL_NO_SIMD */
 
 default:
 op_bytes = 0;
@@ -3711,6 +3713,7 @@ x86_emulate(
 break;
 }
 
+#ifndef X86EMUL_NO_SIMD
 /* With a memory operand, fetch the mask register in use (if any). */
 if ( ea.type == OP_MEM && evex.opmsk &&
  _get_fpu(fpu_type = X86EMUL_FPU_opmask, ctxt, ops) == X86EMUL_OKAY )
@@ -3741,6 +3744,7 @@ x86_emulate(
 put_fpu(X86EMUL_FPU_opmask, false, state, ctxt, ops);
 fpu_type = X86EMUL_FPU_none;
 }
+#endif /* !X86EMUL_NO_SIMD */
 
 /* Decode (but don't fetch) the destination operand: register or memory. */
 switch ( d & DstMask )
@@ -4386,11 +4390,13 @@ x86_emulate(
 singlestep = _regs.eflags & X86_EFLAGS_TF;
 break;
 
+#ifndef X86EMUL_NO_FPU
 case 0x9b:  /* wait/fwait */
 host_and_vcpu_must_have(fpu);
 get_fpu(X86EMUL_FPU_wait);
 emulate_fpu_insn_stub(b);
 break;
+#endif
 
 case 0x9c: /* pushf */
 if ( (_regs.eflags & X86_EFLAGS_VM) &&
@@ -4800,6 +4806,7 @@ x86_emulate(
 break;
 }
 
+#ifndef X86EMUL_NO_FPU
 case 0xd8: /* FPU 0xd8 */
 host_and_vcpu_must_have(fpu);
 get_fpu(X86EMUL_FPU_fpu);
@@ -5134,6 +5141,7 @@ x86_emulate(
 }
 }
 break;
+#endif /* !X86EMUL_NO_FPU */
 
 case 0xe0 ... 0xe2: /* loop{,z,nz} */ {
 unsigned long count = get_loop_count(&_regs, ad_bytes);
@@ -6079,6 +6087,8 @@ x86_emulate(
 case X86EMUL_OPC(0x0f, 0x19) ... X86EMUL_OPC(0x0f, 0x1f): /* nop */
 break;
 
+#ifndef X86EMUL_NO_MMX
+
 case X86EMUL_OPC(0x0f, 0x0e): /* femms */
 host_and_vcpu_must_have(3dnow);
 asm volatile ( "femms" );
@@ -6099,39 +6109,71 @@ x86_emulate(
 state->simd_size = simd_other;
 goto simd_0f_imm8;
 
-#define CASE_SIMD_PACKED_INT(pfx, opc)   \
+#endif /* !X86EMUL_NO_MMX */
+
+#if !defined(X86EMUL_NO_SIMD) && !defined(X86EMUL_NO_MMX)
+# define CASE_SIMD_PACKED_INT(pfx, opc)  \
 case X86EMUL_OPC(pfx, opc):  \
 case X86EMUL_OPC_66(pfx, opc)
-#define CASE_SIMD_PACKED_INT_VEX(pfx, opc)   \
+#elif !defined(X86EMUL_NO_SIMD)
+# define CASE_SIMD_PACKED_INT(pfx, opc)  \
+case X86EMUL_OPC_66(pfx, opc)
+#elif !defined(X86EMUL_NO_MMX)
+# define CASE_SIMD_PACKED_INT(pfx, opc)  \
+case X86EMUL_OPC(pfx, opc)
+#else
+# define CASE_SIMD_PACKED_INT(pfx, opc) C##pfx##_##opc
+#endif
+
+#ifndef X86EMUL_NO_SIMD
+
+# define CASE_SIMD_PACKED_INT_VEX(pfx, opc)  \
 CASE_SIMD_PACKED_INT(pfx, opc):  \
 case X86EMUL_OPC_VEX_66(pfx, opc)
 
-#define CASE_SIMD_ALL_FP(kind, pfx, opc) \
+# define CASE_SIMD_ALL_FP(kind, pfx, opc)\
 CASE_SIMD_PACKED_FP(kind, pfx, opc): \
 CASE_SIMD_SCALAR_FP(kind, pfx, opc)
-#define CASE_SIMD_PACKED_FP(kind, pfx, opc)  \
+# define CASE_SIMD_PACKED_FP(kind, pfx, opc) \
 case X86EMUL_OPC##kind(pfx, opc):\
 case X86EMUL_OPC##kind##_66(pfx, opc)
-#define CASE_SIMD_SCALAR_FP(kind, pfx, opc)  \
+# define CASE_SIMD_SCALAR_FP(kind, pfx, opc) \
 case X86EMUL_OPC##kind##_F3(pfx, opc):

[PATCH v8 00/12] x86emul: further work

2020-05-05 Thread Jan Beulich

At least the RDPRU patch is still at least partly RFC. I'd
appreciate though if at least some of the series could go in rather
sooner than later. Note in particular that there's no strong
ordering throughout the entire series, i.e. certain later parts
could be arranged for to go in earlier. This is also specifically
the case for what is now the last patch.

 1: x86emul: disable FPU/MMX/SIMD insn emulation when !HVM
 2: x86emul: support MOVDIR{I,64B} insns
 3: x86emul: support ENQCMD insn
 4: x86emul: support SERIALIZE
 5: x86emul: support X{SUS,RES}LDTRK
 6: x86/HVM: make hvmemul_blk() capable of handling r/o operations
 7: x86emul: support FNSTENV and FNSAVE
 8: x86emul: support FLDENV and FRSTOR
 9: x86emul: support FXSAVE/FXRSTOR
10: x86/HVM: scale MPERF values reported to guests (on AMD)
11: x86emul: support RDPRU
12: x86/HVM: don't needlessly intercept APERF/MPERF/TSC MSR reads

Main changes from v7 are the new patch 6 and quiite a bit of re-work
of what is now patch 9. See individual patches for revision details.

Jan

Re: [PATCH v2 1/4] x86/mm: no-one passes a NULL domain to init_xen_l4_slots()

2020-05-05 Thread Jan Beulich

Andrew,

On 21.04.2020 18:40, Roger Pau Monné wrote:
> On Tue, Apr 21, 2020 at 11:11:03AM +0200, Jan Beulich wrote:
>> Drop the NULL checks - they've been introduced by commit 8d7b633ada
>> ("x86/mm: Consolidate all Xen L4 slot writing into
>> init_xen_l4_slots()") for no apparent reason.
>>
>> Signed-off-by: Jan Beulich 
> 
> Reviewed-by: Roger Pau Monné 

you weren't entirely happy with the change because of the
possible (or, as you state, necessary) need to undo this. I
still think in the current shape the NULL checks are
pointless and hence would better go away. Re-introducing them
(adjusted to whatever shape the function may be in by that
time) is not that big of a problem. May I ask that you
explicitly clarify whether you actively NAK the patch, accept
it going in with Roger's R-b, or would be willing to ack it?

Thanks, Jan

Re: [PATCH v2 4/4] x86: adjustments to guest handle treatment

2020-05-05 Thread Jan Beulich

On 22.04.2020 10:26, Roger Pau Monné wrote:
> On Tue, Apr 21, 2020 at 11:13:23AM +0200, Jan Beulich wrote:
>> First of all avoid excessive conversions. copy_{from,to}_guest(), for
>> example, work fine with all of XEN_GUEST_HANDLE{,_64,_PARAM}().
>>
>> Further
>> - do_physdev_op_compat() didn't use the param form for its parameter,
>> - {hap,shadow}_track_dirty_vram() wrongly used the param form,
>> - compat processor Px logic failed to check compatibility of native and
>>   compat structures not further converted.
>>
>> As this eliminates all users of guest_handle_from_param() and as there's
>> no real need to allow for conversions in both directions, drop the
>> macros as well.
>>
>> Signed-off-by: Jan Beulich 
>>[...]
>> --- a/xen/drivers/acpi/pmstat.c
>> +++ b/xen/drivers/acpi/pmstat.c
>> @@ -492,7 +492,7 @@ int do_pm_op(struct xen_sysctl_pm_op *op
>>  return ret;
>>  }
>>  
>> -int acpi_set_pdc_bits(u32 acpi_id, XEN_GUEST_HANDLE_PARAM(uint32) pdc)
>> +int acpi_set_pdc_bits(u32 acpi_id, XEN_GUEST_HANDLE(uint32) pdc)
> 
> Nit: switch to uint32_t while there?
> 
> Reviewed-by: Roger Pau Monné 

Unless I hear objections, I'm intending to commit this then in a
day or two with the suggested change made and the R-b given. Of
course a formally required ack for the Arm side dropping of
guest_handle_from_param() would still be nice ...

Jan

[PATCH v3] x86/PV: remove unnecessary toggle_guest_pt() overhead

2020-05-05 Thread Jan Beulich

While the mere updating of ->pv_cr3 and ->root_pgt_changed aren't overly
expensive (but still needed only for the toggle_guest_mode() path), the
effect of the latter on the exit-to-guest path is not insignificant.
Move the logic into toggle_guest_mode(), on the basis that
toggle_guest_pt() will always be invoked in pairs, yet we can't safely
undo the setting of root_pgt_changed during the second of these
invocations.

While at it, add a comment ahead of toggle_guest_pt() to clarify its
intended usage.

Signed-off-by: Jan Beulich 
---
v3: Add comment ahead of toggle_guest_pt().
v2: Extend description.

--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -393,18 +393,10 @@ bool __init xpti_pcid_enabled(void)
 
 static void _toggle_guest_pt(struct vcpu *v)
 {
-const struct domain *d = v->domain;
-struct cpu_info *cpu_info = get_cpu_info();
 unsigned long cr3;
 
 v->arch.flags ^= TF_kernel_mode;
 update_cr3(v);
-if ( d->arch.pv.xpti )
-{
-cpu_info->root_pgt_changed = true;
-cpu_info->pv_cr3 = __pa(this_cpu(root_pgt)) |
-   (d->arch.pv.pcid ? get_pcid_bits(v, true) : 0);
-}
 
 /*
  * Don't flush user global mappings from the TLB. Don't tick TLB clock.
@@ -412,15 +404,11 @@ static void _toggle_guest_pt(struct vcpu
  * In shadow mode, though, update_cr3() may need to be accompanied by a
  * TLB flush (for just the incoming PCID), as the top level page table may
  * have changed behind our backs. To be on the safe side, suppress the
- * no-flush unconditionally in this case. The XPTI CR3 write, if enabled,
- * will then need to be a flushing one too.
+ * no-flush unconditionally in this case.
  */
 cr3 = v->arch.cr3;
-if ( shadow_mode_enabled(d) )
-{
+if ( shadow_mode_enabled(v->domain) )
 cr3 &= ~X86_CR3_NOFLUSH;
-cpu_info->pv_cr3 &= ~X86_CR3_NOFLUSH;
-}
 write_cr3(cr3);
 
 if ( !(v->arch.flags & TF_kernel_mode) )
@@ -436,6 +424,8 @@ static void _toggle_guest_pt(struct vcpu
 
 void toggle_guest_mode(struct vcpu *v)
 {
+const struct domain *d = v->domain;
+
 ASSERT(!is_pv_32bit_vcpu(v));
 
 /* %fs/%gs bases can only be stale if WR{FS,GS}BASE are usable. */
@@ -449,8 +439,27 @@ void toggle_guest_mode(struct vcpu *v)
 asm volatile ( "swapgs" );
 
 _toggle_guest_pt(v);
+
+if ( d->arch.pv.xpti )
+{
+struct cpu_info *cpu_info = get_cpu_info();
+
+cpu_info->root_pgt_changed = true;
+cpu_info->pv_cr3 = __pa(this_cpu(root_pgt)) |
+   (d->arch.pv.pcid ? get_pcid_bits(v, true) : 0);
+/*
+ * As in _toggle_guest_pt() the XPTI CR3 write needs to be a TLB-
+ * flushing one too for shadow mode guests.
+ */
+if ( shadow_mode_enabled(d) )
+cpu_info->pv_cr3 &= ~X86_CR3_NOFLUSH;
+}
 }
 
+/*
+ * Must be called in matching pairs without returning to guest context
+ * inbetween.
+ */
 void toggle_guest_pt(struct vcpu *v)
 {
 if ( !is_pv_32bit_vcpu(v) )

79 matches

Mail list logo