[linux-linus test] 168086: trouble: blocked/broken/fail/pass

2022-02-11 Thread osstest service owner
flight 168086 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168086/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64  broken
 build-arm64-pvopsbroken
 build-arm64-xsm  broken
 build-arm64-xsm   4 host-install(4)broken REGR. vs. 168080
 build-arm64   4 host-install(4)broken REGR. vs. 168080
 build-arm64-pvops 4 host-install(4)broken REGR. vs. 168080

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-examine  1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-seattle   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168080
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168080
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168080
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168080
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168080
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail like 
168080
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168080
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168080
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168080
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linux1d41d2e82623b40ee27811fe9ea38bafe2e722e9
baseline version:
 linuxf1baf68e1383f6ed93eb9cff2866d46562607a43

Last test of basis   168080  2022-02-11 00:09:22 Z1 days
Testing same since   168086  2022-02-11 20:11:19 Z0 days1 attempts


People who touched revisions under test:
  Aaron Liu 
  Alex Deucher 
  Alexandre Ghiti 
  Alviro Iskandar Setiawan 
  Ammar Faizi 
  Andreas Gruenbacher 
  Andrzej Pietrasiewicz 
  Andy Shevchenko 
  Aurelien Jarno 
  Bartosz Golaszewski 
  Bean Huo 
  Bob Peterson 
  Brian Norris 
  Catalin Marinas 
  Changbin Du 
  Christoph Hellwig 
  Christoph Niedermaier 
  Damien Le Moal 
  Daniel Bristot de Oliveira 
  Daniel Stone 
  Daniel Vetter 
  Daniel 

Development Issue of Concern

2022-02-11 Thread Elliott Mitchell
The tradition has been to name the active development branch in GIT has
been named "master".  Quite a number of people object to the name due to
its history.

In light of such concerns, perhaps the Xen Project should join with other
similar projects and move to have the active development branch renamed
"main"?


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





Re: [PATCH RFC 1/3] xen/efi: Always query the console information and get GOP

2022-02-11 Thread Elliott Mitchell
On Mon, Feb 07, 2022 at 06:52:57PM +, Julien Grall wrote:
> On 07/02/2022 08:46, Jan Beulich wrote:
> > On 06.02.2022 20:28, Julien Grall wrote:
> >>
> >> It is not entirely clear to me why the GOP was only fetched when
> >> the configuration file is used.
> >>
> >> I have tested this on RPI4 and it seems to work. Any chance this
> >> was done to workaround an x86 platform?
> > 
> > This was done so in the context of making the code work for Arm. See
> > commit c38cf865ec82 ("EFI: ignore EFI commandline, skip console setup
> > when booted from GRUB"), the description of which explicitly says
> > 
> > "Don't do EFI console or video configuration when booted by GRUB.  The EFI 
> > boot
> >   code does some console and video initialization to support native EFI 
> > boot from
> >   the EFI boot manager or EFI shell.  This initlization should not be done 
> > when
> >   booted using GRUB."
> 
> I read that and still couldn't figure out why this was done like that.

The most likely motivation was simply "Eww!  ACPI/UEFI use gobs of
memory!  Purge the abomination!"

Unfortunately ACPI/UEFI are large an complex due to trying to solve a
large and complex problem.  ACPI/UEFI attempt to provide an OS agnostic
presentation of the hardware layout.  Whereas device-trees are a common
*format* for presenting hardware to *an* OS (similar to how JSON is a
common format).

Due to the size and complexity, most developers have preferred the
simpler device-tree format even though that severely limits OS choice.
As such, nuking ACPI/UEFI's presence is common in the ARM world.  Versus
the x86 world where Intel dragged everyone onto ACPI/UEFI.

One can see this in patches like Roman Shaposhnik's "Making full 2G of
memory available to Xen on HiKey" which simply tosses EFI into the
garbage bin as useless overhead.

Yet the ARM world is now large enough to justify OS-agnostic solutions
such as ACPI/UEFI.  The standards behind device-trees might be heading in
this direction, but they're way behind.




You stated your patch was for 5.17-rc2.  How much backporting would you
expect this patch to be viable for?  (I'm unsure how much churn is
occuring in the relevant portions of Linux) The long-term branches of
Linux include 5.4.179, 5.10.100 and 5.15.23.  `patch` indicated it could
apply to 5.10.92 source with fuzz (hmm).  This suggests 5.15 is likely
viable, but 5.10 is risky and 5.4 is a very long shot.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





Re: [RFC v2 5/8] xen/arm: introduce SCMI-SMC mediator driver

2022-02-11 Thread Stefano Stabellini
On Fri, 11 Feb 2022, Oleksii Moisieiev wrote:
> On Fri, Feb 11, 2022 at 11:18:47AM +, Bertrand Marquis wrote:
> > Hi Oleksii,
> > 
> > 
> > > On 11 Feb 2022, at 10:44, Oleksii Moisieiev  
> > > wrote:
> > > 
> > > Hi Bertrand,
> > > 
> > > On Fri, Feb 11, 2022 at 08:46:05AM +, Bertrand Marquis wrote:
> > >> Hi Oleksii,
> > >> 
> > >> 
> > >>> On 8 Feb 2022, at 18:00, Oleksii Moisieiev  
> > >>> wrote:
> > >>> 
> > >>> This is the implementation of SCI interface, called SCMI-SMC driver,
> > >>> which works as the mediator between XEN Domains and Firmware (SCP, ATF 
> > >>> etc).
> > >>> This allows devices from the Domains to work with clocks, resets and
> > >>> power-domains without access to CPG.
> > >>> 
> > >>> Originally, cpg should be passed to the domain so it can work with
> > >>> power-domains/clocks/resets etc. Considering that cpg can't be split 
> > >>> between
> > >>> the Domains, we get the limitation that the devices, which are using
> > >>> power-domains/clocks/resets etc, couldn't be split between the domains.
> > >>> The solution is to move the power-domain/clock/resets etc to the
> > >>> Firmware (such as SCP firmware or ATF) and provide interface for the
> > >>> Domains. XEN should have an entity, caled SCI-Mediator, which is
> > >>> responsible for messages redirection between Domains and Firmware and
> > >>> for permission handling.
> > >>> 
> > >>> The following features are implemented:
> > >>> - request SCMI channels from ATF and pass channels to Domains;
> > >>> - set device permissions for Domains based on the Domain partial
> > >>> device-tree. Devices with permissions are able to work with clocks,
> > >>> resets and power-domains via SCMI;
> > >>> - redirect scmi messages from Domains to ATF.
> > >> 
> > >> Before going more deeply in the code I would like to discuss the general
> > >> design here and ask some questions to prevent to rework the code before
> > >> we all agree that this is the right solution and that we want this in 
> > >> Xen.
> > >> 
> > >> First I want to point out that clock/reset/power virtualization is a 
> > >> problem
> > >> on most applications using device pass-through and I am very glad that
> > >> someone is looking into it.
> > >> Also SCMI is the current standard existing for this so relying on it is 
> > >> a very
> > >> good idea.
> > >> 
> > >> Latest version SCMI standard (DEN0056D v3.1) is defining some means
> > >> to use SCMI on a virtualised system. In chapter 4.2.1, the standard
> > >> recommends to set permissions per agent in the hypervisor so that a VM
> > >> could later use the discovery protocol to detect the resources and use 
> > >> them.
> > >> Using this kind of scenario the mediator in Xen would just configure the
> > >> Permissions in the SCMI and would then rely on it to limit what is 
> > >> possible
> > >> by who just by just assigning a channel to a VM.
> > > 
> > >> 
> > >> In your current design (please correct me if I am wrong) you seem to 
> > >> fully
> > >> rely on Xen and the FDT for discovery and permission.
> > > 
> > > In current implementation Xen is the trusted agent. And it's responsible
> > > for permissions setting. During initialization it discovers agent and
> > > set permissions by using BASE_SET_DEVICE_PERMISSIONS to the Dom0. When
> > > new domain is created, Xen assigns agent id for this domain and request
> > > resources, that are passed-through to this Domain.
> > 
> > Ok
> > 
> > > 
> > > I'm getting the follwing information from FDT:
> > > 1) Shared memory addressed, which should be used for agents. During
> > > initialization I send BASE_DISCOVER_AGENT to each of this addresses and
> > > receive agent_id. Xen is responsible for assigning agent_id for the
> > > Domain. Then Xen intercept smc calls from the domain, set agent_id and
> > > redirects it to the Firmware.
> > 
> > So Xen is setting the agent ID, no way for a guest to get access to 
> > something it
> > should with more check, am I right ?
> > 
> 
> Yes. Xen is the only entity, which is trusted. So it's responsible for
> setting permissions and assigning agent_id. Guest get's an access only
> for the devices it's allowed to.
> 
> > > 
> > > 2) Devices, that are using SCMI. Those devices has clock/power/resets
> > > etc related to scmi protocol (as it is done in Linux kernel)
> > > and scmi_devid should be set. I'm currently preparing to send patch,
> > > updating kernel bindings with this parameter to Linux kernel.
> > > scmi_devid value should match device id, set in the Firmware.
> > > dt example:
> > >  {
> > >scmi_devid = <1>; // usb0 device id
> > >clocks = <_clock 1> // relays on clock with id 1
> > > }
> > > 
> > > Xen requests permission for the device when device is attached to the
> > > Domain during creation.
> > 
> > Without this, how is (if it is) the linux kernel using SCMI for power 
> > management ?
> 
> Here is how it should be desribed in FDT: 
> /
> {
> firmware {
> scmi {
> 

Re: Metadata and signalling channels for Zephyr virtio-backends on Xen

2022-02-11 Thread Stefano Stabellini
On Fri, 11 Feb 2022, Alex Bennée wrote:
> > FYI, a good and promising approach to handle both SCMI and SCPI is the
> > series recently submitted by EPAM to mediate SCMI and SCPI requests in
> > Xen: https://marc.info/?l=xen-devel=163947444032590
> >
> > (Another "special" virtio backend is virtio-iommu for similar reasons:
> > the guest p2m address mappings and also the IOMMU drivers are in Xen.
> > It is not immediately clear whether a virtio-iommu backend would need to
> > be in Xen or run as a process in dom0/domU.)
> >
> > On the other hand, for all the other "normal" protocols (e.g.
> > virtio-net, virtio-block, etc.) the backend would naturally run as a
> > process in dom0 or domU (e.g. QEMU in Dom0) as one would expect.
> 
> Can domU's not be given particular access to HW they might want to
> tweak? I assume at some point a block device backend needs to actually
> talk to real HW to store the blocks (even if in most cases it would be a
> kernel doing the HW access on it's behalf).

Yes, it would. Block and network are subsystems with limited visibility,
access, and harmful capabilities (assuming IOMMU).

If the block device goes down or is misused, block might not work but
everything else is expected to work. Block only requires visibility of
the block device for it to work. The same is true for network, GPU, USB,
etc.

SCMI is different. If SCMI is misused the whole platform is affected.
SCMI implies visibility of everything in the system. It is not much
about emulating SCMI but more about mediating SCMI calls.

In other words, SCMI is not a device, it is a core interface. In a Xen
model, Xen virtualizes CPU and memory and other core features/interfaces
(timers, interrupt controller, IOMMU, etc). The PCI root complex is
handled by Xen too. Individual (PCI and non-PCI) devices are assigned to
guests.

These are the reasons why I think the best way to enable SCMI in
upstream Xen is with a mediator in the hypervisor as it is currently in
development. Any chances you could combine your efforts with EPAM's
outstanding series? You might be able to spot gaps if any, and might
even have already code to fill those gaps. It would be fantastic to have
your reviews and/or contributions on xen-devel.

Otherwise, if you have to run the virtio-scmi backend in userspace, why
not try to get it to work on Xen :-) It might not be the ideal solution,
but it could be a good learning experience and pave the way for the
other virtio backends which definitely will be in userspace
(virtio-block, virtio-gpu, etc).


> >> Currently the demo setup
> >> is intermediated by a double-ended vhost-user daemon running on the
> >> devbox acting as a go between a number of QEMU instances representing
> >> the front and back-ends. You can view the architecture with Vincents
> >> diagram here:
> >> 
> >>   
> >> https://docs.google.com/drawings/d/1YSuJUSjEdTi2oEUq4oG4A9pBKSEJTAp6hhcHKKhmYHs/edit?usp=sharing
> >> 
> >> The key virtq handling is done over the special carve outs of shared
> >> memory between the front end and guest. However the signalling is
> >> currently over a virtio device on the backend. This is useful for the
> >> PoC but obviously in a real system we don't have a hidden POSIX system
> >> acting as a go between not to mention the additional latency it causes
> >> with all those context switches.
> >> 
> >> I was hoping we could get some more of the Xen experts to the next
> >> Stratos sync (17th Feb) to go over approaches for a properly hosted on
> >> Xen approach. From my recollection (Vincent please correct me if I'm
> >> wrong) of last week the issues that need solving are:
> >
> > Unfortunately I have a regular conflict which prevents me from being
> > able to join the Stratos calls. However, I can certainly make myself
> > available for one call (unless something unexpected comes up).
> >
> >
> >>  * How to handle configuration steps as FE guests come up
> >> 
> >> The SCMI server will be a long running persistent backend because it is
> >> managing real HW resources. However the guests may be ephemeral (or just
> >> restarted) so we can't just hard-code everything in a DTB. While the
> >> virtio-negotiation in the config space covers most things we still need
> >> information like where in the guests address space the shared memory
> >> lives and at what offset into that the queues are created. As far as I'm
> >> aware the canonical source of domain information is XenStore
> >> (https://wiki.xenproject.org/wiki/XenStore) but this relies on a Dom0
> >> type approach. Is there an alternative for dom0less systems or do we
> >> need a dom0-light approach, for example using STR-21 (Ensure Zephyr can
> >> run cleanly as a Dom0 guest) providing just enough services for FE's to
> >> register metadata and BE's to read it?
> >
> > I'll try to answer the question for a generic virtio frontend and
> > backend instead (not SCMI because SCMI is unique due to the reasons
> > above.)
> >
> > Yes, xenstore is the easiest way 

[linux-5.4 test] 168085: trouble: blocked/broken/fail/pass

2022-02-11 Thread osstest service owner
flight 168085 linux-5.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168085/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64  broken
 build-arm64-pvopsbroken
 build-arm64-xsm  broken
 build-arm64-pvops 4 host-install(4)broken REGR. vs. 168060
 build-arm64-xsm   4 host-install(4)broken REGR. vs. 168060
 build-arm64   4 host-install(4)broken REGR. vs. 168060

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass 
in 168084
 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeat  fail pass in 168084

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-examine  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-seattle   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168060
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168060
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168060
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168060
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168060
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168060
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168060
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168060
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168060
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168060
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168060
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168060
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 linux

Re: Metadata and signalling channels for Zephyr virtio-backends on Xen

2022-02-11 Thread Alex Bennée


Stefano Stabellini  writes:

> On Mon, 7 Feb 2022, Alex Bennée wrote:
>> Hi Stefano,
>> 
>> Vincent gave an update on his virtio-scmi work at the last Stratos sync
>> call and the discussion moved onto next steps.
>
> Hi Alex,
>
> I don't know the specifics of virtio-scmi, but if it is about power,
> clocks, reset, etc. like the original SCMI protocol, then virtio-scmi is
> likely going to be very different from all the other virtio frontends
> and backends. That's because SCMI requires a full view of the system,
> which is different from something like virtio-net that is limited to the
> emulation of 1 device. For this reason, it is likely that the
> virtio-scmi backend would be a better fit in Xen itself, rather than run
> in userspace inside a VM.

That may be a good solution for Xen but I still think it's worthwhile
being able to package SCMI in a VM for other hypervisors. We are just
happening to use Xen as a nice type-1 example.

Vincents SCMI server code is portable anyway and can reside in a Zephyr
app, firmware blob or a userspace vhost-user client.

> FYI, a good and promising approach to handle both SCMI and SCPI is the
> series recently submitted by EPAM to mediate SCMI and SCPI requests in
> Xen: https://marc.info/?l=xen-devel=163947444032590
>
> (Another "special" virtio backend is virtio-iommu for similar reasons:
> the guest p2m address mappings and also the IOMMU drivers are in Xen.
> It is not immediately clear whether a virtio-iommu backend would need to
> be in Xen or run as a process in dom0/domU.)
>
> On the other hand, for all the other "normal" protocols (e.g.
> virtio-net, virtio-block, etc.) the backend would naturally run as a
> process in dom0 or domU (e.g. QEMU in Dom0) as one would expect.

Can domU's not be given particular access to HW they might want to
tweak? I assume at some point a block device backend needs to actually
talk to real HW to store the blocks (even if in most cases it would be a
kernel doing the HW access on it's behalf).

>> Currently the demo setup
>> is intermediated by a double-ended vhost-user daemon running on the
>> devbox acting as a go between a number of QEMU instances representing
>> the front and back-ends. You can view the architecture with Vincents
>> diagram here:
>> 
>>   
>> https://docs.google.com/drawings/d/1YSuJUSjEdTi2oEUq4oG4A9pBKSEJTAp6hhcHKKhmYHs/edit?usp=sharing
>> 
>> The key virtq handling is done over the special carve outs of shared
>> memory between the front end and guest. However the signalling is
>> currently over a virtio device on the backend. This is useful for the
>> PoC but obviously in a real system we don't have a hidden POSIX system
>> acting as a go between not to mention the additional latency it causes
>> with all those context switches.
>> 
>> I was hoping we could get some more of the Xen experts to the next
>> Stratos sync (17th Feb) to go over approaches for a properly hosted on
>> Xen approach. From my recollection (Vincent please correct me if I'm
>> wrong) of last week the issues that need solving are:
>
> Unfortunately I have a regular conflict which prevents me from being
> able to join the Stratos calls. However, I can certainly make myself
> available for one call (unless something unexpected comes up).
>
>
>>  * How to handle configuration steps as FE guests come up
>> 
>> The SCMI server will be a long running persistent backend because it is
>> managing real HW resources. However the guests may be ephemeral (or just
>> restarted) so we can't just hard-code everything in a DTB. While the
>> virtio-negotiation in the config space covers most things we still need
>> information like where in the guests address space the shared memory
>> lives and at what offset into that the queues are created. As far as I'm
>> aware the canonical source of domain information is XenStore
>> (https://wiki.xenproject.org/wiki/XenStore) but this relies on a Dom0
>> type approach. Is there an alternative for dom0less systems or do we
>> need a dom0-light approach, for example using STR-21 (Ensure Zephyr can
>> run cleanly as a Dom0 guest) providing just enough services for FE's to
>> register metadata and BE's to read it?
>
> I'll try to answer the question for a generic virtio frontend and
> backend instead (not SCMI because SCMI is unique due to the reasons
> above.)
>
> Yes, xenstore is the easiest way to exchange configuration information
> between domains. I think EPAM used xenstore to exchange the
> configuration information in their virtio-block demo. There is a way to
> use xenstore even between dom0less VMs:
> https://marc.info/?l=xen-devel=164340547602391 Not just xenstore but
> full PV drivers too. However, in the dom0less case xenstore is going to
> become available some time after boot, not immediately at startup time.
> That's because you need to wait until xenstored is up and running.
>
> There are other ways to send data from one VM to another which are
> available immediately at boot, such as Argo and 

Re: [PATCH] PCI/MSI: msix_setup_msi_descs: Restore logic for msi_attrib.can_mask

2022-02-11 Thread Bjorn Helgaas
On Fri, Feb 11, 2022 at 01:10:22AM +0100, Josef Johansson wrote:
> On 2/11/22 00:55, Bjorn Helgaas wrote:
> > On Sat, Jan 22, 2022 at 02:10:01AM +0100, Josef Johansson wrote:
> >> From: Josef Johansson 
> >>
> >> PCI/MSI: msix_setup_msi_descs: Restore logic for msi_attrib.can_mask
> > Please match the form and style of previous subject lines (in
> > particular, omit "msix_setup_msi_descs:").
> Would this subject suffice?
> PCI/MSI: Correct use of can_mask in msi_add_msi_desc()

Looks good to me!

Bjorn



Re: [RFC PATCH] arm/vgic-v3: provide custom callbacks for pend_lpi_tree radix tree

2022-02-11 Thread Julien Grall




On 11/02/2022 15:45, Luca Fancellu wrote:




On 11 Feb 2022, at 15:26, Julien Grall  wrote:

Hi Luca,

On 11/02/2022 15:00, Luca Fancellu wrote:

pend_lpi_tree is a radix tree used to store pending irqs, the tree is
protected by a lock for read/write operations.
Currently the radix tree default function to free items uses the
RCU mechanism, calling call_rcu and deferring the operation.
However every access to the structure is protected by the lock so we
can avoid using the default free function that, by using RCU,
increases memory usage and impacts the predictability of the system.


I understand goal but looking at the implementation of vgic_v3_lpi_to_pending() 
(Copied below for convenience). We would release the lock as soon as the 
look-up finish, yet the element is returned.

static struct pending_irq *vgic_v3_lpi_to_pending(struct domain *d,
  unsigned int lpi)
{
struct pending_irq *pirq;

read_lock(>arch.vgic.pend_lpi_tree_lock);
pirq = radix_tree_lookup(>arch.vgic.pend_lpi_tree, lpi);
read_unlock(>arch.vgic.pend_lpi_tree_lock);

return pirq;
}

So the lock will not protect us against removal. If you want to drop the RCU, 
you will need to ensure the structure pending_irq is suitably protected. I 
haven't check whether there are other locks that may suit us here.



Hi Julien,

Yes you are right! I missed that, sorry for the noise.


Actually,... I think I am wrong :/.

I thought the lock pend_lpi_tre_lock would protect pending_irq, but it 
only protects the radix tree element (not the value).


The use in its_discard_event() seems to confirm that because the
pending_irq is re-initialized as soon as it gets destroyed.

I would like a second opinion though.

Cheers,

--
Julien Grall



Re: [PATCH V2 5/13] hid: use time_is_after_jiffies() instead of jiffies judgment

2022-02-11 Thread srinivas pandruvada
On Thu, 2022-02-10 at 18:30 -0800, Qing Wang wrote:
> From: Wang Qing 
> 
> It is better to use time_xxx() directly instead of jiffies judgment
> for understanding.
> 
> Signed-off-by: Wang Qing 
Acked-by: Srinivas Pandruvada 

> ---
>  drivers/hid/intel-ish-hid/ipc/ipc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/hid/intel-ish-hid/ipc/ipc.c b/drivers/hid/intel-
> ish-hid/ipc/ipc.c
> index 8ccb246..15e1423
> --- a/drivers/hid/intel-ish-hid/ipc/ipc.c
> +++ b/drivers/hid/intel-ish-hid/ipc/ipc.c
> @@ -578,7 +578,7 @@ static void _ish_sync_fw_clock(struct
> ishtp_device *dev)
> static unsigned longprev_sync;
> uint64_tusec;
>  
> -   if (prev_sync && jiffies - prev_sync < 20 * HZ)
> +   if (prev_sync && time_is_after_jiffies(prev_sync + 20 * HZ))
> return;
>  
> prev_sync = jiffies;




Re: [RFC PATCH] arm/vgic-v3: provide custom callbacks for pend_lpi_tree radix tree

2022-02-11 Thread Luca Fancellu



> On 11 Feb 2022, at 15:26, Julien Grall  wrote:
> 
> Hi Luca,
> 
> On 11/02/2022 15:00, Luca Fancellu wrote:
>> pend_lpi_tree is a radix tree used to store pending irqs, the tree is
>> protected by a lock for read/write operations.
>> Currently the radix tree default function to free items uses the
>> RCU mechanism, calling call_rcu and deferring the operation.
>> However every access to the structure is protected by the lock so we
>> can avoid using the default free function that, by using RCU,
>> increases memory usage and impacts the predictability of the system.
> 
> I understand goal but looking at the implementation of 
> vgic_v3_lpi_to_pending() (Copied below for convenience). We would release the 
> lock as soon as the look-up finish, yet the element is returned.
> 
> static struct pending_irq *vgic_v3_lpi_to_pending(struct domain *d,
>  unsigned int lpi)
> {
>struct pending_irq *pirq;
> 
>read_lock(>arch.vgic.pend_lpi_tree_lock);
>pirq = radix_tree_lookup(>arch.vgic.pend_lpi_tree, lpi);
>read_unlock(>arch.vgic.pend_lpi_tree_lock);
> 
>return pirq;
> }
> 
> So the lock will not protect us against removal. If you want to drop the RCU, 
> you will need to ensure the structure pending_irq is suitably protected. I 
> haven't check whether there are other locks that may suit us here.
> 

Hi Julien,

Yes you are right! I missed that, sorry for the noise.

Cheers,
Luca

> Cheers,
> 
> -- 
> Julien Grall




Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

2022-02-11 Thread Roger Pau Monné
On Fri, Feb 11, 2022 at 12:13:38PM +, Oleksandr Andrushchenko wrote:
> 
> 
> On 11.02.22 13:40, Roger Pau Monné wrote:
> > On Fri, Feb 11, 2022 at 07:27:39AM +, Oleksandr Andrushchenko wrote:
> >> Hi, Roger!
> >>
> >> On 10.02.22 18:16, Roger Pau Monné wrote:
> >>> On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote:
>  From: Oleksandr Andrushchenko 
> 
>  Introduce a per-domain read/write lock to check whether vpci is present,
>  so we are sure there are no accesses to the contents of the vpci struct
>  if not. This lock can be used (and in a few cases is used right away)
>  so that vpci removal can be performed while holding the lock in write
>  mode. Previously such removal could race with vpci_read for example.
> >>> Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt
> >>> pci_remove_device, and likely when vPCI gets also used in
> >>> {de}assign_device I think.
> >> Yes, this is indeed an issue, but I was not trying to solve it in
> >> context of vPCI locking yet. I think we should discuss how do
> >> we approach pdev locking, so I can create a patch for that.
> >> that being said, I would like not to solve pdev in  this patch yet
> >>
> >> ...I do understand we do want to avoid that, but at the moment
> >> a single reliable way for making sure pdev is alive seems to
> >> be pcidevs_lock
> > I think we will need to make pcidevs_lock a rwlock and take it in read
> > mode for pci_get_pdev_by_domain.
> >
> > We didn't have this scenario before where PCI emulation is done in the
> > hypervisor, and hence the locking around those data structures has not
> > been designed for those use-cases.
> Yes, I do understand that.
> I hope pcidevs lock move to rwlock can be done as a separate
> patch. While this is not done, do you think we can proceed with
> vPCI series and pcidevs locking re-work being done after?

Ideally we would like to sort out the locking once and for all. I
would like to be sure that what we introduce now doesn't turn out to
interact badly when we decide to look at the pcidevs locking issue.

Thanks, Roger.



Re: [PATCH v6 13/13] xen/arm: account IO handlers for emulated PCI MSI-X

2022-02-11 Thread Julien Grall

Hi Oleksandr,

On 04/02/2022 06:34, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

At the moment, we always allocate an extra 16 slots for IO handlers
(see MAX_IO_HANDLER). So while adding IO trap handlers for the emulated
MSI-X registers we need to explicitly tell that we have additional IO
handlers, so those are accounted.

Signed-off-by: Oleksandr Andrushchenko 


Acked-by: Julien Grall 

Cheers,

--
Julien Grall



[PATCH CPU v2] cpuid: initialize cpuinfo with boot_cpu_data

2022-02-11 Thread Norbert Manthey
When re-identifying CPU data, we might use uninitialized data when
checking for the cache line property to adapt the cache
alignment. The data that depends on this uninitialized read is
currently not forwarded.

To avoid problems in the future, initialize the data cpuinfo
structure before re-identifying the CPU again.

The trace to hit the uninitialized read reported by Coverity is:

bool recheck_cpu_features(unsigned int cpu)
...
struct cpuinfo_x86 c;
...
identify_cpu();

void identify_cpu(struct cpuinfo_x86 *c)
...
generic_identify(c)

static void generic_identify(struct cpuinfo_x86 *c)
...
if (this_cpu->c_early_init)
this_cpu->c_early_init(c); // which is early_init_intel

static void early_init_intel(struct cpuinfo_x86 *c)
...
if (c->x86 == 15 && c->x86_cache_alignment == 64)
c->x86_cache_alignment = 128;

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Signed-off-by: Norbert Manthey 

---
 xen/arch/x86/cpuid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -609,7 +609,7 @@ void __init init_guest_cpuid(void)
 bool recheck_cpu_features(unsigned int cpu)
 {
 bool okay = true;
-struct cpuinfo_x86 c;
+struct cpuinfo_x86 c = {0};
 const struct cpuinfo_x86 *bsp = _cpu_data;
 unsigned int i;
 
-- 
2.17.1




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






Re: [RFC PATCH] arm/vgic-v3: provide custom callbacks for pend_lpi_tree radix tree

2022-02-11 Thread Julien Grall

Hi Luca,

On 11/02/2022 15:00, Luca Fancellu wrote:

pend_lpi_tree is a radix tree used to store pending irqs, the tree is
protected by a lock for read/write operations.

Currently the radix tree default function to free items uses the
RCU mechanism, calling call_rcu and deferring the operation.

However every access to the structure is protected by the lock so we
can avoid using the default free function that, by using RCU,
increases memory usage and impacts the predictability of the system.


I understand goal but looking at the implementation of 
vgic_v3_lpi_to_pending() (Copied below for convenience). We would 
release the lock as soon as the look-up finish, yet the element is returned.


static struct pending_irq *vgic_v3_lpi_to_pending(struct domain *d,
  unsigned int lpi)
{
struct pending_irq *pirq;

read_lock(>arch.vgic.pend_lpi_tree_lock);
pirq = radix_tree_lookup(>arch.vgic.pend_lpi_tree, lpi);
read_unlock(>arch.vgic.pend_lpi_tree_lock);

return pirq;
}

So the lock will not protect us against removal. If you want to drop the 
RCU, you will need to ensure the structure pending_irq is suitably 
protected. I haven't check whether there are other locks that may suit 
us here.


Cheers,

--
Julien Grall



[linux-5.4 test] 168084: trouble: blocked/broken/fail/pass

2022-02-11 Thread osstest service owner
flight 168084 linux-5.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168084/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64  broken
 build-arm64-pvopsbroken
 build-arm64-xsm  broken
 build-arm64-pvops 4 host-install(4)broken REGR. vs. 168060
 build-arm64-xsm   4 host-install(4)broken REGR. vs. 168060
 build-arm64   4 host-install(4)broken REGR. vs. 168060

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-examine  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-seattle   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168060
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168060
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168060
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168060
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168060
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168060
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168060
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168060
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168060
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168060
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168060
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168060
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 linux52871671099d1bb3fca5ed076029e4b937bfc053
baseline version:
 linux76fd334f07cc11e047c2237a19b2cf8b1f653ba2

Last test of basis   168060  2022-02-08 17:43:16 Z2 days
Testing same since   168084  

[RFC PATCH] arm/vgic-v3: provide custom callbacks for pend_lpi_tree radix tree

2022-02-11 Thread Luca Fancellu
pend_lpi_tree is a radix tree used to store pending irqs, the tree is
protected by a lock for read/write operations.

Currently the radix tree default function to free items uses the
RCU mechanism, calling call_rcu and deferring the operation.

However every access to the structure is protected by the lock so we
can avoid using the default free function that, by using RCU,
increases memory usage and impacts the predictability of the system.

This commit provides custom callbacks to alloc/free items of the radix
tree and the free function doesn't use the RCU mechanism.

Signed-off-by: Luca Fancellu 
---
 xen/arch/arm/vgic-v3.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index 65bb7991a69b..970747a72012 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -1650,6 +1650,18 @@ static inline unsigned int 
vgic_v3_max_rdist_count(struct domain *d)
GUEST_GICV3_RDIST_REGIONS;
 }
 
+static struct radix_tree_node *vgic_v3_radix_tree_node_alloc(void *arg)
+{
+struct radix_tree_node *node = xmalloc(struct radix_tree_node);
+
+return node ? node : NULL;
+}
+
+static void vgic_v3_radix_tree_node_free(struct radix_tree_node *elem, void 
*arg)
+{
+xfree(elem);
+}
+
 static int vgic_v3_domain_init(struct domain *d)
 {
 struct vgic_rdist_region *rdist_regions;
@@ -1668,6 +1680,14 @@ static int vgic_v3_domain_init(struct domain *d)
 rwlock_init(>arch.vgic.pend_lpi_tree_lock);
 radix_tree_init(>arch.vgic.pend_lpi_tree);
 
+/*
+ * pend_lpi_tree is protected by rwlock, so don't use lockless RCU default
+ * management for it and provide callbacks to alloc/free elements.
+ */
+radix_tree_set_alloc_callbacks(>arch.vgic.pend_lpi_tree,
+   vgic_v3_radix_tree_node_alloc,
+   vgic_v3_radix_tree_node_free, NULL);
+
 /*
  * Domain 0 gets the hardware address.
  * Guests get the virtual platform layout.
-- 
2.17.1




Re: [RFC v2 1/8] xen/hypfs: support fo nested dynamic hypfs nodes

2022-02-11 Thread Oleksii Moisieiev
On Fri, Feb 11, 2022 at 02:28:49PM +0100, Juergen Gross wrote:
> On 11.02.22 09:16, Oleksii Moisieiev wrote:
> > Hi Juergen,
> > 
> > On Thu, Feb 10, 2022 at 08:34:08AM +0100, Juergen Gross wrote:
> > > On 08.02.22 19:00, Oleksii Moisieiev wrote:
> > > 
> > 
> > > > Add new api:
> > > > - hypfs_read_dyndir_entry
> > > > - hypfs_gen_dyndir_entry
> > > > which are the extension of the dynamic hypfs nodes support, presented in
> > > > 0b3b53be8cf226d947a79c2535a9efbb2dd7bc38.
> > > > This allows nested dynamic nodes to be added. Also input parameter is
> > > > hypfs_entry, so properties can also be generated dynamically.
> > > > 
> > > > Generating mixed list of dirs and properties is also supported.
> > > > Same as to the dynamic hypfs nodes, this is anchored in percpu pointer,
> > > > which can be retriewed on any level of the dynamic entries.
> > > > This handle should be allocated on enter() callback and released on
> > > > exit() callback. When using nested dynamic dirs and properties handle
> > > > should be allocated on the first enter() call and released on the last
> > > > exit() call.
> > > > 
> > > > Signed-off-by: Oleksii Moisieiev 
> 
> ...
> 
> > > > diff --git a/xen/include/xen/hypfs.h b/xen/include/xen/hypfs.h
> > > > index e9d4c2555b..5d2728b963 100644
> > > > --- a/xen/include/xen/hypfs.h
> > > > +++ b/xen/include/xen/hypfs.h
> > > > @@ -79,8 +79,8 @@ struct hypfs_entry_dir {
> > > >struct hypfs_dyndir_id {
> > > 
> > > Please rename to struct hypfs_dyndir.
> > 
> > Ok, thanks.
> > 
> > > 
> > > >struct hypfs_entry_dir dir; /* Modified copy of 
> > > > template. */
> > > >struct hypfs_funcs funcs;   /* Dynamic functions. */
> > > > -const struct hypfs_entry_dir *template; /* Template used. */
> > > > -#define HYPFS_DYNDIR_ID_NAMELEN 12
> > > > +const struct hypfs_entry *template; /* Template used. */
> > > > +#define HYPFS_DYNDIR_ID_NAMELEN 32
> > > >char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. 
> > > > */
> > > >unsigned int id;/* Numerical id. */
> > > 
> > > What about the following change instead:
> > > 
> > > -const struct hypfs_entry_dir *template; /* Template used. */
> > > -#define HYPFS_DYNDIR_ID_NAMELEN 12
> > > -char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. */
> > > -
> > > -unsigned int id;/* Numerical id. */
> > > -void *data; /* Data associated with id. 
> > > */
> > > +const struct hypfs_entry *template;  /* Template used. */
> > > +union {
> > > +#define HYPFS_DYNDIR_NAMELEN32
> > > +char name[HYPFS_DYNDIR_NAMELEN]; /* Name of hypfs entry. */
> > > +struct {
> > > +#define HYPFS_DYNDIR_ID_NAMELEN 12
> > > +char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of id entry. */
> > > +unsigned int id;/* Numerical id. */
> > > +} id;
> > > +};
> > > +void*data;  /* Data associated with entry. */
> > > 
> > 
> > I'm not sure I see the benefit from this union. The only one I see is
> > that struct hypds_dyndir will be smaller by sizeof(unsigned int).
> > Could you explain please?
> 
> My main concern is that it is not obvious to a user that the
> numerical id is needed only for a special case. Putting it into
> the union makes this much more clear.
> 

This make sense. I'll make this union. Thanks.

> > 
> > Also what do you think about the following change:
> > -char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. */
> > -
> > -unsigned int id;/* Numerical id. */
> > -void *data; /* Data associated with id. */
> > +char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. */
> > +
> > +unsigned int id;/* Numerical id. */
> > +union {
> > +   const void *content;
> > +   void *data; /* Data associated with id. 
> > */
> > +}
> > This change is similar to the hypfs_entry_leaf. In this case we can
> > store const pointer for read-only entries and use data when write access
> > is needed?
> 
> Sure, if you need that.

Thanks I will do this as well.

Best regards,
Oleksii
> 
> > 
> > PS: I will address all your comments in v3.
> 
> Thanks,
> 
> 
> Juergen







Re: [RFC v2 1/8] xen/hypfs: support fo nested dynamic hypfs nodes

2022-02-11 Thread Juergen Gross

On 11.02.22 09:16, Oleksii Moisieiev wrote:

Hi Juergen,

On Thu, Feb 10, 2022 at 08:34:08AM +0100, Juergen Gross wrote:

On 08.02.22 19:00, Oleksii Moisieiev wrote:




Add new api:
- hypfs_read_dyndir_entry
- hypfs_gen_dyndir_entry
which are the extension of the dynamic hypfs nodes support, presented in
0b3b53be8cf226d947a79c2535a9efbb2dd7bc38.
This allows nested dynamic nodes to be added. Also input parameter is
hypfs_entry, so properties can also be generated dynamically.

Generating mixed list of dirs and properties is also supported.
Same as to the dynamic hypfs nodes, this is anchored in percpu pointer,
which can be retriewed on any level of the dynamic entries.
This handle should be allocated on enter() callback and released on
exit() callback. When using nested dynamic dirs and properties handle
should be allocated on the first enter() call and released on the last
exit() call.

Signed-off-by: Oleksii Moisieiev 


...


diff --git a/xen/include/xen/hypfs.h b/xen/include/xen/hypfs.h
index e9d4c2555b..5d2728b963 100644
--- a/xen/include/xen/hypfs.h
+++ b/xen/include/xen/hypfs.h
@@ -79,8 +79,8 @@ struct hypfs_entry_dir {
   struct hypfs_dyndir_id {


Please rename to struct hypfs_dyndir.


Ok, thanks.




   struct hypfs_entry_dir dir; /* Modified copy of template. */
   struct hypfs_funcs funcs;   /* Dynamic functions. */
-const struct hypfs_entry_dir *template; /* Template used. */
-#define HYPFS_DYNDIR_ID_NAMELEN 12
+const struct hypfs_entry *template; /* Template used. */
+#define HYPFS_DYNDIR_ID_NAMELEN 32
   char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. */
   unsigned int id;/* Numerical id. */


What about the following change instead:

-const struct hypfs_entry_dir *template; /* Template used. */
-#define HYPFS_DYNDIR_ID_NAMELEN 12
-char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. */
-
-unsigned int id;/* Numerical id. */
-void *data; /* Data associated with id. */
+const struct hypfs_entry *template;  /* Template used. */
+union {
+#define HYPFS_DYNDIR_NAMELEN32
+char name[HYPFS_DYNDIR_NAMELEN]; /* Name of hypfs entry. */
+struct {
+#define HYPFS_DYNDIR_ID_NAMELEN 12
+char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of id entry. */
+unsigned int id;/* Numerical id. */
+} id;
+};
+void*data;  /* Data associated with entry. */



I'm not sure I see the benefit from this union. The only one I see is
that struct hypds_dyndir will be smaller by sizeof(unsigned int).
Could you explain please?


My main concern is that it is not obvious to a user that the
numerical id is needed only for a special case. Putting it into
the union makes this much more clear.



Also what do you think about the following change:
-char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. */
-
-unsigned int id;/* Numerical id. */
-void *data; /* Data associated with id. */
+char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. */
+
+unsigned int id;/* Numerical id. */
+union {
+   const void *content;
+   void *data; /* Data associated with id. */
+}
This change is similar to the hypfs_entry_leaf. In this case we can
store const pointer for read-only entries and use data when write access
is needed?


Sure, if you need that.



PS: I will address all your comments in v3.


Thanks,


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] x86emul: fix VPBLENDMW with mask and memory operand

2022-02-11 Thread Andrew Cooper
On 11/02/2022 11:09, Jan Beulich wrote:
> Element size for this opcode depends on EVEX.W, not the low opcode bit.
> Make use of AVX512BW being a prereq to AVX512_BITALG and move the case
> label there, adding an AVX512BW feature check.
>
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 


Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

2022-02-11 Thread Oleksandr Andrushchenko


On 11.02.22 13:51, Roger Pau Monné wrote:
> On Fri, Feb 11, 2022 at 08:46:59AM +, Oleksandr Andrushchenko wrote:
>>
>> On 10.02.22 18:16, Roger Pau Monné wrote:
>>> On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote:
 From: Oleksandr Andrushchenko 

 Introduce a per-domain read/write lock to check whether vpci is present,
 so we are sure there are no accesses to the contents of the vpci struct
 if not. This lock can be used (and in a few cases is used right away)
 so that vpci removal can be performed while holding the lock in write
 mode. Previously such removal could race with vpci_read for example.
>>> Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt
>>> pci_remove_device, and likely when vPCI gets also used in
>>> {de}assign_device I think.
>>>
>> How about the below? It seems to guarantee that we can access pdev
>> without issues and without requiring pcidevs_lock to be used?
> Hm, I'm unsure this is correct.
Yes, we need pcidevs as rwlock in order to solve this reliably...
>   It's in general a bad idea to use a
> per-domain lock approach to protect the consistency of elements moving
> between domains.
>
> In order for this to be safe you will likely need to hold both the
> source and the destination per-domain locks, and then you could also
> get into ABBA lock issues unless you always take the lock in the same
> order.
>
> I think it's safer to use a global lock in this case (pcidevs_lock).
>
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index e8b09d77d880..fd464a58b3b3 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -937,8 +937,14 @@ static int deassign_device(struct domain *d, uint16_t 
>> seg, uint8_t bus,
>>    }
>>
>>    devfn = pdev->devfn;
>> +#ifdef CONFIG_HAS_VPCI
>> +    write_lock(>vpci_rwlock);
>> +#endif
>>    ret = iommu_call(hd->platform_ops, reassign_device, d, target, devfn,
>>     pci_to_dev(pdev));
>> +#ifdef CONFIG_HAS_VPCI
>> +    write_unlock(>vpci_rwlock);
>> +#endif
>>    if ( ret )
>>    goto out;
>>
>> @@ -1474,6 +1480,9 @@ static int assign_device(struct domain *d, u16 seg, u8 
>> bus, u8 devfn, u32 flag)
>>    const struct domain_iommu *hd = dom_iommu(d);
>>    struct pci_dev *pdev;
>>    int rc = 0;
>> +#ifdef CONFIG_HAS_VPCI
>> +    struct domain *old_d;
>> +#endif
>>
>>    if ( !is_iommu_enabled(d) )
>>    return 0;
>> @@ -1487,15 +1496,34 @@ static int assign_device(struct domain *d, u16 seg, 
>> u8 bus, u8 devfn, u32 flag)
>>    ASSERT(pdev && (pdev->domain == hardware_domain ||
>>    pdev->domain == dom_io));
>>
>> +#ifdef CONFIG_HAS_VPCI
>> +    /* pdev->domain is either hwdom or dom_io. We do not want the later. */
>> +    old_d = pdev->domain == hardware_domain ? pdev->domain : NULL;
>> +    if ( old_d )
>> +    write_lock(_d->vpci_rwlock);
>> +#endif
>> +
>>    rc = pdev_msix_assign(d, pdev);
> I don't think you need the vpci lock for this operation.
>
>>    if ( rc )
>> +    {
>> +#ifdef CONFIG_HAS_VPCI
>> +    if ( old_d )
>> +    write_unlock(_d->vpci_rwlock);
>> +#endif
>>    goto done;
>> +    }
>>
>>    pdev->fault.count = 0;
>>
>>    if ( (rc = iommu_call(hd->platform_ops, assign_device, d, devfn,
>>      pci_to_dev(pdev), flag)) )
>> +    {
>> +#ifdef CONFIG_HAS_VPCI
>> +    if ( old_d )
>> +    write_unlock(_d->vpci_rwlock);
>> +#endif
> Like I've mentioned above, I'm unsure this is correct. You are holding
> the lock of the previous domain, but at some point the device will be
> assigned to a new domain, so that change won't be protected by the
> lock of the new domain.
>
> Thanks, Roger.


Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

2022-02-11 Thread Oleksandr Andrushchenko


On 11.02.22 13:40, Roger Pau Monné wrote:
> On Fri, Feb 11, 2022 at 07:27:39AM +, Oleksandr Andrushchenko wrote:
>> Hi, Roger!
>>
>> On 10.02.22 18:16, Roger Pau Monné wrote:
>>> On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote:
 From: Oleksandr Andrushchenko 

 Introduce a per-domain read/write lock to check whether vpci is present,
 so we are sure there are no accesses to the contents of the vpci struct
 if not. This lock can be used (and in a few cases is used right away)
 so that vpci removal can be performed while holding the lock in write
 mode. Previously such removal could race with vpci_read for example.
>>> Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt
>>> pci_remove_device, and likely when vPCI gets also used in
>>> {de}assign_device I think.
>> Yes, this is indeed an issue, but I was not trying to solve it in
>> context of vPCI locking yet. I think we should discuss how do
>> we approach pdev locking, so I can create a patch for that.
>> that being said, I would like not to solve pdev in  this patch yet
>>
>> ...I do understand we do want to avoid that, but at the moment
>> a single reliable way for making sure pdev is alive seems to
>> be pcidevs_lock
> I think we will need to make pcidevs_lock a rwlock and take it in read
> mode for pci_get_pdev_by_domain.
>
> We didn't have this scenario before where PCI emulation is done in the
> hypervisor, and hence the locking around those data structures has not
> been designed for those use-cases.
Yes, I do understand that.
I hope pcidevs lock move to rwlock can be done as a separate
patch. While this is not done, do you think we can proceed with
vPCI series and pcidevs locking re-work being done after?

>
 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure
 from being removed.

 2. Writing the command register and ROM BAR register may trigger
 modify_bars to run, which in turn may access multiple pdevs while
 checking for the existing BAR's overlap. The overlapping check, if done
 under the read lock, requires vpci->lock to be acquired on both devices
 being compared, which may produce a deadlock. It is not possible to
 upgrade read lock to write lock in such a case. So, in order to prevent
 the deadlock, check which registers are going to be written and acquire
 the lock in the appropriate mode from the beginning.

 All other code, which doesn't lead to pdev->vpci destruction and does not
 access multiple pdevs at the same time, can still use a combination of the
 read lock and pdev->vpci->lock.

 3. Optimize if ROM BAR write lock required detection by caching offset
 of the ROM BAR register in vpci->header->rom_reg which depends on
 header's type.

 4. Reduce locked region in vpci_remove_device as it is now possible
 to set pdev->vpci to NULL early right after the write lock is acquired.

 5. Reduce locked region in vpci_add_handlers as it is possible to
 initialize many more fields of the struct vpci before assigning it to
 pdev->vpci.

 6. vpci_{add|remove}_register are required to be called with the write lock
 held, but it is not feasible to add an assert there as it requires
 struct domain to be passed for that. So, add a comment about this 
 requirement
 to these and other functions with the equivalent constraints.

 7. Drop const qualifier where the new rwlock is used and this is 
 appropriate.

 8. This is based on the discussion at [1].

 [1] 
 https://urldefense.com/v3/__https://lore.kernel.org/all/20220204063459.680961-4-andr2...@gmail.com/__;!!GF_29dbcQIUBPA!gObSySzN7s6zSKrcpSEi6vw18fRPls157cuRoqq4KDd7Ic_Nvh_cFlyVXPRpEWBkI38pgsvvfg$
  [lore[.]kernel[.]org]

 Suggested-by: Roger Pau Monné 
 Suggested-by: Jan Beulich 
 Signed-off-by: Oleksandr Andrushchenko 

 ---
 This was checked on x86: with and without PVH Dom0.
 ---
xen/arch/x86/hvm/vmsi.c   |   2 +
xen/common/domain.c   |   3 +
xen/drivers/vpci/header.c |   8 +++
xen/drivers/vpci/msi.c|   8 ++-
xen/drivers/vpci/msix.c   |  40 +++--
xen/drivers/vpci/vpci.c   | 114 --
xen/include/xen/sched.h   |   3 +
xen/include/xen/vpci.h|   2 +
8 files changed, 146 insertions(+), 34 deletions(-)

 diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
 index 13e2a190b439..351cb968a423 100644
 --- a/xen/arch/x86/hvm/vmsi.c
 +++ b/xen/arch/x86/hvm/vmsi.c
 @@ -893,6 +893,8 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
{
unsigned int i;

 +ASSERT(!!rw_is_locked(>pdev->domain->vpci_rwlock));
>>> ^ no need for the double negation.
>> Ok, will update all asserts which use !!
>>> Also this 

Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data

2022-02-11 Thread Roger Pau Monné
On Fri, Feb 11, 2022 at 12:42:11PM +0100, Jan Beulich wrote:
> On 11.02.2022 12:19, Roger Pau Monné wrote:
> > On Fri, Feb 11, 2022 at 11:50:46AM +0100, Jan Beulich wrote:
> >> On 11.02.2022 11:47, Roger Pau Monné wrote:
> >>> On Fri, Feb 11, 2022 at 11:32:45AM +0100, Jan Beulich wrote:
>  On 11.02.2022 10:02, Roger Pau Monné wrote:
> > On Fri, Feb 11, 2022 at 08:23:27AM +0100, Norbert Manthey wrote:
> >> When re-identifying CPU data, we might use uninitialized data when
> >> checking for the cache line property to adapt the cache
> >> alignment. The data that depends on this uninitialized read is
> >> currently not forwarded.
> >>
> >> To avoid problems in the future, initialize the data cpuinfo
> >> structure before re-identifying the CPU again.
> >>
> >> The trace to hit the uninitialized read reported by Coverity is:
> >>
> >> bool recheck_cpu_features(unsigned int cpu)
> >> ...
> >> struct cpuinfo_x86 c;
> >> ...
> >> identify_cpu();
> >>
> >> void identify_cpu(struct cpuinfo_x86 *c)
> >> ...
> >> generic_identify(c)
> >>
> >> static void generic_identify(struct cpuinfo_x86 *c)
> >> ...
> >
> > Would it be more appropriate for generic_identify to also set
> > x86_cache_alignment like it's done in early_cpu_init?
> >
> > generic_identify already re-fetches a bunch of stuff that's also
> > set by early_cpu_init for the BSP.
> 
>  This would be an option, but how sure are you that there isn't
>  (going to be) another field with similar properties? We better
>  wouldn't require _everything_ to be re-filled in generic_identify().
> >>>
> >>> So you think generic_identify should call into early_cpu_init, or even
> >>> split the cpuinfo_x86 filling done in early_cpu_init into a non-init
> >>> function that could be called by both generic_identify and
> >>> early_cpu_init?
> >>
> >> No, I think it is quite fine for this to be a two-step process.
> > 
> > But it's not a two step process for all CPUs. It's a two step process
> > for the BSP, that will get it's cpuinfo filled by early_cpu_init
> > first, and then by identify_cpu. OTHO APs will only get the
> > information filled by identify_cpu.
> > 
> > Maybe APs don't care about having x86_cache_alignment correctly set?
> 
> They do care, and the field also isn't left uninitialized. See
> initialize_cpu_data(). Like in many other places we assume full
> symmetry among processors here.

Thanks, I did miss that part. Then I think it's fine to initialize the
struct in recheck_cpu_features to zero. That's likely to make it more
obvious if we somehow miss to update certain featuresets? (as they
would be empty instead of inheriting from boot CPU data).

Thanks, Roger.



Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data

2022-02-11 Thread Norbert Manthey
On 2/11/22 11:34, Jan Beulich wrote:
> On 11.02.2022 08:23, Norbert Manthey wrote:
>> --- a/xen/arch/x86/cpuid.c
>> +++ b/xen/arch/x86/cpuid.c
>> @@ -609,7 +609,7 @@ void __init init_guest_cpuid(void)
>>  bool recheck_cpu_features(unsigned int cpu)
>>  {
>>  bool okay = true;
>> -struct cpuinfo_x86 c;
>> +struct cpuinfo_x86 c = boot_cpu_data;
>>  const struct cpuinfo_x86 *bsp = _cpu_data;
>>  unsigned int i;
> While I agree with the need to initialize the local variable, I
> don't think it should be pre-seeded with previous indentification
> results: This could end up hiding bugs. Instead I'd see it simply
> be zero-filled.

That works for me as well, I'll send a rev-2 accordingly.

Norbert




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




[xen-unstable test] 168081: trouble: broken/fail/pass

2022-02-11 Thread osstest service owner
flight 168081 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168081/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-xl-credit1  broken
 test-arm64-arm64-xl-credit2  broken
 test-arm64-arm64-xl-seattle  broken
 test-arm64-arm64-xl-xsm  broken

Tests which are failing intermittently (not blocking):
 test-arm64-arm64-xl-seattle   5 host-install(5)  broken pass in 168075
 test-arm64-arm64-xl-credit2   5 host-install(5)  broken pass in 168075
 test-arm64-arm64-xl-xsm   5 host-install(5)  broken pass in 168075
 test-arm64-arm64-xl-credit1   5 host-install(5)  broken pass in 168075
 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeat fail in 168075 pass in 
168081
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail in 
168075 pass in 168081

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-seattle 15 migrate-support-check fail in 168075 never pass
 test-arm64-arm64-xl-seattle 16 saverestore-support-check fail in 168075 never 
pass
 test-arm64-arm64-xl-xsm 15 migrate-support-check fail in 168075 never pass
 test-arm64-arm64-xl-credit2 15 migrate-support-check fail in 168075 never pass
 test-arm64-arm64-xl-credit1 15 migrate-support-check fail in 168075 never pass
 test-arm64-arm64-xl-credit2 16 saverestore-support-check fail in 168075 never 
pass
 test-arm64-arm64-xl-xsm 16 saverestore-support-check fail in 168075 never pass
 test-arm64-arm64-xl-credit1 16 saverestore-support-check fail in 168075 never 
pass
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168075
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168075
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168075
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168075
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168075
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168075
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168075
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168075
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168075
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168075
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168075
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168075
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-check 

Re: [RFC v2 5/8] xen/arm: introduce SCMI-SMC mediator driver

2022-02-11 Thread Oleksii Moisieiev
On Fri, Feb 11, 2022 at 11:18:47AM +, Bertrand Marquis wrote:
> Hi Oleksii,
> 
> 
> > On 11 Feb 2022, at 10:44, Oleksii Moisieiev  
> > wrote:
> > 
> > Hi Bertrand,
> > 
> > On Fri, Feb 11, 2022 at 08:46:05AM +, Bertrand Marquis wrote:
> >> Hi Oleksii,
> >> 
> >> 
> >>> On 8 Feb 2022, at 18:00, Oleksii Moisieiev  
> >>> wrote:
> >>> 
> >>> This is the implementation of SCI interface, called SCMI-SMC driver,
> >>> which works as the mediator between XEN Domains and Firmware (SCP, ATF 
> >>> etc).
> >>> This allows devices from the Domains to work with clocks, resets and
> >>> power-domains without access to CPG.
> >>> 
> >>> Originally, cpg should be passed to the domain so it can work with
> >>> power-domains/clocks/resets etc. Considering that cpg can't be split 
> >>> between
> >>> the Domains, we get the limitation that the devices, which are using
> >>> power-domains/clocks/resets etc, couldn't be split between the domains.
> >>> The solution is to move the power-domain/clock/resets etc to the
> >>> Firmware (such as SCP firmware or ATF) and provide interface for the
> >>> Domains. XEN should have an entity, caled SCI-Mediator, which is
> >>> responsible for messages redirection between Domains and Firmware and
> >>> for permission handling.
> >>> 
> >>> The following features are implemented:
> >>> - request SCMI channels from ATF and pass channels to Domains;
> >>> - set device permissions for Domains based on the Domain partial
> >>> device-tree. Devices with permissions are able to work with clocks,
> >>> resets and power-domains via SCMI;
> >>> - redirect scmi messages from Domains to ATF.
> >> 
> >> Before going more deeply in the code I would like to discuss the general
> >> design here and ask some questions to prevent to rework the code before
> >> we all agree that this is the right solution and that we want this in Xen.
> >> 
> >> First I want to point out that clock/reset/power virtualization is a 
> >> problem
> >> on most applications using device pass-through and I am very glad that
> >> someone is looking into it.
> >> Also SCMI is the current standard existing for this so relying on it is a 
> >> very
> >> good idea.
> >> 
> >> Latest version SCMI standard (DEN0056D v3.1) is defining some means
> >> to use SCMI on a virtualised system. In chapter 4.2.1, the standard
> >> recommends to set permissions per agent in the hypervisor so that a VM
> >> could later use the discovery protocol to detect the resources and use 
> >> them.
> >> Using this kind of scenario the mediator in Xen would just configure the
> >> Permissions in the SCMI and would then rely on it to limit what is possible
> >> by who just by just assigning a channel to a VM.
> > 
> >> 
> >> In your current design (please correct me if I am wrong) you seem to fully
> >> rely on Xen and the FDT for discovery and permission.
> > 
> > In current implementation Xen is the trusted agent. And it's responsible
> > for permissions setting. During initialization it discovers agent and
> > set permissions by using BASE_SET_DEVICE_PERMISSIONS to the Dom0. When
> > new domain is created, Xen assigns agent id for this domain and request
> > resources, that are passed-through to this Domain.
> 
> Ok
> 
> > 
> > I'm getting the follwing information from FDT:
> > 1) Shared memory addressed, which should be used for agents. During
> > initialization I send BASE_DISCOVER_AGENT to each of this addresses and
> > receive agent_id. Xen is responsible for assigning agent_id for the
> > Domain. Then Xen intercept smc calls from the domain, set agent_id and
> > redirects it to the Firmware.
> 
> So Xen is setting the agent ID, no way for a guest to get access to something 
> it
> should with more check, am I right ?
> 

Yes. Xen is the only entity, which is trusted. So it's responsible for
setting permissions and assigning agent_id. Guest get's an access only
for the devices it's allowed to.

> > 
> > 2) Devices, that are using SCMI. Those devices has clock/power/resets
> > etc related to scmi protocol (as it is done in Linux kernel)
> > and scmi_devid should be set. I'm currently preparing to send patch,
> > updating kernel bindings with this parameter to Linux kernel.
> > scmi_devid value should match device id, set in the Firmware.
> > dt example:
> >  {
> >scmi_devid = <1>; // usb0 device id
> >clocks = <_clock 1> // relays on clock with id 1
> > }
> > 
> > Xen requests permission for the device when device is attached to the
> > Domain during creation.
> 
> Without this, how is (if it is) the linux kernel using SCMI for power 
> management ?

Here is how it should be desribed in FDT: 
/
{
firmware {
scmi {
arm,smc-id = <0x8202>;
scmi_power: protocol@11 {
reg = <0x11>;
#power-domain-cells = <1>;
};
...
scmi_clock: protocol@14 {
...
scmi_reset: protocol@16 {
...
};
};
};

 {
   

Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

2022-02-11 Thread Roger Pau Monné
On Fri, Feb 11, 2022 at 08:46:59AM +, Oleksandr Andrushchenko wrote:
> 
> 
> On 10.02.22 18:16, Roger Pau Monné wrote:
> > On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote:
> >> From: Oleksandr Andrushchenko 
> >>
> >> Introduce a per-domain read/write lock to check whether vpci is present,
> >> so we are sure there are no accesses to the contents of the vpci struct
> >> if not. This lock can be used (and in a few cases is used right away)
> >> so that vpci removal can be performed while holding the lock in write
> >> mode. Previously such removal could race with vpci_read for example.
> > Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt
> > pci_remove_device, and likely when vPCI gets also used in
> > {de}assign_device I think.
> >
> How about the below? It seems to guarantee that we can access pdev
> without issues and without requiring pcidevs_lock to be used?

Hm, I'm unsure this is correct. It's in general a bad idea to use a
per-domain lock approach to protect the consistency of elements moving
between domains.

In order for this to be safe you will likely need to hold both the
source and the destination per-domain locks, and then you could also
get into ABBA lock issues unless you always take the lock in the same
order.

I think it's safer to use a global lock in this case (pcidevs_lock).

> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index e8b09d77d880..fd464a58b3b3 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -937,8 +937,14 @@ static int deassign_device(struct domain *d, uint16_t 
> seg, uint8_t bus,
>   }
> 
>   devfn = pdev->devfn;
> +#ifdef CONFIG_HAS_VPCI
> +    write_lock(>vpci_rwlock);
> +#endif
>   ret = iommu_call(hd->platform_ops, reassign_device, d, target, devfn,
>    pci_to_dev(pdev));
> +#ifdef CONFIG_HAS_VPCI
> +    write_unlock(>vpci_rwlock);
> +#endif
>   if ( ret )
>   goto out;
> 
> @@ -1474,6 +1480,9 @@ static int assign_device(struct domain *d, u16 seg, u8 
> bus, u8 devfn, u32 flag)
>   const struct domain_iommu *hd = dom_iommu(d);
>   struct pci_dev *pdev;
>   int rc = 0;
> +#ifdef CONFIG_HAS_VPCI
> +    struct domain *old_d;
> +#endif
> 
>   if ( !is_iommu_enabled(d) )
>   return 0;
> @@ -1487,15 +1496,34 @@ static int assign_device(struct domain *d, u16 seg, 
> u8 bus, u8 devfn, u32 flag)
>   ASSERT(pdev && (pdev->domain == hardware_domain ||
>   pdev->domain == dom_io));
> 
> +#ifdef CONFIG_HAS_VPCI
> +    /* pdev->domain is either hwdom or dom_io. We do not want the later. */
> +    old_d = pdev->domain == hardware_domain ? pdev->domain : NULL;
> +    if ( old_d )
> +    write_lock(_d->vpci_rwlock);
> +#endif
> +
>   rc = pdev_msix_assign(d, pdev);

I don't think you need the vpci lock for this operation.

>   if ( rc )
> +    {
> +#ifdef CONFIG_HAS_VPCI
> +    if ( old_d )
> +    write_unlock(_d->vpci_rwlock);
> +#endif
>   goto done;
> +    }
> 
>   pdev->fault.count = 0;
> 
>   if ( (rc = iommu_call(hd->platform_ops, assign_device, d, devfn,
>     pci_to_dev(pdev), flag)) )
> +    {
> +#ifdef CONFIG_HAS_VPCI
> +    if ( old_d )
> +    write_unlock(_d->vpci_rwlock);
> +#endif

Like I've mentioned above, I'm unsure this is correct. You are holding
the lock of the previous domain, but at some point the device will be
assigned to a new domain, so that change won't be protected by the
lock of the new domain.

Thanks, Roger.



Re: [PATCH v2 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86

2022-02-11 Thread Jan Beulich
On 11.02.2022 12:29, Roger Pau Monné wrote:
> On Fri, Feb 11, 2022 at 10:06:48AM +, Jane Malalane wrote:
>> On 10/02/2022 10:03, Roger Pau Monné wrote:
>>> On Mon, Feb 07, 2022 at 06:21:00PM +, Jane Malalane wrote:
 diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
 index 7ab15e07a0..4060aef1bd 100644
 --- a/xen/arch/x86/hvm/vmx/vmcs.c
 +++ b/xen/arch/x86/hvm/vmx/vmcs.c
 @@ -343,6 +343,15 @@ static int vmx_init_vmcs_config(bool bsp)
   MSR_IA32_VMX_PROCBASED_CTLS2, );
   }
   
 +/* Check whether hardware supports accelerated xapic and x2apic. */
 +if ( bsp )
 +{
 +assisted_xapic_available = cpu_has_vmx_virtualize_apic_accesses;
 +assisted_x2apic_available = (cpu_has_vmx_apic_reg_virt ||
 + cpu_has_vmx_virtual_intr_delivery) &&
 +cpu_has_vmx_virtualize_x2apic_mode;
>>>
>>> I've been think about this, and it seems kind of asymmetric that for
>>> xAPIC mode we report hw assisted support only with
>>> virtualize_apic_accesses available, while for x2APIC we require
>>> virtualize_x2apic_mode plus either apic_reg_virt or
>>> virtual_intr_delivery.
>>>
>>> I think we likely need to be more consistent here, and report hw
>>> assisted x2APIC support as long as virtualize_x2apic_mode is
>>> available.
>>>
>>> This will likely have some effect on patch 2 also, as you will have to
>>> adjust vmx_vlapic_msr_changed.
>>>
>>> Thanks, Roger.
>>
>> Any other thoughts on this? As on one hand it is asymmetric but also 
>> there isn't much assistance with only virtualize_x2apic_mode set as, in 
>> this case, a VM exit will be avoided only when trying to access the TPR 
>> register.
> 
> I've been thinking about this, and reporting hardware assisted
> x{2}APIC virtualization with just
> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES or
> SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE doesn't seem very helpful. While
> those provide some assistance to the VMM in order to handle APIC
> accesses, it will still require a trap into the hypervisor to handle
> most of the accesses.
> 
> So maybe we should only report hardware assisted support when the
> mentioned features are present together with
> SECONDARY_EXEC_APIC_REGISTER_VIRT?

Not sure - "some assistance" seems still a little better than none at all.
Which route to go depends on what exactly we intend the bit to be used for.

Jan




Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data

2022-02-11 Thread Jan Beulich
On 11.02.2022 12:19, Roger Pau Monné wrote:
> On Fri, Feb 11, 2022 at 11:50:46AM +0100, Jan Beulich wrote:
>> On 11.02.2022 11:47, Roger Pau Monné wrote:
>>> On Fri, Feb 11, 2022 at 11:32:45AM +0100, Jan Beulich wrote:
 On 11.02.2022 10:02, Roger Pau Monné wrote:
> On Fri, Feb 11, 2022 at 08:23:27AM +0100, Norbert Manthey wrote:
>> When re-identifying CPU data, we might use uninitialized data when
>> checking for the cache line property to adapt the cache
>> alignment. The data that depends on this uninitialized read is
>> currently not forwarded.
>>
>> To avoid problems in the future, initialize the data cpuinfo
>> structure before re-identifying the CPU again.
>>
>> The trace to hit the uninitialized read reported by Coverity is:
>>
>> bool recheck_cpu_features(unsigned int cpu)
>> ...
>> struct cpuinfo_x86 c;
>> ...
>> identify_cpu();
>>
>> void identify_cpu(struct cpuinfo_x86 *c)
>> ...
>> generic_identify(c)
>>
>> static void generic_identify(struct cpuinfo_x86 *c)
>> ...
>
> Would it be more appropriate for generic_identify to also set
> x86_cache_alignment like it's done in early_cpu_init?
>
> generic_identify already re-fetches a bunch of stuff that's also
> set by early_cpu_init for the BSP.

 This would be an option, but how sure are you that there isn't
 (going to be) another field with similar properties? We better
 wouldn't require _everything_ to be re-filled in generic_identify().
>>>
>>> So you think generic_identify should call into early_cpu_init, or even
>>> split the cpuinfo_x86 filling done in early_cpu_init into a non-init
>>> function that could be called by both generic_identify and
>>> early_cpu_init?
>>
>> No, I think it is quite fine for this to be a two-step process.
> 
> But it's not a two step process for all CPUs. It's a two step process
> for the BSP, that will get it's cpuinfo filled by early_cpu_init
> first, and then by identify_cpu. OTHO APs will only get the
> information filled by identify_cpu.
> 
> Maybe APs don't care about having x86_cache_alignment correctly set?

They do care, and the field also isn't left uninitialized. See
initialize_cpu_data(). Like in many other places we assume full
symmetry among processors here.

Jan




Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

2022-02-11 Thread Roger Pau Monné
On Fri, Feb 11, 2022 at 07:27:39AM +, Oleksandr Andrushchenko wrote:
> Hi, Roger!
> 
> On 10.02.22 18:16, Roger Pau Monné wrote:
> > On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote:
> >> From: Oleksandr Andrushchenko 
> >>
> >> Introduce a per-domain read/write lock to check whether vpci is present,
> >> so we are sure there are no accesses to the contents of the vpci struct
> >> if not. This lock can be used (and in a few cases is used right away)
> >> so that vpci removal can be performed while holding the lock in write
> >> mode. Previously such removal could race with vpci_read for example.
> > Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt
> > pci_remove_device, and likely when vPCI gets also used in
> > {de}assign_device I think.
> Yes, this is indeed an issue, but I was not trying to solve it in
> context of vPCI locking yet. I think we should discuss how do
> we approach pdev locking, so I can create a patch for that.
> that being said, I would like not to solve pdev in  this patch yet
> 
> ...I do understand we do want to avoid that, but at the moment
> a single reliable way for making sure pdev is alive seems to
> be pcidevs_lock

I think we will need to make pcidevs_lock a rwlock and take it in read
mode for pci_get_pdev_by_domain.

We didn't have this scenario before where PCI emulation is done in the
hypervisor, and hence the locking around those data structures has not
been designed for those use-cases.

> >
> >> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure
> >> from being removed.
> >>
> >> 2. Writing the command register and ROM BAR register may trigger
> >> modify_bars to run, which in turn may access multiple pdevs while
> >> checking for the existing BAR's overlap. The overlapping check, if done
> >> under the read lock, requires vpci->lock to be acquired on both devices
> >> being compared, which may produce a deadlock. It is not possible to
> >> upgrade read lock to write lock in such a case. So, in order to prevent
> >> the deadlock, check which registers are going to be written and acquire
> >> the lock in the appropriate mode from the beginning.
> >>
> >> All other code, which doesn't lead to pdev->vpci destruction and does not
> >> access multiple pdevs at the same time, can still use a combination of the
> >> read lock and pdev->vpci->lock.
> >>
> >> 3. Optimize if ROM BAR write lock required detection by caching offset
> >> of the ROM BAR register in vpci->header->rom_reg which depends on
> >> header's type.
> >>
> >> 4. Reduce locked region in vpci_remove_device as it is now possible
> >> to set pdev->vpci to NULL early right after the write lock is acquired.
> >>
> >> 5. Reduce locked region in vpci_add_handlers as it is possible to
> >> initialize many more fields of the struct vpci before assigning it to
> >> pdev->vpci.
> >>
> >> 6. vpci_{add|remove}_register are required to be called with the write lock
> >> held, but it is not feasible to add an assert there as it requires
> >> struct domain to be passed for that. So, add a comment about this 
> >> requirement
> >> to these and other functions with the equivalent constraints.
> >>
> >> 7. Drop const qualifier where the new rwlock is used and this is 
> >> appropriate.
> >>
> >> 8. This is based on the discussion at [1].
> >>
> >> [1] 
> >> https://urldefense.com/v3/__https://lore.kernel.org/all/20220204063459.680961-4-andr2...@gmail.com/__;!!GF_29dbcQIUBPA!gObSySzN7s6zSKrcpSEi6vw18fRPls157cuRoqq4KDd7Ic_Nvh_cFlyVXPRpEWBkI38pgsvvfg$
> >>  [lore[.]kernel[.]org]
> >>
> >> Suggested-by: Roger Pau Monné 
> >> Suggested-by: Jan Beulich 
> >> Signed-off-by: Oleksandr Andrushchenko 
> >>
> >> ---
> >> This was checked on x86: with and without PVH Dom0.
> >> ---
> >>   xen/arch/x86/hvm/vmsi.c   |   2 +
> >>   xen/common/domain.c   |   3 +
> >>   xen/drivers/vpci/header.c |   8 +++
> >>   xen/drivers/vpci/msi.c|   8 ++-
> >>   xen/drivers/vpci/msix.c   |  40 +++--
> >>   xen/drivers/vpci/vpci.c   | 114 --
> >>   xen/include/xen/sched.h   |   3 +
> >>   xen/include/xen/vpci.h|   2 +
> >>   8 files changed, 146 insertions(+), 34 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> >> index 13e2a190b439..351cb968a423 100644
> >> --- a/xen/arch/x86/hvm/vmsi.c
> >> +++ b/xen/arch/x86/hvm/vmsi.c
> >> @@ -893,6 +893,8 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
> >>   {
> >>   unsigned int i;
> >>   
> >> +ASSERT(!!rw_is_locked(>pdev->domain->vpci_rwlock));
> >^ no need for the double negation.
> Ok, will update all asserts which use !!
> >
> > Also this asserts that the lock is taken, but could be by a different
> > pCPU.  I guess it's better than nothing.
> Fair enough. Do you still want the asserts or should I remove them?

Likely fine to leave them.

> >
> >> +
> >>   for ( i = 0; i < msix->max_entries; i++ )
> >>   {
> >>   

Re: [PATCH] x86emul: fix SIMD test overriding of VBROADCASTS{S,D}

2022-02-11 Thread Andrew Cooper
On 11/02/2022 11:11, Jan Beulich wrote:
> Despite their suffixes these aren't scalar instructions, and hence the
> 128- and 256-bit EVEX forms may not be used without AVX512VL. Gcc11 ends
> up generating such instances for simd-sg.c.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 


Re: [PATCH] x86emul: work around gcc11 bug in SIMD tests

2022-02-11 Thread Andrew Cooper
On 11/02/2022 11:01, Jan Beulich wrote:
> Gcc11 looks to have trouble with conditional expressions used with
> vector operands: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104497.
> Replace two instances causing SEGV there in certain cases.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 


Re: [PATCH v2 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86

2022-02-11 Thread Roger Pau Monné
On Fri, Feb 11, 2022 at 10:06:48AM +, Jane Malalane wrote:
> On 10/02/2022 10:03, Roger Pau Monné wrote:
> > On Mon, Feb 07, 2022 at 06:21:00PM +, Jane Malalane wrote:
> >> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> >> index 7ab15e07a0..4060aef1bd 100644
> >> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> >> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> >> @@ -343,6 +343,15 @@ static int vmx_init_vmcs_config(bool bsp)
> >>   MSR_IA32_VMX_PROCBASED_CTLS2, );
> >>   }
> >>   
> >> +/* Check whether hardware supports accelerated xapic and x2apic. */
> >> +if ( bsp )
> >> +{
> >> +assisted_xapic_available = cpu_has_vmx_virtualize_apic_accesses;
> >> +assisted_x2apic_available = (cpu_has_vmx_apic_reg_virt ||
> >> + cpu_has_vmx_virtual_intr_delivery) &&
> >> +cpu_has_vmx_virtualize_x2apic_mode;
> > 
> > I've been think about this, and it seems kind of asymmetric that for
> > xAPIC mode we report hw assisted support only with
> > virtualize_apic_accesses available, while for x2APIC we require
> > virtualize_x2apic_mode plus either apic_reg_virt or
> > virtual_intr_delivery.
> > 
> > I think we likely need to be more consistent here, and report hw
> > assisted x2APIC support as long as virtualize_x2apic_mode is
> > available.
> > 
> > This will likely have some effect on patch 2 also, as you will have to
> > adjust vmx_vlapic_msr_changed.
> > 
> > Thanks, Roger.
> 
> Any other thoughts on this? As on one hand it is asymmetric but also 
> there isn't much assistance with only virtualize_x2apic_mode set as, in 
> this case, a VM exit will be avoided only when trying to access the TPR 
> register.

I've been thinking about this, and reporting hardware assisted
x{2}APIC virtualization with just
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES or
SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE doesn't seem very helpful. While
those provide some assistance to the VMM in order to handle APIC
accesses, it will still require a trap into the hypervisor to handle
most of the accesses.

So maybe we should only report hardware assisted support when the
mentioned features are present together with
SECONDARY_EXEC_APIC_REGISTER_VIRT?

Thanks, Roger.



Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data

2022-02-11 Thread Roger Pau Monné
On Fri, Feb 11, 2022 at 11:50:46AM +0100, Jan Beulich wrote:
> On 11.02.2022 11:47, Roger Pau Monné wrote:
> > On Fri, Feb 11, 2022 at 11:32:45AM +0100, Jan Beulich wrote:
> >> On 11.02.2022 10:02, Roger Pau Monné wrote:
> >>> On Fri, Feb 11, 2022 at 08:23:27AM +0100, Norbert Manthey wrote:
>  When re-identifying CPU data, we might use uninitialized data when
>  checking for the cache line property to adapt the cache
>  alignment. The data that depends on this uninitialized read is
>  currently not forwarded.
> 
>  To avoid problems in the future, initialize the data cpuinfo
>  structure before re-identifying the CPU again.
> 
>  The trace to hit the uninitialized read reported by Coverity is:
> 
>  bool recheck_cpu_features(unsigned int cpu)
>  ...
>  struct cpuinfo_x86 c;
>  ...
>  identify_cpu();
> 
>  void identify_cpu(struct cpuinfo_x86 *c)
>  ...
>  generic_identify(c)
> 
>  static void generic_identify(struct cpuinfo_x86 *c)
>  ...
> >>>
> >>> Would it be more appropriate for generic_identify to also set
> >>> x86_cache_alignment like it's done in early_cpu_init?
> >>>
> >>> generic_identify already re-fetches a bunch of stuff that's also
> >>> set by early_cpu_init for the BSP.
> >>
> >> This would be an option, but how sure are you that there isn't
> >> (going to be) another field with similar properties? We better
> >> wouldn't require _everything_ to be re-filled in generic_identify().
> > 
> > So you think generic_identify should call into early_cpu_init, or even
> > split the cpuinfo_x86 filling done in early_cpu_init into a non-init
> > function that could be called by both generic_identify and
> > early_cpu_init?
> 
> No, I think it is quite fine for this to be a two-step process.

But it's not a two step process for all CPUs. It's a two step process
for the BSP, that will get it's cpuinfo filled by early_cpu_init
first, and then by identify_cpu. OTHO APs will only get the
information filled by identify_cpu.

Maybe APs don't care about having x86_cache_alignment correctly set?

Thanks, Roger.



Re: [RFC v2 5/8] xen/arm: introduce SCMI-SMC mediator driver

2022-02-11 Thread Bertrand Marquis
Hi Oleksii,


> On 11 Feb 2022, at 10:44, Oleksii Moisieiev  
> wrote:
> 
> Hi Bertrand,
> 
> On Fri, Feb 11, 2022 at 08:46:05AM +, Bertrand Marquis wrote:
>> Hi Oleksii,
>> 
>> 
>>> On 8 Feb 2022, at 18:00, Oleksii Moisieiev  
>>> wrote:
>>> 
>>> This is the implementation of SCI interface, called SCMI-SMC driver,
>>> which works as the mediator between XEN Domains and Firmware (SCP, ATF etc).
>>> This allows devices from the Domains to work with clocks, resets and
>>> power-domains without access to CPG.
>>> 
>>> Originally, cpg should be passed to the domain so it can work with
>>> power-domains/clocks/resets etc. Considering that cpg can't be split between
>>> the Domains, we get the limitation that the devices, which are using
>>> power-domains/clocks/resets etc, couldn't be split between the domains.
>>> The solution is to move the power-domain/clock/resets etc to the
>>> Firmware (such as SCP firmware or ATF) and provide interface for the
>>> Domains. XEN should have an entity, caled SCI-Mediator, which is
>>> responsible for messages redirection between Domains and Firmware and
>>> for permission handling.
>>> 
>>> The following features are implemented:
>>> - request SCMI channels from ATF and pass channels to Domains;
>>> - set device permissions for Domains based on the Domain partial
>>> device-tree. Devices with permissions are able to work with clocks,
>>> resets and power-domains via SCMI;
>>> - redirect scmi messages from Domains to ATF.
>> 
>> Before going more deeply in the code I would like to discuss the general
>> design here and ask some questions to prevent to rework the code before
>> we all agree that this is the right solution and that we want this in Xen.
>> 
>> First I want to point out that clock/reset/power virtualization is a problem
>> on most applications using device pass-through and I am very glad that
>> someone is looking into it.
>> Also SCMI is the current standard existing for this so relying on it is a 
>> very
>> good idea.
>> 
>> Latest version SCMI standard (DEN0056D v3.1) is defining some means
>> to use SCMI on a virtualised system. In chapter 4.2.1, the standard
>> recommends to set permissions per agent in the hypervisor so that a VM
>> could later use the discovery protocol to detect the resources and use them.
>> Using this kind of scenario the mediator in Xen would just configure the
>> Permissions in the SCMI and would then rely on it to limit what is possible
>> by who just by just assigning a channel to a VM.
> 
>> 
>> In your current design (please correct me if I am wrong) you seem to fully
>> rely on Xen and the FDT for discovery and permission.
> 
> In current implementation Xen is the trusted agent. And it's responsible
> for permissions setting. During initialization it discovers agent and
> set permissions by using BASE_SET_DEVICE_PERMISSIONS to the Dom0. When
> new domain is created, Xen assigns agent id for this domain and request
> resources, that are passed-through to this Domain.

Ok

> 
> I'm getting the follwing information from FDT:
> 1) Shared memory addressed, which should be used for agents. During
> initialization I send BASE_DISCOVER_AGENT to each of this addresses and
> receive agent_id. Xen is responsible for assigning agent_id for the
> Domain. Then Xen intercept smc calls from the domain, set agent_id and
> redirects it to the Firmware.

So Xen is setting the agent ID, no way for a guest to get access to something it
should with more check, am I right ?

> 
> 2) Devices, that are using SCMI. Those devices has clock/power/resets
> etc related to scmi protocol (as it is done in Linux kernel)
> and scmi_devid should be set. I'm currently preparing to send patch,
> updating kernel bindings with this parameter to Linux kernel.
> scmi_devid value should match device id, set in the Firmware.
> dt example:
>  {
>scmi_devid = <1>; // usb0 device id
>clocks = <_clock 1> // relays on clock with id 1
> }
> 
> Xen requests permission for the device when device is attached to the
> Domain during creation.

Without this, how is (if it is) the linux kernel using SCMI for power 
management ?

> 
>> Wouldn’t it be a better idea to use the protocol fully ?
> 
> Hm, I was thinking I am using the protocol fully. Did I miss something?

Sorry you seem to be, my understanding of your design was not right.

> 
>> Could we get rid of some of the FDT dependencies by using the discovery
>> system of SCMI ?
> 
> I'm using FDT to get shmem regions for the channels. Then I send
> BASE_DISCOVER_AGENT to each region and getting agent data. Did I use the
> discovery system wrong?

After more digging it seems you are. The link between scmi resource and device
is not possible to get automatically.

> 
>> How is Linux doing this currently ? Is it relying on device tree to discover
>> the SCMI resources ?
> 
> Yes. Linux kernel has 2 nodes in the device-tree: arm,scmi-shmem, which
> includes memory region for the communication and arm,scmi-smc node,
> 

[PATCH] x86emul: fix SIMD test overriding of VBROADCASTS{S,D}

2022-02-11 Thread Jan Beulich
Despite their suffixes these aren't scalar instructions, and hence the
128- and 256-bit EVEX forms may not be used without AVX512VL. Gcc11 ends
up generating such instances for simd-sg.c.

Signed-off-by: Jan Beulich 

--- a/tools/tests/x86_emulator/simd.h
+++ b/tools/tests/x86_emulator/simd.h
@@ -250,7 +250,9 @@ asm ( ".macro override insn\n\t"
 # define OVR_INT(n) OVR_BW(n); OVR_DQ(n)
 
 OVR_INT(broadcast);
+# ifdef __AVX512VL__
 OVR_SFP(broadcast);
+# endif
 OVR_SFP(comi);
 OVR_VFP(cvtdq2);
 OVR_INT(abs);




[PATCH] x86emul: fix VPBLENDMW with mask and memory operand

2022-02-11 Thread Jan Beulich
Element size for this opcode depends on EVEX.W, not the low opcode bit.
Make use of AVX512BW being a prereq to AVX512_BITALG and move the case
label there, adding an AVX512BW feature check.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -7401,7 +7401,6 @@ x86_emulate(
 case X86EMUL_OPC_EVEX_66(0x0f38, 0x0b): /* vpmulhrsw 
[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
 case X86EMUL_OPC_EVEX_66(0x0f38, 0x1c): /* vpabsb [xyz]mm/mem,[xyz]mm{k} */
 case X86EMUL_OPC_EVEX_66(0x0f38, 0x1d): /* vpabsw [xyz]mm/mem,[xyz]mm{k} */
-case X86EMUL_OPC_EVEX_66(0x0f38, 0x66): /* vpblendm{b,w} 
[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
 host_and_vcpu_must_have(avx512bw);
 generate_exception_if(evex.brs, EXC_UD);
 elem_bytes = 1 << (b & 1);
@@ -9558,6 +9557,9 @@ x86_emulate(
 /* fall through */
 case X86EMUL_OPC_EVEX_66(0x0f38, 0x54): /* vpopcnt{b,w} 
[xyz]mm/mem,[xyz]mm{k} */
 host_and_vcpu_must_have(avx512_bitalg);
+/* fall through */
+case X86EMUL_OPC_EVEX_66(0x0f38, 0x66): /* vpblendm{b,w} 
[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */
+host_and_vcpu_must_have(avx512bw);
 generate_exception_if(evex.brs, EXC_UD);
 elem_bytes = 1 << evex.w;
 goto avx512f_no_sae;




Re: [PATCH 2/3] x86/Intel: skip CORE_THREAD_COUNT read on family 0xf

2022-02-11 Thread Jan Beulich
On 11.02.2022 11:58, Roger Pau Monné wrote:
> On Thu, Feb 10, 2022 at 03:56:12PM +0100, Jan Beulich wrote:
>> This avoids an unnecessary (and always somewhat scary) log message for
>> the recovered from #GP(0).
>>
>> Signed-off-by: Jan Beulich 
> 
> Acked-by: Roger Pau Monné 

Thanks.

>> ---
>> Perhaps even use "== 6" in the family check?
> 
> I think it's best as is. Even on family 6 this seems to be supported
> only on model 3f? (Haswell Xeon E5 v3 and E7 v3?)

Well, ...

> The comment also seems to note this is mostly undocumented.

... this same comment says "Nehalem and later". And this also
matches my observations (and, given he has written the comment, quite
clearly Andrew's as well), no matter what the (notoriously incomplete)
SDM Vol 4 may say.

Jan




[PATCH] x86emul: work around gcc11 bug in SIMD tests

2022-02-11 Thread Jan Beulich
Gcc11 looks to have trouble with conditional expressions used with
vector operands: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104497.
Replace two instances causing SEGV there in certain cases.

Signed-off-by: Jan Beulich 

--- a/tools/tests/x86_emulator/simd.c
+++ b/tools/tests/x86_emulator/simd.c
@@ -1727,8 +1727,8 @@ int simd_test(void)
 if ( !eq(x - src, (alt + 1) / 2) ) return __LINE__;
 #endif
 
-for ( i = 0; i < ELEM_COUNT; ++i )
-y[i] = (i & 1 ? inv : src)[i];
+for ( y = src, i = 1; i < ELEM_COUNT; i += 2 )
+y[i] = inv[i];
 
 #ifdef select
 # ifdef UINT_SIZE
--- a/tools/tests/x86_emulator/simd-sg.c
+++ b/tools/tests/x86_emulator/simd-sg.c
@@ -282,7 +282,7 @@ int sg_test(void)
 # if ELEM_SIZE == IDX_SIZE
 y = gather(x, array, idx, (idx & inv) != 0, ELEM_SIZE);
 for ( i = 0; i < ITEM_COUNT; ++i )
-if ( y[i] != ((i + 1) & (ITEM_COUNT - i) ? idx : inv)[i] + 1 )
+if ( y[i] != ((i + 1) & (ITEM_COUNT - i) ? idx[i] : inv[i]) + 1 )
 return __LINE__;
 for ( ; i < ELEM_COUNT; ++i )
 if ( y[i] )




Re: [PATCH 2/3] x86/Intel: skip CORE_THREAD_COUNT read on family 0xf

2022-02-11 Thread Roger Pau Monné
On Thu, Feb 10, 2022 at 03:56:12PM +0100, Jan Beulich wrote:
> This avoids an unnecessary (and always somewhat scary) log message for
> the recovered from #GP(0).
> 
> Signed-off-by: Jan Beulich 

Acked-by: Roger Pau Monné 

> ---
> Perhaps even use "== 6" in the family check?

I think it's best as is. Even on family 6 this seems to be supported
only on model 3f? (Haswell Xeon E5 v3 and E7 v3?)

The comment also seems to note this is mostly undocumented.

Thanks, Roger.



Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data

2022-02-11 Thread Jan Beulich
On 11.02.2022 11:47, Roger Pau Monné wrote:
> On Fri, Feb 11, 2022 at 11:32:45AM +0100, Jan Beulich wrote:
>> On 11.02.2022 10:02, Roger Pau Monné wrote:
>>> On Fri, Feb 11, 2022 at 08:23:27AM +0100, Norbert Manthey wrote:
 When re-identifying CPU data, we might use uninitialized data when
 checking for the cache line property to adapt the cache
 alignment. The data that depends on this uninitialized read is
 currently not forwarded.

 To avoid problems in the future, initialize the data cpuinfo
 structure before re-identifying the CPU again.

 The trace to hit the uninitialized read reported by Coverity is:

 bool recheck_cpu_features(unsigned int cpu)
 ...
 struct cpuinfo_x86 c;
 ...
 identify_cpu();

 void identify_cpu(struct cpuinfo_x86 *c)
 ...
 generic_identify(c)

 static void generic_identify(struct cpuinfo_x86 *c)
 ...
>>>
>>> Would it be more appropriate for generic_identify to also set
>>> x86_cache_alignment like it's done in early_cpu_init?
>>>
>>> generic_identify already re-fetches a bunch of stuff that's also
>>> set by early_cpu_init for the BSP.
>>
>> This would be an option, but how sure are you that there isn't
>> (going to be) another field with similar properties? We better
>> wouldn't require _everything_ to be re-filled in generic_identify().
> 
> So you think generic_identify should call into early_cpu_init, or even
> split the cpuinfo_x86 filling done in early_cpu_init into a non-init
> function that could be called by both generic_identify and
> early_cpu_init?

No, I think it is quite fine for this to be a two-step process. In
fact I was hoping to eliminate some of the remaining redundancy
where possible. recheck_cpu_features(), after all, cares about just
the feature flags, so doesn't require things like cache line
alignment to be filled in.

Jan




Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data

2022-02-11 Thread Roger Pau Monné
On Fri, Feb 11, 2022 at 11:32:45AM +0100, Jan Beulich wrote:
> On 11.02.2022 10:02, Roger Pau Monné wrote:
> > On Fri, Feb 11, 2022 at 08:23:27AM +0100, Norbert Manthey wrote:
> >> When re-identifying CPU data, we might use uninitialized data when
> >> checking for the cache line property to adapt the cache
> >> alignment. The data that depends on this uninitialized read is
> >> currently not forwarded.
> >>
> >> To avoid problems in the future, initialize the data cpuinfo
> >> structure before re-identifying the CPU again.
> >>
> >> The trace to hit the uninitialized read reported by Coverity is:
> >>
> >> bool recheck_cpu_features(unsigned int cpu)
> >> ...
> >> struct cpuinfo_x86 c;
> >> ...
> >> identify_cpu();
> >>
> >> void identify_cpu(struct cpuinfo_x86 *c)
> >> ...
> >> generic_identify(c)
> >>
> >> static void generic_identify(struct cpuinfo_x86 *c)
> >> ...
> > 
> > Would it be more appropriate for generic_identify to also set
> > x86_cache_alignment like it's done in early_cpu_init?
> > 
> > generic_identify already re-fetches a bunch of stuff that's also
> > set by early_cpu_init for the BSP.
> 
> This would be an option, but how sure are you that there isn't
> (going to be) another field with similar properties? We better
> wouldn't require _everything_ to be re-filled in generic_identify().

So you think generic_identify should call into early_cpu_init, or even
split the cpuinfo_x86 filling done in early_cpu_init into a non-init
function that could be called by both generic_identify and
early_cpu_init?

Thanks, Roger.



Re: [RFC v2 5/8] xen/arm: introduce SCMI-SMC mediator driver

2022-02-11 Thread Oleksii Moisieiev
Hi Bertrand,

On Fri, Feb 11, 2022 at 08:46:05AM +, Bertrand Marquis wrote:
> Hi Oleksii,
> 
> 
> > On 8 Feb 2022, at 18:00, Oleksii Moisieiev  
> > wrote:
> > 
> > This is the implementation of SCI interface, called SCMI-SMC driver,
> > which works as the mediator between XEN Domains and Firmware (SCP, ATF etc).
> > This allows devices from the Domains to work with clocks, resets and
> > power-domains without access to CPG.
> > 
> > Originally, cpg should be passed to the domain so it can work with
> > power-domains/clocks/resets etc. Considering that cpg can't be split between
> > the Domains, we get the limitation that the devices, which are using
> > power-domains/clocks/resets etc, couldn't be split between the domains.
> > The solution is to move the power-domain/clock/resets etc to the
> > Firmware (such as SCP firmware or ATF) and provide interface for the
> > Domains. XEN should have an entity, caled SCI-Mediator, which is
> > responsible for messages redirection between Domains and Firmware and
> > for permission handling.
> > 
> > The following features are implemented:
> > - request SCMI channels from ATF and pass channels to Domains;
> > - set device permissions for Domains based on the Domain partial
> > device-tree. Devices with permissions are able to work with clocks,
> > resets and power-domains via SCMI;
> > - redirect scmi messages from Domains to ATF.
> 
> Before going more deeply in the code I would like to discuss the general
> design here and ask some questions to prevent to rework the code before
> we all agree that this is the right solution and that we want this in Xen.
> 
> First I want to point out that clock/reset/power virtualization is a problem
> on most applications using device pass-through and I am very glad that
> someone is looking into it.
> Also SCMI is the current standard existing for this so relying on it is a very
> good idea.
> 
> Latest version SCMI standard (DEN0056D v3.1) is defining some means
> to use SCMI on a virtualised system. In chapter 4.2.1, the standard
> recommends to set permissions per agent in the hypervisor so that a VM
> could later use the discovery protocol to detect the resources and use them.
> Using this kind of scenario the mediator in Xen would just configure the
> Permissions in the SCMI and would then rely on it to limit what is possible
> by who just by just assigning a channel to a VM.

> 
> In your current design (please correct me if I am wrong) you seem to fully
> rely on Xen and the FDT for discovery and permission.

In current implementation Xen is the trusted agent. And it's responsible
for permissions setting. During initialization it discovers agent and
set permissions by using BASE_SET_DEVICE_PERMISSIONS to the Dom0. When
new domain is created, Xen assigns agent id for this domain and request
resources, that are passed-through to this Domain.

I'm getting the follwing information from FDT:
1) Shared memory addressed, which should be used for agents. During
initialization I send BASE_DISCOVER_AGENT to each of this addresses and
receive agent_id. Xen is responsible for assigning agent_id for the
Domain. Then Xen intercept smc calls from the domain, set agent_id and
redirects it to the Firmware.

2) Devices, that are using SCMI. Those devices has clock/power/resets
etc related to scmi protocol (as it is done in Linux kernel)
and scmi_devid should be set. I'm currently preparing to send patch,
updating kernel bindings with this parameter to Linux kernel.
scmi_devid value should match device id, set in the Firmware.
dt example:
 {
scmi_devid = <1>; // usb0 device id
clocks = <_clock 1> // relays on clock with id 1
}

Xen requests permission for the device when device is attached to the
Domain during creation.

> Wouldn’t it be a better idea to use the protocol fully ?

Hm, I was thinking I am using the protocol fully. Did I miss something?

> Could we get rid of some of the FDT dependencies by using the discovery
> system of SCMI ?

I'm using FDT to get shmem regions for the channels. Then I send
BASE_DISCOVER_AGENT to each region and getting agent data. Did I use the
discovery system wrong?

> How is Linux doing this currently ? Is it relying on device tree to discover
>  the SCMI resources ?

Yes. Linux kernel has 2 nodes in the device-tree: arm,scmi-shmem, which
includes memory region for the communication and arm,scmi-smc node,
which describes all data related to scmi ( func_id, protocols etc)
Then the device nodes refer to the protocols by setting
clock/resets/power-domains etc. Please see the example above.
BASE_DISCOVER_AGENT is not used in Linux kernel.
The main idea was that scmi related changes to the device-tree are
common for virtualized and non virtualized systems. So the same FDT
configuration should work with of without Xen.

> 
> Also I understand that you rely on some entries to be declared in the device
> tree and also some support to be implemented in ATF or SCP. I checked in
> The boards I have 

Re: [PATCH 1/3] x86/Intel: skip PLATFORM_INFO reads on family 0xf

2022-02-11 Thread Roger Pau Monné
On Fri, Feb 11, 2022 at 10:59:10AM +0100, Jan Beulich wrote:
> On 11.02.2022 10:40, Roger Pau Monné wrote:
> > On Thu, Feb 10, 2022 at 03:55:52PM +0100, Jan Beulich wrote:
> >> This avoids unnecessary (and always somewhat scary) log messages for the
> >> recovered from #GP(0).
> > 
> > Could we maybe get rid of the #GP messages for cases like this where we
> > are explicitly probing for MSR presence? (ie: it's expected that we
> > can get a #GP)
> 
> This would mean some form of annotation of such RDMSR attempts (for
> the recovery code to recognize in order to skip the printk()). Not
> all rdmsr_safe() uses are, strictly speaking, probes, so I wouldn't
> want to put such in rdmsr_safe() itself.
> 
> In any event - quite a bit more work. Plus I'm not convinced it's a
> good idea to suppress any such log messages.
> 
> >> Signed-off-by: Jan Beulich 

Acked-by: Roger Pau Monné 

> >> ---
> >> Perhaps even use "!= 6" in at least the CPUID-faulting family check?
> > 
> > Likely, or else you would also need to check for family 11 (Knights
> > Corner?) which doesn't seem to support PLATFORM_INFO either.
> 
> I don't think Xen is able to run on these (likewise for IA64, which
> iirc were surfacing as x86 model 7)? These are the co-processor ones,
> aren't they?

Right, Knights Corner uses a socket mount but it's still a
co-processor. It was Knights Landing the first one that could be used
as a host processor.

> My question was more towards whether we would (wrongly)
> exclude future processors when using != 6, if Intel decided to ever
> make new CPUs with a family other than 6.

In the case here I think we should only avoid the probe for family
0xf. Newer families (or even models on family 6 not supporting
PLATFORM_INFO) will just get a #GP message which is OK I think, we
could fix that in due time.

It's better to get a #GP message for probing than to just skip
detection of CPUID faulting on unknown newer families IMO.

Thanks, Roger.



Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data

2022-02-11 Thread Jan Beulich
On 11.02.2022 08:23, Norbert Manthey wrote:
> --- a/xen/arch/x86/cpuid.c
> +++ b/xen/arch/x86/cpuid.c
> @@ -609,7 +609,7 @@ void __init init_guest_cpuid(void)
>  bool recheck_cpu_features(unsigned int cpu)
>  {
>  bool okay = true;
> -struct cpuinfo_x86 c;
> +struct cpuinfo_x86 c = boot_cpu_data;
>  const struct cpuinfo_x86 *bsp = _cpu_data;
>  unsigned int i;

While I agree with the need to initialize the local variable, I
don't think it should be pre-seeded with previous indentification
results: This could end up hiding bugs. Instead I'd see it simply
be zero-filled.

Jan




Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data

2022-02-11 Thread Jan Beulich
On 11.02.2022 10:02, Roger Pau Monné wrote:
> On Fri, Feb 11, 2022 at 08:23:27AM +0100, Norbert Manthey wrote:
>> When re-identifying CPU data, we might use uninitialized data when
>> checking for the cache line property to adapt the cache
>> alignment. The data that depends on this uninitialized read is
>> currently not forwarded.
>>
>> To avoid problems in the future, initialize the data cpuinfo
>> structure before re-identifying the CPU again.
>>
>> The trace to hit the uninitialized read reported by Coverity is:
>>
>> bool recheck_cpu_features(unsigned int cpu)
>> ...
>> struct cpuinfo_x86 c;
>> ...
>> identify_cpu();
>>
>> void identify_cpu(struct cpuinfo_x86 *c)
>> ...
>> generic_identify(c)
>>
>> static void generic_identify(struct cpuinfo_x86 *c)
>> ...
> 
> Would it be more appropriate for generic_identify to also set
> x86_cache_alignment like it's done in early_cpu_init?
> 
> generic_identify already re-fetches a bunch of stuff that's also
> set by early_cpu_init for the BSP.

This would be an option, but how sure are you that there isn't
(going to be) another field with similar properties? We better
wouldn't require _everything_ to be re-filled in generic_identify().

Jan




Re: [PATCH 3/3] x86/Intel: also display CPU freq for family 0xf

2022-02-11 Thread Roger Pau Monné
On Thu, Feb 10, 2022 at 03:56:48PM +0100, Jan Beulich wrote:
> Actually we can do better than simply bailing for there not being any
> PLATFORM_INFO MSR on these. The "max" part of the information is
> available in another MSR, alongside the scaling factor (which is
> encoded in similar ways to Core/Core2, and hence the decoding table can
> be shared).
> 
> Signed-off-by: Jan Beulich 

Acked-by: Roger Pau Monné 

> ---
> The inner switch() is left indented one level too much (and with an
> extra pair of braces) to limit the diff. I'd prefer to make a follow-up
> patch reducing the indentation, unless I'm told to do so right here.

I'm fine with a followup patch.

> --- a/xen/arch/x86/cpu/intel.c
> +++ b/xen/arch/x86/cpu/intel.c
> @@ -412,9 +412,9 @@ static int num_cpu_cores(struct cpuinfo_
>  
>  static void intel_log_freq(const struct cpuinfo_x86 *c)
>  {
> -unsigned int eax, ebx, ecx, edx;
> +unsigned int eax, ebx, ecx, edx, factor;
>  uint64_t msrval;
> -uint8_t max_ratio;
> +uint8_t max_ratio, min_ratio;
>  
>  if ( c->cpuid_level >= 0x15 )
>  {
> @@ -455,21 +455,22 @@ static void intel_log_freq(const struct
>  }
>  }
>  
> -if ( c->x86 == 0xf || rdmsr_safe(MSR_INTEL_PLATFORM_INFO, msrval) )
> -return;
> -max_ratio = msrval >> 8;
> -
> -if ( max_ratio )
> +switch ( c->x86 )
>  {
> -unsigned int factor = 1;
> -uint8_t min_ratio = msrval >> 40;
> +static const unsigned short core_factors[] =

This no longer applies to Core models only, so I wouldn't be opposed
to renaming to scaling_factors or similar.

Thanks, Roger.



Re: [PATCH v2 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86

2022-02-11 Thread Jane Malalane
On 10/02/2022 10:03, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 06:21:00PM +, Jane Malalane wrote:
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index 7ab15e07a0..4060aef1bd 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -343,6 +343,15 @@ static int vmx_init_vmcs_config(bool bsp)
>>   MSR_IA32_VMX_PROCBASED_CTLS2, );
>>   }
>>   
>> +/* Check whether hardware supports accelerated xapic and x2apic. */
>> +if ( bsp )
>> +{
>> +assisted_xapic_available = cpu_has_vmx_virtualize_apic_accesses;
>> +assisted_x2apic_available = (cpu_has_vmx_apic_reg_virt ||
>> + cpu_has_vmx_virtual_intr_delivery) &&
>> +cpu_has_vmx_virtualize_x2apic_mode;
> 
> I've been think about this, and it seems kind of asymmetric that for
> xAPIC mode we report hw assisted support only with
> virtualize_apic_accesses available, while for x2APIC we require
> virtualize_x2apic_mode plus either apic_reg_virt or
> virtual_intr_delivery.
> 
> I think we likely need to be more consistent here, and report hw
> assisted x2APIC support as long as virtualize_x2apic_mode is
> available.
> 
> This will likely have some effect on patch 2 also, as you will have to
> adjust vmx_vlapic_msr_changed.
> 
> Thanks, Roger.

Any other thoughts on this? As on one hand it is asymmetric but also 
there isn't much assistance with only virtualize_x2apic_mode set as, in 
this case, a VM exit will be avoided only when trying to access the TPR 
register.

Thanks,

Jane.

Re: [PATCH 1/3] x86/Intel: skip PLATFORM_INFO reads on family 0xf

2022-02-11 Thread Jan Beulich
On 11.02.2022 10:40, Roger Pau Monné wrote:
> On Thu, Feb 10, 2022 at 03:55:52PM +0100, Jan Beulich wrote:
>> This avoids unnecessary (and always somewhat scary) log messages for the
>> recovered from #GP(0).
> 
> Could we maybe get rid of the #GP messages for cases like this where we
> are explicitly probing for MSR presence? (ie: it's expected that we
> can get a #GP)

This would mean some form of annotation of such RDMSR attempts (for
the recovery code to recognize in order to skip the printk()). Not
all rdmsr_safe() uses are, strictly speaking, probes, so I wouldn't
want to put such in rdmsr_safe() itself.

In any event - quite a bit more work. Plus I'm not convinced it's a
good idea to suppress any such log messages.

>> Signed-off-by: Jan Beulich 
>> ---
>> Perhaps even use "!= 6" in at least the CPUID-faulting family check?
> 
> Likely, or else you would also need to check for family 11 (Knights
> Corner?) which doesn't seem to support PLATFORM_INFO either.

I don't think Xen is able to run on these (likewise for IA64, which
iirc were surfacing as x86 model 7)? These are the co-processor ones,
aren't they? My question was more towards whether we would (wrongly)
exclude future processors when using != 6, if Intel decided to ever
make new CPUs with a family other than 6.

Jan




Re: [PATCH 1/3] x86/Intel: skip PLATFORM_INFO reads on family 0xf

2022-02-11 Thread Roger Pau Monné
On Thu, Feb 10, 2022 at 03:55:52PM +0100, Jan Beulich wrote:
> This avoids unnecessary (and always somewhat scary) log messages for the
> recovered from #GP(0).

Could we maybe get rid of the #GP messages for cases like this where we
are explicitly probing for MSR presence? (ie: it's expected that we
can get a #GP)

> Signed-off-by: Jan Beulich 
> ---
> Perhaps even use "!= 6" in at least the CPUID-faulting family check?

Likely, or else you would also need to check for family 11 (Knights
Corner?) which doesn't seem to support PLATFORM_INFO either.

Thanks, Roger.



Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data

2022-02-11 Thread Roger Pau Monné
On Fri, Feb 11, 2022 at 08:23:27AM +0100, Norbert Manthey wrote:
> When re-identifying CPU data, we might use uninitialized data when
> checking for the cache line property to adapt the cache
> alignment. The data that depends on this uninitialized read is
> currently not forwarded.
> 
> To avoid problems in the future, initialize the data cpuinfo
> structure before re-identifying the CPU again.
> 
> The trace to hit the uninitialized read reported by Coverity is:
> 
> bool recheck_cpu_features(unsigned int cpu)
> ...
> struct cpuinfo_x86 c;
> ...
> identify_cpu();
> 
> void identify_cpu(struct cpuinfo_x86 *c)
> ...
> generic_identify(c)
> 
> static void generic_identify(struct cpuinfo_x86 *c)
> ...

Would it be more appropriate for generic_identify to also set
x86_cache_alignment like it's done in early_cpu_init?

generic_identify already re-fetches a bunch of stuff that's also
set by early_cpu_init for the BSP.

Thanks, Roger.



Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

2022-02-11 Thread Oleksandr Andrushchenko


On 10.02.22 18:16, Roger Pau Monné wrote:
> On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko 
>>
>> Introduce a per-domain read/write lock to check whether vpci is present,
>> so we are sure there are no accesses to the contents of the vpci struct
>> if not. This lock can be used (and in a few cases is used right away)
>> so that vpci removal can be performed while holding the lock in write
>> mode. Previously such removal could race with vpci_read for example.
> Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt
> pci_remove_device, and likely when vPCI gets also used in
> {de}assign_device I think.
>
How about the below? It seems to guarantee that we can access pdev
without issues and without requiring pcidevs_lock to be used?

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index e8b09d77d880..fd464a58b3b3 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -937,8 +937,14 @@ static int deassign_device(struct domain *d, uint16_t seg, 
uint8_t bus,
  }

  devfn = pdev->devfn;
+#ifdef CONFIG_HAS_VPCI
+    write_lock(>vpci_rwlock);
+#endif
  ret = iommu_call(hd->platform_ops, reassign_device, d, target, devfn,
   pci_to_dev(pdev));
+#ifdef CONFIG_HAS_VPCI
+    write_unlock(>vpci_rwlock);
+#endif
  if ( ret )
  goto out;

@@ -1474,6 +1480,9 @@ static int assign_device(struct domain *d, u16 seg, u8 
bus, u8 devfn, u32 flag)
  const struct domain_iommu *hd = dom_iommu(d);
  struct pci_dev *pdev;
  int rc = 0;
+#ifdef CONFIG_HAS_VPCI
+    struct domain *old_d;
+#endif

  if ( !is_iommu_enabled(d) )
  return 0;
@@ -1487,15 +1496,34 @@ static int assign_device(struct domain *d, u16 seg, u8 
bus, u8 devfn, u32 flag)
  ASSERT(pdev && (pdev->domain == hardware_domain ||
  pdev->domain == dom_io));

+#ifdef CONFIG_HAS_VPCI
+    /* pdev->domain is either hwdom or dom_io. We do not want the later. */
+    old_d = pdev->domain == hardware_domain ? pdev->domain : NULL;
+    if ( old_d )
+    write_lock(_d->vpci_rwlock);
+#endif
+
  rc = pdev_msix_assign(d, pdev);
  if ( rc )
+    {
+#ifdef CONFIG_HAS_VPCI
+    if ( old_d )
+    write_unlock(_d->vpci_rwlock);
+#endif
  goto done;
+    }

  pdev->fault.count = 0;

  if ( (rc = iommu_call(hd->platform_ops, assign_device, d, devfn,
    pci_to_dev(pdev), flag)) )
+    {
+#ifdef CONFIG_HAS_VPCI
+    if ( old_d )
+    write_unlock(_d->vpci_rwlock);
+#endif
  goto done;
+    }

  for ( ; pdev->phantom_stride; rc = 0 )
  {

I think we don't care about pci_remove_device because:

int pci_remove_device(u16 seg, u8 bus, u8 devfn)
{
[snip]
     pcidevs_lock();
     list_for_each_entry ( pdev, >alldevs_list, alldevs_list )
     if ( pdev->bus == bus && pdev->devfn == devfn )
     {
     vpci_remove_device(pdev);

as vpci_remove_device will take care there are no readers and
will safely remove vpci.

Thank you,
Oleksandr

Re: [RFC v2 5/8] xen/arm: introduce SCMI-SMC mediator driver

2022-02-11 Thread Bertrand Marquis
Hi Oleksii,


> On 8 Feb 2022, at 18:00, Oleksii Moisieiev  wrote:
> 
> This is the implementation of SCI interface, called SCMI-SMC driver,
> which works as the mediator between XEN Domains and Firmware (SCP, ATF etc).
> This allows devices from the Domains to work with clocks, resets and
> power-domains without access to CPG.
> 
> Originally, cpg should be passed to the domain so it can work with
> power-domains/clocks/resets etc. Considering that cpg can't be split between
> the Domains, we get the limitation that the devices, which are using
> power-domains/clocks/resets etc, couldn't be split between the domains.
> The solution is to move the power-domain/clock/resets etc to the
> Firmware (such as SCP firmware or ATF) and provide interface for the
> Domains. XEN should have an entity, caled SCI-Mediator, which is
> responsible for messages redirection between Domains and Firmware and
> for permission handling.
> 
> The following features are implemented:
> - request SCMI channels from ATF and pass channels to Domains;
> - set device permissions for Domains based on the Domain partial
> device-tree. Devices with permissions are able to work with clocks,
> resets and power-domains via SCMI;
> - redirect scmi messages from Domains to ATF.

Before going more deeply in the code I would like to discuss the general
design here and ask some questions to prevent to rework the code before
we all agree that this is the right solution and that we want this in Xen.

First I want to point out that clock/reset/power virtualization is a problem
on most applications using device pass-through and I am very glad that
someone is looking into it.
Also SCMI is the current standard existing for this so relying on it is a very
good idea.

Latest version SCMI standard (DEN0056D v3.1) is defining some means
to use SCMI on a virtualised system. In chapter 4.2.1, the standard
recommends to set permissions per agent in the hypervisor so that a VM
could later use the discovery protocol to detect the resources and use them.
Using this kind of scenario the mediator in Xen would just configure the
Permissions in the SCMI and would then rely on it to limit what is possible
by who just by just assigning a channel to a VM.

In your current design (please correct me if I am wrong) you seem to fully
rely on Xen and the FDT for discovery and permission.
Wouldn’t it be a better idea to use the protocol fully ?
Could we get rid of some of the FDT dependencies by using the discovery
system of SCMI ?
How is Linux doing this currently ? Is it relying on device tree to discover
 the SCMI resources ?

Also I understand that you rely on some entries to be declared in the device
tree and also some support to be implemented in ATF or SCP. I checked in
The boards I have access to and the device trees but none of this seem to
be supported there. Could you tell which board/configuration/ATF you are
using so that the implementation could be tested/validated ?


Regards
Bertrand


> 
> Signed-off-by: Oleksii Moisieiev 
> ---
> xen/arch/arm/Kconfig|   2 +
> xen/arch/arm/sci/Kconfig|  10 +
> xen/arch/arm/sci/scmi_smc.c | 959 
> 3 files changed, 971 insertions(+)
> create mode 100644 xen/arch/arm/sci/Kconfig
> create mode 100644 xen/arch/arm/sci/scmi_smc.c
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index ab07833582..3b0dfc57b6 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -123,6 +123,8 @@ config ARM_SCI
> support. It allows guests to control system resourcess via one of
> ARM_SCI mediators implemented in XEN.
> 
> + source "arch/arm/sci/Kconfig"
> +
> endmenu
> 
> menu "ARM errata workaround via the alternative framework"
> diff --git a/xen/arch/arm/sci/Kconfig b/xen/arch/arm/sci/Kconfig
> new file mode 100644
> index 00..10b634d2ed
> --- /dev/null
> +++ b/xen/arch/arm/sci/Kconfig
> @@ -0,0 +1,10 @@
> +config SCMI_SMC
> + bool "Enable SCMI-SMC mediator driver"
> + default n
> + depends on ARM_SCI && HOST_DTB_EXPORT
> + ---help---
> +
> + Enables mediator in XEN to pass SCMI requests from Domains to ATF.
> + This feature allows drivers from Domains to work with System
> + Controllers (such as power,resets,clock etc.). SCP is used as transport
> + for communication.
> diff --git a/xen/arch/arm/sci/scmi_smc.c b/xen/arch/arm/sci/scmi_smc.c
> new file mode 100644
> index 00..103529dfab
> --- /dev/null
> +++ b/xen/arch/arm/sci/scmi_smc.c
> @@ -0,0 +1,959 @@
> +/*
> + * xen/arch/arm/sci/scmi_smc.c
> + *
> + * SCMI mediator driver, using SCP as transport.
> + *
> + * Oleksii Moisieiev 
> + * Copyright (C) 2021, EPAM Systems.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * 

[libvirt test] 168082: regressions - FAIL

2022-02-11 Thread osstest service owner
flight 168082 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168082/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  2ac78307af099a2af934d5499f262458f3ce0ea4
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  581 days
Failing since151818  2020-07-11 04:18:52 Z  580 days  562 attempts
Testing same since   168082  2022-02-11 04:20:14 Z0 days1 attempts


People who touched revisions under test:
Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Aleksei Zakharov 
  Andika Triwidada 
  Andrea Bolognani 
  Ani Sinha 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastian Germann 
  Bastien Orivel 
  BiaoXiang Ye 
  Bihong Yu 
  Binfeng Wu 
  Bjoern Walk 
  Boris Fiuczynski 
  Brad Laue 
  Brian Turek 
  Bruno Haible 
  Chris Mayo 
  Christian Borntraeger 
  Christian Ehrhardt 
  Christian Kirbach 
  Christian Schoenebeck 
  Christophe Fergeau 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Cédric Bosdonnat 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Didik Supriadi 
  dinglimin 
  Divya Garg 
  Dmitrii Shcherbakov 
  Dmytro Linkin 
  Eiichi Tsukata 
  Emilio Herrera 
  Eric Farman 
  Erik Skultety 
  Fabian Affolter 
  Fabian Freyer 
  Fabiano Fidêncio 
  Fangge Jin 
  Farhan Ali 
  Fedora Weblate Translation 
  Franck Ridel 
  Gavi Teitz 
  gongwei 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Hela Basa 
  Helmut Grohne 
  Hiroki Narukawa 
  Hyman Huang(黄勇) 
  Ian Wienand 
  Ioanna Alifieraki 
  Ivan Teterevkov 
  Jakob Meng 
  Jamie Strandboge 
  Jamie Strandboge 
  Jan Kuparinen 
  jason lee 
  Jean-Baptiste Holcroft 
  Jia Zhou 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jing Qi 
  Jinsheng Zhang 
  Jiri Denemark 
  Joachim Falk 
  John Ferlan 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Justin Gatzen 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Koichi Murase 
  Kristina Hanicova 
  Laine Stump 
  Laszlo Ersek 
  Lee Yarwood 
  Lei Yang 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Liu Yiding 
  Lubomir Rintel 
  Luke Yue 
  Luyao Zhong 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Masayoshi Mizuma 
  Matej Cepl 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Meina Li 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Moshe Levi 
  Muha Aliss 
  Nathan 
  Neal Gompa 
  Nick Chevsky 
  Nick Shyrokovskiy 
  Nickys Music Group 
  Nico Pache 
  Nicolas Lécureuil 
  Nicolas Lécureuil 
  Nikolay Shirokovskiy 
  Olaf Hering 
  Olesya Gerasimenko 
  Or Ozeri 
  Orion Poplawski 
  Pany 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peng Liang 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Praveen K Paladugu 
  Richard W.M. Jones 
  Ricky Tigg 
  Robin Lee 
  Rohit Kumar 
  Roman Bogorodskiy 
  Roman Bolshakov 
  Ryan Gahagan 
  Ryan Schmidt 
  Sam Hartman 
  Scott Shambarger 
  Sebastian Mitterle 
  SeongHyun Jo 
  Shalini Chellathurai Saroja 
  Shaojun Yang 

Re: [PATCH] MAINTAINERS: make Bertrand ARM maintainer

2022-02-11 Thread Bertrand Marquis
Hi,


> On 11 Feb 2022, at 08:22, Bertrand Marquis  wrote:
> 
> Hi Stefano,
> 
> Thanks a lot :-)
> 
>> On 10 Feb 2022, at 19:08, Stefano Stabellini  wrote:
>> 
>> Signed-off-by: Stefano Stabellini 


Fixing typo introduced by spell checker:

> 
> Asked-by: Bertrand Marquis 

Acked-by : Bertrand Marquis 

Bertrand

> 
> Cheers
> Bertrand
> 
>> 
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 64934cc070..d41572128b 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -230,8 +230,8 @@ F:   tools/libs/ctrl/xc_arinc653.c
>> ARM (W/ VIRTUALISATION EXTENSIONS) ARCHITECTURE
>> M:   Stefano Stabellini 
>> M:   Julien Grall 
>> +M:  Bertrand Marquis 
>> R:   Volodymyr Babchuk 
>> -R:  Bertrand Marquis 
>> S:   Supported
>> L:   xen-devel@lists.xenproject.org
>> F:   docs/misc/arm/
> 
> 




Re: [RFC] Avoid dom0/HVM performance penalty from MSR access tightening

2022-02-11 Thread Roger Pau Monné
On Thu, Feb 10, 2022 at 11:27:15AM -0600, Alex Olson wrote:
> I'm seeing strange performance issues under Xen on a Supermicro server with a 
> Xeon D-1541 CPU caused by an MSR-related commit.
> 
> Commit 322ec7c89f6640ee2a99d1040b6f786cf04872cf 'x86/pv: disallow access to 
> unknown MSRs'
> surprisingly introduces a severe performance penality where dom0 has about 
> 1/8th
> the normal CPU performance. Even even when 'xenpm' is used to select the
> performance governor and operate the CPU at maximum frequency, actual CPU
> performance is still 1/2 of normal (as well as using 
> "cpufreq=xen,performance").
> 
> The patch below fixes it but I don't fully understand why.
> 
> Basically, when *reads* of MSR_IA32_THERM_CONTROL are blocked, dom0 and
> guests (pinned to other CPUs) see the performance issues.

You only mention MSR_IA32_THERM_CONTROL here...

> For benchmarking purposes, I built a small C program that runs a "for
> loop" 
> 4Billion iterations and timed its execution. In dom0, the
> performance issues
> also cause HVM guest startup time to go from 9-10
> seconds to almost 80 seconds.
> 
> I assumed Xen was managing CPU frequency and thus blocking related MSR
> access by dom0 (or any other domain). However,  clearly something else
> is happening and I don't understand why.
> 
> I initially attempted to copy the same logic as the write MSR case. This
> was effective at fixing the dom0 performance issue, but still left other
> domains running at 1/2 speed. Hence, the change below has no access control.
> 
> 
> If anyone has any insight as to what is really happening, I would be all ears
> as I am unsure if the change below is a proper solution.
> 
> Thanks
> 
> -Alex
> 
> ---
> ---
>  xen/arch/x86/pv/emul-priv-op.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
> index 7f4279a051..f254479bda 100644
> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -970,6 +970,18 @@ static int read_msr(unsigned int reg, uint64_t *val,
>  *val = 0;
>  return X86EMUL_OKAY;
>  
> +/* being unable to read MSR_IA32_THERM_CONTROL seems to significantly 
> affect
> + * dom0 and thus HVM guest startup performance, as well as PVH VMs.
> + */
> +case MSR_IA32_THERM_CONTROL:
> +case MSR_IA32_ENERGY_PERF_BIAS:

...yet in the patch you also allow access to
MSR_IA32_ENERGY_PERF_BIAS, which makes me wonder whether
MSR_IA32_THERM_CONTROL is the only required one.

It could help to post full logs Xen + Linux dmesgs.

Is this reproducible with different Linux versions?

Thanks, Roger.



Re: [PATCH] MAINTAINERS: make Bertrand ARM maintainer

2022-02-11 Thread Bertrand Marquis
Hi Stefano,

Thanks a lot :-)

> On 10 Feb 2022, at 19:08, Stefano Stabellini  wrote:
> 
> Signed-off-by: Stefano Stabellini 

Asked-by: Bertrand Marquis 

Cheers
Bertrand

> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 64934cc070..d41572128b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -230,8 +230,8 @@ F:tools/libs/ctrl/xc_arinc653.c
> ARM (W/ VIRTUALISATION EXTENSIONS) ARCHITECTURE
> M:Stefano Stabellini 
> M:Julien Grall 
> +M:   Bertrand Marquis 
> R:Volodymyr Babchuk 
> -R:   Bertrand Marquis 
> S:Supported
> L:xen-devel@lists.xenproject.org
> F:docs/misc/arm/




Re: [RFC v2 1/8] xen/hypfs: support fo nested dynamic hypfs nodes

2022-02-11 Thread Oleksii Moisieiev
Hi Juergen,

On Thu, Feb 10, 2022 at 08:34:08AM +0100, Juergen Gross wrote:
> On 08.02.22 19:00, Oleksii Moisieiev wrote:
> 

> > Add new api:
> > - hypfs_read_dyndir_entry
> > - hypfs_gen_dyndir_entry
> > which are the extension of the dynamic hypfs nodes support, presented in
> > 0b3b53be8cf226d947a79c2535a9efbb2dd7bc38.
> > This allows nested dynamic nodes to be added. Also input parameter is
> > hypfs_entry, so properties can also be generated dynamically.
> > 
> > Generating mixed list of dirs and properties is also supported.
> > Same as to the dynamic hypfs nodes, this is anchored in percpu pointer,
> > which can be retriewed on any level of the dynamic entries.
> > This handle should be allocated on enter() callback and released on
> > exit() callback. When using nested dynamic dirs and properties handle
> > should be allocated on the first enter() call and released on the last
> > exit() call.
> > 
> > Signed-off-by: Oleksii Moisieiev 
> > ---
> >   xen/common/hypfs.c  | 83 +
> >   xen/include/xen/hypfs.h | 14 ++-
> >   2 files changed, 79 insertions(+), 18 deletions(-)
> > 
> > diff --git a/xen/common/hypfs.c b/xen/common/hypfs.c
> > index e71f7df479..6901f5e311 100644
> > --- a/xen/common/hypfs.c
> > +++ b/xen/common/hypfs.c
> > @@ -367,28 +367,27 @@ unsigned int hypfs_getsize(const struct hypfs_entry 
> > *entry)
> >   /*
> >* Fill the direntry for a dynamically generated entry. Especially the
> > - * generated name needs to be kept in sync with 
> > hypfs_gen_dyndir_id_entry().
> > + * generated name needs to be kept in sync with hypfs_gen_dyndir_entry().
> >*/
> > -int hypfs_read_dyndir_id_entry(const struct hypfs_entry_dir *template,
> > -   unsigned int id, bool is_last,
> > +int hypfs_read_dyndir_entry(const struct hypfs_entry *template,
> > +   const char *name, unsigned int namelen,
> > +   bool is_last,
> >  XEN_GUEST_HANDLE_PARAM(void) *uaddr)
> 
> Please fix the indentation of the parameters.
> 
> >   {
> >   struct xen_hypfs_dirlistentry direntry;
> > -char name[HYPFS_DYNDIR_ID_NAMELEN];
> > -unsigned int e_namelen, e_len;
> > +unsigned int e_len;
> > -e_namelen = snprintf(name, sizeof(name), template->e.name, id);
> > -e_len = DIRENTRY_SIZE(e_namelen);
> > +e_len = DIRENTRY_SIZE(namelen);
> >   direntry.e.pad = 0;
> > -direntry.e.type = template->e.type;
> > -direntry.e.encoding = template->e.encoding;
> > -direntry.e.content_len = template->e.funcs->getsize(>e);
> > -direntry.e.max_write_len = template->e.max_size;
> > +direntry.e.type = template->type;
> > +direntry.e.encoding = template->encoding;
> > +direntry.e.content_len = template->funcs->getsize(template);
> > +direntry.e.max_write_len = template->max_size;
> >   direntry.off_next = is_last ? 0 : e_len;
> >   if ( copy_to_guest(*uaddr, , 1) )
> >   return -EFAULT;
> >   if ( copy_to_guest_offset(*uaddr, DIRENTRY_NAME_OFF, name,
> > -  e_namelen + 1) )
> > +  namelen + 1) )
> >   return -EFAULT;
> >   guest_handle_add_offset(*uaddr, e_len);
> > @@ -396,6 +395,22 @@ int hypfs_read_dyndir_id_entry(const struct 
> > hypfs_entry_dir *template,
> >   return 0;
> >   }
> > +/*
> > + * Fill the direntry for a dynamically generated entry. Especially the
> > + * generated name needs to be kept in sync with 
> > hypfs_gen_dyndir_id_entry().
> > + */
> > +int hypfs_read_dyndir_id_entry(const struct hypfs_entry_dir *template,
> > +   unsigned int id, bool is_last,
> > +   XEN_GUEST_HANDLE_PARAM(void) *uaddr)
> > +{
> > +char name[HYPFS_DYNDIR_ID_NAMELEN];
> > +unsigned int e_namelen;
> > +
> > +e_namelen = snprintf(name, sizeof(name), template->e.name, id);
> > +return hypfs_read_dyndir_entry(>e, name, e_namelen, is_last, 
> > uaddr);
> > +}
> > +
> > +
> >   static const struct hypfs_entry *hypfs_dyndir_enter(
> >   const struct hypfs_entry *entry)
> >   {
> > @@ -404,7 +419,7 @@ static const struct hypfs_entry *hypfs_dyndir_enter(
> >   data = hypfs_get_dyndata();
> >   /* Use template with original enter function. */
> > -return data->template->e.funcs->enter(>template->e);
> > +return data->template->funcs->enter(data->template);
> >   }
> >   static struct hypfs_entry *hypfs_dyndir_findentry(
> > @@ -415,7 +430,7 @@ static struct hypfs_entry *hypfs_dyndir_findentry(
> >   data = hypfs_get_dyndata();
> >   /* Use template with original findentry function. */
> > -return data->template->e.funcs->findentry(data->template, name, 
> > name_len);
> > +return data->template->funcs->findentry(>dir, name, name_len);
> >   }
> >   static int hypfs_read_dyndir(const struct hypfs_entry *entry,
> > @@ -426,7 +441,36