[linux-linus test] 168086: trouble: blocked/broken/fail/pass
flight 168086 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/168086/ Failures and problems with tests :-( Tests which did not succeed and are blocking, including tests which could not be run: build-arm64 broken build-arm64-pvopsbroken build-arm64-xsm broken build-arm64-xsm 4 host-install(4)broken REGR. vs. 168080 build-arm64 4 host-install(4)broken REGR. vs. 168080 build-arm64-pvops 4 host-install(4)broken REGR. vs. 168080 Tests which did not succeed, but are not blocking: test-arm64-arm64-examine 1 build-check(1) blocked n/a build-arm64-libvirt 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-raw 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-xl 1 build-check(1) blocked n/a test-arm64-arm64-xl-credit1 1 build-check(1) blocked n/a test-arm64-arm64-xl-credit2 1 build-check(1) blocked n/a test-arm64-arm64-xl-seattle 1 build-check(1) blocked n/a test-arm64-arm64-xl-thunderx 1 build-check(1) blocked n/a test-arm64-arm64-xl-vhd 1 build-check(1) blocked n/a test-arm64-arm64-xl-xsm 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168080 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168080 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168080 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168080 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168080 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail like 168080 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 168080 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 168080 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 168080 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass version targeted for testing: linux1d41d2e82623b40ee27811fe9ea38bafe2e722e9 baseline version: linuxf1baf68e1383f6ed93eb9cff2866d46562607a43 Last test of basis 168080 2022-02-11 00:09:22 Z1 days Testing same since 168086 2022-02-11 20:11:19 Z0 days1 attempts People who touched revisions under test: Aaron Liu Alex Deucher Alexandre Ghiti Alviro Iskandar Setiawan Ammar Faizi Andreas Gruenbacher Andrzej Pietrasiewicz Andy Shevchenko Aurelien Jarno Bartosz Golaszewski Bean Huo Bob Peterson Brian Norris Catalin Marinas Changbin Du Christoph Hellwig Christoph Niedermaier Damien Le Moal Daniel Bristot de Oliveira Daniel Stone Daniel Vetter Daniel
Development Issue of Concern
The tradition has been to name the active development branch in GIT has been named "master". Quite a number of people object to the name due to its history. In light of such concerns, perhaps the Xen Project should join with other similar projects and move to have the active development branch renamed "main"? -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Re: [PATCH RFC 1/3] xen/efi: Always query the console information and get GOP
On Mon, Feb 07, 2022 at 06:52:57PM +, Julien Grall wrote: > On 07/02/2022 08:46, Jan Beulich wrote: > > On 06.02.2022 20:28, Julien Grall wrote: > >> > >> It is not entirely clear to me why the GOP was only fetched when > >> the configuration file is used. > >> > >> I have tested this on RPI4 and it seems to work. Any chance this > >> was done to workaround an x86 platform? > > > > This was done so in the context of making the code work for Arm. See > > commit c38cf865ec82 ("EFI: ignore EFI commandline, skip console setup > > when booted from GRUB"), the description of which explicitly says > > > > "Don't do EFI console or video configuration when booted by GRUB. The EFI > > boot > > code does some console and video initialization to support native EFI > > boot from > > the EFI boot manager or EFI shell. This initlization should not be done > > when > > booted using GRUB." > > I read that and still couldn't figure out why this was done like that. The most likely motivation was simply "Eww! ACPI/UEFI use gobs of memory! Purge the abomination!" Unfortunately ACPI/UEFI are large an complex due to trying to solve a large and complex problem. ACPI/UEFI attempt to provide an OS agnostic presentation of the hardware layout. Whereas device-trees are a common *format* for presenting hardware to *an* OS (similar to how JSON is a common format). Due to the size and complexity, most developers have preferred the simpler device-tree format even though that severely limits OS choice. As such, nuking ACPI/UEFI's presence is common in the ARM world. Versus the x86 world where Intel dragged everyone onto ACPI/UEFI. One can see this in patches like Roman Shaposhnik's "Making full 2G of memory available to Xen on HiKey" which simply tosses EFI into the garbage bin as useless overhead. Yet the ARM world is now large enough to justify OS-agnostic solutions such as ACPI/UEFI. The standards behind device-trees might be heading in this direction, but they're way behind. You stated your patch was for 5.17-rc2. How much backporting would you expect this patch to be viable for? (I'm unsure how much churn is occuring in the relevant portions of Linux) The long-term branches of Linux include 5.4.179, 5.10.100 and 5.15.23. `patch` indicated it could apply to 5.10.92 source with fuzz (hmm). This suggests 5.15 is likely viable, but 5.10 is risky and 5.4 is a very long shot. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Re: [RFC v2 5/8] xen/arm: introduce SCMI-SMC mediator driver
On Fri, 11 Feb 2022, Oleksii Moisieiev wrote: > On Fri, Feb 11, 2022 at 11:18:47AM +, Bertrand Marquis wrote: > > Hi Oleksii, > > > > > > > On 11 Feb 2022, at 10:44, Oleksii Moisieiev > > > wrote: > > > > > > Hi Bertrand, > > > > > > On Fri, Feb 11, 2022 at 08:46:05AM +, Bertrand Marquis wrote: > > >> Hi Oleksii, > > >> > > >> > > >>> On 8 Feb 2022, at 18:00, Oleksii Moisieiev > > >>> wrote: > > >>> > > >>> This is the implementation of SCI interface, called SCMI-SMC driver, > > >>> which works as the mediator between XEN Domains and Firmware (SCP, ATF > > >>> etc). > > >>> This allows devices from the Domains to work with clocks, resets and > > >>> power-domains without access to CPG. > > >>> > > >>> Originally, cpg should be passed to the domain so it can work with > > >>> power-domains/clocks/resets etc. Considering that cpg can't be split > > >>> between > > >>> the Domains, we get the limitation that the devices, which are using > > >>> power-domains/clocks/resets etc, couldn't be split between the domains. > > >>> The solution is to move the power-domain/clock/resets etc to the > > >>> Firmware (such as SCP firmware or ATF) and provide interface for the > > >>> Domains. XEN should have an entity, caled SCI-Mediator, which is > > >>> responsible for messages redirection between Domains and Firmware and > > >>> for permission handling. > > >>> > > >>> The following features are implemented: > > >>> - request SCMI channels from ATF and pass channels to Domains; > > >>> - set device permissions for Domains based on the Domain partial > > >>> device-tree. Devices with permissions are able to work with clocks, > > >>> resets and power-domains via SCMI; > > >>> - redirect scmi messages from Domains to ATF. > > >> > > >> Before going more deeply in the code I would like to discuss the general > > >> design here and ask some questions to prevent to rework the code before > > >> we all agree that this is the right solution and that we want this in > > >> Xen. > > >> > > >> First I want to point out that clock/reset/power virtualization is a > > >> problem > > >> on most applications using device pass-through and I am very glad that > > >> someone is looking into it. > > >> Also SCMI is the current standard existing for this so relying on it is > > >> a very > > >> good idea. > > >> > > >> Latest version SCMI standard (DEN0056D v3.1) is defining some means > > >> to use SCMI on a virtualised system. In chapter 4.2.1, the standard > > >> recommends to set permissions per agent in the hypervisor so that a VM > > >> could later use the discovery protocol to detect the resources and use > > >> them. > > >> Using this kind of scenario the mediator in Xen would just configure the > > >> Permissions in the SCMI and would then rely on it to limit what is > > >> possible > > >> by who just by just assigning a channel to a VM. > > > > > >> > > >> In your current design (please correct me if I am wrong) you seem to > > >> fully > > >> rely on Xen and the FDT for discovery and permission. > > > > > > In current implementation Xen is the trusted agent. And it's responsible > > > for permissions setting. During initialization it discovers agent and > > > set permissions by using BASE_SET_DEVICE_PERMISSIONS to the Dom0. When > > > new domain is created, Xen assigns agent id for this domain and request > > > resources, that are passed-through to this Domain. > > > > Ok > > > > > > > > I'm getting the follwing information from FDT: > > > 1) Shared memory addressed, which should be used for agents. During > > > initialization I send BASE_DISCOVER_AGENT to each of this addresses and > > > receive agent_id. Xen is responsible for assigning agent_id for the > > > Domain. Then Xen intercept smc calls from the domain, set agent_id and > > > redirects it to the Firmware. > > > > So Xen is setting the agent ID, no way for a guest to get access to > > something it > > should with more check, am I right ? > > > > Yes. Xen is the only entity, which is trusted. So it's responsible for > setting permissions and assigning agent_id. Guest get's an access only > for the devices it's allowed to. > > > > > > > 2) Devices, that are using SCMI. Those devices has clock/power/resets > > > etc related to scmi protocol (as it is done in Linux kernel) > > > and scmi_devid should be set. I'm currently preparing to send patch, > > > updating kernel bindings with this parameter to Linux kernel. > > > scmi_devid value should match device id, set in the Firmware. > > > dt example: > > > { > > >scmi_devid = <1>; // usb0 device id > > >clocks = <_clock 1> // relays on clock with id 1 > > > } > > > > > > Xen requests permission for the device when device is attached to the > > > Domain during creation. > > > > Without this, how is (if it is) the linux kernel using SCMI for power > > management ? > > Here is how it should be desribed in FDT: > / > { > firmware { > scmi { >
Re: Metadata and signalling channels for Zephyr virtio-backends on Xen
On Fri, 11 Feb 2022, Alex Bennée wrote: > > FYI, a good and promising approach to handle both SCMI and SCPI is the > > series recently submitted by EPAM to mediate SCMI and SCPI requests in > > Xen: https://marc.info/?l=xen-devel=163947444032590 > > > > (Another "special" virtio backend is virtio-iommu for similar reasons: > > the guest p2m address mappings and also the IOMMU drivers are in Xen. > > It is not immediately clear whether a virtio-iommu backend would need to > > be in Xen or run as a process in dom0/domU.) > > > > On the other hand, for all the other "normal" protocols (e.g. > > virtio-net, virtio-block, etc.) the backend would naturally run as a > > process in dom0 or domU (e.g. QEMU in Dom0) as one would expect. > > Can domU's not be given particular access to HW they might want to > tweak? I assume at some point a block device backend needs to actually > talk to real HW to store the blocks (even if in most cases it would be a > kernel doing the HW access on it's behalf). Yes, it would. Block and network are subsystems with limited visibility, access, and harmful capabilities (assuming IOMMU). If the block device goes down or is misused, block might not work but everything else is expected to work. Block only requires visibility of the block device for it to work. The same is true for network, GPU, USB, etc. SCMI is different. If SCMI is misused the whole platform is affected. SCMI implies visibility of everything in the system. It is not much about emulating SCMI but more about mediating SCMI calls. In other words, SCMI is not a device, it is a core interface. In a Xen model, Xen virtualizes CPU and memory and other core features/interfaces (timers, interrupt controller, IOMMU, etc). The PCI root complex is handled by Xen too. Individual (PCI and non-PCI) devices are assigned to guests. These are the reasons why I think the best way to enable SCMI in upstream Xen is with a mediator in the hypervisor as it is currently in development. Any chances you could combine your efforts with EPAM's outstanding series? You might be able to spot gaps if any, and might even have already code to fill those gaps. It would be fantastic to have your reviews and/or contributions on xen-devel. Otherwise, if you have to run the virtio-scmi backend in userspace, why not try to get it to work on Xen :-) It might not be the ideal solution, but it could be a good learning experience and pave the way for the other virtio backends which definitely will be in userspace (virtio-block, virtio-gpu, etc). > >> Currently the demo setup > >> is intermediated by a double-ended vhost-user daemon running on the > >> devbox acting as a go between a number of QEMU instances representing > >> the front and back-ends. You can view the architecture with Vincents > >> diagram here: > >> > >> > >> https://docs.google.com/drawings/d/1YSuJUSjEdTi2oEUq4oG4A9pBKSEJTAp6hhcHKKhmYHs/edit?usp=sharing > >> > >> The key virtq handling is done over the special carve outs of shared > >> memory between the front end and guest. However the signalling is > >> currently over a virtio device on the backend. This is useful for the > >> PoC but obviously in a real system we don't have a hidden POSIX system > >> acting as a go between not to mention the additional latency it causes > >> with all those context switches. > >> > >> I was hoping we could get some more of the Xen experts to the next > >> Stratos sync (17th Feb) to go over approaches for a properly hosted on > >> Xen approach. From my recollection (Vincent please correct me if I'm > >> wrong) of last week the issues that need solving are: > > > > Unfortunately I have a regular conflict which prevents me from being > > able to join the Stratos calls. However, I can certainly make myself > > available for one call (unless something unexpected comes up). > > > > > >> * How to handle configuration steps as FE guests come up > >> > >> The SCMI server will be a long running persistent backend because it is > >> managing real HW resources. However the guests may be ephemeral (or just > >> restarted) so we can't just hard-code everything in a DTB. While the > >> virtio-negotiation in the config space covers most things we still need > >> information like where in the guests address space the shared memory > >> lives and at what offset into that the queues are created. As far as I'm > >> aware the canonical source of domain information is XenStore > >> (https://wiki.xenproject.org/wiki/XenStore) but this relies on a Dom0 > >> type approach. Is there an alternative for dom0less systems or do we > >> need a dom0-light approach, for example using STR-21 (Ensure Zephyr can > >> run cleanly as a Dom0 guest) providing just enough services for FE's to > >> register metadata and BE's to read it? > > > > I'll try to answer the question for a generic virtio frontend and > > backend instead (not SCMI because SCMI is unique due to the reasons > > above.) > > > > Yes, xenstore is the easiest way
[linux-5.4 test] 168085: trouble: blocked/broken/fail/pass
flight 168085 linux-5.4 real [real] http://logs.test-lab.xenproject.org/osstest/logs/168085/ Failures and problems with tests :-( Tests which did not succeed and are blocking, including tests which could not be run: build-arm64 broken build-arm64-pvopsbroken build-arm64-xsm broken build-arm64-pvops 4 host-install(4)broken REGR. vs. 168060 build-arm64-xsm 4 host-install(4)broken REGR. vs. 168060 build-arm64 4 host-install(4)broken REGR. vs. 168060 Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 168084 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeat fail pass in 168084 Tests which did not succeed, but are not blocking: test-arm64-arm64-examine 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-raw 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-xl 1 build-check(1) blocked n/a test-arm64-arm64-xl-credit1 1 build-check(1) blocked n/a test-arm64-arm64-xl-credit2 1 build-check(1) blocked n/a test-arm64-arm64-xl-seattle 1 build-check(1) blocked n/a test-arm64-arm64-xl-thunderx 1 build-check(1) blocked n/a test-arm64-arm64-xl-vhd 1 build-check(1) blocked n/a test-arm64-arm64-xl-xsm 1 build-check(1) blocked n/a build-arm64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168060 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168060 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168060 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 168060 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168060 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168060 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168060 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168060 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168060 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 168060 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 168060 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168060 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass version targeted for testing: linux
Re: Metadata and signalling channels for Zephyr virtio-backends on Xen
Stefano Stabellini writes: > On Mon, 7 Feb 2022, Alex Bennée wrote: >> Hi Stefano, >> >> Vincent gave an update on his virtio-scmi work at the last Stratos sync >> call and the discussion moved onto next steps. > > Hi Alex, > > I don't know the specifics of virtio-scmi, but if it is about power, > clocks, reset, etc. like the original SCMI protocol, then virtio-scmi is > likely going to be very different from all the other virtio frontends > and backends. That's because SCMI requires a full view of the system, > which is different from something like virtio-net that is limited to the > emulation of 1 device. For this reason, it is likely that the > virtio-scmi backend would be a better fit in Xen itself, rather than run > in userspace inside a VM. That may be a good solution for Xen but I still think it's worthwhile being able to package SCMI in a VM for other hypervisors. We are just happening to use Xen as a nice type-1 example. Vincents SCMI server code is portable anyway and can reside in a Zephyr app, firmware blob or a userspace vhost-user client. > FYI, a good and promising approach to handle both SCMI and SCPI is the > series recently submitted by EPAM to mediate SCMI and SCPI requests in > Xen: https://marc.info/?l=xen-devel=163947444032590 > > (Another "special" virtio backend is virtio-iommu for similar reasons: > the guest p2m address mappings and also the IOMMU drivers are in Xen. > It is not immediately clear whether a virtio-iommu backend would need to > be in Xen or run as a process in dom0/domU.) > > On the other hand, for all the other "normal" protocols (e.g. > virtio-net, virtio-block, etc.) the backend would naturally run as a > process in dom0 or domU (e.g. QEMU in Dom0) as one would expect. Can domU's not be given particular access to HW they might want to tweak? I assume at some point a block device backend needs to actually talk to real HW to store the blocks (even if in most cases it would be a kernel doing the HW access on it's behalf). >> Currently the demo setup >> is intermediated by a double-ended vhost-user daemon running on the >> devbox acting as a go between a number of QEMU instances representing >> the front and back-ends. You can view the architecture with Vincents >> diagram here: >> >> >> https://docs.google.com/drawings/d/1YSuJUSjEdTi2oEUq4oG4A9pBKSEJTAp6hhcHKKhmYHs/edit?usp=sharing >> >> The key virtq handling is done over the special carve outs of shared >> memory between the front end and guest. However the signalling is >> currently over a virtio device on the backend. This is useful for the >> PoC but obviously in a real system we don't have a hidden POSIX system >> acting as a go between not to mention the additional latency it causes >> with all those context switches. >> >> I was hoping we could get some more of the Xen experts to the next >> Stratos sync (17th Feb) to go over approaches for a properly hosted on >> Xen approach. From my recollection (Vincent please correct me if I'm >> wrong) of last week the issues that need solving are: > > Unfortunately I have a regular conflict which prevents me from being > able to join the Stratos calls. However, I can certainly make myself > available for one call (unless something unexpected comes up). > > >> * How to handle configuration steps as FE guests come up >> >> The SCMI server will be a long running persistent backend because it is >> managing real HW resources. However the guests may be ephemeral (or just >> restarted) so we can't just hard-code everything in a DTB. While the >> virtio-negotiation in the config space covers most things we still need >> information like where in the guests address space the shared memory >> lives and at what offset into that the queues are created. As far as I'm >> aware the canonical source of domain information is XenStore >> (https://wiki.xenproject.org/wiki/XenStore) but this relies on a Dom0 >> type approach. Is there an alternative for dom0less systems or do we >> need a dom0-light approach, for example using STR-21 (Ensure Zephyr can >> run cleanly as a Dom0 guest) providing just enough services for FE's to >> register metadata and BE's to read it? > > I'll try to answer the question for a generic virtio frontend and > backend instead (not SCMI because SCMI is unique due to the reasons > above.) > > Yes, xenstore is the easiest way to exchange configuration information > between domains. I think EPAM used xenstore to exchange the > configuration information in their virtio-block demo. There is a way to > use xenstore even between dom0less VMs: > https://marc.info/?l=xen-devel=164340547602391 Not just xenstore but > full PV drivers too. However, in the dom0less case xenstore is going to > become available some time after boot, not immediately at startup time. > That's because you need to wait until xenstored is up and running. > > There are other ways to send data from one VM to another which are > available immediately at boot, such as Argo and
Re: [PATCH] PCI/MSI: msix_setup_msi_descs: Restore logic for msi_attrib.can_mask
On Fri, Feb 11, 2022 at 01:10:22AM +0100, Josef Johansson wrote: > On 2/11/22 00:55, Bjorn Helgaas wrote: > > On Sat, Jan 22, 2022 at 02:10:01AM +0100, Josef Johansson wrote: > >> From: Josef Johansson > >> > >> PCI/MSI: msix_setup_msi_descs: Restore logic for msi_attrib.can_mask > > Please match the form and style of previous subject lines (in > > particular, omit "msix_setup_msi_descs:"). > Would this subject suffice? > PCI/MSI: Correct use of can_mask in msi_add_msi_desc() Looks good to me! Bjorn
Re: [RFC PATCH] arm/vgic-v3: provide custom callbacks for pend_lpi_tree radix tree
On 11/02/2022 15:45, Luca Fancellu wrote: On 11 Feb 2022, at 15:26, Julien Grall wrote: Hi Luca, On 11/02/2022 15:00, Luca Fancellu wrote: pend_lpi_tree is a radix tree used to store pending irqs, the tree is protected by a lock for read/write operations. Currently the radix tree default function to free items uses the RCU mechanism, calling call_rcu and deferring the operation. However every access to the structure is protected by the lock so we can avoid using the default free function that, by using RCU, increases memory usage and impacts the predictability of the system. I understand goal but looking at the implementation of vgic_v3_lpi_to_pending() (Copied below for convenience). We would release the lock as soon as the look-up finish, yet the element is returned. static struct pending_irq *vgic_v3_lpi_to_pending(struct domain *d, unsigned int lpi) { struct pending_irq *pirq; read_lock(>arch.vgic.pend_lpi_tree_lock); pirq = radix_tree_lookup(>arch.vgic.pend_lpi_tree, lpi); read_unlock(>arch.vgic.pend_lpi_tree_lock); return pirq; } So the lock will not protect us against removal. If you want to drop the RCU, you will need to ensure the structure pending_irq is suitably protected. I haven't check whether there are other locks that may suit us here. Hi Julien, Yes you are right! I missed that, sorry for the noise. Actually,... I think I am wrong :/. I thought the lock pend_lpi_tre_lock would protect pending_irq, but it only protects the radix tree element (not the value). The use in its_discard_event() seems to confirm that because the pending_irq is re-initialized as soon as it gets destroyed. I would like a second opinion though. Cheers, -- Julien Grall
Re: [PATCH V2 5/13] hid: use time_is_after_jiffies() instead of jiffies judgment
On Thu, 2022-02-10 at 18:30 -0800, Qing Wang wrote: > From: Wang Qing > > It is better to use time_xxx() directly instead of jiffies judgment > for understanding. > > Signed-off-by: Wang Qing Acked-by: Srinivas Pandruvada > --- > drivers/hid/intel-ish-hid/ipc/ipc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/hid/intel-ish-hid/ipc/ipc.c b/drivers/hid/intel- > ish-hid/ipc/ipc.c > index 8ccb246..15e1423 > --- a/drivers/hid/intel-ish-hid/ipc/ipc.c > +++ b/drivers/hid/intel-ish-hid/ipc/ipc.c > @@ -578,7 +578,7 @@ static void _ish_sync_fw_clock(struct > ishtp_device *dev) > static unsigned longprev_sync; > uint64_tusec; > > - if (prev_sync && jiffies - prev_sync < 20 * HZ) > + if (prev_sync && time_is_after_jiffies(prev_sync + 20 * HZ)) > return; > > prev_sync = jiffies;
Re: [RFC PATCH] arm/vgic-v3: provide custom callbacks for pend_lpi_tree radix tree
> On 11 Feb 2022, at 15:26, Julien Grall wrote: > > Hi Luca, > > On 11/02/2022 15:00, Luca Fancellu wrote: >> pend_lpi_tree is a radix tree used to store pending irqs, the tree is >> protected by a lock for read/write operations. >> Currently the radix tree default function to free items uses the >> RCU mechanism, calling call_rcu and deferring the operation. >> However every access to the structure is protected by the lock so we >> can avoid using the default free function that, by using RCU, >> increases memory usage and impacts the predictability of the system. > > I understand goal but looking at the implementation of > vgic_v3_lpi_to_pending() (Copied below for convenience). We would release the > lock as soon as the look-up finish, yet the element is returned. > > static struct pending_irq *vgic_v3_lpi_to_pending(struct domain *d, > unsigned int lpi) > { >struct pending_irq *pirq; > >read_lock(>arch.vgic.pend_lpi_tree_lock); >pirq = radix_tree_lookup(>arch.vgic.pend_lpi_tree, lpi); >read_unlock(>arch.vgic.pend_lpi_tree_lock); > >return pirq; > } > > So the lock will not protect us against removal. If you want to drop the RCU, > you will need to ensure the structure pending_irq is suitably protected. I > haven't check whether there are other locks that may suit us here. > Hi Julien, Yes you are right! I missed that, sorry for the noise. Cheers, Luca > Cheers, > > -- > Julien Grall
Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure
On Fri, Feb 11, 2022 at 12:13:38PM +, Oleksandr Andrushchenko wrote: > > > On 11.02.22 13:40, Roger Pau Monné wrote: > > On Fri, Feb 11, 2022 at 07:27:39AM +, Oleksandr Andrushchenko wrote: > >> Hi, Roger! > >> > >> On 10.02.22 18:16, Roger Pau Monné wrote: > >>> On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote: > From: Oleksandr Andrushchenko > > Introduce a per-domain read/write lock to check whether vpci is present, > so we are sure there are no accesses to the contents of the vpci struct > if not. This lock can be used (and in a few cases is used right away) > so that vpci removal can be performed while holding the lock in write > mode. Previously such removal could race with vpci_read for example. > >>> Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt > >>> pci_remove_device, and likely when vPCI gets also used in > >>> {de}assign_device I think. > >> Yes, this is indeed an issue, but I was not trying to solve it in > >> context of vPCI locking yet. I think we should discuss how do > >> we approach pdev locking, so I can create a patch for that. > >> that being said, I would like not to solve pdev in this patch yet > >> > >> ...I do understand we do want to avoid that, but at the moment > >> a single reliable way for making sure pdev is alive seems to > >> be pcidevs_lock > > I think we will need to make pcidevs_lock a rwlock and take it in read > > mode for pci_get_pdev_by_domain. > > > > We didn't have this scenario before where PCI emulation is done in the > > hypervisor, and hence the locking around those data structures has not > > been designed for those use-cases. > Yes, I do understand that. > I hope pcidevs lock move to rwlock can be done as a separate > patch. While this is not done, do you think we can proceed with > vPCI series and pcidevs locking re-work being done after? Ideally we would like to sort out the locking once and for all. I would like to be sure that what we introduce now doesn't turn out to interact badly when we decide to look at the pcidevs locking issue. Thanks, Roger.
Re: [PATCH v6 13/13] xen/arm: account IO handlers for emulated PCI MSI-X
Hi Oleksandr, On 04/02/2022 06:34, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko At the moment, we always allocate an extra 16 slots for IO handlers (see MAX_IO_HANDLER). So while adding IO trap handlers for the emulated MSI-X registers we need to explicitly tell that we have additional IO handlers, so those are accounted. Signed-off-by: Oleksandr Andrushchenko Acked-by: Julien Grall Cheers, -- Julien Grall
[PATCH CPU v2] cpuid: initialize cpuinfo with boot_cpu_data
When re-identifying CPU data, we might use uninitialized data when checking for the cache line property to adapt the cache alignment. The data that depends on this uninitialized read is currently not forwarded. To avoid problems in the future, initialize the data cpuinfo structure before re-identifying the CPU again. The trace to hit the uninitialized read reported by Coverity is: bool recheck_cpu_features(unsigned int cpu) ... struct cpuinfo_x86 c; ... identify_cpu(); void identify_cpu(struct cpuinfo_x86 *c) ... generic_identify(c) static void generic_identify(struct cpuinfo_x86 *c) ... if (this_cpu->c_early_init) this_cpu->c_early_init(c); // which is early_init_intel static void early_init_intel(struct cpuinfo_x86 *c) ... if (c->x86 == 15 && c->x86_cache_alignment == 64) c->x86_cache_alignment = 128; This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc. Signed-off-by: Norbert Manthey --- xen/arch/x86/cpuid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -609,7 +609,7 @@ void __init init_guest_cpuid(void) bool recheck_cpu_features(unsigned int cpu) { bool okay = true; -struct cpuinfo_x86 c; +struct cpuinfo_x86 c = {0}; const struct cpuinfo_x86 *bsp = _cpu_data; unsigned int i; -- 2.17.1 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [RFC PATCH] arm/vgic-v3: provide custom callbacks for pend_lpi_tree radix tree
Hi Luca, On 11/02/2022 15:00, Luca Fancellu wrote: pend_lpi_tree is a radix tree used to store pending irqs, the tree is protected by a lock for read/write operations. Currently the radix tree default function to free items uses the RCU mechanism, calling call_rcu and deferring the operation. However every access to the structure is protected by the lock so we can avoid using the default free function that, by using RCU, increases memory usage and impacts the predictability of the system. I understand goal but looking at the implementation of vgic_v3_lpi_to_pending() (Copied below for convenience). We would release the lock as soon as the look-up finish, yet the element is returned. static struct pending_irq *vgic_v3_lpi_to_pending(struct domain *d, unsigned int lpi) { struct pending_irq *pirq; read_lock(>arch.vgic.pend_lpi_tree_lock); pirq = radix_tree_lookup(>arch.vgic.pend_lpi_tree, lpi); read_unlock(>arch.vgic.pend_lpi_tree_lock); return pirq; } So the lock will not protect us against removal. If you want to drop the RCU, you will need to ensure the structure pending_irq is suitably protected. I haven't check whether there are other locks that may suit us here. Cheers, -- Julien Grall
[linux-5.4 test] 168084: trouble: blocked/broken/fail/pass
flight 168084 linux-5.4 real [real] http://logs.test-lab.xenproject.org/osstest/logs/168084/ Failures and problems with tests :-( Tests which did not succeed and are blocking, including tests which could not be run: build-arm64 broken build-arm64-pvopsbroken build-arm64-xsm broken build-arm64-pvops 4 host-install(4)broken REGR. vs. 168060 build-arm64-xsm 4 host-install(4)broken REGR. vs. 168060 build-arm64 4 host-install(4)broken REGR. vs. 168060 Tests which did not succeed, but are not blocking: test-arm64-arm64-examine 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-raw 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-xl 1 build-check(1) blocked n/a test-arm64-arm64-xl-credit1 1 build-check(1) blocked n/a test-arm64-arm64-xl-credit2 1 build-check(1) blocked n/a test-arm64-arm64-xl-seattle 1 build-check(1) blocked n/a test-arm64-arm64-xl-thunderx 1 build-check(1) blocked n/a test-arm64-arm64-xl-vhd 1 build-check(1) blocked n/a test-arm64-arm64-xl-xsm 1 build-check(1) blocked n/a build-arm64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168060 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168060 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168060 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 168060 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168060 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168060 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168060 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168060 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168060 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 168060 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 168060 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168060 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass version targeted for testing: linux52871671099d1bb3fca5ed076029e4b937bfc053 baseline version: linux76fd334f07cc11e047c2237a19b2cf8b1f653ba2 Last test of basis 168060 2022-02-08 17:43:16 Z2 days Testing same since 168084
[RFC PATCH] arm/vgic-v3: provide custom callbacks for pend_lpi_tree radix tree
pend_lpi_tree is a radix tree used to store pending irqs, the tree is protected by a lock for read/write operations. Currently the radix tree default function to free items uses the RCU mechanism, calling call_rcu and deferring the operation. However every access to the structure is protected by the lock so we can avoid using the default free function that, by using RCU, increases memory usage and impacts the predictability of the system. This commit provides custom callbacks to alloc/free items of the radix tree and the free function doesn't use the RCU mechanism. Signed-off-by: Luca Fancellu --- xen/arch/arm/vgic-v3.c | 20 1 file changed, 20 insertions(+) diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c index 65bb7991a69b..970747a72012 100644 --- a/xen/arch/arm/vgic-v3.c +++ b/xen/arch/arm/vgic-v3.c @@ -1650,6 +1650,18 @@ static inline unsigned int vgic_v3_max_rdist_count(struct domain *d) GUEST_GICV3_RDIST_REGIONS; } +static struct radix_tree_node *vgic_v3_radix_tree_node_alloc(void *arg) +{ +struct radix_tree_node *node = xmalloc(struct radix_tree_node); + +return node ? node : NULL; +} + +static void vgic_v3_radix_tree_node_free(struct radix_tree_node *elem, void *arg) +{ +xfree(elem); +} + static int vgic_v3_domain_init(struct domain *d) { struct vgic_rdist_region *rdist_regions; @@ -1668,6 +1680,14 @@ static int vgic_v3_domain_init(struct domain *d) rwlock_init(>arch.vgic.pend_lpi_tree_lock); radix_tree_init(>arch.vgic.pend_lpi_tree); +/* + * pend_lpi_tree is protected by rwlock, so don't use lockless RCU default + * management for it and provide callbacks to alloc/free elements. + */ +radix_tree_set_alloc_callbacks(>arch.vgic.pend_lpi_tree, + vgic_v3_radix_tree_node_alloc, + vgic_v3_radix_tree_node_free, NULL); + /* * Domain 0 gets the hardware address. * Guests get the virtual platform layout. -- 2.17.1
Re: [RFC v2 1/8] xen/hypfs: support fo nested dynamic hypfs nodes
On Fri, Feb 11, 2022 at 02:28:49PM +0100, Juergen Gross wrote: > On 11.02.22 09:16, Oleksii Moisieiev wrote: > > Hi Juergen, > > > > On Thu, Feb 10, 2022 at 08:34:08AM +0100, Juergen Gross wrote: > > > On 08.02.22 19:00, Oleksii Moisieiev wrote: > > > > > > > > > Add new api: > > > > - hypfs_read_dyndir_entry > > > > - hypfs_gen_dyndir_entry > > > > which are the extension of the dynamic hypfs nodes support, presented in > > > > 0b3b53be8cf226d947a79c2535a9efbb2dd7bc38. > > > > This allows nested dynamic nodes to be added. Also input parameter is > > > > hypfs_entry, so properties can also be generated dynamically. > > > > > > > > Generating mixed list of dirs and properties is also supported. > > > > Same as to the dynamic hypfs nodes, this is anchored in percpu pointer, > > > > which can be retriewed on any level of the dynamic entries. > > > > This handle should be allocated on enter() callback and released on > > > > exit() callback. When using nested dynamic dirs and properties handle > > > > should be allocated on the first enter() call and released on the last > > > > exit() call. > > > > > > > > Signed-off-by: Oleksii Moisieiev > > ... > > > > > diff --git a/xen/include/xen/hypfs.h b/xen/include/xen/hypfs.h > > > > index e9d4c2555b..5d2728b963 100644 > > > > --- a/xen/include/xen/hypfs.h > > > > +++ b/xen/include/xen/hypfs.h > > > > @@ -79,8 +79,8 @@ struct hypfs_entry_dir { > > > >struct hypfs_dyndir_id { > > > > > > Please rename to struct hypfs_dyndir. > > > > Ok, thanks. > > > > > > > > >struct hypfs_entry_dir dir; /* Modified copy of > > > > template. */ > > > >struct hypfs_funcs funcs; /* Dynamic functions. */ > > > > -const struct hypfs_entry_dir *template; /* Template used. */ > > > > -#define HYPFS_DYNDIR_ID_NAMELEN 12 > > > > +const struct hypfs_entry *template; /* Template used. */ > > > > +#define HYPFS_DYNDIR_ID_NAMELEN 32 > > > >char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. > > > > */ > > > >unsigned int id;/* Numerical id. */ > > > > > > What about the following change instead: > > > > > > -const struct hypfs_entry_dir *template; /* Template used. */ > > > -#define HYPFS_DYNDIR_ID_NAMELEN 12 > > > -char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. */ > > > - > > > -unsigned int id;/* Numerical id. */ > > > -void *data; /* Data associated with id. > > > */ > > > +const struct hypfs_entry *template; /* Template used. */ > > > +union { > > > +#define HYPFS_DYNDIR_NAMELEN32 > > > +char name[HYPFS_DYNDIR_NAMELEN]; /* Name of hypfs entry. */ > > > +struct { > > > +#define HYPFS_DYNDIR_ID_NAMELEN 12 > > > +char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of id entry. */ > > > +unsigned int id;/* Numerical id. */ > > > +} id; > > > +}; > > > +void*data; /* Data associated with entry. */ > > > > > > > I'm not sure I see the benefit from this union. The only one I see is > > that struct hypds_dyndir will be smaller by sizeof(unsigned int). > > Could you explain please? > > My main concern is that it is not obvious to a user that the > numerical id is needed only for a special case. Putting it into > the union makes this much more clear. > This make sense. I'll make this union. Thanks. > > > > Also what do you think about the following change: > > -char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. */ > > - > > -unsigned int id;/* Numerical id. */ > > -void *data; /* Data associated with id. */ > > +char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. */ > > + > > +unsigned int id;/* Numerical id. */ > > +union { > > + const void *content; > > + void *data; /* Data associated with id. > > */ > > +} > > This change is similar to the hypfs_entry_leaf. In this case we can > > store const pointer for read-only entries and use data when write access > > is needed? > > Sure, if you need that. Thanks I will do this as well. Best regards, Oleksii > > > > > PS: I will address all your comments in v3. > > Thanks, > > > Juergen
Re: [RFC v2 1/8] xen/hypfs: support fo nested dynamic hypfs nodes
On 11.02.22 09:16, Oleksii Moisieiev wrote: Hi Juergen, On Thu, Feb 10, 2022 at 08:34:08AM +0100, Juergen Gross wrote: On 08.02.22 19:00, Oleksii Moisieiev wrote: Add new api: - hypfs_read_dyndir_entry - hypfs_gen_dyndir_entry which are the extension of the dynamic hypfs nodes support, presented in 0b3b53be8cf226d947a79c2535a9efbb2dd7bc38. This allows nested dynamic nodes to be added. Also input parameter is hypfs_entry, so properties can also be generated dynamically. Generating mixed list of dirs and properties is also supported. Same as to the dynamic hypfs nodes, this is anchored in percpu pointer, which can be retriewed on any level of the dynamic entries. This handle should be allocated on enter() callback and released on exit() callback. When using nested dynamic dirs and properties handle should be allocated on the first enter() call and released on the last exit() call. Signed-off-by: Oleksii Moisieiev ... diff --git a/xen/include/xen/hypfs.h b/xen/include/xen/hypfs.h index e9d4c2555b..5d2728b963 100644 --- a/xen/include/xen/hypfs.h +++ b/xen/include/xen/hypfs.h @@ -79,8 +79,8 @@ struct hypfs_entry_dir { struct hypfs_dyndir_id { Please rename to struct hypfs_dyndir. Ok, thanks. struct hypfs_entry_dir dir; /* Modified copy of template. */ struct hypfs_funcs funcs; /* Dynamic functions. */ -const struct hypfs_entry_dir *template; /* Template used. */ -#define HYPFS_DYNDIR_ID_NAMELEN 12 +const struct hypfs_entry *template; /* Template used. */ +#define HYPFS_DYNDIR_ID_NAMELEN 32 char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. */ unsigned int id;/* Numerical id. */ What about the following change instead: -const struct hypfs_entry_dir *template; /* Template used. */ -#define HYPFS_DYNDIR_ID_NAMELEN 12 -char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. */ - -unsigned int id;/* Numerical id. */ -void *data; /* Data associated with id. */ +const struct hypfs_entry *template; /* Template used. */ +union { +#define HYPFS_DYNDIR_NAMELEN32 +char name[HYPFS_DYNDIR_NAMELEN]; /* Name of hypfs entry. */ +struct { +#define HYPFS_DYNDIR_ID_NAMELEN 12 +char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of id entry. */ +unsigned int id;/* Numerical id. */ +} id; +}; +void*data; /* Data associated with entry. */ I'm not sure I see the benefit from this union. The only one I see is that struct hypds_dyndir will be smaller by sizeof(unsigned int). Could you explain please? My main concern is that it is not obvious to a user that the numerical id is needed only for a special case. Putting it into the union makes this much more clear. Also what do you think about the following change: -char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. */ - -unsigned int id;/* Numerical id. */ -void *data; /* Data associated with id. */ +char name[HYPFS_DYNDIR_ID_NAMELEN]; /* Name of hypfs entry. */ + +unsigned int id;/* Numerical id. */ +union { + const void *content; + void *data; /* Data associated with id. */ +} This change is similar to the hypfs_entry_leaf. In this case we can store const pointer for read-only entries and use data when write access is needed? Sure, if you need that. PS: I will address all your comments in v3. Thanks, Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH] x86emul: fix VPBLENDMW with mask and memory operand
On 11/02/2022 11:09, Jan Beulich wrote: > Element size for this opcode depends on EVEX.W, not the low opcode bit. > Make use of AVX512BW being a prereq to AVX512_BITALG and move the case > label there, adding an AVX512BW feature check. > > Signed-off-by: Jan Beulich Reviewed-by: Andrew Cooper
Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure
On 11.02.22 13:51, Roger Pau Monné wrote: > On Fri, Feb 11, 2022 at 08:46:59AM +, Oleksandr Andrushchenko wrote: >> >> On 10.02.22 18:16, Roger Pau Monné wrote: >>> On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko Introduce a per-domain read/write lock to check whether vpci is present, so we are sure there are no accesses to the contents of the vpci struct if not. This lock can be used (and in a few cases is used right away) so that vpci removal can be performed while holding the lock in write mode. Previously such removal could race with vpci_read for example. >>> Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt >>> pci_remove_device, and likely when vPCI gets also used in >>> {de}assign_device I think. >>> >> How about the below? It seems to guarantee that we can access pdev >> without issues and without requiring pcidevs_lock to be used? > Hm, I'm unsure this is correct. Yes, we need pcidevs as rwlock in order to solve this reliably... > It's in general a bad idea to use a > per-domain lock approach to protect the consistency of elements moving > between domains. > > In order for this to be safe you will likely need to hold both the > source and the destination per-domain locks, and then you could also > get into ABBA lock issues unless you always take the lock in the same > order. > > I think it's safer to use a global lock in this case (pcidevs_lock). > >> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c >> index e8b09d77d880..fd464a58b3b3 100644 >> --- a/xen/drivers/passthrough/pci.c >> +++ b/xen/drivers/passthrough/pci.c >> @@ -937,8 +937,14 @@ static int deassign_device(struct domain *d, uint16_t >> seg, uint8_t bus, >> } >> >> devfn = pdev->devfn; >> +#ifdef CONFIG_HAS_VPCI >> + write_lock(>vpci_rwlock); >> +#endif >> ret = iommu_call(hd->platform_ops, reassign_device, d, target, devfn, >> pci_to_dev(pdev)); >> +#ifdef CONFIG_HAS_VPCI >> + write_unlock(>vpci_rwlock); >> +#endif >> if ( ret ) >> goto out; >> >> @@ -1474,6 +1480,9 @@ static int assign_device(struct domain *d, u16 seg, u8 >> bus, u8 devfn, u32 flag) >> const struct domain_iommu *hd = dom_iommu(d); >> struct pci_dev *pdev; >> int rc = 0; >> +#ifdef CONFIG_HAS_VPCI >> + struct domain *old_d; >> +#endif >> >> if ( !is_iommu_enabled(d) ) >> return 0; >> @@ -1487,15 +1496,34 @@ static int assign_device(struct domain *d, u16 seg, >> u8 bus, u8 devfn, u32 flag) >> ASSERT(pdev && (pdev->domain == hardware_domain || >> pdev->domain == dom_io)); >> >> +#ifdef CONFIG_HAS_VPCI >> + /* pdev->domain is either hwdom or dom_io. We do not want the later. */ >> + old_d = pdev->domain == hardware_domain ? pdev->domain : NULL; >> + if ( old_d ) >> + write_lock(_d->vpci_rwlock); >> +#endif >> + >> rc = pdev_msix_assign(d, pdev); > I don't think you need the vpci lock for this operation. > >> if ( rc ) >> + { >> +#ifdef CONFIG_HAS_VPCI >> + if ( old_d ) >> + write_unlock(_d->vpci_rwlock); >> +#endif >> goto done; >> + } >> >> pdev->fault.count = 0; >> >> if ( (rc = iommu_call(hd->platform_ops, assign_device, d, devfn, >> pci_to_dev(pdev), flag)) ) >> + { >> +#ifdef CONFIG_HAS_VPCI >> + if ( old_d ) >> + write_unlock(_d->vpci_rwlock); >> +#endif > Like I've mentioned above, I'm unsure this is correct. You are holding > the lock of the previous domain, but at some point the device will be > assigned to a new domain, so that change won't be protected by the > lock of the new domain. > > Thanks, Roger.
Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure
On 11.02.22 13:40, Roger Pau Monné wrote: > On Fri, Feb 11, 2022 at 07:27:39AM +, Oleksandr Andrushchenko wrote: >> Hi, Roger! >> >> On 10.02.22 18:16, Roger Pau Monné wrote: >>> On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko Introduce a per-domain read/write lock to check whether vpci is present, so we are sure there are no accesses to the contents of the vpci struct if not. This lock can be used (and in a few cases is used right away) so that vpci removal can be performed while holding the lock in write mode. Previously such removal could race with vpci_read for example. >>> Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt >>> pci_remove_device, and likely when vPCI gets also used in >>> {de}assign_device I think. >> Yes, this is indeed an issue, but I was not trying to solve it in >> context of vPCI locking yet. I think we should discuss how do >> we approach pdev locking, so I can create a patch for that. >> that being said, I would like not to solve pdev in this patch yet >> >> ...I do understand we do want to avoid that, but at the moment >> a single reliable way for making sure pdev is alive seems to >> be pcidevs_lock > I think we will need to make pcidevs_lock a rwlock and take it in read > mode for pci_get_pdev_by_domain. > > We didn't have this scenario before where PCI emulation is done in the > hypervisor, and hence the locking around those data structures has not > been designed for those use-cases. Yes, I do understand that. I hope pcidevs lock move to rwlock can be done as a separate patch. While this is not done, do you think we can proceed with vPCI series and pcidevs locking re-work being done after? > 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure from being removed. 2. Writing the command register and ROM BAR register may trigger modify_bars to run, which in turn may access multiple pdevs while checking for the existing BAR's overlap. The overlapping check, if done under the read lock, requires vpci->lock to be acquired on both devices being compared, which may produce a deadlock. It is not possible to upgrade read lock to write lock in such a case. So, in order to prevent the deadlock, check which registers are going to be written and acquire the lock in the appropriate mode from the beginning. All other code, which doesn't lead to pdev->vpci destruction and does not access multiple pdevs at the same time, can still use a combination of the read lock and pdev->vpci->lock. 3. Optimize if ROM BAR write lock required detection by caching offset of the ROM BAR register in vpci->header->rom_reg which depends on header's type. 4. Reduce locked region in vpci_remove_device as it is now possible to set pdev->vpci to NULL early right after the write lock is acquired. 5. Reduce locked region in vpci_add_handlers as it is possible to initialize many more fields of the struct vpci before assigning it to pdev->vpci. 6. vpci_{add|remove}_register are required to be called with the write lock held, but it is not feasible to add an assert there as it requires struct domain to be passed for that. So, add a comment about this requirement to these and other functions with the equivalent constraints. 7. Drop const qualifier where the new rwlock is used and this is appropriate. 8. This is based on the discussion at [1]. [1] https://urldefense.com/v3/__https://lore.kernel.org/all/20220204063459.680961-4-andr2...@gmail.com/__;!!GF_29dbcQIUBPA!gObSySzN7s6zSKrcpSEi6vw18fRPls157cuRoqq4KDd7Ic_Nvh_cFlyVXPRpEWBkI38pgsvvfg$ [lore[.]kernel[.]org] Suggested-by: Roger Pau Monné Suggested-by: Jan Beulich Signed-off-by: Oleksandr Andrushchenko --- This was checked on x86: with and without PVH Dom0. --- xen/arch/x86/hvm/vmsi.c | 2 + xen/common/domain.c | 3 + xen/drivers/vpci/header.c | 8 +++ xen/drivers/vpci/msi.c| 8 ++- xen/drivers/vpci/msix.c | 40 +++-- xen/drivers/vpci/vpci.c | 114 -- xen/include/xen/sched.h | 3 + xen/include/xen/vpci.h| 2 + 8 files changed, 146 insertions(+), 34 deletions(-) diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index 13e2a190b439..351cb968a423 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -893,6 +893,8 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) { unsigned int i; +ASSERT(!!rw_is_locked(>pdev->domain->vpci_rwlock)); >>> ^ no need for the double negation. >> Ok, will update all asserts which use !! >>> Also this
Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data
On Fri, Feb 11, 2022 at 12:42:11PM +0100, Jan Beulich wrote: > On 11.02.2022 12:19, Roger Pau Monné wrote: > > On Fri, Feb 11, 2022 at 11:50:46AM +0100, Jan Beulich wrote: > >> On 11.02.2022 11:47, Roger Pau Monné wrote: > >>> On Fri, Feb 11, 2022 at 11:32:45AM +0100, Jan Beulich wrote: > On 11.02.2022 10:02, Roger Pau Monné wrote: > > On Fri, Feb 11, 2022 at 08:23:27AM +0100, Norbert Manthey wrote: > >> When re-identifying CPU data, we might use uninitialized data when > >> checking for the cache line property to adapt the cache > >> alignment. The data that depends on this uninitialized read is > >> currently not forwarded. > >> > >> To avoid problems in the future, initialize the data cpuinfo > >> structure before re-identifying the CPU again. > >> > >> The trace to hit the uninitialized read reported by Coverity is: > >> > >> bool recheck_cpu_features(unsigned int cpu) > >> ... > >> struct cpuinfo_x86 c; > >> ... > >> identify_cpu(); > >> > >> void identify_cpu(struct cpuinfo_x86 *c) > >> ... > >> generic_identify(c) > >> > >> static void generic_identify(struct cpuinfo_x86 *c) > >> ... > > > > Would it be more appropriate for generic_identify to also set > > x86_cache_alignment like it's done in early_cpu_init? > > > > generic_identify already re-fetches a bunch of stuff that's also > > set by early_cpu_init for the BSP. > > This would be an option, but how sure are you that there isn't > (going to be) another field with similar properties? We better > wouldn't require _everything_ to be re-filled in generic_identify(). > >>> > >>> So you think generic_identify should call into early_cpu_init, or even > >>> split the cpuinfo_x86 filling done in early_cpu_init into a non-init > >>> function that could be called by both generic_identify and > >>> early_cpu_init? > >> > >> No, I think it is quite fine for this to be a two-step process. > > > > But it's not a two step process for all CPUs. It's a two step process > > for the BSP, that will get it's cpuinfo filled by early_cpu_init > > first, and then by identify_cpu. OTHO APs will only get the > > information filled by identify_cpu. > > > > Maybe APs don't care about having x86_cache_alignment correctly set? > > They do care, and the field also isn't left uninitialized. See > initialize_cpu_data(). Like in many other places we assume full > symmetry among processors here. Thanks, I did miss that part. Then I think it's fine to initialize the struct in recheck_cpu_features to zero. That's likely to make it more obvious if we somehow miss to update certain featuresets? (as they would be empty instead of inheriting from boot CPU data). Thanks, Roger.
Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data
On 2/11/22 11:34, Jan Beulich wrote: > On 11.02.2022 08:23, Norbert Manthey wrote: >> --- a/xen/arch/x86/cpuid.c >> +++ b/xen/arch/x86/cpuid.c >> @@ -609,7 +609,7 @@ void __init init_guest_cpuid(void) >> bool recheck_cpu_features(unsigned int cpu) >> { >> bool okay = true; >> -struct cpuinfo_x86 c; >> +struct cpuinfo_x86 c = boot_cpu_data; >> const struct cpuinfo_x86 *bsp = _cpu_data; >> unsigned int i; > While I agree with the need to initialize the local variable, I > don't think it should be pre-seeded with previous indentification > results: This could end up hiding bugs. Instead I'd see it simply > be zero-filled. That works for me as well, I'll send a rev-2 accordingly. Norbert Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[xen-unstable test] 168081: trouble: broken/fail/pass
flight 168081 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/168081/ Failures and problems with tests :-( Tests which did not succeed and are blocking, including tests which could not be run: test-arm64-arm64-xl-credit1 broken test-arm64-arm64-xl-credit2 broken test-arm64-arm64-xl-seattle broken test-arm64-arm64-xl-xsm broken Tests which are failing intermittently (not blocking): test-arm64-arm64-xl-seattle 5 host-install(5) broken pass in 168075 test-arm64-arm64-xl-credit2 5 host-install(5) broken pass in 168075 test-arm64-arm64-xl-xsm 5 host-install(5) broken pass in 168075 test-arm64-arm64-xl-credit1 5 host-install(5) broken pass in 168075 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeat fail in 168075 pass in 168081 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail in 168075 pass in 168081 Tests which did not succeed, but are not blocking: test-arm64-arm64-xl-seattle 15 migrate-support-check fail in 168075 never pass test-arm64-arm64-xl-seattle 16 saverestore-support-check fail in 168075 never pass test-arm64-arm64-xl-xsm 15 migrate-support-check fail in 168075 never pass test-arm64-arm64-xl-credit2 15 migrate-support-check fail in 168075 never pass test-arm64-arm64-xl-credit1 15 migrate-support-check fail in 168075 never pass test-arm64-arm64-xl-credit2 16 saverestore-support-check fail in 168075 never pass test-arm64-arm64-xl-xsm 16 saverestore-support-check fail in 168075 never pass test-arm64-arm64-xl-credit1 16 saverestore-support-check fail in 168075 never pass test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168075 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168075 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168075 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168075 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168075 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 168075 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 168075 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168075 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168075 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 168075 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168075 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168075 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-check
Re: [RFC v2 5/8] xen/arm: introduce SCMI-SMC mediator driver
On Fri, Feb 11, 2022 at 11:18:47AM +, Bertrand Marquis wrote: > Hi Oleksii, > > > > On 11 Feb 2022, at 10:44, Oleksii Moisieiev > > wrote: > > > > Hi Bertrand, > > > > On Fri, Feb 11, 2022 at 08:46:05AM +, Bertrand Marquis wrote: > >> Hi Oleksii, > >> > >> > >>> On 8 Feb 2022, at 18:00, Oleksii Moisieiev > >>> wrote: > >>> > >>> This is the implementation of SCI interface, called SCMI-SMC driver, > >>> which works as the mediator between XEN Domains and Firmware (SCP, ATF > >>> etc). > >>> This allows devices from the Domains to work with clocks, resets and > >>> power-domains without access to CPG. > >>> > >>> Originally, cpg should be passed to the domain so it can work with > >>> power-domains/clocks/resets etc. Considering that cpg can't be split > >>> between > >>> the Domains, we get the limitation that the devices, which are using > >>> power-domains/clocks/resets etc, couldn't be split between the domains. > >>> The solution is to move the power-domain/clock/resets etc to the > >>> Firmware (such as SCP firmware or ATF) and provide interface for the > >>> Domains. XEN should have an entity, caled SCI-Mediator, which is > >>> responsible for messages redirection between Domains and Firmware and > >>> for permission handling. > >>> > >>> The following features are implemented: > >>> - request SCMI channels from ATF and pass channels to Domains; > >>> - set device permissions for Domains based on the Domain partial > >>> device-tree. Devices with permissions are able to work with clocks, > >>> resets and power-domains via SCMI; > >>> - redirect scmi messages from Domains to ATF. > >> > >> Before going more deeply in the code I would like to discuss the general > >> design here and ask some questions to prevent to rework the code before > >> we all agree that this is the right solution and that we want this in Xen. > >> > >> First I want to point out that clock/reset/power virtualization is a > >> problem > >> on most applications using device pass-through and I am very glad that > >> someone is looking into it. > >> Also SCMI is the current standard existing for this so relying on it is a > >> very > >> good idea. > >> > >> Latest version SCMI standard (DEN0056D v3.1) is defining some means > >> to use SCMI on a virtualised system. In chapter 4.2.1, the standard > >> recommends to set permissions per agent in the hypervisor so that a VM > >> could later use the discovery protocol to detect the resources and use > >> them. > >> Using this kind of scenario the mediator in Xen would just configure the > >> Permissions in the SCMI and would then rely on it to limit what is possible > >> by who just by just assigning a channel to a VM. > > > >> > >> In your current design (please correct me if I am wrong) you seem to fully > >> rely on Xen and the FDT for discovery and permission. > > > > In current implementation Xen is the trusted agent. And it's responsible > > for permissions setting. During initialization it discovers agent and > > set permissions by using BASE_SET_DEVICE_PERMISSIONS to the Dom0. When > > new domain is created, Xen assigns agent id for this domain and request > > resources, that are passed-through to this Domain. > > Ok > > > > > I'm getting the follwing information from FDT: > > 1) Shared memory addressed, which should be used for agents. During > > initialization I send BASE_DISCOVER_AGENT to each of this addresses and > > receive agent_id. Xen is responsible for assigning agent_id for the > > Domain. Then Xen intercept smc calls from the domain, set agent_id and > > redirects it to the Firmware. > > So Xen is setting the agent ID, no way for a guest to get access to something > it > should with more check, am I right ? > Yes. Xen is the only entity, which is trusted. So it's responsible for setting permissions and assigning agent_id. Guest get's an access only for the devices it's allowed to. > > > > 2) Devices, that are using SCMI. Those devices has clock/power/resets > > etc related to scmi protocol (as it is done in Linux kernel) > > and scmi_devid should be set. I'm currently preparing to send patch, > > updating kernel bindings with this parameter to Linux kernel. > > scmi_devid value should match device id, set in the Firmware. > > dt example: > > { > >scmi_devid = <1>; // usb0 device id > >clocks = <_clock 1> // relays on clock with id 1 > > } > > > > Xen requests permission for the device when device is attached to the > > Domain during creation. > > Without this, how is (if it is) the linux kernel using SCMI for power > management ? Here is how it should be desribed in FDT: / { firmware { scmi { arm,smc-id = <0x8202>; scmi_power: protocol@11 { reg = <0x11>; #power-domain-cells = <1>; }; ... scmi_clock: protocol@14 { ... scmi_reset: protocol@16 { ... }; }; }; {
Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure
On Fri, Feb 11, 2022 at 08:46:59AM +, Oleksandr Andrushchenko wrote: > > > On 10.02.22 18:16, Roger Pau Monné wrote: > > On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote: > >> From: Oleksandr Andrushchenko > >> > >> Introduce a per-domain read/write lock to check whether vpci is present, > >> so we are sure there are no accesses to the contents of the vpci struct > >> if not. This lock can be used (and in a few cases is used right away) > >> so that vpci removal can be performed while holding the lock in write > >> mode. Previously such removal could race with vpci_read for example. > > Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt > > pci_remove_device, and likely when vPCI gets also used in > > {de}assign_device I think. > > > How about the below? It seems to guarantee that we can access pdev > without issues and without requiring pcidevs_lock to be used? Hm, I'm unsure this is correct. It's in general a bad idea to use a per-domain lock approach to protect the consistency of elements moving between domains. In order for this to be safe you will likely need to hold both the source and the destination per-domain locks, and then you could also get into ABBA lock issues unless you always take the lock in the same order. I think it's safer to use a global lock in this case (pcidevs_lock). > diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c > index e8b09d77d880..fd464a58b3b3 100644 > --- a/xen/drivers/passthrough/pci.c > +++ b/xen/drivers/passthrough/pci.c > @@ -937,8 +937,14 @@ static int deassign_device(struct domain *d, uint16_t > seg, uint8_t bus, > } > > devfn = pdev->devfn; > +#ifdef CONFIG_HAS_VPCI > + write_lock(>vpci_rwlock); > +#endif > ret = iommu_call(hd->platform_ops, reassign_device, d, target, devfn, > pci_to_dev(pdev)); > +#ifdef CONFIG_HAS_VPCI > + write_unlock(>vpci_rwlock); > +#endif > if ( ret ) > goto out; > > @@ -1474,6 +1480,9 @@ static int assign_device(struct domain *d, u16 seg, u8 > bus, u8 devfn, u32 flag) > const struct domain_iommu *hd = dom_iommu(d); > struct pci_dev *pdev; > int rc = 0; > +#ifdef CONFIG_HAS_VPCI > + struct domain *old_d; > +#endif > > if ( !is_iommu_enabled(d) ) > return 0; > @@ -1487,15 +1496,34 @@ static int assign_device(struct domain *d, u16 seg, > u8 bus, u8 devfn, u32 flag) > ASSERT(pdev && (pdev->domain == hardware_domain || > pdev->domain == dom_io)); > > +#ifdef CONFIG_HAS_VPCI > + /* pdev->domain is either hwdom or dom_io. We do not want the later. */ > + old_d = pdev->domain == hardware_domain ? pdev->domain : NULL; > + if ( old_d ) > + write_lock(_d->vpci_rwlock); > +#endif > + > rc = pdev_msix_assign(d, pdev); I don't think you need the vpci lock for this operation. > if ( rc ) > + { > +#ifdef CONFIG_HAS_VPCI > + if ( old_d ) > + write_unlock(_d->vpci_rwlock); > +#endif > goto done; > + } > > pdev->fault.count = 0; > > if ( (rc = iommu_call(hd->platform_ops, assign_device, d, devfn, > pci_to_dev(pdev), flag)) ) > + { > +#ifdef CONFIG_HAS_VPCI > + if ( old_d ) > + write_unlock(_d->vpci_rwlock); > +#endif Like I've mentioned above, I'm unsure this is correct. You are holding the lock of the previous domain, but at some point the device will be assigned to a new domain, so that change won't be protected by the lock of the new domain. Thanks, Roger.
Re: [PATCH v2 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86
On 11.02.2022 12:29, Roger Pau Monné wrote: > On Fri, Feb 11, 2022 at 10:06:48AM +, Jane Malalane wrote: >> On 10/02/2022 10:03, Roger Pau Monné wrote: >>> On Mon, Feb 07, 2022 at 06:21:00PM +, Jane Malalane wrote: diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 7ab15e07a0..4060aef1bd 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -343,6 +343,15 @@ static int vmx_init_vmcs_config(bool bsp) MSR_IA32_VMX_PROCBASED_CTLS2, ); } +/* Check whether hardware supports accelerated xapic and x2apic. */ +if ( bsp ) +{ +assisted_xapic_available = cpu_has_vmx_virtualize_apic_accesses; +assisted_x2apic_available = (cpu_has_vmx_apic_reg_virt || + cpu_has_vmx_virtual_intr_delivery) && +cpu_has_vmx_virtualize_x2apic_mode; >>> >>> I've been think about this, and it seems kind of asymmetric that for >>> xAPIC mode we report hw assisted support only with >>> virtualize_apic_accesses available, while for x2APIC we require >>> virtualize_x2apic_mode plus either apic_reg_virt or >>> virtual_intr_delivery. >>> >>> I think we likely need to be more consistent here, and report hw >>> assisted x2APIC support as long as virtualize_x2apic_mode is >>> available. >>> >>> This will likely have some effect on patch 2 also, as you will have to >>> adjust vmx_vlapic_msr_changed. >>> >>> Thanks, Roger. >> >> Any other thoughts on this? As on one hand it is asymmetric but also >> there isn't much assistance with only virtualize_x2apic_mode set as, in >> this case, a VM exit will be avoided only when trying to access the TPR >> register. > > I've been thinking about this, and reporting hardware assisted > x{2}APIC virtualization with just > SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES or > SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE doesn't seem very helpful. While > those provide some assistance to the VMM in order to handle APIC > accesses, it will still require a trap into the hypervisor to handle > most of the accesses. > > So maybe we should only report hardware assisted support when the > mentioned features are present together with > SECONDARY_EXEC_APIC_REGISTER_VIRT? Not sure - "some assistance" seems still a little better than none at all. Which route to go depends on what exactly we intend the bit to be used for. Jan
Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data
On 11.02.2022 12:19, Roger Pau Monné wrote: > On Fri, Feb 11, 2022 at 11:50:46AM +0100, Jan Beulich wrote: >> On 11.02.2022 11:47, Roger Pau Monné wrote: >>> On Fri, Feb 11, 2022 at 11:32:45AM +0100, Jan Beulich wrote: On 11.02.2022 10:02, Roger Pau Monné wrote: > On Fri, Feb 11, 2022 at 08:23:27AM +0100, Norbert Manthey wrote: >> When re-identifying CPU data, we might use uninitialized data when >> checking for the cache line property to adapt the cache >> alignment. The data that depends on this uninitialized read is >> currently not forwarded. >> >> To avoid problems in the future, initialize the data cpuinfo >> structure before re-identifying the CPU again. >> >> The trace to hit the uninitialized read reported by Coverity is: >> >> bool recheck_cpu_features(unsigned int cpu) >> ... >> struct cpuinfo_x86 c; >> ... >> identify_cpu(); >> >> void identify_cpu(struct cpuinfo_x86 *c) >> ... >> generic_identify(c) >> >> static void generic_identify(struct cpuinfo_x86 *c) >> ... > > Would it be more appropriate for generic_identify to also set > x86_cache_alignment like it's done in early_cpu_init? > > generic_identify already re-fetches a bunch of stuff that's also > set by early_cpu_init for the BSP. This would be an option, but how sure are you that there isn't (going to be) another field with similar properties? We better wouldn't require _everything_ to be re-filled in generic_identify(). >>> >>> So you think generic_identify should call into early_cpu_init, or even >>> split the cpuinfo_x86 filling done in early_cpu_init into a non-init >>> function that could be called by both generic_identify and >>> early_cpu_init? >> >> No, I think it is quite fine for this to be a two-step process. > > But it's not a two step process for all CPUs. It's a two step process > for the BSP, that will get it's cpuinfo filled by early_cpu_init > first, and then by identify_cpu. OTHO APs will only get the > information filled by identify_cpu. > > Maybe APs don't care about having x86_cache_alignment correctly set? They do care, and the field also isn't left uninitialized. See initialize_cpu_data(). Like in many other places we assume full symmetry among processors here. Jan
Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure
On Fri, Feb 11, 2022 at 07:27:39AM +, Oleksandr Andrushchenko wrote: > Hi, Roger! > > On 10.02.22 18:16, Roger Pau Monné wrote: > > On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote: > >> From: Oleksandr Andrushchenko > >> > >> Introduce a per-domain read/write lock to check whether vpci is present, > >> so we are sure there are no accesses to the contents of the vpci struct > >> if not. This lock can be used (and in a few cases is used right away) > >> so that vpci removal can be performed while holding the lock in write > >> mode. Previously such removal could race with vpci_read for example. > > Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt > > pci_remove_device, and likely when vPCI gets also used in > > {de}assign_device I think. > Yes, this is indeed an issue, but I was not trying to solve it in > context of vPCI locking yet. I think we should discuss how do > we approach pdev locking, so I can create a patch for that. > that being said, I would like not to solve pdev in this patch yet > > ...I do understand we do want to avoid that, but at the moment > a single reliable way for making sure pdev is alive seems to > be pcidevs_lock I think we will need to make pcidevs_lock a rwlock and take it in read mode for pci_get_pdev_by_domain. We didn't have this scenario before where PCI emulation is done in the hypervisor, and hence the locking around those data structures has not been designed for those use-cases. > > > >> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure > >> from being removed. > >> > >> 2. Writing the command register and ROM BAR register may trigger > >> modify_bars to run, which in turn may access multiple pdevs while > >> checking for the existing BAR's overlap. The overlapping check, if done > >> under the read lock, requires vpci->lock to be acquired on both devices > >> being compared, which may produce a deadlock. It is not possible to > >> upgrade read lock to write lock in such a case. So, in order to prevent > >> the deadlock, check which registers are going to be written and acquire > >> the lock in the appropriate mode from the beginning. > >> > >> All other code, which doesn't lead to pdev->vpci destruction and does not > >> access multiple pdevs at the same time, can still use a combination of the > >> read lock and pdev->vpci->lock. > >> > >> 3. Optimize if ROM BAR write lock required detection by caching offset > >> of the ROM BAR register in vpci->header->rom_reg which depends on > >> header's type. > >> > >> 4. Reduce locked region in vpci_remove_device as it is now possible > >> to set pdev->vpci to NULL early right after the write lock is acquired. > >> > >> 5. Reduce locked region in vpci_add_handlers as it is possible to > >> initialize many more fields of the struct vpci before assigning it to > >> pdev->vpci. > >> > >> 6. vpci_{add|remove}_register are required to be called with the write lock > >> held, but it is not feasible to add an assert there as it requires > >> struct domain to be passed for that. So, add a comment about this > >> requirement > >> to these and other functions with the equivalent constraints. > >> > >> 7. Drop const qualifier where the new rwlock is used and this is > >> appropriate. > >> > >> 8. This is based on the discussion at [1]. > >> > >> [1] > >> https://urldefense.com/v3/__https://lore.kernel.org/all/20220204063459.680961-4-andr2...@gmail.com/__;!!GF_29dbcQIUBPA!gObSySzN7s6zSKrcpSEi6vw18fRPls157cuRoqq4KDd7Ic_Nvh_cFlyVXPRpEWBkI38pgsvvfg$ > >> [lore[.]kernel[.]org] > >> > >> Suggested-by: Roger Pau Monné > >> Suggested-by: Jan Beulich > >> Signed-off-by: Oleksandr Andrushchenko > >> > >> --- > >> This was checked on x86: with and without PVH Dom0. > >> --- > >> xen/arch/x86/hvm/vmsi.c | 2 + > >> xen/common/domain.c | 3 + > >> xen/drivers/vpci/header.c | 8 +++ > >> xen/drivers/vpci/msi.c| 8 ++- > >> xen/drivers/vpci/msix.c | 40 +++-- > >> xen/drivers/vpci/vpci.c | 114 -- > >> xen/include/xen/sched.h | 3 + > >> xen/include/xen/vpci.h| 2 + > >> 8 files changed, 146 insertions(+), 34 deletions(-) > >> > >> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > >> index 13e2a190b439..351cb968a423 100644 > >> --- a/xen/arch/x86/hvm/vmsi.c > >> +++ b/xen/arch/x86/hvm/vmsi.c > >> @@ -893,6 +893,8 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) > >> { > >> unsigned int i; > >> > >> +ASSERT(!!rw_is_locked(>pdev->domain->vpci_rwlock)); > >^ no need for the double negation. > Ok, will update all asserts which use !! > > > > Also this asserts that the lock is taken, but could be by a different > > pCPU. I guess it's better than nothing. > Fair enough. Do you still want the asserts or should I remove them? Likely fine to leave them. > > > >> + > >> for ( i = 0; i < msix->max_entries; i++ ) > >> { > >>
Re: [PATCH] x86emul: fix SIMD test overriding of VBROADCASTS{S,D}
On 11/02/2022 11:11, Jan Beulich wrote: > Despite their suffixes these aren't scalar instructions, and hence the > 128- and 256-bit EVEX forms may not be used without AVX512VL. Gcc11 ends > up generating such instances for simd-sg.c. > > Signed-off-by: Jan Beulich Acked-by: Andrew Cooper
Re: [PATCH] x86emul: work around gcc11 bug in SIMD tests
On 11/02/2022 11:01, Jan Beulich wrote: > Gcc11 looks to have trouble with conditional expressions used with > vector operands: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104497. > Replace two instances causing SEGV there in certain cases. > > Signed-off-by: Jan Beulich Acked-by: Andrew Cooper
Re: [PATCH v2 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86
On Fri, Feb 11, 2022 at 10:06:48AM +, Jane Malalane wrote: > On 10/02/2022 10:03, Roger Pau Monné wrote: > > On Mon, Feb 07, 2022 at 06:21:00PM +, Jane Malalane wrote: > >> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c > >> index 7ab15e07a0..4060aef1bd 100644 > >> --- a/xen/arch/x86/hvm/vmx/vmcs.c > >> +++ b/xen/arch/x86/hvm/vmx/vmcs.c > >> @@ -343,6 +343,15 @@ static int vmx_init_vmcs_config(bool bsp) > >> MSR_IA32_VMX_PROCBASED_CTLS2, ); > >> } > >> > >> +/* Check whether hardware supports accelerated xapic and x2apic. */ > >> +if ( bsp ) > >> +{ > >> +assisted_xapic_available = cpu_has_vmx_virtualize_apic_accesses; > >> +assisted_x2apic_available = (cpu_has_vmx_apic_reg_virt || > >> + cpu_has_vmx_virtual_intr_delivery) && > >> +cpu_has_vmx_virtualize_x2apic_mode; > > > > I've been think about this, and it seems kind of asymmetric that for > > xAPIC mode we report hw assisted support only with > > virtualize_apic_accesses available, while for x2APIC we require > > virtualize_x2apic_mode plus either apic_reg_virt or > > virtual_intr_delivery. > > > > I think we likely need to be more consistent here, and report hw > > assisted x2APIC support as long as virtualize_x2apic_mode is > > available. > > > > This will likely have some effect on patch 2 also, as you will have to > > adjust vmx_vlapic_msr_changed. > > > > Thanks, Roger. > > Any other thoughts on this? As on one hand it is asymmetric but also > there isn't much assistance with only virtualize_x2apic_mode set as, in > this case, a VM exit will be avoided only when trying to access the TPR > register. I've been thinking about this, and reporting hardware assisted x{2}APIC virtualization with just SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES or SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE doesn't seem very helpful. While those provide some assistance to the VMM in order to handle APIC accesses, it will still require a trap into the hypervisor to handle most of the accesses. So maybe we should only report hardware assisted support when the mentioned features are present together with SECONDARY_EXEC_APIC_REGISTER_VIRT? Thanks, Roger.
Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data
On Fri, Feb 11, 2022 at 11:50:46AM +0100, Jan Beulich wrote: > On 11.02.2022 11:47, Roger Pau Monné wrote: > > On Fri, Feb 11, 2022 at 11:32:45AM +0100, Jan Beulich wrote: > >> On 11.02.2022 10:02, Roger Pau Monné wrote: > >>> On Fri, Feb 11, 2022 at 08:23:27AM +0100, Norbert Manthey wrote: > When re-identifying CPU data, we might use uninitialized data when > checking for the cache line property to adapt the cache > alignment. The data that depends on this uninitialized read is > currently not forwarded. > > To avoid problems in the future, initialize the data cpuinfo > structure before re-identifying the CPU again. > > The trace to hit the uninitialized read reported by Coverity is: > > bool recheck_cpu_features(unsigned int cpu) > ... > struct cpuinfo_x86 c; > ... > identify_cpu(); > > void identify_cpu(struct cpuinfo_x86 *c) > ... > generic_identify(c) > > static void generic_identify(struct cpuinfo_x86 *c) > ... > >>> > >>> Would it be more appropriate for generic_identify to also set > >>> x86_cache_alignment like it's done in early_cpu_init? > >>> > >>> generic_identify already re-fetches a bunch of stuff that's also > >>> set by early_cpu_init for the BSP. > >> > >> This would be an option, but how sure are you that there isn't > >> (going to be) another field with similar properties? We better > >> wouldn't require _everything_ to be re-filled in generic_identify(). > > > > So you think generic_identify should call into early_cpu_init, or even > > split the cpuinfo_x86 filling done in early_cpu_init into a non-init > > function that could be called by both generic_identify and > > early_cpu_init? > > No, I think it is quite fine for this to be a two-step process. But it's not a two step process for all CPUs. It's a two step process for the BSP, that will get it's cpuinfo filled by early_cpu_init first, and then by identify_cpu. OTHO APs will only get the information filled by identify_cpu. Maybe APs don't care about having x86_cache_alignment correctly set? Thanks, Roger.
Re: [RFC v2 5/8] xen/arm: introduce SCMI-SMC mediator driver
Hi Oleksii, > On 11 Feb 2022, at 10:44, Oleksii Moisieiev > wrote: > > Hi Bertrand, > > On Fri, Feb 11, 2022 at 08:46:05AM +, Bertrand Marquis wrote: >> Hi Oleksii, >> >> >>> On 8 Feb 2022, at 18:00, Oleksii Moisieiev >>> wrote: >>> >>> This is the implementation of SCI interface, called SCMI-SMC driver, >>> which works as the mediator between XEN Domains and Firmware (SCP, ATF etc). >>> This allows devices from the Domains to work with clocks, resets and >>> power-domains without access to CPG. >>> >>> Originally, cpg should be passed to the domain so it can work with >>> power-domains/clocks/resets etc. Considering that cpg can't be split between >>> the Domains, we get the limitation that the devices, which are using >>> power-domains/clocks/resets etc, couldn't be split between the domains. >>> The solution is to move the power-domain/clock/resets etc to the >>> Firmware (such as SCP firmware or ATF) and provide interface for the >>> Domains. XEN should have an entity, caled SCI-Mediator, which is >>> responsible for messages redirection between Domains and Firmware and >>> for permission handling. >>> >>> The following features are implemented: >>> - request SCMI channels from ATF and pass channels to Domains; >>> - set device permissions for Domains based on the Domain partial >>> device-tree. Devices with permissions are able to work with clocks, >>> resets and power-domains via SCMI; >>> - redirect scmi messages from Domains to ATF. >> >> Before going more deeply in the code I would like to discuss the general >> design here and ask some questions to prevent to rework the code before >> we all agree that this is the right solution and that we want this in Xen. >> >> First I want to point out that clock/reset/power virtualization is a problem >> on most applications using device pass-through and I am very glad that >> someone is looking into it. >> Also SCMI is the current standard existing for this so relying on it is a >> very >> good idea. >> >> Latest version SCMI standard (DEN0056D v3.1) is defining some means >> to use SCMI on a virtualised system. In chapter 4.2.1, the standard >> recommends to set permissions per agent in the hypervisor so that a VM >> could later use the discovery protocol to detect the resources and use them. >> Using this kind of scenario the mediator in Xen would just configure the >> Permissions in the SCMI and would then rely on it to limit what is possible >> by who just by just assigning a channel to a VM. > >> >> In your current design (please correct me if I am wrong) you seem to fully >> rely on Xen and the FDT for discovery and permission. > > In current implementation Xen is the trusted agent. And it's responsible > for permissions setting. During initialization it discovers agent and > set permissions by using BASE_SET_DEVICE_PERMISSIONS to the Dom0. When > new domain is created, Xen assigns agent id for this domain and request > resources, that are passed-through to this Domain. Ok > > I'm getting the follwing information from FDT: > 1) Shared memory addressed, which should be used for agents. During > initialization I send BASE_DISCOVER_AGENT to each of this addresses and > receive agent_id. Xen is responsible for assigning agent_id for the > Domain. Then Xen intercept smc calls from the domain, set agent_id and > redirects it to the Firmware. So Xen is setting the agent ID, no way for a guest to get access to something it should with more check, am I right ? > > 2) Devices, that are using SCMI. Those devices has clock/power/resets > etc related to scmi protocol (as it is done in Linux kernel) > and scmi_devid should be set. I'm currently preparing to send patch, > updating kernel bindings with this parameter to Linux kernel. > scmi_devid value should match device id, set in the Firmware. > dt example: > { >scmi_devid = <1>; // usb0 device id >clocks = <_clock 1> // relays on clock with id 1 > } > > Xen requests permission for the device when device is attached to the > Domain during creation. Without this, how is (if it is) the linux kernel using SCMI for power management ? > >> Wouldn’t it be a better idea to use the protocol fully ? > > Hm, I was thinking I am using the protocol fully. Did I miss something? Sorry you seem to be, my understanding of your design was not right. > >> Could we get rid of some of the FDT dependencies by using the discovery >> system of SCMI ? > > I'm using FDT to get shmem regions for the channels. Then I send > BASE_DISCOVER_AGENT to each region and getting agent data. Did I use the > discovery system wrong? After more digging it seems you are. The link between scmi resource and device is not possible to get automatically. > >> How is Linux doing this currently ? Is it relying on device tree to discover >> the SCMI resources ? > > Yes. Linux kernel has 2 nodes in the device-tree: arm,scmi-shmem, which > includes memory region for the communication and arm,scmi-smc node, >
[PATCH] x86emul: fix SIMD test overriding of VBROADCASTS{S,D}
Despite their suffixes these aren't scalar instructions, and hence the 128- and 256-bit EVEX forms may not be used without AVX512VL. Gcc11 ends up generating such instances for simd-sg.c. Signed-off-by: Jan Beulich --- a/tools/tests/x86_emulator/simd.h +++ b/tools/tests/x86_emulator/simd.h @@ -250,7 +250,9 @@ asm ( ".macro override insn\n\t" # define OVR_INT(n) OVR_BW(n); OVR_DQ(n) OVR_INT(broadcast); +# ifdef __AVX512VL__ OVR_SFP(broadcast); +# endif OVR_SFP(comi); OVR_VFP(cvtdq2); OVR_INT(abs);
[PATCH] x86emul: fix VPBLENDMW with mask and memory operand
Element size for this opcode depends on EVEX.W, not the low opcode bit. Make use of AVX512BW being a prereq to AVX512_BITALG and move the case label there, adding an AVX512BW feature check. Signed-off-by: Jan Beulich --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -7401,7 +7401,6 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f38, 0x0b): /* vpmulhrsw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x1c): /* vpabsb [xyz]mm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x1d): /* vpabsw [xyz]mm/mem,[xyz]mm{k} */ -case X86EMUL_OPC_EVEX_66(0x0f38, 0x66): /* vpblendm{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ host_and_vcpu_must_have(avx512bw); generate_exception_if(evex.brs, EXC_UD); elem_bytes = 1 << (b & 1); @@ -9558,6 +9557,9 @@ x86_emulate( /* fall through */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x54): /* vpopcnt{b,w} [xyz]mm/mem,[xyz]mm{k} */ host_and_vcpu_must_have(avx512_bitalg); +/* fall through */ +case X86EMUL_OPC_EVEX_66(0x0f38, 0x66): /* vpblendm{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ +host_and_vcpu_must_have(avx512bw); generate_exception_if(evex.brs, EXC_UD); elem_bytes = 1 << evex.w; goto avx512f_no_sae;
Re: [PATCH 2/3] x86/Intel: skip CORE_THREAD_COUNT read on family 0xf
On 11.02.2022 11:58, Roger Pau Monné wrote: > On Thu, Feb 10, 2022 at 03:56:12PM +0100, Jan Beulich wrote: >> This avoids an unnecessary (and always somewhat scary) log message for >> the recovered from #GP(0). >> >> Signed-off-by: Jan Beulich > > Acked-by: Roger Pau Monné Thanks. >> --- >> Perhaps even use "== 6" in the family check? > > I think it's best as is. Even on family 6 this seems to be supported > only on model 3f? (Haswell Xeon E5 v3 and E7 v3?) Well, ... > The comment also seems to note this is mostly undocumented. ... this same comment says "Nehalem and later". And this also matches my observations (and, given he has written the comment, quite clearly Andrew's as well), no matter what the (notoriously incomplete) SDM Vol 4 may say. Jan
[PATCH] x86emul: work around gcc11 bug in SIMD tests
Gcc11 looks to have trouble with conditional expressions used with vector operands: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104497. Replace two instances causing SEGV there in certain cases. Signed-off-by: Jan Beulich --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -1727,8 +1727,8 @@ int simd_test(void) if ( !eq(x - src, (alt + 1) / 2) ) return __LINE__; #endif -for ( i = 0; i < ELEM_COUNT; ++i ) -y[i] = (i & 1 ? inv : src)[i]; +for ( y = src, i = 1; i < ELEM_COUNT; i += 2 ) +y[i] = inv[i]; #ifdef select # ifdef UINT_SIZE --- a/tools/tests/x86_emulator/simd-sg.c +++ b/tools/tests/x86_emulator/simd-sg.c @@ -282,7 +282,7 @@ int sg_test(void) # if ELEM_SIZE == IDX_SIZE y = gather(x, array, idx, (idx & inv) != 0, ELEM_SIZE); for ( i = 0; i < ITEM_COUNT; ++i ) -if ( y[i] != ((i + 1) & (ITEM_COUNT - i) ? idx : inv)[i] + 1 ) +if ( y[i] != ((i + 1) & (ITEM_COUNT - i) ? idx[i] : inv[i]) + 1 ) return __LINE__; for ( ; i < ELEM_COUNT; ++i ) if ( y[i] )
Re: [PATCH 2/3] x86/Intel: skip CORE_THREAD_COUNT read on family 0xf
On Thu, Feb 10, 2022 at 03:56:12PM +0100, Jan Beulich wrote: > This avoids an unnecessary (and always somewhat scary) log message for > the recovered from #GP(0). > > Signed-off-by: Jan Beulich Acked-by: Roger Pau Monné > --- > Perhaps even use "== 6" in the family check? I think it's best as is. Even on family 6 this seems to be supported only on model 3f? (Haswell Xeon E5 v3 and E7 v3?) The comment also seems to note this is mostly undocumented. Thanks, Roger.
Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data
On 11.02.2022 11:47, Roger Pau Monné wrote: > On Fri, Feb 11, 2022 at 11:32:45AM +0100, Jan Beulich wrote: >> On 11.02.2022 10:02, Roger Pau Monné wrote: >>> On Fri, Feb 11, 2022 at 08:23:27AM +0100, Norbert Manthey wrote: When re-identifying CPU data, we might use uninitialized data when checking for the cache line property to adapt the cache alignment. The data that depends on this uninitialized read is currently not forwarded. To avoid problems in the future, initialize the data cpuinfo structure before re-identifying the CPU again. The trace to hit the uninitialized read reported by Coverity is: bool recheck_cpu_features(unsigned int cpu) ... struct cpuinfo_x86 c; ... identify_cpu(); void identify_cpu(struct cpuinfo_x86 *c) ... generic_identify(c) static void generic_identify(struct cpuinfo_x86 *c) ... >>> >>> Would it be more appropriate for generic_identify to also set >>> x86_cache_alignment like it's done in early_cpu_init? >>> >>> generic_identify already re-fetches a bunch of stuff that's also >>> set by early_cpu_init for the BSP. >> >> This would be an option, but how sure are you that there isn't >> (going to be) another field with similar properties? We better >> wouldn't require _everything_ to be re-filled in generic_identify(). > > So you think generic_identify should call into early_cpu_init, or even > split the cpuinfo_x86 filling done in early_cpu_init into a non-init > function that could be called by both generic_identify and > early_cpu_init? No, I think it is quite fine for this to be a two-step process. In fact I was hoping to eliminate some of the remaining redundancy where possible. recheck_cpu_features(), after all, cares about just the feature flags, so doesn't require things like cache line alignment to be filled in. Jan
Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data
On Fri, Feb 11, 2022 at 11:32:45AM +0100, Jan Beulich wrote: > On 11.02.2022 10:02, Roger Pau Monné wrote: > > On Fri, Feb 11, 2022 at 08:23:27AM +0100, Norbert Manthey wrote: > >> When re-identifying CPU data, we might use uninitialized data when > >> checking for the cache line property to adapt the cache > >> alignment. The data that depends on this uninitialized read is > >> currently not forwarded. > >> > >> To avoid problems in the future, initialize the data cpuinfo > >> structure before re-identifying the CPU again. > >> > >> The trace to hit the uninitialized read reported by Coverity is: > >> > >> bool recheck_cpu_features(unsigned int cpu) > >> ... > >> struct cpuinfo_x86 c; > >> ... > >> identify_cpu(); > >> > >> void identify_cpu(struct cpuinfo_x86 *c) > >> ... > >> generic_identify(c) > >> > >> static void generic_identify(struct cpuinfo_x86 *c) > >> ... > > > > Would it be more appropriate for generic_identify to also set > > x86_cache_alignment like it's done in early_cpu_init? > > > > generic_identify already re-fetches a bunch of stuff that's also > > set by early_cpu_init for the BSP. > > This would be an option, but how sure are you that there isn't > (going to be) another field with similar properties? We better > wouldn't require _everything_ to be re-filled in generic_identify(). So you think generic_identify should call into early_cpu_init, or even split the cpuinfo_x86 filling done in early_cpu_init into a non-init function that could be called by both generic_identify and early_cpu_init? Thanks, Roger.
Re: [RFC v2 5/8] xen/arm: introduce SCMI-SMC mediator driver
Hi Bertrand, On Fri, Feb 11, 2022 at 08:46:05AM +, Bertrand Marquis wrote: > Hi Oleksii, > > > > On 8 Feb 2022, at 18:00, Oleksii Moisieiev > > wrote: > > > > This is the implementation of SCI interface, called SCMI-SMC driver, > > which works as the mediator between XEN Domains and Firmware (SCP, ATF etc). > > This allows devices from the Domains to work with clocks, resets and > > power-domains without access to CPG. > > > > Originally, cpg should be passed to the domain so it can work with > > power-domains/clocks/resets etc. Considering that cpg can't be split between > > the Domains, we get the limitation that the devices, which are using > > power-domains/clocks/resets etc, couldn't be split between the domains. > > The solution is to move the power-domain/clock/resets etc to the > > Firmware (such as SCP firmware or ATF) and provide interface for the > > Domains. XEN should have an entity, caled SCI-Mediator, which is > > responsible for messages redirection between Domains and Firmware and > > for permission handling. > > > > The following features are implemented: > > - request SCMI channels from ATF and pass channels to Domains; > > - set device permissions for Domains based on the Domain partial > > device-tree. Devices with permissions are able to work with clocks, > > resets and power-domains via SCMI; > > - redirect scmi messages from Domains to ATF. > > Before going more deeply in the code I would like to discuss the general > design here and ask some questions to prevent to rework the code before > we all agree that this is the right solution and that we want this in Xen. > > First I want to point out that clock/reset/power virtualization is a problem > on most applications using device pass-through and I am very glad that > someone is looking into it. > Also SCMI is the current standard existing for this so relying on it is a very > good idea. > > Latest version SCMI standard (DEN0056D v3.1) is defining some means > to use SCMI on a virtualised system. In chapter 4.2.1, the standard > recommends to set permissions per agent in the hypervisor so that a VM > could later use the discovery protocol to detect the resources and use them. > Using this kind of scenario the mediator in Xen would just configure the > Permissions in the SCMI and would then rely on it to limit what is possible > by who just by just assigning a channel to a VM. > > In your current design (please correct me if I am wrong) you seem to fully > rely on Xen and the FDT for discovery and permission. In current implementation Xen is the trusted agent. And it's responsible for permissions setting. During initialization it discovers agent and set permissions by using BASE_SET_DEVICE_PERMISSIONS to the Dom0. When new domain is created, Xen assigns agent id for this domain and request resources, that are passed-through to this Domain. I'm getting the follwing information from FDT: 1) Shared memory addressed, which should be used for agents. During initialization I send BASE_DISCOVER_AGENT to each of this addresses and receive agent_id. Xen is responsible for assigning agent_id for the Domain. Then Xen intercept smc calls from the domain, set agent_id and redirects it to the Firmware. 2) Devices, that are using SCMI. Those devices has clock/power/resets etc related to scmi protocol (as it is done in Linux kernel) and scmi_devid should be set. I'm currently preparing to send patch, updating kernel bindings with this parameter to Linux kernel. scmi_devid value should match device id, set in the Firmware. dt example: { scmi_devid = <1>; // usb0 device id clocks = <_clock 1> // relays on clock with id 1 } Xen requests permission for the device when device is attached to the Domain during creation. > Wouldn’t it be a better idea to use the protocol fully ? Hm, I was thinking I am using the protocol fully. Did I miss something? > Could we get rid of some of the FDT dependencies by using the discovery > system of SCMI ? I'm using FDT to get shmem regions for the channels. Then I send BASE_DISCOVER_AGENT to each region and getting agent data. Did I use the discovery system wrong? > How is Linux doing this currently ? Is it relying on device tree to discover > the SCMI resources ? Yes. Linux kernel has 2 nodes in the device-tree: arm,scmi-shmem, which includes memory region for the communication and arm,scmi-smc node, which describes all data related to scmi ( func_id, protocols etc) Then the device nodes refer to the protocols by setting clock/resets/power-domains etc. Please see the example above. BASE_DISCOVER_AGENT is not used in Linux kernel. The main idea was that scmi related changes to the device-tree are common for virtualized and non virtualized systems. So the same FDT configuration should work with of without Xen. > > Also I understand that you rely on some entries to be declared in the device > tree and also some support to be implemented in ATF or SCP. I checked in > The boards I have
Re: [PATCH 1/3] x86/Intel: skip PLATFORM_INFO reads on family 0xf
On Fri, Feb 11, 2022 at 10:59:10AM +0100, Jan Beulich wrote: > On 11.02.2022 10:40, Roger Pau Monné wrote: > > On Thu, Feb 10, 2022 at 03:55:52PM +0100, Jan Beulich wrote: > >> This avoids unnecessary (and always somewhat scary) log messages for the > >> recovered from #GP(0). > > > > Could we maybe get rid of the #GP messages for cases like this where we > > are explicitly probing for MSR presence? (ie: it's expected that we > > can get a #GP) > > This would mean some form of annotation of such RDMSR attempts (for > the recovery code to recognize in order to skip the printk()). Not > all rdmsr_safe() uses are, strictly speaking, probes, so I wouldn't > want to put such in rdmsr_safe() itself. > > In any event - quite a bit more work. Plus I'm not convinced it's a > good idea to suppress any such log messages. > > >> Signed-off-by: Jan Beulich Acked-by: Roger Pau Monné > >> --- > >> Perhaps even use "!= 6" in at least the CPUID-faulting family check? > > > > Likely, or else you would also need to check for family 11 (Knights > > Corner?) which doesn't seem to support PLATFORM_INFO either. > > I don't think Xen is able to run on these (likewise for IA64, which > iirc were surfacing as x86 model 7)? These are the co-processor ones, > aren't they? Right, Knights Corner uses a socket mount but it's still a co-processor. It was Knights Landing the first one that could be used as a host processor. > My question was more towards whether we would (wrongly) > exclude future processors when using != 6, if Intel decided to ever > make new CPUs with a family other than 6. In the case here I think we should only avoid the probe for family 0xf. Newer families (or even models on family 6 not supporting PLATFORM_INFO) will just get a #GP message which is OK I think, we could fix that in due time. It's better to get a #GP message for probing than to just skip detection of CPUID faulting on unknown newer families IMO. Thanks, Roger.
Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data
On 11.02.2022 08:23, Norbert Manthey wrote: > --- a/xen/arch/x86/cpuid.c > +++ b/xen/arch/x86/cpuid.c > @@ -609,7 +609,7 @@ void __init init_guest_cpuid(void) > bool recheck_cpu_features(unsigned int cpu) > { > bool okay = true; > -struct cpuinfo_x86 c; > +struct cpuinfo_x86 c = boot_cpu_data; > const struct cpuinfo_x86 *bsp = _cpu_data; > unsigned int i; While I agree with the need to initialize the local variable, I don't think it should be pre-seeded with previous indentification results: This could end up hiding bugs. Instead I'd see it simply be zero-filled. Jan
Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data
On 11.02.2022 10:02, Roger Pau Monné wrote: > On Fri, Feb 11, 2022 at 08:23:27AM +0100, Norbert Manthey wrote: >> When re-identifying CPU data, we might use uninitialized data when >> checking for the cache line property to adapt the cache >> alignment. The data that depends on this uninitialized read is >> currently not forwarded. >> >> To avoid problems in the future, initialize the data cpuinfo >> structure before re-identifying the CPU again. >> >> The trace to hit the uninitialized read reported by Coverity is: >> >> bool recheck_cpu_features(unsigned int cpu) >> ... >> struct cpuinfo_x86 c; >> ... >> identify_cpu(); >> >> void identify_cpu(struct cpuinfo_x86 *c) >> ... >> generic_identify(c) >> >> static void generic_identify(struct cpuinfo_x86 *c) >> ... > > Would it be more appropriate for generic_identify to also set > x86_cache_alignment like it's done in early_cpu_init? > > generic_identify already re-fetches a bunch of stuff that's also > set by early_cpu_init for the BSP. This would be an option, but how sure are you that there isn't (going to be) another field with similar properties? We better wouldn't require _everything_ to be re-filled in generic_identify(). Jan
Re: [PATCH 3/3] x86/Intel: also display CPU freq for family 0xf
On Thu, Feb 10, 2022 at 03:56:48PM +0100, Jan Beulich wrote: > Actually we can do better than simply bailing for there not being any > PLATFORM_INFO MSR on these. The "max" part of the information is > available in another MSR, alongside the scaling factor (which is > encoded in similar ways to Core/Core2, and hence the decoding table can > be shared). > > Signed-off-by: Jan Beulich Acked-by: Roger Pau Monné > --- > The inner switch() is left indented one level too much (and with an > extra pair of braces) to limit the diff. I'd prefer to make a follow-up > patch reducing the indentation, unless I'm told to do so right here. I'm fine with a followup patch. > --- a/xen/arch/x86/cpu/intel.c > +++ b/xen/arch/x86/cpu/intel.c > @@ -412,9 +412,9 @@ static int num_cpu_cores(struct cpuinfo_ > > static void intel_log_freq(const struct cpuinfo_x86 *c) > { > -unsigned int eax, ebx, ecx, edx; > +unsigned int eax, ebx, ecx, edx, factor; > uint64_t msrval; > -uint8_t max_ratio; > +uint8_t max_ratio, min_ratio; > > if ( c->cpuid_level >= 0x15 ) > { > @@ -455,21 +455,22 @@ static void intel_log_freq(const struct > } > } > > -if ( c->x86 == 0xf || rdmsr_safe(MSR_INTEL_PLATFORM_INFO, msrval) ) > -return; > -max_ratio = msrval >> 8; > - > -if ( max_ratio ) > +switch ( c->x86 ) > { > -unsigned int factor = 1; > -uint8_t min_ratio = msrval >> 40; > +static const unsigned short core_factors[] = This no longer applies to Core models only, so I wouldn't be opposed to renaming to scaling_factors or similar. Thanks, Roger.
Re: [PATCH v2 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86
On 10/02/2022 10:03, Roger Pau Monné wrote: > On Mon, Feb 07, 2022 at 06:21:00PM +, Jane Malalane wrote: >> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c >> index 7ab15e07a0..4060aef1bd 100644 >> --- a/xen/arch/x86/hvm/vmx/vmcs.c >> +++ b/xen/arch/x86/hvm/vmx/vmcs.c >> @@ -343,6 +343,15 @@ static int vmx_init_vmcs_config(bool bsp) >> MSR_IA32_VMX_PROCBASED_CTLS2, ); >> } >> >> +/* Check whether hardware supports accelerated xapic and x2apic. */ >> +if ( bsp ) >> +{ >> +assisted_xapic_available = cpu_has_vmx_virtualize_apic_accesses; >> +assisted_x2apic_available = (cpu_has_vmx_apic_reg_virt || >> + cpu_has_vmx_virtual_intr_delivery) && >> +cpu_has_vmx_virtualize_x2apic_mode; > > I've been think about this, and it seems kind of asymmetric that for > xAPIC mode we report hw assisted support only with > virtualize_apic_accesses available, while for x2APIC we require > virtualize_x2apic_mode plus either apic_reg_virt or > virtual_intr_delivery. > > I think we likely need to be more consistent here, and report hw > assisted x2APIC support as long as virtualize_x2apic_mode is > available. > > This will likely have some effect on patch 2 also, as you will have to > adjust vmx_vlapic_msr_changed. > > Thanks, Roger. Any other thoughts on this? As on one hand it is asymmetric but also there isn't much assistance with only virtualize_x2apic_mode set as, in this case, a VM exit will be avoided only when trying to access the TPR register. Thanks, Jane.
Re: [PATCH 1/3] x86/Intel: skip PLATFORM_INFO reads on family 0xf
On 11.02.2022 10:40, Roger Pau Monné wrote: > On Thu, Feb 10, 2022 at 03:55:52PM +0100, Jan Beulich wrote: >> This avoids unnecessary (and always somewhat scary) log messages for the >> recovered from #GP(0). > > Could we maybe get rid of the #GP messages for cases like this where we > are explicitly probing for MSR presence? (ie: it's expected that we > can get a #GP) This would mean some form of annotation of such RDMSR attempts (for the recovery code to recognize in order to skip the printk()). Not all rdmsr_safe() uses are, strictly speaking, probes, so I wouldn't want to put such in rdmsr_safe() itself. In any event - quite a bit more work. Plus I'm not convinced it's a good idea to suppress any such log messages. >> Signed-off-by: Jan Beulich >> --- >> Perhaps even use "!= 6" in at least the CPUID-faulting family check? > > Likely, or else you would also need to check for family 11 (Knights > Corner?) which doesn't seem to support PLATFORM_INFO either. I don't think Xen is able to run on these (likewise for IA64, which iirc were surfacing as x86 model 7)? These are the co-processor ones, aren't they? My question was more towards whether we would (wrongly) exclude future processors when using != 6, if Intel decided to ever make new CPUs with a family other than 6. Jan
Re: [PATCH 1/3] x86/Intel: skip PLATFORM_INFO reads on family 0xf
On Thu, Feb 10, 2022 at 03:55:52PM +0100, Jan Beulich wrote: > This avoids unnecessary (and always somewhat scary) log messages for the > recovered from #GP(0). Could we maybe get rid of the #GP messages for cases like this where we are explicitly probing for MSR presence? (ie: it's expected that we can get a #GP) > Signed-off-by: Jan Beulich > --- > Perhaps even use "!= 6" in at least the CPUID-faulting family check? Likely, or else you would also need to check for family 11 (Knights Corner?) which doesn't seem to support PLATFORM_INFO either. Thanks, Roger.
Re: [PATCH CPU v1] cpuid: initialize cpuinfo with boot_cpu_data
On Fri, Feb 11, 2022 at 08:23:27AM +0100, Norbert Manthey wrote: > When re-identifying CPU data, we might use uninitialized data when > checking for the cache line property to adapt the cache > alignment. The data that depends on this uninitialized read is > currently not forwarded. > > To avoid problems in the future, initialize the data cpuinfo > structure before re-identifying the CPU again. > > The trace to hit the uninitialized read reported by Coverity is: > > bool recheck_cpu_features(unsigned int cpu) > ... > struct cpuinfo_x86 c; > ... > identify_cpu(); > > void identify_cpu(struct cpuinfo_x86 *c) > ... > generic_identify(c) > > static void generic_identify(struct cpuinfo_x86 *c) > ... Would it be more appropriate for generic_identify to also set x86_cache_alignment like it's done in early_cpu_init? generic_identify already re-fetches a bunch of stuff that's also set by early_cpu_init for the BSP. Thanks, Roger.
Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure
On 10.02.22 18:16, Roger Pau Monné wrote: > On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote: >> From: Oleksandr Andrushchenko >> >> Introduce a per-domain read/write lock to check whether vpci is present, >> so we are sure there are no accesses to the contents of the vpci struct >> if not. This lock can be used (and in a few cases is used right away) >> so that vpci removal can be performed while holding the lock in write >> mode. Previously such removal could race with vpci_read for example. > Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt > pci_remove_device, and likely when vPCI gets also used in > {de}assign_device I think. > How about the below? It seems to guarantee that we can access pdev without issues and without requiring pcidevs_lock to be used? diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index e8b09d77d880..fd464a58b3b3 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -937,8 +937,14 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus, } devfn = pdev->devfn; +#ifdef CONFIG_HAS_VPCI + write_lock(>vpci_rwlock); +#endif ret = iommu_call(hd->platform_ops, reassign_device, d, target, devfn, pci_to_dev(pdev)); +#ifdef CONFIG_HAS_VPCI + write_unlock(>vpci_rwlock); +#endif if ( ret ) goto out; @@ -1474,6 +1480,9 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) const struct domain_iommu *hd = dom_iommu(d); struct pci_dev *pdev; int rc = 0; +#ifdef CONFIG_HAS_VPCI + struct domain *old_d; +#endif if ( !is_iommu_enabled(d) ) return 0; @@ -1487,15 +1496,34 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) ASSERT(pdev && (pdev->domain == hardware_domain || pdev->domain == dom_io)); +#ifdef CONFIG_HAS_VPCI + /* pdev->domain is either hwdom or dom_io. We do not want the later. */ + old_d = pdev->domain == hardware_domain ? pdev->domain : NULL; + if ( old_d ) + write_lock(_d->vpci_rwlock); +#endif + rc = pdev_msix_assign(d, pdev); if ( rc ) + { +#ifdef CONFIG_HAS_VPCI + if ( old_d ) + write_unlock(_d->vpci_rwlock); +#endif goto done; + } pdev->fault.count = 0; if ( (rc = iommu_call(hd->platform_ops, assign_device, d, devfn, pci_to_dev(pdev), flag)) ) + { +#ifdef CONFIG_HAS_VPCI + if ( old_d ) + write_unlock(_d->vpci_rwlock); +#endif goto done; + } for ( ; pdev->phantom_stride; rc = 0 ) { I think we don't care about pci_remove_device because: int pci_remove_device(u16 seg, u8 bus, u8 devfn) { [snip] pcidevs_lock(); list_for_each_entry ( pdev, >alldevs_list, alldevs_list ) if ( pdev->bus == bus && pdev->devfn == devfn ) { vpci_remove_device(pdev); as vpci_remove_device will take care there are no readers and will safely remove vpci. Thank you, Oleksandr
Re: [RFC v2 5/8] xen/arm: introduce SCMI-SMC mediator driver
Hi Oleksii, > On 8 Feb 2022, at 18:00, Oleksii Moisieiev wrote: > > This is the implementation of SCI interface, called SCMI-SMC driver, > which works as the mediator between XEN Domains and Firmware (SCP, ATF etc). > This allows devices from the Domains to work with clocks, resets and > power-domains without access to CPG. > > Originally, cpg should be passed to the domain so it can work with > power-domains/clocks/resets etc. Considering that cpg can't be split between > the Domains, we get the limitation that the devices, which are using > power-domains/clocks/resets etc, couldn't be split between the domains. > The solution is to move the power-domain/clock/resets etc to the > Firmware (such as SCP firmware or ATF) and provide interface for the > Domains. XEN should have an entity, caled SCI-Mediator, which is > responsible for messages redirection between Domains and Firmware and > for permission handling. > > The following features are implemented: > - request SCMI channels from ATF and pass channels to Domains; > - set device permissions for Domains based on the Domain partial > device-tree. Devices with permissions are able to work with clocks, > resets and power-domains via SCMI; > - redirect scmi messages from Domains to ATF. Before going more deeply in the code I would like to discuss the general design here and ask some questions to prevent to rework the code before we all agree that this is the right solution and that we want this in Xen. First I want to point out that clock/reset/power virtualization is a problem on most applications using device pass-through and I am very glad that someone is looking into it. Also SCMI is the current standard existing for this so relying on it is a very good idea. Latest version SCMI standard (DEN0056D v3.1) is defining some means to use SCMI on a virtualised system. In chapter 4.2.1, the standard recommends to set permissions per agent in the hypervisor so that a VM could later use the discovery protocol to detect the resources and use them. Using this kind of scenario the mediator in Xen would just configure the Permissions in the SCMI and would then rely on it to limit what is possible by who just by just assigning a channel to a VM. In your current design (please correct me if I am wrong) you seem to fully rely on Xen and the FDT for discovery and permission. Wouldn’t it be a better idea to use the protocol fully ? Could we get rid of some of the FDT dependencies by using the discovery system of SCMI ? How is Linux doing this currently ? Is it relying on device tree to discover the SCMI resources ? Also I understand that you rely on some entries to be declared in the device tree and also some support to be implemented in ATF or SCP. I checked in The boards I have access to and the device trees but none of this seem to be supported there. Could you tell which board/configuration/ATF you are using so that the implementation could be tested/validated ? Regards Bertrand > > Signed-off-by: Oleksii Moisieiev > --- > xen/arch/arm/Kconfig| 2 + > xen/arch/arm/sci/Kconfig| 10 + > xen/arch/arm/sci/scmi_smc.c | 959 > 3 files changed, 971 insertions(+) > create mode 100644 xen/arch/arm/sci/Kconfig > create mode 100644 xen/arch/arm/sci/scmi_smc.c > > diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig > index ab07833582..3b0dfc57b6 100644 > --- a/xen/arch/arm/Kconfig > +++ b/xen/arch/arm/Kconfig > @@ -123,6 +123,8 @@ config ARM_SCI > support. It allows guests to control system resourcess via one of > ARM_SCI mediators implemented in XEN. > > + source "arch/arm/sci/Kconfig" > + > endmenu > > menu "ARM errata workaround via the alternative framework" > diff --git a/xen/arch/arm/sci/Kconfig b/xen/arch/arm/sci/Kconfig > new file mode 100644 > index 00..10b634d2ed > --- /dev/null > +++ b/xen/arch/arm/sci/Kconfig > @@ -0,0 +1,10 @@ > +config SCMI_SMC > + bool "Enable SCMI-SMC mediator driver" > + default n > + depends on ARM_SCI && HOST_DTB_EXPORT > + ---help--- > + > + Enables mediator in XEN to pass SCMI requests from Domains to ATF. > + This feature allows drivers from Domains to work with System > + Controllers (such as power,resets,clock etc.). SCP is used as transport > + for communication. > diff --git a/xen/arch/arm/sci/scmi_smc.c b/xen/arch/arm/sci/scmi_smc.c > new file mode 100644 > index 00..103529dfab > --- /dev/null > +++ b/xen/arch/arm/sci/scmi_smc.c > @@ -0,0 +1,959 @@ > +/* > + * xen/arch/arm/sci/scmi_smc.c > + * > + * SCMI mediator driver, using SCP as transport. > + * > + * Oleksii Moisieiev > + * Copyright (C) 2021, EPAM Systems. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + *
[libvirt test] 168082: regressions - FAIL
flight 168082 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/168082/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 151777 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-pair 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-vhd 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-amd64-i386-libvirt-pair 1 build-check(1) blocked n/a test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt-raw 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-libvirt 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-qcow2 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-raw 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-raw 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-qcow2 1 build-check(1) blocked n/a version targeted for testing: libvirt 2ac78307af099a2af934d5499f262458f3ce0ea4 baseline version: libvirt 2c846fa6bcc11929c9fb857a22430fb9945654ad Last test of basis 151777 2020-07-10 04:19:19 Z 581 days Failing since151818 2020-07-11 04:18:52 Z 580 days 562 attempts Testing same since 168082 2022-02-11 04:20:14 Z0 days1 attempts People who touched revisions under test: Adolfo Jayme Barrientos Aleksandr Alekseev Aleksei Zakharov Andika Triwidada Andrea Bolognani Ani Sinha Balázs Meskó Barrett Schonefeld Bastian Germann Bastien Orivel BiaoXiang Ye Bihong Yu Binfeng Wu Bjoern Walk Boris Fiuczynski Brad Laue Brian Turek Bruno Haible Chris Mayo Christian Borntraeger Christian Ehrhardt Christian Kirbach Christian Schoenebeck Christophe Fergeau Cole Robinson Collin Walling Cornelia Huck Cédric Bosdonnat Côme Borsoi Daniel Henrique Barboza Daniel Letai Daniel P. Berrange Daniel P. Berrangé Didik Supriadi dinglimin Divya Garg Dmitrii Shcherbakov Dmytro Linkin Eiichi Tsukata Emilio Herrera Eric Farman Erik Skultety Fabian Affolter Fabian Freyer Fabiano Fidêncio Fangge Jin Farhan Ali Fedora Weblate Translation Franck Ridel Gavi Teitz gongwei Guoyi Tu Göran Uddeborg Halil Pasic Han Han Hao Wang Hela Basa Helmut Grohne Hiroki Narukawa Hyman Huang(黄勇) Ian Wienand Ioanna Alifieraki Ivan Teterevkov Jakob Meng Jamie Strandboge Jamie Strandboge Jan Kuparinen jason lee Jean-Baptiste Holcroft Jia Zhou Jianan Gao Jim Fehlig Jin Yan Jing Qi Jinsheng Zhang Jiri Denemark Joachim Falk John Ferlan Jonathan Watt Jonathon Jongsma Julio Faracco Justin Gatzen Ján Tomko Kashyap Chamarthy Kevin Locke Koichi Murase Kristina Hanicova Laine Stump Laszlo Ersek Lee Yarwood Lei Yang Liao Pingfang Lin Ma Lin Ma Lin Ma Liu Yiding Lubomir Rintel Luke Yue Luyao Zhong Marc Hartmayer Marc-André Lureau Marek Marczykowski-Górecki Markus Schade Martin Kletzander Masayoshi Mizuma Matej Cepl Matt Coleman Matt Coleman Mauro Matteo Cascella Meina Li Michal Privoznik Michał Smyk Milo Casagrande Moshe Levi Muha Aliss Nathan Neal Gompa Nick Chevsky Nick Shyrokovskiy Nickys Music Group Nico Pache Nicolas Lécureuil Nicolas Lécureuil Nikolay Shirokovskiy Olaf Hering Olesya Gerasimenko Or Ozeri Orion Poplawski Pany Patrick Magauran Paulo de Rezende Pinatti Pavel Hrdina Peng Liang Peter Krempa Pino Toscano Pino Toscano Piotr Drąg Prathamesh Chavan Praveen K Paladugu Richard W.M. Jones Ricky Tigg Robin Lee Rohit Kumar Roman Bogorodskiy Roman Bolshakov Ryan Gahagan Ryan Schmidt Sam Hartman Scott Shambarger Sebastian Mitterle SeongHyun Jo Shalini Chellathurai Saroja Shaojun Yang
Re: [PATCH] MAINTAINERS: make Bertrand ARM maintainer
Hi, > On 11 Feb 2022, at 08:22, Bertrand Marquis wrote: > > Hi Stefano, > > Thanks a lot :-) > >> On 10 Feb 2022, at 19:08, Stefano Stabellini wrote: >> >> Signed-off-by: Stefano Stabellini Fixing typo introduced by spell checker: > > Asked-by: Bertrand Marquis Acked-by : Bertrand Marquis Bertrand > > Cheers > Bertrand > >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index 64934cc070..d41572128b 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -230,8 +230,8 @@ F: tools/libs/ctrl/xc_arinc653.c >> ARM (W/ VIRTUALISATION EXTENSIONS) ARCHITECTURE >> M: Stefano Stabellini >> M: Julien Grall >> +M: Bertrand Marquis >> R: Volodymyr Babchuk >> -R: Bertrand Marquis >> S: Supported >> L: xen-devel@lists.xenproject.org >> F: docs/misc/arm/ > >
Re: [RFC] Avoid dom0/HVM performance penalty from MSR access tightening
On Thu, Feb 10, 2022 at 11:27:15AM -0600, Alex Olson wrote: > I'm seeing strange performance issues under Xen on a Supermicro server with a > Xeon D-1541 CPU caused by an MSR-related commit. > > Commit 322ec7c89f6640ee2a99d1040b6f786cf04872cf 'x86/pv: disallow access to > unknown MSRs' > surprisingly introduces a severe performance penality where dom0 has about > 1/8th > the normal CPU performance. Even even when 'xenpm' is used to select the > performance governor and operate the CPU at maximum frequency, actual CPU > performance is still 1/2 of normal (as well as using > "cpufreq=xen,performance"). > > The patch below fixes it but I don't fully understand why. > > Basically, when *reads* of MSR_IA32_THERM_CONTROL are blocked, dom0 and > guests (pinned to other CPUs) see the performance issues. You only mention MSR_IA32_THERM_CONTROL here... > For benchmarking purposes, I built a small C program that runs a "for > loop" > 4Billion iterations and timed its execution. In dom0, the > performance issues > also cause HVM guest startup time to go from 9-10 > seconds to almost 80 seconds. > > I assumed Xen was managing CPU frequency and thus blocking related MSR > access by dom0 (or any other domain). However, clearly something else > is happening and I don't understand why. > > I initially attempted to copy the same logic as the write MSR case. This > was effective at fixing the dom0 performance issue, but still left other > domains running at 1/2 speed. Hence, the change below has no access control. > > > If anyone has any insight as to what is really happening, I would be all ears > as I am unsure if the change below is a proper solution. > > Thanks > > -Alex > > --- > --- > xen/arch/x86/pv/emul-priv-op.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c > index 7f4279a051..f254479bda 100644 > --- a/xen/arch/x86/pv/emul-priv-op.c > +++ b/xen/arch/x86/pv/emul-priv-op.c > @@ -970,6 +970,18 @@ static int read_msr(unsigned int reg, uint64_t *val, > *val = 0; > return X86EMUL_OKAY; > > +/* being unable to read MSR_IA32_THERM_CONTROL seems to significantly > affect > + * dom0 and thus HVM guest startup performance, as well as PVH VMs. > + */ > +case MSR_IA32_THERM_CONTROL: > +case MSR_IA32_ENERGY_PERF_BIAS: ...yet in the patch you also allow access to MSR_IA32_ENERGY_PERF_BIAS, which makes me wonder whether MSR_IA32_THERM_CONTROL is the only required one. It could help to post full logs Xen + Linux dmesgs. Is this reproducible with different Linux versions? Thanks, Roger.
Re: [PATCH] MAINTAINERS: make Bertrand ARM maintainer
Hi Stefano, Thanks a lot :-) > On 10 Feb 2022, at 19:08, Stefano Stabellini wrote: > > Signed-off-by: Stefano Stabellini Asked-by: Bertrand Marquis Cheers Bertrand > > diff --git a/MAINTAINERS b/MAINTAINERS > index 64934cc070..d41572128b 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -230,8 +230,8 @@ F:tools/libs/ctrl/xc_arinc653.c > ARM (W/ VIRTUALISATION EXTENSIONS) ARCHITECTURE > M:Stefano Stabellini > M:Julien Grall > +M: Bertrand Marquis > R:Volodymyr Babchuk > -R: Bertrand Marquis > S:Supported > L:xen-devel@lists.xenproject.org > F:docs/misc/arm/
Re: [RFC v2 1/8] xen/hypfs: support fo nested dynamic hypfs nodes
Hi Juergen, On Thu, Feb 10, 2022 at 08:34:08AM +0100, Juergen Gross wrote: > On 08.02.22 19:00, Oleksii Moisieiev wrote: > > > Add new api: > > - hypfs_read_dyndir_entry > > - hypfs_gen_dyndir_entry > > which are the extension of the dynamic hypfs nodes support, presented in > > 0b3b53be8cf226d947a79c2535a9efbb2dd7bc38. > > This allows nested dynamic nodes to be added. Also input parameter is > > hypfs_entry, so properties can also be generated dynamically. > > > > Generating mixed list of dirs and properties is also supported. > > Same as to the dynamic hypfs nodes, this is anchored in percpu pointer, > > which can be retriewed on any level of the dynamic entries. > > This handle should be allocated on enter() callback and released on > > exit() callback. When using nested dynamic dirs and properties handle > > should be allocated on the first enter() call and released on the last > > exit() call. > > > > Signed-off-by: Oleksii Moisieiev > > --- > > xen/common/hypfs.c | 83 + > > xen/include/xen/hypfs.h | 14 ++- > > 2 files changed, 79 insertions(+), 18 deletions(-) > > > > diff --git a/xen/common/hypfs.c b/xen/common/hypfs.c > > index e71f7df479..6901f5e311 100644 > > --- a/xen/common/hypfs.c > > +++ b/xen/common/hypfs.c > > @@ -367,28 +367,27 @@ unsigned int hypfs_getsize(const struct hypfs_entry > > *entry) > > /* > >* Fill the direntry for a dynamically generated entry. Especially the > > - * generated name needs to be kept in sync with > > hypfs_gen_dyndir_id_entry(). > > + * generated name needs to be kept in sync with hypfs_gen_dyndir_entry(). > >*/ > > -int hypfs_read_dyndir_id_entry(const struct hypfs_entry_dir *template, > > - unsigned int id, bool is_last, > > +int hypfs_read_dyndir_entry(const struct hypfs_entry *template, > > + const char *name, unsigned int namelen, > > + bool is_last, > > XEN_GUEST_HANDLE_PARAM(void) *uaddr) > > Please fix the indentation of the parameters. > > > { > > struct xen_hypfs_dirlistentry direntry; > > -char name[HYPFS_DYNDIR_ID_NAMELEN]; > > -unsigned int e_namelen, e_len; > > +unsigned int e_len; > > -e_namelen = snprintf(name, sizeof(name), template->e.name, id); > > -e_len = DIRENTRY_SIZE(e_namelen); > > +e_len = DIRENTRY_SIZE(namelen); > > direntry.e.pad = 0; > > -direntry.e.type = template->e.type; > > -direntry.e.encoding = template->e.encoding; > > -direntry.e.content_len = template->e.funcs->getsize(>e); > > -direntry.e.max_write_len = template->e.max_size; > > +direntry.e.type = template->type; > > +direntry.e.encoding = template->encoding; > > +direntry.e.content_len = template->funcs->getsize(template); > > +direntry.e.max_write_len = template->max_size; > > direntry.off_next = is_last ? 0 : e_len; > > if ( copy_to_guest(*uaddr, , 1) ) > > return -EFAULT; > > if ( copy_to_guest_offset(*uaddr, DIRENTRY_NAME_OFF, name, > > - e_namelen + 1) ) > > + namelen + 1) ) > > return -EFAULT; > > guest_handle_add_offset(*uaddr, e_len); > > @@ -396,6 +395,22 @@ int hypfs_read_dyndir_id_entry(const struct > > hypfs_entry_dir *template, > > return 0; > > } > > +/* > > + * Fill the direntry for a dynamically generated entry. Especially the > > + * generated name needs to be kept in sync with > > hypfs_gen_dyndir_id_entry(). > > + */ > > +int hypfs_read_dyndir_id_entry(const struct hypfs_entry_dir *template, > > + unsigned int id, bool is_last, > > + XEN_GUEST_HANDLE_PARAM(void) *uaddr) > > +{ > > +char name[HYPFS_DYNDIR_ID_NAMELEN]; > > +unsigned int e_namelen; > > + > > +e_namelen = snprintf(name, sizeof(name), template->e.name, id); > > +return hypfs_read_dyndir_entry(>e, name, e_namelen, is_last, > > uaddr); > > +} > > + > > + > > static const struct hypfs_entry *hypfs_dyndir_enter( > > const struct hypfs_entry *entry) > > { > > @@ -404,7 +419,7 @@ static const struct hypfs_entry *hypfs_dyndir_enter( > > data = hypfs_get_dyndata(); > > /* Use template with original enter function. */ > > -return data->template->e.funcs->enter(>template->e); > > +return data->template->funcs->enter(data->template); > > } > > static struct hypfs_entry *hypfs_dyndir_findentry( > > @@ -415,7 +430,7 @@ static struct hypfs_entry *hypfs_dyndir_findentry( > > data = hypfs_get_dyndata(); > > /* Use template with original findentry function. */ > > -return data->template->e.funcs->findentry(data->template, name, > > name_len); > > +return data->template->funcs->findentry(>dir, name, name_len); > > } > > static int hypfs_read_dyndir(const struct hypfs_entry *entry, > > @@ -426,7 +441,36