[ovmf test] 186358: all pass - PUSHED
flight 186358 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/186358/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf cf323e2839ce260fde43487baae205527dee1b2f baseline version: ovmf 5e776299a2604b336a947e68593012ab2cc16eb4 Last test of basis 186352 2024-06-14 13:42:52 Z0 days Testing same since 186358 2024-06-15 04:11:23 Z0 days1 attempts People who touched revisions under test: Ard Biesheuvel Leif Lindholm Pierre Gondois jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git 5e776299a2..cf323e2839 cf323e2839ce260fde43487baae205527dee1b2f -> xen-tested-master
[linux-linus test] 186354: tolerable FAIL - PUSHED
flight 186354 linux-linus real [real] flight 186357 linux-linus real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/186354/ http://logs.test-lab.xenproject.org/osstest/logs/186357/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-qemuu-freebsd12-amd64 21 guest-start/freebsd.repeat fail pass in 186357-retest test-armhf-armhf-examine 8 reboot fail pass in 186357-retest test-armhf-armhf-xl-multivcpu 8 xen-boot fail pass in 186357-retest test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 186357-retest Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-multivcpu 15 migrate-support-check fail in 186357 never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-check fail in 186357 never pass test-armhf-armhf-xl-raw 8 xen-boot fail like 186314 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 186314 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186314 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186314 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186314 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186314 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186314 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-qcow214 migrate-support-checkfail never pass test-armhf-armhf-xl-qcow215 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail never pass version targeted for testing: linux0cac73eb3875f6ecb6105e533218dba1868d04c9 baseline version: linux2ef5971ff345d3c000873725db555085e0131961 Last test of basis 186314 2024-06-12 00:10:33 Z3 days Failing since186324 2024-06-12 17:12:12 Z2 days5 attempts Testing same since 186354 2024-06-14 18:42:32 Z0 days1 attempts People who touched revisions under test: "Csókás, Bence" Aleksandr Mishin Andrei Vagin Andy Shevchenko Ard Biesheuvel Borislav Petkov
[xen-unstable test] 186353: tolerable FAIL - PUSHED
flight 186353 xen-unstable real [real] flight 186355 xen-unstable real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/186353/ http://logs.test-lab.xenproject.org/osstest/logs/186355/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-armhf-armhf-xl-arndale 10 host-ping-check-xen fail pass in 186355-retest Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-arndale 15 migrate-support-check fail in 186355 never pass test-armhf-armhf-xl-arndale 16 saverestore-support-check fail in 186355 never pass test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 186344 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186344 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186344 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186344 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186344 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186344 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-qcow214 migrate-support-checkfail never pass test-armhf-armhf-xl-qcow215 saverestore-support-checkfail never pass test-armhf-armhf-xl-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-raw 15 saverestore-support-checkfail never pass version targeted for testing: xen 8b4243a9b560c89bb259db5a27832c253d4bebc7 baseline version: xen 4fdd8d75566fdad06667a79ec0ce6f43cc466c54 Last test of basis 186344 2024-06-14 05:48:54 Z0 days Testing same since 186353 2024-06-14 15:10:38 Z0 days1 attempts People who touched revisions under test: Luca Fancellu Penny Zheng jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64-xtf pass build-amd64
Re: Design session notes: GPU acceleration in Xen
On Fri, Jun 14, 2024 at 08:38:51AM +0200, Jan Beulich wrote: > On 13.06.2024 20:43, Demi Marie Obenour wrote: > > GPU acceleration requires that pageable host memory be able to be mapped > > into a guest. > > I'm sure it was explained in the session, which sadly I couldn't attend. > I've been asking Ray and Xenia the same before, but I'm afraid it still > hasn't become clear to me why this is a _requirement_. After all that's > against what we're doing elsewhere (i.e. so far it has always been > guest memory that's mapped in the host). I can appreciate that it might > be more difficult to implement, but avoiding to violate this fundamental > (kind of) rule might be worth the price (and would avoid other > complexities, of which there may be lurking more than what you enumerate > below). The GPU driver knows how to allocate buffers that are usable by the GPU. On a discrete GPU, these buffers will generally be in VRAM, rather than in system RAM, because access to system RAM requires going through the PCI bus (slow). However, VRAM is a limited resource, so the driver will migrate pages between VRAM and system RAM as needed. During the migration, a guest that tries to access the pages must block until the migration is complete. Some GPU drivers support accessing externally provided memory. This is called "userptr", and is supported by i915 and amdgpu. However, it appears that some other drivers (such as MSM) do not support it, and since GPUs with VRAM need to be supported anyway, Xen still needs to support GPU driver-allocated memory. I also CCd dri-de...@lists.freedesktop.org and the general GPU driver maintainers in Linux in case they can give a better answer, as well as Rob Clark who invented native contexts. > > This requires changes to all of the Xen hypervisor, Linux > > kernel, and userspace device model. > > > > ### Goals > > > > - Allow any userspace pages to be mapped into a guest. > > - Support deprivileged operation: this API must not be usable for > > privilege escalation. > > - Use MMU notifiers to ensure safety with respect to use-after-free. > > > > ### Hypervisor changes > > > > There are at least two Xen changes required: > > > > 1. Add a new flag to IOREQ that means "retry this instruction". > > > >An IOREQ server can set this flag after having successfully handled a > >page fault. It is expected that the IOREQ server has successfully > >mapped a page into the guest at the location of the fault. > >Otherwise, the same fault will likely happen again. > > Were there any thoughts on how to prevent this becoming an infinite loop? > I.e. how to (a) guarantee forward progress in the guest and (b) deal with > misbehaving IOREQ servers? Guaranteeing forward progress is up to the IOREQ server. If the IOREQ server misbehaves, an infinite loop is possible, but the CPU time used by it should be charged to the IOREQ server, so this isn't a vulnerability. > > 2. Add support for `XEN_DOMCTL_memory_mapping` to use system RAM, not > >just IOMEM. Mappings made with `XEN_DOMCTL_memory_mapping` are > >guaranteed to be able to be successfully revoked with > >`XEN_DOMCTL_memory_mapping`, so all operations that would create > >extra references to the mapped memory must be forbidden. These > >include, but may not be limited to: > > > >1. Granting the pages to the same or other domains. > >2. Mapping into another domain using `XEN_DOMCTL_memory_mapping`. > >3. Another domain accessing the pages using the foreign memory APIs, > > unless it is privileged over the domain that owns the pages. > > All of which may call for actually converting the memory to kind-of-MMIO, > with a means to later convert it back. Would this support the case where the mapping domain is not fully priviliged, and where it might be a PV guest? -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab signature.asc Description: PGP signature
[linux-linus test] 186345: regressions - FAIL
flight 186345 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/186345/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-armhf 6 xen-build fail in 186342 REGR. vs. 186314 Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 186342 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-credit2 1 build-check(1) blocked in 186342 n/a test-armhf-armhf-xl-credit1 1 build-check(1) blocked in 186342 n/a test-armhf-armhf-xl-arndale 1 build-check(1) blocked in 186342 n/a test-armhf-armhf-xl-raw 1 build-check(1) blocked in 186342 n/a test-armhf-armhf-libvirt 1 build-check(1) blocked in 186342 n/a test-armhf-armhf-xl-qcow2 1 build-check(1) blocked in 186342 n/a test-armhf-armhf-xl 1 build-check(1) blocked in 186342 n/a test-armhf-armhf-xl-rtds 1 build-check(1) blocked in 186342 n/a test-armhf-armhf-examine 1 build-check(1) blocked in 186342 n/a build-armhf-libvirt 1 build-check(1) blocked in 186342 n/a test-armhf-armhf-libvirt-vhd 1 build-check(1) blocked in 186342 n/a test-armhf-armhf-xl-multivcpu 1 build-check(1) blocked in 186342 n/a test-armhf-armhf-xl-rtds 8 xen-boot fail like 186314 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 186314 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186314 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186314 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186314 test-armhf-armhf-xl-arndale 8 xen-boot fail like 186314 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186314 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186314 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-qcow214 migrate-support-checkfail never pass test-armhf-armhf-xl-qcow215 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail never pass
Re: [PATCH 3/7] x86/boot: Collect the Raw CPU Policy earlier on boot
On 23/05/2024 4:44 pm, Jan Beulich wrote: > On 23.05.2024 13:16, Andrew Cooper wrote: >> This is a tangle, but it's a small step in the right direction. >> >> xstate_init() is shortly going to want data from the Raw policy. >> calculate_raw_cpu_policy() is sufficiently separate from the other policies >> to >> be safe to do. >> >> No functional change. >> >> Signed-off-by: Andrew Cooper > Would you mind taking a look at > https://lists.xen.org/archives/html/xen-devel/2021-04/msg01335.html > to make clear (to me at least) in how far we can perhaps find common grounds > on what wants doing when? (Of course the local version I have has been > constantly re-based, so some of the function names would have changed from > what's visible there.) That's been covered several times, at least in part. I want to eventually move the host policy too, but I'm not willing to compound the mess we've currently got just to do it earlier. It's just creating even more obstacles to doing it nicely. Nothing in this series needs (or indeed should) use the host policy. The same is true of your AMX series. You're (correctly) breaking the uniform allocation size and (when policy selection is ordered WRT vCPU creation, as discussed) it becomes solely depend on the guest policy. xsave.c really has no legitimate use for the host policy once the uniform allocation size aspect has gone away. >> --- a/xen/arch/x86/cpu-policy.c >> +++ b/xen/arch/x86/cpu-policy.c >> @@ -845,7 +845,6 @@ static void __init calculate_hvm_def_policy(void) >> >> void __init init_guest_cpu_policies(void) >> { >> -calculate_raw_cpu_policy(); >> calculate_host_policy(); >> >> if ( IS_ENABLED(CONFIG_PV) ) >> --- a/xen/arch/x86/setup.c >> +++ b/xen/arch/x86/setup.c >> @@ -1888,7 +1888,9 @@ void asmlinkage __init noreturn __start_xen(unsigned >> long mbi_p) >> >> tsx_init(); /* Needs microcode. May change HLE/RTM feature bits. */ >> >> -identify_cpu(_cpu_data); >> +calculate_raw_cpu_policy(); /* Needs microcode. No other dependenices. >> */ >> + >> +identify_cpu(_cpu_data); /* Needs microcode and raw policy. */ > You don't introduce any dependency on raw policy here, and there cannot > possibly > have been such a dependency before (unless there was a bug somewhere). > Therefore > I consider this latter comment misleading at this point. It's made true by the next patch, and I'm not included to split the comment across two patches which are going to be committed together in a unit. ~Andrew
Re: [PATCH] MAINTAINERS: add me as scheduer maintainer
On Mon, 2024-06-10 at 13:35 +0200, Jan Beulich wrote: > George, Dario, > > On 06.06.2024 07:47, Juergen Gross wrote: > > I've been active in the scheduling code since many years now. Add > > me as a maintainer. > > > > Signed-off-by: Juergen Gross > > --- > > MAINTAINERS | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > index 6ba7d2765f..cc40c0be9d 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -490,6 +490,7 @@ F: xen/common/sched/rt.c > > SCHEDULING > > M: George Dunlap > > M: Dario Faggioli > > +M: Juergen Gross > > S: Supported > > F: xen/common/sched/ > > no matter what get-maintainers.pl may say for changes here, I think > it's > largely on the two of you to approve this addition. > Well, I for sure could not approve more, and I'm supper happy to provide my: Acked-by: Dario Faggioli And thanks a lot again Juergen for offering help! Regards, -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part
Re: Design session notes: GPU acceleration in Xen
On Fri, Jun 14, 2024 at 09:21:56AM +0200, Roger Pau Monné wrote: > On Fri, Jun 14, 2024 at 08:38:51AM +0200, Jan Beulich wrote: > > On 13.06.2024 20:43, Demi Marie Obenour wrote: > > > GPU acceleration requires that pageable host memory be able to be mapped > > > into a guest. > > > > I'm sure it was explained in the session, which sadly I couldn't attend. > > I've been asking Ray and Xenia the same before, but I'm afraid it still > > hasn't become clear to me why this is a _requirement_. After all that's > > against what we're doing elsewhere (i.e. so far it has always been > > guest memory that's mapped in the host). I can appreciate that it might > > be more difficult to implement, but avoiding to violate this fundamental > > (kind of) rule might be worth the price (and would avoid other > > complexities, of which there may be lurking more than what you enumerate > > below). > > My limited understanding (please someone correct me if wrong) is that > the GPU buffer (or context I think it's also called?) is always > allocated from dom0 (the owner of the GPU). A GPU context is a GPU address space. It's the GPU equivalent of a CPU process. I don't believe that the same context can be used by more than one userspace process (though I could be wrong), but the same userspace process can create and use as many contexts as it wants. > The underling memory > addresses of such buffer needs to be mapped into the guest. The > buffer backing memory might be GPU MMIO from the device BAR(s) or > system RAM, and such buffer can be paged by the dom0 kernel at any > time (iow: changing the backing memory from MMIO to RAM or vice > versa). Also, the buffer must be contiguous in physical address > space. > > I'm not sure it's possible to ensure that when using system RAM such > memory comes from the guest rather than the host, as it would likely > require some very intrusive hooks into the kernel logic, and > negotiation with the guest to allocate the requested amount of > memory and hand it over to dom0. If the maximum size of the buffer is > known in advance maybe dom0 can negotiate with the guest to allocate > such a region and grant it access to dom0 at driver attachment time. I don't think there is a useful maximum size known. There may be a limit, but it would be around 4GiB or more, which is far too high to reserve physical memory for up front. > One aspect that I'm lacking clarity is better understanding of how the > process of allocating and assigning a GPU buffer to a guest is > performed (I think this is the key to how GPU VirtIO native contexts > work?). The buffer is allocated by the GPU driver in response to an ioctl() made by the userspace server process. If the buffer needs to be accessed by the guest CPU (not all do), it is mapped into part of an emulated PCI BAR for access by the guest. This mailing list thread is about making that possible. > Another question I have, are guest expected to have a single GPU > buffer, or they can have multiple GPU buffers simultaneously > allocated? I believe there is only one emulated BAR, but this is very large (GiBs) and sparsely populated. There can be many GPU buffers mapped into the BAR. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab signature.asc Description: PGP signature
for_each_set_bit() clean-up (API RFC)
More fallout from looking at the code generation... for_each_set_bit() forces it's bitmap parameter out into memory. For an arbitrary sized bitmap, this is fine - and likely preferable as it's an in-memory to begin with. However, more than half the current users of for_each_set_bit() are operating over an single int/long, and this too is spilled to the stack. Worse, x86 seems to be the only architecture which (tries, but not very well) to optimise find_{first,next}_bit() for GPR-sized quantities, meaning that for_each_set_bit() hides 2 backing function calls. The ARM (v)GIC code in particular suffers horribly because of this. We also have several interesting opencoded forms: * evtchn_check_pollers() is a (preprocessor identical) opencoding. * hvm_emulate_writeback() is equivalent. * for_each_vp() exists just to hardcode a constant and swap the other two parameters. and several others forms which I think could be expressed more cleanly as for_each_set_bit(). We also have the while()/ffs() forms which are "just" for_each_set_bit() and some even manage to not spill their main variable to memory. I want to get to a position where there is one clear API to use, and that the compiler will handle nicely. Xen's code generation will definitely improve as a consequence. Sadly, transforming the ideal while()/ffs() form into a for() loop is a bit tricky. This works: for ( unsigned int v = (val), (bit); v; v &= v - 1 ) if ( 1 ) { (bit) = ffs(v) - 1; goto body; } else body: which is a C metaprogramming trick borrowed from PuTTY to make: for_each_BLAH ( bit, val ) { // nice loop body } work, while having the ffs() calculated logically within the loop body. The first issue I expect people to have with the above is the raw 'body' label, although with a macro that can be fixed using body_ ## __COUNTER__. A full example is https://godbolt.org/z/oMGfah696 although a real example in Xen is going to have to be variadic for at least ffs() and ffsl(). Now, from an API point of view, it would be lovely if we could make a single for_each_set_bit() which covers both cases, and while I can distinguish the two forms by whether there are 2 or 3 args, I expect MISRA is going to have a fit at that. Also there's a difference based on the scope of 'bit' and also whether modifications to 'val' in the loop body take effect on the loop condition (they don't because a copy is taken). So I expect everyone is going to want a new API to use here. But what to call it? More than half of the callers in Xen really want the GPR form, so we could introduce a new bitmap_for_each_set_bit(), move all the callers over, then introduce a "new" for_each_set_bit() which is only of the GPR form. Or does anyone want to suggest an alternative name? Thoughts? ~Andrew
Re: Design session notes: GPU acceleration in Xen
On Fri, Jun 14, 2024 at 10:12:40AM +0200, Jan Beulich wrote: > On 14.06.2024 09:21, Roger Pau Monné wrote: > > On Fri, Jun 14, 2024 at 08:38:51AM +0200, Jan Beulich wrote: > >> On 13.06.2024 20:43, Demi Marie Obenour wrote: > >>> GPU acceleration requires that pageable host memory be able to be mapped > >>> into a guest. > >> > >> I'm sure it was explained in the session, which sadly I couldn't attend. > >> I've been asking Ray and Xenia the same before, but I'm afraid it still > >> hasn't become clear to me why this is a _requirement_. After all that's > >> against what we're doing elsewhere (i.e. so far it has always been > >> guest memory that's mapped in the host). I can appreciate that it might > >> be more difficult to implement, but avoiding to violate this fundamental > >> (kind of) rule might be worth the price (and would avoid other > >> complexities, of which there may be lurking more than what you enumerate > >> below). > > > > My limited understanding (please someone correct me if wrong) is that > > the GPU buffer (or context I think it's also called?) is always > > allocated from dom0 (the owner of the GPU). The underling memory > > addresses of such buffer needs to be mapped into the guest. The > > buffer backing memory might be GPU MMIO from the device BAR(s) or > > system RAM, and such buffer can be paged by the dom0 kernel at any > > time (iow: changing the backing memory from MMIO to RAM or vice > > versa). Also, the buffer must be contiguous in physical address > > space. > > This last one in particular would of course be a severe restriction. > Yet: There's an IOMMU involved, isn't there? On x86 there is. On Arm there might or might not be. There are non-embedded systems (such as Apple silicon) where the GPU is not behind an IOMMU, for performance reasons IIUC. > > I'm not sure it's possible to ensure that when using system RAM such > > memory comes from the guest rather than the host, as it would likely > > require some very intrusive hooks into the kernel logic, and > > negotiation with the guest to allocate the requested amount of > > memory and hand it over to dom0. If the maximum size of the buffer is > > known in advance maybe dom0 can negotiate with the guest to allocate > > such a region and grant it access to dom0 at driver attachment time. > > Besides the thought of transiently converting RAM to kind-of-MMIO, this > makes me think of another possible option: Could Dom0 transfer ownership > of the RAM that wants mapping in the guest (remotely resembling > grant-transfer)? Would require the guest to have ballooned down enough > first, of course. (In both cases it would certainly need working out how > the conversion / transfer back could be made work safely and reasonably > cleanly.) > > Jan The kernel driver needs to be able to reclaim the memory at any time. My understanding is that this is used to migrate memory between VRAM and system RAM. It might also be used for other purposes. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab signature.asc Description: PGP signature
Re: convert the SCSI ULDs to the atomic queue limits API v2
On Fri, 31 May 2024 09:47:55 +0200, Christoph Hellwig wrote: > this series converts the SCSI upper level drivers to the atomic queue > limits API. > > The first patch is a bug fix for ubd that later patches depend on and > might be worth picking up for 6.10. > > The second patch changes the max_sectors calculation to take the optimal > I/O size into account so that sd, nbd and rbd don't have to mess with > the user max_sector value. I'd love to see a careful review from the > nbd and rbd maintainers for this one! > > [...] Applied, thanks! [01/14] ubd: refactor the interrupt handler commit: 5db755fbb1a0de4a4cfd5d5edfaa19853b9c56e6 [02/14] ubd: untagle discard vs write zeroes not support handling commit: 31ade7d4fdcf382beb8cb229a1f5d77e0f239672 [03/14] rbd: increase io_opt again commit: a00d4bfce7c6d7fa4712b8133ec195c9bd142ae6 [04/14] block: take io_opt and io_min into account for max_sectors commit: a23634644afc2f7c1bac98776440a1f3b161819e [05/14] sd: simplify the ZBC case in provisioning_mode_store commit: b3491b0db165c0cbe25874da66d97652c03db654 [06/14] sd: add a sd_disable_discard helper commit: b0dadb86a90bd5a7b723f9d3a6cf69da9b596496 [07/14] sd: add a sd_disable_write_same helper commit: 9972b8ce0d4ba373901bdd1e15e4de58fcd7f662 [08/14] sd: simplify the disable case in sd_config_discard commit: d15b9bd42cd3b2077812d4bf32f532a9bd5c4914 [09/14] sd: factor out a sd_discard_mode helper commit: f1e8185fc12c699c3abf4f39b1ff5d7793da3a95 [10/14] sd: cleanup zoned queue limits initialization commit: 9c1d339a1bf45f4d3a2e77bbf24b0ec51f02551c [11/14] sd: convert to the atomic queue limits API commit: 804e498e0496d889090f32f812b5ce1a4f2aa63e [12/14] sr: convert to the atomic queue limits API commit: 969f17e10f5b732c05186ee0126c8a08166d0cda [13/14] block: remove unused queue limits API commit: 1652b0bafeaa8281ca9a805d81e13d7647bd2f44 [14/14] block: add special APIs for run-time disabling of discard and friends commit: 73e3715ed14844067c5c598e72777641004a7f60 Best regards, -- Jens Axboe
[PATCH 1/2] x86/mm address violations of MISRA C:2012 Rule 5.3
This addresses violations of MISRA C:2012 Rule 5.3 which states as following: An identifier declared in an inner scope shall not hide an identifier declared in an outer scope. No functional change. Signed-off-by: Alessandro Zucchelli --- xen/arch/x86/mm.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 5471b6b1f2..720d56e103 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4703,7 +4703,7 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { struct xen_foreign_memory_map fmap; struct domain *d; -struct e820entry *map; +struct e820entry *e; if ( copy_from_guest(, arg, 1) ) return -EFAULT; @@ -4722,23 +4722,23 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) return rc; } -map = xmalloc_array(e820entry_t, fmap.map.nr_entries); -if ( map == NULL ) +e = xmalloc_array(e820entry_t, fmap.map.nr_entries); +if ( e == NULL ) { rcu_unlock_domain(d); return -ENOMEM; } -if ( copy_from_guest(map, fmap.map.buffer, fmap.map.nr_entries) ) +if ( copy_from_guest(e, fmap.map.buffer, fmap.map.nr_entries) ) { -xfree(map); +xfree(e); rcu_unlock_domain(d); return -EFAULT; } spin_lock(>arch.e820_lock); xfree(d->arch.e820); -d->arch.e820 = map; +d->arch.e820 = e; d->arch.nr_e820 = fmap.map.nr_entries; spin_unlock(>arch.e820_lock); -- 2.34.1
[PATCH 2/2] x86/e820 address violations of MISRA C:2012 Rule 5.3
This addresses violations of MISRA C:2012 Rule 5.3 which states as following: An identifier declared in an inner scope shall not hide an identifier declared in an outer scope. No functional change. Signed-off-by: Alessandro Zucchelli --- xen/arch/x86/e820.c | 74 ++--- 1 file changed, 37 insertions(+), 37 deletions(-) diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c index 6a3ce7e0a0..3726823e88 100644 --- a/xen/arch/x86/e820.c +++ b/xen/arch/x86/e820.c @@ -593,79 +593,79 @@ int __init e820_add_range(uint64_t s, uint64_t e, uint32_t type) } int __init e820_change_range_type( -struct e820map *e820, uint64_t s, uint64_t e, +struct e820map *map, uint64_t s, uint64_t e, uint32_t orig_type, uint32_t new_type) { uint64_t rs = 0, re = 0; unsigned int i; -for ( i = 0; i < e820->nr_map; i++ ) +for ( i = 0; i < map->nr_map; i++ ) { /* Have we found the e820 region that includes the specified range? */ -rs = e820->map[i].addr; -re = rs + e820->map[i].size; +rs = map->map[i].addr; +re = rs + map->map[i].size; if ( (s >= rs) && (e <= re) ) break; } -if ( (i == e820->nr_map) || (e820->map[i].type != orig_type) ) +if ( (i == map->nr_map) || (map->map[i].type != orig_type) ) return 0; if ( (s == rs) && (e == re) ) { -e820->map[i].type = new_type; +map->map[i].type = new_type; } else if ( (s == rs) || (e == re) ) { -if ( (e820->nr_map + 1) > ARRAY_SIZE(e820->map) ) +if ( (map->nr_map + 1) > ARRAY_SIZE(map->map) ) goto overflow; -memmove(>map[i+1], >map[i], -(e820->nr_map-i) * sizeof(e820->map[0])); -e820->nr_map++; +memmove(>map[i+1], >map[i], +(map->nr_map-i) * sizeof(map->map[0])); +map->nr_map++; if ( s == rs ) { -e820->map[i].size = e - s; -e820->map[i].type = new_type; -e820->map[i+1].addr = e; -e820->map[i+1].size = re - e; +map->map[i].size = e - s; +map->map[i].type = new_type; +map->map[i+1].addr = e; +map->map[i+1].size = re - e; } else { -e820->map[i].size = s - rs; -e820->map[i+1].addr = s; -e820->map[i+1].size = e - s; -e820->map[i+1].type = new_type; +map->map[i].size = s - rs; +map->map[i+1].addr = s; +map->map[i+1].size = e - s; +map->map[i+1].type = new_type; } } else { -if ( (e820->nr_map + 2) > ARRAY_SIZE(e820->map) ) +if ( (map->nr_map + 2) > ARRAY_SIZE(map->map) ) goto overflow; -memmove(>map[i+2], >map[i], -(e820->nr_map-i) * sizeof(e820->map[0])); -e820->nr_map += 2; +memmove(>map[i+2], >map[i], +(map->nr_map-i) * sizeof(map->map[0])); +map->nr_map += 2; -e820->map[i].size = s - rs; -e820->map[i+1].addr = s; -e820->map[i+1].size = e - s; -e820->map[i+1].type = new_type; -e820->map[i+2].addr = e; -e820->map[i+2].size = re - e; +map->map[i].size = s - rs; +map->map[i+1].addr = s; +map->map[i+1].size = e - s; +map->map[i+1].type = new_type; +map->map[i+2].addr = e; +map->map[i+2].size = re - e; } /* Finally, look for any opportunities to merge adjacent e820 entries. */ -for ( i = 0; i < (e820->nr_map - 1); i++ ) +for ( i = 0; i < (map->nr_map - 1); i++ ) { -if ( (e820->map[i].type != e820->map[i+1].type) || - ((e820->map[i].addr + e820->map[i].size) != e820->map[i+1].addr) ) +if ( (map->map[i].type != map->map[i+1].type) || + ((map->map[i].addr + map->map[i].size) != map->map[i+1].addr) ) continue; -e820->map[i].size += e820->map[i+1].size; -memmove(>map[i+1], >map[i+2], -(e820->nr_map-i-2) * sizeof(e820->map[0])); -e820->nr_map--; +map->map[i].size += map->map[i+1].size; +memmove(>map[i+1], >map[i+2], +(map->nr_map-i-2) * sizeof(map->map[0])); +map->nr_map--; i--; } @@ -678,9 +678,9 @@ int __init e820_change_range_type( } /* Set E820_RAM area (@s,@e) as RESERVED in specified e820 map. */ -int __init reserve_e820_ram(struct e820map *e820, uint64_t s, uint64_t e) +int __init reserve_e820_ram(struct e820map *map, uint64_t s, uint64_t e) { -return e820_change_range_type(e820, s, e, E820_RAM, E820_RESERVED); +return e820_change_range_type(map, s, e, E820_RAM, E820_RESERVED); } unsigned long __init init_e820(const char *str, struct e820map *raw) -- 2.34.1
[PATCH 0/2] Address violations of MISRA C:2012 Rule 5.3
This addresses violations of MISRA C:2012 Rule 5.3 which states as following: An identifier declared in an inner scope shall not hide an identifier declared in an outer scope. In this series are modified files x86/mm.c and x86/e820.c in which occurred instances of variable names shadowing a global variable; these patches are aimed to remove said occurrences leading to partial compliance under MISRA C:2012 Rule 5.3. No functional change. Alessandro Zucchelli (2): x86/mm address violations of MISRA C:2012 Rule 5.3 x86/e820 address violations of MISRA C:2012 Rule 5.3 xen/arch/x86/e820.c | 74 ++--- xen/arch/x86/mm.c | 12 2 files changed, 43 insertions(+), 43 deletions(-) -- 2.34.1
[ovmf test] 186352: all pass - PUSHED
flight 186352 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/186352/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 5e776299a2604b336a947e68593012ab2cc16eb4 baseline version: ovmf ce91687a1b2d4e03b01abb474b4665629776f588 Last test of basis 186346 2024-06-14 07:13:40 Z0 days Testing same since 186352 2024-06-14 13:42:52 Z0 days1 attempts People who touched revisions under test: Gerd Hoffmann jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git ce91687a1b..5e776299a2 5e776299a2604b336a947e68593012ab2cc16eb4 -> xen-tested-master
Re: [PATCH for-4.19] xen/arch: Centralise __read_mostly and __ro_after_init
On Fri, 2024-06-14 at 13:49 +0100, Andrew Cooper wrote: > These being in cache.h is inherited from Linux, but is an > inappropriate > location to live. > > __read_mostly is an optimisation related to data placement in order > to avoid > having shared data in cachelines that are likely to be written to, > but it > really is just a section of the linked image separating data by usage > patterns; it has nothing to do with cache sizes or flushing logic. > > Worse, __ro_after_init was only in xen/cache.h because __read_mostly > was in > arch/cache.h, and has literally nothing whatsoever to do with caches. > > Move the definitions into xen/sections.h, which in paritcular means > that > RISC-V doesn't need to repeat the problematic pattern. Take the > opportunity > to provide a short descriptions of what these are used for. > > For now, leave TODO comments next to the other identical > definitions. It > turns out that unpicking cache.h is more complicated than it appears > because a > number of files use it for transitive dependencies. > > Signed-off-by: Andrew Cooper Release-Acked-By: Oleksii Kurochko > --- > CC: Jan Beulich > CC: Roger Pau Monné > CC: Stefano Stabellini > CC: Julien Grall > CC: Volodymyr Babchuk > CC: Bertrand Marquis > CC: Michal Orzel > CC: Oleksii Kurochko > CC: Shawn Anastasio > > For 4.19. This is to help the RISC-V "full build of Xen" effort > without > introducing a pattern that's going to require extra effort to undo > after the > fact. > --- > xen/arch/arm/include/asm/cache.h | 1 + > xen/arch/ppc/include/asm/cache.h | 1 + > xen/arch/x86/include/asm/cache.h | 1 + > xen/include/xen/cache.h | 1 + > xen/include/xen/sections.h | 21 + > 5 files changed, 25 insertions(+) > > diff --git a/xen/arch/arm/include/asm/cache.h > b/xen/arch/arm/include/asm/cache.h > index 240b6ae0eaa3..029b2896fb3e 100644 > --- a/xen/arch/arm/include/asm/cache.h > +++ b/xen/arch/arm/include/asm/cache.h > @@ -6,6 +6,7 @@ > #define L1_CACHE_SHIFT (CONFIG_ARM_L1_CACHE_SHIFT) > #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT) > > +/* TODO: Phase out the use of this via cache.h */ > #define __read_mostly __section(".data.read_mostly") > > #endif > diff --git a/xen/arch/ppc/include/asm/cache.h > b/xen/arch/ppc/include/asm/cache.h > index 0d7323d7892f..13c0bf3242c8 100644 > --- a/xen/arch/ppc/include/asm/cache.h > +++ b/xen/arch/ppc/include/asm/cache.h > @@ -3,6 +3,7 @@ > #ifndef _ASM_PPC_CACHE_H > #define _ASM_PPC_CACHE_H > > +/* TODO: Phase out the use of this via cache.h */ > #define __read_mostly __section(".data.read_mostly") > > #endif /* _ASM_PPC_CACHE_H */ > diff --git a/xen/arch/x86/include/asm/cache.h > b/xen/arch/x86/include/asm/cache.h > index e4770efb22b9..956c05493e23 100644 > --- a/xen/arch/x86/include/asm/cache.h > +++ b/xen/arch/x86/include/asm/cache.h > @@ -9,6 +9,7 @@ > #define L1_CACHE_SHIFT (CONFIG_X86_L1_CACHE_SHIFT) > #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT) > > +/* TODO: Phase out the use of this via cache.h */ > #define __read_mostly __section(".data.read_mostly") > > #ifndef __ASSEMBLY__ > diff --git a/xen/include/xen/cache.h b/xen/include/xen/cache.h > index f52a0aedf768..55456823c543 100644 > --- a/xen/include/xen/cache.h > +++ b/xen/include/xen/cache.h > @@ -15,6 +15,7 @@ > #define __cacheline_aligned > __attribute__((__aligned__(SMP_CACHE_BYTES))) > #endif > > +/* TODO: Phase out the use of this via cache.h */ > #define __ro_after_init __section(".data.ro_after_init") > > #endif /* __LINUX_CACHE_H */ > diff --git a/xen/include/xen/sections.h b/xen/include/xen/sections.h > index b6cb5604c285..6d4db2b38f0f 100644 > --- a/xen/include/xen/sections.h > +++ b/xen/include/xen/sections.h > @@ -3,9 +3,30 @@ > #ifndef __XEN_SECTIONS_H__ > #define __XEN_SECTIONS_H__ > > +#include > + > /* SAF-0-safe */ > extern char __init_begin[], __init_end[]; > > +/* > + * Some data is expected to be written very rarely (if at all). > + * > + * For performance reasons is it helpful to group such data in the > build, to > + * avoid the linker placing it adjacent to often-written data. > + */ > +#define __read_mostly __section(".data.read_mostly") > + > +/* > + * Some data should be chosen during boot and be immutable > thereafter. > + * > + * Variables annotated with __ro_after_init will become read-only > after boot > + * and suffer a runtime access fault if modified. > + * > + * For architectures/platforms which haven't implemented support, > these > + * variables will be treated as regular mutable data. > + */ > +#define __ro_after_init __section(".data.ro_after_init") > + > #endif /* !__XEN_SECTIONS_H__ */ > /* > * Local variables: > > base-commit: 8b4243a9b560c89bb259db5a27832c253d4bebc7
Re: [PATCH] x86: pci: xen: Remove unnecessary ‘0’ values from ret
On Wed, 12 Jun 2024, Li zeming wrote: > ret is assigned first, so it does not need to initialize the assignment. While the patch seems fine, this description and the shortlog are confusing. -- i. > Signed-off-by: Li zeming > --- > arch/x86/pci/xen.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c > index 652cd53e77f6..67cb9dc9b2e7 100644 > --- a/arch/x86/pci/xen.c > +++ b/arch/x86/pci/xen.c > @@ -267,7 +267,7 @@ static bool __read_mostly pci_seg_supported = true; > > static int xen_initdom_setup_msi_irqs(struct pci_dev *dev, int nvec, int > type) > { > - int ret = 0; > + int ret; > struct msi_desc *msidesc; > > msi_for_each_desc(msidesc, >dev, MSI_DESC_NOTASSOCIATED) { > @@ -353,7 +353,7 @@ static int xen_initdom_setup_msi_irqs(struct pci_dev > *dev, int nvec, int type) > > bool xen_initdom_restore_msi(struct pci_dev *dev) > { > - int ret = 0; > + int ret; > > if (!xen_initial_domain()) > return true; >
[xen-unstable-smoke test] 186349: tolerable all pass - PUSHED
flight 186349 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/186349/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen 8b4243a9b560c89bb259db5a27832c253d4bebc7 baseline version: xen 4fdd8d75566fdad06667a79ec0ce6f43cc466c54 Last test of basis 186337 2024-06-13 16:02:07 Z0 days Testing same since 186349 2024-06-14 10:03:57 Z0 days1 attempts People who touched revisions under test: Luca Fancellu Penny Zheng jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 4fdd8d7556..8b4243a9b5 8b4243a9b560c89bb259db5a27832c253d4bebc7 -> smoke
Re: [PATCH 6/7] x86/cpuid: Fix handling of XSAVE dynamic leaves
On 23/05/2024 5:16 pm, Jan Beulich wrote: > On 23.05.2024 13:16, Andrew Cooper wrote: >> First, if XSAVE is available in hardware but not visible to the guest, the >> dynamic leaves shouldn't be filled in. >> >> Second, the comment concerning XSS state is wrong. VT-x doesn't manage >> host/guest state automatically, but there is provision for "host only" bits >> to >> be set, so the implications are still accurate. >> >> Introduce xstate_compressed_size() to mirror the uncompressed one. Cross >> check it at boot. >> >> Signed-off-by: Andrew Cooper > Reviewed-by: Jan Beulich Thanks. > Irrespective ... > >> v3: >> * Adjust commit message about !XSAVE guests >> * Rebase over boot time cross check >> * Use raw policy > ... it should probably have occurred to me earlier on to ask: Why raw policy? > Isn't the host one the more appropriate one to use for any kind of internal > decisions? State information is identical in all policies. It's the ABI of the X{SAVE,RSTOR}* instructions. Beyond that, consistency. xstate_uncompressed_size() does strictly need to be the raw policy, because it is used by recalculate_xstate() to calculate the host policy. xstate_compressed_size() doesn't have the same restriction, but should use the same source of data. Finally, xstate_{un,}compressed_size() aren't really tied to a choice of features in the first place. They shouldn't be limited to the host_policy's subset of active features. ~Andrew
[xen-unstable test] 186344: tolerable FAIL
flight 186344 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/186344/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 186341 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186341 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186341 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186341 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186341 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186341 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-qcow214 migrate-support-checkfail never pass test-armhf-armhf-xl-qcow215 saverestore-support-checkfail never pass test-armhf-armhf-xl-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-raw 15 saverestore-support-checkfail never pass version targeted for testing: xen 4fdd8d75566fdad06667a79ec0ce6f43cc466c54 baseline version: xen 4fdd8d75566fdad06667a79ec0ce6f43cc466c54 Last test of basis 186344 2024-06-14 05:48:54 Z0 days Testing same since (not found) 0 attempts jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64-xtf pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt pass
Re: [PATCH for-4.19] xen/arch: Centralise __read_mostly and __ro_after_init
On Fri, Jun 14, 2024 at 01:49:50PM +0100, Andrew Cooper wrote: > These being in cache.h is inherited from Linux, but is an inappropriate > location to live. > > __read_mostly is an optimisation related to data placement in order to avoid > having shared data in cachelines that are likely to be written to, but it > really is just a section of the linked image separating data by usage > patterns; it has nothing to do with cache sizes or flushing logic. > > Worse, __ro_after_init was only in xen/cache.h because __read_mostly was in > arch/cache.h, and has literally nothing whatsoever to do with caches. > > Move the definitions into xen/sections.h, which in paritcular means that > RISC-V doesn't need to repeat the problematic pattern. Take the opportunity > to provide a short descriptions of what these are used for. > > For now, leave TODO comments next to the other identical definitions. It > turns out that unpicking cache.h is more complicated than it appears because a > number of files use it for transitive dependencies. I assume that including sections.h from cache.h (in the meantime) creates a circular header dependency? > > Signed-off-by: Andrew Cooper > --- > CC: Jan Beulich > CC: Roger Pau Monné > CC: Stefano Stabellini > CC: Julien Grall > CC: Volodymyr Babchuk > CC: Bertrand Marquis > CC: Michal Orzel > CC: Oleksii Kurochko > CC: Shawn Anastasio > > For 4.19. This is to help the RISC-V "full build of Xen" effort without > introducing a pattern that's going to require extra effort to undo after the > fact. > --- > xen/arch/arm/include/asm/cache.h | 1 + > xen/arch/ppc/include/asm/cache.h | 1 + > xen/arch/x86/include/asm/cache.h | 1 + > xen/include/xen/cache.h | 1 + > xen/include/xen/sections.h | 21 + > 5 files changed, 25 insertions(+) > > diff --git a/xen/arch/arm/include/asm/cache.h > b/xen/arch/arm/include/asm/cache.h > index 240b6ae0eaa3..029b2896fb3e 100644 > --- a/xen/arch/arm/include/asm/cache.h > +++ b/xen/arch/arm/include/asm/cache.h > @@ -6,6 +6,7 @@ > #define L1_CACHE_SHIFT (CONFIG_ARM_L1_CACHE_SHIFT) > #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT) > > +/* TODO: Phase out the use of this via cache.h */ > #define __read_mostly __section(".data.read_mostly") > > #endif > diff --git a/xen/arch/ppc/include/asm/cache.h > b/xen/arch/ppc/include/asm/cache.h > index 0d7323d7892f..13c0bf3242c8 100644 > --- a/xen/arch/ppc/include/asm/cache.h > +++ b/xen/arch/ppc/include/asm/cache.h > @@ -3,6 +3,7 @@ > #ifndef _ASM_PPC_CACHE_H > #define _ASM_PPC_CACHE_H > > +/* TODO: Phase out the use of this via cache.h */ > #define __read_mostly __section(".data.read_mostly") > > #endif /* _ASM_PPC_CACHE_H */ > diff --git a/xen/arch/x86/include/asm/cache.h > b/xen/arch/x86/include/asm/cache.h > index e4770efb22b9..956c05493e23 100644 > --- a/xen/arch/x86/include/asm/cache.h > +++ b/xen/arch/x86/include/asm/cache.h > @@ -9,6 +9,7 @@ > #define L1_CACHE_SHIFT (CONFIG_X86_L1_CACHE_SHIFT) > #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT) > > +/* TODO: Phase out the use of this via cache.h */ > #define __read_mostly __section(".data.read_mostly") > > #ifndef __ASSEMBLY__ > diff --git a/xen/include/xen/cache.h b/xen/include/xen/cache.h > index f52a0aedf768..55456823c543 100644 > --- a/xen/include/xen/cache.h > +++ b/xen/include/xen/cache.h > @@ -15,6 +15,7 @@ > #define __cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) > #endif > > +/* TODO: Phase out the use of this via cache.h */ > #define __ro_after_init __section(".data.ro_after_init") > > #endif /* __LINUX_CACHE_H */ > diff --git a/xen/include/xen/sections.h b/xen/include/xen/sections.h > index b6cb5604c285..6d4db2b38f0f 100644 > --- a/xen/include/xen/sections.h > +++ b/xen/include/xen/sections.h > @@ -3,9 +3,30 @@ > #ifndef __XEN_SECTIONS_H__ > #define __XEN_SECTIONS_H__ > > +#include > + > /* SAF-0-safe */ > extern char __init_begin[], __init_end[]; > > +/* > + * Some data is expected to be written very rarely (if at all). > + * > + * For performance reasons is it helpful to group such data in the build, to > + * avoid the linker placing it adjacent to often-written data. > + */ > +#define __read_mostly __section(".data.read_mostly") > + > +/* > + * Some data should be chosen during boot and be immutable thereafter. > + * > + * Variables annotated with __ro_after_init will become read-only after boot > + * and suffer a runtime access fault if modified. > + * > + * For architectures/platforms which haven't implemented support, these > + * variables will be treated as regular mutable data. > + */ > +#define __ro_after_init __section(".data.ro_after_init") Is it fine for MISRA to have possibly two identical defines in the same translation unit? Thanks, Roger.
Re: [PATCH 4/7] x86/xstate: Rework xstate_ctxt_size() as xstate_uncompressed_size()
On 23/05/2024 5:09 pm, Jan Beulich wrote: > On 23.05.2024 13:16, Andrew Cooper wrote: >> @@ -611,6 +587,40 @@ static bool valid_xcr0(uint64_t xcr0) >> return true; >> } >> >> +unsigned int xstate_uncompressed_size(uint64_t xcr0) >> +{ >> +unsigned int size = XSTATE_AREA_MIN_SIZE, i; >> + >> +ASSERT((xcr0 & ~X86_XCR0_STATES) == 0); > I'm puzzled by the combination of this assertion and ... > >> +if ( xcr0 == xfeature_mask ) >> +return xsave_cntxt_size; > ... this conditional return. Yes, right now we don't support/use any XSS > components, but without any comment the assertion looks overly restrictive > to me. The ASSERT() is to cover a bug I found during testing. It is a hard error to pass in non-XCR0 states. XSS states do not exist in an uncompressed image, and have a base of 0, which ends up hitting a later assertion. This snippet with xfeature_mask is just code motion from xstate_ctxt_size(), expressed as an addition because of diff. It was to save a double XCR0 write in a perceived common case. But, your AMX series makes both xfeature_mask and xsave_cntxt_size bogus by there no longer being a uniform size of the save area, so I can probably get away with simply dropping it here. ~Andrew
Re: [PATCH 2/7] x86/xstate: Cross-check dynamic XSTATE sizes at boot
On 23/05/2024 4:34 pm, Jan Beulich wrote: > On 23.05.2024 13:16, Andrew Cooper wrote: >> Right now, xstate_ctxt_size() performs a cross-check of size with CPUID in >> for >> every call. This is expensive, being used for domain create/migrate, as well >> as to service certain guest CPUID instructions. >> >> Instead, arrange to check the sizes once at boot. See the code comments for >> details. Right now, it just checks hardware against the algorithm >> expectations. Later patches will add further cross-checking. >> >> Introduce the missing X86_XCR0_* and X86_XSS_* constants, and a couple of >> missing CPUID bits. This is to maximise coverage in the sanity check, even >> if >> we don't expect to use/virtualise some of these features any time soon. >> Leave >> HDC and HWP alone for now. We don't have CPUID bits from them stored nicely. > Since you say "the missing", ... > >> --- a/xen/arch/x86/include/asm/x86-defns.h >> +++ b/xen/arch/x86/include/asm/x86-defns.h >> @@ -77,7 +77,7 @@ >> #define X86_CR4_PKS0x0100 /* Protection Key Supervisor */ >> >> /* >> - * XSTATE component flags in XCR0 >> + * XSTATE component flags in XCR0 | MSR_XSS >> */ >> #define X86_XCR0_FP_POS 0 >> #define X86_XCR0_FP (1ULL << X86_XCR0_FP_POS) >> @@ -95,11 +95,34 @@ >> #define X86_XCR0_ZMM (1ULL << X86_XCR0_ZMM_POS) >> #define X86_XCR0_HI_ZMM_POS 7 >> #define X86_XCR0_HI_ZMM (1ULL << X86_XCR0_HI_ZMM_POS) >> +#define X86_XSS_PROC_TRACE(_AC(1, ULL) << 8) >> #define X86_XCR0_PKRU_POS 9 >> #define X86_XCR0_PKRU (1ULL << X86_XCR0_PKRU_POS) >> +#define X86_XSS_PASID (_AC(1, ULL) << 10) >> +#define X86_XSS_CET_U (_AC(1, ULL) << 11) >> +#define X86_XSS_CET_S (_AC(1, ULL) << 12) >> +#define X86_XSS_HDC (_AC(1, ULL) << 13) >> +#define X86_XSS_UINTR (_AC(1, ULL) << 14) >> +#define X86_XSS_LBR (_AC(1, ULL) << 15) >> +#define X86_XSS_HWP (_AC(1, ULL) << 16) >> +#define X86_XCR0_TILE_CFG (_AC(1, ULL) << 17) >> +#define X86_XCR0_TILE_DATA(_AC(1, ULL) << 18) > ... I'm wondering if you deliberately left out APX (bit 19). It was deliberate. APX isn't in the SDM yet, so in principle is still subject to change. I've tweaked the commit message to avoid using the word 'missing'. > Since you're re-doing some of what I have long had in patches already, > I'd also like to ask whether the last underscores each in the two AMX > names really are useful in your opinion. While rebasing isn't going > to be difficult either way, it would be yet simpler with > X86_XCR0_TILECFG and X86_XCR0_TILEDATA, as I've had it in my patches > for over 3 years. I'm torn here. I don't want to deliberately make things harder for you, but I would really prefer that we use the more legible form... >> --- a/xen/arch/x86/xstate.c >> +++ b/xen/arch/x86/xstate.c >> @@ -604,9 +604,156 @@ static bool valid_xcr0(uint64_t xcr0) >> if ( !(xcr0 & X86_XCR0_BNDREGS) != !(xcr0 & X86_XCR0_BNDCSR) ) >> return false; >> >> +/* TILE_CFG and TILE_DATA must be the same. */ >> +if ( !(xcr0 & X86_XCR0_TILE_CFG) != !(xcr0 & X86_XCR0_TILE_DATA) ) >> +return false; >> + >> return true; >> } >> >> +struct xcheck_state { >> +uint64_t states; >> +uint32_t uncomp_size; >> +uint32_t comp_size; >> +}; >> + >> +static void __init check_new_xstate(struct xcheck_state *s, uint64_t new) >> +{ >> +uint32_t hw_size; >> + >> +BUILD_BUG_ON(X86_XCR0_STATES & X86_XSS_STATES); >> + >> +BUG_ON(s->states & new); /* States only increase. */ >> +BUG_ON(!valid_xcr0(s->states | new)); /* Xen thinks it's a good value. >> */ >> +BUG_ON(new & ~(X86_XCR0_STATES | X86_XSS_STATES)); /* Known state. */ >> +BUG_ON((new & X86_XCR0_STATES) && >> + (new & X86_XSS_STATES)); /* User or supervisor, not both. */ >> + >> +s->states |= new; >> +if ( new & X86_XCR0_STATES ) >> +{ >> +if ( !set_xcr0(s->states & X86_XCR0_STATES) ) >> +BUG(); >> +} >> +else >> +set_msr_xss(s->states & X86_XSS_STATES); >> + >> +/* >> + * Check the uncompressed size. Some XSTATEs are out-of-order and fill >> in >> + * prior holes in the state area, so we check that the size doesn't >> + * decrease. >> + */ >> +hw_size = cpuid_count_ebx(0xd, 0); >> + >> +if ( hw_size < s->uncomp_size ) >> +panic("XSTATE 0x%016"PRIx64", new bits {%63pbl}, uncompressed hw >> size %#x < prev size %#x\n", >> + s->states, , hw_size, s->uncomp_size); >> + >> +s->uncomp_size = hw_size; >> + >> +/* >> + * Check the compressed size, if available. All components strictly >> + * appear in index order. In principle there are no holes, but some >> + * components have their base address 64-byte aligned for efficiency >> + * reasons (e.g. AMX-TILE) and there are other
Re: [PATCH for-4.19] xen/arch: Centralise __read_mostly and __ro_after_init
On 14.06.2024 14:49, Andrew Cooper wrote: > These being in cache.h is inherited from Linux, but is an inappropriate > location to live. > > __read_mostly is an optimisation related to data placement in order to avoid > having shared data in cachelines that are likely to be written to, but it > really is just a section of the linked image separating data by usage > patterns; it has nothing to do with cache sizes or flushing logic. > > Worse, __ro_after_init was only in xen/cache.h because __read_mostly was in > arch/cache.h, and has literally nothing whatsoever to do with caches. > > Move the definitions into xen/sections.h, which in paritcular means that > RISC-V doesn't need to repeat the problematic pattern. Take the opportunity > to provide a short descriptions of what these are used for. > > For now, leave TODO comments next to the other identical definitions. It > turns out that unpicking cache.h is more complicated than it appears because a > number of files use it for transitive dependencies. I don't particularly mind this approach, so ... > Signed-off-by: Andrew Cooper Acked-by: Jan Beulich Yet this (or variants thereof) is precisely what I wouldn't have wanted to do myself, for leaving garbled state and, if I'm not mistaken, introducing new Misra violations because of the redundant #define-s. Jan
[PATCH for-4.19] xen/arch: Centralise __read_mostly and __ro_after_init
These being in cache.h is inherited from Linux, but is an inappropriate location to live. __read_mostly is an optimisation related to data placement in order to avoid having shared data in cachelines that are likely to be written to, but it really is just a section of the linked image separating data by usage patterns; it has nothing to do with cache sizes or flushing logic. Worse, __ro_after_init was only in xen/cache.h because __read_mostly was in arch/cache.h, and has literally nothing whatsoever to do with caches. Move the definitions into xen/sections.h, which in paritcular means that RISC-V doesn't need to repeat the problematic pattern. Take the opportunity to provide a short descriptions of what these are used for. For now, leave TODO comments next to the other identical definitions. It turns out that unpicking cache.h is more complicated than it appears because a number of files use it for transitive dependencies. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Stefano Stabellini CC: Julien Grall CC: Volodymyr Babchuk CC: Bertrand Marquis CC: Michal Orzel CC: Oleksii Kurochko CC: Shawn Anastasio For 4.19. This is to help the RISC-V "full build of Xen" effort without introducing a pattern that's going to require extra effort to undo after the fact. --- xen/arch/arm/include/asm/cache.h | 1 + xen/arch/ppc/include/asm/cache.h | 1 + xen/arch/x86/include/asm/cache.h | 1 + xen/include/xen/cache.h | 1 + xen/include/xen/sections.h | 21 + 5 files changed, 25 insertions(+) diff --git a/xen/arch/arm/include/asm/cache.h b/xen/arch/arm/include/asm/cache.h index 240b6ae0eaa3..029b2896fb3e 100644 --- a/xen/arch/arm/include/asm/cache.h +++ b/xen/arch/arm/include/asm/cache.h @@ -6,6 +6,7 @@ #define L1_CACHE_SHIFT (CONFIG_ARM_L1_CACHE_SHIFT) #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT) +/* TODO: Phase out the use of this via cache.h */ #define __read_mostly __section(".data.read_mostly") #endif diff --git a/xen/arch/ppc/include/asm/cache.h b/xen/arch/ppc/include/asm/cache.h index 0d7323d7892f..13c0bf3242c8 100644 --- a/xen/arch/ppc/include/asm/cache.h +++ b/xen/arch/ppc/include/asm/cache.h @@ -3,6 +3,7 @@ #ifndef _ASM_PPC_CACHE_H #define _ASM_PPC_CACHE_H +/* TODO: Phase out the use of this via cache.h */ #define __read_mostly __section(".data.read_mostly") #endif /* _ASM_PPC_CACHE_H */ diff --git a/xen/arch/x86/include/asm/cache.h b/xen/arch/x86/include/asm/cache.h index e4770efb22b9..956c05493e23 100644 --- a/xen/arch/x86/include/asm/cache.h +++ b/xen/arch/x86/include/asm/cache.h @@ -9,6 +9,7 @@ #define L1_CACHE_SHIFT (CONFIG_X86_L1_CACHE_SHIFT) #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT) +/* TODO: Phase out the use of this via cache.h */ #define __read_mostly __section(".data.read_mostly") #ifndef __ASSEMBLY__ diff --git a/xen/include/xen/cache.h b/xen/include/xen/cache.h index f52a0aedf768..55456823c543 100644 --- a/xen/include/xen/cache.h +++ b/xen/include/xen/cache.h @@ -15,6 +15,7 @@ #define __cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES))) #endif +/* TODO: Phase out the use of this via cache.h */ #define __ro_after_init __section(".data.ro_after_init") #endif /* __LINUX_CACHE_H */ diff --git a/xen/include/xen/sections.h b/xen/include/xen/sections.h index b6cb5604c285..6d4db2b38f0f 100644 --- a/xen/include/xen/sections.h +++ b/xen/include/xen/sections.h @@ -3,9 +3,30 @@ #ifndef __XEN_SECTIONS_H__ #define __XEN_SECTIONS_H__ +#include + /* SAF-0-safe */ extern char __init_begin[], __init_end[]; +/* + * Some data is expected to be written very rarely (if at all). + * + * For performance reasons is it helpful to group such data in the build, to + * avoid the linker placing it adjacent to often-written data. + */ +#define __read_mostly __section(".data.read_mostly") + +/* + * Some data should be chosen during boot and be immutable thereafter. + * + * Variables annotated with __ro_after_init will become read-only after boot + * and suffer a runtime access fault if modified. + * + * For architectures/platforms which haven't implemented support, these + * variables will be treated as regular mutable data. + */ +#define __ro_after_init __section(".data.ro_after_init") + #endif /* !__XEN_SECTIONS_H__ */ /* * Local variables: base-commit: 8b4243a9b560c89bb259db5a27832c253d4bebc7 -- 2.39.2
Re: [PATCH v3 0/3] x86/irq: fixes for CPU hot{,un}plug
On Fri, Jun 14, 2024 at 01:52:59PM +0200, Oleksii K. wrote: > On Fri, 2024-06-14 at 09:28 +0200, Roger Pau Monné wrote: > > Sorry, forgot to add the for-4.19 tag and Cc Oleksii. > > > > Since we have taken the start of the series, we might as well take > > the > > remaining patches (if other x86 maintainers agree) and attempt to > > hopefully fix all the interrupt issues with CPU hotplug/unplug. > > > > FTR: there are further issues when doing CPU hotplug/unplug from a > > PVH > > dom0, but those are out of the scope for 4.19, as I haven't even > > started to diagnose what's going on. > And this issues were before the current patch series was introduced? Sure, the issues with PVH dom0 cpu hotplug/unplug are additional to the ones fixed here. Thanks, Roger.
Xen Summit talks live on YouTube
Hi everyone, We had a great few days filled with discussions and talks during the Xen Summit in Lisbon. These are now available for you to watch on YouTube! https://www.youtube.com/playlist?list=PLQMQQsKgvLntZiKoELFs22Mtk-tBNNOMJ Many thanks, Kelly Choi Community Manager Xen Project
Re: [PATCH v3 0/3] x86/irq: fixes for CPU hot{,un}plug
On Fri, 2024-06-14 at 09:28 +0200, Roger Pau Monné wrote: > Sorry, forgot to add the for-4.19 tag and Cc Oleksii. > > Since we have taken the start of the series, we might as well take > the > remaining patches (if other x86 maintainers agree) and attempt to > hopefully fix all the interrupt issues with CPU hotplug/unplug. > > FTR: there are further issues when doing CPU hotplug/unplug from a > PVH > dom0, but those are out of the scope for 4.19, as I haven't even > started to diagnose what's going on. And this issues were before the current patch series was introduced? ~ Oleksii > > Thanks, Roger. > > On Thu, Jun 13, 2024 at 06:56:14PM +0200, Roger Pau Monne wrote: > > Hello, > > > > The following series aim to fix interrupt handling when doing CPU > > plug/unplug operations. Without this series running: > > > > cpus=`xl info max_cpu_id` > > while [ 1 ]; do > > for i in `seq 1 $cpus`; do > > xen-hptool cpu-offline $i; > > xen-hptool cpu-online $i; > > done > > done > > > > Quite quickly results in interrupts getting lost and "No irq > > handler for > > vector" messages on the Xen console. Drivers in dom0 also start > > getting > > interrupt timeouts and the system becomes unusable. > > > > After applying the series running the loop over night still result > > in a > > fully usable system, no "No irq handler for vector" messages at > > all, no > > interrupt loses reported by dom0. Test with x2apic- > > mode={mixed,cluster}. > > > > I've attempted to document all code as good as I could, interrupt > > handling has some unexpected corner cases that are hard to diagnose > > and > > reason about. > > > > Some XenRT testing is undergoing to ensure no breakages. > > > > Thanks, Roger. > > > > Roger Pau Monne (3): > > x86/irq: deal with old_cpu_mask for interrupts in movement in > > fixup_irqs() > > x86/irq: handle moving interrupts in _assign_irq_vector() > > x86/irq: forward pending interrupts to new destination in > > fixup_irqs() > > > > xen/arch/x86/include/asm/apic.h | 5 + > > xen/arch/x86/irq.c | 163 +--- > > > > 2 files changed, 132 insertions(+), 36 deletions(-) > > > > -- > > 2.45.2 > >
Re: [PATCH for 4.19?] x86/Intel: unlock CPUID earlier for the BSP
On 14.06.2024 12:14, Andrew Cooper wrote: > On 14/06/2024 7:27 am, Jan Beulich wrote: >> On 13.06.2024 18:17, Andrew Cooper wrote: >>> On 13/06/2024 9:19 am, Jan Beulich wrote: Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If this bit is set by the BIOS then CPUID evaluation does not work when data from any leaf greater than two is needed; early_cpu_init() in particular wants to collect leaf 7 data. Cure this by unlocking CPUID right before evaluating anything which depends on the maximum CPUID leaf being greater than two. Inspired by (and description cloned from) Linux commit 0c2f6d04619e ("x86/topology/intel: Unlock CPUID before evaluating anything"). Signed-off-by: Jan Beulich --- While I couldn't spot anything, it kind of feels as if I'm overlooking further places where we might be inspecting in particular leaf 7 yet earlier. No Fixes: tag(s), as imo it would be too many that would want enumerating. >>> I also saw that go by, but concluded that Xen doesn't need it, hence why >>> I left it alone. >>> >>> The truth is that only the BSP needs it. APs sort it out in the >>> trampoline via trampoline_misc_enable_off, because they need to clear >>> XD_DISABLE in prior to enabling paging, so we should be taking it out of >>> early_init_intel(). >> Except for the (odd) case also mentioned to Roger, where the BSP might have >> the bit clear but some (or all) AP(s) have it set. > > Fine I suppose. It's a single MSR adjustment once per CPU. > >> >>> But, we don't have an early BSP-only early hook, and I'm not overwhelmed >>> at the idea of exporting it from intel.c >>> >>> I was intending to leave it alone until I can burn this whole >>> infrastructure to the ground and make it work nicely with policies, but >>> that's not a job for this point in the release... >> This last part reads like the rest of your reply isn't an objection to me >> putting this in with Roger's R-b, but it would be nice if you could >> confirm this understanding of mine. Without this last part it, especially >> the 2nd from last paragraph, certainly reads a little like an objection. > > I'm -1 to this generally. It's churn without fixing anything AFAICT, How that? We clearly do the adjustment too late right now for the BSP. All the leaf-7 stuff added to early_cpu_init() (iirc in part in the course of speculation work) is useless on a system where firmware set that flag. Jan > and is moving in a direction which will need undoing in the future. > > But, because it doesn't fix anything, I don't think it's appropriate to > be tagged as 4.19 even if you and roger feel strongly enough to put it > into 4.20. > > ~Andrew
Re: [PATCH for 4.19?] x86/Intel: unlock CPUID earlier for the BSP
On 14/06/2024 7:27 am, Jan Beulich wrote: > On 13.06.2024 18:17, Andrew Cooper wrote: >> On 13/06/2024 9:19 am, Jan Beulich wrote: >>> Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If >>> this bit is set by the BIOS then CPUID evaluation does not work when >>> data from any leaf greater than two is needed; early_cpu_init() in >>> particular wants to collect leaf 7 data. >>> >>> Cure this by unlocking CPUID right before evaluating anything which >>> depends on the maximum CPUID leaf being greater than two. >>> >>> Inspired by (and description cloned from) Linux commit 0c2f6d04619e >>> ("x86/topology/intel: Unlock CPUID before evaluating anything"). >>> >>> Signed-off-by: Jan Beulich >>> --- >>> While I couldn't spot anything, it kind of feels as if I'm overlooking >>> further places where we might be inspecting in particular leaf 7 yet >>> earlier. >>> >>> No Fixes: tag(s), as imo it would be too many that would want >>> enumerating. >> I also saw that go by, but concluded that Xen doesn't need it, hence why >> I left it alone. >> >> The truth is that only the BSP needs it. APs sort it out in the >> trampoline via trampoline_misc_enable_off, because they need to clear >> XD_DISABLE in prior to enabling paging, so we should be taking it out of >> early_init_intel(). > Except for the (odd) case also mentioned to Roger, where the BSP might have > the bit clear but some (or all) AP(s) have it set. Fine I suppose. It's a single MSR adjustment once per CPU. > >> But, we don't have an early BSP-only early hook, and I'm not overwhelmed >> at the idea of exporting it from intel.c >> >> I was intending to leave it alone until I can burn this whole >> infrastructure to the ground and make it work nicely with policies, but >> that's not a job for this point in the release... > This last part reads like the rest of your reply isn't an objection to me > putting this in with Roger's R-b, but it would be nice if you could > confirm this understanding of mine. Without this last part it, especially > the 2nd from last paragraph, certainly reads a little like an objection. I'm -1 to this generally. It's churn without fixing anything AFAICT, and is moving in a direction which will need undoing in the future. But, because it doesn't fix anything, I don't think it's appropriate to be tagged as 4.19 even if you and roger feel strongly enough to put it into 4.20. ~Andrew
Re: [PATCH v12 5/8] xen/riscv: add minimal stuff to mm.h to build full Xen
On 11/06/2024 7:23 pm, Oleksii K. wrote: > On Tue, 2024-06-11 at 16:53 +0100, Andrew Cooper wrote: >> On 30/05/2024 7:22 pm, Oleksii K. wrote: >>> On Thu, 2024-05-30 at 18:23 +0100, Andrew Cooper wrote: On 29/05/2024 8:55 pm, Oleksii Kurochko wrote: > Signed-off-by: Oleksii Kurochko > Acked-by: Jan Beulich This patch looks like it can go in independently? Or does it depend on having bitops.h working in practice? However, one very strong suggestion... > diff --git a/xen/arch/riscv/include/asm/mm.h > b/xen/arch/riscv/include/asm/mm.h > index 07c7a0abba..cc4a07a71c 100644 > --- a/xen/arch/riscv/include/asm/mm.h > +++ b/xen/arch/riscv/include/asm/mm.h > @@ -3,11 +3,246 @@ > > +/* PDX of the first page in the frame table. */ > +extern unsigned long frametable_base_pdx; > + > +/* Convert between machine frame numbers and page-info > structures. > */ > +#define > mfn_to_page(mfn) \ > + (frame_table + (mfn_to_pdx(mfn) - frametable_base_pdx)) > +#define > page_to_mfn(pg) \ > + pdx_to_mfn((unsigned long)((pg) - frame_table) + > frametable_base_pdx) Do yourself a favour and not introduce frametable_base_pdx to begin with. Every RISC-V board I can find has things starting from 0 in physical address space, with RAM starting immediately after. >>> I checked Linux kernel and grep there: >>> [ok@fedora linux-aia]$ grep -Rni "memory@" arch/riscv/boot/dts/ >>> -- >>> exclude "*.tmp" -I >>> arch/riscv/boot/dts/starfive/jh7110-starfive-visionfive- >>> 2.dtsi:33: >>> memory@4000 { >>> arch/riscv/boot/dts/starfive/jh7100-common.dtsi:28: >>> memory@8000 >>> { >>> arch/riscv/boot/dts/microchip/mpfs-sev-kit.dts:49: >>> ddrc_cache: >>> memory@10 { >>> arch/riscv/boot/dts/microchip/mpfs-m100pfsevp.dts:33: >>> ddrc_cache_lo: >>> memory@8000 { >>> arch/riscv/boot/dts/microchip/mpfs-m100pfsevp.dts:37: >>> ddrc_cache_hi: >>> memory@104000 { >>> arch/riscv/boot/dts/microchip/mpfs-tysom-m.dts:34: >>> ddrc_cache_lo: >>> memory@8000 { >>> arch/riscv/boot/dts/microchip/mpfs-tysom-m.dts:40: >>> ddrc_cache_hi: >>> memory@10 { >>> arch/riscv/boot/dts/microchip/mpfs-polarberry.dts:22: >>> ddrc_cache_lo: >>> memory@8000 { >>> arch/riscv/boot/dts/microchip/mpfs-polarberry.dts:27: >>> ddrc_cache_hi: >>> memory@10 { >>> arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dts:57: >>> ddrc_cache_lo: >>> memory@8000 { >>> arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dts:63: >>> ddrc_cache_hi: >>> memory@104000 { >>> arch/riscv/boot/dts/thead/th1520-beaglev-ahead.dts:32: memory@0 >>> { >>> arch/riscv/boot/dts/thead/th1520-lichee-module-4a.dtsi:14: >>> memory@0 { >>> arch/riscv/boot/dts/sophgo/cv1800b-milkv-duo.dts:26: >>> memory@8000 >>> { >>> arch/riscv/boot/dts/sophgo/cv1812h.dtsi:12: memory@8000 >>> { >>> arch/riscv/boot/dts/sifive/hifive-unmatched-a00.dts:26: >>> memory@8000 >>> { >>> arch/riscv/boot/dts/sifive/hifive-unleashed-a00.dts:25: >>> memory@8000 >>> { >>> arch/riscv/boot/dts/canaan/k210.dtsi:82: sram: >>> memory@8000 { >>> >>> And based on that majority of supported by Linux kernel boards has >>> RAM >>> starting not from 0 in physical address space. Am I confusing >>> something? >>> Taking the microchip board as an example, RAM actually starts at 0x800, >>> Today we had conversation with the guy from SiFive in xen-devel >>> channel >>> and he mentioned that they are using "starfive visionfive2 and >>> sifive >>> unleashed platforms." which based on the grep above has RAM not at >>> 0 >>> address. >>> >>> Also, QEMU uses 0x800. >>> which means that having frametable_base_pdx and assuming it does get set to 0x8000 (which isn't even a certainty, given that I think you'll need struct pages covering the PLICs), then what you are trading off is: * Saving 32k of virtual address space only (no need to even allocate memory for this range of the framtable), by * Having an extra memory load and add/sub in every page <-> mfn conversion, which is a screaming hotpath all over Xen. It's a terribly short-sighted tradeoff. 32k of VA space might be worth saving in a 32bit build (I personally wouldn't - especially as there's no need to share Xen's VA space with guests, given no PV guests on ARM/RISC-V), but it's absolutely not at all in an a 64bit build with TB of VA space available. Even if we do find a board with the first interesting thing in the frametable starting sufficiently away from 0 that it might be
Re: [PATCH v4 0/7] Static shared memory followup v2 - pt2
On 14/06/2024 08:54, Luca Fancellu wrote: Hi Julien, Hi Luca, On 13 Jun 2024, at 12:31, Julien Grall wrote: Hi, On 11/06/2024 13:42, Michal Orzel wrote: We would like this serie to be in Xen 4.19, there was a misunderstanding on our side because we thought that since the serie was sent before the last posting date, it could have been a candidate for merging in the new release, instead after speaking with Julien and Oleksii we are now aware that we need to provide a justification for its presence. Pros to this serie is that we are closing the circle for static shared memory, allowing it to use memory from the host or from Xen, it is also a feature that is not enabled by default, so it should not cause too much disruption in case of any bugs that escaped the review, however we’ve tested many configurations for that with/without enabling the feature if that can be an additional value. Cons: we are touching some common code related to p2m, but also there the impact should be minimal because the new code is subject to l2 foreign mapping (to be confirmed maybe from a p2m expert like Julien). The comments on patch 3 of this serie are addressed by this patch: https://patchwork.kernel.org/project/xen-devel/patch/20240528125603.2467640-1-luca.fance...@arm.com/ And the serie is fully reviewed. So our request is to allow this serie in 4.19, Oleksii, ARM maintainers, do you agree on that? As a main reviewer of this series I'm ok to have this series in. It is nicely encapsulated and the feature itself is still in unsupported state. I don't foresee any issues with it. There are changes in the p2m code and the memory allocation for boot domains. So is it really encapsulated? For me there are two risks: * p2m (already mentioned by Luca): We modify the code to put foreign mapping. The worse that can happen if we don't release the foreign mapping. This would mean the page will not be freed. AFAIK, we don't exercise this path in the CI. * domain allocation: This mainly look like refactoring. And the path is exercised in the CI. So I am not concerned with the domain allocation one. @Luca, would it be possible to detail how did you test the foreign pages were properly removed? So at first we tested the code, with/without the static shared memory feature enabled, creating/destroying guest from Dom0 and checking that everything was ok. Just to details a bit more for the other. In a normal setup, guest would be using PV block/network which are based on grant table. To exercise the foreing mapping path, you would need a backend that map using the GFN directly. This is the case for virtio MMIO. Other users are IIRC XenFB, IOREQ (if you emulate a device). After a chat on Matrix with Julien he suggested that using a virtio-mmio disk was better to stress out the foreign mapping looking for regressions. Luckily I’ve found this slide deck from @Oleksandr: https://static.linaro.org/connect/lvc21/presentations/lvc21-314.pdf So I did a setup using fvp-base, having a disk with two partitions containing Dom0 rootfs and DomU rootfs, Dom0 sees this disk using VirtIO block. Thanks for the testing! I am satisfied with the result. I have committed the series. Cheers, -- Julien Grall
[ovmf test] 186346: all pass - PUSHED
flight 186346 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/186346/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf ce91687a1b2d4e03b01abb474b4665629776f588 baseline version: ovmf 712797cf19acd292bf203522a79e40e7e13d268b Last test of basis 186338 2024-06-13 16:11:15 Z0 days Testing same since 186346 2024-06-14 07:13:40 Z0 days1 attempts People who touched revisions under test: Jiaxin Wu jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git 712797cf19..ce91687a1b ce91687a1b2d4e03b01abb474b4665629776f588 -> xen-tested-master
[libvirt test] 186343: tolerable all pass - PUSHED
flight 186343 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/186343/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 186330 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail never pass version targeted for testing: libvirt 991c324fae2cfca8d592437ffc386171d343c836 baseline version: libvirt 230d81fc3a1398e66f482e404abe9759afd9bb59 Last test of basis 186330 2024-06-13 04:18:43 Z1 days Testing same since 186343 2024-06-14 04:22:22 Z0 days1 attempts People who touched revisions under test: Daniel P. Berrangé jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-arm64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-amd64-libvirt-xsm pass test-arm64-arm64-libvirt-xsm pass test-amd64-amd64-libvirt pass test-arm64-arm64-libvirt pass test-armhf-armhf-libvirt pass test-amd64-amd64-libvirt-pairpass test-amd64-amd64-libvirt-qcow2 pass test-arm64-arm64-libvirt-qcow2 pass test-amd64-amd64-libvirt-raw pass test-arm64-arm64-libvirt-raw pass test-amd64-amd64-libvirt-vhd pass test-armhf-armhf-libvirt-vhd pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/libvirt.git 230d81fc3a..991c324fae 991c324fae2cfca8d592437ffc386171d343c836 -> xen-tested-master
[XEN PATCH v3] automation/eclair: extend existing deviations of MISRA C Rule 16.3
Update ECLAIR configuration to deviate more cases where an unintentional fallthrough cannot happen. Add Rule 16.3 to the monitored set and tag it as clean for arm. Signed-off-by: Federico Serafini --- Changes from v2: - fixed grammar; - reprhased deviations regarding do-while-false and ASSERT_UNREACHABLE(). --- .../eclair_analysis/ECLAIR/deviations.ecl | 31 ++- .../eclair_analysis/ECLAIR/monitored.ecl | 1 + automation/eclair_analysis/ECLAIR/tagging.ecl | 2 +- docs/misra/deviations.rst | 28 +++-- 4 files changed, 50 insertions(+), 12 deletions(-) diff --git a/automation/eclair_analysis/ECLAIR/deviations.ecl b/automation/eclair_analysis/ECLAIR/deviations.ecl index 447c1e6661..3bdfc3a84d 100644 --- a/automation/eclair_analysis/ECLAIR/deviations.ecl +++ b/automation/eclair_analysis/ECLAIR/deviations.ecl @@ -364,14 +364,30 @@ therefore it is deemed better to leave such files as is." -config=MC3R1.R16.2,reports+={deliberate, "any_area(any_loc(file(x86_emulate||x86_svm_emulate)))"} -doc_end --doc_begin="Switch clauses ending with continue, goto, return statements are -safe." --config=MC3R1.R16.3,terminals+={safe, "node(continue_stmt||goto_stmt||return_stmt)"} +-doc_begin="Statements that change the control flow (i.e., break, continue, goto, return) and calls to functions that do not return the control back are \"allowed terminal statements\"." +-stmt_selector+={r16_3_allowed_terminal, "node(break_stmt||continue_stmt||goto_stmt||return_stmt)||call(property(noreturn))"} +-config=MC3R1.R16.3,terminals+={safe, "r16_3_allowed_terminal"} +-doc_end + +-doc_begin="An if-else statement having both branches ending with an allowed terminal statement is itself an allowed terminal statement." +-stmt_selector+={r16_3_if, "node(if_stmt)&&(child(then,r16_3_allowed_terminal)||child(then,any_stmt(stmt,-1,r16_3_allowed_terminal)))"} +-stmt_selector+={r16_3_else, "node(if_stmt)&&(child(else,r16_3_allowed_terminal)||child(else,any_stmt(stmt,-1,r16_3_allowed_terminal)))"} +-stmt_selector+={r16_3_if_else, "r16_3_if&_3_else"} +-config=MC3R1.R16.3,terminals+={safe, "r16_3_if_else"} +-doc_end + +-doc_begin="An if-else statement having an always true condition and the true branch ending with an allowed terminal statement is itself an allowed terminal statement." +-stmt_selector+={r16_3_if_true, "r16_3_if&(cond,definitely_in(1..))"} +-config=MC3R1.R16.3,terminals+={safe, "r16_3_if_true"} +-doc_end + +-doc_begin="A switch clause ending with a statement expression which, in turn, ends with an allowed terminal statement is safe." +-config=MC3R1.R16.3,terminals+={safe, "node(stmt_expr)&(stmt,node(compound_stmt)&_stmt(stmt,-1,r16_3_allowed_terminal||r16_3_if_else||r16_3_if_true))"} -doc_end --doc_begin="Switch clauses ending with a call to a function that does not give -the control back (i.e., a function with attribute noreturn) are safe." --config=MC3R1.R16.3,terminals+={safe, "call(property(noreturn))"} +-doc_begin="A switch clause ending with a do-while-false the body of which, in turn, ends with an allowed terminal statement is safe. +An exception to that is the macro ASSERT_UNREACHABLE() which is effective in debug build only: a switch clause ending with ASSERT_UNREACHABLE() is not considered safe." +-config=MC3R1.R16.3,terminals+={safe, "!macro(name(ASSERT_UNREACHABLE))&(do_stmt)&(cond,definitely_in(0))&(body,any_stmt(stmt,-1,r16_3_allowed_terminal||r16_3_if_else||r16_3_if_true))"} -doc_end -doc_begin="Switch clauses ending with pseudo-keyword \"fallthrough\" are @@ -383,8 +399,7 @@ safe." -config=MC3R1.R16.3,reports+={safe, "any_area(end_loc(any_exp(text(/BUG\\(\\);/"} -doc_end --doc_begin="Switch clauses not ending with the break statement are safe if an -explicit comment indicating the fallthrough intention is present." +-doc_begin="Switch clauses ending with an explicit comment indicating the fallthrough intention are safe." -config=MC3R1.R16.3,reports+={safe, "any_area(end_loc(any_exp(text(^(?s).*/\\* [fF]all ?through.? \\*/.*$,0..1"} -doc_end diff --git a/automation/eclair_analysis/ECLAIR/monitored.ecl b/automation/eclair_analysis/ECLAIR/monitored.ecl index 4daecb0c83..45a60074f9 100644 --- a/automation/eclair_analysis/ECLAIR/monitored.ecl +++ b/automation/eclair_analysis/ECLAIR/monitored.ecl @@ -22,6 +22,7 @@ -enable=MC3R1.R14.1 -enable=MC3R1.R14.4 -enable=MC3R1.R16.2 +-enable=MC3R1.R16.3 -enable=MC3R1.R16.6 -enable=MC3R1.R16.7 -enable=MC3R1.R17.1 diff --git a/automation/eclair_analysis/ECLAIR/tagging.ecl b/automation/eclair_analysis/ECLAIR/tagging.ecl index a354ff322e..07de2e7b65 100644 --- a/automation/eclair_analysis/ECLAIR/tagging.ecl +++ b/automation/eclair_analysis/ECLAIR/tagging.ecl @@ -105,7 +105,7 @@ if(string_equal(target,"x86_64"), ) if(string_equal(target,"arm64"), -
[XEN PATCH v3] automation/eclair: add deviation for MISRA C Rule 17.7
Update ECLAIR configuration to deviate some cases where not using the return value of a function is not dangerous. Signed-off-by: Federico Serafini --- Changes in v3: - removed unwanted underscores; - grammar fixed; - do not constraint to the first actual argument. Changes in v2: - do not deviate strlcpy and strlcat. --- automation/eclair_analysis/ECLAIR/deviations.ecl | 4 docs/misra/deviations.rst| 9 + 2 files changed, 13 insertions(+) diff --git a/automation/eclair_analysis/ECLAIR/deviations.ecl b/automation/eclair_analysis/ECLAIR/deviations.ecl index 447c1e6661..97281082a8 100644 --- a/automation/eclair_analysis/ECLAIR/deviations.ecl +++ b/automation/eclair_analysis/ECLAIR/deviations.ecl @@ -413,6 +413,10 @@ explicit comment indicating the fallthrough intention is present." -config=MC3R1.R17.1,macros+={hide , "^va_(arg|start|copy|end)$"} -doc_end +-doc_begin="Not using the return value of a function does not endanger safety if it coincides with an actual argument." +-config=MC3R1.R17.7,calls+={safe, "any()", "decl(name(__builtin_memcpy||__builtin_memmove||__builtin_memset||cpumask_check))"} +-doc_end + # # Series 18. # diff --git a/docs/misra/deviations.rst b/docs/misra/deviations.rst index 36959aa44a..f3abe31eb5 100644 --- a/docs/misra/deviations.rst +++ b/docs/misra/deviations.rst @@ -364,6 +364,15 @@ Deviations related to MISRA C:2012 Rules: by `stdarg.h`. - Tagged as `deliberate` for ECLAIR. + * - R17.7 + - Not using the return value of a function does not endanger safety if it + coincides with an actual argument. + - Tagged as `safe` for ECLAIR. Such functions are: + - __builtin_memcpy() + - __builtin_memmove() + - __builtin_memset() + - cpumask_check() + * - R20.4 - The override of the keyword \"inline\" in xen/compiler.h is present so that section contents checks pass when the compiler chooses not to -- 2.34.1
Re: [PATCH 01/14] ubd: refactor the interrupt handler
On 31/05/2024 08:47, Christoph Hellwig wrote: Instead of a separate handler function that leaves no work in the interrupt hanler itself, split out a per-request end I/O helper and clean up the coding style and variable naming while we're at it. Signed-off-by: Christoph Hellwig Reviewed-by: Martin K. Petersen --- arch/um/drivers/ubd_kern.c | 49 ++ 1 file changed, 18 insertions(+), 31 deletions(-) diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index ef805eaa9e013d..0c9542d58c01b7 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -447,43 +447,30 @@ static int bulk_req_safe_read( return n; } -/* Called without dev->lock held, and only in interrupt context. */ -static void ubd_handler(void) +static void ubd_end_request(struct io_thread_req *io_req) { - int n; - int count; - - while(1){ - n = bulk_req_safe_read( - thread_fd, - irq_req_buffer, - _remainder, - _remainder_size, - UBD_REQ_BUFFER_SIZE - ); - if (n < 0) { - if(n == -EAGAIN) - break; - printk(KERN_ERR "spurious interrupt in ubd_handler, " - "err = %d\n", -n); - return; - } - for (count = 0; count < n/sizeof(struct io_thread_req *); count++) { - struct io_thread_req *io_req = (*irq_req_buffer)[count]; - - if ((io_req->error == BLK_STS_NOTSUPP) && (req_op(io_req->req) == REQ_OP_DISCARD)) { - blk_queue_max_discard_sectors(io_req->req->q, 0); - blk_queue_max_write_zeroes_sectors(io_req->req->q, 0); - } - blk_mq_end_request(io_req->req, io_req->error); - kfree(io_req); - } + if (io_req->error == BLK_STS_NOTSUPP && + req_op(io_req->req) == REQ_OP_DISCARD) { + blk_queue_max_discard_sectors(io_req->req->q, 0); + blk_queue_max_write_zeroes_sectors(io_req->req->q, 0); } + blk_mq_end_request(io_req->req, io_req->error); + kfree(io_req); } static irqreturn_t ubd_intr(int irq, void *dev) { - ubd_handler(); + int len, i; + + while ((len = bulk_req_safe_read(thread_fd, irq_req_buffer, + _remainder, _remainder_size, + UBD_REQ_BUFFER_SIZE)) >= 0) { + for (i = 0; i < len / sizeof(struct io_thread_req *); i++) + ubd_end_request((*irq_req_buffer)[i]); + } + + if (len < 0 && len != -EAGAIN) + pr_err("spurious interrupt in %s, err = %d\n", __func__, len); return IRQ_HANDLED; } Acked-By: Anton Ivanov -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/
Re: [PATCH 02/14] ubd: untagle discard vs write zeroes not support handling
On 31/05/2024 08:47, Christoph Hellwig wrote: Discard and Write Zeroes are different operation and implemented by different fallocate opcodes for ubd. If one fails the other one can work and vice versa. Split the code to disable the operations in ubd_handler to only disable the operation that actually failed. Fixes: 50109b5a03b4 ("um: Add support for DISCARD in the UBD Driver") Signed-off-by: Christoph Hellwig Reviewed-by: Bart Van Assche Reviewed-by: Damien Le Moal Reviewed-by: Martin K. Petersen --- arch/um/drivers/ubd_kern.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index 0c9542d58c01b7..093c87879d08ba 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -449,10 +449,11 @@ static int bulk_req_safe_read( static void ubd_end_request(struct io_thread_req *io_req) { - if (io_req->error == BLK_STS_NOTSUPP && - req_op(io_req->req) == REQ_OP_DISCARD) { - blk_queue_max_discard_sectors(io_req->req->q, 0); - blk_queue_max_write_zeroes_sectors(io_req->req->q, 0); + if (io_req->error == BLK_STS_NOTSUPP) { + if (req_op(io_req->req) == REQ_OP_DISCARD) + blk_queue_max_discard_sectors(io_req->req->q, 0); + else if (req_op(io_req->req) == REQ_OP_WRITE_ZEROES) + blk_queue_max_write_zeroes_sectors(io_req->req->q, 0); } blk_mq_end_request(io_req->req, io_req->error); kfree(io_req); Acked-By: Anton Ivanov -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/
Re: Design session notes: GPU acceleration in Xen
On Fri, Jun 14, 2024 at 10:12:40AM +0200, Jan Beulich wrote: > On 14.06.2024 09:21, Roger Pau Monné wrote: > > On Fri, Jun 14, 2024 at 08:38:51AM +0200, Jan Beulich wrote: > >> On 13.06.2024 20:43, Demi Marie Obenour wrote: > >>> GPU acceleration requires that pageable host memory be able to be mapped > >>> into a guest. > >> > >> I'm sure it was explained in the session, which sadly I couldn't attend. > >> I've been asking Ray and Xenia the same before, but I'm afraid it still > >> hasn't become clear to me why this is a _requirement_. After all that's > >> against what we're doing elsewhere (i.e. so far it has always been > >> guest memory that's mapped in the host). I can appreciate that it might > >> be more difficult to implement, but avoiding to violate this fundamental > >> (kind of) rule might be worth the price (and would avoid other > >> complexities, of which there may be lurking more than what you enumerate > >> below). > > > > My limited understanding (please someone correct me if wrong) is that > > the GPU buffer (or context I think it's also called?) is always > > allocated from dom0 (the owner of the GPU). The underling memory > > addresses of such buffer needs to be mapped into the guest. The > > buffer backing memory might be GPU MMIO from the device BAR(s) or > > system RAM, and such buffer can be paged by the dom0 kernel at any > > time (iow: changing the backing memory from MMIO to RAM or vice > > versa). Also, the buffer must be contiguous in physical address > > space. > > This last one in particular would of course be a severe restriction. > Yet: There's an IOMMU involved, isn't there? Yup, IIRC that's why Ray said it was much more easier for them to support VirtIO GPUs from a PVH dom0 rather than classic PV one. It might be easier to implement from a classic PV dom0 if there's pv-iommu support, so that dom0 can create it's own contiguous memory buffers from the device PoV. > > I'm not sure it's possible to ensure that when using system RAM such > > memory comes from the guest rather than the host, as it would likely > > require some very intrusive hooks into the kernel logic, and > > negotiation with the guest to allocate the requested amount of > > memory and hand it over to dom0. If the maximum size of the buffer is > > known in advance maybe dom0 can negotiate with the guest to allocate > > such a region and grant it access to dom0 at driver attachment time. > > Besides the thought of transiently converting RAM to kind-of-MMIO, this As a note here, changing the type to MMIO would likely involve modifying the EPT/NPT tables to propagate the new type. On a PVH dom0 this would likely involve shattering superpages in order to set the correct memory types. Depending on how often and how random those system RAM changes are necessary this could also create contention on the p2m lock. > makes me think of another possible option: Could Dom0 transfer ownership > of the RAM that wants mapping in the guest (remotely resembling > grant-transfer)? Would require the guest to have ballooned down enough > first, of course. (In both cases it would certainly need working out how > the conversion / transfer back could be made work safely and reasonably > cleanly.) Maybe. The fact the guest needs to balloon down that amount of memory seems weird to me, as from the guest PoV that mapped memory is MMIO-like and not system RAM. Thanks, Roger.
Re: Design session notes: GPU acceleration in Xen
On 14.06.2024 09:21, Roger Pau Monné wrote: > On Fri, Jun 14, 2024 at 08:38:51AM +0200, Jan Beulich wrote: >> On 13.06.2024 20:43, Demi Marie Obenour wrote: >>> GPU acceleration requires that pageable host memory be able to be mapped >>> into a guest. >> >> I'm sure it was explained in the session, which sadly I couldn't attend. >> I've been asking Ray and Xenia the same before, but I'm afraid it still >> hasn't become clear to me why this is a _requirement_. After all that's >> against what we're doing elsewhere (i.e. so far it has always been >> guest memory that's mapped in the host). I can appreciate that it might >> be more difficult to implement, but avoiding to violate this fundamental >> (kind of) rule might be worth the price (and would avoid other >> complexities, of which there may be lurking more than what you enumerate >> below). > > My limited understanding (please someone correct me if wrong) is that > the GPU buffer (or context I think it's also called?) is always > allocated from dom0 (the owner of the GPU). The underling memory > addresses of such buffer needs to be mapped into the guest. The > buffer backing memory might be GPU MMIO from the device BAR(s) or > system RAM, and such buffer can be paged by the dom0 kernel at any > time (iow: changing the backing memory from MMIO to RAM or vice > versa). Also, the buffer must be contiguous in physical address > space. This last one in particular would of course be a severe restriction. Yet: There's an IOMMU involved, isn't there? > I'm not sure it's possible to ensure that when using system RAM such > memory comes from the guest rather than the host, as it would likely > require some very intrusive hooks into the kernel logic, and > negotiation with the guest to allocate the requested amount of > memory and hand it over to dom0. If the maximum size of the buffer is > known in advance maybe dom0 can negotiate with the guest to allocate > such a region and grant it access to dom0 at driver attachment time. Besides the thought of transiently converting RAM to kind-of-MMIO, this makes me think of another possible option: Could Dom0 transfer ownership of the RAM that wants mapping in the guest (remotely resembling grant-transfer)? Would require the guest to have ballooned down enough first, of course. (In both cases it would certainly need working out how the conversion / transfer back could be made work safely and reasonably cleanly.) Jan > One aspect that I'm lacking clarity is better understanding of how the > process of allocating and assigning a GPU buffer to a guest is > performed (I think this is the key to how GPU VirtIO native contexts > work?). > > Another question I have, are guest expected to have a single GPU > buffer, or they can have multiple GPU buffers simultaneously > allocated? > > Regards, Roger.
Re: [PATCH v4 0/7] Static shared memory followup v2 - pt2
Hi Julien, > On 13 Jun 2024, at 12:31, Julien Grall wrote: > > Hi, > > On 11/06/2024 13:42, Michal Orzel wrote: >>> We would like this serie to be in Xen 4.19, there was a misunderstanding on >>> our side because we thought >>> that since the serie was sent before the last posting date, it could have >>> been a candidate for merging in the >>> new release, instead after speaking with Julien and Oleksii we are now >>> aware that we need to provide a >>> justification for its presence. >>> >>> Pros to this serie is that we are closing the circle for static shared >>> memory, allowing it to use memory from >>> the host or from Xen, it is also a feature that is not enabled by default, >>> so it should not cause too much >>> disruption in case of any bugs that escaped the review, however we’ve >>> tested many configurations for that >>> with/without enabling the feature if that can be an additional value. >>> >>> Cons: we are touching some common code related to p2m, but also there the >>> impact should be minimal because >>> the new code is subject to l2 foreign mapping (to be confirmed maybe from a >>> p2m expert like Julien). >>> >>> The comments on patch 3 of this serie are addressed by this patch: >>> https://patchwork.kernel.org/project/xen-devel/patch/20240528125603.2467640-1-luca.fance...@arm.com/ >>> And the serie is fully reviewed. >>> >>> So our request is to allow this serie in 4.19, Oleksii, ARM maintainers, do >>> you agree on that? >> As a main reviewer of this series I'm ok to have this series in. It is >> nicely encapsulated and the feature itself >> is still in unsupported state. I don't foresee any issues with it. > > There are changes in the p2m code and the memory allocation for boot domains. > So is it really encapsulated? > > For me there are two risks: > * p2m (already mentioned by Luca): We modify the code to put foreign mapping. > The worse that can happen if we don't release the foreign mapping. This would > mean the page will not be freed. AFAIK, we don't exercise this path in the CI. > * domain allocation: This mainly look like refactoring. And the path is > exercised in the CI. > > So I am not concerned with the domain allocation one. @Luca, would it be > possible to detail how did you test the foreign pages were properly removed? So at first we tested the code, with/without the static shared memory feature enabled, creating/destroying guest from Dom0 and checking that everything was ok. After a chat on Matrix with Julien he suggested that using a virtio-mmio disk was better to stress out the foreign mapping looking for regressions. Luckily I’ve found this slide deck from @Oleksandr: https://static.linaro.org/connect/lvc21/presentations/lvc21-314.pdf So I did a setup using fvp-base, having a disk with two partitions containing Dom0 rootfs and DomU rootfs, Dom0 sees this disk using VirtIO block. The Dom0 rootfs contains the virtio-disk backend: https://github.com/xen-troops/virtio-disk And the DomU XL configuration is using these parameters: cmdline="console=hvc0 root=/dev/vda rw" disk = ['/dev/vda2,raw,xvda,w,specification=virtio’] Running the setup and creating/destroying a couple of times the guest is not showing regressions, here an example of the output: root@fvp-base:/opt/xtp/guests/linux-guests# xl create -c linux-ext-arm64-stresstests-rootfs.cfg Parsing config from linux-ext-arm64-stresstests-rootfs.cfg main: read frontend domid 2 Info: connected to dom2 demu_seq_next: >XENSTORE_ATTACHED demu_seq_next: domid = 2 demu_seq_next: devid = 51712 demu_seq_next: filename[0] = /dev/vda2 demu_seq_next: readonly[0] = 0 demu_seq_next: base[0] = 0x200 demu_seq_next: irq[0] = 33 demu_seq_next: >XENEVTCHN_OPEN demu_seq_next: >XENFOREIGNMEMORY_OPEN demu_seq_next: >XENDEVICEMODEL_OPEN demu_seq_next: >XENGNTTAB_OPEN demu_initialize: 1 vCPU(s) demu_seq_next: >SERVER_REGISTERED demu_seq_next: ioservid = 0 demu_seq_next: >RESOURCE_MAPPED demu_seq_next: shared_iopage = 0x7f80c58000 demu_seq_next: >SERVER_ENABLED demu_seq_next: >PORT_ARRAY_ALLOCATED demu_seq_next: >EVTCHN_PORTS_BOUND demu_seq_next: VCPU0: 3 -> 6 demu_register_memory_space: 200 - 20001ff Info: (virtio/mmio.c) virtio_mmio_init:165: virtio-mmio.devices=0x200@0x200:33 demu_seq_next: >DEVICE_INITIALIZED demu_seq_next: >INITIALIZED IO request not ready (XEN) d2v0 Unhandled SMC/HVC: 0x8450 (XEN) d2v0 Unhandled SMC/HVC: 0x8600ff01 (XEN) d2v0: vGICD: RAZ on reserved register offset 0x0c (XEN) d2v0: vGICD: unhandled word write 0x00 to ICACTIVER4 (XEN) d2v0: vGICR: SGI: unhandled word write 0x00 to ICACTIVER0 [0.00] Booting Linux on physical CPU 0x00 [0x410fd0f0] [0.00] Linux version 6.1.25 (lucfan01@e125770) (aarch64-poky-linux-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.40.20230119) #4 SMP PREEMPT Thu Jun 13 21:55:06 UTC 2024 [0.00] Machine model: XENVM-4.19 [0.00] Xen 4.19 support found [
Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail
On Thu, Jun 13, 2024 at 04:05:08PM +0200, Christoph Hellwig wrote: > On Wed, Jun 12, 2024 at 05:56:15PM +0200, Roger Pau Monné wrote: > > Right. AFAICT advertising "feature-barrier" and/or > > "feature-flush-cache" could be done based on whether blkback > > understand those commands, not on whether the underlying storage > > supports the equivalent of them. > > > > Worst case we can print a warning message once about the underlying > > storage failing to complete flush/barrier requests, and that data > > integrity might not be guaranteed going forward, and not propagate the > > error to the upper layer? > > > > What would be the consequence of propagating a flush error to the > > upper layers? > > If you propage the error to the upper layer you will generate an > I/O error there, which usually leads to a file system shutdown. > > > Given the description of the feature in the blkif header, I'm afraid > > we cannot guarantee that seeing the feature exposed implies barrier or > > flush support, since the request could fail at any time (or even from > > the start of the disk attachment) and it would still sadly be a correct > > implementation given the description of the options. > > Well, then we could do something like the patch below, which keeps > the existing behavior, but insolates the block layer from it and > removes the only user of blk_queue_write_cache from interrupt > context: LGTM, I'm not sure there's much else we can do. > --- > From e6e82c769ab209a77302994c3829cf6ff7a595b8 Mon Sep 17 00:00:00 2001 > From: Christoph Hellwig > Date: Thu, 30 May 2024 08:58:52 +0200 > Subject: xen-blkfront: don't disable cache flushes when they fail > > blkfront always had a robust negotiation protocol for detecting a write > cache. Stop simply disabling cache flushes in the block layer as the > flags handling is moving to the atomic queue limits API that needs > user context to freeze the queue for that. Instead handle the case > of the feature flags cleared inside of blkfront. This removes old > debug code to check for such a mismatch which was previously impossible > to hit, including the check for passthrough requests that blkfront > never used to start with. > > Signed-off-by: Christoph Hellwig > --- > drivers/block/xen-blkfront.c | 44 +++- > 1 file changed, 23 insertions(+), 21 deletions(-) > > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c > index 9b4ec3e4908cce..e2c92d5095ff17 100644 > --- a/drivers/block/xen-blkfront.c > +++ b/drivers/block/xen-blkfront.c > @@ -788,6 +788,14 @@ static int blkif_queue_rw_req(struct request *req, > struct blkfront_ring_info *ri >* A barrier request a superset of FUA, so we can >* implement it the same way. (It's also a FLUSH+FUA, >* since it is guaranteed ordered WRT previous writes.) > + * > + * Note that can end up here with a FUA write and the > + * flags cleared. This happens when the flag was > + * run-time disabled and raced with I/O submission in > + * the block layer. We submit it as a normal write Since blkfront no longer signals that FUA is no longer available for the device, getting a request with FUA is not actually a race I think? > + * here. A pure flush should never end up here with > + * the flags cleared as they are completed earlier for > + * the !feature_flush case. >*/ > if (info->feature_flush && info->feature_fua) > ring_req->operation = > @@ -795,8 +803,6 @@ static int blkif_queue_rw_req(struct request *req, struct > blkfront_ring_info *ri > else if (info->feature_flush) > ring_req->operation = > BLKIF_OP_FLUSH_DISKCACHE; > - else > - ring_req->operation = 0; > } > ring_req->u.rw.nr_segments = num_grant; > if (unlikely(require_extra_req)) { > @@ -887,16 +893,6 @@ static inline void flush_requests(struct > blkfront_ring_info *rinfo) > notify_remote_via_irq(rinfo->irq); > } > > -static inline bool blkif_request_flush_invalid(struct request *req, > -struct blkfront_info *info) > -{ > - return (blk_rq_is_passthrough(req) || > - ((req_op(req) == REQ_OP_FLUSH) && > - !info->feature_flush) || > - ((req->cmd_flags & REQ_FUA) && > - !info->feature_fua)); > -} > - > static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx *hctx, > const struct blk_mq_queue_data *qd) > { > @@ -908,23 +904,30 @@ static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx >
Re: [PATCH v3 0/3] x86/irq: fixes for CPU hot{,un}plug
Sorry, forgot to add the for-4.19 tag and Cc Oleksii. Since we have taken the start of the series, we might as well take the remaining patches (if other x86 maintainers agree) and attempt to hopefully fix all the interrupt issues with CPU hotplug/unplug. FTR: there are further issues when doing CPU hotplug/unplug from a PVH dom0, but those are out of the scope for 4.19, as I haven't even started to diagnose what's going on. Thanks, Roger. On Thu, Jun 13, 2024 at 06:56:14PM +0200, Roger Pau Monne wrote: > Hello, > > The following series aim to fix interrupt handling when doing CPU > plug/unplug operations. Without this series running: > > cpus=`xl info max_cpu_id` > while [ 1 ]; do > for i in `seq 1 $cpus`; do > xen-hptool cpu-offline $i; > xen-hptool cpu-online $i; > done > done > > Quite quickly results in interrupts getting lost and "No irq handler for > vector" messages on the Xen console. Drivers in dom0 also start getting > interrupt timeouts and the system becomes unusable. > > After applying the series running the loop over night still result in a > fully usable system, no "No irq handler for vector" messages at all, no > interrupt loses reported by dom0. Test with x2apic-mode={mixed,cluster}. > > I've attempted to document all code as good as I could, interrupt > handling has some unexpected corner cases that are hard to diagnose and > reason about. > > Some XenRT testing is undergoing to ensure no breakages. > > Thanks, Roger. > > Roger Pau Monne (3): > x86/irq: deal with old_cpu_mask for interrupts in movement in > fixup_irqs() > x86/irq: handle moving interrupts in _assign_irq_vector() > x86/irq: forward pending interrupts to new destination in fixup_irqs() > > xen/arch/x86/include/asm/apic.h | 5 + > xen/arch/x86/irq.c | 163 +--- > 2 files changed, 132 insertions(+), 36 deletions(-) > > -- > 2.45.2 >
Re: Design session notes: GPU acceleration in Xen
On Fri, Jun 14, 2024 at 08:38:51AM +0200, Jan Beulich wrote: > On 13.06.2024 20:43, Demi Marie Obenour wrote: > > GPU acceleration requires that pageable host memory be able to be mapped > > into a guest. > > I'm sure it was explained in the session, which sadly I couldn't attend. > I've been asking Ray and Xenia the same before, but I'm afraid it still > hasn't become clear to me why this is a _requirement_. After all that's > against what we're doing elsewhere (i.e. so far it has always been > guest memory that's mapped in the host). I can appreciate that it might > be more difficult to implement, but avoiding to violate this fundamental > (kind of) rule might be worth the price (and would avoid other > complexities, of which there may be lurking more than what you enumerate > below). My limited understanding (please someone correct me if wrong) is that the GPU buffer (or context I think it's also called?) is always allocated from dom0 (the owner of the GPU). The underling memory addresses of such buffer needs to be mapped into the guest. The buffer backing memory might be GPU MMIO from the device BAR(s) or system RAM, and such buffer can be paged by the dom0 kernel at any time (iow: changing the backing memory from MMIO to RAM or vice versa). Also, the buffer must be contiguous in physical address space. I'm not sure it's possible to ensure that when using system RAM such memory comes from the guest rather than the host, as it would likely require some very intrusive hooks into the kernel logic, and negotiation with the guest to allocate the requested amount of memory and hand it over to dom0. If the maximum size of the buffer is known in advance maybe dom0 can negotiate with the guest to allocate such a region and grant it access to dom0 at driver attachment time. One aspect that I'm lacking clarity is better understanding of how the process of allocating and assigning a GPU buffer to a guest is performed (I think this is the key to how GPU VirtIO native contexts work?). Another question I have, are guest expected to have a single GPU buffer, or they can have multiple GPU buffers simultaneously allocated? Regards, Roger.
Re: [RFC XEN PATCH v9 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
On 2024/6/14 14:41, Jan Beulich wrote: > On 14.06.2024 05:11, Chen, Jiqian wrote: >> On 2024/6/13 20:51, Anthony PERARD wrote: >>> On Wed, Jun 12, 2024 at 10:55:14AM +, Chen, Jiqian wrote: On 2024/6/12 18:34, Jan Beulich wrote: > On 12.06.2024 12:12, Chen, Jiqian wrote: >> On 2024/6/11 22:39, Jan Beulich wrote: >>> On 07.06.2024 10:11, Jiqian Chen wrote: +r = xc_domain_gsi_permission(ctx->xch, domid, gsi, map); >>> >>> Looking at the hypervisor side, this will fail for PV Dom0. In which >>> case imo >>> you better would avoid making the call in the first place. >> Yes, for PV dom0, the errno is EOPNOTSUPP, then it will do below >> xc_domain_irq_permission. > > Hence why call xc_domain_gsi_permission() at all on a PV Dom0? Is there a function to distinguish that current dom0 is PV or PVH dom0 in tools/libs? >>> >>> That might have never been needed before, so probably not. There's >>> libxl__domain_type() but if that works with dom0 it might return "HVM" >>> for PVH dom0. So if xc_domain_getinfo_single() works and give the right >>> info about dom0, libxl__domain_type() could be extended to deal with >>> dom0 I guess. I don't know if there's a good way to find out which >>> flavor of dom0 is running. >> Thanks Anthony! >> I think here we really need to check is that whether current domain has PIRQ >> flag(X86_EMU_USE_PIRQ) or not. >> And it seems xc_domain_gsi_permission already return the information. > > By way of failing, if I'm not mistaken? As indicated before, I don't > think you should invoke the function when it's clear it's going to fail. Sorry, I wrote wrong here, it should be " And it seems xc_domain_getinfo_single already return the information." And next version will be like: xc_domaininfo_t xcinfo; xc_domain_getinfo_single(xc_handle, domid, ); if( xcinfo.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ ) xc_domain_irq_permission else xc_domain_gsi_permission > > Jan > >> If current domain has no PIRQs, then I should use xc_domain_gsi_permission >> to grant permission, otherwise I should >> keep the original function xc_domain_irq_permission. >> >>> >>> Cheers, >>> >> > -- Best regards, Jiqian Chen.
Re: [XEN PATCH v2] automation/eclair: add deviation for MISRA C Rule 17.7
On 14.06.2024 08:31, Federico Serafini wrote: > --- a/automation/eclair_analysis/ECLAIR/deviations.ecl > +++ b/automation/eclair_analysis/ECLAIR/deviations.ecl > @@ -413,6 +413,10 @@ explicit comment indicating the fallthrough intention is > present." > -config=MC3R1.R17.1,macros+={hide , "^va_(arg|start|copy|end)$"} > -doc_end > > +-doc_begin="Not using the return value of a function do not endanger safety > if it coincides with the first actual argument." > +-config=MC3R1.R17.7,calls+={safe, "any()", > "decl(name(__builtin_memcpy||__builtin_memmove||__builtin_memset||cpumask_check))"} While correct here, ... > --- a/docs/misra/deviations.rst > +++ b/docs/misra/deviations.rst > @@ -364,6 +364,15 @@ Deviations related to MISRA C:2012 Rules: > by `stdarg.h`. > - Tagged as `deliberate` for ECLAIR. > > + * - R17.7 > + - Not using the return value of a function do not endanger safety if it > + coincides with the first actual argument. > + - Tagged as `safe` for ECLAIR. Such functions are: > + - __builtin_memcpy() > + - __builtin_memmove() > + - __builtin_memset() > + - __cpumask_check() ... there are stray leading underscores on the last one here. With that adjustment (and perhaps "s/ do / does /") the deviations.rst change would then look okay to me, but I don't feel competent to ack deviations.ecl changes. Still, as another question: Is it really relevant here that the argument in question is specifically the 1st one? Jan
Re: [RFC XEN PATCH v9 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
On 14.06.2024 05:11, Chen, Jiqian wrote: > On 2024/6/13 20:51, Anthony PERARD wrote: >> On Wed, Jun 12, 2024 at 10:55:14AM +, Chen, Jiqian wrote: >>> On 2024/6/12 18:34, Jan Beulich wrote: On 12.06.2024 12:12, Chen, Jiqian wrote: > On 2024/6/11 22:39, Jan Beulich wrote: >> On 07.06.2024 10:11, Jiqian Chen wrote: >>> +r = xc_domain_gsi_permission(ctx->xch, domid, gsi, map); >> >> Looking at the hypervisor side, this will fail for PV Dom0. In which >> case imo >> you better would avoid making the call in the first place. > Yes, for PV dom0, the errno is EOPNOTSUPP, then it will do below > xc_domain_irq_permission. Hence why call xc_domain_gsi_permission() at all on a PV Dom0? >>> Is there a function to distinguish that current dom0 is PV or PVH dom0 in >>> tools/libs? >> >> That might have never been needed before, so probably not. There's >> libxl__domain_type() but if that works with dom0 it might return "HVM" >> for PVH dom0. So if xc_domain_getinfo_single() works and give the right >> info about dom0, libxl__domain_type() could be extended to deal with >> dom0 I guess. I don't know if there's a good way to find out which >> flavor of dom0 is running. > Thanks Anthony! > I think here we really need to check is that whether current domain has PIRQ > flag(X86_EMU_USE_PIRQ) or not. > And it seems xc_domain_gsi_permission already return the information. By way of failing, if I'm not mistaken? As indicated before, I don't think you should invoke the function when it's clear it's going to fail. Jan > If current domain has no PIRQs, then I should use xc_domain_gsi_permission to > grant permission, otherwise I should > keep the original function xc_domain_irq_permission. > >> >> Cheers, >> >
Re: Design session notes: GPU acceleration in Xen
On 13.06.2024 20:43, Demi Marie Obenour wrote: > GPU acceleration requires that pageable host memory be able to be mapped > into a guest. I'm sure it was explained in the session, which sadly I couldn't attend. I've been asking Ray and Xenia the same before, but I'm afraid it still hasn't become clear to me why this is a _requirement_. After all that's against what we're doing elsewhere (i.e. so far it has always been guest memory that's mapped in the host). I can appreciate that it might be more difficult to implement, but avoiding to violate this fundamental (kind of) rule might be worth the price (and would avoid other complexities, of which there may be lurking more than what you enumerate below). > This requires changes to all of the Xen hypervisor, Linux > kernel, and userspace device model. > > ### Goals > > - Allow any userspace pages to be mapped into a guest. > - Support deprivileged operation: this API must not be usable for privilege > escalation. > - Use MMU notifiers to ensure safety with respect to use-after-free. > > ### Hypervisor changes > > There are at least two Xen changes required: > > 1. Add a new flag to IOREQ that means "retry this instruction". > >An IOREQ server can set this flag after having successfully handled a >page fault. It is expected that the IOREQ server has successfully >mapped a page into the guest at the location of the fault. >Otherwise, the same fault will likely happen again. Were there any thoughts on how to prevent this becoming an infinite loop? I.e. how to (a) guarantee forward progress in the guest and (b) deal with misbehaving IOREQ servers? > 2. Add support for `XEN_DOMCTL_memory_mapping` to use system RAM, not >just IOMEM. Mappings made with `XEN_DOMCTL_memory_mapping` are >guaranteed to be able to be successfully revoked with >`XEN_DOMCTL_memory_mapping`, so all operations that would create >extra references to the mapped memory must be forbidden. These >include, but may not be limited to: > >1. Granting the pages to the same or other domains. >2. Mapping into another domain using `XEN_DOMCTL_memory_mapping`. >3. Another domain accessing the pages using the foreign memory APIs, > unless it is privileged over the domain that owns the pages. All of which may call for actually converting the memory to kind-of-MMIO, with a means to later convert it back. Jan >Open question: what if the other domain goes away? Ideally, >unmapping would (vacuously) succeed in this case. Qubes OS doesn't >care about domid reuse but others might. > > ### Kernel changes > > Linux will add support for mapping userspace memory into an emulated PCI > BAR. This requires Linux to automatically revoke access when needed. > > There will be an IOREQ server that handles page faults. The discussion > assumed that this handling will happen in kernel mode, but if handling > in user mode is simpler that is also an option. > > There is no async #PF in Xen (yet), so the entire vCPU will be blocked > while the fault is handled. This is not great for performance, but > correctness comes first. > > There will be a new kernel ioctl to perform the mapping. A possible C > prototype (presented at design session, but not discussed there): > > struct xen_linux_register_memory { > uint64_t pointer; > uint64_t size; > uint64_t gpa; > uint32_t id; > uint32_t guest_domid; > };
[XEN PATCH v2] automation/eclair: add deviation for MISRA C Rule 17.7
Update ECLAIR configuration to deviate some cases where not using the return value of a function is not dangerous. Signed-off-by: Federico Serafini --- Changes in v2: - do not deviate strlcpy and strlcat. --- automation/eclair_analysis/ECLAIR/deviations.ecl | 4 docs/misra/deviations.rst| 9 + 2 files changed, 13 insertions(+) diff --git a/automation/eclair_analysis/ECLAIR/deviations.ecl b/automation/eclair_analysis/ECLAIR/deviations.ecl index 447c1e6661..97281082a8 100644 --- a/automation/eclair_analysis/ECLAIR/deviations.ecl +++ b/automation/eclair_analysis/ECLAIR/deviations.ecl @@ -413,6 +413,10 @@ explicit comment indicating the fallthrough intention is present." -config=MC3R1.R17.1,macros+={hide , "^va_(arg|start|copy|end)$"} -doc_end +-doc_begin="Not using the return value of a function do not endanger safety if it coincides with the first actual argument." +-config=MC3R1.R17.7,calls+={safe, "any()", "decl(name(__builtin_memcpy||__builtin_memmove||__builtin_memset||cpumask_check))"} +-doc_end + # # Series 18. # diff --git a/docs/misra/deviations.rst b/docs/misra/deviations.rst index 36959aa44a..2a10de5a66 100644 --- a/docs/misra/deviations.rst +++ b/docs/misra/deviations.rst @@ -364,6 +364,15 @@ Deviations related to MISRA C:2012 Rules: by `stdarg.h`. - Tagged as `deliberate` for ECLAIR. + * - R17.7 + - Not using the return value of a function do not endanger safety if it + coincides with the first actual argument. + - Tagged as `safe` for ECLAIR. Such functions are: + - __builtin_memcpy() + - __builtin_memmove() + - __builtin_memset() + - __cpumask_check() + * - R20.4 - The override of the keyword \"inline\" in xen/compiler.h is present so that section contents checks pass when the compiler chooses not to -- 2.34.1
Re: [PATCH for 4.19?] x86/Intel: unlock CPUID earlier for the BSP
On 13.06.2024 18:17, Andrew Cooper wrote: > On 13/06/2024 9:19 am, Jan Beulich wrote: >> Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If >> this bit is set by the BIOS then CPUID evaluation does not work when >> data from any leaf greater than two is needed; early_cpu_init() in >> particular wants to collect leaf 7 data. >> >> Cure this by unlocking CPUID right before evaluating anything which >> depends on the maximum CPUID leaf being greater than two. >> >> Inspired by (and description cloned from) Linux commit 0c2f6d04619e >> ("x86/topology/intel: Unlock CPUID before evaluating anything"). >> >> Signed-off-by: Jan Beulich >> --- >> While I couldn't spot anything, it kind of feels as if I'm overlooking >> further places where we might be inspecting in particular leaf 7 yet >> earlier. >> >> No Fixes: tag(s), as imo it would be too many that would want >> enumerating. > > I also saw that go by, but concluded that Xen doesn't need it, hence why > I left it alone. > > The truth is that only the BSP needs it. APs sort it out in the > trampoline via trampoline_misc_enable_off, because they need to clear > XD_DISABLE in prior to enabling paging, so we should be taking it out of > early_init_intel(). Except for the (odd) case also mentioned to Roger, where the BSP might have the bit clear but some (or all) AP(s) have it set. > But, we don't have an early BSP-only early hook, and I'm not overwhelmed > at the idea of exporting it from intel.c > > I was intending to leave it alone until I can burn this whole > infrastructure to the ground and make it work nicely with policies, but > that's not a job for this point in the release... This last part reads like the rest of your reply isn't an objection to me putting this in with Roger's R-b, but it would be nice if you could confirm this understanding of mine. Without this last part it, especially the 2nd from last paragraph, certainly reads a little like an objection. Jan
Re: [PATCH V3 (resend) 01/19] x86: Create per-domain mapping of guest_root_pt
On 13.06.2024 18:31, Elias El Yandouzi wrote: > On 16/05/2024 08:17, Jan Beulich wrote: >> On 15.05.2024 20:25, Elias El Yandouzi wrote: >>> However, I noticed quite a weird bug while doing some testing. I may >>> need your expertise to find the root cause. >> >> Looks like you've overflowed the dom0 kernel stack, most likely because >> of recurring nested exceptions. >> >>> In the case where I have more vCPUs than pCPUs (and let's consider we >>> have one pCPU for two vCPUs), I noticed that I would always get a page >>> fault in dom0 kernel (5.10.0-13-amd64) at the exact same location. I did >>> a bit of investigation but I couldn't come to a clear conclusion. >>> Looking at the stack trace [1], I have the feeling the crash occurs in a >>> loop or a recursive call. >>> >>> I tried to identify where the crash occurred using addr2line: >>> >>> > addr2line -e vmlinux-5.10.0-29-amd64 0x810218a0 >>> debian/build/build_amd64_none_amd64/arch/x86/xen/mmu_pv.c:880 >>> >>> It turns out to point on the closing bracket of the function >>> xen_mm_unpin_all()[2]. >>> >>> I thought the crash could happen while returning from the function in >>> the assembly epilogue but the output of objdump doesn't even show the >>> address. >>> >>> The only theory I could think of was that because we only have one pCPU, >>> we may never execute one of the two vCPUs, and never setup the mapping >>> to the guest_root_pt in write_ptbase(), hence the page fault. This is >>> just a random theory, I couldn't find any hint suggesting it would be >>> the case though. Any idea how I could debug this? >> >> I guess you want to instrument Xen enough to catch the top level fault (or >> the 2nd from top, depending on where the nesting actually starts) to see >> why that happens. Quite likely some guest mapping isn't set up properly. >> > > Julien helped me with this one and I believe we have identified the > problem. > > As you've suggested, I wrote the mapping of the guest root PT in our > per-domain section, root_pt_l1tab, within write_ptbase() function as > we'd always be in the case v == current plus switch_cr3_cr4() would > always flush local tlb. > > However, there exists a path, in toggle_guest_mode(), where we could > call update_cr3()/make_cr3() without calling write_ptbase() and hence > not maintain mappings properly. Instead toggle_guest_mode() has a partly > open-coded version of write_ptbase(). > > Would you rather like to see the mappings written in make_cr3() or in > toggle_guest_mode() within the pseudo open-coded version of write_ptbase()? Likely the latter, but that's hard to tell without seeing the resulting code. Jan