Re: [PATCH v5 23/23] xen/README: add compiler and binutils versions for RISC-V64
On 26.02.2024 18:39, Oleksii Kurochko wrote: > This patch doesn't represent a strict lower bound for GCC and > GNU Binutils; rather, these versions are specifically employed by > the Xen RISC-V container and are anticipated to undergo continuous > testing. Up and until that container would be updated to a newer gcc. I'm afraid I view this as too weak a criteria, but I'm also not meaning to stand in the way if somebody else wants to ack this patch in this form; my bare minimum requirement is now met. > --- a/README > +++ b/README > @@ -48,6 +48,15 @@ provided by your OS distributor: >- For ARM 64-bit: > - GCC 5.1 or later > - GNU Binutils 2.24 or later > + - For RISC-V 64-bit: > +- GCC 12.2 or later > +- GNU Binutils 2.39 or later > +This doesn't represent a strict lower bound for GCC and GNU Binutils; > +rather, these versions are specifically employed by the Xen RISC-V > +container and are anticipated to undergo continuous testing. As per above, I think here it really needs saying "at the time of writing" or recording a concrete date. Furthermore I expect "these versions" relates to the specifically named versions and particularly _not_ to "or later": With the criteria you apply, using later versions (or in fact any version other than the very specific ones used in the container) would be similarly untested. Much like x86 and Arm don't have the full range of permitted tool chain versions continuously tested. Plus don't forget that distros may apply their own selection of patches on top of what they take from upstream (and they may also take random snapshots rather than released versions). IOW it is hard for me to see why RISC-V needs stronger restrictions here than other architectures. It ought to be possible to determine a baseline version. Even if taking the desire to have "pause" available as a requirement, gas (and presumably gld) 2.36.1 would already suffice. Jan
Re: [PATCH v5 03/23] xen/riscv: introduce nospec.h
On 26.02.2024 18:38, Oleksii Kurochko wrote: > From the unpriviliged doc: > No standard hints are presently defined. > We anticipate standard hints to eventually include memory-system spatial > and temporal locality hints, branch prediction hints, thread-scheduling > hints, security tags, and instrumentation flags for simulation/emulation. > > Also, there are no speculation execution barriers. > > Therefore, functions evaluate_nospec() and block_speculation() should > remain empty until a specific platform has an extension to deal with > speculation execution. What about array_index_mask_nospec(), though? No custom implementation, meaning the generic one will be used there? If that's the intention, then ... > Signed-off-by: Oleksii Kurochko Acked-by: Jan Beulich Jan
Re: [PATCH v5 02/23] xen/riscv: use some asm-generic headers
On 26.02.2024 18:38, Oleksii Kurochko wrote: > The following headers end up the same as asm-generic's version: > * altp2m.h > * device.h > * div64.h > * hardirq.h > * hypercall.h > * iocap.h > * paging.h > * percpu.h > * random.h > * softirq.h > * vm_event.h > > RISC-V should utilize the asm-generic's version of the mentioned > headers instead of introducing them in the arch-specific folder. > > Signed-off-by: Oleksii Kurochko > Acked-by: Jan Beulich > --- > Changes in V5: > - Nothing changed. Only rebase. > - update the commit message. > - drop the message above revision log as there is no depenency for this patch >from other patch series. Please can you make sure you submit patches against a sufficiently up-to- date version of the staging branch? I committed v4 of this patch a couple of days ago already, upon your own confirmation that it was okay to go in ahead of others. Jan
Re: [XEN PATCH 2/2] xen/cpu: address MISRA C Rule 17.7
On 27.02.2024 01:26, Stefano Stabellini wrote: > On Mon, 26 Feb 2024, Jan Beulich wrote: >> On 23.02.2024 10:35, Nicola Vetrini wrote: >>> Refactor cpu_notifier_call_chain into two functions: >>> - the variant that is allowed to fail loses the nofail flag >>> - the variant that shouldn't fail is encapsulated in a call >>> to the failing variant, with an additional check. >>> >>> This prevents uses of the function that are not supposed to >>> fail from ignoring the return value, thus violating Rule 17.7: >>> "The value returned by a function having non-void return type shall >>> be used". >>> >>> No functional change. >>> >>> Signed-off-by: Nicola Vetrini >> >> I'm afraid I disagree with this kind of bifurcation. No matter what >> Misra thinks or says, it is normal for return values of functions to >> not always be relevant to check. > > Hi Jan, I disagree. > > Regardless of MISRA, I really think return values need to be checked. > Moreover, we decided as a group to honor MISRA Rule 17.7, which requires > return values to be checked. This patch is a good step forward. Yet splitting functions isn't the only way to deal with Misra's requirements, I suppose. After all there are functions where the return value is purely courtesy for perhaps just one of its callers. Splitting simply doesn't scale very well, imo. >> To deal with the Misra rule imo requires to first have an abstract >> plan of how to handle such globally in the code base. Imo such a plan >> can't be to introduce perhaps dozens of new wrapper functions like is >> done here. > > This patch is following the right pattern, one we already follow with > the _locked suffix. Right, and - just to mention it - one which I similarly dislike, albeit to a lesser degree. Jan
Re: [XEN PATCH 3/4] xen/include: add pure and const attributes
On 27.02.2024 00:48, Stefano Stabellini wrote: > On Mon, 26 Feb 2024, Jan Beulich wrote: >> On 23.02.2024 23:36, Stefano Stabellini wrote: >>> On Fri, 23 Feb 2024, Jan Beulich wrote: On 23.02.2024 02:32, Stefano Stabellini wrote: > On Tue, 24 Oct 2023, Jan Beulich wrote: >> On 24.10.2023 00:05, Stefano Stabellini wrote: >>> On Mon, 23 Oct 2023, Jan Beulich wrote: On 23.10.2023 17:23, Simone Ballarin wrote: > On 23/10/23 15:34, Jan Beulich wrote: >> On 18.10.2023 16:18, Simone Ballarin wrote: >>> --- a/xen/include/xen/pdx.h >>> +++ b/xen/include/xen/pdx.h >>> @@ -164,7 +164,7 @@ static inline unsigned long pfn_to_pdx(unsigned >>> long pfn) >>>* @param pdx Page index >>>* @return Obtained pfn after decompressing the pdx >>>*/ >>> -static inline unsigned long pdx_to_pfn(unsigned long pdx) >>> +static inline __attribute_pure__ unsigned long pdx_to_pfn(unsigned >>> long pdx) >>> { >>> return (pdx & pfn_pdx_bottom_mask) | >>> ((pdx << pfn_pdx_hole_shift) & pfn_top_mask); >> >> Taking this as an example for what I've said above: The compiler >> can't >> know that the globals used by the functions won't change value. Even >> within Xen it is only by convention that these variables are assigned >> their values during boot, and then aren't changed anymore. Which >> makes >> me wonder: Did you check carefully that around the time the variables >> have their values established, no calls to the functions exist (which >> might then be subject to folding)? > > There is no need to check that, the GCC documentation explicitly says: > > However, functions declared with the pure attribute *can safely read > any > non-volatile objects*, and modify the value of objects in a way that > does not affect their return value or the observable state of the > program. I did quote this same text in response to what Andrew has said, but I also did note there that this needs to be taken with a grain of salt: The compiler generally assumes a single-threaded environment, i.e. no changes to globals behind the back of the code it is processing. >>> >>> Let's start from the beginning. The reason for Simone to add >>> __attribute_pure__ to pdx_to_pfn and other functions is for >>> documentation purposes. It is OK if it doesn't serve any purpose other >>> than documentation. >>> >>> Andrew, for sure we do not want to lie to the compiler and introduce >>> undefined behavior. If we think there is a risk of it, we should not do >>> it. >>> >>> So, what do we want to document? We want to document that the function >>> does not have side effects according to MISRA's definition of it, which >>> might subtly differ from GCC's definition. >>> >>> Looking at GCC's definition of __attribute_pure__, with the >>> clarification statement copy/pasted above by both Simone and Jan, it >>> seems that __attribute_pure__ matches MISRA's definition of a function >>> without side effects. It also seems that pdx_to_pfn abides to that >>> definition. >>> >>> Jan has a point that GCC might be making other assumptions >>> (single-thread execution) that might not hold true in our case. Given >>> the way the GCC statement is written I think this is low risk. But maybe >>> not all GCC versions we want to support in the project might have the >>> same definition of __attribute_pure__. So we could end up using >>> __attribute_pure__ correctly for the GCC version used for safety (GCC >>> 12.1, see docs/misra/C-language-toolchain.rst) but it might actually >>> break an older GCC version. >>> >>> >>> So Option#1 is to use __attribute_pure__ taking the risk that a GCC or >>> Clang version might interpret __attribute_pure__ differently and >>> potentially misbehave. >>> >>> Option#2 is to avoid this risk, by not using __attribute_pure__. >>> Instead, we can use SAF-xx-safe or deviations.rst to document that >>> pdx_to_pfn and other functions like it are without side effects >>> according to MISRA's definition. >>> >>> >>> Both options have pros and cons. To me the most important factor is how >>> many GCC versions come with the statement "pure attribute can safely >>> read any non-volatile objects, and modify the value of objects in a way >>> that does not affect their return value or the observable state of the >>> program". >>> >>> I checked and these are the results: >>> - gcc 4.0.2: no statement >>> - gcc 5.1.0: no statement >>> - gcc 6.1.0: no statement >>> - gcc 7.1.0: no statement >>
Re: [XEN PATCH 1/2] xen/console: drop return value from consoled_guest_rx/tx
On 26.02.2024 23:49, Stefano Stabellini wrote: > On Mon, 26 Feb 2024, Jan Beulich wrote: >> On 26.02.2024 09:23, Nicola Vetrini wrote: >>> On 2024-02-26 09:00, Jan Beulich wrote: On 23.02.2024 23:56, Stefano Stabellini wrote: > On Fri, 23 Feb 2024, Nicola Vetrini wrote: >> These functions never saw a usage of their return value since >> they were introduced, so it can be dropped since their usages >> violate MISRA C Rule 17.7: >> "The value returned by a function having non-void return type shall >> be used". >> >> No functional change. >> >> Signed-off-by: Nicola Vetrini > > Reviewed-by: Stefano Stabellini The cleanup is certainly okay, but would one of you mind clarifying in how far this code is relevant for certification? I don't expect there are plans to run shim Xen in any projected production uses for which certification is relevant? (The subject prefix is also unnecessarily wide here, when it's only daemon code which is affected, not console code in general.) >>> >>> I agree on the subject prefix being too wide. The configuration that >>> uses consoled_guest_tx is #ifdef-ed for x86, so even in configurations >>> that may never reach this condition this is relevant, unless its #ifdef >>> is restricted to cases where the call may actually be reachable. >> >> Hmm, I see. There are contradicting goals here then: It being just X86 is >> to reduce the risk of someone overlooking a build breakage they may >> introduce. Whereas for certification it's quite the other way around: We'd >> like to "hide" as much code as possible. >> >> Really I would have been inclined to suggest to drop the #ifdef, if >> possible even without replacing by IS_ENABLED(), but instead leveraging >> that pv_shim ought to be compile-time false whenever CONFIG_PV_SHIM=n. > > This is OK > > >> After all that's a pattern we've been trying to follow. But with your >> observation is becomes questionable whether extending use of IS_ENABLED() >> is actually going to be helpful. Stefano - perhaps something to discuss >> on one of the next meetings? > > Yes. I checked with the safety manager and his opinion is that > IS_ENABLED() is OK to use as a way to disable code from a safety > perspective. Yet unlike when #ifdef is used, such code would remain visible to e.g. Eclair even after the preprocessing step. Note the context in which I'm bringing this up - if IS_ENABLED() was properly used here (and as tightly as possible), the tool would still have complained, aiui. Jan
[ovmf test] 184777: all pass - PUSHED
flight 184777 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/184777/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 33c81c25bbd55141ad395af89160b4ed4b703cdf baseline version: ovmf d25421d0d8cd2493b30215ef80d2424ecb19c870 Last test of basis 184775 2024-02-26 22:43:02 Z0 days Testing same since 184777 2024-02-27 01:11:23 Z0 days1 attempts People who touched revisions under test: Michael Kubacki jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git d25421d0d8..33c81c25bb 33c81c25bbd55141ad395af89160b4ed4b703cdf -> xen-tested-master
[linux-linus test] 184771: tolerable FAIL - PUSHED
flight 184771 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/184771/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184762 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184762 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184762 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 184762 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 184762 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184762 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 184762 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184762 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass version targeted for testing: linux45ec2f5f6ed3ec3a79ba1329ad585497cdcbe663 baseline version: linuxd206a76d7d2726f3b096037f2079ce0bd3ba329b Last test of basis 184762 2024-02-26 01:13:54 Z1 days Testing same since 184771 2024-02-26 19:42:20 Z0 days1 attempts People who touched revisions under test: Chris Clayton David Sterba Elad Nachman Filipe Manana Han Xu Johannes Thumshirn Linus Torvalds Mark O'Donovan Miquel Raynal jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt
[ovmf test] 184775: all pass - PUSHED
flight 184775 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/184775/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf d25421d0d8cd2493b30215ef80d2424ecb19c870 baseline version: ovmf 68238d4f948069fc2c6b9cc13863bdced52a84d0 Last test of basis 184772 2024-02-26 19:43:08 Z0 days Testing same since 184775 2024-02-26 22:43:02 Z0 days1 attempts People who touched revisions under test: Gerd Hoffmann Michael Kubacki jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git 68238d4f94..d25421d0d8 d25421d0d8cd2493b30215ef80d2424ecb19c870 -> xen-tested-master
RE: question about virtio-vsock on xen
> Subject: Re: question about virtio-vsock on xen > > > > On 26.02.24 05:09, Peng Fan wrote: > > Hi Oleksandr, > > Hello Peng > > > [snip] > > >> > >> ... Peng, we have vhost-vsock (and vhost-net) Xen PoC. Although > >> it is non- upstreamable in its current shape (based on old Linux > >> version, requires some rework and proper integration, most likely > >> requires involving Qemu and protocol changes to pass an additional > >> info to vhost), it works with Linux > >> v5.10 + patched Qemu v7.0, so you can refer to the Yocto meta layer > >> which contains kernel patches for the details [1]. > > > > Thanks for the pointer, I am reading the code. > > > >> > >> In a nutshell, before accessing the guest data the host module needs > >> to map descriptors in virtio rings which contain either guest grant > >> based DMA addresses (by using Xen grant mappings) or guest > >> pseudo-physical addresses (by using Xen foreign mappings). After > >> accessing the guest data the host module needs to unmap them. > > > > Ok, I thought the current xen virtio code already map every ready. > > > > It does, as you said the virtio-blk-pci worked in your environment. But > vhost(-vsock) is a special case, unlike for virtio-blk-pci where the whole > backend resides in Qemu, here we have a split model. As I understand the > Qemu performs only initial setup/configuration then offloads the I/O > processing to a separate entity which is the Linux module in that particular > case. Thanks for sharing me the information. I need to learn more stuff:) Thanks, Peng.
[xen-unstable-smoke test] 184774: tolerable all pass - PUSHED
flight 184774 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/184774/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass version targeted for testing: xen 37541208f119a9c552c6c6c3246ea61be0d44035 baseline version: xen 03fb5f503cb5faa0556b23d594496719ced6a11b Last test of basis 184769 2024-02-26 16:00:26 Z0 days Testing same since 184774 2024-02-26 22:00:26 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Jan Beulich jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 03fb5f503c..37541208f1 37541208f119a9c552c6c6c3246ea61be0d44035 -> smoke
RE: question about virtio-vsock on xen
> Subject: RE: question about virtio-vsock on xen > > On Mon, 26 Feb 2024, Peng Fan wrote: > > Hi Stefano, > > > > > Subject: Re: question about virtio-vsock on xen > > > > > > Hi Peng, > > > > > > We haven't tried to setup virtio-vsock yet. > > > > > > In general, I am very supportive of using QEMU for virtio backends. > > > We use QEMU to provide virtio-net, virtio-block, virtio-console and more. > > > > Would you mind share how to setup virtio-console using qemu + xen? > > Vikram (CCed) has been working on it and should be able to share more > details > > > > > However, typically virtio-vsock comes into play for VM-to-VM > > > communication, which is different. Going via QEMU in Dom0 just to > > > have 1 VM communicate with another VM is not an ideal design: it > > > adds latency and uses resources in Dom0 when actually we could do > without it. > > > > > > A better model for VM-to-VM communication would be to have the VM > > > talk to each other directly via grant table or pre-shared memory > > > (see the static shared memory feature) or via Xen hypercalls (see > > > Argo.) > > > > The goal is to make android trout VM run with XEN + i.MX95, so need vsock. > > I am not familiar with the details of Android Trout... Where is vsock used? > Just asking for my own understanding. Here is the link https://source.android.com/docs/automotive/virtualization/reference_platform vsock is used for audio control hal, dumpstate HAL, vehicle HAL. Regards, Peng. > > > > > For a good Xen design, I think the virtio-vsock backend would need > > > to be in Xen itself (the hypervisor). > > > > > > Of course that is more work and it doesn't help you with the > > > specific question you had below :-) > > > > > > For that, I don't have a pointer to help you but maybe others in CC have. > > > > > > Cheers, > > > > > > Stefano > > > > > > > > > On Fri, 23 Feb 2024, Peng Fan wrote: > > > > Hi All, > > > > > > > > Has anyone make virtio-vsock on xen work? My dm args as below: > > > > > > > > virtio = [ > > > > > > > 'backend=0,type=virtio,device,transport=pci,bdf=05:00.0,backend_type > > > =qem > > > u,grant_usage=true' > > > > ] > > > > device_model_args = [ > > > > '-D', '/home/root/qemu_log.txt', > > > > '-d', > > > > 'trace:*vsock*,trace:*vhost*,trace:*virtio*,trace:*pci_update*,tra > > > > ce:* pci_route*,trace:*handle_ioreq*,trace:*xen*', > > > > '-device', > > > > 'vhost-vsock-pci,iommu_platform=false,id=vhost-vsock-pci0,bus=pcie > > > > .0,a > > > > ddr=5.0,guest-cid=3'] > > > > > > > > During my test, it always return failure in dom0 kernel in below code: > > > > > > > > vhost_transport_do_send_pkt { > > > > ... > > > >nbytes = copy_to_iter(hdr, sizeof(*hdr), &iov_iter); > > > > if (nbytes != sizeof(*hdr)) { > > > > vq_err(vq, "Faulted on copying pkt hdr %x > > > > %x %x %px\n", > > > nbytes, sizeof(*hdr), > > > > __builtin_object_size(hdr, 0), &iov_iter); > > > > kfree_skb(skb); > > > > break; > > > > } > > > > } > > > > > > > > I checked copy_to_iter, it is copy data to __user addr, but it > > > > never pass, the copy to __user addr always return 0 bytes copied. > > > > > > > > The asm code "sttr x7, [x6]" will trigger data abort, the kernel > > > > will run into do_page_fault, but lock_mm_and_find_vma report it is > > > > VM_FAULT_BADMAP, that means the __user addr is not mapped, no > vma > > > has this addr. > > > > > > > > I am not sure what may cause this. Appreciate if any comments. > > > > > > > > BTW: I tested blk pci, it works, so the virtio pci should work on my > > > > setup. > > > > > > > > Thanks, > > > > Peng. > > > > > >
Re: [XEN PATCH 2/2] xen/cpu: address MISRA C Rule 17.7
On Mon, 26 Feb 2024, Jan Beulich wrote: > On 23.02.2024 10:35, Nicola Vetrini wrote: > > Refactor cpu_notifier_call_chain into two functions: > > - the variant that is allowed to fail loses the nofail flag > > - the variant that shouldn't fail is encapsulated in a call > > to the failing variant, with an additional check. > > > > This prevents uses of the function that are not supposed to > > fail from ignoring the return value, thus violating Rule 17.7: > > "The value returned by a function having non-void return type shall > > be used". > > > > No functional change. > > > > Signed-off-by: Nicola Vetrini > > I'm afraid I disagree with this kind of bifurcation. No matter what > Misra thinks or says, it is normal for return values of functions to > not always be relevant to check. Hi Jan, I disagree. Regardless of MISRA, I really think return values need to be checked. Moreover, we decided as a group to honor MISRA Rule 17.7, which requires return values to be checked. This patch is a good step forward. > To deal with the Misra rule imo requires to first have an abstract > plan of how to handle such globally in the code base. Imo such a plan > can't be to introduce perhaps dozens of new wrapper functions like is > done here. This patch is following the right pattern, one we already follow with the _locked suffix.
Re: [XEN PATCH] xen: cache clearing and invalidation helpers refactoring
On Sat, 24 Feb 2024, Nicola Vetrini wrote: > Hi Stefano, > > On 2024-02-24 00:05, Stefano Stabellini wrote: > > On Fri, 23 Feb 2024, Nicola Vetrini wrote: > > > On 2024-02-19 16:14, Nicola Vetrini wrote: > > > > The cache clearing and invalidation helpers in x86 and Arm didn't > > > > comply with MISRA C Rule 17.7: "The value returned by a function > > > > having non-void return type shall be used". On Arm they > > > > were always returning 0, while some in x86 returned -EOPNOTSUPP > > > > and in common/grant_table the return value is saved. > > > > > > > > As a consequence, a common helper arch_grant_cache_flush that returns > > > > an integer is introduced, so that each architecture can choose whether > > > to > > > > return an error value on certain conditions, and the helpers have either > > > > been changed to return void (on Arm) or deleted entirely (on x86). > > > > > > > > Signed-off-by: Julien Grall > > > > Signed-off-by: Nicola Vetrini > > > > --- > > > > The original refactor idea came from Julien Grall in [1]; I edited that > > > > proposal > > > > to fix build errors. > > > > > > > > I did introduce a cast to void for the call to flush_area_local on x86, > > > > because > > > > even before this patch the return value of that function wasn't checked > > > in > > > > all > > > > but one use in x86/smp.c, and in this context the helper (perhaps > > > > incidentally) > > > > ignored the return value of flush_area_local. > > > > > > > > [1] > > > > > > > https://lore.kernel.org/xen-devel/09589e8f-77b6-47f7-b5bd-cf485e4b6...@xen.org/ > > > > --- > > > > xen/arch/arm/include/asm/page.h | 33 ++--- > > > > xen/arch/x86/include/asm/flushtlb.h | 23 ++-- > > > > xen/common/grant_table.c| 9 +--- > > > > 3 files changed, 34 insertions(+), 31 deletions(-) > > > > > > > > > > I'll put this patch in the backlog at the moment: too many intricacies > > > while > > > trying to untangle xen/flushtlb from xen/mm.h, and there are easier cases > > > that > > > can be done faster. If someone is interested I can post the partial work > > > I've > > > done so far, even though it doesn't > > > build on x86. > > > > I understand that the blocker is: > > > > diff --git a/xen/arch/arm/include/asm/page.h > > b/xen/arch/arm/include/asm/page.h > > index 69f817d1e6..e90c9de361 100644 > > --- a/xen/arch/arm/include/asm/page.h > > +++ b/xen/arch/arm/include/asm/page.h > > @@ -123,6 +123,7 @@ > > > > #ifndef __ASSEMBLY__ > > > > +#include > > #include > > #include > > #include > > > > > > And the headers disentagling required to solve it, right? > > > > > > Let me ask a silly question. public/grant_table.h seems needed by > > arch_grant_cache_flush. Can we move arch_grant_cache_flush somewhere > > else? It is not like page.h is a perfect fit for it anyway. > > > > For instance, can we move it to > > > > xen/arch/arm/include/asm/grant_table.h > > > > ? > > Yes, this is what was suggested and what I was trying to accomplish. > Basically my plan is: > > 1. move the arch_grant_cache_flush helper to asm/grant_table.h for both > architectures > 2. pull out of xen/mm.h this hunk (note the inclusion of asm/flushtlb in the > middle of the file) because there is a build error on tlbflush_current_time() > induced in some .c file (don't remember which) by the earlier movement It looks like it would be easier to resolve the build error on tlbflush_current_time() in another way. What's the build error exactly? Looking at the implementation of tlbflush_current_time on the various arches I couldn't find any potential issues. I just moved the implementation of arch_grant_cache_flush to arch specific headers and it seemed to have worked for me. See below. Maybe I am building with a different kconfig. diff --git a/xen/arch/arm/include/asm/grant_table.h b/xen/arch/arm/include/asm/grant_table.h index d3c518a926..2c5c07e061 100644 --- a/xen/arch/arm/include/asm/grant_table.h +++ b/xen/arch/arm/include/asm/grant_table.h @@ -76,6 +76,20 @@ int replace_grant_host_mapping(uint64_t gpaddr, mfn_t frame, #define gnttab_need_iommu_mapping(d)\ (is_domain_direct_mapped(d) && is_iommu_enabled(d)) +static inline int arch_grant_cache_flush(unsigned int op, const void *p, + unsigned long size) +{ +if ( (op & GNTTAB_CACHE_INVAL) && (op & GNTTAB_CACHE_CLEAN) ) +clean_and_invalidate_dcache_va_range(p, size); +else if ( op & GNTTAB_CACHE_INVAL ) +invalidate_dcache_va_range(p, size); +else if ( op & GNTTAB_CACHE_CLEAN ) +clean_dcache_va_range(p, size); + +/* ARM callers assume that dcache_* functions cannot fail. */ +return 0; +} + #endif /* __ASM_GRANT_TABLE_H__ */ /* * Local variables: diff --git a/xen/arch/arm/include/asm/page.h b/xen/arch/arm/include/asm/page.h index 69f817d1e6..aea692a24d 100644 --- a/xen/arch/arm/include/asm/page.h +++ b/xen/arch/arm/include
Re: [XEN PATCH 3/4] xen/include: add pure and const attributes
On Mon, 26 Feb 2024, Jan Beulich wrote: > On 23.02.2024 23:36, Stefano Stabellini wrote: > > On Fri, 23 Feb 2024, Jan Beulich wrote: > >> On 23.02.2024 02:32, Stefano Stabellini wrote: > >>> On Tue, 24 Oct 2023, Jan Beulich wrote: > On 24.10.2023 00:05, Stefano Stabellini wrote: > > On Mon, 23 Oct 2023, Jan Beulich wrote: > >> On 23.10.2023 17:23, Simone Ballarin wrote: > >>> On 23/10/23 15:34, Jan Beulich wrote: > On 18.10.2023 16:18, Simone Ballarin wrote: > > --- a/xen/include/xen/pdx.h > > +++ b/xen/include/xen/pdx.h > > @@ -164,7 +164,7 @@ static inline unsigned long pfn_to_pdx(unsigned > > long pfn) > >* @param pdx Page index > >* @return Obtained pfn after decompressing the pdx > >*/ > > -static inline unsigned long pdx_to_pfn(unsigned long pdx) > > +static inline __attribute_pure__ unsigned long pdx_to_pfn(unsigned > > long pdx) > > { > > return (pdx & pfn_pdx_bottom_mask) | > > ((pdx << pfn_pdx_hole_shift) & pfn_top_mask); > > Taking this as an example for what I've said above: The compiler > can't > know that the globals used by the functions won't change value. Even > within Xen it is only by convention that these variables are assigned > their values during boot, and then aren't changed anymore. Which > makes > me wonder: Did you check carefully that around the time the variables > have their values established, no calls to the functions exist (which > might then be subject to folding)? > >>> > >>> There is no need to check that, the GCC documentation explicitly says: > >>> > >>> However, functions declared with the pure attribute *can safely read > >>> any > >>> non-volatile objects*, and modify the value of objects in a way that > >>> does not affect their return value or the observable state of the > >>> program. > >> > >> I did quote this same text in response to what Andrew has said, but I > >> also > >> did note there that this needs to be taken with a grain of salt: The > >> compiler generally assumes a single-threaded environment, i.e. no > >> changes > >> to globals behind the back of the code it is processing. > > > > Let's start from the beginning. The reason for Simone to add > > __attribute_pure__ to pdx_to_pfn and other functions is for > > documentation purposes. It is OK if it doesn't serve any purpose other > > than documentation. > > > > Andrew, for sure we do not want to lie to the compiler and introduce > > undefined behavior. If we think there is a risk of it, we should not do > > it. > > > > So, what do we want to document? We want to document that the function > > does not have side effects according to MISRA's definition of it, which > > might subtly differ from GCC's definition. > > > > Looking at GCC's definition of __attribute_pure__, with the > > clarification statement copy/pasted above by both Simone and Jan, it > > seems that __attribute_pure__ matches MISRA's definition of a function > > without side effects. It also seems that pdx_to_pfn abides to that > > definition. > > > > Jan has a point that GCC might be making other assumptions > > (single-thread execution) that might not hold true in our case. Given > > the way the GCC statement is written I think this is low risk. But maybe > > not all GCC versions we want to support in the project might have the > > same definition of __attribute_pure__. So we could end up using > > __attribute_pure__ correctly for the GCC version used for safety (GCC > > 12.1, see docs/misra/C-language-toolchain.rst) but it might actually > > break an older GCC version. > > > > > > So Option#1 is to use __attribute_pure__ taking the risk that a GCC or > > Clang version might interpret __attribute_pure__ differently and > > potentially misbehave. > > > > Option#2 is to avoid this risk, by not using __attribute_pure__. > > Instead, we can use SAF-xx-safe or deviations.rst to document that > > pdx_to_pfn and other functions like it are without side effects > > according to MISRA's definition. > > > > > > Both options have pros and cons. To me the most important factor is how > > many GCC versions come with the statement "pure attribute can safely > > read any non-volatile objects, and modify the value of objects in a way > > that does not affect their return value or the observable state of the > > program". > > > > I checked and these are the results: > > - gcc 4.0.2: no statement > > - gcc 5.1.0: no statement > > - gcc 6.1.0: no statement > > - gcc 7.1.0: no statement > > - gcc 8.1.0: alternative statement "The pure
Re: [PATCH v1] automation: remove bin86/dev86 from tumbleweed image
On Sat, 24 Feb 2024, Olaf Hering wrote: > Fri, 23 Feb 2024 15:22:38 -0800 (PST) Stefano Stabellini > : > > > Do you have a successful gitlab pipeline with this patch applied that > > you can give me as proof of testing and success? > > Yes, all of them since the patch went out. Great thanks! Acked-by: Stefano Stabellini
[xen-unstable test] 184767: tolerable FAIL - PUSHED
flight 184767 xen-unstable real [real] flight 184773 xen-unstable real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/184767/ http://logs.test-lab.xenproject.org/osstest/logs/184773/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qemuu-win7-amd64 12 windows-install fail pass in 184773-retest Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop fail in 184773 like 184763 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 184763 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184763 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 184763 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184763 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 184763 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 184763 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 184763 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 184763 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184763 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184763 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 184763 test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass version targeted for testing: xen 8de3afc0b402bc17f65093a53e5870862707a8c7 baseline version: xen 92babc88f67ed0ef3dc575a8b9534040274678ee Last test of basis 184763 2024-02-26 01:52:19 Z
Re: [XEN PATCH 1/2] xen/console: drop return value from consoled_guest_rx/tx
On Mon, 26 Feb 2024, Jan Beulich wrote: > On 26.02.2024 09:23, Nicola Vetrini wrote: > > On 2024-02-26 09:00, Jan Beulich wrote: > >> On 23.02.2024 23:56, Stefano Stabellini wrote: > >>> On Fri, 23 Feb 2024, Nicola Vetrini wrote: > These functions never saw a usage of their return value since > they were introduced, so it can be dropped since their usages > violate MISRA C Rule 17.7: > "The value returned by a function having non-void return type shall > be used". > > No functional change. > > Signed-off-by: Nicola Vetrini > >>> > >>> Reviewed-by: Stefano Stabellini > >> > >> The cleanup is certainly okay, but would one of you mind clarifying in > >> how > >> far this code is relevant for certification? I don't expect there are > >> plans > >> to run shim Xen in any projected production uses for which > >> certification is > >> relevant? (The subject prefix is also unnecessarily wide here, when > >> it's > >> only daemon code which is affected, not console code in general.) > >> > > > > I agree on the subject prefix being too wide. The configuration that > > uses consoled_guest_tx is #ifdef-ed for x86, so even in configurations > > that may never reach this condition this is relevant, unless its #ifdef > > is restricted to cases where the call may actually be reachable. > > Hmm, I see. There are contradicting goals here then: It being just X86 is > to reduce the risk of someone overlooking a build breakage they may > introduce. Whereas for certification it's quite the other way around: We'd > like to "hide" as much code as possible. > > Really I would have been inclined to suggest to drop the #ifdef, if > possible even without replacing by IS_ENABLED(), but instead leveraging > that pv_shim ought to be compile-time false whenever CONFIG_PV_SHIM=n. This is OK > After all that's a pattern we've been trying to follow. But with your > observation is becomes questionable whether extending use of IS_ENABLED() > is actually going to be helpful. Stefano - perhaps something to discuss > on one of the next meetings? Yes. I checked with the safety manager and his opinion is that IS_ENABLED() is OK to use as a way to disable code from a safety perspective.
[ovmf test] 184772: all pass - PUSHED
flight 184772 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/184772/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 68238d4f948069fc2c6b9cc13863bdced52a84d0 baseline version: ovmf 44fdc4f3983be77dda709d7a0c3fb8905fbf920b Last test of basis 184768 2024-02-26 15:43:07 Z0 days Testing same since 184772 2024-02-26 19:43:08 Z0 days1 attempts People who touched revisions under test: Junfeng Guan jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git 44fdc4f398..68238d4f94 68238d4f948069fc2c6b9cc13863bdced52a84d0 -> xen-tested-master
RE: question about virtio-vsock on xen
On Mon, 26 Feb 2024, Peng Fan wrote: > Hi Stefano, > > > Subject: Re: question about virtio-vsock on xen > > > > Hi Peng, > > > > We haven't tried to setup virtio-vsock yet. > > > > In general, I am very supportive of using QEMU for virtio backends. We use > > QEMU to provide virtio-net, virtio-block, virtio-console and more. > > Would you mind share how to setup virtio-console using qemu + xen? Vikram (CCed) has been working on it and should be able to share more details > > However, typically virtio-vsock comes into play for VM-to-VM > > communication, which is different. Going via QEMU in Dom0 just to have 1 > > VM communicate with another VM is not an ideal design: it adds latency and > > uses resources in Dom0 when actually we could do without it. > > > > A better model for VM-to-VM communication would be to have the VM talk > > to each other directly via grant table or pre-shared memory (see the static > > shared memory feature) or via Xen hypercalls (see Argo.) > > The goal is to make android trout VM run with XEN + i.MX95, so need vsock. I am not familiar with the details of Android Trout... Where is vsock used? Just asking for my own understanding. > > For a good Xen design, I think the virtio-vsock backend would need to be in > > Xen itself (the hypervisor). > > > > Of course that is more work and it doesn't help you with the specific > > question > > you had below :-) > > > > For that, I don't have a pointer to help you but maybe others in CC have. > > > > Cheers, > > > > Stefano > > > > > > On Fri, 23 Feb 2024, Peng Fan wrote: > > > Hi All, > > > > > > Has anyone make virtio-vsock on xen work? My dm args as below: > > > > > > virtio = [ > > > > > 'backend=0,type=virtio,device,transport=pci,bdf=05:00.0,backend_type=qem > > u,grant_usage=true' > > > ] > > > device_model_args = [ > > > '-D', '/home/root/qemu_log.txt', > > > '-d', > > > 'trace:*vsock*,trace:*vhost*,trace:*virtio*,trace:*pci_update*,trace:* > > > pci_route*,trace:*handle_ioreq*,trace:*xen*', > > > '-device', > > > 'vhost-vsock-pci,iommu_platform=false,id=vhost-vsock-pci0,bus=pcie.0,a > > > ddr=5.0,guest-cid=3'] > > > > > > During my test, it always return failure in dom0 kernel in below code: > > > > > > vhost_transport_do_send_pkt { > > > ... > > >nbytes = copy_to_iter(hdr, sizeof(*hdr), &iov_iter); > > > if (nbytes != sizeof(*hdr)) { > > > vq_err(vq, "Faulted on copying pkt hdr %x %x %x > > > %px\n", > > nbytes, sizeof(*hdr), > > > __builtin_object_size(hdr, 0), &iov_iter); > > > kfree_skb(skb); > > > break; > > > } > > > } > > > > > > I checked copy_to_iter, it is copy data to __user addr, but it never > > > pass, the copy to __user addr always return 0 bytes copied. > > > > > > The asm code "sttr x7, [x6]" will trigger data abort, the kernel will > > > run into do_page_fault, but lock_mm_and_find_vma report it is > > > VM_FAULT_BADMAP, that means the __user addr is not mapped, no vma > > has this addr. > > > > > > I am not sure what may cause this. Appreciate if any comments. > > > > > > BTW: I tested blk pci, it works, so the virtio pci should work on my > > > setup. > > > > > > Thanks, > > > Peng. > > > >
Re: question about virtio-vsock on xen
On 26.02.24 05:09, Peng Fan wrote: > Hi Oleksandr, Hello Peng [snip] >> >> ... Peng, we have vhost-vsock (and vhost-net) Xen PoC. Although it is >> non- >> upstreamable in its current shape (based on old Linux version, requires some >> rework and proper integration, most likely requires involving Qemu and >> protocol changes to pass an additional info to vhost), it works with Linux >> v5.10 + patched Qemu v7.0, so you can refer to the Yocto meta layer which >> contains kernel patches for the details [1]. > > Thanks for the pointer, I am reading the code. > >> >> In a nutshell, before accessing the guest data the host module needs to map >> descriptors in virtio rings which contain either guest grant based DMA >> addresses (by using Xen grant mappings) or guest pseudo-physical addresses >> (by using Xen foreign mappings). After accessing the guest data the host >> module needs to unmap them. > > Ok, I thought the current xen virtio code already map every ready. > It does, as you said the virtio-blk-pci worked in your environment. But vhost(-vsock) is a special case, unlike for virtio-blk-pci where the whole backend resides in Qemu, here we have a split model. As I understand the Qemu performs only initial setup/configuration then offloads the I/O processing to a separate entity which is the Linux module in that particular case.
Re: [PATCH v6 1/3] xen: introduce Kconfig function alignment option
Hi Roger, On 2/7/24 8:55 AM, Roger Pau Monne wrote: > And use it to replace CODE_ALIGN in assembly. This allows to generalize the > way the code alignment gets set across all architectures. > > No functional change intended. > > Signed-off-by: Roger Pau Monné Acked-by: Shawn Anastasio Thanks, Shawn
Re: [PATCH v2 2/6] xen/ppc: address violations of MISRA C:2012 Rule 11.8.
Hi Stefano, Sorry for the delay on this. Acked-by: Shawn Anastasio Best, Shawn On 2/23/24 5:19 PM, Stefano Stabellini wrote: > Shawn, > > I am thinking of committing this, if you disagree please say something > in the next couple of days > > > On Tue, 19 Dec 2023, Simone Ballarin wrote: >> From: Maria Celeste Cesario >> >> The xen sources contain violations of MISRA C:2012 Rule 11.8 whose >> headline states: >> "A conversion shall not remove any const, volatile or _Atomic qualification >> from the type pointed to by a pointer". >> >> Fix violation by adding missing const qualifier in cast. >> >> Signed-off-by: Maria Celeste Cesario >> Signed-off-by: Simone Ballarin >> Reviewed-by: Stefano Stabellini >> --- >> Adaptation requested by the community to make the code more consistent. >> --- >> xen/arch/ppc/include/asm/atomic.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/xen/arch/ppc/include/asm/atomic.h >> b/xen/arch/ppc/include/asm/atomic.h >> index 64168aa3f1..fe778579fb 100644 >> --- a/xen/arch/ppc/include/asm/atomic.h >> +++ b/xen/arch/ppc/include/asm/atomic.h >> @@ -16,7 +16,7 @@ >> >> static inline int atomic_read(const atomic_t *v) >> { >> -return *(volatile int *)&v->counter; >> +return *(const volatile int *)&v->counter; >> } >> >> static inline int _atomic_read(atomic_t v) >> -- >> 2.40.0 >>
[xen-unstable-smoke test] 184769: tolerable all pass - PUSHED
flight 184769 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/184769/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass version targeted for testing: xen 03fb5f503cb5faa0556b23d594496719ced6a11b baseline version: xen 8de3afc0b402bc17f65093a53e5870862707a8c7 Last test of basis 184764 2024-02-26 10:02:04 Z0 days Testing same since 184769 2024-02-26 16:00:26 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Anthony PERARD George Dunlap Jan Beulich Roger Pau Monné Stefano Stabellini jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 8de3afc0b4..03fb5f503c 03fb5f503cb5faa0556b23d594496719ced6a11b -> smoke
[PATCH v5 22/23] xen/riscv: enable full Xen build
Signed-off-by: Oleksii Kurochko Reviewed-by: Jan Beulich --- Changes in V5: - Nothing changed. Only rebase. --- Changes in V4: - drop stubs for irq_actor_none() and irq_actor_none() as common/irq.c is compiled now. - drop defintion of max_page in stubs.c as common/page_alloc.c is compiled now. - drop printk() related changes in riscv/early_printk.c as common version will be used. --- Changes in V3: - Reviewed-by: Jan Beulich - unrealted change dropped in tiny64_defconfig --- Changes in V2: - Nothing changed. Only rebase. --- xen/arch/riscv/Makefile | 16 +++- xen/arch/riscv/arch.mk| 4 - xen/arch/riscv/early_printk.c | 168 -- xen/arch/riscv/stubs.c| 23 - 4 files changed, 15 insertions(+), 196 deletions(-) diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile index 60afbc0ad9..81b77b13d6 100644 --- a/xen/arch/riscv/Makefile +++ b/xen/arch/riscv/Makefile @@ -12,10 +12,24 @@ $(TARGET): $(TARGET)-syms $(OBJCOPY) -O binary -S $< $@ $(TARGET)-syms: $(objtree)/prelink.o $(obj)/xen.lds - $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< $(build_id_linker) -o $@ + $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< \ + $(objtree)/common/symbols-dummy.o -o $(dot-target).0 + $(NM) -pa --format=sysv $(dot-target).0 \ + | $(objtree)/tools/symbols $(all_symbols) --sysv --sort \ + > $(dot-target).0.S + $(MAKE) $(build)=$(@D) $(dot-target).0.o + $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< \ + $(dot-target).0.o -o $(dot-target).1 + $(NM) -pa --format=sysv $(dot-target).1 \ + | $(objtree)/tools/symbols $(all_symbols) --sysv --sort \ + > $(dot-target).1.S + $(MAKE) $(build)=$(@D) $(dot-target).1.o + $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< $(build_id_linker) \ + $(dot-target).1.o -o $@ $(NM) -pa --format=sysv $@ \ | $(objtree)/tools/symbols --all-symbols --xensyms --sysv --sort \ > $@.map + rm -f $(@D)/.$(@F).[0-9]* $(obj)/xen.lds: $(src)/xen.lds.S FORCE $(call if_changed_dep,cpp_lds_S) diff --git a/xen/arch/riscv/arch.mk b/xen/arch/riscv/arch.mk index fabe323ec5..197d5e1893 100644 --- a/xen/arch/riscv/arch.mk +++ b/xen/arch/riscv/arch.mk @@ -19,7 +19,3 @@ riscv-march-$(CONFIG_RISCV_ISA_C) := $(riscv-march-y)c # -mcmodel=medlow would force Xen into the lower half. CFLAGS += -march=$(riscv-march-y)$(has_zihintpause) -mstrict-align -mcmodel=medany - -# TODO: Drop override when more of the build is working -override ALL_OBJS-y = arch/$(SRCARCH)/built_in.o -override ALL_LIBS-y = diff --git a/xen/arch/riscv/early_printk.c b/xen/arch/riscv/early_printk.c index 60742a042d..610c814f54 100644 --- a/xen/arch/riscv/early_printk.c +++ b/xen/arch/riscv/early_printk.c @@ -40,171 +40,3 @@ void early_printk(const char *str) str++; } } - -/* - * The following #if 1 ... #endif should be removed after printk - * and related stuff are ready. - */ -#if 1 - -#include -#include - -/** - * strlen - Find the length of a string - * @s: The string to be sized - */ -size_t (strlen)(const char * s) -{ -const char *sc; - -for (sc = s; *sc != '\0'; ++sc) -/* nothing */; -return sc - s; -} - -/** - * memcpy - Copy one area of memory to another - * @dest: Where to copy to - * @src: Where to copy from - * @count: The size of the area. - * - * You should not use this function to access IO space, use memcpy_toio() - * or memcpy_fromio() instead. - */ -void *(memcpy)(void *dest, const void *src, size_t count) -{ -char *tmp = (char *) dest, *s = (char *) src; - -while (count--) -*tmp++ = *s++; - -return dest; -} - -int vsnprintf(char* str, size_t size, const char* format, va_list args) -{ -size_t i = 0; /* Current position in the output string */ -size_t written = 0; /* Total number of characters written */ -char* dest = str; - -while ( format[i] != '\0' && written < size - 1 ) -{ -if ( format[i] == '%' ) -{ -i++; - -if ( format[i] == '\0' ) -break; - -if ( format[i] == '%' ) -{ -if ( written < size - 1 ) -{ -dest[written] = '%'; -written++; -} -i++; -continue; -} - -/* - * Handle format specifiers. - * For simplicity, only %s and %d are implemented here. - */ - -if ( format[i] == 's' ) -{ -char* arg = va_arg(args, char*); -size_t arglen = strlen(arg); - -size_t remaining = size - written - 1; - -if ( arglen > remaining ) -arglen = remaining; - -memcpy(dest + written, arg, arglen); - -written += arglen; -
[PATCH v5 18/23] xen/riscv: add minimal stuff to processor.h to build full Xen
Signed-off-by: Oleksii Kurochko --- Changes in V5: - Code style fixes. - drop introduced TOOLCHAIN_HAS_ZIHINTPAUSE and use as-insn instead and use as-insn istead. --- Changes in V4: - Change message -> subject in "Changes in V3" - Documentation about system requirement was added. In the future, it can be checked if the extension is supported by system __riscv_isa_extension_available() ( https://gitlab.com/xen-project/people/olkur/xen/-/commit/737998e89ed305eb92059300c374dfa53d2143fa ) - update cpu_relax() function to check if __riscv_zihintpause is supported by a toolchain - add conditional _zihintpause to -march if it is supported by a toolchain Changes in V3: - update the commit subject - rename get_processor_id to smp_processor_id - code style fixes - update the cpu_relax instruction: use pause instruction instead of div %0, %0, zero --- Changes in V2: - Nothing changed. Only rebase. --- docs/misc/riscv/booting.txt| 8 xen/arch/riscv/arch.mk | 8 +++- xen/arch/riscv/include/asm/processor.h | 23 +++ 3 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 docs/misc/riscv/booting.txt diff --git a/docs/misc/riscv/booting.txt b/docs/misc/riscv/booting.txt new file mode 100644 index 00..38fad74956 --- /dev/null +++ b/docs/misc/riscv/booting.txt @@ -0,0 +1,8 @@ +System requirements +=== + +The following extensions are expected to be supported by a system on which +Xen is run: +- Zihintpause: + On a system that doesn't have this extension, cpu_relax() should be + implemented properly. Otherwise, an illegal instruction exception will arise. diff --git a/xen/arch/riscv/arch.mk b/xen/arch/riscv/arch.mk index 8403f96b6f..fabe323ec5 100644 --- a/xen/arch/riscv/arch.mk +++ b/xen/arch/riscv/arch.mk @@ -5,6 +5,12 @@ $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS)) CFLAGS-$(CONFIG_RISCV_64) += -mabi=lp64 +ifeq ($(CONFIG_RISCV_64),y) +has_zihintpause = $(call as-insn,$(CC) -mabi=lp64 -march=rv64i_zihintpause, "pause",_zihintpause,) +else +has_zihintpause = $(call as-insn,$(CC) -mabi=ilp32 -march=rv32i_zihintpause, "pause",_zihintpause,) +endif + riscv-march-$(CONFIG_RISCV_ISA_RV64G) := rv64g riscv-march-$(CONFIG_RISCV_ISA_C) := $(riscv-march-y)c @@ -12,7 +18,7 @@ riscv-march-$(CONFIG_RISCV_ISA_C) := $(riscv-march-y)c # into the upper half _or_ the lower half of the address space. # -mcmodel=medlow would force Xen into the lower half. -CFLAGS += -march=$(riscv-march-y) -mstrict-align -mcmodel=medany +CFLAGS += -march=$(riscv-march-y)$(has_zihintpause) -mstrict-align -mcmodel=medany # TODO: Drop override when more of the build is working override ALL_OBJS-y = arch/$(SRCARCH)/built_in.o diff --git a/xen/arch/riscv/include/asm/processor.h b/xen/arch/riscv/include/asm/processor.h index 6db681d805..b96af07660 100644 --- a/xen/arch/riscv/include/asm/processor.h +++ b/xen/arch/riscv/include/asm/processor.h @@ -12,6 +12,9 @@ #ifndef __ASSEMBLY__ +/* TODO: need to be implemeted */ +#define smp_processor_id() 0 + /* On stack VCPU state */ struct cpu_user_regs { @@ -53,6 +56,26 @@ struct cpu_user_regs unsigned long pregs; }; +/* TODO: need to implement */ +#define cpu_to_core(cpu) (0) +#define cpu_to_socket(cpu) (0) + +static inline void cpu_relax(void) +{ +#ifdef __riscv_zihintpause +/* + * Reduce instruction retirement. + * This assumes the PC changes. + */ +__asm__ __volatile__ ( "pause" ); +#else +/* Encoding of the pause instruction */ +__asm__ __volatile__ ( ".insn 0x10F" ); +#endif + +barrier(); +} + static inline void wfi(void) { __asm__ __volatile__ ("wfi"); -- 2.43.0
[PATCH v5 14/23] xen/riscv: introduce monitor.h
Signed-off-by: Oleksii Kurochko --- Waiting for dependency to be merged: [PATCH v6 0/9] Introduce generic headers (https://lore.kernel.org/xen-devel/84568b0c24a5ec96244f3f34537e9a148367facf.1707499278.git.oleksii.kuroc...@gmail.com/) --- Changes in V4/V5: - Nothing changed. Only rebase. --- Changes in V3: - new patch. --- xen/arch/riscv/include/asm/monitor.h | 26 ++ 1 file changed, 26 insertions(+) create mode 100644 xen/arch/riscv/include/asm/monitor.h diff --git a/xen/arch/riscv/include/asm/monitor.h b/xen/arch/riscv/include/asm/monitor.h new file mode 100644 index 00..f4fe2c0690 --- /dev/null +++ b/xen/arch/riscv/include/asm/monitor.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __ASM_RISCV_MONITOR_H__ +#define __ASM_RISCV_MONITOR_H__ + +#include + +#include + +struct domain; + +static inline uint32_t arch_monitor_get_capabilities(struct domain *d) +{ +BUG_ON("unimplemented"); +return 0; +} + +#endif /* __ASM_RISCV_MONITOR_H__ */ + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ -- 2.43.0
[PATCH v5 21/23] xen/rirscv: add minimal amount of stubs to build full Xen
Signed-off-by: Oleksii Kurochko --- Changes in V5: - drop unrelated changes - assert_failed("unimplmented...") change to BUG_ON() --- Changes in V4: - added new stubs which are necessary for compilation after rebase: __cpu_up(), __cpu_disable(), __cpu_die() from smpboot.c - back changes related to printk() in early_printk() as they should be removed in the next patch to avoid compilation error. - update definition of cpu_khz: __read_mostly -> __ro_after_init. - drop vm_event_reset_vmtrace(). It is defibed in asm-generic/vm_event.h. - move vm_event_*() functions from stubs.c to riscv/vm_event.c. - s/BUG/BUG_ON("unimplemented") in stubs.c - back irq_actor_none() and irq_actor_none() as common/irq.c isn't compiled at this moment, so this function are needed to avoid compilation error. - defined max_page to avoid compilation error, it will be removed as soon as common/page_alloc.c will be compiled. --- Changes in V3: - code style fixes. - update attribute for frametable_base_pdx and frametable_virt_end to __ro_after_init. insteaf of read_mostly. - use BUG() instead of assert_failed/WARN for newly introduced stubs. - drop "#include " in stubs.c and use forward declaration instead. - drop ack_node() and end_node() as they aren't used now. --- Changes in V2: - define udelay stub - remove 'select HAS_PDX' from RISC-V Kconfig because of https://lore.kernel.org/xen-devel/20231006144405.1078260-1-andrew.coop...@citrix.com/ --- xen/arch/riscv/Makefile | 1 + xen/arch/riscv/mm.c | 50 + xen/arch/riscv/setup.c | 8 + xen/arch/riscv/stubs.c | 438 xen/arch/riscv/traps.c | 25 +++ 5 files changed, 522 insertions(+) create mode 100644 xen/arch/riscv/stubs.c diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile index 1ed1a8369b..60afbc0ad9 100644 --- a/xen/arch/riscv/Makefile +++ b/xen/arch/riscv/Makefile @@ -4,6 +4,7 @@ obj-y += mm.o obj-$(CONFIG_RISCV_64) += riscv64/ obj-y += sbi.o obj-y += setup.o +obj-y += stubs.o obj-y += traps.o obj-y += vm_event.o diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c index fe3a43be20..2c3fb7d72e 100644 --- a/xen/arch/riscv/mm.c +++ b/xen/arch/riscv/mm.c @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0-only */ +#include #include #include #include @@ -14,6 +15,9 @@ #include #include +unsigned long __ro_after_init frametable_base_pdx; +unsigned long __ro_after_init frametable_virt_end; + struct mmu_desc { unsigned int num_levels; unsigned int pgtbl_count; @@ -294,3 +298,49 @@ unsigned long __init calc_phys_offset(void) phys_offset = load_start - XEN_VIRT_START; return phys_offset; } + +void put_page(struct page_info *page) +{ +BUG_ON("unimplemented"); +} + +unsigned long get_upper_mfn_bound(void) +{ +/* No memory hotplug yet, so current memory limit is the final one. */ +return max_page - 1; +} + +void arch_dump_shared_mem_info(void) +{ +BUG_ON("unimplemented"); +} + +int populate_pt_range(unsigned long virt, unsigned long nr_mfns) +{ +BUG_ON("unimplemented"); +return -1; +} + +int xenmem_add_to_physmap_one(struct domain *d, unsigned int space, + union add_to_physmap_extra extra, + unsigned long idx, gfn_t gfn) +{ +BUG_ON("unimplemented"); + +return 0; +} + +int destroy_xen_mappings(unsigned long s, unsigned long e) +{ +BUG_ON("unimplemented"); +return -1; +} + +int map_pages_to_xen(unsigned long virt, + mfn_t mfn, + unsigned long nr_mfns, + unsigned int flags) +{ +BUG_ON("unimplemented"); +return -1; +} diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c index 98a94c4c48..8bb5bdb2ae 100644 --- a/xen/arch/riscv/setup.c +++ b/xen/arch/riscv/setup.c @@ -1,11 +1,19 @@ /* SPDX-License-Identifier: GPL-2.0-only */ +#include #include #include #include +#include + #include +void arch_get_xen_caps(xen_capabilities_info_t *info) +{ +BUG_ON("unimplemented"); +} + /* Xen stack for bringing up the first CPU. */ unsigned char __initdata cpu0_boot_stack[STACK_SIZE] __aligned(STACK_SIZE); diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c new file mode 100644 index 00..529f1dbe52 --- /dev/null +++ b/xen/arch/riscv/stubs.c @@ -0,0 +1,438 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#include +#include +#include +#include +#include +#include + +#include + +/* smpboot.c */ + +cpumask_t cpu_online_map; +cpumask_t cpu_present_map; +cpumask_t cpu_possible_map; + +/* ID of the PCPU we're running on */ +DEFINE_PER_CPU(unsigned int, cpu_id); +/* XXX these seem awfully x86ish... */ +/* representing HT siblings of each logical CPU */ +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_mask); +/* representing HT and core siblings of each logical CPU */ +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask); +
[ovmf test] 184768: all pass - PUSHED
flight 184768 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/184768/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 44fdc4f3983be77dda709d7a0c3fb8905fbf920b baseline version: ovmf ba96acd963e794e86633ffd5e7f96cbc03db673c Last test of basis 184761 2024-02-25 20:12:59 Z0 days Testing same since 184768 2024-02-26 15:43:07 Z0 days1 attempts People who touched revisions under test: Yi Li jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git ba96acd963..44fdc4f398 44fdc4f3983be77dda709d7a0c3fb8905fbf920b -> xen-tested-master
[PATCH v5 23/23] xen/README: add compiler and binutils versions for RISC-V64
This patch doesn't represent a strict lower bound for GCC and GNU Binutils; rather, these versions are specifically employed by the Xen RISC-V container and are anticipated to undergo continuous testing. While it is feasible to utilize Clang, it's important to note that, currently, there is no Xen RISC-V CI job in place to verify the seamless functioning of the build with Clang. Signed-off-by: Oleksii Kurochko --- Changes in V5: - update the commit message and README file with additional explanation about GCC and GNU Binutils version. Additionally, it was added information about Clang. --- Changes in V4: - Update version of GCC (12.2) and GNU Binutils (2.39) to the version which are in Xen's contrainter for RISC-V --- Changes in V3: - new patch --- README | 9 + 1 file changed, 9 insertions(+) diff --git a/README b/README index c8a108449e..7fd4173743 100644 --- a/README +++ b/README @@ -48,6 +48,15 @@ provided by your OS distributor: - For ARM 64-bit: - GCC 5.1 or later - GNU Binutils 2.24 or later + - For RISC-V 64-bit: +- GCC 12.2 or later +- GNU Binutils 2.39 or later +This doesn't represent a strict lower bound for GCC and GNU Binutils; +rather, these versions are specifically employed by the Xen RISC-V +container and are anticipated to undergo continuous testing. +While it is feasible to utilize Clang, it's important to note that, +currently, there is no Xen RISC-V CI job in place to verify the +seamless functioning of the build with Clang. * POSIX compatible awk * Development install of zlib (e.g., zlib-dev) * Development install of Python 2.7 or later (e.g., python-dev) -- 2.43.0
[PATCH v5 13/23] xen/riscv: introduce atomic.h
Initially the patch was introduced by Bobby, who takes the header from Linux kernel. The following changes were done on top of Linux kernel header: - atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were updated to use__*xchg_generic() - drop casts in write_atomic() as they are unnecessary - drop introduction of WRITE_ONCE() and READ_ONCE(). Xen provides ACCESS_ONCE() - remove zero-length array access in read_atomic() - drop defines similar to pattern - #define atomic_add_return_relaxed atomic_add_return_relaxed - move not RISC-V specific functions to asm-generic/atomics-ops.h Signed-off-by: Bobby Eshleman Signed-off-by: Oleksii Kurochko --- Changes in V5: - fence.h changes were moved to separate patch as patches related to io.h and cmpxchg.h, which are dependecies for this patch, also needed changes in fence.h - remove accessing of zero-length array - drops cast in write_atomic() - drop introduction of WRITE_ONCE() and READ_ONCE(). - drop defines similar to pattern #define atomic_add_return_relaxed atomic_add_return_relaxed - Xen code style fixes - move not RISC-V specific functions to asm-generic/atomics-ops.h --- Changes in V4: - do changes related to the updates of [PATCH v3 13/34] xen/riscv: introduce cmpxchg.h - drop casts in read_atomic_size(), write_atomic(), add_sized() - tabs -> spaces - drop #ifdef CONFIG_SMP ... #endif in fence.ha as it is simpler to handle NR_CPUS=1 the same as NR_CPUS>1 with accepting less than ideal performance. --- Changes in V3: - update the commit message - add SPDX for fence.h - code style fixes - Remove /* TODO: ... */ for add_sized macros. It looks correct to me. - re-order the patch - merge to this patch fence.h --- Changes in V2: - Change an author of commit. I got this header from Bobby's old repo. --- xen/arch/riscv/include/asm/atomic.h | 296 +++ xen/include/asm-generic/atomic-ops.h | 92 + 2 files changed, 388 insertions(+) create mode 100644 xen/arch/riscv/include/asm/atomic.h create mode 100644 xen/include/asm-generic/atomic-ops.h diff --git a/xen/arch/riscv/include/asm/atomic.h b/xen/arch/riscv/include/asm/atomic.h new file mode 100644 index 00..8007ae4c90 --- /dev/null +++ b/xen/arch/riscv/include/asm/atomic.h @@ -0,0 +1,296 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Taken and modified from Linux. + * + * The following changes were done: + * - * atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were updated + * to use__*xchg_generic() + * - drop casts in write_atomic() as they are unnecessary + * - drop introduction of WRITE_ONCE() and READ_ONCE(). + * Xen provides ACCESS_ONCE() + * - remove zero-length array access in read_atomic() + * - drop defines similar to pattern + * #define atomic_add_return_relaxed atomic_add_return_relaxed + * - move not RISC-V specific functions to asm-generic/atomics-ops.h + * + * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved. + * Copyright (C) 2012 Regents of the University of California + * Copyright (C) 2017 SiFive + * Copyright (C) 2024 Vates SAS + */ + +#ifndef _ASM_RISCV_ATOMIC_H +#define _ASM_RISCV_ATOMIC_H + +#include + +#include +#include +#include +#include + +#include + +void __bad_atomic_size(void); + +/* + * Legacy from Linux kernel. For some reason they wanted to have ordered + * read/write access. Thereby read* is used instead of read_cpu() + */ +static always_inline void read_atomic_size(const volatile void *p, + void *res, + unsigned int size) +{ +switch ( size ) +{ +case 1: *(uint8_t *)res = readb(p); break; +case 2: *(uint16_t *)res = readw(p); break; +case 4: *(uint32_t *)res = readl(p); break; +case 8: *(uint32_t *)res = readq(p); break; +default: __bad_atomic_size(); break; +} +} + +#define read_atomic(p) ({ \ +union { typeof(*p) val; char c[sizeof(*p)]; } x_; \ +read_atomic_size(p, x_.c, sizeof(*p)); \ +x_.val; \ +}) + +#define write_atomic(p, x) \ +({ \ +typeof(*p) x__ = (x); \ +switch ( sizeof(*p) ) \ +{ \ +case 1: writeb(x__, p); break; \ +case 2: writew(x__, p); break; \ +case 4: writel(x__, p); break; \ +case 8: writeq(x__, p); break; \ +default: __bad_atomic_size(); break;\ +} \ +x__;\ +}) + +#define add_sized(p, x) \ +({ \ +typeo
[PATCH v5 17/23] xen/riscv: add minimal stuff to page.h to build full Xen
Signed-off-by: Oleksii Kurochko Acked-by: Jan Beulich --- Changes in V5: - Nothing changed. Only rebase. --- Changes in V4: --- - Change message -> subject in "Changes in V3" - s/BUG/BUG_ON("...") - Do proper rebase ( pfn_to_paddr() and paddr_to_pfn() aren't removed ). --- Changes in V3: - update the commit subject - add implemetation of PAGE_HYPERVISOR macros - add Acked-by: Jan Beulich - drop definition of pfn_to_addr, and paddr_to_pfn in --- Changes in V2: - Nothing changed. Only rebase. --- xen/arch/riscv/include/asm/page.h | 19 +++ 1 file changed, 19 insertions(+) diff --git a/xen/arch/riscv/include/asm/page.h b/xen/arch/riscv/include/asm/page.h index 95074e29b3..c831e16417 100644 --- a/xen/arch/riscv/include/asm/page.h +++ b/xen/arch/riscv/include/asm/page.h @@ -6,6 +6,7 @@ #ifndef __ASSEMBLY__ #include +#include #include #include @@ -32,6 +33,10 @@ #define PTE_LEAF_DEFAULT(PTE_VALID | PTE_READABLE | PTE_WRITABLE) #define PTE_TABLE (PTE_VALID) +#define PAGE_HYPERVISOR_RW (PTE_VALID | PTE_READABLE | PTE_WRITABLE) + +#define PAGE_HYPERVISOR PAGE_HYPERVISOR_RW + /* Calculate the offsets into the pagetables for a given VA */ #define pt_linear_offset(lvl, va) ((va) >> XEN_PT_LEVEL_SHIFT(lvl)) @@ -62,6 +67,20 @@ static inline bool pte_is_valid(pte_t p) return p.pte & PTE_VALID; } +static inline void invalidate_icache(void) +{ +BUG_ON("unimplemented"); +} + +#define clear_page(page) memset((void *)(page), 0, PAGE_SIZE) +#define copy_page(dp, sp) memcpy(dp, sp, PAGE_SIZE) + +/* TODO: Flush the dcache for an entire page. */ +static inline void flush_page_to_ram(unsigned long mfn, bool sync_icache) +{ +BUG_ON("unimplemented"); +} + #endif /* __ASSEMBLY__ */ #endif /* _ASM_RISCV_PAGE_H */ -- 2.43.0
[PATCH v5 20/23] xen/riscv: introduce vm_event_*() functions
Signed-off-by: Oleksii Kurochko --- Changes in V5: - Only rebase was done. --- Changes in V4: - New patch. --- xen/arch/riscv/Makefile | 1 + xen/arch/riscv/vm_event.c | 19 +++ 2 files changed, 20 insertions(+) create mode 100644 xen/arch/riscv/vm_event.c diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile index 2fefe14e7c..1ed1a8369b 100644 --- a/xen/arch/riscv/Makefile +++ b/xen/arch/riscv/Makefile @@ -5,6 +5,7 @@ obj-$(CONFIG_RISCV_64) += riscv64/ obj-y += sbi.o obj-y += setup.o obj-y += traps.o +obj-y += vm_event.o $(TARGET): $(TARGET)-syms $(OBJCOPY) -O binary -S $< $@ diff --git a/xen/arch/riscv/vm_event.c b/xen/arch/riscv/vm_event.c new file mode 100644 index 00..bb1fc73bc1 --- /dev/null +++ b/xen/arch/riscv/vm_event.c @@ -0,0 +1,19 @@ +#include + +struct vm_event_st; +struct vcpu; + +void vm_event_fill_regs(struct vm_event_st *req) +{ +BUG_ON("unimplemented"); +} + +void vm_event_set_registers(struct vcpu *v, struct vm_event_st *rsp) +{ +BUG_ON("unimplemented"); +} + +void vm_event_monitor_next_interrupt(struct vcpu *v) +{ +/* Not supported on RISCV. */ +} -- 2.43.0
[PATCH v5 15/23] xen/riscv: add definition of __read_mostly
The definition of __read_mostly should be removed in: https://lore.kernel.org/xen-devel/f25eb5c9-7c14-6e23-8535-2c66772b3...@suse.com/ The patch introduces it in arch-specific header to not block enabling of full Xen build for RISC-V. Signed-off-by: Oleksii Kurochko --- - [PATCH] move __read_mostly to xen/cache.h [2] Right now, the patch series doesn't have a direct dependency on [2] and it provides __read_mostly in the patch: [PATCH v3 26/34] xen/riscv: add definition of __read_mostly However, it will be dropped as soon as [2] is merged or at least when the final version of the patch [2] is provided. [2] https://lore.kernel.org/xen-devel/f25eb5c9-7c14-6e23-8535-2c66772b3...@suse.com/ --- Changes in V4-V6: - Nothing changed. Only rebase. --- xen/arch/riscv/include/asm/cache.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/arch/riscv/include/asm/cache.h b/xen/arch/riscv/include/asm/cache.h index 69573eb051..94bd94db53 100644 --- a/xen/arch/riscv/include/asm/cache.h +++ b/xen/arch/riscv/include/asm/cache.h @@ -3,4 +3,6 @@ #ifndef _ASM_RISCV_CACHE_H #define _ASM_RISCV_CACHE_H +#define __read_mostly __section(".data.read_mostly") + #endif /* _ASM_RISCV_CACHE_H */ -- 2.43.0
[PATCH v5 19/23] xen/riscv: add minimal stuff to mm.h to build full Xen
Signed-off-by: Oleksii Kurochko --- Changes in V5: - update the comment around "struct domain *domain;" : zero -> NULL - fix ident. for unsigned long val; - put page_to_virt() and virt_to_page() close to each other. - drop unnessary leading underscore - drop a space before the comment: /* Count of uses of this frame as its current type. */ - drop comment about a page 'not as a shadow'. it is not necessary for RISC-V --- Changes in V4: - update an argument name of PFN_ORDERN macros. - drop pad at the end of 'struct page_info'. - Change message -> subject in "Changes in V3" - delete duplicated macros from riscv/mm.h - fix identation in struct page_info - align comment for PGC_ macros - update definitions of domain_set_alloc_bitsize() and domain_clamp_alloc_bitsize() - drop unnessary comments. - s/BUG/BUG_ON("...") - define __virt_to_maddr, __maddr_to_virt as stubs - add inclusion of xen/mm-frame.h for mfn_x and others - include "xen/mm.h" instead of "asm/mm.h" to fix compilation issues: In file included from arch/riscv/setup.c:7: ./arch/riscv/include/asm/mm.h:60:28: error: field 'list' has incomplete type 60 | struct page_list_entry list; |^~~~ ./arch/riscv/include/asm/mm.h:81:43: error: 'MAX_ORDER' undeclared here (not in a function) 81 | unsigned long first_dirty:MAX_ORDER + 1; | ^ ./arch/riscv/include/asm/mm.h:81:31: error: bit-field 'first_dirty' width not an integer constant 81 | unsigned long first_dirty:MAX_ORDER + 1; - Define __virt_to_mfn() and __mfn_to_virt() using maddr_to_mfn() and mfn_to_maddr(). --- Changes in V3: - update the commit title - introduce DIRECTMAP_VIRT_START. - drop changes related pfn_to_paddr() and paddr_to_pfn as they were remvoe in [PATCH v2 32/39] xen/riscv: add minimal stuff to asm/page.h to build full Xen - code style fixes. - drop get_page_nr and put_page_nr as they don't need for time being - drop CONFIG_STATIC_MEMORY related things - code style fixes --- Changes in V2: - define stub for arch_get_dma_bitsize(void) --- xen/arch/riscv/include/asm/mm.h | 246 xen/arch/riscv/mm.c | 2 +- xen/arch/riscv/setup.c | 2 +- 3 files changed, 248 insertions(+), 2 deletions(-) diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h index 07c7a0abba..2f13c1c3c2 100644 --- a/xen/arch/riscv/include/asm/mm.h +++ b/xen/arch/riscv/include/asm/mm.h @@ -3,11 +3,252 @@ #ifndef _ASM_RISCV_MM_H #define _ASM_RISCV_MM_H +#include +#include +#include +#include +#include + #include #define pfn_to_paddr(pfn) ((paddr_t)(pfn) << PAGE_SHIFT) #define paddr_to_pfn(pa) ((unsigned long)((pa) >> PAGE_SHIFT)) +#define paddr_to_pdx(pa)mfn_to_pdx(maddr_to_mfn(pa)) +#define gfn_to_gaddr(gfn) pfn_to_paddr(gfn_x(gfn)) +#define gaddr_to_gfn(ga)_gfn(paddr_to_pfn(ga)) +#define mfn_to_maddr(mfn) pfn_to_paddr(mfn_x(mfn)) +#define maddr_to_mfn(ma)_mfn(paddr_to_pfn(ma)) +#define vmap_to_mfn(va) maddr_to_mfn(virt_to_maddr((vaddr_t)va)) +#define vmap_to_page(va)mfn_to_page(vmap_to_mfn(va)) + +static inline unsigned long __virt_to_maddr(unsigned long va) +{ +BUG_ON("unimplemented"); +return 0; +} + +static inline void *__maddr_to_virt(unsigned long ma) +{ +BUG_ON("unimplemented"); +return NULL; +} + +#define virt_to_maddr(va) __virt_to_maddr((unsigned long)(va)) +#define maddr_to_virt(pa) __maddr_to_virt((unsigned long)(pa)) + +/* Convert between Xen-heap virtual addresses and machine frame numbers. */ +#define __virt_to_mfn(va) mfn_x(maddr_to_mfn(virt_to_maddr(va))) +#define __mfn_to_virt(mfn) maddr_to_virt(mfn_to_maddr(_mfn(mfn))) + +/* + * We define non-underscored wrappers for above conversion functions. + * These are overriden in various source files while underscored version + * remain intact. + */ +#define virt_to_mfn(va) __virt_to_mfn(va) +#define mfn_to_virt(mfn)__mfn_to_virt(mfn) + +struct page_info +{ +/* Each frame can be threaded onto a doubly-linked list. */ +struct page_list_entry list; + +/* Reference count and various PGC_xxx flags and fields. */ +unsigned long count_info; + +/* Context-dependent fields follow... */ +union { +/* Page is in use: ((count_info & PGC_count_mask) != 0). */ +struct { +/* Type reference count and various PGT_xxx flags and fields. */ +unsigned long type_info; +} inuse; +/* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ +union { +struct { +/* + * Index of the first *possibly* unscrubbed page in the buddy. + * One more bit than maximum possible order to accommodate + * INVALID_DIRTY_IDX. + */ +#define
[PATCH v5 09/23] xen/riscv: introduce bitops.h
Taken from Linux-6.4.0-rc1 Xen's bitops.h consists of several Linux's headers: * linux/arch/include/asm/bitops.h: * The following function were removed as they aren't used in Xen: * test_and_set_bit_lock * clear_bit_unlock * __clear_bit_unlock * The following functions were renamed in the way how they are used by common code: * __test_and_set_bit * __test_and_clear_bit * The declaration and implementation of the following functios were updated to make Xen build happy: * clear_bit * set_bit * __test_and_clear_bit * __test_and_set_bit * linux/include/asm-generic/bitops/generic-non-atomic.h with the following changes: * Only functions that can be reused in Xen were left; others were removed. * it was updated the message inside #ifndef ... #endif. * __always_inline -> always_inline to be align with definition in xen/compiler.h. * update function prototypes from generic___test_and_*(unsigned long nr nr, volatile unsigned long *addr) to generic___test_and_*(unsigned long nr, volatile void *addr) to be consistent with other related macros/defines. * convert identations from tabs to spaces. * inside generic__test_and_* use 'bitops_uint_t' instead of 'unsigned long' to be generic. Signed-off-by: Oleksii Kurochko --- Patches 04 - 08 of this patch series are prerequisite for this patch. --- Changes in V5: - Code style fixes - s/__NOP/NOP/g - s/__NOT/NOT/g - update the comments above functions: test_and_set_bit, test_and_clear_bit, set_bit, clear_bit as all of them are using atomic operation and a memory barrier, so the operation in it cannot be reordered. - s/volatile uint32_t/volatile bitops_uint_t in test_and_set_bit, test_and_clear_bit, set_bit, clear_bit. - update the commit message - split introduction of asm-generic functions to separate patches: Patches 04 - 08 of this patch series are prerequisite for this patch. --- Changes in V4: - updated the commit message: dropped the message about what was taken from linux/include/asm-generic/bitops/find.h as related changes now are located in xen/bitops.h. Also these changes were removed from riscv/bitops.h - switch tabs to spaces. - update return type of __ffs function, format __ffs according to Xen code style. Move the function to respective asm-generic header. - format ffsl() according to Xen code style, update the type of num: int -> unsigned to be align with return type of the function. Move the function to respective asm-generic header. - add new line for the files: asm-generic/bitops-bits.h asm-generic/ffz.h asm-generic/find-first-bit-set.h asm-generic/fls.h asm-generic/flsl.h asm-generic/test-bit.h - rename asm-generic/find-first-bit-set.h to asm-generic/find-first-set-bit.h to be aligned with the function name implemented inside. - introduce generic___test_and*() operation for non-atomic bitops. - rename current __test_and_*() -> test_and_*() as their implementation are atomic aware. - define __test_and_*() to generic___test_and_*(). - introduce test_and_change_bit(). - update asm-generic/bitops/bitops-bits.h to give possoibility to change BITOP_*() macros by architecture. Also, it was introduced bitops_uint_t type to make generic___test_and_*() generic. - "include asm-generic/bitops/bitops-bits.h" to files which use its definitions. - add comment why generic ffz is defined as __ffs(). - update the commit message. - swtich ffsl() to generic_ffsl(). --- Changes in V3: - update the commit message - Introduce the following asm-generic bitops headers: create mode 100644 xen/arch/riscv/include/asm/bitops.h create mode 100644 xen/include/asm-generic/bitops/bitops-bits.h create mode 100644 xen/include/asm-generic/bitops/ffs.h create mode 100644 xen/include/asm-generic/bitops/ffz.h create mode 100644 xen/include/asm-generic/bitops/find-first-bit-set.h create mode 100644 xen/include/asm-generic/bitops/fls.h create mode 100644 xen/include/asm-generic/bitops/flsl.h create mode 100644 xen/include/asm-generic/bitops/hweight.h create mode 100644 xen/include/asm-generic/bitops/test-bit.h - switch some bitops functions to asm-generic's versions. - re-sync some macros with Linux kernel version mentioned in the commit message. - Xen code style fixes. --- Changes in V2: - Nothing changed. Only rebase. --- xen/arch/riscv/include/asm/bitops.h | 152 xen/arch/riscv/include/asm/config.h | 2 + 2 files changed, 154 insertions(+) create mode 100644 xen/arch/riscv/include/asm/bitops.h diff --git a/xen/arch/riscv/include/asm/bitops.h b/xen/arch/riscv/include/asm/bitops.h new file mode 100644 index 00..17b3cf5be5 --- /dev/null +++ b/xen/arch/riscv/include/asm/bitops.h @@ -0,0 +1,152
[PATCH v5 11/23] xen/riscv: introduce cmpxchg.h
The header was taken from Linux kernl 6.4.0-rc1. Addionally, were updated: * add emulation of {cmp}xchg for 1/2 byte types using 32-bit atomic access. * replace tabs with spaces * replace __* variale with *__ * introduce generic version of xchg_* and cmpxchg_*. Implementation of 4- and 8-byte cases were left as it is done in Linux kernel as according to the RISC-V spec: ``` Table A.5 ( only part of the table was copied here ) Linux Construct RVWMO Mapping atomic relaxedamo.{w|d} atomic acquireamo.{w|d}.aq atomic releaseamo.{w|d}.rl atomic amo.{w|d}.aqrl Linux Construct RVWMO LR/SC Mapping atomic relaxedloop: lr.{w|d}; ; sc.{w|d}; bnez loop atomic acquireloop: lr.{w|d}.aq; ; sc.{w|d}; bnez loop atomic releaseloop: lr.{w|d}; ; sc.{w|d}.aqrl∗ ; bnez loop OR fence.tso; loop: lr.{w|d}; ; sc.{w|d}∗ ; bnez loop atomic loop: lr.{w|d}.aq; ; sc.{w|d}.aqrl; bnez loop The Linux mappings for release operations may seem stronger than necessary, but these mappings are needed to cover some cases in which Linux requires stronger orderings than the more intuitive mappings would provide. In particular, as of the time this text is being written, Linux is actively debating whether to require load-load, load-store, and store-store orderings between accesses in one critical section and accesses in a subsequent critical section in the same hart and protected by the same synchronization object. Not all combinations of FENCE RW,W/FENCE R,RW mappings with aq/rl mappings combine to provide such orderings. There are a few ways around this problem, including: 1. Always use FENCE RW,W/FENCE R,RW, and never use aq/rl. This suffices but is undesirable, as it defeats the purpose of the aq/rl modifiers. 2. Always use aq/rl, and never use FENCE RW,W/FENCE R,RW. This does not currently work due to the lack of load and store opcodes with aq and rl modifiers. 3. Strengthen the mappings of release operations such that they would enforce sufficient orderings in the presence of either type of acquire mapping. This is the currently-recommended solution, and the one shown in Table A.5. ``` But in Linux kenrel atomics were strengthen with fences: ``` Atomics present the same issue with locking: release and acquire variants need to be strengthened to meet the constraints defined by the Linux-kernel memory consistency model [1]. Atomics present a further issue: implementations of atomics such as atomic_cmpxchg() and atomic_add_unless() rely on LR/SC pairs, which do not give full-ordering with .aqrl; for example, current implementations allow the "lr-sc-aqrl-pair-vs-full-barrier" test below to end up with the state indicated in the "exists" clause. In order to "synchronize" LKMM and RISC-V's implementation, this commit strengthens the implementations of the atomics operations by replacing .rl and .aq with the use of ("lightweigth") fences, and by replacing .aqrl LR/SC pairs in sequences such as: 0: lr.w.aqrl %0, %addr bne%0, %old, 1f ... sc.w.aqrl %1, %new, %addr bnez %1, 0b 1: with sequences of the form: 0: lr.w %0, %addr bne%0, %old, 1f ... sc.w.rl%1, %new, %addr /* SC-release */ bnez %1, 0b fence rw, rw/* "full" fence */ 1: following Daniel's suggestion. These modifications were validated with simulation of the RISC-V memory consistency model. C lr-sc-aqrl-pair-vs-full-barrier {} P0(int *x, int *y, atomic_t *u) { int r0; int r1; WRITE_ONCE(*x, 1); r0 = atomic_cmpxchg(u, 0, 1); r1 = READ_ONCE(*y); } P1(int *x, int *y, atomic_t *v) { int r0; int r1; WRITE_ONCE(*y, 1); r0 = atomic_cmpxchg(v, 0, 1); r1 = READ_ONCE(*x); } exists (u=1 /\ v=1 /\ 0:r1=0 /\ 1:r1=0) [1] https://marc.info/?l=linux-kernel&m=151930201102853&w=2 https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/hKywNHBkAXM https://marc.info/?l=linux-kernel&m=151633436614259&w=2 ``` Signed-off-by: Oleksii Kurochko --- Changes in V5: - update the commit message. - drop ALIGN_DOWN(). - update the definition of emulate_xchg_1_2(): - lr.d -> lr.w, sc.d -> sc.w. - drop ret argument. - code style fixes around asm volatile. - update prototype. - use asm named operands. - rename local variables. - add comment above the macros - update the definition of __xchg_generic: - drop local ptr__ variable. - code style fixes around switch() - update prototype. - introduce RISCV_FULL_BARRIES. - redefine cmpxchg() - update emulate_cmpxchg_1_2(): - update prototype - update local variables names and usage of them - use name asm operands. - add comment above the macros --- Changes in V4: - Code style fixes. - enforce in __xchg_*() has the same type for new and *ptr, also "\n" was removed at the en
[PATCH v5 03/23] xen/riscv: introduce nospec.h
>From the unpriviliged doc: No standard hints are presently defined. We anticipate standard hints to eventually include memory-system spatial and temporal locality hints, branch prediction hints, thread-scheduling hints, security tags, and instrumentation flags for simulation/emulation. Also, there are no speculation execution barriers. Therefore, functions evaluate_nospec() and block_speculation() should remain empty until a specific platform has an extension to deal with speculation execution. Signed-off-by: Oleksii Kurochko --- Changes in V5: - new patch --- xen/arch/riscv/include/asm/nospec.h | 25 + 1 file changed, 25 insertions(+) create mode 100644 xen/arch/riscv/include/asm/nospec.h diff --git a/xen/arch/riscv/include/asm/nospec.h b/xen/arch/riscv/include/asm/nospec.h new file mode 100644 index 00..4fb404a0a2 --- /dev/null +++ b/xen/arch/riscv/include/asm/nospec.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2024 Vates */ + +#ifndef _ASM_GENERIC_NOSPEC_H +#define _ASM_GENERIC_NOSPEC_H + +static inline bool evaluate_nospec(bool condition) +{ +return condition; +} + +static inline void block_speculation(void) +{ +} + +#endif /* _ASM_GENERIC_NOSPEC_H */ + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ -- 2.43.0
[PATCH v5 16/23] xen/riscv: add required things to current.h
Add minimal requied things to be able to build full Xen. Signed-off-by: Oleksii Kurochko Acked-by: Jan Beulich --- Changes in V5: - Nothing changed. Only rebase. --- Changes in V4: - BUG() was changed to BUG_ON("unimplemented"); - Change "xen/bug.h" to "xen/lib.h" as BUG_ON is defined in xen/lib.h. - Add Acked-by: Jan Beulich --- Changes in V3: - add SPDX - drop a forward declaration of struct vcpu; - update guest_cpu_user_regs() macros - replace get_processor_id with smp_processor_id - update the commit message - code style fixes --- Changes in V2: - Nothing changed. Only rebase. --- xen/arch/riscv/include/asm/current.h | 19 +++ 1 file changed, 19 insertions(+) diff --git a/xen/arch/riscv/include/asm/current.h b/xen/arch/riscv/include/asm/current.h index d84f15dc50..aedb6dc732 100644 --- a/xen/arch/riscv/include/asm/current.h +++ b/xen/arch/riscv/include/asm/current.h @@ -3,6 +3,21 @@ #ifndef __ASM_CURRENT_H #define __ASM_CURRENT_H +#include +#include +#include + +#ifndef __ASSEMBLY__ + +/* Which VCPU is "current" on this PCPU. */ +DECLARE_PER_CPU(struct vcpu *, curr_vcpu); + +#define currentthis_cpu(curr_vcpu) +#define set_current(vcpu) do { current = (vcpu); } while (0) +#define get_cpu_current(cpu) per_cpu(curr_vcpu, cpu) + +#define guest_cpu_user_regs() ({ BUG_ON("unimplemented"); NULL; }) + #define switch_stack_and_jump(stack, fn) do { \ asm volatile ( \ "mv sp, %0\n" \ @@ -10,4 +25,8 @@ unreachable(); \ } while ( false ) +#define get_per_cpu_offset() __per_cpu_offset[smp_processor_id()] + +#endif /* __ASSEMBLY__ */ + #endif /* __ASM_CURRENT_H */ -- 2.43.0
[PATCH v5 07/23] xen/asm-generic: introduce generic hweight64()
The generic hweight() function can be useful for architectures that don't have corresponding arch-specific instructions. Signed-off-by: Oleksii Kurochko --- Changes in V5: - new patch --- xen/include/asm-generic/bitops/hweight.h | 13 + 1 file changed, 13 insertions(+) create mode 100644 xen/include/asm-generic/bitops/hweight.h diff --git a/xen/include/asm-generic/bitops/hweight.h b/xen/include/asm-generic/bitops/hweight.h new file mode 100644 index 00..0d7577054e --- /dev/null +++ b/xen/include/asm-generic/bitops/hweight.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_BITOPS_HWEIGHT_H_ +#define _ASM_GENERIC_BITOPS_HWEIGHT_H_ + +/* + * hweightN - returns the hamming weight of a N-bit word + * @x: the word to weigh + * + * The Hamming Weight of a number is the total number of bits set in it. + */ +#define hweight64(x) generic_hweight64(x) + +#endif /* _ASM_GENERIC_BITOPS_HWEIGHT_H_ */ -- 2.43.0
[PATCH v5 12/23] xen/riscv: introduce io.h
The header taken form Linux 6.4.0-rc1 and is based on arch/riscv/include/asm/mmio.h with the following changes: - drop forcing of endianess for read*(), write*() functions as no matter what CPU endianness, what endianness a particular device (and hence its MMIO region(s)) is using is entirely independent. Hence conversion, where necessary, needs to occur at a layer up. Another one reason to drop endianess conversion here is: https://patchwork.kernel.org/project/linux-riscv/patch/2019045623.5749-3-...@lst.de/ One of the answers of the author of the commit: And we don't know if Linux will be around if that ever changes. The point is: a) the current RISC-V spec is LE only b) the current linux port is LE only except for this little bit There is no point in leaving just this bitrotting code around. It just confuses developers, (very very slightly) slows down compiles and will bitrot. It also won't be any significant help to a future developer down the road doing a hypothetical BE RISC-V Linux port. - drop unused argument of __io_ar() macros. - drop "#define _raw_{read,write}{b,w,l,d,q} _raw_{read,write}{b,w,l,d,q}" as they are unnessary. - Adopt the Xen code style for this header, considering that significant changes are not anticipated in the future. In the event of any issues, adapting them to Xen style should be easily manageable. - drop unnessary __r variables in macros read*_cpu() Addionally, to the header was added definions of ioremap_*(). Signed-off-by: Oleksii Kurochko --- Changes in V5: - Xen code style related fixes - drop #define _raw_{read,write}{b,w,l,d,q} _raw_{read,write}{b,w,l,d,q} - drop cpu_to_le16() - remove unuused argument in _io_ar() - update the commit message - drop unnessary __r variables in macros read*_cpu() - update the comments at the top of the header. --- Changes in V4: - delete inner parentheses in macros. - s/u/uint. --- Changes in V3: - re-sync with linux kernel - update the commit message --- Changes in V2: - Nothing changed. Only rebase. --- xen/arch/riscv/include/asm/io.h | 157 1 file changed, 157 insertions(+) create mode 100644 xen/arch/riscv/include/asm/io.h diff --git a/xen/arch/riscv/include/asm/io.h b/xen/arch/riscv/include/asm/io.h new file mode 100644 index 00..95a459432c --- /dev/null +++ b/xen/arch/riscv/include/asm/io.h @@ -0,0 +1,157 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * The header taken form Linux 6.4.0-rc1 and is based on + * arch/riscv/include/asm/mmio.h with the following changes: + * - drop forcing of endianess for read*(), write*() functions as + * no matter what CPU endianness, what endianness a particular device + * (and hence its MMIO region(s)) is using is entirely independent. + * Hence conversion, where necessary, needs to occur at a layer up. + * Another one reason to drop endianess conversion is: + * https://patchwork.kernel.org/project/linux-riscv/patch/2019045623.5749-3-...@lst.de/ + * One of the answers of the author of the commit: + * And we don't know if Linux will be around if that ever changes. + * The point is: + *a) the current RISC-V spec is LE only + *b) the current linux port is LE only except for this little bit + * There is no point in leaving just this bitrotting code around. It + * just confuses developers, (very very slightly) slows down compiles + * and will bitrot. It also won't be any significant help to a future + * developer down the road doing a hypothetical BE RISC-V Linux port. + * - drop unused argument of __io_ar() macros. + * - drop "#define _raw_{read,write}{b,w,l,d,q} _raw_{read,write}{b,w,l,d,q}" + * as they are unnessary. + * - Adopt the Xen code style for this header, considering that significant changes + * are not anticipated in the future. + * In the event of any issues, adapting them to Xen style should be easily + * manageable. + * - drop unnessary __r variables in macros read*_cpu() + * + * Copyright (C) 1996-2000 Russell King + * Copyright (C) 2012 ARM Ltd. + * Copyright (C) 2014 Regents of the University of California + * Copyright (C) 2024 Vates + */ + +#ifndef _ASM_RISCV_IO_H +#define _ASM_RISCV_IO_H + +#include + +/* + * The RISC-V ISA doesn't yet specify how to query or modify PMAs, so we can't + * change the properties of memory regions. This should be fixed by the + * upcoming platform spec. + */ +#define ioremap_nocache(addr, size) ioremap(addr, size) +#define ioremap_wc(addr, size) ioremap(addr, size) +#define ioremap_wt(addr, size) ioremap(addr, size) + +/* Generic IO read/write. These perform native-endian accesses. */ +static inline void __raw_writeb(uint8_t val, volatile void __iomem *addr) +{ +asm volatile ( "sb %0, 0(%1)" : : "r" (val), "r" (addr) ); +} + +static inline void __raw_writew(uint16_t val, volatile void __iomem *addr) +{ +as
[PATCH v5 05/23] xen/asm-generic: introduce generic find first set bit functions
These functions can be useful for architectures that don't have corresponding arch-specific instructions. Signed-off-by: Oleksii Kurochko --- Changes in V5: - new patch --- xen/include/asm-generic/bitops/__ffs.h| 47 +++ xen/include/asm-generic/bitops/ffs.h | 9 xen/include/asm-generic/bitops/ffsl.h | 16 +++ .../asm-generic/bitops/find-first-set-bit.h | 17 +++ 4 files changed, 89 insertions(+) create mode 100644 xen/include/asm-generic/bitops/__ffs.h create mode 100644 xen/include/asm-generic/bitops/ffs.h create mode 100644 xen/include/asm-generic/bitops/ffsl.h create mode 100644 xen/include/asm-generic/bitops/find-first-set-bit.h diff --git a/xen/include/asm-generic/bitops/__ffs.h b/xen/include/asm-generic/bitops/__ffs.h new file mode 100644 index 00..fecb4484d9 --- /dev/null +++ b/xen/include/asm-generic/bitops/__ffs.h @@ -0,0 +1,47 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_BITOPS___FFS_H_ +#define _ASM_GENERIC_BITOPS___FFS_H_ + +/** + * ffs - find first bit in word. + * @word: The word to search + * + * Returns 0 if no bit exists, otherwise returns 1-indexed bit location. + */ +static inline unsigned int __ffs(unsigned long word) +{ +unsigned int num = 0; + +#if BITS_PER_LONG == 64 +if ( (word & 0x) == 0 ) +{ +num += 32; +word >>= 32; +} +#endif +if ( (word & 0x) == 0 ) +{ +num += 16; +word >>= 16; +} +if ( (word & 0xff) == 0 ) +{ +num += 8; +word >>= 8; +} +if ( (word & 0xf) == 0 ) +{ +num += 4; +word >>= 4; +} +if ( (word & 0x3) == 0 ) +{ +num += 2; +word >>= 2; +} +if ( (word & 0x1) == 0 ) +num += 1; +return num; +} + +#endif /* _ASM_GENERIC_BITOPS___FFS_H_ */ diff --git a/xen/include/asm-generic/bitops/ffs.h b/xen/include/asm-generic/bitops/ffs.h new file mode 100644 index 00..3f75fded14 --- /dev/null +++ b/xen/include/asm-generic/bitops/ffs.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_BITOPS_FFS_H_ +#define _ASM_GENERIC_BITOPS_FFS_H_ + +#include + +#define ffs(x) ({ unsigned int t_ = (x); fls(ISOLATE_LSB(t_)); }) + +#endif /* _ASM_GENERIC_BITOPS_FFS_H_ */ diff --git a/xen/include/asm-generic/bitops/ffsl.h b/xen/include/asm-generic/bitops/ffsl.h new file mode 100644 index 00..d0996808f5 --- /dev/null +++ b/xen/include/asm-generic/bitops/ffsl.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_BITOPS_FFSL_H_ +#define _ASM_GENERIC_BITOPS_FFSL_H_ + +/** + * ffsl - find first bit in long. + * @word: The word to search + * + * Returns 0 if no bit exists, otherwise returns 1-indexed bit location. + */ +static inline unsigned int ffsl(unsigned long word) +{ +return generic_ffsl(word); +} + +#endif /* _ASM_GENERIC_BITOPS_FFSL_H_ */ diff --git a/xen/include/asm-generic/bitops/find-first-set-bit.h b/xen/include/asm-generic/bitops/find-first-set-bit.h new file mode 100644 index 00..7d28b8a89b --- /dev/null +++ b/xen/include/asm-generic/bitops/find-first-set-bit.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_BITOPS_FIND_FIRST_SET_BIT_H_ +#define _ASM_GENERIC_BITOPS_FIND_FIRST_SET_BIT_H_ + +/** + * find_first_set_bit - find the first set bit in @word + * @word: the word to search + * + * Returns the bit-number of the first set bit (first bit being 0). + * The input must *not* be zero. + */ +static inline unsigned int find_first_set_bit(unsigned long word) +{ +return ffsl(word) - 1; +} + +#endif /* _ASM_GENERIC_BITOPS_FIND_FIRST_SET_BIT_H_ */ -- 2.43.0
[PATCH v5 01/23] xen/riscv: disable unnecessary configs
This patch disables unnecessary configs for two cases: 1. By utilizing EXTRA_FIXED_RANDCONFIG for randconfig builds (GitLab CI jobs). 2. By using tiny64_defconfig for non-randconfig builds. Signed-off-by: Oleksii Kurochko --- Changes in V5: - Rebase and drop duplicated configs in EXTRA_FIXED_RANDCONFIG list - Update the commit message --- Changes in V4: - Nothing changed. Only rebase --- Changes in V3: - Remove EXTRA_FIXED_RANDCONFIG for non-randconfig jobs. For non-randconfig jobs, it is sufficient to disable configs by using the defconfig. - Remove double blank lines in build.yaml file before archlinux-current-gcc-riscv64-debug --- Changes in V2: - update the commit message. - remove xen/arch/riscv/Kconfig changes. --- automation/gitlab-ci/build.yaml | 24 xen/arch/riscv/configs/tiny64_defconfig | 17 + 2 files changed, 41 insertions(+) diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml index aac29ee13a..3b3d2c47dc 100644 --- a/automation/gitlab-ci/build.yaml +++ b/automation/gitlab-ci/build.yaml @@ -519,6 +519,30 @@ alpine-3.18-gcc-debug-arm64-boot-cpupools: CONFIG_EXPERT=y CONFIG_GRANT_TABLE=n CONFIG_MEM_ACCESS=n + CONFIG_SCHED_CREDIT=n + CONFIG_SCHED_CREDIT2=n + CONFIG_SCHED_RTDS=n + CONFIG_SCHED_NULL=n + CONFIG_SCHED_ARINC653=n + CONFIG_TRACEBUFFER=n + CONFIG_HYPFS=n + CONFIG_SPECULATIVE_HARDEN_ARRAY=n + CONFIG_ARGO=n + CONFIG_HYPFS_CONFIG=n + CONFIG_CORE_PARKING=n + CONFIG_DEBUG_TRACE=n + CONFIG_IOREQ_SERVER=n + CONFIG_CRASH_DEBUG=n + CONFIG_KEXEC=n + CONFIG_LIVEPATCH=n + CONFIG_NUMA=n + CONFIG_PERF_COUNTERS=n + CONFIG_HAS_PMAP=n + CONFIG_XENOPROF=n + CONFIG_COMPAT=n + CONFIG_UBSAN=n + CONFIG_NEEDS_LIBELF=n + CONFIG_XSM=n archlinux-current-gcc-riscv64: extends: .gcc-riscv64-cross-build diff --git a/xen/arch/riscv/configs/tiny64_defconfig b/xen/arch/riscv/configs/tiny64_defconfig index 09defe236b..35915255e6 100644 --- a/xen/arch/riscv/configs/tiny64_defconfig +++ b/xen/arch/riscv/configs/tiny64_defconfig @@ -7,6 +7,23 @@ # CONFIG_GRANT_TABLE is not set # CONFIG_SPECULATIVE_HARDEN_ARRAY is not set # CONFIG_MEM_ACCESS is not set +# CONFIG_ARGO is not set +# CONFIG_HYPFS_CONFIG is not set +# CONFIG_CORE_PARKING is not set +# CONFIG_DEBUG_TRACE is not set +# CONFIG_IOREQ_SERVER is not set +# CONFIG_CRASH_DEBUG is not setz +# CONFIG_KEXEC is not set +# CONFIG_LIVEPATCH is not set +# CONFIG_NUMA is not set +# CONFIG_PERF_COUNTERS is not set +# CONFIG_HAS_PMAP is not set +# CONFIG_TRACEBUFFER is not set +# CONFIG_XENOPROF is not set +# CONFIG_COMPAT is not set +# CONFIG_COVERAGE is not set +# CONFIG_UBSAN is not set +# CONFIG_NEEDS_LIBELF is not set CONFIG_RISCV_64=y CONFIG_DEBUG=y -- 2.43.0
[PATCH v5 10/23] xen/riscv: introduces acrquire, release and full barriers
Signed-off-by: Oleksii Kurochko --- Changes in V5: - new patch --- xen/arch/riscv/include/asm/fence.h | 9 + 1 file changed, 9 insertions(+) create mode 100644 xen/arch/riscv/include/asm/fence.h diff --git a/xen/arch/riscv/include/asm/fence.h b/xen/arch/riscv/include/asm/fence.h new file mode 100644 index 00..27f46fa897 --- /dev/null +++ b/xen/arch/riscv/include/asm/fence.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _ASM_RISCV_FENCE_H +#define _ASM_RISCV_FENCE_H + +#define RISCV_ACQUIRE_BARRIER "\tfence r , rw\n" +#define RISCV_RELEASE_BARRIER "\tfence rw, w\n" +#define RISCV_FULL_BARRIER "\tfence rw, rw\n" + +#endif /* _ASM_RISCV_FENCE_H */ -- 2.43.0
[PATCH v5 08/23] xen/asm-generic: introduce generic non-atomic test_*bit()
The patch introduces the following generic functions: * test_bit * generic___test_and_set_bit * generic___test_and_clear_bit * generic___test_and_change_bit Also, the patch introduces the following generics which are used by the functions mentioned above: * BITOP_BITS_PER_WORD * BITOP_MASK * BITOP_WORD * BITOP_TYPE These functions and macros can be useful for architectures that don't have corresponding arch-specific instructions. Signed-off-by: Oleksii Kurochko --- Changes in V5: - new patch --- xen/include/asm-generic/bitops/bitops-bits.h | 21 + .../asm-generic/bitops/generic-non-atomic.h | 89 +++ xen/include/asm-generic/bitops/test-bit.h | 18 3 files changed, 128 insertions(+) create mode 100644 xen/include/asm-generic/bitops/bitops-bits.h create mode 100644 xen/include/asm-generic/bitops/generic-non-atomic.h create mode 100644 xen/include/asm-generic/bitops/test-bit.h diff --git a/xen/include/asm-generic/bitops/bitops-bits.h b/xen/include/asm-generic/bitops/bitops-bits.h new file mode 100644 index 00..4ece2affd6 --- /dev/null +++ b/xen/include/asm-generic/bitops/bitops-bits.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_BITOPS_BITS_H_ +#define _ASM_GENERIC_BITOPS_BITS_H_ + +#ifndef BITOP_BITS_PER_WORD +#define BITOP_BITS_PER_WORD 32 +#endif + +#ifndef BITOP_MASK +#define BITOP_MASK(nr) (1U << ((nr) % BITOP_BITS_PER_WORD)) +#endif + +#ifndef BITOP_WORD +#define BITOP_WORD(nr) ((nr) / BITOP_BITS_PER_WORD) +#endif + +#ifndef BITOP_TYPE +typedef uint32_t bitops_uint_t; +#endif + +#endif /* _ASM_GENERIC_BITOPS_BITS_H_ */ diff --git a/xen/include/asm-generic/bitops/generic-non-atomic.h b/xen/include/asm-generic/bitops/generic-non-atomic.h new file mode 100644 index 00..42569d0d7c --- /dev/null +++ b/xen/include/asm-generic/bitops/generic-non-atomic.h @@ -0,0 +1,89 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * The file is based on Linux ( 6.4.0 ) header: + * include/asm-generic/bitops/generic-non-atomic.h + * + * Only functions that can be reused in Xen were left; others were removed. + * + * Also, the following changes were done: + * - it was updated the message inside #ifndef ... #endif. + * - __always_inline -> always_inline to be align with definition in + *xen/compiler.h. + * - update function prototypes from + *generic___test_and_*(unsigned long nr nr, volatile unsigned long *addr) to + *generic___test_and_*(unsigned long nr, volatile void *addr) to be + *consistent with other related macros/defines. + * - convert identations from tabs to spaces. + * - inside generic__test_and_* use 'bitops_uint_t' instead of 'unsigned long' + *to be generic. + */ + +#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H +#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H + +#include + +#include + +#ifndef _LINUX_BITOPS_H +#error only can be included directly +#endif + +/* + * Generic definitions for bit operations, should not be used in regular code + * directly. + */ + +/** + * generic___test_and_set_bit - Set a bit and return its old value + * @nr: Bit to set + * @addr: Address to count from + * + * This operation is non-atomic and can be reordered. + * If two examples of this operation race, one can appear to succeed + * but actually fail. You must protect multiple accesses with a lock. + */ +static always_inline bool +generic___test_and_set_bit(unsigned long nr, volatile void *addr) +{ +bitops_uint_t mask = BITOP_MASK(nr); +bitops_uint_t *p = ((bitops_uint_t *)addr) + BITOP_WORD(nr); +bitops_uint_t old = *p; + +*p = old | mask; +return (old & mask) != 0; +} + +/** + * generic___test_and_clear_bit - Clear a bit and return its old value + * @nr: Bit to clear + * @addr: Address to count from + * + * This operation is non-atomic and can be reordered. + * If two examples of this operation race, one can appear to succeed + * but actually fail. You must protect multiple accesses with a lock. + */ +static always_inline bool +generic___test_and_clear_bit(bitops_uint_t nr, volatile void *addr) +{ +bitops_uint_t mask = BITOP_MASK(nr); +bitops_uint_t *p = ((bitops_uint_t *)addr) + BITOP_WORD(nr); +bitops_uint_t old = *p; + +*p = old & ~mask; +return (old & mask) != 0; +} + +/* WARNING: non atomic and it can be reordered! */ +static always_inline bool +generic___test_and_change_bit(unsigned long nr, volatile void *addr) +{ +bitops_uint_t mask = BITOP_MASK(nr); +bitops_uint_t *p = ((bitops_uint_t *)addr) + BITOP_WORD(nr); +bitops_uint_t old = *p; + +*p = old ^ mask; +return (old & mask) != 0; +} + +#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */ diff --git a/xen/include/asm-generic/bitops/test-bit.h b/xen/include/asm-generic/bitops/test-bit.h new file mode 100644 index 00..6fb414d808 --- /dev/null +++ b/xen/include/asm-generic/bitops/test-bit.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0
[PATCH v5 06/23] xen/asm-generic: introduce generic ffz()
The generic ffz() can be useful for architectures that don't have corresponding arch-specific instruction. Signed-off-by: Oleksii Kurochko --- Changes in V5: - new patch --- xen/include/asm-generic/bitops/ffz.h | 18 ++ 1 file changed, 18 insertions(+) create mode 100644 xen/include/asm-generic/bitops/ffz.h diff --git a/xen/include/asm-generic/bitops/ffz.h b/xen/include/asm-generic/bitops/ffz.h new file mode 100644 index 00..5932fe6695 --- /dev/null +++ b/xen/include/asm-generic/bitops/ffz.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_BITOPS_FFZ_H_ +#define _ASM_GENERIC_BITOPS_FFZ_H_ + +/* + * ffz - find first zero in word. + * @word: The word to search + * + * Undefined if no zero exists, so code should check against ~0UL first. + * + * ffz() is defined as __ffs() and not as ffs() as it is defined in such + * a way in Linux kernel (6.4.0 ) from where this header was taken, so this + * header is supposed to be aligned with Linux kernel version. + * Also, most architectures are defined in the same way in Xen. + */ +#define ffz(x) __ffs(~(x)) + +#endif /* _ASM_GENERIC_BITOPS_FFZ_H_ */ -- 2.43.0
[PATCH v5 04/23] xen/asm-generic: introduce generic fls() and flsl() functions
These functions can be useful for architectures that don't have corresponding arch-specific instructions. Signed-off-by: Oleksii Kurochko --- Changes in V5: - new patch --- xen/include/asm-generic/bitops/fls.h | 18 ++ xen/include/asm-generic/bitops/flsl.h | 10 ++ 2 files changed, 28 insertions(+) create mode 100644 xen/include/asm-generic/bitops/fls.h create mode 100644 xen/include/asm-generic/bitops/flsl.h diff --git a/xen/include/asm-generic/bitops/fls.h b/xen/include/asm-generic/bitops/fls.h new file mode 100644 index 00..369a4c790c --- /dev/null +++ b/xen/include/asm-generic/bitops/fls.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_BITOPS_FLS_H_ +#define _ASM_GENERIC_BITOPS_FLS_H_ + +/** + * fls - find last (most-significant) bit set + * @x: the word to search + * + * This is defined the same way as ffs. + * Note fls(0) = 0, fls(1) = 1, fls(0x8000) = 32. + */ + +static inline int fls(unsigned int x) +{ +return generic_fls(x); +} + +#endif /* _ASM_GENERIC_BITOPS_FLS_H_ */ diff --git a/xen/include/asm-generic/bitops/flsl.h b/xen/include/asm-generic/bitops/flsl.h new file mode 100644 index 00..d0a2e9c729 --- /dev/null +++ b/xen/include/asm-generic/bitops/flsl.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_GENERIC_BITOPS_FLSL_H_ +#define _ASM_GENERIC_BITOPS_FLSL_H_ + +static inline int flsl(unsigned long x) +{ +return generic_flsl(x); +} + +#endif /* _ASM_GENERIC_BITOPS_FLSL_H_ */ -- 2.43.0
[PATCH v5 02/23] xen/riscv: use some asm-generic headers
The following headers end up the same as asm-generic's version: * altp2m.h * device.h * div64.h * hardirq.h * hypercall.h * iocap.h * paging.h * percpu.h * random.h * softirq.h * vm_event.h RISC-V should utilize the asm-generic's version of the mentioned headers instead of introducing them in the arch-specific folder. Signed-off-by: Oleksii Kurochko Acked-by: Jan Beulich --- Changes in V5: - Nothing changed. Only rebase. - update the commit message. - drop the message above revision log as there is no depenency for this patch from other patch series. --- Changes in V4: - removed numa.h from asm/include/Makefile because of the patch: [PATCH v2] NUMA: no need for asm/numa.h when !NUMA - updated the commit message --- Changes in V3: - remove monitor.h from the RISC-V asm/Makefile list. - add Acked-by: Jan Beulich --- Changes in V2: - New commit introduced in V2. --- xen/arch/riscv/include/asm/Makefile | 12 1 file changed, 12 insertions(+) create mode 100644 xen/arch/riscv/include/asm/Makefile diff --git a/xen/arch/riscv/include/asm/Makefile b/xen/arch/riscv/include/asm/Makefile new file mode 100644 index 00..ced02e26ed --- /dev/null +++ b/xen/arch/riscv/include/asm/Makefile @@ -0,0 +1,12 @@ +# SPDX-License-Identifier: GPL-2.0-only +generic-y += altp2m.h +generic-y += device.h +generic-y += div64.h +generic-y += hardirq.h +generic-y += hypercall.h +generic-y += iocap.h +generic-y += paging.h +generic-y += percpu.h +generic-y += random.h +generic-y += softirq.h +generic-y += vm_event.h -- 2.43.0
[PATCH v5 00/23] [PATCH v4 00/30] Enable build of full Xen for RISC-V
This patch series performs all of the additions necessary to drop the build overrides for RISCV and enable the full Xen build. Except in cases where compatibile implementations already exist (e.g. atomic.h and bitops.h), the newly added definitions are simple. The patch series is based on the following patch series: - [PATCH v6 0/9] Introduce generic headers [1] - [PATCH] move __read_mostly to xen/cache.h [2] - [XEN PATCH v2 1/3] xen: introduce STATIC_ASSERT_UNREACHABLE() [3] - [PATCH] xen/lib: introduce generic find next bit operations [4] Right now, the patch series doesn't have a direct dependency on [2] and it provides __read_mostly in the patch: [PATCH v3 26/34] xen/riscv: add definition of __read_mostly However, it will be dropped as soon as [2] is merged or at least when the final version of the patch [2] is provided. [1] https://lore.kernel.org/xen-devel/cover.1703072575.git.oleksii.kuroc...@gmail.com/ [2] https://lore.kernel.org/xen-devel/f25eb5c9-7c14-6e23-8535-2c66772b3...@suse.com/ [3] https://lore.kernel.org/xen-devel/42fc6ae8d3eb802429d29c774502ff232340dc84.1706259490.git.federico.seraf...@bugseng.com/ [4] https://lore.kernel.org/xen-devel/52730e6314210ba4164a9934a720c4fda201447b.1706266854.git.oleksii.kuroc...@gmail.com/ --- Changes in V5: - Update the cover letter as one of the dependencies were merged to staging. - Was introduced asm-generic for atomic ops and separate patches for asm-generic bit ops - Moved fence.h to separate patch to deal with some patches dependecies on fence.h - Patches were dropped as they were merged to staging: * [PATCH v4 03/30] xen: add support in public/hvm/save.h for PPC and RISC-V * [PATCH v4 04/30] xen/riscv: introduce cpufeature.h * [PATCH v4 05/30] xen/riscv: introduce guest_atomics.h * [PATCH v4 06/30] xen: avoid generation of empty asm/iommu.h * [PATCH v4 08/30] xen/riscv: introduce setup.h * [PATCH v4 10/30] xen/riscv: introduce flushtlb.h * [PATCH v4 11/30] xen/riscv: introduce smp.h * [PATCH v4 15/30] xen/riscv: introduce irq.h * [PATCH v4 16/30] xen/riscv: introduce p2m.h * [PATCH v4 17/30] xen/riscv: introduce regs.h * [PATCH v4 18/30] xen/riscv: introduce time.h * [PATCH v4 19/30] xen/riscv: introduce event.h * [PATCH v4 22/30] xen/riscv: define an address of frame table - Other changes are specific to specific patches. please look at specific patch --- Changes in V4: - Update the cover letter message: new patch series dependencies. - Some patches were merged to staging, so they were dropped in this patch series: [PATCH v3 09/34] xen/riscv: introduce system.h [PATCH v3 18/34] xen/riscv: introduce domain.h [PATCH v3 19/34] xen/riscv: introduce guest_access.h - Was sent out of this patch series: [PATCH v3 16/34] xen/lib: introduce generic find next bit operations - [PATCH v3 17/34] xen/riscv: add compilation of generic find-next-bit.c was droped as CONFIG_GENERIC_FIND_NEXT_BIT was dropped. - All other changes are specific to a specific patch. --- Changes in V3: - Update the cover letter message - The following patches were dropped as they were merged to staging: [PATCH v2 03/39] xen/riscv:introduce asm/byteorder.h [PATCH v2 04/39] xen/riscv: add public arch-riscv.h [PATCH v2 05/39] xen/riscv: introduce spinlock.h [PATCH v2 20/39] xen/riscv: define bug frame tables in xen.lds.S [PATCH v2 34/39] xen: add RISCV support for pmu.h [PATCH v2 35/39] xen: add necessary headers to common to build full Xen for RISC-V - Instead of the following patches were introduced new: [PATCH v2 10/39] xen/riscv: introduce asm/iommu.h [PATCH v2 11/39] xen/riscv: introduce asm/nospec.h - remove "asm/" for commit messages which start with "xen/riscv:" - code style updates. - add emulation of {cmp}xchg_* for 1 and 2 bytes types. - code style fixes. - add SPDX and footer for the newly added headers. - introduce generic find-next-bit.c. - some other mionor changes. ( details please find in a patch ) --- Changes in V2: - Drop the following patches as they are the part of [2]: [PATCH v1 06/57] xen/riscv: introduce paging.h [PATCH v1 08/57] xen/riscv: introduce asm/device.h [PATCH v1 10/57] xen/riscv: introduce asm/grant_table.h [PATCH v1 12/57] xen/riscv: introduce asm/hypercall.h [PATCH v1 13/57] xen/riscv: introduce asm/iocap.h [PATCH v1 15/57] xen/riscv: introduce asm/mem_access.h [PATCH v1 18/57] xen/riscv: introduce asm/random.h [PATCH v1 21/57] xen/riscv: introduce asm/xenoprof.h [PATCH v1 24/57] xen/riscv: introduce asm/percpu.h [PATCH v1 29/57] xen/riscv: introduce asm/hardirq.h [PATCH v1 33/57] xen/riscv: introduce asm/altp2m.h [PATCH v1 38/57] xen/riscv: introduce asm/monitor.h [PATCH v1 39/57] xen/riscv: introduce asm/numa.h [PATCH v1 42/57] xen/riscv: introduce asm/softirq.h - xen/lib.h in most of the cases were changed to xen/bug.h as mostly
Re: [PATCH] docs/sphinx: Start an FAQ, and add Kconfig/CET details
On 26/02/2024 4:52 pm, Jan Beulich wrote: > On 26.02.2024 17:25, Andrew Cooper wrote: >> This is long overdue, and we need to start somewhere. >> >> Signed-off-by: Andrew Cooper > Acked-by: Jan Beulich Thanks. > perhaps (nit) with ... > >> --- /dev/null >> +++ b/docs/faq.rst >> @@ -0,0 +1,71 @@ >> +.. SPDX-License-Identifier: CC-BY-4.0 >> + >> +Frequently Asked Questions >> +== >> + >> +How do I... >> +--- >> + >> +... check whether a Kconfig option is active? >> +^ >> + >> + Kconfig is a build time configuration system, combining inherent >> knowledge, >> + the capabilities of the toolchain, and explicit user choice to form a >> + configuration of a build of Xen. >> + >> + A file, by default ``.config``, is produced by the build identifying the >> + configuration used. Kconfig symbols all start with ``CONFIG_``, and come >> in >> + a variety of types including strings, integers and booleans. Booleans are >> + the most common, and when active are expressed with ``...=y``. e.g.:: >> + >> +xen.git/xen$ grep CONFIG_FOO .config >> +CONFIG_FOO_BOOLEAN=y >> +CONFIG_FOO_STRING="lorem ipsum" >> +CONFIG_FOO_INTEGER=42 >> + >> + Symbols which are either absent, or expressed as ``... is not set`` are >> + disabled. e.g.:: >> + >> +xen.git/xen$ grep CONFIG_BAR .config >> +# CONFIG_BAR is not set >> + >> + Builds of Xen configured with ``CONFIG_HYPFS_CONFIG=y`` embed their own >> + ``.config`` at build time, and can provide it to the :term:`control >> domain` >> + upon requested. e.g.:: >> + >> +[root@host ~]# xenhypfs cat /buildinfo/config | grep -e FOO -e BAR >> +CONFIG_FOO=y >> +# CONFIG_BAR is not set >> + >> + >> +... tell if CET is active? >> +^^ >> + >> + Control-flow Enforcement Technology support was added to Xen 4.14. It is >> + build time conditional, dependent on both having a new-enough toolchain >> and >> + an explicit Kconfig option, and also requires capable hardware. See >> + :term:`CET`. >> + >> + For CET-SS, Shadow Stacks, the minimum toolchain requirements are >> ``binutils >> + >= 2.29`` or ``LLVM >= 6``. No specific compiler support is required. >> + Check for ``CONFIG_XEN_SHSTK`` being active. >> + >> + For CET-IBT, Indirect Branch Tracking, the minimum toolchain requirements >> + are ``GCC >= 9`` and ``binutils >= 2.29``. Xen relies on a compiler >> feature >> + which is specific to GCC at the time of writing. Check for >> + ``CONFIG_XEN_IBT`` being active. >> + >> + If a capable Xen with booted on capable hardware, and CET is not disabled >> by > ... s/with/is/ (or "was"). Oops yes. Will fix. ~Andrew
Re: [PATCH] docs/sphinx: Start an FAQ, and add Kconfig/CET details
On 26.02.2024 17:25, Andrew Cooper wrote: > This is long overdue, and we need to start somewhere. > > Signed-off-by: Andrew Cooper Acked-by: Jan Beulich perhaps (nit) with ... > --- /dev/null > +++ b/docs/faq.rst > @@ -0,0 +1,71 @@ > +.. SPDX-License-Identifier: CC-BY-4.0 > + > +Frequently Asked Questions > +== > + > +How do I... > +--- > + > +... check whether a Kconfig option is active? > +^ > + > + Kconfig is a build time configuration system, combining inherent knowledge, > + the capabilities of the toolchain, and explicit user choice to form a > + configuration of a build of Xen. > + > + A file, by default ``.config``, is produced by the build identifying the > + configuration used. Kconfig symbols all start with ``CONFIG_``, and come > in > + a variety of types including strings, integers and booleans. Booleans are > + the most common, and when active are expressed with ``...=y``. e.g.:: > + > +xen.git/xen$ grep CONFIG_FOO .config > +CONFIG_FOO_BOOLEAN=y > +CONFIG_FOO_STRING="lorem ipsum" > +CONFIG_FOO_INTEGER=42 > + > + Symbols which are either absent, or expressed as ``... is not set`` are > + disabled. e.g.:: > + > +xen.git/xen$ grep CONFIG_BAR .config > +# CONFIG_BAR is not set > + > + Builds of Xen configured with ``CONFIG_HYPFS_CONFIG=y`` embed their own > + ``.config`` at build time, and can provide it to the :term:`control domain` > + upon requested. e.g.:: > + > +[root@host ~]# xenhypfs cat /buildinfo/config | grep -e FOO -e BAR > +CONFIG_FOO=y > +# CONFIG_BAR is not set > + > + > +... tell if CET is active? > +^^ > + > + Control-flow Enforcement Technology support was added to Xen 4.14. It is > + build time conditional, dependent on both having a new-enough toolchain and > + an explicit Kconfig option, and also requires capable hardware. See > + :term:`CET`. > + > + For CET-SS, Shadow Stacks, the minimum toolchain requirements are > ``binutils > + >= 2.29`` or ``LLVM >= 6``. No specific compiler support is required. > + Check for ``CONFIG_XEN_SHSTK`` being active. > + > + For CET-IBT, Indirect Branch Tracking, the minimum toolchain requirements > + are ``GCC >= 9`` and ``binutils >= 2.29``. Xen relies on a compiler > feature > + which is specific to GCC at the time of writing. Check for > + ``CONFIG_XEN_IBT`` being active. > + > + If a capable Xen with booted on capable hardware, and CET is not disabled > by ... s/with/is/ (or "was"). Jan
[PATCH v3 10/12] VMX: convert vmx_vmentry_control
... to a field in the capability/controls struct. Signed-off-by: Jan Beulich --- v2: New. --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -162,7 +162,6 @@ static int cf_check parse_ept_param_runt /* Dynamic (run-time adjusted) execution control flags. */ struct vmx_caps __ro_after_init vmx_caps; -u32 vmx_vmentry_control __read_mostly; u64 vmx_ept_vpid_cap __read_mostly; static uint64_t __read_mostly vmx_vmfunc; @@ -259,7 +258,6 @@ static int vmx_init_vmcs_config(bool bsp struct vmx_caps caps = {}; u64 _vmx_ept_vpid_cap = 0; u64 _vmx_misc_cap = 0; -u32 _vmx_vmentry_control; u64 _vmx_vmfunc = 0; bool mismatch = false; @@ -480,7 +478,7 @@ static int vmx_init_vmcs_config(bool bsp min = 0; opt = (VM_ENTRY_LOAD_GUEST_PAT | VM_ENTRY_LOAD_GUEST_EFER | VM_ENTRY_LOAD_BNDCFGS); -_vmx_vmentry_control = adjust_vmx_controls( +caps.vmentry_control = adjust_vmx_controls( "VMEntry Control", min, opt, MSR_IA32_VMX_ENTRY_CTLS, &mismatch); if ( mismatch ) @@ -491,7 +489,6 @@ static int vmx_init_vmcs_config(bool bsp /* First time through. */ vmx_caps = caps; vmx_ept_vpid_cap = _vmx_ept_vpid_cap; -vmx_vmentry_control= _vmx_vmentry_control; vmx_caps.basic_msr = ((uint64_t)vmx_basic_msr_high << 32) | vmx_basic_msr_low; vmx_vmfunc = _vmx_vmfunc; @@ -531,7 +528,7 @@ static int vmx_init_vmcs_config(bool bsp vmx_caps.vmexit_control, caps.vmexit_control); mismatch |= cap_check( "VMEntry Control", -vmx_vmentry_control, _vmx_vmentry_control); +vmx_caps.vmentry_control, caps.vmentry_control); mismatch |= cap_check( "EPT and VPID Capability", vmx_ept_vpid_cap, _vmx_ept_vpid_cap); @@ -1091,7 +1088,7 @@ static int construct_vmcs(struct vcpu *v { struct domain *d = v->domain; uint32_t vmexit_ctl = vmx_caps.vmexit_control; -u32 vmentry_ctl = vmx_vmentry_control; +u32 vmentry_ctl = vmx_caps.vmentry_control; int rc = 0; vmx_vmcs_enter(v); @@ -2202,7 +2199,6 @@ int __init vmx_vmcs_init(void) * Make sure all dependent features are off as well. */ memset(&vmx_caps, 0, sizeof(vmx_caps)); -vmx_vmentry_control= 0; vmx_ept_vpid_cap = 0; vmx_vmfunc = 0; } --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h @@ -235,7 +235,6 @@ void vmx_vmcs_reload(struct vcpu *v); #define VM_ENTRY_LOAD_GUEST_PAT 0x4000 #define VM_ENTRY_LOAD_GUEST_EFER0x8000 #define VM_ENTRY_LOAD_BNDCFGS 0x0001 -extern u32 vmx_vmentry_control; #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x0001U #define SECONDARY_EXEC_ENABLE_EPT 0x0002U @@ -297,6 +296,7 @@ struct vmx_caps { uint32_t secondary_exec_control; uint64_t tertiary_exec_control; uint32_t vmexit_control; +uint32_t vmentry_control; }; extern struct vmx_caps vmx_caps; @@ -325,9 +325,9 @@ extern struct vmx_caps vmx_caps; #define cpu_has_monitor_trap_flag \ (vmx_caps.cpu_based_exec_control & CPU_BASED_MONITOR_TRAP_FLAG) #define cpu_has_vmx_pat \ -(vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_PAT) +(vmx_caps.vmentry_control & VM_ENTRY_LOAD_GUEST_PAT) #define cpu_has_vmx_efer \ -(vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_EFER) +(vmx_caps.vmentry_control & VM_ENTRY_LOAD_GUEST_EFER) #define cpu_has_vmx_unrestricted_guest \ (vmx_caps.secondary_exec_control & SECONDARY_EXEC_UNRESTRICTED_GUEST) #define vmx_unrestricted_guest(v) \ @@ -355,7 +355,7 @@ extern struct vmx_caps vmx_caps; (vmx_caps.secondary_exec_control & SECONDARY_EXEC_ENABLE_PML) #define cpu_has_vmx_mpx \ ((vmx_caps.vmexit_control & VM_EXIT_CLEAR_BNDCFGS) && \ - (vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS)) + (vmx_caps.vmentry_control & VM_ENTRY_LOAD_BNDCFGS)) #define cpu_has_vmx_xsaves \ (vmx_caps.secondary_exec_control & SECONDARY_EXEC_XSAVES) #define cpu_has_vmx_tsc_scaling \
Re: [PATCH 5/6] nestedsvm: Remove bogus debug message from nestedsvm_check_intercepts
On 06/02/2024 1:20 am, George Dunlap wrote: > Changeset ef3e8db8068 ("x86/hvm: Corrections and improvements to > unhandled vmexit logging") introduced a printk to the default path of > the switch statement in nestedsvm_check_intercepts(), complaining of > an unknown exit reason. > > Unfortunately, the "core" switch statement which is meant to handle > all vmexit reasons is in nsvm_vmcb_guest_intercepts_exitcode(); the > switch statement in nestedsvm_check_intercepts() is only meant to > superimpose on top of that some special-casing for how to interaction > between L1 and L0 vmexits. > > Remove the printk, and add a comment to prevent future confusion. > > Signed-off-by: George Dunlap Erm... The addition of this printk was very deliberate, to point out where security fixes are needed. It's not bogus in the slightest. It is an error for exit reasons to not be inspected for safety in this path. Please revert this patch. ~Andrew
[PATCH v3 12/12] VMX: convert vmx_vmfunc
... to a field in the capability/controls struct. Signed-off-by: Jan Beulich --- v2: New. --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -162,7 +162,6 @@ static int cf_check parse_ept_param_runt /* Dynamic (run-time adjusted) execution control flags. */ struct vmx_caps __ro_after_init vmx_caps; -static uint64_t __read_mostly vmx_vmfunc; static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, vmxon_region); static DEFINE_PER_CPU(paddr_t, current_vmcs); @@ -256,7 +255,6 @@ static int vmx_init_vmcs_config(bool bsp u32 vmx_basic_msr_low, vmx_basic_msr_high, min, opt; struct vmx_caps caps = {}; u64 _vmx_misc_cap = 0; -u64 _vmx_vmfunc = 0; bool mismatch = false; rdmsr(MSR_IA32_VMX_BASIC, vmx_basic_msr_low, vmx_basic_msr_high); @@ -458,14 +456,14 @@ static int vmx_init_vmcs_config(bool bsp /* The IA32_VMX_VMFUNC MSR exists only when VMFUNC is available */ if ( caps.secondary_exec_control & SECONDARY_EXEC_ENABLE_VM_FUNCTIONS ) { -rdmsrl(MSR_IA32_VMX_VMFUNC, _vmx_vmfunc); +rdmsrl(MSR_IA32_VMX_VMFUNC, caps.vmfunc); /* * VMFUNC leaf 0 (EPTP switching) must be supported. * * Or we just don't use VMFUNC. */ -if ( !(_vmx_vmfunc & VMX_VMFUNC_EPTP_SWITCHING) ) +if ( !(caps.vmfunc & VMX_VMFUNC_EPTP_SWITCHING) ) caps.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VM_FUNCTIONS; } @@ -488,7 +486,6 @@ static int vmx_init_vmcs_config(bool bsp vmx_caps = caps; vmx_caps.basic_msr = ((uint64_t)vmx_basic_msr_high << 32) | vmx_basic_msr_low; -vmx_vmfunc = _vmx_vmfunc; vmx_display_features(); @@ -530,7 +527,7 @@ static int vmx_init_vmcs_config(bool bsp mismatch |= cap_check("VPID Capability", vmx_caps.vpid, caps.vpid); mismatch |= cap_check( "VMFUNC Capability", -vmx_vmfunc, _vmx_vmfunc); +vmx_caps.vmfunc, caps.vmfunc); if ( cpu_has_vmx_ins_outs_instr_info != !!(vmx_basic_msr_high & (VMX_BASIC_INS_OUT_INFO >> 32)) ) { @@ -2195,7 +2192,6 @@ int __init vmx_vmcs_init(void) * Make sure all dependent features are off as well. */ memset(&vmx_caps, 0, sizeof(vmx_caps)); -vmx_vmfunc = 0; } return ret; --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h @@ -298,6 +298,7 @@ struct vmx_caps { uint32_t vmentry_control; uint32_t ept; uint32_t vpid; +uint64_t vmfunc; }; extern struct vmx_caps vmx_caps;
[PATCH v3 11/12] VMX: convert vmx_ept_vpid_cap
... to fields in the capability/controls struct: Take the opportunity and split the two halves into separate EPT and VPID fields. Signed-off-by: Jan Beulich --- v3: Re-base. v2: New. --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -162,7 +162,6 @@ static int cf_check parse_ept_param_runt /* Dynamic (run-time adjusted) execution control flags. */ struct vmx_caps __ro_after_init vmx_caps; -u64 vmx_ept_vpid_cap __read_mostly; static uint64_t __read_mostly vmx_vmfunc; static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, vmxon_region); @@ -256,7 +255,6 @@ static int vmx_init_vmcs_config(bool bsp { u32 vmx_basic_msr_low, vmx_basic_msr_high, min, opt; struct vmx_caps caps = {}; -u64 _vmx_ept_vpid_cap = 0; u64 _vmx_misc_cap = 0; u64 _vmx_vmfunc = 0; bool mismatch = false; @@ -365,10 +363,10 @@ static int vmx_init_vmcs_config(bool bsp if ( caps.secondary_exec_control & (SECONDARY_EXEC_ENABLE_EPT | SECONDARY_EXEC_ENABLE_VPID) ) { -rdmsrl(MSR_IA32_VMX_EPT_VPID_CAP, _vmx_ept_vpid_cap); +rdmsr(MSR_IA32_VMX_EPT_VPID_CAP, caps.ept, caps.vpid); if ( !opt_ept_ad ) -_vmx_ept_vpid_cap &= ~VMX_EPT_AD_BIT; +caps.ept &= ~VMX_EPT_AD_BIT; /* * Additional sanity checking before using EPT: @@ -381,9 +379,9 @@ static int vmx_init_vmcs_config(bool bsp * * Or we just don't use EPT. */ -if ( !(_vmx_ept_vpid_cap & VMX_EPT_MEMORY_TYPE_WB) || - !(_vmx_ept_vpid_cap & VMX_EPT_WALK_LENGTH_4_SUPPORTED) || - !(_vmx_ept_vpid_cap & VMX_EPT_INVEPT_ALL_CONTEXT) ) +if ( !(caps.ept & VMX_EPT_MEMORY_TYPE_WB) || + !(caps.ept & VMX_EPT_WALK_LENGTH_4_SUPPORTED) || + !(caps.ept & VMX_EPT_INVEPT_ALL_CONTEXT) ) caps.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_EPT; /* @@ -392,11 +390,11 @@ static int vmx_init_vmcs_config(bool bsp * * Or we just don't use VPID. */ -if ( !(_vmx_ept_vpid_cap & VMX_VPID_INVVPID_ALL_CONTEXT) ) +if ( !(caps.vpid & VMX_VPID_INVVPID_ALL_CONTEXT) ) caps.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VPID; /* EPT A/D bits is required for PML */ -if ( !(_vmx_ept_vpid_cap & VMX_EPT_AD_BIT) ) +if ( !(caps.ept & VMX_EPT_AD_BIT) ) caps.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML; } @@ -488,7 +486,6 @@ static int vmx_init_vmcs_config(bool bsp { /* First time through. */ vmx_caps = caps; -vmx_ept_vpid_cap = _vmx_ept_vpid_cap; vmx_caps.basic_msr = ((uint64_t)vmx_basic_msr_high << 32) | vmx_basic_msr_low; vmx_vmfunc = _vmx_vmfunc; @@ -529,9 +526,8 @@ static int vmx_init_vmcs_config(bool bsp mismatch |= cap_check( "VMEntry Control", vmx_caps.vmentry_control, caps.vmentry_control); -mismatch |= cap_check( -"EPT and VPID Capability", -vmx_ept_vpid_cap, _vmx_ept_vpid_cap); +mismatch |= cap_check("EPT Capability", vmx_caps.ept, caps.ept); +mismatch |= cap_check("VPID Capability", vmx_caps.vpid, caps.vpid); mismatch |= cap_check( "VMFUNC Capability", vmx_vmfunc, _vmx_vmfunc); @@ -2199,7 +2195,6 @@ int __init vmx_vmcs_init(void) * Make sure all dependent features are off as well. */ memset(&vmx_caps, 0, sizeof(vmx_caps)); -vmx_ept_vpid_cap = 0; vmx_vmfunc = 0; } --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h @@ -274,12 +274,11 @@ void vmx_vmcs_reload(struct vcpu *v); #define VMX_EPT_AD_BIT 0x0020 #define VMX_EPT_INVEPT_SINGLE_CONTEXT 0x0200 #define VMX_EPT_INVEPT_ALL_CONTEXT 0x0400 -#define VMX_VPID_INVVPID_INSTRUCTION 0x001ULL -#define VMX_VPID_INVVPID_INDIVIDUAL_ADDR 0x100ULL -#define VMX_VPID_INVVPID_SINGLE_CONTEXT 0x200ULL -#define VMX_VPID_INVVPID_ALL_CONTEXT 0x400ULL -#define VMX_VPID_INVVPID_SINGLE_CONTEXT_RETAINING_GLOBAL 0x800ULL -extern u64 vmx_ept_vpid_cap; +#define VMX_VPID_INVVPID_INSTRUCTION0x0001 +#define VMX_VPID_INVVPID_INDIVIDUAL_ADDR0x0100 +#define VMX_VPID_INVVPID_SINGLE_CONTEXT 0x0200 +#define VMX_VPID_INVVPID_ALL_CONTEXT0x0400 +#define VMX_VPID_INVVPID_SINGLE_CONTEXT_RETAINING_GLOBAL0x0800 #define VMX_MISC_ACTIVITY_MASK 0x01c0 #define VMX_MISC_PROC_TRACE 0x4000 @@ -297,6 +296,8 @@ struct vmx_caps { uint64_t
[PATCH v3 09/12] VMX: convert vmx_vmexit_control
... to a field in the capability/controls struct. Signed-off-by: Jan Beulich --- v2: New. --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -162,7 +162,6 @@ static int cf_check parse_ept_param_runt /* Dynamic (run-time adjusted) execution control flags. */ struct vmx_caps __ro_after_init vmx_caps; -u32 vmx_vmexit_control __read_mostly; u32 vmx_vmentry_control __read_mostly; u64 vmx_ept_vpid_cap __read_mostly; static uint64_t __read_mostly vmx_vmfunc; @@ -260,7 +259,6 @@ static int vmx_init_vmcs_config(bool bsp struct vmx_caps caps = {}; u64 _vmx_ept_vpid_cap = 0; u64 _vmx_misc_cap = 0; -u32 _vmx_vmexit_control; u32 _vmx_vmentry_control; u64 _vmx_vmfunc = 0; bool mismatch = false; @@ -442,7 +440,7 @@ static int vmx_init_vmcs_config(bool bsp opt = (VM_EXIT_SAVE_GUEST_PAT | VM_EXIT_LOAD_HOST_PAT | VM_EXIT_LOAD_HOST_EFER | VM_EXIT_CLEAR_BNDCFGS); min |= VM_EXIT_IA32E_MODE; -_vmx_vmexit_control = adjust_vmx_controls( +caps.vmexit_control = adjust_vmx_controls( "VMExit Control", min, opt, MSR_IA32_VMX_EXIT_CTLS, &mismatch); /* @@ -493,7 +491,6 @@ static int vmx_init_vmcs_config(bool bsp /* First time through. */ vmx_caps = caps; vmx_ept_vpid_cap = _vmx_ept_vpid_cap; -vmx_vmexit_control = _vmx_vmexit_control; vmx_vmentry_control= _vmx_vmentry_control; vmx_caps.basic_msr = ((uint64_t)vmx_basic_msr_high << 32) | vmx_basic_msr_low; @@ -531,7 +528,7 @@ static int vmx_init_vmcs_config(bool bsp vmx_caps.tertiary_exec_control, caps.tertiary_exec_control); mismatch |= cap_check( "VMExit Control", -vmx_vmexit_control, _vmx_vmexit_control); +vmx_caps.vmexit_control, caps.vmexit_control); mismatch |= cap_check( "VMEntry Control", vmx_vmentry_control, _vmx_vmentry_control); @@ -1093,7 +1090,7 @@ void nocall vmx_asm_vmexit_handler(void) static int construct_vmcs(struct vcpu *v) { struct domain *d = v->domain; -u32 vmexit_ctl = vmx_vmexit_control; +uint32_t vmexit_ctl = vmx_caps.vmexit_control; u32 vmentry_ctl = vmx_vmentry_control; int rc = 0; @@ -2205,7 +2202,6 @@ int __init vmx_vmcs_init(void) * Make sure all dependent features are off as well. */ memset(&vmx_caps, 0, sizeof(vmx_caps)); -vmx_vmexit_control = 0; vmx_vmentry_control= 0; vmx_ept_vpid_cap = 0; vmx_vmfunc = 0; --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1060,7 +1060,7 @@ static void load_shadow_control(struct v nvmx_update_pin_control(v, vmx_caps.pin_based_exec_control); vmx_update_cpu_exec_control(v); vmx_update_secondary_exec_control(v); -nvmx_update_exit_control(v, vmx_vmexit_control); +nvmx_update_exit_control(v, vmx_caps.vmexit_control); nvmx_update_entry_control(v); vmx_update_exception_bitmap(v); nvmx_update_apic_access_address(v); --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h @@ -227,7 +227,6 @@ void vmx_vmcs_reload(struct vcpu *v); #define VM_EXIT_LOAD_HOST_EFER 0x0020 #define VM_EXIT_SAVE_PREEMPT_TIMER 0x0040 #define VM_EXIT_CLEAR_BNDCFGS 0x0080 -extern u32 vmx_vmexit_control; #define VM_ENTRY_IA32E_MODE 0x0200 #define VM_ENTRY_SMM0x0400 @@ -297,6 +296,7 @@ struct vmx_caps { uint32_t cpu_based_exec_control; uint32_t secondary_exec_control; uint64_t tertiary_exec_control; +uint32_t vmexit_control; }; extern struct vmx_caps vmx_caps; @@ -354,7 +354,7 @@ extern struct vmx_caps vmx_caps; #define cpu_has_vmx_pml \ (vmx_caps.secondary_exec_control & SECONDARY_EXEC_ENABLE_PML) #define cpu_has_vmx_mpx \ -((vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) && \ +((vmx_caps.vmexit_control & VM_EXIT_CLEAR_BNDCFGS) && \ (vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS)) #define cpu_has_vmx_xsaves \ (vmx_caps.secondary_exec_control & SECONDARY_EXEC_XSAVES)
[PATCH v3 08/12] VMX: convert vmx_tertiary_exec_control
... to a field in the capability/controls struct. Signed-off-by: Jan Beulich --- v3: New. --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -162,7 +162,6 @@ static int cf_check parse_ept_param_runt /* Dynamic (run-time adjusted) execution control flags. */ struct vmx_caps __ro_after_init vmx_caps; -uint64_t vmx_tertiary_exec_control __read_mostly; u32 vmx_vmexit_control __read_mostly; u32 vmx_vmentry_control __read_mostly; u64 vmx_ept_vpid_cap __read_mostly; @@ -259,7 +258,6 @@ static int vmx_init_vmcs_config(bool bsp { u32 vmx_basic_msr_low, vmx_basic_msr_high, min, opt; struct vmx_caps caps = {}; -uint64_t _vmx_tertiary_exec_control = 0; u64 _vmx_ept_vpid_cap = 0; u64 _vmx_misc_cap = 0; u32 _vmx_vmexit_control; @@ -362,7 +360,7 @@ static int vmx_init_vmcs_config(bool bsp { uint64_t opt = 0; -_vmx_tertiary_exec_control = adjust_vmx_controls2( +caps.tertiary_exec_control = adjust_vmx_controls2( "Tertiary Exec Control", 0, opt, MSR_IA32_VMX_PROCBASED_CTLS3, &mismatch); } @@ -494,7 +492,6 @@ static int vmx_init_vmcs_config(bool bsp { /* First time through. */ vmx_caps = caps; -vmx_tertiary_exec_control = _vmx_tertiary_exec_control; vmx_ept_vpid_cap = _vmx_ept_vpid_cap; vmx_vmexit_control = _vmx_vmexit_control; vmx_vmentry_control= _vmx_vmentry_control; @@ -531,7 +528,7 @@ static int vmx_init_vmcs_config(bool bsp vmx_caps.secondary_exec_control, caps.secondary_exec_control); mismatch |= cap_check( "Tertiary Exec Control", -vmx_tertiary_exec_control, _vmx_tertiary_exec_control); +vmx_caps.tertiary_exec_control, caps.tertiary_exec_control); mismatch |= cap_check( "VMExit Control", vmx_vmexit_control, _vmx_vmexit_control); @@ -1110,7 +1107,7 @@ static int construct_vmcs(struct vcpu *v v->arch.hvm.vmx.exec_control |= CPU_BASED_RDTSC_EXITING; v->arch.hvm.vmx.secondary_exec_control = vmx_caps.secondary_exec_control; -v->arch.hvm.vmx.tertiary_exec_control = vmx_tertiary_exec_control; +v->arch.hvm.vmx.tertiary_exec_control = vmx_caps.tertiary_exec_control; /* * Disable features which we don't want active by default: @@ -2208,7 +2205,6 @@ int __init vmx_vmcs_init(void) * Make sure all dependent features are off as well. */ memset(&vmx_caps, 0, sizeof(vmx_caps)); -vmx_tertiary_exec_control = 0; vmx_vmexit_control = 0; vmx_vmentry_control= 0; vmx_ept_vpid_cap = 0; --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h @@ -265,7 +265,6 @@ extern u32 vmx_vmentry_control; #define TERTIARY_EXEC_GUEST_PAGING_VERIFY BIT(3, UL) #define TERTIARY_EXEC_IPI_VIRT BIT(4, UL) #define TERTIARY_EXEC_VIRT_SPEC_CTRLBIT(7, UL) -extern uint64_t vmx_tertiary_exec_control; #define VMX_EPT_EXEC_ONLY_SUPPORTED 0x0001 #define VMX_EPT_WALK_LENGTH_4_SUPPORTED 0x0040 @@ -297,6 +296,7 @@ struct vmx_caps { uint32_t pin_based_exec_control; uint32_t cpu_based_exec_control; uint32_t secondary_exec_control; +uint64_t tertiary_exec_control; }; extern struct vmx_caps vmx_caps;
[PATCH v3 07/12] VMX: convert vmx_secondary_exec_control
... to a field in the capability/controls struct. Signed-off-by: Jan Beulich --- v2: New. --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -162,7 +162,6 @@ static int cf_check parse_ept_param_runt /* Dynamic (run-time adjusted) execution control flags. */ struct vmx_caps __ro_after_init vmx_caps; -u32 vmx_secondary_exec_control __read_mostly; uint64_t vmx_tertiary_exec_control __read_mostly; u32 vmx_vmexit_control __read_mostly; u32 vmx_vmentry_control __read_mostly; @@ -260,7 +259,6 @@ static int vmx_init_vmcs_config(bool bsp { u32 vmx_basic_msr_low, vmx_basic_msr_high, min, opt; struct vmx_caps caps = {}; -u32 _vmx_secondary_exec_control = 0; uint64_t _vmx_tertiary_exec_control = 0; u64 _vmx_ept_vpid_cap = 0; u64 _vmx_misc_cap = 0; @@ -355,7 +353,7 @@ static int vmx_init_vmcs_config(bool bsp SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE; -_vmx_secondary_exec_control = adjust_vmx_controls( +caps.secondary_exec_control = adjust_vmx_controls( "Secondary Exec Control", min, opt, MSR_IA32_VMX_PROCBASED_CTLS2, &mismatch); } @@ -370,7 +368,7 @@ static int vmx_init_vmcs_config(bool bsp } /* The IA32_VMX_EPT_VPID_CAP MSR exists only when EPT or VPID available */ -if ( _vmx_secondary_exec_control & (SECONDARY_EXEC_ENABLE_EPT | +if ( caps.secondary_exec_control & (SECONDARY_EXEC_ENABLE_EPT | SECONDARY_EXEC_ENABLE_VPID) ) { rdmsrl(MSR_IA32_VMX_EPT_VPID_CAP, _vmx_ept_vpid_cap); @@ -392,7 +390,7 @@ static int vmx_init_vmcs_config(bool bsp if ( !(_vmx_ept_vpid_cap & VMX_EPT_MEMORY_TYPE_WB) || !(_vmx_ept_vpid_cap & VMX_EPT_WALK_LENGTH_4_SUPPORTED) || !(_vmx_ept_vpid_cap & VMX_EPT_INVEPT_ALL_CONTEXT) ) -_vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_EPT; +caps.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_EPT; /* * the CPU must support INVVPID all context invalidation, because we @@ -401,14 +399,14 @@ static int vmx_init_vmcs_config(bool bsp * Or we just don't use VPID. */ if ( !(_vmx_ept_vpid_cap & VMX_VPID_INVVPID_ALL_CONTEXT) ) -_vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VPID; +caps.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VPID; /* EPT A/D bits is required for PML */ if ( !(_vmx_ept_vpid_cap & VMX_EPT_AD_BIT) ) -_vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML; +caps.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML; } -if ( _vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT ) +if ( caps.secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT ) { /* * To use EPT we expect to be able to clear certain intercepts. @@ -421,25 +419,25 @@ static int vmx_init_vmcs_config(bool bsp if ( must_be_one & (CPU_BASED_INVLPG_EXITING | CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING) ) -_vmx_secondary_exec_control &= +caps.secondary_exec_control &= ~(SECONDARY_EXEC_ENABLE_EPT | SECONDARY_EXEC_UNRESTRICTED_GUEST); } /* PML cannot be supported if EPT is not used */ -if ( !(_vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT) ) -_vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML; +if ( !(caps.secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT) ) +caps.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_PML; /* Turn off opt_ept_pml if PML feature is not present. */ -if ( !(_vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_PML) ) +if ( !(caps.secondary_exec_control & SECONDARY_EXEC_ENABLE_PML) ) opt_ept_pml = false; -if ( (_vmx_secondary_exec_control & SECONDARY_EXEC_PAUSE_LOOP_EXITING) && +if ( (caps.secondary_exec_control & SECONDARY_EXEC_PAUSE_LOOP_EXITING) && ple_gap == 0 ) { if ( !vmx_caps.pin_based_exec_control ) printk(XENLOG_INFO "Disable Pause-Loop Exiting.\n"); -_vmx_secondary_exec_control &= ~ SECONDARY_EXEC_PAUSE_LOOP_EXITING; +caps.secondary_exec_control &= ~ SECONDARY_EXEC_PAUSE_LOOP_EXITING; } min = VM_EXIT_ACK_INTR_ON_EXIT; @@ -454,7 +452,7 @@ static int vmx_init_vmcs_config(bool bsp * delivery" and "acknowledge interrupt on exit" is set. For the latter * is a minimal requirement, only check the former, which is optional. */ -if ( !(_vmx_secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) ) +if ( !(caps.secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) ) caps.pin_based_exec_control &= ~PIN_BASED_POSTED_INTERRUPT; if ( iommu_intpost && @@ -466,7 +464,7 @@ static int vmx_init_
[PATCH v3 06/12] VMX: convert vmx_cpu_based_exec_control
... to a field in the capability/controls struct. Signed-off-by: Jan Beulich --- v2: New. --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -162,7 +162,6 @@ static int cf_check parse_ept_param_runt /* Dynamic (run-time adjusted) execution control flags. */ struct vmx_caps __ro_after_init vmx_caps; -u32 vmx_cpu_based_exec_control __read_mostly; u32 vmx_secondary_exec_control __read_mostly; uint64_t vmx_tertiary_exec_control __read_mostly; u32 vmx_vmexit_control __read_mostly; @@ -261,7 +260,6 @@ static int vmx_init_vmcs_config(bool bsp { u32 vmx_basic_msr_low, vmx_basic_msr_high, min, opt; struct vmx_caps caps = {}; -u32 _vmx_cpu_based_exec_control; u32 _vmx_secondary_exec_control = 0; uint64_t _vmx_tertiary_exec_control = 0; u64 _vmx_ept_vpid_cap = 0; @@ -299,12 +297,12 @@ static int vmx_init_vmcs_config(bool bsp CPU_BASED_MONITOR_TRAP_FLAG | CPU_BASED_ACTIVATE_SECONDARY_CONTROLS | CPU_BASED_ACTIVATE_TERTIARY_CONTROLS); -_vmx_cpu_based_exec_control = adjust_vmx_controls( +caps.cpu_based_exec_control = adjust_vmx_controls( "CPU-Based Exec Control", min, opt, MSR_IA32_VMX_PROCBASED_CTLS, &mismatch); -_vmx_cpu_based_exec_control &= ~CPU_BASED_RDTSC_EXITING; -if ( _vmx_cpu_based_exec_control & CPU_BASED_TPR_SHADOW ) -_vmx_cpu_based_exec_control &= +caps.cpu_based_exec_control &= ~CPU_BASED_RDTSC_EXITING; +if ( caps.cpu_based_exec_control & CPU_BASED_TPR_SHADOW ) +caps.cpu_based_exec_control &= ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING); rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap); @@ -321,7 +319,7 @@ static int vmx_init_vmcs_config(bool bsp return -EINVAL; } -if ( _vmx_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS ) +if ( caps.cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS ) { min = 0; opt = (SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | @@ -351,7 +349,7 @@ static int vmx_init_vmcs_config(bool bsp * "APIC Register Virtualization" and "Virtual Interrupt Delivery" * can be set only when "use TPR shadow" is set */ -if ( (_vmx_cpu_based_exec_control & CPU_BASED_TPR_SHADOW) && +if ( (caps.cpu_based_exec_control & CPU_BASED_TPR_SHADOW) && opt_apicv_enabled ) opt |= SECONDARY_EXEC_APIC_REGISTER_VIRT | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | @@ -362,7 +360,7 @@ static int vmx_init_vmcs_config(bool bsp MSR_IA32_VMX_PROCBASED_CTLS2, &mismatch); } -if ( _vmx_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS ) +if ( caps.cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS ) { uint64_t opt = 0; @@ -498,7 +496,6 @@ static int vmx_init_vmcs_config(bool bsp { /* First time through. */ vmx_caps = caps; -vmx_cpu_based_exec_control = _vmx_cpu_based_exec_control; vmx_secondary_exec_control = _vmx_secondary_exec_control; vmx_tertiary_exec_control = _vmx_tertiary_exec_control; vmx_ept_vpid_cap = _vmx_ept_vpid_cap; @@ -531,7 +528,7 @@ static int vmx_init_vmcs_config(bool bsp vmx_caps.pin_based_exec_control, caps.pin_based_exec_control); mismatch |= cap_check( "CPU-Based Exec Control", -vmx_cpu_based_exec_control, _vmx_cpu_based_exec_control); +vmx_caps.cpu_based_exec_control, caps.cpu_based_exec_control); mismatch |= cap_check( "Secondary Exec Control", vmx_secondary_exec_control, _vmx_secondary_exec_control); @@ -,7 +1108,7 @@ static int construct_vmcs(struct vcpu *v /* VMCS controls. */ __vmwrite(PIN_BASED_VM_EXEC_CONTROL, vmx_caps.pin_based_exec_control); -v->arch.hvm.vmx.exec_control = vmx_cpu_based_exec_control; +v->arch.hvm.vmx.exec_control = vmx_caps.cpu_based_exec_control; if ( d->arch.vtsc && !cpu_has_vmx_tsc_scaling ) v->arch.hvm.vmx.exec_control |= CPU_BASED_RDTSC_EXITING; @@ -2214,7 +2211,6 @@ int __init vmx_vmcs_init(void) * Make sure all dependent features are off as well. */ memset(&vmx_caps, 0, sizeof(vmx_caps)); -vmx_cpu_based_exec_control = 0; vmx_secondary_exec_control = 0; vmx_tertiary_exec_control = 0; vmx_vmexit_control = 0; --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h @@ -210,7 +210,6 @@ void vmx_vmcs_reload(struct vcpu *v); #define CPU_BASED_MONITOR_EXITING 0x2000U #define CPU_BASED_PAUSE_EXITING 0x4000U #define CPU_BASED_ACTIVATE_SECONDARY_CONTROLS 0x8000U -extern u32 vmx_cpu_based_exec_control; #define PIN_BASED_EXT_INTR_MASK 0x0001 #define PIN_BASED_NMI_EXITING 0x0008 @@ -297,6 +296,7 @@ extern u64 vmx_ep
[PATCH v3 05/12] VMX: convert vmx_pin_based_exec_control
... to a field in the capability/controls struct. Use an instance of that struct also in vmx_init_vmcs_config(). Signed-off-by: Jan Beulich --- v3: Add initializer to vmx_init_vmcs_config() new local var. v2: New. --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -162,7 +162,6 @@ static int cf_check parse_ept_param_runt /* Dynamic (run-time adjusted) execution control flags. */ struct vmx_caps __ro_after_init vmx_caps; -u32 vmx_pin_based_exec_control __read_mostly; u32 vmx_cpu_based_exec_control __read_mostly; u32 vmx_secondary_exec_control __read_mostly; uint64_t vmx_tertiary_exec_control __read_mostly; @@ -261,7 +260,7 @@ static bool cap_check( static int vmx_init_vmcs_config(bool bsp) { u32 vmx_basic_msr_low, vmx_basic_msr_high, min, opt; -u32 _vmx_pin_based_exec_control; +struct vmx_caps caps = {}; u32 _vmx_cpu_based_exec_control; u32 _vmx_secondary_exec_control = 0; uint64_t _vmx_tertiary_exec_control = 0; @@ -278,7 +277,7 @@ static int vmx_init_vmcs_config(bool bsp PIN_BASED_NMI_EXITING); opt = (PIN_BASED_VIRTUAL_NMIS | PIN_BASED_POSTED_INTERRUPT); -_vmx_pin_based_exec_control = adjust_vmx_controls( +caps.pin_based_exec_control = adjust_vmx_controls( "Pin-Based Exec Control", min, opt, MSR_IA32_VMX_PINBASED_CTLS, &mismatch); @@ -440,7 +439,7 @@ static int vmx_init_vmcs_config(bool bsp if ( (_vmx_secondary_exec_control & SECONDARY_EXEC_PAUSE_LOOP_EXITING) && ple_gap == 0 ) { -if ( !vmx_pin_based_exec_control ) +if ( !vmx_caps.pin_based_exec_control ) printk(XENLOG_INFO "Disable Pause-Loop Exiting.\n"); _vmx_secondary_exec_control &= ~ SECONDARY_EXEC_PAUSE_LOOP_EXITING; } @@ -458,10 +457,10 @@ static int vmx_init_vmcs_config(bool bsp * is a minimal requirement, only check the former, which is optional. */ if ( !(_vmx_secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) ) -_vmx_pin_based_exec_control &= ~PIN_BASED_POSTED_INTERRUPT; +caps.pin_based_exec_control &= ~PIN_BASED_POSTED_INTERRUPT; if ( iommu_intpost && - !(_vmx_pin_based_exec_control & PIN_BASED_POSTED_INTERRUPT) ) + !(caps.pin_based_exec_control & PIN_BASED_POSTED_INTERRUPT) ) { printk("Intel VT-d Posted Interrupt is disabled for CPU-side Posted " "Interrupt is not enabled\n"); @@ -495,10 +494,10 @@ static int vmx_init_vmcs_config(bool bsp if ( mismatch ) return -EINVAL; -if ( !vmx_pin_based_exec_control ) +if ( !vmx_caps.pin_based_exec_control ) { /* First time through. */ -vmx_pin_based_exec_control = _vmx_pin_based_exec_control; +vmx_caps = caps; vmx_cpu_based_exec_control = _vmx_cpu_based_exec_control; vmx_secondary_exec_control = _vmx_secondary_exec_control; vmx_tertiary_exec_control = _vmx_tertiary_exec_control; @@ -529,7 +528,7 @@ static int vmx_init_vmcs_config(bool bsp vmcs_revision_id, vmx_basic_msr_low & VMX_BASIC_REVISION_MASK); mismatch |= cap_check( "Pin-Based Exec Control", -vmx_pin_based_exec_control, _vmx_pin_based_exec_control); +vmx_caps.pin_based_exec_control, caps.pin_based_exec_control); mismatch |= cap_check( "CPU-Based Exec Control", vmx_cpu_based_exec_control, _vmx_cpu_based_exec_control); @@ -1110,7 +1109,7 @@ static int construct_vmcs(struct vcpu *v vmx_vmcs_enter(v); /* VMCS controls. */ -__vmwrite(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_control); +__vmwrite(PIN_BASED_VM_EXEC_CONTROL, vmx_caps.pin_based_exec_control); v->arch.hvm.vmx.exec_control = vmx_cpu_based_exec_control; if ( d->arch.vtsc && !cpu_has_vmx_tsc_scaling ) @@ -2136,7 +2135,7 @@ void vmcs_dump_vcpu(struct vcpu *v) printk("TSC Offset = 0x%016lx TSC Multiplier = 0x%016lx\n", vmr(TSC_OFFSET), vmr(TSC_MULTIPLIER)); if ( (v->arch.hvm.vmx.exec_control & CPU_BASED_TPR_SHADOW) || - (vmx_pin_based_exec_control & PIN_BASED_POSTED_INTERRUPT) ) + (vmx_caps.pin_based_exec_control & PIN_BASED_POSTED_INTERRUPT) ) printk("TPR Threshold = 0x%02x PostedIntrVec = 0x%02x\n", vmr32(TPR_THRESHOLD), vmr16(POSTED_INTR_NOTIFICATION_VECTOR)); if ( (v->arch.hvm.vmx.secondary_exec_control & @@ -2215,7 +2214,6 @@ int __init vmx_vmcs_init(void) * Make sure all dependent features are off as well. */ memset(&vmx_caps, 0, sizeof(vmx_caps)); -vmx_pin_based_exec_control = 0; vmx_cpu_based_exec_control = 0; vmx_secondary_exec_control = 0; vmx_tertiary_exec_control = 0; --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1057,7 +1057,7 @@ static void load_shadow_control(struct v * and EXCEPTION * Enforce the removed features */ -
[PATCH v3 04/12] VMX: convert vmx_basic_msr
... to a struct field, which is then going to be accompanied by other capability/control data presently living in individual variables. As this structure isn't supposed to be altered post-boot, put it in .data.ro_after_init right away. Suggested-by: Roger Pau Monné Signed-off-by: Jan Beulich --- v2: New. --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -161,6 +161,7 @@ static int cf_check parse_ept_param_runt #endif /* Dynamic (run-time adjusted) execution control flags. */ +struct vmx_caps __ro_after_init vmx_caps; u32 vmx_pin_based_exec_control __read_mostly; u32 vmx_cpu_based_exec_control __read_mostly; u32 vmx_secondary_exec_control __read_mostly; @@ -175,8 +176,7 @@ static DEFINE_PER_CPU(paddr_t, current_v static DEFINE_PER_CPU(struct list_head, active_vmcs_list); DEFINE_PER_CPU(bool, vmxon); -#define vmcs_revision_id (vmx_basic_msr & VMX_BASIC_REVISION_MASK) -u64 __read_mostly vmx_basic_msr; +#define vmcs_revision_id (vmx_caps.basic_msr & VMX_BASIC_REVISION_MASK) static void __init vmx_display_features(void) { @@ -505,8 +505,8 @@ static int vmx_init_vmcs_config(bool bsp vmx_ept_vpid_cap = _vmx_ept_vpid_cap; vmx_vmexit_control = _vmx_vmexit_control; vmx_vmentry_control= _vmx_vmentry_control; -vmx_basic_msr = ((u64)vmx_basic_msr_high << 32) | - vmx_basic_msr_low; +vmx_caps.basic_msr = ((uint64_t)vmx_basic_msr_high << 32) | + vmx_basic_msr_low; vmx_vmfunc = _vmx_vmfunc; vmx_display_features(); @@ -560,7 +560,7 @@ static int vmx_init_vmcs_config(bool bsp mismatch = 1; } if ( (vmx_basic_msr_high & (VMX_BASIC_VMCS_SIZE_MASK >> 32)) != - ((vmx_basic_msr & VMX_BASIC_VMCS_SIZE_MASK) >> 32) ) + ((vmx_caps.basic_msr & VMX_BASIC_VMCS_SIZE_MASK) >> 32) ) { printk("VMX: CPU%d unexpected VMCS size %Lu\n", smp_processor_id(), @@ -2214,7 +2214,7 @@ int __init vmx_vmcs_init(void) * _vmx_vcpu_up() may have made it past feature identification. * Make sure all dependent features are off as well. */ -vmx_basic_msr = 0; +memset(&vmx_caps, 0, sizeof(vmx_caps)); vmx_pin_based_exec_control = 0; vmx_cpu_based_exec_control = 0; vmx_secondary_exec_control = 0; --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h @@ -294,6 +294,12 @@ extern u64 vmx_ept_vpid_cap; #define VMX_TSC_MULTIPLIER_MAX 0xULL +/* Capabilities and dynamic (run-time adjusted) execution control flags. */ +struct vmx_caps { +uint64_t basic_msr; +}; +extern struct vmx_caps vmx_caps; + #define cpu_has_wbinvd_exiting \ (vmx_secondary_exec_control & SECONDARY_EXEC_WBINVD_EXITING) #define cpu_has_vmx_virtualize_apic_accesses \ @@ -379,9 +385,8 @@ extern u64 vmx_ept_vpid_cap; */ #define VMX_BASIC_DEFAULT1_ZERO(1ULL << 55) -extern u64 vmx_basic_msr; #define cpu_has_vmx_ins_outs_instr_info \ -(!!(vmx_basic_msr & VMX_BASIC_INS_OUT_INFO)) +(!!(vmx_caps.basic_msr & VMX_BASIC_INS_OUT_INFO)) /* Guest interrupt status */ #define VMX_GUEST_INTR_STATUS_SUBFIELD_BITMASK 0x0FF --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1561,7 +1561,7 @@ static int nvmx_handle_vmxon(struct cpu_ rc = hvm_copy_from_guest_phys(&nvmcs_revid, gpa, sizeof(nvmcs_revid)); if ( rc != HVMTRANS_okay || (nvmcs_revid & ~VMX_BASIC_REVISION_MASK) || - ((nvmcs_revid ^ vmx_basic_msr) & VMX_BASIC_REVISION_MASK) ) + ((nvmcs_revid ^ vmx_caps.basic_msr) & VMX_BASIC_REVISION_MASK) ) { vmfail_invalid(regs); return X86EMUL_OKAY; @@ -1799,7 +1799,7 @@ static int nvmx_handle_vmptrld(struct cp { struct vmcs_struct *vvmcs = vvmcx; -if ( ((vvmcs->revision_id ^ vmx_basic_msr) & +if ( ((vvmcs->revision_id ^ vmx_caps.basic_msr) & VMX_BASIC_REVISION_MASK) || (!cpu_has_vmx_vmcs_shadowing && (vvmcs->revision_id & ~VMX_BASIC_REVISION_MASK)) ) @@ -2192,7 +2192,7 @@ int nvmx_msr_read_intercept(unsigned int case MSR_IA32_VMX_TRUE_PROCBASED_CTLS: case MSR_IA32_VMX_TRUE_EXIT_CTLS: case MSR_IA32_VMX_TRUE_ENTRY_CTLS: -if ( !(vmx_basic_msr & VMX_BASIC_DEFAULT1_ZERO) ) +if ( !(vmx_caps.basic_msr & VMX_BASIC_DEFAULT1_ZERO) ) return 0; break;
[PATCH v3 03/12] VMX: drop vmcs_revision_id
It's effectively redundant with vmx_basic_msr. For the #define replacement to work, struct vmcs_struct's respective field name also needs to change: Drop the not really meaningful "vmcs_" prefix from it. Signed-off-by: Jan Beulich --- v2: New. --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -175,7 +175,7 @@ static DEFINE_PER_CPU(paddr_t, current_v static DEFINE_PER_CPU(struct list_head, active_vmcs_list); DEFINE_PER_CPU(bool, vmxon); -static u32 vmcs_revision_id __read_mostly; +#define vmcs_revision_id (vmx_basic_msr & VMX_BASIC_REVISION_MASK) u64 __read_mostly vmx_basic_msr; static void __init vmx_display_features(void) @@ -498,7 +498,6 @@ static int vmx_init_vmcs_config(bool bsp if ( !vmx_pin_based_exec_control ) { /* First time through. */ -vmcs_revision_id = vmx_basic_msr_low & VMX_BASIC_REVISION_MASK; vmx_pin_based_exec_control = _vmx_pin_based_exec_control; vmx_cpu_based_exec_control = _vmx_cpu_based_exec_control; vmx_secondary_exec_control = _vmx_secondary_exec_control; @@ -610,7 +609,7 @@ static paddr_t vmx_alloc_vmcs(void) vmcs = __map_domain_page(pg); clear_page(vmcs); -vmcs->vmcs_revision_id = vmcs_revision_id; +vmcs->revision_id = vmcs_revision_id; unmap_domain_page(vmcs); return page_to_maddr(pg); --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1166,7 +1166,7 @@ static void nvmx_set_vmcs_pointer(struct paddr_t vvmcs_maddr = v->arch.hvm.vmx.vmcs_shadow_maddr; __vmpclear(vvmcs_maddr); -vvmcs->vmcs_revision_id |= VMCS_RID_TYPE_MASK; +vvmcs->revision_id |= VMCS_RID_TYPE_MASK; v->arch.hvm.vmx.secondary_exec_control |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING; __vmwrite(SECONDARY_VM_EXEC_CONTROL, @@ -1181,7 +1181,7 @@ static void nvmx_clear_vmcs_pointer(stru paddr_t vvmcs_maddr = v->arch.hvm.vmx.vmcs_shadow_maddr; __vmpclear(vvmcs_maddr); -vvmcs->vmcs_revision_id &= ~VMCS_RID_TYPE_MASK; +vvmcs->revision_id &= ~VMCS_RID_TYPE_MASK; v->arch.hvm.vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VMCS_SHADOWING; __vmwrite(SECONDARY_VM_EXEC_CONTROL, @@ -1799,10 +1799,10 @@ static int nvmx_handle_vmptrld(struct cp { struct vmcs_struct *vvmcs = vvmcx; -if ( ((vvmcs->vmcs_revision_id ^ vmx_basic_msr) & - VMX_BASIC_REVISION_MASK) || +if ( ((vvmcs->revision_id ^ vmx_basic_msr) & + VMX_BASIC_REVISION_MASK) || (!cpu_has_vmx_vmcs_shadowing && - (vvmcs->vmcs_revision_id & ~VMX_BASIC_REVISION_MASK)) ) + (vvmcs->revision_id & ~VMX_BASIC_REVISION_MASK)) ) { hvm_unmap_guest_frame(vvmcx, 1); vmfail(regs, VMX_INSN_VMPTRLD_INCORRECT_VMCS_ID); @@ -2214,7 +2214,7 @@ int nvmx_msr_read_intercept(unsigned int map_domain_page(_mfn(PFN_DOWN(v->arch.hvm.vmx.vmcs_pa))); data = (host_data & (~0ul << 32)) | - (vmcs->vmcs_revision_id & 0x7fff); + (vmcs->revision_id & 0x7fff); unmap_domain_page(vmcs); if ( !cpu_has_vmx_vmcs_shadowing ) --- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h +++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h @@ -17,7 +17,7 @@ int cf_check vmx_cpu_up(void); void cf_check vmx_cpu_down(void); struct vmcs_struct { -u32 vmcs_revision_id; +uint32_t revision_id; unsigned char data [0]; /* vmcs size is read from MSR */ };
[PATCH v3 02/12] x86/HVM: improve CET-IBT pruning of ENDBR
__init{const,data}_cf_clobber can have an effect only for pointers actually populated in the respective tables. While not the case for SVM right now, VMX installs a number of pointers only under certain conditions. Hence the respective functions would have their ENDBR purged only when those conditions are met. Invoke "pruning" functions after having copied the respective tables, for them to install any "missing" pointers. Signed-off-by: Jan Beulich --- This is largely cosmetic for present hardware, which when supporting CET-IBT likely also supports all of the advanced VMX features for which hook pointers are installed conditionally. The only case this would make a difference there is when use of respective features was suppressed via command line option (where available). For future hooks it may end up relevant even by default, and it also would be if AMD started supporting CET-IBT; right now it matters only for .pi_update_irte, as iommu_intpost continues to default to off. Originally I had meant to put the SVM and VMX functions in presmp- initcalls, but hvm/{svm,vmx}/built_in.o are linked into hvm/built_in.o before hvm/hvm.o. And I don't think I want to fiddle with link order here. --- v3: Re-base. v2: Use cpu_has_xen_ibt in prune_{svm,vmx}(). --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -161,10 +161,15 @@ static int __init cf_check hvm_enable(vo else if ( cpu_has_svm ) fns = start_svm(); +if ( fns ) +hvm_funcs = *fns; + +prune_vmx(); +prune_svm(); + if ( fns == NULL ) return 0; -hvm_funcs = *fns; hvm_enabled = 1; printk("HVM: %s enabled\n", fns->name); --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2587,6 +2587,19 @@ const struct hvm_function_table * __init return &svm_function_table; } +void __init prune_svm(void) +{ +/* + * Now that svm_function_table was copied, populate all function pointers + * which may have been left at NULL, for __initdata_cf_clobber to have as + * much of an effect as possible. + */ +if ( !cpu_has_xen_ibt ) +return; + +/* Nothing at present. */ +} + void asmlinkage svm_vmexit_handler(void) { struct cpu_user_regs *regs = guest_cpu_user_regs(); --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3042,6 +3042,30 @@ const struct hvm_function_table * __init return &vmx_function_table; } +void __init prune_vmx(void) +{ +/* + * Now that vmx_function_table was copied, populate all function pointers + * which may have been left at NULL, for __initdata_cf_clobber to have as + * much of an effect as possible. + */ +if ( !cpu_has_xen_ibt ) +return; + +vmx_function_table.set_descriptor_access_exiting = +vmx_set_descriptor_access_exiting; + +vmx_function_table.update_eoi_exit_bitmap = vmx_update_eoi_exit_bitmap; +vmx_function_table.process_isr= vmx_process_isr; +vmx_function_table.handle_eoi = vmx_handle_eoi; + +vmx_function_table.pi_update_irte = vmx_pi_update_irte; + +vmx_function_table.deliver_posted_intr = vmx_deliver_posted_intr; +vmx_function_table.sync_pir_to_irr = vmx_sync_pir_to_irr; +vmx_function_table.test_pir= vmx_test_pir; +} + /* * Not all cases receive valid value in the VM-exit instruction length field. * Callers must know what they're doing! --- a/xen/arch/x86/include/asm/hvm/hvm.h +++ b/xen/arch/x86/include/asm/hvm/hvm.h @@ -250,6 +250,9 @@ extern s8 hvm_port80_allowed; extern const struct hvm_function_table *start_svm(void); extern const struct hvm_function_table *start_vmx(void); +void prune_svm(void); +void prune_vmx(void); + int hvm_domain_initialise(struct domain *d, const struct xen_domctl_createdomain *config); void hvm_domain_relinquish_resources(struct domain *d);
[PATCH v3 01/12] VMX: don't run with CR4.VMXE set when VMX could not be enabled
While generally benign, doing so is still at best misleading. Signed-off-by: Jan Beulich --- Using set_in_cr4() seems favorable over updating mmu_cr4_features despite the resulting redundant CR4 update. But I certainly could be talked into going the alternative route. --- v2: Actually clear CR4.VMXE for the BSP on the error path. --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2961,14 +2961,18 @@ static bool __init has_if_pschange_mc(vo const struct hvm_function_table * __init start_vmx(void) { -set_in_cr4(X86_CR4_VMXE); +write_cr4(read_cr4() | X86_CR4_VMXE); if ( vmx_vmcs_init() ) { +write_cr4(read_cr4() & ~X86_CR4_VMXE); printk("VMX: failed to initialise.\n"); return NULL; } +/* Arrange for APs to have CR4.VMXE set early on. */ +set_in_cr4(X86_CR4_VMXE); + vmx_function_table.singlestep_supported = cpu_has_monitor_trap_flag; if ( cpu_has_vmx_dt_exiting )
[PATCH v3 00/12] x86/HVM: misc tidying
01: VMX: don't run with CR4.VMXE set when VMX could not be enabled 02: x86/HVM: improve CET-IBT pruning of ENDBR 03: VMX: drop vmcs_revision_id 04: VMX: convert vmx_basic_msr 05: VMX: convert vmx_pin_based_exec_control 06: VMX: convert vmx_cpu_based_exec_control 07: VMX: convert vmx_secondary_exec_control 08: VMX: convert vmx_tertiary_exec_control 09: VMX: convert vmx_vmexit_control 10: VMX: convert vmx_vmentry_control 11: VMX: convert vmx_ept_vpid_cap 12: VMX: convert vmx_vmfunc Jan
[PATCH] docs/sphinx: Start an FAQ, and add Kconfig/CET details
This is long overdue, and we need to start somewhere. Signed-off-by: Andrew Cooper --- CC: George Dunlap CC: Jan Beulich CC: Stefano Stabellini CC: Wei Liu CC: Julien Grall --- docs/faq.rst | 71 +++ docs/glossary.rst | 15 ++ docs/index.rst| 1 + 3 files changed, 87 insertions(+) create mode 100644 docs/faq.rst diff --git a/docs/faq.rst b/docs/faq.rst new file mode 100644 index ..75cd70328a5c --- /dev/null +++ b/docs/faq.rst @@ -0,0 +1,71 @@ +.. SPDX-License-Identifier: CC-BY-4.0 + +Frequently Asked Questions +== + +How do I... +--- + +... check whether a Kconfig option is active? +^ + + Kconfig is a build time configuration system, combining inherent knowledge, + the capabilities of the toolchain, and explicit user choice to form a + configuration of a build of Xen. + + A file, by default ``.config``, is produced by the build identifying the + configuration used. Kconfig symbols all start with ``CONFIG_``, and come in + a variety of types including strings, integers and booleans. Booleans are + the most common, and when active are expressed with ``...=y``. e.g.:: + +xen.git/xen$ grep CONFIG_FOO .config +CONFIG_FOO_BOOLEAN=y +CONFIG_FOO_STRING="lorem ipsum" +CONFIG_FOO_INTEGER=42 + + Symbols which are either absent, or expressed as ``... is not set`` are + disabled. e.g.:: + +xen.git/xen$ grep CONFIG_BAR .config +# CONFIG_BAR is not set + + Builds of Xen configured with ``CONFIG_HYPFS_CONFIG=y`` embed their own + ``.config`` at build time, and can provide it to the :term:`control domain` + upon requested. e.g.:: + +[root@host ~]# xenhypfs cat /buildinfo/config | grep -e FOO -e BAR +CONFIG_FOO=y +# CONFIG_BAR is not set + + +... tell if CET is active? +^^ + + Control-flow Enforcement Technology support was added to Xen 4.14. It is + build time conditional, dependent on both having a new-enough toolchain and + an explicit Kconfig option, and also requires capable hardware. See + :term:`CET`. + + For CET-SS, Shadow Stacks, the minimum toolchain requirements are ``binutils + >= 2.29`` or ``LLVM >= 6``. No specific compiler support is required. + Check for ``CONFIG_XEN_SHSTK`` being active. + + For CET-IBT, Indirect Branch Tracking, the minimum toolchain requirements + are ``GCC >= 9`` and ``binutils >= 2.29``. Xen relies on a compiler feature + which is specific to GCC at the time of writing. Check for + ``CONFIG_XEN_IBT`` being active. + + If a capable Xen with booted on capable hardware, and CET is not disabled by + command line option or errata, Xen will print some details early on boot + about which CET facilities have been turned on:: + +... +(XEN) CPU Vendor: Intel, Family 6 (0x6), Model 143 (0x8f), Stepping 8 (raw 000806f8) +(XEN) Enabling Supervisor Shadow Stacks +(XEN) Enabling Indirect Branch Tracking +(XEN) - IBT disabled in UEFI Runtime Services +(XEN) EFI RAM map: +... + + This can be obtained from the control domain with ``xl dmesg``, but remember + to confirm that the console ring hasn't wrapped. diff --git a/docs/glossary.rst b/docs/glossary.rst index 8ddbdab160a1..6adeec77e14c 100644 --- a/docs/glossary.rst +++ b/docs/glossary.rst @@ -28,6 +28,21 @@ Glossary single instance of Xen, used as the identifier in various APIs, and is typically allocated sequentially from 0. + CET + Control-flow Enforcement Technology is a facility in x86 CPUs for + defending against memory safety vulnerabilities. It is formed of two + independent features: + + * CET-SS, Shadow Stacks, are designed to protect against Return Oriented + Programming (ROP) attacks. + + * CET-IBT, Indirect Branch Tracking, is designed to protect against Call + or Jump Oriented Programming (COP/JOP) attacks. + + Intel support CET-SS and CET-IBT from the Tiger Lake (Client, 2020) and + Sapphire Rapids (Server, 2023) CPUs. AMD support only CET-SS, starting + with Zen3 (Both client and server, 2020) CPUs. + guest The term 'guest' has two different meanings, depending on context, and should not be confused with :term:`domain`. diff --git a/docs/index.rst b/docs/index.rst index 22fdde80590c..ab051a0f3833 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -72,4 +72,5 @@ Miscellanea .. toctree:: + faq glossary base-commit: 60e00f77a5cc671d30c5ef3318f5b8e9b74e4aa3 -- 2.30.2
[REGRESSION] Re: [XEN PATCH v4 4/4] eclair: move function and macro properties outside ECLAIR
On 02/02/2024 3:16 pm, Simone Ballarin wrote: > From: Maria Celeste Cesario > > Function and macro properties contained in ECLAIR/call_properties.ecl are of > general interest: this patch moves these annotations in a generaric JSON file > in docs. In this way, they can be exploited for other purposes (i.e. > documentation, > other tools). > > Add rst file containing explanation on how to update > function_macro_properties.json. > Add script to convert the JSON file in ECL configurations. > Remove ECLAIR/call_properties.ecl: the file is now automatically generated > from > the JSON file. > > Signed-off-by: Maria Celeste Cesario > Signed-off-by: Simone Ballarin > > --- > Changes in v4: > - add missing script for converting the JSON file in ECL configurations; > - improve commit message; > - remove call_properties.ecs. > --- > .../eclair_analysis/ECLAIR/analysis.ecl | 1 + > .../ECLAIR/call_properties.ecl| 128 --- > automation/eclair_analysis/prepare.sh | 2 + > automation/eclair_analysis/propertyparser.py | 37 + > docs/function_macro_properties.json | 841 ++ > docs/function_macro_properties.rst| 58 ++ > 6 files changed, 939 insertions(+), 128 deletions(-) > delete mode 100644 automation/eclair_analysis/ECLAIR/call_properties.ecl > create mode 100644 automation/eclair_analysis/propertyparser.py > create mode 100644 docs/function_macro_properties.json > create mode 100644 docs/function_macro_properties.rst This breaks the Sphinx build. checking consistency... /local/xen.git/docs/function_macro_properties.rst: WARNING: document isn't included in any toctree Also, the top level of docs really isn't an appropriate place to put it. Everything else is in docs/misra/. When you've moved the files, you'll need to edit docs/misra/index.rst to fix the build. ~Andrew
Re: [XEN PATCH] automation: Rework "build-each-commit-gcc" test
On 26.02.2024 15:49, Anthony PERARD wrote: > On Mon, Feb 26, 2024 at 10:23:37AM +0100, Jan Beulich wrote: >> On 20.02.2024 15:07, Anthony PERARD wrote: >>> --- a/automation/scripts/build-test.sh >>> +++ b/automation/scripts/build-test.sh >>> @@ -9,6 +9,37 @@ >>> # Set NON_SYMBOLIC_REF=1 if you want to use this script in detached HEAD >>> state. >>> # This is currently used by automated test system. >>> >>> +# Colors with ANSI escape sequences >>> +txt_info='[32m' >>> +txt_err='[31m' >>> +txt_clr='[0m' >>> + >>> +# $GITLAB_CI should be "true" or "false". >>> +if [ "$GITLAB_CI" != true ]; then >>> +GITLAB_CI=false >>> +fi >>> + >>> +gitlab_log_section() { >>> +if $GITLAB_CI; then >>> +echo -n "[0Ksection_$1:$(date +%s):$2 >>> [0K" >> >> ... there was either corruption on transmit here, or there's an embedded >> newline that I don't know how to deal with. > > No corruption here, there is a \r in this string. There's also \e a few > times. Most tools can deal with these characters just fine, so I didn't > even think there could be an issue. Okay, in an entirely different 2nd attempt I got it committed fine (I hope). It's probably too old fashioned of me / my scripts that I demand for every patch to pass a "patch --dry-run" first. Jan > If that byte is really an issue, I could rewrite the patch to use > printf, and I think it would read as: > > printf "\e[0Ksection_$1:$(date +%s):$2\r\e[0K" > > Thanks, >
Re: [PATCH] tools/xentop: Add VBD3 support to xentop
On 26/02/2024 2:22 pm, Jan Beulich wrote: > On 26.02.2024 15:12, Fouad Hilly wrote: >> From: Pritha Srivastava >> >> xl now knows how to drive tapdisk, so modified libxenstat to >> understand vbd3 statistics. >> >> Signed-off-by: Jorge Martin >> Signed-off-by: Pritha Srivastava > Just a formal question (I'm not really qualified to review this change): > With the above two S-o-b and the earlier From: - who is the original > author of this patch? The general expectation is for the 1st S-o-b to > be the author's. This patch has been sat in the XenServer patchqueue for a decade. Neither Pritha nor Jorge are with us any more. Sadly the review system we used back then is also no longer with us. >From ticketing, I think it was co-developed at the same time. This is the form the patch has existed in the patchqueue for that time, so I'm tempted to say we reorder the SoB chain to make it match. That's the best I can figure out. ~Andrew
Re: [PATCH 3/3] x86/entry: Introduce EFRAME_* constants
On 26/02/2024 2:32 pm, Jan Beulich wrote: > On 26.02.2024 13:55, Andrew Cooper wrote: >> restore_all_guest() does a lot of manipulation of the stack after popping the >> GPRs, and uses raw %rsp displacements to do so. Also, almost all entrypaths >> use raw %rsp displacements prior to pushing GPRs. >> >> Provide better mnemonics, to aid readability and reduce the chance of errors >> when editing. >> >> No functional change. The resulting binary is identical. >> >> Signed-off-by: Andrew Cooper > Reviewed-by: Jan Beulich Thanks. > with one small request: > >> --- a/xen/arch/x86/x86_64/asm-offsets.c >> +++ b/xen/arch/x86/x86_64/asm-offsets.c >> @@ -51,6 +51,23 @@ void __dummy__(void) >> OFFSET(UREGS_kernel_sizeof, struct cpu_user_regs, es); >> BLANK(); >> >> +/* >> + * EFRAME_* is for the entry/exit logic where %rsp is pointing at >> + * UREGS_error_code and GPRs are still guest values. >> + */ > "still/already" or some such to match "entry/exit"? Ok. ~Andrew
Re: [XEN PATCH] automation: Rework "build-each-commit-gcc" test
On Mon, Feb 26, 2024 at 10:23:37AM +0100, Jan Beulich wrote: > On 20.02.2024 15:07, Anthony PERARD wrote: > > --- a/automation/scripts/build-test.sh > > +++ b/automation/scripts/build-test.sh > > @@ -9,6 +9,37 @@ > > # Set NON_SYMBOLIC_REF=1 if you want to use this script in detached HEAD > > state. > > # This is currently used by automated test system. > > > > +# Colors with ANSI escape sequences > > +txt_info='[32m' > > +txt_err='[31m' > > +txt_clr='[0m' > > + > > +# $GITLAB_CI should be "true" or "false". > > +if [ "$GITLAB_CI" != true ]; then > > +GITLAB_CI=false > > +fi > > + > > +gitlab_log_section() { > > +if $GITLAB_CI; then > > +echo -n "[0Ksection_$1:$(date +%s):$2 > > [0K" > > ... there was either corruption on transmit here, or there's an embedded > newline that I don't know how to deal with. No corruption here, there is a \r in this string. There's also \e a few times. Most tools can deal with these characters just fine, so I didn't even think there could be an issue. If that byte is really an issue, I could rewrite the patch to use printf, and I think it would read as: printf "\e[0Ksection_$1:$(date +%s):$2\r\e[0K" Thanks, -- Anthony PERARD
Re: [PATCH v13.3 01/14] vpci: use per-domain PCI lock to protect vpci structure
On 21.02.2024 03:45, Stewart Hildebrand wrote: > From: Oleksandr Andrushchenko > > Use the per-domain PCI read/write lock to protect the presence of the > pci device vpci field. This lock can be used (and in a few cases is used > right away) so that vpci removal can be performed while holding the lock > in write mode. Previously such removal could race with vpci_read for > example. > > When taking both d->pci_lock and pdev->vpci->lock, they should be > taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid > possible deadlock situations. > > 1. Per-domain's pci_lock is used to protect pdev->vpci structure > from being removed. > > 2. Writing the command register and ROM BAR register may trigger > modify_bars to run, which in turn may access multiple pdevs while > checking for the existing BAR's overlap. The overlapping check, if > done under the read lock, requires vpci->lock to be acquired on both > devices being compared, which may produce a deadlock. It is not > possible to upgrade read lock to write lock in such a case. So, in > order to prevent the deadlock, use d->pci_lock in write mode instead. > > All other code, which doesn't lead to pdev->vpci destruction and does > not access multiple pdevs at the same time, can still use a > combination of the read lock and pdev->vpci->lock. > > 3. Drop const qualifier where the new rwlock is used and this is > appropriate. > > 4. Do not call process_pending_softirqs with any locks held. For that > unlock prior the call and re-acquire the locks after. After > re-acquiring the lock there is no need to check if pdev->vpci exists: > - in apply_map because of the context it is called (no race condition >possible) > - for MSI/MSI-X debug code because it is called at the end of >pdev->vpci access and no further access to pdev->vpci is made > > 5. Use d->pci_lock around for_each_pdev and pci_get_pdev() > while accessing pdevs in vpci code. > > 6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs > do not go away. The vPCI functions call several MSI-related functions > which already have existing non-vPCI callers. Change those MSI-related > functions to allow using either pcidevs_lock() or d->pci_lock for > ensuring pdevs do not go away. Holding d->pci_lock in read mode is > sufficient. Note that this pdev protection mechanism does not protect > other state or critical sections. These MSI-related functions already > have other race condition and state protection mechanims (e.g. > d->event_lock and msixtbl RCU), so we deduce that the use of the global > pcidevs_lock() is to ensure that pdevs do not go away. > > 7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking > that pdevs do not go away. The purpose of this wrapper is to aid > readability and document the intent of the pdev protection mechanism. > > 8. When possible, the existing non-vPCI callers of these MSI-related > functions haven't been switched to use the newly introduced per-domain > pci_lock, and will continue to use the global pcidevs_lock(). This is > done to reduce the risk of the new locking scheme introducing > regressions. Those users will be adjusted in due time. One exception > is where the pcidevs_lock() in allocate_and_map_msi_pirq() is moved to > the caller, physdev_map_pirq(): this instance is switched to > read_lock(&d->pci_lock) right away. > > Suggested-by: Roger Pau Monné > Suggested-by: Jan Beulich > Signed-off-by: Oleksandr Andrushchenko > Signed-off-by: Volodymyr Babchuk > Signed-off-by: Stewart Hildebrand Acked-by: Jan Beulich with two small remaining remarks (below) and on the assumption that an R-b from Roger in particular for the vPCI code is going to turn up eventually. > @@ -895,6 +891,15 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) > { > unsigned int i; > > +/* > + * Assert that pdev_list doesn't change. ASSERT_PDEV_LIST_IS_READ_LOCKED > + * is not suitable here because it may allow either pcidevs_lock() or > + * pci_lock to be held, but here we rely on pci_lock being held, not > + * pcidevs_lock(). > + */ > +ASSERT(rw_is_locked(&msix->pdev->domain->pci_lock)); > +ASSERT(spin_is_locked(&msix->pdev->vpci->lock)); As to the comment, I think it's not really "may". I also think referral to ... > @@ -913,13 +918,23 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) > struct pci_dev *pdev = msix->pdev; > > spin_unlock(&msix->pdev->vpci->lock); > +read_unlock(&pdev->domain->pci_lock); > process_pending_softirqs(); > + > +if ( !read_trylock(&pdev->domain->pci_lock) ) > +return -EBUSY; > + > /* NB: we assume that pdev cannot go away for an alive domain. */ > if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) ) > +{ > +read_unlock(&pdev->domain->pci_lock); > return -EBUSY; > +} > + >
[RFC PATCH 66/73] x86/pvm: Use new cpu feature to describe XENPV and PVM
From: Hou Wenlong Some PVOPS are patched as the native version directly if the guest is not a XENPV guest. However, this approach will not work after introducing a PVM guest. To address this, use a new CPU feature to describe XENPV and PVM, and ensure that those PVOPS are patched only when it is not a paravirtual guest. Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/entry/entry_64.S | 5 ++--- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/paravirt.h| 14 +++--- arch/x86/kernel/pvm.c | 1 + arch/x86/xen/enlighten_pv.c| 1 + 5 files changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index fe12605b3c05..6b41a1837698 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -127,9 +127,8 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL) * In the PVM guest case we must use eretu synthetic instruction. */ - ALTERNATIVE_2 "testb %al, %al; jz swapgs_restore_regs_and_return_to_usermode", \ - "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV, \ - "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_KVM_PVM_GUEST + ALTERNATIVE "testb %al, %al; jz swapgs_restore_regs_and_return_to_usermode", \ + "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_PV_GUEST /* * We win! This label is here just for ease of understanding diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index e17e72f13423..72ef58a2db19 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -238,6 +238,7 @@ #define X86_FEATURE_VCPUPREEMPT( 8*32+21) /* "" PV vcpu_is_preempted function */ #define X86_FEATURE_TDX_GUEST ( 8*32+22) /* Intel Trust Domain Extensions Guest */ #define X86_FEATURE_KVM_PVM_GUEST ( 8*32+23) /* KVM Pagetable-based Virtual Machine guest */ +#define X86_FEATURE_PV_GUEST ( 8*32+24) /* "" Paravirtual guest */ /* Intel-defined CPU features, CPUID level 0x0007:0 (EBX), word 9 */ #define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/ diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index deaee9ec575e..a864ee481ca2 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -143,7 +143,7 @@ static __always_inline unsigned long read_cr2(void) { return PVOP_ALT_CALLEE0(unsigned long, mmu.read_cr2, "mov %%cr2, %%rax;", - ALT_NOT(X86_FEATURE_XENPV)); + ALT_NOT(X86_FEATURE_PV_GUEST)); } static __always_inline void write_cr2(unsigned long x) @@ -154,13 +154,13 @@ static __always_inline void write_cr2(unsigned long x) static inline unsigned long __read_cr3(void) { return PVOP_ALT_CALL0(unsigned long, mmu.read_cr3, - "mov %%cr3, %%rax;", ALT_NOT(X86_FEATURE_XENPV)); + "mov %%cr3, %%rax;", ALT_NOT(X86_FEATURE_PV_GUEST)); } static inline void write_cr3(unsigned long x) { PVOP_ALT_VCALL1(mmu.write_cr3, x, - "mov %%rdi, %%cr3", ALT_NOT(X86_FEATURE_XENPV)); + "mov %%rdi, %%cr3", ALT_NOT(X86_FEATURE_PV_GUEST)); } static inline void __write_cr4(unsigned long x) @@ -694,17 +694,17 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu); static __always_inline unsigned long arch_local_save_flags(void) { return PVOP_ALT_CALLEE0(unsigned long, irq.save_fl, "pushf; pop %%rax;", - ALT_NOT(X86_FEATURE_XENPV)); + ALT_NOT(X86_FEATURE_PV_GUEST)); } static __always_inline void arch_local_irq_disable(void) { - PVOP_ALT_VCALLEE0(irq.irq_disable, "cli;", ALT_NOT(X86_FEATURE_XENPV)); + PVOP_ALT_VCALLEE0(irq.irq_disable, "cli;", ALT_NOT(X86_FEATURE_PV_GUEST)); } static __always_inline void arch_local_irq_enable(void) { - PVOP_ALT_VCALLEE0(irq.irq_enable, "sti;", ALT_NOT(X86_FEATURE_XENPV)); + PVOP_ALT_VCALLEE0(irq.irq_enable, "sti;", ALT_NOT(X86_FEATURE_PV_GUEST)); } static __always_inline unsigned long arch_local_irq_save(void) @@ -776,7 +776,7 @@ void native_pv_lock_init(void) __init; .endm #define SAVE_FLAGS ALTERNATIVE "PARA_IRQ_save_fl;", "pushf; pop %rax;", \ - ALT_NOT(X86_FEATURE_XENPV) + ALT_NOT(X86_FEATURE_PV_GUEST) #endif #endif /* CONFIG_PARAVIRT_XXL */ #endif /* CONFIG_X86_64 */ diff --git a/arch/x86/kernel/pvm.c b/arch/x86/kernel/pvm.c index c38e46a96ad3..d39550a8159f 100644 --- a/arch/x86/kernel/pvm.c +++ b/arch/x86/kernel/pvm.c @@ -300,6 +300,7 @@ void __init pvm_early_setup(void)
Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h
On Mon, 2024-02-26 at 15:20 +0100, Jan Beulich wrote: > On 26.02.2024 13:58, Oleksii wrote: > > On Mon, 2024-02-26 at 12:28 +0100, Jan Beulich wrote: > > > On 26.02.2024 12:18, Oleksii wrote: > > > > On Mon, 2024-02-26 at 10:45 +0100, Jan Beulich wrote: > > > > > On 23.02.2024 13:23, Oleksii wrote: > > > > > > > > > > > > > > > > > As 1- and 2-byte cases are emulated I decided that > > > > > > > > > > is > > > > > > > > > > not > > > > > > > > > > to > > > > > > > > > > provide > > > > > > > > > > sfx argument for emulation macros as it will not > > > > > > > > > > have > > > > > > > > > > to > > > > > > > > > > much > > > > > > > > > > affect on > > > > > > > > > > emulated types and just consume more performance on > > > > > > > > > > acquire > > > > > > > > > > and > > > > > > > > > > release > > > > > > > > > > version of sc/ld instructions. > > > > > > > > > > > > > > > > > > Question is whether the common case (4- and 8-byte > > > > > > > > > accesses) > > > > > > > > > shouldn't > > > > > > > > > be valued higher, with 1- and 2-byte emulation being > > > > > > > > > there > > > > > > > > > just > > > > > > > > > to > > > > > > > > > allow things to not break altogether. > > > > > > > > If I understand you correctly, it would make sense to > > > > > > > > add > > > > > > > > the > > > > > > > > 'sfx' > > > > > > > > argument for the 1/2-byte access case, ensuring that > > > > > > > > all > > > > > > > > options > > > > > > > > are > > > > > > > > available for 1/2-byte access case as well. > > > > > > > > > > > > > > That's one of the possibilities. As said, I'm not overly > > > > > > > worried > > > > > > > about > > > > > > > the emulated cases. For the initial implementation I'd > > > > > > > recommend > > > > > > > going > > > > > > > with what is easiest there, yielding the best possible > > > > > > > result > > > > > > > for > > > > > > > the > > > > > > > 4- and 8-byte cases. If later it turns out repeated > > > > > > > acquire/release > > > > > > > accesses are a problem in the emulation loop, things can > > > > > > > be > > > > > > > changed > > > > > > > to explicit barriers, without touching the 4- and 8-byte > > > > > > > cases. > > > > > > I am confused then a little bit if emulated case is not an > > > > > > issue. > > > > > > > > > > > > For 4- and 8-byte cases for xchg .aqrl is used, for relaxed > > > > > > and > > > > > > aqcuire > > > > > > version of xchg barries are used. > > > > > > > > > > > > The similar is done for cmpxchg. > > > > > > > > > > > > If something will be needed to change in emulation loop it > > > > > > won't > > > > > > require to change 4- and 8-byte cases. > > > > > > > > > > I'm afraid I don't understand your reply. > > > > IIUC, emulated cases it is implemented correctly in terms of > > > > usage > > > > barriers. And it also OK not to use sfx for lr/sc instructions > > > > and > > > > use > > > > only barriers. > > > > > > > > For 4- and 8-byte cases are used sfx + barrier depending on the > > > > specific case ( relaxed, acquire, release, generic xchg/cmpxchg > > > > ). > > > > What also looks to me correct. But you suggested to provide the > > > > best > > > > possible result for 4- and 8-byte cases. > > > > > > > > So I don't understand what the best possible result is as the > > > > current > > > > one usage of __{cmp}xchg_generic for each specific case ( > > > > relaxed, > > > > acquire, release, generic xchg/cmpxchg ) looks correct to me: > > > > xchg -> (..., ".aqrl", "", "") just suffix .aqrl suffix without > > > > barriers. > > > > xchg_release -> (..., "", RISCV_RELEASE_BARRIER, "" ) use only > > > > release > > > > barrier > > > > xchg_acquire -> (..., "", "", RISCV_ACQUIRE_BARRIER ), only > > > > acquire > > > > barrier > > > > xchg_relaxed -> (..., "", "", "") - no barries, no sfx > > > > > > So first: While explicit barriers are technically okay, I don't > > > follow why > > > you insist on using them when you can achieve the same by > > > suitably > > > tagging > > > the actual insn doing the exchange. Then second: It's somewhat > > > hard > > > for me > > > to see the final effect on the emulation paths without you > > > actually > > > having > > > done the switch. Maybe no special handling is necessary there > > > anymore > > > then. And as said, it may actually be acceptable for the > > > emulation > > > paths > > > to "only" be correct, but not be ideal in terms of performance. > > > After > > > all, > > > if you use the normal 4-byte primitive in there, more (non- > > > explicit) > > > barriers than needed would occur if the involved loop has to take > > > more > > > than one iteration. Which could (but imo doesn't need to be) > > > avoided > > > by > > > using a more relaxed 4-byte primitive there and an explicit > > > barrier > > > outside of the loop. > > > > According to the spec: > > Table A.5 ( part of the table only I copied here ) > > > > Linux Construct RVWMO Mapping > > atomic relaxed amo.{w|d
[RFC PATCH 56/73] x86/pvm: Relocate kernel image early in PVH entry
From: Hou Wenlong For a PIE kernel, it runs in a high virtual address in the PVH entry, so it needs to relocate the kernel image early in the PVH entry for the PVM guest. Signed-off-by: Hou Wenlong Signed-off-by: Lai Jiangshan --- arch/x86/include/asm/init.h | 5 + arch/x86/kernel/head64_identity.c | 5 - arch/x86/platform/pvh/enlighten.c | 22 ++ arch/x86/platform/pvh/head.S | 4 4 files changed, 31 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h index cc9ccf61b6bd..f78edef60253 100644 --- a/arch/x86/include/asm/init.h +++ b/arch/x86/include/asm/init.h @@ -4,6 +4,11 @@ #define __head __section(".head.text") +#define SYM_ABS_VA(sym) ({ \ + unsigned long __v; \ + asm("movabsq $" __stringify(sym) ", %0":"=r"(__v)); \ + __v; }) + struct x86_mapping_info { void *(*alloc_pgt_page)(void *); /* allocate buf for page table */ void *context; /* context for alloc_pgt_page */ diff --git a/arch/x86/kernel/head64_identity.c b/arch/x86/kernel/head64_identity.c index 4e6a073d9e6c..f69f9904003c 100644 --- a/arch/x86/kernel/head64_identity.c +++ b/arch/x86/kernel/head64_identity.c @@ -82,11 +82,6 @@ static void __head set_kernel_map_base(unsigned long text_base) } #endif -#define SYM_ABS_VA(sym) ({ \ - unsigned long __v; \ - asm("movabsq $" __stringify(sym) ", %0":"=r"(__v)); \ - __v; }) - static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd) { unsigned long vaddr, vaddr_end; diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index 00a92cb2c814..8c64c31c971b 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -1,8 +1,10 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include #include +#include #include #include #include @@ -113,6 +115,26 @@ static void __init hypervisor_specific_init(bool xen_guest) xen_pvh_init(&pvh_bootparams); } +#ifdef CONFIG_PVM_GUEST +void pvm_relocate_kernel(unsigned long physbase); + +void __init pvm_update_pgtable(unsigned long physbase) +{ + pgdval_t *pgd; + pudval_t *pud; + unsigned long base; + + pvm_relocate_kernel(physbase); + + pgd = (pgdval_t *)init_top_pgt; + base = SYM_ABS_VA(_text); + pgd[pgd_index(base)] = pgd[0]; + pgd[pgd_index(page_offset_base)] = pgd[0]; + pud = (pudval_t *)level3_ident_pgt; + pud[pud_index(base)] = (unsigned long)level2_ident_pgt + _KERNPG_TABLE_NOENC; +} +#endif + /* * This routine (and those that it might call) should not use * anything that lives in .bss since that segment will be cleared later. diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S index baaa3fe34a00..127f297f7257 100644 --- a/arch/x86/platform/pvh/head.S +++ b/arch/x86/platform/pvh/head.S @@ -109,6 +109,10 @@ SYM_CODE_START_LOCAL(pvh_start_xen) wrmsr #ifdef CONFIG_X86_PIE +#ifdef CONFIG_PVM_GUEST + leaq_text(%rip), %rdi + callpvm_update_pgtable +#endif movabs $2f, %rax ANNOTATE_RETPOLINE_SAFE jmp *%rax -- 2.19.1.6.gb485710b
Re: [PATCH 3/3] x86/entry: Introduce EFRAME_* constants
On 26.02.2024 13:55, Andrew Cooper wrote: > restore_all_guest() does a lot of manipulation of the stack after popping the > GPRs, and uses raw %rsp displacements to do so. Also, almost all entrypaths > use raw %rsp displacements prior to pushing GPRs. > > Provide better mnemonics, to aid readability and reduce the chance of errors > when editing. > > No functional change. The resulting binary is identical. > > Signed-off-by: Andrew Cooper Reviewed-by: Jan Beulich with one small request: > --- a/xen/arch/x86/x86_64/asm-offsets.c > +++ b/xen/arch/x86/x86_64/asm-offsets.c > @@ -51,6 +51,23 @@ void __dummy__(void) > OFFSET(UREGS_kernel_sizeof, struct cpu_user_regs, es); > BLANK(); > > +/* > + * EFRAME_* is for the entry/exit logic where %rsp is pointing at > + * UREGS_error_code and GPRs are still guest values. > + */ "still/already" or some such to match "entry/exit"? Jan
Re: [PATCH 2/3] x86/entry: Simplify expressions in compat_restore_all_guest()
On 26.02.2024 13:55, Andrew Cooper wrote: > compat_restore_all_guest() already has SPEC_CTRL_EXIT_TO_PV with a documented > requirement for %rsp to be both regs and cpuinfo. > > Use the now-normal annotations and simplify the expressions which happen to be > a subtraction of 0. > > Signed-off-by: Andrew Cooper Reviewed-by: Jan Beulich
Re: [PATCH 1/3] x86/entry: Adjustments to "reduce assembly code size of entry points"
On 26.02.2024 13:54, Andrew Cooper wrote: > Some retroactive review, for if I'd got to the patch in time. > > * The new ASM-friendly BUILD_BUG_ON() should be in a header file. > * entry_int82() wants the movl->movb treatment too. > > Fixes: c144b9e32427 ("x86: Reduce assembly code size of entry points") > Signed-off-by: Andrew Cooper Reviewed-by: Jan Beulich > Fixes just to link the two patches. Hmm, I see. > --- a/xen/arch/x86/include/asm/asm_defns.h > +++ b/xen/arch/x86/include/asm/asm_defns.h > @@ -56,6 +56,18 @@ register unsigned long current_stack_pointer asm("rsp"); > #define ASSERT_INTERRUPTS_DISABLED \ > ASSERT_INTERRUPT_STATUS(z, "INTERRUPTS DISABLED") > > +#ifdef __ASSEMBLY__ > + > +.macro BUILD_BUG_ON condstr cond:vararg > +.if \cond > +.error "Condition \"\condstr\" not satisfied" > +.endif > +.endm > +/* preprocessor macro to make error message more user friendly */ > +#define BUILD_BUG_ON(cond) BUILD_BUG_ON #cond cond > + > +#endif /* __ASSEMBLY__ */ > + > #ifdef __ASSEMBLY__ Minor remark: Seeing these back to back is slightly odd. A little higher up there's an "#ifndef __ASSEMBLY__" (producing e.g. ASM_CALL_CONSTRAINT) in an #else of which this would imo be a pretty good fit. Further down where SUBSECTION_LBL() is defined would also look to be more suitable than adding yet another #ifdef. Jan
Re: [PATCH] tools/xentop: Add VBD3 support to xentop
On Mon, Feb 26, 2024 at 2:12 PM Fouad Hilly wrote: > > From: Pritha Srivastava > > xl now knows how to drive tapdisk, so modified libxenstat to > understand vbd3 statistics. > > Signed-off-by: Jorge Martin > Signed-off-by: Pritha Srivastava > Signed-off-by: Fouad Hilly > --- > CC: Wei Liu > CC: Anthony PERARD > CC: Juergen Gross > --- > tools/libs/stat/xenstat_linux.c | 69 - > tools/libs/stat/xenstat_priv.h | 16 > tools/xentop/xentop.c | 1 + > 3 files changed, 85 insertions(+), 1 deletion(-) > > diff --git a/tools/libs/stat/xenstat_linux.c b/tools/libs/stat/xenstat_linux.c > index cbba54aa83ee..5a4a03634182 100644 > --- a/tools/libs/stat/xenstat_linux.c > +++ b/tools/libs/stat/xenstat_linux.c > @@ -390,6 +390,39 @@ void xenstat_uninit_networks(xenstat_handle * handle) > fclose(priv->procnetdev); > } > > +static int read_attributes_vbd3(char *vbd3_path, xenstat_vbd *vbd) > +{ > + FILE *fp; > + struct vbd3_stats vbd3_stats; > + > + fp = fopen(vbd3_path, "rb"); > + > + if (fp == NULL) > + { The syntax of this if statement is different from the ones below. > + return -1; > + } > + > + if (fread(&vbd3_stats, sizeof(struct vbd3_stats), 1, fp) != 1) { > + fclose(fp); > + return -1; > + } > + > + if (vbd3_stats.version != 1) { > + fclose(fp); > + return -1; > + } > + > + vbd->oo_reqs = vbd3_stats.oo_reqs; > + vbd->rd_reqs = vbd3_stats.read_reqs_submitted; > + vbd->rd_sects = vbd3_stats.read_sectors; > + vbd->wr_reqs = vbd3_stats.write_reqs_submitted; > + vbd->wr_sects = vbd3_stats.write_sectors; > + > + fclose(fp); > + > + return 0; > +} > + > static int read_attributes_vbd(const char *vbd_directory, const char *what, > char *ret, int cap) > { > static char file_name[80]; > @@ -438,7 +471,7 @@ int xenstat_collect_vbds(xenstat_node * node) > int ret; > char buf[256]; > > - ret = sscanf(dp->d_name, "%3s-%u-%u", buf, &domid, &vbd.dev); > + ret = sscanf(dp->d_name, "%255[^-]-%u-%u", buf, &domid, > &vbd.dev); > if (ret != 3) > continue; > if (!(strstr(buf, "vbd")) && !(strstr(buf, "tap"))) > @@ -448,6 +481,8 @@ int xenstat_collect_vbds(xenstat_node * node) > vbd.back_type = 1; > else if (strcmp(buf,"tap") == 0) > vbd.back_type = 2; > + else if (strcmp(buf,"vbd3") == 0) > + vbd.back_type = 3; > else > vbd.back_type = 0; > > @@ -479,6 +514,38 @@ int xenstat_collect_vbds(xenstat_node * node) > vbd.error = 1; > } > } > + else if (vbd.back_type == 3) > + { > + > + char *td3_pid; > + char *path; > + > + vbd.back_type = 3; > + vbd.error = 0; > + > + if (asprintf(&path, > "/local/domain/0/backend/vbd3/%u/%u/kthread-pid", domid, vbd.dev) < 0) > + continue; > + > + td3_pid = xs_read(node->handle->xshandle, XBT_NULL, > path, NULL); > + > + if (td3_pid == NULL) { > + free(path); > + continue; > + } > + > + free(path); Why not freeing "path" unconditionally? We free it in all cases. > + > + if (asprintf(&path, "/dev/shm/td3-%s/vbd-%u-%u", > td3_pid, domid, vbd.dev) < 0) { > + free(td3_pid); > + continue; > + } > + > + if (read_attributes_vbd3(path, &vbd) < 0) > + vbd.error = 1; > + > + free(td3_pid); > + free(path); > + } > else > { > vbd.error = 1; > diff --git a/tools/libs/stat/xenstat_priv.h b/tools/libs/stat/xenstat_priv.h > index 4eb44a8ebb84..c3a9635240e9 100644 > --- a/tools/libs/stat/xenstat_priv.h > +++ b/tools/libs/stat/xenstat_priv.h > @@ -98,6 +98,22 @@ struct xenstat_vbd { > unsigned long long wr_sects; > }; > > +struct vbd3_stats { > + uint32_t version; > + uint32_t __pad; > + uint64_t oo_reqs; > + uint64_t read_reqs_submitted; > + uint64_t read_reqs_completed; > + uint64_t read_sectors; > + uint64_t read_total_ticks; > + uint64_t write_reqs_submitted; > + uint64_t write_reqs_completed; > + uint64_t write_sectors; > + uint64_t write_total_ticks; > + uint64_t io_errors; > +
Re: [PATCH] tools/xentop: Add VBD3 support to xentop
On 26.02.2024 15:12, Fouad Hilly wrote: > From: Pritha Srivastava > > xl now knows how to drive tapdisk, so modified libxenstat to > understand vbd3 statistics. > > Signed-off-by: Jorge Martin > Signed-off-by: Pritha Srivastava Just a formal question (I'm not really qualified to review this change): With the above two S-o-b and the earlier From: - who is the original author of this patch? The general expectation is for the 1st S-o-b to be the author's. Jan > Signed-off-by: Fouad Hilly > --- > CC: Wei Liu > CC: Anthony PERARD > CC: Juergen Gross > --- > tools/libs/stat/xenstat_linux.c | 69 - > tools/libs/stat/xenstat_priv.h | 16 > tools/xentop/xentop.c | 1 + > 3 files changed, 85 insertions(+), 1 deletion(-) > > diff --git a/tools/libs/stat/xenstat_linux.c b/tools/libs/stat/xenstat_linux.c > index cbba54aa83ee..5a4a03634182 100644 > --- a/tools/libs/stat/xenstat_linux.c > +++ b/tools/libs/stat/xenstat_linux.c > @@ -390,6 +390,39 @@ void xenstat_uninit_networks(xenstat_handle * handle) > fclose(priv->procnetdev); > } > > +static int read_attributes_vbd3(char *vbd3_path, xenstat_vbd *vbd) > +{ > + FILE *fp; > + struct vbd3_stats vbd3_stats; > + > + fp = fopen(vbd3_path, "rb"); > + > + if (fp == NULL) > + { > + return -1; > + } > + > + if (fread(&vbd3_stats, sizeof(struct vbd3_stats), 1, fp) != 1) { > + fclose(fp); > + return -1; > + } > + > + if (vbd3_stats.version != 1) { > + fclose(fp); > + return -1; > + } > + > + vbd->oo_reqs = vbd3_stats.oo_reqs; > + vbd->rd_reqs = vbd3_stats.read_reqs_submitted; > + vbd->rd_sects = vbd3_stats.read_sectors; > + vbd->wr_reqs = vbd3_stats.write_reqs_submitted; > + vbd->wr_sects = vbd3_stats.write_sectors; > + > + fclose(fp); > + > + return 0; > +} > + > static int read_attributes_vbd(const char *vbd_directory, const char *what, > char *ret, int cap) > { > static char file_name[80]; > @@ -438,7 +471,7 @@ int xenstat_collect_vbds(xenstat_node * node) > int ret; > char buf[256]; > > - ret = sscanf(dp->d_name, "%3s-%u-%u", buf, &domid, &vbd.dev); > + ret = sscanf(dp->d_name, "%255[^-]-%u-%u", buf, &domid, > &vbd.dev); > if (ret != 3) > continue; > if (!(strstr(buf, "vbd")) && !(strstr(buf, "tap"))) > @@ -448,6 +481,8 @@ int xenstat_collect_vbds(xenstat_node * node) > vbd.back_type = 1; > else if (strcmp(buf,"tap") == 0) > vbd.back_type = 2; > + else if (strcmp(buf,"vbd3") == 0) > + vbd.back_type = 3; > else > vbd.back_type = 0; > > @@ -479,6 +514,38 @@ int xenstat_collect_vbds(xenstat_node * node) > vbd.error = 1; > } > } > + else if (vbd.back_type == 3) > + { > + > + char *td3_pid; > + char *path; > + > + vbd.back_type = 3; > + vbd.error = 0; > + > + if (asprintf(&path, > "/local/domain/0/backend/vbd3/%u/%u/kthread-pid", domid, vbd.dev) < 0) > + continue; > + > + td3_pid = xs_read(node->handle->xshandle, XBT_NULL, > path, NULL); > + > + if (td3_pid == NULL) { > + free(path); > + continue; > + } > + > + free(path); > + > + if (asprintf(&path, "/dev/shm/td3-%s/vbd-%u-%u", > td3_pid, domid, vbd.dev) < 0) { > + free(td3_pid); > + continue; > + } > + > + if (read_attributes_vbd3(path, &vbd) < 0) > + vbd.error = 1; > + > + free(td3_pid); > + free(path); > + } > else > { > vbd.error = 1; > diff --git a/tools/libs/stat/xenstat_priv.h b/tools/libs/stat/xenstat_priv.h > index 4eb44a8ebb84..c3a9635240e9 100644 > --- a/tools/libs/stat/xenstat_priv.h > +++ b/tools/libs/stat/xenstat_priv.h > @@ -98,6 +98,22 @@ struct xenstat_vbd { > unsigned long long wr_sects; > }; > > +struct vbd3_stats { > + uint32_t version; > + uint32_t __pad; > + uint64_t oo_reqs; > + uint64_t read_reqs_submitted; > + uint64_t read_reqs_completed; > + uint64_t read_sectors; > + uint64_t read_total_ticks; > + uint64_t write_reqs_submitted; > + uint64_t write_reqs_completed; > + uint64_t write_sectors; > + uint64_t write_total_ticks; > + uint64_t io_errors; > + uint64_t flags; > +}; > + > extern int xenstat_collect_net
Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h
On 26.02.2024 13:58, Oleksii wrote: > On Mon, 2024-02-26 at 12:28 +0100, Jan Beulich wrote: >> On 26.02.2024 12:18, Oleksii wrote: >>> On Mon, 2024-02-26 at 10:45 +0100, Jan Beulich wrote: On 23.02.2024 13:23, Oleksii wrote: >> > As 1- and 2-byte cases are emulated I decided that is > not > to > provide > sfx argument for emulation macros as it will not have > to > much > affect on > emulated types and just consume more performance on > acquire > and > release > version of sc/ld instructions. Question is whether the common case (4- and 8-byte accesses) shouldn't be valued higher, with 1- and 2-byte emulation being there just to allow things to not break altogether. >>> If I understand you correctly, it would make sense to add >>> the >>> 'sfx' >>> argument for the 1/2-byte access case, ensuring that all >>> options >>> are >>> available for 1/2-byte access case as well. >> >> That's one of the possibilities. As said, I'm not overly >> worried >> about >> the emulated cases. For the initial implementation I'd >> recommend >> going >> with what is easiest there, yielding the best possible result >> for >> the >> 4- and 8-byte cases. If later it turns out repeated >> acquire/release >> accesses are a problem in the emulation loop, things can be >> changed >> to explicit barriers, without touching the 4- and 8-byte >> cases. > I am confused then a little bit if emulated case is not an > issue. > > For 4- and 8-byte cases for xchg .aqrl is used, for relaxed and > aqcuire > version of xchg barries are used. > > The similar is done for cmpxchg. > > If something will be needed to change in emulation loop it > won't > require to change 4- and 8-byte cases. I'm afraid I don't understand your reply. >>> IIUC, emulated cases it is implemented correctly in terms of usage >>> barriers. And it also OK not to use sfx for lr/sc instructions and >>> use >>> only barriers. >>> >>> For 4- and 8-byte cases are used sfx + barrier depending on the >>> specific case ( relaxed, acquire, release, generic xchg/cmpxchg ). >>> What also looks to me correct. But you suggested to provide the >>> best >>> possible result for 4- and 8-byte cases. >>> >>> So I don't understand what the best possible result is as the >>> current >>> one usage of __{cmp}xchg_generic for each specific case ( relaxed, >>> acquire, release, generic xchg/cmpxchg ) looks correct to me: >>> xchg -> (..., ".aqrl", "", "") just suffix .aqrl suffix without >>> barriers. >>> xchg_release -> (..., "", RISCV_RELEASE_BARRIER, "" ) use only >>> release >>> barrier >>> xchg_acquire -> (..., "", "", RISCV_ACQUIRE_BARRIER ), only acquire >>> barrier >>> xchg_relaxed -> (..., "", "", "") - no barries, no sfx >> >> So first: While explicit barriers are technically okay, I don't >> follow why >> you insist on using them when you can achieve the same by suitably >> tagging >> the actual insn doing the exchange. Then second: It's somewhat hard >> for me >> to see the final effect on the emulation paths without you actually >> having >> done the switch. Maybe no special handling is necessary there anymore >> then. And as said, it may actually be acceptable for the emulation >> paths >> to "only" be correct, but not be ideal in terms of performance. After >> all, >> if you use the normal 4-byte primitive in there, more (non-explicit) >> barriers than needed would occur if the involved loop has to take >> more >> than one iteration. Which could (but imo doesn't need to be) avoided >> by >> using a more relaxed 4-byte primitive there and an explicit barrier >> outside of the loop. > > According to the spec: > Table A.5 ( part of the table only I copied here ) > > Linux Construct RVWMO Mapping > atomic relaxed amo.{w|d} > atomic acquire amo.{w|d}.aq > atomic release amo.{w|d}.rl > atomicamo.{w|d}.aqrl > > Linux Construct RVWMO LR/SC Mapping > atomic relaxed loop: lr.{w|d}; ; sc.{w|d}; bnez loop > atomic acquire loop: lr.{w|d}.aq; ; sc.{w|d}; bnez loop > atomic release loop: lr.{w|d}; ; sc.{w|d}.aqrl∗ ; bnez > loop OR > fence.tso; loop: lr.{w|d}; ; sc.{w|d}∗ ; > bnez loop > atomicloop: lr.{w|d}.aq; ; sc.{w|d}.aqrl; bnez > loop In your consideration what to implement you will want to limit things to constructs we actually use. I can't find any use of the relaxed, acquire, or release forms of atomics as mentioned above. > The Linux mappings for release operations may seem stronger than > necessary, but these mappings > are needed to cover some cases in which Linux requires stronger > orderings than the more in
[PATCH] tools/xentop: Add VBD3 support to xentop
From: Pritha Srivastava xl now knows how to drive tapdisk, so modified libxenstat to understand vbd3 statistics. Signed-off-by: Jorge Martin Signed-off-by: Pritha Srivastava Signed-off-by: Fouad Hilly --- CC: Wei Liu CC: Anthony PERARD CC: Juergen Gross --- tools/libs/stat/xenstat_linux.c | 69 - tools/libs/stat/xenstat_priv.h | 16 tools/xentop/xentop.c | 1 + 3 files changed, 85 insertions(+), 1 deletion(-) diff --git a/tools/libs/stat/xenstat_linux.c b/tools/libs/stat/xenstat_linux.c index cbba54aa83ee..5a4a03634182 100644 --- a/tools/libs/stat/xenstat_linux.c +++ b/tools/libs/stat/xenstat_linux.c @@ -390,6 +390,39 @@ void xenstat_uninit_networks(xenstat_handle * handle) fclose(priv->procnetdev); } +static int read_attributes_vbd3(char *vbd3_path, xenstat_vbd *vbd) +{ + FILE *fp; + struct vbd3_stats vbd3_stats; + + fp = fopen(vbd3_path, "rb"); + + if (fp == NULL) + { + return -1; + } + + if (fread(&vbd3_stats, sizeof(struct vbd3_stats), 1, fp) != 1) { + fclose(fp); + return -1; + } + + if (vbd3_stats.version != 1) { + fclose(fp); + return -1; + } + + vbd->oo_reqs = vbd3_stats.oo_reqs; + vbd->rd_reqs = vbd3_stats.read_reqs_submitted; + vbd->rd_sects = vbd3_stats.read_sectors; + vbd->wr_reqs = vbd3_stats.write_reqs_submitted; + vbd->wr_sects = vbd3_stats.write_sectors; + + fclose(fp); + + return 0; +} + static int read_attributes_vbd(const char *vbd_directory, const char *what, char *ret, int cap) { static char file_name[80]; @@ -438,7 +471,7 @@ int xenstat_collect_vbds(xenstat_node * node) int ret; char buf[256]; - ret = sscanf(dp->d_name, "%3s-%u-%u", buf, &domid, &vbd.dev); + ret = sscanf(dp->d_name, "%255[^-]-%u-%u", buf, &domid, &vbd.dev); if (ret != 3) continue; if (!(strstr(buf, "vbd")) && !(strstr(buf, "tap"))) @@ -448,6 +481,8 @@ int xenstat_collect_vbds(xenstat_node * node) vbd.back_type = 1; else if (strcmp(buf,"tap") == 0) vbd.back_type = 2; + else if (strcmp(buf,"vbd3") == 0) + vbd.back_type = 3; else vbd.back_type = 0; @@ -479,6 +514,38 @@ int xenstat_collect_vbds(xenstat_node * node) vbd.error = 1; } } + else if (vbd.back_type == 3) + { + + char *td3_pid; + char *path; + + vbd.back_type = 3; + vbd.error = 0; + + if (asprintf(&path, "/local/domain/0/backend/vbd3/%u/%u/kthread-pid", domid, vbd.dev) < 0) + continue; + + td3_pid = xs_read(node->handle->xshandle, XBT_NULL, path, NULL); + + if (td3_pid == NULL) { + free(path); + continue; + } + + free(path); + + if (asprintf(&path, "/dev/shm/td3-%s/vbd-%u-%u", td3_pid, domid, vbd.dev) < 0) { + free(td3_pid); + continue; + } + + if (read_attributes_vbd3(path, &vbd) < 0) + vbd.error = 1; + + free(td3_pid); + free(path); + } else { vbd.error = 1; diff --git a/tools/libs/stat/xenstat_priv.h b/tools/libs/stat/xenstat_priv.h index 4eb44a8ebb84..c3a9635240e9 100644 --- a/tools/libs/stat/xenstat_priv.h +++ b/tools/libs/stat/xenstat_priv.h @@ -98,6 +98,22 @@ struct xenstat_vbd { unsigned long long wr_sects; }; +struct vbd3_stats { + uint32_t version; + uint32_t __pad; + uint64_t oo_reqs; + uint64_t read_reqs_submitted; + uint64_t read_reqs_completed; + uint64_t read_sectors; + uint64_t read_total_ticks; + uint64_t write_reqs_submitted; + uint64_t write_reqs_completed; + uint64_t write_sectors; + uint64_t write_total_ticks; + uint64_t io_errors; + uint64_t flags; +}; + extern int xenstat_collect_networks(xenstat_node * node); extern void xenstat_uninit_networks(xenstat_handle * handle); extern int xenstat_collect_vbds(xenstat_node * node); diff --git a/tools/xentop/xentop.c b/tools/xentop/xentop.c index 0a2fab7f15a3..f5a456fd4dfd 100644 --- a/tools/xentop/xentop.c +++ b/tools/xentop/xentop.c @@ -1124,6 +1124,7 @@ void do_vbd(xenstat_domain *domain) "Unidentified", /* number 0
Re: [PATCH v2] x86/HVM: limit upcall vector related verbosity
On 18/12/2023 7:26 am, Jan Beulich wrote: > Avoid logging all-identical messages for every vCPU, but make sure to > log unusual events like the vector differing from vCPU 0's (note that > the respective condition also makes sure vCPU 0 itself will have the > vector setting logged), or it changing after it was once set. (Arguably > a downside is that some vCPU not having its vector set would no longer > be recognizable from the logs. But I think that's tolerable as > sufficiently unlikely outside of people actively fiddling with related > code.) > > Signed-off-by: Jan Beulich Acked-by: Andrew Cooper
Re: [PATCH v2] x86: Dom0 "broken ELF" reporting adjustments
On 22/01/2024 1:41 pm, Jan Beulich wrote: > elf_load_binary() isn't the primary source of brokenness being > indicated. Therefore make the respective PVH log message there > conditional (much like PV has it), and add another instance when > elf_xen_parse() failed (again matching behavior in the PV case). > > Make the PV side match the (new) use of %pd here. > > Signed-off-by: Jan Beulich Acked-by: Andrew Cooper
Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h
On Mon, 2024-02-26 at 12:28 +0100, Jan Beulich wrote: > On 26.02.2024 12:18, Oleksii wrote: > > On Mon, 2024-02-26 at 10:45 +0100, Jan Beulich wrote: > > > On 23.02.2024 13:23, Oleksii wrote: > > > > > > > > > > > > > As 1- and 2-byte cases are emulated I decided that is > > > > > > > > not > > > > > > > > to > > > > > > > > provide > > > > > > > > sfx argument for emulation macros as it will not have > > > > > > > > to > > > > > > > > much > > > > > > > > affect on > > > > > > > > emulated types and just consume more performance on > > > > > > > > acquire > > > > > > > > and > > > > > > > > release > > > > > > > > version of sc/ld instructions. > > > > > > > > > > > > > > Question is whether the common case (4- and 8-byte > > > > > > > accesses) > > > > > > > shouldn't > > > > > > > be valued higher, with 1- and 2-byte emulation being > > > > > > > there > > > > > > > just > > > > > > > to > > > > > > > allow things to not break altogether. > > > > > > If I understand you correctly, it would make sense to add > > > > > > the > > > > > > 'sfx' > > > > > > argument for the 1/2-byte access case, ensuring that all > > > > > > options > > > > > > are > > > > > > available for 1/2-byte access case as well. > > > > > > > > > > That's one of the possibilities. As said, I'm not overly > > > > > worried > > > > > about > > > > > the emulated cases. For the initial implementation I'd > > > > > recommend > > > > > going > > > > > with what is easiest there, yielding the best possible result > > > > > for > > > > > the > > > > > 4- and 8-byte cases. If later it turns out repeated > > > > > acquire/release > > > > > accesses are a problem in the emulation loop, things can be > > > > > changed > > > > > to explicit barriers, without touching the 4- and 8-byte > > > > > cases. > > > > I am confused then a little bit if emulated case is not an > > > > issue. > > > > > > > > For 4- and 8-byte cases for xchg .aqrl is used, for relaxed and > > > > aqcuire > > > > version of xchg barries are used. > > > > > > > > The similar is done for cmpxchg. > > > > > > > > If something will be needed to change in emulation loop it > > > > won't > > > > require to change 4- and 8-byte cases. > > > > > > I'm afraid I don't understand your reply. > > IIUC, emulated cases it is implemented correctly in terms of usage > > barriers. And it also OK not to use sfx for lr/sc instructions and > > use > > only barriers. > > > > For 4- and 8-byte cases are used sfx + barrier depending on the > > specific case ( relaxed, acquire, release, generic xchg/cmpxchg ). > > What also looks to me correct. But you suggested to provide the > > best > > possible result for 4- and 8-byte cases. > > > > So I don't understand what the best possible result is as the > > current > > one usage of __{cmp}xchg_generic for each specific case ( relaxed, > > acquire, release, generic xchg/cmpxchg ) looks correct to me: > > xchg -> (..., ".aqrl", "", "") just suffix .aqrl suffix without > > barriers. > > xchg_release -> (..., "", RISCV_RELEASE_BARRIER, "" ) use only > > release > > barrier > > xchg_acquire -> (..., "", "", RISCV_ACQUIRE_BARRIER ), only acquire > > barrier > > xchg_relaxed -> (..., "", "", "") - no barries, no sfx > > So first: While explicit barriers are technically okay, I don't > follow why > you insist on using them when you can achieve the same by suitably > tagging > the actual insn doing the exchange. Then second: It's somewhat hard > for me > to see the final effect on the emulation paths without you actually > having > done the switch. Maybe no special handling is necessary there anymore > then. And as said, it may actually be acceptable for the emulation > paths > to "only" be correct, but not be ideal in terms of performance. After > all, > if you use the normal 4-byte primitive in there, more (non-explicit) > barriers than needed would occur if the involved loop has to take > more > than one iteration. Which could (but imo doesn't need to be) avoided > by > using a more relaxed 4-byte primitive there and an explicit barrier > outside of the loop. According to the spec: Table A.5 ( part of the table only I copied here ) Linux Construct RVWMO Mapping atomic relaxed amo.{w|d} atomic acquire amo.{w|d}.aq atomic release amo.{w|d}.rl atomicamo.{w|d}.aqrl Linux Construct RVWMO LR/SC Mapping atomic relaxed loop: lr.{w|d}; ; sc.{w|d}; bnez loop atomic acquire loop: lr.{w|d}.aq; ; sc.{w|d}; bnez loop atomic release loop: lr.{w|d}; ; sc.{w|d}.aqrl∗ ; bnez loop OR fence.tso; loop: lr.{w|d}; ; sc.{w|d}∗ ; bnez loop atomicloop: lr.{w|d}.aq; ; sc.{w|d}.aqrl; bnez loop The Linux mappings for release operations may seem stronger than necessary, but these mappings are needed to cover some cases in which Linux requires stronger orderings than the more intuitive mappings would provide. In particular, as of the time t
[xen-unstable test] 184763: tolerable FAIL
flight 184763 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/184763/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 12 debian-hvm-install fail in 184756 pass in 184763 test-armhf-armhf-xl-credit2 8 xen-boot fail pass in 184756 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-credit2 15 migrate-support-check fail in 184756 never pass test-armhf-armhf-xl-credit2 16 saverestore-support-check fail in 184756 never pass test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 184756 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184756 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 184756 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184756 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 184756 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 184756 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 184756 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 184756 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184756 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184756 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 184756 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184756 test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass version targeted for testing: xen 92babc88f67ed0ef3dc575a8b9534040274678ee baseline version: xen 92babc88f67ed0ef3dc575a8b9534040274678ee Last test of basis 184763 2024-02-26 01:52:19 Z
[PATCH 1/3] x86/entry: Adjustments to "reduce assembly code size of entry points"
Some retroactive review, for if I'd got to the patch in time. * The new ASM-friendly BUILD_BUG_ON() should be in a header file. * entry_int82() wants the movl->movb treatment too. Fixes: c144b9e32427 ("x86: Reduce assembly code size of entry points") Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Wei Liu Fixes just to link the two patches. --- xen/arch/x86/include/asm/asm_defns.h | 12 xen/arch/x86/x86_64/compat/entry.S | 2 +- xen/arch/x86/x86_64/entry.S | 8 3 files changed, 13 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/include/asm/asm_defns.h b/xen/arch/x86/include/asm/asm_defns.h index a9a6c21c76cd..f18a11b36198 100644 --- a/xen/arch/x86/include/asm/asm_defns.h +++ b/xen/arch/x86/include/asm/asm_defns.h @@ -56,6 +56,18 @@ register unsigned long current_stack_pointer asm("rsp"); #define ASSERT_INTERRUPTS_DISABLED \ ASSERT_INTERRUPT_STATUS(z, "INTERRUPTS DISABLED") +#ifdef __ASSEMBLY__ + +.macro BUILD_BUG_ON condstr cond:vararg +.if \cond +.error "Condition \"\condstr\" not satisfied" +.endif +.endm +/* preprocessor macro to make error message more user friendly */ +#define BUILD_BUG_ON(cond) BUILD_BUG_ON #cond cond + +#endif /* __ASSEMBLY__ */ + #ifdef __ASSEMBLY__ # define _ASM_EX(p) p-. #else diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S index d4f0e4804090..93fbbeb4ae18 100644 --- a/xen/arch/x86/x86_64/compat/entry.S +++ b/xen/arch/x86/x86_64/compat/entry.S @@ -15,7 +15,7 @@ FUNC(entry_int82) ENDBR64 ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP pushq $0 -movl $HYPERCALL_VECTOR, 4(%rsp) +movb $HYPERCALL_VECTOR, 4(%rsp) SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */ diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S index a7bd8f0ca5b1..f8938b0b42fd 100644 --- a/xen/arch/x86/x86_64/entry.S +++ b/xen/arch/x86/x86_64/entry.S @@ -22,14 +22,6 @@ #endif .endm -.macro BUILD_BUG_ON condstr cond:vararg -.if \cond -.error "Condition \"\condstr\" not satisfied" -.endif -.endm -/* preprocessor macro to make error message more user friendly */ -#define BUILD_BUG_ON(cond) BUILD_BUG_ON #cond cond - #ifdef CONFIG_PV /* %rbx: struct vcpu */ FUNC_LOCAL(switch_to_kernel) -- 2.30.2
[PATCH 2/3] x86/entry: Simplify expressions in compat_restore_all_guest()
compat_restore_all_guest() already has SPEC_CTRL_EXIT_TO_PV with a documented requirement for %rsp to be both regs and cpuinfo. Use the now-normal annotations and simplify the expressions which happen to be a subtraction of 0. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Wei Liu --- xen/arch/x86/x86_64/compat/entry.S | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S index 93fbbeb4ae18..727ab65290de 100644 --- a/xen/arch/x86/x86_64/compat/entry.S +++ b/xen/arch/x86/x86_64/compat/entry.S @@ -117,19 +117,19 @@ compat_process_trap: jmp compat_test_all_events END(compat_test_all_events) -/* %rbx: struct vcpu, interrupts disabled */ +/* %rbx: struct vcpu, interrupts disabled, %rsp=regs/cpuinfo */ FUNC(compat_restore_all_guest) ASSERT_INTERRUPTS_DISABLED mov $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), %r11d and UREGS_eflags(%rsp),%r11d -.macro alt_cr4_pv32 +.macro alt_cr4_pv32 /* %rsp=regs/cpuinfo */ testb $3,UREGS_cs(%rsp) jpe 2f -mov CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp), %rax +mov CPUINFO_cr4(%rsp), %rax and $~XEN_CR4_PV32_BITS, %rax 1: -mov %rax, CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp) +mov %rax, CPUINFO_cr4(%rsp) mov %rax, %cr4 /* * An NMI or MCE may have occurred between the previous two @@ -141,7 +141,7 @@ FUNC(compat_restore_all_guest) * loop to cause a live lock: If NMIs/MCEs occurred at that * high a rate, we'd be live locked anyway. */ -cmp %rax, CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp) +cmp %rax, CPUINFO_cr4(%rsp) jne 1b 2: .endm -- 2.30.2
[PATCH 3/3] x86/entry: Introduce EFRAME_* constants
restore_all_guest() does a lot of manipulation of the stack after popping the GPRs, and uses raw %rsp displacements to do so. Also, almost all entrypaths use raw %rsp displacements prior to pushing GPRs. Provide better mnemonics, to aid readability and reduce the chance of errors when editing. No functional change. The resulting binary is identical. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Wei Liu --- xen/arch/x86/x86_64/asm-offsets.c | 17 xen/arch/x86/x86_64/compat/entry.S | 2 +- xen/arch/x86/x86_64/entry.S| 68 +++--- 3 files changed, 52 insertions(+), 35 deletions(-) diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c index fee0edc61abb..4cc23cd032c1 100644 --- a/xen/arch/x86/x86_64/asm-offsets.c +++ b/xen/arch/x86/x86_64/asm-offsets.c @@ -51,6 +51,23 @@ void __dummy__(void) OFFSET(UREGS_kernel_sizeof, struct cpu_user_regs, es); BLANK(); +/* + * EFRAME_* is for the entry/exit logic where %rsp is pointing at + * UREGS_error_code and GPRs are still guest values. + */ +#define OFFSET_EF(sym, mem) \ +DEFINE(sym, offsetof(struct cpu_user_regs, mem) - \ +offsetof(struct cpu_user_regs, error_code)) + +OFFSET_EF(EFRAME_entry_vector,entry_vector); +OFFSET_EF(EFRAME_rip, rip); +OFFSET_EF(EFRAME_cs, cs); +OFFSET_EF(EFRAME_eflags, eflags); +OFFSET_EF(EFRAME_rsp, rsp); +BLANK(); + +#undef OFFSET_EF + OFFSET(VCPU_processor, struct vcpu, processor); OFFSET(VCPU_domain, struct vcpu, domain); OFFSET(VCPU_vcpu_info, struct vcpu, vcpu_info_area.map); diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S index 727ab65290de..2f8fe5ebfbe4 100644 --- a/xen/arch/x86/x86_64/compat/entry.S +++ b/xen/arch/x86/x86_64/compat/entry.S @@ -15,7 +15,7 @@ FUNC(entry_int82) ENDBR64 ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP pushq $0 -movb $HYPERCALL_VECTOR, 4(%rsp) +movb $HYPERCALL_VECTOR, EFRAME_entry_vector(%rsp) SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */ diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S index f8938b0b42fd..1b846f3aaff0 100644 --- a/xen/arch/x86/x86_64/entry.S +++ b/xen/arch/x86/x86_64/entry.S @@ -188,15 +188,15 @@ FUNC_LOCAL(restore_all_guest) RESTORE_ALL BUILD_BUG_ON(TRAP_syscall & 0xff) -testb $TRAP_syscall >> 8, 4+1(%rsp) +testb $TRAP_syscall >> 8, EFRAME_entry_vector + 1(%rsp) jziret_exit_to_guest -movq 24(%rsp),%r11 # RFLAGS +mov EFRAME_eflags(%rsp), %r11 andq $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), %r11 orq $X86_EFLAGS_IF,%r11 /* Don't use SYSRET path if the return address is not canonical. */ -movq 8(%rsp),%rcx +mov EFRAME_rip(%rsp), %rcx sarq $47,%rcx incl %ecx cmpl $1,%ecx @@ -211,19 +211,19 @@ FUNC_LOCAL(restore_all_guest) ALTERNATIVE "", rag_clrssbsy, X86_FEATURE_XEN_SHSTK #endif -movq 8(%rsp), %rcx # RIP -cmpw $FLAT_USER_CS32,16(%rsp)# CS -movq 32(%rsp),%rsp # RSP +mov EFRAME_rip(%rsp), %rcx +cmpw $FLAT_USER_CS32, EFRAME_cs(%rsp) +mov EFRAME_rsp(%rsp), %rsp je1f sysretq 1: sysretl LABEL_LOCAL(.Lrestore_rcx_iret_exit_to_guest) -movq 8(%rsp), %rcx # RIP +mov EFRAME_rip(%rsp), %rcx /* No special register assumptions. */ iret_exit_to_guest: -andl $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), 24(%rsp) -orl $X86_EFLAGS_IF,24(%rsp) +andl $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), EFRAME_eflags(%rsp) +orl $X86_EFLAGS_IF, EFRAME_eflags(%rsp) addq $8,%rsp .Lft0: iretq _ASM_PRE_EXTABLE(.Lft0, handle_exception) @@ -256,7 +256,7 @@ FUNC(lstar_enter) pushq %rcx pushq $0 BUILD_BUG_ON(TRAP_syscall & 0xff) -movb $TRAP_syscall >> 8, 4+1(%rsp) +movb $TRAP_syscall >> 8, EFRAME_entry_vector + 1(%rsp) SAVE_ALL SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */ @@ -295,7 +295,7 @@ FUNC(cstar_enter) pushq %rcx pushq $0 BUILD_BUG_ON(TRAP_syscall & 0xff) -movb $TRAP_syscall >> 8, 4+1(%rsp) +movb $TRAP_syscall >> 8, EFRAME_entry_vector + 1(%rsp) SAVE_ALL SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */ @@ -338,7 +338,7 @@ LABEL(sysenter_eflags_saved, 0) pushq $0 /* null rip */ pushq $0 BUILD_BUG_ON(TRAP_syscall & 0xff) -movb $TRAP_syscall >> 8, 4+1(%rsp)
[PATCH 0/3] x86/entry: More cleanup
Misc adjustments and cleanup. No functional change. Andrew Cooper (3): x86/entry: Adjustments to "reduce assembly code size of entry points" x86/entry: Simplify expressions in compat_restore_all_guest() x86/entry: Introduce EFRAME_* constants xen/arch/x86/include/asm/asm_defns.h | 12 + xen/arch/x86/x86_64/asm-offsets.c| 17 +++ xen/arch/x86/x86_64/compat/entry.S | 12 ++--- xen/arch/x86/x86_64/entry.S | 76 +--- 4 files changed, 69 insertions(+), 48 deletions(-) base-commit: 8de3afc0b402bc17f65093a53e5870862707a8c7 -- 2.30.2
Re: [PATCH v3 4/4] x86/spec: do not print thunk option selection if not built-in
On 26.02.2024 12:07, Roger Pau Monne wrote: > Now that the thunk built-in enable is printed as part of the "Compiled-in > support:" line, avoid printing anything in "Xen settings:" if the thunk is > disabled at build time. Why "Now that ..."? It's other logging the earlier patch adds there. > Note the BTI-Thunk option printing is also adjusted to print a colon in the > same way the other options on the line do. > > Requested-by: Jan Beulich > Signed-off-by: Roger Pau Monné With either a clarification of what's meant or e.g. s/Now that/Since/ Reviewed-by: Jan Beulich Jan
Re: [PATCH v3 3/4] x86/spec: fix INDIRECT_THUNK option to only be set when build-enabled
On 26.02.2024 12:07, Roger Pau Monne wrote: > Attempt to provide a more helpful error message when the user attempts to set > spec-ctrl=bti-thunk option but the support is build-time disabled. > > While there also adjust the command line documentation to mention > CONFIG_INDIRECT_THUNK instead of INDIRECT_THUNK. > > Reported-by: Andrew Cooper > Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich with one minor remark: > --- a/xen/arch/x86/spec_ctrl.c > +++ b/xen/arch/x86/spec_ctrl.c > @@ -241,7 +241,12 @@ static int __init cf_check parse_spec_ctrl(const char *s) > { > s += 10; > > -if ( !cmdline_strcmp(s, "retpoline") ) > +if ( !IS_ENABLED(CONFIG_INDIRECT_THUNK) ) > +{ > +no_config_param("INDIRECT_THUNK", "spec-ctrl=bti-thunk", s, > ss); > +rc = -EINVAL; > +} > +else if ( !cmdline_strcmp(s, "retpoline") ) > opt_thunk = THUNK_RETPOLINE; > else if ( !cmdline_strcmp(s, "lfence") ) > opt_thunk = THUNK_LFENCE; How about if ( !IS_ENABLED(CONFIG_INDIRECT_THUNK) ) { no_config_param("INDIRECT_THUNK", "spec-ctrl", s - 10, ss); rc = -EINVAL; } else if ( !cmdline_strcmp(s, "retpoline") ) or (likely less liked by you and Andrew) "s += 10;" dropped and then if ( !IS_ENABLED(CONFIG_INDIRECT_THUNK) ) { no_config_param("INDIRECT_THUNK", "spec-ctrl", s, ss); rc = -EINVAL; } else if ( !cmdline_strcmp(s += 10, "retpoline") ) conserving a little on string literal space (sadly, despite the function being __init, string literals remain post-init due to living in .rodata)? Jan
Re: [PATCH v3 2/4] x86/spec: fix BRANCH_HARDEN option to only be set when build-enabled
On 26.02.2024 12:07, Roger Pau Monne wrote: > The current logic to handle the BRANCH_HARDEN option will report it as enabled > even when build-time disabled. Fix this by only allowing the option to be set > when support for it is built into Xen. > > Fixes: 2d6f36daa086 ('x86/nospec: Introduce CONFIG_SPECULATIVE_HARDEN_BRANCH') > Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich
Re: [PATCH v3 1/4] x86/spec: print the built-in SPECULATIVE_HARDEN_* options
On 26.02.2024 12:07, Roger Pau Monne wrote: > Just like it's done for INDIRECT_THUNK and SHADOW_PAGING. > > Reported-by: Jan Beulich > Signed-off-by: Roger Pau Monné In principle Reviewed-by: Jan Beulich but ... > --- a/xen/arch/x86/spec_ctrl.c > +++ b/xen/arch/x86/spec_ctrl.c > @@ -466,13 +466,25 @@ static void __init print_details(enum ind_thunk thunk) > (e21a & cpufeat_mask(X86_FEATURE_SBPB)) ? " SBPB" > : ""); > > /* Compiled-in support which pertains to mitigations. */ > -if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || > IS_ENABLED(CONFIG_SHADOW_PAGING) ) > +if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || > IS_ENABLED(CONFIG_SHADOW_PAGING) || > + IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_ARRAY) || > + IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH) || > + IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS) ) > printk(" Compiled-in support:" > #ifdef CONFIG_INDIRECT_THUNK > " INDIRECT_THUNK" > #endif > #ifdef CONFIG_SHADOW_PAGING > " SHADOW_PAGING" > +#endif > +#ifdef CONFIG_SPECULATIVE_HARDEN_ARRAY > + " SPECULATIVE_HARDEN_ARRAY" > +#endif > +#ifdef CONFIG_SPECULATIVE_HARDEN_BRANCH > + " SPECULATIVE_HARDEN_BRANCH" > +#endif > +#ifdef CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS > + " SPECULATIVE_HARDEN_GUEST_ACCESS" > #endif ... I'd like to suggest to drop the SPECULATIVE_ from the string literals. They're relevant in the Kconfig identifiers, but they're imo redundant in the context of these log messages. (Happy to adjust while committing, if need be.) Jan
Re: [PATCH v6 3/3] xen/livepatch: align functions to ensure minimal distance between entry points
On 26.02.2024 12:32, Roger Pau Monné wrote: > On Tue, Feb 13, 2024 at 04:58:38PM +0100, Jan Beulich wrote: >> On 07.02.2024 15:55, Roger Pau Monne wrote: >>> The minimal function size requirements for an x86 livepatch are either 5 >>> bytes >>> (for jmp) or 9 bytes (for endbr + jmp), and always 4 bytes on Arm. Ensure >>> that >>> distance between functions entry points is always at least of the minimal >>> required size for livepatch instruction replacement to be successful. >>> >>> Add an additional align directive to the linker scripts, in order to ensure >>> that >>> the next section placed after the .text.* (per-function sections) is also >>> aligned to the required boundary, so that the distance of the last function >>> entry point with the next symbol is also of minimal size. >> >> Perhaps "... minimal required size"? > > Yes. > >>> --- a/xen/common/Kconfig >>> +++ b/xen/common/Kconfig >>> @@ -395,8 +395,11 @@ config CRYPTO >>> config LIVEPATCH >>> bool "Live patching support" >>> default X86 >>> - depends on "$(XEN_HAS_BUILD_ID)" = "y" >>> + depends on "$(XEN_HAS_BUILD_ID)" = "y" && CC_HAS_FUNCTION_ALIGNMENT >>> select CC_SPLIT_SECTIONS >>> + select FUNCTION_ALIGNMENT_16B if XEN_IBT >>> + select FUNCTION_ALIGNMENT_8B if X86 >>> + select FUNCTION_ALIGNMENT_4B if ARM >> >> This isn't strictly needed, is it? Would be nice to avoid re-selection >> of what the default for an arch is anyway, as otherwise this will start >> looking clumsy when a couple more architectures are added. > > My worry was that the default per-arch could change, ie: for example > x86 moving from 16 to 8 and then it would hamper livepatch support if > IBT is also enabled. I however think it's very unlikely to reduce the > default alignment, and in any case we would hit a build time assert if > that ever happens. > > So yes, I'm fine with dropping those. Oh, no - not "those", only "that", i.e. only the last (Arm) one. Jan
[xen-unstable-smoke test] 184764: tolerable all pass - PUSHED
flight 184764 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/184764/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass version targeted for testing: xen 8de3afc0b402bc17f65093a53e5870862707a8c7 baseline version: xen 92babc88f67ed0ef3dc575a8b9534040274678ee Last test of basis 184745 2024-02-23 23:03:51 Z2 days Testing same since 184764 2024-02-26 10:02:04 Z0 days1 attempts People who touched revisions under test: Daniel P. Smith Jan Beulich Julien Grall Nicola Vetrini Oleksii Kurochko Roger Pau Monné jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 92babc88f6..8de3afc0b4 8de3afc0b402bc17f65093a53e5870862707a8c7 -> smoke
Re: [PATCH v6 1/3] xen: introduce Kconfig function alignment option
On 26.02.2024 12:33, Roger Pau Monné wrote: > On Tue, Feb 13, 2024 at 04:51:13PM +0100, Jan Beulich wrote: >> On 07.02.2024 15:55, Roger Pau Monne wrote: >>> --- a/xen/arch/x86/Kconfig >>> +++ b/xen/arch/x86/Kconfig >>> @@ -29,6 +29,7 @@ config X86 >>> select HAS_UBSAN >>> select HAS_VPCI if HVM >>> select NEEDS_LIBELF >>> + select FUNCTION_ALIGNMENT_16B >> >> With the insertion here as well as for Arm and PPC obeying alphabetic >> sorting: >> Reviewed-by: Jan Beulich > > Would you like me to resend with that adjusted? I guess it can be taken care of while committing; I've taken note of this. Sadly there is still at least one missing ack. Jan
Re: [PATCH v6 1/3] xen: introduce Kconfig function alignment option
On Tue, Feb 13, 2024 at 04:51:13PM +0100, Jan Beulich wrote: > On 07.02.2024 15:55, Roger Pau Monne wrote: > > --- a/xen/arch/x86/Kconfig > > +++ b/xen/arch/x86/Kconfig > > @@ -29,6 +29,7 @@ config X86 > > select HAS_UBSAN > > select HAS_VPCI if HVM > > select NEEDS_LIBELF > > + select FUNCTION_ALIGNMENT_16B > > With the insertion here as well as for Arm and PPC obeying alphabetic > sorting: > Reviewed-by: Jan Beulich Would you like me to resend with that adjusted? Thanks, Roger.
Re: [PATCH v6 3/3] xen/livepatch: align functions to ensure minimal distance between entry points
On Tue, Feb 13, 2024 at 04:58:38PM +0100, Jan Beulich wrote: > On 07.02.2024 15:55, Roger Pau Monne wrote: > > The minimal function size requirements for an x86 livepatch are either 5 > > bytes > > (for jmp) or 9 bytes (for endbr + jmp), and always 4 bytes on Arm. Ensure > > that > > distance between functions entry points is always at least of the minimal > > required size for livepatch instruction replacement to be successful. > > > > Add an additional align directive to the linker scripts, in order to ensure > > that > > the next section placed after the .text.* (per-function sections) is also > > aligned to the required boundary, so that the distance of the last function > > entry point with the next symbol is also of minimal size. > > Perhaps "... minimal required size"? Yes. > > --- a/xen/common/Kconfig > > +++ b/xen/common/Kconfig > > @@ -395,8 +395,11 @@ config CRYPTO > > config LIVEPATCH > > bool "Live patching support" > > default X86 > > - depends on "$(XEN_HAS_BUILD_ID)" = "y" > > + depends on "$(XEN_HAS_BUILD_ID)" = "y" && CC_HAS_FUNCTION_ALIGNMENT > > select CC_SPLIT_SECTIONS > > + select FUNCTION_ALIGNMENT_16B if XEN_IBT > > + select FUNCTION_ALIGNMENT_8B if X86 > > + select FUNCTION_ALIGNMENT_4B if ARM > > This isn't strictly needed, is it? Would be nice to avoid re-selection > of what the default for an arch is anyway, as otherwise this will start > looking clumsy when a couple more architectures are added. My worry was that the default per-arch could change, ie: for example x86 moving from 16 to 8 and then it would hamper livepatch support if IBT is also enabled. I however think it's very unlikely to reduce the default alignment, and in any case we would hit a build time assert if that ever happens. So yes, I'm fine with dropping those. > Preferably > with that dropped (or it being clarified why it's still desirable to > have): > Reviewed-by: Jan Beulich Thanks, Roger.
Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h
On 26.02.2024 12:18, Oleksii wrote: > On Mon, 2024-02-26 at 10:45 +0100, Jan Beulich wrote: >> On 23.02.2024 13:23, Oleksii wrote: >>> As 1- and 2-byte cases are emulated I decided that is not >>> to >>> provide >>> sfx argument for emulation macros as it will not have to >>> much >>> affect on >>> emulated types and just consume more performance on acquire >>> and >>> release >>> version of sc/ld instructions. >> >> Question is whether the common case (4- and 8-byte accesses) >> shouldn't >> be valued higher, with 1- and 2-byte emulation being there >> just >> to >> allow things to not break altogether. > If I understand you correctly, it would make sense to add the > 'sfx' > argument for the 1/2-byte access case, ensuring that all > options > are > available for 1/2-byte access case as well. That's one of the possibilities. As said, I'm not overly worried about the emulated cases. For the initial implementation I'd recommend going with what is easiest there, yielding the best possible result for the 4- and 8-byte cases. If later it turns out repeated acquire/release accesses are a problem in the emulation loop, things can be changed to explicit barriers, without touching the 4- and 8-byte cases. >>> I am confused then a little bit if emulated case is not an issue. >>> >>> For 4- and 8-byte cases for xchg .aqrl is used, for relaxed and >>> aqcuire >>> version of xchg barries are used. >>> >>> The similar is done for cmpxchg. >>> >>> If something will be needed to change in emulation loop it won't >>> require to change 4- and 8-byte cases. >> >> I'm afraid I don't understand your reply. > IIUC, emulated cases it is implemented correctly in terms of usage > barriers. And it also OK not to use sfx for lr/sc instructions and use > only barriers. > > For 4- and 8-byte cases are used sfx + barrier depending on the > specific case ( relaxed, acquire, release, generic xchg/cmpxchg ). > What also looks to me correct. But you suggested to provide the best > possible result for 4- and 8-byte cases. > > So I don't understand what the best possible result is as the current > one usage of __{cmp}xchg_generic for each specific case ( relaxed, > acquire, release, generic xchg/cmpxchg ) looks correct to me: > xchg -> (..., ".aqrl", "", "") just suffix .aqrl suffix without > barriers. > xchg_release -> (..., "", RISCV_RELEASE_BARRIER, "" ) use only release > barrier > xchg_acquire -> (..., "", "", RISCV_ACQUIRE_BARRIER ), only acquire > barrier > xchg_relaxed -> (..., "", "", "") - no barries, no sfx So first: While explicit barriers are technically okay, I don't follow why you insist on using them when you can achieve the same by suitably tagging the actual insn doing the exchange. Then second: It's somewhat hard for me to see the final effect on the emulation paths without you actually having done the switch. Maybe no special handling is necessary there anymore then. And as said, it may actually be acceptable for the emulation paths to "only" be correct, but not be ideal in terms of performance. After all, if you use the normal 4-byte primitive in there, more (non-explicit) barriers than needed would occur if the involved loop has to take more than one iteration. Which could (but imo doesn't need to be) avoided by using a more relaxed 4-byte primitive there and an explicit barrier outside of the loop. Jan