[Xen-devel] [linux-linus test] 148949: regressions - trouble: fail/pass/starved
flight 148949 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/148949/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 133580 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 133580 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 133580 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 133580 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 133580 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 133580 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 133580 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 133580 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 133580 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 133580 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-dom0pvh-xl-intel 15 guest-saverestore fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-dom0pvh-xl-amd 2 hosts-allocate starved n/a version targeted for testing: linux979e52ca0469fb38646bc51d26a0263a740c9f03 baseline version: linux736706bee3298208343a76096370e4f6a5c55915 Last test of basis 133580 2019-03-04 19:53:09 Z 386 days Failing since
[Xen-devel] [seabios test] 148952: regressions - FAIL
flight 148952 seabios real [real] http://logs.test-lab.xenproject.org/osstest/logs/148952/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 148666 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 148666 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 148666 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 148666 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail starved in 148666 version targeted for testing: seabios de88a9628426e82f1cee4b61b06e67e6787301b1 baseline version: seabios 066a9956097b54530888b88ab9aa1ea02e42af5a Last test of basis 148666 2020-03-17 13:39:45 Z7 days Failing since148690 2020-03-18 06:43:59 Z6 days9 attempts Testing same since 148794 2020-03-20 23:39:57 Z4 days5 attempts People who touched revisions under test: Gerd Hoffmann Matt DeVillier Paul Menzel jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm pass test-amd64-i386-xl-qemuu-debianhvm-i386-xsm pass test-amd64-amd64-qemuu-nested-amdfail test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-amd64-amd64-xl-qemuu-ws16-amd64 fail test-amd64-i386-xl-qemuu-ws16-amd64 fail test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrictpass test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict pass test-amd64-amd64-qemuu-nested-intel fail test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. commit de88a9628426e82f1cee4b61b06e67e6787301b1 Author: Paul Menzel Date: Wed Mar 4 14:51:27 2020 +0100 std/tcg: Replace zero-length array with flexible-array member GCC 10 gives the warnings below: In file included from out/ccode32flat.o.tmp.c:54: ./src/tcgbios.c: In function 'tpm20_write_EfiSpecIdEventStruct': ./src/tcgbios.c:290:30: warning: array subscript '() + 4294967295' is outside the bounds of an interior zero-length array 'struct TCG_EfiSpecIdEventAlgorithmSize[0]' [-Wzero-length-bounds] 290 | event.hdr.digestSizes[count].algorithmId = be16_to_cpu(sel->hashAlg); | ~^~~ In file included from ./src/tcgbios.c:22, from out/ccode32flat.o.tmp.c:54: ./src/std/tcg.h:527:7: note: while referencing 'digestSizes' 527 | } digestSizes[0]; | ^~~ In file included from out/ccode32flat.o.tmp.c:54: ./src/tcgbios.c:291:3
[Xen-devel] [qemu-mainline test] 148937: regressions - trouble: fail/pass/starved
flight 148937 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/148937/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-win7-amd64 10 windows-install fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-debianhvm-amd64 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-qemuu-nested-amd 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-i386-freebsd10-i386 11 guest-startfail REGR. vs. 144861 test-amd64-i386-qemuu-rhel6hvm-intel 10 redhat-install fail REGR. vs. 144861 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-install fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-debianhvm-amd64 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-ws16-amd64 10 windows-install fail REGR. vs. 144861 test-amd64-i386-qemuu-rhel6hvm-amd 10 redhat-install fail REGR. vs. 144861 test-amd64-i386-libvirt-pair 21 guest-start/debian fail REGR. vs. 144861 test-amd64-amd64-libvirt 12 guest-start fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-win7-amd64 10 windows-install fail REGR. vs. 144861 test-amd64-amd64-libvirt-xsm 12 guest-start fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-qemuu-nested-intel 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 10 debian-hvm-install fail REGR. vs. 144861 test-amd64-i386-freebsd10-amd64 11 guest-start fail REGR. vs. 144861 test-amd64-amd64-libvirt-pair 21 guest-start/debian fail REGR. vs. 144861 test-amd64-i386-libvirt-xsm 12 guest-start fail REGR. vs. 144861 test-amd64-i386-libvirt 12 guest-start fail REGR. vs. 144861 test-arm64-arm64-libvirt-xsm 12 guest-start fail REGR. vs. 144861 test-armhf-armhf-libvirt 12 guest-start fail REGR. vs. 144861 test-amd64-amd64-libvirt-vhd 10 debian-di-installfail REGR. vs. 144861 test-amd64-amd64-xl-qcow210 debian-di-installfail REGR. vs. 144861 test-armhf-armhf-xl-vhd 10 debian-di-installfail REGR. vs. 144861 test-armhf-armhf-libvirt-raw 10 debian-di-installfail REGR. vs. 144861 Tests which did not succeed, but are not blocking: test-amd64-amd64-dom0pvh-xl-intel 17 guest-saverestore.2 fail baseline untested test-amd64-i386-xl-pvshim12 guest-start fail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestor
[Xen-devel] [xen-unstable-smoke test] 148983: tolerable all pass - PUSHED
flight 148983 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/148983/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 0537d246f8db3ac0a1df2ce653b07e85cd887962 baseline version: xen 3ec1296ad3a823609eec479cb6c7ee493f6a888b Last test of basis 148966 2020-03-24 10:00:45 Z0 days Testing same since 148983 2020-03-24 17:00:40 Z0 days1 attempts People who touched revisions under test: Juergen Gross Julien Grall Paul Durrant Tamas K Lengyel jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 3ec1296ad3..0537d246f8 0537d246f8db3ac0a1df2ce653b07e85cd887962 -> smoke
Re: [Xen-devel] Moving Forward on XenSummit
On Mar 24, 2020, at 14:03, George Dunlap wrote: > > I wanted to let everyone know that the XenProject is moving forward with > plans to hold XenSummit this year, one way or another. > > There are two basic approaches the Advisory Board has been considering: > Postponing the even until later in the year, or holding a virtual event > during the same timeframe. Additionally, if we hold a virtual event during > the same timeframe, the Board wants to keep the option open of having a > smaller, in-person event later in the year, if circumstances permit. Due to variation in scope/timing of geo and company restrictions on travel, could some speakers present remotely for the in-person event? Could the Xen Summit CFP be re-opened for those who can present virtually, who may not have submitted due to travel restrictions? Rich
Re: [Xen-devel] [PATCH] Revert "domctl: improve locking during domain destruction"
On 24/03/2020 15:21, Hongyan Xia wrote: From: Hongyan Xia Unfortunately, even though that commit dropped the domctl lock and allowed other domctl to continue, it created severe lock contention within domain destructions themselves. Multiple domain destructions in parallel now spin for the global heap lock when freeing memory and could spend a long time before the next hypercall continuation. In contrast, after dropping that commit, parallel domain destructions will just fail to take the domctl lock, creating a hypercall continuation and backing off immediately, allowing the thread that holds the lock to destroy a domain much more quickly and allowing backed-off threads to process events and irqs. On a 144-core server with 4TiB of memory, destroying 32 guests (each with 4 vcpus and 122GiB memory) simultaneously takes: before the revert: 29 minutes after the revert: 6 minutes This is timed between the first page and the very last page of all 32 guests is released back to the heap. This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88. Signed-off-by: Hongyan Xia Reviewed-by: Julien Grall --- xen/common/domain.c | 11 +-- xen/common/domctl.c | 5 + 2 files changed, 2 insertions(+), 14 deletions(-) diff --git a/xen/common/domain.c b/xen/common/domain.c index b4eb476a9c..7b02f5ead7 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -698,20 +698,11 @@ int domain_kill(struct domain *d) if ( d == current->domain ) return -EINVAL; -/* Protected by d->domain_lock. */ +/* Protected by domctl_lock. */ switch ( d->is_dying ) { case DOMDYING_alive: -domain_unlock(d); domain_pause(d); -domain_lock(d); -/* - * With the domain lock dropped, d->is_dying may have changed. Call - * ourselves recursively if so, which is safe as then we won't come - * back here. - */ -if ( d->is_dying != DOMDYING_alive ) -return domain_kill(d); d->is_dying = DOMDYING_dying; argo_destroy(d); evtchn_destroy(d); diff --git a/xen/common/domctl.c b/xen/common/domctl.c index a69b3b59a8..e010079203 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -571,14 +571,11 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) break; case XEN_DOMCTL_destroydomain: -domctl_lock_release(); -domain_lock(d); ret = domain_kill(d); -domain_unlock(d); if ( ret == -ERESTART ) ret = hypercall_create_continuation( __HYPERVISOR_domctl, "h", u_domctl); -goto domctl_out_unlock_domonly; +break; case XEN_DOMCTL_setnodeaffinity: { -- Julien Grall
Re: [Xen-devel] [PATCH] Revert "domctl: improve locking during domain destruction"
On 24/03/2020 16:13, Jan Beulich wrote: On 24.03.2020 16:21, Hongyan Xia wrote: From: Hongyan Xia In contrast, after dropping that commit, parallel domain destructions will just fail to take the domctl lock, creating a hypercall continuation and backing off immediately, allowing the thread that holds the lock to destroy a domain much more quickly and allowing backed-off threads to process events and irqs. On a 144-core server with 4TiB of memory, destroying 32 guests (each with 4 vcpus and 122GiB memory) simultaneously takes: before the revert: 29 minutes after the revert: 6 minutes This wants comparing against numbers demonstrating the bad effects of the global domctl lock. Iirc they were quite a bit higher than 6 min, perhaps depending on guest properties. Your original commit message doesn't contain any clue in which cases the domctl lock was an issue. So please provide information on the setups you think it will make it worse. Cheers, -- Julien Grall
Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PGC_state_uninitialised
Hi David, On 24/03/2020 17:55, David Woodhouse wrote: On Tue, 2020-03-24 at 10:08 +, Julien Grall wrote: Hi David, On 23/03/2020 10:55, David Woodhouse wrote: On Mon, 2020-03-23 at 09:34 +, Julien Grall wrote: For liveupdate, we will need a way to initialize a page but mark it as already inuse (i.e in the same state as they would be if allocated normally). I am unconvinced of the veracity of this claim. We don't want to turn specific details of the current Xen buddy allocator part into of the implicit ABI of live update. That goes for the power-of-two zone boundaries, amongst other things. Why would you to do that? Marking the page as already used is no different to "PGC_state_unitialized" except the "struct page_info" and the internal of the buddy allocator would be properly setup for start rather than at free. The internals of the buddy allocator *cannot* be set up properly for a page which it would not actually give out — like the zero page. We *could* do some tricks to allocate the zone structures for zones which *do* exist but contain only "pre-allocated" pages so the buddy allocator has never seen those zones... yet. I think using "PGC_state_unitialised" for preserved page is an abuse. I understand this is existing in other part of Xen (particularly on x86), but I would rather not try to add more. I am perfectly happy to try avoiding PGC_state_uninitialised for pages which are "in use" at boot time due to live update. All I insist on is that you explicitly describe the ABI constraints that it imposes, if any. Agreed. For example, that hack which stops the buddy allocator from giving out page zero: Could we have live updated from a Xen without that hack, to a Xen which has it? With page zero already given out to a domain? The buddy allocator could never have given out page 0 on x86 because it is part of the first MB of the RAM (see arch_init_memory() in arch/x86/mm.c). AFAIK, the first MB cannot be freed.. The change in the buddy allocator was to address the Arm side and also make clear this was a problem this is a weakness of the allocator. What's yours? How would we cope with a situation like that, over LU? When do you expect the pages to be carved out from the buddy allocator? I can see only two situations: 1) Workaround a bug in the allocator. 2) A CPU errata requiring to not use memory. In the case of 1), it is still safe for a domain to run with that page. However, we don't want to give it back to the page allocator. A solution is to mark them as "offlining/broken". Alternatively, you could remove the swap the page (see more below). In the case of 2), it is not safe for a domain to run with that page. So it is probably best to swap the pages with a new one. For HVM domain (including the P2M), it should be easy. For PV domain, I am not entirely sure if that's feasible. Cheers, -- Julien Grall
[Xen-devel] Moving Forward on XenSummit
I wanted to let everyone know that the XenProject is moving forward with plans to hold XenSummit this year, one way or another. There are two basic approaches the Advisory Board has been considering: Postponing the even until later in the year, or holding a virtual event during the same timeframe. Additionally, if we hold a virtual event during the same timeframe, the Board wants to keep the option open of having a smaller, in-person event later in the year, if circumstances permit. Because the University of Bucharest has been very flexible, there is no rush to make a decision. As a result, the Advisory Board has recommended that we spend time looking into the options in detail, and make a final decision around mid-April, 6 weeks before the originally scheduled event. (As a side effect, the event webpage will have dates and places for the schedule as though we were still holding the event in Bucharest. These will be updated when we know what we’re planning to do instead.) # Physical and Virtual The XenSummit is an important event for our community. Some visible things that happen include: * To allow members of the community to communicate to everyone else what they've been working on in the previous year, and what they plan to work on in the future * To allow people to hash out technical issues in design sessions Just as critical, the XenSummit allows an innumerable number of small "hallway-track" conversations, as well as plain social interaction -- filling out email addresses with faces and personalities, allowing the community to run much more smoothly during the rest of the year. It is very clear that holding a virtual event will not be nearly as effective at those things as an in-person event. However, given the current uncertainty, it's not clear that the world will be ready for travel later in the Fall either. And if it were, there's a risk that many such postponed event will collide with other postponed events, reducing attendance. Additionally, we would need to either coordinate with the University term time, or find another venue, which could be much more expensive. Having a virtual Summit is much better than having no XenSummit at all. So the decision to be made will be to weigh the lower effectiveness of having a virtual Summit against the risk that a postponed event will turn out not to be possible. In the mean time, we are brainstorming ways to try to get as much of the benefits of an in-person summit as possible. If you have any thoughts or concrete suggestions along these lines -- in particular, things that have been tried and worked well or poorly in other virtual events that you've participated in -- then please let us know. To be clear, there is no thought of continuing to hold virtual events after the current pandemic has passed. We fully expect to have an in-person event in 2021. As always, if you have any thoughts or suggestions, please feel free to share them with me. Stay safe everyone, and look forward to seeing you all in person when things have returned to normal. -George Dunlap
Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PGC_state_uninitialised
On Tue, 2020-03-24 at 10:08 +, Julien Grall wrote: > Hi David, > > On 23/03/2020 10:55, David Woodhouse wrote: > > On Mon, 2020-03-23 at 09:34 +, Julien Grall wrote: > > > For liveupdate, we will need a way to initialize a page but mark it as > > > already inuse (i.e in the same state as they would be if allocated > > > normally). > > > > I am unconvinced of the veracity of this claim. > > > > We don't want to turn specific details of the current Xen buddy > > allocator part into of the implicit ABI of live update. That goes for > > the power-of-two zone boundaries, amongst other things. > > Why would you to do that? Marking the page as already used is no > different to "PGC_state_unitialized" except the "struct page_info" and > the internal of the buddy allocator would be properly setup for start > rather than at free. The internals of the buddy allocator *cannot* be set up properly for a page which it would not actually give out — like the zero page. We *could* do some tricks to allocate the zone structures for zones which *do* exist but contain only "pre-allocated" pages so the buddy allocator has never seen those zones... yet. > I think using "PGC_state_unitialised" for preserved page is an abuse. I > understand this is existing in other part of Xen (particularly on x86), > but I would rather not try to add more. I am perfectly happy to try avoiding PGC_state_uninitialised for pages which are "in use" at boot time due to live update. All I insist on is that you explicitly describe the ABI constraints that it imposes, if any. For example, that hack which stops the buddy allocator from giving out page zero: Could we have live updated from a Xen without that hack, to a Xen which has it? With page zero already given out to a domain? With PGC_state_initialised and passing the page through init_heap_pages() if/when it ever gets freed, my answer would be 'yes'. What's yours? How would we cope with a situation like that, over LU? smime.p7s Description: S/MIME cryptographic signature
Re: [Xen-devel] [XEN PATCH v3 15/23] xen/build: have the root Makefile generates the CFLAGS
On Mon, Mar 23, 2020 at 04:11:53PM +0100, Roger Pau Monné wrote: > On Thu, Mar 19, 2020 at 04:24:12PM +, Anthony PERARD wrote: > > So, testing for the -Wa,--strip-local-absolute flags turns out to be > > more complicated than I though it would be. > > - cc-option-add doesn't work because it doesn't test with the current list > >of CFLAGS. And if I add the CFLAGS, clang says the option is unused, > >it doesn't matter if -no-integrated-as is present or not. > > Oh, that seems like completely bogus clang behavior. It's my > understanding (from reading the manual) that whatever gets appended to > -Wa, is just passed to the assembler, in which case the compiler > has no idea whether it's used by it or not. > > Which version of clang did you use to test it? Probably 9.0.1, I don't think I've upgraded since my tests. -- Anthony PERARD
[Xen-devel] [PATCH 7/7] x86emul: support SYSRET
This is to augment SYSCALL, which has been supported for quite some time. Signed-off-by: Jan Beulich --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -5975,6 +5975,60 @@ x86_emulate( goto done; break; +case X86EMUL_OPC(0x0f, 0x07): /* sysret */ +vcpu_must_have(syscall); +/* Inject #UD if syscall/sysret are disabled. */ +fail_if(!ops->read_msr); +if ( (rc = ops->read_msr(MSR_EFER, &msr_val, ctxt)) != X86EMUL_OKAY ) +goto done; +generate_exception_if((msr_val & EFER_SCE) == 0, EXC_UD); +generate_exception_if(!amd_like(ctxt) && !mode_64bit(), EXC_UD); +generate_exception_if(!mode_ring0(), EXC_GP, 0); +generate_exception_if(!in_protmode(ctxt, ops), EXC_GP, 0); + +if ( (rc = ops->read_msr(MSR_STAR, &msr_val, ctxt)) != X86EMUL_OKAY ) +goto done; + +sreg.sel = ((msr_val >> 48) + 8) | 3; /* SELECTOR_RPL_MASK */ +cs.sel = op_bytes == 8 ? sreg.sel + 8 : sreg.sel - 8; + +cs.base = sreg.base = 0; /* flat segment */ +cs.limit = sreg.limit = ~0u; /* 4GB limit */ +cs.attr = 0xcfb; /* G+DB+P+DPL3+S+Code */ +sreg.attr = 0xcf3; /* G+DB+P+DPL3+S+Data */ + +#ifdef __x86_64__ +if ( mode_64bit() ) +{ +if ( op_bytes == 8 ) +{ +cs.attr = 0xafb; /* L+DB+P+DPL3+S+Code */ +generate_exception_if(!is_canonical_address(_regs.rcx) && + !amd_like(ctxt), EXC_GP, 0); +_regs.rip = _regs.rcx; +} +else +_regs.rip = _regs.ecx; + +_regs.eflags = _regs.r11 & ~(X86_EFLAGS_RF | X86_EFLAGS_VM); +} +else +#endif +{ +_regs.r(ip) = _regs.ecx; +_regs.eflags |= X86_EFLAGS_IF; +} + +fail_if(!ops->write_segment); +if ( (rc = ops->write_segment(x86_seg_cs, &cs, ctxt)) != X86EMUL_OKAY || + (!amd_like(ctxt) && + (rc = ops->write_segment(x86_seg_ss, &sreg, + ctxt)) != X86EMUL_OKAY) ) +goto done; + +singlestep = _regs.eflags & X86_EFLAGS_TF; +break; + case X86EMUL_OPC(0x0f, 0x08): /* invd */ case X86EMUL_OPC(0x0f, 0x09): /* wbinvd / wbnoinvd */ generate_exception_if(!mode_ring0(), EXC_GP, 0);
[Xen-devel] [PATCH 6/7] x86emul: vendor specific SYSCALL behavior
AMD CPUs permit the insn everywhere (even outside of protected mode), while Intel ones restrict it to 64-bit mode. While at it also add the so far missing CPUID bit check. Signed-off-by: Jan Beulich --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -1870,6 +1870,7 @@ amd_like(const struct x86_emulate_ctxt * #define vcpu_has_f16c()(ctxt->cpuid->basic.f16c) #define vcpu_has_rdrand() (ctxt->cpuid->basic.rdrand) +#define vcpu_has_syscall() (ctxt->cpuid->extd.syscall) #define vcpu_has_mmxext() (ctxt->cpuid->extd.mmxext || vcpu_has_sse()) #define vcpu_has_3dnow_ext() (ctxt->cpuid->extd._3dnowext) #define vcpu_has_3dnow() (ctxt->cpuid->extd._3dnow) @@ -5897,13 +5898,13 @@ x86_emulate( break; case X86EMUL_OPC(0x0f, 0x05): /* syscall */ -generate_exception_if(!in_protmode(ctxt, ops), EXC_UD); - +vcpu_must_have(syscall); /* Inject #UD if syscall/sysret are disabled. */ fail_if(ops->read_msr == NULL); if ( (rc = ops->read_msr(MSR_EFER, &msr_val, ctxt)) != X86EMUL_OKAY ) goto done; generate_exception_if((msr_val & EFER_SCE) == 0, EXC_UD); +generate_exception_if(!amd_like(ctxt) && !mode_64bit(), EXC_UD); if ( (rc = ops->read_msr(MSR_STAR, &msr_val, ctxt)) != X86EMUL_OKAY ) goto done;
[Xen-devel] [PATCH 5/7] x86emul: vendor specific SYSENTER/SYSEXIT behavior in long mode
Intel CPUs permit both insns there while AMD ones don't. While at it also - drop the ring 0 check from SYSENTER handling - neither Intel's nor AMD's insn pages have any indication of #GP(0) getting raised when executed from ring 0, and trying it out in practice also confirms the check shouldn't be there, - move SYSENTER segment register writing until after the (in principle able to fail) MSR reads. Signed-off-by: Jan Beulich --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -6588,7 +6588,7 @@ x86_emulate( case X86EMUL_OPC(0x0f, 0x34): /* sysenter */ vcpu_must_have(sep); -generate_exception_if(mode_ring0(), EXC_GP, 0); +generate_exception_if(amd_like(ctxt) && ctxt->lma, EXC_UD); generate_exception_if(!in_protmode(ctxt, ops), EXC_GP, 0); fail_if(ops->read_msr == NULL); @@ -6611,11 +6611,6 @@ x86_emulate( sreg.limit = ~0u; /* 4GB limit */ sreg.attr = 0xc93; /* G+DB+P+S+Data */ -fail_if(ops->write_segment == NULL); -if ( (rc = ops->write_segment(x86_seg_cs, &cs, ctxt)) != 0 || - (rc = ops->write_segment(x86_seg_ss, &sreg, ctxt)) != 0 ) -goto done; - if ( (rc = ops->read_msr(MSR_IA32_SYSENTER_EIP, &msr_val, ctxt)) != X86EMUL_OKAY ) goto done; @@ -6626,11 +6621,19 @@ x86_emulate( goto done; _regs.r(sp) = ctxt->lma ? msr_val : (uint32_t)msr_val; +fail_if(!ops->write_segment); +if ( (rc = ops->write_segment(x86_seg_cs, &cs, + ctxt)) != X86EMUL_OKAY || + (rc = ops->write_segment(x86_seg_ss, &sreg, + ctxt)) != X86EMUL_OKAY ) +goto done; + singlestep = _regs.eflags & X86_EFLAGS_TF; break; case X86EMUL_OPC(0x0f, 0x35): /* sysexit */ vcpu_must_have(sep); +generate_exception_if(amd_like(ctxt) && ctxt->lma, EXC_UD); generate_exception_if(!mode_ring0(), EXC_GP, 0); generate_exception_if(!in_protmode(ctxt, ops), EXC_GP, 0);
[Xen-devel] [PATCH 4/7] x86emul: vendor specific near indirect branch behavior in 64-bit mode
Intel CPUs ignore operand size overrides here, while AMD ones don't. Signed-off-by: Jan Beulich --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -813,6 +813,17 @@ static const struct { .opcode = { 0x66, 0x67, 0xe3, 0x10 }, .opc_len = { 4, 4 }, .disp = { 4 + 16 - MMAP_ADDR, 4 + 16 }, +}, { +.descr = "jmpw *(%rsp)", +.opcode = { 0x66, 0xff, 0x24, 0x24 }, +.opc_len = { 4, 4 }, +.disp = { STKVAL_DISP - MMAP_ADDR, STKVAL_DISP }, +}, { +.descr = "callw *(%rsp)", +.opcode = { 0x66, 0xff, 0x14, 0x24 }, +.opc_len = { 4, 4 }, +.stkoff = { -2, -8 }, +.disp = { STKVAL_DISP - MMAP_ADDR, STKVAL_DISP }, }, }; #endif --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -2524,8 +2524,7 @@ x86_decode_onebyte( { case 2: /* call (near) */ case 4: /* jmp (near) */ -case 6: /* push */ -if ( mode_64bit() && op_bytes == 4 ) +if ( mode_64bit() && (op_bytes == 4 || !amd_like(ctxt)) ) op_bytes = 8; state->desc = DstNone | SrcMem | Mov; break; @@ -2537,6 +2536,12 @@ x86_decode_onebyte( op_bytes = 4; state->desc = DstNone | SrcMem | Mov; break; + +case 6: /* push */ +if ( mode_64bit() && op_bytes == 4 ) +op_bytes = 8; +state->desc = DstNone | SrcMem | Mov; +break; } break; }
[Xen-devel] [PATCH 3/7] x86emul: vendor specific direct branch behavior in 64-bit mode
Intel CPUs ignore operand size overrides here, while AMD ones don't. Signed-off-by: Jan Beulich --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -757,6 +757,62 @@ static const struct { .opc_len = { 4, 4 }, .stkoff = { 2 + 16, 8 + 16 }, .disp = { STKVAL_DISP - MMAP_ADDR, STKVAL_DISP }, +}, { +.descr = "jmpw .+16", +.opcode = { 0x66, 0xeb, 0x10 }, +.opc_len = { 3, 3 }, +.disp = { 3 + 16 - MMAP_ADDR, 3 + 16 }, +}, { +.descr = "jmpw .+128", +.opcode = { 0x66, 0xe9, 0x80, 0x00, 0x00, 0x00 }, +.opc_len = { 4, 6 }, +.disp = { 4 + 128 - MMAP_ADDR, 6 + 128 }, +}, { +.descr = "callw .+16", +.opcode = { 0x66, 0xe8, 0x10, 0x00, 0x00, 0x00 }, +.opc_len = { 4, 6 }, +.stkoff = { -2, -8 }, +.disp = { 4 + 16 - MMAP_ADDR, 6 + 16 }, +}, { +.descr = "jzw .+16", +.opcode = { 0x66, 0x74, 0x10 }, +.opc_len = { 3, 3 }, +.disp = { 3, 3 }, +}, { +.descr = "jzw .+128", +.opcode = { 0x66, 0x0f, 0x84, 0x80, 0x00, 0x00, 0x00 }, +.opc_len = { 5, 7 }, +.disp = { 5, 7 }, +}, { +.descr = "jnzw .+16", +.opcode = { 0x66, 0x75, 0x10 }, +.opc_len = { 3, 3 }, +.disp = { 3 + 16 - MMAP_ADDR, 3 + 16 }, +}, { +.descr = "jnzw .+128", +.opcode = { 0x66, 0x0f, 0x85, 0x80, 0x00, 0x00, 0x00 }, +.opc_len = { 5, 7 }, +.disp = { 5 + 128 - MMAP_ADDR, 7 + 128 }, +}, { +.descr = "loopqw .+16 (RCX>1)", +.opcode = { 0x66, 0xe0, 0x10 }, +.opc_len = { 3, 3 }, +.disp = { 3 + 16 - MMAP_ADDR, 3 + 16 }, +}, { +.descr = "looplw .+16 (ECX=1)", +.opcode = { 0x66, 0x67, 0xe0, 0x10 }, +.opc_len = { 4, 4 }, +.disp = { 4, 4 }, +}, { +.descr = "jrcxzw .+16 (RCX>0)", +.opcode = { 0x66, 0xe3, 0x10 }, +.opc_len = { 3, 3 }, +.disp = { 3, 3 }, +}, { +.descr = "jecxzw .+16 (ECX=0)", +.opcode = { 0x66, 0x67, 0xe3, 0x10 }, +.opc_len = { 4, 4 }, +.disp = { 4 + 16 - MMAP_ADDR, 4 + 16 }, }, }; #endif @@ -1361,6 +1417,7 @@ int main(int argc, char **argv) const char *vendor = cp.x86_vendor == X86_VENDOR_INTEL ? "Intel" : "AMD"; uint64_t *stk = (void *)res + MMAP_SZ - 16; +regs.rcx = 2; for ( i = 0; i < ARRAY_SIZE(vendor_tests); ++i ) { printf("%-*s", @@ -1370,6 +1427,7 @@ int main(int argc, char **argv) regs.eflags = EFLAGS_ALWAYS_SET; regs.rip= (unsigned long)instr; regs.rsp= (unsigned long)stk; +regs.rcx |= 0x87654321UL; stk[0] = regs.rip + STKVAL_DISP; rc = x86_emulate(&ctxt, &emulops); if ( (rc != X86EMUL_OKAY) || @@ -1379,6 +1437,16 @@ int main(int argc, char **argv) ?: vendor_tests[i].opc_len[v])) || (regs.rsp != (unsigned long)stk + vendor_tests[i].stkoff[v]) ) goto fail; +/* For now only call insns push something onto the stack. */ +if ( regs.rsp < (unsigned long)stk ) +{ +unsigned long opc_end = (unsigned long)instr + +vendor_tests[i].opc_len[v]; + +if ( memcmp(&opc_end, (void *)regs.rsp, +min((unsigned long)stk - regs.rsp, 8UL)) ) +goto fail; +} printf("okay\n"); } --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -1273,7 +1273,7 @@ do { #define jmp_rel(rel)\ do {\ unsigned long ip = _regs.r(ip) + (int)(rel);\ -if ( op_bytes == 2 )\ +if ( op_bytes == 2 && (amd_like(ctxt) || !mode_64bit()) ) \ ip = (uint16_t)ip; \ else if ( !mode_64bit() ) \ ip = (uint32_t)ip; \ @@ -3392,7 +3392,13 @@ x86_decode( case SrcImm: if ( !(d & ByteOp) ) +{ +if ( mode_64bit() && !amd_like(ctxt) && + ((ext == ext_none && (b | 1) == 0xe9) /* call / jmp */ || + (ext == ext_0f && (b | 0xf) == 0x8f) /* jcc */ ) ) +op_bytes = 4; bytes = op_bytes != 8 ? op_bytes : 4; +} else { case SrcImmByte:
[Xen-devel] [PATCH 1/7] x86emul: add wrappers to check for AMD-like behavior
These are to aid readbility at their use sites, in particular because we're going to gain more of them. Suggested-by: Andrew Cooper Signed-off-by: Jan Beulich --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -1836,6 +1836,18 @@ in_protmode( return !(in_realmode(ctxt, ops) || (ctxt->regs->eflags & X86_EFLAGS_VM)); } +static bool +_amd_like(const struct cpuid_policy *cp) +{ +return cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON); +} + +static bool +amd_like(const struct x86_emulate_ctxt *ctxt) +{ +return _amd_like(ctxt->cpuid); +} + #define vcpu_has_fpu() (ctxt->cpuid->basic.fpu) #define vcpu_has_sep() (ctxt->cpuid->basic.sep) #define vcpu_has_cx8() (ctxt->cpuid->basic.cx8) @@ -1995,7 +2007,7 @@ protmode_load_seg( case x86_seg_tr: goto raise_exn; } -if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) || +if ( !_amd_like(cp) || !ops->read_segment || ops->read_segment(seg, sreg, ctxt) != X86EMUL_OKAY ) memset(sreg, 0, sizeof(*sreg)); @@ -2122,9 +2134,7 @@ protmode_load_seg( * - all 16 bytes read with the high 8 bytes ignored on AMD. */ bool wide = desc.b & 0x1000 -? false : (desc.b & 0xf00) != 0xc00 && - !(cp->x86_vendor & - (X86_VENDOR_AMD | X86_VENDOR_HYGON)) +? false : (desc.b & 0xf00) != 0xc00 && !_amd_like(cp) ? mode_64bit() : ctxt->lma; if ( wide ) @@ -2142,9 +2152,7 @@ protmode_load_seg( default: return rc; } -if ( !mode_64bit() && - (cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) && - (desc.b & 0xf00) != 0xc00 ) +if ( !mode_64bit() && _amd_like(cp) && (desc.b & 0xf00) != 0xc00 ) desc_hi.b = desc_hi.a = 0; if ( (desc_hi.b & 0x1f00) || (seg != x86_seg_none && @@ -2525,9 +2533,7 @@ x86_decode_onebyte( case 3: /* call (far, absolute indirect) */ case 5: /* jmp (far, absolute indirect) */ /* REX.W ignored on a vendor-dependent basis. */ -if ( op_bytes == 8 && - (ctxt->cpuid->x86_vendor & - (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) +if ( op_bytes == 8 && amd_like(ctxt) ) op_bytes = 4; state->desc = DstNone | SrcMem | Mov; break; @@ -2651,8 +2657,7 @@ x86_decode_twobyte( case 0xb4: /* lfs */ case 0xb5: /* lgs */ /* REX.W ignored on a vendor-dependent basis. */ -if ( op_bytes == 8 && - (ctxt->cpuid->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) +if ( op_bytes == 8 && amd_like(ctxt) ) op_bytes = 4; break; @@ -4068,9 +4073,7 @@ x86_emulate( if ( ea.type == OP_REG ) src.val = *ea.reg; else if ( (rc = read_ulong(ea.mem.seg, ea.mem.off, &src.val, - (op_bytes == 2 && -!(ctxt->cpuid->x86_vendor & - (X86_VENDOR_AMD | X86_VENDOR_HYGON)) + (op_bytes == 2 && !amd_like(ctxt) ? 2 : 4), ctxt, ops)) ) goto done;
[Xen-devel] [PATCH 2/7] x86emul: vendor specific near RET behavior in 64-bit mode
Intel CPUs ignore operand size overrides here, while AMD ones don't. Signed-off-by: Jan Beulich --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -733,6 +733,34 @@ static struct x86_emulate_ops emulops = #define EFLAGS_ALWAYS_SET (X86_EFLAGS_IF | X86_EFLAGS_MBS) #define EFLAGS_MASK (X86_EFLAGS_ARITH_MASK | EFLAGS_ALWAYS_SET) +#define MMAP_ADDR 0x10 + +#ifdef __x86_64__ +# define STKVAL_DISP 64 +static const struct { +const char *descr; +uint8_t opcode[8]; +/* Index 0: AMD, index 1: Intel. */ +uint8_t opc_len[2]; +int8_t stkoff[2]; +int32_t disp[2]; +} vendor_tests[] = { +{ +.descr = "retw", +.opcode = { 0x66, 0xc3 }, +.opc_len = { 2, 2 }, +.stkoff = { 2, 8 }, +.disp = { STKVAL_DISP - MMAP_ADDR, STKVAL_DISP }, +}, { +.descr = "retw $16", +.opcode = { 0x66, 0xc2, 0x10, 0x00 }, +.opc_len = { 4, 4 }, +.stkoff = { 2 + 16, 8 + 16 }, +.disp = { STKVAL_DISP - MMAP_ADDR, STKVAL_DISP }, +}, +}; +#endif + int main(int argc, char **argv) { struct x86_emulate_ctxt ctxt; @@ -741,7 +769,9 @@ int main(int argc, char **argv) unsigned int *res, i, j; bool stack_exec; int rc; -#ifndef __x86_64__ +#ifdef __x86_64__ +unsigned int vendor_native; +#else unsigned int bcdres_native, bcdres_emul; #endif @@ -755,7 +785,7 @@ int main(int argc, char **argv) ctxt.addr_size = 8 * sizeof(void *); ctxt.sp_size = 8 * sizeof(void *); -res = mmap((void *)0x10, MMAP_SZ, PROT_READ|PROT_WRITE|PROT_EXEC, +res = mmap((void *)MMAP_ADDR, MMAP_SZ, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); if ( res == MAP_FAILED ) { @@ -1323,7 +1353,41 @@ int main(int argc, char **argv) (regs.eip != (unsigned long)&instr[3]) ) goto fail; printf("okay\n"); -#endif + +vendor_native = cp.x86_vendor; +for ( cp.x86_vendor = X86_VENDOR_AMD; ; ) +{ +unsigned int v = cp.x86_vendor == X86_VENDOR_INTEL; +const char *vendor = cp.x86_vendor == X86_VENDOR_INTEL ? "Intel" : "AMD"; +uint64_t *stk = (void *)res + MMAP_SZ - 16; + +for ( i = 0; i < ARRAY_SIZE(vendor_tests); ++i ) +{ +printf("%-*s", + 40 - printf("Testing %s [%s]", vendor_tests[i].descr, vendor), + "..."); +memcpy(instr, vendor_tests[i].opcode, vendor_tests[i].opc_len[v]); +regs.eflags = EFLAGS_ALWAYS_SET; +regs.rip= (unsigned long)instr; +regs.rsp= (unsigned long)stk; +stk[0] = regs.rip + STKVAL_DISP; +rc = x86_emulate(&ctxt, &emulops); +if ( (rc != X86EMUL_OKAY) || + (regs.eflags != EFLAGS_ALWAYS_SET) || + (regs.rip != (unsigned long)instr + + (vendor_tests[i].disp[v] + ?: vendor_tests[i].opc_len[v])) || + (regs.rsp != (unsigned long)stk + vendor_tests[i].stkoff[v]) ) +goto fail; +printf("okay\n"); +} + +if ( cp.x86_vendor == X86_VENDOR_INTEL ) +break; +cp.x86_vendor = X86_VENDOR_INTEL; +} +cp.x86_vendor = vendor_native; +#endif /* x86-64 */ printf("%-40s", "Testing shld $1,%ecx,(%edx)..."); res[0] = 0x12345678; --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -4611,7 +4611,8 @@ x86_emulate( case 0xc2: /* ret imm16 (near) */ case 0xc3: /* ret (near) */ -op_bytes = ((op_bytes == 4) && mode_64bit()) ? 8 : op_bytes; +op_bytes = (op_bytes == 4 || !amd_like(ctxt)) && mode_64bit() + ? 8 : op_bytes; if ( (rc = read_ulong(x86_seg_ss, sp_post_inc(op_bytes + src.val), &dst.val, op_bytes, ctxt, ops)) != 0 || (rc = ops->insn_fetch(x86_seg_cs, dst.val, NULL, 0, ctxt)) )
[Xen-devel] [PATCH 0/7] x86emul: (mainly) vendor specific behavior adjustments
There are quite a few more vendor differences than we currently support, in particular in 64-bit mode. Now that I've made some progress on the binutils side I felt more confident in making the changes here as well. 1: add wrappers to check for AMD-like behavior 2: vendor specific near RET behavior in 64-bit mode 3: vendor specific direct branch behavior in 64-bit mode 4: vendor specific near indirect branch behavior in 64-bit mode 5: vendor specific SYSENTER/SYSEXIT behavior in long mode 6: vendor specific SYSCALL behavior 7: support SYSRET Jan
[Xen-devel] [xen-unstable test] 148925: tolerable trouble: fail/pass/starved
flight 148925 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/148925/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-dom0pvh-xl-intel 17 guest-saverestore.2 fail pass in 148873 test-arm64-arm64-xl 12 guest-startfail pass in 148873 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-rtds 16 guest-start/debian.repeat fail in 148873 like 148826 test-arm64-arm64-xl 13 migrate-support-check fail in 148873 never pass test-arm64-arm64-xl 14 saverestore-support-check fail in 148873 never pass test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail like 148873 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 148873 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 148873 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 148873 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 148873 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 148873 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 148873 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 148873 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 148873 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 148873 test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-dom0pvh-xl-amd 2 hosts-allocate starved n/a version targeted for testing: xen 60d6ba1916dce0622a53b00dbae3c01d0761057e baseline version: xen 60d6ba1916dce0622a53b00dbae3c01d0761057e Last test of basis 148925 2020-03-23 17:36:41 Z0 days Testing same
Re: [Xen-devel] [PATCH] Revert "domctl: improve locking during domain destruction"
On 24.03.2020 16:21, Hongyan Xia wrote: > From: Hongyan Xia > > Unfortunately, even though that commit dropped the domctl lock and > allowed other domctl to continue, it created severe lock contention > within domain destructions themselves. Multiple domain destructions in > parallel now spin for the global heap lock when freeing memory and could > spend a long time before the next hypercall continuation. I'm not at all happy to see this reverted; instead I was hoping that we could drop the domctl lock in further cases. If a lack of continuations is the problem, did you try forcing them to occur more frequently? > In contrast, > after dropping that commit, parallel domain destructions will just fail > to take the domctl lock, creating a hypercall continuation and backing > off immediately, allowing the thread that holds the lock to destroy a > domain much more quickly and allowing backed-off threads to process > events and irqs. > > On a 144-core server with 4TiB of memory, destroying 32 guests (each > with 4 vcpus and 122GiB memory) simultaneously takes: > > before the revert: 29 minutes > after the revert: 6 minutes This wants comparing against numbers demonstrating the bad effects of the global domctl lock. Iirc they were quite a bit higher than 6 min, perhaps depending on guest properties. Jan
[Xen-devel] [libvirt test] 148954: regressions - FAIL
flight 148954 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/148954/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 146182 build-i386-libvirt6 libvirt-buildfail REGR. vs. 146182 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 146182 build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 146182 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt-pair 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-vhd 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-raw 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-qcow2 1 build-check(1) blocked n/a test-amd64-i386-libvirt-pair 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-arm64-arm64-libvirt 1 build-check(1) blocked n/a version targeted for testing: libvirt 75c386985e02ef0d430d968a5c7093549507efba baseline version: libvirt a1cd25b919509be2645dbe6f952d5263e0d4e4e5 Last test of basis 146182 2020-01-17 06:00:23 Z 67 days Failing since146211 2020-01-18 04:18:52 Z 66 days 63 attempts Testing same since 148954 2020-03-24 04:19:05 Z0 days1 attempts People who touched revisions under test: Andrea Bolognani Arnaud Patard Boris Fiuczynski Christian Ehrhardt Collin Walling Daniel Henrique Barboza Daniel P. Berrangé Daniel Veillard Dario Faggioli Erik Skultety Gaurav Agrawal Han Han Jim Fehlig Jiri Denemark Jonathon Jongsma Julio Faracco Ján Tomko Laine Stump Lin Ma Marek Marczykowski-Górecki Michal Privoznik Nikolay Shirokovskiy Pavel Hrdina Pavel Mores Peter Krempa Pino Toscano Rafael Fonseca Richard W.M. Jones Rikard Falkeborn Ryan Moeller Sahid Orentino Ferdjaoui Sebastian Mitterle Seeteena Thoufeek Stefan Berger Stefan Berger Stefan Hajnoczi Thomas Huth Wu Qingliang Your Name Zhang Bo zhenwei pi Zhimin Feng jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt fail build-arm64-libvirt fail build-armhf-libvirt fail build-i386-libvirt fail build-amd64-pvopspass build-arm64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm blocked test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmblocked test-amd64-amd64-libvirt-xsm blocked test-arm64-arm64-libvirt-xsm blocked test-amd64-i386-libvirt-xsm blocked test-amd64-amd64-libvirt blocked test-arm64-arm64-libvirt blocked test-armhf-armhf-libvirt blocked test-amd64-i386-libvirt blocked test-amd64-amd64-libvirt-pairblocked test-amd64-i386-libvirt-pair blocked test-arm64-arm64-libvirt-qcow2 blocked test-armhf-armhf-libvirt-raw blocked test-amd64-amd64-libvirt-vhd blocked -
[Xen-devel] [ovmf test] 148946: all pass - PUSHED
flight 148946 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/148946/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 2f524a745e23e1b4c73ea22b058492bfe4af84c2 baseline version: ovmf 0c8ea9fe1adbbee230ee0c68f28b68ca2b0534bc Last test of basis 148761 2020-03-19 17:39:22 Z4 days Testing same since 148946 2020-03-24 02:46:56 Z0 days1 attempts People who touched revisions under test: Fan, ZhijuX Zhiju.Fan jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git 0c8ea9fe1a..2f524a745e 2f524a745e23e1b4c73ea22b058492bfe4af84c2 -> xen-tested-master
[Xen-devel] [PATCH] Revert "domctl: improve locking during domain destruction"
From: Hongyan Xia Unfortunately, even though that commit dropped the domctl lock and allowed other domctl to continue, it created severe lock contention within domain destructions themselves. Multiple domain destructions in parallel now spin for the global heap lock when freeing memory and could spend a long time before the next hypercall continuation. In contrast, after dropping that commit, parallel domain destructions will just fail to take the domctl lock, creating a hypercall continuation and backing off immediately, allowing the thread that holds the lock to destroy a domain much more quickly and allowing backed-off threads to process events and irqs. On a 144-core server with 4TiB of memory, destroying 32 guests (each with 4 vcpus and 122GiB memory) simultaneously takes: before the revert: 29 minutes after the revert: 6 minutes This is timed between the first page and the very last page of all 32 guests is released back to the heap. This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88. Signed-off-by: Hongyan Xia --- xen/common/domain.c | 11 +-- xen/common/domctl.c | 5 + 2 files changed, 2 insertions(+), 14 deletions(-) diff --git a/xen/common/domain.c b/xen/common/domain.c index b4eb476a9c..7b02f5ead7 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -698,20 +698,11 @@ int domain_kill(struct domain *d) if ( d == current->domain ) return -EINVAL; -/* Protected by d->domain_lock. */ +/* Protected by domctl_lock. */ switch ( d->is_dying ) { case DOMDYING_alive: -domain_unlock(d); domain_pause(d); -domain_lock(d); -/* - * With the domain lock dropped, d->is_dying may have changed. Call - * ourselves recursively if so, which is safe as then we won't come - * back here. - */ -if ( d->is_dying != DOMDYING_alive ) -return domain_kill(d); d->is_dying = DOMDYING_dying; argo_destroy(d); evtchn_destroy(d); diff --git a/xen/common/domctl.c b/xen/common/domctl.c index a69b3b59a8..e010079203 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -571,14 +571,11 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) break; case XEN_DOMCTL_destroydomain: -domctl_lock_release(); -domain_lock(d); ret = domain_kill(d); -domain_unlock(d); if ( ret == -ERESTART ) ret = hypercall_create_continuation( __HYPERVISOR_domctl, "h", u_domctl); -goto domctl_out_unlock_domonly; +break; case XEN_DOMCTL_setnodeaffinity: { -- 2.17.1
Re: [Xen-devel] [PATCH 1/2] xen: expand BALLOON_MEMORY_HOTPLUG description
On 24.03.20 16:18, Roger Pau Monné wrote: On Tue, Mar 24, 2020 at 04:13:48PM +0100, Jürgen Groß wrote: On 24.03.20 16:00, Roger Pau Monne wrote: To mention it's also useful for PVH or HVM domains that require mapping foreign memory or grants. Signed-off-by: Roger Pau Monné --- Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: xen-devel@lists.xenproject.org --- drivers/xen/Kconfig | 4 1 file changed, 4 insertions(+) diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index 61212fc7f0c7..57ddd6f4b729 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -19,6 +19,10 @@ config XEN_BALLOON_MEMORY_HOTPLUG It is very useful on critical systems which require long run without rebooting. + It's also very useful for translated domains (PVH or HVM) to obtain I'd rather say "(non PV)" or "(PVH, HVM or Arm)". I'm fine with any of the variants. Would you mind adjusting when picking it up or would you like me to resend? No need to resend. I'll use the "non PV" variant. Juergen
Re: [Xen-devel] [PATCH 1/2] xen: expand BALLOON_MEMORY_HOTPLUG description
On Tue, Mar 24, 2020 at 04:13:48PM +0100, Jürgen Groß wrote: > On 24.03.20 16:00, Roger Pau Monne wrote: > > To mention it's also useful for PVH or HVM domains that require > > mapping foreign memory or grants. > > > > Signed-off-by: Roger Pau Monné > > --- > > Cc: Boris Ostrovsky > > Cc: Juergen Gross > > Cc: Stefano Stabellini > > Cc: xen-devel@lists.xenproject.org > > --- > > drivers/xen/Kconfig | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig > > index 61212fc7f0c7..57ddd6f4b729 100644 > > --- a/drivers/xen/Kconfig > > +++ b/drivers/xen/Kconfig > > @@ -19,6 +19,10 @@ config XEN_BALLOON_MEMORY_HOTPLUG > > It is very useful on critical systems which require long > > run without rebooting. > > + It's also very useful for translated domains (PVH or HVM) to obtain > > I'd rather say "(non PV)" or "(PVH, HVM or Arm)". I'm fine with any of the variants. Would you mind adjusting when picking it up or would you like me to resend? > > + unpopulated physical memory ranges to use in order to map foreign > > + memory or grants. > > + > > Memory could be hotplugged in following steps: > > 1) target domain: ensure that memory auto online policy is in > > > > With that: > > Reviewed-by: Juergen Gross Thanks!
Re: [Xen-devel] [PATCH 2/2] xen: enable BALLOON_MEMORY_HOTPLUG by default
On Tue, Mar 24, 2020 at 04:09:35PM +0100, Jürgen Groß wrote: > On 24.03.20 16:00, Roger Pau Monne wrote: > > Without it a PVH dom0 is mostly useless, as it would balloon down huge > > amounts of RAM in order get physical address space to map foreign > > memory and grants, ultimately leading to an out of memory situation. > > > > Such option is also needed for HVM or PVH driver domains, since they > > also require mapping grants into physical memory regions. > > > > Suggested-by: Ian Jackson > > Signed-off-by: Roger Pau Monné > > --- > > Cc: Boris Ostrovsky > > Cc: Juergen Gross > > Cc: Stefano Stabellini > > Cc: xen-devel@lists.xenproject.org > > --- > > drivers/xen/Kconfig | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig > > index 57ddd6f4b729..c344bcffd89d 100644 > > --- a/drivers/xen/Kconfig > > +++ b/drivers/xen/Kconfig > > @@ -13,6 +13,7 @@ config XEN_BALLOON > > config XEN_BALLOON_MEMORY_HOTPLUG > > bool "Memory hotplug support for Xen balloon driver" > > depends on XEN_BALLOON && MEMORY_HOTPLUG > > + default y > > help > > Memory hotplug support for Xen balloon driver allows expanding memory > > available for the system above limit declared at system startup. > > > > Another variant would be to set: default XEN_BACKEND > > This would match the reasoning for switching it on. I would rather have it always on if possible, as gntdev or privcmd (when used to map foreign pages from user-space) will also require it, and they are not gated on XEN_BACKEND AFAICT. > Either way would be fine with me, so you can add > > Reviewed-by: Juergen Gross Thanks! Roger.
Re: [Xen-devel] [PATCH 1/2] xen: expand BALLOON_MEMORY_HOTPLUG description
On 24.03.20 16:00, Roger Pau Monne wrote: To mention it's also useful for PVH or HVM domains that require mapping foreign memory or grants. Signed-off-by: Roger Pau Monné --- Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: xen-devel@lists.xenproject.org --- drivers/xen/Kconfig | 4 1 file changed, 4 insertions(+) diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index 61212fc7f0c7..57ddd6f4b729 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -19,6 +19,10 @@ config XEN_BALLOON_MEMORY_HOTPLUG It is very useful on critical systems which require long run without rebooting. + It's also very useful for translated domains (PVH or HVM) to obtain I'd rather say "(non PV)" or "(PVH, HVM or Arm)". + unpopulated physical memory ranges to use in order to map foreign + memory or grants. + Memory could be hotplugged in following steps: 1) target domain: ensure that memory auto online policy is in With that: Reviewed-by: Juergen Gross Juergen
Re: [Xen-devel] [PATCH v2] xen/sched: fix onlining cpu with core scheduling active
On Tue, 2020-03-10 at 10:06 +0100, Juergen Gross wrote: > When onlining a cpu cpupool_cpu_add() checks whether all siblings of > the new cpu are free in order to decide whether to add it to > cpupool0. > In case the added cpu is not the last sibling to be onlined this test > is wrong as it only checks for all online siblings to be free. The > test should include the check for the number of siblings having > reached the scheduling granularity of cpupool0, too. > > Signed-off-by: Juergen Gross > Reviewed-by: Dario Faggioli Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part
Re: [Xen-devel] [PATCH 2/2] xen: enable BALLOON_MEMORY_HOTPLUG by default
On 24.03.20 16:00, Roger Pau Monne wrote: Without it a PVH dom0 is mostly useless, as it would balloon down huge amounts of RAM in order get physical address space to map foreign memory and grants, ultimately leading to an out of memory situation. Such option is also needed for HVM or PVH driver domains, since they also require mapping grants into physical memory regions. Suggested-by: Ian Jackson Signed-off-by: Roger Pau Monné --- Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: xen-devel@lists.xenproject.org --- drivers/xen/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index 57ddd6f4b729..c344bcffd89d 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -13,6 +13,7 @@ config XEN_BALLOON config XEN_BALLOON_MEMORY_HOTPLUG bool "Memory hotplug support for Xen balloon driver" depends on XEN_BALLOON && MEMORY_HOTPLUG + default y help Memory hotplug support for Xen balloon driver allows expanding memory available for the system above limit declared at system startup. Another variant would be to set: default XEN_BACKEND This would match the reasoning for switching it on. Either way would be fine with me, so you can add Reviewed-by: Juergen Gross Juergen
[Xen-devel] [PATCH 2/2] xen: enable BALLOON_MEMORY_HOTPLUG by default
Without it a PVH dom0 is mostly useless, as it would balloon down huge amounts of RAM in order get physical address space to map foreign memory and grants, ultimately leading to an out of memory situation. Such option is also needed for HVM or PVH driver domains, since they also require mapping grants into physical memory regions. Suggested-by: Ian Jackson Signed-off-by: Roger Pau Monné --- Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: xen-devel@lists.xenproject.org --- drivers/xen/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index 57ddd6f4b729..c344bcffd89d 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -13,6 +13,7 @@ config XEN_BALLOON config XEN_BALLOON_MEMORY_HOTPLUG bool "Memory hotplug support for Xen balloon driver" depends on XEN_BALLOON && MEMORY_HOTPLUG + default y help Memory hotplug support for Xen balloon driver allows expanding memory available for the system above limit declared at system startup. -- 2.25.0
[Xen-devel] [PATCH 1/2] xen: expand BALLOON_MEMORY_HOTPLUG description
To mention it's also useful for PVH or HVM domains that require mapping foreign memory or grants. Signed-off-by: Roger Pau Monné --- Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: xen-devel@lists.xenproject.org --- drivers/xen/Kconfig | 4 1 file changed, 4 insertions(+) diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index 61212fc7f0c7..57ddd6f4b729 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -19,6 +19,10 @@ config XEN_BALLOON_MEMORY_HOTPLUG It is very useful on critical systems which require long run without rebooting. + It's also very useful for translated domains (PVH or HVM) to obtain + unpopulated physical memory ranges to use in order to map foreign + memory or grants. + Memory could be hotplugged in following steps: 1) target domain: ensure that memory auto online policy is in -- 2.25.0
Re: [Xen-devel] xl vcpu-pin peculiarities in core scheduling mode
On Tue, 2020-03-24 at 15:22 +0100, Jürgen Groß wrote: > On 24.03.20 14:34, Sergey Dyasli wrote: > > I did some experiments and noticed > > the following > > inconsistencies: > > > >1. xl vcpu-pin 5 0 0 > > Windows 10 (64-bit) (1) 5 00 > > -b-1644.0 0 / all > > Windows 10 (64-bit) (1) 5 11 > > -b-1650.1 0 / all > > ^ > > ^ > > CPU 1 doesn't match reported hard-affinity of 0. Should this > > command set > > hard-affinity of vCPU 1 to 1? Or should it be 0-1 for both > > vCPUs instead? > > I think this is fine. For improving how this is reported back to users, I'd go for the solution nr 3 proposed by Juergen (below). > >2. xl vcpu-pin 5 0 1 > > libxl: error: libxl_sched.c:62:libxl__set_vcpuaffinity: > > Domain 5:Setting vcpu affinity: Invalid argument > > This is expected but perhaps needs documenting somewhere? > > Not against more clear error reporting. It would mean that libxl must have a way to tell that pinning failed because pinning was not being done to a "master CPU". I guess it's doable, but perhaps it's not the top priority, assuming we have (and we put in place, if we still don't) good documentation on how pinning works in this operational mode. That would make a good article/blog post, I think. > >3. xl vcpu-pin 5 0 1-2 > > Windows 10 (64-bit) (1) 5 02 > > -b-1646.7 1-2 / all > > Windows 10 (64-bit) (1) 5 13 > > -b-1651.6 1-2 / all > > ^ > > ^^^ > > Here is a CPU / affinity mismatch again, but the more > > interesting fact > > is that setting 1-2 is allowed at all, I'd expect CPU would > > never be set > > to 1 with such settings. > > This is the situation I'm most concerned of. Mostly, because I think a user might be surprised to see the command (1) not failing and (2) having the effect that it has. I think that, in this case, we should either fail, or adjust the affinity to 2-3. If we do the latter, we should inform the user about that. There's something similar in libxl already (related to soft and hard affinity, where we set a mask, then we check what's been actually setup by Xen and act accordingly). Thoughts? I'd go for a mix of 1 and 3, i.e., I'd do: > 1. As today, documenting the output. > Not very nice IMO, but the least effort. > This, i.e., we definitely need more documentation and we need to make sure it's visible enough. > 2. Just print one line for each virtual cpu/core/socket, like: > Windows 10 (64-bit) (1)5 0-1 0-1 -b-1646.7 0-1 / > all > This has the disadvantage of dropping the per-vcpu time in favor > of > per-vcore time, OTOH this is reflecting reality. > > 3. Print the effective pinnings: > Windows 10 (64-bit) (1)5 0 0 -b-1646.7 0 / > all > Windows 10 (64-bit) (1)5 1 1 -b-1646.7 1 / > all > Should be rather easy to do. > And this: i.e., I'd always report the effective mapping. I actually would go as far as changing the mapping we've been given and store the effective one(s) in `cpu_hard_affinity`, etc, in Xen. Of course, as said above, we'd need to inform the user that this has happened. Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part
Re: [Xen-devel] [XEN PATCH v3 2/2] xen/arm: Configure early printk via Kconfig
On Tue, Mar 17, 2020 at 10:51:34AM -0700, Stefano Stabellini wrote: > On Tue, 17 Mar 2020, Julien Grall wrote: > > I noticed below you added "depends on ARM_64" on the Xilinx SoC. In > > general, platform specific options are tied to either arm32 or arm64, > > even if the UART "driver" is arch agnostic. > > > > You could technically boot Xen on Arm 32-bit on Armv8 HW provided they > > support 32-bit at the hypervisor level, but we never supported this > > case. So I am wondering whether we should add depends on each > > earlyprintk. Stefano, any opinions? > > Well spotted. > > Xilinx doesn't support 32-bit Xen on their boards, "support" as in test, > run or validate. So it would not be a problem from Xilinx point of view > to add a "depends on ARM_64". > > I take that you are suggesting adding "depends on ARM_64/32" under the > legacy platform earlyprintk options, from EARLY_PRINTK_BRCM to > EARLY_PRINTK_ZYNQMP right? If so, I am fine with it, and it seems like a > good idea. I don't have useful information on which Xen bitness each platform can boot or support, so I can't really add those "depends on". But that could be done in a follow-up. > The other new generic earlyprintk options, the ones that only depend on > the uart driver, from EARLY_UART_CHOICE_8250 to EARLY_UART_CHOICE_SCIF, > it feels more natural to leave them without a specific arch dependency. That would mean adding drivers for both arm32 and arm64. For example, debug-cadence.inc is only available in arm64/. So if someone selects arm32 and the cadence early uart driver, there's going to be a compile error. That's the only reason on why I've added "depends on" on all EARLY_UART_CHOICE_*. Thanks, -- Anthony PERARD
Re: [Xen-devel] [PATCH v6 5/5] domain: use PGC_extra domheap page for shared_info
> -Original Message- [snip] > > Currently shared_info is a shared xenheap page but shared xenheap pages > > complicate future plans for live-update of Xen so it is desirable to, > > where possible, not use them [1]. This patch therefore converts shared_info > > into a PGC_extra domheap page. This does entail freeing shared_info during > > domain_relinquish_resources() rather than domain_destroy() so care is > > needed to avoid de-referencing a NULL shared_info pointer hence some > > extra checks of 'is_dying' are needed. > > > > NOTE: For Arm, the call to free_shared_info() in arch_domain_destroy() is > >left in place since it is idempotent and called in the error path for > >arch_domain_create(). > > The approach looks good to me. I have one comment below. > Thanks. > [...] > > > diff --git a/xen/common/domain.c b/xen/common/domain.c > > index 4ef0d3b21e..4f3266454f 100644 > > --- a/xen/common/domain.c > > +++ b/xen/common/domain.c > > @@ -1651,24 +1651,44 @@ int continue_hypercall_on_cpu( > > > > int alloc_shared_info(struct domain *d, unsigned int memflags) > > { > > -if ( (d->shared_info.virt = alloc_xenheap_pages(0, memflags)) == NULL ) > > +struct page_info *pg; > > + > > +pg = alloc_domheap_page(d, MEMF_no_refcount | memflags); > > +if ( !pg ) > > return -ENOMEM; > > > > -d->shared_info.mfn = virt_to_mfn(d->shared_info.virt); > > +if ( !get_page_and_type(pg, d, PGT_writable_page) ) > > +{ > > +/* > > + * The domain should not be running at this point so there is > > + * no way we should reach this error path. > > + */ > > +ASSERT_UNREACHABLE(); > > +return -ENODATA; > > +} > > + > > +d->shared_info.mfn = page_to_mfn(pg); > > +d->shared_info.virt = __map_domain_page_global(pg); > > __map_domain_page_global() is not guaranteed to succeed. For instance, > on Arm32 this will be a call to vmap(). > > So you want to check the return. > Ok, I'll add a check. As Jan discovered, I messed up the version numbering so there is already a v7 series where I dropped this patch (until I can group it with conversion of other shared xenheap pages). Paul > Cheers, > > -- > Julien Grall
Re: [Xen-devel] [PATCH v4 1/3] mm: keep PGC_extra pages on a separate list
On 18/03/2020 17:32, Paul Durrant wrote: This patch adds a new page_list_head into struct domain to hold PGC_extra pages. This avoids them getting confused with 'normal' domheap pages where the domain's page_list is walked. A new dump loop is also added to dump_pageframe_info() to unconditionally dump the 'extra page list'. Signed-off-by: Paul Durrant Reviewed-by: Jan Beulich Acked-by: Julien Grall --- Cc: Andrew Cooper Cc: George Dunlap Cc: Ian Jackson Cc: Julien Grall Cc: Stefano Stabellini Cc: Wei Liu Cc: "Roger Pau Monné" v7: - Cosmetic changes v6: - New in v6 --- xen/arch/x86/domain.c| 9 + xen/common/domain.c | 1 + xen/common/page_alloc.c | 2 +- xen/include/asm-x86/mm.h | 6 ++ xen/include/xen/mm.h | 5 ++--- xen/include/xen/sched.h | 13 + 6 files changed, 28 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index caf2ecad7e..683bc619aa 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -251,12 +251,21 @@ void dump_pageframe_info(struct domain *d) p2m_pod_dump_data(d); spin_lock(&d->page_alloc_lock); + page_list_for_each ( page, &d->xenpage_list ) { printk("XenPage %p: caf=%08lx, taf=%" PRtype_info "\n", _p(mfn_x(page_to_mfn(page))), page->count_info, page->u.inuse.type_info); } + +page_list_for_each ( page, &d->extra_page_list ) +{ +printk("ExtraPage %p: caf=%08lx, taf=%" PRtype_info "\n", + _p(mfn_x(page_to_mfn(page))), + page->count_info, page->u.inuse.type_info); +} + spin_unlock(&d->page_alloc_lock); } diff --git a/xen/common/domain.c b/xen/common/domain.c index b4eb476a9c..3dcd73f67c 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -403,6 +403,7 @@ struct domain *domain_create(domid_t domid, spin_lock_init_prof(d, page_alloc_lock); spin_lock_init(&d->hypercall_deadlock_mutex); INIT_PAGE_LIST_HEAD(&d->page_list); +INIT_PAGE_LIST_HEAD(&d->extra_page_list); INIT_PAGE_LIST_HEAD(&d->xenpage_list); spin_lock_init(&d->node_affinity_lock); diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 76d37226df..10b7aeca48 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -2314,7 +2314,7 @@ int assign_pages( smp_wmb(); /* Domain pointer must be visible before updating refcnt. */ pg[i].count_info = (pg[i].count_info & PGC_extra) | PGC_allocated | 1; -page_list_add_tail(&pg[i], &d->page_list); +page_list_add_tail(&pg[i], page_to_list(d, &pg[i])); } out: diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h index a06b2fb81f..1fa334b306 100644 --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -629,10 +629,8 @@ typedef struct mm_rwlock { const char*locker_function; /* func that took it */ } mm_rwlock_t; -#define arch_free_heap_page(d, pg) \ -page_list_del2(pg, is_xen_heap_page(pg) ? \ - &(d)->xenpage_list : &(d)->page_list,\ - &(d)->arch.relmem_list) +#define arch_free_heap_page(d, pg) \ +page_list_del2(pg, page_to_list(d, pg), &(d)->arch.relmem_list) extern const char zero_page[]; diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h index d0d095d9c7..a163c201e2 100644 --- a/xen/include/xen/mm.h +++ b/xen/include/xen/mm.h @@ -583,9 +583,8 @@ static inline unsigned int get_order_from_pages(unsigned long nr_pages) void scrub_one_page(struct page_info *); #ifndef arch_free_heap_page -#define arch_free_heap_page(d, pg) \ -page_list_del(pg, is_xen_heap_page(pg) ?\ - &(d)->xenpage_list : &(d)->page_list) +#define arch_free_heap_page(d, pg) \ +page_list_del(pg, page_to_list(d, pg)) #endif int xenmem_add_to_physmap_one(struct domain *d, unsigned int space, diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index e6813288ab..4b78291d51 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -329,6 +329,7 @@ struct domain spinlock_t page_alloc_lock; /* protects all the following fields */ struct page_list_head page_list; /* linked list */ +struct page_list_head extra_page_list; /* linked list (size extra_pages) */ struct page_list_head xenpage_list; /* linked list (size xenheap_pages) */ /* @@ -512,6 +513,18 @@ struct domain #endif }; +static inline struct page_list_head *page_to_list( +struct domain *d, const struct page_info *pg) +{ +if ( is_xen_heap_page(pg) ) +return &d->xenpage_list; + +if ( pg->count_info & PGC_extra ) +return &d->extra_page_list; + +return &d->page_list; +} + /* Return number of pages currently posessed by
Re: [Xen-devel] [PATCH v4 3/3] mm: add 'is_special_page' inline function...
On 18/03/2020 17:32, Paul Durrant wrote: From: Paul Durrant ... to cover xenheap and PGC_extra pages. PGC_extra pages are intended to hold data structures that are associated with a domain and may be mapped by that domain. They should not be treated as 'normal' guest pages (i.e. RAM or page tables). Hence, in many cases where code currently tests is_xen_heap_page() it should also check for the PGC_extra bit in 'count_info'. This patch therefore defines is_special_page() to cover both cases and converts tests of is_xen_heap_page() (or open coded tests of PGC_xen_heap) to is_special_page() where the page is assigned to a domain. Signed-off-by: Paul Durrant Acked-by: Tamas K Lengyel Acked-by: Julien Grall --- Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: "Roger Pau Monné" Cc: George Dunlap Cc: Ian Jackson Cc: Julien Grall Cc: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Tim Deegan v7: - Fixed some uses of is_xen_heap_mfn() that I'd missed - Updated commit comment to point out that only tests on assigned xenheap pages are candidates for conversion v6: - Convert open-coded checks of PGC_xen_heap to use is_special_page() where appropriate v4: - Use inline function instead of macro - Add missing conversions from is_xen_heap_page() v3: - Delete obsolete comment. v2: - New in v2 --- xen/arch/x86/domctl.c | 2 +- xen/arch/x86/mm.c | 13 ++--- xen/arch/x86/mm/altp2m.c| 2 +- xen/arch/x86/mm/mem_sharing.c | 3 +-- xen/arch/x86/mm/p2m-pod.c | 12 +++- xen/arch/x86/mm/p2m.c | 4 ++-- xen/arch/x86/mm/shadow/common.c | 13 - xen/arch/x86/mm/shadow/multi.c | 3 ++- xen/arch/x86/tboot.c| 4 ++-- xen/include/xen/mm.h| 5 + 10 files changed, 35 insertions(+), 26 deletions(-) diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index ed86762fa6..add70126b9 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -394,7 +394,7 @@ long arch_do_domctl( page = get_page_from_gfn(d, gfn, &t, P2M_ALLOC); if ( unlikely(!page) || - unlikely(is_xen_heap_page(page)) ) + unlikely(is_special_page(page)) ) { if ( unlikely(p2m_is_broken(t)) ) type = XEN_DOMCTL_PFINFO_BROKEN; diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 62507ca651..2fac67ad57 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -1014,7 +1014,7 @@ get_page_from_l1e( unsigned long cacheattr = pte_flags_to_cacheattr(l1f); int err; -if ( is_xen_heap_page(page) ) +if ( is_special_page(page) ) { if ( write ) put_page_type(page); @@ -2447,7 +2447,7 @@ static int cleanup_page_mappings(struct page_info *page) { page->count_info &= ~PGC_cacheattr_mask; -BUG_ON(is_xen_heap_page(page)); +BUG_ON(is_special_page(page)); rc = update_xen_mappings(mfn, 0); } @@ -2477,7 +2477,7 @@ static int cleanup_page_mappings(struct page_info *page) rc = rc2; } -if ( likely(!is_xen_heap_page(page)) ) +if ( likely(!is_special_page(page)) ) { ASSERT((page->u.inuse.type_info & (PGT_type_mask | PGT_count_mask)) == PGT_writable_page); @@ -4216,8 +4216,7 @@ int steal_page( if ( !(owner = page_get_owner_and_reference(page)) ) goto fail; -if ( owner != d || is_xen_heap_page(page) || - (page->count_info & PGC_extra) ) +if ( owner != d || is_special_page(page) ) goto fail_put; /* @@ -4580,8 +4579,8 @@ int xenmem_add_to_physmap_one( prev_mfn = get_gfn(d, gfn_x(gpfn), &p2mt); if ( mfn_valid(prev_mfn) ) { -if ( is_xen_heap_mfn(prev_mfn) ) -/* Xen heap frames are simply unhooked from this phys slot. */ +if ( is_special_page(mfn_to_page(prev_mfn)) ) +/* Special pages are simply unhooked from this phys slot. */ rc = guest_physmap_remove_page(d, gpfn, prev_mfn, PAGE_ORDER_4K); else /* Normal domain memory is freed, to avoid leaking memory. */ diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c index 50768f2547..c091b03ea3 100644 --- a/xen/arch/x86/mm/altp2m.c +++ b/xen/arch/x86/mm/altp2m.c @@ -77,7 +77,7 @@ int altp2m_vcpu_enable_ve(struct vcpu *v, gfn_t gfn) * pageable() predicate for this, due to it having the same properties * that we want. */ -if ( !p2m_is_pageable(p2mt) || is_xen_heap_page(pg) ) +if ( !p2m_is_pageable(p2mt) || is_special_page(pg) ) { rc = -EINVAL; goto err; diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 3835bc928f..f49f27a3ef 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem
Re: [Xen-devel] [PATCH v6 5/5] domain: use PGC_extra domheap page for shared_info
Hi Paul, On 10/03/2020 17:49, p...@xen.org wrote: From: Paul Durrant Currently shared_info is a shared xenheap page but shared xenheap pages complicate future plans for live-update of Xen so it is desirable to, where possible, not use them [1]. This patch therefore converts shared_info into a PGC_extra domheap page. This does entail freeing shared_info during domain_relinquish_resources() rather than domain_destroy() so care is needed to avoid de-referencing a NULL shared_info pointer hence some extra checks of 'is_dying' are needed. NOTE: For Arm, the call to free_shared_info() in arch_domain_destroy() is left in place since it is idempotent and called in the error path for arch_domain_create(). The approach looks good to me. I have one comment below. [...] diff --git a/xen/common/domain.c b/xen/common/domain.c index 4ef0d3b21e..4f3266454f 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -1651,24 +1651,44 @@ int continue_hypercall_on_cpu( int alloc_shared_info(struct domain *d, unsigned int memflags) { -if ( (d->shared_info.virt = alloc_xenheap_pages(0, memflags)) == NULL ) +struct page_info *pg; + +pg = alloc_domheap_page(d, MEMF_no_refcount | memflags); +if ( !pg ) return -ENOMEM; -d->shared_info.mfn = virt_to_mfn(d->shared_info.virt); +if ( !get_page_and_type(pg, d, PGT_writable_page) ) +{ +/* + * The domain should not be running at this point so there is + * no way we should reach this error path. + */ +ASSERT_UNREACHABLE(); +return -ENODATA; +} + +d->shared_info.mfn = page_to_mfn(pg); +d->shared_info.virt = __map_domain_page_global(pg); __map_domain_page_global() is not guaranteed to succeed. For instance, on Arm32 this will be a call to vmap(). So you want to check the return. Cheers, -- Julien Grall
Re: [Xen-devel] xl vcpu-pin peculiarities in core scheduling mode
On 24.03.20 14:34, Sergey Dyasli wrote: Hi Juergen, I've notived there is no documentation about how vcpu-pin is supposed to work with core scheduling enabled. I did some experiments and noticed the following inconsistencies: 1. xl vcpu-pin 5 0 0 Windows 10 (64-bit) (1) 5 00 -b-1644.0 0 / all Windows 10 (64-bit) (1) 5 11 -b-1650.1 0 / all ^ ^ CPU 1 doesn't match reported hard-affinity of 0. Should this command set hard-affinity of vCPU 1 to 1? Or should it be 0-1 for both vCPUs instead? 2. xl vcpu-pin 5 0 1 libxl: error: libxl_sched.c:62:libxl__set_vcpuaffinity: Domain 5:Setting vcpu affinity: Invalid argument This is expected but perhaps needs documenting somewhere? 3. xl vcpu-pin 5 0 1-2 Windows 10 (64-bit) (1) 5 02 -b-1646.7 1-2 / all Windows 10 (64-bit) (1) 5 13 -b-1651.6 1-2 / all ^ ^^^ Here is a CPU / affinity mismatch again, but the more interesting fact is that setting 1-2 is allowed at all, I'd expect CPU would never be set to 1 with such settings. Please let me know what you think about the above cases. I think all of the effects can be explained by the way how pinning with core scheduling is implemented. This does not mean that the information presented to the user shouldn't be adapted. Basically pinning of any vcpu will just affect the "master"-vcpu of a virtual core (sibling 0). It will happily accept any setting as long as any "master"-cpu of a core is in the resulting set of cpus. All vcpus of a virtual core share the same pinnings. I think this explains all of the above scenarios. IMO there are the following possibilities for reporting those pinnings to the user: 1. As today, documenting the output. Not very nice IMO, but the least effort. 2. Just print one line for each virtual cpu/core/socket, like: Windows 10 (64-bit) (1)5 0-1 0-1 -b-1646.7 0-1 / all This has the disadvantage of dropping the per-vcpu time in favor of per-vcore time, OTOH this is reflecting reality. 3. Print the effective pinnings: Windows 10 (64-bit) (1)5 0 0 -b-1646.7 0 / all Windows 10 (64-bit) (1)5 1 1 -b-1646.7 1 / all Should be rather easy to do. Thoughts? Juergen
[Xen-devel] xl vcpu-pin peculiarities in core scheduling mode
Hi Juergen, I've notived there is no documentation about how vcpu-pin is supposed to work with core scheduling enabled. I did some experiments and noticed the following inconsistencies: 1. xl vcpu-pin 5 0 0 Windows 10 (64-bit) (1) 5 00 -b-1644.0 0 / all Windows 10 (64-bit) (1) 5 11 -b-1650.1 0 / all ^ ^ CPU 1 doesn't match reported hard-affinity of 0. Should this command set hard-affinity of vCPU 1 to 1? Or should it be 0-1 for both vCPUs instead? 2. xl vcpu-pin 5 0 1 libxl: error: libxl_sched.c:62:libxl__set_vcpuaffinity: Domain 5:Setting vcpu affinity: Invalid argument This is expected but perhaps needs documenting somewhere? 3. xl vcpu-pin 5 0 1-2 Windows 10 (64-bit) (1) 5 02 -b-1646.7 1-2 / all Windows 10 (64-bit) (1) 5 13 -b-1651.6 1-2 / all ^ ^^^ Here is a CPU / affinity mismatch again, but the more interesting fact is that setting 1-2 is allowed at all, I'd expect CPU would never be set to 1 with such settings. Please let me know what you think about the above cases. -- Thanks, Sergey
Re: [Xen-devel] [PATCH OSSTEST] kernel-build: enable XEN_BALLOON_MEMORY_HOTPLUG
On 24.03.20 13:00, Roger Pau Monné wrote: Adding Juergen and Boris for feedback. On Tue, Mar 24, 2020 at 11:57:48AM +, Ian Jackson wrote: Ian Jackson writes ("Re: [PATCH OSSTEST] kernel-build: enable XEN_BALLOON_MEMORY_HOTPLUG"): Roger Pau Monne writes ("[PATCH OSSTEST] kernel-build: enable XEN_BALLOON_MEMORY_HOTPLUG"): This allows a PVH/HVM domain to use unpopulated memory ranges to map foreign memory or grants, and is required for a PVH dom0 to function properly. Signed-off-by: Roger Pau Monné Acked-by: Ian Jackson I will push this to pretest immediately. Now done. Would you consider whether the default should be changed in Linux and prepare a patch to do so if appropriate ? DYK if there's any reason why this is not on by default? AFAIK the default has been off since introduction in 2011. I don't know the reason for that. Juergen
Re: [Xen-devel] [PATCH v5 00/10] x86emul: further work
Paul,On 24.03.2020 13:26, Jan Beulich wrote: > Some of the later patches are still at least partly RFC, for > varying reasons (see there). I'd appreciate though if at least > some of the earlier ones could go in rather sooner than later. > > Patch 1 functionally (for the test harness) depends on > "libx86/CPUID: fix (not just) leaf 7 processing", while at > least patch 2 contextually depends on "x86emul: disable > FPU/MMX/SIMD insn emulation when !HVM". > > 1: x86emul: support AVX512_BF16 insns I should note that I also have a VP2INTERSECT patch ready, but the just released SDE segfaults when trying to test it. I'll be holding this back for some more time, I guess. > 2: x86emul: support MOVDIRI insn > 3: x86: determine HAVE_AS_* just once > 4: x86: move back clang no integrated assembler tests > 5: x86emul: support MOVDIR64B insn > 6: x86emul: support ENQCMD insn > 7: x86/HVM: scale MPERF values reported to guests (on AMD) > 8: x86emul: support RDPRU > 9: x86/HVM: don't needlessly intercept APERF/MPERF/TSC MSR reads > 10: x86emul: support MCOMMIT Paul, I should also note that I mistakenly Cc-ed your old Citrix address. I'd like to avoid re-posting the series - do you perhaps nevertheless get the xen-devel copies? Jan
[Xen-devel] [PATCH v5 10/10] x86emul: support MCOMMIT
The dependency on a new EFER bit implies that we need to set that bit ourselves in order to be able to successfully invoke the insn. Also once again introduce the SVM related constants at this occasion. Signed-off-by: Jan Beulich --- RFC: The exact meaning of the PM stating "any errors encountered by those stores have been signaled to associated error logging resources" is unclear. Depending on what this entails, blindly enabling EFER.MCOMMIT may not be a good idea. Hence the RFC. --- v5: Re-base. v4: New. --- a/tools/libxl/libxl_cpuid.c +++ b/tools/libxl/libxl_cpuid.c @@ -261,6 +261,7 @@ int libxl_cpuid_parse_config(libxl_cpuid {"clzero", 0x8008, NA, CPUID_REG_EBX, 0, 1}, {"rstr-fp-err-ptrs", 0x8008, NA, CPUID_REG_EBX, 2, 1}, {"rdpru",0x8008, NA, CPUID_REG_EBX, 4, 1}, +{"mcommit", 0x8008, NA, CPUID_REG_EBX, 8, 1}, {"wbnoinvd", 0x8008, NA, CPUID_REG_EBX, 9, 1}, {"ibpb", 0x8008, NA, CPUID_REG_EBX, 12, 1}, {"ppin", 0x8008, NA, CPUID_REG_EBX, 23, 1}, --- a/tools/misc/xen-cpuid.c +++ b/tools/misc/xen-cpuid.c @@ -149,7 +149,7 @@ static const char *const str_e8b[32] = [ 4] = "rdpru", -/* [ 8] */[ 9] = "wbnoinvd", +[ 8] = "mcommit", [ 9] = "wbnoinvd", [12] = "ibpb", --- a/xen/arch/x86/cpu/amd.c +++ b/xen/arch/x86/cpu/amd.c @@ -798,6 +798,9 @@ static void init_amd(struct cpuinfo_x86 wrmsr(MSR_K7_HWCR, l, h); } + if (cpu_has(c, X86_FEATURE_MCOMMIT)) + write_efer(read_efer() | EFER_MCOMMIT); + /* Prevent TSC drift in non single-processor, single-core platforms. */ if ((smp_processor_id() == 1) && !cpu_has(c, X86_FEATURE_ITSC)) disable_c1_ramping(); --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -1877,6 +1877,7 @@ in_protmode( #define vcpu_has_fma4()(ctxt->cpuid->extd.fma4) #define vcpu_has_tbm() (ctxt->cpuid->extd.tbm) #define vcpu_has_clzero() (ctxt->cpuid->extd.clzero) +#define vcpu_has_mcommit() (ctxt->cpuid->extd.mcommit) #define vcpu_has_rdpru() (ctxt->cpuid->extd.rdpru) #define vcpu_has_wbnoinvd()(ctxt->cpuid->extd.wbnoinvd) @@ -5676,6 +5677,28 @@ x86_emulate( _regs.r(cx) = (uint32_t)msr_val; goto rdtsc; +case 0xfa: /* monitorx / mcommit */ +if ( vex.pfx == vex_f3 ) +{ +bool cf; + +host_and_vcpu_must_have(mcommit); +if ( !ops->read_msr || + ops->read_msr(MSR_EFER, &msr_val, ctxt) != X86EMUL_OKAY ) +msr_val = 0; +generate_exception_if(!(msr_val & EFER_MCOMMIT), EXC_UD); +memcpy(get_stub(stub), + ((uint8_t[]){ 0xf3, 0x0f, 0x01, 0xfa, 0xc3 }), 5); +_regs.eflags &= ~EFLAGS_MASK; +invoke_stub("", ASM_FLAG_OUT(, "setc %[cf]"), +[cf] ASM_FLAG_OUT("=@ccc", "=qm") (cf) : "i" (0)); +if ( cf ) +_regs.eflags |= X86_EFLAGS_CF; +put_stub(stub); +goto done; +} +goto unrecognized_insn; + case 0xfc: /* clzero */ { unsigned long zero = 0; --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -131,6 +131,9 @@ #define cpu_has_avx512_4fmaps boot_cpu_has(X86_FEATURE_AVX512_4FMAPS) #define cpu_has_tsx_force_abort boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT) +/* CPUID level 0x8008.ebx */ +#define cpu_has_mcommit boot_cpu_has(X86_FEATURE_MCOMMIT) + /* CPUID level 0x0007:1.eax */ #define cpu_has_avx512_bf16 boot_cpu_has(X86_FEATURE_AVX512_BF16) --- a/xen/include/asm-x86/hvm/svm/vmcb.h +++ b/xen/include/asm-x86/hvm/svm/vmcb.h @@ -78,6 +78,11 @@ enum GenericIntercept2bits GENERAL2_INTERCEPT_RDPRU = 1 << 14, }; +/* general 3 intercepts */ +enum GenericIntercept3bits +{ +GENERAL3_INTERCEPT_MCOMMIT = 1 << 3, +}; /* control register intercepts */ enum CRInterceptBits @@ -300,6 +305,7 @@ enum VMEXIT_EXITCODE VMEXIT_MWAIT_CONDITIONAL= 140, /* 0x8c */ VMEXIT_XSETBV = 141, /* 0x8d */ VMEXIT_RDPRU= 142, /* 0x8e */ +VMEXIT_MCOMMIT = 163, /* 0xa3 */ VMEXIT_NPF = 1024, /* 0x400, nested paging fault */ VMEXIT_INVALID = -1 }; @@ -406,7 +412,8 @@ struct vmcb_struct { u32 _exception_intercepts; /* offset 0x08 - cleanbit 0 */ u32 _general1_intercepts; /* offset 0x0C - cleanbit 0 */ u32 _general2_intercepts; /* offset 0x10 - cleanbit 0 */ -u32 res01[10]; +u32 _general3_intercepts; /* offset 0x14 - cleanbit 0 */ +u32 res01[9]; u16 _pause_filter_thresh; /* offset 0x3C - cleanbit 0 */ u16 _pause_filter_count;/* off
[Xen-devel] [PATCH v5 09/10] x86/HVM: don't needlessly intercept APERF/MPERF/TSC MSR reads
If the hardware can handle accesses, we should allow it to do so. This way we can expose EFRO to HVM guests, and "all" that's left for exposing APERF/MPERF is to figure out how to handle writes to these MSRs. (Note that the leaf 6 guest CPUID checks will evaluate to false for now, as recalculate_misc() zaps the entire leaf for now.) For TSC the intercepts are made mirror the RDTSC ones. Signed-off-by: Jan Beulich --- v4: Make TSC intercepts mirror RDTSC ones. Re-base. v3: New. --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -595,6 +595,7 @@ static void svm_cpuid_policy_changed(str struct vmcb_struct *vmcb = svm->vmcb; const struct cpuid_policy *cp = v->domain->arch.cpuid; u32 bitmap = vmcb_get_exception_intercepts(vmcb); +unsigned int mode; if ( opt_hvm_fep || (v->domain->arch.cpuid->x86_vendor != boot_cpu_data.x86_vendor) ) @@ -607,6 +608,17 @@ static void svm_cpuid_policy_changed(str /* Give access to MSR_PRED_CMD if the guest has been told about it. */ svm_intercept_msr(v, MSR_PRED_CMD, cp->extd.ibpb ? MSR_INTERCEPT_NONE : MSR_INTERCEPT_RW); + +/* Allow direct reads from APERF/MPERF if permitted by the policy. */ +mode = cp->basic.raw[6].c & CPUID6_ECX_APERFMPERF_CAPABILITY + ? MSR_INTERCEPT_WRITE : MSR_INTERCEPT_RW; +svm_intercept_msr(v, MSR_IA32_APERF, mode); +svm_intercept_msr(v, MSR_IA32_MPERF, mode); + +/* Allow direct access to their r/o counterparts if permitted. */ +mode = cp->extd.efro ? MSR_INTERCEPT_NONE : MSR_INTERCEPT_RW; +svm_intercept_msr(v, MSR_APERF_RD_ONLY, mode); +svm_intercept_msr(v, MSR_MPERF_RD_ONLY, mode); } void svm_sync_vmcb(struct vcpu *v, enum vmcb_sync_state new_state) @@ -860,7 +872,10 @@ static void svm_set_rdtsc_exiting(struct { general1_intercepts |= GENERAL1_INTERCEPT_RDTSC; general2_intercepts |= GENERAL2_INTERCEPT_RDTSCP; +svm_enable_intercept_for_msr(v, MSR_IA32_TSC); } +else +svm_intercept_msr(v, MSR_IA32_TSC, MSR_INTERCEPT_WRITE); vmcb_set_general1_intercepts(vmcb, general1_intercepts); vmcb_set_general2_intercepts(vmcb, general2_intercepts); --- a/xen/arch/x86/hvm/svm/vmcb.c +++ b/xen/arch/x86/hvm/svm/vmcb.c @@ -108,6 +108,7 @@ static int construct_vmcb(struct vcpu *v { vmcb->_general1_intercepts |= GENERAL1_INTERCEPT_RDTSC; vmcb->_general2_intercepts |= GENERAL2_INTERCEPT_RDTSCP; +svm_intercept_msr(v, MSR_IA32_TSC, MSR_INTERCEPT_WRITE); } /* Guest segment limits. */ --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -1141,8 +1141,13 @@ static int construct_vmcs(struct vcpu *v vmx_clear_msr_intercept(v, MSR_IA32_SYSENTER_CS, VMX_MSR_RW); vmx_clear_msr_intercept(v, MSR_IA32_SYSENTER_ESP, VMX_MSR_RW); vmx_clear_msr_intercept(v, MSR_IA32_SYSENTER_EIP, VMX_MSR_RW); + +if ( !(v->arch.hvm.vmx.exec_control & CPU_BASED_RDTSC_EXITING) ) +vmx_clear_msr_intercept(v, MSR_IA32_TSC, VMX_MSR_R); + if ( paging_mode_hap(d) && (!is_iommu_enabled(d) || iommu_snoop) ) vmx_clear_msr_intercept(v, MSR_IA32_CR_PAT, VMX_MSR_RW); + if ( (vmexit_ctl & VM_EXIT_CLEAR_BNDCFGS) && (vmentry_ctl & VM_ENTRY_LOAD_BNDCFGS) ) vmx_clear_msr_intercept(v, MSR_IA32_BNDCFGS, VMX_MSR_RW); --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -585,6 +585,18 @@ static void vmx_cpuid_policy_changed(str vmx_clear_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW); else vmx_set_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW); + +/* Allow direct reads from APERF/MPERF if permitted by the policy. */ +if ( cp->basic.raw[6].c & CPUID6_ECX_APERFMPERF_CAPABILITY ) +{ +vmx_clear_msr_intercept(v, MSR_IA32_APERF, VMX_MSR_R); +vmx_clear_msr_intercept(v, MSR_IA32_MPERF, VMX_MSR_R); +} +else +{ +vmx_set_msr_intercept(v, MSR_IA32_APERF, VMX_MSR_R); +vmx_set_msr_intercept(v, MSR_IA32_MPERF, VMX_MSR_R); +} } int vmx_guest_x86_mode(struct vcpu *v) @@ -1250,7 +1262,12 @@ static void vmx_set_rdtsc_exiting(struct vmx_vmcs_enter(v); v->arch.hvm.vmx.exec_control &= ~CPU_BASED_RDTSC_EXITING; if ( enable ) +{ v->arch.hvm.vmx.exec_control |= CPU_BASED_RDTSC_EXITING; +vmx_set_msr_intercept(v, MSR_IA32_TSC, VMX_MSR_R); +} +else +vmx_clear_msr_intercept(v, MSR_IA32_TSC, VMX_MSR_R); vmx_update_cpu_exec_control(v); vmx_vmcs_exit(v); } --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -243,7 +243,7 @@ XEN_CPUFEATURE(ENQCMD,6*32+29) / /* AMD-defined CPU features, CPUID level 0x8007.edx, word 7 */ XEN_CPUFEATURE(ITSC, 7*32+ 8) /* Invariant TSC */ -XEN_CPUFEATURE(EFRO, 7*32+10) /* APERF/MPERF Read Only interface */ +XEN_CPUFEATURE(EFRO,
[Xen-devel] [xen-unstable-smoke test] 148966: tolerable all pass - PUSHED
flight 148966 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/148966/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 3ec1296ad3a823609eec479cb6c7ee493f6a888b baseline version: xen 60d6ba1916dce0622a53b00dbae3c01d0761057e Last test of basis 148813 2020-03-21 17:00:59 Z2 days Testing same since 148966 2020-03-24 10:00:45 Z0 days1 attempts People who touched revisions under test: Andrew Cooper David Woodhouse Hongyan Xia Ian Jackson Pu Wen Yan Yankovskyi jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 60d6ba1916..3ec1296ad3 3ec1296ad3a823609eec479cb6c7ee493f6a888b -> smoke
[Xen-devel] [PATCH v5 08/10] x86emul: support RDPRU
While the PM doesn't say so, this assumes that the MPERF value read this way gets scaled similarly to its reading through RDMSR. Also introduce the SVM related constants at this occasion. Signed-off-by: Jan Beulich --- v5: The CPUID field used is just 8 bits wide. v4: Add GENERAL2_INTERCEPT_RDPRU and VMEXIT_RDPRU enumerators. Fold handling of out of bounds indexes into switch(). Avoid recalculate_misc() clobbering what recalculate_cpu_policy() has done. Re-base. v3: New. --- RFC: Andrew promised to take care of the CPUID side of this; re-base over his work once available. --- a/tools/libxl/libxl_cpuid.c +++ b/tools/libxl/libxl_cpuid.c @@ -260,6 +260,7 @@ int libxl_cpuid_parse_config(libxl_cpuid {"clzero", 0x8008, NA, CPUID_REG_EBX, 0, 1}, {"rstr-fp-err-ptrs", 0x8008, NA, CPUID_REG_EBX, 2, 1}, +{"rdpru",0x8008, NA, CPUID_REG_EBX, 4, 1}, {"wbnoinvd", 0x8008, NA, CPUID_REG_EBX, 9, 1}, {"ibpb", 0x8008, NA, CPUID_REG_EBX, 12, 1}, {"ppin", 0x8008, NA, CPUID_REG_EBX, 23, 1}, --- a/tools/misc/xen-cpuid.c +++ b/tools/misc/xen-cpuid.c @@ -147,6 +147,8 @@ static const char *const str_e8b[32] = [ 0] = "clzero", [ 2] = "rstr-fp-err-ptrs", +[ 4] = "rdpru", + /* [ 8] */[ 9] = "wbnoinvd", [12] = "ibpb", --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -683,6 +683,13 @@ static int read_msr( { switch ( reg ) { +case 0x00e8: /* APERF */ +case 0xc0e8: /* APERF_RD_ONLY */ +#define APERF_LO_VALUE 0xAEAEAEAE +#define APERF_HI_VALUE 0xEAEAEAEA +*val = ((uint64_t)APERF_HI_VALUE << 32) | APERF_LO_VALUE; +return X86EMUL_OKAY; + case 0xc080: /* EFER */ *val = ctxt->addr_size > 32 ? 0x500 /* LME|LMA */ : 0; return X86EMUL_OKAY; @@ -2249,6 +2256,30 @@ int main(int argc, char **argv) else printf("skipped\n"); +printf("%-40s", "Testing rdpru..."); +instr[0] = 0x0f; instr[1] = 0x01; instr[2] = 0xfd; +regs.eip = (unsigned long)&instr[0]; +regs.ecx = 1; +regs.eflags = EFLAGS_ALWAYS_SET; +rc = x86_emulate(&ctxt, &emulops); +if ( (rc != X86EMUL_OKAY) || + (regs.eax != APERF_LO_VALUE) || (regs.edx != APERF_HI_VALUE) || + !(regs.eflags & X86_EFLAGS_CF) || + (regs.eip != (unsigned long)&instr[3]) ) +goto fail; +if ( ctxt.cpuid->extd.rdpru_max < 0x ) +{ +regs.eip = (unsigned long)&instr[0]; +regs.ecx = ctxt.cpuid->extd.rdpru_max + 1; +regs.eflags = EFLAGS_ALWAYS_SET | X86_EFLAGS_CF; +rc = x86_emulate(&ctxt, &emulops); +if ( (rc != X86EMUL_OKAY) || regs.eax || regs.edx || + (regs.eflags & X86_EFLAGS_CF) || + (regs.eip != (unsigned long)&instr[3]) ) +goto fail; +} +printf("okay\n"); + printf("%-40s", "Testing movq %mm3,(%ecx)..."); if ( stack_exec && cpu_has_mmx ) { --- a/tools/tests/x86_emulator/x86-emulate.c +++ b/tools/tests/x86_emulator/x86-emulate.c @@ -78,6 +78,8 @@ bool emul_test_init(void) cp.feat.rdpid = true; cp.feat.movdiri = true; cp.extd.clzero = true; +cp.extd.rdpru = true; +cp.extd.rdpru_max = 1; if ( cpu_has_xsave ) { @@ -150,11 +152,11 @@ int emul_test_cpuid( } /* - * The emulator doesn't itself use CLZERO, so we can always run the + * The emulator doesn't itself use CLZERO/RDPRU, so we can always run the * respective test(s). */ if ( leaf == 0x8008 ) -res->b |= 1U << 0; +res->b |= (1U << 0) | (1U << 4); return X86EMUL_OKAY; } --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -243,8 +243,6 @@ static void recalculate_misc(struct cpui /* Most of Power/RAS hidden from guests. */ p->extd.raw[0x7].a = p->extd.raw[0x7].b = p->extd.raw[0x7].c = 0; -p->extd.raw[0x8].d = 0; - switch ( p->x86_vendor ) { case X86_VENDOR_INTEL: @@ -263,6 +261,7 @@ static void recalculate_misc(struct cpui p->extd.raw[0x8].a &= 0x; p->extd.raw[0x8].c = 0; +p->extd.raw[0x8].d = 0; break; case X86_VENDOR_AMD: @@ -281,6 +280,7 @@ static void recalculate_misc(struct cpui p->extd.raw[0x8].a &= 0x; /* GuestMaxPhysAddr hidden. */ p->extd.raw[0x8].c &= 0x0003f0ff; +p->extd.raw[0x8].d &= 0x; p->extd.raw[0x9] = EMPTY_LEAF; @@ -643,6 +643,11 @@ void recalculate_cpuid_policy(struct dom p->extd.maxlinaddr = p->extd.lm ? 48 : 32; +if ( p->extd.rdpru ) +p->extd.rdpru_max = min(p->extd.rdpru_max, max->extd.rdpru_max); +else +p->extd.rdpru_max = 0; + recalculate_xstate(p); recalculate_misc(p); --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -1877,6 +1877,7 @@ in_
[Xen-devel] [PATCH v5 07/10] x86/HVM: scale MPERF values reported to guests (on AMD)
AMD's PM specifies that MPERF (and its r/o counterpart) reads are affected by the TSC ratio. Hence when processing such reads in software we too should scale the values. While we don't currently (yet) expose the underlying feature flags, besides us allowing the MSRs to be read nevertheless, RDPRU is going to expose the values even to user space. Furthermore, due to the not exposed feature flags, this change has the effect of making properly inaccessible (for reads) the two MSRs. Note that writes to MPERF (and APERF) continue to be unsupported. Signed-off-by: Jan Beulich --- v3: New. --- I did consider whether to put the code in guest_rdmsr() instead, but decided that it's better to have it next to TSC handling. --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3456,6 +3456,22 @@ int hvm_msr_read_intercept(unsigned int *msr_content = v->arch.hvm.msr_tsc_adjust; break; +case MSR_MPERF_RD_ONLY: +if ( !d->arch.cpuid->extd.efro ) +{ +goto gp_fault; + +case MSR_IA32_MPERF: +if ( !(d->arch.cpuid->basic.raw[6].c & + CPUID6_ECX_APERFMPERF_CAPABILITY) ) +goto gp_fault; +} +if ( rdmsr_safe(msr, *msr_content) ) +goto gp_fault; +if ( d->arch.cpuid->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) ) +*msr_content = hvm_get_guest_tsc_fixed(v, *msr_content); +break; + case MSR_APIC_BASE: *msr_content = vcpu_vlapic(v)->hw.apic_base_msr; break; --- a/xen/include/asm-x86/msr-index.h +++ b/xen/include/asm-x86/msr-index.h @@ -397,6 +397,9 @@ #define MSR_IA32_MPERF 0x00e7 #define MSR_IA32_APERF 0x00e8 +#define MSR_MPERF_RD_ONLY 0xc0e7 +#define MSR_APERF_RD_ONLY 0xc0e8 + #define MSR_IA32_THERM_CONTROL 0x019a #define MSR_IA32_THERM_INTERRUPT 0x019b #define MSR_IA32_THERM_STATUS 0x019c
[Xen-devel] [PATCH v5 06/10] x86emul: support ENQCMD insn
Introduce a new blk() hook, paralleling the rmw() on in certain way, but being intended for larger data sizes, and hence its HVM intermediate handling function doesn't fall back to splitting the operation if the requested virtual address can't be mapped. Note that the ISA extensions document revision 037 doesn't specify exception behavior for ModRM.mod == 0b11; assuming #UD here. No tests are being added to the harness - this would be quite hard, we can't just issue the insns against RAM. Their similarity with MOVDIR64B should have the test case there be god enough to cover any fundamental flaws. Signed-off-by: Jan Beulich --- TBD: This doesn't (can't) consult PASID translation tables yet, as we have no VMX code for this so far. I guess for this we will want to replace the direct ->read_msr(MSR_IA32_PASID, ...) with a new ->read_pasid() hook. --- v5: New. --- a/xen/arch/x86/Makefile +++ b/xen/arch/x86/Makefile @@ -250,13 +250,14 @@ $(BASEDIR)/include/asm-x86/asm-macros.h: # sure we pick up changes when the compiler used has changed.) ifeq ($(MAKECMDGOALS),asm-offsets.s) -as-ISA-list := CLWB EPT FSGSBASE INVPCID MOVDIR64B RDRAND RDSEED SSE4_2 VMX XSAVEOPT +as-ISA-list := CLWB ENQCMD EPT FSGSBASE INVPCID MOVDIR64B RDRAND RDSEED SSE4_2 VMX XSAVEOPT CLWB-insn := clwb (%rax) EPT-insn := invept (%rax),%rax FSGSBASE-insn := rdfsbase %rax INVPCID-insn := invpcid (%rax),%rax MOVDIR64B-insn := movdir64b (%rax),%rax +ENQCMD-insn:= enqcmd (%rax),%rax RDRAND-insn:= rdrand %eax RDSEED-insn:= rdseed %eax SSE4_2-insn:= crc32 %eax,%eax --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -854,6 +854,7 @@ struct x86_emulate_state { rmw_xor, } rmw; enum { +blk_enqcmd, blk_movdir, } blk; uint8_t modrm, modrm_mod, modrm_reg, modrm_rm; @@ -900,6 +901,7 @@ typedef union { uint64_t __attribute__ ((aligned(16))) xmm[2]; uint64_t __attribute__ ((aligned(32))) ymm[4]; uint64_t __attribute__ ((aligned(64))) zmm[8]; +uint32_t data32[16]; } mmval_t; /* @@ -1909,6 +1911,7 @@ in_protmode( #define vcpu_has_rdpid() (ctxt->cpuid->feat.rdpid) #define vcpu_has_movdiri() (ctxt->cpuid->feat.movdiri) #define vcpu_has_movdir64b() (ctxt->cpuid->feat.movdir64b) +#define vcpu_has_enqcmd() (ctxt->cpuid->feat.enqcmd) #define vcpu_has_avx512_4vnniw() (ctxt->cpuid->feat.avx512_4vnniw) #define vcpu_has_avx512_4fmaps() (ctxt->cpuid->feat.avx512_4fmaps) #define vcpu_has_avx512_bf16() (ctxt->cpuid->feat.avx512_bf16) @@ -10101,6 +10104,36 @@ x86_emulate( state->simd_size = simd_none; break; +case X86EMUL_OPC_F2(0x0f38, 0xf8): /* enqcmd r,m512 */ +case X86EMUL_OPC_F3(0x0f38, 0xf8): /* enqcmds r,m512 */ +host_and_vcpu_must_have(enqcmd); +generate_exception_if(ea.type != OP_MEM, EXC_UD); +generate_exception_if(vex.pfx != vex_f2 && !mode_ring0(), EXC_GP, 0); +src.val = truncate_ea(*dst.reg); +generate_exception_if(!is_aligned(x86_seg_es, src.val, 64, ctxt, ops), + EXC_GP, 0); +fail_if(!ops->blk); +BUILD_BUG_ON(sizeof(*mmvalp) < 64); +if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp, 64, + ctxt)) != X86EMUL_OKAY ) +goto done; +if ( vex.pfx == vex_f2 ) /* enqcmd */ +{ +fail_if(!ops->read_msr); +if ( (rc = ops->read_msr(MSR_IA32_PASID, + &msr_val, ctxt)) != X86EMUL_OKAY ) +goto done; +generate_exception_if(!(msr_val & PASID_VALID), EXC_GP, 0); +mmvalp->data32[0] = MASK_EXTR(msr_val, PASID_PASID_MASK); +} +mmvalp->data32[0] &= ~0x7ff0; +state->blk = blk_enqcmd; +if ( (rc = ops->blk(x86_seg_es, src.val, mmvalp, 64, &_regs.eflags, +state, ctxt)) != X86EMUL_OKAY ) +goto done; +state->simd_size = simd_none; +break; + case X86EMUL_OPC(0x0f38, 0xf9): /* movdiri mem,r */ vcpu_must_have(movdiri); generate_exception_if(dst.type != OP_MEM, EXC_UD); @@ -11378,11 +11411,36 @@ int x86_emul_blk( { switch ( state->blk ) { +bool zf; + /* * Throughout this switch(), memory clobbers are used to compensate * that other operands may not properly express the (full) memory * ranges covered. */ +case blk_enqcmd: +ASSERT(bytes == 64); +if ( ((unsigned long)ptr & 0x3f) ) +{ +ASSERT_UNREACHABLE(); +return X86EMUL_UNHANDLEABLE; +} +*eflags &= ~EFLAGS_MASK; +#ifdef HAVE_AS_ENQCMD +asm ( "enqcmds (%[src]), %[dst]" ASM_FLAG_OUT(, "; setz %0") + : [zf] ASM_FLAG_OUT("=@ccz", "=qm") (zf) + : [src] "r" (data), [dst] "r" (ptr) : "memory" ); +#else +
[Xen-devel] [PATCH v5 05/10] x86emul: support MOVDIR64B insn
Introduce a new blk() hook, paralleling the rmw() on in certain way, but being intended for larger data sizes, and hence its HVM intermediate handling function doesn't fall back to splitting the operation if the requested virtual address can't be mapped. Note that SDM revision 071 doesn't specify exception behavior for ModRM.mod == 0b11; assuming #UD here. Signed-off-by: Jan Beulich --- TBD: If we want to avoid depending on correct MTRR settings, hvmemul_map_linear_addr() may need to gain a parameter to allow controlling cachability of the produced mapping(s). Of course the function will also need to be made capable of mapping at least p2m_mmio_direct pages for this and the two ENQCMD insns to be actually useful. --- v5: Introduce/use ->blk() hook. Correct asm() operands. v4: Split MOVDIRI and MOVDIR64B. Switch to using ->rmw(). Re-base. v3: Update description. --- (SDE: -tnt) --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -652,6 +652,18 @@ static int cmpxchg( return X86EMUL_OKAY; } +static int blk( +enum x86_segment seg, +unsigned long offset, +void *p_data, +unsigned int bytes, +uint32_t *eflags, +struct x86_emulate_state *state, +struct x86_emulate_ctxt *ctxt) +{ +return x86_emul_blk((void *)offset, p_data, bytes, eflags, state, ctxt); +} + static int read_segment( enum x86_segment seg, struct segment_register *reg, @@ -2208,6 +2220,35 @@ int main(int argc, char **argv) goto fail; printf("okay\n"); +printf("%-40s", "Testing movdir64b 144(%edx),%ecx..."); +if ( stack_exec && cpu_has_movdir64b ) +{ +emulops.blk = blk; + +instr[0] = 0x66; instr[1] = 0x0f; instr[2] = 0x38; instr[3] = 0xf8; +instr[4] = 0x8a; instr[5] = 0x90; instr[8] = instr[7] = instr[6] = 0; + +regs.eip = (unsigned long)&instr[0]; +for ( i = 0; i < 64; ++i ) +res[i] = i - 20; +regs.edx = (unsigned long)res; +regs.ecx = (unsigned long)(res + 16); + +rc = x86_emulate(&ctxt, &emulops); +if ( (rc != X86EMUL_OKAY) || + (regs.eip != (unsigned long)&instr[9]) || + res[15] != -5 || res[32] != 12 ) +goto fail; +for ( i = 16; i < 32; ++i ) +if ( res[i] != i ) +goto fail; +printf("okay\n"); + +emulops.blk = NULL; +} +else +printf("skipped\n"); + printf("%-40s", "Testing movq %mm3,(%ecx)..."); if ( stack_exec && cpu_has_mmx ) { --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -154,6 +154,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512_vnni (cp.feat.avx512_vnni && xcr0_mask(0xe6)) #define cpu_has_avx512_bitalg (cp.feat.avx512_bitalg && xcr0_mask(0xe6)) #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6)) +#define cpu_has_movdir64b cp.feat.movdir64b #define cpu_has_avx512_4vnniw (cp.feat.avx512_4vnniw && xcr0_mask(0xe6)) #define cpu_has_avx512_4fmaps (cp.feat.avx512_4fmaps && xcr0_mask(0xe6)) #define cpu_has_avx512_bf16 (cp.feat.avx512_bf16 && xcr0_mask(0xe6)) --- a/xen/arch/x86/Makefile +++ b/xen/arch/x86/Makefile @@ -250,12 +250,13 @@ $(BASEDIR)/include/asm-x86/asm-macros.h: # sure we pick up changes when the compiler used has changed.) ifeq ($(MAKECMDGOALS),asm-offsets.s) -as-ISA-list := CLWB EPT FSGSBASE INVPCID RDRAND RDSEED SSE4_2 VMX XSAVEOPT +as-ISA-list := CLWB EPT FSGSBASE INVPCID MOVDIR64B RDRAND RDSEED SSE4_2 VMX XSAVEOPT CLWB-insn := clwb (%rax) EPT-insn := invept (%rax),%rax FSGSBASE-insn := rdfsbase %rax INVPCID-insn := invpcid (%rax),%rax +MOVDIR64B-insn := movdir64b (%rax),%rax RDRAND-insn:= rdrand %eax RDSEED-insn:= rdseed %eax SSE4_2-insn:= crc32 %eax,%eax --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -1409,6 +1409,44 @@ static int hvmemul_rmw( return rc; } +static int hvmemul_blk( +enum x86_segment seg, +unsigned long offset, +void *p_data, +unsigned int bytes, +uint32_t *eflags, +struct x86_emulate_state *state, +struct x86_emulate_ctxt *ctxt) +{ +struct hvm_emulate_ctxt *hvmemul_ctxt = +container_of(ctxt, struct hvm_emulate_ctxt, ctxt); +unsigned long addr; +uint32_t pfec = PFEC_page_present | PFEC_write_access; +int rc; +void *mapping = NULL; + +rc = hvmemul_virtual_to_linear( +seg, offset, bytes, NULL, hvm_access_write, hvmemul_ctxt, &addr); +if ( rc != X86EMUL_OKAY || !bytes ) +return rc; + +if ( is_x86_system_segment(seg) ) +pfec |= PFEC_implicit; +else if ( hvmemul_ctxt->seg_reg[x86_seg_ss].dpl == 3 ) +pfec |= PFEC_user_mode; + +mapping = hvmemul_map_linear_addr(addr, bytes, pfec, hvmemul_ctxt); +if ( IS_ERR(mapping) ) +return ~PTR_ERR(mapping); +if ( !mapping ) +return
[Xen-devel] [PATCH v5 04/10] x86: move back clang no integrated assembler tests
This largely reverts f19af2f1138e ("x86: re-order clang no integrated assembler tests"): Other CFLAGS setup would better happen first, in case any of it affects the behavior of the integrated assembler. The comment addition of course doesn't get undone. The only remaining as-option-add invocation gets moved down in addition. Signed-off-by: Jan Beulich Reviewed-by: Roger Pau Monné --- v5: Re-base. v4: New. --- a/xen/arch/x86/Rules.mk +++ b/xen/arch/x86/Rules.mk @@ -12,35 +12,8 @@ CFLAGS += '-D__OBJECT_LABEL__=$(subst /, # Prevent floating-point variables from creeping into Xen. CFLAGS += -msoft-float -ifeq ($(CONFIG_CC_IS_CLANG),y) -# Note: Any test which adds -no-integrated-as will cause subsequent tests to -# succeed, and not trigger further additions. -# -# The tests to select whether the integrated assembler is usable need to happen -# before testing any assembler features, or else the result of the tests would -# be stale if the integrated assembler is not used. - -# Older clang's built-in assembler doesn't understand .skip with labels: -# https://bugs.llvm.org/show_bug.cgi?id=27369 -$(call as-option-add,CFLAGS,CC,".L0: .L1: .skip (.L1 - .L0)",,\ - -no-integrated-as) - -# Check whether clang asm()-s support .include. -$(call as-option-add,CFLAGS,CC,".include \"asm-x86/indirect_thunk_asm.h\"",,\ - -no-integrated-as) - -# Check whether clang keeps .macro-s between asm()-s: -# https://bugs.llvm.org/show_bug.cgi?id=36110 -$(call as-option-add,CFLAGS,CC,\ - ".macro FOO;.endm"$$(close); asm volatile $$(open)".macro FOO;.endm",\ - -no-integrated-as) -endif - $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS)) $(call cc-option-add,CFLAGS,CC,-Wnested-externs) -$(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \ - -U__OBJECT_LABEL__ -DHAVE_AS_QUOTED_SYM \ - '-D__OBJECT_LABEL__=$(subst $(BASEDIR)/,,$(CURDIR))/$$@') CFLAGS += -mno-red-zone -fpic -fno-asynchronous-unwind-tables @@ -70,3 +43,30 @@ endif # Set up the assembler include path properly for older toolchains. CFLAGS += -Wa,-I$(BASEDIR)/include +ifeq ($(CONFIG_CC_IS_CLANG),y) +# Note: Any test which adds -no-integrated-as will cause subsequent tests to +# succeed, and not trigger further additions. +# +# The tests to select whether the integrated assembler is usable need to happen +# before testing any assembler features, or else the result of the tests would +# be stale if the integrated assembler is not used. + +# Older clang's built-in assembler doesn't understand .skip with labels: +# https://bugs.llvm.org/show_bug.cgi?id=27369 +$(call as-option-add,CFLAGS,CC,".L0: .L1: .skip (.L1 - .L0)",,\ + -no-integrated-as) + +# Check whether clang asm()-s support .include. +$(call as-option-add,CFLAGS,CC,".include \"asm-x86/indirect_thunk_asm.h\"",,\ + -no-integrated-as) + +# Check whether clang keeps .macro-s between asm()-s: +# https://bugs.llvm.org/show_bug.cgi?id=36110 +$(call as-option-add,CFLAGS,CC,\ + ".macro FOO;.endm"$$(close); asm volatile $$(open)".macro FOO;.endm",\ + -no-integrated-as) +endif + +$(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \ + -U__OBJECT_LABEL__ -DHAVE_AS_QUOTED_SYM \ + '-D__OBJECT_LABEL__=$(subst $(BASEDIR)/,,$(CURDIR))/$$@')
[Xen-devel] [PATCH v5 03/10] x86: determine HAVE_AS_* just once
With the exception of HAVE_AS_QUOTED_SYM, populate the results into a generated header instead of (at least once per [sub]directory) into CFLAGS. This results in proper rebuilds (via make dependencies) in case the compiler used changes between builds. It additionally eases inspection of which assembler features were actually found usable. Some trickery is needed to avoid header generation itself to try to include the to-be/not-yet-generated header. Since the definitions in generated/config.h, previously having been command line options, might even affect xen/config.h or its descendants, move adding of the -include option for the latter after inclusion of the per-arch Rules.mk. Use the occasion to also move the most general -I option to the common Rules.mk. Signed-off-by: Jan Beulich --- v5: Re-base. v4: New. --- An alternative to the $(MAKECMDGOALS) trickery would be to make generation of generated/config.h part of the asm-offsets.s rule, instead of adding it as a dependency there. Not sure whether either is truly better than the other. --- a/xen/Rules.mk +++ b/xen/Rules.mk @@ -55,7 +55,7 @@ endif CFLAGS += -nostdinc -fno-builtin -fno-common CFLAGS += -Werror -Wredundant-decls -Wno-pointer-arith $(call cc-option-add,CFLAGS,CC,-Wvla) -CFLAGS += -pipe -D__XEN__ -include $(BASEDIR)/include/xen/config.h +CFLAGS += -pipe -D__XEN__ -I$(BASEDIR)/include CFLAGS-$(CONFIG_DEBUG_INFO) += -g CFLAGS += '-D__OBJECT_FILE__="$@"' @@ -95,6 +95,9 @@ SPECIAL_DATA_SECTIONS := rodata $(foreac include $(BASEDIR)/arch/$(TARGET_ARCH)/Rules.mk +# Allow the arch to use -include ahead of this one. +CFLAGS += -include xen/config.h + include Makefile define gendep --- a/xen/arch/arm/Rules.mk +++ b/xen/arch/arm/Rules.mk @@ -6,8 +6,6 @@ # 'make clean' before rebuilding. # -CFLAGS += -I$(BASEDIR)/include - $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS)) $(call cc-option-add,CFLAGS,CC,-Wnested-externs) --- a/xen/arch/x86/Makefile +++ b/xen/arch/x86/Makefile @@ -225,7 +225,8 @@ endif efi/boot.init.o efi/runtime.o efi/compat.o efi/buildid.o efi/relocs-dummy.o: $(BASEDIR)/arch/x86/efi/built_in.o efi/boot.init.o efi/runtime.o efi/compat.o efi/buildid.o efi/relocs-dummy.o: ; -asm-offsets.s: $(TARGET_SUBARCH)/asm-offsets.c $(BASEDIR)/include/asm-x86/asm-macros.h +asm-offsets.s: $(TARGET_SUBARCH)/asm-offsets.c $(BASEDIR)/include/asm-x86/asm-macros.h \ + $(BASEDIR)/include/generated/config.h $(CC) $(filter-out -Wa$(comma)% -flto,$(CFLAGS)) -S -o $@ $< asm-macros.i: CFLAGS += -D__ASSEMBLY__ -P @@ -241,6 +242,45 @@ $(BASEDIR)/include/asm-x86/asm-macros.h: echo '#endif' >>$@.new $(call move-if-changed,$@.new,$@) +# There are multiple invocations of this Makefile, one each for asm-offset.s, +# $(TARGET), built_in.o, and several more from the rules building $(TARGET) +# and $(TARGET).efi. The 2nd and 3rd may race with one another, and we don't +# want to re-generate config.h in that case anyway, so guard the logic +# accordingly. (We do want to have the FORCE dependency on the rule, to be +# sure we pick up changes when the compiler used has changed.) +ifeq ($(MAKECMDGOALS),asm-offsets.s) + +as-ISA-list := CLWB EPT FSGSBASE INVPCID RDRAND RDSEED SSE4_2 VMX XSAVEOPT + +CLWB-insn := clwb (%rax) +EPT-insn := invept (%rax),%rax +FSGSBASE-insn := rdfsbase %rax +INVPCID-insn := invpcid (%rax),%rax +RDRAND-insn:= rdrand %eax +RDSEED-insn:= rdseed %eax +SSE4_2-insn:= crc32 %eax,%eax +VMX-insn := vmcall +XSAVEOPT-insn := xsaveopt (%rax) + +# GAS's idea of true is -1. Clang's idea is 1. +NEGATIVE_TRUE-insn := .if ((1 > 0) > 0); .error \"\"; .endif + +# Check to see whether the assembler supports the .nops directive. +NOPS_DIRECTIVE-insn := .L1: .L2: .nops (.L2 - .L1),9 + +as-features-list := $(as-ISA-list) NEGATIVE_TRUE NOPS_DIRECTIVE + +$(BASEDIR)/include/generated/config.h: FORCE + echo '/* Generated header, do not edit. */' >$@.new + $(foreach f,$(as-features-list), \ + $(if $($(f)-insn),,$(error $(f)-insn is empty)) \ + echo '#$(call as-insn,$(CC) $(CFLAGS),"$($(f)-insn)", \ + define, \ + undef) HAVE_AS_$(f) /* $($(f)-insn) */' >>$@.new;) + $(call move-if-changed,$@.new,$@) + +endif + efi.lds: AFLAGS += -DEFI xen.lds efi.lds: xen.lds.S $(CC) -P -E -Ui386 $(filter-out -Wa$(comma)%,$(AFLAGS)) -o $@ $< --- a/xen/arch/x86/Rules.mk +++ b/xen/arch/x86/Rules.mk @@ -3,7 +3,7 @@ XEN_IMG_OFFSET := 0x20 -CFLAGS += -I$(BASEDIR)/include +CFLAGS += $(if $(filter asm-macros.% %/generated/config.h,$@),,-include generated/config.h) CFLAGS += -I$(BASEDIR)/include/asm-x86/mach-generic CFLAGS += -I$(BASEDIR)/include/asm-x86/mach-default CFLAGS += -DXEN_IMG_OFFSET=$(XEN_IMG_OFFSET) @@ -38,26 +38,9 @@ endif $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS)) $(call cc-option-add,CFLAGS,CC,-Wnested-externs) -$(call as-option-add,CFLAGS,CC,"vmcall",-DHAVE_AS_VMX) -$(ca
[Xen-devel] [PATCH v5 01/10] x86emul: support AVX512_BF16 insns
Signed-off-by: Jan Beulich --- v5: New. --- (SDE: -cpx) --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -550,6 +550,12 @@ static const struct test avx512_4vnniw_5 INSN(p4dpwssds, f2, 0f38, 53, el_4, d, vl), }; +static const struct test avx512_bf16_all[] = { +INSN(vcvtne2ps2bf16, f2, 0f38, 72, vl, d, vl), +INSN(vcvtneps2bf16, f3, 0f38, 72, vl, d, vl), +INSN(vdpbf16ps, f3, 0f38, 52, vl, d, vl), +}; + static const struct test avx512_bitalg_all[] = { INSN(popcnt, 66, 0f38, 54, vl, bw, vl), INSN(pshufbitqmb, 66, 0f38, 8f, vl, b, vl), @@ -984,6 +990,7 @@ void evex_disp8_test(void *instr, struct RUN(avx512pf, 512); RUN(avx512_4fmaps, 512); RUN(avx512_4vnniw, 512); +RUN(avx512_bf16, all); RUN(avx512_bitalg, all); RUN(avx512_ifma, all); RUN(avx512_vbmi, all); --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -4516,6 +4516,80 @@ int main(int argc, char **argv) else printf("skipped\n"); +if ( stack_exec && cpu_has_avx512_bf16 ) +{ +decl_insn(vcvtne2ps2bf16); +decl_insn(vcvtneps2bf16); +decl_insn(vdpbf16ps); +static const struct { +float f[16]; +} in1 = {{ +1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 +}}, in2 = {{ +1, -2, 3, -4, 5, -6, 7, -8, 9, -10, 11, -12, 13, -14, 15, -16 +}}, out = {{ +1 * 1 + 2 * 2, 3 * 3 + 4 * 4, +5 * 5 + 6 * 6, 7 * 7 + 8 * 8, +9 * 9 + 10 * 10, 11 * 11 + 12 * 12, +13 * 13 + 14 * 14, 15 * 15 + 16 * 16, +1 * 1 - 2 * 2, 3 * 3 - 4 * 4, +5 * 5 - 6 * 6, 7 * 7 - 8 * 8, +9 * 9 - 10 * 10, 11 * 11 - 12 * 12, +13 * 13 - 14 * 14, 15 * 15 - 16 * 16 +}}; + +printf("%-40s", "Testing vcvtne2ps2bf16 64(%ecx),%zmm1,%zmm2..."); +asm volatile ( "vmovups %1, %%zmm1\n" + put_insn(vcvtne2ps2bf16, +/* vcvtne2ps2bf16 64(%0), %%zmm1, %%zmm2 */ +".byte 0x62, 0xf2, 0x77, 0x48, 0x72, 0x51, 0x01") + :: "c" (NULL), "m" (in2) ); +set_insn(vcvtne2ps2bf16); +regs.ecx = (unsigned long)&in1 - 64; +rc = x86_emulate(&ctxt, &emulops); +if ( rc != X86EMUL_OKAY || !check_eip(vcvtne2ps2bf16) ) +goto fail; +printf("pending\n"); + +printf("%-40s", "Testing vcvtneps2bf16 64(%ecx),%ymm3..."); +asm volatile ( put_insn(vcvtneps2bf16, +/* vcvtneps2bf16 64(%0), %%ymm3 */ +".byte 0x62, 0xf2, 0x7e, 0x48, 0x72, 0x59, 0x01") + :: "c" (NULL) ); +set_insn(vcvtneps2bf16); +rc = x86_emulate(&ctxt, &emulops); +if ( rc != X86EMUL_OKAY || !check_eip(vcvtneps2bf16) ) +goto fail; +asm ( "vmovdqa %%ymm2, %%ymm5\n\t" + "vpcmpeqd %%zmm3, %%zmm5, %%k0\n\t" + "kmovw %%k0, %0" + : "=g" (rc) : "m" (out) ); +if ( rc != 0x ) +goto fail; +printf("pending\n"); + +printf("%-40s", "Testing vdpbf16ps 128(%ecx),%zmm2,%zmm4..."); +asm volatile ( "vmovdqa %%ymm3, %0\n\t" + "vmovdqa %%ymm3, %1\n" + put_insn(vdpbf16ps, +/* vdpbf16ps 128(%2), %%zmm2, %%zmm4 */ +".byte 0x62, 0xf2, 0x6e, 0x48, 0x52, 0x61, 0x02") + : "=&m" (res[0]), "=&m" (res[8]) + : "c" (NULL) + : "memory" ); +set_insn(vdpbf16ps); +regs.ecx = (unsigned long)res - 128; +rc = x86_emulate(&ctxt, &emulops); +if ( rc != X86EMUL_OKAY || !check_eip(vdpbf16ps) ) +goto fail; +asm ( "vcmpeqps %1, %%zmm4, %%k0\n\t" + "kmovw %%k0, %0" + : "=g" (rc) : "m" (out) ); +if ( rc != 0x ) +goto fail; +printf("okay\n"); +} + printf("%-40s", "Testing invpcid 16(%ecx),%%edx..."); if ( stack_exec ) { --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -156,6 +156,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6)) #define cpu_has_avx512_4vnniw (cp.feat.avx512_4vnniw && xcr0_mask(0xe6)) #define cpu_has_avx512_4fmaps (cp.feat.avx512_4fmaps && xcr0_mask(0xe6)) +#define cpu_has_avx512_bf16 (cp.feat.avx512_bf16 && xcr0_mask(0xe6)) #define cpu_has_xgetbv1 (cpu_has_xsave && cp.xstate.xgetbv1) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -1904,6 +1904,7 @@ in_protmode( #define vcpu_has_rdpid() (ctxt->cpuid->feat.rdpid) #define vcpu_has_avx512_4vnniw() (ctxt->cpuid->
[Xen-devel] [PATCH v5 02/10] x86emul: support MOVDIRI insn
Note that SDM revision 070 doesn't specify exception behavior for ModRM.mod == 0b11; assuming #UD here. Signed-off-by: Jan Beulich --- v4: Split MOVDIRI and MOVDIR64B and move this one ahead. Re-base. v3: Update description. --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -2196,6 +2196,18 @@ int main(int argc, char **argv) goto fail; printf("okay\n"); +printf("%-40s", "Testing movdiri %edx,(%ecx)..."); +instr[0] = 0x0f; instr[1] = 0x38; instr[2] = 0xf9; instr[3] = 0x11; +regs.eip = (unsigned long)&instr[0]; +regs.ecx = (unsigned long)memset(res, -1, 16); +regs.edx = 0x44332211; +rc = x86_emulate(&ctxt, &emulops); +if ( (rc != X86EMUL_OKAY) || + (regs.eip != (unsigned long)&instr[4]) || + res[0] != 0x44332211 || ~res[1] ) +goto fail; +printf("okay\n"); + printf("%-40s", "Testing movq %mm3,(%ecx)..."); if ( stack_exec && cpu_has_mmx ) { --- a/tools/tests/x86_emulator/x86-emulate.c +++ b/tools/tests/x86_emulator/x86-emulate.c @@ -76,6 +76,7 @@ bool emul_test_init(void) cp.feat.adx = true; cp.feat.avx512pf = cp.feat.avx512f; cp.feat.rdpid = true; +cp.feat.movdiri = true; cp.extd.clzero = true; if ( cpu_has_xsave ) @@ -137,15 +138,15 @@ int emul_test_cpuid( res->c |= 1U << 22; /* - * The emulator doesn't itself use ADCX/ADOX/RDPID nor the S/G prefetch - * insns, so we can always run the respective tests. + * The emulator doesn't itself use ADCX/ADOX/RDPID/MOVDIRI nor the S/G + * prefetch insns, so we can always run the respective tests. */ if ( leaf == 7 && subleaf == 0 ) { res->b |= (1U << 10) | (1U << 19); if ( res->b & (1U << 16) ) res->b |= 1U << 26; -res->c |= 1U << 22; +res->c |= (1U << 22) | (1U << 27); } /* --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -548,6 +548,7 @@ static const struct ext0f38_table { [0xf1] = { .to_mem = 1, .two_op = 1 }, [0xf2 ... 0xf3] = {}, [0xf5 ... 0xf7] = {}, +[0xf9] = { .to_mem = 1 }, }; /* Shift values between src and dst sizes of pmov{s,z}x{b,w,d}{w,d,q}. */ @@ -1902,6 +1903,7 @@ in_protmode( #define vcpu_has_avx512_bitalg() (ctxt->cpuid->feat.avx512_bitalg) #define vcpu_has_avx512_vpopcntdq() (ctxt->cpuid->feat.avx512_vpopcntdq) #define vcpu_has_rdpid() (ctxt->cpuid->feat.rdpid) +#define vcpu_has_movdiri() (ctxt->cpuid->feat.movdiri) #define vcpu_has_avx512_4vnniw() (ctxt->cpuid->feat.avx512_4vnniw) #define vcpu_has_avx512_4fmaps() (ctxt->cpuid->feat.avx512_4fmaps) #define vcpu_has_avx512_bf16() (ctxt->cpuid->feat.avx512_bf16) @@ -2713,10 +2715,12 @@ x86_decode_0f38( { case 0x00 ... 0xef: case 0xf2 ... 0xf5: -case 0xf7 ... 0xff: +case 0xf7 ... 0xf8: +case 0xfa ... 0xff: op_bytes = 0; /* fall through */ case 0xf6: /* adcx / adox */ +case 0xf9: /* movdiri */ ctxt->opcode |= MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK); break; @@ -10075,6 +10079,14 @@ x86_emulate( : "0" ((uint32_t)src.val), "rm" (_regs.edx) ); break; +case X86EMUL_OPC(0x0f38, 0xf9): /* movdiri mem,r */ +vcpu_must_have(movdiri); +generate_exception_if(dst.type != OP_MEM, EXC_UD); +/* Ignore the non-temporal behavior for now. */ +dst.val = src.val; +sfence = true; +break; + #ifndef X86EMUL_NO_SIMD case X86EMUL_OPC_VEX_66(0x0f3a, 0x00): /* vpermq $imm8,ymm/m256,ymm */ --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -237,6 +237,7 @@ XEN_CPUFEATURE(AVX512_BITALG, 6*32+12) / XEN_CPUFEATURE(AVX512_VPOPCNTDQ, 6*32+14) /*A POPCNT for vectors of DW/QW */ XEN_CPUFEATURE(RDPID, 6*32+22) /*A RDPID instruction */ XEN_CPUFEATURE(CLDEMOTE, 6*32+25) /*A CLDEMOTE instruction */ +XEN_CPUFEATURE(MOVDIRI, 6*32+27) /*A MOVDIRI instruction */ /* AMD-defined CPU features, CPUID level 0x8007.edx, word 7 */ XEN_CPUFEATURE(ITSC, 7*32+ 8) /* Invariant TSC */
Re: [Xen-devel] [PATCH v2] SVM: Add union intstat_t for offset 68h in vmcb struct
On 24/03/2020 10:37, Pu Wen wrote: > According to chapter "Appendix B Layout of VMCB" in the new version > (v3.32) AMD64 APM[1], bit 1 of the VMCB offset 68h is defined as > GUEST_INTERRUPT_MASK. > > In current xen codes, it use whole u64 interrupt_shadow to setup > interrupt shadow, which will misuse other bit in VMCB offset 68h > as part of interrupt_shadow. > > Add union intstat_t for VMCB offset 68h and fix codes to only use > bit 0 as intr_shadow according to the new APM description. > > Reference: > [1] https://www.amd.com/system/files/TechDocs/24593.pdf > > Signed-off-by: Pu Wen Hmm - this field doesn't appear to be part of AVIC, which makes me wonder what we're doing without it. It appears to be a shadow copy of EFLAGS.IF which is only written on vmexit, and never consumed, but this is based on Appendix B which is the only reference I can find to the field at all. Neither the VMRUN/#VMEXIT descriptions discuss it at all. Given its position next to the (ambiguous) INTERRUPT_SHADOW, it just might actually distinguish the STI shadow from the MovSS shadow, but it could only do that by not behaving as described, and being asymmetric with EFLAGS. I don't have time to investigate this right now. We need the field described in Xen to set it appropriately for virtual vmexit, but I think that is the extent of what we need to do. ~Andrew
[Xen-devel] [PATCH v5 02/10] x86emul: support AVX512_BF16 insns
Signed-off-by: Jan Beulich --- v5: New. --- (SDE: -cpx) --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -550,6 +550,12 @@ static const struct test avx512_4vnniw_5 INSN(p4dpwssds, f2, 0f38, 53, el_4, d, vl), }; +static const struct test avx512_bf16_all[] = { +INSN(vcvtne2ps2bf16, f2, 0f38, 72, vl, d, vl), +INSN(vcvtneps2bf16, f3, 0f38, 72, vl, d, vl), +INSN(vdpbf16ps, f3, 0f38, 52, vl, d, vl), +}; + static const struct test avx512_bitalg_all[] = { INSN(popcnt, 66, 0f38, 54, vl, bw, vl), INSN(pshufbitqmb, 66, 0f38, 8f, vl, b, vl), @@ -984,6 +990,7 @@ void evex_disp8_test(void *instr, struct RUN(avx512pf, 512); RUN(avx512_4fmaps, 512); RUN(avx512_4vnniw, 512); +RUN(avx512_bf16, all); RUN(avx512_bitalg, all); RUN(avx512_ifma, all); RUN(avx512_vbmi, all); --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -4516,6 +4516,80 @@ int main(int argc, char **argv) else printf("skipped\n"); +if ( stack_exec && cpu_has_avx512_bf16 ) +{ +decl_insn(vcvtne2ps2bf16); +decl_insn(vcvtneps2bf16); +decl_insn(vdpbf16ps); +static const struct { +float f[16]; +} in1 = {{ +1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 +}}, in2 = {{ +1, -2, 3, -4, 5, -6, 7, -8, 9, -10, 11, -12, 13, -14, 15, -16 +}}, out = {{ +1 * 1 + 2 * 2, 3 * 3 + 4 * 4, +5 * 5 + 6 * 6, 7 * 7 + 8 * 8, +9 * 9 + 10 * 10, 11 * 11 + 12 * 12, +13 * 13 + 14 * 14, 15 * 15 + 16 * 16, +1 * 1 - 2 * 2, 3 * 3 - 4 * 4, +5 * 5 - 6 * 6, 7 * 7 - 8 * 8, +9 * 9 - 10 * 10, 11 * 11 - 12 * 12, +13 * 13 - 14 * 14, 15 * 15 - 16 * 16 +}}; + +printf("%-40s", "Testing vcvtne2ps2bf16 64(%ecx),%zmm1,%zmm2..."); +asm volatile ( "vmovups %1, %%zmm1\n" + put_insn(vcvtne2ps2bf16, +/* vcvtne2ps2bf16 64(%0), %%zmm1, %%zmm2 */ +".byte 0x62, 0xf2, 0x77, 0x48, 0x72, 0x51, 0x01") + :: "c" (NULL), "m" (in2) ); +set_insn(vcvtne2ps2bf16); +regs.ecx = (unsigned long)&in1 - 64; +rc = x86_emulate(&ctxt, &emulops); +if ( rc != X86EMUL_OKAY || !check_eip(vcvtne2ps2bf16) ) +goto fail; +printf("pending\n"); + +printf("%-40s", "Testing vcvtneps2bf16 64(%ecx),%ymm3..."); +asm volatile ( put_insn(vcvtneps2bf16, +/* vcvtneps2bf16 64(%0), %%ymm3 */ +".byte 0x62, 0xf2, 0x7e, 0x48, 0x72, 0x59, 0x01") + :: "c" (NULL) ); +set_insn(vcvtneps2bf16); +rc = x86_emulate(&ctxt, &emulops); +if ( rc != X86EMUL_OKAY || !check_eip(vcvtneps2bf16) ) +goto fail; +asm ( "vmovdqa %%ymm2, %%ymm5\n\t" + "vpcmpeqd %%zmm3, %%zmm5, %%k0\n\t" + "kmovw %%k0, %0" + : "=g" (rc) : "m" (out) ); +if ( rc != 0x ) +goto fail; +printf("pending\n"); + +printf("%-40s", "Testing vdpbf16ps 128(%ecx),%zmm2,%zmm4..."); +asm volatile ( "vmovdqa %%ymm3, %0\n\t" + "vmovdqa %%ymm3, %1\n" + put_insn(vdpbf16ps, +/* vdpbf16ps 128(%2), %%zmm2, %%zmm4 */ +".byte 0x62, 0xf2, 0x6e, 0x48, 0x52, 0x61, 0x02") + : "=&m" (res[0]), "=&m" (res[8]) + : "c" (NULL) + : "memory" ); +set_insn(vdpbf16ps); +regs.ecx = (unsigned long)res - 128; +rc = x86_emulate(&ctxt, &emulops); +if ( rc != X86EMUL_OKAY || !check_eip(vdpbf16ps) ) +goto fail; +asm ( "vcmpeqps %1, %%zmm4, %%k0\n\t" + "kmovw %%k0, %0" + : "=g" (rc) : "m" (out) ); +if ( rc != 0x ) +goto fail; +printf("okay\n"); +} + printf("%-40s", "Testing invpcid 16(%ecx),%%edx..."); if ( stack_exec ) { --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -156,6 +156,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512_vpopcntdq (cp.feat.avx512_vpopcntdq && xcr0_mask(0xe6)) #define cpu_has_avx512_4vnniw (cp.feat.avx512_4vnniw && xcr0_mask(0xe6)) #define cpu_has_avx512_4fmaps (cp.feat.avx512_4fmaps && xcr0_mask(0xe6)) +#define cpu_has_avx512_bf16 (cp.feat.avx512_bf16 && xcr0_mask(0xe6)) #define cpu_has_xgetbv1 (cpu_has_xsave && cp.xstate.xgetbv1) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -1904,6 +1904,7 @@ in_protmode( #define vcpu_has_rdpid() (ctxt->cpuid->feat.rdpid) #define vcpu_has_avx512_4vnniw() (ctxt->cpuid->
[Xen-devel] [PATCH v5 00/10] x86emul: further work
Some of the later patches are still at least partly RFC, for varying reasons (see there). I'd appreciate though if at least some of the earlier ones could go in rather sooner than later. Patch 1 functionally (for the test harness) depends on "libx86/CPUID: fix (not just) leaf 7 processing", while at least patch 2 contextually depends on "x86emul: disable FPU/MMX/SIMD insn emulation when !HVM". 1: x86emul: support AVX512_BF16 insns 2: x86emul: support MOVDIRI insn 3: x86: determine HAVE_AS_* just once 4: x86: move back clang no integrated assembler tests 5: x86emul: support MOVDIR64B insn 6: x86emul: support ENQCMD insn 7: x86/HVM: scale MPERF values reported to guests (on AMD) 8: x86emul: support RDPRU 9: x86/HVM: don't needlessly intercept APERF/MPERF/TSC MSR reads 10: x86emul: support MCOMMIT See individual patches for changes from v4 (which was mistakenly sent out with a v3 tag). Jan
Re: [Xen-devel] [PATCH 2/3] x86/nvmx: clarify and fix usage of nvmx_update_apicv
On Tue, Mar 24, 2020 at 11:33:00AM +, Tian, Kevin wrote: > > From: Roger Pau Monné > > Sent: Tuesday, March 24, 2020 7:23 PM > > > > On Tue, Mar 24, 2020 at 10:11:15AM +, Tian, Kevin wrote: > > > > From: Roger Pau Monné > > > > Sent: Tuesday, March 24, 2020 5:51 PM > > > > > > > > On Tue, Mar 24, 2020 at 06:03:28AM +, Tian, Kevin wrote: > > > > > > From: Roger Pau Monne > > > > > > Sent: Saturday, March 21, 2020 3:08 AM > > > > > > > > > > > > The current usage of nvmx_update_apicv is not clear: it is deeply > > > > > > intertwined with the Ack interrupt on exit VMEXIT control. > > > > > > > > > > > > The code in nvmx_update_apicv should update the SVI (in service > > > > interrupt) > > > > > > field of the guest interrupt status only when the Ack interrupt on > > > > > > exit is set, as it is used to record that the interrupt being > > > > > > serviced is signaled in a vmcs field, and hence hasn't been injected > > > > > > as on native. It's important to record the current in service > > > > > > interrupt on the guest interrupt status field, or else further > > > > > > interrupts won't respect the priority of the in service one. > > > > > > > > > > > > While clarifying the usage make sure that the SVI is only updated > > when > > > > > > Ack on exit is set and the nested vmcs interrupt info field is > > > > > > valid. Or > > > > > > else a guest not using the Ack on exit feature would loose > > > > > > interrupts > > as > > > > > > they would be signaled as being in service on the guest interrupt > > > > > > status field but won't actually be recorded on the interrupt info > > > > > > vmcs > > > > > > field, neither injected in any other way. > > > > > > > > > > It is insufficient. You also need to update RVI to enable virtual > > > > > injection > > > > > when Ack on exit is cleared. > > > > > > > > But RVI should be updated in vmx_intr_assist in that case, since > > > > nvmx_intr_intercept shouldn't intercept the interrupt, as it should be > > > > handled normally. > > > > > > As we discussed before, vmx_intr_assist is invoked before > > nvmx_switch_guest. > > > > > > It is incorrectly to update RVI at that time since it might be still > > > vmcs02 > > being > > > active (if no pending softirq to make it invoked again). > > > > > > Also nvmx_intr_intercept does intercept Ack-on-exit=0 case: > > > > > > if ( intack.source == hvm_intsrc_pic || > > > intack.source == hvm_intsrc_lapic ) I've realized that nvmx_intr_intercept will return 1 for interrupts originating from the lapic or the pic, while nvmx_update_apicv will only update GUEST_INTR_STATUS for interrupts originating from the lapic. Is this correct? Shouldn't both be in sync and handle the same interrupt sources? Thanks, Roger.
[Xen-devel] Ping: [PATCH 5/5] x86emul: disable FPU/MMX/SIMD insn emulation when !HVM
On 20.12.2019 14:41, Jan Beulich wrote: > In a pure PV environment (the PV shim in particular) we don't really > need emulation of all these. To limit #ifdef-ary utilize some of the > CASE_*() macros we have, by providing variants expanding to > (effectively) nothing (really a label, which in turn requires passing > -Wno-unused-label to the compiler when build such configurations). > > Due to the mixture of macro and #ifdef use, the placement of some of > the #ifdef-s is a little arbitrary. > > The resulting object file's .text is less than half the size of the > original, and looks to also be compiling a little more quickly. > > This is meant as a first step; more parts can likely be disabled down > the road. > > Suggested-by: Andrew Cooper > Signed-off-by: Jan Beulich Ping? > --- > I'll be happy to take suggestions allowing to avoid -Wno-unused-label. > > --- a/xen/arch/x86/Makefile > +++ b/xen/arch/x86/Makefile > @@ -79,6 +79,9 @@ obj-y += hpet.o > obj-y += vm_event.o > obj-y += xstate.o > > +ifneq ($(CONFIG_HVM),y) > +x86_emulate.o: CFLAGS += -Wno-unused-label > +endif > x86_emulate.o: x86_emulate/x86_emulate.c x86_emulate/x86_emulate.h > > efi-y := $(shell if [ ! -r $(BASEDIR)/include/xen/compile.h -o \ > --- a/xen/arch/x86/x86_emulate.c > +++ b/xen/arch/x86/x86_emulate.c > @@ -42,6 +42,12 @@ > } \ > }) > > +#ifndef CONFIG_HVM > +# define X86EMUL_NO_FPU > +# define X86EMUL_NO_MMX > +# define X86EMUL_NO_SIMD > +#endif > + > #include "x86_emulate/x86_emulate.c" > > int x86emul_read_xcr(unsigned int reg, uint64_t *val, > --- a/xen/arch/x86/x86_emulate/x86_emulate.c > +++ b/xen/arch/x86/x86_emulate/x86_emulate.c > @@ -3476,6 +3476,7 @@ x86_decode( > op_bytes = 4; > break; > > +#ifndef X86EMUL_NO_SIMD > case simd_packed_int: > switch ( vex.pfx ) > { > @@ -3541,6 +3542,7 @@ x86_decode( > case simd_256: > op_bytes = 32; > break; > +#endif /* !X86EMUL_NO_SIMD */ > > default: > op_bytes = 0; > @@ -3695,6 +3697,7 @@ x86_emulate( > break; > } > > +#ifndef X86EMUL_NO_SIMD > /* With a memory operand, fetch the mask register in use (if any). */ > if ( ea.type == OP_MEM && evex.opmsk && > _get_fpu(fpu_type = X86EMUL_FPU_opmask, ctxt, ops) == X86EMUL_OKAY ) > @@ -3725,6 +3728,7 @@ x86_emulate( > put_fpu(X86EMUL_FPU_opmask, false, state, ctxt, ops); > fpu_type = X86EMUL_FPU_none; > } > +#endif /* !X86EMUL_NO_SIMD */ > > /* Decode (but don't fetch) the destination operand: register or memory. > */ > switch ( d & DstMask ) > @@ -4372,11 +4376,13 @@ x86_emulate( > singlestep = _regs.eflags & X86_EFLAGS_TF; > break; > > +#ifndef X86EMUL_NO_FPU > case 0x9b: /* wait/fwait */ > host_and_vcpu_must_have(fpu); > get_fpu(X86EMUL_FPU_wait); > emulate_fpu_insn_stub(b); > break; > +#endif > > case 0x9c: /* pushf */ > if ( (_regs.eflags & X86_EFLAGS_VM) && > @@ -4785,6 +4791,7 @@ x86_emulate( > break; > } > > +#ifndef X86EMUL_NO_FPU > case 0xd8: /* FPU 0xd8 */ > host_and_vcpu_must_have(fpu); > get_fpu(X86EMUL_FPU_fpu); > @@ -5119,6 +5126,7 @@ x86_emulate( > } > } > break; > +#endif /* !X86EMUL_NO_FPU */ > > case 0xe0 ... 0xe2: /* loop{,z,nz} */ { > unsigned long count = get_loop_count(&_regs, ad_bytes); > @@ -5983,6 +5991,8 @@ x86_emulate( > case X86EMUL_OPC(0x0f, 0x19) ... X86EMUL_OPC(0x0f, 0x1f): /* nop */ > break; > > +#ifndef X86EMUL_NO_MMX > + > case X86EMUL_OPC(0x0f, 0x0e): /* femms */ > host_and_vcpu_must_have(3dnow); > asm volatile ( "femms" ); > @@ -6003,39 +6013,71 @@ x86_emulate( > state->simd_size = simd_other; > goto simd_0f_imm8; > > -#define CASE_SIMD_PACKED_INT(pfx, opc) \ > +#endif /* !X86EMUL_NO_MMX */ > + > +#if !defined(X86EMUL_NO_SIMD) && !defined(X86EMUL_NO_MMX) > +# define CASE_SIMD_PACKED_INT(pfx, opc) \ > case X86EMUL_OPC(pfx, opc): \ > case X86EMUL_OPC_66(pfx, opc) > -#define CASE_SIMD_PACKED_INT_VEX(pfx, opc) \ > +#elif !defined(X86EMUL_NO_SIMD) > +# define CASE_SIMD_PACKED_INT(pfx, opc) \ > +case X86EMUL_OPC_66(pfx, opc) > +#elif !defined(X86EMUL_NO_MMX) > +# define CASE_SIMD_PACKED_INT(pfx, opc) \ > +case X86EMUL_OPC(pfx, opc) > +#else > +# define CASE_SIMD_PACKED_INT(pfx, opc) C##pfx##_##opc > +#endif > + > +#ifndef X86EMUL_NO_SIMD > + > +# define CASE_SIMD_PACKED_INT_VEX(pfx, opc) \ > CASE_SIMD_PACKED_INT(pfx, opc): \ > case X86EMUL_OPC_VEX_66(pfx, opc) > > -#define CASE_SIMD_ALL_FP(kind, pfx, opc) \ > +# define CASE_SIMD_ALL_FP(kind, pfx, opc)\ > CASE_SIMD_PACKED_FP(kind, pfx, opc): \ > CASE_SIMD_SCALAR_FP(kind, pfx, opc) > -#define CASE_SIMD_PACKED_F
Re: [Xen-devel] [PATCH OSSTEST] kernel-build: enable XEN_BALLOON_MEMORY_HOTPLUG
Adding Juergen and Boris for feedback. On Tue, Mar 24, 2020 at 11:57:48AM +, Ian Jackson wrote: > Ian Jackson writes ("Re: [PATCH OSSTEST] kernel-build: enable > XEN_BALLOON_MEMORY_HOTPLUG"): > > Roger Pau Monne writes ("[PATCH OSSTEST] kernel-build: enable > > XEN_BALLOON_MEMORY_HOTPLUG"): > > > This allows a PVH/HVM domain to use unpopulated memory ranges to map > > > foreign memory or grants, and is required for a PVH dom0 to function > > > properly. > > > > > > Signed-off-by: Roger Pau Monné > > > > Acked-by: Ian Jackson > > > > I will push this to pretest immediately. > > Now done. Would you consider whether the default should be changed > in Linux and prepare a patch to do so if appropriate ? DYK if there's any reason why this is not on by default? Thanks, Roger.
Re: [Xen-devel] [PATCH v2] SVM: Add union intstat_t for offset 68h in vmcb struct
On 24.03.2020 11:37, Pu Wen wrote: > According to chapter "Appendix B Layout of VMCB" in the new version > (v3.32) AMD64 APM[1], bit 1 of the VMCB offset 68h is defined as > GUEST_INTERRUPT_MASK. > > In current xen codes, it use whole u64 interrupt_shadow to setup > interrupt shadow, which will misuse other bit in VMCB offset 68h > as part of interrupt_shadow. > > Add union intstat_t for VMCB offset 68h and fix codes to only use > bit 0 as intr_shadow according to the new APM description. > > Reference: > [1] https://www.amd.com/system/files/TechDocs/24593.pdf > > Signed-off-by: Pu Wen Looks okay now to me (with one further nit, see below), but ... > v1->v2: > - Copy the whole int_stat in nsvm_vmcb_prepare4vmrun() and > nsvm_vmcb_prepare4vmexit(). ... in particular this part I'd prefer to wait a little to whether Andrew or anyone else has a specific opinion one or the other way. > --- a/xen/include/asm-x86/hvm/svm/vmcb.h > +++ b/xen/include/asm-x86/hvm/svm/vmcb.h > @@ -316,6 +316,17 @@ typedef union > uint64_t raw; > } intinfo_t; > > +typedef union > +{ > +struct > +{ Nit: The brace should be on the same line as "struct"; can be taken care of while committing. Jan
Re: [Xen-devel] [PATCH OSSTEST] kernel-build: enable XEN_BALLOON_MEMORY_HOTPLUG
Roger Pau Monne writes ("[PATCH OSSTEST] kernel-build: enable XEN_BALLOON_MEMORY_HOTPLUG"): > This allows a PVH/HVM domain to use unpopulated memory ranges to map > foreign memory or grants, and is required for a PVH dom0 to function > properly. > > Signed-off-by: Roger Pau Monné Acked-by: Ian Jackson I will push this to pretest immediately. Thanks, Ian.
[Xen-devel] [qemu-mainline bisection] complete test-amd64-amd64-xl-qemuu-ovmf-amd64
branch xen-unstable xenbranch xen-unstable job test-amd64-amd64-xl-qemuu-ovmf-amd64 testid debian-hvm-install Tree: linux git://xenbits.xen.org/linux-pvops.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://git.qemu.org/qemu.git Tree: seabios git://xenbits.xen.org/osstest/seabios.git Tree: xen git://xenbits.xen.org/xen.git *** Found and reproduced problem changeset *** Bug is in tree: qemuu git://git.qemu.org/qemu.git Bug introduced: ca6155c0f2bd39b4b4162533be401c98bd960820 Bug not present: c220cdec4845f305034330f80ce297f1f997f2d3 Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/148968/ (Revision log too long, omitted.) For bisection revision-tuple graph see: http://logs.test-lab.xenproject.org/osstest/results/bisect/qemu-mainline/test-amd64-amd64-xl-qemuu-ovmf-amd64.debian-hvm-install.html Revision IDs in each graph node refer, respectively, to the Trees above. Running cs-bisection-step --graph-out=/home/logs/results/bisect/qemu-mainline/test-amd64-amd64-xl-qemuu-ovmf-amd64.debian-hvm-install --summary-out=tmp/148968.bisection-summary --basis-template=144861 --blessings=real,real-bisect qemu-mainline test-amd64-amd64-xl-qemuu-ovmf-amd64 debian-hvm-install Searching for failure / basis pass: 148879 fail [host=albana1] / 147546 ok. Failure / basis pass flights: 148879 / 147546 (tree with no url: minios) Tree: linux git://xenbits.xen.org/linux-pvops.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git Tree: qemuu git://git.qemu.org/qemu.git Tree: seabios git://xenbits.xen.org/osstest/seabios.git Tree: xen git://xenbits.xen.org/xen.git Latest c3038e718a19fc596f7b1baba0f83d5146dc7784 c530a75c1e6a472b0eb9558310b518f0dfcd8860 0c8ea9fe1adbbee230ee0c68f28b68ca2b0534bc d0d8ad39ecb51cd7497cd524484fe09f50876798 9b26a610936deaf436af9b7e39e4b7f0a35e4409 066a9956097b54530888b88ab9aa1ea02e42af5a d094e95fb7c61c5f46d8e446b4bdc028438dea1c Basis pass c3038e718a19fc596f7b1baba0f83d5146dc7784 c530a75c1e6a472b0eb9558310b518f0dfcd8860 70911f1f4aee0366b6122f2b90d367ec0f066beb d0d8ad39ecb51cd7497cd524484fe09f50876798 c1e667d2598b9b3ce62b8e89ed22dd38dfe9f57f 76551856b28d227cb0386a1ab0e774329b941f7d c47984aabead53918e5ba6d43cdb3f1467452739 Generating revisions with ./adhoc-revtuple-generator git://xenbits.xen.org/linux-pvops.git#c3038e718a19fc596f7b1baba0f83d5146dc7784-c3038e718a19fc596f7b1baba0f83d5146dc7784 git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860 git://xenbits.xen.org/osstest/ovmf.git#70911f1f4aee0366b6122f2b90d367ec0f066beb-0c8ea9fe1adbbee230ee0c68f28b68ca2b0534bc git://xenbits.xen.org/qemu-xen-traditional.git#d0d8ad39ecb51cd7497cd524484\ fe09f50876798-d0d8ad39ecb51cd7497cd524484fe09f50876798 git://git.qemu.org/qemu.git#c1e667d2598b9b3ce62b8e89ed22dd38dfe9f57f-9b26a610936deaf436af9b7e39e4b7f0a35e4409 git://xenbits.xen.org/osstest/seabios.git#76551856b28d227cb0386a1ab0e774329b941f7d-066a9956097b54530888b88ab9aa1ea02e42af5a git://xenbits.xen.org/xen.git#c47984aabead53918e5ba6d43cdb3f1467452739-d094e95fb7c61c5f46d8e446b4bdc028438dea1c >From git://cache:9419/git://git.qemu.org/qemu f1e748d279..09a98dd988 master -> origin/master Use of uninitialized value $parents in array dereference at ./adhoc-revtuple-generator line 465. Use of uninitialized value in concatenation (.) or string at ./adhoc-revtuple-generator line 465. Use of uninitialized value $parents in array dereference at ./adhoc-revtuple-generator line 465. Use of uninitialized value in concatenation (.) or string at ./adhoc-revtuple-generator line 465. Use of uninitialized value $parents in array dereference at ./adhoc-revtuple-generator line 465. Use of uninitialized value in concatenation (.) or string at ./adhoc-revtuple-generator line 465. Loaded 30209 nodes in revision graph Searching for test results: 147546 pass c3038e718a19fc596f7b1baba0f83d5146dc7784 c530a75c1e6a472b0eb9558310b518f0dfcd8860 70911f1f4aee0366b6122f2b90d367ec0f066beb d0d8ad39ecb51cd7497cd524484fe09f50876798 c1e667d2598b9b3ce62b8e89ed22dd38dfe9f57f 76551856b28d227cb0386a1ab0e774329b941f7d c47984aabead53918e5ba6d43cdb3f1467452739 147641 fail irrelevant 147710 fail irrelevant 147758 fail irrelevant 147821 fail irrelevant 148010 fail irrelevant 148184 fail irrelevant 148120 fail irrelevant 148261 fail irrelevant 148421 fail irrelevant 148340 fail irrelevant 148483 fail irrelevant 148545 fail irrelevant 148578 fail c3038e718a19fc596f7b1baba0f83d5146dc7784 c530a75c1e6a472b0eb9558310b518f0dfcd8860 799d88c1bae7978da23727df94b16f37bd1521f4 d0d8ad39ecb51cd7497cd524484fe09f50876798 61c265f0660ee476985808c8a
Re: [Xen-devel] [PATCH v2] xen/sched: fix onlining cpu with core scheduling active
Ping? On 10.03.20 10:06, Juergen Gross wrote: When onlining a cpu cpupool_cpu_add() checks whether all siblings of the new cpu are free in order to decide whether to add it to cpupool0. In case the added cpu is not the last sibling to be onlined this test is wrong as it only checks for all online siblings to be free. The test should include the check for the number of siblings having reached the scheduling granularity of cpupool0, too. Signed-off-by: Juergen Gross --- V2: - modify condition form >= to == (Jan Beulich) --- xen/common/sched/cpupool.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c index 9f70c7ec17..d40345b585 100644 --- a/xen/common/sched/cpupool.c +++ b/xen/common/sched/cpupool.c @@ -616,7 +616,8 @@ static int cpupool_cpu_add(unsigned int cpu) get_sched_res(cpu)->cpupool = NULL; cpus = sched_get_opt_cpumask(cpupool0->gran, cpu); -if ( cpumask_subset(cpus, &cpupool_free_cpus) ) +if ( cpumask_subset(cpus, &cpupool_free_cpus) && + cpumask_weight(cpus) == cpupool_get_granularity(cpupool0) ) ret = cpupool_assign_cpu_locked(cpupool0, cpu); rcu_read_unlock(&sched_res_rculock);
Re: [Xen-devel] [PATCH v3] xen/sched: fix cpu offlining with core scheduling
Ping? On 10.03.20 09:09, Juergen Gross wrote: Offlining a cpu with core scheduling active can result in a hanging system. Reason is the scheduling resource and unit of the to be removed cpus needs to be split in order to remove the cpu from its cpupool and move it to the idle scheduler. In case one of the involved cpus happens to have received a sched slave event due to a vcpu former having been running on that cpu being woken up again, it can happen that this cpu will enter sched_wait_rendezvous_in() while its scheduling resource is just about to be split. It might wait for ever for the other sibling to join, which will never happen due to the resources already being modified. This can easily be avoided by: - resetting the rendezvous counters of the idle unit which is kept - checking for a new scheduling resource in sched_wait_rendezvous_in() after reacquiring the scheduling lock and resetting the counters in that case without scheduling another vcpu - moving schedule resource modifications (in schedule_cpu_rm()) and retrieving (schedule(), sched_slave() is fine already, others are not critical) into locked regions Reported-by: Igor Druzhinin Signed-off-by: Juergen Gross --- V2: - fix unlocking, add some related comments V3: - small comment corrections (Jan Beulich) --- xen/common/sched/core.c | 39 --- 1 file changed, 32 insertions(+), 7 deletions(-) diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c index 7e8e7d2c39..626861a3fe 100644 --- a/xen/common/sched/core.c +++ b/xen/common/sched/core.c @@ -2299,6 +2299,10 @@ void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext) rcu_read_unlock(&sched_res_rculock); } +/* + * Switch to a new context or keep the current one running. + * On x86 it won't return, so it needs to drop the still held sched_res_rculock. + */ static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext, bool reset_idle_unit, s_time_t now) { @@ -2408,6 +2412,9 @@ static struct vcpu *sched_force_context_switch(struct vcpu *vprev, * zero do_schedule() is called and the rendezvous counter for leaving * context_switch() is set. All other members will wait until the counter is * becoming zero, dropping the schedule lock in between. + * Either returns the new unit to run, or NULL if no context switch is + * required or (on Arm) has already been performed. If NULL is returned + * sched_res_rculock has been dropped. */ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev, spinlock_t **lock, int cpu, @@ -2415,7 +2422,8 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev, { struct sched_unit *next; struct vcpu *v; -unsigned int gran = get_sched_res(cpu)->granularity; +struct sched_resource *sr = get_sched_res(cpu); +unsigned int gran = sr->granularity; if ( !--prev->rendezvous_in_cnt ) { @@ -2482,6 +2490,21 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev, atomic_set(&prev->next_task->rendezvous_out_cnt, 0); prev->rendezvous_in_cnt = 0; } + +/* + * Check for scheduling resource switched. This happens when we are + * moved away from our cpupool and cpus are subject of the idle + * scheduler now. + */ +if ( unlikely(sr != get_sched_res(cpu)) ) +{ +ASSERT(is_idle_unit(prev)); +atomic_set(&prev->next_task->rendezvous_out_cnt, 0); +prev->rendezvous_in_cnt = 0; +pcpu_schedule_unlock_irq(*lock, cpu); +rcu_read_unlock(&sched_res_rculock); +return NULL; +} } return prev->next_task; @@ -2567,11 +2590,11 @@ static void schedule(void) rcu_read_lock(&sched_res_rculock); +lock = pcpu_schedule_lock_irq(cpu); + sr = get_sched_res(cpu); gran = sr->granularity; -lock = pcpu_schedule_lock_irq(cpu); - if ( prev->rendezvous_in_cnt ) { /* @@ -3151,7 +3174,10 @@ int schedule_cpu_rm(unsigned int cpu) per_cpu(sched_res_idx, cpu_iter) = 0; if ( cpu_iter == cpu ) { -idle_vcpu[cpu_iter]->sched_unit->priv = NULL; +unit = idle_vcpu[cpu_iter]->sched_unit; +unit->priv = NULL; +atomic_set(&unit->next_task->rendezvous_out_cnt, 0); +unit->rendezvous_in_cnt = 0; } else { @@ -3182,6 +3208,8 @@ int schedule_cpu_rm(unsigned int cpu) } sr->scheduler = &sched_idle_ops; sr->sched_priv = NULL; +sr->granularity = 1; +sr->cpupool = NULL; smp_mb(); sr->schedule_lock = &sched_free_cpu_lock; @@ -3194,9 +3222,6 @@ int schedule_cpu_rm(unsigned int cpu) sched_free_udata(old_ops, vpriv_old); sched_free_p
Re: [Xen-devel] [PATCH 2/3] x86/nvmx: clarify and fix usage of nvmx_update_apicv
> From: Roger Pau Monné > Sent: Tuesday, March 24, 2020 7:23 PM > > On Tue, Mar 24, 2020 at 10:11:15AM +, Tian, Kevin wrote: > > > From: Roger Pau Monné > > > Sent: Tuesday, March 24, 2020 5:51 PM > > > > > > On Tue, Mar 24, 2020 at 06:03:28AM +, Tian, Kevin wrote: > > > > > From: Roger Pau Monne > > > > > Sent: Saturday, March 21, 2020 3:08 AM > > > > > > > > > > The current usage of nvmx_update_apicv is not clear: it is deeply > > > > > intertwined with the Ack interrupt on exit VMEXIT control. > > > > > > > > > > The code in nvmx_update_apicv should update the SVI (in service > > > interrupt) > > > > > field of the guest interrupt status only when the Ack interrupt on > > > > > exit is set, as it is used to record that the interrupt being > > > > > serviced is signaled in a vmcs field, and hence hasn't been injected > > > > > as on native. It's important to record the current in service > > > > > interrupt on the guest interrupt status field, or else further > > > > > interrupts won't respect the priority of the in service one. > > > > > > > > > > While clarifying the usage make sure that the SVI is only updated > when > > > > > Ack on exit is set and the nested vmcs interrupt info field is valid. > > > > > Or > > > > > else a guest not using the Ack on exit feature would loose interrupts > as > > > > > they would be signaled as being in service on the guest interrupt > > > > > status field but won't actually be recorded on the interrupt info vmcs > > > > > field, neither injected in any other way. > > > > > > > > It is insufficient. You also need to update RVI to enable virtual > > > > injection > > > > when Ack on exit is cleared. > > > > > > But RVI should be updated in vmx_intr_assist in that case, since > > > nvmx_intr_intercept shouldn't intercept the interrupt, as it should be > > > handled normally. > > > > As we discussed before, vmx_intr_assist is invoked before > nvmx_switch_guest. > > > > It is incorrectly to update RVI at that time since it might be still vmcs02 > being > > active (if no pending softirq to make it invoked again). > > > > Also nvmx_intr_intercept does intercept Ack-on-exit=0 case: > > > > if ( intack.source == hvm_intsrc_pic || > > intack.source == hvm_intsrc_lapic ) > > { > > vmx_inject_extint(intack.vector, intack.source); > > > > ctrl = get_vvmcs(v, VM_EXIT_CONTROLS); > > if ( ctrl & VM_EXIT_ACK_INTR_ON_EXIT ) > > { > > /* for now, duplicate the ack path in vmx_intr_assist */ > > hvm_vcpu_ack_pending_irq(v, intack); > > pt_intr_post(v, intack); > > > > intack = hvm_vcpu_has_pending_irq(v); > > if ( unlikely(intack.source != hvm_intsrc_none) ) > > vmx_enable_intr_window(v, intack); > > } > > else if ( !cpu_has_vmx_virtual_intr_delivery ) > > vmx_enable_intr_window(v, intack); > > > > return 1; > > Right, I always get confused by the switch happening in the vmentry > path. > > That only happens when the vcpu is in nested mode > (nestedhvm_vcpu_in_guestmode == true). That would be the case before a > vmexit, at least for the first call to vmx_intr_assist if there are > pending softirqs. > > Note that if there are pending softirqs the second time > nvmx_intr_intercept will return 0. Exactly. > > > } > > > > > > > > > > > > > > > Signed-off-by: Roger Pau Monné > > > > > --- > > > > > xen/arch/x86/hvm/vmx/vvmx.c | 11 ++- > > > > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c > > > b/xen/arch/x86/hvm/vmx/vvmx.c > > > > > index 1b8461ba30..180d01e385 100644 > > > > > --- a/xen/arch/x86/hvm/vmx/vvmx.c > > > > > +++ b/xen/arch/x86/hvm/vmx/vvmx.c > > > > > @@ -1383,7 +1383,7 @@ static void (struct vcpu *v) > > > > > { > > > > > struct nestedvmx *nvmx = &vcpu_2_nvmx(v); > > > > > unsigned long reason = get_vvmcs(v, VM_EXIT_REASON); > > > > > -uint32_t intr_info = nvmx->intr.intr_info; > > > > > +unsigned long intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO); > > > > > > > > well, above reminds me an interesting question. Combining last and this > > > > patch, we'd see essentially that it goes back to the state before > f96e1469 > > > > (at least when Ack on exit is true). > > > > > > Well, patch 1/3 is a revert of f96e1469, so just reverting f96e1469 > > > gets us to that state. > > > > you are right. I just wanted to point out that this patch alone doesn't > > do any real fix for ack-on-exit=1 case. It just makes sure that > > ack-on-exit=0 > > is skipped from that function. So it won't be the one fixing your previous > > problem. 😊 > > Yes, that's correct. > > > > > > > This patch is an attempt to clarify that nvmx_update_apicv is closely > > > related to the Ack on exit feature, as it modifies SVI in order to > > > signal the
Re: [Xen-devel] [PATCH 3/3] x86/nvmx: update exit bitmap on vmexit
On Tue, Mar 24, 2020 at 10:16:52AM +, Tian, Kevin wrote: > > From: Roger Pau Monné > > Sent: Tuesday, March 24, 2020 5:59 PM > > > > On Tue, Mar 24, 2020 at 06:22:43AM +, Tian, Kevin wrote: > > > > From: Roger Pau Monne > > > > Sent: Saturday, March 21, 2020 3:08 AM > > > > > > > > Current code in nvmx_update_apicv set the guest interrupt status field > > > > but doesn't update the exit bitmap, which can cause issues of lost > > > > interrupts on the L1 hypervisor if vmx_intr_assist gets > > > > short-circuited by nvmx_intr_intercept returning true. > > > > > > Above is not accurate. Currently Xen didn't choose to update the EOI > > > exit bitmap every time when there is a change. Instead, it chose to > > > batch the update before resuming to the guest. sort of optimization. > > > So it is not related to whether SVI is changed. We should always do the > > > bitmap update in nvmx_update_apicv, regardless of the setting of > > > Ack-on-exit ... > > > > But if Ack on exit is not set the GUEST_INTR_STATUS won't be changed > > by nvmx_intr_intercept, and hence there's no need to update the EOI > > exit map? > > > > If OTOH the GUEST_INTR_STATUS field is changed by vmx_intr_assist the > > bitmap will already be updated there. > > > > If you agree with my comment in patch 2/3 about setting RVI for > ack-on-exit=0, then EOI bitmap update should be done there too. Yes, I agree, see my reply to your comment. I (again) misremembered the sequence and assumed the switch will happen prior to calling vmx_intr_assist which is not the case. Thanks, Roger.
Re: [Xen-devel] [PATCH 2/3] x86/nvmx: clarify and fix usage of nvmx_update_apicv
On Tue, Mar 24, 2020 at 10:11:15AM +, Tian, Kevin wrote: > > From: Roger Pau Monné > > Sent: Tuesday, March 24, 2020 5:51 PM > > > > On Tue, Mar 24, 2020 at 06:03:28AM +, Tian, Kevin wrote: > > > > From: Roger Pau Monne > > > > Sent: Saturday, March 21, 2020 3:08 AM > > > > > > > > The current usage of nvmx_update_apicv is not clear: it is deeply > > > > intertwined with the Ack interrupt on exit VMEXIT control. > > > > > > > > The code in nvmx_update_apicv should update the SVI (in service > > interrupt) > > > > field of the guest interrupt status only when the Ack interrupt on > > > > exit is set, as it is used to record that the interrupt being > > > > serviced is signaled in a vmcs field, and hence hasn't been injected > > > > as on native. It's important to record the current in service > > > > interrupt on the guest interrupt status field, or else further > > > > interrupts won't respect the priority of the in service one. > > > > > > > > While clarifying the usage make sure that the SVI is only updated when > > > > Ack on exit is set and the nested vmcs interrupt info field is valid. Or > > > > else a guest not using the Ack on exit feature would loose interrupts as > > > > they would be signaled as being in service on the guest interrupt > > > > status field but won't actually be recorded on the interrupt info vmcs > > > > field, neither injected in any other way. > > > > > > It is insufficient. You also need to update RVI to enable virtual > > > injection > > > when Ack on exit is cleared. > > > > But RVI should be updated in vmx_intr_assist in that case, since > > nvmx_intr_intercept shouldn't intercept the interrupt, as it should be > > handled normally. > > As we discussed before, vmx_intr_assist is invoked before nvmx_switch_guest. > > It is incorrectly to update RVI at that time since it might be still vmcs02 > being > active (if no pending softirq to make it invoked again). > > Also nvmx_intr_intercept does intercept Ack-on-exit=0 case: > > if ( intack.source == hvm_intsrc_pic || > intack.source == hvm_intsrc_lapic ) > { > vmx_inject_extint(intack.vector, intack.source); > > ctrl = get_vvmcs(v, VM_EXIT_CONTROLS); > if ( ctrl & VM_EXIT_ACK_INTR_ON_EXIT ) > { > /* for now, duplicate the ack path in vmx_intr_assist */ > hvm_vcpu_ack_pending_irq(v, intack); > pt_intr_post(v, intack); > > intack = hvm_vcpu_has_pending_irq(v); > if ( unlikely(intack.source != hvm_intsrc_none) ) > vmx_enable_intr_window(v, intack); > } > else if ( !cpu_has_vmx_virtual_intr_delivery ) > vmx_enable_intr_window(v, intack); > > return 1; Right, I always get confused by the switch happening in the vmentry path. That only happens when the vcpu is in nested mode (nestedhvm_vcpu_in_guestmode == true). That would be the case before a vmexit, at least for the first call to vmx_intr_assist if there are pending softirqs. Note that if there are pending softirqs the second time nvmx_intr_intercept will return 0. > } > > > > > > > > > > > Signed-off-by: Roger Pau Monné > > > > --- > > > > xen/arch/x86/hvm/vmx/vvmx.c | 11 ++- > > > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c > > b/xen/arch/x86/hvm/vmx/vvmx.c > > > > index 1b8461ba30..180d01e385 100644 > > > > --- a/xen/arch/x86/hvm/vmx/vvmx.c > > > > +++ b/xen/arch/x86/hvm/vmx/vvmx.c > > > > @@ -1383,7 +1383,7 @@ static void (struct vcpu *v) > > > > { > > > > struct nestedvmx *nvmx = &vcpu_2_nvmx(v); > > > > unsigned long reason = get_vvmcs(v, VM_EXIT_REASON); > > > > -uint32_t intr_info = nvmx->intr.intr_info; > > > > +unsigned long intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO); > > > > > > well, above reminds me an interesting question. Combining last and this > > > patch, we'd see essentially that it goes back to the state before f96e1469 > > > (at least when Ack on exit is true). > > > > Well, patch 1/3 is a revert of f96e1469, so just reverting f96e1469 > > gets us to that state. > > you are right. I just wanted to point out that this patch alone doesn't > do any real fix for ack-on-exit=1 case. It just makes sure that ack-on-exit=0 > is skipped from that function. So it won't be the one fixing your previous > problem. 😊 Yes, that's correct. > > > > This patch is an attempt to clarify that nvmx_update_apicv is closely > > related to the Ack on exit feature, as it modifies SVI in order to > > signal the vector currently being serviced by the EXIT_INTR_INFO vmcs > > field. This was not obvious to me, as at first sight I assumed > > nvmx_update_apicv was actually injecting that vector into the guest. > > > > > iirc, that commit was introduced to enable > > > nested x2apic with apicv, and your very first version
Re: [Xen-devel] [PATCH 1/3] Revert "x86/vvmx: fix virtual interrupt injection when Ack on exit control is used"
> From: Roger Pau Monné > Sent: Tuesday, March 24, 2020 6:47 PM > > On Tue, Mar 24, 2020 at 08:49:27AM +, Tian, Kevin wrote: > > > From: Jan Beulich > > > Sent: Tuesday, March 24, 2020 4:10 PM > > > > > > On 24.03.2020 06:41, Tian, Kevin wrote: > > > >> From: Roger Pau Monné > > > >> Sent: Monday, March 23, 2020 10:49 PM > > > >> > > > >> On Mon, Mar 23, 2020 at 09:09:59AM +0100, Jan Beulich wrote: > > > >>> On 20.03.2020 20:07, Roger Pau Monne wrote: > > > This reverts commit f96e1469ad06b61796c60193daaeb9f8a96d7458. > > > > > > The commit is wrong, as the whole point of nvmx_update_apicv is > to > > > update the guest interrupt status field when the Ack on exit VMEXIT > > > control feature is enabled. > > > > > > Signed-off-by: Roger Pau Monné > > > >>> > > > >>> Before anyone gets to look at the other two patches, should this > > > >>> be thrown in right away? > > > >> > > > >> I would like if possible get a confirmation from Kevin (or anyone > > > >> else) that my understanding is correct. I find the nested code very > > > >> confusing, and I've already made a mistake while trying to fix it. > > > >> That being said, this was spotted by osstest as introducing a > > > >> regression, so I guess it's safe to just toss it in now. > > > >> > > > >> FWIW patch 2/3 attempts to provide a description of my > understanding > > > >> of how nvmx_update_apicv works. > > > >> > > > > > > > > I feel it is not good to take this patch alone, as it was introduced to > > > > fix > > > > another problem. W/o understanding whether the whole series can > > > > fix both old and new problems, we may risk putting nested interrupt > > > > logic in an even worse state... > > > > > > Well, okay, I'll wait then, but it would seem to me that reverting > > > wouldn't put us in a worse state than we were in before that change > > > was put in. > > > > Roger needs to make the call, i.e. which problem is more severe, old or > > new one. > > AFAICT those problems affect different kind of hardware, so it doesn't > make much difference. IMO f96e1469ad06b6 should be reverted because > it's plain wrong, and albeit it seemed to fix the issue in one of my > boxes it was just luck. > > I don't thinks it's going to make matters worse, as osstest is already > blocked, but reverting it alone without the rest of the series is not > going to unblock osstest either. If you agree it's wrong I think it > could go in now, even if it's just to avoid having to repost it with > newer versions of the series. > yes, I agree it's wrong. Reviewed-by: Kevin Tian
Re: [Xen-devel] [PATCH 1/3] Revert "x86/vvmx: fix virtual interrupt injection when Ack on exit control is used"
On Tue, Mar 24, 2020 at 08:49:27AM +, Tian, Kevin wrote: > > From: Jan Beulich > > Sent: Tuesday, March 24, 2020 4:10 PM > > > > On 24.03.2020 06:41, Tian, Kevin wrote: > > >> From: Roger Pau Monné > > >> Sent: Monday, March 23, 2020 10:49 PM > > >> > > >> On Mon, Mar 23, 2020 at 09:09:59AM +0100, Jan Beulich wrote: > > >>> On 20.03.2020 20:07, Roger Pau Monne wrote: > > This reverts commit f96e1469ad06b61796c60193daaeb9f8a96d7458. > > > > The commit is wrong, as the whole point of nvmx_update_apicv is to > > update the guest interrupt status field when the Ack on exit VMEXIT > > control feature is enabled. > > > > Signed-off-by: Roger Pau Monné > > >>> > > >>> Before anyone gets to look at the other two patches, should this > > >>> be thrown in right away? > > >> > > >> I would like if possible get a confirmation from Kevin (or anyone > > >> else) that my understanding is correct. I find the nested code very > > >> confusing, and I've already made a mistake while trying to fix it. > > >> That being said, this was spotted by osstest as introducing a > > >> regression, so I guess it's safe to just toss it in now. > > >> > > >> FWIW patch 2/3 attempts to provide a description of my understanding > > >> of how nvmx_update_apicv works. > > >> > > > > > > I feel it is not good to take this patch alone, as it was introduced to > > > fix > > > another problem. W/o understanding whether the whole series can > > > fix both old and new problems, we may risk putting nested interrupt > > > logic in an even worse state... > > > > Well, okay, I'll wait then, but it would seem to me that reverting > > wouldn't put us in a worse state than we were in before that change > > was put in. > > Roger needs to make the call, i.e. which problem is more severe, old or > new one. AFAICT those problems affect different kind of hardware, so it doesn't make much difference. IMO f96e1469ad06b6 should be reverted because it's plain wrong, and albeit it seemed to fix the issue in one of my boxes it was just luck. I don't thinks it's going to make matters worse, as osstest is already blocked, but reverting it alone without the rest of the series is not going to unblock osstest either. If you agree it's wrong I think it could go in now, even if it's just to avoid having to repost it with newer versions of the series. Thanks, Roger.
Re: [Xen-devel] [PATCH V6] x86/altp2m: Hypercall to set altp2m view visibility
Hi Kevin and sorry for the long reply time, On 10.03.2020 04:04, sTian, Kevin wrote: From: Alexandru Stefan ISAILA Sent: Tuesday, March 3, 2020 8:23 PM At this moment a guest can call vmfunc to change the altp2m view. This should be limited in order to avoid any unwanted view switch. I look forward to more elaboration of the motivation, especially for one who doesn't track altp2m closely like me. For example, do_altp2m_op mentions three modes: external, internal, coordinated. Then is this patch trying to limit the view switch in all three modes or just one of them? from the definition clearly external disallows guest to change any view (then why do we want per-view visibility control) while the latter two both allows guest to switch the view. later you noted some exception with mixed (internal) mode. then is this restriction pushed just for limited (coordinated) mode? As you stated, there are some exceptions with mixed (internal) mode. This restriction is clearly used for coordinated mode but it also restricts view switching in the external mode as well. I had a good example to start with, let's say we have one external agent in dom0 that uses view1 and view2 and the logic requires the switch between the views. At this point VMFUNC is available to the guest so with a simple asm code it can witch to view 0. At this time the external agent is not aware that the view has switched and further more view0 was not supposed to be in the main logic so it crashes. This example can be extended to any number of views. I hope it can paint a more clear picture of what this patch is trying to achive. btw I'm not sure why altp2m invents two names per mode, and their mapping looks a bit weird. e.g. isn't 'coordinated' mode sound more like 'mixed' mode? Yes that is true, it si a bit weird. The new xc_altp2m_set_visibility() solves this by making views invisible to vmfunc. if one doesn't want to make view visible to vmfunc, why can't he just avoids registering the view at the first place? Are you aiming for a scenario that dom0 may register 10 views, with 5 views visible to vmfunc with the other 5 views switched by dom0 itself? That is one scenario, another can be that dom0 has a number of views created and in some time it wants to be sure that only some of the views can be switched, saving the rest and making them visible when the time is right. Sure the example given up is another example. Regards, Alex
Re: [Xen-devel] [PATCH] SVM: Add union intstat_t for offset 68h in vmcb struct
On 2020/3/24 17:08, Jan Beulich wrote: > On 24.03.2020 05:52, Pu Wen wrote: >> --- a/xen/arch/x86/hvm/svm/nestedsvm.c >> +++ b/xen/arch/x86/hvm/svm/nestedsvm.c >> @@ -508,7 +508,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, >> struct cpu_user_regs *regs) >> } >> >> /* Shadow Mode */ >> -n2vmcb->interrupt_shadow = ns_vmcb->interrupt_shadow; >> +n2vmcb->int_stat.intr_shadow = ns_vmcb->int_stat.intr_shadow; > > While bit 1 is irrelevant to VMRUN, I still wonder whether you > shouldn't copy "raw" here. Ok, will copy the whole "raw" as suggested, thanks. >> @@ -1058,7 +1058,7 @@ nsvm_vmcb_prepare4vmexit(struct vcpu *v, struct >> cpu_user_regs *regs) >> ns_vmcb->_vintr.fields.intr_masking = 0; >> >> /* Shadow mode */ >> -ns_vmcb->interrupt_shadow = n2vmcb->interrupt_shadow; >> +ns_vmcb->int_stat.intr_shadow = n2vmcb->int_stat.intr_shadow; > > Same here, or at the very least you want to also copy bit 1 here. Ok, will change to: /* Interrupt state */ ns_vmcb->int_stat = n2vmcb->int_stat; >> --- a/xen/arch/x86/hvm/svm/svmdebug.c >> +++ b/xen/arch/x86/hvm/svm/svmdebug.c >> @@ -51,9 +51,9 @@ void svm_vmcb_dump(const char *from, const struct >> vmcb_struct *vmcb) >> printk("iopm_base_pa = %#"PRIx64" msrpm_base_pa = %#"PRIx64" >> tsc_offset = %#"PRIx64"\n", >> vmcb_get_iopm_base_pa(vmcb), vmcb_get_msrpm_base_pa(vmcb), >> vmcb_get_tsc_offset(vmcb)); >> -printk("tlb_control = %#x vintr = %#"PRIx64" interrupt_shadow = >> %#"PRIx64"\n", >> +printk("tlb_control = %#x vintr = %#"PRIx64" interrupt_shadow = %#x\n", >> vmcb->tlb_control, vmcb_get_vintr(vmcb).bytes, >> - vmcb->interrupt_shadow); >> + vmcb->int_stat.intr_shadow); OK, will dump like this: printk("tlb_control = %#x vintr = %#"PRIx64" int_stat = %#"PRIx64"\n", vmcb->tlb_control, vmcb_get_vintr(vmcb).bytes, vmcb->int_stat.raw); Thx. -- Regards, Pu Wen
[Xen-devel] [PATCH v2] SVM: Add union intstat_t for offset 68h in vmcb struct
According to chapter "Appendix B Layout of VMCB" in the new version (v3.32) AMD64 APM[1], bit 1 of the VMCB offset 68h is defined as GUEST_INTERRUPT_MASK. In current xen codes, it use whole u64 interrupt_shadow to setup interrupt shadow, which will misuse other bit in VMCB offset 68h as part of interrupt_shadow. Add union intstat_t for VMCB offset 68h and fix codes to only use bit 0 as intr_shadow according to the new APM description. Reference: [1] https://www.amd.com/system/files/TechDocs/24593.pdf Signed-off-by: Pu Wen --- v1->v2: - Copy the whole int_stat in nsvm_vmcb_prepare4vmrun() and nsvm_vmcb_prepare4vmexit(). - Dump all 64 bits of int_stat in svm_vmcb_dump(). xen/arch/x86/hvm/svm/nestedsvm.c | 8 xen/arch/x86/hvm/svm/svm.c | 8 xen/arch/x86/hvm/svm/svmdebug.c| 4 ++-- xen/include/asm-x86/hvm/svm/vmcb.h | 13 - 4 files changed, 22 insertions(+), 11 deletions(-) diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c index 3bd2a119d3..bbd06e342e 100644 --- a/xen/arch/x86/hvm/svm/nestedsvm.c +++ b/xen/arch/x86/hvm/svm/nestedsvm.c @@ -507,8 +507,8 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct cpu_user_regs *regs) n2vmcb->_vintr.fields.intr_masking = 1; } -/* Shadow Mode */ -n2vmcb->interrupt_shadow = ns_vmcb->interrupt_shadow; +/* Interrupt state */ +n2vmcb->int_stat = ns_vmcb->int_stat; /* Exit codes */ n2vmcb->exitcode = ns_vmcb->exitcode; @@ -1057,8 +1057,8 @@ nsvm_vmcb_prepare4vmexit(struct vcpu *v, struct cpu_user_regs *regs) if (!(svm->ns_hostflags.fields.vintrmask)) ns_vmcb->_vintr.fields.intr_masking = 0; -/* Shadow mode */ -ns_vmcb->interrupt_shadow = n2vmcb->interrupt_shadow; +/* Interrupt state */ +ns_vmcb->int_stat = n2vmcb->int_stat; /* Exit codes */ ns_vmcb->exitcode = n2vmcb->exitcode; diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 32d8d847f2..888f504a94 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -116,7 +116,7 @@ void __update_guest_eip(struct cpu_user_regs *regs, unsigned int inst_len) regs->rip += inst_len; regs->eflags &= ~X86_EFLAGS_RF; -curr->arch.hvm.svm.vmcb->interrupt_shadow = 0; +curr->arch.hvm.svm.vmcb->int_stat.intr_shadow = 0; if ( regs->eflags & X86_EFLAGS_TF ) hvm_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC); @@ -432,7 +432,7 @@ static unsigned int svm_get_interrupt_shadow(struct vcpu *v) struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb; unsigned int intr_shadow = 0; -if ( vmcb->interrupt_shadow ) +if ( vmcb->int_stat.intr_shadow ) intr_shadow |= HVM_INTR_SHADOW_MOV_SS | HVM_INTR_SHADOW_STI; if ( vmcb_get_general1_intercepts(vmcb) & GENERAL1_INTERCEPT_IRET ) @@ -446,7 +446,7 @@ static void svm_set_interrupt_shadow(struct vcpu *v, unsigned int intr_shadow) struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb; u32 general1_intercepts = vmcb_get_general1_intercepts(vmcb); -vmcb->interrupt_shadow = +vmcb->int_stat.intr_shadow = !!(intr_shadow & (HVM_INTR_SHADOW_MOV_SS|HVM_INTR_SHADOW_STI)); general1_intercepts &= ~GENERAL1_INTERCEPT_IRET; @@ -2945,7 +2945,7 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) * retired. */ general1_intercepts &= ~GENERAL1_INTERCEPT_IRET; -vmcb->interrupt_shadow = 1; +vmcb->int_stat.intr_shadow = 1; vmcb_set_general1_intercepts(vmcb, general1_intercepts); break; diff --git a/xen/arch/x86/hvm/svm/svmdebug.c b/xen/arch/x86/hvm/svm/svmdebug.c index 366a003f21..5aa9d410ba 100644 --- a/xen/arch/x86/hvm/svm/svmdebug.c +++ b/xen/arch/x86/hvm/svm/svmdebug.c @@ -51,9 +51,9 @@ void svm_vmcb_dump(const char *from, const struct vmcb_struct *vmcb) printk("iopm_base_pa = %#"PRIx64" msrpm_base_pa = %#"PRIx64" tsc_offset = %#"PRIx64"\n", vmcb_get_iopm_base_pa(vmcb), vmcb_get_msrpm_base_pa(vmcb), vmcb_get_tsc_offset(vmcb)); -printk("tlb_control = %#x vintr = %#"PRIx64" interrupt_shadow = %#"PRIx64"\n", +printk("tlb_control = %#x vintr = %#"PRIx64" int_stat = %#"PRIx64"\n", vmcb->tlb_control, vmcb_get_vintr(vmcb).bytes, - vmcb->interrupt_shadow); + vmcb->int_stat.raw); printk("event_inj %016"PRIx64", valid? %d, ec? %d, type %u, vector %#x\n", vmcb->event_inj.raw, vmcb->event_inj.v, vmcb->event_inj.ev, vmcb->event_inj.type, diff --git a/xen/include/asm-x86/hvm/svm/vmcb.h b/xen/include/asm-x86/hvm/svm/vmcb.h index b9e389d481..d8a3285752 100644 --- a/xen/include/asm-x86/hvm/svm/vmcb.h +++ b/xen/include/asm-x86/hvm/svm/vmcb.h @@ -316,6 +316,17 @@ typedef union uint64_t raw; } intinfo_t; +typedef union +{ +struct +{ +u64 intr_shadow:1; +u64 guest_intr_mask:1; +u64 resvd: 62; +}; +uint6
Re: [Xen-devel] [PATCH v4 0/3] make sure PGC_extra pages are dealt with properly
> -Original Message- > From: Jan Beulich > Sent: 24 March 2020 09:39 > To: Paul Durrant > Cc: xen-devel@lists.xenproject.org; Andrew Cooper > ; George Dunlap > ; Ian Jackson ; Julien > Grall ; > Konrad Rzeszutek Wilk ; Roger Pau Monné > ; Stefano > Stabellini ; Tim Deegan ; Wei Liu > > Subject: Re: [PATCH v4 0/3] make sure PGC_extra pages are dealt with properly > > On 18.03.2020 18:32, Paul Durrant wrote: > > This series was formerly called "remove one more shared xenheap page: > > shared_info" but I have dropped the patches actually changing shared_info > > and just left the PGC_extra clean-up that was previously intertwined. > > > > Paul Durrant (3): > > mm: keep PGC_extra pages on a separate list > > x86 / ioreq: use a MEMF_no_refcount allocation for server pages... > > mm: add 'is_special_page' inline function... > > So I'm confused - I had just replied twice to v6 patch 5/5. This > series calls itself v4 and consists of the middle three patches > of what v6 was. What's the deal? Is this really v7, and the two > patches have been dropped / postponed? Sorry, I clearly got confused with numbering against one of my other series. Yes, this should be v7. I wanted to send the patches that clear up use of PGC_extra, separating from the change to shared_info since I'm pressed for time to complete all the other conversions from xenheap pages such that I can send them as a single series. Paul > > Jan
Re: [Xen-devel] [PATCH v4 1/2] xen: Use evtchn_type_t as a type for event channels
On Mon, Mar 23, 2020 at 05:55:39PM -0400, Boris Ostrovsky wrote: > > On 3/23/20 12:15 PM, Yan Yankovskyi wrote: > > Make event channel functions pass event channel port using > > evtchn_port_t type. It eliminates signed <-> unsigned conversion. > > > > Signed-off-by: Yan Yankovskyi > > > If the difference is only the whitespace fix There were two more fixes: missing include in gntdev-common.h and a variable initialization in bind_virq_to_irq (eliminates warning).
Re: [Xen-devel] [PATCH 3/3] x86/nvmx: update exit bitmap on vmexit
> From: Roger Pau Monné > Sent: Tuesday, March 24, 2020 5:59 PM > > On Tue, Mar 24, 2020 at 06:22:43AM +, Tian, Kevin wrote: > > > From: Roger Pau Monne > > > Sent: Saturday, March 21, 2020 3:08 AM > > > > > > Current code in nvmx_update_apicv set the guest interrupt status field > > > but doesn't update the exit bitmap, which can cause issues of lost > > > interrupts on the L1 hypervisor if vmx_intr_assist gets > > > short-circuited by nvmx_intr_intercept returning true. > > > > Above is not accurate. Currently Xen didn't choose to update the EOI > > exit bitmap every time when there is a change. Instead, it chose to > > batch the update before resuming to the guest. sort of optimization. > > So it is not related to whether SVI is changed. We should always do the > > bitmap update in nvmx_update_apicv, regardless of the setting of > > Ack-on-exit ... > > But if Ack on exit is not set the GUEST_INTR_STATUS won't be changed > by nvmx_intr_intercept, and hence there's no need to update the EOI > exit map? > > If OTOH the GUEST_INTR_STATUS field is changed by vmx_intr_assist the > bitmap will already be updated there. > If you agree with my comment in patch 2/3 about setting RVI for ack-on-exit=0, then EOI bitmap update should be done there too. Thanks Kevin
[Xen-devel] [PATCH] tools/xenstore: fix a use after free problem in xenstored
Commit 562a1c0f7ef3fb ("tools/xenstore: dont unlink connection object twice") introduced a potential use after free problem in domain_cleanup(): after calling talloc_unlink() for domain->conn domain->conn is set to NULL. The problem is that domain is registered as talloc child of domain->conn, so it might be freed by the talloc_unlink() call. Fixes: 562a1c0f7ef3fb ("tools/xenstore: dont unlink connection object twice") Signed-off-by: Juergen Gross --- tools/xenstore/xenstored_domain.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c index baddaba5df..5858185211 100644 --- a/tools/xenstore/xenstored_domain.c +++ b/tools/xenstore/xenstored_domain.c @@ -214,6 +214,7 @@ static void domain_cleanup(void) { xc_dominfo_t dominfo; struct domain *domain; + struct connection *conn; int notify = 0; again: @@ -230,8 +231,10 @@ static void domain_cleanup(void) continue; } if (domain->conn) { - talloc_unlink(talloc_autofree_context(), domain->conn); + /* domain is a talloc child of domain->conn. */ + conn = domain->conn; domain->conn = NULL; + talloc_unlink(talloc_autofree_context(), conn); notify = 0; /* destroy_domain() fires the watch */ goto again; } -- 2.16.4
Re: [Xen-devel] [PATCH 2/3] x86/nvmx: clarify and fix usage of nvmx_update_apicv
> From: Roger Pau Monné > Sent: Tuesday, March 24, 2020 5:51 PM > > On Tue, Mar 24, 2020 at 06:03:28AM +, Tian, Kevin wrote: > > > From: Roger Pau Monne > > > Sent: Saturday, March 21, 2020 3:08 AM > > > > > > The current usage of nvmx_update_apicv is not clear: it is deeply > > > intertwined with the Ack interrupt on exit VMEXIT control. > > > > > > The code in nvmx_update_apicv should update the SVI (in service > interrupt) > > > field of the guest interrupt status only when the Ack interrupt on > > > exit is set, as it is used to record that the interrupt being > > > serviced is signaled in a vmcs field, and hence hasn't been injected > > > as on native. It's important to record the current in service > > > interrupt on the guest interrupt status field, or else further > > > interrupts won't respect the priority of the in service one. > > > > > > While clarifying the usage make sure that the SVI is only updated when > > > Ack on exit is set and the nested vmcs interrupt info field is valid. Or > > > else a guest not using the Ack on exit feature would loose interrupts as > > > they would be signaled as being in service on the guest interrupt > > > status field but won't actually be recorded on the interrupt info vmcs > > > field, neither injected in any other way. > > > > It is insufficient. You also need to update RVI to enable virtual injection > > when Ack on exit is cleared. > > But RVI should be updated in vmx_intr_assist in that case, since > nvmx_intr_intercept shouldn't intercept the interrupt, as it should be > handled normally. As we discussed before, vmx_intr_assist is invoked before nvmx_switch_guest. It is incorrectly to update RVI at that time since it might be still vmcs02 being active (if no pending softirq to make it invoked again). Also nvmx_intr_intercept does intercept Ack-on-exit=0 case: if ( intack.source == hvm_intsrc_pic || intack.source == hvm_intsrc_lapic ) { vmx_inject_extint(intack.vector, intack.source); ctrl = get_vvmcs(v, VM_EXIT_CONTROLS); if ( ctrl & VM_EXIT_ACK_INTR_ON_EXIT ) { /* for now, duplicate the ack path in vmx_intr_assist */ hvm_vcpu_ack_pending_irq(v, intack); pt_intr_post(v, intack); intack = hvm_vcpu_has_pending_irq(v); if ( unlikely(intack.source != hvm_intsrc_none) ) vmx_enable_intr_window(v, intack); } else if ( !cpu_has_vmx_virtual_intr_delivery ) vmx_enable_intr_window(v, intack); return 1; } > > > > > > > Signed-off-by: Roger Pau Monné > > > --- > > > xen/arch/x86/hvm/vmx/vvmx.c | 11 ++- > > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c > b/xen/arch/x86/hvm/vmx/vvmx.c > > > index 1b8461ba30..180d01e385 100644 > > > --- a/xen/arch/x86/hvm/vmx/vvmx.c > > > +++ b/xen/arch/x86/hvm/vmx/vvmx.c > > > @@ -1383,7 +1383,7 @@ static void (struct vcpu *v) > > > { > > > struct nestedvmx *nvmx = &vcpu_2_nvmx(v); > > > unsigned long reason = get_vvmcs(v, VM_EXIT_REASON); > > > -uint32_t intr_info = nvmx->intr.intr_info; > > > +unsigned long intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO); > > > > well, above reminds me an interesting question. Combining last and this > > patch, we'd see essentially that it goes back to the state before f96e1469 > > (at least when Ack on exit is true). > > Well, patch 1/3 is a revert of f96e1469, so just reverting f96e1469 > gets us to that state. you are right. I just wanted to point out that this patch alone doesn't do any real fix for ack-on-exit=1 case. It just makes sure that ack-on-exit=0 is skipped from that function. So it won't be the one fixing your previous problem. 😊 > > This patch is an attempt to clarify that nvmx_update_apicv is closely > related to the Ack on exit feature, as it modifies SVI in order to > signal the vector currently being serviced by the EXIT_INTR_INFO vmcs > field. This was not obvious to me, as at first sight I assumed > nvmx_update_apicv was actually injecting that vector into the guest. > > > iirc, that commit was introduced to enable > > nested x2apic with apicv, and your very first version even just removed > > the whole nvmx_update_apicv. Then now with the new reverted logic, > > are you still suffering x2apic problem? If not, does it imply the real fix > > is actually coming from patch 3/3 for eoi bitmap update? > > Yes, patch 3/3 is the one that fixed the issue. Note however that > strangely enough removing the call to nvmx_update_apicv (as my first > attempt to solve the issue) did fix it on one of my boxes. It depends > a lot on the available vmx features AFAICT. Did you confirm that with 3/3 alone can fix that issue? Just want to make sure the real gain of each patch, so we can reflect it in the commit msg in updated version. > > > > > >
Re: [Xen-devel] [PATCH 2/2] xen/mm: Introduce PGC_state_uninitialised
Hi David, On 23/03/2020 10:55, David Woodhouse wrote: On Mon, 2020-03-23 at 09:34 +, Julien Grall wrote: For liveupdate, we will need a way to initialize a page but mark it as already inuse (i.e in the same state as they would be if allocated normally). I am unconvinced of the veracity of this claim. We don't want to turn specific details of the current Xen buddy allocator part into of the implicit ABI of live update. That goes for the power-of-two zone boundaries, amongst other things. Why would you to do that? Marking the page as already used is no different to "PGC_state_unitialized" except the "struct page_info" and the internal of the buddy allocator would be properly setup for start rather than at free. What if Xen receives LU state in which *all* pages in a given zone are marked as already in use? That's one of the cases in which we *really* want to pass through init_heap_pages() instead of just free_heap_pages(), in order to allocate the zone data structures for the first pages that get freed into that zone. What if Xen starts to exclude more pages, like the exclusion at zero? What if new Xen wants to exclude an additional page due to a hardware erratum? It can't take it away from existing domains (especially if there are assigned PCI devices) but it could be part of the vetting in init_heap_pages(), for example. I don't think it would be safe to continue to run a guest using pages that were excluded for an HW erratum. It would be safer to not restart the domain (or replace the page) in the target Xen if that's hapenning. My intent for PGC_state_uninitialised was to mark pages that haven't been through init_heap_pages(), whatever init_heap_pages() does in the current version of Xen. The pages which are "already in use" because they're inherited through LU state should be in PGC_state_uninitialised, shouldn't they? I think using "PGC_state_unitialised" for preserved page is an abuse. I understand this is existing in other part of Xen (particularly on x86), but I would rather not try to add more. The PGC_state_unitialised may work for the current allocator because most of the fields are not going to be used after allocation. But it may not hold for any new allocator (I know the embedded folks are working on a new one). Perhaps if there's a need for a helper, it could be a companion function to init_heap_pages() which would return a boolean saying, "nah, I didn't want to do anything to this page anyway", which could short-circuit it into the PGC_state_inuse state. But I'm not sure I see the need for such an optimisation. I don't view it as an optimisation but as a way to avoid spreading the current misbehavior. Cheers, -- Julien Grall
Re: [Xen-devel] [PATCH 3/3] x86/nvmx: update exit bitmap on vmexit
On Tue, Mar 24, 2020 at 06:22:43AM +, Tian, Kevin wrote: > > From: Roger Pau Monne > > Sent: Saturday, March 21, 2020 3:08 AM > > > > Current code in nvmx_update_apicv set the guest interrupt status field > > but doesn't update the exit bitmap, which can cause issues of lost > > interrupts on the L1 hypervisor if vmx_intr_assist gets > > short-circuited by nvmx_intr_intercept returning true. > > Above is not accurate. Currently Xen didn't choose to update the EOI > exit bitmap every time when there is a change. Instead, it chose to > batch the update before resuming to the guest. sort of optimization. > So it is not related to whether SVI is changed. We should always do the > bitmap update in nvmx_update_apicv, regardless of the setting of > Ack-on-exit ... But if Ack on exit is not set the GUEST_INTR_STATUS won't be changed by nvmx_intr_intercept, and hence there's no need to update the EOI exit map? If OTOH the GUEST_INTR_STATUS field is changed by vmx_intr_assist the bitmap will already be updated there. Thanks, Roger.
Re: [Xen-devel] [PATCH 2/3] x86/nvmx: clarify and fix usage of nvmx_update_apicv
On Tue, Mar 24, 2020 at 06:03:28AM +, Tian, Kevin wrote: > > From: Roger Pau Monne > > Sent: Saturday, March 21, 2020 3:08 AM > > > > The current usage of nvmx_update_apicv is not clear: it is deeply > > intertwined with the Ack interrupt on exit VMEXIT control. > > > > The code in nvmx_update_apicv should update the SVI (in service interrupt) > > field of the guest interrupt status only when the Ack interrupt on > > exit is set, as it is used to record that the interrupt being > > serviced is signaled in a vmcs field, and hence hasn't been injected > > as on native. It's important to record the current in service > > interrupt on the guest interrupt status field, or else further > > interrupts won't respect the priority of the in service one. > > > > While clarifying the usage make sure that the SVI is only updated when > > Ack on exit is set and the nested vmcs interrupt info field is valid. Or > > else a guest not using the Ack on exit feature would loose interrupts as > > they would be signaled as being in service on the guest interrupt > > status field but won't actually be recorded on the interrupt info vmcs > > field, neither injected in any other way. > > It is insufficient. You also need to update RVI to enable virtual injection > when Ack on exit is cleared. But RVI should be updated in vmx_intr_assist in that case, since nvmx_intr_intercept shouldn't intercept the interrupt, as it should be handled normally. > > > > Signed-off-by: Roger Pau Monné > > --- > > xen/arch/x86/hvm/vmx/vvmx.c | 11 ++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c > > index 1b8461ba30..180d01e385 100644 > > --- a/xen/arch/x86/hvm/vmx/vvmx.c > > +++ b/xen/arch/x86/hvm/vmx/vvmx.c > > @@ -1383,7 +1383,7 @@ static void (struct vcpu *v) > > { > > struct nestedvmx *nvmx = &vcpu_2_nvmx(v); > > unsigned long reason = get_vvmcs(v, VM_EXIT_REASON); > > -uint32_t intr_info = nvmx->intr.intr_info; > > +unsigned long intr_info = get_vvmcs(v, VM_EXIT_INTR_INFO); > > well, above reminds me an interesting question. Combining last and this > patch, we'd see essentially that it goes back to the state before f96e1469 > (at least when Ack on exit is true). Well, patch 1/3 is a revert of f96e1469, so just reverting f96e1469 gets us to that state. This patch is an attempt to clarify that nvmx_update_apicv is closely related to the Ack on exit feature, as it modifies SVI in order to signal the vector currently being serviced by the EXIT_INTR_INFO vmcs field. This was not obvious to me, as at first sight I assumed nvmx_update_apicv was actually injecting that vector into the guest. > iirc, that commit was introduced to enable > nested x2apic with apicv, and your very first version even just removed > the whole nvmx_update_apicv. Then now with the new reverted logic, > are you still suffering x2apic problem? If not, does it imply the real fix > is actually coming from patch 3/3 for eoi bitmap update? Yes, patch 3/3 is the one that fixed the issue. Note however that strangely enough removing the call to nvmx_update_apicv (as my first attempt to solve the issue) did fix it on one of my boxes. It depends a lot on the available vmx features AFAICT. > > > > if ( reason == EXIT_REASON_EXTERNAL_INTERRUPT && > > nvmx->intr.source == hvm_intsrc_lapic && > > @@ -1399,6 +1399,15 @@ static void nvmx_update_apicv(struct vcpu *v) > > ppr = vlapic_set_ppr(vlapic); > > WARN_ON((ppr & 0xf0) != (vector & 0xf0)); > > > > +/* > > + * SVI must be updated when the interrupt has been signaled using > > the > > + * Ack on exit feature, or else the currently in-service interrupt > > + * won't be respected. > > + * > > + * Note that this is specific to the fact that when doing a VMEXIT > > an > > + * interrupt might get delivered using the interrupt info vmcs > > field > > + * instead of being injected normally. > > + */ > > I'm not sure mentioning SVI specifically here is necessary, since all steps > here are required - updating iir, isr, rvi, svi, ppr, etc. It is just an > emulation > of updating virtual APIC state as if a virtual interrupt delivery happens. Hm, it was hard for me to figure out that SVI is modified here in order to signal that the Ack on exit vector is currently in service, as opposed to being injected using the virtual interrupt delivery mechanism. I just wanted to clarify that the purpose of this function is not to inject the vector in intr_info, but rather to signal that such vector has already been injected using a different mechanism. I'm happy to reword this, but IMO it should be clear that the purpose of the function is not so much to inject an interrupt, but to update the virtual interrupt delivery field in order to signal that an interrupt has been injected using a different mechanism. Thanks, Rog
[Xen-devel] Ping: [PATCH] x86/HVM: fix AMD ECS handling for Fam 10
On 16.03.2020 14:41, Andrew Cooper wrote: > On 16/03/2020 11:00, Jan Beulich wrote: >> The involved comparison was, very likely inadvertently, converted from >>> = to > when making changes unrelated to the actual family range. >> >> Fixes: 9841eb71ea87 ("x86/cpuid: Drop a guests cached x86 family and model >> information") >> Signed-off-by: Jan Beulich > > Reviewed-by: Andrew Cooper Paul? >> --- a/xen/arch/x86/hvm/ioreq.c >> +++ b/xen/arch/x86/hvm/ioreq.c >> @@ -1284,7 +1284,7 @@ struct hvm_ioreq_server *hvm_select_iore >> if ( CF8_ADDR_HI(cf8) && >> d->arch.cpuid->x86_vendor == X86_VENDOR_AMD && >> (x86_fam = get_cpu_family( >> - d->arch.cpuid->basic.raw_fms, NULL, NULL)) > 0x10 && >> + d->arch.cpuid->basic.raw_fms, NULL, NULL)) >= 0x10 && >> x86_fam < 0x17 ) >> { >> uint64_t msr_val; >> >
Re: [Xen-devel] [PATCH v4 0/3] make sure PGC_extra pages are dealt with properly
On 18.03.2020 18:32, Paul Durrant wrote: > This series was formerly called "remove one more shared xenheap page: > shared_info" but I have dropped the patches actually changing shared_info > and just left the PGC_extra clean-up that was previously intertwined. > > Paul Durrant (3): > mm: keep PGC_extra pages on a separate list > x86 / ioreq: use a MEMF_no_refcount allocation for server pages... > mm: add 'is_special_page' inline function... So I'm confused - I had just replied twice to v6 patch 5/5. This series calls itself v4 and consists of the middle three patches of what v6 was. What's the deal? Is this really v7, and the two patches have been dropped / postponed? Jan
Re: [Xen-devel] [PATCH v6 5/5] domain: use PGC_extra domheap page for shared_info
On 24.03.2020 10:26, Jan Beulich wrote: > On 10.03.2020 18:49, p...@xen.org wrote: >> From: Paul Durrant >> >> Currently shared_info is a shared xenheap page but shared xenheap pages >> complicate future plans for live-update of Xen so it is desirable to, >> where possible, not use them [1]. This patch therefore converts shared_info >> into a PGC_extra domheap page. This does entail freeing shared_info during >> domain_relinquish_resources() rather than domain_destroy() so care is >> needed to avoid de-referencing a NULL shared_info pointer hence some >> extra checks of 'is_dying' are needed. >> >> NOTE: For Arm, the call to free_shared_info() in arch_domain_destroy() is >> left in place since it is idempotent and called in the error path for >> arch_domain_create(). >> >> [1] See >> https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg02018.html >> >> Signed-off-by: Paul Durrant >> --- >> Cc: Stefano Stabellini >> Cc: Julien Grall >> Cc: Volodymyr Babchuk >> Cc: Andrew Cooper >> Cc: George Dunlap >> Cc: Ian Jackson >> Cc: Jan Beulich >> Cc: Konrad Rzeszutek Wilk >> Cc: Wei Liu >> >> v6: >> - Drop dump_shared_info() but tag the shared info in the 'ExtraPage' >>dump >> >> v5: >> - Incorporate new dump_shared_info() function >> >> v2: >> - Addressed comments from Julien >> - Expanded the commit comment to explain why this patch is wanted >> --- >> xen/arch/arm/domain.c | 2 ++ > > Julien, Stefano? (I'd prefer to commit the entire series in one go, > rather than leaving out just this last patch.) Actually - never mind, I've just realized that there are still some pending items on the last two patches of this series. Jan
Re: [Xen-devel] [PATCH v6 5/5] domain: use PGC_extra domheap page for shared_info
On 10.03.2020 18:49, p...@xen.org wrote: > From: Paul Durrant > > Currently shared_info is a shared xenheap page but shared xenheap pages > complicate future plans for live-update of Xen so it is desirable to, > where possible, not use them [1]. This patch therefore converts shared_info > into a PGC_extra domheap page. This does entail freeing shared_info during > domain_relinquish_resources() rather than domain_destroy() so care is > needed to avoid de-referencing a NULL shared_info pointer hence some > extra checks of 'is_dying' are needed. > > NOTE: For Arm, the call to free_shared_info() in arch_domain_destroy() is > left in place since it is idempotent and called in the error path for > arch_domain_create(). > > [1] See > https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg02018.html > > Signed-off-by: Paul Durrant > --- > Cc: Stefano Stabellini > Cc: Julien Grall > Cc: Volodymyr Babchuk > Cc: Andrew Cooper > Cc: George Dunlap > Cc: Ian Jackson > Cc: Jan Beulich > Cc: Konrad Rzeszutek Wilk > Cc: Wei Liu > > v6: > - Drop dump_shared_info() but tag the shared info in the 'ExtraPage' >dump > > v5: > - Incorporate new dump_shared_info() function > > v2: > - Addressed comments from Julien > - Expanded the commit comment to explain why this patch is wanted > --- > xen/arch/arm/domain.c | 2 ++ Julien, Stefano? (I'd prefer to commit the entire series in one go, rather than leaving out just this last patch.) Jan
Re: [Xen-devel] [PATCH] SVM: Add union intstat_t for offset 68h in vmcb struct
On 24.03.2020 05:52, Pu Wen wrote: > --- a/xen/arch/x86/hvm/svm/nestedsvm.c > +++ b/xen/arch/x86/hvm/svm/nestedsvm.c > @@ -508,7 +508,7 @@ static int nsvm_vmcb_prepare4vmrun(struct vcpu *v, struct > cpu_user_regs *regs) > } > > /* Shadow Mode */ > -n2vmcb->interrupt_shadow = ns_vmcb->interrupt_shadow; > +n2vmcb->int_stat.intr_shadow = ns_vmcb->int_stat.intr_shadow; While bit 1 is irrelevant to VMRUN, I still wonder whether you shouldn't copy "raw" here. > @@ -1058,7 +1058,7 @@ nsvm_vmcb_prepare4vmexit(struct vcpu *v, struct > cpu_user_regs *regs) > ns_vmcb->_vintr.fields.intr_masking = 0; > > /* Shadow mode */ > -ns_vmcb->interrupt_shadow = n2vmcb->interrupt_shadow; > +ns_vmcb->int_stat.intr_shadow = n2vmcb->int_stat.intr_shadow; Same here, or at the very least you want to also copy bit 1 here. > --- a/xen/arch/x86/hvm/svm/svmdebug.c > +++ b/xen/arch/x86/hvm/svm/svmdebug.c > @@ -51,9 +51,9 @@ void svm_vmcb_dump(const char *from, const struct > vmcb_struct *vmcb) > printk("iopm_base_pa = %#"PRIx64" msrpm_base_pa = %#"PRIx64" tsc_offset > = %#"PRIx64"\n", > vmcb_get_iopm_base_pa(vmcb), vmcb_get_msrpm_base_pa(vmcb), > vmcb_get_tsc_offset(vmcb)); > -printk("tlb_control = %#x vintr = %#"PRIx64" interrupt_shadow = > %#"PRIx64"\n", > +printk("tlb_control = %#x vintr = %#"PRIx64" interrupt_shadow = %#x\n", > vmcb->tlb_control, vmcb_get_vintr(vmcb).bytes, > - vmcb->interrupt_shadow); > + vmcb->int_stat.intr_shadow); Please dump all 64 bits here. Jan
Re: [Xen-devel] [PATCH] x86/mce: Correct the machine check vendor for Hygon
On 24.03.2020 05:51, Pu Wen wrote: > Currently the xl dmesg output on Hygon platforms will be > "(XEN) CPU0: AMD Fam18h machine check reporting enabled", > which is misleading as AMD does not have family 18h (Hygon > negotiated with AMD to confirm that only Hygon has family 18h). > > To correct this, add Hygon machine check type and vendor string. > > Signed-off-by: Pu Wen Reviewed-by: Jan Beulich
Re: [Xen-devel] [PATCH 1/3] Revert "x86/vvmx: fix virtual interrupt injection when Ack on exit control is used"
> From: Jan Beulich > Sent: Tuesday, March 24, 2020 4:10 PM > > On 24.03.2020 06:41, Tian, Kevin wrote: > >> From: Roger Pau Monné > >> Sent: Monday, March 23, 2020 10:49 PM > >> > >> On Mon, Mar 23, 2020 at 09:09:59AM +0100, Jan Beulich wrote: > >>> On 20.03.2020 20:07, Roger Pau Monne wrote: > This reverts commit f96e1469ad06b61796c60193daaeb9f8a96d7458. > > The commit is wrong, as the whole point of nvmx_update_apicv is to > update the guest interrupt status field when the Ack on exit VMEXIT > control feature is enabled. > > Signed-off-by: Roger Pau Monné > >>> > >>> Before anyone gets to look at the other two patches, should this > >>> be thrown in right away? > >> > >> I would like if possible get a confirmation from Kevin (or anyone > >> else) that my understanding is correct. I find the nested code very > >> confusing, and I've already made a mistake while trying to fix it. > >> That being said, this was spotted by osstest as introducing a > >> regression, so I guess it's safe to just toss it in now. > >> > >> FWIW patch 2/3 attempts to provide a description of my understanding > >> of how nvmx_update_apicv works. > >> > > > > I feel it is not good to take this patch alone, as it was introduced to fix > > another problem. W/o understanding whether the whole series can > > fix both old and new problems, we may risk putting nested interrupt > > logic in an even worse state... > > Well, okay, I'll wait then, but it would seem to me that reverting > wouldn't put us in a worse state than we were in before that change > was put in. Roger needs to make the call, i.e. which problem is more severe, old or new one. Thanks Kevin
[Xen-devel] [linux-5.4 test] 148903: regressions - trouble: fail/pass/starved
flight 148903 linux-5.4 real [real] http://logs.test-lab.xenproject.org/osstest/logs/148903/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 146121 Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10 fail in 148814 pass in 148903 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeat fail pass in 148814 Tests which did not succeed, but are not blocking: test-amd64-amd64-dom0pvh-xl-intel 15 guest-saverestore fail baseline untested test-amd64-i386-xl-pvshim12 guest-start fail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-dom0pvh-xl-amd 2 hosts-allocate starved n/a version targeted for testing: linux585e0cc080690239f0689973c119459ff69db473 baseline version: linux122179cb7d648a6f36b20dd6bf34f953cb384c30 La
Re: [Xen-devel] [PATCH 1/3] Revert "x86/vvmx: fix virtual interrupt injection when Ack on exit control is used"
On 24.03.2020 06:41, Tian, Kevin wrote: >> From: Roger Pau Monné >> Sent: Monday, March 23, 2020 10:49 PM >> >> On Mon, Mar 23, 2020 at 09:09:59AM +0100, Jan Beulich wrote: >>> On 20.03.2020 20:07, Roger Pau Monne wrote: This reverts commit f96e1469ad06b61796c60193daaeb9f8a96d7458. The commit is wrong, as the whole point of nvmx_update_apicv is to update the guest interrupt status field when the Ack on exit VMEXIT control feature is enabled. Signed-off-by: Roger Pau Monné >>> >>> Before anyone gets to look at the other two patches, should this >>> be thrown in right away? >> >> I would like if possible get a confirmation from Kevin (or anyone >> else) that my understanding is correct. I find the nested code very >> confusing, and I've already made a mistake while trying to fix it. >> That being said, this was spotted by osstest as introducing a >> regression, so I guess it's safe to just toss it in now. >> >> FWIW patch 2/3 attempts to provide a description of my understanding >> of how nvmx_update_apicv works. >> > > I feel it is not good to take this patch alone, as it was introduced to fix > another problem. W/o understanding whether the whole series can > fix both old and new problems, we may risk putting nested interrupt > logic in an even worse state... Well, okay, I'll wait then, but it would seem to me that reverting wouldn't put us in a worse state than we were in before that change was put in. Jan