[qemu-mainline test] 180637: tolerable FAIL - PUSHED
flight 180637 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/180637/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180610 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180610 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 180610 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 180610 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 180610 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 180610 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180610 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 180610 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass version targeted for testing: qemuu278238505d28d292927bff7683f39fb4fbca7fd1 baseline version: qemuud530697ca20e19f7a626f4c1c8b26fccd0dc4470 Last test of basis 180610 2023-05-10 23:38:37 Z2 days Testing same since 180621 2023-05-11 14:27:55 Z1 days3 attempts People who touched revisions under test: Dr. David Alan Gilbert Jamie Iles Juan Quintela Laurent Vivier Lukas Straub Peter Maydell Richard Henderson Thomas Huth Vladimir Sementsov-Ogievskiy jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm
[PATCH 2/4] automation: enable earlyprintk=xen for both dom0 and domU in hw tests
Make debugging early boot failures easier based on just CI logs. Signed-off-by: Marek Marczykowski-Górecki --- automation/scripts/qubes-x86-64.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/automation/scripts/qubes-x86-64.sh b/automation/scripts/qubes-x86-64.sh index ae766395d184..bd09451d7d28 100755 --- a/automation/scripts/qubes-x86-64.sh +++ b/automation/scripts/qubes-x86-64.sh @@ -80,7 +80,7 @@ type = "'${test_variant#pci-}'" name = "domU" kernel = "/boot/vmlinuz" ramdisk = "/boot/initrd-domU" -extra = "root=/dev/ram0 console=hvc0" +extra = "root=/dev/ram0 console=hvc0 earlyprintk=xen" memory = 512 vif = [ ] disk = [ ] @@ -186,7 +186,7 @@ CONTROLLER=control@thor.testnet echo " multiboot2 (http)/gitlab-ci/xen $CONSOLE_OPTS loglvl=all guest_loglvl=all dom0_mem=4G -module2 (http)/gitlab-ci/vmlinuz console=hvc0 root=/dev/ram0 +module2 (http)/gitlab-ci/vmlinuz console=hvc0 root=/dev/ram0 earlyprintk=xen module2 (http)/gitlab-ci/initrd-dom0 " > $TFTP/grub.cfg -- git-series 0.9.1
[PATCH 3/4] automation: add x86_64 tests on a AMD Zen3+ runner
This adds another physical runner to Gitlab-CI, running similar set of jobs that the Adler Lake one. The machine specifically is MinisForum UM773 Lite with AMD Ryzen 7 7735HS The PV passthrough test is skipped as currently it fails on this system with: (d1) Can't find new memory area for initrd needed due to E820 map conflict The S3 test is skipped as it currently fails - the system seems to suspend properly (power LED blinks), but when woken up the power LED gets back to solid on and the fan spins at top speed and but otherwise there is no signs of if life from the system (no output on the console, HDMI or anything else). Signed-off-by: Marek Marczykowski-Górecki --- automation/gitlab-ci/test.yaml | 26 ++ 1 file changed, 26 insertions(+) diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml index cb7fd5c272e9..81d027532cca 100644 --- a/automation/gitlab-ci/test.yaml +++ b/automation/gitlab-ci/test.yaml @@ -108,6 +108,16 @@ tags: - qubes-hw2 +.zen3p-x86-64: + # it's really similar to the above + extends: .adl-x86-64 + variables: +PCIDEV: "01:00.0" +PCIDEV_INTR: "MSI-X" +CONSOLE_OPTS: "console=com1 com1=115200,8n1,pci,msi" + tags: +- qubes-hw11 + # Test jobs build-each-commit-gcc: extends: .test-jobs-common @@ -176,6 +186,22 @@ adl-pci-hvm-x86-64-gcc-debug: - *x86-64-test-needs - alpine-3.12-gcc-debug +zen3p-smoke-x86-64-gcc-debug: + extends: .zen3p-x86-64 + script: +- ./automation/scripts/qubes-x86-64.sh 2>&1 | tee ${LOGFILE} + needs: +- *x86-64-test-needs +- alpine-3.12-gcc-debug + +zen3p-pci-hvm-x86-64-gcc-debug: + extends: .zen3p-x86-64 + script: +- ./automation/scripts/qubes-x86-64.sh pci-hvm 2>&1 | tee ${LOGFILE} + needs: +- *x86-64-test-needs +- alpine-3.12-gcc-debug + qemu-smoke-dom0-arm64-gcc: extends: .qemu-arm64 script: -- git-series 0.9.1
[PATCH 0/4] automation: add AMD hw runner
This series adds another hardware runner to the CI. See individual patch descriptions for details. Marek Marczykowski-Górecki (4): automation: make console options configurable via variables automation: enable earlyprintk=xen for both dom0 and domU in hw tests automation: add x86_64 tests on a AMD Zen3+ runner automation: add PV passthrough tests on a AMD Zen3+ runner automation/gitlab-ci/test.yaml | 36 +++- automation/scripts/qubes-x86-64.sh | 10 - 2 files changed, 41 insertions(+), 5 deletions(-) base-commit: 31c65549746179e16cf3f82b694b4b1e0b7545ca -- git-series 0.9.1
[PATCH 1/4] automation: make console options configurable via variables
This makes the test script easier reusable for different runners, where console may be connected differently. Include both console= option and configuration for specific chosen console too (like com1= here) in the 'CONSOLE_OPTS' variable. Signed-off-by: Marek Marczykowski-Górecki --- This will conflict with Stefano's patch, as both modify multiboot2 line, but it shouldn't be too hard to resolve the conflict manually (both replace console opts with a variable, and add extra opts at the end). --- automation/gitlab-ci/test.yaml | 1 + automation/scripts/qubes-x86-64.sh | 6 +++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml index 55ca0c27dc49..cb7fd5c272e9 100644 --- a/automation/gitlab-ci/test.yaml +++ b/automation/gitlab-ci/test.yaml @@ -96,6 +96,7 @@ LOGFILE: smoke-test.log PCIDEV: "03:00.0" PCIDEV_INTR: "MSI-X" +CONSOLE_OPTS: "console=com1 com1=115200,8n1" artifacts: paths: - smoke.serial diff --git a/automation/scripts/qubes-x86-64.sh b/automation/scripts/qubes-x86-64.sh index 056faf9e6de8..ae766395d184 100755 --- a/automation/scripts/qubes-x86-64.sh +++ b/automation/scripts/qubes-x86-64.sh @@ -184,11 +184,11 @@ cd .. TFTP=/scratch/gitlab-runner/tftp CONTROLLER=control@thor.testnet -echo ' -multiboot2 (http)/gitlab-ci/xen console=com1 com1=115200,8n1 loglvl=all guest_loglvl=all dom0_mem=4G +echo " +multiboot2 (http)/gitlab-ci/xen $CONSOLE_OPTS loglvl=all guest_loglvl=all dom0_mem=4G module2 (http)/gitlab-ci/vmlinuz console=hvc0 root=/dev/ram0 module2 (http)/gitlab-ci/initrd-dom0 -' > $TFTP/grub.cfg +" > $TFTP/grub.cfg cp -f binaries/xen $TFTP/xen cp -f binaries/bzImage $TFTP/vmlinuz -- git-series 0.9.1
[PATCH 4/4] automation: add PV passthrough tests on a AMD Zen3+ runner
The PV passthrough test currently fails on this system with: (d1) Can't find new memory area for initrd needed due to E820 map conflict Setting e820_host=1 does not help. So, add this test with "allow_failure: true" until the problem is fixed. Signed-off-by: Marek Marczykowski-Górecki --- I'm unsure if this should be included. On one hand, the test case will help verifying potential fix. But on the other hand, until fixed it will be wasting time. --- automation/gitlab-ci/test.yaml | 9 + 1 file changed, 9 insertions(+) diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml index 81d027532cca..7becb7a6b782 100644 --- a/automation/gitlab-ci/test.yaml +++ b/automation/gitlab-ci/test.yaml @@ -194,6 +194,15 @@ zen3p-smoke-x86-64-gcc-debug: - *x86-64-test-needs - alpine-3.12-gcc-debug +zen3p-pci-pv-x86-64-gcc-debug: + extends: .zen3p-x86-64 + allow_failure: true + script: +- ./automation/scripts/qubes-x86-64.sh pci-pv 2>&1 | tee ${LOGFILE} + needs: +- *x86-64-test-needs +- alpine-3.12-gcc-debug + zen3p-pci-hvm-x86-64-gcc-debug: extends: .zen3p-x86-64 script: -- git-series 0.9.1
Re: [PATCH] automation: add a Dom0 PVH test based on Qubes' runner
On Fri, May 12, 2023 at 06:28:33PM -0700, Stefano Stabellini wrote: > From: Stefano Stabellini > > Straightforward Dom0 PVH test based on the existing basic Smoke test for > the Qubes runner. > > Signed-off-by: Stefano Stabellini > --- > automation/gitlab-ci/test.yaml | 8 > automation/scripts/qubes-x86-64.sh | 14 +- > 2 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml > index 55ca0c27dc..9c0e08d456 100644 > --- a/automation/gitlab-ci/test.yaml > +++ b/automation/gitlab-ci/test.yaml > @@ -149,6 +149,14 @@ adl-smoke-x86-64-gcc-debug: > - *x86-64-test-needs > - alpine-3.12-gcc-debug > > +adl-smoke-x86-64-dom0pvh-gcc-debug: > + extends: .adl-x86-64 > + script: > +- ./automation/scripts/qubes-x86-64.sh dom0pvh 2>&1 | tee ${LOGFILE} > + needs: > +- *x86-64-test-needs > +- alpine-3.12-gcc-debug > + > adl-suspend-x86-64-gcc-debug: >extends: .adl-x86-64 >script: > diff --git a/automation/scripts/qubes-x86-64.sh > b/automation/scripts/qubes-x86-64.sh > index 056faf9e6d..35b9386e5d 100755 > --- a/automation/scripts/qubes-x86-64.sh > +++ b/automation/scripts/qubes-x86-64.sh > @@ -5,6 +5,7 @@ set -ex > test_variant=$1 > > ### defaults > +dom0pvh= Maybe better name it extra_xen_opts=? I can see it useful in other tests in the future too. > wait_and_wakeup= > timeout=120 > domU_config=' > @@ -18,8 +19,8 @@ vif = [ "bridge=xenbr0", ] > disk = [ ] > ' > > -### test: smoke test > -if [ -z "${test_variant}" ]; then > +### test: smoke test & smoke test PVH > +if [ -z "${test_variant}" ] || [ "${test_variant}" = "dom0pvh" ]; then > passed="ping test passed" > domU_check=" > ifconfig eth0 192.168.0.2 > @@ -36,6 +37,9 @@ done > tail -n 100 /var/log/xen/console/guest-domU.log > echo \"${passed}\" > " > +if [ "${test_variant}" = "dom0pvh" ]; then > +dom0pvh="dom0=pvh" > +fi > > ### test: S3 > elif [ "${test_variant}" = "s3" ]; then > @@ -184,11 +188,11 @@ cd .. > TFTP=/scratch/gitlab-runner/tftp > CONTROLLER=control@thor.testnet > > -echo ' > -multiboot2 (http)/gitlab-ci/xen console=com1 com1=115200,8n1 loglvl=all > guest_loglvl=all dom0_mem=4G > +echo " > +multiboot2 (http)/gitlab-ci/xen console=com1 com1=115200,8n1 loglvl=all > guest_loglvl=all dom0_mem=4G $dom0pvh > module2 (http)/gitlab-ci/vmlinuz console=hvc0 root=/dev/ram0 > module2 (http)/gitlab-ci/initrd-dom0 > -' > $TFTP/grub.cfg > +" > $TFTP/grub.cfg > > cp -f binaries/xen $TFTP/xen > cp -f binaries/bzImage $TFTP/vmlinuz > -- > 2.25.1 > > -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
[linux-linus test] 180633: regressions - FAIL
flight 180633 linux-linus real [real] flight 180640 linux-linus real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/180633/ http://logs.test-lab.xenproject.org/osstest/logs/180640/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-armhf-armhf-xl-credit1 8 xen-boot fail REGR. vs. 180278 Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 8 xen-boot fail like 180278 test-armhf-armhf-xl-multivcpu 8 xen-boot fail like 180278 test-armhf-armhf-libvirt-raw 8 xen-boot fail like 180278 test-armhf-armhf-xl-rtds 8 xen-boot fail like 180278 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180278 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180278 test-armhf-armhf-xl 8 xen-boot fail like 180278 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180278 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180278 test-armhf-armhf-examine 8 reboot fail like 180278 test-armhf-armhf-libvirt-qcow2 8 xen-bootfail like 180278 test-armhf-armhf-xl-credit2 8 xen-boot fail like 180278 test-armhf-armhf-xl-vhd 8 xen-boot fail like 180278 test-armhf-armhf-xl-arndale 8 xen-boot fail like 180278 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180278 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass version targeted for testing: linuxcc3c44c9fda264c6d401be04e95449a57c1231c6 baseline version: linux6c538e1adbfc696ac4747fb10d63e704344f763d Last test of basis 180278 2023-04-16 19:41:46 Z 26 days Failing since180281 2023-04-17 06:24:36 Z 25 days 47 attempts Testing same since 180625 2023-05-11 22:11:49 Z1 days2 attempts 2380 people touched revisions under test, not listing them all jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-arm64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-xl pass
[PATCH] automation: add a Dom0 PVH test based on Qubes' runner
From: Stefano Stabellini Straightforward Dom0 PVH test based on the existing basic Smoke test for the Qubes runner. Signed-off-by: Stefano Stabellini --- automation/gitlab-ci/test.yaml | 8 automation/scripts/qubes-x86-64.sh | 14 +- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml index 55ca0c27dc..9c0e08d456 100644 --- a/automation/gitlab-ci/test.yaml +++ b/automation/gitlab-ci/test.yaml @@ -149,6 +149,14 @@ adl-smoke-x86-64-gcc-debug: - *x86-64-test-needs - alpine-3.12-gcc-debug +adl-smoke-x86-64-dom0pvh-gcc-debug: + extends: .adl-x86-64 + script: +- ./automation/scripts/qubes-x86-64.sh dom0pvh 2>&1 | tee ${LOGFILE} + needs: +- *x86-64-test-needs +- alpine-3.12-gcc-debug + adl-suspend-x86-64-gcc-debug: extends: .adl-x86-64 script: diff --git a/automation/scripts/qubes-x86-64.sh b/automation/scripts/qubes-x86-64.sh index 056faf9e6d..35b9386e5d 100755 --- a/automation/scripts/qubes-x86-64.sh +++ b/automation/scripts/qubes-x86-64.sh @@ -5,6 +5,7 @@ set -ex test_variant=$1 ### defaults +dom0pvh= wait_and_wakeup= timeout=120 domU_config=' @@ -18,8 +19,8 @@ vif = [ "bridge=xenbr0", ] disk = [ ] ' -### test: smoke test -if [ -z "${test_variant}" ]; then +### test: smoke test & smoke test PVH +if [ -z "${test_variant}" ] || [ "${test_variant}" = "dom0pvh" ]; then passed="ping test passed" domU_check=" ifconfig eth0 192.168.0.2 @@ -36,6 +37,9 @@ done tail -n 100 /var/log/xen/console/guest-domU.log echo \"${passed}\" " +if [ "${test_variant}" = "dom0pvh" ]; then +dom0pvh="dom0=pvh" +fi ### test: S3 elif [ "${test_variant}" = "s3" ]; then @@ -184,11 +188,11 @@ cd .. TFTP=/scratch/gitlab-runner/tftp CONTROLLER=control@thor.testnet -echo ' -multiboot2 (http)/gitlab-ci/xen console=com1 com1=115200,8n1 loglvl=all guest_loglvl=all dom0_mem=4G +echo " +multiboot2 (http)/gitlab-ci/xen console=com1 com1=115200,8n1 loglvl=all guest_loglvl=all dom0_mem=4G $dom0pvh module2 (http)/gitlab-ci/vmlinuz console=hvc0 root=/dev/ram0 module2 (http)/gitlab-ci/initrd-dom0 -' > $TFTP/grub.cfg +" > $TFTP/grub.cfg cp -f binaries/xen $TFTP/xen cp -f binaries/bzImage $TFTP/vmlinuz -- 2.25.1
[PATCH 2/2] xen/x86/pvh: copy ACPI tables to Dom0 instead of mapping
From: Stefano Stabellini Mapping the ACPI tables to Dom0 PVH 1:1 leads to memory corruptions of the tables in the guest. Instead, copy the tables to Dom0. This is a workaround. Signed-off-by: Stefano Stabellini --- As mentioned in the cover letter, this is a RFC workaround as I don't know the cause of the underlying problem. I do know that this patch solves what would be otherwise a hang at boot when Dom0 PVH attempts to parse ACPI tables. --- xen/arch/x86/hvm/dom0_build.c | 107 +- 1 file changed, 27 insertions(+), 80 deletions(-) diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c index 5fde769863..a6037fc6ed 100644 --- a/xen/arch/x86/hvm/dom0_build.c +++ b/xen/arch/x86/hvm/dom0_build.c @@ -73,32 +73,6 @@ static void __init print_order_stats(const struct domain *d) printk("order %2u allocations: %u\n", i, order_stats[i]); } -static int __init modify_identity_mmio(struct domain *d, unsigned long pfn, - unsigned long nr_pages, const bool map) -{ -int rc; - -for ( ; ; ) -{ -rc = map ? map_mmio_regions(d, _gfn(pfn), nr_pages, _mfn(pfn)) - : unmap_mmio_regions(d, _gfn(pfn), nr_pages, _mfn(pfn)); -if ( rc == 0 ) -break; -if ( rc < 0 ) -{ -printk(XENLOG_WARNING - "Failed to identity %smap [%#lx,%#lx) for d%d: %d\n", - map ? "" : "un", pfn, pfn + nr_pages, d->domain_id, rc); -break; -} -nr_pages -= rc; -pfn += rc; -process_pending_softirqs(); -} - -return rc; -} - /* Populate a HVM memory range using the biggest possible order. */ static int __init pvh_populate_memory_range(struct domain *d, unsigned long start, @@ -967,6 +941,8 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, paddr_t madt_addr, unsigned long size = sizeof(*xsdt); unsigned int i, j, num_tables = 0; int rc; +struct acpi_table_fadt fadt; +unsigned long fadt_addr = 0, dsdt_addr = 0, facs_addr = 0, fadt_size = 0; struct acpi_table_header header = { .signature= "XSDT", .length = sizeof(struct acpi_table_header), @@ -1013,10 +989,33 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, paddr_t madt_addr, /* Copy the addresses of the rest of the allowed tables. */ for( i = 0, j = 1; i < acpi_gbl_root_table_list.count; i++ ) { +void *table; + +pvh_steal_ram(d, tables[i].length, 0, GB(4), addr); +table = acpi_os_map_memory(tables[i].address, tables[i].length); +hvm_copy_to_guest_phys(*addr, table, tables[i].length, d->vcpu[0]); +pvh_add_mem_range(d, *addr, *addr + tables[i].length, E820_ACPI); + +if ( !strncmp(tables[i].signature.ascii, ACPI_SIG_FADT, ACPI_NAME_SIZE) ) +{ +memcpy(, table, tables[i].length); +fadt_addr = *addr; +fadt_size = tables[i].length; +} +else if ( !strncmp(tables[i].signature.ascii, ACPI_SIG_DSDT, ACPI_NAME_SIZE) ) +dsdt_addr = *addr; +else if ( !strncmp(tables[i].signature.ascii, ACPI_SIG_FACS, ACPI_NAME_SIZE) ) +facs_addr = *addr; + if ( pvh_acpi_xsdt_table_allowed(tables[i].signature.ascii, - tables[i].address, tables[i].length) ) -xsdt->table_offset_entry[j++] = tables[i].address; +tables[i].address, tables[i].length) ) +xsdt->table_offset_entry[j++] = *addr; + +acpi_os_unmap_memory(table, tables[i].length); } +fadt.dsdt = dsdt_addr; +fadt.facs = facs_addr; +hvm_copy_to_guest_phys(fadt_addr, , fadt_size, d->vcpu[0]); xsdt->header.revision = 1; xsdt->header.length = size; @@ -1055,9 +1054,7 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, paddr_t madt_addr, static int __init pvh_setup_acpi(struct domain *d, paddr_t start_info) { -unsigned long pfn, nr_pages; paddr_t madt_paddr, xsdt_paddr, rsdp_paddr; -unsigned int i; int rc; struct acpi_table_rsdp *native_rsdp, rsdp = { .signature = ACPI_SIG_RSDP, @@ -1065,56 +1062,6 @@ static int __init pvh_setup_acpi(struct domain *d, paddr_t start_info) .length = sizeof(rsdp), }; - -/* Scan top-level tables and add their regions to the guest memory map. */ -for( i = 0; i < acpi_gbl_root_table_list.count; i++ ) -{ -const char *sig = acpi_gbl_root_table_list.tables[i].signature.ascii; -unsigned long addr = acpi_gbl_root_table_list.tables[i].address; -unsigned long size = acpi_gbl_root_table_list.tables[i].length; - -/* - * Make sure the original MADT is also mapped, so that Dom0 can - * properly access the data returned by _MAT methods in case it's - * re-using MADT
[PATCH 1/2] xen/x86/pvh: use preset XSDT header for XSDT generation
From: Stefano Stabellini Xen always generates a XSDT table even if the firmware provided a RSDT table. Instead of copying the XSDT header from the firmware table (that might be missing), generate the XSDT header from a preset. Signed-off-by: Stefano Stabellini --- xen/arch/x86/hvm/dom0_build.c | 32 +--- 1 file changed, 9 insertions(+), 23 deletions(-) diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c index 307edc6a8c..5fde769863 100644 --- a/xen/arch/x86/hvm/dom0_build.c +++ b/xen/arch/x86/hvm/dom0_build.c @@ -963,13 +963,18 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, paddr_t madt_addr, paddr_t *addr) { struct acpi_table_xsdt *xsdt; -struct acpi_table_header *table; -struct acpi_table_rsdp *rsdp; const struct acpi_table_desc *tables = acpi_gbl_root_table_list.tables; unsigned long size = sizeof(*xsdt); unsigned int i, j, num_tables = 0; -paddr_t xsdt_paddr; int rc; +struct acpi_table_header header = { +.signature= "XSDT", +.length = sizeof(struct acpi_table_header), +.revision = 0x1, +.oem_id = "Xen", +.oem_table_id = "HVM", +.oem_revision = 0, +}; /* * Restore original DMAR table signature, we are going to filter it from @@ -1001,26 +1006,7 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, paddr_t madt_addr, goto out; } -/* Copy the native XSDT table header. */ -rsdp = acpi_os_map_memory(acpi_os_get_root_pointer(), sizeof(*rsdp)); -if ( !rsdp ) -{ -printk("Unable to map RSDP\n"); -rc = -EINVAL; -goto out; -} -xsdt_paddr = rsdp->xsdt_physical_address; -acpi_os_unmap_memory(rsdp, sizeof(*rsdp)); -table = acpi_os_map_memory(xsdt_paddr, sizeof(*table)); -if ( !table ) -{ -printk("Unable to map XSDT\n"); -rc = -EINVAL; -goto out; -} -xsdt->header = *table; -acpi_os_unmap_memory(table, sizeof(*table)); - +xsdt->header = header; /* Add the custom MADT. */ xsdt->table_offset_entry[0] = madt_addr; -- 2.25.1
[PATCH 0/2] PVH Dom0 on QEMU
Hi all, These 2 patches are necessary to boot Xen and Dom0 PVH on QEMU (with or without KVM acceleration.) The first one is a genuine fix. The second one is a workaround: I don't know the underlying root cause of the problem. Cheers, Stefano
PVH feature omission
>From tools/libs/light/libxl_x86.c: libxl__arch_passthrough_mode_setdefault() "passthrough not yet supported for x86 PVH guests\n" So PVH is recommended for most situations, but it is /still/ impossible to pass hardware devices to PVH domains? Seems odd this has never been addressed with how long PVH has been around. The other tools omission I noticed is it appears `xl` has no support for pvSCSI? Might be infrequently used, but seems similarly valuable for domains focused on handling storage. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
[patch V4 31/37] x86/smpboot: Enable split CPU startup
From: Thomas Gleixner The x86 CPU bringup state currently does AP wake-up, wait for AP to respond and then release it for full bringup. It is safe to be split into a wake-up and and a separate wait+release state. Provide the required functions and enable the split CPU bringup, which prepares for parallel bringup, where the bringup of the non-boot CPUs takes two iterations: One to prepare and wake all APs and the second to wait and release them. Depending on timing this can eliminate the wait time completely. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/Kconfig |2 +- arch/x86/include/asm/smp.h |9 ++--- arch/x86/kernel/smp.c |2 +- arch/x86/kernel/smpboot.c |8 arch/x86/xen/smp_pv.c |4 ++-- 5 files changed, 10 insertions(+), 15 deletions(-) --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -274,8 +274,8 @@ config X86 select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER select HAVE_GENERIC_VDSO - select HOTPLUG_CORE_SYNC_FULL if SMP select HOTPLUG_SMT if SMP + select HOTPLUG_SPLIT_STARTUPif SMP select IRQ_FORCED_THREADING select NEED_PER_CPU_EMBED_FIRST_CHUNK select NEED_PER_CPU_PAGE_FIRST_CHUNK --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -40,7 +40,7 @@ struct smp_ops { void (*cleanup_dead_cpu)(unsigned cpu); void (*poll_sync_state)(void); - int (*cpu_up)(unsigned cpu, struct task_struct *tidle); + int (*kick_ap_alive)(unsigned cpu, struct task_struct *tidle); int (*cpu_disable)(void); void (*cpu_die)(unsigned int cpu); void (*play_dead)(void); @@ -80,11 +80,6 @@ static inline void smp_cpus_done(unsigne smp_ops.smp_cpus_done(max_cpus); } -static inline int __cpu_up(unsigned int cpu, struct task_struct *tidle) -{ - return smp_ops.cpu_up(cpu, tidle); -} - static inline int __cpu_disable(void) { return smp_ops.cpu_disable(); @@ -124,7 +119,7 @@ void native_smp_prepare_cpus(unsigned in void calculate_max_logical_packages(void); void native_smp_cpus_done(unsigned int max_cpus); int common_cpu_up(unsigned int cpunum, struct task_struct *tidle); -int native_cpu_up(unsigned int cpunum, struct task_struct *tidle); +int native_kick_ap(unsigned int cpu, struct task_struct *tidle); int native_cpu_disable(void); void __noreturn hlt_play_dead(void); void native_play_dead(void); --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -268,7 +268,7 @@ struct smp_ops smp_ops = { #endif .smp_send_reschedule= native_smp_send_reschedule, - .cpu_up = native_cpu_up, + .kick_ap_alive = native_kick_ap, .cpu_disable= native_cpu_disable, .play_dead = native_play_dead, --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1052,7 +1052,7 @@ static int do_boot_cpu(int apicid, int c return ret; } -static int native_kick_ap(unsigned int cpu, struct task_struct *tidle) +int native_kick_ap(unsigned int cpu, struct task_struct *tidle) { int apicid = apic->cpu_present_to_apicid(cpu); int err; @@ -1088,15 +1088,15 @@ static int native_kick_ap(unsigned int c return err; } -int native_cpu_up(unsigned int cpu, struct task_struct *tidle) +int arch_cpuhp_kick_ap_alive(unsigned int cpu, struct task_struct *tidle) { - return native_kick_ap(cpu, tidle); + return smp_ops.kick_ap_alive(cpu, tidle); } void arch_cpuhp_cleanup_kick_cpu(unsigned int cpu) { /* Cleanup possible dangling ends... */ - if (smp_ops.cpu_up == native_cpu_up && x86_platform.legacy.warm_reset) + if (smp_ops.kick_ap_alive == native_kick_ap && x86_platform.legacy.warm_reset) smpboot_restore_warm_reset_vector(); } --- a/arch/x86/xen/smp_pv.c +++ b/arch/x86/xen/smp_pv.c @@ -314,7 +314,7 @@ cpu_initialize_context(unsigned int cpu, return 0; } -static int xen_pv_cpu_up(unsigned int cpu, struct task_struct *idle) +static int xen_pv_kick_ap(unsigned int cpu, struct task_struct *idle) { int rc; @@ -438,7 +438,7 @@ static const struct smp_ops xen_smp_ops .smp_prepare_cpus = xen_pv_smp_prepare_cpus, .smp_cpus_done = xen_smp_cpus_done, - .cpu_up = xen_pv_cpu_up, + .kick_ap_alive = xen_pv_kick_ap, .cpu_die = xen_pv_cpu_die, .cleanup_dead_cpu = xen_pv_cleanup_dead_cpu, .poll_sync_state = xen_pv_poll_sync_state,
[patch V4 34/37] x86/apic: Save the APIC virtual base address
From: Thomas Gleixner For parallel CPU brinugp it's required to read the APIC ID in the low level startup code. The virtual APIC base address is a constant because its a fix-mapped address. Exposing that constant which is composed via macros to assembly code is non-trivial due to header inclusion hell. Aside of that it's constant only because of the vsyscall ABI requirement. Once vsyscall is out of the picture the fixmap can be placed at runtime. Avoid header hell, stay flexible and store the address in a variable which can be exposed to the low level startup code. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- V4: Fixed changelog typo - Sergey --- arch/x86/include/asm/smp.h |1 + arch/x86/kernel/apic/apic.c |4 2 files changed, 5 insertions(+) --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -196,6 +196,7 @@ extern void nmi_selftest(void); #endif extern unsigned int smpboot_control; +extern unsigned long apic_mmio_base; #endif /* !__ASSEMBLY__ */ --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -101,6 +101,9 @@ static int apic_extnmi __ro_after_init = */ static bool virt_ext_dest_id __ro_after_init; +/* For parallel bootup. */ +unsigned long apic_mmio_base __ro_after_init; + /* * Map cpu index to physical APIC ID */ @@ -2163,6 +2166,7 @@ void __init register_lapic_address(unsig if (!x2apic_mode) { set_fixmap_nocache(FIX_APIC_BASE, address); + apic_mmio_base = APIC_BASE; apic_printk(APIC_VERBOSE, "mapped APIC to %16lx (%16lx)\n", APIC_BASE, address); }
[patch V4 22/37] ARM: smp: Switch to hotplug core state synchronization
From: Thomas Gleixner Switch to the CPU hotplug core state tracking and synchronization mechanim. No functional change intended. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/arm/Kconfig |1 + arch/arm/include/asm/smp.h |2 +- arch/arm/kernel/smp.c | 18 +++--- 3 files changed, 9 insertions(+), 12 deletions(-) --- --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -124,6 +124,7 @@ config ARM select HAVE_SYSCALL_TRACEPOINTS select HAVE_UID16 select HAVE_VIRT_CPU_ACCOUNTING_GEN + select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU select IRQ_FORCED_THREADING select MODULES_USE_ELF_REL select NEED_DMA_MAP_STATE --- a/arch/arm/include/asm/smp.h +++ b/arch/arm/include/asm/smp.h @@ -64,7 +64,7 @@ extern void secondary_startup_arm(void); extern int __cpu_disable(void); -extern void __cpu_die(unsigned int cpu); +static inline void __cpu_die(unsigned int cpu) { } extern void arch_send_call_function_single_ipi(int cpu); extern void arch_send_call_function_ipi_mask(const struct cpumask *mask); --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -288,15 +288,11 @@ int __cpu_disable(void) } /* - * called on the thread which is asking for a CPU to be shutdown - - * waits until shutdown has completed, or it is timed out. + * called on the thread which is asking for a CPU to be shutdown after the + * shutdown completed. */ -void __cpu_die(unsigned int cpu) +void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu) { - if (!cpu_wait_death(cpu, 5)) { - pr_err("CPU%u: cpu didn't die\n", cpu); - return; - } pr_debug("CPU%u: shutdown\n", cpu); clear_tasks_mm_cpumask(cpu); @@ -336,11 +332,11 @@ void __noreturn arch_cpu_idle_dead(void) flush_cache_louis(); /* -* Tell __cpu_die() that this CPU is now safe to dispose of. Once -* this returns, power and/or clocks can be removed at any point -* from this CPU and its cache by platform_cpu_kill(). +* Tell cpuhp_bp_sync_dead() that this CPU is now safe to dispose +* of. Once this returns, power and/or clocks can be removed at +* any point from this CPU and its cache by platform_cpu_kill(). */ - (void)cpu_report_death(); + cpuhp_ap_report_dead(); /* * Ensure that the cache lines associated with that completion are
[patch V4 33/37] cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE
From: Thomas Gleixner There is often significant latency in the early stages of CPU bringup, and time is wasted by waking each CPU (e.g. with SIPI/INIT/INIT on x86) and then waiting for it to respond before moving on to the next. Allow a platform to enable parallel setup which brings all to be onlined CPUs up to the CPUHP_BP_KICK_AP state. While this state advancement on the control CPU (BP) is single-threaded the important part is the last state CPUHP_BP_KICK_AP which wakes the to be onlined CPUs up. This allows the CPUs to run up to the first sychronization point cpuhp_ap_sync_alive() where they wait for the control CPU to release them one by one for the full onlining procedure. This parallelism depends on the CPU hotplug core sync mechanism which ensures that the parallel brought up CPUs wait for release before touching any state which would make the CPU visible to anything outside the hotplug control mechanism. To handle the SMT constraints of X86 correctly the bringup happens in two iterations when CONFIG_HOTPLUG_SMT is enabled. The control CPU brings up the primary SMT threads of each core first, which can load the microcode without the need to rendevouz with the thread siblings. Once that's completed it brings up the secondary SMT threads. Co-developed-by: David Woodhouse Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- Documentation/admin-guide/kernel-parameters.txt |6 + arch/Kconfig|4 include/linux/cpuhotplug.h |1 kernel/cpu.c| 103 ++-- 4 files changed, 109 insertions(+), 5 deletions(-) --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -838,6 +838,12 @@ on every CPU online, such as boot, and resume from suspend. Default: 1 + cpuhp.parallel= + [SMP] Enable/disable parallel bringup of secondary CPUs + Format: + Default is enabled if CONFIG_HOTPLUG_PARALLEL=y. Otherwise + the parameter has no effect. + crash_kexec_post_notifiers Run kdump after running panic-notifiers and dumping kmsg. This only for the users who doubt kdump always --- a/arch/Kconfig +++ b/arch/Kconfig @@ -53,6 +53,10 @@ config HOTPLUG_SPLIT_STARTUP bool select HOTPLUG_CORE_SYNC_FULL +config HOTPLUG_PARALLEL + bool + select HOTPLUG_SPLIT_STARTUP + config GENERIC_ENTRY bool --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -524,6 +524,7 @@ void cpuhp_ap_sync_alive(void); void arch_cpuhp_sync_state_poll(void); void arch_cpuhp_cleanup_kick_cpu(unsigned int cpu); int arch_cpuhp_kick_ap_alive(unsigned int cpu, struct task_struct *tidle); +bool arch_cpuhp_init_parallel_bringup(void); #ifdef CONFIG_HOTPLUG_CORE_SYNC_DEAD void cpuhp_ap_report_dead(void); --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -649,8 +649,23 @@ bool cpu_smt_possible(void) cpu_smt_control != CPU_SMT_NOT_SUPPORTED; } EXPORT_SYMBOL_GPL(cpu_smt_possible); + +static inline bool cpuhp_smt_aware(void) +{ + return topology_smt_supported(); +} + +static inline const struct cpumask *cpuhp_get_primary_thread_mask(void) +{ + return cpu_primary_thread_mask; +} #else static inline bool cpu_smt_allowed(unsigned int cpu) { return true; } +static inline bool cpuhp_smt_aware(void) { return false; } +static inline const struct cpumask *cpuhp_get_primary_thread_mask(void) +{ + return cpu_present_mask; +} #endif static inline enum cpuhp_state @@ -1747,16 +1762,94 @@ int bringup_hibernate_cpu(unsigned int s return 0; } -void __init bringup_nonboot_cpus(unsigned int setup_max_cpus) +static void __init cpuhp_bringup_mask(const struct cpumask *mask, unsigned int ncpus, + enum cpuhp_state target) { unsigned int cpu; - for_each_present_cpu(cpu) { - if (num_online_cpus() >= setup_max_cpus) + for_each_cpu(cpu, mask) { + struct cpuhp_cpu_state *st = per_cpu_ptr(_state, cpu); + + if (!--ncpus) break; - if (!cpu_online(cpu)) - cpu_up(cpu, CPUHP_ONLINE); + + if (cpu_up(cpu, target) && can_rollback_cpu(st)) { + /* +* If this failed then cpu_up() might have only +* rolled back to CPUHP_BP_KICK_AP for the final +* online. Clean it up. NOOP if already rolled back. +*/ + WARN_ON(cpuhp_invoke_callback_range(false, cpu, st, CPUHP_OFFLINE)); + } + } +} + +#ifdef CONFIG_HOTPLUG_PARALLEL +static bool
[patch V4 29/37] cpu/hotplug: Reset task stack state in _cpu_up()
From: David Woodhouse Commit dce1ca0525bf ("sched/scs: Reset task stack state in bringup_cpu()") ensured that the shadow call stack and KASAN poisoning were removed from a CPU's stack each time that CPU is brought up, not just once. This is not incorrect. However, with parallel bringup the idle thread setup will happen at a different step. As a consequence the cleanup in bringup_cpu() would be too late. Move the SCS/KASAN cleanup to the generic _cpu_up() function instead, which already ensures that the new CPU's stack is available, purely to allow for early failure. This occurs when the CPU to be brought up is in the CPUHP_OFFLINE state, which should correctly do the cleanup any time the CPU has been taken down to the point where such is needed. Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner Tested-by: Mark Rutland Tested-by: Michael Kelley Reviewed-by: Mark Rutland --- kernel/cpu.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -771,12 +771,6 @@ static int bringup_cpu(unsigned int cpu) return -EAGAIN; /* -* Reset stale stack state from the last time this CPU was online. -*/ - scs_task_reset(idle); - kasan_unpoison_task_stack(idle); - - /* * Some architectures have to walk the irq descriptors to * setup the vector space for the cpu which comes online. * @@ -1587,6 +1581,12 @@ static int _cpu_up(unsigned int cpu, int ret = PTR_ERR(idle); goto out; } + + /* +* Reset stale stack state from the last time this CPU was online. +*/ + scs_task_reset(idle); + kasan_unpoison_task_stack(idle); } cpuhp_tasks_frozen = tasks_frozen;
[patch V4 26/37] parisc: Switch to hotplug core state synchronization
From: Thomas Gleixner Switch to the CPU hotplug core state tracking and synchronization mechanim. No functional change intended. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/parisc/Kconfig |1 + arch/parisc/kernel/process.c |4 ++-- arch/parisc/kernel/smp.c |7 +++ 3 files changed, 6 insertions(+), 6 deletions(-) --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -57,6 +57,7 @@ config PARISC select HAVE_ARCH_SECCOMP_FILTER select HAVE_ARCH_TRACEHOOK select HAVE_REGS_AND_STACK_ACCESS_API + select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU select GENERIC_SCHED_CLOCK select GENERIC_IRQ_MIGRATION if SMP select HAVE_UNSTABLE_SCHED_CLOCK if SMP --- a/arch/parisc/kernel/process.c +++ b/arch/parisc/kernel/process.c @@ -166,8 +166,8 @@ void __noreturn arch_cpu_idle_dead(void) local_irq_disable(); - /* Tell __cpu_die() that this CPU is now safe to dispose of. */ - (void)cpu_report_death(); + /* Tell the core that this CPU is now safe to dispose of. */ + cpuhp_ap_report_dead(); /* Ensure that the cache lines are written out. */ flush_cache_all_local(); --- a/arch/parisc/kernel/smp.c +++ b/arch/parisc/kernel/smp.c @@ -500,11 +500,10 @@ int __cpu_disable(void) void __cpu_die(unsigned int cpu) { pdc_cpu_rendezvous_lock(); +} - if (!cpu_wait_death(cpu, 5)) { - pr_crit("CPU%u: cpu didn't die\n", cpu); - return; - } +void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu) +{ pr_info("CPU%u: is shutting down\n", cpu); /* set task's state to interruptible sleep */
[patch V4 28/37] cpu/hotplug: Remove unused state functions
From: Thomas Gleixner All users converted to the hotplug core mechanism. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- include/linux/cpu.h |2 - kernel/smpboot.c| 75 2 files changed, 77 deletions(-) --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -193,8 +193,6 @@ static inline void play_idle(unsigned lo } #ifdef CONFIG_HOTPLUG_CPU -bool cpu_wait_death(unsigned int cpu, int seconds); -bool cpu_report_death(void); void cpuhp_report_idle_dead(void); #else static inline void cpuhp_report_idle_dead(void) { } --- a/kernel/smpboot.c +++ b/kernel/smpboot.c @@ -325,78 +325,3 @@ void smpboot_unregister_percpu_thread(st cpus_read_unlock(); } EXPORT_SYMBOL_GPL(smpboot_unregister_percpu_thread); - -#ifndef CONFIG_HOTPLUG_CORE_SYNC -static DEFINE_PER_CPU(atomic_t, cpu_hotplug_state) = ATOMIC_INIT(CPU_POST_DEAD); - -#ifdef CONFIG_HOTPLUG_CPU -/* - * Wait for the specified CPU to exit the idle loop and die. - */ -bool cpu_wait_death(unsigned int cpu, int seconds) -{ - int jf_left = seconds * HZ; - int oldstate; - bool ret = true; - int sleep_jf = 1; - - might_sleep(); - - /* The outgoing CPU will normally get done quite quickly. */ - if (atomic_read(_cpu(cpu_hotplug_state, cpu)) == CPU_DEAD) - goto update_state_early; - udelay(5); - - /* But if the outgoing CPU dawdles, wait increasingly long times. */ - while (atomic_read(_cpu(cpu_hotplug_state, cpu)) != CPU_DEAD) { - schedule_timeout_uninterruptible(sleep_jf); - jf_left -= sleep_jf; - if (jf_left <= 0) - break; - sleep_jf = DIV_ROUND_UP(sleep_jf * 11, 10); - } -update_state_early: - oldstate = atomic_read(_cpu(cpu_hotplug_state, cpu)); -update_state: - if (oldstate == CPU_DEAD) { - /* Outgoing CPU died normally, update state. */ - smp_mb(); /* atomic_read() before update. */ - atomic_set(_cpu(cpu_hotplug_state, cpu), CPU_POST_DEAD); - } else { - /* Outgoing CPU still hasn't died, set state accordingly. */ - if (!atomic_try_cmpxchg(_cpu(cpu_hotplug_state, cpu), - , CPU_BROKEN)) - goto update_state; - ret = false; - } - return ret; -} - -/* - * Called by the outgoing CPU to report its successful death. Return - * false if this report follows the surviving CPU's timing out. - * - * A separate "CPU_DEAD_FROZEN" is used when the surviving CPU - * timed out. This approach allows architectures to omit calls to - * cpu_check_up_prepare() and cpu_set_state_online() without defeating - * the next cpu_wait_death()'s polling loop. - */ -bool cpu_report_death(void) -{ - int oldstate; - int newstate; - int cpu = smp_processor_id(); - - oldstate = atomic_read(_cpu(cpu_hotplug_state, cpu)); - do { - if (oldstate != CPU_BROKEN) - newstate = CPU_DEAD; - else - newstate = CPU_DEAD_FROZEN; - } while (!atomic_try_cmpxchg(_cpu(cpu_hotplug_state, cpu), -, newstate)); - return newstate == CPU_DEAD; -} - -#endif /* #ifdef CONFIG_HOTPLUG_CPU */ -#endif /* !CONFIG_HOTPLUG_CORE_SYNC */
[patch V4 32/37] x86/apic: Provide cpu_primary_thread mask
From: Thomas Gleixner Make the primary thread tracking CPU mask based in preparation for simpler handling of parallel bootup. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/include/asm/apic.h |2 -- arch/x86/include/asm/topology.h | 19 +++ arch/x86/kernel/apic/apic.c | 20 +--- arch/x86/kernel/smpboot.c | 12 +++- 4 files changed, 27 insertions(+), 26 deletions(-) --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -506,10 +506,8 @@ extern int default_check_phys_apicid_pre #endif /* CONFIG_X86_LOCAL_APIC */ #ifdef CONFIG_SMP -bool apic_id_is_primary_thread(unsigned int id); void apic_smt_update(void); #else -static inline bool apic_id_is_primary_thread(unsigned int id) { return false; } static inline void apic_smt_update(void) { } #endif --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -31,9 +31,9 @@ * CONFIG_NUMA. */ #include +#include #ifdef CONFIG_NUMA -#include #include #include @@ -139,9 +139,20 @@ static inline int topology_max_smt_threa int topology_update_package_map(unsigned int apicid, unsigned int cpu); int topology_update_die_map(unsigned int dieid, unsigned int cpu); int topology_phys_to_logical_pkg(unsigned int pkg); -bool topology_is_primary_thread(unsigned int cpu); bool topology_smt_supported(void); -#else + +extern struct cpumask __cpu_primary_thread_mask; +#define cpu_primary_thread_mask ((const struct cpumask *)&__cpu_primary_thread_mask) + +/** + * topology_is_primary_thread - Check whether CPU is the primary SMT thread + * @cpu: CPU to check + */ +static inline bool topology_is_primary_thread(unsigned int cpu) +{ + return cpumask_test_cpu(cpu, cpu_primary_thread_mask); +} +#else /* CONFIG_SMP */ #define topology_max_packages()(1) static inline int topology_update_package_map(unsigned int apicid, unsigned int cpu) { return 0; } @@ -152,7 +163,7 @@ static inline int topology_max_die_per_p static inline int topology_max_smt_threads(void) { return 1; } static inline bool topology_is_primary_thread(unsigned int cpu) { return true; } static inline bool topology_smt_supported(void) { return false; } -#endif +#endif /* !CONFIG_SMP */ static inline void arch_fix_phys_package_id(int num, u32 slot) { --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2386,20 +2386,16 @@ bool arch_match_cpu_phys_id(int cpu, u64 } #ifdef CONFIG_SMP -/** - * apic_id_is_primary_thread - Check whether APIC ID belongs to a primary thread - * @apicid: APIC ID to check - */ -bool apic_id_is_primary_thread(unsigned int apicid) +static void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid) { - u32 mask; - - if (smp_num_siblings == 1) - return true; /* Isolate the SMT bit(s) in the APICID and check for 0 */ - mask = (1U << (fls(smp_num_siblings) - 1)) - 1; - return !(apicid & mask); + u32 mask = (1U << (fls(smp_num_siblings) - 1)) - 1; + + if (smp_num_siblings == 1 || !(apicid & mask)) + cpumask_set_cpu(cpu, &__cpu_primary_thread_mask); } +#else +static inline void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid) { } #endif /* @@ -2544,6 +2540,8 @@ int generic_processor_info(int apicid, i set_cpu_present(cpu, true); num_processors++; + cpu_mark_primary_thread(cpu, apicid); + return cpu; } --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -102,6 +102,9 @@ EXPORT_PER_CPU_SYMBOL(cpu_die_map); DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info); EXPORT_PER_CPU_SYMBOL(cpu_info); +/* CPUs which are the primary SMT threads */ +struct cpumask __cpu_primary_thread_mask __read_mostly; + /* Representing CPUs for which sibling maps can be computed */ static cpumask_var_t cpu_sibling_setup_mask; @@ -277,15 +280,6 @@ static void notrace start_secondary(void } /** - * topology_is_primary_thread - Check whether CPU is the primary SMT thread - * @cpu: CPU to check - */ -bool topology_is_primary_thread(unsigned int cpu) -{ - return apic_id_is_primary_thread(per_cpu(x86_cpu_to_apicid, cpu)); -} - -/** * topology_smt_supported - Check whether SMT is supported by the CPUs */ bool topology_smt_supported(void)
[patch V4 21/37] cpu/hotplug: Remove cpu_report_state() and related unused cruft
From: Thomas Gleixner No more users. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- include/linux/cpu.h |2 - kernel/smpboot.c| 90 2 files changed, 92 deletions(-) --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -184,8 +184,6 @@ void arch_cpu_idle_enter(void); void arch_cpu_idle_exit(void); void __noreturn arch_cpu_idle_dead(void); -int cpu_report_state(int cpu); -int cpu_check_up_prepare(int cpu); void cpu_set_state_online(int cpu); void play_idle_precise(u64 duration_ns, u64 latency_ns); --- a/kernel/smpboot.c +++ b/kernel/smpboot.c @@ -329,97 +329,7 @@ EXPORT_SYMBOL_GPL(smpboot_unregister_per #ifndef CONFIG_HOTPLUG_CORE_SYNC static DEFINE_PER_CPU(atomic_t, cpu_hotplug_state) = ATOMIC_INIT(CPU_POST_DEAD); -/* - * Called to poll specified CPU's state, for example, when waiting for - * a CPU to come online. - */ -int cpu_report_state(int cpu) -{ - return atomic_read(_cpu(cpu_hotplug_state, cpu)); -} - -/* - * If CPU has died properly, set its state to CPU_UP_PREPARE and - * return success. Otherwise, return -EBUSY if the CPU died after - * cpu_wait_death() timed out. And yet otherwise again, return -EAGAIN - * if cpu_wait_death() timed out and the CPU still hasn't gotten around - * to dying. In the latter two cases, the CPU might not be set up - * properly, but it is up to the arch-specific code to decide. - * Finally, -EIO indicates an unanticipated problem. - * - * Note that it is permissible to omit this call entirely, as is - * done in architectures that do no CPU-hotplug error checking. - */ -int cpu_check_up_prepare(int cpu) -{ - if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) { - atomic_set(_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE); - return 0; - } - - switch (atomic_read(_cpu(cpu_hotplug_state, cpu))) { - - case CPU_POST_DEAD: - - /* The CPU died properly, so just start it up again. */ - atomic_set(_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE); - return 0; - - case CPU_DEAD_FROZEN: - - /* -* Timeout during CPU death, so let caller know. -* The outgoing CPU completed its processing, but after -* cpu_wait_death() timed out and reported the error. The -* caller is free to proceed, in which case the state -* will be reset properly by cpu_set_state_online(). -* Proceeding despite this -EBUSY return makes sense -* for systems where the outgoing CPUs take themselves -* offline, with no post-death manipulation required from -* a surviving CPU. -*/ - return -EBUSY; - - case CPU_BROKEN: - - /* -* The most likely reason we got here is that there was -* a timeout during CPU death, and the outgoing CPU never -* did complete its processing. This could happen on -* a virtualized system if the outgoing VCPU gets preempted -* for more than five seconds, and the user attempts to -* immediately online that same CPU. Trying again later -* might return -EBUSY above, hence -EAGAIN. -*/ - return -EAGAIN; - - case CPU_UP_PREPARE: - /* -* Timeout while waiting for the CPU to show up. Allow to try -* again later. -*/ - return 0; - - default: - - /* Should not happen. Famous last words. */ - return -EIO; - } -} - -/* - * Mark the specified CPU online. - * - * Note that it is permissible to omit this call entirely, as is - * done in architectures that do no CPU-hotplug error checking. - */ -void cpu_set_state_online(int cpu) -{ - (void)atomic_xchg(_cpu(cpu_hotplug_state, cpu), CPU_ONLINE); -} - #ifdef CONFIG_HOTPLUG_CPU - /* * Wait for the specified CPU to exit the idle loop and die. */
[patch V4 25/37] MIPS: SMP_CPS: Switch to hotplug core state synchronization
From: Thomas Gleixner Switch to the CPU hotplug core state tracking and synchronization mechanim. This unfortunately requires to add dead reporting to the non CPS platforms as CPS is the only user, but it allows an overall consolidation of this functionality. No functional change intended. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/mips/Kconfig |1 + arch/mips/cavium-octeon/smp.c |1 + arch/mips/include/asm/smp-ops.h |1 + arch/mips/kernel/smp-bmips.c|1 + arch/mips/kernel/smp-cps.c | 14 +- arch/mips/kernel/smp.c |8 arch/mips/loongson64/smp.c |1 + 7 files changed, 18 insertions(+), 9 deletions(-) --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -2285,6 +2285,7 @@ config MIPS_CPS select MIPS_CM select MIPS_CPS_PM if HOTPLUG_CPU select SMP + select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU select SYNC_R4K if (CEVT_R4K || CSRC_R4K) select SYS_SUPPORTS_HOTPLUG_CPU select SYS_SUPPORTS_SCHED_SMT if CPU_MIPSR6 --- a/arch/mips/cavium-octeon/smp.c +++ b/arch/mips/cavium-octeon/smp.c @@ -345,6 +345,7 @@ void play_dead(void) int cpu = cpu_number_map(cvmx_get_core_num()); idle_task_exit(); + cpuhp_ap_report_dead(); octeon_processor_boot = 0xff; per_cpu(cpu_state, cpu) = CPU_DEAD; --- a/arch/mips/include/asm/smp-ops.h +++ b/arch/mips/include/asm/smp-ops.h @@ -33,6 +33,7 @@ struct plat_smp_ops { #ifdef CONFIG_HOTPLUG_CPU int (*cpu_disable)(void); void (*cpu_die)(unsigned int cpu); + void (*cleanup_dead_cpu)(unsigned cpu); #endif #ifdef CONFIG_KEXEC void (*kexec_nonboot_cpu)(void); --- a/arch/mips/kernel/smp-bmips.c +++ b/arch/mips/kernel/smp-bmips.c @@ -392,6 +392,7 @@ static void bmips_cpu_die(unsigned int c void __ref play_dead(void) { idle_task_exit(); + cpuhp_ap_report_dead(); /* flush data cache */ _dma_cache_wback_inv(0, ~0); --- a/arch/mips/kernel/smp-cps.c +++ b/arch/mips/kernel/smp-cps.c @@ -503,8 +503,7 @@ void play_dead(void) } } - /* This CPU has chosen its way out */ - (void)cpu_report_death(); + cpuhp_ap_report_dead(); cps_shutdown_this_cpu(cpu_death); @@ -527,7 +526,9 @@ static void wait_for_sibling_halt(void * } while (!(halted & TCHALT_H)); } -static void cps_cpu_die(unsigned int cpu) +static void cps_cpu_die(unsigned int cpu) { } + +static void cps_cleanup_dead_cpu(unsigned cpu) { unsigned core = cpu_core(_data[cpu]); unsigned int vpe_id = cpu_vpe_id(_data[cpu]); @@ -535,12 +536,6 @@ static void cps_cpu_die(unsigned int cpu unsigned stat; int err; - /* Wait for the cpu to choose its way out */ - if (!cpu_wait_death(cpu, 5)) { - pr_err("CPU%u: didn't offline\n", cpu); - return; - } - /* * Now wait for the CPU to actually offline. Without doing this that * offlining may race with one or more of: @@ -624,6 +619,7 @@ static const struct plat_smp_ops cps_smp #ifdef CONFIG_HOTPLUG_CPU .cpu_disable= cps_cpu_disable, .cpu_die= cps_cpu_die, + .cleanup_dead_cpu = cps_cleanup_dead_cpu, #endif #ifdef CONFIG_KEXEC .kexec_nonboot_cpu = cps_kexec_nonboot_cpu, --- a/arch/mips/kernel/smp.c +++ b/arch/mips/kernel/smp.c @@ -690,6 +690,14 @@ void flush_tlb_one(unsigned long vaddr) EXPORT_SYMBOL(flush_tlb_page); EXPORT_SYMBOL(flush_tlb_one); +#ifdef CONFIG_HOTPLUG_CORE_SYNC_DEAD +void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu) +{ + if (mp_ops->cleanup_dead_cpu) + mp_ops->cleanup_dead_cpu(cpu); +} +#endif + #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST static void tick_broadcast_callee(void *info) --- a/arch/mips/loongson64/smp.c +++ b/arch/mips/loongson64/smp.c @@ -775,6 +775,7 @@ void play_dead(void) void (*play_dead_at_ckseg1)(int *); idle_task_exit(); + cpuhp_ap_report_dead(); prid_imp = read_c0_prid() & PRID_IMP_MASK; prid_rev = read_c0_prid() & PRID_REV_MASK;
[patch V4 37/37] x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it
From: Thomas Gleixner Implement the validation function which tells the core code whether parallel bringup is possible. The only condition for now is that the kernel does not run in an encrypted guest as these will trap the RDMSR via #VC, which cannot be handled at that point in early startup. There was an earlier variant for AMD-SEV which used the GHBC protocol for retrieving the APIC ID via CPUID, but there is no guarantee that the initial APIC ID in CPUID is the same as the real APIC ID. There is no enforcement from the secure firmware and the hypervisor can assign APIC IDs as it sees fit as long as the ACPI/MADT table is consistent with that assignment. Unfortunately there is no RDMSR GHCB protocol at the moment, so enabling AMD-SEV guests for parallel startup needs some more thought. Intel-TDX provides a secure RDMSR hypercall, but supporting that is outside the scope of this change. Fixup announce_cpu() as e.g. on Hyper-V CPU1 is the secondary sibling of CPU0, which makes the @cpu == 1 logic in announce_cpu() fall apart. [ mikelley: Reported the announce_cpu() fallout Originally-by: David Woodhouse Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- V2: Fixup announce_cpu() - Michael Kelley V3: Fixup announce_cpu() for real - Michael Kelley --- arch/x86/Kconfig |3 - arch/x86/kernel/cpu/common.c |6 -- arch/x86/kernel/smpboot.c| 87 +++ 3 files changed, 75 insertions(+), 21 deletions(-) --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -274,8 +274,9 @@ config X86 select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER select HAVE_GENERIC_VDSO + select HOTPLUG_PARALLEL if SMP && X86_64 select HOTPLUG_SMT if SMP - select HOTPLUG_SPLIT_STARTUPif SMP + select HOTPLUG_SPLIT_STARTUPif SMP && X86_32 select IRQ_FORCED_THREADING select NEED_PER_CPU_EMBED_FIRST_CHUNK select NEED_PER_CPU_PAGE_FIRST_CHUNK --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -2128,11 +2128,7 @@ static inline void setup_getcpu(int cpu) } #ifdef CONFIG_X86_64 -static inline void ucode_cpu_init(int cpu) -{ - if (cpu) - load_ucode_ap(); -} +static inline void ucode_cpu_init(int cpu) { } static inline void tss_setup_ist(struct tss_struct *tss) { --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -58,6 +58,7 @@ #include #include #include +#include #include #include @@ -75,7 +76,7 @@ #include #include #include -#include +#include #include #include #include @@ -128,7 +129,6 @@ int arch_update_cpu_topology(void) return retval; } - static unsigned int smpboot_warm_reset_vector_count; static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip) @@ -226,16 +226,43 @@ static void notrace start_secondary(void */ cr4_init(); -#ifdef CONFIG_X86_32 - /* switch away from the initial page table */ - load_cr3(swapper_pg_dir); - __flush_tlb_all(); -#endif + /* +* 32-bit specific. 64-bit reaches this code with the correct page +* table established. Yet another historical divergence. +*/ + if (IS_ENABLED(CONFIG_X86_32)) { + /* switch away from the initial page table */ + load_cr3(swapper_pg_dir); + __flush_tlb_all(); + } + cpu_init_exception_handling(); /* -* Synchronization point with the hotplug core. Sets the -* synchronization state to ALIVE and waits for the control CPU to +* 32-bit systems load the microcode from the ASM startup code for +* historical reasons. +* +* On 64-bit systems load it before reaching the AP alive +* synchronization point below so it is not part of the full per +* CPU serialized bringup part when "parallel" bringup is enabled. +* +* That's even safe when hyperthreading is enabled in the CPU as +* the core code starts the primary threads first and leaves the +* secondary threads waiting for SIPI. Loading microcode on +* physical cores concurrently is a safe operation. +* +* This covers both the Intel specific issue that concurrent +* microcode loading on SMT siblings must be prohibited and the +* vendor independent issue`that microcode loading which changes +* CPUID, MSRs etc. must be strictly serialized to maintain +* software state correctness. +*/ + if (IS_ENABLED(CONFIG_X86_64)) + load_ucode_ap(); + + /* +* Synchronization point with the hotplug core. Sets this CPUs +* synchronization state to ALIVE and spin-waits for the control CPU to * release this CPU for further bringup. */ cpuhp_ap_sync_alive(); @@ -918,9
[patch V4 24/37] csky/smp: Switch to hotplug core state synchronization
From: Thomas Gleixner Switch to the CPU hotplug core state tracking and synchronization mechanim. No functional change intended. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/csky/Kconfig |1 + arch/csky/include/asm/smp.h |2 +- arch/csky/kernel/smp.c |8 ++-- 3 files changed, 4 insertions(+), 7 deletions(-) --- a/arch/csky/Kconfig +++ b/arch/csky/Kconfig @@ -96,6 +96,7 @@ config CSKY select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_STACKPROTECTOR select HAVE_SYSCALL_TRACEPOINTS + select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU select MAY_HAVE_SPARSE_IRQ select MODULES_USE_ELF_RELA if MODULES select OF --- a/arch/csky/include/asm/smp.h +++ b/arch/csky/include/asm/smp.h @@ -23,7 +23,7 @@ void __init set_send_ipi(void (*func)(co int __cpu_disable(void); -void __cpu_die(unsigned int cpu); +static inline void __cpu_die(unsigned int cpu) { } #endif /* CONFIG_SMP */ --- a/arch/csky/kernel/smp.c +++ b/arch/csky/kernel/smp.c @@ -291,12 +291,8 @@ int __cpu_disable(void) return 0; } -void __cpu_die(unsigned int cpu) +void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu) { - if (!cpu_wait_death(cpu, 5)) { - pr_crit("CPU%u: shutdown failed\n", cpu); - return; - } pr_notice("CPU%u: shutdown\n", cpu); } @@ -304,7 +300,7 @@ void __noreturn arch_cpu_idle_dead(void) { idle_task_exit(); - cpu_report_death(); + cpuhp_ap_report_dead(); while (!secondary_stack) arch_cpu_idle();
[patch V4 36/37] x86/smpboot: Support parallel startup of secondary CPUs
From: David Woodhouse In parallel startup mode the APs are kicked alive by the control CPU quickly after each other and run through the early startup code in parallel. The real-mode startup code is already serialized with a bit-spinlock to protect the real-mode stack. In parallel startup mode the smpboot_control variable obviously cannot contain the Linux CPU number so the APs have to determine their Linux CPU number on their own. This is required to find the CPUs per CPU offset in order to find the idle task stack and other per CPU data. To achieve this, export the cpuid_to_apicid[] array so that each AP can find its own CPU number by searching therein based on its APIC ID. Introduce a flag in the top bits of smpboot_control which indicates that the AP should find its CPU number by reading the APIC ID from the APIC. This is required because CPUID based APIC ID retrieval can only provide the initial APIC ID, which might have been overruled by the firmware. Some AMD APUs come up with APIC ID = initial APIC ID + 0x10, so the APIC ID to CPU number lookup would fail miserably if based on CPUID. Also virtualization can make its own APIC ID assignements. The only requirement is that the APIC IDs are consistent with the APCI/MADT table. For the boot CPU or in case parallel bringup is disabled the control bits are empty and the CPU number is directly available in bit 0-23 of smpboot_control. [ tglx: Initial proof of concept patch with bitlock and APIC ID lookup ] [ dwmw2: Rework and testing, commit message, CPUID 0x1 and CPU0 support ] [ seanc: Fix stray override of initial_gs in common_cpu_up() ] [ Oleksandr Natalenko: reported suspend/resume issue fixed in x86_acpi_suspend_lowlevel ] [ tglx: Make it read the APIC ID from the APIC instead of using CPUID, split the bitlock part out ] Co-developed-by: Thomas Gleixner Co-developed-by: Brian Gerst Signed-off-by: Thomas Gleixner Signed-off-by: Brian Gerst Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- V4: Remove the lock prefix in the error path - Peter Z. --- arch/x86/include/asm/apic.h|2 + arch/x86/include/asm/apicdef.h |5 ++- arch/x86/include/asm/smp.h |6 arch/x86/kernel/acpi/sleep.c |9 +- arch/x86/kernel/apic/apic.c|2 - arch/x86/kernel/head_64.S | 61 + arch/x86/kernel/smpboot.c |2 - 7 files changed, 83 insertions(+), 4 deletions(-) --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -55,6 +55,8 @@ extern int local_apic_timer_c2_ok; extern int disable_apic; extern unsigned int lapic_timer_period; +extern int cpuid_to_apicid[]; + extern enum apic_intr_mode_id apic_intr_mode; enum apic_intr_mode_id { APIC_PIC, --- a/arch/x86/include/asm/apicdef.h +++ b/arch/x86/include/asm/apicdef.h @@ -138,7 +138,8 @@ #defineAPIC_EILVT_MASKED (1 << 16) #define APIC_BASE (fix_to_virt(FIX_APIC_BASE)) -#define APIC_BASE_MSR 0x800 +#define APIC_BASE_MSR 0x800 +#define APIC_X2APIC_ID_MSR 0x802 #define XAPIC_ENABLE (1UL << 11) #define X2APIC_ENABLE (1UL << 10) @@ -162,6 +163,7 @@ #define APIC_CPUID(apicid) ((apicid) & XAPIC_DEST_CPUS_MASK) #define NUM_APIC_CLUSTERS ((BAD_APICID + 1) >> XAPIC_DEST_CPUS_SHIFT) +#ifndef __ASSEMBLY__ /* * the local APIC register structure, memory mapped. Not terribly well * tested, but we might eventually use this one in the future - the @@ -435,4 +437,5 @@ enum apic_delivery_modes { APIC_DELIVERY_MODE_EXTINT = 7, }; +#endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_APICDEF_H */ --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -200,4 +200,10 @@ extern unsigned long apic_mmio_base; #endif /* !__ASSEMBLY__ */ +/* Control bits for startup_64 */ +#define STARTUP_READ_APICID0x8000 + +/* Top 8 bits are reserved for control */ +#define STARTUP_PARALLEL_MASK 0xFF00 + #endif /* _ASM_X86_SMP_H */ --- a/arch/x86/kernel/acpi/sleep.c +++ b/arch/x86/kernel/acpi/sleep.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include "../../realmode/rm/wakeup.h" @@ -127,7 +128,13 @@ int x86_acpi_suspend_lowlevel(void) * value is in the actual %rsp register. */ current->thread.sp = (unsigned long)temp_stack + sizeof(temp_stack); - smpboot_control = smp_processor_id(); + /* +* Ensure the CPU knows which one it is when it comes back, if +* it isn't in parallel mode and expected to work that out for +* itself. +*/ + if (!(smpboot_control & STARTUP_PARALLEL_MASK)) + smpboot_control = smp_processor_id(); #endif initial_code = (unsigned long)wakeup_long64; saved_magic = 0x123456789abcdef0L; --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2380,7 +2380,7 @@ static int nr_logical_cpuids = 1; /* * Used to
[patch V4 30/37] cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism
From: Thomas Gleixner The bring up logic of a to be onlined CPU consists of several parts, which are considered to be a single hotplug state: 1) Control CPU issues the wake-up 2) To be onlined CPU starts up, does the minimal initialization, reports to be alive and waits for release into the complete bring-up. 3) Control CPU waits for the alive report and releases the upcoming CPU for the complete bring-up. Allow to split this into two states: 1) Control CPU issues the wake-up After that the to be onlined CPU starts up, does the minimal initialization, reports to be alive and waits for release into the full bring-up. As this can run after the control CPU dropped the hotplug locks the code which is executed on the AP before it reports alive has to be carefully audited to not violate any of the hotplug constraints, especially not modifying any of the various cpumasks. This is really only meant to avoid waiting for the AP to react on the wake-up. Of course an architecture can move strict CPU related setup functionality, e.g. microcode loading, with care before the synchronization point to save further pointless waiting time. 2) Control CPU waits for the alive report and releases the upcoming CPU for the complete bring-up. This allows that the two states can be split up to run all to be onlined CPUs up to state #1 on the control CPU and then at a later point run state #2. This spares some of the latencies of the full serialized per CPU bringup by avoiding the per CPU wakeup/wait serialization. The assumption is that the first AP already waits when the last AP has been woken up. This obvioulsy depends on the hardware latencies and depending on the timings this might still not completely eliminate all wait scenarios. This split is just a preparatory step for enabling the parallel bringup later. The boot time bringup is still fully serialized. It has a separate config switch so that architectures which want to support parallel bringup can test the split of the CPUHP_BRINGUG step separately. To enable this the architecture must support the CPU hotplug core sync mechanism and has to be audited that there are no implicit hotplug state dependencies which require a fully serialized bringup. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/Kconfig |4 ++ include/linux/cpuhotplug.h |4 ++ kernel/cpu.c | 70 +++-- 3 files changed, 76 insertions(+), 2 deletions(-) --- a/arch/Kconfig +++ b/arch/Kconfig @@ -49,6 +49,10 @@ config HOTPLUG_CORE_SYNC_FULL select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU select HOTPLUG_CORE_SYNC +config HOTPLUG_SPLIT_STARTUP + bool + select HOTPLUG_CORE_SYNC_FULL + config GENERIC_ENTRY bool --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -133,6 +133,7 @@ enum cpuhp_state { CPUHP_MIPS_SOC_PREPARE, CPUHP_BP_PREPARE_DYN, CPUHP_BP_PREPARE_DYN_END= CPUHP_BP_PREPARE_DYN + 20, + CPUHP_BP_KICK_AP, CPUHP_BRINGUP_CPU, /* @@ -517,9 +518,12 @@ void cpuhp_online_idle(enum cpuhp_state static inline void cpuhp_online_idle(enum cpuhp_state state) { } #endif +struct task_struct; + void cpuhp_ap_sync_alive(void); void arch_cpuhp_sync_state_poll(void); void arch_cpuhp_cleanup_kick_cpu(unsigned int cpu); +int arch_cpuhp_kick_ap_alive(unsigned int cpu, struct task_struct *tidle); #ifdef CONFIG_HOTPLUG_CORE_SYNC_DEAD void cpuhp_ap_report_dead(void); --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -761,6 +761,47 @@ static int bringup_wait_for_ap_online(un return 0; } +#ifdef CONFIG_HOTPLUG_SPLIT_STARTUP +static int cpuhp_kick_ap_alive(unsigned int cpu) +{ + if (!cpuhp_can_boot_ap(cpu)) + return -EAGAIN; + + return arch_cpuhp_kick_ap_alive(cpu, idle_thread_get(cpu)); +} + +static int cpuhp_bringup_ap(unsigned int cpu) +{ + struct cpuhp_cpu_state *st = per_cpu_ptr(_state, cpu); + int ret; + + /* +* Some architectures have to walk the irq descriptors to +* setup the vector space for the cpu which comes online. +* Prevent irq alloc/free across the bringup. +*/ + irq_lock_sparse(); + + ret = cpuhp_bp_sync_alive(cpu); + if (ret) + goto out_unlock; + + ret = bringup_wait_for_ap_online(cpu); + if (ret) + goto out_unlock; + + irq_unlock_sparse(); + + if (st->target <= CPUHP_AP_ONLINE_IDLE) + return 0; + + return cpuhp_kick_ap(cpu, st, st->target); + +out_unlock: + irq_unlock_sparse(); + return ret; +} +#else static int bringup_cpu(unsigned int cpu) { struct cpuhp_cpu_state *st = per_cpu_ptr(_state, cpu); @@ -781,7 +822,6 @@ static int bringup_cpu(unsigned int cpu) */ irq_lock_sparse(); - /*
[patch V4 35/37] x86/smpboot: Implement a bit spinlock to protect the realmode stack
From: Thomas Gleixner Parallel AP bringup requires that the APs can run fully parallel through the early startup code including the real mode trampoline. To prepare for this implement a bit-spinlock to serialize access to the real mode stack so that parallel upcoming APs are not going to corrupt each others stack while going through the real mode startup code. Co-developed-by: David Woodhouse Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- V4: Simplify the lock implementation - Peter Z. --- arch/x86/include/asm/realmode.h |3 +++ arch/x86/kernel/head_64.S| 12 arch/x86/realmode/init.c |3 +++ arch/x86/realmode/rm/trampoline_64.S | 23 ++- 4 files changed, 36 insertions(+), 5 deletions(-) --- a/arch/x86/include/asm/realmode.h +++ b/arch/x86/include/asm/realmode.h @@ -52,6 +52,7 @@ struct trampoline_header { u64 efer; u32 cr4; u32 flags; + u32 lock; #endif }; @@ -64,6 +65,8 @@ extern unsigned long initial_stack; extern unsigned long initial_vc_handler; #endif +extern u32 *trampoline_lock; + extern unsigned char real_mode_blob[]; extern unsigned char real_mode_relocs[]; --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -252,6 +252,16 @@ SYM_INNER_LABEL(secondary_startup_64_no_ movqTASK_threadsp(%rax), %rsp /* +* Now that this CPU is running on its own stack, drop the realmode +* protection. For the boot CPU the pointer is NULL! +*/ + movqtrampoline_lock(%rip), %rax + testq %rax, %rax + jz .Lsetup_gdt + movl$0, (%rax) + +.Lsetup_gdt: + /* * We must switch to a new descriptor in kernel space for the GDT * because soon the kernel won't have access anymore to the userspace * addresses where we're currently running on. We have to do that here @@ -433,6 +443,8 @@ SYM_DATA(initial_code, .quad x86_64_star #ifdef CONFIG_AMD_MEM_ENCRYPT SYM_DATA(initial_vc_handler, .quad handle_vc_boot_ghcb) #endif + +SYM_DATA(trampoline_lock, .quad 0); __FINITDATA __INIT --- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -154,6 +154,9 @@ static void __init setup_real_mode(void) trampoline_header->flags = 0; + trampoline_lock = _header->lock; + *trampoline_lock = 0; + trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd); /* Map the real mode stub as virtual == physical */ --- a/arch/x86/realmode/rm/trampoline_64.S +++ b/arch/x86/realmode/rm/trampoline_64.S @@ -37,6 +37,20 @@ .text .code16 +.macro LOAD_REALMODE_ESP + /* +* Make sure only one CPU fiddles with the realmode stack + */ +.Llock_rm\@: +lock btsl $0, tr_lock +jnc 2f +pause +jmp .Llock_rm\@ +2: + # Setup stack + movl$rm_stack_end, %esp +.endm + .balign PAGE_SIZE SYM_CODE_START(trampoline_start) cli # We should be safe anyway @@ -49,8 +63,7 @@ SYM_CODE_START(trampoline_start) mov %ax, %es mov %ax, %ss - # Setup stack - movl$rm_stack_end, %esp + LOAD_REALMODE_ESP callverify_cpu # Verify the cpu supports long mode testl %eax, %eax # Check for return code @@ -93,8 +106,7 @@ SYM_CODE_START(sev_es_trampoline_start) mov %ax, %es mov %ax, %ss - # Setup stack - movl$rm_stack_end, %esp + LOAD_REALMODE_ESP jmp .Lswitch_to_protected SYM_CODE_END(sev_es_trampoline_start) @@ -177,7 +189,7 @@ SYM_CODE_START(pa_trampoline_compat) * In compatibility mode. Prep ESP and DX for startup_32, then disable * paging and complete the switch to legacy 32-bit mode. */ - movl$rm_stack_end, %esp + LOAD_REALMODE_ESP movw$__KERNEL_DS, %dx movl$(CR0_STATE & ~X86_CR0_PG), %eax @@ -241,6 +253,7 @@ SYM_DATA_START(trampoline_header) SYM_DATA(tr_efer, .space 8) SYM_DATA(tr_cr4,.space 4) SYM_DATA(tr_flags, .space 4) + SYM_DATA(tr_lock, .space 4) SYM_DATA_END(trampoline_header) #include "trampoline_common.S"
[patch V4 27/37] riscv: Switch to hotplug core state synchronization
From: Thomas Gleixner Switch to the CPU hotplug core state tracking and synchronization mechanim. No functional change intended. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley Acked-by: Palmer Dabbelt --- arch/riscv/Kconfig |1 + arch/riscv/include/asm/smp.h|2 +- arch/riscv/kernel/cpu-hotplug.c | 14 +++--- 3 files changed, 9 insertions(+), 8 deletions(-) --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -122,6 +122,7 @@ config RISCV select HAVE_RSEQ select HAVE_STACKPROTECTOR select HAVE_SYSCALL_TRACEPOINTS + select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU select IRQ_DOMAIN select IRQ_FORCED_THREADING select KASAN_VMALLOC if KASAN --- a/arch/riscv/include/asm/smp.h +++ b/arch/riscv/include/asm/smp.h @@ -70,7 +70,7 @@ asmlinkage void smp_callin(void); #if defined CONFIG_HOTPLUG_CPU int __cpu_disable(void); -void __cpu_die(unsigned int cpu); +static inline void __cpu_die(unsigned int cpu) { } #endif /* CONFIG_HOTPLUG_CPU */ #else --- a/arch/riscv/kernel/cpu-hotplug.c +++ b/arch/riscv/kernel/cpu-hotplug.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -49,17 +50,15 @@ int __cpu_disable(void) return ret; } +#ifdef CONFIG_HOTPLUG_CPU /* - * Called on the thread which is asking for a CPU to be shutdown. + * Called on the thread which is asking for a CPU to be shutdown, if the + * CPU reported dead to the hotplug core. */ -void __cpu_die(unsigned int cpu) +void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu) { int ret = 0; - if (!cpu_wait_death(cpu, 5)) { - pr_err("CPU %u: didn't die\n", cpu); - return; - } pr_notice("CPU%u: off\n", cpu); /* Verify from the firmware if the cpu is really stopped*/ @@ -76,9 +75,10 @@ void __noreturn arch_cpu_idle_dead(void) { idle_task_exit(); - (void)cpu_report_death(); + cpuhp_ap_report_dead(); cpu_ops[smp_processor_id()]->cpu_stop(); /* It should never reach here */ BUG(); } +#endif
[patch V4 23/37] arm64: smp: Switch to hotplug core state synchronization
From: Thomas Gleixner Switch to the CPU hotplug core state tracking and synchronization mechanim. No functional change intended. Signed-off-by: Thomas Gleixner Tested-by: Mark Rutland Tested-by: Michael Kelley --- arch/arm64/Kconfig |1 + arch/arm64/include/asm/smp.h |2 +- arch/arm64/kernel/smp.c | 14 +- 3 files changed, 7 insertions(+), 10 deletions(-) --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -222,6 +222,7 @@ config ARM64 select HAVE_KPROBES select HAVE_KRETPROBES select HAVE_GENERIC_VDSO + select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU select IRQ_DOMAIN select IRQ_FORCED_THREADING select KASAN_VMALLOC if KASAN --- a/arch/arm64/include/asm/smp.h +++ b/arch/arm64/include/asm/smp.h @@ -99,7 +99,7 @@ static inline void arch_send_wakeup_ipi_ extern int __cpu_disable(void); -extern void __cpu_die(unsigned int cpu); +static inline void __cpu_die(unsigned int cpu) { } extern void __noreturn cpu_die(void); extern void __noreturn cpu_die_early(void); --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -332,17 +332,13 @@ static int op_cpu_kill(unsigned int cpu) } /* - * called on the thread which is asking for a CPU to be shutdown - - * waits until shutdown has completed, or it is timed out. + * Called on the thread which is asking for a CPU to be shutdown after the + * shutdown completed. */ -void __cpu_die(unsigned int cpu) +void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu) { int err; - if (!cpu_wait_death(cpu, 5)) { - pr_crit("CPU%u: cpu didn't die\n", cpu); - return; - } pr_debug("CPU%u: shutdown\n", cpu); /* @@ -369,8 +365,8 @@ void __noreturn cpu_die(void) local_daif_mask(); - /* Tell __cpu_die() that this CPU is now safe to dispose of */ - (void)cpu_report_death(); + /* Tell cpuhp_bp_sync_dead() that this CPU is now safe to dispose of */ + cpuhp_ap_report_dead(); /* * Actually shutdown the CPU. This must never fail. The specific hotplug
Re: [PATCH v2] piix: fix regression during unplug in Xen HVM domUs
On Wed, 10 May 2023, Olaf Hering wrote: > Wed, 10 May 2023 00:58:27 +0200 Olaf Hering : > > > In my debugging (with v8.0.0) it turned out the three pci_set_word > > causes the domU to hang. In fact, it is just the last one: > > > >pci_set_byte(pci_conf + 0x20, 0x01); /* BMIBA: 20-23h */ > > > > It changes the value from 0xc121 to 0x1. > > If I disable just "pci_set_word(pci_conf + PCI_COMMAND, 0x);" it works as > well. > It changes the value from 0x5 to 0. > > In general I feel it is wrong to fiddle with PCI from the host side. > This is most likely not the intention of the Xen unplug protocol. > I'm sure the guest does not expect such changes under the hood. > It happens to work by luck with pvops kernels because their PCI discovery > is done after the unplug. > > So, what do we do here to get this off the table? I don't have a concrete suggestion because I don't understand the root cause of the issue. Looking back at Paolo's reply from 2021 https://marc.info/?l=xen-devel=161669099305992=2 I think he was right. We can either fix the root cause of the issue or avoid calling qdev_reset_all on unplug. I am OK with either one.
[patch V4 16/37] x86/smpboot: Remove wait for cpu_online()
From: Thomas Gleixner Now that the core code drops sparse_irq_lock after the idle thread synchronized, it's pointless to wait for the AP to mark itself online. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/kernel/smpboot.c | 26 ++ 1 file changed, 2 insertions(+), 24 deletions(-) --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -275,7 +275,6 @@ static void notrace start_secondary(void * half valid vector space. */ lock_vector_lock(); - /* Sync point with do_wait_cpu_online() */ set_cpu_online(smp_processor_id(), true); lapic_online(); unlock_vector_lock(); @@ -1104,20 +1103,6 @@ static int wait_cpu_initialized(unsigned return 0; } -/* - * Bringup step three: Wait for the target AP to reach set_cpu_online() in - * start_secondary(). - */ -static void wait_cpu_online(unsigned int cpu) -{ - /* -* Wait for the AP to mark itself online, so the core caller -* can drop sparse_irq_lock. -*/ - while (!cpu_online(cpu)) - schedule(); -} - static int native_kick_ap(unsigned int cpu, struct task_struct *tidle) { int apicid = apic->cpu_present_to_apicid(cpu); @@ -1164,16 +1149,9 @@ int native_cpu_up(unsigned int cpu, stru int ret; ret = native_kick_ap(cpu, tidle); - if (ret) - goto out; - - ret = wait_cpu_initialized(cpu); - if (ret) - goto out; - - wait_cpu_online(cpu); + if (!ret) + ret = wait_cpu_initialized(cpu); -out: /* Cleanup possible dangling ends... */ if (x86_platform.legacy.warm_reset) smpboot_restore_warm_reset_vector();
[patch V4 20/37] x86/smpboot: Switch to hotplug core state synchronization
From: Thomas Gleixner The new AP state tracking and synchronization mechanism in the CPU hotplug core code allows to remove quite some x86 specific code: 1) The AP alive synchronization based on cpumasks 2) The decision whether an AP can be brought up again Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- V2: Use for_each_online_cpu() - Brian --- arch/x86/Kconfig |1 arch/x86/include/asm/smp.h |7 + arch/x86/kernel/smp.c |1 arch/x86/kernel/smpboot.c | 165 +++-- arch/x86/xen/smp_hvm.c | 16 +--- arch/x86/xen/smp_pv.c | 39 ++ 6 files changed, 75 insertions(+), 154 deletions(-) --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -274,6 +274,7 @@ config X86 select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER select HAVE_GENERIC_VDSO + select HOTPLUG_CORE_SYNC_FULL if SMP select HOTPLUG_SMT if SMP select IRQ_FORCED_THREADING select NEED_PER_CPU_EMBED_FIRST_CHUNK --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -38,6 +38,8 @@ struct smp_ops { void (*crash_stop_other_cpus)(void); void (*smp_send_reschedule)(int cpu); + void (*cleanup_dead_cpu)(unsigned cpu); + void (*poll_sync_state)(void); int (*cpu_up)(unsigned cpu, struct task_struct *tidle); int (*cpu_disable)(void); void (*cpu_die)(unsigned int cpu); @@ -90,7 +92,8 @@ static inline int __cpu_disable(void) static inline void __cpu_die(unsigned int cpu) { - smp_ops.cpu_die(cpu); + if (smp_ops.cpu_die) + smp_ops.cpu_die(cpu); } static inline void __noreturn play_dead(void) @@ -123,8 +126,6 @@ void native_smp_cpus_done(unsigned int m int common_cpu_up(unsigned int cpunum, struct task_struct *tidle); int native_cpu_up(unsigned int cpunum, struct task_struct *tidle); int native_cpu_disable(void); -int common_cpu_die(unsigned int cpu); -void native_cpu_die(unsigned int cpu); void __noreturn hlt_play_dead(void); void native_play_dead(void); void play_dead_common(void); --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -269,7 +269,6 @@ struct smp_ops smp_ops = { .smp_send_reschedule= native_smp_send_reschedule, .cpu_up = native_cpu_up, - .cpu_die= native_cpu_die, .cpu_disable= native_cpu_disable, .play_dead = native_play_dead, --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -57,6 +57,7 @@ #include #include #include +#include #include #include @@ -101,9 +102,6 @@ EXPORT_PER_CPU_SYMBOL(cpu_die_map); DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info); EXPORT_PER_CPU_SYMBOL(cpu_info); -/* All of these masks are initialized in setup_cpu_local_masks() */ -static cpumask_var_t cpu_initialized_mask; -static cpumask_var_t cpu_callout_mask; /* Representing CPUs for which sibling maps can be computed */ static cpumask_var_t cpu_sibling_setup_mask; @@ -166,10 +164,10 @@ static void ap_starting(void) int cpuid = smp_processor_id(); /* -* If woken up by an INIT in an 82489DX configuration -* cpu_callout_mask guarantees the CPU does not reach this point -* before an INIT_deassert IPI reaches the local APIC, so it is now -* safe to touch the local APIC. +* If woken up by an INIT in an 82489DX configuration the alive +* synchronization guarantees that the CPU does not reach this +* point before an INIT_deassert IPI reaches the local APIC, so it +* is now safe to touch the local APIC. * * Set up this CPU, first the APIC, which is probably redundant on * most boards. @@ -213,17 +211,6 @@ static void ap_calibrate_delay(void) cpu_data(smp_processor_id()).loops_per_jiffy = loops_per_jiffy; } -static void wait_for_master_cpu(int cpu) -{ - /* -* Wait for release by control CPU before continuing with AP -* initialization. -*/ - WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask)); - while (!cpumask_test_cpu(cpu, cpu_callout_mask)) - cpu_relax(); -} - /* * Activate a secondary processor. */ @@ -244,11 +231,11 @@ static void notrace start_secondary(void cpu_init_exception_handling(); /* -* Sync point with wait_cpu_initialized(). Sets AP in -* cpu_initialized_mask and then waits for the control CPU -* to release it. +* Synchronization point with the hotplug core. Sets the +* synchronization state to ALIVE and waits for the control CPU to +* release this CPU for further bringup. */ - wait_for_master_cpu(raw_smp_processor_id()); + cpuhp_ap_sync_alive(); cpu_init(); rcu_cpu_starting(raw_smp_processor_id()); @@ -278,7 +265,6 @@
[patch V4 19/37] cpu/hotplug: Add CPU state tracking and synchronization
From: Thomas Gleixner The CPU state tracking and synchronization mechanism in smpboot.c is completely independent of the hotplug code and all logic around it is implemented in architecture specific code. Except for the state reporting of the AP there is absolutely nothing architecture specific and the sychronization and decision functions can be moved into the generic hotplug core code. Provide an integrated variant and add the core synchronization and decision points. This comes in two flavours: 1) DEAD state synchronization Updated by the architecture code once the AP reaches the point where it is ready to be torn down by the control CPU, e.g. by removing power or clocks or tear down via the hypervisor. The control CPU waits for this state to be reached with a timeout. If the state is reached an architecture specific cleanup function is invoked. 2) Full state synchronization This extends #1 with AP alive synchronization. This is new functionality, which allows to replace architecture specific wait mechanims, e.g. cpumasks, completely. It also prevents that an AP which is in a limbo state can be brought up again. This can happen when an AP failed to report dead state during a previous off-line operation. The dead synchronization is what most architectures use. Only x86 makes a bringup decision based on that state at the moment. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- V4: Remove the try_cmpxchg() loop in cpuhp_ap_update_sync_state() - Peter Z. --- arch/Kconfig | 15 +++ include/linux/cpuhotplug.h | 12 ++ kernel/cpu.c | 193 - kernel/smpboot.c |2 4 files changed, 221 insertions(+), 1 deletion(-) --- a/arch/Kconfig +++ b/arch/Kconfig @@ -34,6 +34,21 @@ config ARCH_HAS_SUBPAGE_FAULTS config HOTPLUG_SMT bool +# Selected by HOTPLUG_CORE_SYNC_DEAD or HOTPLUG_CORE_SYNC_FULL +config HOTPLUG_CORE_SYNC + bool + +# Basic CPU dead synchronization selected by architecture +config HOTPLUG_CORE_SYNC_DEAD + bool + select HOTPLUG_CORE_SYNC + +# Full CPU synchronization with alive state selected by architecture +config HOTPLUG_CORE_SYNC_FULL + bool + select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU + select HOTPLUG_CORE_SYNC + config GENERIC_ENTRY bool --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -517,4 +517,16 @@ void cpuhp_online_idle(enum cpuhp_state static inline void cpuhp_online_idle(enum cpuhp_state state) { } #endif +void cpuhp_ap_sync_alive(void); +void arch_cpuhp_sync_state_poll(void); +void arch_cpuhp_cleanup_kick_cpu(unsigned int cpu); + +#ifdef CONFIG_HOTPLUG_CORE_SYNC_DEAD +void cpuhp_ap_report_dead(void); +void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu); +#else +static inline void cpuhp_ap_report_dead(void) { } +static inline void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu) { } +#endif + #endif --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -59,6 +60,7 @@ * @last: For multi-instance rollback, remember how far we got * @cb_state: The state for a single callback (install/uninstall) * @result:Result of the operation + * @ap_sync_state: State for AP synchronization * @done_up: Signal completion to the issuer of the task for cpu-up * @done_down: Signal completion to the issuer of the task for cpu-down */ @@ -76,6 +78,7 @@ struct cpuhp_cpu_state { struct hlist_node *last; enum cpuhp_statecb_state; int result; + atomic_tap_sync_state; struct completion done_up; struct completion done_down; #endif @@ -276,6 +279,182 @@ static bool cpuhp_is_atomic_state(enum c return CPUHP_AP_IDLE_DEAD <= state && state < CPUHP_AP_ONLINE; } +/* Synchronization state management */ +enum cpuhp_sync_state { + SYNC_STATE_DEAD, + SYNC_STATE_KICKED, + SYNC_STATE_SHOULD_DIE, + SYNC_STATE_ALIVE, + SYNC_STATE_SHOULD_ONLINE, + SYNC_STATE_ONLINE, +}; + +#ifdef CONFIG_HOTPLUG_CORE_SYNC +/** + * cpuhp_ap_update_sync_state - Update synchronization state during bringup/teardown + * @state: The synchronization state to set + * + * No synchronization point. Just update of the synchronization state, but implies + * a full barrier so that the AP changes are visible before the control CPU proceeds. + */ +static inline void cpuhp_ap_update_sync_state(enum cpuhp_sync_state state) +{ + atomic_t *st = this_cpu_ptr(_state.ap_sync_state); + + (void)atomic_xchg(st, state); +} + +void __weak arch_cpuhp_sync_state_poll(void) { cpu_relax(); } + +static bool cpuhp_wait_for_sync_state(unsigned int cpu, enum cpuhp_sync_state state, + enum cpuhp_sync_state next_state)
[patch V4 18/37] x86/xen/hvm: Get rid of DEAD_FROZEN handling
From: Thomas Gleixner No point in this conditional voodoo. Un-initializing the lock mechanism is safe to be called unconditionally even if it was already invoked when the CPU died. Remove the invocation of xen_smp_intr_free() as that has been already cleaned up in xen_cpu_dead_hvm(). Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/xen/enlighten_hvm.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) --- a/arch/x86/xen/enlighten_hvm.c +++ b/arch/x86/xen/enlighten_hvm.c @@ -161,13 +161,12 @@ static int xen_cpu_up_prepare_hvm(unsign int rc = 0; /* -* This can happen if CPU was offlined earlier and -* offlining timed out in common_cpu_die(). +* If a CPU was offlined earlier and offlining timed out then the +* lock mechanism is still initialized. Uninit it unconditionally +* as it's safe to call even if already uninited. Interrupts and +* timer have already been handled in xen_cpu_dead_hvm(). */ - if (cpu_report_state(cpu) == CPU_DEAD_FROZEN) { - xen_smp_intr_free(cpu); - xen_uninit_lock_cpu(cpu); - } + xen_uninit_lock_cpu(cpu); if (cpu_acpi_id(cpu) != U32_MAX) per_cpu(xen_vcpu_id, cpu) = cpu_acpi_id(cpu);
[patch V4 15/37] cpu/hotplug: Rework sparse_irq locking in bringup_cpu()
From: Thomas Gleixner There is no harm to hold sparse_irq lock until the upcoming CPU completes in cpuhp_online_idle(). This allows to remove cpu_online() synchronization from architecture code. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- V4: Amend comment about sparse irq lock - Peter Z. --- kernel/cpu.c | 34 -- 1 file changed, 24 insertions(+), 10 deletions(-) --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -558,7 +558,7 @@ static int cpuhp_kick_ap(int cpu, struct return ret; } -static int bringup_wait_for_ap(unsigned int cpu) +static int bringup_wait_for_ap_online(unsigned int cpu) { struct cpuhp_cpu_state *st = per_cpu_ptr(_state, cpu); @@ -579,15 +579,12 @@ static int bringup_wait_for_ap(unsigned */ if (!cpu_smt_allowed(cpu)) return -ECANCELED; - - if (st->target <= CPUHP_AP_ONLINE_IDLE) - return 0; - - return cpuhp_kick_ap(cpu, st, st->target); + return 0; } static int bringup_cpu(unsigned int cpu) { + struct cpuhp_cpu_state *st = per_cpu_ptr(_state, cpu); struct task_struct *idle = idle_thread_get(cpu); int ret; @@ -600,16 +597,33 @@ static int bringup_cpu(unsigned int cpu) /* * Some architectures have to walk the irq descriptors to * setup the vector space for the cpu which comes online. -* Prevent irq alloc/free across the bringup. +* +* Prevent irq alloc/free across the bringup by acquiring the +* sparse irq lock. Hold it until the upcoming CPU completes the +* startup in cpuhp_online_idle() which allows to avoid +* intermediate synchronization points in the architecture code. */ irq_lock_sparse(); /* Arch-specific enabling code. */ ret = __cpu_up(cpu, idle); - irq_unlock_sparse(); if (ret) - return ret; - return bringup_wait_for_ap(cpu); + goto out_unlock; + + ret = bringup_wait_for_ap_online(cpu); + if (ret) + goto out_unlock; + + irq_unlock_sparse(); + + if (st->target <= CPUHP_AP_ONLINE_IDLE) + return 0; + + return cpuhp_kick_ap(cpu, st, st->target); + +out_unlock: + irq_unlock_sparse(); + return ret; } static int finish_cpu(unsigned int cpu)
[patch V4 14/37] x86/smpboot: Remove cpu_callin_mask
From: Thomas Gleixner Now that TSC synchronization is SMP function call based there is no reason to wait for the AP to be set in smp_callin_mask. The control CPU waits for the AP to set itself in the online mask anyway. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- V4: Rename smp_callin() to ap_starting() - Peter Z. --- arch/x86/kernel/smpboot.c | 74 +- 1 file changed, 15 insertions(+), 59 deletions(-) --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -104,7 +104,6 @@ EXPORT_PER_CPU_SYMBOL(cpu_info); /* All of these masks are initialized in setup_cpu_local_masks() */ static cpumask_var_t cpu_initialized_mask; static cpumask_var_t cpu_callout_mask; -static cpumask_var_t cpu_callin_mask; /* Representing CPUs for which sibling maps can be computed */ static cpumask_var_t cpu_sibling_setup_mask; @@ -161,38 +160,30 @@ static inline void smpboot_restore_warm_ } -/* - * Report back to the Boot Processor during boot time or to the caller processor - * during CPU online. - */ -static void smp_callin(void) +/* Run the next set of setup steps for the upcoming CPU */ +static void ap_starting(void) { - int cpuid; - - /* -* If waken up by an INIT in an 82489DX configuration -* cpu_callout_mask guarantees we don't get here before -* an INIT_deassert IPI reaches our local APIC, so it is -* now safe to touch our local APIC. -*/ - cpuid = smp_processor_id(); + int cpuid = smp_processor_id(); /* -* the boot CPU has finished the init stage and is spinning -* on callin_map until we finish. We are free to set up this -* CPU, first the APIC. (this is probably redundant on most -* boards) +* If woken up by an INIT in an 82489DX configuration +* cpu_callout_mask guarantees the CPU does not reach this point +* before an INIT_deassert IPI reaches the local APIC, so it is now +* safe to touch the local APIC. +* +* Set up this CPU, first the APIC, which is probably redundant on +* most boards. */ apic_ap_setup(); - /* Save our processor parameters. */ + /* Save the processor parameters. */ smp_store_cpu_info(cpuid); /* * The topology information must be up to date before * notify_cpu_starting(). */ - set_cpu_sibling_map(raw_smp_processor_id()); + set_cpu_sibling_map(cpuid); ap_init_aperfmperf(); @@ -205,11 +196,6 @@ static void smp_callin(void) * state CPUHP_ONLINE. */ notify_cpu_starting(cpuid); - - /* -* Allow the master to continue. -*/ - cpumask_set_cpu(cpuid, cpu_callin_mask); } static void ap_calibrate_delay(void) @@ -268,12 +254,7 @@ static void notrace start_secondary(void rcu_cpu_starting(raw_smp_processor_id()); x86_cpuinit.early_percpu_clock_init(); - /* -* Sync point with wait_cpu_callin(). The AP doesn't wait here -* but just sets the bit to let the controlling CPU (BSP) know that -* it's got this far. -*/ - smp_callin(); + ap_starting(); /* Check TSC synchronization with the control CPU. */ check_tsc_sync_target(); @@ -1109,7 +1090,7 @@ static int wait_cpu_cpumask(unsigned int * and thus wait_for_master_cpu(), then set cpu_callout_mask to allow it * to proceed. The AP will then proceed past setting its 'callin' bit * and end up waiting in check_tsc_sync_target() until we reach - * do_wait_cpu_online() to tend to it. + * wait_cpu_online() to tend to it. */ static int wait_cpu_initialized(unsigned int cpu) { @@ -1124,20 +1105,7 @@ static int wait_cpu_initialized(unsigned } /* - * Bringup step three: Wait for the target AP to reach smp_callin(). - * The AP is not waiting for us here so we don't need to parallelise - * this step. Not entirely clear why we care about this, since we just - * proceed directly to TSC synchronization which is the next sync - * point with the AP anyway. - */ -static void wait_cpu_callin(unsigned int cpu) -{ - while (!cpumask_test_cpu(cpu, cpu_callin_mask)) - schedule(); -} - -/* - * Bringup step four: Wait for the target AP to reach set_cpu_online() in + * Bringup step three: Wait for the target AP to reach set_cpu_online() in * start_secondary(). */ static void wait_cpu_online(unsigned int cpu) @@ -1167,14 +1135,6 @@ static int native_kick_ap(unsigned int c } /* -* Already booted CPU? -*/ - if (cpumask_test_cpu(cpu, cpu_callin_mask)) { - pr_debug("do_boot_cpu %d Already started\n", cpu); - return -ENOSYS; - } - - /* * Save current MTRR state in case it was changed since early boot * (e.g. by the ACPI SMI) to initialize new CPUs with MTRRs in sync: */ @@
[patch V4 11/37] x86/cpu/cacheinfo: Remove cpu_callout_mask dependency
From: Thomas Gleixner cpu_callout_mask is used for the stop machine based MTRR/PAT init. In preparation of moving the BP/AP synchronization to the core hotplug code, use a private CPU mask for cacheinfo and manage it in the starting/dying hotplug state. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/kernel/cpu/cacheinfo.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) --- a/arch/x86/kernel/cpu/cacheinfo.c +++ b/arch/x86/kernel/cpu/cacheinfo.c @@ -39,6 +39,8 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t /* Shared L2 cache maps */ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_l2c_shared_map); +static cpumask_var_t cpu_cacheinfo_mask; + /* Kernel controls MTRR and/or PAT MSRs. */ unsigned int memory_caching_control __ro_after_init; @@ -1172,8 +1174,10 @@ void cache_bp_restore(void) cache_cpu_init(); } -static int cache_ap_init(unsigned int cpu) +static int cache_ap_online(unsigned int cpu) { + cpumask_set_cpu(cpu, cpu_cacheinfo_mask); + if (!memory_caching_control || get_cache_aps_delayed_init()) return 0; @@ -1191,11 +1195,17 @@ static int cache_ap_init(unsigned int cp * lock to prevent MTRR entry changes */ stop_machine_from_inactive_cpu(cache_rendezvous_handler, NULL, - cpu_callout_mask); + cpu_cacheinfo_mask); return 0; } +static int cache_ap_offline(unsigned int cpu) +{ + cpumask_clear_cpu(cpu, cpu_cacheinfo_mask); + return 0; +} + /* * Delayed cache initialization for all AP's */ @@ -1210,9 +1220,12 @@ void cache_aps_init(void) static int __init cache_ap_register(void) { + zalloc_cpumask_var(_cacheinfo_mask, GFP_KERNEL); + cpumask_set_cpu(smp_processor_id(), cpu_cacheinfo_mask); + cpuhp_setup_state_nocalls(CPUHP_AP_CACHECTRL_STARTING, "x86/cachectrl:starting", - cache_ap_init, NULL); + cache_ap_online, cache_ap_offline); return 0; } -core_initcall(cache_ap_register); +early_initcall(cache_ap_register);
[patch V4 17/37] x86/xen/smp_pv: Remove wait for CPU online
From: Thomas Gleixner Now that the core code drops sparse_irq_lock after the idle thread synchronized, it's pointless to wait for the AP to mark itself online. Whether the control CPU runs in a wait loop or sleeps in the core code waiting for the online operation to complete makes no difference. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/xen/smp_pv.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) --- a/arch/x86/xen/smp_pv.c +++ b/arch/x86/xen/smp_pv.c @@ -340,11 +340,11 @@ static int xen_pv_cpu_up(unsigned int cp xen_pmu_init(cpu); - rc = HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL); - BUG_ON(rc); - - while (cpu_report_state(cpu) != CPU_ONLINE) - HYPERVISOR_sched_op(SCHEDOP_yield, NULL); + /* +* Why is this a BUG? If the hypercall fails then everything can be +* rolled back, no? +*/ + BUG_ON(HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL)); return 0; }
[patch V4 12/37] x86/smpboot: Move synchronization masks to SMP boot code
From: Thomas Gleixner The usage is in smpboot.c and not in the CPU initialization code. The XEN_PV usage of cpu_callout_mask is obsolete as cpu_init() not longer waits and cacheinfo has its own CPU mask now, so cpu_callout_mask can be made static too. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/include/asm/cpumask.h |5 - arch/x86/kernel/cpu/common.c | 17 - arch/x86/kernel/smpboot.c | 16 arch/x86/xen/smp_pv.c |3 --- 4 files changed, 16 insertions(+), 25 deletions(-) --- a/arch/x86/include/asm/cpumask.h +++ b/arch/x86/include/asm/cpumask.h @@ -4,11 +4,6 @@ #ifndef __ASSEMBLY__ #include -extern cpumask_var_t cpu_callin_mask; -extern cpumask_var_t cpu_callout_mask; -extern cpumask_var_t cpu_initialized_mask; -extern cpumask_var_t cpu_sibling_setup_mask; - extern void setup_cpu_local_masks(void); /* --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -67,14 +67,6 @@ u32 elf_hwcap2 __read_mostly; -/* all of these masks are initialized in setup_cpu_local_masks() */ -cpumask_var_t cpu_initialized_mask; -cpumask_var_t cpu_callout_mask; -cpumask_var_t cpu_callin_mask; - -/* representing cpus for which sibling maps can be computed */ -cpumask_var_t cpu_sibling_setup_mask; - /* Number of siblings per CPU package */ int smp_num_siblings = 1; EXPORT_SYMBOL(smp_num_siblings); @@ -169,15 +161,6 @@ static void ppin_init(struct cpuinfo_x86 clear_cpu_cap(c, info->feature); } -/* correctly size the local cpu masks */ -void __init setup_cpu_local_masks(void) -{ - alloc_bootmem_cpumask_var(_initialized_mask); - alloc_bootmem_cpumask_var(_callin_mask); - alloc_bootmem_cpumask_var(_callout_mask); - alloc_bootmem_cpumask_var(_sibling_setup_mask); -} - static void default_init(struct cpuinfo_x86 *c) { #ifdef CONFIG_X86_64 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -101,6 +101,13 @@ EXPORT_PER_CPU_SYMBOL(cpu_die_map); DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info); EXPORT_PER_CPU_SYMBOL(cpu_info); +/* All of these masks are initialized in setup_cpu_local_masks() */ +static cpumask_var_t cpu_initialized_mask; +static cpumask_var_t cpu_callout_mask; +static cpumask_var_t cpu_callin_mask; +/* Representing CPUs for which sibling maps can be computed */ +static cpumask_var_t cpu_sibling_setup_mask; + /* Logical package management. We might want to allocate that dynamically */ unsigned int __max_logical_packages __read_mostly; EXPORT_SYMBOL(__max_logical_packages); @@ -1548,6 +1555,15 @@ early_param("possible_cpus", _setup_poss set_cpu_possible(i, true); } +/* correctly size the local cpu masks */ +void __init setup_cpu_local_masks(void) +{ + alloc_bootmem_cpumask_var(_initialized_mask); + alloc_bootmem_cpumask_var(_callin_mask); + alloc_bootmem_cpumask_var(_callout_mask); + alloc_bootmem_cpumask_var(_sibling_setup_mask); +} + #ifdef CONFIG_HOTPLUG_CPU /* Recompute SMT state for all CPUs on offline */ --- a/arch/x86/xen/smp_pv.c +++ b/arch/x86/xen/smp_pv.c @@ -254,15 +254,12 @@ cpu_initialize_context(unsigned int cpu, struct desc_struct *gdt; unsigned long gdt_mfn; - /* used to tell cpu_init() that it can proceed with initialization */ - cpumask_set_cpu(cpu, cpu_callout_mask); if (cpumask_test_and_set_cpu(cpu, xen_cpu_initialized_map)) return 0; ctxt = kzalloc(sizeof(*ctxt), GFP_KERNEL); if (ctxt == NULL) { cpumask_clear_cpu(cpu, xen_cpu_initialized_map); - cpumask_clear_cpu(cpu, cpu_callout_mask); return -ENOMEM; }
[patch V4 13/37] x86/smpboot: Make TSC synchronization function call based
From: Thomas Gleixner Spin-waiting on the control CPU until the AP reaches the TSC synchronization is just a waste especially in the case that there is no synchronization required. As the synchronization has to run with interrupts disabled the control CPU part can just be done from a SMP function call. The upcoming AP issues that call async only in the case that synchronization is required. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/include/asm/tsc.h |2 -- arch/x86/kernel/smpboot.c | 20 +++- arch/x86/kernel/tsc_sync.c | 36 +++- 3 files changed, 14 insertions(+), 44 deletions(-) --- a/arch/x86/include/asm/tsc.h +++ b/arch/x86/include/asm/tsc.h @@ -55,12 +55,10 @@ extern bool tsc_async_resets; #ifdef CONFIG_X86_TSC extern bool tsc_store_and_check_tsc_adjust(bool bootcpu); extern void tsc_verify_tsc_adjust(bool resume); -extern void check_tsc_sync_source(int cpu); extern void check_tsc_sync_target(void); #else static inline bool tsc_store_and_check_tsc_adjust(bool bootcpu) { return false; } static inline void tsc_verify_tsc_adjust(bool resume) { } -static inline void check_tsc_sync_source(int cpu) { } static inline void check_tsc_sync_target(void) { } #endif --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -275,11 +275,7 @@ static void notrace start_secondary(void */ smp_callin(); - /* -* Check TSC synchronization with the control CPU, which will do -* its part of this from wait_cpu_online(), making it an implicit -* synchronization point. -*/ + /* Check TSC synchronization with the control CPU. */ check_tsc_sync_target(); /* @@ -1141,21 +1137,11 @@ static void wait_cpu_callin(unsigned int } /* - * Bringup step four: Synchronize the TSC and wait for the target AP - * to reach set_cpu_online() in start_secondary(). + * Bringup step four: Wait for the target AP to reach set_cpu_online() in + * start_secondary(). */ static void wait_cpu_online(unsigned int cpu) { - unsigned long flags; - - /* -* Check TSC synchronization with the AP (keep irqs disabled -* while doing so): -*/ - local_irq_save(flags); - check_tsc_sync_source(cpu); - local_irq_restore(flags); - /* * Wait for the AP to mark itself online, so the core caller * can drop sparse_irq_lock. --- a/arch/x86/kernel/tsc_sync.c +++ b/arch/x86/kernel/tsc_sync.c @@ -245,7 +245,6 @@ bool tsc_store_and_check_tsc_adjust(bool */ static atomic_t start_count; static atomic_t stop_count; -static atomic_t skip_test; static atomic_t test_runs; /* @@ -344,21 +343,14 @@ static inline unsigned int loop_timeout( } /* - * Source CPU calls into this - it waits for the freshly booted - * target CPU to arrive and then starts the measurement: + * The freshly booted CPU initiates this via an async SMP function call. */ -void check_tsc_sync_source(int cpu) +static void check_tsc_sync_source(void *__cpu) { + unsigned int cpu = (unsigned long)__cpu; int cpus = 2; /* -* No need to check if we already know that the TSC is not -* synchronized or if we have no TSC. -*/ - if (unsynchronized_tsc()) - return; - - /* * Set the maximum number of test runs to * 1 if the CPU does not provide the TSC_ADJUST MSR * 3 if the MSR is available, so the target can try to adjust @@ -368,16 +360,9 @@ void check_tsc_sync_source(int cpu) else atomic_set(_runs, 3); retry: - /* -* Wait for the target to start or to skip the test: -*/ - while (atomic_read(_count) != cpus - 1) { - if (atomic_read(_test) > 0) { - atomic_set(_test, 0); - return; - } + /* Wait for the target to start. */ + while (atomic_read(_count) != cpus - 1) cpu_relax(); - } /* * Trigger the target to continue into the measurement too: @@ -397,14 +382,14 @@ void check_tsc_sync_source(int cpu) if (!nr_warps) { atomic_set(_runs, 0); - pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n", + pr_debug("TSC synchronization [CPU#%d -> CPU#%u]: passed\n", smp_processor_id(), cpu); } else if (atomic_dec_and_test(_runs) || random_warps) { /* Force it to 0 if random warps brought us here */ atomic_set(_runs, 0); - pr_warn("TSC synchronization [CPU#%d -> CPU#%d]:\n", + pr_warn("TSC synchronization [CPU#%d -> CPU#%u]:\n", smp_processor_id(), cpu); pr_warn("Measured %Ld cycles TSC warp between CPUs, " "turning off TSC clock.\n", max_warp); @@ -457,11 +442,12
[patch V4 10/37] x86/smpboot: Get rid of cpu_init_secondary()
From: Thomas Gleixner The synchronization of the AP with the control CPU is a SMP boot problem and has nothing to do with cpu_init(). Open code cpu_init_secondary() in start_secondary() and move wait_for_master_cpu() into the SMP boot code. No functional change. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/include/asm/processor.h |1 - arch/x86/kernel/cpu/common.c | 27 --- arch/x86/kernel/smpboot.c| 24 +++- 3 files changed, 19 insertions(+), 33 deletions(-) --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -551,7 +551,6 @@ extern void switch_gdt_and_percpu_base(i extern void load_direct_gdt(int); extern void load_fixmap_gdt(int); extern void cpu_init(void); -extern void cpu_init_secondary(void); extern void cpu_init_exception_handling(void); extern void cr4_init(void); --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -2123,19 +2123,6 @@ static void dbg_restore_debug_regs(void) #define dbg_restore_debug_regs() #endif /* ! CONFIG_KGDB */ -static void wait_for_master_cpu(int cpu) -{ -#ifdef CONFIG_SMP - /* -* wait for ACK from master CPU before continuing -* with AP initialization -*/ - WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask)); - while (!cpumask_test_cpu(cpu, cpu_callout_mask)) - cpu_relax(); -#endif -} - static inline void setup_getcpu(int cpu) { unsigned long cpudata = vdso_encode_cpunode(cpu, early_cpu_to_node(cpu)); @@ -2239,8 +2226,6 @@ void cpu_init(void) struct task_struct *cur = current; int cpu = raw_smp_processor_id(); - wait_for_master_cpu(cpu); - ucode_cpu_init(cpu); #ifdef CONFIG_NUMA @@ -2293,18 +2278,6 @@ void cpu_init(void) load_fixmap_gdt(cpu); } -#ifdef CONFIG_SMP -void cpu_init_secondary(void) -{ - /* -* Relies on the BP having set-up the IDT tables, which are loaded -* on this CPU in cpu_init_exception_handling(). -*/ - cpu_init_exception_handling(); - cpu_init(); -} -#endif - #ifdef CONFIG_MICROCODE_LATE_LOADING /** * store_cpu_caps() - Store a snapshot of CPU capabilities --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -220,6 +220,17 @@ static void ap_calibrate_delay(void) cpu_data(smp_processor_id()).loops_per_jiffy = loops_per_jiffy; } +static void wait_for_master_cpu(int cpu) +{ + /* +* Wait for release by control CPU before continuing with AP +* initialization. +*/ + WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask)); + while (!cpumask_test_cpu(cpu, cpu_callout_mask)) + cpu_relax(); +} + /* * Activate a secondary processor. */ @@ -237,13 +248,16 @@ static void notrace start_secondary(void load_cr3(swapper_pg_dir); __flush_tlb_all(); #endif + cpu_init_exception_handling(); + /* -* Sync point with wait_cpu_initialized(). Before proceeding through -* cpu_init(), the AP will call wait_for_master_cpu() which sets its -* own bit in cpu_initialized_mask and then waits for the BSP to set -* its bit in cpu_callout_mask to release it. +* Sync point with wait_cpu_initialized(). Sets AP in +* cpu_initialized_mask and then waits for the control CPU +* to release it. */ - cpu_init_secondary(); + wait_for_master_cpu(raw_smp_processor_id()); + + cpu_init(); rcu_cpu_starting(raw_smp_processor_id()); x86_cpuinit.early_percpu_clock_init();
[patch V4 07/37] x86/smpboot: Restrict soft_restart_cpu() to SEV
From: Thomas Gleixner Now that the CPU0 hotplug cruft is gone, the only user is AMD SEV. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/kernel/callthunks.c |2 +- arch/x86/kernel/head_32.S| 14 -- arch/x86/kernel/head_64.S|2 +- 3 files changed, 2 insertions(+), 16 deletions(-) --- a/arch/x86/kernel/callthunks.c +++ b/arch/x86/kernel/callthunks.c @@ -133,7 +133,7 @@ static bool skip_addr(void *dest) /* Accounts directly */ if (dest == ret_from_fork) return true; -#ifdef CONFIG_HOTPLUG_CPU +#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_AMD_MEM_ENCRYPT) if (dest == soft_restart_cpu) return true; #endif --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -138,20 +138,6 @@ SYM_CODE_START(startup_32) jmp .Ldefault_entry SYM_CODE_END(startup_32) -#ifdef CONFIG_HOTPLUG_CPU -/* - * Entry point for soft restart of a CPU. Invoked from xxx_play_dead() for - * restarting the boot CPU or for restarting SEV guest CPUs after CPU hot - * unplug. Everything is set up already except the stack. - */ -SYM_FUNC_START(soft_restart_cpu) - movl initial_stack, %ecx - movl %ecx, %esp - call *(initial_code) -1: jmp 1b -SYM_FUNC_END(soft_restart_cpu) -#endif - /* * Non-boot CPU entry point; entered from trampoline.S * We can't lgdt here, because lgdt itself uses a data segment, but --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -375,7 +375,7 @@ SYM_CODE_END(secondary_startup_64) #include "verify_cpu.S" #include "sev_verify_cbit.S" -#ifdef CONFIG_HOTPLUG_CPU +#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_AMD_MEM_ENCRYPT) /* * Entry point for soft restart of a CPU. Invoked from xxx_play_dead() for * restarting the boot CPU or for restarting SEV guest CPUs after CPU hot
[patch V4 08/37] x86/smpboot: Remove unnecessary barrier()
Peter stumbled over the barrier() after the invocation of smp_callin() in start_secondary(): "...this barrier() and it's comment seem weird vs smp_callin(). That function ends with an atomic bitop (it has to, at the very least it must not be weaker than store-release) but also has an explicit wmb() to order setup vs CPU_STARTING. There is no way the smp_processor_id() referred to in this comment can land before cpu_init() even without the barrier()." The barrier() along with the comment was added in 2003 with commit d8f19f2cac70 ("[PATCH] x86-64 merge") in the history tree. One of those well documented combo patches of that time which changes world and some more. The context back then was: /* * Dont put anything before smp_callin(), SMP * booting is too fragile that we want to limit the * things done here to the most necessary things. */ cpu_init(); smp_callin(); + /* otherwise gcc will move up smp_processor_id before the cpu_init */ + barrier(); Dprintk("cpu %d: waiting for commence\n", smp_processor_id()); Even back in 2003 the compiler was not allowed to reorder that smp_processor_id() invocation before the cpu_init() function call. Especially not as smp_processor_id() resolved to: asm volatile("movl %%gs:%c1,%0":"=r" (ret__):"i"(pda_offset(field)):"memory"); There is no trace of this change in any mailing list archive including the back then official x86_64 list disc...@x86-64.org, which would explain the problem this change solved. The debug prints are gone by now and the the only smp_processor_id() invocation today is farther down in start_secondary() after locking vector_lock which itself prevents reordering. Even if the compiler would be allowed to reorder this, the code would still be correct as GSBASE is set up early in the assembly code and is valid when the CPU reaches start_secondary(), while the code at the time when this barrier was added did the GSBASE setup in cpu_init(). As the barrier has zero value, remove it. Reported-by: Peter Zijlstra Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20230509100421.gu83...@hirez.programming.kicks-ass.net --- V4: New patch --- arch/x86/kernel/smpboot.c |2 -- 1 file changed, 2 deletions(-) --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -238,8 +238,6 @@ static void notrace start_secondary(void x86_cpuinit.early_percpu_clock_init(); smp_callin(); - /* otherwise gcc will move up smp_processor_id before the cpu_init */ - barrier(); /* Check TSC synchronization with the control CPU: */ check_tsc_sync_target();
[patch V4 09/37] x86/smpboot: Split up native_cpu_up() into separate phases and document them
From: David Woodhouse There are four logical parts to what native_cpu_up() does on the BSP (or on the controlling CPU for a later hotplug): 1) Wake the AP by sending the INIT/SIPI/SIPI sequence. 2) Wait for the AP to make it as far as wait_for_master_cpu() which sets that CPU's bit in cpu_initialized_mask, then sets the bit in cpu_callout_mask to let the AP proceed through cpu_init(). 3) Wait for the AP to finish cpu_init() and get as far as the smp_callin() call, which sets that CPU's bit in cpu_callin_mask. 4) Perform the TSC synchronization and wait for the AP to actually mark itself online in cpu_online_mask. In preparation to allow these phases to operate in parallel on multiple APs, split them out into separate functions and document the interactions a little more clearly in both the BP and AP code paths. No functional change intended. Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/kernel/smpboot.c | 184 +- 1 file changed, 119 insertions(+), 65 deletions(-) --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -193,6 +193,10 @@ static void smp_callin(void) wmb(); + /* +* This runs the AP through all the cpuhp states to its target +* state CPUHP_ONLINE. +*/ notify_cpu_starting(cpuid); /* @@ -233,12 +237,28 @@ static void notrace start_secondary(void load_cr3(swapper_pg_dir); __flush_tlb_all(); #endif + /* +* Sync point with wait_cpu_initialized(). Before proceeding through +* cpu_init(), the AP will call wait_for_master_cpu() which sets its +* own bit in cpu_initialized_mask and then waits for the BSP to set +* its bit in cpu_callout_mask to release it. +*/ cpu_init_secondary(); rcu_cpu_starting(raw_smp_processor_id()); x86_cpuinit.early_percpu_clock_init(); + + /* +* Sync point with wait_cpu_callin(). The AP doesn't wait here +* but just sets the bit to let the controlling CPU (BSP) know that +* it's got this far. +*/ smp_callin(); - /* Check TSC synchronization with the control CPU: */ + /* +* Check TSC synchronization with the control CPU, which will do +* its part of this from wait_cpu_online(), making it an implicit +* synchronization point. +*/ check_tsc_sync_target(); /* @@ -257,6 +277,7 @@ static void notrace start_secondary(void * half valid vector space. */ lock_vector_lock(); + /* Sync point with do_wait_cpu_online() */ set_cpu_online(smp_processor_id(), true); lapic_online(); unlock_vector_lock(); @@ -979,17 +1000,13 @@ int common_cpu_up(unsigned int cpu, stru /* * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad * (ie clustered apic addressing mode), this is a LOGICAL apic ID. - * Returns zero if CPU booted OK, else error code from + * Returns zero if startup was successfully sent, else error code from * ->wakeup_secondary_cpu. */ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle) { - /* start_ip had better be page-aligned! */ unsigned long start_ip = real_mode_header->trampoline_start; - unsigned long boot_error = 0; - unsigned long timeout; - #ifdef CONFIG_X86_64 /* If 64-bit wakeup method exists, use the 64-bit mode trampoline IP */ if (apic->wakeup_secondary_cpu_64) @@ -1046,60 +1063,89 @@ static int do_boot_cpu(int apicid, int c * - Use an INIT boot APIC message */ if (apic->wakeup_secondary_cpu_64) - boot_error = apic->wakeup_secondary_cpu_64(apicid, start_ip); + return apic->wakeup_secondary_cpu_64(apicid, start_ip); else if (apic->wakeup_secondary_cpu) - boot_error = apic->wakeup_secondary_cpu(apicid, start_ip); - else - boot_error = wakeup_secondary_cpu_via_init(apicid, start_ip); + return apic->wakeup_secondary_cpu(apicid, start_ip); - if (!boot_error) { - /* -* Wait 10s total for first sign of life from AP -*/ - boot_error = -1; - timeout = jiffies + 10*HZ; - while (time_before(jiffies, timeout)) { - if (cpumask_test_cpu(cpu, cpu_initialized_mask)) { - /* -* Tell AP to proceed with initialization -*/ - cpumask_set_cpu(cpu, cpu_callout_mask); - boot_error = 0; - break; - } - schedule(); - } - } + return wakeup_secondary_cpu_via_init(apicid, start_ip); +} -
[patch V4 05/37] x86/topology: Remove CPU0 hotplug option
From: Thomas Gleixner This was introduced together with commit e1c467e69040 ("x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI") to eventually support physical hotplug of CPU0: "We'll change this code in the future to wake up hard offlined CPU0 if real platform and request are available." 11 years later this has not happened and physical hotplug is not officially supported. Remove the cruft. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- Documentation/admin-guide/kernel-parameters.txt | 14 --- Documentation/core-api/cpu_hotplug.rst | 13 --- arch/x86/Kconfig| 43 -- arch/x86/include/asm/cpu.h |3 arch/x86/kernel/topology.c | 98 arch/x86/power/cpu.c| 37 - 6 files changed, 6 insertions(+), 202 deletions(-) --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -818,20 +818,6 @@ Format: ,,,[,] - cpu0_hotplug[X86] Turn on CPU0 hotplug feature when - CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off. - Some features depend on CPU0. Known dependencies are: - 1. Resume from suspend/hibernate depends on CPU0. - Suspend/hibernate will fail if CPU0 is offline and you - need to online CPU0 before suspend/hibernate. - 2. PIC interrupts also depend on CPU0. CPU0 can't be - removed if a PIC interrupt is detected. - It's said poweroff/reboot may depend on CPU0 on some - machines although I haven't seen such issues so far - after CPU0 is offline on a few tested machines. - If the dependencies are under your control, you can - turn on cpu0_hotplug. - cpuidle.off=1 [CPU_IDLE] disable the cpuidle sub-system --- a/Documentation/core-api/cpu_hotplug.rst +++ b/Documentation/core-api/cpu_hotplug.rst @@ -127,17 +127,8 @@ Once the CPU is shutdown, it will be rem $ echo 1 > /sys/devices/system/cpu/cpu4/online smpboot: Booting Node 0 Processor 4 APIC 0x1 -The CPU is usable again. This should work on all CPUs. CPU0 is often special -and excluded from CPU hotplug. On X86 the kernel option -*CONFIG_BOOTPARAM_HOTPLUG_CPU0* has to be enabled in order to be able to -shutdown CPU0. Alternatively the kernel command option *cpu0_hotplug* can be -used. Some known dependencies of CPU0: - -* Resume from hibernate/suspend. Hibernate/suspend will fail if CPU0 is offline. -* PIC interrupts. CPU0 can't be removed if a PIC interrupt is detected. - -Please let Fenghua Yu know if you find any dependencies -on CPU0. +The CPU is usable again. This should work on all CPUs, but CPU0 is often special +and excluded from CPU hotplug. The CPU hotplug coordination --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2305,49 +2305,6 @@ config HOTPLUG_CPU def_bool y depends on SMP -config BOOTPARAM_HOTPLUG_CPU0 - bool "Set default setting of cpu0_hotpluggable" - depends on HOTPLUG_CPU - help - Set whether default state of cpu0_hotpluggable is on or off. - - Say Y here to enable CPU0 hotplug by default. If this switch - is turned on, there is no need to give cpu0_hotplug kernel - parameter and the CPU0 hotplug feature is enabled by default. - - Please note: there are two known CPU0 dependencies if you want - to enable the CPU0 hotplug feature either by this switch or by - cpu0_hotplug kernel parameter. - - First, resume from hibernate or suspend always starts from CPU0. - So hibernate and suspend are prevented if CPU0 is offline. - - Second dependency is PIC interrupts always go to CPU0. CPU0 can not - offline if any interrupt can not migrate out of CPU0. There may - be other CPU0 dependencies. - - Please make sure the dependencies are under your control before - you enable this feature. - - Say N if you don't want to enable CPU0 hotplug feature by default. - You still can enable the CPU0 hotplug feature at boot by kernel - parameter cpu0_hotplug. - -config DEBUG_HOTPLUG_CPU0 - def_bool n - prompt "Debug CPU0 hotplug" - depends on HOTPLUG_CPU - help - Enabling this option offlines CPU0 (if CPU0 can be offlined) as - soon as possible and boots up userspace with CPU0 offlined. User - can online CPU0 back after boot time. - - To debug CPU0 hotplug, you need to enable CPU0 offline/online - feature by either turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during - compilation or giving
[patch V4 04/37] x86/smpboot: Rename start_cpu0() to soft_restart_cpu()
From: Thomas Gleixner This is used in the SEV play_dead() implementation to re-online CPUs. But that has nothing to do with CPU0. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/include/asm/cpu.h |2 +- arch/x86/kernel/callthunks.c |2 +- arch/x86/kernel/head_32.S| 10 +- arch/x86/kernel/head_64.S| 10 +- arch/x86/kernel/sev.c|2 +- 5 files changed, 13 insertions(+), 13 deletions(-) --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -30,7 +30,7 @@ struct x86_cpu { #ifdef CONFIG_HOTPLUG_CPU extern int arch_register_cpu(int num); extern void arch_unregister_cpu(int); -extern void start_cpu0(void); +extern void soft_restart_cpu(void); #ifdef CONFIG_DEBUG_HOTPLUG_CPU0 extern int _debug_hotplug_cpu(int cpu, int action); #endif --- a/arch/x86/kernel/callthunks.c +++ b/arch/x86/kernel/callthunks.c @@ -134,7 +134,7 @@ static bool skip_addr(void *dest) if (dest == ret_from_fork) return true; #ifdef CONFIG_HOTPLUG_CPU - if (dest == start_cpu0) + if (dest == soft_restart_cpu) return true; #endif #ifdef CONFIG_FUNCTION_TRACER --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -140,16 +140,16 @@ SYM_CODE_END(startup_32) #ifdef CONFIG_HOTPLUG_CPU /* - * Boot CPU0 entry point. It's called from play_dead(). Everything has been set - * up already except stack. We just set up stack here. Then call - * start_secondary(). + * Entry point for soft restart of a CPU. Invoked from xxx_play_dead() for + * restarting the boot CPU or for restarting SEV guest CPUs after CPU hot + * unplug. Everything is set up already except the stack. */ -SYM_FUNC_START(start_cpu0) +SYM_FUNC_START(soft_restart_cpu) movl initial_stack, %ecx movl %ecx, %esp call *(initial_code) 1: jmp 1b -SYM_FUNC_END(start_cpu0) +SYM_FUNC_END(soft_restart_cpu) #endif /* --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -377,11 +377,11 @@ SYM_CODE_END(secondary_startup_64) #ifdef CONFIG_HOTPLUG_CPU /* - * Boot CPU0 entry point. It's called from play_dead(). Everything has been set - * up already except stack. We just set up stack here. Then call - * start_secondary() via .Ljump_to_C_code. + * Entry point for soft restart of a CPU. Invoked from xxx_play_dead() for + * restarting the boot CPU or for restarting SEV guest CPUs after CPU hot + * unplug. Everything is set up already except the stack. */ -SYM_CODE_START(start_cpu0) +SYM_CODE_START(soft_restart_cpu) ANNOTATE_NOENDBR UNWIND_HINT_END_OF_STACK @@ -390,7 +390,7 @@ SYM_CODE_START(start_cpu0) movqTASK_threadsp(%rcx), %rsp jmp .Ljump_to_C_code -SYM_CODE_END(start_cpu0) +SYM_CODE_END(soft_restart_cpu) #endif #ifdef CONFIG_AMD_MEM_ENCRYPT --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -1328,7 +1328,7 @@ static void sev_es_play_dead(void) * If we get here, the VCPU was woken up again. Jump to CPU * startup code to get it back online. */ - start_cpu0(); + soft_restart_cpu(); } #else /* CONFIG_HOTPLUG_CPU */ #define sev_es_play_dead native_play_dead
[patch V4 02/37] cpu/hotplug: Mark arch_disable_smp_support() and bringup_nonboot_cpus() __init
From: Thomas Gleixner No point in keeping them around. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/kernel/smpboot.c |4 ++-- kernel/cpu.c |2 +- kernel/smp.c |2 +- 3 files changed, 4 insertions(+), 4 deletions(-) --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1269,9 +1269,9 @@ int native_cpu_up(unsigned int cpu, stru } /** - * arch_disable_smp_support() - disables SMP support for x86 at runtime + * arch_disable_smp_support() - Disables SMP support for x86 at boottime */ -void arch_disable_smp_support(void) +void __init arch_disable_smp_support(void) { disable_ioapic_support(); } --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1502,7 +1502,7 @@ int bringup_hibernate_cpu(unsigned int s return 0; } -void bringup_nonboot_cpus(unsigned int setup_max_cpus) +void __init bringup_nonboot_cpus(unsigned int setup_max_cpus) { unsigned int cpu; --- a/kernel/smp.c +++ b/kernel/smp.c @@ -892,7 +892,7 @@ EXPORT_SYMBOL(setup_max_cpus); * SMP mode to . */ -void __weak arch_disable_smp_support(void) { } +void __weak __init arch_disable_smp_support(void) { } static int __init nosmp(char *str) {
[patch V4 06/37] x86/smpboot: Remove the CPU0 hotplug kludge
From: Thomas Gleixner This was introduced with commit e1c467e69040 ("x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI") to eventually support physical hotplug of CPU0: "We'll change this code in the future to wake up hard offlined CPU0 if real platform and request are available." 11 years later this has not happened and physical hotplug is not officially supported. Remove the cruft. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/include/asm/apic.h |1 arch/x86/include/asm/smp.h|1 arch/x86/kernel/smpboot.c | 170 +++--- drivers/acpi/processor_idle.c |4 4 files changed, 14 insertions(+), 162 deletions(-) --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -377,7 +377,6 @@ extern struct apic *__apicdrivers[], *__ * APIC functionality to boot other CPUs - only used on SMP: */ #ifdef CONFIG_SMP -extern int wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip); extern int lapic_can_unplug_cpu(void); #endif --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -130,7 +130,6 @@ void native_play_dead(void); void play_dead_common(void); void wbinvd_on_cpu(int cpu); int wbinvd_on_all_cpus(void); -void cond_wakeup_cpu0(void); void native_smp_send_reschedule(int cpu); void native_send_call_func_ipi(const struct cpumask *mask); --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -216,9 +216,6 @@ static void ap_calibrate_delay(void) cpu_data(smp_processor_id()).loops_per_jiffy = loops_per_jiffy; } -static int cpu0_logical_apicid; -static int enable_start_cpu0; - /* * Activate a secondary processor. */ @@ -241,8 +238,6 @@ static void notrace start_secondary(void x86_cpuinit.early_percpu_clock_init(); smp_callin(); - enable_start_cpu0 = 0; - /* otherwise gcc will move up smp_processor_id before the cpu_init */ barrier(); /* Check TSC synchronization with the control CPU: */ @@ -410,7 +405,7 @@ void smp_store_cpu_info(int id) c->cpu_index = id; /* * During boot time, CPU0 has this setup already. Save the info when -* bringing up AP or offlined CPU0. +* bringing up an AP. */ identify_secondary_cpu(c); c->initialized = true; @@ -807,51 +802,14 @@ static void __init smp_quirk_init_udelay } /* - * Poke the other CPU in the eye via NMI to wake it up. Remember that the normal - * INIT, INIT, STARTUP sequence will reset the chip hard for us, and this - * won't ... remember to clear down the APIC, etc later. - */ -int -wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip) -{ - u32 dm = apic->dest_mode_logical ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL; - unsigned long send_status, accept_status = 0; - int maxlvt; - - /* Target chip */ - /* Boot on the stack */ - /* Kick the second */ - apic_icr_write(APIC_DM_NMI | dm, apicid); - - pr_debug("Waiting for send to finish...\n"); - send_status = safe_apic_wait_icr_idle(); - - /* -* Give the other CPU some time to accept the IPI. -*/ - udelay(200); - if (APIC_INTEGRATED(boot_cpu_apic_version)) { - maxlvt = lapic_get_maxlvt(); - if (maxlvt > 3) /* Due to the Pentium erratum 3AP. */ - apic_write(APIC_ESR, 0); - accept_status = (apic_read(APIC_ESR) & 0xEF); - } - pr_debug("NMI sent\n"); - - if (send_status) - pr_err("APIC never delivered???\n"); - if (accept_status) - pr_err("APIC delivery error (%lx)\n", accept_status); - - return (send_status | accept_status); -} - -static int -wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip) + * Wake up AP by INIT, INIT, STARTUP sequence. + */ +static int wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip) { unsigned long send_status = 0, accept_status = 0; int maxlvt, num_starts, j; + preempt_disable(); maxlvt = lapic_get_maxlvt(); /* @@ -957,6 +915,7 @@ wakeup_secondary_cpu_via_init(int phys_a if (accept_status) pr_err("APIC delivery error (%lx)\n", accept_status); + preempt_enable(); return (send_status | accept_status); } @@ -997,67 +956,6 @@ static void announce_cpu(int cpu, int ap node, cpu, apicid); } -static int wakeup_cpu0_nmi(unsigned int cmd, struct pt_regs *regs) -{ - int cpu; - - cpu = smp_processor_id(); - if (cpu == 0 && !cpu_online(cpu) && enable_start_cpu0) - return NMI_HANDLED; - - return NMI_DONE; -} - -/* - * Wake up AP by INIT, INIT, STARTUP sequence. - * - * Instead of waiting for STARTUP after INITs, BSP will execute the BIOS - * boot-strap code which is not a desired behavior for
[patch V4 03/37] x86/smpboot: Avoid pointless delay calibration if TSC is synchronized
From: Thomas Gleixner When TSC is synchronized across sockets then there is no reason to calibrate the delay for the first CPU which comes up on a socket. Just reuse the existing calibration value. This removes 100ms pointlessly wasted time from CPU hotplug per socket. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/kernel/smpboot.c | 40 +--- arch/x86/kernel/tsc.c | 20 2 files changed, 41 insertions(+), 19 deletions(-) --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -178,28 +178,17 @@ static void smp_callin(void) */ apic_ap_setup(); - /* -* Save our processor parameters. Note: this information -* is needed for clock calibration. -*/ + /* Save our processor parameters. */ smp_store_cpu_info(cpuid); /* * The topology information must be up to date before -* calibrate_delay() and notify_cpu_starting(). +* notify_cpu_starting(). */ set_cpu_sibling_map(raw_smp_processor_id()); ap_init_aperfmperf(); - /* -* Get our bogomips. -* Update loops_per_jiffy in cpu_data. Previous call to -* smp_store_cpu_info() stored a value that is close but not as -* accurate as the value just calculated. -*/ - calibrate_delay(); - cpu_data(cpuid).loops_per_jiffy = loops_per_jiffy; pr_debug("Stack at about %p\n", ); wmb(); @@ -212,8 +201,24 @@ static void smp_callin(void) cpumask_set_cpu(cpuid, cpu_callin_mask); } +static void ap_calibrate_delay(void) +{ + /* +* Calibrate the delay loop and update loops_per_jiffy in cpu_data. +* smp_store_cpu_info() stored a value that is close but not as +* accurate as the value just calculated. +* +* As this is invoked after the TSC synchronization check, +* calibrate_delay_is_known() will skip the calibration routine +* when TSC is synchronized across sockets. +*/ + calibrate_delay(); + cpu_data(smp_processor_id()).loops_per_jiffy = loops_per_jiffy; +} + static int cpu0_logical_apicid; static int enable_start_cpu0; + /* * Activate a secondary processor. */ @@ -240,10 +245,15 @@ static void notrace start_secondary(void /* otherwise gcc will move up smp_processor_id before the cpu_init */ barrier(); + /* Check TSC synchronization with the control CPU: */ + check_tsc_sync_target(); + /* -* Check TSC synchronization with the boot CPU: +* Calibrate the delay loop after the TSC synchronization check. +* This allows to skip the calibration when TSC is synchronized +* across sockets. */ - check_tsc_sync_target(); + ap_calibrate_delay(); speculative_store_bypass_ht_init(); --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1598,10 +1598,7 @@ void __init tsc_init(void) #ifdef CONFIG_SMP /* - * If we have a constant TSC and are using the TSC for the delay loop, - * we can skip clock calibration if another cpu in the same socket has already - * been calibrated. This assumes that CONSTANT_TSC applies to all - * cpus in the socket - this should be a safe assumption. + * Check whether existing calibration data can be reused. */ unsigned long calibrate_delay_is_known(void) { @@ -1609,6 +1606,21 @@ unsigned long calibrate_delay_is_known(v int constant_tsc = cpu_has(_data(cpu), X86_FEATURE_CONSTANT_TSC); const struct cpumask *mask = topology_core_cpumask(cpu); + /* +* If TSC has constant frequency and TSC is synchronized across +* sockets then reuse CPU0 calibration. +*/ + if (constant_tsc && !tsc_unstable) + return cpu_data(0).loops_per_jiffy; + + /* +* If TSC has constant frequency and TSC is not synchronized across +* sockets and this is not the first CPU in the socket, then reuse +* the calibration value of an already online CPU on that socket. +* +* This assumes that CONSTANT_TSC is consistent for all CPUs in a +* socket. +*/ if (!constant_tsc || !mask) return 0;
[patch V4 01/37] x86/smpboot: Cleanup topology_phys_to_logical_pkg()/die()
From: Thomas Gleixner Make topology_phys_to_logical_pkg_die() static as it's only used in smpboot.c and fixup the kernel-doc warnings for both functions. Signed-off-by: Thomas Gleixner Tested-by: Michael Kelley --- arch/x86/include/asm/topology.h |3 --- arch/x86/kernel/smpboot.c | 10 ++ 2 files changed, 6 insertions(+), 7 deletions(-) --- --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -139,7 +139,6 @@ static inline int topology_max_smt_threa int topology_update_package_map(unsigned int apicid, unsigned int cpu); int topology_update_die_map(unsigned int dieid, unsigned int cpu); int topology_phys_to_logical_pkg(unsigned int pkg); -int topology_phys_to_logical_die(unsigned int die, unsigned int cpu); bool topology_is_primary_thread(unsigned int cpu); bool topology_smt_supported(void); #else @@ -149,8 +148,6 @@ topology_update_package_map(unsigned int static inline int topology_update_die_map(unsigned int dieid, unsigned int cpu) { return 0; } static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; } -static inline int topology_phys_to_logical_die(unsigned int die, - unsigned int cpu) { return 0; } static inline int topology_max_die_per_package(void) { return 1; } static inline int topology_max_smt_threads(void) { return 1; } static inline bool topology_is_primary_thread(unsigned int cpu) { return true; } --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -288,6 +288,7 @@ bool topology_smt_supported(void) /** * topology_phys_to_logical_pkg - Map a physical package id to a logical + * @phys_pkg: The physical package id to map * * Returns logical package id or -1 if not found */ @@ -304,15 +305,17 @@ int topology_phys_to_logical_pkg(unsigne return -1; } EXPORT_SYMBOL(topology_phys_to_logical_pkg); + /** * topology_phys_to_logical_die - Map a physical die id to logical + * @die_id:The physical die id to map + * @cur_cpu: The CPU for which the mapping is done * * Returns logical die id or -1 if not found */ -int topology_phys_to_logical_die(unsigned int die_id, unsigned int cur_cpu) +static int topology_phys_to_logical_die(unsigned int die_id, unsigned int cur_cpu) { - int cpu; - int proc_id = cpu_data(cur_cpu).phys_proc_id; + int cpu, proc_id = cpu_data(cur_cpu).phys_proc_id; for_each_possible_cpu(cpu) { struct cpuinfo_x86 *c = _data(cpu); @@ -323,7 +326,6 @@ int topology_phys_to_logical_die(unsigne } return -1; } -EXPORT_SYMBOL(topology_phys_to_logical_die); /** * topology_update_package_map - Update the physical to logical package map
[patch V4 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup
Hi! This is version 4 of the reworked parallel bringup series. Version 3 can be found here: https://lore.kernel.org/lkml/20230508181633.089804...@linutronix.de This is just a reiteration to address the following details: 1) Address review feedback (Peter Zijlstra) 2) Fix a MIPS related build problem (0day) Other than that there are no changes and the other details are all the same as in V3 and V2. It's also available from git: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git hotplug Diff to V3 below. Thanks, tglx --- diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c index f5e0f4235746..90c71d800b59 100644 --- a/arch/mips/kernel/smp.c +++ b/arch/mips/kernel/smp.c @@ -690,7 +690,7 @@ void flush_tlb_one(unsigned long vaddr) EXPORT_SYMBOL(flush_tlb_page); EXPORT_SYMBOL(flush_tlb_one); -#ifdef CONFIG_HOTPLUG_CPU +#ifdef CONFIG_HOTPLUG_CORE_SYNC_DEAD void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu) { if (mp_ops->cleanup_dead_cpu) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 0438802031c3..9cd77d319555 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -290,8 +290,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) /* APIC ID not found in the table. Drop the trampoline lock and bail. */ movqtrampoline_lock(%rip), %rax - lock - btrl$0, (%rax) + movl$0, (%rax) 1: cli hlt @@ -320,8 +319,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) movqtrampoline_lock(%rip), %rax testq %rax, %rax jz .Lsetup_gdt - lock - btrl$0, (%rax) + movl$0, (%rax) .Lsetup_gdt: /* diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 5caf4897b507..660709e94823 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -161,31 +161,28 @@ static inline void smpboot_restore_warm_reset_vector(void) } -/* - * Report back to the Boot Processor during boot time or to the caller processor - * during CPU online. - */ -static void smp_callin(void) +/* Run the next set of setup steps for the upcoming CPU */ +static void ap_starting(void) { int cpuid = smp_processor_id(); /* -* If waken up by an INIT in an 82489DX configuration the alive -* synchronization guarantees we don't get here before an -* INIT_deassert IPI reaches our local APIC, so it is now safe to -* touch our local APIC. +* If woken up by an INIT in an 82489DX configuration the alive +* synchronization guarantees that the CPU does not reach this +* point before an INIT_deassert IPI reaches the local APIC, so it +* is now safe to touch the local APIC. * * Set up this CPU, first the APIC, which is probably redundant on * most boards. */ apic_ap_setup(); - /* Save our processor parameters. */ + /* Save the processor parameters. */ smp_store_cpu_info(cpuid); /* * The topology information must be up to date before -* calibrate_delay() and notify_cpu_starting(). +* notify_cpu_starting(). */ set_cpu_sibling_map(cpuid); @@ -197,7 +194,7 @@ static void smp_callin(void) /* * This runs the AP through all the cpuhp states to its target -* state (CPUHP_ONLINE in the case of serial bringup). +* state CPUHP_ONLINE. */ notify_cpu_starting(cpuid); } @@ -274,10 +271,7 @@ static void notrace start_secondary(void *unused) rcu_cpu_starting(raw_smp_processor_id()); x86_cpuinit.early_percpu_clock_init(); - smp_callin(); - - /* Otherwise gcc will move up smp_processor_id() before cpu_init() */ - barrier(); + ap_starting(); /* Check TSC synchronization with the control CPU. */ check_tsc_sync_target(); diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S index 2dfb1c400167..c6de4deec746 100644 --- a/arch/x86/realmode/rm/trampoline_64.S +++ b/arch/x86/realmode/rm/trampoline_64.S @@ -40,17 +40,13 @@ .macro LOAD_REALMODE_ESP /* * Make sure only one CPU fiddles with the realmode stack -*/ + */ .Llock_rm\@: - btl $0, tr_lock - jnc 2f - pause - jmp .Llock_rm\@ +lock btsl $0, tr_lock +jnc 2f +pause +jmp .Llock_rm\@ 2: - lock - btsl$0, tr_lock - jc .Llock_rm\@ - # Setup stack movl$rm_stack_end, %esp .endm diff --git a/kernel/cpu.c b/kernel/cpu.c index 60b4093fae9e..005f863a3d2b 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -294,14 +294,14 @@ enum cpuhp_sync_state { * cpuhp_ap_update_sync_state - Update synchronization state during bringup/teardown * @state: The
Re: [PATCH v2 5/8] pci/arm: Use iommu_add_dt_pci_device()
On 5/12/23 03:25, Jan Beulich wrote: > On 11.05.2023 21:16, Stewart Hildebrand wrote: >> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, >> pdev->domain = NULL; >> goto out; >> } >> +#ifdef CONFIG_HAS_DEVICE_TREE >> +ret = iommu_add_dt_pci_device(pdev); >> +if ( ret < 0 ) >> +{ >> +printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret); >> +goto out; >> +} >> +#endif >> ret = iommu_add_device(pdev); > > Hmm, am I misremembering that in the earlier patch you had #else to > invoke the alternative behavior? You are remembering correctly. v1 had an #else, v2 does not. > Now you end up calling both functions; > if that's indeed intended, Yes, this is intentional. > this may still want doing differently. > Looking at the earlier patch introducing the function, I can't infer > though whether that's intended: iommu_add_dt_pci_device() checks that > the add_device hook is present, but then I didn't find any use of this > hook. The revlog there suggests the check might be stale. Ah, right, the ops->add_device check is stale in the other patch. Good catch, I'll remove it there. > If indeed the function does only preparatory work, I don't see why it > would need naming "iommu_..."; I'd rather consider pci_add_dt_device() > then. The function has now been reduced to reading SMMU configuration data from DT and mapping RID/BDF -> AXI stream ID. However, it is still SMMU related, and it is still invoking another iommu_ops hook function, dt_xlate (which is yet another AXI stream ID translation, separate from what is being discussed here). Does this justify keeping "iommu_..." in the name? I'm not convinced pci_add_dt_device() is a good name for it either (more on this below). > Plus in such a case #ifdef-ary here probably wants avoiding by > introducing a suitable no-op stub for the !HAS_DEVICE_TREE case. Then > ... > >> if ( ret ) >> { >> +#ifdef CONFIG_HAS_DEVICE_TREE >> +iommu_fwspec_free(pci_to_dev(pdev)); >> +#endif > > ... this (which I understand is doing the corresponding cleanup) then > also wants wrapping in a suitably named tiny helper function. Sure, I'm on board with eliminating/reducing the #ifdef-ary where possible. Will do. > But yet further I'm then no longer convinced this is the right place > for the addition. pci_add_device() is backing physdev hypercalls. It > would seem to me that the function may want invoking yet one layer > further up, or it may even want invoking from a brand new DT-specific > physdev-op. This would then leave at least the x86-only paths (invoking > pci_add_device() from outside of pci_physdev_op()) entirely alone. Let's establish that pci_add_device()/iommu_add_device() are already inherently performing tasks related to setting up a PCI device to work with an IOMMU. The preparatory work in question needs to happen after: pci_add_device() -> alloc_pdev() since we need to know all the possible RIDs (including those for phantom functions), but before the add_device iommu hook: pci_add_device() -> iommu_add_device() -> iommu_call(hd->platform_ops, add_device, ...) The preparatory work (i.e. mapping RID/BDF -> AXI stream ID) is inherently associated with setting up a PCI device to work with an ARM SMMU (but not any particular variant of the SMMU). The SMMU distinguishes what PCI device/function DMA traffic is associated with based on the derived AXI stream ID (sideband data), not the RID/BDF directly. See [1]. Moving the preparatory work one layer up would mean duplicating what alloc_pdev() is already doing to set up pdev->phantom_stride (which we need to figure out all RIDs for that particular device). Moving it down into the individual SMMU drivers (smmu_ops/platform_ops) would mean duplicating special phantom function handling in each SMMU driver, and further deviating from the Linux SMMU driver(s) they are based on. It still feels to me like pci_add_device() (or iommu_add_device()) is the right place to perform the RID/BDF -> AXI stream ID mapping. Since there's nothing inherently DT specific (or ACPI specific) about deriving sideband data from RID/BDF, let me propose a new name for the function (instead of iommu_add_dt_pci_device): iommu_derive_pci_device_sideband_IDs() Now, as far as DT and ACPI co-existing goes, I admit I haven't tested with CONFIG_ACPI=y yet (there seems to be some issues when both CONFIG_ARM_SMMU_V3=y and CONFIG_ACPI=y are enabled, even in staging). But I do recognize that we need a way support both CONFIG_HAS_DEVICE_TREE=y and CONFIG_ACPI=y simultaneously. Let me think on that for a bit... [1] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt
[xen-unstable test] 180632: tolerable FAIL
flight 180632 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/180632/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail in 180624 pass in 180632 test-amd64-i386-examine-uefi 6 xen-installfail pass in 180624 test-amd64-i386-pair 10 xen-install/src_host fail pass in 180624 test-amd64-i386-freebsd10-i386 7 xen-install fail pass in 180624 Tests which did not succeed, but are not blocking: test-amd64-i386-examine 6 xen-install fail like 180624 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180624 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 180624 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180624 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 180624 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 180624 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 180624 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 180624 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180624 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 180624 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180624 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 180624 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180624 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never
Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users
On Fri, May 12, 2023 at 01:56:29PM +0300, Andy Shevchenko wrote: > On Tue, May 09, 2023 at 01:21:22PM -0500, Bjorn Helgaas wrote: > > On Tue, Apr 04, 2023 at 11:11:01AM -0500, Bjorn Helgaas wrote: > > > On Thu, Mar 30, 2023 at 07:24:27PM +0300, Andy Shevchenko wrote: > > > > Provide two new helper macros to iterate over PCI device resources and > > > > convert users. > > > > > Applied 2-7 to pci/resource for v6.4, thanks, I really like this! > > > > This is 09cc90063240 ("PCI: Introduce pci_dev_for_each_resource()") > > upstream now. > > > > Coverity complains about each use, > > It needs more clarification here. Use of reduced variant of the > macro or all of them? If the former one, then I can speculate that > Coverity (famous for false positives) simply doesn't understand `for > (type var; var ...)` code. True, Coverity finds false positives. It flagged every use in drivers/pci and drivers/pnp. It didn't mention the arch/alpha, arm, mips, powerpc, sh, or sparc uses, but I think it just didn't look at those. It flagged both: pbus_size_iopci_dev_for_each_resource(dev, r) pbus_size_mem pci_dev_for_each_resource(dev, r, i) Here's a spreadsheet with a few more details (unfortunately I don't know how to make it dump the actual line numbers or analysis like I pasted below, so "pci_dev_for_each_resource" doesn't appear). These are mostly in the "Drivers-PCI" component. https://docs.google.com/spreadsheets/d/1ohOJwxqXXoDUA0gwopgk-z-6ArLvhN7AZn4mIlDkHhQ/edit?usp=sharing These particular reports are in the "High Impact Outstanding" tab. > > sample below from > > drivers/pci/vgaarb.c. I didn't investigate at all, so it might be a > > false positive; just FYI. > > > > 1. Condition screen_info.capabilities & (2U /* 1 << 1 */), taking > > true branch. > > 556if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE) > > 557base |= (u64)screen_info.ext_lfb_base << 32; > > 558 > > 559limit = base + size; > > 560 > > 561/* Does firmware framebuffer belong to us? */ > > 2. Condition __b < PCI_NUM_RESOURCES, taking true branch. > > 3. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), > > taking true branch. > > 6. Condition __b < PCI_NUM_RESOURCES, taking true branch. > > 7. cond_at_most: Checking __b < PCI_NUM_RESOURCES implies that __b > > may be up to 16 on the true branch. > > 8. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), > > taking true branch. > > 11. incr: Incrementing __b. The value of __b may now be up to 17. > > 12. alias: Assigning: r = >resource[__b]. r may now point to as > > high as element 17 of pdev->resource (which consists of 17 64-byte > > elements). > > 13. Condition __b < PCI_NUM_RESOURCES, taking true branch. > > 14. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), > > taking true branch. > > 562pci_dev_for_each_resource(pdev, r) { > > 4. Condition resource_type(r) != 512, taking true branch. > > 9. Condition resource_type(r) != 512, taking true branch. > > > > CID 1529911 (#1 of 1): Out-of-bounds read (OVERRUN) > > 15. overrun-local: Overrunning array of 1088 bytes at byte offset 1088 by > > dereferencing pointer r. [show details] > > 563if (resource_type(r) != IORESOURCE_MEM) > > 5. Continuing loop. > > 10. Continuing loop. > > 564continue; > > -- > With Best Regards, > Andy Shevchenko > >
Re: [PATCH 2/2] xen/arm: smmuv3: Advertise coherent table walk if supported
Hi Michal, On 12/05/2023 15:35, Michal Orzel wrote: At the moment, even in case of a SMMU being I/O coherent, we clean the updated PT as a result of not advertising the coherency feature. SMMUv3 coherency feature means that page table walks, accesses to memory structures and queues are I/O coherent (refer ARM IHI 0070 E.A, 3.15). Follow the same steps that were done for SMMU v1,v2 driver by the commit: 080dcb781e1bc3bb22f55a9dfdecb830ccbabe88 The same restrictions apply, meaning that in order to advertise coherent table walk platform feature, all the SMMU devices need to report coherency feature. This is because the page tables (we are sharing them with CPU) are populated before any device assignment and in case of a device being behind non-coherent SMMU, we would have to scan the tables and clean the cache. It is to be noted that the SBSA/BSA (refer ARM DEN0094C 1.0C, section D) requires that all SMMUv3 devices support I/O coherency. Signed-off-by: Michal Orzel --- There are very few platforms out there with SMMUv3 but I have never seen a SMMUv3 that is not I/O coherent. --- xen/drivers/passthrough/arm/smmu-v3.c | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c index bf053cdb6d5c..2adaad0fa038 100644 --- a/xen/drivers/passthrough/arm/smmu-v3.c +++ b/xen/drivers/passthrough/arm/smmu-v3.c @@ -2526,6 +2526,15 @@ static const struct dt_device_match arm_smmu_of_match[] = { }; /* Start of Xen specific code. */ + +/* + * Platform features. It indicates the list of features supported by all + * SMMUs. Actually we only care about coherent table walk, which in case of + * SMMUv3 is implied by the overall coherency feature (refer ARM IHI 0070 E.A, + * section 3.15 and SMMU_IDR0.COHACC bit description). + */ +static uint32_t platform_features = ARM_SMMU_FEAT_COHERENCY; + static int __must_check arm_smmu_iotlb_flush_all(struct domain *d) { struct arm_smmu_xen_domain *xen_domain = dom_iommu(d)->arch.priv; @@ -2708,8 +2717,12 @@ static int arm_smmu_iommu_xen_domain_init(struct domain *d) INIT_LIST_HEAD(_domain->contexts); dom_iommu(d)->arch.priv = xen_domain; - return 0; + /* Coherent walk can be enabled only when all SMMUs support it. */ + if (platform_features & ARM_SMMU_FEAT_COHERENCY) + iommu_set_feature(d, IOMMU_FEAT_COHERENT_WALK); + + return 0; } All good till here. static void arm_smmu_iommu_xen_domain_teardown(struct domain *d) @@ -2738,6 +2751,7 @@ static __init int arm_smmu_dt_init(struct dt_device_node *dev, const void *data) { int rc; + const struct arm_smmu_device *smmu; /* * Even if the device can't be initialized, we don't want to @@ -2751,6 +2765,14 @@ static __init int arm_smmu_dt_init(struct dt_device_node *dev, iommu_set_ops(_smmu_iommu_ops); + /* Find the just added SMMU and retrieve its features. */ + smmu = arm_smmu_get_by_dev(dt_to_dev(dev)); + + /* It would be a bug not to find the SMMU we just added. */ + BUG_ON(!smmu); + + platform_features &= smmu->features; + Can you explain this change in the commit message ? - Ayan return 0; }
Re: [PATCH 1/2] xen/arm: smmuv3: Constify arm_smmu_get_by_dev() parameter
On 12/05/2023 15:35, Michal Orzel wrote: This function does not modify its parameter 'dev' and it is not supposed to do it. Therefore, constify it. Signed-off-by: Michal Orzel Reviewed-by: Ayan Kumar Halder
[ovmf test] 180635: all pass - PUSHED
flight 180635 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/180635/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf d3225577123767fd09c91201d27e9c91663ae132 baseline version: ovmf 0b37723186ec1525b6caf14b0309fb0ed04084d7 Last test of basis 180629 2023-05-12 06:42:08 Z0 days Testing same since 180635 2023-05-12 13:40:43 Z0 days1 attempts People who touched revisions under test: Ard Biesheuvel Gerd Hoffmann jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git 0b37723186..d322557712 d3225577123767fd09c91201d27e9c91663ae132 -> xen-tested-master
[libvirt test] 180628: tolerable FAIL - PUSHED
flight 180628 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/180628/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 180613 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeatfail like 180613 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 180613 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 180613 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass version targeted for testing: libvirt 517d76466b2b798e492cde18b61e4b6db8bd4ce1 baseline version: libvirt db91bf2ba31766809f901e9b2bd02b4a9c917300 Last test of basis 180613 2023-05-11 04:19:02 Z1 days Testing same since 180628 2023-05-12 04:18:49 Z0 days1 attempts People who touched revisions under test: Andrea Bolognani jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-arm64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass test-amd64-amd64-libvirt-xsm pass test-arm64-arm64-libvirt-xsm pass test-amd64-i386-libvirt-xsm pass test-amd64-amd64-libvirt pass test-arm64-arm64-libvirt pass test-armhf-armhf-libvirt pass test-amd64-i386-libvirt pass test-amd64-amd64-libvirt-pairpass test-amd64-i386-libvirt-pair pass test-arm64-arm64-libvirt-qcow2 pass test-armhf-armhf-libvirt-qcow2 pass test-arm64-arm64-libvirt-raw pass test-armhf-armhf-libvirt-raw pass test-amd64-i386-libvirt-raw pass test-amd64-amd64-libvirt-vhd fail sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at
[PATCH 0/2] xen/arm: smmuv3: Advertise coherent table walk
Based on the work done for SMMU v1,v2 by commit: 080dcb781e1bc3bb22f55a9dfdecb830ccbabe88 Michal Orzel (2): xen/arm: smmuv3: Constify arm_smmu_get_by_dev() parameter xen/arm: smmuv3: Advertise coherent table walk if supported xen/drivers/passthrough/arm/smmu-v3.c | 28 --- 1 file changed, 25 insertions(+), 3 deletions(-) -- 2.25.1
[PATCH 2/2] xen/arm: smmuv3: Advertise coherent table walk if supported
At the moment, even in case of a SMMU being I/O coherent, we clean the updated PT as a result of not advertising the coherency feature. SMMUv3 coherency feature means that page table walks, accesses to memory structures and queues are I/O coherent (refer ARM IHI 0070 E.A, 3.15). Follow the same steps that were done for SMMU v1,v2 driver by the commit: 080dcb781e1bc3bb22f55a9dfdecb830ccbabe88 The same restrictions apply, meaning that in order to advertise coherent table walk platform feature, all the SMMU devices need to report coherency feature. This is because the page tables (we are sharing them with CPU) are populated before any device assignment and in case of a device being behind non-coherent SMMU, we would have to scan the tables and clean the cache. It is to be noted that the SBSA/BSA (refer ARM DEN0094C 1.0C, section D) requires that all SMMUv3 devices support I/O coherency. Signed-off-by: Michal Orzel --- There are very few platforms out there with SMMUv3 but I have never seen a SMMUv3 that is not I/O coherent. --- xen/drivers/passthrough/arm/smmu-v3.c | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c index bf053cdb6d5c..2adaad0fa038 100644 --- a/xen/drivers/passthrough/arm/smmu-v3.c +++ b/xen/drivers/passthrough/arm/smmu-v3.c @@ -2526,6 +2526,15 @@ static const struct dt_device_match arm_smmu_of_match[] = { }; /* Start of Xen specific code. */ + +/* + * Platform features. It indicates the list of features supported by all + * SMMUs. Actually we only care about coherent table walk, which in case of + * SMMUv3 is implied by the overall coherency feature (refer ARM IHI 0070 E.A, + * section 3.15 and SMMU_IDR0.COHACC bit description). + */ +static uint32_t platform_features = ARM_SMMU_FEAT_COHERENCY; + static int __must_check arm_smmu_iotlb_flush_all(struct domain *d) { struct arm_smmu_xen_domain *xen_domain = dom_iommu(d)->arch.priv; @@ -2708,8 +2717,12 @@ static int arm_smmu_iommu_xen_domain_init(struct domain *d) INIT_LIST_HEAD(_domain->contexts); dom_iommu(d)->arch.priv = xen_domain; - return 0; + /* Coherent walk can be enabled only when all SMMUs support it. */ + if (platform_features & ARM_SMMU_FEAT_COHERENCY) + iommu_set_feature(d, IOMMU_FEAT_COHERENT_WALK); + + return 0; } static void arm_smmu_iommu_xen_domain_teardown(struct domain *d) @@ -2738,6 +2751,7 @@ static __init int arm_smmu_dt_init(struct dt_device_node *dev, const void *data) { int rc; + const struct arm_smmu_device *smmu; /* * Even if the device can't be initialized, we don't want to @@ -2751,6 +2765,14 @@ static __init int arm_smmu_dt_init(struct dt_device_node *dev, iommu_set_ops(_smmu_iommu_ops); + /* Find the just added SMMU and retrieve its features. */ + smmu = arm_smmu_get_by_dev(dt_to_dev(dev)); + + /* It would be a bug not to find the SMMU we just added. */ + BUG_ON(!smmu); + + platform_features &= smmu->features; + return 0; } -- 2.25.1
[PATCH 1/2] xen/arm: smmuv3: Constify arm_smmu_get_by_dev() parameter
This function does not modify its parameter 'dev' and it is not supposed to do it. Therefore, constify it. Signed-off-by: Michal Orzel --- xen/drivers/passthrough/arm/smmu-v3.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c index bfdb62b395ad..bf053cdb6d5c 100644 --- a/xen/drivers/passthrough/arm/smmu-v3.c +++ b/xen/drivers/passthrough/arm/smmu-v3.c @@ -1468,7 +1468,7 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid) return sid < limit; } /* Forward declaration */ -static struct arm_smmu_device *arm_smmu_get_by_dev(struct device *dev); +static struct arm_smmu_device *arm_smmu_get_by_dev(const struct device *dev); static int arm_smmu_add_device(u8 devfn, struct device *dev) { @@ -2556,7 +2556,7 @@ static int __must_check arm_smmu_iotlb_flush(struct domain *d, dfn_t dfn, return arm_smmu_iotlb_flush_all(d); } -static struct arm_smmu_device *arm_smmu_get_by_dev(struct device *dev) +static struct arm_smmu_device *arm_smmu_get_by_dev(const struct device *dev) { struct arm_smmu_device *smmu = NULL; -- 2.25.1
[qemu-mainline test] 180626: regressions - FAIL
flight 180626 qemu-mainline real [real] flight 180634 qemu-mainline real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/180626/ http://logs.test-lab.xenproject.org/osstest/logs/180634/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeat fail REGR. vs. 180610 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180610 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180610 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 180610 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 180610 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 180610 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 180610 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180610 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 180610 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass version targeted for testing: qemuu278238505d28d292927bff7683f39fb4fbca7fd1 baseline version: qemuud530697ca20e19f7a626f4c1c8b26fccd0dc4470 Last test of basis 180610 2023-05-10 23:38:37 Z1 days Testing same since 180621 2023-05-11 14:27:55 Z0 days2 attempts People who touched revisions under test: Dr. David Alan Gilbert Jamie Iles Juan Quintela Laurent Vivier Lukas Straub Peter Maydell Richard
[PATCH] x86/cpuid: Calculate FEATURESET_NR_ENTRIES more helpfully
When adding new featureset words, it is convenient to split the work into several patches. However, GCC 12 spotted that the way we prefer to split the work results in a real (transient) breakage whereby the policy <-> featureset helpers perform out-of-bounds accesses on the featureset array. Fix this by having gen-cpuid.py calculate FEATURESET_NR_ENTRIES from the comments describing the word blocks, rather than from the XEN_CPUFEATURE() with the greatest value. For simplicty, require that the word blocks appear in order. This can be revisted if we find a good reason to have blocks out of order. No functional change. Reported-by: Jan Beulich Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Wei Liu This supercedes the entire "x86: Fix transient build breakage with featureset additions" series, but doesn't really feel as if it ought to be labelled v2 --- xen/tools/gen-cpuid.py | 42 -- 1 file changed, 36 insertions(+), 6 deletions(-) diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py index 0edb7d4a19f8..e0664e0defe1 100755 --- a/xen/tools/gen-cpuid.py +++ b/xen/tools/gen-cpuid.py @@ -50,13 +50,37 @@ def parse_definitions(state): "\s+([\s\d]+\*[\s\d]+\+[\s\d]+)\)" "\s+/\*([\w!]*) .*$") +word_regex = re.compile( +r"^/\* .* word (\d*) \*/$") +last_word = -1 + this = sys.modules[__name__] for l in state.input.readlines(): -# Short circuit the regex... -if not l.startswith("XEN_CPUFEATURE("): + +# Short circuit the regexes... +if not (l.startswith("XEN_CPUFEATURE(") or +l.startswith("/* ")): continue +# Handle /* ... word $N */ lines +if l.startswith("/* "): + +res = word_regex.match(l) +if res is None: +continue # Some other comment + +word = int(res.groups()[0]) + +if word != last_word + 1: +raise Fail("Featureset word %u out of order (last word %u)" + % (word, last_word)) + +last_word = word +state.nr_entries = word + 1 +continue + +# Handle XEN_CPUFEATURE( lines res = feat_regex.match(l) if res is None: @@ -94,6 +118,15 @@ def parse_definitions(state): if len(state.names) == 0: raise Fail("No features found") +if state.nr_entries == 0: +raise Fail("No featureset word info found") + +max_val = max(state.names.keys()) +if (max_val >> 5) + 1 > state.nr_entries: +max_name = state.names[max_val] +raise Fail("Feature %s (%d*32+%d) exceeds FEATURESET_NR_ENTRIES (%d)" + % (max_name, max_val >> 5, max_val & 31, state.nr_entries)) + def featureset_to_uint32s(fs, nr): """ Represent a featureset as a list of C-compatible uint32_t's """ @@ -122,9 +155,6 @@ def format_uint32s(state, featureset, indent): def crunch_numbers(state): -# Size of bitmaps -state.nr_entries = nr_entries = (max(state.names.keys()) >> 5) + 1 - # Features common between 1d and e1d. common_1d = (FPU, VME, DE, PSE, TSC, MSR, PAE, MCE, CX8, APIC, MTRR, PGE, MCA, CMOV, PAT, PSE36, MMX, FXSR) @@ -328,7 +358,7 @@ def crunch_numbers(state): state.nr_deep_deps = len(state.deep_deps.keys()) # Calculate the bitfield name declarations -for word in range(nr_entries): +for word in range(state.nr_entries): names = [] for bit in range(32): -- 2.30.2
[PATCH v2] tools: drop bogus and obsolete ptyfuncs.m4
According to openpty(3) it is required to include to get the prototypes for openpty() and login_tty(). But this is not what the function AX_CHECK_PTYFUNCS actually does. It makes no attempt to include the required header. The two source files which call openpty() and login_tty() already contain the conditionals to include the required header. Remove the bogus m4 file to fix build with clang, which complains about calls to undeclared functions. Remove usage of INCLUDE_LIBUTIL_H in libxl_bootloader.c, it is already covered by inclusion of libxl_osdep.h. Remove usage of PTYFUNCS_LIBS in libxl/Makefile, it is already covered by UTIL_LIBS from config/StdGNU.mk. Signed-off-by: Olaf Hering --- v2: remove consumers of the macros config/Tools.mk.in | 2 -- m4/ptyfuncs.m4 | 35 - tools/configure.ac | 2 -- tools/libs/light/Makefile | 2 +- tools/libs/light/libxl_bootloader.c | 4 5 files changed, 1 insertion(+), 44 deletions(-) delete mode 100644 m4/ptyfuncs.m4 diff --git a/config/Tools.mk.in b/config/Tools.mk.in index 6abb377564..b7cc2961d8 100644 --- a/config/Tools.mk.in +++ b/config/Tools.mk.in @@ -31,8 +31,6 @@ PTHREAD_CFLAGS := @PTHREAD_CFLAGS@ PTHREAD_LDFLAGS := @PTHREAD_LDFLAGS@ PTHREAD_LIBS:= @PTHREAD_LIBS@ -PTYFUNCS_LIBS := @PTYFUNCS_LIBS@ - LIBNL3_LIBS := @LIBNL3_LIBS@ LIBNL3_CFLAGS := @LIBNL3_CFLAGS@ XEN_TOOLS_RPATH := @rpath@ diff --git a/m4/ptyfuncs.m4 b/m4/ptyfuncs.m4 deleted file mode 100644 index 3e37b5a23c..00 --- a/m4/ptyfuncs.m4 +++ /dev/null @@ -1,35 +0,0 @@ -AC_DEFUN([AX_CHECK_PTYFUNCS], [ -dnl This is a workaround for a bug in Debian package -dnl libbsd-dev-0.3.0-1. Once we no longer support that -dnl package we can remove the addition of -Werror to -dnl CPPFLAGS. -AX_SAVEVAR_SAVE(CPPFLAGS) -CPPFLAGS="$CPPFLAGS -Werror" -AC_CHECK_HEADER([libutil.h],[ - AC_DEFINE([INCLUDE_LIBUTIL_H],[],[libutil header file name]) -]) -AX_SAVEVAR_RESTORE(CPPFLAGS) -AC_CACHE_CHECK([for openpty et al], [ax_cv_ptyfuncs_libs], [ -for ax_cv_ptyfuncs_libs in -lutil "" NOT_FOUND; do -if test "x$ax_cv_ptyfuncs_libs" = "xNOT_FOUND"; then -AC_MSG_FAILURE([Unable to find library for openpty and login_tty]) -fi -AX_SAVEVAR_SAVE(LIBS) -LIBS="$LIBS $ax_cv_ptyfuncs_libs" -AC_LINK_IFELSE([AC_LANG_SOURCE([ -#ifdef INCLUDE_LIBUTIL_H -#include INCLUDE_LIBUTIL_H -#endif -int main(void) { - openpty(0,0,0,0,0); - login_tty(0); -} -])],[ -break -],[]) -AX_SAVEVAR_RESTORE(LIBS) -done -]) -PTYFUNCS_LIBS="$ax_cv_ptyfuncs_libs" -AC_SUBST(PTYFUNCS_LIBS) -]) diff --git a/tools/configure.ac b/tools/configure.ac index 9bcf42f233..3cccf41960 100644 --- a/tools/configure.ac +++ b/tools/configure.ac @@ -70,7 +70,6 @@ m4_include([../m4/uuid.m4]) m4_include([../m4/pkg.m4]) m4_include([../m4/curses.m4]) m4_include([../m4/pthread.m4]) -m4_include([../m4/ptyfuncs.m4]) m4_include([../m4/extfs.m4]) m4_include([../m4/fetcher.m4]) m4_include([../m4/ax_compare_version.m4]) @@ -416,7 +415,6 @@ AC_SUBST([ZLIB_CFLAGS]) AC_SUBST([ZLIB_LIBS]) AX_CHECK_EXTFS AX_CHECK_PTHREAD -AX_CHECK_PTYFUNCS AC_CHECK_LIB([yajl], [yajl_alloc], [], [AC_MSG_ERROR([Could not find yajl])]) AC_CHECK_LIB([z], [deflateCopy], [], [AC_MSG_ERROR([Could not find zlib])]) diff --git a/tools/libs/light/Makefile b/tools/libs/light/Makefile index 96daeabc47..5d7ff94b05 100644 --- a/tools/libs/light/Makefile +++ b/tools/libs/light/Makefile @@ -158,7 +158,7 @@ NO_HEADERS_CHK := y include $(XEN_ROOT)/tools/libs/libs.mk -LDLIBS-y += $(PTYFUNCS_LIBS) +LDLIBS-y += $(UTIL_LIBS) LDLIBS-$(CONFIG_LIBNL) += $(LIBNL3_LIBS) LDLIBS-$(CONFIG_Linux) += -luuid LDLIBS-$(CONFIG_Linux) += -lrt diff --git a/tools/libs/light/libxl_bootloader.c b/tools/libs/light/libxl_bootloader.c index 18e9ebd714..1bc6e51827 100644 --- a/tools/libs/light/libxl_bootloader.c +++ b/tools/libs/light/libxl_bootloader.c @@ -19,10 +19,6 @@ #include #endif -#ifdef INCLUDE_LIBUTIL_H -#include INCLUDE_LIBUTIL_H -#endif - #include "libxl_internal.h" #define BOOTLOADER_BUF_OUT 65536
Re: [PATCH v1] tools: drop bogus and obsolete ptyfuncs.m4
Tue, 9 May 2023 17:47:33 +0100 Anthony PERARD : > That change isn't enough. And I'm not convinced that it needs to be > removed. You are right, the provided functions must be removed as well. My build scripts do not run autoreconf, perhaps it is about time to change that. > First, AX_CHECK_PTYFUNCS is still called in "tools/configure.ac". This needs to be removed, yes. > Then, AX_CHECK_PTYFUNCS define INCLUDE_LIBUTIL_H and PTYFUNCS_LIBS. This is used inconsistently. Some places include unconditionally. -lutil is already used unconditionally via UTIL_LIBS > Also, that that macro isn't just about the header, but also about the > needed library. This is already covered. Olaf pgprDKONIMuhT.pgp Description: Digitale Signatur von OpenPGP
[PATCH v3] Fix install.sh for systemd
On a fedora system, if you run `sudo sh install.sh` you break your system. The installation clobbers /var/run, a symlink to /run. A subsequent boot fails when /var/run and /run are different since accesses through /var/run can't find items that now only exist in /run and vice-versa. Skip populating /var/run/xen during make install. The directory is already created by some scripts. Adjust all remaining scripts to create XEN_RUN_DIR at runtime. Use the shell variable XEN_RUN_DIR instead of hardcoded paths. XEN_RUN_STORED is covered by XEN_RUN_DIR because xenstored is usually started afterwards. Reported-by: Jason Andryuk Tested-by: Jason Andryuk Signed-off-by: Olaf Hering --- v3: use variables, and quote variables, drop -m0700 from one mkdir call tools/Makefile | 2 -- tools/hotplug/FreeBSD/rc.d/xencommons.in | 1 + tools/hotplug/FreeBSD/rc.d/xendriverdomain.in | 1 + tools/hotplug/Linux/init.d/xendriverdomain.in | 1 + tools/hotplug/Linux/systemd/xenconsoled.service.in | 2 +- tools/hotplug/NetBSD/rc.d/xendriverdomain.in | 2 +- 6 files changed, 5 insertions(+), 4 deletions(-) diff --git a/tools/Makefile b/tools/Makefile index 4906fdbc23..1ff90ddfa0 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -58,9 +58,7 @@ build all: subdirs-all install: $(INSTALL_DIR) -m 700 $(DESTDIR)$(XEN_DUMP_DIR) $(INSTALL_DIR) $(DESTDIR)$(XEN_LOG_DIR) - $(INSTALL_DIR) $(DESTDIR)$(XEN_RUN_DIR) $(INSTALL_DIR) $(DESTDIR)$(XEN_LIB_DIR) - $(INSTALL_DIR) $(DESTDIR)$(XEN_RUN_STORED) $(INSTALL_DIR) $(DESTDIR)$(PKG_INSTALLDIR) $(MAKE) subdirs-install diff --git a/tools/hotplug/FreeBSD/rc.d/xencommons.in b/tools/hotplug/FreeBSD/rc.d/xencommons.in index 7f7cda289f..6f429e4b0c 100644 --- a/tools/hotplug/FreeBSD/rc.d/xencommons.in +++ b/tools/hotplug/FreeBSD/rc.d/xencommons.in @@ -34,6 +34,7 @@ xen_startcmd() local time=0 local timeout=30 + mkdir -p "${XEN_RUN_DIR}" xenstored_pid=$(check_pidfile ${XENSTORED_PIDFILE} ${XENSTORED}) if test -z "$xenstored_pid"; then printf "Starting xenservices: xenstored, xenconsoled." diff --git a/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in b/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in index a032822e33..f487c43468 100644 --- a/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in +++ b/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in @@ -27,6 +27,7 @@ xendriverdomain_start() { printf "Starting xenservices: xl devd." + mkdir -p "${XEN_RUN_DIR}" PATH="${bindir}:${sbindir}:$PATH" ${sbindir}/xl devd --pidfile ${XLDEVD_PIDFILE} ${XLDEVD_ARGS} printf "\n" diff --git a/tools/hotplug/Linux/init.d/xendriverdomain.in b/tools/hotplug/Linux/init.d/xendriverdomain.in index c63060f62a..17b381c3dc 100644 --- a/tools/hotplug/Linux/init.d/xendriverdomain.in +++ b/tools/hotplug/Linux/init.d/xendriverdomain.in @@ -49,6 +49,7 @@ fi do_start () { echo Starting xl devd... + mkdir -p "${XEN_RUN_DIR}" ${sbindir}/xl devd --pidfile=$XLDEVD_PIDFILE $XLDEVD_ARGS } do_stop () { diff --git a/tools/hotplug/Linux/systemd/xenconsoled.service.in b/tools/hotplug/Linux/systemd/xenconsoled.service.in index 1f03de9041..d84c09aa9c 100644 --- a/tools/hotplug/Linux/systemd/xenconsoled.service.in +++ b/tools/hotplug/Linux/systemd/xenconsoled.service.in @@ -11,7 +11,7 @@ Environment=XENCONSOLED_TRACE=none Environment=XENCONSOLED_LOG_DIR=@XEN_LOG_DIR@/console EnvironmentFile=-@CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons ExecStartPre=/bin/grep -q control_d /proc/xen/capabilities -ExecStartPre=/bin/mkdir -p ${XENCONSOLED_LOG_DIR} +ExecStartPre=/bin/mkdir -p ${XENCONSOLED_LOG_DIR} @XEN_RUN_DIR@ ExecStart=@sbindir@/xenconsoled -i --log=${XENCONSOLED_TRACE} --log-dir=${XENCONSOLED_LOG_DIR} $XENCONSOLED_ARGS [Install] diff --git a/tools/hotplug/NetBSD/rc.d/xendriverdomain.in b/tools/hotplug/NetBSD/rc.d/xendriverdomain.in index f47b0b189c..87afc061ac 100644 --- a/tools/hotplug/NetBSD/rc.d/xendriverdomain.in +++ b/tools/hotplug/NetBSD/rc.d/xendriverdomain.in @@ -23,7 +23,7 @@ XLDEVD_PIDFILE="@XEN_RUN_DIR@/xldevd.pid" xendriverdomain_precmd() { - : + mkdir -p "${XEN_RUN_DIR}" } xendriverdomain_startcmd()
Re: [PATCH v2] Fix install.sh for systemd
Fri, 12 May 2023 12:22:08 +0100 Andrew Cooper : > Does this allow for making any of these files no longer > preprocessed by ./configure ? (i.e. cease being .in files) The path to hotplugpath.sh is variable, so each consumer needs to be a .in file. Olaf pgpSPFRUGSANB.pgp Description: Digitale Signatur von OpenPGP
Re: [PATCH v2] Fix install.sh for systemd
On 12/05/2023 12:18 pm, Olaf Hering wrote: > Tue, 9 May 2023 13:47:11 +0100 Andrew Cooper : > >> Why is this 700, and the others just using regular perms? >> Also, doesn't it want quoting like the other examples too? > It is not clear why there is a single mkdir -m 0700 in the tree. > Most likely it will not give any extra security. I agree. It's weird and doesn't have a good reason for being different. > The scripts source hotplug.sh, which defines a variable XEN_RUN_DIR. > I think it is better to use the shell variable instead of hardcoded paths. Sounds good. Does this allow for making any of these files no longer preprocessed by ./configure ? (i.e. cease being .in files) > Regarding quoting: there are many paths used without quoting. > For the beauty an additional (huge) change could be done to quote > everything. Not sure if it is worth the effort... Perhaps, but variables should always be quoted. At least make sure that new additions (and edits) leave things quoted. Thanks, ~Andrew
Re: [PATCH v2] Fix install.sh for systemd
Tue, 9 May 2023 13:47:11 +0100 Andrew Cooper : > Why is this 700, and the others just using regular perms? > Also, doesn't it want quoting like the other examples too? It is not clear why there is a single mkdir -m 0700 in the tree. Most likely it will not give any extra security. The scripts source hotplug.sh, which defines a variable XEN_RUN_DIR. I think it is better to use the shell variable instead of hardcoded paths. Regarding quoting: there are many paths used without quoting. For the beauty an additional (huge) change could be done to quote everything. Not sure if it is worth the effort... I will post a v3 with this relative change: --- a/tools/hotplug/FreeBSD/rc.d/xencommons.in +++ b/tools/hotplug/FreeBSD/rc.d/xencommons.in @@ -34,7 +34,7 @@ xen_startcmd() local time=0 local timeout=30 - mkdir -p "@XEN_RUN_DIR@" + mkdir -p "${XEN_RUN_DIR}" xenstored_pid=$(check_pidfile ${XENSTORED_PIDFILE} ${XENSTORED}) if test -z "$xenstored_pid"; then printf "Starting xenservices: xenstored, xenconsoled." --- a/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in +++ b/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in @@ -27,7 +27,7 @@ xendriverdomain_start() { printf "Starting xenservices: xl devd." - mkdir -p "@XEN_RUN_DIR@" + mkdir -p "${XEN_RUN_DIR}" PATH="${bindir}:${sbindir}:$PATH" ${sbindir}/xl devd --pidfile ${XLDEVD_PIDFILE} ${XLDEVD_ARGS} printf "\n" --- a/tools/hotplug/Linux/init.d/xendriverdomain.in +++ b/tools/hotplug/Linux/init.d/xendriverdomain.in @@ -49,7 +49,7 @@ fi do_start () { echo Starting xl devd... - mkdir -m700 -p @XEN_RUN_DIR@ + mkdir -p "${XEN_RUN_DIR}" ${sbindir}/xl devd --pidfile=$XLDEVD_PIDFILE $XLDEVD_ARGS } do_stop () { --- a/tools/hotplug/NetBSD/rc.d/xendriverdomain.in +++ b/tools/hotplug/NetBSD/rc.d/xendriverdomain.in @@ -23,7 +23,7 @@ XLDEVD_PIDFILE="@XEN_RUN_DIR@/xldevd.pid" xendriverdomain_precmd() { - mkdir -p "@XEN_RUN_DIR@" + mkdir -p "${XEN_RUN_DIR}" } xendriverdomain_startcmd() pgpfEm89B51zQ.pgp Description: Digitale Signatur von OpenPGP
Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users
On Tue, May 09, 2023 at 01:21:22PM -0500, Bjorn Helgaas wrote: > On Tue, Apr 04, 2023 at 11:11:01AM -0500, Bjorn Helgaas wrote: > > On Thu, Mar 30, 2023 at 07:24:27PM +0300, Andy Shevchenko wrote: > > > Provide two new helper macros to iterate over PCI device resources and > > > convert users. > > > Applied 2-7 to pci/resource for v6.4, thanks, I really like this! > > This is 09cc90063240 ("PCI: Introduce pci_dev_for_each_resource()") > upstream now. > > Coverity complains about each use, It needs more clarification here. Use of reduced variant of the macro or all of them? If the former one, then I can speculate that Coverity (famous for false positives) simply doesn't understand `for (type var; var ...)` code. > sample below from > drivers/pci/vgaarb.c. I didn't investigate at all, so it might be a > false positive; just FYI. > > 1. Condition screen_info.capabilities & (2U /* 1 << 1 */), taking > true branch. > 556if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE) > 557base |= (u64)screen_info.ext_lfb_base << 32; > 558 > 559limit = base + size; > 560 > 561/* Does firmware framebuffer belong to us? */ > 2. Condition __b < PCI_NUM_RESOURCES, taking true branch. > 3. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), > taking true branch. > 6. Condition __b < PCI_NUM_RESOURCES, taking true branch. > 7. cond_at_most: Checking __b < PCI_NUM_RESOURCES implies that __b > may be up to 16 on the true branch. > 8. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), > taking true branch. > 11. incr: Incrementing __b. The value of __b may now be up to 17. > 12. alias: Assigning: r = >resource[__b]. r may now point to as > high as element 17 of pdev->resource (which consists of 17 64-byte elements). > 13. Condition __b < PCI_NUM_RESOURCES, taking true branch. > 14. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), > taking true branch. > 562pci_dev_for_each_resource(pdev, r) { > 4. Condition resource_type(r) != 512, taking true branch. > 9. Condition resource_type(r) != 512, taking true branch. > > CID 1529911 (#1 of 1): Out-of-bounds read (OVERRUN) > 15. overrun-local: Overrunning array of 1088 bytes at byte offset 1088 by > dereferencing pointer r. [show details] > 563if (resource_type(r) != IORESOURCE_MEM) > 5. Continuing loop. > 10. Continuing loop. > 564continue; -- With Best Regards, Andy Shevchenko
Re: [PATCH V2 2/2] libxl: arm: Add grant_usage parameter for virtio devices
On Thu, May 11, 2023 at 01:20:43PM +0530, Viresh Kumar wrote: > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in > index 24ac92718288..0405f6efe62a 100644 > --- a/docs/man/xl.cfg.5.pod.in > +++ b/docs/man/xl.cfg.5.pod.in > @@ -1619,6 +1619,18 @@ hexadecimal format, without the "0x" prefix and all in > lower case, like > Specifies the transport mechanism for the Virtio device, only "mmio" is > supported for now. > > +=item B > + > +Specifies the grant usage details for the Virtio device. This can be set to > +following values: > + > +- "default": The default grant setting will be used, enable grants if > + backend-domid != 0. I don't think this "default" setting is useful. We could just describe what the default is when "grant_usage" setting is missing from the configuration. > +- "enabled": The grants are always enabled. > + > +- "disabled": The grants are always disabled. > diff --git a/tools/libs/light/libxl_types.idl > b/tools/libs/light/libxl_types.idl > index c10292e0d7e3..17228817f9b7 100644 > --- a/tools/libs/light/libxl_types.idl > +++ b/tools/libs/light/libxl_types.idl > @@ -283,6 +283,12 @@ libxl_virtio_transport = Enumeration("virtio_transport", > [ > (1, "MMIO"), > ]) > > +libxl_virtio_grant_usage = Enumeration("virtio_grant_usage", [ > +(0, "DEFAULT"), > +(1, "DISABLED"), > +(2, "ENABLED"), libxl already provide this type, it's call "libxl_defbool". It can be set to "default", "false" or "true". > diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c > index 97c80d7ed0fa..9cd7dbef0237 100644 > --- a/tools/libs/light/libxl_arm.c > +++ b/tools/libs/light/libxl_arm.c > @@ -1363,22 +1365,29 @@ static int libxl__prepare_dtb(libxl__gc *gc, > libxl_domain_config *d_config, > iommu_needed = true; > > FDT( make_virtio_mmio_node(gc, fdt, disk->base, disk->irq, > - disk->backend_domid) ); > + disk->backend_domid, > + disk->backend_domid != > LIBXL_TOOLSTACK_DOMID) ); > } > } > > for (i = 0; i < d_config->num_virtios; i++) { > libxl_device_virtio *virtio = _config->virtios[i]; > +bool use_grant = false; > > if (virtio->transport != LIBXL_VIRTIO_TRANSPORT_MMIO) > continue; > > -if (virtio->backend_domid != LIBXL_TOOLSTACK_DOMID) > +if ((virtio->grant_usage == LIBXL_VIRTIO_GRANT_USAGE_ENABLED) || > +((virtio->grant_usage == LIBXL_VIRTIO_GRANT_USAGE_DEFAULT) && > + (virtio->backend_domid != LIBXL_TOOLSTACK_DOMID))) { I think libxl can select what the default value should be replace with before we start to setup the guest. There's a *_setdefault() phase were we set the correct value when a configuration value hasn't been set and thus a default value is used. I think this can be done in libxl__device_virtio_setdefault(). After that, virtio->grant_usage will be true or false, and that's the value that should be given to the virtio backend via xenstore. > diff --git a/tools/libs/light/libxl_virtio.c b/tools/libs/light/libxl_virtio.c > index eadcb7124c3f..0a0fae967a0f 100644 > --- a/tools/libs/light/libxl_virtio.c > +++ b/tools/libs/light/libxl_virtio.c > @@ -46,11 +46,13 @@ static int libxl__set_xenstore_virtio(libxl__gc *gc, > uint32_t domid, >flexarray_t *ro_front) > { > const char *transport = > libxl_virtio_transport_to_string(virtio->transport); > +const char *grant_usage = > libxl_virtio_grant_usage_to_string(virtio->grant_usage); > > flexarray_append_pair(back, "irq", GCSPRINTF("%u", virtio->irq)); > flexarray_append_pair(back, "base", GCSPRINTF("%#"PRIx64, virtio->base)); > flexarray_append_pair(back, "type", GCSPRINTF("%s", virtio->type)); > flexarray_append_pair(back, "transport", GCSPRINTF("%s", transport)); > +flexarray_append_pair(back, "grant_usage", GCSPRINTF("%s", grant_usage)); Currently, this mean that we store "default" in this node. That mean that both the virtio backend and libxl have to do computation in order to figure out if "default" mean "true" or "false". And both have to find the same result. I don't think this is necessary, and libxl can just tell enabled or disable. This would be done in libxl before we run this function. See previous comment on this patch. Thanks, -- Anthony PERARD
[xen-unstable-smoke test] 180631: tolerable all pass - PUSHED
flight 180631 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/180631/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen 4c507d8a6b6e8be90881a335b0a66eb28e0f7737 baseline version: xen cb781ae2c98de5d5742aa0de6850f371fb25825f Last test of basis 180622 2023-05-11 16:02:01 Z0 days Testing same since 180631 2023-05-12 08:00:26 Z0 days1 attempts People who touched revisions under test: Jan Beulich Julien Grall Roger Pau Monné jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git cb781ae2c9..4c507d8a6b 4c507d8a6b6e8be90881a335b0a66eb28e0f7737 -> smoke
Re: [PATCH V2 1/2] libxl: virtio: Remove unused frontend nodes
On Thu, May 11, 2023 at 01:20:42PM +0530, Viresh Kumar wrote: > The libxl_virtio file works for only PV devices and the nodes under the Well, VirtIO devices are a kind of PV devices, yes, like there's Xen PV devices. So this explanation doesn't really make much sense. > frontend path are not used by anyone. Remove them. Instead of describing what isn't used, it might be better to describe what we try to achieve. Something like: Only the VirtIO backend will watch xenstore to find out when a new instance needs to be created for a guest, and read the parameters from there. VirtIO frontend are only virtio, so they will not do anything with the xenstore nodes. They can be removed. > While at it, also add a comment to the file describing what devices this > file is used for. > > Signed-off-by: Viresh Kumar > --- > V2: New patch. > > tools/libs/light/libxl_virtio.c | 8 +++- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/tools/libs/light/libxl_virtio.c b/tools/libs/light/libxl_virtio.c > index faada49e184e..eadcb7124c3f 100644 > --- a/tools/libs/light/libxl_virtio.c > +++ b/tools/libs/light/libxl_virtio.c > @@ -10,6 +10,9 @@ > * but WITHOUT ANY WARRANTY; without even the implied warranty of > * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > * GNU Lesser General Public License for more details. > + * > + * This is used for PV (paravirtualized) devices only and the frontend isn't > + * aware of the xenstore. It might be more common to put this kind of comment at the top, that is just before the "Copyright" line. Also, same remark as for the patch description, VirtIO are PV devices, so the comment is unnecessary. What is less obvious is why is there even xenstore interaction with a VirtIO device? Maybe a better description for the source file would be: Setup VirtIO backend. This is intended to interact with a VirtIO backend that is watching xenstore, and create new VirtIO devices with the parameter found in xenstore. (VirtIO frontend don't interact with xenstore.) Thanks, -- Anthony PERARD
[ovmf test] 180629: all pass - PUSHED
flight 180629 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/180629/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 0b37723186ec1525b6caf14b0309fb0ed04084d7 baseline version: ovmf c08a3a96fd19ac8adc75f00d373b4a1606b26c00 Last test of basis 180627 2023-05-12 04:12:20 Z0 days Testing same since 180629 2023-05-12 06:42:08 Z0 days1 attempts People who touched revisions under test: Giri Mudusuru jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git c08a3a96fd..0b37723186 0b37723186ec1525b6caf14b0309fb0ed04084d7 -> xen-tested-master
Re: [PATCH v7 3/5] xen/riscv: align __bss_start
On Fri, 2023-05-12 at 09:45 +0200, Jan Beulich wrote: > On 11.05.2023 19:09, Oleksii Kurochko wrote: > > bss clear cycle requires proper alignment of __bss_start. > > > > Signed-off-by: Oleksii Kurochko > > Reviewed-by: Jan Beulich > with two remarks, though: > > While probably not very important yet for RISC-V (until there is at > least enough functionality to, say, boot Dom0), you may want to get > used to add Fixes: tags in cases like this one. Got it. > > > --- a/xen/arch/riscv/xen.lds.S > > +++ b/xen/arch/riscv/xen.lds.S > > @@ -137,6 +137,7 @@ SECTIONS > > __init_end = .; > > > > .bss : { /* BSS */ > > + . = ALIGN(POINTER_ALIGN); > > __bss_start = .; > > *(.bss.stack_aligned) > > . = ALIGN(PAGE_SIZE); > > While independent of the change here, this ALIGN() visible in context > is unnecessary, afaict. ALIGN() generally only makes sense when > there's a linker-script-defined symbol right afterwards. Taking the > case here, any contributions to .bss.page_aligned have to specify > proper alignment themselves anyway (or else they'd be dependent upon > linking order). Just like there's (correctly) no ALIGN(STACK_SIZE) > ahead of *(.bss.stack_aligned). It make sense. > > The change here might be a good opportunity to drop that ALIGN() at > the same time. So long as you (and the maintainers) agree, I guess > the adjustment could easily be made while committing. I would agree with this. Thanks. ~ Oleksii
[linux-linus test] 180625: regressions - FAIL
flight 180625 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/180625/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-arm64 6 xen-buildfail REGR. vs. 180278 test-amd64-amd64-dom0pvh-xl-intel 20 guest-localmigrate/x10 fail REGR. vs. 180278 test-armhf-armhf-xl-credit1 8 xen-boot fail REGR. vs. 180278 Tests which did not succeed, but are not blocking: test-arm64-arm64-xl-credit2 1 build-check(1) blocked n/a test-arm64-arm64-xl-thunderx 1 build-check(1) blocked n/a test-arm64-arm64-xl-vhd 1 build-check(1) blocked n/a test-arm64-arm64-examine 1 build-check(1) blocked n/a build-arm64-libvirt 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-raw 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-xl 1 build-check(1) blocked n/a test-arm64-arm64-xl-credit1 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 8 xen-boot fail like 180278 test-armhf-armhf-xl-multivcpu 8 xen-boot fail like 180278 test-armhf-armhf-libvirt-raw 8 xen-boot fail like 180278 test-armhf-armhf-xl-rtds 8 xen-boot fail like 180278 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180278 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180278 test-armhf-armhf-xl 8 xen-boot fail like 180278 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180278 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180278 test-armhf-armhf-examine 8 reboot fail like 180278 test-armhf-armhf-libvirt-qcow2 8 xen-bootfail like 180278 test-armhf-armhf-xl-credit2 8 xen-boot fail like 180278 test-armhf-armhf-xl-vhd 8 xen-boot fail like 180278 test-armhf-armhf-xl-arndale 8 xen-boot fail like 180278 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180278 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass version targeted for testing: linuxcc3c44c9fda264c6d401be04e95449a57c1231c6 baseline version: linux6c538e1adbfc696ac4747fb10d63e704344f763d Last test of basis 180278 2023-04-16 19:41:46 Z 25 days Failing since180281 2023-04-17 06:24:36 Z 25 days 46 attempts Testing same since 180625 2023-05-11 22:11:49 Z0 days1 attempts 2380 people touched revisions under test, not listing them all jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 fail build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt blocked build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-arm64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-xl pass test-amd64-coresched-amd64-xlpass test-arm64-arm64-xl blocked test-armhf-armhf-xl fail test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsmpass
[xen-unstable test] 180624: tolerable FAIL - PUSHED
flight 180624 xen-unstable real [real] flight 180630 xen-unstable real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/180624/ http://logs.test-lab.xenproject.org/osstest/logs/180630/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-i386-examine 6 xen-install fail pass in 180630-retest Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 180609 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 180609 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 180609 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180618 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 180618 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180618 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail like 180618 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 180618 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 180618 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180618 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180618 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 180618 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180618 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass version targeted for testing: xen cb781ae2c98de5d5742aa0de6850f371fb25825f baseline version: xen
Re: [PATCH v7 3/5] xen/riscv: align __bss_start
On 11.05.2023 19:09, Oleksii Kurochko wrote: > bss clear cycle requires proper alignment of __bss_start. > > Signed-off-by: Oleksii Kurochko Reviewed-by: Jan Beulich with two remarks, though: While probably not very important yet for RISC-V (until there is at least enough functionality to, say, boot Dom0), you may want to get used to add Fixes: tags in cases like this one. > --- a/xen/arch/riscv/xen.lds.S > +++ b/xen/arch/riscv/xen.lds.S > @@ -137,6 +137,7 @@ SECTIONS > __init_end = .; > > .bss : { /* BSS */ > +. = ALIGN(POINTER_ALIGN); > __bss_start = .; > *(.bss.stack_aligned) > . = ALIGN(PAGE_SIZE); While independent of the change here, this ALIGN() visible in context is unnecessary, afaict. ALIGN() generally only makes sense when there's a linker-script-defined symbol right afterwards. Taking the case here, any contributions to .bss.page_aligned have to specify proper alignment themselves anyway (or else they'd be dependent upon linking order). Just like there's (correctly) no ALIGN(STACK_SIZE) ahead of *(.bss.stack_aligned). The change here might be a good opportunity to drop that ALIGN() at the same time. So long as you (and the maintainers) agree, I guess the adjustment could easily be made while committing. Jan
Re: [RFC PATCH v2 6/8] pci/arm: don't do iommu call for phantom functions
On 11.05.2023 21:16, Stewart Hildebrand wrote: > It's not necessary to add/remove/assign/deassign pci phantom functions > for the ARM SMMU drivers. All associated AXI stream IDs are added during > the iommu call for the base PCI device/function. > > However, the ARM SMMU drivers can cope with the extra/unnecessary calls just > fine, so this patch is RFC as it's not strictly required. Tying the skipping to IS_ENABLED(CONFIG_HAS_DEVICE_TREE) goes against one of Julien's earlier comments, towards DT and ACPI wanting to co-exist at some point. So I think keeping the supposedly unnecessary calls is going to be unavoidable, unless you have a runtime property that you could check instead. Jan
Re: [PATCH v2 5/8] pci/arm: Use iommu_add_dt_pci_device()
On 11.05.2023 21:16, Stewart Hildebrand wrote: > @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, > pdev->domain = NULL; > goto out; > } > +#ifdef CONFIG_HAS_DEVICE_TREE > +ret = iommu_add_dt_pci_device(pdev); > +if ( ret < 0 ) > +{ > +printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret); > +goto out; > +} > +#endif > ret = iommu_add_device(pdev); Hmm, am I misremembering that in the earlier patch you had #else to invoke the alternative behavior? Now you end up calling both functions; if that's indeed intended, this may still want doing differently. Looking at the earlier patch introducing the function, I can't infer though whether that's intended: iommu_add_dt_pci_device() checks that the add_device hook is present, but then I didn't find any use of this hook. The revlog there suggests the check might be stale. If indeed the function does only preparatory work, I don't see why it would need naming "iommu_..."; I'd rather consider pci_add_dt_device() then. Plus in such a case #ifdef-ary here probably wants avoiding by introducing a suitable no-op stub for the !HAS_DEVICE_TREE case. Then ... > if ( ret ) > { > +#ifdef CONFIG_HAS_DEVICE_TREE > +iommu_fwspec_free(pci_to_dev(pdev)); > +#endif ... this (which I understand is doing the corresponding cleanup) then also wants wrapping in a suitably named tiny helper function. But yet further I'm then no longer convinced this is the right place for the addition. pci_add_device() is backing physdev hypercalls. It would seem to me that the function may want invoking yet one layer further up, or it may even want invoking from a brand new DT-specific physdev-op. This would then leave at least the x86-only paths (invoking pci_add_device() from outside of pci_physdev_op()) entirely alone. Jan
Re: [PATCH v2 3/8] iommu/arm: Introduce iommu_add_dt_pci_device API
On 11.05.2023 21:16, Stewart Hildebrand wrote: > --- a/xen/include/xen/iommu.h > +++ b/xen/include/xen/iommu.h > @@ -219,7 +219,8 @@ int iommu_dt_domain_init(struct domain *d); > int iommu_release_dt_devices(struct domain *d); > > /* > - * Helper to add master device to the IOMMU using generic IOMMU DT bindings. > + * Helpers to add master device to the IOMMU using generic (PCI-)IOMMU > + * DT bindings. > * > * Return values: > * 0 : device is protected by an IOMMU > @@ -228,6 +229,9 @@ int iommu_release_dt_devices(struct domain *d); > * (IOMMU is not enabled/present or device is not connected to it). > */ > int iommu_add_dt_device(struct dt_device_node *np); > +#ifdef CONFIG_HAS_PCI > +int iommu_add_dt_pci_device(struct pci_dev *pdev); > +#endif Is the #ifdef really necessary? Jan
Re: [PATCH] x86/vPIT: check/bound values loaded from state save record
On 11.05.2023 19:51, Jason Andryuk wrote: > On Thu, May 11, 2023 at 7:50 AM Jan Beulich wrote: >> @@ -426,6 +427,33 @@ static int cf_check pit_load(struct doma >> } >> >> /* >> + * Convert loaded values to be within valid range, for them to represent >> + * actually reachable state. Uses of some of the values elsewhere >> assume >> + * this is the case. >> + */ >> +for ( i = 0; i < ARRAY_SIZE(pit->hw.channels); ++i ) >> +{ >> +struct hvm_hw_pit_channel *ch = >hw.channels[i]; >> + >> +/* pit_load_count() will convert 0 suitably back to 0x1. */ >> +ch->count &= 0x; >> +if ( ch->count_latched >= RW_STATE_NUM ) >> +ch->count_latched = 0; >> +if ( ch->read_state >= RW_STATE_NUM ) >> +ch->read_state = 0; >> +if ( ch->read_state >= RW_STATE_NUM ) >> +ch->write_state = 0; > > Should these both be write_state? Definitely. Thanks for noticing! Jan
[ovmf test] 180627: all pass - PUSHED
flight 180627 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/180627/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf c08a3a96fd19ac8adc75f00d373b4a1606b26c00 baseline version: ovmf 0a0e60caf20ab621ee9c1fc66b82b739158c05cf Last test of basis 180617 2023-05-11 09:10:41 Z0 days Testing same since 180627 2023-05-12 04:12:20 Z0 days1 attempts People who touched revisions under test: Tinh Nguyen jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git 0a0e60caf2..c08a3a96fd c08a3a96fd19ac8adc75f00d373b4a1606b26c00 -> xen-tested-master
Re: [PATCH v3 07/14 RESEND] cpufreq: Export HWP parameters to userspace
On 11.05.2023 22:22, Jason Andryuk wrote: > On Thu, May 11, 2023 at 10:10 AM Jan Beulich wrote: >> >> On 11.05.2023 15:49, Jason Andryuk wrote: >>> On Thu, May 11, 2023 at 2:21 AM Jan Beulich wrote: On 10.05.2023 19:49, Jason Andryuk wrote: > On Mon, May 8, 2023 at 6:26 AM Jan Beulich wrote: >> >> On 01.05.2023 21:30, Jason Andryuk wrote: >>> Extend xen_get_cpufreq_para to return hwp parameters. These match the >>> hardware rather closely. >>> >>> We need the features bitmask to indicated fields supported by the actual >>> hardware. >>> >>> The use of uint8_t parameters matches the hardware size. uint32_t >>> entries grows the sysctl_t past the build assertion in setup.c. The >>> uint8_t ranges are supported across multiple generations, so hopefully >>> they won't change. >> >> Still it feels a little odd for values to be this narrow. Aiui the >> scaling_governor[] and scaling_{max,min}_freq fields aren't (really) >> used by HWP. So you could widen the union in struct >> xen_get_cpufreq_para (in a binary but not necessarily source compatible >> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly >> placed scaling_cur_freq could be included as well ... > > The values are narrow, but they match the hardware. It works for HWP, > so there is no need to change at this time AFAICT. > > Do you want me to make this change? Well, much depends on what these 8-bit values actually express (I did raise this question in one of the replies to your patches, as I wasn't able to find anything in the SDM). That'll then hopefully allow to make some educated prediction on on how likely it is that a future variant of hwp would want to widen them. >>> >>> Sorry for not providing a reference earlier. In the SDM, >>> HARDWARE-CONTROLLED PERFORMANCE STATES (HWP) section, there is this >>> second paragraph: >>> """ >>> In contrast, HWP is an implementation of the ACPI-defined >>> Collaborative Processor Performance Control (CPPC), which specifies >>> that the platform enumerates a continuous, abstract unit-less, >>> performance value scale that is not tied to a specific performance >>> state / frequency by definition. While the enumerated scale is roughly >>> linear in terms of a delivered integer workload performance result, >>> the OS is required to characterize the performance value range to >>> comprehend the delivered performance for an applied workload. >>> """ >>> >>> The numbers are "continuous, abstract unit-less, performance value." >>> So there isn't much to go on there, but generally, smaller numbers >>> mean slower and bigger numbers mean faster. >>> >>> Cross referencing the ACPI spec here: >>> https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#collaborative-processor-performance-control >>> >>> Scrolling down you can find the register entries such as >>> >>> Highest Performance >>> Register or DWORD Attribute: Read >>> Size: 8-32 bits >>> >>> AMD has its own pstate implementation that is similar to HWP. Looking >>> at the Linux support, the AMD hardware also use 8 bit values for the >>> comparable fields: >>> https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/msr-index.h#L612 >>> >>> So Intel and AMD are 8bit for now at least. Something could do 32bits >>> according to the ACPI spec. >>> >>> 8 bits of granularity for slow to fast seems like plenty to me. I'm >>> not sure what one would gain from 16 or 32 bits, but I'm not designing >>> the hardware. From the earlier xenpm output, "highest" was 49, so >>> still a decent amount of room in an 8 bit range. >> >> Hmm, thanks for the pointers. I'm still somewhat undecided. I guess I'm >> okay with you keeping things as you have them. If and when needed we can >> still rework the structure - it is possible to change it as it's (for >> the time being at least) still an unstable interface. > > With an anonymous union and anonymous struct, struct > xen_get_cpufreq_para can be re-arranged and compile without any > changes to other cpufreq code. struct xen_hwp_para becomes 10 > uint32_t's. The old scaling is 3 * uint32_t + 16 bytes > CPUFREQ_NAME_LEN + 4 * uint32_t for xen_ondemand = 11 uint32_t. So > int32_t turbo_enabled doesn't move and it's binary compatible. > > Anonymous unions and structs aren't allowed in the public header > though, right? Correct. > So that would need to change, though it doesn't seem > too bad. There isn't too much churn. > > I have no plans to tackle AMD pstate. But having glanced at it this > morning, maybe these hwp sysctls should be renamed cppc? AMD pstate > and HWP are both implementations of CPPC, so that could be more future > proof? But, again, I only glanced at the AMD stuff, so there may be > other changes needed. I consider this naming change plan plausible. If further adjustments end up
Re: [PATCH v3 04/14 RESEND] cpufreq: Add Hardware P-State (HWP) driver
On 12.05.2023 03:02, Marek Marczykowski-Górecki wrote: > On Wed, May 10, 2023 at 04:19:57PM +0200, Jan Beulich wrote: >> On 10.05.2023 15:54, Jason Andryuk wrote: >>> On Mon, May 8, 2023 at 2:33 AM Jan Beulich wrote: On 05.05.2023 17:35, Jason Andryuk wrote: > On Fri, May 5, 2023 at 3:01 AM Jan Beulich wrote: > The other issue is that if you select "hwp" as the governor, but HWP > hardware support is not available, then hwp_available() needs to reset > the governor back to the default. This feels like a layering > violation. Layering violation - yes. But why would the governor need resetting in this case? If HWP was asked for but isn't available, I don't think any other cpufreq handling (and hence governor) should be put in place. And turning off cpufreq altogether (if necessary in the first place) wouldn't, to me, feel as much like a layering violation. >>> >>> My goal was for Xen to use HWP if available and fallback to the acpi >>> cpufreq driver if not. That to me seems more user-friendly than >>> disabling cpufreq. >>> >>> if ( hwp_available() ) >>> ret = hwp_register_driver(); >>> else >>> ret = cpufreq_register_driver(_cpufreq_driver); >> >> That's fine as a (future) default, but for now using hwp requires a >> command line option, and if that option says "hwp" then it ought to >> be hwp imo. > > As a downstrem distribution, I'd strongly prefer to have an option that > would enable HWP when present and fallback to other driver otherwise, > even if that isn't the default upstream. I can't possibly require large > group of users (either HWP-having or HWP-not-having) to edit the Xen > cmdline to get power management working well. > > If the meaning for cpufreq=hwp absolutely must include "nothing if HWP > is not available", then maybe it should be named cpufreq=try-hwp > instead, or cpufreq=prefer-hwp or something else like this? Any new sub-option needs to fit the existing ones in its meaning. I could see e.g. "cpufreq=xen" alone to effect what you want (once hwp becomes available for use by default). But (for now at least) I continue to think that a request for "hwp" ought to mean HWP. Jan