[qemu-mainline test] 180637: tolerable FAIL - PUSHED

2023-05-12 Thread osstest service owner
flight 180637 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/180637/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180610
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180610
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 180610
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 180610
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 180610
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 180610
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180610
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 180610
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuu278238505d28d292927bff7683f39fb4fbca7fd1
baseline version:
 qemuud530697ca20e19f7a626f4c1c8b26fccd0dc4470

Last test of basis   180610  2023-05-10 23:38:37 Z2 days
Testing same since   180621  2023-05-11 14:27:55 Z1 days3 attempts


People who touched revisions under test:
  Dr. David Alan Gilbert 
  Jamie Iles 
  Juan Quintela 
  Laurent Vivier 
  Lukas Straub 
  Peter Maydell 
  Richard Henderson 
  Thomas Huth 
  Vladimir Sementsov-Ogievskiy 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm  

[PATCH 2/4] automation: enable earlyprintk=xen for both dom0 and domU in hw tests

2023-05-12 Thread Marek Marczykowski-Górecki
Make debugging early boot failures easier based on just CI logs.

Signed-off-by: Marek Marczykowski-Górecki 
---
 automation/scripts/qubes-x86-64.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/automation/scripts/qubes-x86-64.sh 
b/automation/scripts/qubes-x86-64.sh
index ae766395d184..bd09451d7d28 100755
--- a/automation/scripts/qubes-x86-64.sh
+++ b/automation/scripts/qubes-x86-64.sh
@@ -80,7 +80,7 @@ type = "'${test_variant#pci-}'"
 name = "domU"
 kernel = "/boot/vmlinuz"
 ramdisk = "/boot/initrd-domU"
-extra = "root=/dev/ram0 console=hvc0"
+extra = "root=/dev/ram0 console=hvc0 earlyprintk=xen"
 memory = 512
 vif = [ ]
 disk = [ ]
@@ -186,7 +186,7 @@ CONTROLLER=control@thor.testnet
 
 echo "
 multiboot2 (http)/gitlab-ci/xen $CONSOLE_OPTS loglvl=all guest_loglvl=all 
dom0_mem=4G
-module2 (http)/gitlab-ci/vmlinuz console=hvc0 root=/dev/ram0
+module2 (http)/gitlab-ci/vmlinuz console=hvc0 root=/dev/ram0 earlyprintk=xen
 module2 (http)/gitlab-ci/initrd-dom0
 " > $TFTP/grub.cfg
 
-- 
git-series 0.9.1



[PATCH 3/4] automation: add x86_64 tests on a AMD Zen3+ runner

2023-05-12 Thread Marek Marczykowski-Górecki
This adds another physical runner to Gitlab-CI, running similar set of
jobs that the Adler Lake one.

The machine specifically is
MinisForum UM773 Lite with AMD Ryzen 7 7735HS

The PV passthrough test is skipped as currently it fails on this system
with:
(d1) Can't find new memory area for initrd needed due to E820 map conflict

The S3 test is skipped as it currently fails - the system seems to
suspend properly (power LED blinks), but when woken up the power LED
gets back to solid on and the fan spins at top speed and but otherwise there is 
no
signs of if life from the system (no output on the console, HDMI or
anything else).

Signed-off-by: Marek Marczykowski-Górecki 
---
 automation/gitlab-ci/test.yaml | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
index cb7fd5c272e9..81d027532cca 100644
--- a/automation/gitlab-ci/test.yaml
+++ b/automation/gitlab-ci/test.yaml
@@ -108,6 +108,16 @@
   tags:
 - qubes-hw2
 
+.zen3p-x86-64:
+  # it's really similar to the above
+  extends: .adl-x86-64
+  variables:
+PCIDEV: "01:00.0"
+PCIDEV_INTR: "MSI-X"
+CONSOLE_OPTS: "console=com1 com1=115200,8n1,pci,msi"
+  tags:
+- qubes-hw11
+
 # Test jobs
 build-each-commit-gcc:
   extends: .test-jobs-common
@@ -176,6 +186,22 @@ adl-pci-hvm-x86-64-gcc-debug:
 - *x86-64-test-needs
 - alpine-3.12-gcc-debug
 
+zen3p-smoke-x86-64-gcc-debug:
+  extends: .zen3p-x86-64
+  script:
+- ./automation/scripts/qubes-x86-64.sh 2>&1 | tee ${LOGFILE}
+  needs:
+- *x86-64-test-needs
+- alpine-3.12-gcc-debug
+
+zen3p-pci-hvm-x86-64-gcc-debug:
+  extends: .zen3p-x86-64
+  script:
+- ./automation/scripts/qubes-x86-64.sh pci-hvm 2>&1 | tee ${LOGFILE}
+  needs:
+- *x86-64-test-needs
+- alpine-3.12-gcc-debug
+
 qemu-smoke-dom0-arm64-gcc:
   extends: .qemu-arm64
   script:
-- 
git-series 0.9.1



[PATCH 0/4] automation: add AMD hw runner

2023-05-12 Thread Marek Marczykowski-Górecki
This series adds another hardware runner to the CI.
See individual patch descriptions for details.

Marek Marczykowski-Górecki (4):
  automation: make console options configurable via variables
  automation: enable earlyprintk=xen for both dom0 and domU in hw tests
  automation: add x86_64 tests on a AMD Zen3+ runner
  automation: add PV passthrough tests on a AMD Zen3+ runner

 automation/gitlab-ci/test.yaml | 36 +++-
 automation/scripts/qubes-x86-64.sh | 10 -
 2 files changed, 41 insertions(+), 5 deletions(-)

base-commit: 31c65549746179e16cf3f82b694b4b1e0b7545ca
-- 
git-series 0.9.1



[PATCH 1/4] automation: make console options configurable via variables

2023-05-12 Thread Marek Marczykowski-Górecki
This makes the test script easier reusable for different runners, where
console may be connected differently. Include both console= option and
configuration for specific chosen console too (like com1= here) in the
'CONSOLE_OPTS' variable.

Signed-off-by: Marek Marczykowski-Górecki 
---
This will conflict with Stefano's patch, as both modify multiboot2 line,
but it shouldn't be too hard to resolve the conflict manually (both
replace console opts with a variable, and add extra opts at the end).
---
 automation/gitlab-ci/test.yaml | 1 +
 automation/scripts/qubes-x86-64.sh | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
index 55ca0c27dc49..cb7fd5c272e9 100644
--- a/automation/gitlab-ci/test.yaml
+++ b/automation/gitlab-ci/test.yaml
@@ -96,6 +96,7 @@
 LOGFILE: smoke-test.log
 PCIDEV: "03:00.0"
 PCIDEV_INTR: "MSI-X"
+CONSOLE_OPTS: "console=com1 com1=115200,8n1"
   artifacts:
 paths:
   - smoke.serial
diff --git a/automation/scripts/qubes-x86-64.sh 
b/automation/scripts/qubes-x86-64.sh
index 056faf9e6de8..ae766395d184 100755
--- a/automation/scripts/qubes-x86-64.sh
+++ b/automation/scripts/qubes-x86-64.sh
@@ -184,11 +184,11 @@ cd ..
 TFTP=/scratch/gitlab-runner/tftp
 CONTROLLER=control@thor.testnet
 
-echo '
-multiboot2 (http)/gitlab-ci/xen console=com1 com1=115200,8n1 loglvl=all 
guest_loglvl=all dom0_mem=4G
+echo "
+multiboot2 (http)/gitlab-ci/xen $CONSOLE_OPTS loglvl=all guest_loglvl=all 
dom0_mem=4G
 module2 (http)/gitlab-ci/vmlinuz console=hvc0 root=/dev/ram0
 module2 (http)/gitlab-ci/initrd-dom0
-' > $TFTP/grub.cfg
+" > $TFTP/grub.cfg
 
 cp -f binaries/xen $TFTP/xen
 cp -f binaries/bzImage $TFTP/vmlinuz
-- 
git-series 0.9.1



[PATCH 4/4] automation: add PV passthrough tests on a AMD Zen3+ runner

2023-05-12 Thread Marek Marczykowski-Górecki
The PV passthrough test currently fails on this system
with:
(d1) Can't find new memory area for initrd needed due to E820 map conflict

Setting e820_host=1 does not help. So, add this test with
"allow_failure: true" until the problem is fixed.

Signed-off-by: Marek Marczykowski-Górecki 
---
I'm unsure if this should be included. On one hand, the test case will
help verifying potential fix. But on the other hand, until fixed it will
be wasting time.
---
 automation/gitlab-ci/test.yaml |  9 +
 1 file changed, 9 insertions(+)

diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
index 81d027532cca..7becb7a6b782 100644
--- a/automation/gitlab-ci/test.yaml
+++ b/automation/gitlab-ci/test.yaml
@@ -194,6 +194,15 @@ zen3p-smoke-x86-64-gcc-debug:
 - *x86-64-test-needs
 - alpine-3.12-gcc-debug
 
+zen3p-pci-pv-x86-64-gcc-debug:
+  extends: .zen3p-x86-64
+  allow_failure: true
+  script:
+- ./automation/scripts/qubes-x86-64.sh pci-pv 2>&1 | tee ${LOGFILE}
+  needs:
+- *x86-64-test-needs
+- alpine-3.12-gcc-debug
+
 zen3p-pci-hvm-x86-64-gcc-debug:
   extends: .zen3p-x86-64
   script:
-- 
git-series 0.9.1



Re: [PATCH] automation: add a Dom0 PVH test based on Qubes' runner

2023-05-12 Thread Marek Marczykowski-Górecki
On Fri, May 12, 2023 at 06:28:33PM -0700, Stefano Stabellini wrote:
> From: Stefano Stabellini 
> 
> Straightforward Dom0 PVH test based on the existing basic Smoke test for
> the Qubes runner.
> 
> Signed-off-by: Stefano Stabellini 
> ---
>  automation/gitlab-ci/test.yaml |  8 
>  automation/scripts/qubes-x86-64.sh | 14 +-
>  2 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
> index 55ca0c27dc..9c0e08d456 100644
> --- a/automation/gitlab-ci/test.yaml
> +++ b/automation/gitlab-ci/test.yaml
> @@ -149,6 +149,14 @@ adl-smoke-x86-64-gcc-debug:
>  - *x86-64-test-needs
>  - alpine-3.12-gcc-debug
>  
> +adl-smoke-x86-64-dom0pvh-gcc-debug:
> +  extends: .adl-x86-64
> +  script:
> +- ./automation/scripts/qubes-x86-64.sh dom0pvh 2>&1 | tee ${LOGFILE}
> +  needs:
> +- *x86-64-test-needs
> +- alpine-3.12-gcc-debug
> +
>  adl-suspend-x86-64-gcc-debug:
>extends: .adl-x86-64
>script:
> diff --git a/automation/scripts/qubes-x86-64.sh 
> b/automation/scripts/qubes-x86-64.sh
> index 056faf9e6d..35b9386e5d 100755
> --- a/automation/scripts/qubes-x86-64.sh
> +++ b/automation/scripts/qubes-x86-64.sh
> @@ -5,6 +5,7 @@ set -ex
>  test_variant=$1
>  
>  ### defaults
> +dom0pvh=

Maybe better name it extra_xen_opts=? I can see it useful in other tests
in the future too.

>  wait_and_wakeup=
>  timeout=120
>  domU_config='
> @@ -18,8 +19,8 @@ vif = [ "bridge=xenbr0", ]
>  disk = [ ]
>  '
>  
> -### test: smoke test
> -if [ -z "${test_variant}" ]; then
> +### test: smoke test & smoke test PVH
> +if [ -z "${test_variant}" ] || [ "${test_variant}" = "dom0pvh" ]; then
>  passed="ping test passed"
>  domU_check="
>  ifconfig eth0 192.168.0.2
> @@ -36,6 +37,9 @@ done
>  tail -n 100 /var/log/xen/console/guest-domU.log
>  echo \"${passed}\"
>  "
> +if [ "${test_variant}" = "dom0pvh" ]; then
> +dom0pvh="dom0=pvh"
> +fi
>  
>  ### test: S3
>  elif [ "${test_variant}" = "s3" ]; then
> @@ -184,11 +188,11 @@ cd ..
>  TFTP=/scratch/gitlab-runner/tftp
>  CONTROLLER=control@thor.testnet
>  
> -echo '
> -multiboot2 (http)/gitlab-ci/xen console=com1 com1=115200,8n1 loglvl=all 
> guest_loglvl=all dom0_mem=4G
> +echo "
> +multiboot2 (http)/gitlab-ci/xen console=com1 com1=115200,8n1 loglvl=all 
> guest_loglvl=all dom0_mem=4G $dom0pvh
>  module2 (http)/gitlab-ci/vmlinuz console=hvc0 root=/dev/ram0
>  module2 (http)/gitlab-ci/initrd-dom0
> -' > $TFTP/grub.cfg
> +" > $TFTP/grub.cfg
>  
>  cp -f binaries/xen $TFTP/xen
>  cp -f binaries/bzImage $TFTP/vmlinuz
> -- 
> 2.25.1
> 
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


[linux-linus test] 180633: regressions - FAIL

2023-05-12 Thread osstest service owner
flight 180633 linux-linus real [real]
flight 180640 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/180633/
http://logs.test-lab.xenproject.org/osstest/logs/180640/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-credit1   8 xen-boot fail REGR. vs. 180278

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt  8 xen-boot fail  like 180278
 test-armhf-armhf-xl-multivcpu  8 xen-boot fail like 180278
 test-armhf-armhf-libvirt-raw  8 xen-boot fail  like 180278
 test-armhf-armhf-xl-rtds  8 xen-boot fail  like 180278
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180278
 test-armhf-armhf-xl   8 xen-boot fail  like 180278
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180278
 test-armhf-armhf-examine  8 reboot   fail  like 180278
 test-armhf-armhf-libvirt-qcow2  8 xen-bootfail like 180278
 test-armhf-armhf-xl-credit2   8 xen-boot fail  like 180278
 test-armhf-armhf-xl-vhd   8 xen-boot fail  like 180278
 test-armhf-armhf-xl-arndale   8 xen-boot fail  like 180278
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 linuxcc3c44c9fda264c6d401be04e95449a57c1231c6
baseline version:
 linux6c538e1adbfc696ac4747fb10d63e704344f763d

Last test of basis   180278  2023-04-16 19:41:46 Z   26 days
Failing since180281  2023-04-17 06:24:36 Z   25 days   47 attempts
Testing same since   180625  2023-05-11 22:11:49 Z1 days2 attempts


2380 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 

[PATCH] automation: add a Dom0 PVH test based on Qubes' runner

2023-05-12 Thread Stefano Stabellini
From: Stefano Stabellini 

Straightforward Dom0 PVH test based on the existing basic Smoke test for
the Qubes runner.

Signed-off-by: Stefano Stabellini 
---
 automation/gitlab-ci/test.yaml |  8 
 automation/scripts/qubes-x86-64.sh | 14 +-
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
index 55ca0c27dc..9c0e08d456 100644
--- a/automation/gitlab-ci/test.yaml
+++ b/automation/gitlab-ci/test.yaml
@@ -149,6 +149,14 @@ adl-smoke-x86-64-gcc-debug:
 - *x86-64-test-needs
 - alpine-3.12-gcc-debug
 
+adl-smoke-x86-64-dom0pvh-gcc-debug:
+  extends: .adl-x86-64
+  script:
+- ./automation/scripts/qubes-x86-64.sh dom0pvh 2>&1 | tee ${LOGFILE}
+  needs:
+- *x86-64-test-needs
+- alpine-3.12-gcc-debug
+
 adl-suspend-x86-64-gcc-debug:
   extends: .adl-x86-64
   script:
diff --git a/automation/scripts/qubes-x86-64.sh 
b/automation/scripts/qubes-x86-64.sh
index 056faf9e6d..35b9386e5d 100755
--- a/automation/scripts/qubes-x86-64.sh
+++ b/automation/scripts/qubes-x86-64.sh
@@ -5,6 +5,7 @@ set -ex
 test_variant=$1
 
 ### defaults
+dom0pvh=
 wait_and_wakeup=
 timeout=120
 domU_config='
@@ -18,8 +19,8 @@ vif = [ "bridge=xenbr0", ]
 disk = [ ]
 '
 
-### test: smoke test
-if [ -z "${test_variant}" ]; then
+### test: smoke test & smoke test PVH
+if [ -z "${test_variant}" ] || [ "${test_variant}" = "dom0pvh" ]; then
 passed="ping test passed"
 domU_check="
 ifconfig eth0 192.168.0.2
@@ -36,6 +37,9 @@ done
 tail -n 100 /var/log/xen/console/guest-domU.log
 echo \"${passed}\"
 "
+if [ "${test_variant}" = "dom0pvh" ]; then
+dom0pvh="dom0=pvh"
+fi
 
 ### test: S3
 elif [ "${test_variant}" = "s3" ]; then
@@ -184,11 +188,11 @@ cd ..
 TFTP=/scratch/gitlab-runner/tftp
 CONTROLLER=control@thor.testnet
 
-echo '
-multiboot2 (http)/gitlab-ci/xen console=com1 com1=115200,8n1 loglvl=all 
guest_loglvl=all dom0_mem=4G
+echo "
+multiboot2 (http)/gitlab-ci/xen console=com1 com1=115200,8n1 loglvl=all 
guest_loglvl=all dom0_mem=4G $dom0pvh
 module2 (http)/gitlab-ci/vmlinuz console=hvc0 root=/dev/ram0
 module2 (http)/gitlab-ci/initrd-dom0
-' > $TFTP/grub.cfg
+" > $TFTP/grub.cfg
 
 cp -f binaries/xen $TFTP/xen
 cp -f binaries/bzImage $TFTP/vmlinuz
-- 
2.25.1




[PATCH 2/2] xen/x86/pvh: copy ACPI tables to Dom0 instead of mapping

2023-05-12 Thread Stefano Stabellini
From: Stefano Stabellini 

Mapping the ACPI tables to Dom0 PVH 1:1 leads to memory corruptions of
the tables in the guest. Instead, copy the tables to Dom0.

This is a workaround.

Signed-off-by: Stefano Stabellini 
---
As mentioned in the cover letter, this is a RFC workaround as I don't
know the cause of the underlying problem. I do know that this patch
solves what would be otherwise a hang at boot when Dom0 PVH attempts to
parse ACPI tables.
---
 xen/arch/x86/hvm/dom0_build.c | 107 +-
 1 file changed, 27 insertions(+), 80 deletions(-)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 5fde769863..a6037fc6ed 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -73,32 +73,6 @@ static void __init print_order_stats(const struct domain *d)
 printk("order %2u allocations: %u\n", i, order_stats[i]);
 }
 
-static int __init modify_identity_mmio(struct domain *d, unsigned long pfn,
-   unsigned long nr_pages, const bool map)
-{
-int rc;
-
-for ( ; ; )
-{
-rc = map ?   map_mmio_regions(d, _gfn(pfn), nr_pages, _mfn(pfn))
- : unmap_mmio_regions(d, _gfn(pfn), nr_pages, _mfn(pfn));
-if ( rc == 0 )
-break;
-if ( rc < 0 )
-{
-printk(XENLOG_WARNING
-   "Failed to identity %smap [%#lx,%#lx) for d%d: %d\n",
-   map ? "" : "un", pfn, pfn + nr_pages, d->domain_id, rc);
-break;
-}
-nr_pages -= rc;
-pfn += rc;
-process_pending_softirqs();
-}
-
-return rc;
-}
-
 /* Populate a HVM memory range using the biggest possible order. */
 static int __init pvh_populate_memory_range(struct domain *d,
 unsigned long start,
@@ -967,6 +941,8 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, 
paddr_t madt_addr,
 unsigned long size = sizeof(*xsdt);
 unsigned int i, j, num_tables = 0;
 int rc;
+struct acpi_table_fadt fadt;
+unsigned long fadt_addr = 0, dsdt_addr = 0, facs_addr = 0, fadt_size = 0;
 struct acpi_table_header header = {
 .signature= "XSDT",
 .length   = sizeof(struct acpi_table_header),
@@ -1013,10 +989,33 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, 
paddr_t madt_addr,
 /* Copy the addresses of the rest of the allowed tables. */
 for( i = 0, j = 1; i < acpi_gbl_root_table_list.count; i++ )
 {
+void *table;
+
+pvh_steal_ram(d, tables[i].length, 0, GB(4), addr);
+table = acpi_os_map_memory(tables[i].address, tables[i].length);
+hvm_copy_to_guest_phys(*addr, table, tables[i].length, d->vcpu[0]);
+pvh_add_mem_range(d, *addr, *addr + tables[i].length, E820_ACPI);
+
+if ( !strncmp(tables[i].signature.ascii, ACPI_SIG_FADT, 
ACPI_NAME_SIZE) )
+{
+memcpy(, table, tables[i].length);
+fadt_addr = *addr;
+fadt_size = tables[i].length;
+}
+else if ( !strncmp(tables[i].signature.ascii, ACPI_SIG_DSDT, 
ACPI_NAME_SIZE) )
+dsdt_addr = *addr;
+else if ( !strncmp(tables[i].signature.ascii, ACPI_SIG_FACS, 
ACPI_NAME_SIZE) )
+facs_addr = *addr;
+
 if ( pvh_acpi_xsdt_table_allowed(tables[i].signature.ascii,
- tables[i].address, tables[i].length) )
-xsdt->table_offset_entry[j++] = tables[i].address;
+tables[i].address, tables[i].length) )
+xsdt->table_offset_entry[j++] = *addr;
+
+acpi_os_unmap_memory(table, tables[i].length);
 }
+fadt.dsdt = dsdt_addr;
+fadt.facs = facs_addr;
+hvm_copy_to_guest_phys(fadt_addr, , fadt_size, d->vcpu[0]);
 
 xsdt->header.revision = 1;
 xsdt->header.length = size;
@@ -1055,9 +1054,7 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, 
paddr_t madt_addr,
 
 static int __init pvh_setup_acpi(struct domain *d, paddr_t start_info)
 {
-unsigned long pfn, nr_pages;
 paddr_t madt_paddr, xsdt_paddr, rsdp_paddr;
-unsigned int i;
 int rc;
 struct acpi_table_rsdp *native_rsdp, rsdp = {
 .signature = ACPI_SIG_RSDP,
@@ -1065,56 +1062,6 @@ static int __init pvh_setup_acpi(struct domain *d, 
paddr_t start_info)
 .length = sizeof(rsdp),
 };
 
-
-/* Scan top-level tables and add their regions to the guest memory map. */
-for( i = 0; i < acpi_gbl_root_table_list.count; i++ )
-{
-const char *sig = acpi_gbl_root_table_list.tables[i].signature.ascii;
-unsigned long addr = acpi_gbl_root_table_list.tables[i].address;
-unsigned long size = acpi_gbl_root_table_list.tables[i].length;
-
-/*
- * Make sure the original MADT is also mapped, so that Dom0 can
- * properly access the data returned by _MAT methods in case it's
- * re-using MADT 

[PATCH 1/2] xen/x86/pvh: use preset XSDT header for XSDT generation

2023-05-12 Thread Stefano Stabellini
From: Stefano Stabellini 

Xen always generates a XSDT table even if the firmware provided a RSDT
table. Instead of copying the XSDT header from the firmware table (that
might be missing), generate the XSDT header from a preset.

Signed-off-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/dom0_build.c | 32 +---
 1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 307edc6a8c..5fde769863 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -963,13 +963,18 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, 
paddr_t madt_addr,
   paddr_t *addr)
 {
 struct acpi_table_xsdt *xsdt;
-struct acpi_table_header *table;
-struct acpi_table_rsdp *rsdp;
 const struct acpi_table_desc *tables = acpi_gbl_root_table_list.tables;
 unsigned long size = sizeof(*xsdt);
 unsigned int i, j, num_tables = 0;
-paddr_t xsdt_paddr;
 int rc;
+struct acpi_table_header header = {
+.signature= "XSDT",
+.length   = sizeof(struct acpi_table_header),
+.revision = 0x1,
+.oem_id   = "Xen",
+.oem_table_id = "HVM",
+.oem_revision = 0,
+};
 
 /*
  * Restore original DMAR table signature, we are going to filter it from
@@ -1001,26 +1006,7 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, 
paddr_t madt_addr,
 goto out;
 }
 
-/* Copy the native XSDT table header. */
-rsdp = acpi_os_map_memory(acpi_os_get_root_pointer(), sizeof(*rsdp));
-if ( !rsdp )
-{
-printk("Unable to map RSDP\n");
-rc = -EINVAL;
-goto out;
-}
-xsdt_paddr = rsdp->xsdt_physical_address;
-acpi_os_unmap_memory(rsdp, sizeof(*rsdp));
-table = acpi_os_map_memory(xsdt_paddr, sizeof(*table));
-if ( !table )
-{
-printk("Unable to map XSDT\n");
-rc = -EINVAL;
-goto out;
-}
-xsdt->header = *table;
-acpi_os_unmap_memory(table, sizeof(*table));
-
+xsdt->header = header;
 /* Add the custom MADT. */
 xsdt->table_offset_entry[0] = madt_addr;
 
-- 
2.25.1




[PATCH 0/2] PVH Dom0 on QEMU

2023-05-12 Thread Stefano Stabellini
Hi all,

These 2 patches are necessary to boot Xen and Dom0 PVH on QEMU (with or
without KVM acceleration.)

The first one is a genuine fix. The second one is a workaround: I don't
know the underlying root cause of the problem.

Cheers,

Stefano



PVH feature omission

2023-05-12 Thread Elliott Mitchell
>From tools/libs/light/libxl_x86.c: libxl__arch_passthrough_mode_setdefault()

"passthrough not yet supported for x86 PVH guests\n"

So PVH is recommended for most situations, but it is /still/ impossible
to pass hardware devices to PVH domains?  Seems odd this has never been
addressed with how long PVH has been around.

The other tools omission I noticed is it appears `xl` has no support for
pvSCSI?  Might be infrequently used, but seems similarly valuable for
domains focused on handling storage.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





[patch V4 31/37] x86/smpboot: Enable split CPU startup

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

The x86 CPU bringup state currently does AP wake-up, wait for AP to
respond and then release it for full bringup.

It is safe to be split into a wake-up and and a separate wait+release
state.

Provide the required functions and enable the split CPU bringup, which
prepares for parallel bringup, where the bringup of the non-boot CPUs takes
two iterations: One to prepare and wake all APs and the second to wait and
release them. Depending on timing this can eliminate the wait time
completely.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/Kconfig   |2 +-
 arch/x86/include/asm/smp.h |9 ++---
 arch/x86/kernel/smp.c  |2 +-
 arch/x86/kernel/smpboot.c  |8 
 arch/x86/xen/smp_pv.c  |4 ++--
 5 files changed, 10 insertions(+), 15 deletions(-)
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -274,8 +274,8 @@ config X86
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_USER_RETURN_NOTIFIER
select HAVE_GENERIC_VDSO
-   select HOTPLUG_CORE_SYNC_FULL   if SMP
select HOTPLUG_SMT  if SMP
+   select HOTPLUG_SPLIT_STARTUPif SMP
select IRQ_FORCED_THREADING
select NEED_PER_CPU_EMBED_FIRST_CHUNK
select NEED_PER_CPU_PAGE_FIRST_CHUNK
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -40,7 +40,7 @@ struct smp_ops {
 
void (*cleanup_dead_cpu)(unsigned cpu);
void (*poll_sync_state)(void);
-   int (*cpu_up)(unsigned cpu, struct task_struct *tidle);
+   int (*kick_ap_alive)(unsigned cpu, struct task_struct *tidle);
int (*cpu_disable)(void);
void (*cpu_die)(unsigned int cpu);
void (*play_dead)(void);
@@ -80,11 +80,6 @@ static inline void smp_cpus_done(unsigne
smp_ops.smp_cpus_done(max_cpus);
 }
 
-static inline int __cpu_up(unsigned int cpu, struct task_struct *tidle)
-{
-   return smp_ops.cpu_up(cpu, tidle);
-}
-
 static inline int __cpu_disable(void)
 {
return smp_ops.cpu_disable();
@@ -124,7 +119,7 @@ void native_smp_prepare_cpus(unsigned in
 void calculate_max_logical_packages(void);
 void native_smp_cpus_done(unsigned int max_cpus);
 int common_cpu_up(unsigned int cpunum, struct task_struct *tidle);
-int native_cpu_up(unsigned int cpunum, struct task_struct *tidle);
+int native_kick_ap(unsigned int cpu, struct task_struct *tidle);
 int native_cpu_disable(void);
 void __noreturn hlt_play_dead(void);
 void native_play_dead(void);
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -268,7 +268,7 @@ struct smp_ops smp_ops = {
 #endif
.smp_send_reschedule= native_smp_send_reschedule,
 
-   .cpu_up = native_cpu_up,
+   .kick_ap_alive  = native_kick_ap,
.cpu_disable= native_cpu_disable,
.play_dead  = native_play_dead,
 
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1052,7 +1052,7 @@ static int do_boot_cpu(int apicid, int c
return ret;
 }
 
-static int native_kick_ap(unsigned int cpu, struct task_struct *tidle)
+int native_kick_ap(unsigned int cpu, struct task_struct *tidle)
 {
int apicid = apic->cpu_present_to_apicid(cpu);
int err;
@@ -1088,15 +1088,15 @@ static int native_kick_ap(unsigned int c
return err;
 }
 
-int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
+int arch_cpuhp_kick_ap_alive(unsigned int cpu, struct task_struct *tidle)
 {
-   return native_kick_ap(cpu, tidle);
+   return smp_ops.kick_ap_alive(cpu, tidle);
 }
 
 void arch_cpuhp_cleanup_kick_cpu(unsigned int cpu)
 {
/* Cleanup possible dangling ends... */
-   if (smp_ops.cpu_up == native_cpu_up && x86_platform.legacy.warm_reset)
+   if (smp_ops.kick_ap_alive == native_kick_ap && 
x86_platform.legacy.warm_reset)
smpboot_restore_warm_reset_vector();
 }
 
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -314,7 +314,7 @@ cpu_initialize_context(unsigned int cpu,
return 0;
 }
 
-static int xen_pv_cpu_up(unsigned int cpu, struct task_struct *idle)
+static int xen_pv_kick_ap(unsigned int cpu, struct task_struct *idle)
 {
int rc;
 
@@ -438,7 +438,7 @@ static const struct smp_ops xen_smp_ops
.smp_prepare_cpus = xen_pv_smp_prepare_cpus,
.smp_cpus_done = xen_smp_cpus_done,
 
-   .cpu_up = xen_pv_cpu_up,
+   .kick_ap_alive = xen_pv_kick_ap,
.cpu_die = xen_pv_cpu_die,
.cleanup_dead_cpu = xen_pv_cleanup_dead_cpu,
.poll_sync_state = xen_pv_poll_sync_state,




[patch V4 34/37] x86/apic: Save the APIC virtual base address

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

For parallel CPU brinugp it's required to read the APIC ID in the low level
startup code. The virtual APIC base address is a constant because its a
fix-mapped address. Exposing that constant which is composed via macros to
assembly code is non-trivial due to header inclusion hell.

Aside of that it's constant only because of the vsyscall ABI
requirement. Once vsyscall is out of the picture the fixmap can be placed
at runtime.

Avoid header hell, stay flexible and store the address in a variable which
can be exposed to the low level startup code.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
V4: Fixed changelog typo - Sergey
---
 arch/x86/include/asm/smp.h  |1 +
 arch/x86/kernel/apic/apic.c |4 
 2 files changed, 5 insertions(+)
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -196,6 +196,7 @@ extern void nmi_selftest(void);
 #endif
 
 extern unsigned int smpboot_control;
+extern unsigned long apic_mmio_base;
 
 #endif /* !__ASSEMBLY__ */
 
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -101,6 +101,9 @@ static int apic_extnmi __ro_after_init =
  */
 static bool virt_ext_dest_id __ro_after_init;
 
+/* For parallel bootup. */
+unsigned long apic_mmio_base __ro_after_init;
+
 /*
  * Map cpu index to physical APIC ID
  */
@@ -2163,6 +2166,7 @@ void __init register_lapic_address(unsig
 
if (!x2apic_mode) {
set_fixmap_nocache(FIX_APIC_BASE, address);
+   apic_mmio_base = APIC_BASE;
apic_printk(APIC_VERBOSE, "mapped APIC to %16lx (%16lx)\n",
APIC_BASE, address);
}






[patch V4 22/37] ARM: smp: Switch to hotplug core state synchronization

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

Switch to the CPU hotplug core state tracking and synchronization
mechanim. No functional change intended.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/arm/Kconfig   |1 +
 arch/arm/include/asm/smp.h |2 +-
 arch/arm/kernel/smp.c  |   18 +++---
 3 files changed, 9 insertions(+), 12 deletions(-)
---

--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -124,6 +124,7 @@ config ARM
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_UID16
select HAVE_VIRT_CPU_ACCOUNTING_GEN
+   select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select IRQ_FORCED_THREADING
select MODULES_USE_ELF_REL
select NEED_DMA_MAP_STATE
--- a/arch/arm/include/asm/smp.h
+++ b/arch/arm/include/asm/smp.h
@@ -64,7 +64,7 @@ extern void secondary_startup_arm(void);
 
 extern int __cpu_disable(void);
 
-extern void __cpu_die(unsigned int cpu);
+static inline void __cpu_die(unsigned int cpu) { }
 
 extern void arch_send_call_function_single_ipi(int cpu);
 extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -288,15 +288,11 @@ int __cpu_disable(void)
 }
 
 /*
- * called on the thread which is asking for a CPU to be shutdown -
- * waits until shutdown has completed, or it is timed out.
+ * called on the thread which is asking for a CPU to be shutdown after the
+ * shutdown completed.
  */
-void __cpu_die(unsigned int cpu)
+void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
 {
-   if (!cpu_wait_death(cpu, 5)) {
-   pr_err("CPU%u: cpu didn't die\n", cpu);
-   return;
-   }
pr_debug("CPU%u: shutdown\n", cpu);
 
clear_tasks_mm_cpumask(cpu);
@@ -336,11 +332,11 @@ void __noreturn arch_cpu_idle_dead(void)
flush_cache_louis();
 
/*
-* Tell __cpu_die() that this CPU is now safe to dispose of.  Once
-* this returns, power and/or clocks can be removed at any point
-* from this CPU and its cache by platform_cpu_kill().
+* Tell cpuhp_bp_sync_dead() that this CPU is now safe to dispose
+* of. Once this returns, power and/or clocks can be removed at
+* any point from this CPU and its cache by platform_cpu_kill().
 */
-   (void)cpu_report_death();
+   cpuhp_ap_report_dead();
 
/*
 * Ensure that the cache lines associated with that completion are






[patch V4 33/37] cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

There is often significant latency in the early stages of CPU bringup, and
time is wasted by waking each CPU (e.g. with SIPI/INIT/INIT on x86) and
then waiting for it to respond before moving on to the next.

Allow a platform to enable parallel setup which brings all to be onlined
CPUs up to the CPUHP_BP_KICK_AP state. While this state advancement on the
control CPU (BP) is single-threaded the important part is the last state
CPUHP_BP_KICK_AP which wakes the to be onlined CPUs up.

This allows the CPUs to run up to the first sychronization point
cpuhp_ap_sync_alive() where they wait for the control CPU to release them
one by one for the full onlining procedure.

This parallelism depends on the CPU hotplug core sync mechanism which
ensures that the parallel brought up CPUs wait for release before touching
any state which would make the CPU visible to anything outside the hotplug
control mechanism.

To handle the SMT constraints of X86 correctly the bringup happens in two
iterations when CONFIG_HOTPLUG_SMT is enabled. The control CPU brings up
the primary SMT threads of each core first, which can load the microcode
without the need to rendevouz with the thread siblings. Once that's
completed it brings up the secondary SMT threads.

Co-developed-by: David Woodhouse 
Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 Documentation/admin-guide/kernel-parameters.txt |6 +
 arch/Kconfig|4 
 include/linux/cpuhotplug.h  |1 
 kernel/cpu.c|  103 ++--
 4 files changed, 109 insertions(+), 5 deletions(-)
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -838,6 +838,12 @@
on every CPU online, such as boot, and resume from 
suspend.
Default: 1
 
+   cpuhp.parallel=
+   [SMP] Enable/disable parallel bringup of secondary CPUs
+   Format: 
+   Default is enabled if CONFIG_HOTPLUG_PARALLEL=y. 
Otherwise
+   the parameter has no effect.
+
crash_kexec_post_notifiers
Run kdump after running panic-notifiers and dumping
kmsg. This only for the users who doubt kdump always
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -53,6 +53,10 @@ config HOTPLUG_SPLIT_STARTUP
bool
select HOTPLUG_CORE_SYNC_FULL
 
+config HOTPLUG_PARALLEL
+   bool
+   select HOTPLUG_SPLIT_STARTUP
+
 config GENERIC_ENTRY
bool
 
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -524,6 +524,7 @@ void cpuhp_ap_sync_alive(void);
 void arch_cpuhp_sync_state_poll(void);
 void arch_cpuhp_cleanup_kick_cpu(unsigned int cpu);
 int arch_cpuhp_kick_ap_alive(unsigned int cpu, struct task_struct *tidle);
+bool arch_cpuhp_init_parallel_bringup(void);
 
 #ifdef CONFIG_HOTPLUG_CORE_SYNC_DEAD
 void cpuhp_ap_report_dead(void);
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -649,8 +649,23 @@ bool cpu_smt_possible(void)
cpu_smt_control != CPU_SMT_NOT_SUPPORTED;
 }
 EXPORT_SYMBOL_GPL(cpu_smt_possible);
+
+static inline bool cpuhp_smt_aware(void)
+{
+   return topology_smt_supported();
+}
+
+static inline const struct cpumask *cpuhp_get_primary_thread_mask(void)
+{
+   return cpu_primary_thread_mask;
+}
 #else
 static inline bool cpu_smt_allowed(unsigned int cpu) { return true; }
+static inline bool cpuhp_smt_aware(void) { return false; }
+static inline const struct cpumask *cpuhp_get_primary_thread_mask(void)
+{
+   return cpu_present_mask;
+}
 #endif
 
 static inline enum cpuhp_state
@@ -1747,16 +1762,94 @@ int bringup_hibernate_cpu(unsigned int s
return 0;
 }
 
-void __init bringup_nonboot_cpus(unsigned int setup_max_cpus)
+static void __init cpuhp_bringup_mask(const struct cpumask *mask, unsigned int 
ncpus,
+ enum cpuhp_state target)
 {
unsigned int cpu;
 
-   for_each_present_cpu(cpu) {
-   if (num_online_cpus() >= setup_max_cpus)
+   for_each_cpu(cpu, mask) {
+   struct cpuhp_cpu_state *st = per_cpu_ptr(_state, cpu);
+
+   if (!--ncpus)
break;
-   if (!cpu_online(cpu))
-   cpu_up(cpu, CPUHP_ONLINE);
+
+   if (cpu_up(cpu, target) && can_rollback_cpu(st)) {
+   /*
+* If this failed then cpu_up() might have only
+* rolled back to CPUHP_BP_KICK_AP for the final
+* online. Clean it up. NOOP if already rolled back.
+*/
+   WARN_ON(cpuhp_invoke_callback_range(false, cpu, st, 
CPUHP_OFFLINE));
+   }
+   }
+}
+
+#ifdef CONFIG_HOTPLUG_PARALLEL
+static bool 

[patch V4 29/37] cpu/hotplug: Reset task stack state in _cpu_up()

2023-05-12 Thread Thomas Gleixner
From: David Woodhouse 

Commit dce1ca0525bf ("sched/scs: Reset task stack state in bringup_cpu()")
ensured that the shadow call stack and KASAN poisoning were removed from
a CPU's stack each time that CPU is brought up, not just once.

This is not incorrect. However, with parallel bringup the idle thread setup
will happen at a different step. As a consequence the cleanup in
bringup_cpu() would be too late.

Move the SCS/KASAN cleanup to the generic _cpu_up() function instead,
which already ensures that the new CPU's stack is available, purely to
allow for early failure. This occurs when the CPU to be brought up is
in the CPUHP_OFFLINE state, which should correctly do the cleanup any
time the CPU has been taken down to the point where such is needed.

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Tested-by: Mark Rutland 
Tested-by: Michael Kelley 
Reviewed-by: Mark Rutland 
---
 kernel/cpu.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)


--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -771,12 +771,6 @@ static int bringup_cpu(unsigned int cpu)
return -EAGAIN;
 
/*
-* Reset stale stack state from the last time this CPU was online.
-*/
-   scs_task_reset(idle);
-   kasan_unpoison_task_stack(idle);
-
-   /*
 * Some architectures have to walk the irq descriptors to
 * setup the vector space for the cpu which comes online.
 *
@@ -1587,6 +1581,12 @@ static int _cpu_up(unsigned int cpu, int
ret = PTR_ERR(idle);
goto out;
}
+
+   /*
+* Reset stale stack state from the last time this CPU was 
online.
+*/
+   scs_task_reset(idle);
+   kasan_unpoison_task_stack(idle);
}
 
cpuhp_tasks_frozen = tasks_frozen;




[patch V4 26/37] parisc: Switch to hotplug core state synchronization

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

Switch to the CPU hotplug core state tracking and synchronization
mechanim. No functional change intended.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/parisc/Kconfig  |1 +
 arch/parisc/kernel/process.c |4 ++--
 arch/parisc/kernel/smp.c |7 +++
 3 files changed, 6 insertions(+), 6 deletions(-)


--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -57,6 +57,7 @@ config PARISC
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_REGS_AND_STACK_ACCESS_API
+   select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select GENERIC_SCHED_CLOCK
select GENERIC_IRQ_MIGRATION if SMP
select HAVE_UNSTABLE_SCHED_CLOCK if SMP
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -166,8 +166,8 @@ void __noreturn arch_cpu_idle_dead(void)
 
local_irq_disable();
 
-   /* Tell __cpu_die() that this CPU is now safe to dispose of. */
-   (void)cpu_report_death();
+   /* Tell the core that this CPU is now safe to dispose of. */
+   cpuhp_ap_report_dead();
 
/* Ensure that the cache lines are written out. */
flush_cache_all_local();
--- a/arch/parisc/kernel/smp.c
+++ b/arch/parisc/kernel/smp.c
@@ -500,11 +500,10 @@ int __cpu_disable(void)
 void __cpu_die(unsigned int cpu)
 {
pdc_cpu_rendezvous_lock();
+}
 
-   if (!cpu_wait_death(cpu, 5)) {
-   pr_crit("CPU%u: cpu didn't die\n", cpu);
-   return;
-   }
+void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
+{
pr_info("CPU%u: is shutting down\n", cpu);
 
/* set task's state to interruptible sleep */






[patch V4 28/37] cpu/hotplug: Remove unused state functions

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

All users converted to the hotplug core mechanism.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 include/linux/cpu.h |2 -
 kernel/smpboot.c|   75 
 2 files changed, 77 deletions(-)


--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -193,8 +193,6 @@ static inline void play_idle(unsigned lo
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-bool cpu_wait_death(unsigned int cpu, int seconds);
-bool cpu_report_death(void);
 void cpuhp_report_idle_dead(void);
 #else
 static inline void cpuhp_report_idle_dead(void) { }
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -325,78 +325,3 @@ void smpboot_unregister_percpu_thread(st
cpus_read_unlock();
 }
 EXPORT_SYMBOL_GPL(smpboot_unregister_percpu_thread);
-
-#ifndef CONFIG_HOTPLUG_CORE_SYNC
-static DEFINE_PER_CPU(atomic_t, cpu_hotplug_state) = 
ATOMIC_INIT(CPU_POST_DEAD);
-
-#ifdef CONFIG_HOTPLUG_CPU
-/*
- * Wait for the specified CPU to exit the idle loop and die.
- */
-bool cpu_wait_death(unsigned int cpu, int seconds)
-{
-   int jf_left = seconds * HZ;
-   int oldstate;
-   bool ret = true;
-   int sleep_jf = 1;
-
-   might_sleep();
-
-   /* The outgoing CPU will normally get done quite quickly. */
-   if (atomic_read(_cpu(cpu_hotplug_state, cpu)) == CPU_DEAD)
-   goto update_state_early;
-   udelay(5);
-
-   /* But if the outgoing CPU dawdles, wait increasingly long times. */
-   while (atomic_read(_cpu(cpu_hotplug_state, cpu)) != CPU_DEAD) {
-   schedule_timeout_uninterruptible(sleep_jf);
-   jf_left -= sleep_jf;
-   if (jf_left <= 0)
-   break;
-   sleep_jf = DIV_ROUND_UP(sleep_jf * 11, 10);
-   }
-update_state_early:
-   oldstate = atomic_read(_cpu(cpu_hotplug_state, cpu));
-update_state:
-   if (oldstate == CPU_DEAD) {
-   /* Outgoing CPU died normally, update state. */
-   smp_mb(); /* atomic_read() before update. */
-   atomic_set(_cpu(cpu_hotplug_state, cpu), CPU_POST_DEAD);
-   } else {
-   /* Outgoing CPU still hasn't died, set state accordingly. */
-   if (!atomic_try_cmpxchg(_cpu(cpu_hotplug_state, cpu),
-   , CPU_BROKEN))
-   goto update_state;
-   ret = false;
-   }
-   return ret;
-}
-
-/*
- * Called by the outgoing CPU to report its successful death.  Return
- * false if this report follows the surviving CPU's timing out.
- *
- * A separate "CPU_DEAD_FROZEN" is used when the surviving CPU
- * timed out.  This approach allows architectures to omit calls to
- * cpu_check_up_prepare() and cpu_set_state_online() without defeating
- * the next cpu_wait_death()'s polling loop.
- */
-bool cpu_report_death(void)
-{
-   int oldstate;
-   int newstate;
-   int cpu = smp_processor_id();
-
-   oldstate = atomic_read(_cpu(cpu_hotplug_state, cpu));
-   do {
-   if (oldstate != CPU_BROKEN)
-   newstate = CPU_DEAD;
-   else
-   newstate = CPU_DEAD_FROZEN;
-   } while (!atomic_try_cmpxchg(_cpu(cpu_hotplug_state, cpu),
-, newstate));
-   return newstate == CPU_DEAD;
-}
-
-#endif /* #ifdef CONFIG_HOTPLUG_CPU */
-#endif /* !CONFIG_HOTPLUG_CORE_SYNC */






[patch V4 32/37] x86/apic: Provide cpu_primary_thread mask

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

Make the primary thread tracking CPU mask based in preparation for simpler
handling of parallel bootup.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/include/asm/apic.h |2 --
 arch/x86/include/asm/topology.h |   19 +++
 arch/x86/kernel/apic/apic.c |   20 +---
 arch/x86/kernel/smpboot.c   |   12 +++-
 4 files changed, 27 insertions(+), 26 deletions(-)
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -506,10 +506,8 @@ extern int default_check_phys_apicid_pre
 #endif /* CONFIG_X86_LOCAL_APIC */
 
 #ifdef CONFIG_SMP
-bool apic_id_is_primary_thread(unsigned int id);
 void apic_smt_update(void);
 #else
-static inline bool apic_id_is_primary_thread(unsigned int id) { return false; }
 static inline void apic_smt_update(void) { }
 #endif
 
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -31,9 +31,9 @@
  * CONFIG_NUMA.
  */
 #include 
+#include 
 
 #ifdef CONFIG_NUMA
-#include 
 
 #include 
 #include 
@@ -139,9 +139,20 @@ static inline int topology_max_smt_threa
 int topology_update_package_map(unsigned int apicid, unsigned int cpu);
 int topology_update_die_map(unsigned int dieid, unsigned int cpu);
 int topology_phys_to_logical_pkg(unsigned int pkg);
-bool topology_is_primary_thread(unsigned int cpu);
 bool topology_smt_supported(void);
-#else
+
+extern struct cpumask __cpu_primary_thread_mask;
+#define cpu_primary_thread_mask ((const struct cpumask 
*)&__cpu_primary_thread_mask)
+
+/**
+ * topology_is_primary_thread - Check whether CPU is the primary SMT thread
+ * @cpu:   CPU to check
+ */
+static inline bool topology_is_primary_thread(unsigned int cpu)
+{
+   return cpumask_test_cpu(cpu, cpu_primary_thread_mask);
+}
+#else /* CONFIG_SMP */
 #define topology_max_packages()(1)
 static inline int
 topology_update_package_map(unsigned int apicid, unsigned int cpu) { return 0; 
}
@@ -152,7 +163,7 @@ static inline int topology_max_die_per_p
 static inline int topology_max_smt_threads(void) { return 1; }
 static inline bool topology_is_primary_thread(unsigned int cpu) { return true; 
}
 static inline bool topology_smt_supported(void) { return false; }
-#endif
+#endif /* !CONFIG_SMP */
 
 static inline void arch_fix_phys_package_id(int num, u32 slot)
 {
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2386,20 +2386,16 @@ bool arch_match_cpu_phys_id(int cpu, u64
 }
 
 #ifdef CONFIG_SMP
-/**
- * apic_id_is_primary_thread - Check whether APIC ID belongs to a primary 
thread
- * @apicid: APIC ID to check
- */
-bool apic_id_is_primary_thread(unsigned int apicid)
+static void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid)
 {
-   u32 mask;
-
-   if (smp_num_siblings == 1)
-   return true;
/* Isolate the SMT bit(s) in the APICID and check for 0 */
-   mask = (1U << (fls(smp_num_siblings) - 1)) - 1;
-   return !(apicid & mask);
+   u32 mask = (1U << (fls(smp_num_siblings) - 1)) - 1;
+
+   if (smp_num_siblings == 1 || !(apicid & mask))
+   cpumask_set_cpu(cpu, &__cpu_primary_thread_mask);
 }
+#else
+static inline void cpu_mark_primary_thread(unsigned int cpu, unsigned int 
apicid) { }
 #endif
 
 /*
@@ -2544,6 +2540,8 @@ int generic_processor_info(int apicid, i
set_cpu_present(cpu, true);
num_processors++;
 
+   cpu_mark_primary_thread(cpu, apicid);
+
return cpu;
 }
 
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -102,6 +102,9 @@ EXPORT_PER_CPU_SYMBOL(cpu_die_map);
 DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info);
 EXPORT_PER_CPU_SYMBOL(cpu_info);
 
+/* CPUs which are the primary SMT threads */
+struct cpumask __cpu_primary_thread_mask __read_mostly;
+
 /* Representing CPUs for which sibling maps can be computed */
 static cpumask_var_t cpu_sibling_setup_mask;
 
@@ -277,15 +280,6 @@ static void notrace start_secondary(void
 }
 
 /**
- * topology_is_primary_thread - Check whether CPU is the primary SMT thread
- * @cpu:   CPU to check
- */
-bool topology_is_primary_thread(unsigned int cpu)
-{
-   return apic_id_is_primary_thread(per_cpu(x86_cpu_to_apicid, cpu));
-}
-
-/**
  * topology_smt_supported - Check whether SMT is supported by the CPUs
  */
 bool topology_smt_supported(void)




[patch V4 21/37] cpu/hotplug: Remove cpu_report_state() and related unused cruft

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

No more users.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 include/linux/cpu.h |2 -
 kernel/smpboot.c|   90 
 2 files changed, 92 deletions(-)


--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -184,8 +184,6 @@ void arch_cpu_idle_enter(void);
 void arch_cpu_idle_exit(void);
 void __noreturn arch_cpu_idle_dead(void);
 
-int cpu_report_state(int cpu);
-int cpu_check_up_prepare(int cpu);
 void cpu_set_state_online(int cpu);
 void play_idle_precise(u64 duration_ns, u64 latency_ns);
 
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -329,97 +329,7 @@ EXPORT_SYMBOL_GPL(smpboot_unregister_per
 #ifndef CONFIG_HOTPLUG_CORE_SYNC
 static DEFINE_PER_CPU(atomic_t, cpu_hotplug_state) = 
ATOMIC_INIT(CPU_POST_DEAD);
 
-/*
- * Called to poll specified CPU's state, for example, when waiting for
- * a CPU to come online.
- */
-int cpu_report_state(int cpu)
-{
-   return atomic_read(_cpu(cpu_hotplug_state, cpu));
-}
-
-/*
- * If CPU has died properly, set its state to CPU_UP_PREPARE and
- * return success.  Otherwise, return -EBUSY if the CPU died after
- * cpu_wait_death() timed out.  And yet otherwise again, return -EAGAIN
- * if cpu_wait_death() timed out and the CPU still hasn't gotten around
- * to dying.  In the latter two cases, the CPU might not be set up
- * properly, but it is up to the arch-specific code to decide.
- * Finally, -EIO indicates an unanticipated problem.
- *
- * Note that it is permissible to omit this call entirely, as is
- * done in architectures that do no CPU-hotplug error checking.
- */
-int cpu_check_up_prepare(int cpu)
-{
-   if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
-   atomic_set(_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
-   return 0;
-   }
-
-   switch (atomic_read(_cpu(cpu_hotplug_state, cpu))) {
-
-   case CPU_POST_DEAD:
-
-   /* The CPU died properly, so just start it up again. */
-   atomic_set(_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
-   return 0;
-
-   case CPU_DEAD_FROZEN:
-
-   /*
-* Timeout during CPU death, so let caller know.
-* The outgoing CPU completed its processing, but after
-* cpu_wait_death() timed out and reported the error. The
-* caller is free to proceed, in which case the state
-* will be reset properly by cpu_set_state_online().
-* Proceeding despite this -EBUSY return makes sense
-* for systems where the outgoing CPUs take themselves
-* offline, with no post-death manipulation required from
-* a surviving CPU.
-*/
-   return -EBUSY;
-
-   case CPU_BROKEN:
-
-   /*
-* The most likely reason we got here is that there was
-* a timeout during CPU death, and the outgoing CPU never
-* did complete its processing.  This could happen on
-* a virtualized system if the outgoing VCPU gets preempted
-* for more than five seconds, and the user attempts to
-* immediately online that same CPU.  Trying again later
-* might return -EBUSY above, hence -EAGAIN.
-*/
-   return -EAGAIN;
-
-   case CPU_UP_PREPARE:
-   /*
-* Timeout while waiting for the CPU to show up. Allow to try
-* again later.
-*/
-   return 0;
-
-   default:
-
-   /* Should not happen.  Famous last words. */
-   return -EIO;
-   }
-}
-
-/*
- * Mark the specified CPU online.
- *
- * Note that it is permissible to omit this call entirely, as is
- * done in architectures that do no CPU-hotplug error checking.
- */
-void cpu_set_state_online(int cpu)
-{
-   (void)atomic_xchg(_cpu(cpu_hotplug_state, cpu), CPU_ONLINE);
-}
-
 #ifdef CONFIG_HOTPLUG_CPU
-
 /*
  * Wait for the specified CPU to exit the idle loop and die.
  */






[patch V4 25/37] MIPS: SMP_CPS: Switch to hotplug core state synchronization

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

Switch to the CPU hotplug core state tracking and synchronization
mechanim. This unfortunately requires to add dead reporting to the non CPS
platforms as CPS is the only user, but it allows an overall consolidation
of this functionality.

No functional change intended.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/mips/Kconfig   |1 +
 arch/mips/cavium-octeon/smp.c   |1 +
 arch/mips/include/asm/smp-ops.h |1 +
 arch/mips/kernel/smp-bmips.c|1 +
 arch/mips/kernel/smp-cps.c  |   14 +-
 arch/mips/kernel/smp.c  |8 
 arch/mips/loongson64/smp.c  |1 +
 7 files changed, 18 insertions(+), 9 deletions(-)


--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2285,6 +2285,7 @@ config MIPS_CPS
select MIPS_CM
select MIPS_CPS_PM if HOTPLUG_CPU
select SMP
+   select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select SYNC_R4K if (CEVT_R4K || CSRC_R4K)
select SYS_SUPPORTS_HOTPLUG_CPU
select SYS_SUPPORTS_SCHED_SMT if CPU_MIPSR6
--- a/arch/mips/cavium-octeon/smp.c
+++ b/arch/mips/cavium-octeon/smp.c
@@ -345,6 +345,7 @@ void play_dead(void)
int cpu = cpu_number_map(cvmx_get_core_num());
 
idle_task_exit();
+   cpuhp_ap_report_dead();
octeon_processor_boot = 0xff;
per_cpu(cpu_state, cpu) = CPU_DEAD;
 
--- a/arch/mips/include/asm/smp-ops.h
+++ b/arch/mips/include/asm/smp-ops.h
@@ -33,6 +33,7 @@ struct plat_smp_ops {
 #ifdef CONFIG_HOTPLUG_CPU
int (*cpu_disable)(void);
void (*cpu_die)(unsigned int cpu);
+   void (*cleanup_dead_cpu)(unsigned cpu);
 #endif
 #ifdef CONFIG_KEXEC
void (*kexec_nonboot_cpu)(void);
--- a/arch/mips/kernel/smp-bmips.c
+++ b/arch/mips/kernel/smp-bmips.c
@@ -392,6 +392,7 @@ static void bmips_cpu_die(unsigned int c
 void __ref play_dead(void)
 {
idle_task_exit();
+   cpuhp_ap_report_dead();
 
/* flush data cache */
_dma_cache_wback_inv(0, ~0);
--- a/arch/mips/kernel/smp-cps.c
+++ b/arch/mips/kernel/smp-cps.c
@@ -503,8 +503,7 @@ void play_dead(void)
}
}
 
-   /* This CPU has chosen its way out */
-   (void)cpu_report_death();
+   cpuhp_ap_report_dead();
 
cps_shutdown_this_cpu(cpu_death);
 
@@ -527,7 +526,9 @@ static void wait_for_sibling_halt(void *
} while (!(halted & TCHALT_H));
 }
 
-static void cps_cpu_die(unsigned int cpu)
+static void cps_cpu_die(unsigned int cpu) { }
+
+static void cps_cleanup_dead_cpu(unsigned cpu)
 {
unsigned core = cpu_core(_data[cpu]);
unsigned int vpe_id = cpu_vpe_id(_data[cpu]);
@@ -535,12 +536,6 @@ static void cps_cpu_die(unsigned int cpu
unsigned stat;
int err;
 
-   /* Wait for the cpu to choose its way out */
-   if (!cpu_wait_death(cpu, 5)) {
-   pr_err("CPU%u: didn't offline\n", cpu);
-   return;
-   }
-
/*
 * Now wait for the CPU to actually offline. Without doing this that
 * offlining may race with one or more of:
@@ -624,6 +619,7 @@ static const struct plat_smp_ops cps_smp
 #ifdef CONFIG_HOTPLUG_CPU
.cpu_disable= cps_cpu_disable,
.cpu_die= cps_cpu_die,
+   .cleanup_dead_cpu   = cps_cleanup_dead_cpu,
 #endif
 #ifdef CONFIG_KEXEC
.kexec_nonboot_cpu  = cps_kexec_nonboot_cpu,
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -690,6 +690,14 @@ void flush_tlb_one(unsigned long vaddr)
 EXPORT_SYMBOL(flush_tlb_page);
 EXPORT_SYMBOL(flush_tlb_one);
 
+#ifdef CONFIG_HOTPLUG_CORE_SYNC_DEAD
+void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
+{
+   if (mp_ops->cleanup_dead_cpu)
+   mp_ops->cleanup_dead_cpu(cpu);
+}
+#endif
+
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 
 static void tick_broadcast_callee(void *info)
--- a/arch/mips/loongson64/smp.c
+++ b/arch/mips/loongson64/smp.c
@@ -775,6 +775,7 @@ void play_dead(void)
void (*play_dead_at_ckseg1)(int *);
 
idle_task_exit();
+   cpuhp_ap_report_dead();
 
prid_imp = read_c0_prid() & PRID_IMP_MASK;
prid_rev = read_c0_prid() & PRID_REV_MASK;




[patch V4 37/37] x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

Implement the validation function which tells the core code whether
parallel bringup is possible.

The only condition for now is that the kernel does not run in an encrypted
guest as these will trap the RDMSR via #VC, which cannot be handled at that
point in early startup.

There was an earlier variant for AMD-SEV which used the GHBC protocol for
retrieving the APIC ID via CPUID, but there is no guarantee that the
initial APIC ID in CPUID is the same as the real APIC ID. There is no
enforcement from the secure firmware and the hypervisor can assign APIC IDs
as it sees fit as long as the ACPI/MADT table is consistent with that
assignment.

Unfortunately there is no RDMSR GHCB protocol at the moment, so enabling
AMD-SEV guests for parallel startup needs some more thought.

Intel-TDX provides a secure RDMSR hypercall, but supporting that is outside
the scope of this change.

Fixup announce_cpu() as e.g. on Hyper-V CPU1 is the secondary sibling of
CPU0, which makes the @cpu == 1 logic in announce_cpu() fall apart.

[ mikelley: Reported the announce_cpu() fallout

Originally-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
V2: Fixup announce_cpu() - Michael Kelley
V3: Fixup announce_cpu() for real - Michael Kelley
---
 arch/x86/Kconfig |3 -
 arch/x86/kernel/cpu/common.c |6 --
 arch/x86/kernel/smpboot.c|   87 +++
 3 files changed, 75 insertions(+), 21 deletions(-)
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -274,8 +274,9 @@ config X86
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_USER_RETURN_NOTIFIER
select HAVE_GENERIC_VDSO
+   select HOTPLUG_PARALLEL if SMP && X86_64
select HOTPLUG_SMT  if SMP
-   select HOTPLUG_SPLIT_STARTUPif SMP
+   select HOTPLUG_SPLIT_STARTUPif SMP && X86_32
select IRQ_FORCED_THREADING
select NEED_PER_CPU_EMBED_FIRST_CHUNK
select NEED_PER_CPU_PAGE_FIRST_CHUNK
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2128,11 +2128,7 @@ static inline void setup_getcpu(int cpu)
 }
 
 #ifdef CONFIG_X86_64
-static inline void ucode_cpu_init(int cpu)
-{
-   if (cpu)
-   load_ucode_ap();
-}
+static inline void ucode_cpu_init(int cpu) { }
 
 static inline void tss_setup_ist(struct tss_struct *tss)
 {
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -58,6 +58,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -75,7 +76,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -128,7 +129,6 @@ int arch_update_cpu_topology(void)
return retval;
 }
 
-
 static unsigned int smpboot_warm_reset_vector_count;
 
 static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip)
@@ -226,16 +226,43 @@ static void notrace start_secondary(void
 */
cr4_init();
 
-#ifdef CONFIG_X86_32
-   /* switch away from the initial page table */
-   load_cr3(swapper_pg_dir);
-   __flush_tlb_all();
-#endif
+   /*
+* 32-bit specific. 64-bit reaches this code with the correct page
+* table established. Yet another historical divergence.
+*/
+   if (IS_ENABLED(CONFIG_X86_32)) {
+   /* switch away from the initial page table */
+   load_cr3(swapper_pg_dir);
+   __flush_tlb_all();
+   }
+
cpu_init_exception_handling();
 
/*
-* Synchronization point with the hotplug core. Sets the
-* synchronization state to ALIVE and waits for the control CPU to
+* 32-bit systems load the microcode from the ASM startup code for
+* historical reasons.
+*
+* On 64-bit systems load it before reaching the AP alive
+* synchronization point below so it is not part of the full per
+* CPU serialized bringup part when "parallel" bringup is enabled.
+*
+* That's even safe when hyperthreading is enabled in the CPU as
+* the core code starts the primary threads first and leaves the
+* secondary threads waiting for SIPI. Loading microcode on
+* physical cores concurrently is a safe operation.
+*
+* This covers both the Intel specific issue that concurrent
+* microcode loading on SMT siblings must be prohibited and the
+* vendor independent issue`that microcode loading which changes
+* CPUID, MSRs etc. must be strictly serialized to maintain
+* software state correctness.
+*/
+   if (IS_ENABLED(CONFIG_X86_64))
+   load_ucode_ap();
+
+   /*
+* Synchronization point with the hotplug core. Sets this CPUs
+* synchronization state to ALIVE and spin-waits for the control CPU to
 * release this CPU for further bringup.
 */
cpuhp_ap_sync_alive();
@@ -918,9 

[patch V4 24/37] csky/smp: Switch to hotplug core state synchronization

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

Switch to the CPU hotplug core state tracking and synchronization
mechanim. No functional change intended.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/csky/Kconfig   |1 +
 arch/csky/include/asm/smp.h |2 +-
 arch/csky/kernel/smp.c  |8 ++--
 3 files changed, 4 insertions(+), 7 deletions(-)


--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -96,6 +96,7 @@ config CSKY
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
+   select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select MAY_HAVE_SPARSE_IRQ
select MODULES_USE_ELF_RELA if MODULES
select OF
--- a/arch/csky/include/asm/smp.h
+++ b/arch/csky/include/asm/smp.h
@@ -23,7 +23,7 @@ void __init set_send_ipi(void (*func)(co
 
 int __cpu_disable(void);
 
-void __cpu_die(unsigned int cpu);
+static inline void __cpu_die(unsigned int cpu) { }
 
 #endif /* CONFIG_SMP */
 
--- a/arch/csky/kernel/smp.c
+++ b/arch/csky/kernel/smp.c
@@ -291,12 +291,8 @@ int __cpu_disable(void)
return 0;
 }
 
-void __cpu_die(unsigned int cpu)
+void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
 {
-   if (!cpu_wait_death(cpu, 5)) {
-   pr_crit("CPU%u: shutdown failed\n", cpu);
-   return;
-   }
pr_notice("CPU%u: shutdown\n", cpu);
 }
 
@@ -304,7 +300,7 @@ void __noreturn arch_cpu_idle_dead(void)
 {
idle_task_exit();
 
-   cpu_report_death();
+   cpuhp_ap_report_dead();
 
while (!secondary_stack)
arch_cpu_idle();






[patch V4 36/37] x86/smpboot: Support parallel startup of secondary CPUs

2023-05-12 Thread Thomas Gleixner
From: David Woodhouse 

In parallel startup mode the APs are kicked alive by the control CPU
quickly after each other and run through the early startup code in
parallel. The real-mode startup code is already serialized with a
bit-spinlock to protect the real-mode stack.

In parallel startup mode the smpboot_control variable obviously cannot
contain the Linux CPU number so the APs have to determine their Linux CPU
number on their own. This is required to find the CPUs per CPU offset in
order to find the idle task stack and other per CPU data.

To achieve this, export the cpuid_to_apicid[] array so that each AP can
find its own CPU number by searching therein based on its APIC ID.

Introduce a flag in the top bits of smpboot_control which indicates that
the AP should find its CPU number by reading the APIC ID from the APIC.

This is required because CPUID based APIC ID retrieval can only provide the
initial APIC ID, which might have been overruled by the firmware. Some AMD
APUs come up with APIC ID = initial APIC ID + 0x10, so the APIC ID to CPU
number lookup would fail miserably if based on CPUID. Also virtualization
can make its own APIC ID assignements. The only requirement is that the
APIC IDs are consistent with the APCI/MADT table.

For the boot CPU or in case parallel bringup is disabled the control bits
are empty and the CPU number is directly available in bit 0-23 of
smpboot_control.

[ tglx: Initial proof of concept patch with bitlock and APIC ID lookup ]
[ dwmw2: Rework and testing, commit message, CPUID 0x1 and CPU0 support ]
[ seanc: Fix stray override of initial_gs in common_cpu_up() ]
[ Oleksandr Natalenko: reported suspend/resume issue fixed in
  x86_acpi_suspend_lowlevel ]
[ tglx: Make it read the APIC ID from the APIC instead of using CPUID,
split the bitlock part out ]

Co-developed-by: Thomas Gleixner 
Co-developed-by: Brian Gerst 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Brian Gerst 
Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
V4: Remove the lock prefix in the error path - Peter Z.
---
 arch/x86/include/asm/apic.h|2 +
 arch/x86/include/asm/apicdef.h |5 ++-
 arch/x86/include/asm/smp.h |6 
 arch/x86/kernel/acpi/sleep.c   |9 +-
 arch/x86/kernel/apic/apic.c|2 -
 arch/x86/kernel/head_64.S  |   61 +
 arch/x86/kernel/smpboot.c  |2 -
 7 files changed, 83 insertions(+), 4 deletions(-)
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -55,6 +55,8 @@ extern int local_apic_timer_c2_ok;
 extern int disable_apic;
 extern unsigned int lapic_timer_period;
 
+extern int cpuid_to_apicid[];
+
 extern enum apic_intr_mode_id apic_intr_mode;
 enum apic_intr_mode_id {
APIC_PIC,
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -138,7 +138,8 @@
 #defineAPIC_EILVT_MASKED   (1 << 16)
 
 #define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
-#define APIC_BASE_MSR  0x800
+#define APIC_BASE_MSR  0x800
+#define APIC_X2APIC_ID_MSR 0x802
 #define XAPIC_ENABLE   (1UL << 11)
 #define X2APIC_ENABLE  (1UL << 10)
 
@@ -162,6 +163,7 @@
 #define APIC_CPUID(apicid) ((apicid) & XAPIC_DEST_CPUS_MASK)
 #define NUM_APIC_CLUSTERS  ((BAD_APICID + 1) >> XAPIC_DEST_CPUS_SHIFT)
 
+#ifndef __ASSEMBLY__
 /*
  * the local APIC register structure, memory mapped. Not terribly well
  * tested, but we might eventually use this one in the future - the
@@ -435,4 +437,5 @@ enum apic_delivery_modes {
APIC_DELIVERY_MODE_EXTINT   = 7,
 };
 
+#endif /* !__ASSEMBLY__ */
 #endif /* _ASM_X86_APICDEF_H */
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -200,4 +200,10 @@ extern unsigned long apic_mmio_base;
 
 #endif /* !__ASSEMBLY__ */
 
+/* Control bits for startup_64 */
+#define STARTUP_READ_APICID0x8000
+
+/* Top 8 bits are reserved for control */
+#define STARTUP_PARALLEL_MASK  0xFF00
+
 #endif /* _ASM_X86_SMP_H */
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "../../realmode/rm/wakeup.h"
@@ -127,7 +128,13 @@ int x86_acpi_suspend_lowlevel(void)
 * value is in the actual %rsp register.
 */
current->thread.sp = (unsigned long)temp_stack + sizeof(temp_stack);
-   smpboot_control = smp_processor_id();
+   /*
+* Ensure the CPU knows which one it is when it comes back, if
+* it isn't in parallel mode and expected to work that out for
+* itself.
+*/
+   if (!(smpboot_control & STARTUP_PARALLEL_MASK))
+   smpboot_control = smp_processor_id();
 #endif
initial_code = (unsigned long)wakeup_long64;
saved_magic = 0x123456789abcdef0L;
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2380,7 +2380,7 @@ static int nr_logical_cpuids = 1;
 /*
  * Used to 

[patch V4 30/37] cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

The bring up logic of a to be onlined CPU consists of several parts, which
are considered to be a single hotplug state:

  1) Control CPU issues the wake-up

  2) To be onlined CPU starts up, does the minimal initialization,
 reports to be alive and waits for release into the complete bring-up.

  3) Control CPU waits for the alive report and releases the upcoming CPU
 for the complete bring-up.

Allow to split this into two states:

  1) Control CPU issues the wake-up

 After that the to be onlined CPU starts up, does the minimal
 initialization, reports to be alive and waits for release into the
 full bring-up. As this can run after the control CPU dropped the
 hotplug locks the code which is executed on the AP before it reports
 alive has to be carefully audited to not violate any of the hotplug
 constraints, especially not modifying any of the various cpumasks.

 This is really only meant to avoid waiting for the AP to react on the
 wake-up. Of course an architecture can move strict CPU related setup
 functionality, e.g. microcode loading, with care before the
 synchronization point to save further pointless waiting time.

  2) Control CPU waits for the alive report and releases the upcoming CPU
 for the complete bring-up.

This allows that the two states can be split up to run all to be onlined
CPUs up to state #1 on the control CPU and then at a later point run state
#2. This spares some of the latencies of the full serialized per CPU
bringup by avoiding the per CPU wakeup/wait serialization. The assumption
is that the first AP already waits when the last AP has been woken up. This
obvioulsy depends on the hardware latencies and depending on the timings
this might still not completely eliminate all wait scenarios.

This split is just a preparatory step for enabling the parallel bringup
later. The boot time bringup is still fully serialized. It has a separate
config switch so that architectures which want to support parallel bringup
can test the split of the CPUHP_BRINGUG step separately.

To enable this the architecture must support the CPU hotplug core sync
mechanism and has to be audited that there are no implicit hotplug state
dependencies which require a fully serialized bringup.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/Kconfig   |4 ++
 include/linux/cpuhotplug.h |4 ++
 kernel/cpu.c   |   70 +++--
 3 files changed, 76 insertions(+), 2 deletions(-)
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -49,6 +49,10 @@ config HOTPLUG_CORE_SYNC_FULL
select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select HOTPLUG_CORE_SYNC
 
+config HOTPLUG_SPLIT_STARTUP
+   bool
+   select HOTPLUG_CORE_SYNC_FULL
+
 config GENERIC_ENTRY
bool
 
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -133,6 +133,7 @@ enum cpuhp_state {
CPUHP_MIPS_SOC_PREPARE,
CPUHP_BP_PREPARE_DYN,
CPUHP_BP_PREPARE_DYN_END= CPUHP_BP_PREPARE_DYN + 20,
+   CPUHP_BP_KICK_AP,
CPUHP_BRINGUP_CPU,
 
/*
@@ -517,9 +518,12 @@ void cpuhp_online_idle(enum cpuhp_state
 static inline void cpuhp_online_idle(enum cpuhp_state state) { }
 #endif
 
+struct task_struct;
+
 void cpuhp_ap_sync_alive(void);
 void arch_cpuhp_sync_state_poll(void);
 void arch_cpuhp_cleanup_kick_cpu(unsigned int cpu);
+int arch_cpuhp_kick_ap_alive(unsigned int cpu, struct task_struct *tidle);
 
 #ifdef CONFIG_HOTPLUG_CORE_SYNC_DEAD
 void cpuhp_ap_report_dead(void);
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -761,6 +761,47 @@ static int bringup_wait_for_ap_online(un
return 0;
 }
 
+#ifdef CONFIG_HOTPLUG_SPLIT_STARTUP
+static int cpuhp_kick_ap_alive(unsigned int cpu)
+{
+   if (!cpuhp_can_boot_ap(cpu))
+   return -EAGAIN;
+
+   return arch_cpuhp_kick_ap_alive(cpu, idle_thread_get(cpu));
+}
+
+static int cpuhp_bringup_ap(unsigned int cpu)
+{
+   struct cpuhp_cpu_state *st = per_cpu_ptr(_state, cpu);
+   int ret;
+
+   /*
+* Some architectures have to walk the irq descriptors to
+* setup the vector space for the cpu which comes online.
+* Prevent irq alloc/free across the bringup.
+*/
+   irq_lock_sparse();
+
+   ret = cpuhp_bp_sync_alive(cpu);
+   if (ret)
+   goto out_unlock;
+
+   ret = bringup_wait_for_ap_online(cpu);
+   if (ret)
+   goto out_unlock;
+
+   irq_unlock_sparse();
+
+   if (st->target <= CPUHP_AP_ONLINE_IDLE)
+   return 0;
+
+   return cpuhp_kick_ap(cpu, st, st->target);
+
+out_unlock:
+   irq_unlock_sparse();
+   return ret;
+}
+#else
 static int bringup_cpu(unsigned int cpu)
 {
struct cpuhp_cpu_state *st = per_cpu_ptr(_state, cpu);
@@ -781,7 +822,6 @@ static int bringup_cpu(unsigned int cpu)
 */
irq_lock_sparse();
 
-   /* 

[patch V4 35/37] x86/smpboot: Implement a bit spinlock to protect the realmode stack

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

Parallel AP bringup requires that the APs can run fully parallel through
the early startup code including the real mode trampoline.

To prepare for this implement a bit-spinlock to serialize access to the
real mode stack so that parallel upcoming APs are not going to corrupt each
others stack while going through the real mode startup code.

Co-developed-by: David Woodhouse 
Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
V4: Simplify the lock implementation - Peter Z.
---
 arch/x86/include/asm/realmode.h  |3 +++
 arch/x86/kernel/head_64.S|   12 
 arch/x86/realmode/init.c |3 +++
 arch/x86/realmode/rm/trampoline_64.S |   23 ++-
 4 files changed, 36 insertions(+), 5 deletions(-)
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -52,6 +52,7 @@ struct trampoline_header {
u64 efer;
u32 cr4;
u32 flags;
+   u32 lock;
 #endif
 };
 
@@ -64,6 +65,8 @@ extern unsigned long initial_stack;
 extern unsigned long initial_vc_handler;
 #endif
 
+extern u32 *trampoline_lock;
+
 extern unsigned char real_mode_blob[];
 extern unsigned char real_mode_relocs[];
 
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -252,6 +252,16 @@ SYM_INNER_LABEL(secondary_startup_64_no_
movqTASK_threadsp(%rax), %rsp
 
/*
+* Now that this CPU is running on its own stack, drop the realmode
+* protection. For the boot CPU the pointer is NULL!
+*/
+   movqtrampoline_lock(%rip), %rax
+   testq   %rax, %rax
+   jz  .Lsetup_gdt
+   movl$0, (%rax)
+
+.Lsetup_gdt:
+   /*
 * We must switch to a new descriptor in kernel space for the GDT
 * because soon the kernel won't have access anymore to the userspace
 * addresses where we're currently running on. We have to do that here
@@ -433,6 +443,8 @@ SYM_DATA(initial_code,  .quad x86_64_star
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 SYM_DATA(initial_vc_handler,   .quad handle_vc_boot_ghcb)
 #endif
+
+SYM_DATA(trampoline_lock, .quad 0);
__FINITDATA
 
__INIT
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -154,6 +154,9 @@ static void __init setup_real_mode(void)
 
trampoline_header->flags = 0;
 
+   trampoline_lock = _header->lock;
+   *trampoline_lock = 0;
+
trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
 
/* Map the real mode stub as virtual == physical */
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -37,6 +37,20 @@
.text
.code16
 
+.macro LOAD_REALMODE_ESP
+   /*
+* Make sure only one CPU fiddles with the realmode stack
+   */
+.Llock_rm\@:
+lock btsl   $0, tr_lock
+jnc 2f
+pause
+jmp .Llock_rm\@
+2:
+   # Setup stack
+   movl$rm_stack_end, %esp
+.endm
+
.balign PAGE_SIZE
 SYM_CODE_START(trampoline_start)
cli # We should be safe anyway
@@ -49,8 +63,7 @@ SYM_CODE_START(trampoline_start)
mov %ax, %es
mov %ax, %ss
 
-   # Setup stack
-   movl$rm_stack_end, %esp
+   LOAD_REALMODE_ESP
 
callverify_cpu  # Verify the cpu supports long mode
testl   %eax, %eax  # Check for return code
@@ -93,8 +106,7 @@ SYM_CODE_START(sev_es_trampoline_start)
mov %ax, %es
mov %ax, %ss
 
-   # Setup stack
-   movl$rm_stack_end, %esp
+   LOAD_REALMODE_ESP
 
jmp .Lswitch_to_protected
 SYM_CODE_END(sev_es_trampoline_start)
@@ -177,7 +189,7 @@ SYM_CODE_START(pa_trampoline_compat)
 * In compatibility mode.  Prep ESP and DX for startup_32, then disable
 * paging and complete the switch to legacy 32-bit mode.
 */
-   movl$rm_stack_end, %esp
+   LOAD_REALMODE_ESP
movw$__KERNEL_DS, %dx
 
movl$(CR0_STATE & ~X86_CR0_PG), %eax
@@ -241,6 +253,7 @@ SYM_DATA_START(trampoline_header)
SYM_DATA(tr_efer,   .space 8)
SYM_DATA(tr_cr4,.space 4)
SYM_DATA(tr_flags,  .space 4)
+   SYM_DATA(tr_lock,   .space 4)
 SYM_DATA_END(trampoline_header)
 
 #include "trampoline_common.S"




[patch V4 27/37] riscv: Switch to hotplug core state synchronization

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

Switch to the CPU hotplug core state tracking and synchronization
mechanim. No functional change intended.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
Acked-by: Palmer Dabbelt 
---
 arch/riscv/Kconfig  |1 +
 arch/riscv/include/asm/smp.h|2 +-
 arch/riscv/kernel/cpu-hotplug.c |   14 +++---
 3 files changed, 9 insertions(+), 8 deletions(-)
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -122,6 +122,7 @@ config RISCV
select HAVE_RSEQ
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
+   select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN
--- a/arch/riscv/include/asm/smp.h
+++ b/arch/riscv/include/asm/smp.h
@@ -70,7 +70,7 @@ asmlinkage void smp_callin(void);
 
 #if defined CONFIG_HOTPLUG_CPU
 int __cpu_disable(void);
-void __cpu_die(unsigned int cpu);
+static inline void __cpu_die(unsigned int cpu) { }
 #endif /* CONFIG_HOTPLUG_CPU */
 
 #else
--- a/arch/riscv/kernel/cpu-hotplug.c
+++ b/arch/riscv/kernel/cpu-hotplug.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -49,17 +50,15 @@ int __cpu_disable(void)
return ret;
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
 /*
- * Called on the thread which is asking for a CPU to be shutdown.
+ * Called on the thread which is asking for a CPU to be shutdown, if the
+ * CPU reported dead to the hotplug core.
  */
-void __cpu_die(unsigned int cpu)
+void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
 {
int ret = 0;
 
-   if (!cpu_wait_death(cpu, 5)) {
-   pr_err("CPU %u: didn't die\n", cpu);
-   return;
-   }
pr_notice("CPU%u: off\n", cpu);
 
/* Verify from the firmware if the cpu is really stopped*/
@@ -76,9 +75,10 @@ void __noreturn arch_cpu_idle_dead(void)
 {
idle_task_exit();
 
-   (void)cpu_report_death();
+   cpuhp_ap_report_dead();
 
cpu_ops[smp_processor_id()]->cpu_stop();
/* It should never reach here */
BUG();
 }
+#endif






[patch V4 23/37] arm64: smp: Switch to hotplug core state synchronization

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

Switch to the CPU hotplug core state tracking and synchronization
mechanim. No functional change intended.

Signed-off-by: Thomas Gleixner 
Tested-by: Mark Rutland 
Tested-by: Michael Kelley 
---
 arch/arm64/Kconfig   |1 +
 arch/arm64/include/asm/smp.h |2 +-
 arch/arm64/kernel/smp.c  |   14 +-
 3 files changed, 7 insertions(+), 10 deletions(-)


--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -222,6 +222,7 @@ config ARM64
select HAVE_KPROBES
select HAVE_KRETPROBES
select HAVE_GENERIC_VDSO
+   select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -99,7 +99,7 @@ static inline void arch_send_wakeup_ipi_
 
 extern int __cpu_disable(void);
 
-extern void __cpu_die(unsigned int cpu);
+static inline void __cpu_die(unsigned int cpu) { }
 extern void __noreturn cpu_die(void);
 extern void __noreturn cpu_die_early(void);
 
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -332,17 +332,13 @@ static int op_cpu_kill(unsigned int cpu)
 }
 
 /*
- * called on the thread which is asking for a CPU to be shutdown -
- * waits until shutdown has completed, or it is timed out.
+ * Called on the thread which is asking for a CPU to be shutdown after the
+ * shutdown completed.
  */
-void __cpu_die(unsigned int cpu)
+void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
 {
int err;
 
-   if (!cpu_wait_death(cpu, 5)) {
-   pr_crit("CPU%u: cpu didn't die\n", cpu);
-   return;
-   }
pr_debug("CPU%u: shutdown\n", cpu);
 
/*
@@ -369,8 +365,8 @@ void __noreturn cpu_die(void)
 
local_daif_mask();
 
-   /* Tell __cpu_die() that this CPU is now safe to dispose of */
-   (void)cpu_report_death();
+   /* Tell cpuhp_bp_sync_dead() that this CPU is now safe to dispose of */
+   cpuhp_ap_report_dead();
 
/*
 * Actually shutdown the CPU. This must never fail. The specific hotplug






Re: [PATCH v2] piix: fix regression during unplug in Xen HVM domUs

2023-05-12 Thread Stefano Stabellini
On Wed, 10 May 2023, Olaf Hering wrote:
> Wed, 10 May 2023 00:58:27 +0200 Olaf Hering :
> 
> > In my debugging (with v8.0.0) it turned out the three pci_set_word
> > causes the domU to hang. In fact, it is just the last one:
> > 
> >pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
> > 
> > It changes the value from 0xc121 to 0x1.
> 
> If I disable just "pci_set_word(pci_conf + PCI_COMMAND, 0x);" it works as 
> well.
> It changes the value from 0x5 to 0.
> 
> In general I feel it is wrong to fiddle with PCI from the host side.
> This is most likely not the intention of the Xen unplug protocol.
> I'm sure the guest does not expect such changes under the hood.
> It happens to work by luck with pvops kernels because their PCI discovery
> is done after the unplug.
> 
> So, what do we do here to get this off the table?

I don't have a concrete suggestion because I don't understand the root
cause of the issue. Looking back at Paolo's reply from 2021

https://marc.info/?l=xen-devel=161669099305992=2

I think he was right. We can either fix the root cause of the issue or
avoid calling qdev_reset_all on unplug. I am OK with either one.



[patch V4 16/37] x86/smpboot: Remove wait for cpu_online()

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

Now that the core code drops sparse_irq_lock after the idle thread
synchronized, it's pointless to wait for the AP to mark itself online.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/kernel/smpboot.c |   26 ++
 1 file changed, 2 insertions(+), 24 deletions(-)

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -275,7 +275,6 @@ static void notrace start_secondary(void
 * half valid vector space.
 */
lock_vector_lock();
-   /* Sync point with do_wait_cpu_online() */
set_cpu_online(smp_processor_id(), true);
lapic_online();
unlock_vector_lock();
@@ -1104,20 +1103,6 @@ static int wait_cpu_initialized(unsigned
return 0;
 }
 
-/*
- * Bringup step three: Wait for the target AP to reach set_cpu_online() in
- * start_secondary().
- */
-static void wait_cpu_online(unsigned int cpu)
-{
-   /*
-* Wait for the AP to mark itself online, so the core caller
-* can drop sparse_irq_lock.
-*/
-   while (!cpu_online(cpu))
-   schedule();
-}
-
 static int native_kick_ap(unsigned int cpu, struct task_struct *tidle)
 {
int apicid = apic->cpu_present_to_apicid(cpu);
@@ -1164,16 +1149,9 @@ int native_cpu_up(unsigned int cpu, stru
int ret;
 
ret = native_kick_ap(cpu, tidle);
-   if (ret)
-   goto out;
-
-   ret = wait_cpu_initialized(cpu);
-   if (ret)
-   goto out;
-
-   wait_cpu_online(cpu);
+   if (!ret)
+   ret = wait_cpu_initialized(cpu);
 
-out:
/* Cleanup possible dangling ends... */
if (x86_platform.legacy.warm_reset)
smpboot_restore_warm_reset_vector();




[patch V4 20/37] x86/smpboot: Switch to hotplug core state synchronization

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

The new AP state tracking and synchronization mechanism in the CPU hotplug
core code allows to remove quite some x86 specific code:

  1) The AP alive synchronization based on cpumasks

  2) The decision whether an AP can be brought up again

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
V2: Use for_each_online_cpu() - Brian
---
 arch/x86/Kconfig   |1 
 arch/x86/include/asm/smp.h |7 +
 arch/x86/kernel/smp.c  |1 
 arch/x86/kernel/smpboot.c  |  165 +++--
 arch/x86/xen/smp_hvm.c |   16 +---
 arch/x86/xen/smp_pv.c  |   39 ++
 6 files changed, 75 insertions(+), 154 deletions(-)


--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -274,6 +274,7 @@ config X86
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_USER_RETURN_NOTIFIER
select HAVE_GENERIC_VDSO
+   select HOTPLUG_CORE_SYNC_FULL   if SMP
select HOTPLUG_SMT  if SMP
select IRQ_FORCED_THREADING
select NEED_PER_CPU_EMBED_FIRST_CHUNK
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -38,6 +38,8 @@ struct smp_ops {
void (*crash_stop_other_cpus)(void);
void (*smp_send_reschedule)(int cpu);
 
+   void (*cleanup_dead_cpu)(unsigned cpu);
+   void (*poll_sync_state)(void);
int (*cpu_up)(unsigned cpu, struct task_struct *tidle);
int (*cpu_disable)(void);
void (*cpu_die)(unsigned int cpu);
@@ -90,7 +92,8 @@ static inline int __cpu_disable(void)
 
 static inline void __cpu_die(unsigned int cpu)
 {
-   smp_ops.cpu_die(cpu);
+   if (smp_ops.cpu_die)
+   smp_ops.cpu_die(cpu);
 }
 
 static inline void __noreturn play_dead(void)
@@ -123,8 +126,6 @@ void native_smp_cpus_done(unsigned int m
 int common_cpu_up(unsigned int cpunum, struct task_struct *tidle);
 int native_cpu_up(unsigned int cpunum, struct task_struct *tidle);
 int native_cpu_disable(void);
-int common_cpu_die(unsigned int cpu);
-void native_cpu_die(unsigned int cpu);
 void __noreturn hlt_play_dead(void);
 void native_play_dead(void);
 void play_dead_common(void);
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -269,7 +269,6 @@ struct smp_ops smp_ops = {
.smp_send_reschedule= native_smp_send_reschedule,
 
.cpu_up = native_cpu_up,
-   .cpu_die= native_cpu_die,
.cpu_disable= native_cpu_disable,
.play_dead  = native_play_dead,
 
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -101,9 +102,6 @@ EXPORT_PER_CPU_SYMBOL(cpu_die_map);
 DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info);
 EXPORT_PER_CPU_SYMBOL(cpu_info);
 
-/* All of these masks are initialized in setup_cpu_local_masks() */
-static cpumask_var_t cpu_initialized_mask;
-static cpumask_var_t cpu_callout_mask;
 /* Representing CPUs for which sibling maps can be computed */
 static cpumask_var_t cpu_sibling_setup_mask;
 
@@ -166,10 +164,10 @@ static void ap_starting(void)
int cpuid = smp_processor_id();
 
/*
-* If woken up by an INIT in an 82489DX configuration
-* cpu_callout_mask guarantees the CPU does not reach this point
-* before an INIT_deassert IPI reaches the local APIC, so it is now
-* safe to touch the local APIC.
+* If woken up by an INIT in an 82489DX configuration the alive
+* synchronization guarantees that the CPU does not reach this
+* point before an INIT_deassert IPI reaches the local APIC, so it
+* is now safe to touch the local APIC.
 *
 * Set up this CPU, first the APIC, which is probably redundant on
 * most boards.
@@ -213,17 +211,6 @@ static void ap_calibrate_delay(void)
cpu_data(smp_processor_id()).loops_per_jiffy = loops_per_jiffy;
 }
 
-static void wait_for_master_cpu(int cpu)
-{
-   /*
-* Wait for release by control CPU before continuing with AP
-* initialization.
-*/
-   WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask));
-   while (!cpumask_test_cpu(cpu, cpu_callout_mask))
-   cpu_relax();
-}
-
 /*
  * Activate a secondary processor.
  */
@@ -244,11 +231,11 @@ static void notrace start_secondary(void
cpu_init_exception_handling();
 
/*
-* Sync point with wait_cpu_initialized(). Sets AP in
-* cpu_initialized_mask and then waits for the control CPU
-* to release it.
+* Synchronization point with the hotplug core. Sets the
+* synchronization state to ALIVE and waits for the control CPU to
+* release this CPU for further bringup.
 */
-   wait_for_master_cpu(raw_smp_processor_id());
+   cpuhp_ap_sync_alive();
 
cpu_init();
rcu_cpu_starting(raw_smp_processor_id());
@@ -278,7 +265,6 @@ 

[patch V4 19/37] cpu/hotplug: Add CPU state tracking and synchronization

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

The CPU state tracking and synchronization mechanism in smpboot.c is
completely independent of the hotplug code and all logic around it is
implemented in architecture specific code.

Except for the state reporting of the AP there is absolutely nothing
architecture specific and the sychronization and decision functions can be
moved into the generic hotplug core code.

Provide an integrated variant and add the core synchronization and decision
points. This comes in two flavours:

  1) DEAD state synchronization

 Updated by the architecture code once the AP reaches the point where
 it is ready to be torn down by the control CPU, e.g. by removing power
 or clocks or tear down via the hypervisor.

 The control CPU waits for this state to be reached with a timeout. If
 the state is reached an architecture specific cleanup function is
 invoked.

  2) Full state synchronization

 This extends #1 with AP alive synchronization. This is new
 functionality, which allows to replace architecture specific wait
 mechanims, e.g. cpumasks, completely.

 It also prevents that an AP which is in a limbo state can be brought
 up again. This can happen when an AP failed to report dead state
 during a previous off-line operation.

The dead synchronization is what most architectures use. Only x86 makes a
bringup decision based on that state at the moment.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
V4: Remove the try_cmpxchg() loop in cpuhp_ap_update_sync_state() - Peter Z.
---
 arch/Kconfig   |   15 +++
 include/linux/cpuhotplug.h |   12 ++
 kernel/cpu.c   |  193 -
 kernel/smpboot.c   |2 
 4 files changed, 221 insertions(+), 1 deletion(-)
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -34,6 +34,21 @@ config ARCH_HAS_SUBPAGE_FAULTS
 config HOTPLUG_SMT
bool
 
+# Selected by HOTPLUG_CORE_SYNC_DEAD or HOTPLUG_CORE_SYNC_FULL
+config HOTPLUG_CORE_SYNC
+   bool
+
+# Basic CPU dead synchronization selected by architecture
+config HOTPLUG_CORE_SYNC_DEAD
+   bool
+   select HOTPLUG_CORE_SYNC
+
+# Full CPU synchronization with alive state selected by architecture
+config HOTPLUG_CORE_SYNC_FULL
+   bool
+   select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
+   select HOTPLUG_CORE_SYNC
+
 config GENERIC_ENTRY
bool
 
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -517,4 +517,16 @@ void cpuhp_online_idle(enum cpuhp_state
 static inline void cpuhp_online_idle(enum cpuhp_state state) { }
 #endif
 
+void cpuhp_ap_sync_alive(void);
+void arch_cpuhp_sync_state_poll(void);
+void arch_cpuhp_cleanup_kick_cpu(unsigned int cpu);
+
+#ifdef CONFIG_HOTPLUG_CORE_SYNC_DEAD
+void cpuhp_ap_report_dead(void);
+void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu);
+#else
+static inline void cpuhp_ap_report_dead(void) { }
+static inline void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu) { }
+#endif
+
 #endif
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -59,6 +60,7 @@
  * @last:  For multi-instance rollback, remember how far we got
  * @cb_state:  The state for a single callback (install/uninstall)
  * @result:Result of the operation
+ * @ap_sync_state: State for AP synchronization
  * @done_up:   Signal completion to the issuer of the task for cpu-up
  * @done_down: Signal completion to the issuer of the task for cpu-down
  */
@@ -76,6 +78,7 @@ struct cpuhp_cpu_state {
struct hlist_node   *last;
enum cpuhp_statecb_state;
int result;
+   atomic_tap_sync_state;
struct completion   done_up;
struct completion   done_down;
 #endif
@@ -276,6 +279,182 @@ static bool cpuhp_is_atomic_state(enum c
return CPUHP_AP_IDLE_DEAD <= state && state < CPUHP_AP_ONLINE;
 }
 
+/* Synchronization state management */
+enum cpuhp_sync_state {
+   SYNC_STATE_DEAD,
+   SYNC_STATE_KICKED,
+   SYNC_STATE_SHOULD_DIE,
+   SYNC_STATE_ALIVE,
+   SYNC_STATE_SHOULD_ONLINE,
+   SYNC_STATE_ONLINE,
+};
+
+#ifdef CONFIG_HOTPLUG_CORE_SYNC
+/**
+ * cpuhp_ap_update_sync_state - Update synchronization state during 
bringup/teardown
+ * @state: The synchronization state to set
+ *
+ * No synchronization point. Just update of the synchronization state, but 
implies
+ * a full barrier so that the AP changes are visible before the control CPU 
proceeds.
+ */
+static inline void cpuhp_ap_update_sync_state(enum cpuhp_sync_state state)
+{
+   atomic_t *st = this_cpu_ptr(_state.ap_sync_state);
+
+   (void)atomic_xchg(st, state);
+}
+
+void __weak arch_cpuhp_sync_state_poll(void) { cpu_relax(); }
+
+static bool cpuhp_wait_for_sync_state(unsigned int cpu, enum cpuhp_sync_state 
state,
+ enum cpuhp_sync_state next_state)

[patch V4 18/37] x86/xen/hvm: Get rid of DEAD_FROZEN handling

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

No point in this conditional voodoo. Un-initializing the lock mechanism is
safe to be called unconditionally even if it was already invoked when the
CPU died.

Remove the invocation of xen_smp_intr_free() as that has been already
cleaned up in xen_cpu_dead_hvm().

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/xen/enlighten_hvm.c |   11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -161,13 +161,12 @@ static int xen_cpu_up_prepare_hvm(unsign
int rc = 0;
 
/*
-* This can happen if CPU was offlined earlier and
-* offlining timed out in common_cpu_die().
+* If a CPU was offlined earlier and offlining timed out then the
+* lock mechanism is still initialized. Uninit it unconditionally
+* as it's safe to call even if already uninited. Interrupts and
+* timer have already been handled in xen_cpu_dead_hvm().
 */
-   if (cpu_report_state(cpu) == CPU_DEAD_FROZEN) {
-   xen_smp_intr_free(cpu);
-   xen_uninit_lock_cpu(cpu);
-   }
+   xen_uninit_lock_cpu(cpu);
 
if (cpu_acpi_id(cpu) != U32_MAX)
per_cpu(xen_vcpu_id, cpu) = cpu_acpi_id(cpu);






[patch V4 15/37] cpu/hotplug: Rework sparse_irq locking in bringup_cpu()

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

There is no harm to hold sparse_irq lock until the upcoming CPU completes
in cpuhp_online_idle(). This allows to remove cpu_online() synchronization
from architecture code.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
V4: Amend comment about sparse irq lock - Peter Z.
---
 kernel/cpu.c |   34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -558,7 +558,7 @@ static int cpuhp_kick_ap(int cpu, struct
return ret;
 }
 
-static int bringup_wait_for_ap(unsigned int cpu)
+static int bringup_wait_for_ap_online(unsigned int cpu)
 {
struct cpuhp_cpu_state *st = per_cpu_ptr(_state, cpu);
 
@@ -579,15 +579,12 @@ static int bringup_wait_for_ap(unsigned
 */
if (!cpu_smt_allowed(cpu))
return -ECANCELED;
-
-   if (st->target <= CPUHP_AP_ONLINE_IDLE)
-   return 0;
-
-   return cpuhp_kick_ap(cpu, st, st->target);
+   return 0;
 }
 
 static int bringup_cpu(unsigned int cpu)
 {
+   struct cpuhp_cpu_state *st = per_cpu_ptr(_state, cpu);
struct task_struct *idle = idle_thread_get(cpu);
int ret;
 
@@ -600,16 +597,33 @@ static int bringup_cpu(unsigned int cpu)
/*
 * Some architectures have to walk the irq descriptors to
 * setup the vector space for the cpu which comes online.
-* Prevent irq alloc/free across the bringup.
+*
+* Prevent irq alloc/free across the bringup by acquiring the
+* sparse irq lock. Hold it until the upcoming CPU completes the
+* startup in cpuhp_online_idle() which allows to avoid
+* intermediate synchronization points in the architecture code.
 */
irq_lock_sparse();
 
/* Arch-specific enabling code. */
ret = __cpu_up(cpu, idle);
-   irq_unlock_sparse();
if (ret)
-   return ret;
-   return bringup_wait_for_ap(cpu);
+   goto out_unlock;
+
+   ret = bringup_wait_for_ap_online(cpu);
+   if (ret)
+   goto out_unlock;
+
+   irq_unlock_sparse();
+
+   if (st->target <= CPUHP_AP_ONLINE_IDLE)
+   return 0;
+
+   return cpuhp_kick_ap(cpu, st, st->target);
+
+out_unlock:
+   irq_unlock_sparse();
+   return ret;
 }
 
 static int finish_cpu(unsigned int cpu)




[patch V4 14/37] x86/smpboot: Remove cpu_callin_mask

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

Now that TSC synchronization is SMP function call based there is no reason
to wait for the AP to be set in smp_callin_mask. The control CPU waits for
the AP to set itself in the online mask anyway.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
V4: Rename smp_callin() to ap_starting() - Peter Z.
---
 arch/x86/kernel/smpboot.c |   74 +-
 1 file changed, 15 insertions(+), 59 deletions(-)

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -104,7 +104,6 @@ EXPORT_PER_CPU_SYMBOL(cpu_info);
 /* All of these masks are initialized in setup_cpu_local_masks() */
 static cpumask_var_t cpu_initialized_mask;
 static cpumask_var_t cpu_callout_mask;
-static cpumask_var_t cpu_callin_mask;
 /* Representing CPUs for which sibling maps can be computed */
 static cpumask_var_t cpu_sibling_setup_mask;
 
@@ -161,38 +160,30 @@ static inline void smpboot_restore_warm_
 
 }
 
-/*
- * Report back to the Boot Processor during boot time or to the caller 
processor
- * during CPU online.
- */
-static void smp_callin(void)
+/* Run the next set of setup steps for the upcoming CPU */
+static void ap_starting(void)
 {
-   int cpuid;
-
-   /*
-* If waken up by an INIT in an 82489DX configuration
-* cpu_callout_mask guarantees we don't get here before
-* an INIT_deassert IPI reaches our local APIC, so it is
-* now safe to touch our local APIC.
-*/
-   cpuid = smp_processor_id();
+   int cpuid = smp_processor_id();
 
/*
-* the boot CPU has finished the init stage and is spinning
-* on callin_map until we finish. We are free to set up this
-* CPU, first the APIC. (this is probably redundant on most
-* boards)
+* If woken up by an INIT in an 82489DX configuration
+* cpu_callout_mask guarantees the CPU does not reach this point
+* before an INIT_deassert IPI reaches the local APIC, so it is now
+* safe to touch the local APIC.
+*
+* Set up this CPU, first the APIC, which is probably redundant on
+* most boards.
 */
apic_ap_setup();
 
-   /* Save our processor parameters. */
+   /* Save the processor parameters. */
smp_store_cpu_info(cpuid);
 
/*
 * The topology information must be up to date before
 * notify_cpu_starting().
 */
-   set_cpu_sibling_map(raw_smp_processor_id());
+   set_cpu_sibling_map(cpuid);
 
ap_init_aperfmperf();
 
@@ -205,11 +196,6 @@ static void smp_callin(void)
 * state CPUHP_ONLINE.
 */
notify_cpu_starting(cpuid);
-
-   /*
-* Allow the master to continue.
-*/
-   cpumask_set_cpu(cpuid, cpu_callin_mask);
 }
 
 static void ap_calibrate_delay(void)
@@ -268,12 +254,7 @@ static void notrace start_secondary(void
rcu_cpu_starting(raw_smp_processor_id());
x86_cpuinit.early_percpu_clock_init();
 
-   /*
-* Sync point with wait_cpu_callin(). The AP doesn't wait here
-* but just sets the bit to let the controlling CPU (BSP) know that
-* it's got this far.
-*/
-   smp_callin();
+   ap_starting();
 
/* Check TSC synchronization with the control CPU. */
check_tsc_sync_target();
@@ -1109,7 +1090,7 @@ static int wait_cpu_cpumask(unsigned int
  * and thus wait_for_master_cpu(), then set cpu_callout_mask to allow it
  * to proceed.  The AP will then proceed past setting its 'callin' bit
  * and end up waiting in check_tsc_sync_target() until we reach
- * do_wait_cpu_online() to tend to it.
+ * wait_cpu_online() to tend to it.
  */
 static int wait_cpu_initialized(unsigned int cpu)
 {
@@ -1124,20 +1105,7 @@ static int wait_cpu_initialized(unsigned
 }
 
 /*
- * Bringup step three: Wait for the target AP to reach smp_callin().
- * The AP is not waiting for us here so we don't need to parallelise
- * this step. Not entirely clear why we care about this, since we just
- * proceed directly to TSC synchronization which is the next sync
- * point with the AP anyway.
- */
-static void wait_cpu_callin(unsigned int cpu)
-{
-   while (!cpumask_test_cpu(cpu, cpu_callin_mask))
-   schedule();
-}
-
-/*
- * Bringup step four: Wait for the target AP to reach set_cpu_online() in
+ * Bringup step three: Wait for the target AP to reach set_cpu_online() in
  * start_secondary().
  */
 static void wait_cpu_online(unsigned int cpu)
@@ -1167,14 +1135,6 @@ static int native_kick_ap(unsigned int c
}
 
/*
-* Already booted CPU?
-*/
-   if (cpumask_test_cpu(cpu, cpu_callin_mask)) {
-   pr_debug("do_boot_cpu %d Already started\n", cpu);
-   return -ENOSYS;
-   }
-
-   /*
 * Save current MTRR state in case it was changed since early boot
 * (e.g. by the ACPI SMI) to initialize new CPUs with MTRRs in sync:
 */
@@ 

[patch V4 11/37] x86/cpu/cacheinfo: Remove cpu_callout_mask dependency

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

cpu_callout_mask is used for the stop machine based MTRR/PAT init.

In preparation of moving the BP/AP synchronization to the core hotplug
code, use a private CPU mask for cacheinfo and manage it in the
starting/dying hotplug state.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/kernel/cpu/cacheinfo.c |   21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -39,6 +39,8 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t
 /* Shared L2 cache maps */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_l2c_shared_map);
 
+static cpumask_var_t cpu_cacheinfo_mask;
+
 /* Kernel controls MTRR and/or PAT MSRs. */
 unsigned int memory_caching_control __ro_after_init;
 
@@ -1172,8 +1174,10 @@ void cache_bp_restore(void)
cache_cpu_init();
 }
 
-static int cache_ap_init(unsigned int cpu)
+static int cache_ap_online(unsigned int cpu)
 {
+   cpumask_set_cpu(cpu, cpu_cacheinfo_mask);
+
if (!memory_caching_control || get_cache_aps_delayed_init())
return 0;
 
@@ -1191,11 +1195,17 @@ static int cache_ap_init(unsigned int cp
 *  lock to prevent MTRR entry changes
 */
stop_machine_from_inactive_cpu(cache_rendezvous_handler, NULL,
-  cpu_callout_mask);
+  cpu_cacheinfo_mask);
 
return 0;
 }
 
+static int cache_ap_offline(unsigned int cpu)
+{
+   cpumask_clear_cpu(cpu, cpu_cacheinfo_mask);
+   return 0;
+}
+
 /*
  * Delayed cache initialization for all AP's
  */
@@ -1210,9 +1220,12 @@ void cache_aps_init(void)
 
 static int __init cache_ap_register(void)
 {
+   zalloc_cpumask_var(_cacheinfo_mask, GFP_KERNEL);
+   cpumask_set_cpu(smp_processor_id(), cpu_cacheinfo_mask);
+
cpuhp_setup_state_nocalls(CPUHP_AP_CACHECTRL_STARTING,
  "x86/cachectrl:starting",
- cache_ap_init, NULL);
+ cache_ap_online, cache_ap_offline);
return 0;
 }
-core_initcall(cache_ap_register);
+early_initcall(cache_ap_register);




[patch V4 17/37] x86/xen/smp_pv: Remove wait for CPU online

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

Now that the core code drops sparse_irq_lock after the idle thread
synchronized, it's pointless to wait for the AP to mark itself online.

Whether the control CPU runs in a wait loop or sleeps in the core code
waiting for the online operation to complete makes no difference.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/xen/smp_pv.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)


--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -340,11 +340,11 @@ static int xen_pv_cpu_up(unsigned int cp
 
xen_pmu_init(cpu);
 
-   rc = HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL);
-   BUG_ON(rc);
-
-   while (cpu_report_state(cpu) != CPU_ONLINE)
-   HYPERVISOR_sched_op(SCHEDOP_yield, NULL);
+   /*
+* Why is this a BUG? If the hypercall fails then everything can be
+* rolled back, no?
+*/
+   BUG_ON(HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL));
 
return 0;
 }






[patch V4 12/37] x86/smpboot: Move synchronization masks to SMP boot code

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

The usage is in smpboot.c and not in the CPU initialization code.

The XEN_PV usage of cpu_callout_mask is obsolete as cpu_init() not longer
waits and cacheinfo has its own CPU mask now, so cpu_callout_mask can be
made static too.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/include/asm/cpumask.h |5 -
 arch/x86/kernel/cpu/common.c   |   17 -
 arch/x86/kernel/smpboot.c  |   16 
 arch/x86/xen/smp_pv.c  |3 ---
 4 files changed, 16 insertions(+), 25 deletions(-)
--- a/arch/x86/include/asm/cpumask.h
+++ b/arch/x86/include/asm/cpumask.h
@@ -4,11 +4,6 @@
 #ifndef __ASSEMBLY__
 #include 
 
-extern cpumask_var_t cpu_callin_mask;
-extern cpumask_var_t cpu_callout_mask;
-extern cpumask_var_t cpu_initialized_mask;
-extern cpumask_var_t cpu_sibling_setup_mask;
-
 extern void setup_cpu_local_masks(void);
 
 /*
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -67,14 +67,6 @@
 
 u32 elf_hwcap2 __read_mostly;
 
-/* all of these masks are initialized in setup_cpu_local_masks() */
-cpumask_var_t cpu_initialized_mask;
-cpumask_var_t cpu_callout_mask;
-cpumask_var_t cpu_callin_mask;
-
-/* representing cpus for which sibling maps can be computed */
-cpumask_var_t cpu_sibling_setup_mask;
-
 /* Number of siblings per CPU package */
 int smp_num_siblings = 1;
 EXPORT_SYMBOL(smp_num_siblings);
@@ -169,15 +161,6 @@ static void ppin_init(struct cpuinfo_x86
clear_cpu_cap(c, info->feature);
 }
 
-/* correctly size the local cpu masks */
-void __init setup_cpu_local_masks(void)
-{
-   alloc_bootmem_cpumask_var(_initialized_mask);
-   alloc_bootmem_cpumask_var(_callin_mask);
-   alloc_bootmem_cpumask_var(_callout_mask);
-   alloc_bootmem_cpumask_var(_sibling_setup_mask);
-}
-
 static void default_init(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_X86_64
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -101,6 +101,13 @@ EXPORT_PER_CPU_SYMBOL(cpu_die_map);
 DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info);
 EXPORT_PER_CPU_SYMBOL(cpu_info);
 
+/* All of these masks are initialized in setup_cpu_local_masks() */
+static cpumask_var_t cpu_initialized_mask;
+static cpumask_var_t cpu_callout_mask;
+static cpumask_var_t cpu_callin_mask;
+/* Representing CPUs for which sibling maps can be computed */
+static cpumask_var_t cpu_sibling_setup_mask;
+
 /* Logical package management. We might want to allocate that dynamically */
 unsigned int __max_logical_packages __read_mostly;
 EXPORT_SYMBOL(__max_logical_packages);
@@ -1548,6 +1555,15 @@ early_param("possible_cpus", _setup_poss
set_cpu_possible(i, true);
 }
 
+/* correctly size the local cpu masks */
+void __init setup_cpu_local_masks(void)
+{
+   alloc_bootmem_cpumask_var(_initialized_mask);
+   alloc_bootmem_cpumask_var(_callin_mask);
+   alloc_bootmem_cpumask_var(_callout_mask);
+   alloc_bootmem_cpumask_var(_sibling_setup_mask);
+}
+
 #ifdef CONFIG_HOTPLUG_CPU
 
 /* Recompute SMT state for all CPUs on offline */
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -254,15 +254,12 @@ cpu_initialize_context(unsigned int cpu,
struct desc_struct *gdt;
unsigned long gdt_mfn;
 
-   /* used to tell cpu_init() that it can proceed with initialization */
-   cpumask_set_cpu(cpu, cpu_callout_mask);
if (cpumask_test_and_set_cpu(cpu, xen_cpu_initialized_map))
return 0;
 
ctxt = kzalloc(sizeof(*ctxt), GFP_KERNEL);
if (ctxt == NULL) {
cpumask_clear_cpu(cpu, xen_cpu_initialized_map);
-   cpumask_clear_cpu(cpu, cpu_callout_mask);
return -ENOMEM;
}
 




[patch V4 13/37] x86/smpboot: Make TSC synchronization function call based

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

Spin-waiting on the control CPU until the AP reaches the TSC
synchronization is just a waste especially in the case that there is no
synchronization required.

As the synchronization has to run with interrupts disabled the control CPU
part can just be done from a SMP function call. The upcoming AP issues that
call async only in the case that synchronization is required.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/include/asm/tsc.h |2 --
 arch/x86/kernel/smpboot.c  |   20 +++-
 arch/x86/kernel/tsc_sync.c |   36 +++-
 3 files changed, 14 insertions(+), 44 deletions(-)

--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -55,12 +55,10 @@ extern bool tsc_async_resets;
 #ifdef CONFIG_X86_TSC
 extern bool tsc_store_and_check_tsc_adjust(bool bootcpu);
 extern void tsc_verify_tsc_adjust(bool resume);
-extern void check_tsc_sync_source(int cpu);
 extern void check_tsc_sync_target(void);
 #else
 static inline bool tsc_store_and_check_tsc_adjust(bool bootcpu) { return 
false; }
 static inline void tsc_verify_tsc_adjust(bool resume) { }
-static inline void check_tsc_sync_source(int cpu) { }
 static inline void check_tsc_sync_target(void) { }
 #endif
 
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -275,11 +275,7 @@ static void notrace start_secondary(void
 */
smp_callin();
 
-   /*
-* Check TSC synchronization with the control CPU, which will do
-* its part of this from wait_cpu_online(), making it an implicit
-* synchronization point.
-*/
+   /* Check TSC synchronization with the control CPU. */
check_tsc_sync_target();
 
/*
@@ -1141,21 +1137,11 @@ static void wait_cpu_callin(unsigned int
 }
 
 /*
- * Bringup step four: Synchronize the TSC and wait for the target AP
- * to reach set_cpu_online() in start_secondary().
+ * Bringup step four: Wait for the target AP to reach set_cpu_online() in
+ * start_secondary().
  */
 static void wait_cpu_online(unsigned int cpu)
 {
-   unsigned long flags;
-
-   /*
-* Check TSC synchronization with the AP (keep irqs disabled
-* while doing so):
-*/
-   local_irq_save(flags);
-   check_tsc_sync_source(cpu);
-   local_irq_restore(flags);
-
/*
 * Wait for the AP to mark itself online, so the core caller
 * can drop sparse_irq_lock.
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -245,7 +245,6 @@ bool tsc_store_and_check_tsc_adjust(bool
  */
 static atomic_t start_count;
 static atomic_t stop_count;
-static atomic_t skip_test;
 static atomic_t test_runs;
 
 /*
@@ -344,21 +343,14 @@ static inline unsigned int loop_timeout(
 }
 
 /*
- * Source CPU calls into this - it waits for the freshly booted
- * target CPU to arrive and then starts the measurement:
+ * The freshly booted CPU initiates this via an async SMP function call.
  */
-void check_tsc_sync_source(int cpu)
+static void check_tsc_sync_source(void *__cpu)
 {
+   unsigned int cpu = (unsigned long)__cpu;
int cpus = 2;
 
/*
-* No need to check if we already know that the TSC is not
-* synchronized or if we have no TSC.
-*/
-   if (unsynchronized_tsc())
-   return;
-
-   /*
 * Set the maximum number of test runs to
 *  1 if the CPU does not provide the TSC_ADJUST MSR
 *  3 if the MSR is available, so the target can try to adjust
@@ -368,16 +360,9 @@ void check_tsc_sync_source(int cpu)
else
atomic_set(_runs, 3);
 retry:
-   /*
-* Wait for the target to start or to skip the test:
-*/
-   while (atomic_read(_count) != cpus - 1) {
-   if (atomic_read(_test) > 0) {
-   atomic_set(_test, 0);
-   return;
-   }
+   /* Wait for the target to start. */
+   while (atomic_read(_count) != cpus - 1)
cpu_relax();
-   }
 
/*
 * Trigger the target to continue into the measurement too:
@@ -397,14 +382,14 @@ void check_tsc_sync_source(int cpu)
if (!nr_warps) {
atomic_set(_runs, 0);
 
-   pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
+   pr_debug("TSC synchronization [CPU#%d -> CPU#%u]: passed\n",
smp_processor_id(), cpu);
 
} else if (atomic_dec_and_test(_runs) || random_warps) {
/* Force it to 0 if random warps brought us here */
atomic_set(_runs, 0);
 
-   pr_warn("TSC synchronization [CPU#%d -> CPU#%d]:\n",
+   pr_warn("TSC synchronization [CPU#%d -> CPU#%u]:\n",
smp_processor_id(), cpu);
pr_warn("Measured %Ld cycles TSC warp between CPUs, "
"turning off TSC clock.\n", max_warp);
@@ -457,11 +442,12 

[patch V4 10/37] x86/smpboot: Get rid of cpu_init_secondary()

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

The synchronization of the AP with the control CPU is a SMP boot problem
and has nothing to do with cpu_init().

Open code cpu_init_secondary() in start_secondary() and move
wait_for_master_cpu() into the SMP boot code.

No functional change.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/include/asm/processor.h |1 -
 arch/x86/kernel/cpu/common.c |   27 ---
 arch/x86/kernel/smpboot.c|   24 +++-
 3 files changed, 19 insertions(+), 33 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -551,7 +551,6 @@ extern void switch_gdt_and_percpu_base(i
 extern void load_direct_gdt(int);
 extern void load_fixmap_gdt(int);
 extern void cpu_init(void);
-extern void cpu_init_secondary(void);
 extern void cpu_init_exception_handling(void);
 extern void cr4_init(void);
 
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2123,19 +2123,6 @@ static void dbg_restore_debug_regs(void)
 #define dbg_restore_debug_regs()
 #endif /* ! CONFIG_KGDB */
 
-static void wait_for_master_cpu(int cpu)
-{
-#ifdef CONFIG_SMP
-   /*
-* wait for ACK from master CPU before continuing
-* with AP initialization
-*/
-   WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask));
-   while (!cpumask_test_cpu(cpu, cpu_callout_mask))
-   cpu_relax();
-#endif
-}
-
 static inline void setup_getcpu(int cpu)
 {
unsigned long cpudata = vdso_encode_cpunode(cpu, 
early_cpu_to_node(cpu));
@@ -2239,8 +2226,6 @@ void cpu_init(void)
struct task_struct *cur = current;
int cpu = raw_smp_processor_id();
 
-   wait_for_master_cpu(cpu);
-
ucode_cpu_init(cpu);
 
 #ifdef CONFIG_NUMA
@@ -2293,18 +2278,6 @@ void cpu_init(void)
load_fixmap_gdt(cpu);
 }
 
-#ifdef CONFIG_SMP
-void cpu_init_secondary(void)
-{
-   /*
-* Relies on the BP having set-up the IDT tables, which are loaded
-* on this CPU in cpu_init_exception_handling().
-*/
-   cpu_init_exception_handling();
-   cpu_init();
-}
-#endif
-
 #ifdef CONFIG_MICROCODE_LATE_LOADING
 /**
  * store_cpu_caps() - Store a snapshot of CPU capabilities
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -220,6 +220,17 @@ static void ap_calibrate_delay(void)
cpu_data(smp_processor_id()).loops_per_jiffy = loops_per_jiffy;
 }
 
+static void wait_for_master_cpu(int cpu)
+{
+   /*
+* Wait for release by control CPU before continuing with AP
+* initialization.
+*/
+   WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask));
+   while (!cpumask_test_cpu(cpu, cpu_callout_mask))
+   cpu_relax();
+}
+
 /*
  * Activate a secondary processor.
  */
@@ -237,13 +248,16 @@ static void notrace start_secondary(void
load_cr3(swapper_pg_dir);
__flush_tlb_all();
 #endif
+   cpu_init_exception_handling();
+
/*
-* Sync point with wait_cpu_initialized(). Before proceeding through
-* cpu_init(), the AP will call wait_for_master_cpu() which sets its
-* own bit in cpu_initialized_mask and then waits for the BSP to set
-* its bit in cpu_callout_mask to release it.
+* Sync point with wait_cpu_initialized(). Sets AP in
+* cpu_initialized_mask and then waits for the control CPU
+* to release it.
 */
-   cpu_init_secondary();
+   wait_for_master_cpu(raw_smp_processor_id());
+
+   cpu_init();
rcu_cpu_starting(raw_smp_processor_id());
x86_cpuinit.early_percpu_clock_init();
 






[patch V4 07/37] x86/smpboot: Restrict soft_restart_cpu() to SEV

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

Now that the CPU0 hotplug cruft is gone, the only user is AMD SEV.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/kernel/callthunks.c |2 +-
 arch/x86/kernel/head_32.S|   14 --
 arch/x86/kernel/head_64.S|2 +-
 3 files changed, 2 insertions(+), 16 deletions(-)

--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -133,7 +133,7 @@ static bool skip_addr(void *dest)
/* Accounts directly */
if (dest == ret_from_fork)
return true;
-#ifdef CONFIG_HOTPLUG_CPU
+#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_AMD_MEM_ENCRYPT)
if (dest == soft_restart_cpu)
return true;
 #endif
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -138,20 +138,6 @@ SYM_CODE_START(startup_32)
jmp .Ldefault_entry
 SYM_CODE_END(startup_32)
 
-#ifdef CONFIG_HOTPLUG_CPU
-/*
- * Entry point for soft restart of a CPU. Invoked from xxx_play_dead() for
- * restarting the boot CPU or for restarting SEV guest CPUs after CPU hot
- * unplug. Everything is set up already except the stack.
- */
-SYM_FUNC_START(soft_restart_cpu)
-   movl initial_stack, %ecx
-   movl %ecx, %esp
-   call *(initial_code)
-1: jmp 1b
-SYM_FUNC_END(soft_restart_cpu)
-#endif
-
 /*
  * Non-boot CPU entry point; entered from trampoline.S
  * We can't lgdt here, because lgdt itself uses a data segment, but
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -375,7 +375,7 @@ SYM_CODE_END(secondary_startup_64)
 #include "verify_cpu.S"
 #include "sev_verify_cbit.S"
 
-#ifdef CONFIG_HOTPLUG_CPU
+#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_AMD_MEM_ENCRYPT)
 /*
  * Entry point for soft restart of a CPU. Invoked from xxx_play_dead() for
  * restarting the boot CPU or for restarting SEV guest CPUs after CPU hot






[patch V4 08/37] x86/smpboot: Remove unnecessary barrier()

2023-05-12 Thread Thomas Gleixner
Peter stumbled over the barrier() after the invocation of smp_callin() in
start_secondary():

  "...this barrier() and it's comment seem weird vs smp_callin(). That
   function ends with an atomic bitop (it has to, at the very least it must
   not be weaker than store-release) but also has an explicit wmb() to order
   setup vs CPU_STARTING.

   There is no way the smp_processor_id() referred to in this comment can land
   before cpu_init() even without the barrier()."

The barrier() along with the comment was added in 2003 with commit
d8f19f2cac70 ("[PATCH] x86-64 merge") in the history tree. One of those
well documented combo patches of that time which changes world and some
more. The context back then was:

/*
 * Dont put anything before smp_callin(), SMP
 * booting is too fragile that we want to limit the
 * things done here to the most necessary things.
 */
cpu_init();
smp_callin();

+   /* otherwise gcc will move up smp_processor_id before the cpu_init */
+   barrier();

Dprintk("cpu %d: waiting for commence\n", smp_processor_id()); 

Even back in 2003 the compiler was not allowed to reorder that
smp_processor_id() invocation before the cpu_init() function call.
Especially not as smp_processor_id() resolved to:

  asm volatile("movl %%gs:%c1,%0":"=r" (ret__):"i"(pda_offset(field)):"memory");

There is no trace of this change in any mailing list archive including the
back then official x86_64 list disc...@x86-64.org, which would explain the
problem this change solved.

The debug prints are gone by now and the the only smp_processor_id()
invocation today is farther down in start_secondary() after locking
vector_lock which itself prevents reordering.

Even if the compiler would be allowed to reorder this, the code would still
be correct as GSBASE is set up early in the assembly code and is valid when
the CPU reaches start_secondary(), while the code at the time when this
barrier was added did the GSBASE setup in cpu_init().

As the barrier has zero value, remove it.

Reported-by: Peter Zijlstra 
Signed-off-by: Thomas Gleixner 
Link: 
https://lore.kernel.org/r/20230509100421.gu83...@hirez.programming.kicks-ass.net
---
V4: New patch
---
 arch/x86/kernel/smpboot.c |2 --
 1 file changed, 2 deletions(-)

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -238,8 +238,6 @@ static void notrace start_secondary(void
x86_cpuinit.early_percpu_clock_init();
smp_callin();
 
-   /* otherwise gcc will move up smp_processor_id before the cpu_init */
-   barrier();
/* Check TSC synchronization with the control CPU: */
check_tsc_sync_target();
 




[patch V4 09/37] x86/smpboot: Split up native_cpu_up() into separate phases and document them

2023-05-12 Thread Thomas Gleixner
From: David Woodhouse 

There are four logical parts to what native_cpu_up() does on the BSP (or
on the controlling CPU for a later hotplug):

 1) Wake the AP by sending the INIT/SIPI/SIPI sequence.

 2) Wait for the AP to make it as far as wait_for_master_cpu() which
sets that CPU's bit in cpu_initialized_mask, then sets the bit in
cpu_callout_mask to let the AP proceed through cpu_init().

 3) Wait for the AP to finish cpu_init() and get as far as the
smp_callin() call, which sets that CPU's bit in cpu_callin_mask.

 4) Perform the TSC synchronization and wait for the AP to actually
mark itself online in cpu_online_mask.

In preparation to allow these phases to operate in parallel on multiple
APs, split them out into separate functions and document the interactions
a little more clearly in both the BP and AP code paths.

No functional change intended.

Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/kernel/smpboot.c |  184 +-
 1 file changed, 119 insertions(+), 65 deletions(-)

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -193,6 +193,10 @@ static void smp_callin(void)
 
wmb();
 
+   /*
+* This runs the AP through all the cpuhp states to its target
+* state CPUHP_ONLINE.
+*/
notify_cpu_starting(cpuid);
 
/*
@@ -233,12 +237,28 @@ static void notrace start_secondary(void
load_cr3(swapper_pg_dir);
__flush_tlb_all();
 #endif
+   /*
+* Sync point with wait_cpu_initialized(). Before proceeding through
+* cpu_init(), the AP will call wait_for_master_cpu() which sets its
+* own bit in cpu_initialized_mask and then waits for the BSP to set
+* its bit in cpu_callout_mask to release it.
+*/
cpu_init_secondary();
rcu_cpu_starting(raw_smp_processor_id());
x86_cpuinit.early_percpu_clock_init();
+
+   /*
+* Sync point with wait_cpu_callin(). The AP doesn't wait here
+* but just sets the bit to let the controlling CPU (BSP) know that
+* it's got this far.
+*/
smp_callin();
 
-   /* Check TSC synchronization with the control CPU: */
+   /*
+* Check TSC synchronization with the control CPU, which will do
+* its part of this from wait_cpu_online(), making it an implicit
+* synchronization point.
+*/
check_tsc_sync_target();
 
/*
@@ -257,6 +277,7 @@ static void notrace start_secondary(void
 * half valid vector space.
 */
lock_vector_lock();
+   /* Sync point with do_wait_cpu_online() */
set_cpu_online(smp_processor_id(), true);
lapic_online();
unlock_vector_lock();
@@ -979,17 +1000,13 @@ int common_cpu_up(unsigned int cpu, stru
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
  * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
- * Returns zero if CPU booted OK, else error code from
+ * Returns zero if startup was successfully sent, else error code from
  * ->wakeup_secondary_cpu.
  */
 static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 {
-   /* start_ip had better be page-aligned! */
unsigned long start_ip = real_mode_header->trampoline_start;
 
-   unsigned long boot_error = 0;
-   unsigned long timeout;
-
 #ifdef CONFIG_X86_64
/* If 64-bit wakeup method exists, use the 64-bit mode trampoline IP */
if (apic->wakeup_secondary_cpu_64)
@@ -1046,60 +1063,89 @@ static int do_boot_cpu(int apicid, int c
 * - Use an INIT boot APIC message
 */
if (apic->wakeup_secondary_cpu_64)
-   boot_error = apic->wakeup_secondary_cpu_64(apicid, start_ip);
+   return apic->wakeup_secondary_cpu_64(apicid, start_ip);
else if (apic->wakeup_secondary_cpu)
-   boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);
-   else
-   boot_error = wakeup_secondary_cpu_via_init(apicid, start_ip);
+   return apic->wakeup_secondary_cpu(apicid, start_ip);
 
-   if (!boot_error) {
-   /*
-* Wait 10s total for first sign of life from AP
-*/
-   boot_error = -1;
-   timeout = jiffies + 10*HZ;
-   while (time_before(jiffies, timeout)) {
-   if (cpumask_test_cpu(cpu, cpu_initialized_mask)) {
-   /*
-* Tell AP to proceed with initialization
-*/
-   cpumask_set_cpu(cpu, cpu_callout_mask);
-   boot_error = 0;
-   break;
-   }
-   schedule();
-   }
-   }
+   return wakeup_secondary_cpu_via_init(apicid, start_ip);
+}
 
- 

[patch V4 05/37] x86/topology: Remove CPU0 hotplug option

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

This was introduced together with commit e1c467e69040 ("x86, hotplug: Wake
up CPU0 via NMI instead of INIT, SIPI, SIPI") to eventually support
physical hotplug of CPU0:

 "We'll change this code in the future to wake up hard offlined CPU0 if
  real platform and request are available."

11 years later this has not happened and physical hotplug is not officially
supported. Remove the cruft.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 Documentation/admin-guide/kernel-parameters.txt |   14 ---
 Documentation/core-api/cpu_hotplug.rst  |   13 ---
 arch/x86/Kconfig|   43 --
 arch/x86/include/asm/cpu.h  |3 
 arch/x86/kernel/topology.c  |   98 
 arch/x86/power/cpu.c|   37 -
 6 files changed, 6 insertions(+), 202 deletions(-)

--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -818,20 +818,6 @@
Format:
,,,[,]
 
-   cpu0_hotplug[X86] Turn on CPU0 hotplug feature when
-   CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off.
-   Some features depend on CPU0. Known dependencies are:
-   1. Resume from suspend/hibernate depends on CPU0.
-   Suspend/hibernate will fail if CPU0 is offline and you
-   need to online CPU0 before suspend/hibernate.
-   2. PIC interrupts also depend on CPU0. CPU0 can't be
-   removed if a PIC interrupt is detected.
-   It's said poweroff/reboot may depend on CPU0 on some
-   machines although I haven't seen such issues so far
-   after CPU0 is offline on a few tested machines.
-   If the dependencies are under your control, you can
-   turn on cpu0_hotplug.
-
cpuidle.off=1   [CPU_IDLE]
disable the cpuidle sub-system
 
--- a/Documentation/core-api/cpu_hotplug.rst
+++ b/Documentation/core-api/cpu_hotplug.rst
@@ -127,17 +127,8 @@ Once the CPU is shutdown, it will be rem
  $ echo 1 > /sys/devices/system/cpu/cpu4/online
  smpboot: Booting Node 0 Processor 4 APIC 0x1
 
-The CPU is usable again. This should work on all CPUs. CPU0 is often special
-and excluded from CPU hotplug. On X86 the kernel option
-*CONFIG_BOOTPARAM_HOTPLUG_CPU0* has to be enabled in order to be able to
-shutdown CPU0. Alternatively the kernel command option *cpu0_hotplug* can be
-used. Some known dependencies of CPU0:
-
-* Resume from hibernate/suspend. Hibernate/suspend will fail if CPU0 is 
offline.
-* PIC interrupts. CPU0 can't be removed if a PIC interrupt is detected.
-
-Please let Fenghua Yu  know if you find any dependencies
-on CPU0.
+The CPU is usable again. This should work on all CPUs, but CPU0 is often 
special
+and excluded from CPU hotplug.
 
 The CPU hotplug coordination
 
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2305,49 +2305,6 @@ config HOTPLUG_CPU
def_bool y
depends on SMP
 
-config BOOTPARAM_HOTPLUG_CPU0
-   bool "Set default setting of cpu0_hotpluggable"
-   depends on HOTPLUG_CPU
-   help
- Set whether default state of cpu0_hotpluggable is on or off.
-
- Say Y here to enable CPU0 hotplug by default. If this switch
- is turned on, there is no need to give cpu0_hotplug kernel
- parameter and the CPU0 hotplug feature is enabled by default.
-
- Please note: there are two known CPU0 dependencies if you want
- to enable the CPU0 hotplug feature either by this switch or by
- cpu0_hotplug kernel parameter.
-
- First, resume from hibernate or suspend always starts from CPU0.
- So hibernate and suspend are prevented if CPU0 is offline.
-
- Second dependency is PIC interrupts always go to CPU0. CPU0 can not
- offline if any interrupt can not migrate out of CPU0. There may
- be other CPU0 dependencies.
-
- Please make sure the dependencies are under your control before
- you enable this feature.
-
- Say N if you don't want to enable CPU0 hotplug feature by default.
- You still can enable the CPU0 hotplug feature at boot by kernel
- parameter cpu0_hotplug.
-
-config DEBUG_HOTPLUG_CPU0
-   def_bool n
-   prompt "Debug CPU0 hotplug"
-   depends on HOTPLUG_CPU
-   help
- Enabling this option offlines CPU0 (if CPU0 can be offlined) as
- soon as possible and boots up userspace with CPU0 offlined. User
- can online CPU0 back after boot time.
-
- To debug CPU0 hotplug, you need to enable CPU0 offline/online
- feature by either turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during
- compilation or giving 

[patch V4 04/37] x86/smpboot: Rename start_cpu0() to soft_restart_cpu()

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

This is used in the SEV play_dead() implementation to re-online CPUs. But
that has nothing to do with CPU0.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/include/asm/cpu.h   |2 +-
 arch/x86/kernel/callthunks.c |2 +-
 arch/x86/kernel/head_32.S|   10 +-
 arch/x86/kernel/head_64.S|   10 +-
 arch/x86/kernel/sev.c|2 +-
 5 files changed, 13 insertions(+), 13 deletions(-)

--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -30,7 +30,7 @@ struct x86_cpu {
 #ifdef CONFIG_HOTPLUG_CPU
 extern int arch_register_cpu(int num);
 extern void arch_unregister_cpu(int);
-extern void start_cpu0(void);
+extern void soft_restart_cpu(void);
 #ifdef CONFIG_DEBUG_HOTPLUG_CPU0
 extern int _debug_hotplug_cpu(int cpu, int action);
 #endif
--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -134,7 +134,7 @@ static bool skip_addr(void *dest)
if (dest == ret_from_fork)
return true;
 #ifdef CONFIG_HOTPLUG_CPU
-   if (dest == start_cpu0)
+   if (dest == soft_restart_cpu)
return true;
 #endif
 #ifdef CONFIG_FUNCTION_TRACER
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -140,16 +140,16 @@ SYM_CODE_END(startup_32)
 
 #ifdef CONFIG_HOTPLUG_CPU
 /*
- * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
- * up already except stack. We just set up stack here. Then call
- * start_secondary().
+ * Entry point for soft restart of a CPU. Invoked from xxx_play_dead() for
+ * restarting the boot CPU or for restarting SEV guest CPUs after CPU hot
+ * unplug. Everything is set up already except the stack.
  */
-SYM_FUNC_START(start_cpu0)
+SYM_FUNC_START(soft_restart_cpu)
movl initial_stack, %ecx
movl %ecx, %esp
call *(initial_code)
 1: jmp 1b
-SYM_FUNC_END(start_cpu0)
+SYM_FUNC_END(soft_restart_cpu)
 #endif
 
 /*
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -377,11 +377,11 @@ SYM_CODE_END(secondary_startup_64)
 
 #ifdef CONFIG_HOTPLUG_CPU
 /*
- * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
- * up already except stack. We just set up stack here. Then call
- * start_secondary() via .Ljump_to_C_code.
+ * Entry point for soft restart of a CPU. Invoked from xxx_play_dead() for
+ * restarting the boot CPU or for restarting SEV guest CPUs after CPU hot
+ * unplug. Everything is set up already except the stack.
  */
-SYM_CODE_START(start_cpu0)
+SYM_CODE_START(soft_restart_cpu)
ANNOTATE_NOENDBR
UNWIND_HINT_END_OF_STACK
 
@@ -390,7 +390,7 @@ SYM_CODE_START(start_cpu0)
movqTASK_threadsp(%rcx), %rsp
 
jmp .Ljump_to_C_code
-SYM_CODE_END(start_cpu0)
+SYM_CODE_END(soft_restart_cpu)
 #endif
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1328,7 +1328,7 @@ static void sev_es_play_dead(void)
 * If we get here, the VCPU was woken up again. Jump to CPU
 * startup code to get it back online.
 */
-   start_cpu0();
+   soft_restart_cpu();
 }
 #else  /* CONFIG_HOTPLUG_CPU */
 #define sev_es_play_dead   native_play_dead






[patch V4 02/37] cpu/hotplug: Mark arch_disable_smp_support() and bringup_nonboot_cpus() __init

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

No point in keeping them around.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/kernel/smpboot.c |4 ++--
 kernel/cpu.c  |2 +-
 kernel/smp.c  |2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)


--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1269,9 +1269,9 @@ int native_cpu_up(unsigned int cpu, stru
 }
 
 /**
- * arch_disable_smp_support() - disables SMP support for x86 at runtime
+ * arch_disable_smp_support() - Disables SMP support for x86 at boottime
  */
-void arch_disable_smp_support(void)
+void __init arch_disable_smp_support(void)
 {
disable_ioapic_support();
 }
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1502,7 +1502,7 @@ int bringup_hibernate_cpu(unsigned int s
return 0;
 }
 
-void bringup_nonboot_cpus(unsigned int setup_max_cpus)
+void __init bringup_nonboot_cpus(unsigned int setup_max_cpus)
 {
unsigned int cpu;
 
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -892,7 +892,7 @@ EXPORT_SYMBOL(setup_max_cpus);
  * SMP mode to .
  */
 
-void __weak arch_disable_smp_support(void) { }
+void __weak __init arch_disable_smp_support(void) { }
 
 static int __init nosmp(char *str)
 {






[patch V4 06/37] x86/smpboot: Remove the CPU0 hotplug kludge

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

This was introduced with commit e1c467e69040 ("x86, hotplug: Wake up CPU0
via NMI instead of INIT, SIPI, SIPI") to eventually support physical
hotplug of CPU0:

 "We'll change this code in the future to wake up hard offlined CPU0 if
  real platform and request are available."

11 years later this has not happened and physical hotplug is not officially
supported. Remove the cruft.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/include/asm/apic.h   |1 
 arch/x86/include/asm/smp.h|1 
 arch/x86/kernel/smpboot.c |  170 +++---
 drivers/acpi/processor_idle.c |4 
 4 files changed, 14 insertions(+), 162 deletions(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -377,7 +377,6 @@ extern struct apic *__apicdrivers[], *__
  * APIC functionality to boot other CPUs - only used on SMP:
  */
 #ifdef CONFIG_SMP
-extern int wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip);
 extern int lapic_can_unplug_cpu(void);
 #endif
 
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -130,7 +130,6 @@ void native_play_dead(void);
 void play_dead_common(void);
 void wbinvd_on_cpu(int cpu);
 int wbinvd_on_all_cpus(void);
-void cond_wakeup_cpu0(void);
 
 void native_smp_send_reschedule(int cpu);
 void native_send_call_func_ipi(const struct cpumask *mask);
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -216,9 +216,6 @@ static void ap_calibrate_delay(void)
cpu_data(smp_processor_id()).loops_per_jiffy = loops_per_jiffy;
 }
 
-static int cpu0_logical_apicid;
-static int enable_start_cpu0;
-
 /*
  * Activate a secondary processor.
  */
@@ -241,8 +238,6 @@ static void notrace start_secondary(void
x86_cpuinit.early_percpu_clock_init();
smp_callin();
 
-   enable_start_cpu0 = 0;
-
/* otherwise gcc will move up smp_processor_id before the cpu_init */
barrier();
/* Check TSC synchronization with the control CPU: */
@@ -410,7 +405,7 @@ void smp_store_cpu_info(int id)
c->cpu_index = id;
/*
 * During boot time, CPU0 has this setup already. Save the info when
-* bringing up AP or offlined CPU0.
+* bringing up an AP.
 */
identify_secondary_cpu(c);
c->initialized = true;
@@ -807,51 +802,14 @@ static void __init smp_quirk_init_udelay
 }
 
 /*
- * Poke the other CPU in the eye via NMI to wake it up. Remember that the 
normal
- * INIT, INIT, STARTUP sequence will reset the chip hard for us, and this
- * won't ... remember to clear down the APIC, etc later.
- */
-int
-wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip)
-{
-   u32 dm = apic->dest_mode_logical ? APIC_DEST_LOGICAL : 
APIC_DEST_PHYSICAL;
-   unsigned long send_status, accept_status = 0;
-   int maxlvt;
-
-   /* Target chip */
-   /* Boot on the stack */
-   /* Kick the second */
-   apic_icr_write(APIC_DM_NMI | dm, apicid);
-
-   pr_debug("Waiting for send to finish...\n");
-   send_status = safe_apic_wait_icr_idle();
-
-   /*
-* Give the other CPU some time to accept the IPI.
-*/
-   udelay(200);
-   if (APIC_INTEGRATED(boot_cpu_apic_version)) {
-   maxlvt = lapic_get_maxlvt();
-   if (maxlvt > 3) /* Due to the Pentium erratum 
3AP.  */
-   apic_write(APIC_ESR, 0);
-   accept_status = (apic_read(APIC_ESR) & 0xEF);
-   }
-   pr_debug("NMI sent\n");
-
-   if (send_status)
-   pr_err("APIC never delivered???\n");
-   if (accept_status)
-   pr_err("APIC delivery error (%lx)\n", accept_status);
-
-   return (send_status | accept_status);
-}
-
-static int
-wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip)
+ * Wake up AP by INIT, INIT, STARTUP sequence.
+ */
+static int wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long 
start_eip)
 {
unsigned long send_status = 0, accept_status = 0;
int maxlvt, num_starts, j;
 
+   preempt_disable();
maxlvt = lapic_get_maxlvt();
 
/*
@@ -957,6 +915,7 @@ wakeup_secondary_cpu_via_init(int phys_a
if (accept_status)
pr_err("APIC delivery error (%lx)\n", accept_status);
 
+   preempt_enable();
return (send_status | accept_status);
 }
 
@@ -997,67 +956,6 @@ static void announce_cpu(int cpu, int ap
node, cpu, apicid);
 }
 
-static int wakeup_cpu0_nmi(unsigned int cmd, struct pt_regs *regs)
-{
-   int cpu;
-
-   cpu = smp_processor_id();
-   if (cpu == 0 && !cpu_online(cpu) && enable_start_cpu0)
-   return NMI_HANDLED;
-
-   return NMI_DONE;
-}
-
-/*
- * Wake up AP by INIT, INIT, STARTUP sequence.
- *
- * Instead of waiting for STARTUP after INITs, BSP will execute the BIOS
- * boot-strap code which is not a desired behavior for 

[patch V4 03/37] x86/smpboot: Avoid pointless delay calibration if TSC is synchronized

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

When TSC is synchronized across sockets then there is no reason to
calibrate the delay for the first CPU which comes up on a socket.

Just reuse the existing calibration value.

This removes 100ms pointlessly wasted time from CPU hotplug per socket.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/kernel/smpboot.c |   40 +---
 arch/x86/kernel/tsc.c |   20 
 2 files changed, 41 insertions(+), 19 deletions(-)


--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -178,28 +178,17 @@ static void smp_callin(void)
 */
apic_ap_setup();
 
-   /*
-* Save our processor parameters. Note: this information
-* is needed for clock calibration.
-*/
+   /* Save our processor parameters. */
smp_store_cpu_info(cpuid);
 
/*
 * The topology information must be up to date before
-* calibrate_delay() and notify_cpu_starting().
+* notify_cpu_starting().
 */
set_cpu_sibling_map(raw_smp_processor_id());
 
ap_init_aperfmperf();
 
-   /*
-* Get our bogomips.
-* Update loops_per_jiffy in cpu_data. Previous call to
-* smp_store_cpu_info() stored a value that is close but not as
-* accurate as the value just calculated.
-*/
-   calibrate_delay();
-   cpu_data(cpuid).loops_per_jiffy = loops_per_jiffy;
pr_debug("Stack at about %p\n", );
 
wmb();
@@ -212,8 +201,24 @@ static void smp_callin(void)
cpumask_set_cpu(cpuid, cpu_callin_mask);
 }
 
+static void ap_calibrate_delay(void)
+{
+   /*
+* Calibrate the delay loop and update loops_per_jiffy in cpu_data.
+* smp_store_cpu_info() stored a value that is close but not as
+* accurate as the value just calculated.
+*
+* As this is invoked after the TSC synchronization check,
+* calibrate_delay_is_known() will skip the calibration routine
+* when TSC is synchronized across sockets.
+*/
+   calibrate_delay();
+   cpu_data(smp_processor_id()).loops_per_jiffy = loops_per_jiffy;
+}
+
 static int cpu0_logical_apicid;
 static int enable_start_cpu0;
+
 /*
  * Activate a secondary processor.
  */
@@ -240,10 +245,15 @@ static void notrace start_secondary(void
 
/* otherwise gcc will move up smp_processor_id before the cpu_init */
barrier();
+   /* Check TSC synchronization with the control CPU: */
+   check_tsc_sync_target();
+
/*
-* Check TSC synchronization with the boot CPU:
+* Calibrate the delay loop after the TSC synchronization check.
+* This allows to skip the calibration when TSC is synchronized
+* across sockets.
 */
-   check_tsc_sync_target();
+   ap_calibrate_delay();
 
speculative_store_bypass_ht_init();
 
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1598,10 +1598,7 @@ void __init tsc_init(void)
 
 #ifdef CONFIG_SMP
 /*
- * If we have a constant TSC and are using the TSC for the delay loop,
- * we can skip clock calibration if another cpu in the same socket has already
- * been calibrated. This assumes that CONSTANT_TSC applies to all
- * cpus in the socket - this should be a safe assumption.
+ * Check whether existing calibration data can be reused.
  */
 unsigned long calibrate_delay_is_known(void)
 {
@@ -1609,6 +1606,21 @@ unsigned long calibrate_delay_is_known(v
int constant_tsc = cpu_has(_data(cpu), X86_FEATURE_CONSTANT_TSC);
const struct cpumask *mask = topology_core_cpumask(cpu);
 
+   /*
+* If TSC has constant frequency and TSC is synchronized across
+* sockets then reuse CPU0 calibration.
+*/
+   if (constant_tsc && !tsc_unstable)
+   return cpu_data(0).loops_per_jiffy;
+
+   /*
+* If TSC has constant frequency and TSC is not synchronized across
+* sockets and this is not the first CPU in the socket, then reuse
+* the calibration value of an already online CPU on that socket.
+*
+* This assumes that CONSTANT_TSC is consistent for all CPUs in a
+* socket.
+*/
if (!constant_tsc || !mask)
return 0;
 




[patch V4 01/37] x86/smpboot: Cleanup topology_phys_to_logical_pkg()/die()

2023-05-12 Thread Thomas Gleixner
From: Thomas Gleixner 

Make topology_phys_to_logical_pkg_die() static as it's only used in
smpboot.c and fixup the kernel-doc warnings for both functions.

Signed-off-by: Thomas Gleixner 
Tested-by: Michael Kelley 
---
 arch/x86/include/asm/topology.h |3 ---
 arch/x86/kernel/smpboot.c   |   10 ++
 2 files changed, 6 insertions(+), 7 deletions(-)
---

--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -139,7 +139,6 @@ static inline int topology_max_smt_threa
 int topology_update_package_map(unsigned int apicid, unsigned int cpu);
 int topology_update_die_map(unsigned int dieid, unsigned int cpu);
 int topology_phys_to_logical_pkg(unsigned int pkg);
-int topology_phys_to_logical_die(unsigned int die, unsigned int cpu);
 bool topology_is_primary_thread(unsigned int cpu);
 bool topology_smt_supported(void);
 #else
@@ -149,8 +148,6 @@ topology_update_package_map(unsigned int
 static inline int
 topology_update_die_map(unsigned int dieid, unsigned int cpu) { return 0; }
 static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
-static inline int topology_phys_to_logical_die(unsigned int die,
-   unsigned int cpu) { return 0; }
 static inline int topology_max_die_per_package(void) { return 1; }
 static inline int topology_max_smt_threads(void) { return 1; }
 static inline bool topology_is_primary_thread(unsigned int cpu) { return true; 
}
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -288,6 +288,7 @@ bool topology_smt_supported(void)
 
 /**
  * topology_phys_to_logical_pkg - Map a physical package id to a logical
+ * @phys_pkg:  The physical package id to map
  *
  * Returns logical package id or -1 if not found
  */
@@ -304,15 +305,17 @@ int topology_phys_to_logical_pkg(unsigne
return -1;
 }
 EXPORT_SYMBOL(topology_phys_to_logical_pkg);
+
 /**
  * topology_phys_to_logical_die - Map a physical die id to logical
+ * @die_id:The physical die id to map
+ * @cur_cpu:   The CPU for which the mapping is done
  *
  * Returns logical die id or -1 if not found
  */
-int topology_phys_to_logical_die(unsigned int die_id, unsigned int cur_cpu)
+static int topology_phys_to_logical_die(unsigned int die_id, unsigned int 
cur_cpu)
 {
-   int cpu;
-   int proc_id = cpu_data(cur_cpu).phys_proc_id;
+   int cpu, proc_id = cpu_data(cur_cpu).phys_proc_id;
 
for_each_possible_cpu(cpu) {
struct cpuinfo_x86 *c = _data(cpu);
@@ -323,7 +326,6 @@ int topology_phys_to_logical_die(unsigne
}
return -1;
 }
-EXPORT_SYMBOL(topology_phys_to_logical_die);
 
 /**
  * topology_update_package_map - Update the physical to logical package map






[patch V4 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-05-12 Thread Thomas Gleixner
Hi!

This is version 4 of the reworked parallel bringup series. Version 3 can be
found here:

   https://lore.kernel.org/lkml/20230508181633.089804...@linutronix.de

This is just a reiteration to address the following details:

  1) Address review feedback (Peter Zijlstra)

  2) Fix a MIPS related build problem (0day)

Other than that there are no changes and the other details are all the same
as in V3 and V2.

It's also available from git:

git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git hotplug

Diff to V3 below.

Thanks,

tglx
---
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index f5e0f4235746..90c71d800b59 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -690,7 +690,7 @@ void flush_tlb_one(unsigned long vaddr)
 EXPORT_SYMBOL(flush_tlb_page);
 EXPORT_SYMBOL(flush_tlb_one);
 
-#ifdef CONFIG_HOTPLUG_CPU
+#ifdef CONFIG_HOTPLUG_CORE_SYNC_DEAD
 void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
 {
if (mp_ops->cleanup_dead_cpu)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 0438802031c3..9cd77d319555 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -290,8 +290,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, 
SYM_L_GLOBAL)
 
/*  APIC ID not found in the table. Drop the trampoline lock and bail. 
*/
movqtrampoline_lock(%rip), %rax
-   lock
-   btrl$0, (%rax)
+   movl$0, (%rax)
 
 1: cli
hlt
@@ -320,8 +319,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, 
SYM_L_GLOBAL)
movqtrampoline_lock(%rip), %rax
testq   %rax, %rax
jz  .Lsetup_gdt
-   lock
-   btrl$0, (%rax)
+   movl$0, (%rax)
 
 .Lsetup_gdt:
/*
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 5caf4897b507..660709e94823 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -161,31 +161,28 @@ static inline void smpboot_restore_warm_reset_vector(void)
 
 }
 
-/*
- * Report back to the Boot Processor during boot time or to the caller 
processor
- * during CPU online.
- */
-static void smp_callin(void)
+/* Run the next set of setup steps for the upcoming CPU */
+static void ap_starting(void)
 {
int cpuid = smp_processor_id();
 
/*
-* If waken up by an INIT in an 82489DX configuration the alive
-* synchronization guarantees we don't get here before an
-* INIT_deassert IPI reaches our local APIC, so it is now safe to
-* touch our local APIC.
+* If woken up by an INIT in an 82489DX configuration the alive
+* synchronization guarantees that the CPU does not reach this
+* point before an INIT_deassert IPI reaches the local APIC, so it
+* is now safe to touch the local APIC.
 *
 * Set up this CPU, first the APIC, which is probably redundant on
 * most boards.
 */
apic_ap_setup();
 
-   /* Save our processor parameters. */
+   /* Save the processor parameters. */
smp_store_cpu_info(cpuid);
 
/*
 * The topology information must be up to date before
-* calibrate_delay() and notify_cpu_starting().
+* notify_cpu_starting().
 */
set_cpu_sibling_map(cpuid);
 
@@ -197,7 +194,7 @@ static void smp_callin(void)
 
/*
 * This runs the AP through all the cpuhp states to its target
-* state (CPUHP_ONLINE in the case of serial bringup).
+* state CPUHP_ONLINE.
 */
notify_cpu_starting(cpuid);
 }
@@ -274,10 +271,7 @@ static void notrace start_secondary(void *unused)
rcu_cpu_starting(raw_smp_processor_id());
x86_cpuinit.early_percpu_clock_init();
 
-   smp_callin();
-
-   /* Otherwise gcc will move up smp_processor_id() before cpu_init() */
-   barrier();
+   ap_starting();
 
/* Check TSC synchronization with the control CPU. */
check_tsc_sync_target();
diff --git a/arch/x86/realmode/rm/trampoline_64.S 
b/arch/x86/realmode/rm/trampoline_64.S
index 2dfb1c400167..c6de4deec746 100644
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -40,17 +40,13 @@
 .macro LOAD_REALMODE_ESP
/*
 * Make sure only one CPU fiddles with the realmode stack
-*/
+   */
 .Llock_rm\@:
-   btl $0, tr_lock
-   jnc 2f
-   pause
-   jmp .Llock_rm\@
+lock btsl   $0, tr_lock
+jnc 2f
+pause
+jmp .Llock_rm\@
 2:
-   lock
-   btsl$0, tr_lock
-   jc  .Llock_rm\@
-
# Setup stack
movl$rm_stack_end, %esp
 .endm
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 60b4093fae9e..005f863a3d2b 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -294,14 +294,14 @@ enum cpuhp_sync_state {
  * cpuhp_ap_update_sync_state - Update synchronization state during 
bringup/teardown
  * @state: The 

Re: [PATCH v2 5/8] pci/arm: Use iommu_add_dt_pci_device()

2023-05-12 Thread Stewart Hildebrand
On 5/12/23 03:25, Jan Beulich wrote:
> On 11.05.2023 21:16, Stewart Hildebrand wrote:
>> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>  pdev->domain = NULL;
>>  goto out;
>>  }
>> +#ifdef CONFIG_HAS_DEVICE_TREE
>> +ret = iommu_add_dt_pci_device(pdev);
>> +if ( ret < 0 )
>> +{
>> +printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret);
>> +goto out;
>> +}
>> +#endif
>>  ret = iommu_add_device(pdev);
> 
> Hmm, am I misremembering that in the earlier patch you had #else to
> invoke the alternative behavior?

You are remembering correctly. v1 had an #else, v2 does not.

> Now you end up calling both functions;
> if that's indeed intended,

Yes, this is intentional.

> this may still want doing differently.
> Looking at the earlier patch introducing the function, I can't infer
> though whether that's intended: iommu_add_dt_pci_device() checks that
> the add_device hook is present, but then I didn't find any use of this
> hook. The revlog there suggests the check might be stale.

Ah, right, the ops->add_device check is stale in the other patch. Good catch, 
I'll remove it there.

> If indeed the function does only preparatory work, I don't see why it
> would need naming "iommu_..."; I'd rather consider pci_add_dt_device()
> then.

The function has now been reduced to reading SMMU configuration data from DT 
and mapping RID/BDF -> AXI stream ID. However, it is still SMMU related, and it 
is still invoking another iommu_ops hook function, dt_xlate (which is yet 
another AXI stream ID translation, separate from what is being discussed here). 
Does this justify keeping "iommu_..." in the name? I'm not convinced 
pci_add_dt_device() is a good name for it either (more on this below).

> Plus in such a case #ifdef-ary here probably wants avoiding by
> introducing a suitable no-op stub for the !HAS_DEVICE_TREE case. Then
> ...
> 
>>  if ( ret )
>>  {
>> +#ifdef CONFIG_HAS_DEVICE_TREE
>> +iommu_fwspec_free(pci_to_dev(pdev));
>> +#endif
> 
> ... this (which I understand is doing the corresponding cleanup) then
> also wants wrapping in a suitably named tiny helper function.

Sure, I'm on board with eliminating/reducing the #ifdef-ary where possible. 
Will do.

> But yet further I'm then no longer convinced this is the right place
> for the addition. pci_add_device() is backing physdev hypercalls. It
> would seem to me that the function may want invoking yet one layer
> further up, or it may even want invoking from a brand new DT-specific
> physdev-op. This would then leave at least the x86-only paths (invoking
> pci_add_device() from outside of pci_physdev_op()) entirely alone.

Let's establish that pci_add_device()/iommu_add_device() are already inherently 
performing tasks related to setting up a PCI device to work with an IOMMU.

The preparatory work in question needs to happen after:

  pci_add_device()
-> alloc_pdev()

since we need to know all the possible RIDs (including those for phantom 
functions), but before the add_device iommu hook:

  pci_add_device()
-> iommu_add_device()
  -> iommu_call(hd->platform_ops, add_device, ...)


The preparatory work (i.e. mapping RID/BDF -> AXI stream ID) is inherently 
associated with setting up a PCI device to work with an ARM SMMU (but not any 
particular variant of the SMMU). The SMMU distinguishes what PCI 
device/function DMA traffic is associated with based on the derived AXI stream 
ID (sideband data), not the RID/BDF directly. See [1].

Moving the preparatory work one layer up would mean duplicating what 
alloc_pdev() is already doing to set up pdev->phantom_stride (which we need to 
figure out all RIDs for that particular device). Moving it down into the 
individual SMMU drivers (smmu_ops/platform_ops) would mean duplicating special 
phantom function handling in each SMMU driver, and further deviating from the 
Linux SMMU driver(s) they are based on.

It still feels to me like pci_add_device() (or iommu_add_device()) is the right 
place to perform the RID/BDF -> AXI stream ID mapping.

Since there's nothing inherently DT specific (or ACPI specific) about deriving 
sideband data from RID/BDF, let me propose a new name for the function (instead 
of iommu_add_dt_pci_device):

  iommu_derive_pci_device_sideband_IDs()


Now, as far as DT and ACPI co-existing goes, I admit I haven't tested with 
CONFIG_ACPI=y yet (there seems to be some issues when both CONFIG_ARM_SMMU_V3=y 
and CONFIG_ACPI=y are enabled, even in staging). But I do recognize that we 
need a way support both CONFIG_HAS_DEVICE_TREE=y and CONFIG_ACPI=y 
simultaneously. Let me think on that for a bit...

[1] 
https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt



[xen-unstable test] 180632: tolerable FAIL

2023-05-12 Thread osstest service owner
flight 180632 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/180632/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail in 
180624 pass in 180632
 test-amd64-i386-examine-uefi  6 xen-installfail pass in 180624
 test-amd64-i386-pair 10 xen-install/src_host   fail pass in 180624
 test-amd64-i386-freebsd10-i386  7 xen-install  fail pass in 180624

Tests which did not succeed, but are not blocking:
 test-amd64-i386-examine   6 xen-install  fail  like 180624
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180624
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 180624
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180624
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 180624
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 180624
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 180624
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 180624
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180624
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 180624
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180624
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 180624
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180624
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never 

Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users

2023-05-12 Thread Bjorn Helgaas
On Fri, May 12, 2023 at 01:56:29PM +0300, Andy Shevchenko wrote:
> On Tue, May 09, 2023 at 01:21:22PM -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 04, 2023 at 11:11:01AM -0500, Bjorn Helgaas wrote:
> > > On Thu, Mar 30, 2023 at 07:24:27PM +0300, Andy Shevchenko wrote:
> > > > Provide two new helper macros to iterate over PCI device resources and
> > > > convert users.
> > 
> > > Applied 2-7 to pci/resource for v6.4, thanks, I really like this!
> > 
> > This is 09cc90063240 ("PCI: Introduce pci_dev_for_each_resource()")
> > upstream now.
> > 
> > Coverity complains about each use,
> 
> It needs more clarification here. Use of reduced variant of the
> macro or all of them? If the former one, then I can speculate that
> Coverity (famous for false positives) simply doesn't understand `for
> (type var; var ...)` code.

True, Coverity finds false positives.  It flagged every use in
drivers/pci and drivers/pnp.  It didn't mention the arch/alpha, arm,
mips, powerpc, sh, or sparc uses, but I think it just didn't look at
those.

It flagged both:

  pbus_size_iopci_dev_for_each_resource(dev, r)
  pbus_size_mem   pci_dev_for_each_resource(dev, r, i)

Here's a spreadsheet with a few more details (unfortunately I don't
know how to make it dump the actual line numbers or analysis like I
pasted below, so "pci_dev_for_each_resource" doesn't appear).  These
are mostly in the "Drivers-PCI" component.

https://docs.google.com/spreadsheets/d/1ohOJwxqXXoDUA0gwopgk-z-6ArLvhN7AZn4mIlDkHhQ/edit?usp=sharing

These particular reports are in the "High Impact Outstanding" tab.

> > sample below from
> > drivers/pci/vgaarb.c.  I didn't investigate at all, so it might be a
> > false positive; just FYI.
> > 
> >   1. Condition screen_info.capabilities & (2U /* 1 << 1 */), taking 
> > true branch.
> >   556if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
> >   557base |= (u64)screen_info.ext_lfb_base << 32;
> >   558
> >   559limit = base + size;
> >   560
> >   561/* Does firmware framebuffer belong to us? */
> >   2. Condition __b < PCI_NUM_RESOURCES, taking true branch.
> >   3. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), 
> > taking true branch.
> >   6. Condition __b < PCI_NUM_RESOURCES, taking true branch.
> >   7. cond_at_most: Checking __b < PCI_NUM_RESOURCES implies that __b 
> > may be up to 16 on the true branch.
> >   8. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), 
> > taking true branch.
> >   11. incr: Incrementing __b. The value of __b may now be up to 17.
> >   12. alias: Assigning: r = >resource[__b]. r may now point to as 
> > high as element 17 of pdev->resource (which consists of 17 64-byte 
> > elements).
> >   13. Condition __b < PCI_NUM_RESOURCES, taking true branch.
> >   14. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), 
> > taking true branch.
> >   562pci_dev_for_each_resource(pdev, r) {
> >   4. Condition resource_type(r) != 512, taking true branch.
> >   9. Condition resource_type(r) != 512, taking true branch.
> > 
> >   CID 1529911 (#1 of 1): Out-of-bounds read (OVERRUN)
> >   15. overrun-local: Overrunning array of 1088 bytes at byte offset 1088 by 
> > dereferencing pointer r. [show details]
> >   563if (resource_type(r) != IORESOURCE_MEM)
> >   5. Continuing loop.
> >   10. Continuing loop.
> >   564continue;
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 



Re: [PATCH 2/2] xen/arm: smmuv3: Advertise coherent table walk if supported

2023-05-12 Thread Ayan Kumar Halder

Hi Michal,

On 12/05/2023 15:35, Michal Orzel wrote:

At the moment, even in case of a SMMU being I/O coherent, we clean the
updated PT as a result of not advertising the coherency feature. SMMUv3
coherency feature means that page table walks, accesses to memory
structures and queues are I/O coherent (refer ARM IHI 0070 E.A, 3.15).

Follow the same steps that were done for SMMU v1,v2 driver by the commit:
080dcb781e1bc3bb22f55a9dfdecb830ccbabe88

The same restrictions apply, meaning that in order to advertise coherent
table walk platform feature, all the SMMU devices need to report coherency
feature. This is because the page tables (we are sharing them with CPU)
are populated before any device assignment and in case of a device being
behind non-coherent SMMU, we would have to scan the tables and clean
the cache.

It is to be noted that the SBSA/BSA (refer ARM DEN0094C 1.0C, section D)
requires that all SMMUv3 devices support I/O coherency.

Signed-off-by: Michal Orzel 
---
There are very few platforms out there with SMMUv3 but I have never seen
a SMMUv3 that is not I/O coherent.
---
  xen/drivers/passthrough/arm/smmu-v3.c | 24 +++-
  1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/arm/smmu-v3.c 
b/xen/drivers/passthrough/arm/smmu-v3.c
index bf053cdb6d5c..2adaad0fa038 100644
--- a/xen/drivers/passthrough/arm/smmu-v3.c
+++ b/xen/drivers/passthrough/arm/smmu-v3.c
@@ -2526,6 +2526,15 @@ static const struct dt_device_match arm_smmu_of_match[] 
= {
  };
  
  /* Start of Xen specific code. */

+
+/*
+ * Platform features. It indicates the list of features supported by all
+ * SMMUs. Actually we only care about coherent table walk, which in case of
+ * SMMUv3 is implied by the overall coherency feature (refer ARM IHI 0070 E.A,
+ * section 3.15 and SMMU_IDR0.COHACC bit description).
+ */
+static uint32_t platform_features = ARM_SMMU_FEAT_COHERENCY;
+
  static int __must_check arm_smmu_iotlb_flush_all(struct domain *d)
  {
struct arm_smmu_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
@@ -2708,8 +2717,12 @@ static int arm_smmu_iommu_xen_domain_init(struct domain 
*d)
INIT_LIST_HEAD(_domain->contexts);
  
  	dom_iommu(d)->arch.priv = xen_domain;

-   return 0;
  
+	/* Coherent walk can be enabled only when all SMMUs support it. */

+   if (platform_features & ARM_SMMU_FEAT_COHERENCY)
+   iommu_set_feature(d, IOMMU_FEAT_COHERENT_WALK);
+
+   return 0;
  }
  

All good till here.

  static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
@@ -2738,6 +2751,7 @@ static __init int arm_smmu_dt_init(struct dt_device_node 
*dev,
const void *data)
  {
int rc;
+   const struct arm_smmu_device *smmu;
  
  	/*

 * Even if the device can't be initialized, we don't want to
@@ -2751,6 +2765,14 @@ static __init int arm_smmu_dt_init(struct dt_device_node 
*dev,
  
  	iommu_set_ops(_smmu_iommu_ops);
  
+	/* Find the just added SMMU and retrieve its features. */

+   smmu = arm_smmu_get_by_dev(dt_to_dev(dev));
+
+   /* It would be a bug not to find the SMMU we just added. */
+   BUG_ON(!smmu);
+
+   platform_features &= smmu->features;
+


Can you explain this change in the commit message ?

- Ayan


return 0;
  }
  




Re: [PATCH 1/2] xen/arm: smmuv3: Constify arm_smmu_get_by_dev() parameter

2023-05-12 Thread Ayan Kumar Halder



On 12/05/2023 15:35, Michal Orzel wrote:

This function does not modify its parameter 'dev' and it is not supposed
to do it. Therefore, constify it.

Signed-off-by: Michal Orzel 

Reviewed-by: Ayan Kumar Halder 



[ovmf test] 180635: all pass - PUSHED

2023-05-12 Thread osstest service owner
flight 180635 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/180635/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf d3225577123767fd09c91201d27e9c91663ae132
baseline version:
 ovmf 0b37723186ec1525b6caf14b0309fb0ed04084d7

Last test of basis   180629  2023-05-12 06:42:08 Z0 days
Testing same since   180635  2023-05-12 13:40:43 Z0 days1 attempts


People who touched revisions under test:
  Ard Biesheuvel 
  Gerd Hoffmann 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   0b37723186..d322557712  d3225577123767fd09c91201d27e9c91663ae132 -> 
xen-tested-master



[libvirt test] 180628: tolerable FAIL - PUSHED

2023-05-12 Thread osstest service owner
flight 180628 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/180628/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 180613
 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeatfail  like 180613
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 180613
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 180613
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 libvirt  517d76466b2b798e492cde18b61e4b6db8bd4ce1
baseline version:
 libvirt  db91bf2ba31766809f901e9b2bd02b4a9c917300

Last test of basis   180613  2023-05-11 04:19:02 Z1 days
Testing same since   180628  2023-05-12 04:18:49 Z0 days1 attempts


People who touched revisions under test:
  Andrea Bolognani 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd fail



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at

[PATCH 0/2] xen/arm: smmuv3: Advertise coherent table walk

2023-05-12 Thread Michal Orzel
Based on the work done for SMMU v1,v2 by commit:
080dcb781e1bc3bb22f55a9dfdecb830ccbabe88

Michal Orzel (2):
  xen/arm: smmuv3: Constify arm_smmu_get_by_dev() parameter
  xen/arm: smmuv3: Advertise coherent table walk if supported

 xen/drivers/passthrough/arm/smmu-v3.c | 28 ---
 1 file changed, 25 insertions(+), 3 deletions(-)

-- 
2.25.1




[PATCH 2/2] xen/arm: smmuv3: Advertise coherent table walk if supported

2023-05-12 Thread Michal Orzel
At the moment, even in case of a SMMU being I/O coherent, we clean the
updated PT as a result of not advertising the coherency feature. SMMUv3
coherency feature means that page table walks, accesses to memory
structures and queues are I/O coherent (refer ARM IHI 0070 E.A, 3.15).

Follow the same steps that were done for SMMU v1,v2 driver by the commit:
080dcb781e1bc3bb22f55a9dfdecb830ccbabe88

The same restrictions apply, meaning that in order to advertise coherent
table walk platform feature, all the SMMU devices need to report coherency
feature. This is because the page tables (we are sharing them with CPU)
are populated before any device assignment and in case of a device being
behind non-coherent SMMU, we would have to scan the tables and clean
the cache.

It is to be noted that the SBSA/BSA (refer ARM DEN0094C 1.0C, section D)
requires that all SMMUv3 devices support I/O coherency.

Signed-off-by: Michal Orzel 
---
There are very few platforms out there with SMMUv3 but I have never seen
a SMMUv3 that is not I/O coherent.
---
 xen/drivers/passthrough/arm/smmu-v3.c | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/arm/smmu-v3.c 
b/xen/drivers/passthrough/arm/smmu-v3.c
index bf053cdb6d5c..2adaad0fa038 100644
--- a/xen/drivers/passthrough/arm/smmu-v3.c
+++ b/xen/drivers/passthrough/arm/smmu-v3.c
@@ -2526,6 +2526,15 @@ static const struct dt_device_match arm_smmu_of_match[] 
= {
 };
 
 /* Start of Xen specific code. */
+
+/*
+ * Platform features. It indicates the list of features supported by all
+ * SMMUs. Actually we only care about coherent table walk, which in case of
+ * SMMUv3 is implied by the overall coherency feature (refer ARM IHI 0070 E.A,
+ * section 3.15 and SMMU_IDR0.COHACC bit description).
+ */
+static uint32_t platform_features = ARM_SMMU_FEAT_COHERENCY;
+
 static int __must_check arm_smmu_iotlb_flush_all(struct domain *d)
 {
struct arm_smmu_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
@@ -2708,8 +2717,12 @@ static int arm_smmu_iommu_xen_domain_init(struct domain 
*d)
INIT_LIST_HEAD(_domain->contexts);
 
dom_iommu(d)->arch.priv = xen_domain;
-   return 0;
 
+   /* Coherent walk can be enabled only when all SMMUs support it. */
+   if (platform_features & ARM_SMMU_FEAT_COHERENCY)
+   iommu_set_feature(d, IOMMU_FEAT_COHERENT_WALK);
+
+   return 0;
 }
 
 static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
@@ -2738,6 +2751,7 @@ static __init int arm_smmu_dt_init(struct dt_device_node 
*dev,
const void *data)
 {
int rc;
+   const struct arm_smmu_device *smmu;
 
/*
 * Even if the device can't be initialized, we don't want to
@@ -2751,6 +2765,14 @@ static __init int arm_smmu_dt_init(struct dt_device_node 
*dev,
 
iommu_set_ops(_smmu_iommu_ops);
 
+   /* Find the just added SMMU and retrieve its features. */
+   smmu = arm_smmu_get_by_dev(dt_to_dev(dev));
+
+   /* It would be a bug not to find the SMMU we just added. */
+   BUG_ON(!smmu);
+
+   platform_features &= smmu->features;
+
return 0;
 }
 
-- 
2.25.1




[PATCH 1/2] xen/arm: smmuv3: Constify arm_smmu_get_by_dev() parameter

2023-05-12 Thread Michal Orzel
This function does not modify its parameter 'dev' and it is not supposed
to do it. Therefore, constify it.

Signed-off-by: Michal Orzel 
---
 xen/drivers/passthrough/arm/smmu-v3.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/arm/smmu-v3.c 
b/xen/drivers/passthrough/arm/smmu-v3.c
index bfdb62b395ad..bf053cdb6d5c 100644
--- a/xen/drivers/passthrough/arm/smmu-v3.c
+++ b/xen/drivers/passthrough/arm/smmu-v3.c
@@ -1468,7 +1468,7 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device 
*smmu, u32 sid)
return sid < limit;
 }
 /* Forward declaration */
-static struct arm_smmu_device *arm_smmu_get_by_dev(struct device *dev);
+static struct arm_smmu_device *arm_smmu_get_by_dev(const struct device *dev);
 
 static int arm_smmu_add_device(u8 devfn, struct device *dev)
 {
@@ -2556,7 +2556,7 @@ static int __must_check arm_smmu_iotlb_flush(struct 
domain *d, dfn_t dfn,
return arm_smmu_iotlb_flush_all(d);
 }
 
-static struct arm_smmu_device *arm_smmu_get_by_dev(struct device *dev)
+static struct arm_smmu_device *arm_smmu_get_by_dev(const struct device *dev)
 {
struct arm_smmu_device *smmu = NULL;
 
-- 
2.25.1




[qemu-mainline test] 180626: regressions - FAIL

2023-05-12 Thread osstest service owner
flight 180626 qemu-mainline real [real]
flight 180634 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/180626/
http://logs.test-lab.xenproject.org/osstest/logs/180634/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeat fail REGR. vs. 180610

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180610
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180610
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 180610
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 180610
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 180610
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 180610
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180610
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 180610
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuu278238505d28d292927bff7683f39fb4fbca7fd1
baseline version:
 qemuud530697ca20e19f7a626f4c1c8b26fccd0dc4470

Last test of basis   180610  2023-05-10 23:38:37 Z1 days
Testing same since   180621  2023-05-11 14:27:55 Z0 days2 attempts


People who touched revisions under test:
  Dr. David Alan Gilbert 
  Jamie Iles 
  Juan Quintela 
  Laurent Vivier 
  Lukas Straub 
  Peter Maydell 
  Richard 

[PATCH] x86/cpuid: Calculate FEATURESET_NR_ENTRIES more helpfully

2023-05-12 Thread Andrew Cooper
When adding new featureset words, it is convenient to split the work into
several patches.  However, GCC 12 spotted that the way we prefer to split the
work results in a real (transient) breakage whereby the policy <-> featureset
helpers perform out-of-bounds accesses on the featureset array.

Fix this by having gen-cpuid.py calculate FEATURESET_NR_ENTRIES from the
comments describing the word blocks, rather than from the XEN_CPUFEATURE()
with the greatest value.

For simplicty, require that the word blocks appear in order.  This can be
revisted if we find a good reason to have blocks out of order.

No functional change.

Reported-by: Jan Beulich 
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

This supercedes the entire "x86: Fix transient build breakage with featureset
additions" series, but doesn't really feel as if it ought to be labelled v2
---
 xen/tools/gen-cpuid.py | 42 --
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
index 0edb7d4a19f8..e0664e0defe1 100755
--- a/xen/tools/gen-cpuid.py
+++ b/xen/tools/gen-cpuid.py
@@ -50,13 +50,37 @@ def parse_definitions(state):
 "\s+([\s\d]+\*[\s\d]+\+[\s\d]+)\)"
 "\s+/\*([\w!]*) .*$")
 
+word_regex = re.compile(
+r"^/\* .* word (\d*) \*/$")
+last_word = -1
+
 this = sys.modules[__name__]
 
 for l in state.input.readlines():
-# Short circuit the regex...
-if not l.startswith("XEN_CPUFEATURE("):
+
+# Short circuit the regexes...
+if not (l.startswith("XEN_CPUFEATURE(") or
+l.startswith("/* ")):
 continue
 
+# Handle /* ... word $N */ lines
+if l.startswith("/* "):
+
+res = word_regex.match(l)
+if res is None:
+continue # Some other comment
+
+word = int(res.groups()[0])
+
+if word != last_word + 1:
+raise Fail("Featureset word %u out of order (last word %u)"
+   % (word, last_word))
+
+last_word = word
+state.nr_entries = word + 1
+continue
+
+# Handle XEN_CPUFEATURE( lines
 res = feat_regex.match(l)
 
 if res is None:
@@ -94,6 +118,15 @@ def parse_definitions(state):
 if len(state.names) == 0:
 raise Fail("No features found")
 
+if state.nr_entries == 0:
+raise Fail("No featureset word info found")
+
+max_val = max(state.names.keys())
+if (max_val >> 5) + 1 > state.nr_entries:
+max_name = state.names[max_val]
+raise Fail("Feature %s (%d*32+%d) exceeds FEATURESET_NR_ENTRIES (%d)"
+   % (max_name, max_val >> 5, max_val & 31, state.nr_entries))
+
 def featureset_to_uint32s(fs, nr):
 """ Represent a featureset as a list of C-compatible uint32_t's """
 
@@ -122,9 +155,6 @@ def format_uint32s(state, featureset, indent):
 
 def crunch_numbers(state):
 
-# Size of bitmaps
-state.nr_entries = nr_entries = (max(state.names.keys()) >> 5) + 1
-
 # Features common between 1d and e1d.
 common_1d = (FPU, VME, DE, PSE, TSC, MSR, PAE, MCE, CX8, APIC,
  MTRR, PGE, MCA, CMOV, PAT, PSE36, MMX, FXSR)
@@ -328,7 +358,7 @@ def crunch_numbers(state):
 state.nr_deep_deps = len(state.deep_deps.keys())
 
 # Calculate the bitfield name declarations
-for word in range(nr_entries):
+for word in range(state.nr_entries):
 
 names = []
 for bit in range(32):
-- 
2.30.2




[PATCH v2] tools: drop bogus and obsolete ptyfuncs.m4

2023-05-12 Thread Olaf Hering
According to openpty(3) it is required to include  to get the
prototypes for openpty() and login_tty(). But this is not what the
function AX_CHECK_PTYFUNCS actually does. It makes no attempt to include
the required header.

The two source files which call openpty() and login_tty() already contain
the conditionals to include the required header.

Remove the bogus m4 file to fix build with clang, which complains about
calls to undeclared functions.

Remove usage of INCLUDE_LIBUTIL_H in libxl_bootloader.c, it is already
covered by inclusion of libxl_osdep.h.

Remove usage of PTYFUNCS_LIBS in libxl/Makefile, it is already covered
by UTIL_LIBS from config/StdGNU.mk.

Signed-off-by: Olaf Hering 
---
v2: remove consumers of the macros

 config/Tools.mk.in  |  2 --
 m4/ptyfuncs.m4  | 35 -
 tools/configure.ac  |  2 --
 tools/libs/light/Makefile   |  2 +-
 tools/libs/light/libxl_bootloader.c |  4 
 5 files changed, 1 insertion(+), 44 deletions(-)
 delete mode 100644 m4/ptyfuncs.m4

diff --git a/config/Tools.mk.in b/config/Tools.mk.in
index 6abb377564..b7cc2961d8 100644
--- a/config/Tools.mk.in
+++ b/config/Tools.mk.in
@@ -31,8 +31,6 @@ PTHREAD_CFLAGS  := @PTHREAD_CFLAGS@
 PTHREAD_LDFLAGS := @PTHREAD_LDFLAGS@
 PTHREAD_LIBS:= @PTHREAD_LIBS@
 
-PTYFUNCS_LIBS   := @PTYFUNCS_LIBS@
-
 LIBNL3_LIBS := @LIBNL3_LIBS@
 LIBNL3_CFLAGS   := @LIBNL3_CFLAGS@
 XEN_TOOLS_RPATH := @rpath@
diff --git a/m4/ptyfuncs.m4 b/m4/ptyfuncs.m4
deleted file mode 100644
index 3e37b5a23c..00
--- a/m4/ptyfuncs.m4
+++ /dev/null
@@ -1,35 +0,0 @@
-AC_DEFUN([AX_CHECK_PTYFUNCS], [
-dnl This is a workaround for a bug in Debian package
-dnl libbsd-dev-0.3.0-1. Once we no longer support that
-dnl package we can remove the addition of -Werror to
-dnl CPPFLAGS.
-AX_SAVEVAR_SAVE(CPPFLAGS)
-CPPFLAGS="$CPPFLAGS -Werror"
-AC_CHECK_HEADER([libutil.h],[
-  AC_DEFINE([INCLUDE_LIBUTIL_H],[],[libutil header file name])
-])
-AX_SAVEVAR_RESTORE(CPPFLAGS)
-AC_CACHE_CHECK([for openpty et al], [ax_cv_ptyfuncs_libs], [
-for ax_cv_ptyfuncs_libs in -lutil "" NOT_FOUND; do
-if test "x$ax_cv_ptyfuncs_libs" = "xNOT_FOUND"; then
-AC_MSG_FAILURE([Unable to find library for openpty and 
login_tty])
-fi
-AX_SAVEVAR_SAVE(LIBS)
-LIBS="$LIBS $ax_cv_ptyfuncs_libs"
-AC_LINK_IFELSE([AC_LANG_SOURCE([
-#ifdef INCLUDE_LIBUTIL_H
-#include INCLUDE_LIBUTIL_H
-#endif
-int main(void) {
-  openpty(0,0,0,0,0);
-  login_tty(0);
-}
-])],[
-break
-],[])
-AX_SAVEVAR_RESTORE(LIBS)
-done
-])
-PTYFUNCS_LIBS="$ax_cv_ptyfuncs_libs"
-AC_SUBST(PTYFUNCS_LIBS)
-])
diff --git a/tools/configure.ac b/tools/configure.ac
index 9bcf42f233..3cccf41960 100644
--- a/tools/configure.ac
+++ b/tools/configure.ac
@@ -70,7 +70,6 @@ m4_include([../m4/uuid.m4])
 m4_include([../m4/pkg.m4])
 m4_include([../m4/curses.m4])
 m4_include([../m4/pthread.m4])
-m4_include([../m4/ptyfuncs.m4])
 m4_include([../m4/extfs.m4])
 m4_include([../m4/fetcher.m4])
 m4_include([../m4/ax_compare_version.m4])
@@ -416,7 +415,6 @@ AC_SUBST([ZLIB_CFLAGS])
 AC_SUBST([ZLIB_LIBS])
 AX_CHECK_EXTFS
 AX_CHECK_PTHREAD
-AX_CHECK_PTYFUNCS
 AC_CHECK_LIB([yajl], [yajl_alloc], [],
 [AC_MSG_ERROR([Could not find yajl])])
 AC_CHECK_LIB([z], [deflateCopy], [], [AC_MSG_ERROR([Could not find zlib])])
diff --git a/tools/libs/light/Makefile b/tools/libs/light/Makefile
index 96daeabc47..5d7ff94b05 100644
--- a/tools/libs/light/Makefile
+++ b/tools/libs/light/Makefile
@@ -158,7 +158,7 @@ NO_HEADERS_CHK := y
 
 include $(XEN_ROOT)/tools/libs/libs.mk
 
-LDLIBS-y += $(PTYFUNCS_LIBS)
+LDLIBS-y += $(UTIL_LIBS)
 LDLIBS-$(CONFIG_LIBNL) += $(LIBNL3_LIBS)
 LDLIBS-$(CONFIG_Linux) += -luuid
 LDLIBS-$(CONFIG_Linux) += -lrt
diff --git a/tools/libs/light/libxl_bootloader.c 
b/tools/libs/light/libxl_bootloader.c
index 18e9ebd714..1bc6e51827 100644
--- a/tools/libs/light/libxl_bootloader.c
+++ b/tools/libs/light/libxl_bootloader.c
@@ -19,10 +19,6 @@
 #include 
 #endif
 
-#ifdef INCLUDE_LIBUTIL_H
-#include INCLUDE_LIBUTIL_H
-#endif
-
 #include "libxl_internal.h"
 
 #define BOOTLOADER_BUF_OUT 65536



Re: [PATCH v1] tools: drop bogus and obsolete ptyfuncs.m4

2023-05-12 Thread Olaf Hering
Tue, 9 May 2023 17:47:33 +0100 Anthony PERARD :

> That change isn't enough. And I'm not convinced that it needs to be
> removed.

You are right, the provided functions must be removed as well.
My build scripts do not run autoreconf, perhaps it is about time to
change that.

> First, AX_CHECK_PTYFUNCS is still called in "tools/configure.ac".

This needs to be removed, yes.

> Then, AX_CHECK_PTYFUNCS define INCLUDE_LIBUTIL_H and PTYFUNCS_LIBS.

This is used inconsistently. Some places include  unconditionally.
-lutil is already used unconditionally via UTIL_LIBS

> Also, that that macro isn't just about the header, but also about the
> needed library.

This is already covered.

Olaf


pgprDKONIMuhT.pgp
Description: Digitale Signatur von OpenPGP


[PATCH v3] Fix install.sh for systemd

2023-05-12 Thread Olaf Hering
On a fedora system, if you run `sudo sh install.sh` you break your
system. The installation clobbers /var/run, a symlink to /run.
A subsequent boot fails when /var/run and /run are different since
accesses through /var/run can't find items that now only exist in /run
and vice-versa.

Skip populating /var/run/xen during make install.
The directory is already created by some scripts. Adjust all remaining
scripts to create XEN_RUN_DIR at runtime.

Use the shell variable XEN_RUN_DIR instead of hardcoded paths.

XEN_RUN_STORED is covered by XEN_RUN_DIR because xenstored is usually
started afterwards.

Reported-by: Jason Andryuk 
Tested-by: Jason Andryuk 
Signed-off-by: Olaf Hering 
---
v3: use variables, and quote variables, drop -m0700 from one mkdir call

 tools/Makefile | 2 --
 tools/hotplug/FreeBSD/rc.d/xencommons.in   | 1 +
 tools/hotplug/FreeBSD/rc.d/xendriverdomain.in  | 1 +
 tools/hotplug/Linux/init.d/xendriverdomain.in  | 1 +
 tools/hotplug/Linux/systemd/xenconsoled.service.in | 2 +-
 tools/hotplug/NetBSD/rc.d/xendriverdomain.in   | 2 +-
 6 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/Makefile b/tools/Makefile
index 4906fdbc23..1ff90ddfa0 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -58,9 +58,7 @@ build all: subdirs-all
 install:
$(INSTALL_DIR) -m 700 $(DESTDIR)$(XEN_DUMP_DIR)
$(INSTALL_DIR) $(DESTDIR)$(XEN_LOG_DIR)
-   $(INSTALL_DIR) $(DESTDIR)$(XEN_RUN_DIR)
$(INSTALL_DIR) $(DESTDIR)$(XEN_LIB_DIR)
-   $(INSTALL_DIR) $(DESTDIR)$(XEN_RUN_STORED)
$(INSTALL_DIR) $(DESTDIR)$(PKG_INSTALLDIR)
$(MAKE) subdirs-install
 
diff --git a/tools/hotplug/FreeBSD/rc.d/xencommons.in 
b/tools/hotplug/FreeBSD/rc.d/xencommons.in
index 7f7cda289f..6f429e4b0c 100644
--- a/tools/hotplug/FreeBSD/rc.d/xencommons.in
+++ b/tools/hotplug/FreeBSD/rc.d/xencommons.in
@@ -34,6 +34,7 @@ xen_startcmd()
local time=0
local timeout=30
 
+   mkdir -p "${XEN_RUN_DIR}"
xenstored_pid=$(check_pidfile ${XENSTORED_PIDFILE} ${XENSTORED})
if test -z "$xenstored_pid"; then
printf "Starting xenservices: xenstored, xenconsoled."
diff --git a/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in 
b/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in
index a032822e33..f487c43468 100644
--- a/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in
+++ b/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in
@@ -27,6 +27,7 @@ xendriverdomain_start()
 {
printf "Starting xenservices: xl devd."
 
+   mkdir -p "${XEN_RUN_DIR}"
PATH="${bindir}:${sbindir}:$PATH" ${sbindir}/xl devd --pidfile 
${XLDEVD_PIDFILE} ${XLDEVD_ARGS}
 
printf "\n"
diff --git a/tools/hotplug/Linux/init.d/xendriverdomain.in 
b/tools/hotplug/Linux/init.d/xendriverdomain.in
index c63060f62a..17b381c3dc 100644
--- a/tools/hotplug/Linux/init.d/xendriverdomain.in
+++ b/tools/hotplug/Linux/init.d/xendriverdomain.in
@@ -49,6 +49,7 @@ fi
 
 do_start () {
echo Starting xl devd...
+   mkdir -p "${XEN_RUN_DIR}"
${sbindir}/xl devd --pidfile=$XLDEVD_PIDFILE $XLDEVD_ARGS
 }
 do_stop () {
diff --git a/tools/hotplug/Linux/systemd/xenconsoled.service.in 
b/tools/hotplug/Linux/systemd/xenconsoled.service.in
index 1f03de9041..d84c09aa9c 100644
--- a/tools/hotplug/Linux/systemd/xenconsoled.service.in
+++ b/tools/hotplug/Linux/systemd/xenconsoled.service.in
@@ -11,7 +11,7 @@ Environment=XENCONSOLED_TRACE=none
 Environment=XENCONSOLED_LOG_DIR=@XEN_LOG_DIR@/console
 EnvironmentFile=-@CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons
 ExecStartPre=/bin/grep -q control_d /proc/xen/capabilities
-ExecStartPre=/bin/mkdir -p ${XENCONSOLED_LOG_DIR}
+ExecStartPre=/bin/mkdir -p ${XENCONSOLED_LOG_DIR} @XEN_RUN_DIR@
 ExecStart=@sbindir@/xenconsoled -i --log=${XENCONSOLED_TRACE} 
--log-dir=${XENCONSOLED_LOG_DIR} $XENCONSOLED_ARGS
 
 [Install]
diff --git a/tools/hotplug/NetBSD/rc.d/xendriverdomain.in 
b/tools/hotplug/NetBSD/rc.d/xendriverdomain.in
index f47b0b189c..87afc061ac 100644
--- a/tools/hotplug/NetBSD/rc.d/xendriverdomain.in
+++ b/tools/hotplug/NetBSD/rc.d/xendriverdomain.in
@@ -23,7 +23,7 @@ XLDEVD_PIDFILE="@XEN_RUN_DIR@/xldevd.pid"
 
 xendriverdomain_precmd()
 {
-   :
+   mkdir -p "${XEN_RUN_DIR}"
 }
 
 xendriverdomain_startcmd()



Re: [PATCH v2] Fix install.sh for systemd

2023-05-12 Thread Olaf Hering
Fri, 12 May 2023 12:22:08 +0100 Andrew Cooper :

> Does this allow for making any of these files no longer
> preprocessed by ./configure ?  (i.e. cease being .in files)

The path to hotplugpath.sh is variable, so each consumer needs to be a .in file.


Olaf


pgpSPFRUGSANB.pgp
Description: Digitale Signatur von OpenPGP


Re: [PATCH v2] Fix install.sh for systemd

2023-05-12 Thread Andrew Cooper
On 12/05/2023 12:18 pm, Olaf Hering wrote:
> Tue, 9 May 2023 13:47:11 +0100 Andrew Cooper :
>
>> Why is this 700, and the others just using regular perms?
>> Also, doesn't it want quoting like the other examples too?
> It is not clear why there is a single mkdir -m 0700 in the tree.
> Most likely it will not give any extra security.

I agree.  It's weird and doesn't have a good reason for being different.

> The scripts source hotplug.sh, which defines a variable XEN_RUN_DIR.
> I think it is better to use the shell variable instead of hardcoded paths.

Sounds good.  Does this allow for making any of these files no longer
preprocessed by ./configure ?  (i.e. cease being .in files)

> Regarding quoting: there are many paths used without quoting.
> For the beauty an additional (huge) change could be done to quote
> everything. Not sure if it is worth the effort...

Perhaps, but variables should always be quoted.  At least make sure that
new additions (and edits) leave things quoted.

Thanks,

~Andrew



Re: [PATCH v2] Fix install.sh for systemd

2023-05-12 Thread Olaf Hering
Tue, 9 May 2023 13:47:11 +0100 Andrew Cooper :

> Why is this 700, and the others just using regular perms?
> Also, doesn't it want quoting like the other examples too?

It is not clear why there is a single mkdir -m 0700 in the tree.
Most likely it will not give any extra security.

The scripts source hotplug.sh, which defines a variable XEN_RUN_DIR.
I think it is better to use the shell variable instead of hardcoded paths.

Regarding quoting: there are many paths used without quoting.
For the beauty an additional (huge) change could be done to quote
everything. Not sure if it is worth the effort...

I will post a v3 with this relative change:

--- a/tools/hotplug/FreeBSD/rc.d/xencommons.in
+++ b/tools/hotplug/FreeBSD/rc.d/xencommons.in
@@ -34,7 +34,7 @@ xen_startcmd()
local time=0
local timeout=30
 
-   mkdir -p "@XEN_RUN_DIR@"
+   mkdir -p "${XEN_RUN_DIR}"
xenstored_pid=$(check_pidfile ${XENSTORED_PIDFILE} ${XENSTORED})
if test -z "$xenstored_pid"; then
printf "Starting xenservices: xenstored, xenconsoled."
--- a/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in
+++ b/tools/hotplug/FreeBSD/rc.d/xendriverdomain.in
@@ -27,7 +27,7 @@ xendriverdomain_start()
 {
printf "Starting xenservices: xl devd."
 
-   mkdir -p "@XEN_RUN_DIR@"
+   mkdir -p "${XEN_RUN_DIR}"
PATH="${bindir}:${sbindir}:$PATH" ${sbindir}/xl devd --pidfile 
${XLDEVD_PIDFILE} ${XLDEVD_ARGS}
 
printf "\n"
--- a/tools/hotplug/Linux/init.d/xendriverdomain.in
+++ b/tools/hotplug/Linux/init.d/xendriverdomain.in
@@ -49,7 +49,7 @@ fi
 
 do_start () {
echo Starting xl devd...
-   mkdir -m700 -p @XEN_RUN_DIR@
+   mkdir -p "${XEN_RUN_DIR}"
${sbindir}/xl devd --pidfile=$XLDEVD_PIDFILE $XLDEVD_ARGS
 }
 do_stop () {
--- a/tools/hotplug/NetBSD/rc.d/xendriverdomain.in
+++ b/tools/hotplug/NetBSD/rc.d/xendriverdomain.in
@@ -23,7 +23,7 @@ XLDEVD_PIDFILE="@XEN_RUN_DIR@/xldevd.pid"
 
 xendriverdomain_precmd()
 {
-   mkdir -p "@XEN_RUN_DIR@"
+   mkdir -p "${XEN_RUN_DIR}"
 }
 
 xendriverdomain_startcmd()



pgpfEm89B51zQ.pgp
Description: Digitale Signatur von OpenPGP


Re: [PATCH v8 0/7] Add pci_dev_for_each_resource() helper and update users

2023-05-12 Thread Andy Shevchenko
On Tue, May 09, 2023 at 01:21:22PM -0500, Bjorn Helgaas wrote:
> On Tue, Apr 04, 2023 at 11:11:01AM -0500, Bjorn Helgaas wrote:
> > On Thu, Mar 30, 2023 at 07:24:27PM +0300, Andy Shevchenko wrote:
> > > Provide two new helper macros to iterate over PCI device resources and
> > > convert users.
> 
> > Applied 2-7 to pci/resource for v6.4, thanks, I really like this!
> 
> This is 09cc90063240 ("PCI: Introduce pci_dev_for_each_resource()")
> upstream now.
> 
> Coverity complains about each use,

It needs more clarification here. Use of reduced variant of the macro or all of
them? If the former one, then I can speculate that Coverity (famous for false
positives) simply doesn't understand `for (type var; var ...)` code.

>   sample below from
> drivers/pci/vgaarb.c.  I didn't investigate at all, so it might be a
> false positive; just FYI.
> 
> 1. Condition screen_info.capabilities & (2U /* 1 << 1 */), taking 
> true branch.
>   556if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
>   557base |= (u64)screen_info.ext_lfb_base << 32;
>   558
>   559limit = base + size;
>   560
>   561/* Does firmware framebuffer belong to us? */
> 2. Condition __b < PCI_NUM_RESOURCES, taking true branch.
> 3. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), 
> taking true branch.
> 6. Condition __b < PCI_NUM_RESOURCES, taking true branch.
> 7. cond_at_most: Checking __b < PCI_NUM_RESOURCES implies that __b 
> may be up to 16 on the true branch.
> 8. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), 
> taking true branch.
> 11. incr: Incrementing __b. The value of __b may now be up to 17.
> 12. alias: Assigning: r = >resource[__b]. r may now point to as 
> high as element 17 of pdev->resource (which consists of 17 64-byte elements).
> 13. Condition __b < PCI_NUM_RESOURCES, taking true branch.
> 14. Condition (r = >resource[__b]) , (__b < PCI_NUM_RESOURCES), 
> taking true branch.
>   562pci_dev_for_each_resource(pdev, r) {
> 4. Condition resource_type(r) != 512, taking true branch.
> 9. Condition resource_type(r) != 512, taking true branch.
> 
>   CID 1529911 (#1 of 1): Out-of-bounds read (OVERRUN)
>   15. overrun-local: Overrunning array of 1088 bytes at byte offset 1088 by 
> dereferencing pointer r. [show details]
>   563if (resource_type(r) != IORESOURCE_MEM)
> 5. Continuing loop.
> 10. Continuing loop.
>   564continue;

-- 
With Best Regards,
Andy Shevchenko





Re: [PATCH V2 2/2] libxl: arm: Add grant_usage parameter for virtio devices

2023-05-12 Thread Anthony PERARD
On Thu, May 11, 2023 at 01:20:43PM +0530, Viresh Kumar wrote:
> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> index 24ac92718288..0405f6efe62a 100644
> --- a/docs/man/xl.cfg.5.pod.in
> +++ b/docs/man/xl.cfg.5.pod.in
> @@ -1619,6 +1619,18 @@ hexadecimal format, without the "0x" prefix and all in 
> lower case, like
>  Specifies the transport mechanism for the Virtio device, only "mmio" is
>  supported for now.
>  
> +=item B
> +
> +Specifies the grant usage details for the Virtio device. This can be set to
> +following values:
> +
> +- "default": The default grant setting will be used, enable grants if
> +  backend-domid != 0.

I don't think this "default" setting is useful. We could just describe
what the default is when "grant_usage" setting is missing from the
configuration.

> +- "enabled": The grants are always enabled.
> +
> +- "disabled": The grants are always disabled.

> diff --git a/tools/libs/light/libxl_types.idl 
> b/tools/libs/light/libxl_types.idl
> index c10292e0d7e3..17228817f9b7 100644
> --- a/tools/libs/light/libxl_types.idl
> +++ b/tools/libs/light/libxl_types.idl
> @@ -283,6 +283,12 @@ libxl_virtio_transport = Enumeration("virtio_transport", 
> [
>  (1, "MMIO"),
>  ])
>  
> +libxl_virtio_grant_usage = Enumeration("virtio_grant_usage", [
> +(0, "DEFAULT"),
> +(1, "DISABLED"),
> +(2, "ENABLED"),

libxl already provide this type, it's call "libxl_defbool". It can be
set to "default", "false" or "true".



> diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
> index 97c80d7ed0fa..9cd7dbef0237 100644
> --- a/tools/libs/light/libxl_arm.c
> +++ b/tools/libs/light/libxl_arm.c
> @@ -1363,22 +1365,29 @@ static int libxl__prepare_dtb(libxl__gc *gc, 
> libxl_domain_config *d_config,
>  iommu_needed = true;
>  
>  FDT( make_virtio_mmio_node(gc, fdt, disk->base, disk->irq,
> -   disk->backend_domid) );
> +   disk->backend_domid,
> +   disk->backend_domid != 
> LIBXL_TOOLSTACK_DOMID) );
>  }
>  }
>  
>  for (i = 0; i < d_config->num_virtios; i++) {
>  libxl_device_virtio *virtio = _config->virtios[i];
> +bool use_grant = false;
>  
>  if (virtio->transport != LIBXL_VIRTIO_TRANSPORT_MMIO)
>  continue;
>  
> -if (virtio->backend_domid != LIBXL_TOOLSTACK_DOMID)
> +if ((virtio->grant_usage == LIBXL_VIRTIO_GRANT_USAGE_ENABLED) ||
> +((virtio->grant_usage == LIBXL_VIRTIO_GRANT_USAGE_DEFAULT) &&
> + (virtio->backend_domid != LIBXL_TOOLSTACK_DOMID))) {

I think libxl can select what the default value should be replace with
before we start to setup the guest. There's a *_setdefault() phase were
we set the correct value when a configuration value hasn't been set and
thus a default value is used. I think this can be done in
libxl__device_virtio_setdefault().
After that, virtio->grant_usage will be true or false, and that's the
value that should be given to the virtio backend via xenstore.

> diff --git a/tools/libs/light/libxl_virtio.c b/tools/libs/light/libxl_virtio.c
> index eadcb7124c3f..0a0fae967a0f 100644
> --- a/tools/libs/light/libxl_virtio.c
> +++ b/tools/libs/light/libxl_virtio.c
> @@ -46,11 +46,13 @@ static int libxl__set_xenstore_virtio(libxl__gc *gc, 
> uint32_t domid,
>flexarray_t *ro_front)
>  {
>  const char *transport = 
> libxl_virtio_transport_to_string(virtio->transport);
> +const char *grant_usage = 
> libxl_virtio_grant_usage_to_string(virtio->grant_usage);
>  
>  flexarray_append_pair(back, "irq", GCSPRINTF("%u", virtio->irq));
>  flexarray_append_pair(back, "base", GCSPRINTF("%#"PRIx64, virtio->base));
>  flexarray_append_pair(back, "type", GCSPRINTF("%s", virtio->type));
>  flexarray_append_pair(back, "transport", GCSPRINTF("%s", transport));
> +flexarray_append_pair(back, "grant_usage", GCSPRINTF("%s", grant_usage));

Currently, this mean that we store "default" in this node. That mean
that both the virtio backend and libxl have to do computation in order
to figure out if "default" mean "true" or "false". And both have to find
the same result. I don't think this is necessary, and libxl can just
tell enabled or disable. This would be done in libxl before we run this
function. See previous comment on this patch.

Thanks,

-- 
Anthony PERARD



[xen-unstable-smoke test] 180631: tolerable all pass - PUSHED

2023-05-12 Thread osstest service owner
flight 180631 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/180631/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  4c507d8a6b6e8be90881a335b0a66eb28e0f7737
baseline version:
 xen  cb781ae2c98de5d5742aa0de6850f371fb25825f

Last test of basis   180622  2023-05-11 16:02:01 Z0 days
Testing same since   180631  2023-05-12 08:00:26 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Julien Grall 
  Roger Pau Monné 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   cb781ae2c9..4c507d8a6b  4c507d8a6b6e8be90881a335b0a66eb28e0f7737 -> smoke



Re: [PATCH V2 1/2] libxl: virtio: Remove unused frontend nodes

2023-05-12 Thread Anthony PERARD
On Thu, May 11, 2023 at 01:20:42PM +0530, Viresh Kumar wrote:
> The libxl_virtio file works for only PV devices and the nodes under the

Well, VirtIO devices are a kind of PV devices, yes, like there's Xen PV
devices. So this explanation doesn't really make much sense.

> frontend path are not used by anyone. Remove them.

Instead of describing what isn't used, it might be better to describe
what we try to achieve. Something like:

Only the VirtIO backend will watch xenstore to find out when a new
instance needs to be created for a guest, and read the parameters
from there. VirtIO frontend are only virtio, so they will not do
anything with the xenstore nodes. They can be removed.


> While at it, also add a comment to the file describing what devices this
> file is used for.
> 
> Signed-off-by: Viresh Kumar 
> ---
> V2: New patch.
> 
>  tools/libs/light/libxl_virtio.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/libs/light/libxl_virtio.c b/tools/libs/light/libxl_virtio.c
> index faada49e184e..eadcb7124c3f 100644
> --- a/tools/libs/light/libxl_virtio.c
> +++ b/tools/libs/light/libxl_virtio.c
> @@ -10,6 +10,9 @@
>   * but WITHOUT ANY WARRANTY; without even the implied warranty of
>   * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>   * GNU Lesser General Public License for more details.
> + *
> + * This is used for PV (paravirtualized) devices only and the frontend isn't
> + * aware of the xenstore.

It might be more common to put this kind of comment at the top, that is
just before the "Copyright" line.

Also, same remark as for the patch description, VirtIO are PV devices,
so the comment is unnecessary. What is less obvious is why is there even
xenstore interaction with a VirtIO device?

Maybe a better description for the source file would be:

Setup VirtIO backend. This is intended to interact with a VirtIO
backend that is watching xenstore, and create new VirtIO devices
with the parameter found in xenstore. (VirtIO frontend don't
interact with xenstore.)


Thanks,

-- 
Anthony PERARD



[ovmf test] 180629: all pass - PUSHED

2023-05-12 Thread osstest service owner
flight 180629 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/180629/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 0b37723186ec1525b6caf14b0309fb0ed04084d7
baseline version:
 ovmf c08a3a96fd19ac8adc75f00d373b4a1606b26c00

Last test of basis   180627  2023-05-12 04:12:20 Z0 days
Testing same since   180629  2023-05-12 06:42:08 Z0 days1 attempts


People who touched revisions under test:
  Giri Mudusuru 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   c08a3a96fd..0b37723186  0b37723186ec1525b6caf14b0309fb0ed04084d7 -> 
xen-tested-master



Re: [PATCH v7 3/5] xen/riscv: align __bss_start

2023-05-12 Thread Oleksii
On Fri, 2023-05-12 at 09:45 +0200, Jan Beulich wrote:
> On 11.05.2023 19:09, Oleksii Kurochko wrote:
> > bss clear cycle requires proper alignment of __bss_start.
> > 
> > Signed-off-by: Oleksii Kurochko 
> 
> Reviewed-by: Jan Beulich 
> with two remarks, though:
> 
> While probably not very important yet for RISC-V (until there is at
> least enough functionality to, say, boot Dom0), you may want to get
> used to add Fixes: tags in cases like this one.
Got it.

> 
> > --- a/xen/arch/riscv/xen.lds.S
> > +++ b/xen/arch/riscv/xen.lds.S
> > @@ -137,6 +137,7 @@ SECTIONS
> >  __init_end = .;
> >  
> >  .bss : { /* BSS */
> > +    . = ALIGN(POINTER_ALIGN);
> >  __bss_start = .;
> >  *(.bss.stack_aligned)
> >  . = ALIGN(PAGE_SIZE);
> 
> While independent of the change here, this ALIGN() visible in context
> is unnecessary, afaict. ALIGN() generally only makes sense when
> there's a linker-script-defined symbol right afterwards. Taking the
> case here, any contributions to .bss.page_aligned have to specify
> proper alignment themselves anyway (or else they'd be dependent upon
> linking order). Just like there's (correctly) no ALIGN(STACK_SIZE)
> ahead of *(.bss.stack_aligned).
It make sense.

> 
> The change here might be a good opportunity to drop that ALIGN() at
> the same time. So long as you (and the maintainers) agree, I guess
> the adjustment could easily be made while committing.
I would agree with this. Thanks.

~ Oleksii



[linux-linus test] 180625: regressions - FAIL

2023-05-12 Thread osstest service owner
flight 180625 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/180625/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64   6 xen-buildfail REGR. vs. 180278
 test-amd64-amd64-dom0pvh-xl-intel 20 guest-localmigrate/x10 fail REGR. vs. 
180278
 test-armhf-armhf-xl-credit1   8 xen-boot fail REGR. vs. 180278

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-examine  1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  8 xen-boot fail  like 180278
 test-armhf-armhf-xl-multivcpu  8 xen-boot fail like 180278
 test-armhf-armhf-libvirt-raw  8 xen-boot fail  like 180278
 test-armhf-armhf-xl-rtds  8 xen-boot fail  like 180278
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180278
 test-armhf-armhf-xl   8 xen-boot fail  like 180278
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180278
 test-armhf-armhf-examine  8 reboot   fail  like 180278
 test-armhf-armhf-libvirt-qcow2  8 xen-bootfail like 180278
 test-armhf-armhf-xl-credit2   8 xen-boot fail  like 180278
 test-armhf-armhf-xl-vhd   8 xen-boot fail  like 180278
 test-armhf-armhf-xl-arndale   8 xen-boot fail  like 180278
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass

version targeted for testing:
 linuxcc3c44c9fda264c6d401be04e95449a57c1231c6
baseline version:
 linux6c538e1adbfc696ac4747fb10d63e704344f763d

Last test of basis   180278  2023-04-16 19:41:46 Z   25 days
Failing since180281  2023-04-17 06:24:36 Z   25 days   46 attempts
Testing same since   180625  2023-05-11 22:11:49 Z0 days1 attempts


2380 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  fail
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  blocked 
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-amd64-coresched-amd64-xlpass
 test-arm64-arm64-xl  blocked 
 test-armhf-armhf-xl  fail
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsmpass
 

[xen-unstable test] 180624: tolerable FAIL - PUSHED

2023-05-12 Thread osstest service owner
flight 180624 xen-unstable real [real]
flight 180630 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/180624/
http://logs.test-lab.xenproject.org/osstest/logs/180630/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-examine   6 xen-install fail pass in 180630-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 180609
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 180609
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 180609
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180618
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 180618
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180618
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail like 
180618
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 180618
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 180618
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180618
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180618
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 180618
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180618
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass

version targeted for testing:
 xen  cb781ae2c98de5d5742aa0de6850f371fb25825f
baseline version:
 xen  

Re: [PATCH v7 3/5] xen/riscv: align __bss_start

2023-05-12 Thread Jan Beulich
On 11.05.2023 19:09, Oleksii Kurochko wrote:
> bss clear cycle requires proper alignment of __bss_start.
> 
> Signed-off-by: Oleksii Kurochko 

Reviewed-by: Jan Beulich 
with two remarks, though:

While probably not very important yet for RISC-V (until there is at
least enough functionality to, say, boot Dom0), you may want to get
used to add Fixes: tags in cases like this one.

> --- a/xen/arch/riscv/xen.lds.S
> +++ b/xen/arch/riscv/xen.lds.S
> @@ -137,6 +137,7 @@ SECTIONS
>  __init_end = .;
>  
>  .bss : { /* BSS */
> +. = ALIGN(POINTER_ALIGN);
>  __bss_start = .;
>  *(.bss.stack_aligned)
>  . = ALIGN(PAGE_SIZE);

While independent of the change here, this ALIGN() visible in context
is unnecessary, afaict. ALIGN() generally only makes sense when
there's a linker-script-defined symbol right afterwards. Taking the
case here, any contributions to .bss.page_aligned have to specify
proper alignment themselves anyway (or else they'd be dependent upon
linking order). Just like there's (correctly) no ALIGN(STACK_SIZE)
ahead of *(.bss.stack_aligned).

The change here might be a good opportunity to drop that ALIGN() at
the same time. So long as you (and the maintainers) agree, I guess
the adjustment could easily be made while committing.

Jan



Re: [RFC PATCH v2 6/8] pci/arm: don't do iommu call for phantom functions

2023-05-12 Thread Jan Beulich
On 11.05.2023 21:16, Stewart Hildebrand wrote:
> It's not necessary to add/remove/assign/deassign pci phantom functions
> for the ARM SMMU drivers. All associated AXI stream IDs are added during
> the iommu call for the base PCI device/function.
> 
> However, the ARM SMMU drivers can cope with the extra/unnecessary calls just
> fine, so this patch is RFC as it's not strictly required.

Tying the skipping to IS_ENABLED(CONFIG_HAS_DEVICE_TREE) goes against
one of Julien's earlier comments, towards DT and ACPI wanting to
co-exist at some point. So I think keeping the supposedly unnecessary
calls is going to be unavoidable, unless you have a runtime property
that you could check instead.

Jan



Re: [PATCH v2 5/8] pci/arm: Use iommu_add_dt_pci_device()

2023-05-12 Thread Jan Beulich
On 11.05.2023 21:16, Stewart Hildebrand wrote:
> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>  pdev->domain = NULL;
>  goto out;
>  }
> +#ifdef CONFIG_HAS_DEVICE_TREE
> +ret = iommu_add_dt_pci_device(pdev);
> +if ( ret < 0 )
> +{
> +printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret);
> +goto out;
> +}
> +#endif
>  ret = iommu_add_device(pdev);

Hmm, am I misremembering that in the earlier patch you had #else to
invoke the alternative behavior? Now you end up calling both functions;
if that's indeed intended, this may still want doing differently.
Looking at the earlier patch introducing the function, I can't infer
though whether that's intended: iommu_add_dt_pci_device() checks that
the add_device hook is present, but then I didn't find any use of this
hook. The revlog there suggests the check might be stale.

If indeed the function does only preparatory work, I don't see why it
would need naming "iommu_..."; I'd rather consider pci_add_dt_device()
then. Plus in such a case #ifdef-ary here probably wants avoiding by
introducing a suitable no-op stub for the !HAS_DEVICE_TREE case. Then
...

>  if ( ret )
>  {
> +#ifdef CONFIG_HAS_DEVICE_TREE
> +iommu_fwspec_free(pci_to_dev(pdev));
> +#endif

... this (which I understand is doing the corresponding cleanup) then
also wants wrapping in a suitably named tiny helper function.

But yet further I'm then no longer convinced this is the right place
for the addition. pci_add_device() is backing physdev hypercalls. It
would seem to me that the function may want invoking yet one layer
further up, or it may even want invoking from a brand new DT-specific
physdev-op. This would then leave at least the x86-only paths (invoking
pci_add_device() from outside of pci_physdev_op()) entirely alone.

Jan



Re: [PATCH v2 3/8] iommu/arm: Introduce iommu_add_dt_pci_device API

2023-05-12 Thread Jan Beulich
On 11.05.2023 21:16, Stewart Hildebrand wrote:
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -219,7 +219,8 @@ int iommu_dt_domain_init(struct domain *d);
>  int iommu_release_dt_devices(struct domain *d);
>  
>  /*
> - * Helper to add master device to the IOMMU using generic IOMMU DT bindings.
> + * Helpers to add master device to the IOMMU using generic (PCI-)IOMMU
> + * DT bindings.
>   *
>   * Return values:
>   *  0 : device is protected by an IOMMU
> @@ -228,6 +229,9 @@ int iommu_release_dt_devices(struct domain *d);
>   *  (IOMMU is not enabled/present or device is not connected to it).
>   */
>  int iommu_add_dt_device(struct dt_device_node *np);
> +#ifdef CONFIG_HAS_PCI
> +int iommu_add_dt_pci_device(struct pci_dev *pdev);
> +#endif

Is the #ifdef really necessary?

Jan



Re: [PATCH] x86/vPIT: check/bound values loaded from state save record

2023-05-12 Thread Jan Beulich
On 11.05.2023 19:51, Jason Andryuk wrote:
> On Thu, May 11, 2023 at 7:50 AM Jan Beulich  wrote:
>> @@ -426,6 +427,33 @@ static int cf_check pit_load(struct doma
>>  }
>>
>>  /*
>> + * Convert loaded values to be within valid range, for them to represent
>> + * actually reachable state.  Uses of some of the values elsewhere 
>> assume
>> + * this is the case.
>> + */
>> +for ( i = 0; i < ARRAY_SIZE(pit->hw.channels); ++i )
>> +{
>> +struct hvm_hw_pit_channel *ch = >hw.channels[i];
>> +
>> +/* pit_load_count() will convert 0 suitably back to 0x1. */
>> +ch->count &= 0x;
>> +if ( ch->count_latched >= RW_STATE_NUM )
>> +ch->count_latched = 0;
>> +if ( ch->read_state >= RW_STATE_NUM )
>> +ch->read_state = 0;
>> +if ( ch->read_state >= RW_STATE_NUM )
>> +ch->write_state = 0;
> 
> Should these both be write_state?

Definitely. Thanks for noticing!

Jan



[ovmf test] 180627: all pass - PUSHED

2023-05-12 Thread osstest service owner
flight 180627 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/180627/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf c08a3a96fd19ac8adc75f00d373b4a1606b26c00
baseline version:
 ovmf 0a0e60caf20ab621ee9c1fc66b82b739158c05cf

Last test of basis   180617  2023-05-11 09:10:41 Z0 days
Testing same since   180627  2023-05-12 04:12:20 Z0 days1 attempts


People who touched revisions under test:
  Tinh Nguyen 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   0a0e60caf2..c08a3a96fd  c08a3a96fd19ac8adc75f00d373b4a1606b26c00 -> 
xen-tested-master



Re: [PATCH v3 07/14 RESEND] cpufreq: Export HWP parameters to userspace

2023-05-12 Thread Jan Beulich
On 11.05.2023 22:22, Jason Andryuk wrote:
> On Thu, May 11, 2023 at 10:10 AM Jan Beulich  wrote:
>>
>> On 11.05.2023 15:49, Jason Andryuk wrote:
>>> On Thu, May 11, 2023 at 2:21 AM Jan Beulich  wrote:

 On 10.05.2023 19:49, Jason Andryuk wrote:
> On Mon, May 8, 2023 at 6:26 AM Jan Beulich  wrote:
>>
>> On 01.05.2023 21:30, Jason Andryuk wrote:
>>> Extend xen_get_cpufreq_para to return hwp parameters.  These match the
>>> hardware rather closely.
>>>
>>> We need the features bitmask to indicated fields supported by the actual
>>> hardware.
>>>
>>> The use of uint8_t parameters matches the hardware size.  uint32_t
>>> entries grows the sysctl_t past the build assertion in setup.c.  The
>>> uint8_t ranges are supported across multiple generations, so hopefully
>>> they won't change.
>>
>> Still it feels a little odd for values to be this narrow. Aiui the
>> scaling_governor[] and scaling_{max,min}_freq fields aren't (really)
>> used by HWP. So you could widen the union in struct
>> xen_get_cpufreq_para (in a binary but not necessarily source compatible
>> manner), gaining you 6 more uint32_t slots. Possibly the somewhat oddly
>> placed scaling_cur_freq could be included as well ...
>
> The values are narrow, but they match the hardware.  It works for HWP,
> so there is no need to change at this time AFAICT.
>
> Do you want me to make this change?

 Well, much depends on what these 8-bit values actually express (I did
 raise this question in one of the replies to your patches, as I wasn't
 able to find anything in the SDM). That'll then hopefully allow to
 make some educated prediction on on how likely it is that a future
 variant of hwp would want to widen them.
>>>
>>> Sorry for not providing a reference earlier.  In the SDM,
>>> HARDWARE-CONTROLLED PERFORMANCE STATES (HWP) section, there is this
>>> second paragraph:
>>> """
>>> In contrast, HWP is an implementation of the ACPI-defined
>>> Collaborative Processor Performance Control (CPPC), which specifies
>>> that the platform enumerates a continuous, abstract unit-less,
>>> performance value scale that is not tied to a specific performance
>>> state / frequency by definition. While the enumerated scale is roughly
>>> linear in terms of a delivered integer workload performance result,
>>> the OS is required to characterize the performance value range to
>>> comprehend the delivered performance for an applied workload.
>>> """
>>>
>>> The numbers are "continuous, abstract unit-less, performance value."
>>> So there isn't much to go on there, but generally, smaller numbers
>>> mean slower and bigger numbers mean faster.
>>>
>>> Cross referencing the ACPI spec here:
>>> https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#collaborative-processor-performance-control
>>>
>>> Scrolling down you can find the register entries such as
>>>
>>> Highest Performance
>>> Register or DWORD Attribute:  Read
>>> Size: 8-32 bits
>>>
>>> AMD has its own pstate implementation that is similar to HWP.  Looking
>>> at the Linux support, the AMD hardware also use 8 bit values for the
>>> comparable fields:
>>> https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/msr-index.h#L612
>>>
>>> So Intel and AMD are 8bit for now at least.  Something could do 32bits
>>> according to the ACPI spec.
>>>
>>> 8 bits of granularity for slow to fast seems like plenty to me.  I'm
>>> not sure what one would gain from 16 or 32 bits, but I'm not designing
>>> the hardware.  From the earlier xenpm output, "highest" was 49, so
>>> still a decent amount of room in an 8 bit range.
>>
>> Hmm, thanks for the pointers. I'm still somewhat undecided. I guess I'm
>> okay with you keeping things as you have them. If and when needed we can
>> still rework the structure - it is possible to change it as it's (for
>> the time being at least) still an unstable interface.
> 
> With an anonymous union and anonymous struct, struct
> xen_get_cpufreq_para can be re-arranged and compile without any
> changes to other cpufreq code.  struct xen_hwp_para becomes 10
> uint32_t's.  The old scaling is 3 * uint32_t + 16 bytes
> CPUFREQ_NAME_LEN + 4 * uint32_t for xen_ondemand = 11 uint32_t.  So
> int32_t turbo_enabled doesn't move and it's binary compatible.
> 
> Anonymous unions and structs aren't allowed in the public header
> though, right?

Correct.

>  So that would need to change, though it doesn't seem
> too bad.  There isn't too much churn.
> 
> I have no plans to tackle AMD pstate.  But having glanced at it this
> morning, maybe these hwp sysctls should be renamed cppc?  AMD pstate
> and HWP are both implementations of CPPC, so that could be more future
> proof?  But, again, I only glanced at the AMD stuff, so there may be
> other changes needed.

I consider this naming change plan plausible. If further adjustments
end up 

Re: [PATCH v3 04/14 RESEND] cpufreq: Add Hardware P-State (HWP) driver

2023-05-12 Thread Jan Beulich
On 12.05.2023 03:02, Marek Marczykowski-Górecki wrote:
> On Wed, May 10, 2023 at 04:19:57PM +0200, Jan Beulich wrote:
>> On 10.05.2023 15:54, Jason Andryuk wrote:
>>> On Mon, May 8, 2023 at 2:33 AM Jan Beulich  wrote:
 On 05.05.2023 17:35, Jason Andryuk wrote:
> On Fri, May 5, 2023 at 3:01 AM Jan Beulich  wrote:
> The other issue is that if you select "hwp" as the governor, but HWP
> hardware support is not available, then hwp_available() needs to reset
> the governor back to the default.  This feels like a layering
> violation.

 Layering violation - yes. But why would the governor need resetting in
 this case? If HWP was asked for but isn't available, I don't think any
 other cpufreq handling (and hence governor) should be put in place.
 And turning off cpufreq altogether (if necessary in the first place)
 wouldn't, to me, feel as much like a layering violation.
>>>
>>> My goal was for Xen to use HWP if available and fallback to the acpi
>>> cpufreq driver if not.  That to me seems more user-friendly than
>>> disabling cpufreq.
>>>
>>> if ( hwp_available() )
>>> ret = hwp_register_driver();
>>> else
>>> ret = cpufreq_register_driver(_cpufreq_driver);
>>
>> That's fine as a (future) default, but for now using hwp requires a
>> command line option, and if that option says "hwp" then it ought to
>> be hwp imo.
> 
> As a downstrem distribution, I'd strongly prefer to have an option that
> would enable HWP when present and fallback to other driver otherwise,
> even if that isn't the default upstream. I can't possibly require large
> group of users (either HWP-having or HWP-not-having) to edit the Xen
> cmdline to get power management working well.
> 
> If the meaning for cpufreq=hwp absolutely must include "nothing if HWP
> is not available", then maybe it should be named cpufreq=try-hwp
> instead, or cpufreq=prefer-hwp or something else like this?

Any new sub-option needs to fit the existing ones in its meaning. I
could see e.g. "cpufreq=xen" alone to effect what you want (once hwp
becomes available for use by default). But (for now at least) I
continue to think that a request for "hwp" ought to mean HWP.

Jan