[xen-unstable-smoke test] 186425: regressions - FAIL

2024-06-19 Thread osstest service owner
flight 186425 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186425/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf   6 xen-buildfail REGR. vs. 186411

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  43d5c5d5f70b3f5419e7ef30399d23adf6ddfa8e
baseline version:
 xen  efa6e9f15ba943d154e8d7b29384581915b2aacd

Last test of basis   186411  2024-06-19 12:00:22 Z0 days
Testing same since   186412  2024-06-19 15:03:58 Z0 days5 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Julien Grall 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  fail
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  blocked 
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit 43d5c5d5f70b3f5419e7ef30399d23adf6ddfa8e
Author: Jan Beulich 
Date:   Wed Jun 19 14:11:07 2024 +0200

xen: avoid UB in guest handle arithmetic

At least XENMEM_memory_exchange can have huge values passed in the
nr_extents and nr_exchanged fields. Adding such values to pointers can
overflow, resulting in UB. Cast respective pointers to "unsigned long"
while at the same time making the necessary multiplication explicit.
Remaining arithmetic is, despite there possibly being mathematical
overflow, okay as per the C99 spec: "A computation involving unsigned
operands can never overflow, because a result that cannot be represented
by the resulting unsigned integer type is reduced modulo the number that
is one greater than the largest value that can be represented by the
resulting type." The overflow that we need to guard against is checked
for in array_access_ok().

Note that in / down from array_access_ok() the address value is only
ever cast to "unsigned long" anyway, which is why in the invocation from
guest_handle_subrange_okay() the value doesn't need casting back to
pointer type.

In compat grant table code change two guest_handle_add_offset() to avoid
passing in negative offsets.

Since {,__}clear_guest_offset() need touching anyway, also deal with
another (latent) issue there: They were losing the handle type, i.e. the
size of the individual objects accessed. Luckily the few users we
presently have all pass char or uint8 handles.

Reported-by: Andrew Cooper 
Signed-off-by: Jan Beulich 
Acked-by: Andrew Cooper 
Tested-by: Andrew Cooper 
Release-Acked-By: Oleksii Kurochko 

commit 267122a24c499d26278ab2dbdfb46ebcaaf38474
Author: Andrew Cooper 
Date:   Fri Apr 30 16:14:36 2021 +0100

x86/defns: Clean up X86_{XCR0,XSS}_* constants

With the exception of one case in read_bndcfgu() which can use ilog2(),
the *_POS defines are unused.  Drop them.

X86_XCR0_X87 is the name used by both the SDM and APM, rather than
X86_XCR0_FP.

No functional change.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
Release-Acked-by: Oleksii Kurochko 

commit 71cacfb035f4a78ee10970dc38a3baa04d387451
Author: Andrew Cooper 
Date:   Fri Apr 30 20:17:55 2021 +0100

x86/cpuid: Fix handling of XSAVE dynamic leaves

First, if XSAVE is available in hardware but not visible to the guest, the
dynamic leaves shouldn't be filled in.

Second, the comment concerning XSS state is wrong.  VT-x doesn't manage
host/gu

[ovmf test] 186422: all pass - PUSHED

2024-06-19 Thread osstest service owner
flight 186422 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186422/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 57a890fd03356350a1b7a2a0064c8118f44e9958
baseline version:
 ovmf 95e220e95d6237e21f7c0b83fca02d56b9327c4a

Last test of basis   186414  2024-06-19 17:42:49 Z0 days
Testing same since   186422  2024-06-20 01:56:09 Z0 days1 attempts


People who touched revisions under test:
  Rebecca Cran 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   95e220e95d..57a890fd03  57a890fd03356350a1b7a2a0064c8118f44e9958 -> 
xen-tested-master



[linux-linus test] 186413: regressions - FAIL

2024-06-19 Thread osstest service owner
flight 186413 linux-linus real [real]
flight 186424 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/186413/
http://logs.test-lab.xenproject.org/osstest/logs/186424/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-raw   8 xen-boot fail REGR. vs. 186406

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt  8 xen-boot fail  like 186406
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186406
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186406
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186406
 test-armhf-armhf-examine  8 reboot   fail  like 186406
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186406
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186406
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass

version targeted for testing:
 linuxe5b3efbe1ab1793bb49ae07d56d0973267e65112
baseline version:
 linux92e5605a199efbaee59fb19e15d6cc2103a04ec2

Last test of basis   186406  2024-06-19 02:45:48 Z1 days
Testing same since   186413  2024-06-19 17:42:05 Z0 days1 attempts


People who touched revisions under test:
  Christian Marangi 
  Florian Fainelli 
  Ilpo Järvinen 
  Linus Torvalds 
  Martin Schiller 
  Masami Hiramatsu (Google) 
  Thomas Bogendoerfer 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf

[xen-unstable-smoke test] 186420: regressions - FAIL

2024-06-19 Thread osstest service owner
flight 186420 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186420/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf   6 xen-buildfail REGR. vs. 186411

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  43d5c5d5f70b3f5419e7ef30399d23adf6ddfa8e
baseline version:
 xen  efa6e9f15ba943d154e8d7b29384581915b2aacd

Last test of basis   186411  2024-06-19 12:00:22 Z0 days
Testing same since   186412  2024-06-19 15:03:58 Z0 days4 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Julien Grall 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  fail
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  blocked 
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit 43d5c5d5f70b3f5419e7ef30399d23adf6ddfa8e
Author: Jan Beulich 
Date:   Wed Jun 19 14:11:07 2024 +0200

xen: avoid UB in guest handle arithmetic

At least XENMEM_memory_exchange can have huge values passed in the
nr_extents and nr_exchanged fields. Adding such values to pointers can
overflow, resulting in UB. Cast respective pointers to "unsigned long"
while at the same time making the necessary multiplication explicit.
Remaining arithmetic is, despite there possibly being mathematical
overflow, okay as per the C99 spec: "A computation involving unsigned
operands can never overflow, because a result that cannot be represented
by the resulting unsigned integer type is reduced modulo the number that
is one greater than the largest value that can be represented by the
resulting type." The overflow that we need to guard against is checked
for in array_access_ok().

Note that in / down from array_access_ok() the address value is only
ever cast to "unsigned long" anyway, which is why in the invocation from
guest_handle_subrange_okay() the value doesn't need casting back to
pointer type.

In compat grant table code change two guest_handle_add_offset() to avoid
passing in negative offsets.

Since {,__}clear_guest_offset() need touching anyway, also deal with
another (latent) issue there: They were losing the handle type, i.e. the
size of the individual objects accessed. Luckily the few users we
presently have all pass char or uint8 handles.

Reported-by: Andrew Cooper 
Signed-off-by: Jan Beulich 
Acked-by: Andrew Cooper 
Tested-by: Andrew Cooper 
Release-Acked-By: Oleksii Kurochko 

commit 267122a24c499d26278ab2dbdfb46ebcaaf38474
Author: Andrew Cooper 
Date:   Fri Apr 30 16:14:36 2021 +0100

x86/defns: Clean up X86_{XCR0,XSS}_* constants

With the exception of one case in read_bndcfgu() which can use ilog2(),
the *_POS defines are unused.  Drop them.

X86_XCR0_X87 is the name used by both the SDM and APM, rather than
X86_XCR0_FP.

No functional change.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
Release-Acked-by: Oleksii Kurochko 

commit 71cacfb035f4a78ee10970dc38a3baa04d387451
Author: Andrew Cooper 
Date:   Fri Apr 30 20:17:55 2021 +0100

x86/cpuid: Fix handling of XSAVE dynamic leaves

First, if XSAVE is available in hardware but not visible to the guest, the
dynamic leaves shouldn't be filled in.

Second, the comment concerning XSS state is wrong.  VT-x doesn't manage
host/gu

Re: [XEN PATCH 4/5] automation/eclair_analysis: address remaining violations of MISRA C Rule 20.12

2024-06-19 Thread Stefano Stabellini
On Sat, 1 Jun 2024, Nicola Vetrini wrote:
> The DEFINE macro in asm-offsets.c (for all architectures) still generates
> violations despite the file(s) being excluded from compliance, due to the
> fact that in its expansion it sometimes refers entities in non-excluded files.
> These corner cases are deviated by the configuration.
> 
> No functional change.
> 
> Signed-off-by: Nicola Vetrini 

Acked-by: Stefano Stabellini 



Re: [XEN PATCH v3] automation/eclair: extend existing deviations of MISRA C Rule 16.3

2024-06-19 Thread Stefano Stabellini
On Fri, 14 Jun 2024, Federico Serafini wrote:
> Update ECLAIR configuration to deviate more cases where an
> unintentional fallthrough cannot happen.
> 
> Add Rule 16.3 to the monitored set and tag it as clean for arm.
> 
> Signed-off-by: Federico Serafini 

Acked-by: Stefano Stabellini 



Re: [XEN PATCH v3] automation/eclair: add deviation for MISRA C Rule 17.7

2024-06-19 Thread Stefano Stabellini
On Fri, 14 Jun 2024, Federico Serafini wrote:
> Update ECLAIR configuration to deviate some cases where not using
> the return value of a function is not dangerous.
> 
> Signed-off-by: Federico Serafini 

Acked-by: Stefano Stabellini 


> ---
> Changes in v3:
> - removed unwanted underscores;
> - grammar fixed;
> - do not constraint to the first actual argument.
> Changes in v2:
> - do not deviate strlcpy and strlcat.
> ---
>  automation/eclair_analysis/ECLAIR/deviations.ecl | 4 
>  docs/misra/deviations.rst| 9 +
>  2 files changed, 13 insertions(+)
> 
> diff --git a/automation/eclair_analysis/ECLAIR/deviations.ecl 
> b/automation/eclair_analysis/ECLAIR/deviations.ecl
> index 447c1e6661..97281082a8 100644
> --- a/automation/eclair_analysis/ECLAIR/deviations.ecl
> +++ b/automation/eclair_analysis/ECLAIR/deviations.ecl
> @@ -413,6 +413,10 @@ explicit comment indicating the fallthrough intention is 
> present."
>  -config=MC3R1.R17.1,macros+={hide , "^va_(arg|start|copy|end)$"}
>  -doc_end
>  
> +-doc_begin="Not using the return value of a function does not endanger 
> safety if it coincides with an actual argument."
> +-config=MC3R1.R17.7,calls+={safe, "any()", 
> "decl(name(__builtin_memcpy||__builtin_memmove||__builtin_memset||cpumask_check))"}
> +-doc_end
> +
>  #
>  # Series 18.
>  #
> diff --git a/docs/misra/deviations.rst b/docs/misra/deviations.rst
> index 36959aa44a..f3abe31eb5 100644
> --- a/docs/misra/deviations.rst
> +++ b/docs/misra/deviations.rst
> @@ -364,6 +364,15 @@ Deviations related to MISRA C:2012 Rules:
> by `stdarg.h`.
>   - Tagged as `deliberate` for ECLAIR.
>  
> +   * - R17.7
> + - Not using the return value of a function does not endanger safety if 
> it
> +   coincides with an actual argument.
> + - Tagged as `safe` for ECLAIR. Such functions are:
> + - __builtin_memcpy()
> + - __builtin_memmove()
> + - __builtin_memset()
> + - cpumask_check()
> +
> * - R20.4
>   - The override of the keyword \"inline\" in xen/compiler.h is present so
> that section contents checks pass when the compiler chooses not to
> -- 
> 2.34.1
> 



Re: [PATCH] automation/eclair_analysis: deviate common/unlzo.c for MISRA Rule 7.3

2024-06-19 Thread Stefano Stabellini
On Tue, 18 Jun 2024, Jan Beulich wrote:
> On 18.06.2024 11:54, Alessandro Zucchelli wrote:
> > The file contains violations of Rule 7.3 which states as following: The
> > lowercase character `l' shall not be used in a literal suffix.
> > 
> > This file defines a non-compliant constant used in a macro expanded in a
> > non-excluded file, so this deviation is needed in order to avoid
> > a spurious violation involving both files.
> 
> Imo it would be nice to be specific in such cases: Which constant? And
> which macro expanded in which file?

Hi Alessandro,
if you give me the details, I could add it to the commit message on commit



Re: [PATCH v4 2/4] xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor

2024-06-19 Thread Stefano Stabellini
On Wed, 18 Jun 2024, Julien Grall wrote:
> Hi Stefano,
> 
> On 19/06/2024 01:37, Stefano Stabellini wrote:
> > On Mon, 27 May 2024, Oleksii K. wrote:
> > > > I don't think it is a big problem if this is not merged for the code
> > > > freeze as this is technically a bug fix.
> > > 
> > > Agree, this is not a problem as it is still looks to me as a bug fix.
> > > 
> > > ~ Oleksii
> > 
> > Hi Oleksii, this version of the series was already all acked with minor
> > NITs and you gave the go-ahead for this release as it is a bug fix. Due
> > to 2 weeks of travels I only managed to commit the series now, sorry for
> > the delay.
> 
> Unfortunately this series is breaking gitlab CI [1]. I understand the go ahead
> was given two weeks ago, but as we are now past the code freeze, I feel we
> should have had a pros/cons e-mail to assess whether it was worth the risk to
> merge it.
> 
> Now to the issues, I vaguely recall this series didn't require any change in
> Linux. Did I miss anything? If so, why are we breaking Linux?
> 
> For now, I will revert this series. Once we root cause the issue, we can
> re-assess whether the fix should be apply for 4.19.

Hi Julien,

Thanks for acting quickly on this. Reverting the series was the right
thing to do, and I apologize for not waiting for the gitlab-ci results
before committing it.

Like you, my understanding was also that there were no Linux changes
required. I think this will need a deeper investigation that we can do
after the 4.19 release.

Thanks again for unblocking the tree quickly.

- Stefano



[xen-unstable-smoke test] 186418: regressions - FAIL

2024-06-19 Thread osstest service owner
flight 186418 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186418/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf   6 xen-buildfail REGR. vs. 186411

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  43d5c5d5f70b3f5419e7ef30399d23adf6ddfa8e
baseline version:
 xen  efa6e9f15ba943d154e8d7b29384581915b2aacd

Last test of basis   186411  2024-06-19 12:00:22 Z0 days
Testing same since   186412  2024-06-19 15:03:58 Z0 days3 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Julien Grall 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  fail
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  blocked 
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit 43d5c5d5f70b3f5419e7ef30399d23adf6ddfa8e
Author: Jan Beulich 
Date:   Wed Jun 19 14:11:07 2024 +0200

xen: avoid UB in guest handle arithmetic

At least XENMEM_memory_exchange can have huge values passed in the
nr_extents and nr_exchanged fields. Adding such values to pointers can
overflow, resulting in UB. Cast respective pointers to "unsigned long"
while at the same time making the necessary multiplication explicit.
Remaining arithmetic is, despite there possibly being mathematical
overflow, okay as per the C99 spec: "A computation involving unsigned
operands can never overflow, because a result that cannot be represented
by the resulting unsigned integer type is reduced modulo the number that
is one greater than the largest value that can be represented by the
resulting type." The overflow that we need to guard against is checked
for in array_access_ok().

Note that in / down from array_access_ok() the address value is only
ever cast to "unsigned long" anyway, which is why in the invocation from
guest_handle_subrange_okay() the value doesn't need casting back to
pointer type.

In compat grant table code change two guest_handle_add_offset() to avoid
passing in negative offsets.

Since {,__}clear_guest_offset() need touching anyway, also deal with
another (latent) issue there: They were losing the handle type, i.e. the
size of the individual objects accessed. Luckily the few users we
presently have all pass char or uint8 handles.

Reported-by: Andrew Cooper 
Signed-off-by: Jan Beulich 
Acked-by: Andrew Cooper 
Tested-by: Andrew Cooper 
Release-Acked-By: Oleksii Kurochko 

commit 267122a24c499d26278ab2dbdfb46ebcaaf38474
Author: Andrew Cooper 
Date:   Fri Apr 30 16:14:36 2021 +0100

x86/defns: Clean up X86_{XCR0,XSS}_* constants

With the exception of one case in read_bndcfgu() which can use ilog2(),
the *_POS defines are unused.  Drop them.

X86_XCR0_X87 is the name used by both the SDM and APM, rather than
X86_XCR0_FP.

No functional change.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
Release-Acked-by: Oleksii Kurochko 

commit 71cacfb035f4a78ee10970dc38a3baa04d387451
Author: Andrew Cooper 
Date:   Fri Apr 30 20:17:55 2021 +0100

x86/cpuid: Fix handling of XSAVE dynamic leaves

First, if XSAVE is available in hardware but not visible to the guest, the
dynamic leaves shouldn't be filled in.

Second, the comment concerning XSS state is wrong.  VT-x doesn't manage
host/gu

[xen-unstable-smoke test] 186416: regressions - FAIL

2024-06-19 Thread osstest service owner
flight 186416 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186416/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf   6 xen-buildfail REGR. vs. 186411

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  43d5c5d5f70b3f5419e7ef30399d23adf6ddfa8e
baseline version:
 xen  efa6e9f15ba943d154e8d7b29384581915b2aacd

Last test of basis   186411  2024-06-19 12:00:22 Z0 days
Testing same since   186412  2024-06-19 15:03:58 Z0 days2 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Julien Grall 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  fail
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  blocked 
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit 43d5c5d5f70b3f5419e7ef30399d23adf6ddfa8e
Author: Jan Beulich 
Date:   Wed Jun 19 14:11:07 2024 +0200

xen: avoid UB in guest handle arithmetic

At least XENMEM_memory_exchange can have huge values passed in the
nr_extents and nr_exchanged fields. Adding such values to pointers can
overflow, resulting in UB. Cast respective pointers to "unsigned long"
while at the same time making the necessary multiplication explicit.
Remaining arithmetic is, despite there possibly being mathematical
overflow, okay as per the C99 spec: "A computation involving unsigned
operands can never overflow, because a result that cannot be represented
by the resulting unsigned integer type is reduced modulo the number that
is one greater than the largest value that can be represented by the
resulting type." The overflow that we need to guard against is checked
for in array_access_ok().

Note that in / down from array_access_ok() the address value is only
ever cast to "unsigned long" anyway, which is why in the invocation from
guest_handle_subrange_okay() the value doesn't need casting back to
pointer type.

In compat grant table code change two guest_handle_add_offset() to avoid
passing in negative offsets.

Since {,__}clear_guest_offset() need touching anyway, also deal with
another (latent) issue there: They were losing the handle type, i.e. the
size of the individual objects accessed. Luckily the few users we
presently have all pass char or uint8 handles.

Reported-by: Andrew Cooper 
Signed-off-by: Jan Beulich 
Acked-by: Andrew Cooper 
Tested-by: Andrew Cooper 
Release-Acked-By: Oleksii Kurochko 

commit 267122a24c499d26278ab2dbdfb46ebcaaf38474
Author: Andrew Cooper 
Date:   Fri Apr 30 16:14:36 2021 +0100

x86/defns: Clean up X86_{XCR0,XSS}_* constants

With the exception of one case in read_bndcfgu() which can use ilog2(),
the *_POS defines are unused.  Drop them.

X86_XCR0_X87 is the name used by both the SDM and APM, rather than
X86_XCR0_FP.

No functional change.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
Release-Acked-by: Oleksii Kurochko 

commit 71cacfb035f4a78ee10970dc38a3baa04d387451
Author: Andrew Cooper 
Date:   Fri Apr 30 20:17:55 2021 +0100

x86/cpuid: Fix handling of XSAVE dynamic leaves

First, if XSAVE is available in hardware but not visible to the guest, the
dynamic leaves shouldn't be filled in.

Second, the comment concerning XSS state is wrong.  VT-x doesn't manage
host/gu

[ovmf test] 186414: all pass - PUSHED

2024-06-19 Thread osstest service owner
flight 186414 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186414/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 95e220e95d6237e21f7c0b83fca02d56b9327c4a
baseline version:
 ovmf 4d4f56992460c039d0cfe48c394c2e07aecf1d22

Last test of basis   186408  2024-06-19 05:41:10 Z0 days
Testing same since   186414  2024-06-19 17:42:49 Z0 days1 attempts


People who touched revisions under test:
  Ard Biesheuvel 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   4d4f569924..95e220e95d  95e220e95d6237e21f7c0b83fca02d56b9327c4a -> 
xen-tested-master



[PATCH for-4.19 v2] x86/spec-ctrl: Support for SRSO_US_NO and SRSO_MSR_FIX

2024-06-19 Thread Andrew Cooper
AMD have updated the SRSO whitepaper[1] with further information.

There's a new SRSO_U/S_NO enumeration saying that SRSO attacks can't cross the
user/supervisor boundary.  i.e. Xen don't need to use IBPB-on-entry for PV.

There's also a new SRSO_MSR_FIX identifying that the BP_SPEC_REDUCE bit is
available in MSR_BP_CFG.  When set, SRSO attacks can't cross the host/guest
boundary.  i.e. Xen don't need to use IBPB-on-entry for HVM.

Extend ibpb_calculations() to account for these when calculating
opt_ibpb_entry_{pv,hvm} defaults.  Add a bp-spec-reduce option to control the
use of BP_SPEC_REDUCE, but activate it by default.

Because MSR_BP_CFG is core-scoped with a race condition updating it, repurpose
amd_check_erratum_1485() into amd_check_bp_cfg() and calculate all updates at
once.

Advertise SRSO_U/S_NO to guests whenever possible, as it allows the guest
kernel to skip SRSO protections too.  This is easy for HVM guests, but hard
for PV guests, as both the guest userspace and kernel operate in CPL3.  After
discussing with AMD, it is believed to be safe to advertise SRSO_U/S_NO to PV
guests when BP_SPEC_REDUCE is active.

Fix a typo in the SRSO_NO's comment.

[1] 
https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Oleksii Kurochko 

v2:
 * Add "for HVM guests" to xen-command-line.pandoc
 * Print details on boot
 * Don't advertise SRSO_US_NO to PV guests if BP_SPEC_REDUCE isn't active.

For 4.19.  This should be no functional change on current hardware.  On
forthcoming hardware, it avoids the substantial perf hits which were necessary
to protect against the SRSO speculative vulnerability.
---
 docs/misc/xen-command-line.pandoc   |  9 +++-
 xen/arch/x86/cpu-policy.c   | 19 
 xen/arch/x86/cpu/amd.c  | 29 +---
 xen/arch/x86/include/asm/msr-index.h|  1 +
 xen/arch/x86/include/asm/spec_ctrl.h|  1 +
 xen/arch/x86/spec_ctrl.c| 49 -
 xen/include/public/arch-x86/cpufeatureset.h |  4 +-
 7 files changed, 92 insertions(+), 20 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 1dea7431fab6..88beb64525d5 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2390,7 +2390,7 @@ By default SSBD will be mitigated at runtime (i.e 
`ssbd=runtime`).
 >  {ibrs,ibpb,ssbd,psfd,
 >  eager-fpu,l1d-flush,branch-harden,srb-lock,
 >  unpriv-mmio,gds-mit,div-scrub,lock-harden,
->  bhi-dis-s}= ]`
+>  bhi-dis-s,bp-spec-reduce}= ]`
 
 Controls for speculative execution sidechannel mitigations.  By default, Xen
 will pick the most appropriate mitigations based on compiled in support,
@@ -2539,6 +2539,13 @@ boolean can be used to force or prevent Xen from using 
speculation barriers to
 protect lock critical regions.  This mitigation won't be engaged by default,
 and needs to be explicitly enabled on the command line.
 
+On hardware supporting SRSO_MSR_FIX, the `bp-spec-reduce=` option can be used
+to force or prevent Xen from using MSR_BP_CFG.BP_SPEC_REDUCE to mitigate the
+SRSO (Speculative Return Stack Overflow) vulnerability.  Xen will use
+bp-spec-reduce when available, as it is preferable to using `ibpb-entry=hvm`
+to mitigate SRSO for HVM guests, and because it is a necessary prerequisite in
+order to advertise SRSO_U/S_NO to PV guests.
+
 ### sync_console
 > `= `
 
diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 304dc20cfab8..fd32fe84 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct cpu_policy __read_mostly   raw_cpu_policy;
@@ -605,6 +606,24 @@ static void __init calculate_pv_max_policy(void)
 __clear_bit(X86_FEATURE_IBRS, fs);
 }
 
+/*
+ * SRSO_U/S_NO means that the CPU is not vulnerable to SRSO attacks across
+ * the User (CPL3)/Supervisor (CPL<3) boundary.  However the PV64
+ * user/kernel boundary is CPL3 on both sides, so it won't convey the
+ * meaning that a PV kernel expects.
+ *
+ * PV32 guests are explicitly unsupported WRT speculative safety, so are
+ * ignored to avoid complicating the logic.
+ *
+ * After discussions with AMD, it is believed to be safe to offer
+ * SRSO_US_NO to PV guests when BP_SPEC_REDUCE is active.
+ *
+ * If BP_SPEC_REDUCE isn't active, remove SRSO_U/S_NO from the PV max
+ * policy, which will cause it to filter out of PV default too.
+ */
+if ( !boot_cpu_has(X86_FEATURE_SRSO_MSR_FIX) || !opt_bp_spec_reduce )
+__clear_bit(X86_FEATURE_SRSO_US_NO, fs);
+
 guest_common_max_feature_adjustments(fs);
 guest_common_feature_adjustments(fs);
 
diff --git a/xen/arch/x86/cpu/amd.c b/xen/ar

[xen-unstable test] 186409: tolerable FAIL - PUSHED

2024-06-19 Thread osstest service owner
flight 186409 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186409/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186401
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186401
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186401
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186401
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186401
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186401
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  53c5c99e8744495395c1274595d6ca55947d1d6a
baseline version:
 xen  bd59af99700f075d06a6d47a16f777c9519928e0

Last test of basis   186401  2024-06-18 21:08:51 Z0 days
Testing same since   186409  2024-06-19 05:58:35 Z0 days1 attempts


People who touched revisions under test:
  Anthony PERARD 
  Henry Wang 
  Michal Orzel 
  Stefano Stabellini 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   

[xen-unstable-smoke test] 186412: regressions - FAIL

2024-06-19 Thread osstest service owner
flight 186412 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186412/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf   6 xen-buildfail REGR. vs. 186411

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  43d5c5d5f70b3f5419e7ef30399d23adf6ddfa8e
baseline version:
 xen  efa6e9f15ba943d154e8d7b29384581915b2aacd

Last test of basis   186411  2024-06-19 12:00:22 Z0 days
Testing same since   186412  2024-06-19 15:03:58 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Julien Grall 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  fail
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  blocked 
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit 43d5c5d5f70b3f5419e7ef30399d23adf6ddfa8e
Author: Jan Beulich 
Date:   Wed Jun 19 14:11:07 2024 +0200

xen: avoid UB in guest handle arithmetic

At least XENMEM_memory_exchange can have huge values passed in the
nr_extents and nr_exchanged fields. Adding such values to pointers can
overflow, resulting in UB. Cast respective pointers to "unsigned long"
while at the same time making the necessary multiplication explicit.
Remaining arithmetic is, despite there possibly being mathematical
overflow, okay as per the C99 spec: "A computation involving unsigned
operands can never overflow, because a result that cannot be represented
by the resulting unsigned integer type is reduced modulo the number that
is one greater than the largest value that can be represented by the
resulting type." The overflow that we need to guard against is checked
for in array_access_ok().

Note that in / down from array_access_ok() the address value is only
ever cast to "unsigned long" anyway, which is why in the invocation from
guest_handle_subrange_okay() the value doesn't need casting back to
pointer type.

In compat grant table code change two guest_handle_add_offset() to avoid
passing in negative offsets.

Since {,__}clear_guest_offset() need touching anyway, also deal with
another (latent) issue there: They were losing the handle type, i.e. the
size of the individual objects accessed. Luckily the few users we
presently have all pass char or uint8 handles.

Reported-by: Andrew Cooper 
Signed-off-by: Jan Beulich 
Acked-by: Andrew Cooper 
Tested-by: Andrew Cooper 
Release-Acked-By: Oleksii Kurochko 

commit 267122a24c499d26278ab2dbdfb46ebcaaf38474
Author: Andrew Cooper 
Date:   Fri Apr 30 16:14:36 2021 +0100

x86/defns: Clean up X86_{XCR0,XSS}_* constants

With the exception of one case in read_bndcfgu() which can use ilog2(),
the *_POS defines are unused.  Drop them.

X86_XCR0_X87 is the name used by both the SDM and APM, rather than
X86_XCR0_FP.

No functional change.

Signed-off-by: Andrew Cooper 
Acked-by: Jan Beulich 
Release-Acked-by: Oleksii Kurochko 

commit 71cacfb035f4a78ee10970dc38a3baa04d387451
Author: Andrew Cooper 
Date:   Fri Apr 30 20:17:55 2021 +0100

x86/cpuid: Fix handling of XSAVE dynamic leaves

First, if XSAVE is available in hardware but not visible to the guest, the
dynamic leaves shouldn't be filled in.

Second, the comment concerning XSS state is wrong.  VT-x doesn't manage
host/gu

[PATCH 1/2] automation/eclair_analysis: deviate MISRA C Rule 21.2

2024-06-19 Thread Alessandro Zucchelli
Rule 21.2 reports identifiers reserved for the C and POSIX standard
libraries: all xen's translation units are compiled with option
-nostdinc, this guarantees that these libraries are not used, therefore
a justification is provided for allowing uses of such identifiers in
the project.
Builtins starting with "__builtin_" still remain available.

No functional change.

Signed-off-by: Alessandro Zucchelli 
---
 automation/eclair_analysis/ECLAIR/deviations.ecl | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/automation/eclair_analysis/ECLAIR/deviations.ecl 
b/automation/eclair_analysis/ECLAIR/deviations.ecl
index 447c1e6661..9fa9a7f01c 100644
--- a/automation/eclair_analysis/ECLAIR/deviations.ecl
+++ b/automation/eclair_analysis/ECLAIR/deviations.ecl
@@ -487,6 +487,17 @@ leads to a violation of the Rule are deviated."
 # Series 21.
 #
 
+-doc_begin="Rules 21.1 and 21.2 report identifiers reserved for the C and POSIX
+standard libraries: if these libraries are not used there is no reason to 
avoid such
+identifiers. All xen's translation units are compiled with option -nostdinc,
+this guarantees that these libraries are not used. Some compilers could perform
+optimization using built-in functions: this risk is partially addressed by
+using the compilation option -fno-builtin. Builtins starting with 
\"__builtin_\"
+still remain available."
+-config=MC3R1.R21.1,macros={safe , "!^__builtin_$" }
+-config=MC3R1.R21.2,declarations+={safe, "!^__builtin_.*$"}
+-doc_end
+
 -doc_begin="Xen does not use the functions provided by the Standard Library, 
but
 implements a set of functions that share the same names as their Standard 
Library equivalent.
 The implementation of these functions is available in source form, so the 
undefined, unspecified
-- 
2.34.1




[PATCH 2/2] x86/APIC: address violation of MISRA C Rule 21.2

2024-06-19 Thread Alessandro Zucchelli
From: Nicola Vetrini 

The rule disallows the usage of an identifier reserved by the C standard.
All identfiers starting with '__' are reserved for any use, so the label
can be renamed in order to avoid the violation.

No functional change.

Signed-off-by: Nicola Vetrini 
---
 xen/arch/x86/apic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index 6567af685a..2a60e6fe26 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -925,7 +925,7 @@ void __init init_apic_mappings(void)
 unsigned long apic_phys;
 
 if ( x2apic_enabled )
-goto __next;
+goto next;
 /*
  * If no local APIC can be found then set up a fake all
  * zeroes page to simulate the local APIC and another
@@ -941,7 +941,7 @@ void __init init_apic_mappings(void)
 apic_printk(APIC_VERBOSE, "mapped APIC to %08Lx (%08lx)\n", APIC_BASE,
 apic_phys);
 
-__next:
+next:
 /*
  * Fetch the APIC ID of the BSP in case we have a
  * default configuration (or the MP table is broken).
-- 
2.34.1




[PATCH 0/2] automation/eclair_analysis: deviate MISRA C Rule 21.2

2024-06-19 Thread Alessandro Zucchelli
This series aims to address several violations of Rule 21.2 which states the
following: A reserved identifier or reserved macro name shall not be declared.
The series contains two patches, one changes x86/APIC which used an identifier
starting with '__', the second deviates all reserved identifiers with the 
exception of those starting with "__builtin_" which still remain available.

Alessandro Zucchelli (1):
  automation/eclair_analysis: deviate MISRA C Rule 21.2

Nicola Vetrini (1):
  x86/APIC: address violation of MISRA C Rule 21.2

 automation/eclair_analysis/ECLAIR/deviations.ecl | 11 +++
 xen/arch/x86/apic.c  |  4 ++--
 2 files changed, 13 insertions(+), 2 deletions(-)

-- 
2.34.1




Re: Design session notes: GPU acceleration in Xen

2024-06-19 Thread Alex Deucher
On Wed, Jun 19, 2024 at 12:27 PM Christian König
 wrote:
>
> Am 18.06.24 um 16:12 schrieb Demi Marie Obenour:
> > On Tue, Jun 18, 2024 at 08:33:38AM +0200, Christian König wrote:
> > > Am 18.06.24 um 02:57 schrieb Demi Marie Obenour:
> > >> On Mon, Jun 17, 2024 at 10:46:13PM +0200, Marek Marczykowski-Górecki
> > >> wrote:
> > >>> On Mon, Jun 17, 2024 at 09:46:29AM +0200, Roger Pau Monné wrote:
> >  On Sun, Jun 16, 2024 at 08:38:19PM -0400, Demi Marie Obenour wrote:
> > > In both cases, the device physical
> > > addresses are identical to dom0’s physical addresses.
> > 
> >  Yes, but a PV dom0 physical address space can be very scattered.
> > 
> >  IIRC there's an hypercall to request physically contiguous memory for
> >  PV, but you don't want to be using that every time you allocate a
> >  buffer (not sure it would support the sizes needed by the GPU
> >  anyway).
> > >>
> > >>> Indeed that isn't going to fly. In older Qubes versions we had PV
> > >>> sys-net with PCI passthrough for a network card. After some uptime it
> > >>> was basically impossible to restart and still have enough contagious
> > >>> memory for a network driver, and there it was about _much_ smaller
> > >>> buffers, like 2M or 4M. At least not without shutting down a lot more
> > >>> things to free some more memory.
> > >>
> > >> Ouch!  That makes me wonder if all GPU drivers actually need physically
> > >> contiguous buffers, or if it is (as I suspect) driver-specific. CCing
> > >> Christian König who has mentioned issues in this area.
> >
> > > Well GPUs don't need physical contiguous memory to function, but if they
> > > only get 4k pages to work with it means a quite large (up to 30%)
> > > performance penalty.
> >
> > The status quo is "no GPU acceleration at all", so 70% of bare metal
> > performance would be amazing right now.
>
> Well AMD uses native context approach in XEN which which delivers over
> 90% of bare metal performance.
>
> Pierre-Eric can tell you more, but we certainly have GPU solutions in
> productions with XEN which would suffer greatly if we see the underlying
> memory fragmented like this.
>
> >   However, the implementation
> > should not preclude eliminating this performance penalty in the future.
> >
> > What size pages do GPUs need for good performance?  Is it the same as
> > CPU huge pages?
>
> 2MiB are usually sufficient.

Larger pages are helpful for both system memory and VRAM, but it's
more important for VRAM.

Alex

>
> Regards,
> Christian.
>
> >   PV dom0 doesn't get huge pages at all, but PVH and HVM
> > guests do, and the goal is to move away from PV guests as they have lots
> > of unrelated problems.
> >
> > > So scattering memory like you described is probably a very bad idea
> > if you
> > > want any halve way decent performance.
> >
> > For an initial prototype a 30% performance penalty is acceptable, but
> > it's good to know that memory fragmentation needs to be avoided.
> >
> > > Regards,
> > > Christian
> >
> > Thanks for the prompt response!
>



[PATCH] xen/guest_access: Fix accessors for 32bit builds of Xen

2024-06-19 Thread Andrew Cooper
Gitlab CI reports an ARM32 randconfig failure as follows:

  In file included from common/livepatch.c:9:
  common/livepatch.c: In function ‘livepatch_list’:
  ./include/xen/guest_access.h:130:25: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
130 | __raw_copy_to_guest((void *)(d_ + (off) * sizeof(*_s)), \
| ^
  common/livepatch.c:1283:18: note: in expansion of macro 
‘__copy_to_guest_offset’
   1283 | if ( __copy_to_guest_offset(list->name, name_offset,
|  ^~
  ./include/xen/guest_access.h:130:25: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
130 | __raw_copy_to_guest((void *)(d_ + (off) * sizeof(*_s)), \
| ^
  common/livepatch.c:1287:17: note: in expansion of macro 
‘__copy_to_guest_offset’
   1287 | __copy_to_guest_offset(list->metadata, 
metadata_offset,
| ^~

This isn't specific to ARM32; it's LIVEPATCH on any 32bit build of Xen.

Both name_offset and metadata_offset are uint64_t, meaning that the
expression:

  (d_ + (off) * sizeof(*(hnd).p)

gets promoted to uint64_t, and is too wide to cast back to a pointer in 32bit
builds.  The expression needs casting through (unsigned long) before it can be
cast to (void *).

Xen has the _p() wrapper to do precisely this.

No functional change.

Link: https://gitlab.com/xen-project/xen/-/jobs/7136417308
Fixes: 43d5c5d5f70b ("xen: avoid UB in guest handle arithmetic")
Signed-off-by: Andrew Cooper 
---
CC: George Dunlap 
CC: Jan Beulich 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Oleksii Kurochko 

For 4.19, as this was found by CI.
---
 xen/include/xen/guest_access.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/xen/include/xen/guest_access.h b/xen/include/xen/guest_access.h
index 96dbef2e0251..969813762c25 100644
--- a/xen/include/xen/guest_access.h
+++ b/xen/include/xen/guest_access.h
@@ -65,7 +65,7 @@
 /* Check that the handle is not for a const type */ \
 void *__maybe_unused _t = (hnd).p;  \
 (void)((hnd).p == _s);  \
-raw_copy_to_guest((void *)(d_ + (off) * sizeof(*_s)), \
+raw_copy_to_guest(_p(d_ + (off) * sizeof(*_s)), \
   _s, (nr) * sizeof(*_s));  \
 })
 
@@ -75,7 +75,7 @@
  */
 #define clear_guest_offset(hnd, off, nr) ({ \
 unsigned long d_ = (unsigned long)(hnd).p;  \
-raw_clear_guest((void *)(d_ + (off) * sizeof(*(hnd).p)), \
+raw_clear_guest(_p(d_ + (off) * sizeof(*(hnd).p)),  \
 (nr) * sizeof(*(hnd).p));   \
 })
 
@@ -87,7 +87,7 @@
 unsigned long s_ = (unsigned long)(hnd).p;  \
 typeof(*(ptr)) *_d = (ptr); \
 raw_copy_from_guest(_d, \
-(const void *)(s_ + (off) * sizeof(*_d)), \
+_p(s_ + (off) * sizeof(*_d)),   \
 (nr) * sizeof(*_d));\
 })
 
@@ -127,13 +127,13 @@
 /* Check that the handle is not for a const type */ \
 void *__maybe_unused _t = (hnd).p;  \
 (void)((hnd).p == _s);  \
-__raw_copy_to_guest((void *)(d_ + (off) * sizeof(*_s)), \
+__raw_copy_to_guest(_p(d_ + (off) * sizeof(*_s)),   \
   _s, (nr) * sizeof(*_s));  \
 })
 
 #define __clear_guest_offset(hnd, off, nr) ({   \
 unsigned long d_ = (unsigned long)(hnd).p;  \
-__raw_clear_guest((void *)(d_ + (off) * sizeof(*(hnd).p)), \
+__raw_clear_guest(_p(d_ + (off) * sizeof(*(hnd).p)),\
   (nr) * sizeof(*(hnd).p)); \
 })
 
@@ -141,7 +141,7 @@
 unsigned long s_ = (unsigned long)(hnd).p;  \
 typeof(*(ptr)) *_d = (ptr); \
 __raw_copy_from_guest(_d,   \
-  (const void *)(s_ + (off) * sizeof(*_d)), \
+  _p(s_ + (off) * sizeof(*_d)), \
   (nr) * sizeof(*_d));  \
 })
 

base-commit: 43d5c5d5f70b3f5419e7ef30399d23adf6ddfa8e
-- 
2.39.2




Re: [RFC PATCH] iommu/xen: Add Xen PV-IOMMU driver

2024-06-19 Thread Jason Gunthorpe
On Thu, Jun 13, 2024 at 01:50:22PM +, Teddy Astie wrote:

> +struct iommu_domain *xen_iommu_domain_alloc(unsigned type)
> +{
> + struct xen_iommu_domain *domain;
> + u16 ctx_no;
> + int ret;
> +
> + if (type & IOMMU_DOMAIN_IDENTITY) {
> + /* use default domain */
> + ctx_no = 0;

Please use the new ops, domain_alloc_paging and the static identity domain.

> +static struct iommu_group *xen_iommu_device_group(struct device *dev)
> +{
> + if (!dev_is_pci(dev))
> + return ERR_PTR(-ENODEV);
> +

device_group is only called after probe_device, since you already
exclude !pci during probe there is no need for this wrapper, just set
the op directly to pci_device_group.

> +static void xen_iommu_release_device(struct device *dev)
> +{
> + int ret;
> + struct pci_dev *pdev;
> + struct pv_iommu_op op = {
> + .subop_id = IOMMUOP_reattach_device,
> + .flags = 0,
> + .ctx_no = 0 /* reattach device back to default context */
> + };

Consider if you can use release_domain for this, I think this is
probably a BLOCKED domain behavior.

> + if (!dev_is_pci(dev))
> + return;

No op is ever called on a non-probed device, remove all these checks.

> +static int xen_iommu_map_pages(struct iommu_domain *domain, unsigned long 
> iova,
> +phys_addr_t paddr, 
> size_t pgsize, size_t pgcount,
> +int prot, gfp_t gfp, 
> size_t *mapped)
> +{
> + size_t xen_pg_count = (pgsize / XEN_PAGE_SIZE) * pgcount;
> + struct xen_iommu_domain *dom = to_xen_iommu_domain(domain);
> + struct pv_iommu_op op = {
> + .subop_id = IOMMUOP_map_pages,
> + .flags = 0,
> + .ctx_no = dom->ctx_no
> + };
> + /* NOTE: paddr is actually bound to pfn, not gfn */
> + uint64_t pfn = addr_to_pfn(paddr);
> + uint64_t dfn = addr_to_pfn(iova);
> + int ret = 0;
> +
> + if (WARN(!dom->ctx_no, "Tried to map page to default context"))
> + return -EINVAL;

A paging domain should be the only domain ops that have a populated
map so this should be made impossible by construction.

Jason



Re: [PATCH] AMD/IOMMU: Improve register_iommu_exclusion_range()

2024-06-19 Thread Andrew Cooper
On 19/06/2024 8:45 am, Jan Beulich wrote:
> On 18.06.2024 20:31, Andrew Cooper wrote:
>>  * Use 64bit accesses instead of 32bit accesses
>>  * Simplify the constant names
>>  * Pull base into a local variable to avoid it being reloaded because of the
>>memory clobber in writeq().
>>
>> No functional change.
>>
>> Signed-off-by: Andrew Cooper 
>> ---
>> CC: Jan Beulich 
>> CC: Roger Pau Monné 
>>
>> RFC.  This is my proposed way of cleaning up the whole IOMMU file.  The
>> diffstat speaks for itself.
> Absolutely.
>
>> I've finally found the bit in the AMD IOMMU spec which says 64bit accesses 
>> are
>> permitted:
>>
>>   3.4 IOMMU MMIO Registers:
>>
>>   Software access to IOMMU registers may not be larger than 64 bits. Accesses
>>   must be aligned to the size of the access and the size in bytes must be a
>>   power of two. Software may use accesses as small as one byte.
> I take it that the use of 32-bit writes was because of the past need
> also work in a 32-bit hypervisor, not because of perceived restrictions
> by the spec.

I recall having problems getting writeq() acked in the past, even after
we'd dropped 32bit.

But this is the first time that I've positively found anything in the
spec saying that 64bit accesses are ok.

>
>> --- a/xen/drivers/passthrough/amd/iommu-defs.h
>> +++ b/xen/drivers/passthrough/amd/iommu-defs.h
>> @@ -338,22 +338,10 @@ union amd_iommu_control {
>>  };
>>  
>>  /* Exclusion Register */
>> -#define IOMMU_EXCLUSION_BASE_LOW_OFFSET 0x20
>> -#define IOMMU_EXCLUSION_BASE_HIGH_OFFSET0x24
>> -#define IOMMU_EXCLUSION_LIMIT_LOW_OFFSET0x28
>> -#define IOMMU_EXCLUSION_LIMIT_HIGH_OFFSET   0x2C
>> -#define IOMMU_EXCLUSION_BASE_LOW_MASK   0xF000U
>> -#define IOMMU_EXCLUSION_BASE_LOW_SHIFT  12
>> -#define IOMMU_EXCLUSION_BASE_HIGH_MASK  0xU
>> -#define IOMMU_EXCLUSION_BASE_HIGH_SHIFT 0
>> -#define IOMMU_EXCLUSION_RANGE_ENABLE_MASK   0x0001U
>> -#define IOMMU_EXCLUSION_RANGE_ENABLE_SHIFT  0
>> -#define IOMMU_EXCLUSION_ALLOW_ALL_MASK  0x0002U
>> -#define IOMMU_EXCLUSION_ALLOW_ALL_SHIFT 1
>> -#define IOMMU_EXCLUSION_LIMIT_LOW_MASK  0xF000U
>> -#define IOMMU_EXCLUSION_LIMIT_LOW_SHIFT 12
>> -#define IOMMU_EXCLUSION_LIMIT_HIGH_MASK 0xU
>> -#define IOMMU_EXCLUSION_LIMIT_HIGH_SHIFT0
>> +#define IOMMU_MMIO_EXCLUSION_BASE   0x20
>> +#define   EXCLUSION_RANGE_ENABLE(1 << 0)
>> +#define   EXCLUSION_ALLOW_ALL   (1 << 1)
>> +#define IOMMU_MMIO_EXCLUSION_LIMIT  0x28
> Just one question here: Previously you suggested we switch to bitfields
> for anything like this, and we've already done so with e.g.
> union amd_iommu_control and union amd_iommu_ext_features. IOW I wonder
> if we wouldn't better strive to be consistent in this regard. Or if not,
> what the (written or unwritten) guidelines are when to use which
> approach.

We've got two very different kinds of things here.

The device table/etc are in-memory WB datastructure which we're
interpreting and editing routinely.  It's completely full of bits and
small fields, and letting the compiler do the hard work there is
preferable; certainly in terms of legibility.

This example is an MMIO register (in a bar on the IOMMU PCI device, even
though we find the address in the IVRS).  We set it up once at boot and
don't touch it afterwards.

So while we could make a struct for it, we'd still need to get it into a
form that we can writeq(), and that's more code than the single case
were we need to put two metadata bits into an address.

~Andrew



Re: [PATCH 14/26] block: move the nonrot flag to queue_limits

2024-06-19 Thread Simon Fernandez
Hi folks, how can I unsubscribe from this group.?
Thanks in advance.
S

> On 17 Jun 2024, at 07:04, Christoph Hellwig  wrote:
> 
> Move the nonrot flag into the queue_limits feature field so that it can
> be set atomically with the queue frozen.
> 
> Use the chance to switch to defaulting to non-rotational and require
> the driver to opt into rotational, which matches the polarity of the
> sysfs interface.
> 
> For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new
> rotational flag is not set as they clearly are not rotational despite
> this being a behavior change.  There are some other drivers that
> unconditionally set the rotational flag to keep the existing behavior
> as they arguably can be used on rotational devices even if that is
> probably not their main use today (e.g. virtio_blk and drbd).
> 
> The flag is automatically inherited in blk_stack_limits matching the
> existing behavior in dm and md.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Damien Le Moal 
> ---
> arch/m68k/emu/nfblock.c |  1 +
> arch/um/drivers/ubd_kern.c  |  1 -
> arch/xtensa/platforms/iss/simdisk.c |  5 +++-
> block/blk-mq-debugfs.c  |  1 -
> block/blk-sysfs.c   | 39 ++---
> drivers/block/amiflop.c |  5 +++-
> drivers/block/aoe/aoeblk.c  |  1 +
> drivers/block/ataflop.c |  5 +++-
> drivers/block/brd.c |  2 --
> drivers/block/drbd/drbd_main.c  |  3 ++-
> drivers/block/floppy.c  |  3 ++-
> drivers/block/loop.c|  8 +++---
> drivers/block/mtip32xx/mtip32xx.c   |  1 -
> drivers/block/n64cart.c |  2 --
> drivers/block/nbd.c |  5 
> drivers/block/null_blk/main.c   |  1 -
> drivers/block/pktcdvd.c |  1 +
> drivers/block/ps3disk.c |  3 ++-
> drivers/block/rbd.c |  3 ---
> drivers/block/rnbd/rnbd-clt.c   |  4 ---
> drivers/block/sunvdc.c  |  1 +
> drivers/block/swim.c|  5 +++-
> drivers/block/swim3.c   |  5 +++-
> drivers/block/ublk_drv.c|  9 +++
> drivers/block/virtio_blk.c  |  4 ++-
> drivers/block/xen-blkfront.c|  1 -
> drivers/block/zram/zram_drv.c   |  2 --
> drivers/cdrom/gdrom.c   |  1 +
> drivers/md/bcache/super.c   |  2 --
> drivers/md/dm-table.c   | 12 -
> drivers/md/md.c | 13 --
> drivers/mmc/core/queue.c|  1 -
> drivers/mtd/mtd_blkdevs.c   |  1 -
> drivers/nvdimm/btt.c|  1 -
> drivers/nvdimm/pmem.c   |  1 -
> drivers/nvme/host/core.c|  1 -
> drivers/nvme/host/multipath.c   |  1 -
> drivers/s390/block/dasd_genhd.c |  1 -
> drivers/s390/block/scm_blk.c|  1 -
> drivers/scsi/sd.c   |  4 +--
> include/linux/blkdev.h  | 10 
> 41 files changed, 83 insertions(+), 88 deletions(-)
> 
> diff --git a/arch/m68k/emu/nfblock.c b/arch/m68k/emu/nfblock.c
> index 642fb80c5c4e31..8eea7ef9115146 100644
> --- a/arch/m68k/emu/nfblock.c
> +++ b/arch/m68k/emu/nfblock.c
> @@ -98,6 +98,7 @@ static int __init nfhd_init_one(int id, u32 blocks, u32 
> bsize)
> {
>   struct queue_limits lim = {
>   .logical_block_size = bsize,
> + .features   = BLK_FEAT_ROTATIONAL,
>   };
>   struct nfhd_device *dev;
>   int dev_id = id - NFHD_DEV_OFFSET;
> diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
> index 19e01691ea0ea7..9f1e76ddda5a26 100644
> --- a/arch/um/drivers/ubd_kern.c
> +++ b/arch/um/drivers/ubd_kern.c
> @@ -882,7 +882,6 @@ static int ubd_add(int n, char **error_out)
>   goto out_cleanup_tags;
>   }
> 
> - blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
>   disk->major = UBD_MAJOR;
>   disk->first_minor = n << UBD_SHIFT;
>   disk->minors = 1 << UBD_SHIFT;
> diff --git a/arch/xtensa/platforms/iss/simdisk.c 
> b/arch/xtensa/platforms/iss/simdisk.c
> index defc67909a9c74..d6d2b533a5744d 100644
> --- a/arch/xtensa/platforms/iss/simdisk.c
> +++ b/arch/xtensa/platforms/iss/simdisk.c
> @@ -263,6 +263,9 @@ static const struct proc_ops simdisk_proc_ops = {
> static int __init simdisk_setup(struct simdisk *dev, int which,
>   struct proc_dir_entry *procdir)
> {
> + struct queue_limits lim = {
> + .features   = BLK_FEAT_ROTATIONAL,
> + };
>   char tmp[2] = { '0' + which, 0 };
>   int err;
> 
> @@ -271,7 +274,7 @@ static int __init simdisk_setup(struct simdisk *dev, int 
> which,
>   spin_lock_init(&dev->lock);
>   dev->users = 0;
> 
> - dev->gd = blk_alloc_disk(NULL, NUMA_NO_NODE);
> + dev->gd = blk_alloc_disk(&lim, NUMA_NO_NODE);
>   if (IS_ERR(dev->gd)) {
>   err = PTR_ERR(dev->gd);
>   goto out;
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c

Re: [PATCH for-4.19? v6 6/9] xen: Make the maximum number of altp2m views configurable for x86

2024-06-19 Thread Petr Beneš
On Thu, Jun 13, 2024 at 2:03 PM Jan Beulich  wrote:
> > @@ -510,13 +526,13 @@ int p2m_change_altp2m_gfn(struct domain *d, unsigned 
> > int idx,
> >  mfn_t mfn;
> >  int rc = -EINVAL;
> >
> > -if ( idx >=  min(ARRAY_SIZE(d->arch.altp2m_p2m), MAX_EPTP) ||
> > +if ( idx >= d->nr_altp2m ||
> >   d->arch.altp2m_eptp[array_index_nospec(idx, MAX_EPTP)] ==
>
> This ends up being suspicious: The range check is against a value different
> from what is passed to array_index_nospec(). The two weren't the same
> before either, but there the range check was more strict (which now isn't
> visible anymore, even though I think it would still be true). Imo this
> wants a comment, or an assertion effectively taking the place of a comment.

> Since they're all "is this slot populated" checks, maybe we want
> an is_altp2m_eptp_valid() helper?

Let me see if I understand correctly. You're suggesting the condition
should be replaced with something like this? (Also, I would suggest
altp2m_is_eptp_valid() name, since it's consistent e.g. with
p2m_is_altp2m().)

static inline bool altp2m_is_eptp_valid(const struct domain *d,
unsigned int idx)
{
/*
 * EPTP index is correlated with altp2m index and should not exceed
 * d->nr_altp2m.
 */
assert(idx < d->nr_altp2m);

return idx < MAX_EPTP &&
d->arch.altp2m_eptp[array_index_nospec(idx, MAX_EPTP)] !=
mfn_x(INVALID_MFN);
}

Note that in the codebase there are also very similar checks, but
again without array_index_nospec. For instance, in the
p2m_altp2m_propagate_change() function (which is called fairly
frequently):

int p2m_altp2m_propagate_change(struct domain *d, gfn_t gfn,
mfn_t mfn, unsigned int page_order,
p2m_type_t p2mt, p2m_access_t p2ma)
{
struct p2m_domain *p2m;
unsigned int i;
unsigned int reset_count = 0;
unsigned int last_reset_idx = ~0;
int ret = 0;

if ( !altp2m_active(d) )
return 0;

altp2m_list_lock(d);

for ( i = 0; i < d->nr_altp2m; i++ )
{
p2m_type_t t;
p2m_access_t a;

// XXX this could be replaced with altp2m_is_eptp_valid(), but
based on previous review remarks,
// it would introduce unnecessary perf. hit. So, should these
occurrences left unchanged?
if ( d->arch.altp2m_eptp[i] == mfn_x(INVALID_MFN) )
continue;

   ...

There are more instances of this. Which re-opens again the issue from
previous conversation: should I introduce a function which will be
used in some cases (where _nospec is used) and not used elsewhere?

P.



Re: xen | Failed pipeline for staging | 43d5c5d5

2024-06-19 Thread Andrew Cooper
On 19/06/2024 3:44 pm, GitLab wrote:
> GitLab
> ✖ Pipeline #1338876222 has failed!
> 
>  
> Project   xen-project  / xen
> 
> Branch
>   staging 
> 
> Commit
>   43d5c5d5
> 
> 
> xen: avoid UB in guest handle arithmetic At le...
> Commit Author 
>   Jan Beulich 
> 
>  
> Pipeline #1338876222
>  triggered by
>   Ganis 
> 
> had 1 failed job
> Failed job
> ✖ build
> 
>   debian-bookworm-gcc-arm32-debug-randconfig
> 

This is:

In file included from common/livepatch.c:9:
common/livepatch.c: In function 'livepatch_list':
./include/xen/guest_access.h:130:25: error: cast to pointer from integer
of different size [-Werror=int-to-pointer-cast]
  130 | __raw_copy_to_guest((void *)(d_ + (off) * sizeof(*_s)), \
  | ^
common/livepatch.c:1283:18: note: in expansion of macro
'__copy_to_guest_offset'
 1283 | if ( __copy_to_guest_offset(list->name, name_offset,
  |  ^~
./include/xen/guest_access.h:130:25: error: cast to pointer from integer
of different size [-Werror=int-to-pointer-cast]
  130 | __raw_copy_to_guest((void *)(d_ + (off) * sizeof(*_s)), \
  | ^
common/livepatch.c:1287:17: note: in expansion of macro
'__copy_to_guest_offset'
 1287 | __copy_to_guest_offset(list->metadata,
metadata_offset,
  | ^~


The problem is that (off) is of type uint64_t, so

   (const void *)(s_ + (off) * sizeof(*_d))

ends up being a uint64_t -> uint32_t down-convert in arm32.

This wants to use the _p() macro which takes care of casting through
(unsigned long) on its way to a pointer.  I'll do a patch.

~Andrew



[libvirt test] 186407: tolerable all pass - PUSHED

2024-06-19 Thread osstest service owner
flight 186407 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186407/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186390
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass

version targeted for testing:
 libvirt  2b199ad3f1a69bd6de1b2a1fc7d5bd31817dcb11
baseline version:
 libvirt  3fff8c91b059ec450a1c94c8fced685a7de19d42

Last test of basis   186390  2024-06-18 04:18:45 Z1 days
Testing same since   186407  2024-06-19 04:22:27 Z0 days1 attempts


People who touched revisions under test:
  Adam Julis 
  Boris Fiuczynski 
  Daniel P. Berrangé 
  Marc Hartmayer 
  Michal Privoznik 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-amd64-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-amd64-amd64-libvirt-raw pass
 test-arm64-arm64-libvirt-raw pass
 test-amd64-amd64-libvirt-vhd pass
 test-armhf-armhf-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/libvirt.git
   3fff8c91b0..2b199ad3f1  2b199ad3f1a69bd6de1b2a1fc7d5bd31817dcb11 -> 
xen-tested-master



[xen-unstable-smoke test] 186411: tolerable all pass - PUSHED

2024-06-19 Thread osstest service owner
flight 186411 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186411/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  efa6e9f15ba943d154e8d7b29384581915b2aacd
baseline version:
 xen  53c5c99e8744495395c1274595d6ca55947d1d6a

Last test of basis   186404  2024-06-19 01:00:23 Z0 days
Testing same since   186411  2024-06-19 12:00:22 Z0 days1 attempts


People who touched revisions under test:
  Julien Grall 
  Michal Orzel 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   53c5c99e87..efa6e9f15b  efa6e9f15ba943d154e8d7b29384581915b2aacd -> smoke



Re: [PATCH for-4.19] hotplug: Restore block-tap phy compatibility

2024-06-19 Thread Anthony PERARD
On Wed, Jun 19, 2024 at 02:07:04PM +0200, Jan Beulich wrote:
> On 16.05.2024 15:52, Jason Andryuk wrote:
> > On 2024-05-16 03:41, Jan Beulich wrote:
> >> On 16.05.2024 04:22, Jason Andryuk wrote:
> >>> From: Jason Andryuk 
> >>>
> >>> From: Jason Andryuk 
> >>
> >> Two identical From: (also in another patch of yours, while in yet another 
> >> one
> >> you have two _different_ ones, when only one will survive into the eventual
> >> commit anyway)?
> > 
> > Sorry about that.  Since I was sending from my gmail account, I thought 
> > I needed explicit From: lines to ensure the authorship was listed w/ 
> > amd.com.  I generated the patches with `git format-patch --from`, to get 
> > the explicit From: lines, and then sent with `git send-email`.  The 
> > send-email step then inserted the additional lines.  I guess it added 
> >  From amd.com since I had changed to that address in .gitconfig.
> > 
> >>> backendtype=phy using the blktap kernel module needs to use write_dev,
> >>> but tapback can't support that.  tapback should perform better, but make
> >>> the script compatible with the old kernel module again.
> >>>
> >>> Signed-off-by: Jason Andryuk 
> >>
> >> Should there be a Fixes: tag here?
> > 
> > That makes sense.
> > 
> > Fixes: 76a484193d ("hotplug: Update block-tap")
> 
> Surely this wants going into 4.19? Thus - Anthony, Oleksii?

Yes, I think so.

Acked-by: Anthony PERARD 

Thanks,

-- 


Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



Re: [PATCH] loadpolicy: Verifies memory allocation during policy loading

2024-06-19 Thread Daniel P. Smith

On 6/19/24 08:04, Jan Beulich wrote:

On 27.05.2024 14:54, ysk...@gmail.com wrote:

--- a/tools/flask/utils/loadpolicy.c
+++ b/tools/flask/utils/loadpolicy.c
@@ -58,6 +58,11 @@ int main (int argCnt, const char *args[])
  }
  
  polMemCp = malloc(info.st_size);

+if (!polMemCp) {
+fprintf(stderr, "Error occurred allocating %ld bytes\n", info.st_size);
+ret = -ENOMEM;


I don't think -ENOMEM is valid to use here. See neighboring code. Nevertheless
it is correct that a check should be here.

As to %ld - is that portably usable with an off_t value?

In any event, Daniel, really your turn to review / ack. I'm looking at this
merely because I found this and another bugfix still sit in waiting-for-ack
state.


I saw this but was on the fence of whether it really required my ack 
since it was more of a toolstack code fix versus an XSM relevant change.


With that said, and to expand on Jan's comment regarding ENOMEM, the 
utility does not currently differentiate main's return code. Unless the 
tools maintainer wants to start changing this, I would suggest setting 
ret to -1.


As to the '%ld', aligning with Jan's first comment, perhaps you might 
consider just reporting `strerror(errno)` similar to the other error 
handling checks. NB: it is likely errno will be set to -ENOMEM, so by 
doing this you will end up notifying ENOMEM occurred as you were 
attempting to do by providing it with `ret`. Additionally, then you 
won't have to deal with portability concerns over off_t.


V/r,
Daniel P. Smith





Re: move features flags into queue_limits v2

2024-06-19 Thread Christoph Hellwig
On Wed, Jun 19, 2024 at 08:21:14AM -0600, Jens Axboe wrote:
> Please check for-6.11/block, as I pulled in the changes to the main
> block branch and that threw some merge conflicts mostly due to Damien's
> changes in for-6.11/block. While fixing those up, I also came across
> oddities like:
> 
> (limits->features & limits->features & BLK_FEAT_ZONED)) {
> 
> which don't make much sense and hence I changed them to
> 
> (limits->features & BLK_FEAT_ZONED)) {

Yeah.  The above is harmless but of course completely pointless.



Re: move features flags into queue_limits v2

2024-06-19 Thread Jens Axboe
On 6/19/24 8:18 AM, Jens Axboe wrote:
> 
> On Mon, 17 Jun 2024 08:04:27 +0200, Christoph Hellwig wrote:
>> this is the third and last major series to convert settings to
>> queue_limits for this merge window.  After a bunch of prep patches to
>> get various drivers in shape, it moves all the queue_flags that specify
>> driver controlled features into the queue limits so that they can be
>> set atomically and are separated from the blk-mq internal flags.
>>
>> Note that I've only Cc'ed the maintainers for drivers with non-mechanical
>> changes as the Cc list is already huge.
>>
>> [...]
> 
> Applied, thanks!

Please check for-6.11/block, as I pulled in the changes to the main
block branch and that threw some merge conflicts mostly due to Damien's
changes in for-6.11/block. While fixing those up, I also came across
oddities like:

(limits->features & limits->features & BLK_FEAT_ZONED)) {

which don't make much sense and hence I changed them to

(limits->features & BLK_FEAT_ZONED)) {

-- 
Jens Axboe




Re: move features flags into queue_limits v2

2024-06-19 Thread Jens Axboe


On Mon, 17 Jun 2024 08:04:27 +0200, Christoph Hellwig wrote:
> this is the third and last major series to convert settings to
> queue_limits for this merge window.  After a bunch of prep patches to
> get various drivers in shape, it moves all the queue_flags that specify
> driver controlled features into the queue limits so that they can be
> set atomically and are separated from the blk-mq internal flags.
> 
> Note that I've only Cc'ed the maintainers for drivers with non-mechanical
> changes as the Cc list is already huge.
> 
> [...]

Applied, thanks!

[01/26] xen-blkfront: don't disable cache flushes when they fail
commit: dd9300e9eaeeb212f77ffeb72d1d8756107f1f1f
[02/26] sd: remove sd_is_zoned
commit: be60e7700e6df1e16a2f60f45bece08e6140a46d
[03/26] sd: move zone limits setup out of sd_read_block_characteristics
commit: 308ad58af49d6c4c3b7a36b98972cc9db4d7b36a
[04/26] loop: stop using loop_reconfigure_limits in __loop_clr_fd
commit: c9055b44abe60da69aa4ee4fdcb78ee7fe75
[05/26] loop: always update discard settings in loop_reconfigure_limits
commit: ae0d40ff49642651f969883ef9fc79d69c1632d7
[06/26] loop: regularize upgrading the block size for direct I/O
commit: a17ece76bcfe7b86327b19cae1652d7c62068a30
[07/26] loop: also use the default block size from an underlying block device
commit: 4ce37fe0938b02b7b947029c40b72d76a22a3882
[08/26] loop: fold loop_update_rotational into loop_reconfigure_limits
commit: 97dd4a43d69b74a114be466d6887e257971adfe9
[09/26] virtio_blk: remove virtblk_update_cache_mode
commit: bbe5c84122b35c37f2706872fe34da66f0854b56
[10/26] nbd: move setting the cache control flags to __nbd_set_size
commit: 6b377787a306253111404325aee98005b361e59a
[11/26] block: freeze the queue in queue_attr_store
commit: af2814149883e2c1851866ea2afcd8eadc040f79
[12/26] block: remove blk_flush_policy
commit: 70905f8706b62113ae32c8df721384ff6ffb6c6a
[13/26] block: move cache control settings out of queue->flags
commit: 1122c0c1cc71f740fa4d5f14f239194e06a1d5e7
[14/26] block: move the nonrot flag to queue_limits
commit: bd4a633b6f7c3c6b6ebc1a07317643270e751a94
[15/26] block: move the add_random flag to queue_limits
commit: 39a9f1c334f9f27b3b3e6d0005c10ed667268346
[16/26] block: move the io_stat flag setting to queue_limits
commit: cdb2497918cc2929691408bac87b58433b45b6d3
[17/26] block: move the stable_writes flag to queue_limits
commit: 1a02f3a73f8c670eddeb44bf52a75ae7f67cfc11
[18/26] block: move the synchronous flag to queue_limits
commit: aadd5c59c910427c0464c217d5ed588ff14e2502
[19/26] block: move the nowait flag to queue_limits
commit: f76af42f8bf13d2620084f305f01691de9238fc7
[20/26] block: move the dax flag to queue_limits
commit: f467fee48da4500786e145489787b37adae317c3
[21/26] block: move the poll flag to queue_limits
commit: 8023e144f9d6e35f8786937e2f0c2fea0aba6dbc
[22/26] block: move the zoned flag into the features field
commit: b1fc937a55f5735b98d9dceae5bb6ba262501f56
[23/26] block: move the zone_resetall flag to queue_limits
commit: a52758a39768f441e468a41da6c15a59d6d6011a
[24/26] block: move the pci_p2pdma flag to queue_limits
commit: 9c1e42e3c876c66796eda23e79836a4d92613a61
[25/26] block: move the skip_tagset_quiesce flag to queue_limits
commit: 8c8f5c85b20d0a7dc0ab9b2a17318130d69ceb5a
[26/26] block: move the bounce flag into the features field
commit: 339d3948c07b4aa2940aeb874294a7d6782cec16

Best regards,
-- 
Jens Axboe






Re: MISRA C Rule 5.3 violation - shadowing in mctelem.c

2024-06-19 Thread Jan Beulich
On 19.06.2024 15:23, Nicola Vetrini wrote:
> I was looking at the shadowing due to the struct identifier and the 
> local variables "mctctl" in x86/cpu/mcheck/mctelem.c (see [1], the 
> second report). This kind of shadowing seems very intentional, and the 
> initial naive approach I devised was to simply rename the local 
> variables.
> This, however, results in build breakages, as sometimes the shadowed 
> name seems to be used for accessing the global struct (unless I'm 
> missing something), and as a result changing the name of the locals is 
> not possible, at least not without further modifications to this file, 
> which aren't obvious to me.
> 
> It would be really helpful if you could point me to either:
> - avoid the shadowing in some way that does not occur to me at the 
> moment;

Could you please be more specific about the issues you encountered? I
hope you don't expect everyone reading this request of yours to (try to)
redo what you did. The only thing I could vaguely guess is that maybe
you went a little too far with the renaming. Plus, just from looking at
the grep output, did you try to simply move down the file scope variable?
It looks like all shadowing instances are ahead of any uses of the
variable (but I may easily be overlooking an important line contradicting
that pattern).

> - deviate this file, as many similar files in x86/cpu are already 
> deviated.

I question the presence of these in those files. They were apparently all
added when the files were introduced, and said commit - from Simone, acked
by Stefano - came with no justification at all.

Jan



MISRA C Rule 5.3 violation - shadowing in mctelem.c

2024-06-19 Thread Nicola Vetrini

Hi all,

I was looking at the shadowing due to the struct identifier and the 
local variables "mctctl" in x86/cpu/mcheck/mctelem.c (see [1], the 
second report). This kind of shadowing seems very intentional, and the 
initial naive approach I devised was to simply rename the local 
variables.
This, however, results in build breakages, as sometimes the shadowed 
name seems to be used for accessing the global struct (unless I'm 
missing something), and as a result changing the name of the locals is 
not possible, at least not without further modifications to this file, 
which aren't obvious to me.


It would be really helpful if you could point me to either:
- avoid the shadowing in some way that does not occur to me at the 
moment;
- deviate this file, as many similar files in x86/cpu are already 
deviated.


What's your opinion on this?

Thanks,
  Nicola

[1] 
https://saas.eclairit.com:3787/fs/var/local/eclair/XEN.ecdf/ECLAIR_normal/staging/X86_64-BUGSENG/latest/PROJECT.ecd;/by_service/MC3R1.R5.3.html


--
Nicola Vetrini, BSc
Software Engineer, BUGSENG srl (https://bugseng.com)



Re: [PATCH] tools/libs/light: Fix nic->vlan memory allocation

2024-06-19 Thread Anthony PERARD
On Mon, May 20, 2024 at 01:08:03PM -0400, Jason Andryuk wrote:
> On 2024-05-20 12:44, Leigh Brown wrote:
> > After the following commit:
> > 3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic")
> > xl list -l aborts with a double free error if a domain has at least
> > one vif defined:
> > 
> >$ sudo xl list -l
> >free(): double free detected in tcache 2
> >Aborted
> > 
> > Orginally, the vlan field was called vid and was defined as an integer.
> > It was appropriate to call libxl__xs_read_checked() with gc passed as
> > the string data was copied to a different variable.  However, the final
> > version uses a string data type and the call should have been changed
> > to use NOGC instead of gc to allow that data to live past the gc
> > controlled lifetime, in line with the other string fields.
> > 
> > This patch makes the change to pass NOGC instead of gc and moves the
> > new code to be next to the other string fields (fixing a couple of
> > errant tabs along the way), as recommended by Jason.
> > 
> > Fixes: 3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic")
> > Signed-off-by: Leigh Brown 
> 
> Reviewed-by: Jason Andryuk 

Acked-by: Anthony PERARD 

Thanks,

-- 


Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



Re: [PATCH for-4.19] xen/arm: static-shmem: fix "gbase/pbase used uninitialized" build failure

2024-06-19 Thread Julien Grall




On 19/06/2024 13:06, Michal Orzel wrote:

Hi Julien,

On 19/06/2024 13:55, Julien Grall wrote:



Hi Michal,

On 19/06/2024 07:46, Michal Orzel wrote:

Building Xen with CONFIG_STATIC_SHM=y results in a build failure:

arch/arm/static-shmem.c: In function 'process_shm':
arch/arm/static-shmem.c:327:41: error: 'gbase' may be used uninitialized 
[-Werror=maybe-uninitialized]
327 | if ( is_domain_direct_mapped(d) && (pbase != gbase) )
arch/arm/static-shmem.c:305:17: note: 'gbase' was declared here
305 | paddr_t gbase, pbase, psize;

This is because the commit cb1ddafdc573 adds a check referencing
gbase/pbase variables which were not yet assigned a value. Fix it.

Fixes: cb1ddafdc573 ("xen/arm/static-shmem: Static-shmem should be direct-mapped for 
direct-mapped domains")
Signed-off-by: Michal Orzel 
---
Rationale for 4.19: this patch fixes a build failure reported by CI:
https://gitlab.com/xen-project/xen/-/jobs/7131807878
---
   xen/arch/arm/static-shmem.c | 13 +++--
   1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
index c434b96e6204..cd48d2896b7e 100644
--- a/xen/arch/arm/static-shmem.c
+++ b/xen/arch/arm/static-shmem.c
@@ -324,12 +324,6 @@ int __init process_shm(struct domain *d, struct 
kernel_info *kinfo,
   printk("%pd: static shared memory bank not found: '%s'", d, 
shm_id);
   return -ENOENT;
   }
-if ( is_domain_direct_mapped(d) && (pbase != gbase) )
-{
-printk("%pd: physical address 0x%"PRIpaddr" and guest address 
0x%"PRIpaddr" are not direct-mapped.\n",
-   d, pbase, gbase);
-return -EINVAL;
-}

   pbase = boot_shm_bank->start;
   psize = boot_shm_bank->size;
@@ -353,6 +347,13 @@ int __init process_shm(struct domain *d, struct 
kernel_info *kinfo,
   /* guest phys address is after host phys address */
   gbase = dt_read_paddr(cells + addr_cells, addr_cells);

+if ( is_domain_direct_mapped(d) && (pbase != gbase) )
+{
+printk("%pd: physical address 0x%"PRIpaddr" and guest address 
0x%"PRIpaddr" are not direct-mapped.\n",
+   d, pbase, gbase);
+return -EINVAL;
+}
+


Before this patch, the check was globally. I guess the intention was it
covers the two part of the "if". But now, you only have it in when
"paddr" is specified in the DT.

  From a brief look at the code, I can't figure out why we don't need a
similar check on the else path. Is this because it is guarantee that
will be paddr == gaddr?

The reason why I added this check only in the first case is due to what doc 
states.
It says that if a domain is 1:1, the shmem should be also 1:1 i.e. pbase == 
gbase. In the else
case the pbase is omitted and thus a user cannot know and has no guarantee what 
will be the backing physical address.


The property "direct-map" has the following definition:

"- direct-map

Only available when statically allocated memory is used for the domain.
An empty property to request the memory of the domain to be
direct-map (guest physical address == physical address).
"

So I think it would be fair for someone to interpret it as shared memory 
would also be 1:1 mapped.



Thus, reading this doc makes me feel that for 1:1 guests user needs to specify 
pbase == gbase.


See above, I think this is not 100% clear. I am concerned that someone 
may try to use the version where only the guest address is specified.


It would likely be hard to realize that the extended regions would not 
work properly. So something needs to be done.


I don't have any preference on how to address. It could simply be a 
check in Xen to request that both "gaddr" and "paddr" are specified for 
direct mapped domain.


Cheers,

--
Julien Grall



Re: [PATCH for-4.19] hotplug: Restore block-tap phy compatibility

2024-06-19 Thread Jan Beulich
On 16.05.2024 15:52, Jason Andryuk wrote:
> On 2024-05-16 03:41, Jan Beulich wrote:
>> On 16.05.2024 04:22, Jason Andryuk wrote:
>>> From: Jason Andryuk 
>>>
>>> From: Jason Andryuk 
>>
>> Two identical From: (also in another patch of yours, while in yet another one
>> you have two _different_ ones, when only one will survive into the eventual
>> commit anyway)?
> 
> Sorry about that.  Since I was sending from my gmail account, I thought 
> I needed explicit From: lines to ensure the authorship was listed w/ 
> amd.com.  I generated the patches with `git format-patch --from`, to get 
> the explicit From: lines, and then sent with `git send-email`.  The 
> send-email step then inserted the additional lines.  I guess it added 
>  From amd.com since I had changed to that address in .gitconfig.
> 
>>> backendtype=phy using the blktap kernel module needs to use write_dev,
>>> but tapback can't support that.  tapback should perform better, but make
>>> the script compatible with the old kernel module again.
>>>
>>> Signed-off-by: Jason Andryuk 
>>
>> Should there be a Fixes: tag here?
> 
> That makes sense.
> 
> Fixes: 76a484193d ("hotplug: Update block-tap")

Surely this wants going into 4.19? Thus - Anthony, Oleksii?

Jan



Re: [PATCH for-4.19] xen/arm: static-shmem: fix "gbase/pbase used uninitialized" build failure

2024-06-19 Thread Michal Orzel
Hi Julien,

On 19/06/2024 13:55, Julien Grall wrote:
> 
> 
> Hi Michal,
> 
> On 19/06/2024 07:46, Michal Orzel wrote:
>> Building Xen with CONFIG_STATIC_SHM=y results in a build failure:
>>
>> arch/arm/static-shmem.c: In function 'process_shm':
>> arch/arm/static-shmem.c:327:41: error: 'gbase' may be used uninitialized 
>> [-Werror=maybe-uninitialized]
>>327 | if ( is_domain_direct_mapped(d) && (pbase != gbase) )
>> arch/arm/static-shmem.c:305:17: note: 'gbase' was declared here
>>305 | paddr_t gbase, pbase, psize;
>>
>> This is because the commit cb1ddafdc573 adds a check referencing
>> gbase/pbase variables which were not yet assigned a value. Fix it.
>>
>> Fixes: cb1ddafdc573 ("xen/arm/static-shmem: Static-shmem should be 
>> direct-mapped for direct-mapped domains")
>> Signed-off-by: Michal Orzel 
>> ---
>> Rationale for 4.19: this patch fixes a build failure reported by CI:
>> https://gitlab.com/xen-project/xen/-/jobs/7131807878
>> ---
>>   xen/arch/arm/static-shmem.c | 13 +++--
>>   1 file changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
>> index c434b96e6204..cd48d2896b7e 100644
>> --- a/xen/arch/arm/static-shmem.c
>> +++ b/xen/arch/arm/static-shmem.c
>> @@ -324,12 +324,6 @@ int __init process_shm(struct domain *d, struct 
>> kernel_info *kinfo,
>>   printk("%pd: static shared memory bank not found: '%s'", d, 
>> shm_id);
>>   return -ENOENT;
>>   }
>> -if ( is_domain_direct_mapped(d) && (pbase != gbase) )
>> -{
>> -printk("%pd: physical address 0x%"PRIpaddr" and guest address 
>> 0x%"PRIpaddr" are not direct-mapped.\n",
>> -   d, pbase, gbase);
>> -return -EINVAL;
>> -}
>>
>>   pbase = boot_shm_bank->start;
>>   psize = boot_shm_bank->size;
>> @@ -353,6 +347,13 @@ int __init process_shm(struct domain *d, struct 
>> kernel_info *kinfo,
>>   /* guest phys address is after host phys address */
>>   gbase = dt_read_paddr(cells + addr_cells, addr_cells);
>>
>> +if ( is_domain_direct_mapped(d) && (pbase != gbase) )
>> +{
>> +printk("%pd: physical address 0x%"PRIpaddr" and guest 
>> address 0x%"PRIpaddr" are not direct-mapped.\n",
>> +   d, pbase, gbase);
>> +return -EINVAL;
>> +}
>> +
> 
> Before this patch, the check was globally. I guess the intention was it
> covers the two part of the "if". But now, you only have it in when
> "paddr" is specified in the DT.
> 
>  From a brief look at the code, I can't figure out why we don't need a
> similar check on the else path. Is this because it is guarantee that
> will be paddr == gaddr?
The reason why I added this check only in the first case is due to what doc 
states.
It says that if a domain is 1:1, the shmem should be also 1:1 i.e. pbase == 
gbase. In the else
case the pbase is omitted and thus a user cannot know and has no guarantee what 
will be the backing physical address.
Thus, reading this doc makes me feel that for 1:1 guests user needs to specify 
pbase == gbase.

~Michal




[linux-linus test] 186406: tolerable FAIL - PUSHED

2024-06-19 Thread osstest service owner
flight 186406 linux-linus real [real]
flight 186410 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/186406/
http://logs.test-lab.xenproject.org/osstest/logs/186410/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-arndale   8 xen-bootfail pass in 186410-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-arndale 15 migrate-support-check fail in 186410 never pass
 test-armhf-armhf-xl-arndale 16 saverestore-support-check fail in 186410 never 
pass
 test-armhf-armhf-libvirt  8 xen-boot fail  like 186398
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186398
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186398
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186398
 test-armhf-armhf-examine  8 reboot   fail  like 186398
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail like 
186398
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186398
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186398
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass

version targeted for testing:
 linux92e5605a199efbaee59fb19e15d6cc2103a04ec2
baseline version:
 linux46d1907d1ca422ae814c52065f243caa010a

Last test of basis   186398  2024-06-18 16:13:58 Z0 days
Testing same since   186406  2024-06-19 02:45:48 Z0 days1 attempts


People who touched revisions under test:
  Amer Al Shanawany 
  John Hubbard 
  Linus Torvalds 
  Shuah Khan 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd6

Re: [PATCH] loadpolicy: Verifies memory allocation during policy loading

2024-06-19 Thread Jan Beulich
On 27.05.2024 14:54, ysk...@gmail.com wrote:
> --- a/tools/flask/utils/loadpolicy.c
> +++ b/tools/flask/utils/loadpolicy.c
> @@ -58,6 +58,11 @@ int main (int argCnt, const char *args[])
>  }
>  
>  polMemCp = malloc(info.st_size);
> +if (!polMemCp) {
> +fprintf(stderr, "Error occurred allocating %ld bytes\n", 
> info.st_size);
> +ret = -ENOMEM;

I don't think -ENOMEM is valid to use here. See neighboring code. Nevertheless
it is correct that a check should be here.

As to %ld - is that portably usable with an off_t value?

In any event, Daniel, really your turn to review / ack. I'm looking at this
merely because I found this and another bugfix still sit in waiting-for-ack
state.

Jan



Re: [PATCH for-4.19?] tools/libs/light: Fix nic->vlan memory allocation

2024-06-19 Thread Jan Beulich
On 20.05.2024 19:08, Jason Andryuk wrote:
> On 2024-05-20 12:44, Leigh Brown wrote:
>> After the following commit:
>> 3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic")
>> xl list -l aborts with a double free error if a domain has at least
>> one vif defined:
>>
>>$ sudo xl list -l
>>free(): double free detected in tcache 2
>>Aborted
>>
>> Orginally, the vlan field was called vid and was defined as an integer.
>> It was appropriate to call libxl__xs_read_checked() with gc passed as
>> the string data was copied to a different variable.  However, the final
>> version uses a string data type and the call should have been changed
>> to use NOGC instead of gc to allow that data to live past the gc
>> controlled lifetime, in line with the other string fields.
>>
>> This patch makes the change to pass NOGC instead of gc and moves the
>> new code to be next to the other string fields (fixing a couple of
>> errant tabs along the way), as recommended by Jason.
>>
>> Fixes: 3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic")
>> Signed-off-by: Leigh Brown 
> 
> Reviewed-by: Jason Andryuk 

I notice this wasn't Cc-ed to the maintainer, which likely is the reason
for there not having been an ack yet. Anthony, any thoughts?

Further at this point, bug fix or not, it would likely also need a release
ack. Oleksii, thoughts?

Jan



Re: [PATCH for-4.19] xen/arm: static-shmem: fix "gbase/pbase used uninitialized" build failure

2024-06-19 Thread Julien Grall

Hi Michal,

On 19/06/2024 07:46, Michal Orzel wrote:

Building Xen with CONFIG_STATIC_SHM=y results in a build failure:

arch/arm/static-shmem.c: In function 'process_shm':
arch/arm/static-shmem.c:327:41: error: 'gbase' may be used uninitialized 
[-Werror=maybe-uninitialized]
   327 | if ( is_domain_direct_mapped(d) && (pbase != gbase) )
arch/arm/static-shmem.c:305:17: note: 'gbase' was declared here
   305 | paddr_t gbase, pbase, psize;

This is because the commit cb1ddafdc573 adds a check referencing
gbase/pbase variables which were not yet assigned a value. Fix it.

Fixes: cb1ddafdc573 ("xen/arm/static-shmem: Static-shmem should be direct-mapped for 
direct-mapped domains")
Signed-off-by: Michal Orzel 
---
Rationale for 4.19: this patch fixes a build failure reported by CI:
https://gitlab.com/xen-project/xen/-/jobs/7131807878
---
  xen/arch/arm/static-shmem.c | 13 +++--
  1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
index c434b96e6204..cd48d2896b7e 100644
--- a/xen/arch/arm/static-shmem.c
+++ b/xen/arch/arm/static-shmem.c
@@ -324,12 +324,6 @@ int __init process_shm(struct domain *d, struct 
kernel_info *kinfo,
  printk("%pd: static shared memory bank not found: '%s'", d, 
shm_id);
  return -ENOENT;
  }
-if ( is_domain_direct_mapped(d) && (pbase != gbase) )
-{
-printk("%pd: physical address 0x%"PRIpaddr" and guest address 
0x%"PRIpaddr" are not direct-mapped.\n",
-   d, pbase, gbase);
-return -EINVAL;
-}
  
  pbase = boot_shm_bank->start;

  psize = boot_shm_bank->size;
@@ -353,6 +347,13 @@ int __init process_shm(struct domain *d, struct 
kernel_info *kinfo,
  /* guest phys address is after host phys address */
  gbase = dt_read_paddr(cells + addr_cells, addr_cells);
  
+if ( is_domain_direct_mapped(d) && (pbase != gbase) )

+{
+printk("%pd: physical address 0x%"PRIpaddr" and guest address 
0x%"PRIpaddr" are not direct-mapped.\n",
+   d, pbase, gbase);
+return -EINVAL;
+}
+


Before this patch, the check was globally. I guess the intention was it 
covers the two part of the "if". But now, you only have it in when 
"paddr" is specified in the DT.


From a brief look at the code, I can't figure out why we don't need a 
similar check on the else path. Is this because it is guarantee that 
will be paddr == gaddr?


Cheers,

--
Julien Grall



Re: [PATCH for-4.19 v4] x86/irq: forward pending interrupts to new destination in fixup_irqs()

2024-06-19 Thread Jan Beulich
On 19.06.2024 11:58, Roger Pau Monne wrote:
> fixup_irqs() is used to evacuate interrupts from to be offlined CPUs.  Given
> the CPU is to become offline, the normal migration logic used by Xen where the
> vector in the previous target(s) is left configured until the interrupt is
> received on the new destination is not suitable.
> 
> Instead attempt to do as much as possible in order to prevent loosing
> interrupts.  If fixup_irqs() is called from the CPU to be offlined (as is
> currently the case for CPU hot unplug) attempt to forward pending vectors when
> interrupts that target the current CPU are migrated to a different 
> destination.
> 
> Additionally, for interrupts that have already been moved from the current CPU
> prior to the call to fixup_irqs() but that haven't been delivered to the new
> destination (iow: interrupts with move_in_progress set and the current CPU set
> in ->arch.old_cpu_mask) also check whether the previous vector is pending and
> forward it to the new destination.
> 
> This allows us to remove the window with interrupts enabled at the bottom of
> fixup_irqs().  Such window wasn't safe anyway: references to the CPU to become
> offline are removed from interrupts masks, but the per-CPU vector_irq[] array
> is not updated to reflect those changes (as the CPU is going offline anyway).
> 
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Jan Beulich 





Re: [PATCH for-4.19 v4.5 2/7] x86/xstate: Cross-check dynamic XSTATE sizes at boot

2024-06-19 Thread Jan Beulich
On 19.06.2024 12:46, Andrew Cooper wrote:
> Right now, xstate_ctxt_size() performs a cross-check of size with CPUID in for
> every call.  This is expensive, being used for domain create/migrate, as well
> as to service certain guest CPUID instructions.
> 
> Instead, arrange to check the sizes once at boot.  See the code comments for
> details.  Right now, it just checks hardware against the algorithm
> expectations.  Later patches will cross-check Xen's XSTATE calculations too.
> 
> Introduce more X86_XCR0_* and X86_XSS_* constants CPUID bits.  This is to
> maximise coverage in the sanity check, even if we don't expect to
> use/virtualise some of these features any time soon.  Leave HDC and HWP alone
> for now; we don't have CPUID bits from them stored nicely.
> 
> Only perform the cross-checks when SELF_TESTS are active.  It's only
> developers or new hardware liable to trip these checks, and Xen at least
> tracks "maximum value ever seen in xcr0" for the lifetime of the VM, which we
> don't want to be tickling in the general case.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 





Re: [PATCH v4 2/4] xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor

2024-06-19 Thread Julien Grall

Hi Stefano,

On 19/06/2024 01:37, Stefano Stabellini wrote:

On Mon, 27 May 2024, Oleksii K. wrote:

I don't think it is a big problem if this is not merged for the code
freeze as this is technically a bug fix.


Agree, this is not a problem as it is still looks to me as a bug fix.

~ Oleksii


Hi Oleksii, this version of the series was already all acked with minor
NITs and you gave the go-ahead for this release as it is a bug fix. Due
to 2 weeks of travels I only managed to commit the series now, sorry for
the delay.


Unfortunately this series is breaking gitlab CI [1]. I understand the go 
ahead was given two weeks ago, but as we are now past the code freeze, I 
feel we should have had a pros/cons e-mail to assess whether it was 
worth the risk to merge it.


Now to the issues, I vaguely recall this series didn't require any 
change in Linux. Did I miss anything? If so, why are we breaking Linux?


For now, I will revert this series. Once we root cause the issue, we can 
re-assess whether the fix should be apply for 4.19.


Cheers,

[1] https://gitlab.com/xen-project/xen/-/pipelines/1338067978

--
Julien Grall



Re: [XEN PATCH v2] xen: add explicit comment to identify notifier patterns

2024-06-19 Thread Jan Beulich
On 19.06.2024 13:21, Julien Grall wrote:
> 
> 
> On 19/06/2024 12:17, Julien Grall wrote:
>> Hi Federico,
>>
>> On 19/06/2024 10:29, Federico Serafini wrote:
>>> MISRA C Rule 16.4 states that every `switch' statement shall have a
>>> `default' label" and a statement or a comment prior to the
>>> terminating break statement.
>>>
>>> This patch addresses some violations of the rule related to the
>>> "notifier pattern": a frequently-used pattern whereby only a few values
>>> are handled by the switch statement and nothing should be done for
>>> others (nothing to do in the default case).
>>>
>>> Note that for function mwait_idle_cpu_init() in
>>> xen/arch/x86/cpu/mwait-idle.c the /* Notifier pattern. */ comment is
>>> not added: differently from the other functions covered in this patch,
>>> the default label has a return statement that does not violates Rule 
>>> 16.4.
>>>
>>> No functional change.
>>>
>>> Signed-off-by: Federico Serafini 
>>> ---
>>> Changes in v2:
>>> as Jan pointed out, in the v1 some patterns were not explicitly 
>>> identified
>>> (https://lore.kernel.org/xen-devel/cad05a5c-e2d8-4e5d-af05-30ae6f959...@bugseng.com/).
>>>
>>> This version adds the /* Notifier pattern. */ to all the pattern 
>>> present in
>>> the Xen codebase except for mwait_idle_cpu_init().
>>> ---
>>>   xen/arch/arm/cpuerrata.c | 1 +
>>>   xen/arch/arm/gic-v3-lpi.c    | 4 
>>>   xen/arch/arm/gic.c   | 1 +
>>>   xen/arch/arm/irq.c   | 4 
>>>   xen/arch/arm/mmu/p2m.c   | 1 +
>>>   xen/arch/arm/percpu.c    | 1 +
>>>   xen/arch/arm/smpboot.c   | 1 +
>>>   xen/arch/arm/time.c  | 1 +
>>>   xen/arch/arm/vgic-v3-its.c   | 2 ++
>>>   xen/arch/x86/acpi/cpu_idle.c | 4 
>>>   xen/arch/x86/cpu/mcheck/mce.c    | 4 
>>>   xen/arch/x86/cpu/mcheck/mce_intel.c  | 4 
>>>   xen/arch/x86/genapic/x2apic.c    | 3 +++
>>>   xen/arch/x86/hvm/hvm.c   | 1 +
>>>   xen/arch/x86/nmi.c   | 1 +
>>>   xen/arch/x86/percpu.c    | 3 +++
>>>   xen/arch/x86/psr.c   | 3 +++
>>>   xen/arch/x86/smpboot.c   | 3 +++
>>>   xen/common/kexec.c   | 1 +
>>>   xen/common/rcupdate.c    | 1 +
>>>   xen/common/sched/core.c  | 1 +
>>>   xen/common/sched/cpupool.c   | 1 +
>>>   xen/common/spinlock.c    | 1 +
>>>   xen/common/tasklet.c | 1 +
>>>   xen/common/timer.c   | 1 +
>>>   xen/drivers/cpufreq/cpufreq.c    | 1 +
>>>   xen/drivers/cpufreq/cpufreq_misc_governors.c | 3 +++
>>>   xen/drivers/passthrough/x86/hvm.c    | 3 +++
>>>   xen/drivers/passthrough/x86/iommu.c  | 3 +++
>>>   29 files changed, 59 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
>>> index 2b7101ea25..69c30aecd8 100644
>>> --- a/xen/arch/arm/cpuerrata.c
>>> +++ b/xen/arch/arm/cpuerrata.c
>>> @@ -730,6 +730,7 @@ static int cpu_errata_callback(struct 
>>> notifier_block *nfb,
>>>   rc = enable_nonboot_cpu_caps(arm_errata);
>>>   break;
>>>   default:
>>> +    /* Notifier pattern. */
>> Without looking at the commit message (which may not be trivial when 
>> committed), it is not clear to me what this is supposed to mean. Will 
>> there be a longer explanation in the MISRA doc? Should this be a SAF-* 
>> comment?
> 
> Please ignore this comment. Just found it in the rules.rst.

Except that there it is only an example (and such an example could even
be replaced at any time). Already on the previous version I had asked
that some explanation be added as to what this means and under what
circumstances it is legitimate to add (kind of related to a later part
of the earlier reply of yours).

Jan



Re: [PATCH for-4.19] xen/arm: static-shmem: fix "gbase/pbase used uninitialized" build failure

2024-06-19 Thread Michal Orzel
Hi Julein,

On 19/06/2024 13:37, Julien Grall wrote:
> 
> 
> Hi,
> 
> On 19/06/2024 12:19, Oleksii K. wrote:
>> Hi,
>> On Wed, 2024-06-19 at 09:02 +, Bertrand Marquis wrote:
>>> Hi,
>>>
>>> Adding Oleksii for Release ack.
>>>
>>> Cheers
>>> Bertrand
>>>
 On 19 Jun 2024, at 08:46, Michal Orzel 
 wrote:

 Building Xen with CONFIG_STATIC_SHM=y results in a build failure:

 arch/arm/static-shmem.c: In function 'process_shm':
 arch/arm/static-shmem.c:327:41: error: 'gbase' may be used
 uninitialized [-Werror=maybe-uninitialized]
   327 | if ( is_domain_direct_mapped(d) && (pbase != gbase)
 )
 arch/arm/static-shmem.c:305:17: note: 'gbase' was declared here
   305 | paddr_t gbase, pbase, psize;

 This is because the commit cb1ddafdc573 adds a check referencing
 gbase/pbase variables which were not yet assigned a value. Fix it.

 Fixes: cb1ddafdc573 ("xen/arm/static-shmem: Static-shmem should be
 direct-mapped for direct-mapped domains")
 Signed-off-by: Michal Orzel 
>> Release-Acked-by: Oleksii Kurochko 
> 
> I have committed to unblock the CI. But I have some questions on the
> approach. I will ask them separately.
The CI failures as seen in:
https://gitlab.com/xen-project/xen/-/pipelines/1338067978
are due to 2 issues. This patch solves the first one. The other is related to 
Henry's xenstore
series that without a corresponding Linux patch, that has been merged into 
mainline, causes a regression.
And thus all the dom0less PV tests fail. We will need to revert the xenstore 
patches for now.

~Michal



Re: [PATCH for-4.19] xen/arm: static-shmem: fix "gbase/pbase used uninitialized" build failure

2024-06-19 Thread Julien Grall

Hi,

On 19/06/2024 12:19, Oleksii K. wrote:

Hi,
On Wed, 2024-06-19 at 09:02 +, Bertrand Marquis wrote:

Hi,

Adding Oleksii for Release ack.

Cheers
Bertrand


On 19 Jun 2024, at 08:46, Michal Orzel 
wrote:

Building Xen with CONFIG_STATIC_SHM=y results in a build failure:

arch/arm/static-shmem.c: In function 'process_shm':
arch/arm/static-shmem.c:327:41: error: 'gbase' may be used
uninitialized [-Werror=maybe-uninitialized]
  327 | if ( is_domain_direct_mapped(d) && (pbase != gbase)
)
arch/arm/static-shmem.c:305:17: note: 'gbase' was declared here
  305 | paddr_t gbase, pbase, psize;

This is because the commit cb1ddafdc573 adds a check referencing
gbase/pbase variables which were not yet assigned a value. Fix it.

Fixes: cb1ddafdc573 ("xen/arm/static-shmem: Static-shmem should be
direct-mapped for direct-mapped domains")
Signed-off-by: Michal Orzel 

Release-Acked-by: Oleksii Kurochko 


I have committed to unblock the CI. But I have some questions on the 
approach. I will ask them separately.


Cheers,

--
Julien Grall



Re: [PATCH for-4.19 v4 0/7] x86/xstate: Fixes to size calculations

2024-06-19 Thread Oleksii K.
On Mon, 2024-06-17 at 18:39 +0100, Andrew Cooper wrote:
> Only minor changes in v4 vs v3.  See patches for details.
> 
> The end result has been tested across the entire XenServer hardware
> lab.  This
> found several false assupmtion about how the dynamic sizes behave.
Release-Acked-by: Oleksii Kurochko 

~ Oleksii
> 
> Patches 1 and 6 the main bugfixes from this series.  There's still
> lots more
> work to do in order to get AMX and/or CET working, but this is at
> least a 4-yo
> series finally off my plate.
> 
> Andrew Cooper (7):
>   x86/xstate: Fix initialisation of XSS cache
>   x86/xstate: Cross-check dynamic XSTATE sizes at boot
>   x86/boot: Collect the Raw CPU Policy earlier on boot
>   x86/xstate: Rework xstate_ctxt_size() as xstate_uncompressed_size()
>   x86/cpu-policy: Simplify recalculate_xstate()
>   x86/cpuid: Fix handling of XSAVE dynamic leaves
>   x86/defns: Clean up X86_{XCR0,XSS}_* constants
> 
>  xen/arch/x86/cpu-policy.c   |  56 ++--
>  xen/arch/x86/cpuid.c    |  24 +-
>  xen/arch/x86/domctl.c   |   2 +-
>  xen/arch/x86/hvm/hvm.c  |   2 +-
>  xen/arch/x86/i387.c |   2 +-
>  xen/arch/x86/include/asm/x86-defns.h    |  55 ++--
>  xen/arch/x86/include/asm/xstate.h   |   8 +-
>  xen/arch/x86/setup.c    |   4 +-
>  xen/arch/x86/xstate.c   | 294 +-
> --
>  xen/include/public/arch-x86/cpufeatureset.h |   3 +
>  xen/include/xen/lib/x86/cpu-policy.h    |   2 +-
>  11 files changed, 330 insertions(+), 122 deletions(-)
> 



Re: [XEN PATCH v2] xen: add explicit comment to identify notifier patterns

2024-06-19 Thread Julien Grall




On 19/06/2024 12:17, Julien Grall wrote:

Hi Federico,

On 19/06/2024 10:29, Federico Serafini wrote:

MISRA C Rule 16.4 states that every `switch' statement shall have a
`default' label" and a statement or a comment prior to the
terminating break statement.

This patch addresses some violations of the rule related to the
"notifier pattern": a frequently-used pattern whereby only a few values
are handled by the switch statement and nothing should be done for
others (nothing to do in the default case).

Note that for function mwait_idle_cpu_init() in
xen/arch/x86/cpu/mwait-idle.c the /* Notifier pattern. */ comment is
not added: differently from the other functions covered in this patch,
the default label has a return statement that does not violates Rule 
16.4.


No functional change.

Signed-off-by: Federico Serafini 
---
Changes in v2:
as Jan pointed out, in the v1 some patterns were not explicitly 
identified

(https://lore.kernel.org/xen-devel/cad05a5c-e2d8-4e5d-af05-30ae6f959...@bugseng.com/).

This version adds the /* Notifier pattern. */ to all the pattern 
present in

the Xen codebase except for mwait_idle_cpu_init().
---
  xen/arch/arm/cpuerrata.c | 1 +
  xen/arch/arm/gic-v3-lpi.c    | 4 
  xen/arch/arm/gic.c   | 1 +
  xen/arch/arm/irq.c   | 4 
  xen/arch/arm/mmu/p2m.c   | 1 +
  xen/arch/arm/percpu.c    | 1 +
  xen/arch/arm/smpboot.c   | 1 +
  xen/arch/arm/time.c  | 1 +
  xen/arch/arm/vgic-v3-its.c   | 2 ++
  xen/arch/x86/acpi/cpu_idle.c | 4 
  xen/arch/x86/cpu/mcheck/mce.c    | 4 
  xen/arch/x86/cpu/mcheck/mce_intel.c  | 4 
  xen/arch/x86/genapic/x2apic.c    | 3 +++
  xen/arch/x86/hvm/hvm.c   | 1 +
  xen/arch/x86/nmi.c   | 1 +
  xen/arch/x86/percpu.c    | 3 +++
  xen/arch/x86/psr.c   | 3 +++
  xen/arch/x86/smpboot.c   | 3 +++
  xen/common/kexec.c   | 1 +
  xen/common/rcupdate.c    | 1 +
  xen/common/sched/core.c  | 1 +
  xen/common/sched/cpupool.c   | 1 +
  xen/common/spinlock.c    | 1 +
  xen/common/tasklet.c | 1 +
  xen/common/timer.c   | 1 +
  xen/drivers/cpufreq/cpufreq.c    | 1 +
  xen/drivers/cpufreq/cpufreq_misc_governors.c | 3 +++
  xen/drivers/passthrough/x86/hvm.c    | 3 +++
  xen/drivers/passthrough/x86/iommu.c  | 3 +++
  29 files changed, 59 insertions(+)

diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
index 2b7101ea25..69c30aecd8 100644
--- a/xen/arch/arm/cpuerrata.c
+++ b/xen/arch/arm/cpuerrata.c
@@ -730,6 +730,7 @@ static int cpu_errata_callback(struct 
notifier_block *nfb,

  rc = enable_nonboot_cpu_caps(arm_errata);
  break;
  default:
+    /* Notifier pattern. */
Without looking at the commit message (which may not be trivial when 
committed), it is not clear to me what this is supposed to mean. Will 
there be a longer explanation in the MISRA doc? Should this be a SAF-* 
comment?


Please ignore this comment. Just found it in the rules.rst.




  break;
  }
diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
index eb0a5535e4..4c2bd35403 100644
--- a/xen/arch/arm/gic-v3-lpi.c
+++ b/xen/arch/arm/gic-v3-lpi.c
@@ -389,6 +389,10 @@ static int cpu_callback(struct notifier_block 
*nfb, unsigned long action,
  printk(XENLOG_ERR "Unable to allocate the pendtable for 
CPU%lu\n",

 cpu);
  break;
+
+    default:
+    /* Notifier pattern. */
+    break;


Skimming through v1, it was pointed out that gic-v3-lpi may miss some 
cases.


Let me start with that I understand this patch is technically not 
changing anything. However, it gives us an opportunity to check the 
notifier pattern.


Has anyone done any proper investigation? If so, what was the outcome? 
If not, have we identified someone to do it?


The same question will apply for place where you add "default".

Cheers,



--
Julien Grall



Re: [PATCH for-4.19] xen/arm: static-shmem: fix "gbase/pbase used uninitialized" build failure

2024-06-19 Thread Oleksii K.
Hi, 
On Wed, 2024-06-19 at 09:02 +, Bertrand Marquis wrote:
> Hi,
> 
> Adding Oleksii for Release ack.
> 
> Cheers
> Bertrand
> 
> > On 19 Jun 2024, at 08:46, Michal Orzel 
> > wrote:
> > 
> > Building Xen with CONFIG_STATIC_SHM=y results in a build failure:
> > 
> > arch/arm/static-shmem.c: In function 'process_shm':
> > arch/arm/static-shmem.c:327:41: error: 'gbase' may be used
> > uninitialized [-Werror=maybe-uninitialized]
> >  327 | if ( is_domain_direct_mapped(d) && (pbase != gbase)
> > )
> > arch/arm/static-shmem.c:305:17: note: 'gbase' was declared here
> >  305 | paddr_t gbase, pbase, psize;
> > 
> > This is because the commit cb1ddafdc573 adds a check referencing
> > gbase/pbase variables which were not yet assigned a value. Fix it.
> > 
> > Fixes: cb1ddafdc573 ("xen/arm/static-shmem: Static-shmem should be
> > direct-mapped for direct-mapped domains")
> > Signed-off-by: Michal Orzel 
Release-Acked-by: Oleksii Kurochko 

~ Oleksii
> > ---
> > Rationale for 4.19: this patch fixes a build failure reported by
> > CI:
> > https://gitlab.com/xen-project/xen/-/jobs/7131807878
> > ---
> > xen/arch/arm/static-shmem.c | 13 +++--
> > 1 file changed, 7 insertions(+), 6 deletions(-)
> > 
> > diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-
> > shmem.c
> > index c434b96e6204..cd48d2896b7e 100644
> > --- a/xen/arch/arm/static-shmem.c
> > +++ b/xen/arch/arm/static-shmem.c
> > @@ -324,12 +324,6 @@ int __init process_shm(struct domain *d,
> > struct kernel_info *kinfo,
> >     printk("%pd: static shared memory bank not found:
> > '%s'", d, shm_id);
> >     return -ENOENT;
> >     }
> > -    if ( is_domain_direct_mapped(d) && (pbase != gbase) )
> > -    {
> > -    printk("%pd: physical address 0x%"PRIpaddr" and guest
> > address 0x%"PRIpaddr" are not direct-mapped.\n",
> > -   d, pbase, gbase);
> > -    return -EINVAL;
> > -    }
> > 
> >     pbase = boot_shm_bank->start;
> >     psize = boot_shm_bank->size;
> > @@ -353,6 +347,13 @@ int __init process_shm(struct domain *d,
> > struct kernel_info *kinfo,
> >     /* guest phys address is after host phys address */
> >     gbase = dt_read_paddr(cells + addr_cells, addr_cells);
> > 
> > +    if ( is_domain_direct_mapped(d) && (pbase != gbase) )
> > +    {
> > +    printk("%pd: physical address 0x%"PRIpaddr" and
> > guest address 0x%"PRIpaddr" are not direct-mapped.\n",
> > +   d, pbase, gbase);
> > +    return -EINVAL;
> > +    }
> > +
> >     for ( i = 0; i < PFN_DOWN(psize); i++ )
> >     if ( !mfn_valid(mfn_add(maddr_to_mfn(pbase), i)) )
> >     {
> > -- 
> > 2.25.1
> > 
> 



Re: [XEN PATCH v2] xen: add explicit comment to identify notifier patterns

2024-06-19 Thread Julien Grall

Hi Federico,

On 19/06/2024 10:29, Federico Serafini wrote:

MISRA C Rule 16.4 states that every `switch' statement shall have a
`default' label" and a statement or a comment prior to the
terminating break statement.

This patch addresses some violations of the rule related to the
"notifier pattern": a frequently-used pattern whereby only a few values
are handled by the switch statement and nothing should be done for
others (nothing to do in the default case).

Note that for function mwait_idle_cpu_init() in
xen/arch/x86/cpu/mwait-idle.c the /* Notifier pattern. */ comment is
not added: differently from the other functions covered in this patch,
the default label has a return statement that does not violates Rule 16.4.

No functional change.

Signed-off-by: Federico Serafini 
---
Changes in v2:
as Jan pointed out, in the v1 some patterns were not explicitly identified
(https://lore.kernel.org/xen-devel/cad05a5c-e2d8-4e5d-af05-30ae6f959...@bugseng.com/).

This version adds the /* Notifier pattern. */ to all the pattern present in
the Xen codebase except for mwait_idle_cpu_init().
---
  xen/arch/arm/cpuerrata.c | 1 +
  xen/arch/arm/gic-v3-lpi.c| 4 
  xen/arch/arm/gic.c   | 1 +
  xen/arch/arm/irq.c   | 4 
  xen/arch/arm/mmu/p2m.c   | 1 +
  xen/arch/arm/percpu.c| 1 +
  xen/arch/arm/smpboot.c   | 1 +
  xen/arch/arm/time.c  | 1 +
  xen/arch/arm/vgic-v3-its.c   | 2 ++
  xen/arch/x86/acpi/cpu_idle.c | 4 
  xen/arch/x86/cpu/mcheck/mce.c| 4 
  xen/arch/x86/cpu/mcheck/mce_intel.c  | 4 
  xen/arch/x86/genapic/x2apic.c| 3 +++
  xen/arch/x86/hvm/hvm.c   | 1 +
  xen/arch/x86/nmi.c   | 1 +
  xen/arch/x86/percpu.c| 3 +++
  xen/arch/x86/psr.c   | 3 +++
  xen/arch/x86/smpboot.c   | 3 +++
  xen/common/kexec.c   | 1 +
  xen/common/rcupdate.c| 1 +
  xen/common/sched/core.c  | 1 +
  xen/common/sched/cpupool.c   | 1 +
  xen/common/spinlock.c| 1 +
  xen/common/tasklet.c | 1 +
  xen/common/timer.c   | 1 +
  xen/drivers/cpufreq/cpufreq.c| 1 +
  xen/drivers/cpufreq/cpufreq_misc_governors.c | 3 +++
  xen/drivers/passthrough/x86/hvm.c| 3 +++
  xen/drivers/passthrough/x86/iommu.c  | 3 +++
  29 files changed, 59 insertions(+)

diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
index 2b7101ea25..69c30aecd8 100644
--- a/xen/arch/arm/cpuerrata.c
+++ b/xen/arch/arm/cpuerrata.c
@@ -730,6 +730,7 @@ static int cpu_errata_callback(struct notifier_block *nfb,
  rc = enable_nonboot_cpu_caps(arm_errata);
  break;
  default:
+/* Notifier pattern. */
Without looking at the commit message (which may not be trivial when 
committed), it is not clear to me what this is supposed to mean. Will 
there be a longer explanation in the MISRA doc? Should this be a SAF-* 
comment?



  break;
  }
  
diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c

index eb0a5535e4..4c2bd35403 100644
--- a/xen/arch/arm/gic-v3-lpi.c
+++ b/xen/arch/arm/gic-v3-lpi.c
@@ -389,6 +389,10 @@ static int cpu_callback(struct notifier_block *nfb, 
unsigned long action,
  printk(XENLOG_ERR "Unable to allocate the pendtable for CPU%lu\n",
 cpu);
  break;
+
+default:
+/* Notifier pattern. */
+break;


Skimming through v1, it was pointed out that gic-v3-lpi may miss some cases.

Let me start with that I understand this patch is technically not 
changing anything. However, it gives us an opportunity to check the 
notifier pattern.


Has anyone done any proper investigation? If so, what was the outcome? 
If not, have we identified someone to do it?


The same question will apply for place where you add "default".

Cheers,

--
Julien Grall



[PATCH for-4.19 v4.5 2/7] x86/xstate: Cross-check dynamic XSTATE sizes at boot

2024-06-19 Thread Andrew Cooper
Right now, xstate_ctxt_size() performs a cross-check of size with CPUID in for
every call.  This is expensive, being used for domain create/migrate, as well
as to service certain guest CPUID instructions.

Instead, arrange to check the sizes once at boot.  See the code comments for
details.  Right now, it just checks hardware against the algorithm
expectations.  Later patches will cross-check Xen's XSTATE calculations too.

Introduce more X86_XCR0_* and X86_XSS_* constants CPUID bits.  This is to
maximise coverage in the sanity check, even if we don't expect to
use/virtualise some of these features any time soon.  Leave HDC and HWP alone
for now; we don't have CPUID bits from them stored nicely.

Only perform the cross-checks when SELF_TESTS are active.  It's only
developers or new hardware liable to trip these checks, and Xen at least
tracks "maximum value ever seen in xcr0" for the lifetime of the VM, which we
don't want to be tickling in the general case.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Oleksii Kurochko 

v3:
 * New
v4:
 * Rebase over CONFIG_SELF_TESTS
 * Swap one BUG_ON() for a WARN()

v4.5:
 * Reorder xstate_check_sizes() to strictly increase by index.  In turn this
   strengthens the compressed check to "size always increases".
 * For new superivsor states, check that the uncomressed size doesn't change.

On Sapphire Rapids with the whole series inc diagnostics, we get this pattern:

  (XEN) *** check_new_xstate(, 0x0003)
  (XEN) *** check_new_xstate(, 0x0004)
  (XEN) *** check_new_xstate(, 0x00e0)
  (XEN) *** check_new_xstate(, 0x0100)
  (XEN) *** check_new_xstate(, 0x0200)
  (XEN) *** check_new_xstate(, 0x0400)
  (XEN) *** check_new_xstate(, 0x0800)
  (XEN) *** check_new_xstate(, 0x1000)
  (XEN) *** check_new_xstate(, 0x4000)
  (XEN) *** check_new_xstate(, 0x8000)
  (XEN) *** check_new_xstate(, 0x0006)

and on Genoa, this pattern:

  (Xen) *** check_new_xstate(, 0x0003)
  (Xen) *** check_new_xstate(, 0x0004)
  (Xen) *** check_new_xstate(, 0x00e0)
  (Xen) *** check_new_xstate(, 0x0200)
  (Xen) *** check_new_xstate(, 0x0800)
  (Xen) *** check_new_xstate(, 0x1000)
---
 xen/arch/x86/include/asm/x86-defns.h|  25 ++-
 xen/arch/x86/xstate.c   | 170 
 xen/include/public/arch-x86/cpufeatureset.h |   3 +
 3 files changed, 197 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/include/asm/x86-defns.h 
b/xen/arch/x86/include/asm/x86-defns.h
index 48d7a3b7af45..d7602ab225c4 100644
--- a/xen/arch/x86/include/asm/x86-defns.h
+++ b/xen/arch/x86/include/asm/x86-defns.h
@@ -77,7 +77,7 @@
 #define X86_CR4_PKS0x0100 /* Protection Key Supervisor */
 
 /*
- * XSTATE component flags in XCR0
+ * XSTATE component flags in XCR0 | MSR_XSS
  */
 #define X86_XCR0_FP_POS   0
 #define X86_XCR0_FP   (1ULL << X86_XCR0_FP_POS)
@@ -95,11 +95,34 @@
 #define X86_XCR0_ZMM  (1ULL << X86_XCR0_ZMM_POS)
 #define X86_XCR0_HI_ZMM_POS   7
 #define X86_XCR0_HI_ZMM   (1ULL << X86_XCR0_HI_ZMM_POS)
+#define X86_XSS_PROC_TRACE(_AC(1, ULL) <<  8)
 #define X86_XCR0_PKRU_POS 9
 #define X86_XCR0_PKRU (1ULL << X86_XCR0_PKRU_POS)
+#define X86_XSS_PASID (_AC(1, ULL) << 10)
+#define X86_XSS_CET_U (_AC(1, ULL) << 11)
+#define X86_XSS_CET_S (_AC(1, ULL) << 12)
+#define X86_XSS_HDC   (_AC(1, ULL) << 13)
+#define X86_XSS_UINTR (_AC(1, ULL) << 14)
+#define X86_XSS_LBR   (_AC(1, ULL) << 15)
+#define X86_XSS_HWP   (_AC(1, ULL) << 16)
+#define X86_XCR0_TILE_CFG (_AC(1, ULL) << 17)
+#define X86_XCR0_TILE_DATA(_AC(1, ULL) << 18)
 #define X86_XCR0_LWP_POS  62
 #define X86_XCR0_LWP  (1ULL << X86_XCR0_LWP_POS)
 
+#define X86_XCR0_STATES \
+(X86_XCR0_FP | X86_XCR0_SSE | X86_XCR0_YMM | X86_XCR0_BNDREGS | \
+ X86_XCR0_BNDCSR | X86_XCR0_OPMASK | X86_XCR0_ZMM | \
+ X86_XCR0_HI_ZMM | X86_XCR0_PKRU | X86_XCR0_TILE_CFG |  \
+ X86_XCR0_TILE_DATA |   \
+ X86_XCR0_LWP)
+
+#define X86_XSS_STATES  \
+(X86_XSS_PROC_TRACE | X86_XSS_PASID | X86_XSS_CET_U |   \
+ X86_XSS_CET_S | X86_XSS_HDC | X86_XSS_UINTR | X86_XSS_LBR |\
+ X86_XSS_HWP |  \
+ 0)
+
 /*
  * Debug status flags in DR6.
  *
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 75788147966a..408d9dd10897 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -604,9 +604,176 @@ static bool valid_xcr0(uint64_t xcr0)
 if ( !(xcr0 & X86_XCR0_BNDREGS) != !(xcr0 & X86_XCR0_BNDCSR) )
 return false;
 
+/* TILECFG and TILEDATA must be the same. */
+if (

[PATCH v2] x86/xen/time: Reduce Xen timer tick

2024-06-19 Thread Frediano Ziglio
Current timer tick is causing some deadline to fail.
The current high value constant was probably due to an old
bug in the Xen timer implementation causing errors if the
deadline was in the future.

This was fixed in Xen commit:
19c6cbd90965 xen/vcpu: ignore VCPU_SSHOTTMR_future

Usage of VCPU_SSHOTTMR_future in Linux kernel was removed by:
c06b6d70feb3 xen/x86: don't lose event interrupts

Signed-off-by: Frediano Ziglio 
---
 arch/x86/xen/time.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Changes since v1:
- Update commit message;
- reduce delay.

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 52fa5609b7f6..96521b1874ac 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -30,7 +30,7 @@
 #include "xen-ops.h"
 
 /* Minimum amount of time until next clock event fires */
-#define TIMER_SLOP 10
+#define TIMER_SLOP 1
 
 static u64 xen_sched_clock_offset __read_mostly;
 
-- 
2.45.1




Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-19 Thread Chen, Jiqian
On 2024/6/19 17:49, Jan Beulich wrote:
> On 19.06.2024 10:51, Chen, Jiqian wrote:
>> On 2024/6/19 16:06, Jan Beulich wrote:
>>> On 19.06.2024 09:53, Chen, Jiqian wrote:
 On 2024/6/18 16:55, Jan Beulich wrote:
> On 18.06.2024 08:57, Chen, Jiqian wrote:
>> On 2024/6/17 22:52, Jan Beulich wrote:
>>> On 17.06.2024 11:00, Jiqian Chen wrote:
 The gsi of a passthrough device must be configured for it to be
 able to be mapped into a hvm domU.
 But When dom0 is PVH, the gsis don't get registered, it causes
 the info of apic, pin and irq not be added into irq_2_pin list,
 and the handler of irq_desc is not set, then when passthrough a
 device, setting ioapic affinity and vector will fail.

 To fix above problem, on Linux kernel side, a new code will
 need to call PHYSDEVOP_setup_gsi for passthrough devices to
 register gsi when dom0 is PVH.

 So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
 purpose.

 Signed-off-by: Jiqian Chen 
 Signed-off-by: Huang Rui 
 Signed-off-by: Jiqian Chen 
 ---
 The code link that will call this hypercall on linux kernel side is as 
 follows:
 https://lore.kernel.org/xen-devel/20240607075109.126277-3-jiqian.c...@amd.com/
>>>
>>> One of my v9 comments was addressed, thanks. Repeating the other, 
>>> unaddressed
>>> one here:
>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>  operation, I think it'll also want/need explaining why what is 
>>> sufficient for
>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>> I have modified the commit message to describe why GSIs are not 
>> registered can cause passthrough not work, according to this v9 comment.
>> " it causes the info of apic, pin and irq not be added into irq_2_pin 
>> list, and the handler of irq_desc is not set, then when passthrough a 
>> device, setting ioapic affinity and vector will fail."
>> What description do you want me to add?
>
> What I'd first like to have clarification on (i.e. before putting it in
> the description one way or another): How come Dom0 alone gets away fine
> without making the call, yet for passthrough-to-DomU it's needed? Is it
> perhaps that it just so happened that for Dom0 things have been working
> on systems where it was tested, but the call should in principle have been
> there in this case, too [1]? That (to me at least) would make quite a
> difference for both this patch's description and us accepting it.
 Oh, I think I know what's your concern now. Thanks.
 First question, why gsi of device can work on PVH dom0:
 Because when probe a driver to a normal device, it will call linux kernel 
 side:pci_device_probe-> request_threaded_irq-> irq_startup-> 
 __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> 
 hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> 
 vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
 Second question, why gsi of passthrough can't work on PVH dom0:
 Because when assign a device to be passthrough, it uses pciback to probe 
 the device, and it calls pcistub_probe, but in all callstack of 
 pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the 
 function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when 
 the gsi is unmasked, so that the gsi can't work for passthrough device.
>>>
>>> And why exactly would the fake IRQ handler not be set up by pciback? Its
>>> setting up ought to lead to those same IO-APIC RTE writes that Xen
>>> intercepts.
>> Because isr_on is not set, when xen_pcibk_control_isr is called, it will 
>> return due to " !dev_data->isr_on". So that fake IRQ handler aren't 
>> installed.
> 
> I'm afraid I don't follow you here. Quoting from the function:
> 
>   enable =  dev_data->enable_intx;
> 
>   /* Asked to disable, but ISR isn't runnig */
>   if (!enable && !dev_data->isr_on)
>   return;
> 
> I.e. we bail if the request was to _disable_ and there is no ISR.
I mean, after debugging the pcistub_probe callstack:
pcistub_seize-> pcistub_init_device-> xen_pcibk_reset_device-> 
xen_pcibk_control_isr(dev, 1 /*reset device*/)
and in xen_pcibk_control_isr code:
if (reset) {
dev_data->enable_intx = 0;
dev_data->ack_intr = 0;
}
enable =  dev_data->enable_intx;

/* Asked to disable, but ISR isn't runnig */
if (!enable && !dev_data->isr_on)
return;
"reset" is 1, so "dev_date->enable_intx" is set 0, then "enable" is 0, and then 
arrive the second "if" check, "enable" is 0 and "dev_date->isr_on" is also 0, 
so it return here.
It can't reach following code to install irq handle.

> 
> I can't 

[PATCH for-4.19 v4] x86/irq: forward pending interrupts to new destination in fixup_irqs()

2024-06-19 Thread Roger Pau Monne
fixup_irqs() is used to evacuate interrupts from to be offlined CPUs.  Given
the CPU is to become offline, the normal migration logic used by Xen where the
vector in the previous target(s) is left configured until the interrupt is
received on the new destination is not suitable.

Instead attempt to do as much as possible in order to prevent loosing
interrupts.  If fixup_irqs() is called from the CPU to be offlined (as is
currently the case for CPU hot unplug) attempt to forward pending vectors when
interrupts that target the current CPU are migrated to a different destination.

Additionally, for interrupts that have already been moved from the current CPU
prior to the call to fixup_irqs() but that haven't been delivered to the new
destination (iow: interrupts with move_in_progress set and the current CPU set
in ->arch.old_cpu_mask) also check whether the previous vector is pending and
forward it to the new destination.

This allows us to remove the window with interrupts enabled at the bottom of
fixup_irqs().  Such window wasn't safe anyway: references to the CPU to become
offline are removed from interrupts masks, but the per-CPU vector_irq[] array
is not updated to reflect those changes (as the CPU is going offline anyway).

Signed-off-by: Roger Pau Monné 
---
Changes since v3:
 - Move the IRR check after the cpumask_copy().

Changes since v2:
 - Remove interrupt enabled window from fixup_irqs().
 - Adjust comments and commit message.

Changes since v1:
 - Rename to apic_irr_read().
---
 xen/arch/x86/include/asm/apic.h |  5 
 xen/arch/x86/irq.c  | 46 -
 2 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/include/asm/apic.h b/xen/arch/x86/include/asm/apic.h
index d1cb001fb4ab..7bd66dc6e151 100644
--- a/xen/arch/x86/include/asm/apic.h
+++ b/xen/arch/x86/include/asm/apic.h
@@ -132,6 +132,11 @@ static inline bool apic_isr_read(uint8_t vector)
 (vector & 0x1f)) & 1;
 }
 
+static inline bool apic_irr_read(unsigned int vector)
+{
+return apic_read(APIC_IRR + (vector / 32 * 0x10)) & (1U << (vector % 32));
+}
+
 static inline u32 get_apic_id(void)
 {
 u32 id = apic_read(APIC_ID);
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index d7f15c38af22..9a611c79e024 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2591,7 +2591,7 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
 
 for ( irq = 0; irq < nr_irqs; irq++ )
 {
-bool break_affinity = false, set_affinity = true;
+bool break_affinity = false, set_affinity = true, check_irr = false;
 unsigned int vector, cpu = smp_processor_id();
 cpumask_t *affinity = this_cpu(scratch_cpumask);
 
@@ -2644,6 +2644,25 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
  !cpu_online(cpu) &&
  cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )
 {
+/*
+ * This to be offlined CPU was the target of an interrupt that's
+ * been moved, and the new destination target hasn't yet
+ * acknowledged any interrupt from it.
+ *
+ * We know the interrupt is configured to target the new CPU at
+ * this point, so we can check IRR for any pending vectors and
+ * forward them to the new destination.
+ *
+ * Note that for the other case of an interrupt movement being in
+ * progress (move_cleanup_count being non-zero) we know the new
+ * destination has already acked at least one interrupt from this
+ * source, and hence there's no need to forward any stale
+ * interrupts.
+ */
+if ( apic_irr_read(desc->arch.old_vector) )
+send_IPI_mask(cpumask_of(cpumask_any(desc->arch.cpu_mask)),
+  desc->arch.vector);
+
 /*
  * This CPU is going offline, remove it from ->arch.old_cpu_mask
  * and possibly release the old vector if the old mask becomes
@@ -2684,6 +2703,14 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
 if ( desc->handler->disable )
 desc->handler->disable(desc);
 
+/*
+ * If the current CPU is going offline and is (one of) the target(s) of
+ * the interrupt, signal to check whether there are any pending vectors
+ * to be handled in the local APIC after the interrupt has been moved.
+ */
+if ( !cpu_online(cpu) && cpumask_test_cpu(cpu, desc->arch.cpu_mask) )
+check_irr = true;
+
 if ( desc->handler->set_affinity )
 desc->handler->set_affinity(desc, affinity);
 else if ( !(warned++) )
@@ -2694,6 +2721,18 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
 
 cpumask_copy(affinity, desc->affinity);
 
+if ( check_irr && apic_irr_read(vector) )
+/*
+ * Forward pending interrupt to the new dest

Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-19 Thread Jan Beulich
On 19.06.2024 10:51, Chen, Jiqian wrote:
> On 2024/6/19 16:06, Jan Beulich wrote:
>> On 19.06.2024 09:53, Chen, Jiqian wrote:
>>> On 2024/6/18 16:55, Jan Beulich wrote:
 On 18.06.2024 08:57, Chen, Jiqian wrote:
> On 2024/6/17 22:52, Jan Beulich wrote:
>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>> The gsi of a passthrough device must be configured for it to be
>>> able to be mapped into a hvm domU.
>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>> and the handler of irq_desc is not set, then when passthrough a
>>> device, setting ioapic affinity and vector will fail.
>>>
>>> To fix above problem, on Linux kernel side, a new code will
>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>> register gsi when dom0 is PVH.
>>>
>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>> purpose.
>>>
>>> Signed-off-by: Jiqian Chen 
>>> Signed-off-by: Huang Rui 
>>> Signed-off-by: Jiqian Chen 
>>> ---
>>> The code link that will call this hypercall on linux kernel side is as 
>>> follows:
>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-jiqian.c...@amd.com/
>>
>> One of my v9 comments was addressed, thanks. Repeating the other, 
>> unaddressed
>> one here:
>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>  operation, I think it'll also want/need explaining why what is 
>> sufficient for
>>  Dom0 alone isn't sufficient when pass-through comes into play."
> I have modified the commit message to describe why GSIs are not 
> registered can cause passthrough not work, according to this v9 comment.
> " it causes the info of apic, pin and irq not be added into irq_2_pin 
> list, and the handler of irq_desc is not set, then when passthrough a 
> device, setting ioapic affinity and vector will fail."
> What description do you want me to add?

 What I'd first like to have clarification on (i.e. before putting it in
 the description one way or another): How come Dom0 alone gets away fine
 without making the call, yet for passthrough-to-DomU it's needed? Is it
 perhaps that it just so happened that for Dom0 things have been working
 on systems where it was tested, but the call should in principle have been
 there in this case, too [1]? That (to me at least) would make quite a
 difference for both this patch's description and us accepting it.
>>> Oh, I think I know what's your concern now. Thanks.
>>> First question, why gsi of device can work on PVH dom0:
>>> Because when probe a driver to a normal device, it will call linux kernel 
>>> side:pci_device_probe-> request_threaded_irq-> irq_startup-> 
>>> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> 
>>> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> 
>>> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
>>> Second question, why gsi of passthrough can't work on PVH dom0:
>>> Because when assign a device to be passthrough, it uses pciback to probe 
>>> the device, and it calls pcistub_probe, but in all callstack of 
>>> pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the 
>>> function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when 
>>> the gsi is unmasked, so that the gsi can't work for passthrough device.
>>
>> And why exactly would the fake IRQ handler not be set up by pciback? Its
>> setting up ought to lead to those same IO-APIC RTE writes that Xen
>> intercepts.
> Because isr_on is not set, when xen_pcibk_control_isr is called, it will 
> return due to " !dev_data->isr_on". So that fake IRQ handler aren't installed.

I'm afraid I don't follow you here. Quoting from the function:

enable =  dev_data->enable_intx;

/* Asked to disable, but ISR isn't runnig */
if (!enable && !dev_data->isr_on)
return;

I.e. we bail if the request was to _disable_ and there is no ISR.

I can't exclude though that command_write()'s logic to set ->enable_intx
is insufficient. But in the common case one would surely expect at least
one of PCI_COMMAND_MEMORY and PCI_COMMAND_IO to be set first by a guest.
IOW at some point I'd expect xen_pcibk_control_isr() to be called with
the second argument 0 and with ->enable_intx set.

> And it seems isr_on is set through driver sysfs " irq_handler_state" for a 
> level device that is to be shared with guest and the IRQ is shared with the 
> initial domain.

The sysfs interface is, according to my reading of the description
of the commit introducing it, merely for debugging / recovery purposes.
(It also looks to me as if this was partly broken: If one would use it,
thus clearing ->isr_on, a subsequent disable request would take exactly
that early bailing path quoted above, with n

[XEN PATCH v2] xen: add explicit comment to identify notifier patterns

2024-06-19 Thread Federico Serafini
MISRA C Rule 16.4 states that every `switch' statement shall have a
`default' label" and a statement or a comment prior to the
terminating break statement.

This patch addresses some violations of the rule related to the
"notifier pattern": a frequently-used pattern whereby only a few values
are handled by the switch statement and nothing should be done for
others (nothing to do in the default case).

Note that for function mwait_idle_cpu_init() in
xen/arch/x86/cpu/mwait-idle.c the /* Notifier pattern. */ comment is
not added: differently from the other functions covered in this patch,
the default label has a return statement that does not violates Rule 16.4.

No functional change.

Signed-off-by: Federico Serafini 
---
Changes in v2:
as Jan pointed out, in the v1 some patterns were not explicitly identified
(https://lore.kernel.org/xen-devel/cad05a5c-e2d8-4e5d-af05-30ae6f959...@bugseng.com/).

This version adds the /* Notifier pattern. */ to all the pattern present in
the Xen codebase except for mwait_idle_cpu_init().
---
 xen/arch/arm/cpuerrata.c | 1 +
 xen/arch/arm/gic-v3-lpi.c| 4 
 xen/arch/arm/gic.c   | 1 +
 xen/arch/arm/irq.c   | 4 
 xen/arch/arm/mmu/p2m.c   | 1 +
 xen/arch/arm/percpu.c| 1 +
 xen/arch/arm/smpboot.c   | 1 +
 xen/arch/arm/time.c  | 1 +
 xen/arch/arm/vgic-v3-its.c   | 2 ++
 xen/arch/x86/acpi/cpu_idle.c | 4 
 xen/arch/x86/cpu/mcheck/mce.c| 4 
 xen/arch/x86/cpu/mcheck/mce_intel.c  | 4 
 xen/arch/x86/genapic/x2apic.c| 3 +++
 xen/arch/x86/hvm/hvm.c   | 1 +
 xen/arch/x86/nmi.c   | 1 +
 xen/arch/x86/percpu.c| 3 +++
 xen/arch/x86/psr.c   | 3 +++
 xen/arch/x86/smpboot.c   | 3 +++
 xen/common/kexec.c   | 1 +
 xen/common/rcupdate.c| 1 +
 xen/common/sched/core.c  | 1 +
 xen/common/sched/cpupool.c   | 1 +
 xen/common/spinlock.c| 1 +
 xen/common/tasklet.c | 1 +
 xen/common/timer.c   | 1 +
 xen/drivers/cpufreq/cpufreq.c| 1 +
 xen/drivers/cpufreq/cpufreq_misc_governors.c | 3 +++
 xen/drivers/passthrough/x86/hvm.c| 3 +++
 xen/drivers/passthrough/x86/iommu.c  | 3 +++
 29 files changed, 59 insertions(+)

diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
index 2b7101ea25..69c30aecd8 100644
--- a/xen/arch/arm/cpuerrata.c
+++ b/xen/arch/arm/cpuerrata.c
@@ -730,6 +730,7 @@ static int cpu_errata_callback(struct notifier_block *nfb,
 rc = enable_nonboot_cpu_caps(arm_errata);
 break;
 default:
+/* Notifier pattern. */
 break;
 }
 
diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
index eb0a5535e4..4c2bd35403 100644
--- a/xen/arch/arm/gic-v3-lpi.c
+++ b/xen/arch/arm/gic-v3-lpi.c
@@ -389,6 +389,10 @@ static int cpu_callback(struct notifier_block *nfb, 
unsigned long action,
 printk(XENLOG_ERR "Unable to allocate the pendtable for CPU%lu\n",
cpu);
 break;
+
+default:
+/* Notifier pattern. */
+break;
 }
 
 return notifier_from_errno(rc);
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 3eaf670fd7..dc5408a456 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -463,6 +463,7 @@ static int cpu_gic_callback(struct notifier_block *nfb,
 release_irq(gic_hw_ops->info->maintenance_irq, NULL);
 break;
 default:
+/* Notifier pattern. */
 break;
 }
 
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index c60502444c..61ca6f5b87 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -127,6 +127,10 @@ static int cpu_callback(struct notifier_block *nfb, 
unsigned long action,
 printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%u\n",
cpu);
 break;
+
+default:
+/* Notifier pattern. */
+break;
 }
 
 return notifier_from_errno(rc);
diff --git a/xen/arch/arm/mmu/p2m.c b/xen/arch/arm/mmu/p2m.c
index 1725cca649..bf7c66155d 100644
--- a/xen/arch/arm/mmu/p2m.c
+++ b/xen/arch/arm/mmu/p2m.c
@@ -1839,6 +1839,7 @@ static int cpu_virt_paging_callback(struct notifier_block 
*nfb,
 setup_virt_paging_one(NULL);
 break;
 default:
+/* Notifier pattern. */
 break;
 }
 
diff --git a/xen/arch/arm/percpu.c b/xen/arch/arm/percpu.c
index 87fe960330..81f91f05bb 100644
--- a/xen/arch/arm/percpu.c
+++ b/xen/arch/arm/percpu.c
@@ -66,6 +66,7 @@ static int cpu_percpu_callback(
 free_percpu_area(cpu);
 break;
 default:
+/* Notifier pattern. */
 break;

Re: [PATCH v3 3/3] x86/irq: forward pending interrupts to new destination in fixup_irqs()

2024-06-19 Thread Jan Beulich
On 19.06.2024 10:32, Roger Pau Monné wrote:
> On Wed, Jun 19, 2024 at 09:24:41AM +0200, Jan Beulich wrote:
>> On 19.06.2024 09:05, Roger Pau Monné wrote:
>>> On Tue, Jun 18, 2024 at 06:30:22PM +0200, Jan Beulich wrote:
 On 18.06.2024 16:50, Roger Pau Monné wrote:
> On Tue, Jun 18, 2024 at 04:34:50PM +0200, Jan Beulich wrote:
>> On 18.06.2024 13:30, Roger Pau Monné wrote:
>>> On Mon, Jun 17, 2024 at 03:41:12PM +0200, Jan Beulich wrote:
 On 13.06.2024 18:56, Roger Pau Monne wrote:
> @@ -2686,11 +2705,27 @@ void fixup_irqs(const cpumask_t *mask, bool 
> verbose)
>  if ( desc->handler->disable )
>  desc->handler->disable(desc);
>  
> +/*
> + * If the current CPU is going offline and is (one of) the 
> target(s) of
> + * the interrupt, signal to check whether there are any 
> pending vectors
> + * to be handled in the local APIC after the interrupt has 
> been moved.
> + */
> +if ( !cpu_online(cpu) && cpumask_test_cpu(cpu, 
> desc->arch.cpu_mask) )
> +check_irr = true;
> +
>  if ( desc->handler->set_affinity )
>  desc->handler->set_affinity(desc, affinity);
>  else if ( !(warned++) )
>  set_affinity = false;
>  
> +if ( check_irr && apic_irr_read(vector) )
> +/*
> + * Forward pending interrupt to the new destination, 
> this CPU is
> + * going offline and otherwise the interrupt would be 
> lost.
> + */
> +
> send_IPI_mask(cpumask_of(cpumask_any(desc->arch.cpu_mask)),
> +  desc->arch.vector);

 Hmm, IRR may become set right after the IRR read (unlike in the other 
 cases,
 where new IRQs ought to be surfacing only at the new destination). 
 Doesn't
 this want moving ...

>  if ( desc->handler->enable )
>  desc->handler->enable(desc);

 ... past the actual affinity change?
>>>
>>> Hm, but the ->enable() hook is just unmasking the interrupt, the
>>> actual affinity change is done in ->set_affinity(), and hence after
>>> the call to ->set_affinity() no further interrupts should be delivered
>>> to the CPU regardless of whether the source is masked?
>>>
>>> Or is it possible for the device/interrupt controller to not switch to
>>> use the new destination until the interrupt is unmasked, and hence
>>> could have pending masked interrupts still using the old destination?
>>> IIRC For MSI-X it's required that the device updates the destination
>>> target once the entry is unmasked.
>>
>> That's all not relevant here, I think. IRR gets set when an interrupt is
>> signaled, no matter whether it's masked.
>
> I'm kind of lost here, what does signaling mean in this context?
>
> I would expect the interrupt vector to not get set in IRR if the MSI-X
> entry is masked, as at that point the state of the address/data fields
> might not be consistent (that's the whole point of masking it right?)
>
>> It's its handling which the
>> masking would prevent, i.e. the "moving" of the set bit from IRR to ISR.
>
> My understanding was that the masking would prevent the message write to
> the APIC from happening, and hence no vector should get set in IRR.

 Hmm, yes, looks like I was confused. The masking is at the source side
 (IO-APIC RTE, MSI-X entry, or - if supported - in the MSI capability).
 So the sole case to worry about is MSI without mask-bit support then.
>>>
>>> Yeah, and for MSI without masking bit support we don't care doing the
>>> IRR check before or after the ->enable() hook, as that's a no-op in
>>> that case.  The write to the MSI address/data fields has already been
>>> done, and hence the issue would be exclusively with draining any
>>> in-flight writes to the APIC doorbell (what you mention below).
>>
>> Except that both here ...
>>
>> Plus we run with IRQs off here anyway if I'm not mistaken, so no
>> interrupt can be delivered to the local CPU. IOW whatever IRR bits it
>> has set (including ones becoming set between the IRR read and the actual
>> vector change), those would never be serviced. Hence the reading of the
>> bit ought to occur after the vector change: It's only then that we know
>> the IRR bit corresponding to the old vector can't become set anymore.
>
> Right, and the vector change happens in ->set_affinity(), not
> ->enable().  See for example set_msi_affinity() and the
> write_msi_msg(), that's where the vector gets changed.
>
>> And even then 

Re: [PATCH for-4.19] xen/arm: static-shmem: fix "gbase/pbase used uninitialized" build failure

2024-06-19 Thread Bertrand Marquis
Hi,

Adding Oleksii for Release ack.

Cheers
Bertrand

> On 19 Jun 2024, at 08:46, Michal Orzel  wrote:
> 
> Building Xen with CONFIG_STATIC_SHM=y results in a build failure:
> 
> arch/arm/static-shmem.c: In function 'process_shm':
> arch/arm/static-shmem.c:327:41: error: 'gbase' may be used uninitialized 
> [-Werror=maybe-uninitialized]
>  327 | if ( is_domain_direct_mapped(d) && (pbase != gbase) )
> arch/arm/static-shmem.c:305:17: note: 'gbase' was declared here
>  305 | paddr_t gbase, pbase, psize;
> 
> This is because the commit cb1ddafdc573 adds a check referencing
> gbase/pbase variables which were not yet assigned a value. Fix it.
> 
> Fixes: cb1ddafdc573 ("xen/arm/static-shmem: Static-shmem should be 
> direct-mapped for direct-mapped domains")
> Signed-off-by: Michal Orzel 
> ---
> Rationale for 4.19: this patch fixes a build failure reported by CI:
> https://gitlab.com/xen-project/xen/-/jobs/7131807878
> ---
> xen/arch/arm/static-shmem.c | 13 +++--
> 1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
> index c434b96e6204..cd48d2896b7e 100644
> --- a/xen/arch/arm/static-shmem.c
> +++ b/xen/arch/arm/static-shmem.c
> @@ -324,12 +324,6 @@ int __init process_shm(struct domain *d, struct 
> kernel_info *kinfo,
> printk("%pd: static shared memory bank not found: '%s'", d, 
> shm_id);
> return -ENOENT;
> }
> -if ( is_domain_direct_mapped(d) && (pbase != gbase) )
> -{
> -printk("%pd: physical address 0x%"PRIpaddr" and guest address 
> 0x%"PRIpaddr" are not direct-mapped.\n",
> -   d, pbase, gbase);
> -return -EINVAL;
> -}
> 
> pbase = boot_shm_bank->start;
> psize = boot_shm_bank->size;
> @@ -353,6 +347,13 @@ int __init process_shm(struct domain *d, struct 
> kernel_info *kinfo,
> /* guest phys address is after host phys address */
> gbase = dt_read_paddr(cells + addr_cells, addr_cells);
> 
> +if ( is_domain_direct_mapped(d) && (pbase != gbase) )
> +{
> +printk("%pd: physical address 0x%"PRIpaddr" and guest 
> address 0x%"PRIpaddr" are not direct-mapped.\n",
> +   d, pbase, gbase);
> +return -EINVAL;
> +}
> +
> for ( i = 0; i < PFN_DOWN(psize); i++ )
> if ( !mfn_valid(mfn_add(maddr_to_mfn(pbase), i)) )
> {
> -- 
> 2.25.1
> 




Re: [PATCH for-4.19] xen/arm: static-shmem: fix "gbase/pbase used uninitialized" build failure

2024-06-19 Thread Bertrand Marquis
Hi Michal,

> On 19 Jun 2024, at 08:46, Michal Orzel  wrote:
> 
> Building Xen with CONFIG_STATIC_SHM=y results in a build failure:
> 
> arch/arm/static-shmem.c: In function 'process_shm':
> arch/arm/static-shmem.c:327:41: error: 'gbase' may be used uninitialized 
> [-Werror=maybe-uninitialized]
>  327 | if ( is_domain_direct_mapped(d) && (pbase != gbase) )
> arch/arm/static-shmem.c:305:17: note: 'gbase' was declared here
>  305 | paddr_t gbase, pbase, psize;
> 
> This is because the commit cb1ddafdc573 adds a check referencing
> gbase/pbase variables which were not yet assigned a value. Fix it.
> 
> Fixes: cb1ddafdc573 ("xen/arm/static-shmem: Static-shmem should be 
> direct-mapped for direct-mapped domains")
> Signed-off-by: Michal Orzel 

Reviewed-by: Bertrand Marquis 

Cheers
Bertrand

> ---
> Rationale for 4.19: this patch fixes a build failure reported by CI:
> https://gitlab.com/xen-project/xen/-/jobs/7131807878
> ---
> xen/arch/arm/static-shmem.c | 13 +++--
> 1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
> index c434b96e6204..cd48d2896b7e 100644
> --- a/xen/arch/arm/static-shmem.c
> +++ b/xen/arch/arm/static-shmem.c
> @@ -324,12 +324,6 @@ int __init process_shm(struct domain *d, struct 
> kernel_info *kinfo,
> printk("%pd: static shared memory bank not found: '%s'", d, 
> shm_id);
> return -ENOENT;
> }
> -if ( is_domain_direct_mapped(d) && (pbase != gbase) )
> -{
> -printk("%pd: physical address 0x%"PRIpaddr" and guest address 
> 0x%"PRIpaddr" are not direct-mapped.\n",
> -   d, pbase, gbase);
> -return -EINVAL;
> -}
> 
> pbase = boot_shm_bank->start;
> psize = boot_shm_bank->size;
> @@ -353,6 +347,13 @@ int __init process_shm(struct domain *d, struct 
> kernel_info *kinfo,
> /* guest phys address is after host phys address */
> gbase = dt_read_paddr(cells + addr_cells, addr_cells);
> 
> +if ( is_domain_direct_mapped(d) && (pbase != gbase) )
> +{
> +printk("%pd: physical address 0x%"PRIpaddr" and guest 
> address 0x%"PRIpaddr" are not direct-mapped.\n",
> +   d, pbase, gbase);
> +return -EINVAL;
> +}
> +
> for ( i = 0; i < PFN_DOWN(psize); i++ )
> if ( !mfn_valid(mfn_add(maddr_to_mfn(pbase), i)) )
> {
> -- 
> 2.25.1
> 




Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-19 Thread Chen, Jiqian
On 2024/6/19 16:06, Jan Beulich wrote:
> On 19.06.2024 09:53, Chen, Jiqian wrote:
>> On 2024/6/18 16:55, Jan Beulich wrote:
>>> On 18.06.2024 08:57, Chen, Jiqian wrote:
 On 2024/6/17 22:52, Jan Beulich wrote:
> On 17.06.2024 11:00, Jiqian Chen wrote:
>> The gsi of a passthrough device must be configured for it to be
>> able to be mapped into a hvm domU.
>> But When dom0 is PVH, the gsis don't get registered, it causes
>> the info of apic, pin and irq not be added into irq_2_pin list,
>> and the handler of irq_desc is not set, then when passthrough a
>> device, setting ioapic affinity and vector will fail.
>>
>> To fix above problem, on Linux kernel side, a new code will
>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>> register gsi when dom0 is PVH.
>>
>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>> purpose.
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>> The code link that will call this hypercall on linux kernel side is as 
>> follows:
>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-jiqian.c...@amd.com/
>
> One of my v9 comments was addressed, thanks. Repeating the other, 
> unaddressed
> one here:
> "As to GSIs not being registered: If that's not a problem for Dom0's own
>  operation, I think it'll also want/need explaining why what is 
> sufficient for
>  Dom0 alone isn't sufficient when pass-through comes into play."
 I have modified the commit message to describe why GSIs are not registered 
 can cause passthrough not work, according to this v9 comment.
 " it causes the info of apic, pin and irq not be added into irq_2_pin 
 list, and the handler of irq_desc is not set, then when passthrough a 
 device, setting ioapic affinity and vector will fail."
 What description do you want me to add?
>>>
>>> What I'd first like to have clarification on (i.e. before putting it in
>>> the description one way or another): How come Dom0 alone gets away fine
>>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>>> perhaps that it just so happened that for Dom0 things have been working
>>> on systems where it was tested, but the call should in principle have been
>>> there in this case, too [1]? That (to me at least) would make quite a
>>> difference for both this patch's description and us accepting it.
>> Oh, I think I know what's your concern now. Thanks.
>> First question, why gsi of device can work on PVH dom0:
>> Because when probe a driver to a normal device, it will call linux kernel 
>> side:pci_device_probe-> request_threaded_irq-> irq_startup-> 
>> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> 
>> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> 
>> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
>> Second question, why gsi of passthrough can't work on PVH dom0:
>> Because when assign a device to be passthrough, it uses pciback to probe the 
>> device, and it calls pcistub_probe, but in all callstack of pcistub_probe, 
>> it doesn't unmask the gsi, and we can see on Xen side, the function 
>> vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is 
>> unmasked, so that the gsi can't work for passthrough device.
> 
> And why exactly would the fake IRQ handler not be set up by pciback? Its
> setting up ought to lead to those same IO-APIC RTE writes that Xen
> intercepts.
Because isr_on is not set, when xen_pcibk_control_isr is called, it will return 
due to " !dev_data->isr_on". So that fake IRQ handler aren't installed.
And it seems isr_on is set through driver sysfs " irq_handler_state" for a 
level device that is to be shared with guest and the IRQ is shared with the 
initial domain.

> 
> In any event, imo a summary of the above wants to be part of the patch
> description.
OK, will add into the commit message in next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.


Re: [PATCH v3 3/3] x86/irq: forward pending interrupts to new destination in fixup_irqs()

2024-06-19 Thread Roger Pau Monné
On Wed, Jun 19, 2024 at 09:24:41AM +0200, Jan Beulich wrote:
> On 19.06.2024 09:05, Roger Pau Monné wrote:
> > On Tue, Jun 18, 2024 at 06:30:22PM +0200, Jan Beulich wrote:
> >> On 18.06.2024 16:50, Roger Pau Monné wrote:
> >>> On Tue, Jun 18, 2024 at 04:34:50PM +0200, Jan Beulich wrote:
>  On 18.06.2024 13:30, Roger Pau Monné wrote:
> > On Mon, Jun 17, 2024 at 03:41:12PM +0200, Jan Beulich wrote:
> >> On 13.06.2024 18:56, Roger Pau Monne wrote:
> >>> @@ -2686,11 +2705,27 @@ void fixup_irqs(const cpumask_t *mask, bool 
> >>> verbose)
> >>>  if ( desc->handler->disable )
> >>>  desc->handler->disable(desc);
> >>>  
> >>> +/*
> >>> + * If the current CPU is going offline and is (one of) the 
> >>> target(s) of
> >>> + * the interrupt, signal to check whether there are any 
> >>> pending vectors
> >>> + * to be handled in the local APIC after the interrupt has 
> >>> been moved.
> >>> + */
> >>> +if ( !cpu_online(cpu) && cpumask_test_cpu(cpu, 
> >>> desc->arch.cpu_mask) )
> >>> +check_irr = true;
> >>> +
> >>>  if ( desc->handler->set_affinity )
> >>>  desc->handler->set_affinity(desc, affinity);
> >>>  else if ( !(warned++) )
> >>>  set_affinity = false;
> >>>  
> >>> +if ( check_irr && apic_irr_read(vector) )
> >>> +/*
> >>> + * Forward pending interrupt to the new destination, 
> >>> this CPU is
> >>> + * going offline and otherwise the interrupt would be 
> >>> lost.
> >>> + */
> >>> +
> >>> send_IPI_mask(cpumask_of(cpumask_any(desc->arch.cpu_mask)),
> >>> +  desc->arch.vector);
> >>
> >> Hmm, IRR may become set right after the IRR read (unlike in the other 
> >> cases,
> >> where new IRQs ought to be surfacing only at the new destination). 
> >> Doesn't
> >> this want moving ...
> >>
> >>>  if ( desc->handler->enable )
> >>>  desc->handler->enable(desc);
> >>
> >> ... past the actual affinity change?
> >
> > Hm, but the ->enable() hook is just unmasking the interrupt, the
> > actual affinity change is done in ->set_affinity(), and hence after
> > the call to ->set_affinity() no further interrupts should be delivered
> > to the CPU regardless of whether the source is masked?
> >
> > Or is it possible for the device/interrupt controller to not switch to
> > use the new destination until the interrupt is unmasked, and hence
> > could have pending masked interrupts still using the old destination?
> > IIRC For MSI-X it's required that the device updates the destination
> > target once the entry is unmasked.
> 
>  That's all not relevant here, I think. IRR gets set when an interrupt is
>  signaled, no matter whether it's masked.
> >>>
> >>> I'm kind of lost here, what does signaling mean in this context?
> >>>
> >>> I would expect the interrupt vector to not get set in IRR if the MSI-X
> >>> entry is masked, as at that point the state of the address/data fields
> >>> might not be consistent (that's the whole point of masking it right?)
> >>>
>  It's its handling which the
>  masking would prevent, i.e. the "moving" of the set bit from IRR to ISR.
> >>>
> >>> My understanding was that the masking would prevent the message write to
> >>> the APIC from happening, and hence no vector should get set in IRR.
> >>
> >> Hmm, yes, looks like I was confused. The masking is at the source side
> >> (IO-APIC RTE, MSI-X entry, or - if supported - in the MSI capability).
> >> So the sole case to worry about is MSI without mask-bit support then.
> > 
> > Yeah, and for MSI without masking bit support we don't care doing the
> > IRR check before or after the ->enable() hook, as that's a no-op in
> > that case.  The write to the MSI address/data fields has already been
> > done, and hence the issue would be exclusively with draining any
> > in-flight writes to the APIC doorbell (what you mention below).
> 
> Except that both here ...
> 
>  Plus we run with IRQs off here anyway if I'm not mistaken, so no
>  interrupt can be delivered to the local CPU. IOW whatever IRR bits it
>  has set (including ones becoming set between the IRR read and the actual
>  vector change), those would never be serviced. Hence the reading of the
>  bit ought to occur after the vector change: It's only then that we know
>  the IRR bit corresponding to the old vector can't become set anymore.
> >>>
> >>> Right, and the vector change happens in ->set_affinity(), not
> >>> ->enable().  See for example set_msi_affinity() and the
> >>> write_msi_msg(), that's where the vector gets changed.
> >>>
>  And even then we're assuming that no interrupt signals might 

Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-19 Thread Jan Beulich
On 19.06.2024 09:53, Chen, Jiqian wrote:
> On 2024/6/18 16:55, Jan Beulich wrote:
>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>> On 2024/6/17 22:52, Jan Beulich wrote:
 On 17.06.2024 11:00, Jiqian Chen wrote:
> The gsi of a passthrough device must be configured for it to be
> able to be mapped into a hvm domU.
> But When dom0 is PVH, the gsis don't get registered, it causes
> the info of apic, pin and irq not be added into irq_2_pin list,
> and the handler of irq_desc is not set, then when passthrough a
> device, setting ioapic affinity and vector will fail.
>
> To fix above problem, on Linux kernel side, a new code will
> need to call PHYSDEVOP_setup_gsi for passthrough devices to
> register gsi when dom0 is PVH.
>
> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
> purpose.
>
> Signed-off-by: Jiqian Chen 
> Signed-off-by: Huang Rui 
> Signed-off-by: Jiqian Chen 
> ---
> The code link that will call this hypercall on linux kernel side is as 
> follows:
> https://lore.kernel.org/xen-devel/20240607075109.126277-3-jiqian.c...@amd.com/

 One of my v9 comments was addressed, thanks. Repeating the other, 
 unaddressed
 one here:
 "As to GSIs not being registered: If that's not a problem for Dom0's own
  operation, I think it'll also want/need explaining why what is sufficient 
 for
  Dom0 alone isn't sufficient when pass-through comes into play."
>>> I have modified the commit message to describe why GSIs are not registered 
>>> can cause passthrough not work, according to this v9 comment.
>>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, 
>>> and the handler of irq_desc is not set, then when passthrough a device, 
>>> setting ioapic affinity and vector will fail."
>>> What description do you want me to add?
>>
>> What I'd first like to have clarification on (i.e. before putting it in
>> the description one way or another): How come Dom0 alone gets away fine
>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>> perhaps that it just so happened that for Dom0 things have been working
>> on systems where it was tested, but the call should in principle have been
>> there in this case, too [1]? That (to me at least) would make quite a
>> difference for both this patch's description and us accepting it.
> Oh, I think I know what's your concern now. Thanks.
> First question, why gsi of device can work on PVH dom0:
> Because when probe a driver to a normal device, it will call linux kernel 
> side:pci_device_probe-> request_threaded_irq-> irq_startup-> 
> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> 
> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> 
> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
> Second question, why gsi of passthrough can't work on PVH dom0:
> Because when assign a device to be passthrough, it uses pciback to probe the 
> device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it 
> doesn't unmask the gsi, and we can see on Xen side, the function 
> vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is 
> unmasked, so that the gsi can't work for passthrough device.

And why exactly would the fake IRQ handler not be set up by pciback? Its
setting up ought to lead to those same IO-APIC RTE writes that Xen
intercepts.

In any event, imo a summary of the above wants to be part of the patch
description.

Jan



Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-19 Thread Chen, Jiqian
On 2024/6/18 16:55, Jan Beulich wrote:
> On 18.06.2024 08:57, Chen, Jiqian wrote:
>> On 2024/6/17 22:52, Jan Beulich wrote:
>>> On 17.06.2024 11:00, Jiqian Chen wrote:
 The gsi of a passthrough device must be configured for it to be
 able to be mapped into a hvm domU.
 But When dom0 is PVH, the gsis don't get registered, it causes
 the info of apic, pin and irq not be added into irq_2_pin list,
 and the handler of irq_desc is not set, then when passthrough a
 device, setting ioapic affinity and vector will fail.

 To fix above problem, on Linux kernel side, a new code will
 need to call PHYSDEVOP_setup_gsi for passthrough devices to
 register gsi when dom0 is PVH.

 So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
 purpose.

 Signed-off-by: Jiqian Chen 
 Signed-off-by: Huang Rui 
 Signed-off-by: Jiqian Chen 
 ---
 The code link that will call this hypercall on linux kernel side is as 
 follows:
 https://lore.kernel.org/xen-devel/20240607075109.126277-3-jiqian.c...@amd.com/
>>>
>>> One of my v9 comments was addressed, thanks. Repeating the other, 
>>> unaddressed
>>> one here:
>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>  operation, I think it'll also want/need explaining why what is sufficient 
>>> for
>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>> I have modified the commit message to describe why GSIs are not registered 
>> can cause passthrough not work, according to this v9 comment.
>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, 
>> and the handler of irq_desc is not set, then when passthrough a device, 
>> setting ioapic affinity and vector will fail."
>> What description do you want me to add?
> 
> What I'd first like to have clarification on (i.e. before putting it in
> the description one way or another): How come Dom0 alone gets away fine
> without making the call, yet for passthrough-to-DomU it's needed? Is it
> perhaps that it just so happened that for Dom0 things have been working
> on systems where it was tested, but the call should in principle have been
> there in this case, too [1]? That (to me at least) would make quite a
> difference for both this patch's description and us accepting it.
Oh, I think I know what's your concern now. Thanks.
First question, why gsi of device can work on PVH dom0:
Because when probe a driver to a normal device, it will call linux kernel 
side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> 
io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> 
hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> 
mp_register_gsi. So that the gsi can be registered.
Second question, why gsi of passthrough can't work on PVH dom0:
Because when assign a device to be passthrough, it uses pciback to probe the 
device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it 
doesn't unmask the gsi, and we can see on Xen side, the function 
vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is 
unmasked, so that the gsi can't work for passthrough device.

> 
> Jan
> 
> [1] Alternative e.g. being that because of other actions PVH Dom0 takes,
> like the IO-APIC RTE programming it does for IRQs it wants to use for
> itself, the necessary information is already suitably conveyed to Xen in
> that case. In such a case imo it's relevant to mention in the description.
> Not the least because iirc the pciback driver sets up a fake IRQ handler
> in such cases, which ought to lead to similar IO-APIC RTE programming, at
> which point the question would again arise why the hypercall needs
> exposing.

-- 
Best regards,
Jiqian Chen.


Re: [PATCH] AMD/IOMMU: Improve register_iommu_exclusion_range()

2024-06-19 Thread Jan Beulich
On 19.06.2024 09:45, Jan Beulich wrote:
> On 18.06.2024 20:31, Andrew Cooper wrote:
>> I've finally found the bit in the AMD IOMMU spec which says 64bit accesses 
>> are
>> permitted:
>>
>>   3.4 IOMMU MMIO Registers:
>>
>>   Software access to IOMMU registers may not be larger than 64 bits. Accesses
>>   must be aligned to the size of the access and the size in bytes must be a
>>   power of two. Software may use accesses as small as one byte.
> 
> I take it that the use of 32-bit writes was because of the past need
> also work in a 32-bit hypervisor, not because of perceived restrictions
> by the spec.

In fact it looks like we're already halfway through converting to writeq().

Jan



Re: [PATCH] AMD/IOMMU: Improve register_iommu_exclusion_range()

2024-06-19 Thread Jan Beulich
On 18.06.2024 20:31, Andrew Cooper wrote:
>  * Use 64bit accesses instead of 32bit accesses
>  * Simplify the constant names
>  * Pull base into a local variable to avoid it being reloaded because of the
>memory clobber in writeq().
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Roger Pau Monné 
> 
> RFC.  This is my proposed way of cleaning up the whole IOMMU file.  The
> diffstat speaks for itself.

Absolutely.

> I've finally found the bit in the AMD IOMMU spec which says 64bit accesses are
> permitted:
> 
>   3.4 IOMMU MMIO Registers:
> 
>   Software access to IOMMU registers may not be larger than 64 bits. Accesses
>   must be aligned to the size of the access and the size in bytes must be a
>   power of two. Software may use accesses as small as one byte.

I take it that the use of 32-bit writes was because of the past need
also work in a 32-bit hypervisor, not because of perceived restrictions
by the spec.

> --- a/xen/drivers/passthrough/amd/iommu-defs.h
> +++ b/xen/drivers/passthrough/amd/iommu-defs.h
> @@ -338,22 +338,10 @@ union amd_iommu_control {
>  };
>  
>  /* Exclusion Register */
> -#define IOMMU_EXCLUSION_BASE_LOW_OFFSET  0x20
> -#define IOMMU_EXCLUSION_BASE_HIGH_OFFSET 0x24
> -#define IOMMU_EXCLUSION_LIMIT_LOW_OFFSET 0x28
> -#define IOMMU_EXCLUSION_LIMIT_HIGH_OFFSET0x2C
> -#define IOMMU_EXCLUSION_BASE_LOW_MASK0xF000U
> -#define IOMMU_EXCLUSION_BASE_LOW_SHIFT   12
> -#define IOMMU_EXCLUSION_BASE_HIGH_MASK   0xU
> -#define IOMMU_EXCLUSION_BASE_HIGH_SHIFT  0
> -#define IOMMU_EXCLUSION_RANGE_ENABLE_MASK0x0001U
> -#define IOMMU_EXCLUSION_RANGE_ENABLE_SHIFT   0
> -#define IOMMU_EXCLUSION_ALLOW_ALL_MASK   0x0002U
> -#define IOMMU_EXCLUSION_ALLOW_ALL_SHIFT  1
> -#define IOMMU_EXCLUSION_LIMIT_LOW_MASK   0xF000U
> -#define IOMMU_EXCLUSION_LIMIT_LOW_SHIFT  12
> -#define IOMMU_EXCLUSION_LIMIT_HIGH_MASK  0xU
> -#define IOMMU_EXCLUSION_LIMIT_HIGH_SHIFT 0
> +#define IOMMU_MMIO_EXCLUSION_BASE   0x20
> +#define   EXCLUSION_RANGE_ENABLE(1 << 0)
> +#define   EXCLUSION_ALLOW_ALL   (1 << 1)
> +#define IOMMU_MMIO_EXCLUSION_LIMIT  0x28

Just one question here: Previously you suggested we switch to bitfields
for anything like this, and we've already done so with e.g.
union amd_iommu_control and union amd_iommu_ext_features. IOW I wonder
if we wouldn't better strive to be consistent in this regard. Or if not,
what the (written or unwritten) guidelines are when to use which
approach.

Jan



[ovmf test] 186408: all pass - PUSHED

2024-06-19 Thread osstest service owner
flight 186408 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186408/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 4d4f56992460c039d0cfe48c394c2e07aecf1d22
baseline version:
 ovmf 26a30abdd0f7fe5a9d2421cba6efe9397185ad98

Last test of basis   186405  2024-06-19 01:11:08 Z0 days
Testing same since   186408  2024-06-19 05:41:10 Z0 days1 attempts


People who touched revisions under test:
  Dhaval 
  Dhaval Sharma 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   26a30abdd0..4d4f569924  4d4f56992460c039d0cfe48c394c2e07aecf1d22 -> 
xen-tested-master



Re: Design session notes: GPU acceleration in Xen

2024-06-19 Thread Christian König

Am 18.06.24 um 16:12 schrieb Demi Marie Obenour:

On Tue, Jun 18, 2024 at 08:33:38AM +0200, Christian König wrote:
> Am 18.06.24 um 02:57 schrieb Demi Marie Obenour:
>> On Mon, Jun 17, 2024 at 10:46:13PM +0200, Marek Marczykowski-Górecki
>> wrote:
>>> On Mon, Jun 17, 2024 at 09:46:29AM +0200, Roger Pau Monné wrote:
 On Sun, Jun 16, 2024 at 08:38:19PM -0400, Demi Marie Obenour wrote:
> In both cases, the device physical
> addresses are identical to dom0’s physical addresses.

 Yes, but a PV dom0 physical address space can be very scattered.

 IIRC there's an hypercall to request physically contiguous memory for
 PV, but you don't want to be using that every time you allocate a
 buffer (not sure it would support the sizes needed by the GPU
 anyway).
>>
>>> Indeed that isn't going to fly. In older Qubes versions we had PV
>>> sys-net with PCI passthrough for a network card. After some uptime it
>>> was basically impossible to restart and still have enough contagious
>>> memory for a network driver, and there it was about _much_ smaller
>>> buffers, like 2M or 4M. At least not without shutting down a lot more
>>> things to free some more memory.
>>
>> Ouch!  That makes me wonder if all GPU drivers actually need physically
>> contiguous buffers, or if it is (as I suspect) driver-specific. CCing
>> Christian König who has mentioned issues in this area.

> Well GPUs don't need physical contiguous memory to function, but if they
> only get 4k pages to work with it means a quite large (up to 30%)
> performance penalty.

The status quo is "no GPU acceleration at all", so 70% of bare metal
performance would be amazing right now.


Well AMD uses native context approach in XEN which which delivers over 
90% of bare metal performance.


Pierre-Eric can tell you more, but we certainly have GPU solutions in 
productions with XEN which would suffer greatly if we see the underlying 
memory fragmented like this.



  However, the implementation
should not preclude eliminating this performance penalty in the future.

What size pages do GPUs need for good performance?  Is it the same as
CPU huge pages?


2MiB are usually sufficient.

Regards,
Christian.


  PV dom0 doesn't get huge pages at all, but PVH and HVM
guests do, and the goal is to move away from PV guests as they have lots
of unrelated problems.

> So scattering memory like you described is probably a very bad idea 
if you

> want any halve way decent performance.

For an initial prototype a 30% performance penalty is acceptable, but
it's good to know that memory fragmentation needs to be avoided.

> Regards,
> Christian

Thanks for the prompt response!





Re: [PATCH v3 3/3] x86/irq: forward pending interrupts to new destination in fixup_irqs()

2024-06-19 Thread Jan Beulich
On 19.06.2024 09:05, Roger Pau Monné wrote:
> On Tue, Jun 18, 2024 at 06:30:22PM +0200, Jan Beulich wrote:
>> On 18.06.2024 16:50, Roger Pau Monné wrote:
>>> On Tue, Jun 18, 2024 at 04:34:50PM +0200, Jan Beulich wrote:
 On 18.06.2024 13:30, Roger Pau Monné wrote:
> On Mon, Jun 17, 2024 at 03:41:12PM +0200, Jan Beulich wrote:
>> On 13.06.2024 18:56, Roger Pau Monne wrote:
>>> @@ -2686,11 +2705,27 @@ void fixup_irqs(const cpumask_t *mask, bool 
>>> verbose)
>>>  if ( desc->handler->disable )
>>>  desc->handler->disable(desc);
>>>  
>>> +/*
>>> + * If the current CPU is going offline and is (one of) the 
>>> target(s) of
>>> + * the interrupt, signal to check whether there are any 
>>> pending vectors
>>> + * to be handled in the local APIC after the interrupt has 
>>> been moved.
>>> + */
>>> +if ( !cpu_online(cpu) && cpumask_test_cpu(cpu, 
>>> desc->arch.cpu_mask) )
>>> +check_irr = true;
>>> +
>>>  if ( desc->handler->set_affinity )
>>>  desc->handler->set_affinity(desc, affinity);
>>>  else if ( !(warned++) )
>>>  set_affinity = false;
>>>  
>>> +if ( check_irr && apic_irr_read(vector) )
>>> +/*
>>> + * Forward pending interrupt to the new destination, this 
>>> CPU is
>>> + * going offline and otherwise the interrupt would be lost.
>>> + */
>>> +send_IPI_mask(cpumask_of(cpumask_any(desc->arch.cpu_mask)),
>>> +  desc->arch.vector);
>>
>> Hmm, IRR may become set right after the IRR read (unlike in the other 
>> cases,
>> where new IRQs ought to be surfacing only at the new destination). 
>> Doesn't
>> this want moving ...
>>
>>>  if ( desc->handler->enable )
>>>  desc->handler->enable(desc);
>>
>> ... past the actual affinity change?
>
> Hm, but the ->enable() hook is just unmasking the interrupt, the
> actual affinity change is done in ->set_affinity(), and hence after
> the call to ->set_affinity() no further interrupts should be delivered
> to the CPU regardless of whether the source is masked?
>
> Or is it possible for the device/interrupt controller to not switch to
> use the new destination until the interrupt is unmasked, and hence
> could have pending masked interrupts still using the old destination?
> IIRC For MSI-X it's required that the device updates the destination
> target once the entry is unmasked.

 That's all not relevant here, I think. IRR gets set when an interrupt is
 signaled, no matter whether it's masked.
>>>
>>> I'm kind of lost here, what does signaling mean in this context?
>>>
>>> I would expect the interrupt vector to not get set in IRR if the MSI-X
>>> entry is masked, as at that point the state of the address/data fields
>>> might not be consistent (that's the whole point of masking it right?)
>>>
 It's its handling which the
 masking would prevent, i.e. the "moving" of the set bit from IRR to ISR.
>>>
>>> My understanding was that the masking would prevent the message write to
>>> the APIC from happening, and hence no vector should get set in IRR.
>>
>> Hmm, yes, looks like I was confused. The masking is at the source side
>> (IO-APIC RTE, MSI-X entry, or - if supported - in the MSI capability).
>> So the sole case to worry about is MSI without mask-bit support then.
> 
> Yeah, and for MSI without masking bit support we don't care doing the
> IRR check before or after the ->enable() hook, as that's a no-op in
> that case.  The write to the MSI address/data fields has already been
> done, and hence the issue would be exclusively with draining any
> in-flight writes to the APIC doorbell (what you mention below).

Except that both here ...

 Plus we run with IRQs off here anyway if I'm not mistaken, so no
 interrupt can be delivered to the local CPU. IOW whatever IRR bits it
 has set (including ones becoming set between the IRR read and the actual
 vector change), those would never be serviced. Hence the reading of the
 bit ought to occur after the vector change: It's only then that we know
 the IRR bit corresponding to the old vector can't become set anymore.
>>>
>>> Right, and the vector change happens in ->set_affinity(), not
>>> ->enable().  See for example set_msi_affinity() and the
>>> write_msi_msg(), that's where the vector gets changed.
>>>
 And even then we're assuming that no interrupt signals might still be
 "on their way" from the IO-APIC or a posted MSI message write by a
 device to the LAPIC (I have no idea how to properly fence that, or
 whether there are guarantees for this to never occur).
>>>
>>> Yeah, those I expect would be completed in the window betwee

Re: [PATCH for-4.19] xen/irq: Address MISRA Rule 8.3 violation

2024-06-19 Thread Oleksii K.
On Tue, 2024-06-18 at 14:00 +0100, Andrew Cooper wrote:
> When centralising irq_ack_none(), different architectures had
> different names
> for the parameter of irq_ack_none().  As it's type is struct irq_desc
> *, it
> should be named desc.  Make this consistent.
> 
> No functional change.
> 
> Fixes: 8aeda4a241ab ("arch/irq: Make irq_ack_none() mandatory")
> Signed-off-by: Andrew Cooper 
Release-Acked-by: Oleksii Kurochko 

~ Oleksii

> ---
> CC: George Dunlap 
> CC: Jan Beulich 
> CC: Stefano Stabellini 
> CC: Julien Grall 
> CC: Volodymyr Babchuk 
> CC: Bertrand Marquis 
> CC: Michal Orzel 
> CC: Oleksii Kurochko 
> CC: Roberto Bagnara 
> CC: Nicola Vetrini 
> CC: consult...@bugseng.com 
> 
> Request for 4.19.  This was an accidental regression in a recent
> cleanup
> patch, and the fix is just a rename - its no functional change.
> ---
>  xen/arch/arm/irq.c    | 4 ++--
>  xen/include/xen/irq.h | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> index c60502444ccf..6b89f64fd194 100644
> --- a/xen/arch/arm/irq.c
> +++ b/xen/arch/arm/irq.c
> @@ -31,9 +31,9 @@ struct irq_guest
>  unsigned int virq;
>  };
>  
> -void irq_ack_none(struct irq_desc *irq)
> +void irq_ack_none(struct irq_desc *desc)
>  {
> -    printk("unexpected IRQ trap at irq %02x\n", irq->irq);
> +    printk("unexpected IRQ trap at irq %02x\n", desc->irq);
>  }
>  
>  void irq_end_none(struct irq_desc *irq)
> diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
> index adf33547d25f..580ae37e7428 100644
> --- a/xen/include/xen/irq.h
> +++ b/xen/include/xen/irq.h
> @@ -134,7 +134,7 @@ void cf_check irq_actor_none(struct irq_desc
> *desc);
>   * irq_ack_none() must be provided by the architecture.
>   * irq_end_none() is optional, and opted into using a define.
>   */
> -void cf_check irq_ack_none(struct irq_desc *irq);
> +void cf_check irq_ack_none(struct irq_desc *desc);
>  
>  /*
>   * Per-cpu interrupted context register state - the inner-most
> interrupt frame
> 
> base-commit: 8b4243a9b560c89bb259db5a27832c253d4bebc7




Re: [PATCH for-4.19] avoid UB in guest handle arithmetic

2024-06-19 Thread Oleksii K.
On Tue, 2024-06-18 at 14:24 +0100, Andrew Cooper wrote:
> On 19/03/2024 1:26 pm, Jan Beulich wrote:
> > At least XENMEM_memory_exchange can have huge values passed in the
> > nr_extents and nr_exchanged fields. Adding such values to pointers
> > can
> > overflow, resulting in UB. Cast respective pointers to "unsigned
> > long"
> > while at the same time making the necessary multiplication
> > explicit.
> > Remaining arithmetic is, despite there possibly being mathematical
> > overflow, okay as per the C99 spec: "A computation involving
> > unsigned
> > operands can never overflow, because a result that cannot be
> > represented
> > by the resulting unsigned integer type is reduced modulo the number
> > that
> > is one greater than the largest value that can be represented by
> > the
> > resulting type." The overflow that we need to guard against is
> > checked
> > for in array_access_ok().
> > 
> > Note that in / down from array_access_ok() the address value is
> > only
> > ever cast to "unsigned long" anyway, which is why in the invocation
> > from
> > guest_handle_subrange_okay() the value doesn't need casting back to
> > pointer type.
> > 
> > In compat grant table code change two guest_handle_add_offset() to
> > avoid
> > passing in negative offsets.
> > 
> > Since {,__}clear_guest_offset() need touching anyway, also deal
> > with
> > another (latent) issue there: They were losing the handle type,
> > i.e. the
> > size of the individual objects accessed. Luckily the few users we
> > presently have all pass char or uint8 handles.
> > 
> > Reported-by: Andrew Cooper 
> > Signed-off-by: Jan Beulich 
> 
> There wants to be a xen: prefix in the subject.
> 
> But as for the UB aspect, I've checked that this does resolve the
> failure identified by the XSA-212 XTF test.
> 
> Acked-by: Andrew Cooper 
> Tested-by: Andrew Cooper 
> 
> CC'ing Oleksii as this wants to go into 4.19.
Release-Acked-By: Oleksii Kurochko 

~ Oleksii




Re: [PATCH v3 3/3] x86/irq: forward pending interrupts to new destination in fixup_irqs()

2024-06-19 Thread Roger Pau Monné
On Tue, Jun 18, 2024 at 06:30:22PM +0200, Jan Beulich wrote:
> On 18.06.2024 16:50, Roger Pau Monné wrote:
> > On Tue, Jun 18, 2024 at 04:34:50PM +0200, Jan Beulich wrote:
> >> On 18.06.2024 13:30, Roger Pau Monné wrote:
> >>> On Mon, Jun 17, 2024 at 03:41:12PM +0200, Jan Beulich wrote:
>  On 13.06.2024 18:56, Roger Pau Monne wrote:
> > @@ -2686,11 +2705,27 @@ void fixup_irqs(const cpumask_t *mask, bool 
> > verbose)
> >  if ( desc->handler->disable )
> >  desc->handler->disable(desc);
> >  
> > +/*
> > + * If the current CPU is going offline and is (one of) the 
> > target(s) of
> > + * the interrupt, signal to check whether there are any 
> > pending vectors
> > + * to be handled in the local APIC after the interrupt has 
> > been moved.
> > + */
> > +if ( !cpu_online(cpu) && cpumask_test_cpu(cpu, 
> > desc->arch.cpu_mask) )
> > +check_irr = true;
> > +
> >  if ( desc->handler->set_affinity )
> >  desc->handler->set_affinity(desc, affinity);
> >  else if ( !(warned++) )
> >  set_affinity = false;
> >  
> > +if ( check_irr && apic_irr_read(vector) )
> > +/*
> > + * Forward pending interrupt to the new destination, this 
> > CPU is
> > + * going offline and otherwise the interrupt would be lost.
> > + */
> > +send_IPI_mask(cpumask_of(cpumask_any(desc->arch.cpu_mask)),
> > +  desc->arch.vector);
> 
>  Hmm, IRR may become set right after the IRR read (unlike in the other 
>  cases,
>  where new IRQs ought to be surfacing only at the new destination). 
>  Doesn't
>  this want moving ...
> 
> >  if ( desc->handler->enable )
> >  desc->handler->enable(desc);
> 
>  ... past the actual affinity change?
> >>>
> >>> Hm, but the ->enable() hook is just unmasking the interrupt, the
> >>> actual affinity change is done in ->set_affinity(), and hence after
> >>> the call to ->set_affinity() no further interrupts should be delivered
> >>> to the CPU regardless of whether the source is masked?
> >>>
> >>> Or is it possible for the device/interrupt controller to not switch to
> >>> use the new destination until the interrupt is unmasked, and hence
> >>> could have pending masked interrupts still using the old destination?
> >>> IIRC For MSI-X it's required that the device updates the destination
> >>> target once the entry is unmasked.
> >>
> >> That's all not relevant here, I think. IRR gets set when an interrupt is
> >> signaled, no matter whether it's masked.
> > 
> > I'm kind of lost here, what does signaling mean in this context?
> > 
> > I would expect the interrupt vector to not get set in IRR if the MSI-X
> > entry is masked, as at that point the state of the address/data fields
> > might not be consistent (that's the whole point of masking it right?)
> > 
> >> It's its handling which the
> >> masking would prevent, i.e. the "moving" of the set bit from IRR to ISR.
> > 
> > My understanding was that the masking would prevent the message write to
> > the APIC from happening, and hence no vector should get set in IRR.
> 
> Hmm, yes, looks like I was confused. The masking is at the source side
> (IO-APIC RTE, MSI-X entry, or - if supported - in the MSI capability).
> So the sole case to worry about is MSI without mask-bit support then.

Yeah, and for MSI without masking bit support we don't care doing the
IRR check before or after the ->enable() hook, as that's a no-op in
that case.  The write to the MSI address/data fields has already been
done, and hence the issue would be exclusively with draining any
in-flight writes to the APIC doorbell (what you mention below).

> >> Plus we run with IRQs off here anyway if I'm not mistaken, so no
> >> interrupt can be delivered to the local CPU. IOW whatever IRR bits it
> >> has set (including ones becoming set between the IRR read and the actual
> >> vector change), those would never be serviced. Hence the reading of the
> >> bit ought to occur after the vector change: It's only then that we know
> >> the IRR bit corresponding to the old vector can't become set anymore.
> > 
> > Right, and the vector change happens in ->set_affinity(), not
> > ->enable().  See for example set_msi_affinity() and the
> > write_msi_msg(), that's where the vector gets changed.
> > 
> >> And even then we're assuming that no interrupt signals might still be
> >> "on their way" from the IO-APIC or a posted MSI message write by a
> >> device to the LAPIC (I have no idea how to properly fence that, or
> >> whether there are guarantees for this to never occur).
> > 
> > Yeah, those I expect would be completed in the window between the
> > write of the new vector/destination and the reading of IRR.
> 
> Except 

Re: [XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device

2024-06-19 Thread Jan Beulich
On 19.06.2024 05:39, Chen, Jiqian wrote:
> On 2024/6/18 16:33, Jan Beulich wrote:
>> On 18.06.2024 08:25, Chen, Jiqian wrote:
>>> On 2024/6/17 22:17, Jan Beulich wrote:
 On 17.06.2024 11:00, Jiqian Chen wrote:
> --- a/xen/drivers/pci/physdev.c
> +++ b/xen/drivers/pci/physdev.c
> @@ -2,11 +2,17 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifndef COMPAT
>  typedef long ret_t;
>  #endif
>  
> +static const struct pci_device_state_reset_method
> +pci_device_state_reset_methods[] = {
> +[ DEVICE_RESET_FLR ].reset_fn = vpci_reset_device_state,
> +};

 What about the other three DEVICE_RESET_*? In particular ...
>>> I don't know how to implement the other three types of reset.
>>> This is a design form so that corresponding processing functions can be 
>>> added later if necessary. Do I need to set them to NULL pointers in this 
>>> array?
>>
>> No.
>>
>>> Does this form conform to your previous suggestion of using only one 
>>> hypercall to handle all types of resets?
>>
>> Yes, at least in principle. Question here is: To be on the safe side,
>> wouldn't we better reset state for all forms of reset, leaving possible
>> relaxation of that for later? At which point the function wouldn't need
>> calling indirectly, and instead would be passed the reset type as an
>> argument.
> If I understood correctly, next version should be?
> Use macros to represent different reset types.
> Add switch cases in PHYSDEVOP_pci_device_state_reset to handle different 
> reset functions.
> Add reset_type as a function parameter to vpci_reset_device_state for 
> possible future use.
> 
> +case PHYSDEVOP_pci_device_state_reset:
> +{
> +struct pci_device_state_reset dev_reset;
> +struct pci_dev *pdev;
> +pci_sbdf_t sbdf;
> +
> +if ( !is_pci_passthrough_enabled() )
> +return -EOPNOTSUPP;
> +
> +ret = -EFAULT;
> +if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
> +break;
> +
> +sbdf = PCI_SBDF(dev_reset.dev.seg,
> +dev_reset.dev.bus,
> +dev_reset.dev.evfn);
> +
> +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
> +if ( ret )
> +break;
> +
> +pcidevs_lock();
> +pdev = pci_get_pdev(NULL, sbdf);
> +if ( !pdev )
> +{
> +pcidevs_unlock();
> +ret = -ENODEV;
> +break;
> +}
> +
> +write_lock(&pdev->domain->pci_lock);
> +pcidevs_unlock();
> +/* Implement FLR, other reset types may be implemented in future */
> +switch ( dev_reset.reset_type )
> +{
> +case PCI_DEVICE_STATE_RESET_COLD:
> +case PCI_DEVICE_STATE_RESET_WARM:
> +case PCI_DEVICE_STATE_RESET_HOT:
> +case PCI_DEVICE_STATE_RESET_FLR:
> +ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
> +break;
> +}

If you use a switch() here, then there wants to be a default case returning
e.g. -EOPNOTSUPP or -EINVAL. Else the switch wants dropping. I'm not sure
which one's better in this specific case, I'm only slightly tending towards
the former.

In any event the comment (if any) wants to reflect what the actual code does.

Jan