date:20231012

Re: Xen 4.18 release: Reminder about code freeze

2023-10-12 Thread Juergen Gross


On 13.10.23 00:36, Stefano Stabellini wrote:

On Thu, 12 Oct 2023, George Dunlap wrote:

Stop tinkering in the hope that it hides the problem.  You're only
making it harder to fix properly.


Making it harder to fix properly would be a valid reason not to commit
the (maybe partial) fix. But looking at the fix again:

diff --git a/tools/xenstored/domain.c b/tools/xenstored/domain.c
index a6cd199fdc..9cd6678015 100644
--- a/tools/xenstored/domain.c
+++ b/tools/xenstored/domain.c
@@ -989,6 +989,7 @@ static struct domain *introduce_domain(const void *ctx,
 talloc_steal(domain->conn, domain);

 if (!restore) {
+   domain_conn_reset(domain);
 /* Notify the domain that xenstore is available */
 interface->connection = XENSTORE_CONNECTED;
 xenevtchn_notify(xce_handle, domain->port);
@@ -1031,8 +1032,6 @@ int do_introduce(const void *ctx, struct connection *conn,
 if (!domain)
 return errno;

-   domain_conn_reset(domain);
-
 send_ack(conn, XS_INTRODUCE);

It is a 1-line movement. Textually small. Easy to understand and to
revert. It doesn't seem to be making things harder to fix? We could
revert it any time if a better fix is offered.

Maybe we could have a XXX note in the commit message or in-code
comment?


It moves a line from one function (do_domain_introduce()) into a
completely different function (introduce_domain()), nested inside two
if() statements; with no analysis on how the change will impact
things.


I am not the original author of the patch, and I am not the maintainer
of the code, so I don't feel I have the qualifications to give you the
answers you are seeking. Julien as author of the patch and xenstore
reviewer might be in a better position to answer. Or Juergen as xenstore
maintainer.


I did already provide some feedback when the patch was sent the first time
in May.



 From what I can see the patch is correct.


You removed the dom0 special casing again, which I asked for to add back
then. And I still think there are missing barriers (at least for Arm).



We are removing a call to domain_conn_reset in do_introduce.
We are adding a call to domain_conn_reset in introduce_domain, which is
called right before in introduce_domain. Yes there are 2 if statements
but the domain_conn_reset is added in the right location: the
non-already-introduced non-restore code path.



Are there any paths through do_domain_introduce() that now *won't* get
a domain_conn_reset() call?  Is that OK?


Yes, the already-introduced and the restore code paths. The operations in
the already-introduced or the restore code paths seem simple enough not
to require a domain_conn_reset. Julien and Juergen should confirm.



Is introduce_domain() called in other places?  Will those places now
get extra domain_conn_reset() calls they weren't expecting?  Is that
OK?


introduce_domain is called by dom0_init, but I am guessing that dom0 is
already-introduced so it wouldn't get an extra domain_conn_reset. Julien
and Jurgen should confirm.


I don't think this is correct. dom0 will only be introduced via dom0_init().


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

[xen-unstable test] 183360: tolerable FAIL

2023-10-12 Thread osstest service owner

flight 183360 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183360/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-migrupgrade 21 debian-fixup/dst_host fail in 183347 pass in 
183360
 test-armhf-armhf-xl   8 xen-boot fail in 183347 pass in 183360
 test-amd64-i386-examine-bios  6 xen-installfail pass in 183347

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183347
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 183347
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 183347
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 183347
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183347
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 183347
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 183347
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 183347
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183347
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 183347
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 183347
 test-amd64-amd64-xl-qemuu-win7-amd64 12 windows-install   fail like 183347
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-raw  3 hosts-allocate   starved in 183347 n/a
 test-arm64-arm64-xl-credit2   3 hosts-allocate   starved in 183347 n/a
 test-ar

Re: [for-4.18][PATCH 2/2] x86/amd: Prevent potentially fetching wrong instruction bytes on Zen4

2023-10-12 Thread Andrew Cooper

On 13/10/2023 1:26 am, Alejandro Vallejo wrote:
> If Zen4 runs with SMT and without STIBP, then it may corrupt its own code
> stream. This is erratum #1485 in the AMD specs.

I'm afraid this description isn't fully accurate, and I can't elaborate
any further at this juncture.

Just say "address AMD erratum #1485".  When the revision guides do get
published, things will become clearer.

> diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
> index 22aa8c0085..2426e4cf15 100644
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -1166,6 +1166,17 @@ static void cf_check init_amd(struct cpuinfo_x86 *c)
>   if (c->x86 == 0x10)
>   __clear_bit(X86_FEATURE_MONITOR, c->x86_capability);
>  
> + /*
> +  * Erratum #1485: Outdated microcode on Zen4 may cause corruption
> +  * in the code stream if SMT is enabled and STIBP is not. The fix
> +  * is cheap, so it's applied irrespective of the loaded microcode.

Again, unfortunately not accurate.  Just link to the erratum here.

> +  */
> + if (!cpu_has_hypervisor && is_zen4_uarch()) {
> + rdmsrl(MSR_AMD64_BP_CFG, value);
> + wrmsrl(MSR_AMD64_BP_CFG,
> +value | AMD64_BP_CFG_SHARED_BTB_FIX);
> + }
> +
>   if (!cpu_has_amd_erratum(c, AMD_ERRATUM_121))
>   opt_allow_unsafe = 1;
>   else if (opt_allow_unsafe < 0)
> diff --git a/xen/arch/x86/include/asm/amd.h b/xen/arch/x86/include/asm/amd.h
> index 5a40bcc2ba..7d18f7c66b 100644
> --- a/xen/arch/x86/include/asm/amd.h
> +++ b/xen/arch/x86/include/asm/amd.h
> @@ -145,12 +145,20 @@
>   * Hygon (Fam18h) but without simple model number rules.  Instead, use STIBP
>   * as a heuristic that distinguishes the two.
>   *
> + * Zen3 and Zen4 are easier to spot by model number, but it's still not a
> + * single range. We use AutoIBRS to to discriminate instead, as that's a
> + * Zen4-specific feature.

I'd strongly advise not passing commentary on whether Zen3/4 are easier
or harder to spot.  Just discuss the technical aspect.

> + *
>   * The caller is required to perform the appropriate vendor check first.
>   */
>  #define is_zen1_uarch() ((boot_cpu_data.x86 == 0x17 || boot_cpu_data.x86 == 
> 0x18) && \
>   !boot_cpu_has(X86_FEATURE_AMD_STIBP))
>  #define is_zen2_uarch() (boot_cpu_data.x86 == 0x17 && \
>   boot_cpu_has(X86_FEATURE_AMD_STIBP))
> +#define is_zen3_uarch() (boot_cpu_data.x86 == 0x19 && \
> + !boot_cpu_has(X86_FEATURE_AUTO_IBRS))
> +#define is_zen4_uarch() (boot_cpu_data.x86 == 0x19 && \
> + boot_cpu_has(X86_FEATURE_AUTO_IBRS))
>  
>  struct cpuinfo_x86;
>  int cpu_has_amd_erratum(const struct cpuinfo_x86 *, int, ...);
> diff --git a/xen/arch/x86/include/asm/msr-index.h 
> b/xen/arch/x86/include/asm/msr-index.h
> index 11ffed543a..4437e8a63e 100644
> --- a/xen/arch/x86/include/asm/msr-index.h
> +++ b/xen/arch/x86/include/asm/msr-index.h
> @@ -403,6 +403,8 @@
>  #define MSR_AMD64_DE_CFG 0xc0011029
>  #define AMD64_DE_CFG_LFENCE_SERIALISE(_AC(1, ULL) << 1)
>  #define MSR_AMD64_EX_CFG 0xc001102c
> +#define MSR_AMD64_BP_CFG 0xc001102e
> +#define AMD64_BP_CFG_SHARED_BTB_FIX  (_AC(1, ULL) << 5)

MSR_AMD64_BP_CFG is fine to have in msr-index.h (it's consistent across
generations), but SHARED_BTB_FIX is not.  It's model specific, so keep
it local to the errata workaround.

LFENCE_SERIALISE was special.  It happened to have always been
consistent, and was retroactively declared to be architectural.

~Andrew

Re: [for-4.18][PATCH 1/2] xen/x86: Add family guards to the is_zen[12]_uarch() macros

2023-10-12 Thread Andrew Cooper

On 13/10/2023 1:26 am, Alejandro Vallejo wrote:
> It slightly simplifies the code that uses them at no real cost because the
> code is very rarely executed. This makes it harder to confuse zen uarches
> due to missing or mistaken family checks.

I'm afraid I disagree.

It's bogus to do a family check without a vendor check.  By making this
change, you're separating (spatially in code, and therefore cognitively)
the two checks that it's important to be able to cross-check.

> diff --git a/xen/arch/x86/include/asm/amd.h b/xen/arch/x86/include/asm/amd.h
> index d862cb7972..5a40bcc2ba 100644
> --- a/xen/arch/x86/include/asm/amd.h
> +++ b/xen/arch/x86/include/asm/amd.h
> @@ -145,11 +145,12 @@
>   * Hygon (Fam18h) but without simple model number rules.  Instead, use STIBP
>   * as a heuristic that distinguishes the two.
>   *
> - * The caller is required to perform the appropriate vendor/family checks
> - * first.
> + * The caller is required to perform the appropriate vendor check first.
>   */
> -#define is_zen1_uarch() (!boot_cpu_has(X86_FEATURE_AMD_STIBP))
> -#define is_zen2_uarch()   boot_cpu_has(X86_FEATURE_AMD_STIBP)
> +#define is_zen1_uarch() ((boot_cpu_data.x86 == 0x17 || boot_cpu_data.x86 == 
> 0x18) && \
> + !boot_cpu_has(X86_FEATURE_AMD_STIBP))
> +#define is_zen2_uarch() (boot_cpu_data.x86 == 0x17 && \
> + boot_cpu_has(X86_FEATURE_AMD_STIBP))

What leads you to believe there aren't Hygon Zen2's ?

~Andrew

[linux-linus test] 183359: tolerable FAIL - PUSHED

2023-10-12 Thread osstest service owner

flight 183359 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183359/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds  8 xen-boot fail REGR. vs. 183344

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 183344
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 183344
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 183344
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183344
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183344
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 183344
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183344
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 183344
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linux401644852d0b2a278811de38081be23f74b5bb04
baseline version:
 linux1c8b86a3799f7e5be903c3f49fcdaee29fd385b5

Last test of basis   183344  2023-10-11 09:42:09 Z1 days
Testing same since   183359  2023-10-12 13:01:48 Z0 days1 attempts


People who touched revisions under test:
  Benjamin Tissoires 
  Damien Le Moal 
  David Sterba 
  Dennis Gilmore 
  Douglas Anderson 
  Gustavo A. R. Silva 
  Hans de Goede 
  Jan Kara 
  Johan Hovold 
  John Ogness 
  Linus Torvalds 
  Ondrej Zary 
  Petr Mladek 
  Petr Tesarik 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386

Re: [for-4.18][PATCH 0/2] Fix AMD erratum #1485 on AMD Zen4

2023-10-12 Thread Henry Wang

Hi Alejandro,

> On Oct 13, 2023, at 01:25, Alejandro Vallejo  
> wrote:
> 
> This patch should make it to 4.18, as it prevents random crashes on AMD
> Zen4 running outdated microcode.

Yes I agree, so for this series:

Release-acked-by: Henry Wang 

Kind regards,
Henry


> 
> Under certain conditions Zen4 may corrupt its own code stream when SMT is
> enabled and STIBP is not. The Linux thread in which this was highlighted is
> in patch2's commit message.
> 
> NOTE: Still running in CI as of now, but tested locally. Pipeline here.
>https://gitlab.com/xen-project/people/agvallejo/xen/-/pipelines/1034895039
> 
> Patch 1: is just a minor refactor to ensure we don't get microarchitectures
> of different families mixed up now that we have 3 different
> families involved (Fam17h, Fam18h and Fam19h)
> Patch 2: The actual fix. It involves setting a bit in an MSR if it's a non
> virtualized zen4. It's not a direct copy of the Linux patch, as we
> have started using macros to find out microarchitectures from
> CPUID bits, rather than relying on models.
> 
> Alejandro Vallejo (2):
>  xen/x86: Add family guards to the is_zen[12]_uarch() macros
>  x86/amd: Prevent potentially fetching wrong instruction bytes on Zen4
> 
> xen/arch/x86/cpu/amd.c   | 16 +---
> xen/arch/x86/include/asm/amd.h   | 17 +
> xen/arch/x86/include/asm/msr-index.h |  2 ++
> xen/arch/x86/spec_ctrl.c |  3 ---
> 4 files changed, 28 insertions(+), 10 deletions(-)
> 
> -- 
> 2.34.1
>

Re: [XEN PATCH][for-4.19 2/2] xen/iommu: use LOWEST_BIT to wrap a violation of Rule 10.1

2023-10-12 Thread Stefano Stabellini

On Thu, 12 Oct 2023, Nicola Vetrini wrote:
> No functional change.
> 
> Signed-off-by: Nicola Vetrini 

Reviewed-by: Stefano Stabellini

Re: [XEN PATCH][for-4.19 1/2] xen/vmap: use LOWEST_BIT to wrap a violation of Rule 10.1

2023-10-12 Thread Stefano Stabellini

On Thu, 12 Oct 2023, Nicola Vetrini wrote:
> No functional change.
> 
> Signed-off-by: Nicola Vetrini 

Reviewed-by: Stefano Stabellini

Re: [XEN PATCH][for-next v2 6/8] x86/mce: Move MC_NCLASSES into the enum mctelem_class

2023-10-12 Thread Stefano Stabellini

On Thu, 12 Oct 2023, Nicola Vetrini wrote:
> The definition of MC_NCLASSES contained a violation of MISRA C:2012
> Rule 10.1, therefore by moving it as an enumeration constant resolves the
> violation and makes it more resilient to possible additions to that enum.
> 
> Signed-off-by: Nicola Vetrini 

Reviewed-by: Stefano Stabellini

Re: [XEN PATCH][for-next v2 4/8] x86_64/mm: express macro CNT using LOWEST_BIT

2023-10-12 Thread Stefano Stabellini

On Thu, 12 Oct 2023, Nicola Vetrini wrote:
> The various definitions of macro CNT (and the related BUILD_BUG_ON)
> can be rewritten using LOWEST_BIT, encapsulating a violation of
> MISRA C:2012 Rule 10.1.
> 
> Signed-off-by: Nicola Vetrini 

Reviewed-by: Stefano Stabellini

Re: [XEN PATCH][for-4.19 v2 3/8] xen/pdx: amend definition of PDX_GROUP_COUNT

2023-10-12 Thread Stefano Stabellini

On Thu, 12 Oct 2023, Nicola Vetrini wrote:
> The definition of PDX_GROUP_COUNT causes violations of
> MISRA C:2012 Rule 10.1, therefore the problematic part now uses
> the LOWEST_BIT macro, which encapsulates the pattern.
> 
> Signed-off-by: Nicola Vetrini 

Reviewed-by: Stefano Stabellini

Re: [XEN PATCH][for-4.19 v2 2/8] arm/bitops: encapsulate violation of MISRA C:2012 Rule 10.1

2023-10-12 Thread Stefano Stabellini

On Thu, 12 Oct 2023, Nicola Vetrini wrote:
> The definitions of ffs{l}? violate Rule 10.1, by using the well-known
> pattern (x & -x); its usage is wrapped by the LOWEST_BIT macro.
> 
> No functional change.
> 
> Signed-off-by: Nicola Vetrini 

Reviewed-by: Stefano Stabellini

Re: [XEN PATCH][for-next][for-4.19 v2 1/8] xen/include: add macro LOWEST_BIT

2023-10-12 Thread Stefano Stabellini

On Thu, 12 Oct 2023, Nicola Vetrini wrote:
> The purpose of this macro is to encapsulate the well-known expression
> 'x & -x', that in 2's complement architectures on unsigned integers will
> give 2^ffs(x), where ffs(x) is the position of the lowest set bit in x.
> 
> A deviation for ECLAIR is also introduced.
> 
> Signed-off-by: Nicola Vetrini 
> ---
> Changes in v2:
> - rename to LOWEST_BIT
> ---
>  automation/eclair_analysis/ECLAIR/deviations.ecl | 6 ++
>  xen/include/xen/macros.h | 6 --
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/automation/eclair_analysis/ECLAIR/deviations.ecl 
> b/automation/eclair_analysis/ECLAIR/deviations.ecl
> index d8170106b449..b8e1155ee49d 100644
> --- a/automation/eclair_analysis/ECLAIR/deviations.ecl
> +++ b/automation/eclair_analysis/ECLAIR/deviations.ecl
> @@ -274,6 +274,12 @@ still non-negative."
>  -config=MC3R1.R10.1,etypes+={safe, 
> "stmt(operator(logical)||node(conditional_operator||binary_conditional_operator))",
>  "dst_type(ebool||boolean)"}
>  -doc_end
> 
> +-doc_begin="The macro LOWEST_BIT encapsulates a well-known pattern to obtain 
> the value
> +2^ffs(x) for unsigned integers on two's complement architectures
> +(all the architectures supported by Xen satisfy this requirement)."
> +-config=MC3R1.R10.1,reports+={safe, 
> "any_area(any_loc(any_exp(macro(^LOWEST_BIT$"}
> +-doc_end

Please also add the same deviation and explanation to
docs/misra/deviations.rst. Other than that, this looks fine.


>  ### Set 3 ###
> 
>  #
> diff --git a/xen/include/xen/macros.h b/xen/include/xen/macros.h
> index d0caae7db298..af47179d1056 100644
> --- a/xen/include/xen/macros.h
> +++ b/xen/include/xen/macros.h
> @@ -8,8 +8,10 @@
>  #define DIV_ROUND(n, d) (((n) + (d) / 2) / (d))
>  #define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
> 
> -#define MASK_EXTR(v, m) (((v) & (m)) / ((m) & -(m)))
> -#define MASK_INSR(v, m) (((v) * ((m) & -(m))) & (m))
> +#define LOWEST_BIT(x) ((x) & -(x))
> +
> +#define MASK_EXTR(v, m) (((v) & (m)) / LOWEST_BIT(m))
> +#define MASK_INSR(v, m) (((v) * LOWEST_BIT(m)) & (m))
> 
>  #define count_args_(dot, a1, a2, a3, a4, a5, a6, a7, a8, x, ...) x
>  #define count_args(args...) \
> --
> 2.34.1
>

Re: [XEN PATCH][for-4.19 v2 1/1] xen: introduce a deviation for Rule 11.9

2023-10-12 Thread Stefano Stabellini

On Wed, 11 Oct 2023, Nicola Vetrini wrote:
> The constant 0 is used instead of NULL in '__ACCESS_ONCE' as a
> compile-time check to detect non-scalar types; its usage for this
> purpose is deviated.
> 
> Furthermore, the 'access_field' and 'typeof_field' macros are
> introduced as a general way to deal with accesses to structs
> without declaring a struct variable.
> 
> No functional change intended.
> 
> Signed-off-by: Nicola Vetrini 

Reviewed-by: Stefano Stabellini 


> ---
> Changes in v2:
> - added entry in deviations.rst
> ---
>  automation/eclair_analysis/ECLAIR/deviations.ecl | 9 +
>  docs/misra/deviations.rst| 5 +
>  xen/include/xen/compiler.h   | 5 -
>  xen/include/xen/kernel.h | 2 +-
>  4 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/automation/eclair_analysis/ECLAIR/deviations.ecl 
> b/automation/eclair_analysis/ECLAIR/deviations.ecl
> index fa56e5c00a27..28d9c37bedb2 100644
> --- a/automation/eclair_analysis/ECLAIR/deviations.ecl
> +++ b/automation/eclair_analysis/ECLAIR/deviations.ecl
> @@ -246,6 +246,15 @@ constant expressions are required.\""
>"any()"}
>  -doc_end
>  
> +#
> +# Series 11
> +#
> +
> +-doc_begin="This construct is used to check if the type is scalar, and for 
> this purpose the use of 0 as a null pointer constant is deliberate."
> +-config=MC3R1.R11.9,reports+={deliberate, 
> "any_area(any_loc(any_exp(macro(^__ACCESS_ONCE$"
> +}
> +-doc_end
> +
>  #
>  # Series 13
>  #
> diff --git a/docs/misra/deviations.rst b/docs/misra/deviations.rst
> index ee7aed0609d2..1b00e4e3e9b7 100644
> --- a/docs/misra/deviations.rst
> +++ b/docs/misra/deviations.rst
> @@ -199,6 +199,11 @@ Deviations related to MISRA C:2012 Rules:
> See automation/eclair_analysis/deviations.ecl for the full 
> explanation.
>   - Tagged as `safe` for ECLAIR.
>  
> +   * - R11.9
> + - __ACCESS_ONCE uses a 0 as a null pointer constant to check if a type 
> is
> +   scalar, therefore its usage for this purpose is allowed.
> + - Tagged as `deliberate` for ECLAIR.
> +
> * - R13.5
>   - All developers and reviewers can be safely assumed to be well aware of
> the short-circuit evaluation strategy for logical operators.
> diff --git a/xen/include/xen/compiler.h b/xen/include/xen/compiler.h
> index dd99e573083f..15be9a750b23 100644
> --- a/xen/include/xen/compiler.h
> +++ b/xen/include/xen/compiler.h
> @@ -109,13 +109,16 @@
>  
>  #define offsetof(a,b) __builtin_offsetof(a,b)
>  
> +/* Access the field of structure type, without defining a local variable */
> +#define access_field(type, member) (((type *)NULL)->member)
> +#define typeof_field(type, member) typeof(access_field(type, member))
>  /**
>   * sizeof_field(TYPE, MEMBER)
>   *
>   * @TYPE: The structure containing the field of interest
>   * @MEMBER: The field to return the size of
>   */
> -#define sizeof_field(TYPE, MEMBER) sizeofTYPE *)0)->MEMBER))
> +#define sizeof_field(TYPE, MEMBER) sizeof(access_field(TYPE, MEMBER))
>  
>  #if !defined(__STDC_VERSION__) || __STDC_VERSION__ < 201112L
>  #define alignof __alignof__
> diff --git a/xen/include/xen/kernel.h b/xen/include/xen/kernel.h
> index 46b3c9c02625..2c5ed7736c99 100644
> --- a/xen/include/xen/kernel.h
> +++ b/xen/include/xen/kernel.h
> @@ -51,7 +51,7 @@
>   *
>   */
>  #define container_of(ptr, type, member) ({  \
> -typeof( ((type *)0)->member ) *__mptr = (ptr);  \
> +typeof_field(type, member) *__mptr = (ptr); \
>  (type *)( (char *)__mptr - offsetof(type,member) );})
>  
>  /*
> -- 
> 2.34.1
>

Re: [XEN PATCH][for-4.19] xen/include: make enum perfcounter anonymous

2023-10-12 Thread Stefano Stabellini

On Wed, 11 Oct 2023, Nicola Vetrini wrote:
> Using enumerators declared in a named enum, such as the one modified,
> as operands to arithmetic operators is not allowed by MISRA C:2012 Rule 10.1.
> The enumerators of an anonymous enum can be used instead.
> 
> Signed-off-by: Nicola Vetrini 

Reviewed-by: Stefano Stabellini 


> ---
> This violation manifeststs itself, for instance, in all uses of macro
> 'perfc_incra' from xen/include/xen/perfc.h, because the expansion
> contains an arithmetic operation on two enum constants from enum perfcounter.
> 
> ( (*nr) <= PERFC_LAST_hypercalls - PERFC_hypercalls ?  [...]
> 
> ---
>  docs/misra/rules.rst| 3 +++
>  xen/include/xen/perfc.h | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
> index 3139ca7ae6dd..26c3ff819948 100644
> --- a/docs/misra/rules.rst
> +++ b/docs/misra/rules.rst
> @@ -341,6 +341,9 @@ maintainers if you want to suggest a change.
> compilers' extensions)
>   - Implicit conversions to boolean for conditionals (?: if while
> for) and logical operators (! || &&)
> + - The essential type model allows the constants defined by anonymous
> +   enums (e.g., enum { A, B, C }) to be used as operands to 
> arithmetic
> +   operators, as they have a signed essential type.
> 
> * - `Rule 10.2 
> `_
>   - Required
> diff --git a/xen/include/xen/perfc.h b/xen/include/xen/perfc.h
> index 7c5ce537bd02..96022c07481e 100644
> --- a/xen/include/xen/perfc.h
> +++ b/xen/include/xen/perfc.h
> @@ -39,7 +39,7 @@
>  #define PERFSTATUS   PERFCOUNTER
>  #define PERFSTATUS_ARRAY PERFCOUNTER_ARRAY
> 
> -enum perfcounter {
> +enum {
>  #include 
>   NUM_PERFCOUNTERS
>  };
> --
> 2.34.1
>

Re: [XEN PATCH] xen: Add SAF deviations for MISRA C:2012 Rule 7.1

2023-10-12 Thread Stefano Stabellini

On Wed, 11 Oct 2023, Julien Grall wrote:
> On 11/10/2023 11:53, Luca Fancellu wrote:
> > 
> > > > > 
> > > > > Luca answered to a similar, more generic, question a few days ago
> > > > > about
> > > > > Coverity: https://marc.info/?l=xen-devel&m=169657904027210
> > > > Interesting.
> > > > > 
> > > > > So even if we get cppcheck to work like that, we would lose Coverity
> > > > > support.
> > > > I think my suggestion was probably misunderstood. So let me clarify it.
> > > > To cover multi-line, we could write the following in Xen:
> > > > /* cppcheck tag next-3-lines */
> > > > line 1
> > > > line 2
> > > > line 3
> > > > AFAIU Eclair supports multi-line, so the tag would be transformed to
> > > > there internal solution. For CPPCheck, this could be transformed to:
> > > > /* cppcheck tag next-3 lines */
> > > > line 1 /* generated cppcheck tag */
> > > > line 2 /* generated cppcheck tag */
> > > > line 3 /* generated cppcheck tag */
> > > > Now, I understand that coverity doesn't support any of the two format.
> > > > So the only solution would be to add the comment before each line which
> > > > would impact the line numbers. This is not great, but I think this is
> > > > acceptable as the context would likely help to find where this is coming
> > > > from.
> > > 
> > > Just to clarify, here I meant that for coverity, a script before the scan
> > > could convert to the multi-line version. So the line change only impact
> > > Coverity.
> > 
> > Hi Julien,
> > 
> > We’ve tried to avoid that because when the tool insert lines, the resultant
> > report would give wrong lines numbers if any violation is reported after the
> > insertion points. So there will be a mismatch between the codebase and the
> > report findings from some point on in the file.
> 
> I know. Stefano already pointed that out. But as I wrote, I don't think this
> is a big problem as it only affecte one tool (Coverity) and one would still be
> able to find the exact place based on the context.

Yeah I think we could consider going that way if it only affects 1 tool
out of 3.

We might still have to patch cppcheck to add the functionality but it
might not be that difficult.

Re: [XEN PATCH][for-4.19 v2 2/2] docs/misra: add deviations.rst to document additional deviations.

2023-10-12 Thread Stefano Stabellini

On Wed, 11 Oct 2023, Nicola Vetrini wrote:
> On 11/10/2023 17:00, Nicola Vetrini wrote:
> > > > > > > +
> > > > > > > +   * - R2.1
> > > > > > > + - The compiler implementation guarantees that the
> > > > > > > unreachable code
> > > > > > > is
> > > > > > > +   removed. Constant expressions and unreachable branches of
> > > > > > > if and
> > > > > > > switch
> > > > > > > +   statements are expected.
> > > > > > > + - Tagged as `safe` for ECLAIR.
> > > > > > > +
> > > > > > > +   * - R2.1
> > > > > > > + - Some functions are intended not to be referenced.
> > > > > > > + - Tagged as `deliberate` for ECLAIR.
> > > > > > 
> > > > > > What does it mean "some functions" in this case? Should we list
> > > > > > which
> > > > > > functions?
> > > > > > 
> > > > > 
> > > > > Well, there are a lot, typically resulting from build configurations
> > > > > that do
> > > > > not
> > > > > use them, or because they are used only in asm code. I can mention
> > > > > these
> > > > > reasons in the
> > > > > document, to make it easier to understand.
> > > > 
> > > > Yes, I think we need to clarify further this point, because saying "Some
> > > > functions" doesn't help the reader understand:
> > > > - whether all functions can be not referenced
> > > > - which subset of functions can be not referenced
> > > > 
> > > > How to distinguish between? How do we know whether a certain patch is
> > > > violating the rule or not?
> > > > 
> > > > If there is a clear list of functions that can be not referenced, then
> > > > we should list them here. If there is a methodology we can use to
> > > > distinguish between them (e.g. functions called from asm only) then we
> > > > can write the methodology here. Either way it is fine as long as the
> > > > criteria to know if it is OK if a function is not referenced is clear.
> > > 
> > > Aren't they more or less the one we tagged with SAF-1-safe because
> > > there were no prototype? If so, we could use the same tags.
> > > 
> > > We could introduce an extra tags for the others. An alternative would
> > > be to add an attribute (e.g. asmcall) to mark each function used by
> > > assembly.
> > > 
> > > Cheers,
> > 
> > Both suggestion do have some value. As it is, it's not distinguishable
> > what causes a
> > function to be unreferenced in a certain analysis config. However:
> > 
> > - functions only used by asm code can be specified in the ECLAIR
> > config so that they will
> >   have an extra fake reference as far as the checker is concerned. I
> > can do that on a
> >   separate patch and list them in deviations.rst. An attribute seems a
> > good way to signal the
> >   intention.
> > - Functions that have no reference only in the current analysis should
> > have their declaration
> >   #ifdef-ed out in the configurations where they are not used, in an
> > ideal world.
> > - Truly unreferenced functions should be removed, or justified
> 
> Especially the last two appear somewhat tricky to disentangle, as they do
> require knowledge of
> possible code paths.

First let me premise that if we are unsure on how to proceed on this you
can resend this patch series without this item ("Some functions are
intended not to be referenced"), so at least the rest can go in now.

On this specific point, I think we should only make clear and
unmistakable statements. For instance, I think it is OK to say that all
the functions only used by asm code are exceptions (ideally they would
have a asmcall tag as Julien suggested) because that is deterministic.

Functions that have no references in a specific kconfig configuration
should have their definition #ifdef'ed (not necessarily the
declaration, I think we have already clarified that it is OK to have a
declaration without definition.) 

Truly unreferenced functions should be removed.

In conclusion, I think we should only have "functions only called from
asm code" as a deviation here.

Re: [PATCH v2 0/4] Introduce %rip-relative addressing to PER_CPU_VAR macro

2023-10-12 Thread Uros Bizjak

On Thu, Oct 12, 2023 at 11:22 PM H. Peter Anvin  wrote:
>
> On 10/12/23 14:17, Uros Bizjak wrote:
> >>
> >> Are you PIC-adjusting the percpu variables as well?
> >
> > After this patch (and after fixing percpu_stable_op to use "a" operand
> > modifier on GCC), the only *one* remaining absolute reference to
> > percpu variable remain in xen-head.S, where:
> >
> >  movq$INIT_PER_CPU_VAR(fixed_percpu_data),%rax
> >
> > should be changed to use leaq.
> >
> > All others should then be (%rip)-relative.
> >
>
> I mean, the symbols themselves are relative, not absolute?

The reference to the symbol is relative to the segment register, but
absolute to the location of the instruction. If the executable changes
location, then instruction moves around  and reference is not valid
anymore. (%rip)-relative reference compensate for changed location of
the instruction.

Uros.

Re: Xen 4.18 release: Reminder about code freeze

2023-10-12 Thread Stefano Stabellini

On Thu, 12 Oct 2023, George Dunlap wrote:
> > > Stop tinkering in the hope that it hides the problem.  You're only
> > > making it harder to fix properly.
> >
> > Making it harder to fix properly would be a valid reason not to commit
> > the (maybe partial) fix. But looking at the fix again:
> >
> > diff --git a/tools/xenstored/domain.c b/tools/xenstored/domain.c
> > index a6cd199fdc..9cd6678015 100644
> > --- a/tools/xenstored/domain.c
> > +++ b/tools/xenstored/domain.c
> > @@ -989,6 +989,7 @@ static struct domain *introduce_domain(const void *ctx,
> > talloc_steal(domain->conn, domain);
> >
> > if (!restore) {
> > +   domain_conn_reset(domain);
> > /* Notify the domain that xenstore is available */
> > interface->connection = XENSTORE_CONNECTED;
> > xenevtchn_notify(xce_handle, domain->port);
> > @@ -1031,8 +1032,6 @@ int do_introduce(const void *ctx, struct connection 
> > *conn,
> > if (!domain)
> > return errno;
> >
> > -   domain_conn_reset(domain);
> > -
> > send_ack(conn, XS_INTRODUCE);
> >
> > It is a 1-line movement. Textually small. Easy to understand and to
> > revert. It doesn't seem to be making things harder to fix? We could
> > revert it any time if a better fix is offered.
> >
> > Maybe we could have a XXX note in the commit message or in-code
> > comment?
> 
> It moves a line from one function (do_domain_introduce()) into a
> completely different function (introduce_domain()), nested inside two
> if() statements; with no analysis on how the change will impact
> things.

I am not the original author of the patch, and I am not the maintainer
of the code, so I don't feel I have the qualifications to give you the
answers you are seeking. Julien as author of the patch and xenstore
reviewer might be in a better position to answer. Or Juergen as xenstore
maintainer.

>From what I can see the patch is correct.

We are removing a call to domain_conn_reset in do_introduce.
We are adding a call to domain_conn_reset in introduce_domain, which is
called right before in introduce_domain. Yes there are 2 if statements
but the domain_conn_reset is added in the right location: the
non-already-introduced non-restore code path.

> Are there any paths through do_domain_introduce() that now *won't* get
> a domain_conn_reset() call?  Is that OK?

Yes, the already-introduced and the restore code paths. The operations in
the already-introduced or the restore code paths seem simple enough not
to require a domain_conn_reset. Julien and Juergen should confirm.

> Is introduce_domain() called in other places?  Will those places now
> get extra domain_conn_reset() calls they weren't expecting?  Is that
> OK?

introduce_domain is called by dom0_init, but I am guessing that dom0 is
already-introduced so it wouldn't get an extra domain_conn_reset. Julien
and Jurgen should confirm.

> I mean, it certainly seems strange to set the state to CONNECTED, send
> off an event channel, and then after that delete all watches /
> transactions / buffered data and so on; but we need at least a basic
> understanding of what's going on to know that this change isn't going
> to break comething.
> 
> Not knowing much about the xenstore protocol: In the
> (!domain->introduced) case, will there be anything to actually delete?
>  It seems like it would only be necessary / useful on the
> (domain->introduced) case.

Doesn't it seem weird to you that we set a connection to CONNECTED,
notify the domain that it is ready to go, and only *after* that we reset
the connection to zero?

What happens if a domain starts using the connection as soon as it
receives the event channel notification and before domain_conn_reset is
called?

[PATCH v10 15/17] xen/arm: account IO handlers for emulated PCI MSI-X

2023-10-12 Thread Volodymyr Babchuk

From: Oleksandr Andrushchenko 

At the moment, we always allocate an extra 16 slots for IO handlers
(see MAX_IO_HANDLER). So while adding IO trap handlers for the emulated
MSI-X registers we need to explicitly tell that we have additional IO
handlers, so those are accounted.

Signed-off-by: Oleksandr Andrushchenko 
Acked-by: Julien Grall 
---
Cc: Julien Grall 
Cc: Stefano Stabellini 
---
This actually moved here from the part 2 of the prep work for PCI
passthrough on Arm as it seems to be the proper place for it.

Since v5:
- optimize with IS_ENABLED(CONFIG_HAS_PCI_MSI) since VPCI_MAX_VIRT_DEV is
  defined unconditionally
New in v5
---
 xen/arch/arm/vpci.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 58e2a20135..01b50d435e 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -134,6 +134,8 @@ static int vpci_get_num_handlers_cb(struct domain *d,
 
 unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
 {
+unsigned int count;
+
 if ( !has_vpci(d) )
 return 0;
 
@@ -154,7 +156,17 @@ unsigned int domain_vpci_get_num_mmio_handlers(struct 
domain *d)
  * For guests each host bridge requires one region to cover the
  * configuration space. At the moment, we only expose a single host bridge.
  */
-return 1;
+count = 1;
+
+/*
+ * There's a single MSI-X MMIO handler that deals with both PBA
+ * and MSI-X tables per each PCI device being passed through.
+ * Maximum number of emulated virtual devices is VPCI_MAX_VIRT_DEV.
+ */
+if ( IS_ENABLED(CONFIG_HAS_PCI_MSI) )
+count += VPCI_MAX_VIRT_DEV;
+
+return count;
 }
 
 /*
-- 
2.42.0

[PATCH v10 17/17] arm/vpci: honor access size when returning an error

2023-10-12 Thread Volodymyr Babchuk

Guest can try to read config space using different access sizes: 8,
16, 32, 64 bits. We need to take this into account when we are
returning an error back to MMIO handler, otherwise it is possible to
provide more data than requested: i.e. guest issues LDRB instruction
to read one byte, but we are writing 0x in the target
register.

Signed-off-by: Volodymyr Babchuk 
---
 xen/arch/arm/vpci.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 3521d5bc2f..f1e434a5db 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -46,6 +46,8 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
 {
 struct pci_host_bridge *bridge = p;
 pci_sbdf_t sbdf;
+const uint8_t access_size = (1 << info->dabt.size) * 8;
+const uint64_t access_mask = GENMASK_ULL(access_size - 1, 0);
 /* data is needed to prevent a pointer cast on 32bit */
 unsigned long data;
 
@@ -53,7 +55,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
 
 if ( !vpci_sbdf_from_gpa(v->domain, bridge, info->gpa, &sbdf) )
 {
-*r = ~0ul;
+*r = access_mask;
 return 1;
 }
 
@@ -64,7 +66,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
 return 1;
 }
 
-*r = ~0ul;
+*r = access_mask;
 
 return 0;
 }
-- 
2.42.0

[PATCH v10 13/17] vpci: add initial support for virtual PCI bus topology

2023-10-12 Thread Volodymyr Babchuk

From: Oleksandr Andrushchenko 

Assign SBDF to the PCI devices being passed through with bus 0.
The resulting topology is where PCIe devices reside on the bus 0 of the
root complex itself (embedded endpoints).
This implementation is limited to 32 devices which are allowed on
a single PCI bus.

Please note, that at the moment only function 0 of a multifunction
device can be passed through.

Signed-off-by: Oleksandr Andrushchenko 
---
In v10:
- Removed ASSERT(pcidevs_locked())
- Removed redundant code (local sbdf variable, clearing sbdf during
device removal, etc)
- Added __maybe_unused attribute to "out:" label
- Introduced HAS_VPCI_GUEST_SUPPORT Kconfig option, as this is the
  first patch where it is used (previously was in "vpci: add hooks for
  PCI device assign/de-assign")
In v9:
- Lock in add_virtual_device() replaced with ASSERT (thanks, Stewart)
In v8:
- Added write lock in add_virtual_device
Since v6:
- re-work wrt new locking scheme
- OT: add ASSERT(pcidevs_write_locked()); to add_virtual_device()
Since v5:
- s/vpci_add_virtual_device/add_virtual_device and make it static
- call add_virtual_device from vpci_assign_device and do not use
  REGISTER_VPCI_INIT machinery
- add pcidevs_locked ASSERT
- use DECLARE_BITMAP for vpci_dev_assigned_map
Since v4:
- moved and re-worked guest sbdf initializers
- s/set_bit/__set_bit
- s/clear_bit/__clear_bit
- minor comment fix s/Virtual/Guest/
- added VPCI_MAX_VIRT_DEV constant (PCI_SLOT(~0) + 1) which will be used
  later for counting the number of MMIO handlers required for a guest
  (Julien)
Since v3:
 - make use of VPCI_INIT
 - moved all new code to vpci.c which belongs to it
 - changed open-coded 31 to PCI_SLOT(~0)
 - added comments and code to reject multifunction devices with
   functions other than 0
 - updated comment about vpci_dev_next and made it unsigned int
 - implement roll back in case of error while assigning/deassigning devices
 - s/dom%pd/%pd
Since v2:
 - remove casts that are (a) malformed and (b) unnecessary
 - add new line for better readability
 - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
functions are now completely gated with this config
 - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/drivers/Kconfig |  4 +++
 xen/drivers/vpci/vpci.c | 63 +
 xen/include/xen/sched.h |  8 ++
 xen/include/xen/vpci.h  | 11 +++
 4 files changed, 86 insertions(+)

diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
index db94393f47..780490cf8e 100644
--- a/xen/drivers/Kconfig
+++ b/xen/drivers/Kconfig
@@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
 config HAS_VPCI
bool
 
+config HAS_VPCI_GUEST_SUPPORT
+   bool
+   depends on HAS_VPCI
+
 endmenu
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 5e34d0092a..7c46a2d3f4 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -36,6 +36,52 @@ extern vpci_register_init_t *const __start_vpci_array[];
 extern vpci_register_init_t *const __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+static int add_virtual_device(struct pci_dev *pdev)
+{
+struct domain *d = pdev->domain;
+unsigned long new_dev_number;
+
+if ( is_hardware_domain(d) )
+return 0;
+
+ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+/*
+ * Each PCI bus supports 32 devices/slots at max or up to 256 when
+ * there are multi-function ones which are not yet supported.
+ */
+if ( pdev->info.is_extfn && !pdev->info.is_virtfn )
+{
+gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
+ &pdev->sbdf);
+return -EOPNOTSUPP;
+}
+new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map,
+ VPCI_MAX_VIRT_DEV);
+if ( new_dev_number == VPCI_MAX_VIRT_DEV )
+{
+write_unlock(&pdev->domain->pci_lock);
+return -ENOSPC;
+}
+
+__set_bit(new_dev_number, &d->vpci_dev_assigned_map);
+
+/*
+ * Both segment and bus number are 0:
+ *  - we emulate a single host bridge for the guest, e.g. segment 0
+ *  - with bus 0 the virtual devices are seen as embedded
+ *endpoints behind the root complex
+ *
+ * TODO: add support for multi-function devices.
+ */
+pdev->vpci->guest_sbdf = PCI_SBDF(0, 0, new_dev_number, 0);
+
+return 0;
+}
+
+#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
+
 void vpci_deassign_device(struct pci_dev *pdev)
 {
 unsigned int i;
@@ -46,6 +92,13 @@ void vpci_deassign_device(struct pci_dev *pdev)
 return;
 
 spin_lock(&pdev->vpci->lock);
+
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+if ( pdev->vpci->guest_sbdf.sbdf != ~0 )
+__clear_bit(pdev->vpci->guest_sbdf.dev,
+&pdev->domain->vpci_dev_assigned_map);
+#endif
+
 while ( !list_empty(&pdev->vpci->handlers) )
 {

[PATCH v10 14/17] xen/arm: translate virtual PCI bus topology for guests

2023-10-12 Thread Volodymyr Babchuk

From: Oleksandr Andrushchenko 

There are three  originators for the PCI configuration space access:
1. The domain that owns physical host bridge: MMIO handlers are
there so we can update vPCI register handlers with the values
written by the hardware domain, e.g. physical view of the registers
vs guest's view on the configuration space.
2. Guest access to the passed through PCI devices: we need to properly
map virtual bus topology to the physical one, e.g. pass the configuration
space access to the corresponding physical devices.
3. Emulated host PCI bridge access. It doesn't exist in the physical
topology, e.g. it can't be mapped to some physical host bridge.
So, all access to the host bridge itself needs to be trapped and
emulated.

Signed-off-by: Oleksandr Andrushchenko 
---
Since v9:
- Commend about required lock replaced with ASSERT()
- Style fixes
- call to vpci_translate_virtual_device folded into vpci_sbdf_from_gpa
Since v8:
- locks moved out of vpci_translate_virtual_device()
Since v6:
- add pcidevs locking to vpci_translate_virtual_device
- update wrt to the new locking scheme
Since v5:
- add vpci_translate_virtual_device for #ifndef CONFIG_HAS_VPCI_GUEST_SUPPORT
  case to simplify ifdefery
- add ASSERT(!is_hardware_domain(d)); to vpci_translate_virtual_device
- reset output register on failed virtual SBDF translation
Since v4:
- indentation fixes
- constify struct domain
- updated commit message
- updates to the new locking scheme (pdev->vpci_lock)
Since v3:
- revisit locking
- move code to vpci.c
Since v2:
 - pass struct domain instead of struct vcpu
 - constify arguments where possible
 - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/arch/arm/vpci.c | 51 -
 xen/drivers/vpci/vpci.c | 25 +++-
 xen/include/xen/vpci.h  | 10 
 3 files changed, 74 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 3bc4bb5508..58e2a20135 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -7,31 +7,55 @@
 
 #include 
 
-static pci_sbdf_t vpci_sbdf_from_gpa(const struct pci_host_bridge *bridge,
- paddr_t gpa)
+static bool_t vpci_sbdf_from_gpa(struct domain *d,
+ const struct pci_host_bridge *bridge,
+ paddr_t gpa, pci_sbdf_t *sbdf)
 {
-pci_sbdf_t sbdf;
+ASSERT(sbdf);
 
 if ( bridge )
 {
-sbdf.sbdf = VPCI_ECAM_BDF(gpa - bridge->cfg->phys_addr);
-sbdf.seg = bridge->segment;
-sbdf.bus += bridge->cfg->busn_start;
+sbdf->sbdf = VPCI_ECAM_BDF(gpa - bridge->cfg->phys_addr);
+sbdf->seg = bridge->segment;
+sbdf->bus += bridge->cfg->busn_start;
 }
 else
-sbdf.sbdf = VPCI_ECAM_BDF(gpa - GUEST_VPCI_ECAM_BASE);
-
-return sbdf;
+{
+bool translated;
+
+/*
+ * For the passed through devices we need to map their virtual SBDF
+ * to the physical PCI device being passed through.
+ */
+sbdf->sbdf = VPCI_ECAM_BDF(gpa - GUEST_VPCI_ECAM_BASE);
+read_lock(&d->pci_lock);
+translated = vpci_translate_virtual_device(d, sbdf);
+read_unlock(&d->pci_lock);
+
+if ( !translated )
+{
+return false;
+}
+}
+return true;
 }
 
 static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
   register_t *r, void *p)
 {
 struct pci_host_bridge *bridge = p;
-pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
+pci_sbdf_t sbdf;
 /* data is needed to prevent a pointer cast on 32bit */
 unsigned long data;
 
+ASSERT(!bridge == !is_hardware_domain(v->domain));
+
+if ( !vpci_sbdf_from_gpa(v->domain, bridge, info->gpa, &sbdf) )
+{
+*r = ~0ul;
+return 1;
+}
+
 if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa),
 1U << info->dabt.size, &data) )
 {
@@ -48,7 +72,12 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
register_t r, void *p)
 {
 struct pci_host_bridge *bridge = p;
-pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
+pci_sbdf_t sbdf;
+
+ASSERT(!bridge == !is_hardware_domain(v->domain));
+
+if ( !vpci_sbdf_from_gpa(v->domain, bridge, info->gpa, &sbdf) )
+return 1;
 
 return vpci_ecam_write(sbdf, ECAM_REG_OFFSET(info->gpa),
1U << info->dabt.size, r);
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 7c46a2d3f4..0dee5118d6 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -80,6 +80,30 @@ static int add_virtual_device(struct pci_dev *pdev)
 return 0;
 }
 
+/*
+ * Find the physical device which is mapped to the virtual device
+ * and translate virtual SBDF to the physical one.
+ */
+bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf)

[PATCH v10 16/17] xen/arm: vpci: permit access to guest vpci space

2023-10-12 Thread Volodymyr Babchuk

From: Stewart Hildebrand 

Move iomem_caps initialization earlier (before arch_domain_create()).

Signed-off-by: Stewart Hildebrand 
---
Changes in v10:
* fix off-by-one
* also permit access to GUEST_VPCI_PREFETCH_MEM_ADDR

Changes in v9:
* new patch

This is sort of a follow-up to:

  baa6ea700386 ("vpci: add permission checks to map_range()")

I don't believe we need a fixes tag since this depends on the vPCI p2m BAR
patches.
---
 xen/arch/arm/vpci.c | 9 +
 xen/common/domain.c | 4 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 01b50d435e..3521d5bc2f 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -2,6 +2,7 @@
 /*
  * xen/arch/arm/vpci.c
  */
+#include 
 #include 
 #include 
 
@@ -119,8 +120,16 @@ int domain_vpci_init(struct domain *d)
 return ret;
 }
 else
+{
 register_mmio_handler(d, &vpci_mmio_handler,
   GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, 
NULL);
+iomem_permit_access(d, paddr_to_pfn(GUEST_VPCI_MEM_ADDR),
+paddr_to_pfn(GUEST_VPCI_MEM_ADDR +
+ GUEST_VPCI_MEM_SIZE - 1));
+iomem_permit_access(d, paddr_to_pfn(GUEST_VPCI_PREFETCH_MEM_ADDR),
+paddr_to_pfn(GUEST_VPCI_PREFETCH_MEM_ADDR +
+ GUEST_VPCI_PREFETCH_MEM_SIZE - 1));
+}
 
 return 0;
 }
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 785c69e48b..bf63fab29b 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -695,6 +695,9 @@ struct domain *domain_create(domid_t domid,
 radix_tree_init(&d->pirq_tree);
 }
 
+if ( !is_idle_domain(d) )
+d->iomem_caps = rangeset_new(d, "I/O Memory", 
RANGESETF_prettyprint_hex);
+
 if ( (err = arch_domain_create(d, config, flags)) != 0 )
 goto fail;
 init_status |= INIT_arch;
@@ -704,7 +707,6 @@ struct domain *domain_create(domid_t domid,
 watchdog_domain_init(d);
 init_status |= INIT_watchdog;
 
-d->iomem_caps = rangeset_new(d, "I/O Memory", 
RANGESETF_prettyprint_hex);
 d->irq_caps   = rangeset_new(d, "Interrupts", 0);
 if ( !d->iomem_caps || !d->irq_caps )
 goto fail;
-- 
2.42.0

[PATCH v10 02/17] pci: introduce per-domain PCI rwlock

2023-10-12 Thread Volodymyr Babchuk

Add per-domain d->pci_lock that protects access to
d->pdev_list. Purpose of this lock is to give guarantees to VPCI code
that underlying pdev will not disappear under feet. This is a rw-lock,
but this patch adds only write_lock()s. There will be read_lock()
users in the next patches.

This lock should be taken in write mode every time d->pdev_list is
altered. All write accesses also should be protected by pcidevs_lock()
as well. Idea is that any user that wants read access to the list or
to the devices stored in the list should use either this new
d->pci_lock or old pcidevs_lock(). Usage of any of this two locks will
ensure only that pdev of interest will not disappear from under feet
and that the pdev still will be assigned to the same domain. Of
course, any new users should use pcidevs_lock() when it is
appropriate (e.g. when accessing any other state that is protected by
the said lock). In case both the newly introduced per-domain rwlock
and the pcidevs lock is taken, the later must be acquired first.

Suggested-by: Roger Pau Monné 
Suggested-by: Jan Beulich 
Signed-off-by: Volodymyr Babchuk 

---

Changes in v10:
 - pdev->domain is assigned after removing from source domain but
   before adding to target domain in reassign_device() functions.

Changes in v9:
 - returned back "pdev->domain = target;" in AMD IOMMU code
 - used "source" instead of pdev->domain in IOMMU functions
 - added comment about lock ordering in the commit message
 - reduced locked regions
 - minor changes non-functional changes in various places

Changes in v8:
 - New patch

Changes in v8 vs RFC:
 - Removed all read_locks after discussion with Roger in #xendevel
 - pci_release_devices() now returns the first error code
 - extended commit message
 - added missing lock in pci_remove_device()
 - extended locked region in pci_add_device() to protect list_del() calls
---
 xen/common/domain.c |  1 +
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  9 ++-
 xen/drivers/passthrough/pci.c   | 71 +
 xen/drivers/passthrough/vtd/iommu.c |  9 ++-
 xen/include/xen/sched.h |  1 +
 5 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 8f9ab01c0c..785c69e48b 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -651,6 +651,7 @@ struct domain *domain_create(domid_t domid,
 
 #ifdef CONFIG_HAS_PCI
 INIT_LIST_HEAD(&d->pdev_list);
+rwlock_init(&d->pci_lock);
 #endif
 
 /* All error paths can depend on the above setup. */
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 836c24b02e..36a617bed4 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -476,8 +476,15 @@ static int cf_check reassign_device(
 
 if ( devfn == pdev->devfn && pdev->domain != target )
 {
-list_move(&pdev->domain_list, &target->pdev_list);
+write_lock(&source->pci_lock);
+list_del(&pdev->domain_list);
+write_unlock(&source->pci_lock);
+
 pdev->domain = target;
+
+write_lock(&target->pci_lock);
+list_add(&pdev->domain_list, &target->pdev_list);
+write_unlock(&target->pci_lock);
 }
 
 /*
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 04d00c7c37..b8ad4fa07c 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -453,7 +453,9 @@ static void __init _pci_hide_device(struct pci_dev *pdev)
 if ( pdev->domain )
 return;
 pdev->domain = dom_xen;
+write_lock(&dom_xen->pci_lock);
 list_add(&pdev->domain_list, &dom_xen->pdev_list);
+write_unlock(&dom_xen->pci_lock);
 }
 
 int __init pci_hide_device(unsigned int seg, unsigned int bus,
@@ -746,7 +748,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
 if ( !pdev->domain )
 {
 pdev->domain = hardware_domain;
+write_lock(&hardware_domain->pci_lock);
 list_add(&pdev->domain_list, &hardware_domain->pdev_list);
+write_unlock(&hardware_domain->pci_lock);
 
 /*
  * For devices not discovered by Xen during boot, add vPCI handlers
@@ -756,7 +760,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
 if ( ret )
 {
 printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
+write_lock(&hardware_domain->pci_lock);
 list_del(&pdev->domain_list);
+write_unlock(&hardware_domain->pci_lock);
 pdev->domain = NULL;
 goto out;
 }
@@ -764,7 +770,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
 if ( ret )
 {
 vpci_remove_device(pdev);
+write_lock(&hardware_domain->pci_lock);
 list_del(&pdev->domain_list);
+write_unlock(&hardware_domain->pci_lock);
 pdev->domain = NULL;
 goto out;
 }
@@ -814,7

[PATCH v10 04/17] vpci: restrict unhandled read/write operations for guests

2023-10-12 Thread Volodymyr Babchuk

From: Oleksandr Andrushchenko 

A guest would be able to read and write those registers which are not
emulated and have no respective vPCI handlers, so it will be possible
for it to access the hardware directly.
In order to prevent a guest from reads and writes from/to the unhandled
registers make sure only hardware domain can access the hardware directly
and restrict guests from doing so.

Suggested-by: Roger Pau Monné 
Signed-off-by: Oleksandr Andrushchenko 
Reviewed-by: Roger Pau Monné 

---
Since v9:
- removed stray formatting change
- added Roger's R-b tag
Since v6:
- do not use is_hwdom parameter for vpci_{read|write}_hw and use
  current->domain internally
- update commit message
New in v6
---
 xen/drivers/vpci/vpci.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 112de56fb3..724987e981 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -233,6 +233,10 @@ static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned int 
reg,
 {
 uint32_t data;
 
+/* Guest domains are not allowed to read real hardware. */
+if ( !is_hardware_domain(current->domain) )
+return ~(uint32_t)0;
+
 switch ( size )
 {
 case 4:
@@ -276,6 +280,10 @@ static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned int 
reg,
 static void vpci_write_hw(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
   uint32_t data)
 {
+/* Guest domains are not allowed to write real hardware. */
+if ( !is_hardware_domain(current->domain) )
+return;
+
 switch ( size )
 {
 case 4:
-- 
2.42.0

[PATCH v10 12/17] vpci/header: emulate PCI_COMMAND register for guests

2023-10-12 Thread Volodymyr Babchuk

From: Oleksandr Andrushchenko 

Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
guest's view of this will want to be zero initially, the host having set
it to 1 may not easily be overwritten with 0, or else we'd effectively
imply giving the guest control of the bit. Thus, PCI_COMMAND register needs
proper emulation in order to honor host's settings.

According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
Device Control" the reset state of the command register is typically 0,
so when assigning a PCI device use 0 as the initial state for the guest's view
of the command register.

Here is the full list of command register bits with notes about
emulation, along with QEMU behavior in the same situation:

PCI_COMMAND_IO - QEMU does not allow a guest to change value of this bit
in real device. Instead it is always set to 1. A guest can write to this
register, but writes are ignored.

PCI_COMMAND_MEMORY - QEMU behaves exactly as with PCI_COMMAND_IO. In
Xen case, we handle writes to this bit by mapping/unmapping BAR
regions. For devices assigned to DomUs, memory decoding will be
disabled and the initialization.

PCI_COMMAND_MASTER - Allow guest to control it. QEMU passes through
writes to this bit.

PCI_COMMAND_SPECIAL - Guest can generate special cycles only if it has
access to host bridge that supports software generation of special
cycles. In our case guest has no access to host bridges at all. Value
after reset is 0. QEMU passes through writes of this bit, we will do
the same.

PCI_COMMAND_INVALIDATE - Allows "Memory Write and Invalidate" commands
to be generated. It requires additional configuration via Cacheline
Size register. We are not emulating this register right now and we
can't expect guest to properly configure it. QEMU "emulates" access to
Cachline Size register by ignoring all writes to it. QEMU passes through
writes of PCI_COMMAND_INVALIDATE bit, we will do the same.

PCI_COMMAND_VGA_PALETTE - Enable VGA palette snooping. QEMU passes
through writes of this bit, we will do the same.

PCI_COMMAND_PARITY - Controls how device response to parity
errors. QEMU ignores writes to this bit, we will do the same.

PCI_COMMAND_WAIT - Reserved. Should be 0, but QEMU passes
through writes of this bit, so we will do the same.

PCI_COMMAND_SERR - Controls if device can assert SERR. QEMU ignores
writes to this bit, we will do the same.

PCI_COMMAND_FAST_BACK - Optional bit that allows fast back-to-back
transactions. It is configured by firmware, so we don't want guest to
control it. QEMU ignores writes to this bit, we will do the same.

PCI_COMMAND_INTX_DISABLE - Disables INTx signals. If MSI(X) is
enabled, device is prohibited from asserting INTx as per
specification. Value after reset is 0. In QEMU case, it checks of INTx
was mapped for a device. If it is not, then guest can't control
PCI_COMMAND_INTX_DISABLE bit. In our case, we prohibit a guest to
change value of this bit if MSI(X) is enabled.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
---
In v10:
- Added cf_check attribute to guest_cmd_read
- Removed warning about non-zero cmd
- Updated comment MSI code regarding disabling INTX
- Used ternary operator in vpci_add_register() call
- Disable memory decoding for DomUs in init_bars()
In v9:
- Reworked guest_cmd_read
- Added handling for more bits
Since v6:
- fold guest's logic into cmd_write
- implement cmd_read, so we can report emulated INTx state to guests
- introduce header->guest_cmd to hold the emulated state of the
  PCI_COMMAND register for guests
Since v5:
- add additional check for MSI-X enabled while altering INTX bit
- make sure INTx disabled while guests enable MSI/MSI-X
Since v3:
- gate more code on CONFIG_HAS_MSI
- removed logic for the case when MSI/MSI-X not enabled
---
 xen/drivers/vpci/header.c | 44 +++
 xen/drivers/vpci/msi.c|  6 ++
 xen/drivers/vpci/msix.c   |  4 
 xen/include/xen/vpci.h|  3 +++
 4 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index efce0bc2ae..e8eed6a674 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -501,14 +501,32 @@ static int modify_bars(const struct pci_dev *pdev, 
uint16_t cmd, bool rom_only)
 return 0;
 }
 
+/* TODO: Add proper emulation for all bits of the command register. */
 static void cf_check cmd_write(
 const struct pci_dev *pdev, unsigned int reg, uint32_t cmd, void *data)
 {
 struct vpci_header *header = data;
 
+if ( !is_hardware_domain(pdev->domain) )
+{
+const struct vpci *vpci = pdev->vpci;
+uint16_t excluded = PCI_COMMAND_PARITY | PCI_COMMAND_SERR |
+PCI_COMMAND_FAST_BACK;
+
+header->guest_cmd = cmd;
+
+if ( (vpci->msi && vpci->msi->enabled) ||
+ (vpci->msix && vpci->msi->enabled) )
+excl

[PATCH v10 05/17] vpci: add hooks for PCI device assign/de-assign

2023-10-12 Thread Volodymyr Babchuk

From: Oleksandr Andrushchenko 

When a PCI device gets assigned/de-assigned we need to
initialize/de-initialize vPCI state for the device.

Also, rename vpci_add_handlers() to vpci_assign_device() and
vpci_remove_device() to vpci_deassign_device() to better reflect role
of the functions.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
---

In v10:
- removed HAS_VPCI_GUEST_SUPPORT checks
- HAS_VPCI_GUEST_SUPPORT config option (in Kconfig) as it is not used
  anywhere
In v9:
- removed previous  vpci_[de]assign_device function and renamed
  existing handlers
- dropped attempts to handle errors in assign_device() function
- do not call vpci_assign_device for dom_io
- use d instead of pdev->domain
- use IS_ENABLED macro
In v8:
- removed vpci_deassign_device
In v6:
- do not pass struct domain to vpci_{assign|deassign}_device as
  pdev->domain can be used
- do not leave the device assigned (pdev->domain == new domain) in case
  vpci_assign_device fails: try to de-assign and if this also fails, then
  crash the domain
In v5:
- do not split code into run_vpci_init
- do not check for is_system_domain in vpci_{de}assign_device
- do not use vpci_remove_device_handlers_locked and re-allocate
  pdev->vpci completely
- make vpci_deassign_device void
In v4:
 - de-assign vPCI from the previous domain on device assignment
 - do not remove handlers in vpci_assign_device as those must not
   exist at that point
In v3:
 - remove toolstack roll-back description from the commit message
   as error are to be handled with proper cleanup in Xen itself
 - remove __must_check
 - remove redundant rc check while assigning devices
 - fix redundant CONFIG_HAS_VPCI check for CONFIG_HAS_VPCI_GUEST_SUPPORT
 - use REGISTER_VPCI_INIT machinery to run required steps on device
   init/assign: add run_vpci_init helper
In v2:
- define CONFIG_HAS_VPCI_GUEST_SUPPORT so dead code is not compiled
  for x86
In v1:
 - constify struct pci_dev where possible
 - do not open code is_system_domain()
 - extended the commit message
---
 xen/drivers/passthrough/pci.c | 20 
 xen/drivers/vpci/header.c |  2 +-
 xen/drivers/vpci/vpci.c   |  6 +++---
 xen/include/xen/vpci.h| 10 +-
 4 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 182da45acb..b7926a291c 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -755,7 +755,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
  * For devices not discovered by Xen during boot, add vPCI handlers
  * when Dom0 first informs Xen about such devices.
  */
-ret = vpci_add_handlers(pdev);
+ret = vpci_assign_device(pdev);
 if ( ret )
 {
 list_del(&pdev->domain_list);
@@ -769,7 +769,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
 if ( ret )
 {
 write_lock(&hardware_domain->pci_lock);
-vpci_remove_device(pdev);
+vpci_deassign_device(pdev);
 list_del(&pdev->domain_list);
 write_unlock(&hardware_domain->pci_lock);
 pdev->domain = NULL;
@@ -817,7 +817,7 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
 list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
 if ( pdev->bus == bus && pdev->devfn == devfn )
 {
-vpci_remove_device(pdev);
+vpci_deassign_device(pdev);
 pci_cleanup_msi(pdev);
 ret = iommu_remove_device(pdev);
 if ( pdev->domain )
@@ -875,6 +875,10 @@ static int deassign_device(struct domain *d, uint16_t seg, 
uint8_t bus,
 goto out;
 }
 
+write_lock(&d->pci_lock);
+vpci_deassign_device(pdev);
+write_unlock(&d->pci_lock);
+
 devfn = pdev->devfn;
 ret = iommu_call(hd->platform_ops, reassign_device, d, target, devfn,
  pci_to_dev(pdev));
@@ -1146,7 +1150,7 @@ static void __hwdom_init setup_one_hwdom_device(const 
struct setup_hwdom *ctxt,
   PCI_SLOT(devfn) == PCI_SLOT(pdev->devfn) );
 
 write_lock(&ctxt->d->pci_lock);
-err = vpci_add_handlers(pdev);
+err = vpci_assign_device(pdev);
 write_unlock(&ctxt->d->pci_lock);
 if ( err )
 printk(XENLOG_ERR "setup of vPCI for d%d failed: %d\n",
@@ -1476,6 +1480,10 @@ static int assign_device(struct domain *d, u16 seg, u8 
bus, u8 devfn, u32 flag)
 if ( pdev->broken && d != hardware_domain && d != dom_io )
 goto done;
 
+write_lock(&pdev->domain->pci_lock);
+vpci_deassign_device(pdev);
+write_unlock(&pdev->domain->pci_lock);
+
 rc = pdev_msix_assign(d, pdev);
 if ( rc )
 goto done;
@@ -1502,6 +1510,10 @@ static int assign_device(struct domain *d, u16 seg, u8 
bus, u8 devfn, u32 flag)
 pci_to_dev(pdev), flag);
 }
 
+write_lock(&d->pci_lock);
+rc = vpci_assign_device(pdev);
+write_unlock(&d->pc

[PATCH v10 09/17] rangeset: add rangeset_empty() function

2023-10-12 Thread Volodymyr Babchuk

This function can be used when user wants to remove all rangeset
entries but do not want to destroy rangeset itself.

Signed-off-by: Volodymyr Babchuk 

---

Changes in v10:

 - New in v10. The function is used in "vpci/header: handle p2m range sets per 
BAR"
---
 xen/common/rangeset.c  | 9 +++--
 xen/include/xen/rangeset.h | 3 ++-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index 35c3420885..420275669e 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -448,8 +448,7 @@ struct rangeset *rangeset_new(
 return r;
 }
 
-void rangeset_destroy(
-struct rangeset *r)
+void rangeset_empty(struct rangeset *r)
 {
 struct range *x;
 
@@ -465,6 +464,12 @@ void rangeset_destroy(
 
 while ( (x = first_range(r)) != NULL )
 destroy_range(r, x);
+}
+
+void rangeset_destroy(
+struct rangeset *r)
+{
+rangeset_empty(r);
 
 xfree(r);
 }
diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
index f7c69394d6..5eded7ffc5 100644
--- a/xen/include/xen/rangeset.h
+++ b/xen/include/xen/rangeset.h
@@ -56,7 +56,7 @@ void rangeset_limit(
 bool_t __must_check rangeset_is_empty(
 const struct rangeset *r);
 
-/* Add/claim/remove/query a numeric range. */
+/* Add/claim/remove/query/empty a numeric range. */
 int __must_check rangeset_add_range(
 struct rangeset *r, unsigned long s, unsigned long e);
 int __must_check rangeset_claim_range(struct rangeset *r, unsigned long size,
@@ -70,6 +70,7 @@ bool_t __must_check rangeset_overlaps_range(
 int rangeset_report_ranges(
 struct rangeset *r, unsigned long s, unsigned long e,
 int (*cb)(unsigned long s, unsigned long e, void *), void *ctxt);
+void rangeset_empty(struct rangeset *r);
 
 /*
  * Note that the consume function can return an error value apart from
-- 
2.42.0

[PATCH v10 01/17] pci: msi: pass pdev to pci_enable_msi() function

2023-10-12 Thread Volodymyr Babchuk

Previously pci_enable_msi() function obtained pdev pointer by itself,
but taking into account upcoming changes to PCI locking, it is better
when caller passes already acquired pdev pointer to the function.

Note that ns16550 driver does not check validity of obtained pdev
pointer because pci_enable_msi() already does this.

Signed-off-by: Volodymyr Babchuk 

---
Changes in v10:

 - New in v10. This is the result of discussion in "vpci: add initial
 support for virtual PCI bus topology"
---
 xen/arch/x86/include/asm/msi.h |  3 ++-
 xen/arch/x86/irq.c |  2 +-
 xen/arch/x86/msi.c | 19 ++-
 xen/drivers/char/ns16550.c |  4 +++-
 4 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
index a53ade95c9..836c8cd4ba 100644
--- a/xen/arch/x86/include/asm/msi.h
+++ b/xen/arch/x86/include/asm/msi.h
@@ -81,7 +81,8 @@ struct irq_desc;
 struct hw_interrupt_type;
 struct msi_desc;
 /* Helper functions */
-extern int pci_enable_msi(struct msi_info *msi, struct msi_desc **desc);
+extern int pci_enable_msi(struct msi_info *msi, struct msi_desc **desc,
+ struct pci_dev *pdev);
 extern void pci_disable_msi(struct msi_desc *desc);
 extern int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool off);
 extern void pci_cleanup_msi(struct pci_dev *pdev);
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 6abfd81621..68b788c42e 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2167,7 +2167,7 @@ int map_domain_pirq(
 if ( !pdev )
 goto done;
 
-ret = pci_enable_msi(msi, &msi_desc);
+ret = pci_enable_msi(msi, &msi_desc, pdev);
 if ( ret )
 {
 if ( ret > 0 )
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index a78367d7cf..20275260b3 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -983,13 +983,13 @@ static int msix_capability_init(struct pci_dev *dev,
  * irq or non-zero for otherwise.
  **/
 
-static int __pci_enable_msi(struct msi_info *msi, struct msi_desc **desc)
+static int __pci_enable_msi(struct msi_info *msi, struct msi_desc **desc,
+   struct pci_dev *pdev)
 {
-struct pci_dev *pdev;
 struct msi_desc *old_desc;
 
 ASSERT(pcidevs_locked());
-pdev = pci_get_pdev(NULL, msi->sbdf);
+
 if ( !pdev )
 return -ENODEV;
 
@@ -1038,13 +1038,13 @@ static void __pci_disable_msi(struct msi_desc *entry)
  * of irqs available. Driver should use the returned value to re-send
  * its request.
  **/
-static int __pci_enable_msix(struct msi_info *msi, struct msi_desc **desc)
+static int __pci_enable_msix(struct msi_info *msi, struct msi_desc **desc,
+struct pci_dev *pdev)
 {
-struct pci_dev *pdev;
 struct msi_desc *old_desc;
 
 ASSERT(pcidevs_locked());
-pdev = pci_get_pdev(NULL, msi->sbdf);
+
 if ( !pdev || !pdev->msix )
 return -ENODEV;
 
@@ -1151,15 +1151,16 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool 
off)
  * Notice: only construct the msi_desc
  * no change to irq_desc here, and the interrupt is masked
  */
-int pci_enable_msi(struct msi_info *msi, struct msi_desc **desc)
+int pci_enable_msi(struct msi_info *msi, struct msi_desc **desc,
+  struct pci_dev *pdev)
 {
 ASSERT(pcidevs_locked());
 
 if ( !use_msi )
 return -EPERM;
 
-return msi->table_base ? __pci_enable_msix(msi, desc) :
- __pci_enable_msi(msi, desc);
+return msi->table_base ? __pci_enable_msix(msi, desc, pdev) :
+__pci_enable_msi(msi, desc, pdev);
 }
 
 /*
diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
index 28ddedd50d..1856b72e63 100644
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -452,10 +452,12 @@ static void __init cf_check ns16550_init_postirq(struct 
serial_port *port)
 if ( rc > 0 )
 {
 struct msi_desc *msi_desc = NULL;
+struct pci_dev *pdev;
 
 pcidevs_lock();
 
-rc = pci_enable_msi(&msi, &msi_desc);
+pdev = pci_get_pdev(NULL, msi.sbdf);
+rc = pci_enable_msi(&msi, &msi_desc, pdev);
 if ( !rc )
 {
 struct irq_desc *desc = irq_to_desc(msi.irq);
-- 
2.42.0

[PATCH v10 00/17] PCI devices passthrough on Arm, part 3

2023-10-12 Thread Volodymyr Babchuk

Hello all,

This is next version of vPCI rework. Aim of this series is to prepare
ground for introducing PCI support on ARM platform.

in v10:

 - Removed patch ("xen/arm: vpci: check guest range"), proper fix
   for the issue is part of ("vpci/header: emulate PCI_COMMAND
   register for guests")
 - Removed patch ("pci/header: reset the command register when adding
   devices")
 - Added patch ("rangeset: add rangeset_empty() function") because
   this function is needed in ("vpci/header: handle p2m range sets
   per BAR")
 - Added ("vpci/header: handle p2m range sets per BAR") which addressed
   an issue discovered by Andrii Chepurnyi during virtio integration
 - Added ("pci: msi: pass pdev to pci_enable_msi() function"), which is
   prereq for ("pci: introduce per-domain PCI rwlock")
 - Fixed "Since v9/v8/... " comments in changelogs to reduce confusion.
   I left "Since" entries for older versions, because they were added
   by original author of the patches.

in v9:

v9 includes addressed commentes from a previous one. Also it
introduces a couple patches from Stewart. This patches are related to
vPCI use on ARM. Patch "vpci/header: rework exit path in init_bars"
was factored-out from "vpci/header: handle p2m range sets per BAR".

in v8:

The biggest change from previous, mistakenly named, v7 series is how
locking is implemented. Instead of d->vpci_rwlock we introduce
d->pci_lock which has broader scope, as it protects not only domain's
vpci state, but domain's list of PCI devices as well.

As we discussed in IRC with Roger, it is not feasible to rework all
the existing code to use the new lock right away. It was agreed that
any write access to d->pdev_list will be protected by **both**
d->pci_lock in write mode and pcidevs_lock(). Read access on other
hand should be protected by either d->pci_lock in read mode or
pcidevs_lock(). It is expected that existing code will use
pcidevs_lock() and new users will use new rw lock. Of course, this
does not mean that new users shall not use pcidevs_lock() when it is
appropriate.



Changes from previous versions are described in each separate patch.


Oleksandr Andrushchenko (11):
  vpci: use per-domain PCI lock to protect vpci structure
  vpci: restrict unhandled read/write operations for guests
  vpci: add hooks for PCI device assign/de-assign
  vpci/header: implement guest BAR register handlers
  rangeset: add RANGESETF_no_print flag
  vpci/header: handle p2m range sets per BAR
  vpci/header: program p2m with guest BAR view
  vpci/header: emulate PCI_COMMAND register for guests
  vpci: add initial support for virtual PCI bus topology
  xen/arm: translate virtual PCI bus topology for guests
  xen/arm: account IO handlers for emulated PCI MSI-X

Stewart Hildebrand (1):
  xen/arm: vpci: permit access to guest vpci space

Volodymyr Babchuk (5):
  pci: msi: pass pdev to pci_enable_msi() function
  pci: introduce per-domain PCI rwlock
  vpci/header: rework exit path in init_bars
  rangeset: add rangeset_empty() function
  arm/vpci: honor access size when returning an error

 xen/arch/arm/vpci.c |  78 +++-
 xen/arch/x86/hvm/vmsi.c |  26 +-
 xen/arch/x86/hvm/vmx/vmx.c  |   2 -
 xen/arch/x86/include/asm/irq.h  |   3 +-
 xen/arch/x86/include/asm/msi.h  |   3 +-
 xen/arch/x86/irq.c  |  14 +-
 xen/arch/x86/msi.c  |  25 +-
 xen/arch/x86/physdev.c  |   2 +-
 xen/common/domain.c |   5 +-
 xen/common/rangeset.c   |  14 +-
 xen/drivers/Kconfig |   4 +
 xen/drivers/char/ns16550.c  |   4 +-
 xen/drivers/passthrough/amd/pci_amd_iommu.c |   9 +-
 xen/drivers/passthrough/pci.c   |  94 +++-
 xen/drivers/passthrough/vtd/iommu.c |   9 +-
 xen/drivers/vpci/header.c   | 482 +++-
 xen/drivers/vpci/msi.c  |  34 +-
 xen/drivers/vpci/msix.c |  56 ++-
 xen/drivers/vpci/vpci.c | 157 ++-
 xen/include/xen/rangeset.h  |   8 +-
 xen/include/xen/sched.h |   9 +
 xen/include/xen/vpci.h  |  42 +-
 22 files changed, 875 insertions(+), 205 deletions(-)

-- 
2.42.0

[PATCH v10 10/17] vpci/header: handle p2m range sets per BAR

2023-10-12 Thread Volodymyr Babchuk

From: Oleksandr Andrushchenko 

Instead of handling a single range set, that contains all the memory
regions of all the BARs and ROM, have them per BAR.
As the range sets are now created when a PCI device is added and destroyed
when it is removed so make them named and accounted.

Note that rangesets were chosen here despite there being only up to
3 separate ranges in each set (typically just 1). But rangeset per BAR
was chosen for the ease of implementation and existing code re-usability.

This is in preparation of making non-identity mappings in p2m for the MMIOs.

Signed-off-by: Oleksandr Andrushchenko 

---
In v10:
- Added additional checks to vpci_process_pending()
- vpci_process_pending() now clears rangeset in case of failure
- Fixed locks in vpci_process_pending()
- Fixed coding style issues
- Fixed error handling in init_bars
In v9:
- removed d->vpci.map_pending in favor of checking v->vpci.pdev !=
NULL
- printk -> gprintk
- renamed bar variable to fix shadowing
- fixed bug with iterating on remote device's BARs
- relaxed lock in vpci_process_pending
- removed stale comment
Since v6:
- update according to the new locking scheme
- remove odd fail label in modify_bars
Since v5:
- fix comments
- move rangeset allocation to init_bars and only allocate
  for MAPPABLE BARs
- check for overlap with the already setup BAR ranges
Since v4:
- use named range sets for BARs (Jan)
- changes required by the new locking scheme
- updated commit message (Jan)
Since v3:
- re-work vpci_cancel_pending accordingly to the per-BAR handling
- s/num_mem_ranges/map_pending and s/uint8_t/bool
- ASSERT(bar->mem) in modify_bars
- create and destroy the rangesets on add/remove
---
 xen/drivers/vpci/header.c | 256 ++
 xen/drivers/vpci/vpci.c   |   6 +
 xen/include/xen/vpci.h|   2 +-
 3 files changed, 184 insertions(+), 80 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 40d1a07ada..5c056923ad 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -161,63 +161,106 @@ static void modify_decoding(const struct pci_dev *pdev, 
uint16_t cmd,
 
 bool vpci_process_pending(struct vcpu *v)
 {
-if ( v->vpci.mem )
+struct pci_dev *pdev = v->vpci.pdev;
+struct map_data data = {
+.d = v->domain,
+.map = v->vpci.cmd & PCI_COMMAND_MEMORY,
+};
+struct vpci_header *header = NULL;
+unsigned int i;
+
+if ( !pdev )
+return false;
+
+read_lock(&v->domain->pci_lock);
+
+if ( !pdev->vpci || (v->domain != pdev->domain) )
 {
-struct map_data data = {
-.d = v->domain,
-.map = v->vpci.cmd & PCI_COMMAND_MEMORY,
-};
-int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
+read_unlock(&v->domain->pci_lock);
+return false;
+}
+
+header = &pdev->vpci->header;
+for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+{
+struct vpci_bar *bar = &header->bars[i];
+int rc;
+
+if ( rangeset_is_empty(bar->mem) )
+continue;
+
+rc = rangeset_consume_ranges(bar->mem, map_range, &data);
 
 if ( rc == -ERESTART )
+{
+read_unlock(&v->domain->pci_lock);
 return true;
+}
 
-write_lock(&v->domain->pci_lock);
-spin_lock(&v->vpci.pdev->vpci->lock);
-/* Disable memory decoding unconditionally on failure. */
-modify_decoding(v->vpci.pdev,
-rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
-!rc && v->vpci.rom_only);
-spin_unlock(&v->vpci.pdev->vpci->lock);
-
-rangeset_destroy(v->vpci.mem);
-v->vpci.mem = NULL;
 if ( rc )
-/*
- * FIXME: in case of failure remove the device from the domain.
- * Note that there might still be leftover mappings. While this is
- * safe for Dom0, for DomUs the domain will likely need to be
- * killed in order to avoid leaking stale p2m mappings on
- * failure.
- */
-vpci_deassign_device(v->vpci.pdev);
-write_unlock(&v->domain->pci_lock);
+{
+spin_lock(&pdev->vpci->lock);
+/* Disable memory decoding unconditionally on failure. */
+modify_decoding(pdev, v->vpci.cmd & ~PCI_COMMAND_MEMORY,
+false);
+spin_unlock(&pdev->vpci->lock);
+
+/* Clean all the rangesets */
+for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+if ( !rangeset_is_empty(header->bars[i].mem) )
+ rangeset_empty(header->bars[i].mem);
+
+v->vpci.pdev = NULL;
+
+read_unlock(&v->domain->pci_lock);
+
+if ( !is_hardware_domain(v->domain) )
+domain_crash(v->domain);
+
+return false;
+}
 }
+v->vpci.pdev = NULL;
+
+spin_lock(&pdev->v

[PATCH v10 11/17] vpci/header: program p2m with guest BAR view

2023-10-12 Thread Volodymyr Babchuk

From: Oleksandr Andrushchenko 

Take into account guest's BAR view and program its p2m accordingly:
gfn is guest's view of the BAR and mfn is the physical BAR value.
This way hardware domain sees physical BAR values and guest sees
emulated ones.

Hardware domain continues getting the BARs identity mapped, while for
domUs the BARs are mapped at the requested guest address without
modifying the BAR address in the device PCI config space.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
---
In v10:
- Moved GFN variable definition outside the loop in map_range()
- Updated printk error message in map_range()
- Now BAR address is always stored in bar->guest_addr, even for
  HW dom, this removes bunch of ugly is_hwdom() checks in modify_bars()
- vmsix_table_base() now uses .guest_addr instead of .addr
In v9:
- Extended the commit message
- Use bar->guest_addr in modify_bars
- Extended printk error message in map_range
- Moved map_data initialization so .bar can be initialized during declaration
Since v5:
- remove debug print in map_range callback
- remove "identity" from the debug print
Since v4:
- moved start_{gfn|mfn} calculation into map_range
- pass vpci_bar in the map_data instead of start_{gfn|mfn}
- s/guest_addr/guest_reg
Since v3:
- updated comment (Roger)
- removed gfn_add(map->start_gfn, rc); which is wrong
- use v->domain instead of v->vpci.pdev->domain
- removed odd e.g. in comment
- s/d%d/%pd in altered code
- use gdprintk for map/unmap logs
Since v2:
- improve readability for data.start_gfn and restructure ?: construct
Since v1:
 - s/MSI/MSI-X in comments
---
 xen/drivers/vpci/header.c | 53 ---
 xen/include/xen/vpci.h|  3 ++-
 2 files changed, 41 insertions(+), 15 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 5c056923ad..efce0bc2ae 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -33,6 +33,7 @@
 
 struct map_data {
 struct domain *d;
+const struct vpci_bar *bar;
 bool map;
 };
 
@@ -40,11 +41,21 @@ static int cf_check map_range(
 unsigned long s, unsigned long e, void *data, unsigned long *c)
 {
 const struct map_data *map = data;
+/* Start address of the BAR as seen by the guest. */
+unsigned long start_gfn = PFN_DOWN(map->bar->guest_addr);
+/* Physical start address of the BAR. */
+mfn_t start_mfn = _mfn(PFN_DOWN(map->bar->addr));
 int rc;
 
 for ( ; ; )
 {
 unsigned long size = e - s + 1;
+/*
+ * Ranges to be mapped don't always start at the BAR start address, as
+ * there can be holes or partially consumed ranges. Account for the
+ * offset of the current address from the BAR start.
+ */
+mfn_t map_mfn = mfn_add(start_mfn, s - start_gfn);
 
 if ( !iomem_access_permitted(map->d, s, e) )
 {
@@ -72,8 +83,8 @@ static int cf_check map_range(
  * - {un}map_mmio_regions doesn't support preemption.
  */
 
-rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
-  : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
+rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, map_mfn)
+  : unmap_mmio_regions(map->d, _gfn(s), size, map_mfn);
 if ( rc == 0 )
 {
 *c += size;
@@ -82,8 +93,9 @@ static int cf_check map_range(
 if ( rc < 0 )
 {
 printk(XENLOG_G_WARNING
-   "Failed to identity %smap [%lx, %lx] for d%d: %d\n",
-   map->map ? "" : "un", s, e, map->d->domain_id, rc);
+   "Failed to %smap [%lx %lx] -> [%lx %lx] for %pd: %d\n",
+   map->map ? "" : "un", s, e, mfn_x(map_mfn),
+   mfn_x(map_mfn) + size, map->d, rc);
 break;
 }
 ASSERT(rc < size);
@@ -162,10 +174,6 @@ static void modify_decoding(const struct pci_dev *pdev, 
uint16_t cmd,
 bool vpci_process_pending(struct vcpu *v)
 {
 struct pci_dev *pdev = v->vpci.pdev;
-struct map_data data = {
-.d = v->domain,
-.map = v->vpci.cmd & PCI_COMMAND_MEMORY,
-};
 struct vpci_header *header = NULL;
 unsigned int i;
 
@@ -184,6 +192,11 @@ bool vpci_process_pending(struct vcpu *v)
 for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
 {
 struct vpci_bar *bar = &header->bars[i];
+struct map_data data = {
+.d = v->domain,
+.map = v->vpci.cmd & PCI_COMMAND_MEMORY,
+.bar = bar,
+};
 int rc;
 
 if ( rangeset_is_empty(bar->mem) )
@@ -234,7 +247,6 @@ bool vpci_process_pending(struct vcpu *v)
 static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
 uint16_t cmd)
 {
-struct map_data data = { .d = d, .map = true };
 struct vpci_header *header = &pdev->vpci->header;
 int rc = 0;
 unsigned int i;
@@ -244,6 +256,

[PATCH v10 03/17] vpci: use per-domain PCI lock to protect vpci structure

2023-10-12 Thread Volodymyr Babchuk

From: Oleksandr Andrushchenko 

Use a previously introduced per-domain read/write lock to check
whether vpci is present, so we are sure there are no accesses to the
contents of the vpci struct if not. This lock can be used (and in a
few cases is used right away) so that vpci removal can be performed
while holding the lock in write mode. Previously such removal could
race with vpci_read for example.

When taking both d->pci_lock and pdev->vpci->lock, they should be
taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid
possible deadlock situations.

1. Per-domain's pci_rwlock is used to protect pdev->vpci structure
from being removed.

2. Writing the command register and ROM BAR register may trigger
modify_bars to run, which in turn may access multiple pdevs while
checking for the existing BAR's overlap. The overlapping check, if
done under the read lock, requires vpci->lock to be acquired on both
devices being compared, which may produce a deadlock. It is not
possible to upgrade read lock to write lock in such a case. So, in
order to prevent the deadlock, use d->pci_lock instead. To prevent
deadlock while locking both hwdom->pci_lock and dom_xen->pci_lock,
always lock hwdom first.

All other code, which doesn't lead to pdev->vpci destruction and does
not access multiple pdevs at the same time, can still use a
combination of the read lock and pdev->vpci->lock.

3. Drop const qualifier where the new rwlock is used and this is
appropriate.

4. Do not call process_pending_softirqs with any locks held. For that
unlock prior the call and re-acquire the locks after. After
re-acquiring the lock there is no need to check if pdev->vpci exists:
 - in apply_map because of the context it is called (no race condition
   possible)
 - for MSI/MSI-X debug code because it is called at the end of
   pdev->vpci access and no further access to pdev->vpci is made

5. Use d->pci_lock around for_each_pdev and pci_get_pdev_by_domain
while accessing pdevs in vpci code.

6. We are removing multiple ASSERT(pcidevs_locked()) instances because
they are too strict now: they should be corrected to
ASSERT(pcidevs_locked() || rw_is_locked(&d->pci_lock)), but problem is
that mentioned instances does not have access to the domain
pointer and it is not feasible to pass a domain pointer to a function
just for debugging purposes.

There is a possible lock inversion in MSI code, as some parts of it
acquire pcidevs_lock() while already holding d->pci_lock.

Suggested-by: Roger Pau Monné 
Suggested-by: Jan Beulich 
Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 

---
Changes in v10:
 - Moved printk pas locked area
 - Returned back ASSERTs
 - Added new parameter to allocate_and_map_msi_pirq() so it knows if
 it should take the global pci lock
 - Added comment about possible improvement in vpci_write
 - Changed ASSERT(rw_is_locked()) to rw_is_write_locked() in
   appropriate places
 - Renamed release_domain_locks() to release_domain_write_locks()
 - moved domain_done label in vpci_dump_msi() to correct place
Changes in v9:
 - extended locked region to protect vpci_remove_device and
   vpci_add_handlers() calls
 - vpci_write() takes lock in the write mode to protect
   potential call to modify_bars()
 - renamed lock releasing function
 - removed ASSERT()s from msi code
 - added trylock in vpci_dump_msi

Changes in v8:
 - changed d->vpci_lock to d->pci_lock
 - introducing d->pci_lock in a separate patch
 - extended locked region in vpci_process_pending
 - removed pcidevs_lockis vpci_dump_msi()
 - removed some changes as they are not needed with
   the new locking scheme
 - added handling for hwdom && dom_xen case
---
 xen/arch/x86/hvm/vmsi.c| 26 -
 xen/arch/x86/hvm/vmx/vmx.c |  2 --
 xen/arch/x86/include/asm/irq.h |  3 +-
 xen/arch/x86/irq.c | 12 
 xen/arch/x86/msi.c | 10 ++-
 xen/arch/x86/physdev.c |  2 +-
 xen/drivers/passthrough/pci.c  |  9 +++---
 xen/drivers/vpci/header.c  | 18 
 xen/drivers/vpci/msi.c | 28 --
 xen/drivers/vpci/msix.c| 52 +-
 xen/drivers/vpci/vpci.c| 51 +++--
 11 files changed, 166 insertions(+), 47 deletions(-)

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 128f236362..6b33a80120 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -468,7 +468,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq 
*pirq, uint64_t gtable)
 struct msixtbl_entry *entry, *new_entry;
 int r = -EINVAL;
 
-ASSERT(pcidevs_locked());
+ASSERT(pcidevs_locked() || rw_is_locked(&d->pci_lock));
 ASSERT(rw_is_write_locked(&d->event_lock));
 
 if ( !msixtbl_initialised(d) )
@@ -538,7 +538,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq 
*pirq)
 struct pci_dev *pdev;
 struct msixtbl_entry *entry;
 
-ASSERT(pcidevs_locked());
+ASSERT(pcidevs

[PATCH v10 06/17] vpci/header: rework exit path in init_bars

2023-10-12 Thread Volodymyr Babchuk

Introduce "fail" label in init_bars() function to have the centralized
error return path. This is the pre-requirement for the future changes
in this function.

This patch does not introduce functional changes.

Signed-off-by: Volodymyr Babchuk 
Suggested-by: Roger Pau Monné 
Acked-by: Roger Pau Monné 
--
In v10:
- Added Roger's A-b tag.
In v9:
- New in v9
---
 xen/drivers/vpci/header.c | 20 +++-
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 176fe16b9f..33db58580c 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -581,11 +581,7 @@ static int cf_check init_bars(struct pci_dev *pdev)
 rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
4, &bars[i]);
 if ( rc )
-{
-pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
-return rc;
-}
-
+goto fail;
 continue;
 }
 
@@ -604,10 +600,7 @@ static int cf_check init_bars(struct pci_dev *pdev)
 rc = pci_size_mem_bar(pdev->sbdf, reg, &addr, &size,
   (i == num_bars - 1) ? PCI_BAR_LAST : 0);
 if ( rc < 0 )
-{
-pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
-return rc;
-}
+goto fail;
 
 if ( size == 0 )
 {
@@ -622,10 +615,7 @@ static int cf_check init_bars(struct pci_dev *pdev)
 rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
&bars[i]);
 if ( rc )
-{
-pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
-return rc;
-}
+goto fail;
 }
 
 /* Check expansion ROM. */
@@ -647,6 +637,10 @@ static int cf_check init_bars(struct pci_dev *pdev)
 }
 
 return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;
+
+ fail:
+pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
+return rc;
 }
 REGISTER_VPCI_INIT(init_bars, VPCI_PRIORITY_MIDDLE);
 
-- 
2.42.0

[PATCH v10 07/17] vpci/header: implement guest BAR register handlers

2023-10-12 Thread Volodymyr Babchuk

From: Oleksandr Andrushchenko 

Add relevant vpci register handlers when assigning PCI device to a domain
and remove those when de-assigning. This allows having different
handlers for different domains, e.g. hwdom and other guests.

Emulate guest BAR register values: this allows creating a guest view
of the registers and emulates size and properties probe as it is done
during PCI device enumeration by the guest.

All empty, IO and ROM BARs for guests are emulated by returning 0 on
reads and ignoring writes: this BARs are special with this respect as
their lower bits have special meaning, so returning default ~0 on read
may confuse guest OS.

Signed-off-by: Oleksandr Andrushchenko 
---
In v10:
- ull -> ULL to be MISRA-compatbile
- Use PAGE_OFFSET() instead of combining with ~PAGE_MASK
- Set type of empty bars to VPCI_BAR_EMPTY
In v9:
- factored-out "fail" label introduction in init_bars()
- replaced #ifdef CONFIG_X86 with IS_ENABLED()
- do not pass bars[i] to empty_bar_read() handler
- store guest's BAR address instead of guests BAR register view
Since v6:
- unify the writing of the PCI_COMMAND register on the
  error path into a label
- do not introduce bar_ignore_access helper and open code
- s/guest_bar_ignore_read/empty_bar_read
- update error message in guest_bar_write
- only setup empty_bar_read for IO if !x86
Since v5:
- make sure that the guest set address has the same page offset
  as the physical address on the host
- remove guest_rom_{read|write} as those just implement the default
  behaviour of the registers not being handled
- adjusted comment for struct vpci.addr field
- add guest handlers for BARs which are not handled and will otherwise
  return ~0 on read and ignore writes. The BARs are special with this
  respect as their lower bits have special meaning, so returning ~0
  doesn't seem to be right
Since v4:
- updated commit message
- s/guest_addr/guest_reg
Since v3:
- squashed two patches: dynamic add/remove handlers and guest BAR
  handler implementation
- fix guest BAR read of the high part of a 64bit BAR (Roger)
- add error handling to vpci_assign_device
- s/dom%pd/%pd
- blank line before return
Since v2:
- remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
  has been eliminated from being built on x86
Since v1:
 - constify struct pci_dev where possible
 - do not open code is_system_domain()
 - simplify some code3. simplify
 - use gdprintk + error code instead of gprintk
 - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
   so these do not get compiled for x86
 - removed unneeded is_system_domain check
 - re-work guest read/write to be much simpler and do more work on write
   than read which is expected to be called more frequently
 - removed one too obvious comment
---
 xen/drivers/vpci/header.c | 137 +-
 xen/include/xen/vpci.h|   3 +
 2 files changed, 123 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 33db58580c..40d1a07ada 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -477,6 +477,74 @@ static void cf_check bar_write(
 pci_conf_write32(pdev->sbdf, reg, val);
 }
 
+static void cf_check guest_bar_write(const struct pci_dev *pdev,
+ unsigned int reg, uint32_t val, void 
*data)
+{
+struct vpci_bar *bar = data;
+bool hi = false;
+uint64_t guest_addr = bar->guest_addr;
+
+if ( bar->type == VPCI_BAR_MEM64_HI )
+{
+ASSERT(reg > PCI_BASE_ADDRESS_0);
+bar--;
+hi = true;
+}
+else
+{
+val &= PCI_BASE_ADDRESS_MEM_MASK;
+}
+
+guest_addr &= ~(0xULL << (hi ? 32 : 0));
+guest_addr |= (uint64_t)val << (hi ? 32 : 0);
+
+/* Allow guest to size BAR correctly */
+guest_addr &= ~(bar->size - 1);
+
+/*
+ * Make sure that the guest set address has the same page offset
+ * as the physical address on the host or otherwise things won't work as
+ * expected.
+ */
+if ( guest_addr != ~(bar->size -1 )  &&
+ PAGE_OFFSET(guest_addr) != PAGE_OFFSET(bar->addr) )
+{
+gprintk(XENLOG_WARNING,
+"%pp: ignored BAR %zu write attempting to change page 
offset\n",
+&pdev->sbdf, bar - pdev->vpci->header.bars + hi);
+return;
+}
+
+bar->guest_addr = guest_addr;
+}
+
+static uint32_t cf_check guest_bar_read(const struct pci_dev *pdev,
+unsigned int reg, void *data)
+{
+const struct vpci_bar *bar = data;
+uint32_t reg_val;
+
+if ( bar->type == VPCI_BAR_MEM64_HI )
+{
+ASSERT(reg > PCI_BASE_ADDRESS_0);
+bar--;
+return bar->guest_addr >> 32;
+}
+
+reg_val = bar->guest_addr;
+reg_val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32 :
+ PCI_BASE_ADDRESS_MEM_TYPE_64;
+reg_val |= bar->prefetchable ? PCI

[PATCH v10 08/17] rangeset: add RANGESETF_no_print flag

2023-10-12 Thread Volodymyr Babchuk

From: Oleksandr Andrushchenko 

There are range sets which should not be printed, so introduce a flag
which allows marking those as such. Implement relevant logic to skip
such entries while printing.

While at it also simplify the definition of the flags by directly
defining those without helpers.

Suggested-by: Jan Beulich 
Signed-off-by: Oleksandr Andrushchenko 
Reviewed-by: Jan Beulich 
---
Since v5:
- comment indentation (Jan)
Since v1:
- update BUG_ON with new flag
- simplify the definition of the flags
---
 xen/common/rangeset.c  | 5 -
 xen/include/xen/rangeset.h | 5 +++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index f3baf52ab6..35c3420885 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -433,7 +433,7 @@ struct rangeset *rangeset_new(
 INIT_LIST_HEAD(&r->range_list);
 r->nr_ranges = -1;
 
-BUG_ON(flags & ~RANGESETF_prettyprint_hex);
+BUG_ON(flags & ~(RANGESETF_prettyprint_hex | RANGESETF_no_print));
 r->flags = flags;
 
 safe_strcpy(r->name, name ?: "(no name)");
@@ -575,6 +575,9 @@ void rangeset_domain_printk(
 
 list_for_each_entry ( r, &d->rangesets, rangeset_list )
 {
+if ( r->flags & RANGESETF_no_print )
+continue;
+
 printk("");
 rangeset_printk(r);
 printk("\n");
diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
index 135f33f606..f7c69394d6 100644
--- a/xen/include/xen/rangeset.h
+++ b/xen/include/xen/rangeset.h
@@ -49,8 +49,9 @@ void rangeset_limit(
 
 /* Flags for passing to rangeset_new(). */
  /* Pretty-print range limits in hexadecimal. */
-#define _RANGESETF_prettyprint_hex 0
-#define RANGESETF_prettyprint_hex  (1U << _RANGESETF_prettyprint_hex)
+#define RANGESETF_prettyprint_hex   (1U << 0)
+ /* Do not print entries marked with this flag. */
+#define RANGESETF_no_print  (1U << 1)
 
 bool_t __must_check rangeset_is_empty(
 const struct rangeset *r);
-- 
2.42.0

Re: [PATCH v2 0/4] Introduce %rip-relative addressing to PER_CPU_VAR macro

2023-10-12 Thread H. Peter Anvin


On 10/12/23 14:17, Uros Bizjak wrote:


Are you PIC-adjusting the percpu variables as well?


After this patch (and after fixing percpu_stable_op to use "a" operand
modifier on GCC), the only *one* remaining absolute reference to
percpu variable remain in xen-head.S, where:

 movq$INIT_PER_CPU_VAR(fixed_percpu_data),%rax

should be changed to use leaq.

All others should then be (%rip)-relative.



I mean, the symbols themselves are relative, not absolute?

-hpa

Re: [PATCH v2 0/4] Introduce %rip-relative addressing to PER_CPU_VAR macro

2023-10-12 Thread Uros Bizjak

On Thu, Oct 12, 2023 at 11:08 PM H. Peter Anvin  wrote:
>
> On 10/12/23 13:59, Uros Bizjak wrote:
> > On Thu, Oct 12, 2023 at 10:53 PM Dave Hansen  wrote:
> >>
> >> On 10/12/23 13:12, Uros Bizjak wrote:
> >>> The last patch introduces (%rip) suffix and uses it for x86_64 target,
> >>> resulting in a small code size decrease: text data bss dec hex filename
> >>> 25510677 4386685 808388 30705750 1d48856 vmlinux-new.o 25510629 4386685
> >>> 808388 30705702 1d48826 vmlinux-old.o
> >>
> >> I feel like I'm missing some of the motivation here.
> >>
> >> 50 bytes is great and all, but it isn't without the cost of changing
> >> some rules and introducing potential PER_CPU_ARG() vs. PER_CPU_VAR()
> >> confusion.
> >>
> >> Are there some other side benefits?  What else does this enable?
> >
> > These changes are necessary to build the kernel as Position
> > Independent Executable (PIE) on x86_64 [1]. And since I was working in
> > percpu area I thought that it was worth implementing them.
> >
> > [1] 
> > https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong@antgroup.com/
> >
>
> Are you PIC-adjusting the percpu variables as well?

After this patch (and after fixing percpu_stable_op to use "a" operand
modifier on GCC), the only *one* remaining absolute reference to
percpu variable remain in xen-head.S, where:

movq$INIT_PER_CPU_VAR(fixed_percpu_data),%rax

should be changed to use leaq.

All others should then be (%rip)-relative.

Uros.

Re: [PATCH v2 0/4] Introduce %rip-relative addressing to PER_CPU_VAR macro

2023-10-12 Thread H. Peter Anvin


On 10/12/23 13:59, Uros Bizjak wrote:

On Thu, Oct 12, 2023 at 10:53 PM Dave Hansen  wrote:


On 10/12/23 13:12, Uros Bizjak wrote:

The last patch introduces (%rip) suffix and uses it for x86_64 target,
resulting in a small code size decrease: text data bss dec hex filename
25510677 4386685 808388 30705750 1d48856 vmlinux-new.o 25510629 4386685
808388 30705702 1d48826 vmlinux-old.o


I feel like I'm missing some of the motivation here.

50 bytes is great and all, but it isn't without the cost of changing
some rules and introducing potential PER_CPU_ARG() vs. PER_CPU_VAR()
confusion.

Are there some other side benefits?  What else does this enable?


These changes are necessary to build the kernel as Position
Independent Executable (PIE) on x86_64 [1]. And since I was working in
percpu area I thought that it was worth implementing them.

[1] 
https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong@antgroup.com/



Are you PIC-adjusting the percpu variables as well?

-hpa

Re: [PATCH 1/4] x86/percpu: Use explicit segment registers in lib/cmpxchg{8,16}b_emu.S

2023-10-12 Thread H. Peter Anvin


On 10/12/23 14:02, H. Peter Anvin wrote:>

%fs??

>

Nevermind, I forgot that we changed from %gs to %fs on i386 at some 
point in the now-distant past.


-hpa

Re: [PATCH 1/4] x86/percpu: Use explicit segment registers in lib/cmpxchg{8,16}b_emu.S

2023-10-12 Thread Uros Bizjak

On Thu, Oct 12, 2023 at 11:02 PM H. Peter Anvin  wrote:
>
> On October 12, 2023 9:10:36 AM PDT, Uros Bizjak  wrote:
> >PER_CPU_VAR macro is intended to be applied to a symbol, it is not
> >intended to be used as a selector between %fs and %gs segment
> >registers for general operands.
> >
> >The address to these emulation functions is passed in a register, so
> >use explicit segment registers to access percpu variable instead.
> >
> >Also add a missing function comment to this_cpu_cmpxchg8b_emu.
> >
> >No functional changes intended.
> >
> >Cc: Thomas Gleixner 
> >Cc: Ingo Molnar 
> >Cc: Borislav Petkov 
> >Cc: Dave Hansen 
> >Cc: "H. Peter Anvin" 
> >Cc: Peter Zijlstra 
> >Signed-off-by: Uros Bizjak 
> >---
> > arch/x86/lib/cmpxchg16b_emu.S | 12 ++--
> > arch/x86/lib/cmpxchg8b_emu.S  | 30 +-
> > 2 files changed, 27 insertions(+), 15 deletions(-)
> >
> >diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S
> >index 6962df315793..2bd8b89bce75 100644
> >--- a/arch/x86/lib/cmpxchg16b_emu.S
> >+++ b/arch/x86/lib/cmpxchg16b_emu.S
> >@@ -23,14 +23,14 @@ SYM_FUNC_START(this_cpu_cmpxchg16b_emu)
> >   cli
> >
> >   /* if (*ptr == old) */
> >-  cmpqPER_CPU_VAR(0(%rsi)), %rax
> >+  cmpq%gs:(%rsi), %rax
> >   jne .Lnot_same
> >-  cmpqPER_CPU_VAR(8(%rsi)), %rdx
> >+  cmpq%gs:8(%rsi), %rdx
> >   jne .Lnot_same
> >
> >   /* *ptr = new */
> >-  movq%rbx, PER_CPU_VAR(0(%rsi))
> >-  movq%rcx, PER_CPU_VAR(8(%rsi))
> >+  movq%rbx, %gs:(%rsi)
> >+  movq%rcx, %gs:8(%rsi)
> >
> >   /* set ZF in EFLAGS to indicate success */
> >   orl $X86_EFLAGS_ZF, (%rsp)
> >@@ -42,8 +42,8 @@ SYM_FUNC_START(this_cpu_cmpxchg16b_emu)
> >   /* *ptr != old */
> >
> >   /* old = *ptr */
> >-  movqPER_CPU_VAR(0(%rsi)), %rax
> >-  movqPER_CPU_VAR(8(%rsi)), %rdx
> >+  movq%gs:(%rsi), %rax
> >+  movq%gs:8(%rsi), %rdx
> >
> >   /* clear ZF in EFLAGS to indicate failure */
> >   andl$(~X86_EFLAGS_ZF), (%rsp)
> >diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
> >index 49805257b125..b7d68d5e2d31 100644
> >--- a/arch/x86/lib/cmpxchg8b_emu.S
> >+++ b/arch/x86/lib/cmpxchg8b_emu.S
> >@@ -24,12 +24,12 @@ SYM_FUNC_START(cmpxchg8b_emu)
> >   pushfl
> >   cli
> >
> >-  cmpl0(%esi), %eax
> >+  cmpl(%esi), %eax
> >   jne .Lnot_same
> >   cmpl4(%esi), %edx
> >   jne .Lnot_same
> >
> >-  movl%ebx, 0(%esi)
> >+  movl%ebx, (%esi)
> >   movl%ecx, 4(%esi)
> >
> >   orl $X86_EFLAGS_ZF, (%esp)
> >@@ -38,7 +38,7 @@ SYM_FUNC_START(cmpxchg8b_emu)
> >   RET
> >
> > .Lnot_same:
> >-  movl0(%esi), %eax
> >+  movl(%esi), %eax
> >   movl4(%esi), %edx
> >
> >   andl$(~X86_EFLAGS_ZF), (%esp)
> >@@ -53,18 +53,30 @@ EXPORT_SYMBOL(cmpxchg8b_emu)
> >
> > #ifndef CONFIG_UML
> >
> >+/*
> >+ * Emulate 'cmpxchg8b %fs:(%esi)'
> >+ *
> >+ * Inputs:
> >+ * %esi : memory location to compare
> >+ * %eax : low 32 bits of old value
> >+ * %edx : high 32 bits of old value
> >+ * %ebx : low 32 bits of new value
> >+ * %ecx : high 32 bits of new value
> >+ *
> >+ * Notably this is not LOCK prefixed and is not safe against NMIs
> >+ */
> > SYM_FUNC_START(this_cpu_cmpxchg8b_emu)
> >
> >   pushfl
> >   cli
> >
> >-  cmplPER_CPU_VAR(0(%esi)), %eax
> >+  cmpl%fs:(%esi), %eax
> >   jne .Lnot_same2
> >-  cmplPER_CPU_VAR(4(%esi)), %edx
> >+  cmpl%fs:4(%esi), %edx
> >   jne .Lnot_same2
> >
> >-  movl%ebx, PER_CPU_VAR(0(%esi))
> >-  movl%ecx, PER_CPU_VAR(4(%esi))
> >+  movl%ebx, %fs:(%esi)
> >+  movl%ecx, %fs:4(%esi)
> >
> >   orl $X86_EFLAGS_ZF, (%esp)
> >
> >@@ -72,8 +84,8 @@ SYM_FUNC_START(this_cpu_cmpxchg8b_emu)
> >   RET
> >
> > .Lnot_same2:
> >-  movlPER_CPU_VAR(0(%esi)), %eax
> >-  movlPER_CPU_VAR(4(%esi)), %edx
> >+  movl%fs:(%esi), %eax
> >+  movl%fs:4(%esi), %edx
> >
> >   andl$(~X86_EFLAGS_ZF), (%esp)
> >
>
> %fs??

Yes, 32-bit targets default to %fs. Please also note a new version
(v2) of the patch that reimplements this part.

Uros.

Re: [PATCH 1/4] x86/percpu: Use explicit segment registers in lib/cmpxchg{8,16}b_emu.S

2023-10-12 Thread H. Peter Anvin

On October 12, 2023 9:10:36 AM PDT, Uros Bizjak  wrote:
>PER_CPU_VAR macro is intended to be applied to a symbol, it is not
>intended to be used as a selector between %fs and %gs segment
>registers for general operands.
>
>The address to these emulation functions is passed in a register, so
>use explicit segment registers to access percpu variable instead.
>
>Also add a missing function comment to this_cpu_cmpxchg8b_emu.
>
>No functional changes intended.
>
>Cc: Thomas Gleixner 
>Cc: Ingo Molnar 
>Cc: Borislav Petkov 
>Cc: Dave Hansen 
>Cc: "H. Peter Anvin" 
>Cc: Peter Zijlstra 
>Signed-off-by: Uros Bizjak 
>---
> arch/x86/lib/cmpxchg16b_emu.S | 12 ++--
> arch/x86/lib/cmpxchg8b_emu.S  | 30 +-
> 2 files changed, 27 insertions(+), 15 deletions(-)
>
>diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S
>index 6962df315793..2bd8b89bce75 100644
>--- a/arch/x86/lib/cmpxchg16b_emu.S
>+++ b/arch/x86/lib/cmpxchg16b_emu.S
>@@ -23,14 +23,14 @@ SYM_FUNC_START(this_cpu_cmpxchg16b_emu)
>   cli
> 
>   /* if (*ptr == old) */
>-  cmpqPER_CPU_VAR(0(%rsi)), %rax
>+  cmpq%gs:(%rsi), %rax
>   jne .Lnot_same
>-  cmpqPER_CPU_VAR(8(%rsi)), %rdx
>+  cmpq%gs:8(%rsi), %rdx
>   jne .Lnot_same
> 
>   /* *ptr = new */
>-  movq%rbx, PER_CPU_VAR(0(%rsi))
>-  movq%rcx, PER_CPU_VAR(8(%rsi))
>+  movq%rbx, %gs:(%rsi)
>+  movq%rcx, %gs:8(%rsi)
> 
>   /* set ZF in EFLAGS to indicate success */
>   orl $X86_EFLAGS_ZF, (%rsp)
>@@ -42,8 +42,8 @@ SYM_FUNC_START(this_cpu_cmpxchg16b_emu)
>   /* *ptr != old */
> 
>   /* old = *ptr */
>-  movqPER_CPU_VAR(0(%rsi)), %rax
>-  movqPER_CPU_VAR(8(%rsi)), %rdx
>+  movq%gs:(%rsi), %rax
>+  movq%gs:8(%rsi), %rdx
> 
>   /* clear ZF in EFLAGS to indicate failure */
>   andl$(~X86_EFLAGS_ZF), (%rsp)
>diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
>index 49805257b125..b7d68d5e2d31 100644
>--- a/arch/x86/lib/cmpxchg8b_emu.S
>+++ b/arch/x86/lib/cmpxchg8b_emu.S
>@@ -24,12 +24,12 @@ SYM_FUNC_START(cmpxchg8b_emu)
>   pushfl
>   cli
> 
>-  cmpl0(%esi), %eax
>+  cmpl(%esi), %eax
>   jne .Lnot_same
>   cmpl4(%esi), %edx
>   jne .Lnot_same
> 
>-  movl%ebx, 0(%esi)
>+  movl%ebx, (%esi)
>   movl%ecx, 4(%esi)
> 
>   orl $X86_EFLAGS_ZF, (%esp)
>@@ -38,7 +38,7 @@ SYM_FUNC_START(cmpxchg8b_emu)
>   RET
> 
> .Lnot_same:
>-  movl0(%esi), %eax
>+  movl(%esi), %eax
>   movl4(%esi), %edx
> 
>   andl$(~X86_EFLAGS_ZF), (%esp)
>@@ -53,18 +53,30 @@ EXPORT_SYMBOL(cmpxchg8b_emu)
> 
> #ifndef CONFIG_UML
> 
>+/*
>+ * Emulate 'cmpxchg8b %fs:(%esi)'
>+ *
>+ * Inputs:
>+ * %esi : memory location to compare
>+ * %eax : low 32 bits of old value
>+ * %edx : high 32 bits of old value
>+ * %ebx : low 32 bits of new value
>+ * %ecx : high 32 bits of new value
>+ *
>+ * Notably this is not LOCK prefixed and is not safe against NMIs
>+ */
> SYM_FUNC_START(this_cpu_cmpxchg8b_emu)
> 
>   pushfl
>   cli
> 
>-  cmplPER_CPU_VAR(0(%esi)), %eax
>+  cmpl%fs:(%esi), %eax
>   jne .Lnot_same2
>-  cmplPER_CPU_VAR(4(%esi)), %edx
>+  cmpl%fs:4(%esi), %edx
>   jne .Lnot_same2
> 
>-  movl%ebx, PER_CPU_VAR(0(%esi))
>-  movl%ecx, PER_CPU_VAR(4(%esi))
>+  movl%ebx, %fs:(%esi)
>+  movl%ecx, %fs:4(%esi)
> 
>   orl $X86_EFLAGS_ZF, (%esp)
> 
>@@ -72,8 +84,8 @@ SYM_FUNC_START(this_cpu_cmpxchg8b_emu)
>   RET
> 
> .Lnot_same2:
>-  movlPER_CPU_VAR(0(%esi)), %eax
>-  movlPER_CPU_VAR(4(%esi)), %edx
>+  movl%fs:(%esi), %eax
>+  movl%fs:4(%esi), %edx
> 
>   andl$(~X86_EFLAGS_ZF), (%esp)
> 

%fs??

Re: [PATCH v2 0/4] Introduce %rip-relative addressing to PER_CPU_VAR macro

2023-10-12 Thread Uros Bizjak

On Thu, Oct 12, 2023 at 10:53 PM Dave Hansen  wrote:
>
> On 10/12/23 13:12, Uros Bizjak wrote:
> > The last patch introduces (%rip) suffix and uses it for x86_64 target,
> > resulting in a small code size decrease: text data bss dec hex filename
> > 25510677 4386685 808388 30705750 1d48856 vmlinux-new.o 25510629 4386685
> > 808388 30705702 1d48826 vmlinux-old.o
>
> I feel like I'm missing some of the motivation here.
>
> 50 bytes is great and all, but it isn't without the cost of changing
> some rules and introducing potential PER_CPU_ARG() vs. PER_CPU_VAR()
> confusion.
>
> Are there some other side benefits?  What else does this enable?

These changes are necessary to build the kernel as Position
Independent Executable (PIE) on x86_64 [1]. And since I was working in
percpu area I thought that it was worth implementing them.

[1] 
https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong@antgroup.com/

Uros.

Re: [PATCH v2 0/4] Introduce %rip-relative addressing to PER_CPU_VAR macro

2023-10-12 Thread Dave Hansen

On 10/12/23 13:12, Uros Bizjak wrote:
> The last patch introduces (%rip) suffix and uses it for x86_64 target,
> resulting in a small code size decrease: text data bss dec hex filename
> 25510677 4386685 808388 30705750 1d48856 vmlinux-new.o 25510629 4386685
> 808388 30705702 1d48826 vmlinux-old.o

I feel like I'm missing some of the motivation here.

50 bytes is great and all, but it isn't without the cost of changing
some rules and introducing potential PER_CPU_ARG() vs. PER_CPU_VAR()
confusion.

Are there some other side benefits?  What else does this enable?

[PATCH v2 4/4] x86/percpu: Introduce %rip-relative addressing to PER_CPU_VAR macro

2023-10-12 Thread Uros Bizjak

Introduce x86_64 %rip-relative addressing to PER_CPU_VAR macro.
Instruction with %rip-relative address operand is one byte shorter than
its absolute address counterpart and is also compatible with position
independent executable (-fpie) build.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Cc: Peter Zijlstra 
Signed-off-by: Uros Bizjak 
---
 arch/x86/include/asm/percpu.h | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 83e6a4bcea38..c53c5a7f8e78 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -4,19 +4,21 @@
 
 #ifdef CONFIG_X86_64
 #define __percpu_seg   gs
+#define __percpu_rel   (%rip)
 #else
 #define __percpu_seg   fs
+#define __percpu_rel
 #endif
 
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_SMP
 #define PER_CPU_ARG(arg)   %__percpu_seg:arg
-#define PER_CPU_VAR(var)   %__percpu_seg:var
+#define PER_CPU_VAR(var)   %__percpu_seg:(var)##__percpu_rel
 #else /* ! SMP */
 #define PER_CPU_ARG(arg)   arg
-#define PER_CPU_VAR(var)   var
-#endif /* SMP */
+#define PER_CPU_VAR(var)   (var)##__percpu_rel
+#endif /* SMP */
 
 #ifdef CONFIG_X86_64_SMP
 #define INIT_PER_CPU_VAR(var)  init_per_cpu__##var
-- 
2.41.0

[PATCH v2 3/4] x86/percpu, xen: Correct PER_CPU_VAR usage to include symbol and its addend

2023-10-12 Thread Uros Bizjak

PER_CPU_VAR macro should be applied to a symbol and its addend.
Inconsisten usage is currently harmless, but needs to be corrected
before %rip-relative addressing is introduced to PER_CPU_VAR macro.

No functional changes intended.

Cc: Juergen Gross 
Cc: Boris Ostrovsky 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Signed-off-by: Uros Bizjak 
---
 arch/x86/xen/xen-asm.S | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 9e5e68008785..448958ddbaf8 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -28,7 +28,7 @@
  * non-zero.
  */
 SYM_FUNC_START(xen_irq_disable_direct)
-   movb $1, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+   movb $1, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
RET
 SYM_FUNC_END(xen_irq_disable_direct)
 
@@ -69,7 +69,7 @@ SYM_FUNC_END(check_events)
 SYM_FUNC_START(xen_irq_enable_direct)
FRAME_BEGIN
/* Unmask events */
-   movb $0, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+   movb $0, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 
/*
 * Preempt here doesn't matter because that will deal with any
@@ -78,7 +78,7 @@ SYM_FUNC_START(xen_irq_enable_direct)
 */
 
/* Test for pending */
-   testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending
+   testb $0xff, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_pending)
jz 1f
 
call check_events
@@ -97,7 +97,7 @@ SYM_FUNC_END(xen_irq_enable_direct)
  * x86 use opposite senses (mask vs enable).
  */
 SYM_FUNC_START(xen_save_fl_direct)
-   testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+   testb $0xff, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
setz %ah
addb %ah, %ah
RET
@@ -113,7 +113,7 @@ SYM_FUNC_END(xen_read_cr2);
 
 SYM_FUNC_START(xen_read_cr2_direct)
FRAME_BEGIN
-   _ASM_MOV PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_arch_cr2, %_ASM_AX
+   _ASM_MOV PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_arch_cr2), %_ASM_AX
FRAME_END
RET
 SYM_FUNC_END(xen_read_cr2_direct);
-- 
2.41.0

[PATCH v2 0/4] Introduce %rip-relative addressing to PER_CPU_VAR macro

2023-10-12 Thread Uros Bizjak

The following patch series introduces %rip-relative addressing to
PER_CPU_VAR macro. Instruction with %rip-relative address operand is
one byte shorter than its absolute address counterpart and is also
compatible with position independent executable (-fpie) build.

The first three patches are cleanups that fix various inconsistencies
throughout the assembly code.

The last patch introduces (%rip) suffix and uses it for x86_64 target,
resulting in a small code size decrease:

   textdata bss dec hex filename
255106774386685  808388 307057501d48856 vmlinux-new.o
255106294386685  808388 307057021d48826 vmlinux-old.o

Patch series is against current mainline and can be applied independently
of ongoing percpu work.

v2: Introduce PER_CPU_ARG macro to conditionally enable
segment registers in cmpxchg{8,16}b_emu.S for CONFIG_SMP.

Cc: Juergen Gross 
Cc: Boris Ostrovsky 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 

Uros Bizjak (4):
  x86/percpu: Introduce PER_CPU_ARG and use it in cmpxchg{8,16}b_emu.S
  x86/percpu: Correct PER_CPU_VAR usage to include symbol and its addend
  x86/percpu, xen: Correct PER_CPU_VAR usage to include symbol and its
addend
  x86/percpu: Introduce %rip-relative addressing to PER_CPU_VAR macro

 arch/x86/entry/calling.h  |  2 +-
 arch/x86/entry/entry_32.S |  2 +-
 arch/x86/entry/entry_64.S |  2 +-
 arch/x86/include/asm/percpu.h | 10 +++---
 arch/x86/kernel/head_64.S |  2 +-
 arch/x86/lib/cmpxchg16b_emu.S | 12 ++--
 arch/x86/lib/cmpxchg8b_emu.S  | 24 ++--
 arch/x86/xen/xen-asm.S| 10 +-
 8 files changed, 40 insertions(+), 24 deletions(-)

-- 
2.41.0

[PATCH v2 1/4] x86/percpu: Introduce PER_CPU_ARG and use it in cmpxchg{8,16}b_emu.S

2023-10-12 Thread Uros Bizjak

PER_CPU_VAR macro is intended to be applied to a symbol and should not
be used with general operands. Introduce new PER_CPU_ARG macro and
use it in cmpxchg{8,16}b_emu.S instead.

PER_CPU_VAR macro will be repurposed for %rip-relative addressing.

Also add a missing function comment to this_cpu_cmpxchg8b_emu.

No functional changes intended.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Cc: Peter Zijlstra 
Signed-off-by: Uros Bizjak 
--
v2: Introduce PER_CPU_ARG macro to conditionally enable
segment registers in cmpxchg{8,16}b_emu.S for CONFIG_SMP.
---
 arch/x86/include/asm/percpu.h |  2 ++
 arch/x86/lib/cmpxchg16b_emu.S | 12 ++--
 arch/x86/lib/cmpxchg8b_emu.S  | 24 ++--
 3 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 34734d730463..83e6a4bcea38 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -11,8 +11,10 @@
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_SMP
+#define PER_CPU_ARG(arg)   %__percpu_seg:arg
 #define PER_CPU_VAR(var)   %__percpu_seg:var
 #else /* ! SMP */
+#define PER_CPU_ARG(arg)   arg
 #define PER_CPU_VAR(var)   var
 #endif /* SMP */
 
diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S
index 6962df315793..b6b942d07a00 100644
--- a/arch/x86/lib/cmpxchg16b_emu.S
+++ b/arch/x86/lib/cmpxchg16b_emu.S
@@ -23,14 +23,14 @@ SYM_FUNC_START(this_cpu_cmpxchg16b_emu)
cli
 
/* if (*ptr == old) */
-   cmpqPER_CPU_VAR(0(%rsi)), %rax
+   cmpqPER_CPU_ARG(0(%rsi)), %rax
jne .Lnot_same
-   cmpqPER_CPU_VAR(8(%rsi)), %rdx
+   cmpqPER_CPU_ARG(8(%rsi)), %rdx
jne .Lnot_same
 
/* *ptr = new */
-   movq%rbx, PER_CPU_VAR(0(%rsi))
-   movq%rcx, PER_CPU_VAR(8(%rsi))
+   movq%rbx, PER_CPU_ARG(0(%rsi))
+   movq%rcx, PER_CPU_ARG(8(%rsi))
 
/* set ZF in EFLAGS to indicate success */
orl $X86_EFLAGS_ZF, (%rsp)
@@ -42,8 +42,8 @@ SYM_FUNC_START(this_cpu_cmpxchg16b_emu)
/* *ptr != old */
 
/* old = *ptr */
-   movqPER_CPU_VAR(0(%rsi)), %rax
-   movqPER_CPU_VAR(8(%rsi)), %rdx
+   movqPER_CPU_ARG(0(%rsi)), %rax
+   movqPER_CPU_ARG(8(%rsi)), %rdx
 
/* clear ZF in EFLAGS to indicate failure */
andl$(~X86_EFLAGS_ZF), (%rsp)
diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
index 49805257b125..9a0a7feeaf7c 100644
--- a/arch/x86/lib/cmpxchg8b_emu.S
+++ b/arch/x86/lib/cmpxchg8b_emu.S
@@ -53,18 +53,30 @@ EXPORT_SYMBOL(cmpxchg8b_emu)
 
 #ifndef CONFIG_UML
 
+/*
+ * Emulate 'cmpxchg8b %fs:(%rsi)'
+ *
+ * Inputs:
+ * %esi : memory location to compare
+ * %eax : low 32 bits of old value
+ * %edx : high 32 bits of old value
+ * %ebx : low 32 bits of new value
+ * %ecx : high 32 bits of new value
+ *
+ * Notably this is not LOCK prefixed and is not safe against NMIs
+ */
 SYM_FUNC_START(this_cpu_cmpxchg8b_emu)
 
pushfl
cli
 
-   cmplPER_CPU_VAR(0(%esi)), %eax
+   cmplPER_CPU_ARG(0(%esi)), %eax
jne .Lnot_same2
-   cmplPER_CPU_VAR(4(%esi)), %edx
+   cmplPER_CPU_ARG(4(%esi)), %edx
jne .Lnot_same2
 
-   movl%ebx, PER_CPU_VAR(0(%esi))
-   movl%ecx, PER_CPU_VAR(4(%esi))
+   movl%ebx, PER_CPU_ARG(0(%esi))
+   movl%ecx, PER_CPU_ARG(4(%esi))
 
orl $X86_EFLAGS_ZF, (%esp)
 
@@ -72,8 +84,8 @@ SYM_FUNC_START(this_cpu_cmpxchg8b_emu)
RET
 
 .Lnot_same2:
-   movlPER_CPU_VAR(0(%esi)), %eax
-   movlPER_CPU_VAR(4(%esi)), %edx
+   movlPER_CPU_ARG(0(%esi)), %eax
+   movlPER_CPU_ARG(4(%esi)), %edx
 
andl$(~X86_EFLAGS_ZF), (%esp)
 
-- 
2.41.0

[PATCH v2 2/4] x86/percpu: Correct PER_CPU_VAR usage to include symbol and its addend

2023-10-12 Thread Uros Bizjak

PER_CPU_VAR macro should be applied to a symbol and its addend.
Inconsistent usage is currently harmless, but needs to be corrected
before %rip-relative addressing is introduced to PER_CPU_VAR macro.

No functional changes intended.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Cc: Peter Zijlstra 
Signed-off-by: Uros Bizjak 
---
 arch/x86/entry/calling.h  | 2 +-
 arch/x86/entry/entry_32.S | 2 +-
 arch/x86/entry/entry_64.S | 2 +-
 arch/x86/kernel/head_64.S | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index f6907627172b..47368ab0bda0 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -173,7 +173,7 @@ For 32-bit we have the following conventions - kernel is 
built with
 .endm
 
 #define THIS_CPU_user_pcid_flush_mask   \
-   PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_user_pcid_flush_mask
+   PER_CPU_VAR(cpu_tlbstate + TLB_STATE_user_pcid_flush_mask)
 
 .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 6e6af42e044a..d4e094b2c877 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -305,7 +305,7 @@
 .macro CHECK_AND_APPLY_ESPFIX
 #ifdef CONFIG_X86_ESPFIX32
 #define GDT_ESPFIX_OFFSET (GDT_ENTRY_ESPFIX_SS * 8)
-#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + GDT_ESPFIX_OFFSET
+#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page + GDT_ESPFIX_OFFSET)
 
ALTERNATIVE "jmp .Lend_\@", "", X86_BUG_ESPFIX
 
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 43606de22511..3d6770b87b87 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -252,7 +252,7 @@ SYM_FUNC_START(__switch_to_asm)
 
 #ifdef CONFIG_STACKPROTECTOR
movqTASK_stack_canary(%rsi), %rbx
-   movq%rbx, PER_CPU_VAR(fixed_percpu_data) + FIXED_stack_canary
+   movq%rbx, PER_CPU_VAR(fixed_percpu_data + FIXED_stack_canary)
 #endif
 
/*
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index ea6995920b7a..bfe5ec2f4f83 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -449,7 +449,7 @@ SYM_CODE_START(soft_restart_cpu)
UNWIND_HINT_END_OF_STACK
 
/* Find the idle task stack */
-   movqPER_CPU_VAR(pcpu_hot) + X86_current_task, %rcx
+   movqPER_CPU_VAR(pcpu_hot + X86_current_task), %rcx
movqTASK_threadsp(%rcx), %rsp
 
jmp .Ljump_to_C_code
-- 
2.41.0

[xen-4.16-testing test] 183357: tolerable FAIL - PUSHED

2023-10-12 Thread osstest service owner

flight 183357 xen-4.16-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183357/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-libvirt-pair 28 guest-migrate/dst_host/src_host/debian.repeat 
fail in 183350 pass in 183357
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-install fail pass in 183350

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop   fail in 183350 like 183082
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183082
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 183082
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 183082
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 183082
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 183082
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 183082
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183082
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 183082
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183082
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 183082
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 183082
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm  3 hosts-allocate   starved in 183350 n/a
 test-arm64-arm64-libvirt-raw  3 hosts-allocate   starved in 183350 n/a
 test-arm64-arm64-xl-thunderx  3 hosts-allocate   st

Re: [PATCH 1/4] x86/percpu: Use explicit segment registers in lib/cmpxchg{8,16}b_emu.S

2023-10-12 Thread Uros Bizjak

On Thu, Oct 12, 2023 at 7:54 PM Uros Bizjak  wrote:
> > This will break on !SMP builds, where per-cpu variables are just
> > regular data and not accessed with a segment prefix.
>
> Ugh, indeed. Let me rethink this a bit.

Something like this:

#ifdef CONFIG_SMP
#define PER_CPU_ARG(arg)%__percpu_seg:arg
#define PER_CPU_VAR(var)%__percpu_seg:(var)##__percpu_rel
#else /* ! SMP */
#define PER_CPU_ARG(arg)arg
#define PER_CPU_VAR(var)(var)##__percpu_rel
#endif/* SMP */

and using the above PER_CPU_ARG in /lib/cmpxchg{8,16}b_emu.S will
solve the issue.

I will prepare a v2.

Thanks,
Uros.

Re: [PATCH 1/4] x86/percpu: Use explicit segment registers in lib/cmpxchg{8,16}b_emu.S

2023-10-12 Thread Uros Bizjak

On Thu, Oct 12, 2023 at 7:45 PM Brian Gerst  wrote:
>
> On Thu, Oct 12, 2023 at 12:13 PM Uros Bizjak  wrote:
> >
> > PER_CPU_VAR macro is intended to be applied to a symbol, it is not
> > intended to be used as a selector between %fs and %gs segment
> > registers for general operands.
> >
> > The address to these emulation functions is passed in a register, so
> > use explicit segment registers to access percpu variable instead.
> >
> > Also add a missing function comment to this_cpu_cmpxchg8b_emu.
> >
> > No functional changes intended.
> >
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: Borislav Petkov 
> > Cc: Dave Hansen 
> > Cc: "H. Peter Anvin" 
> > Cc: Peter Zijlstra 
> > Signed-off-by: Uros Bizjak 
> > ---
> >  arch/x86/lib/cmpxchg16b_emu.S | 12 ++--
> >  arch/x86/lib/cmpxchg8b_emu.S  | 30 +-
> >  2 files changed, 27 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S
> > index 6962df315793..2bd8b89bce75 100644
> > --- a/arch/x86/lib/cmpxchg16b_emu.S
> > +++ b/arch/x86/lib/cmpxchg16b_emu.S
> > @@ -23,14 +23,14 @@ SYM_FUNC_START(this_cpu_cmpxchg16b_emu)
> > cli
> >
> > /* if (*ptr == old) */
> > -   cmpqPER_CPU_VAR(0(%rsi)), %rax
> > +   cmpq%gs:(%rsi), %rax
> > jne .Lnot_same
> > -   cmpqPER_CPU_VAR(8(%rsi)), %rdx
> > +   cmpq%gs:8(%rsi), %rdx
> > jne .Lnot_same
> >
> > /* *ptr = new */
> > -   movq%rbx, PER_CPU_VAR(0(%rsi))
> > -   movq%rcx, PER_CPU_VAR(8(%rsi))
> > +   movq%rbx, %gs:(%rsi)
> > +   movq%rcx, %gs:8(%rsi)
> >
> > /* set ZF in EFLAGS to indicate success */
> > orl $X86_EFLAGS_ZF, (%rsp)
> > @@ -42,8 +42,8 @@ SYM_FUNC_START(this_cpu_cmpxchg16b_emu)
> > /* *ptr != old */
> >
> > /* old = *ptr */
> > -   movqPER_CPU_VAR(0(%rsi)), %rax
> > -   movqPER_CPU_VAR(8(%rsi)), %rdx
> > +   movq%gs:(%rsi), %rax
> > +   movq%gs:8(%rsi), %rdx
> >
> > /* clear ZF in EFLAGS to indicate failure */
> > andl$(~X86_EFLAGS_ZF), (%rsp)
> > diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
> > index 49805257b125..b7d68d5e2d31 100644
> > --- a/arch/x86/lib/cmpxchg8b_emu.S
> > +++ b/arch/x86/lib/cmpxchg8b_emu.S
> > @@ -24,12 +24,12 @@ SYM_FUNC_START(cmpxchg8b_emu)
> > pushfl
> > cli
> >
> > -   cmpl0(%esi), %eax
> > +   cmpl(%esi), %eax
> > jne .Lnot_same
> > cmpl4(%esi), %edx
> > jne .Lnot_same
> >
> > -   movl%ebx, 0(%esi)
> > +   movl%ebx, (%esi)
> > movl%ecx, 4(%esi)
> >
> > orl $X86_EFLAGS_ZF, (%esp)
> > @@ -38,7 +38,7 @@ SYM_FUNC_START(cmpxchg8b_emu)
> > RET
> >
> >  .Lnot_same:
> > -   movl0(%esi), %eax
> > +   movl(%esi), %eax
> > movl4(%esi), %edx
> >
> > andl$(~X86_EFLAGS_ZF), (%esp)
> > @@ -53,18 +53,30 @@ EXPORT_SYMBOL(cmpxchg8b_emu)
> >
> >  #ifndef CONFIG_UML
> >
> > +/*
> > + * Emulate 'cmpxchg8b %fs:(%esi)'
> > + *
> > + * Inputs:
> > + * %esi : memory location to compare
> > + * %eax : low 32 bits of old value
> > + * %edx : high 32 bits of old value
> > + * %ebx : low 32 bits of new value
> > + * %ecx : high 32 bits of new value
> > + *
> > + * Notably this is not LOCK prefixed and is not safe against NMIs
> > + */
> >  SYM_FUNC_START(this_cpu_cmpxchg8b_emu)
> >
> > pushfl
> > cli
> >
> > -   cmplPER_CPU_VAR(0(%esi)), %eax
> > +   cmpl%fs:(%esi), %eax
> > jne .Lnot_same2
> > -   cmplPER_CPU_VAR(4(%esi)), %edx
> > +   cmpl%fs:4(%esi), %edx
> > jne .Lnot_same2
> >
> > -   movl%ebx, PER_CPU_VAR(0(%esi))
> > -   movl%ecx, PER_CPU_VAR(4(%esi))
> > +   movl%ebx, %fs:(%esi)
> > +   movl%ecx, %fs:4(%esi)
> >
> > orl $X86_EFLAGS_ZF, (%esp)
> >
> > @@ -72,8 +84,8 @@ SYM_FUNC_START(this_cpu_cmpxchg8b_emu)
> > RET
> >
> >  .Lnot_same2:
> > -   movlPER_CPU_VAR(0(%esi)), %eax
> > -   movlPER_CPU_VAR(4(%esi)), %edx
> > +   movl%fs:(%esi), %eax
> > +   movl%fs:4(%esi), %edx
> >
> > andl$(~X86_EFLAGS_ZF), (%esp)
> >
> > --
> > 2.41.0
> >
>
> This will break on !SMP builds, where per-cpu variables are just
> regular data and not accessed with a segment prefix.

Ugh, indeed. Let me rethink this a bit.

Thanks,
Uros.

Re: [PATCH 1/4] x86/percpu: Use explicit segment registers in lib/cmpxchg{8,16}b_emu.S

2023-10-12 Thread Brian Gerst

On Thu, Oct 12, 2023 at 12:13 PM Uros Bizjak  wrote:
>
> PER_CPU_VAR macro is intended to be applied to a symbol, it is not
> intended to be used as a selector between %fs and %gs segment
> registers for general operands.
>
> The address to these emulation functions is passed in a register, so
> use explicit segment registers to access percpu variable instead.
>
> Also add a missing function comment to this_cpu_cmpxchg8b_emu.
>
> No functional changes intended.
>
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: Dave Hansen 
> Cc: "H. Peter Anvin" 
> Cc: Peter Zijlstra 
> Signed-off-by: Uros Bizjak 
> ---
>  arch/x86/lib/cmpxchg16b_emu.S | 12 ++--
>  arch/x86/lib/cmpxchg8b_emu.S  | 30 +-
>  2 files changed, 27 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S
> index 6962df315793..2bd8b89bce75 100644
> --- a/arch/x86/lib/cmpxchg16b_emu.S
> +++ b/arch/x86/lib/cmpxchg16b_emu.S
> @@ -23,14 +23,14 @@ SYM_FUNC_START(this_cpu_cmpxchg16b_emu)
> cli
>
> /* if (*ptr == old) */
> -   cmpqPER_CPU_VAR(0(%rsi)), %rax
> +   cmpq%gs:(%rsi), %rax
> jne .Lnot_same
> -   cmpqPER_CPU_VAR(8(%rsi)), %rdx
> +   cmpq%gs:8(%rsi), %rdx
> jne .Lnot_same
>
> /* *ptr = new */
> -   movq%rbx, PER_CPU_VAR(0(%rsi))
> -   movq%rcx, PER_CPU_VAR(8(%rsi))
> +   movq%rbx, %gs:(%rsi)
> +   movq%rcx, %gs:8(%rsi)
>
> /* set ZF in EFLAGS to indicate success */
> orl $X86_EFLAGS_ZF, (%rsp)
> @@ -42,8 +42,8 @@ SYM_FUNC_START(this_cpu_cmpxchg16b_emu)
> /* *ptr != old */
>
> /* old = *ptr */
> -   movqPER_CPU_VAR(0(%rsi)), %rax
> -   movqPER_CPU_VAR(8(%rsi)), %rdx
> +   movq%gs:(%rsi), %rax
> +   movq%gs:8(%rsi), %rdx
>
> /* clear ZF in EFLAGS to indicate failure */
> andl$(~X86_EFLAGS_ZF), (%rsp)
> diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
> index 49805257b125..b7d68d5e2d31 100644
> --- a/arch/x86/lib/cmpxchg8b_emu.S
> +++ b/arch/x86/lib/cmpxchg8b_emu.S
> @@ -24,12 +24,12 @@ SYM_FUNC_START(cmpxchg8b_emu)
> pushfl
> cli
>
> -   cmpl0(%esi), %eax
> +   cmpl(%esi), %eax
> jne .Lnot_same
> cmpl4(%esi), %edx
> jne .Lnot_same
>
> -   movl%ebx, 0(%esi)
> +   movl%ebx, (%esi)
> movl%ecx, 4(%esi)
>
> orl $X86_EFLAGS_ZF, (%esp)
> @@ -38,7 +38,7 @@ SYM_FUNC_START(cmpxchg8b_emu)
> RET
>
>  .Lnot_same:
> -   movl0(%esi), %eax
> +   movl(%esi), %eax
> movl4(%esi), %edx
>
> andl$(~X86_EFLAGS_ZF), (%esp)
> @@ -53,18 +53,30 @@ EXPORT_SYMBOL(cmpxchg8b_emu)
>
>  #ifndef CONFIG_UML
>
> +/*
> + * Emulate 'cmpxchg8b %fs:(%esi)'
> + *
> + * Inputs:
> + * %esi : memory location to compare
> + * %eax : low 32 bits of old value
> + * %edx : high 32 bits of old value
> + * %ebx : low 32 bits of new value
> + * %ecx : high 32 bits of new value
> + *
> + * Notably this is not LOCK prefixed and is not safe against NMIs
> + */
>  SYM_FUNC_START(this_cpu_cmpxchg8b_emu)
>
> pushfl
> cli
>
> -   cmplPER_CPU_VAR(0(%esi)), %eax
> +   cmpl%fs:(%esi), %eax
> jne .Lnot_same2
> -   cmplPER_CPU_VAR(4(%esi)), %edx
> +   cmpl%fs:4(%esi), %edx
> jne .Lnot_same2
>
> -   movl%ebx, PER_CPU_VAR(0(%esi))
> -   movl%ecx, PER_CPU_VAR(4(%esi))
> +   movl%ebx, %fs:(%esi)
> +   movl%ecx, %fs:4(%esi)
>
> orl $X86_EFLAGS_ZF, (%esp)
>
> @@ -72,8 +84,8 @@ SYM_FUNC_START(this_cpu_cmpxchg8b_emu)
> RET
>
>  .Lnot_same2:
> -   movlPER_CPU_VAR(0(%esi)), %eax
> -   movlPER_CPU_VAR(4(%esi)), %edx
> +   movl%fs:(%esi), %eax
> +   movl%fs:4(%esi), %edx
>
> andl$(~X86_EFLAGS_ZF), (%esp)
>
> --
> 2.41.0
>

This will break on !SMP builds, where per-cpu variables are just
regular data and not accessed with a segment prefix.

Brian Gerst

[for-4.18][PATCH 1/2] xen/x86: Add family guards to the is_zen[12]_uarch() macros

2023-10-12 Thread Alejandro Vallejo

It slightly simplifies the code that uses them at no real cost because the
code is very rarely executed. This makes it harder to confuse zen uarches
due to missing or mistaken family checks.

No functional change intended.

Signed-off-by: Alejandro Vallejo 
---
 xen/arch/x86/cpu/amd.c | 5 ++---
 xen/arch/x86/include/asm/amd.h | 9 +
 xen/arch/x86/spec_ctrl.c   | 3 ---
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 4f27187f92..22aa8c0085 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -1056,8 +1056,7 @@ static void cf_check init_amd(struct cpuinfo_x86 *c)
 
amd_init_ssbd(c);
 
-   if (c->x86 == 0x17)
-   amd_init_spectral_chicken();
+   amd_init_spectral_chicken();
 
/* Probe for NSCB on Zen2 CPUs when not virtualised */
if (!cpu_has_hypervisor && !cpu_has_nscb && c == &boot_cpu_data &&
@@ -1293,7 +1292,7 @@ static int __init cf_check zen2_c6_errata_check(void)
 */
s_time_t delta;
 
-   if (cpu_has_hypervisor || boot_cpu_data.x86 != 0x17 || !is_zen2_uarch())
+   if (cpu_has_hypervisor || !is_zen2_uarch())
return 0;
 
/*
diff --git a/xen/arch/x86/include/asm/amd.h b/xen/arch/x86/include/asm/amd.h
index d862cb7972..5a40bcc2ba 100644
--- a/xen/arch/x86/include/asm/amd.h
+++ b/xen/arch/x86/include/asm/amd.h
@@ -145,11 +145,12 @@
  * Hygon (Fam18h) but without simple model number rules.  Instead, use STIBP
  * as a heuristic that distinguishes the two.
  *
- * The caller is required to perform the appropriate vendor/family checks
- * first.
+ * The caller is required to perform the appropriate vendor check first.
  */
-#define is_zen1_uarch() (!boot_cpu_has(X86_FEATURE_AMD_STIBP))
-#define is_zen2_uarch()   boot_cpu_has(X86_FEATURE_AMD_STIBP)
+#define is_zen1_uarch() ((boot_cpu_data.x86 == 0x17 || boot_cpu_data.x86 == 
0x18) && \
+ !boot_cpu_has(X86_FEATURE_AMD_STIBP))
+#define is_zen2_uarch() (boot_cpu_data.x86 == 0x17 && \
+ boot_cpu_has(X86_FEATURE_AMD_STIBP))
 
 struct cpuinfo_x86;
 int cpu_has_amd_erratum(const struct cpuinfo_x86 *, int, ...);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 6fd7d44ce4..508c454516 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -976,9 +976,6 @@ static bool __init has_div_vuln(void)
(X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
 return false;
 
-if ( boot_cpu_data.x86 != 0x17 && boot_cpu_data.x86 != 0x18 )
-return false;
-
 return is_zen1_uarch();
 }
 
-- 
2.34.1

[for-4.18][PATCH 2/2] x86/amd: Prevent potentially fetching wrong instruction bytes on Zen4

2023-10-12 Thread Alejandro Vallejo

If Zen4 runs with SMT and without STIBP, then it may corrupt its own code
stream. This is erratum #1485 in the AMD specs.

Fix adapted off Linux's mailing list:
  
https://lore.kernel.org/lkml/d99589f4-bc5d-430b-87b2-72c20370c...@exactcode.com/T/#u

Signed-off-by: Alejandro Vallejo 
---
 xen/arch/x86/cpu/amd.c   | 11 +++
 xen/arch/x86/include/asm/amd.h   |  8 
 xen/arch/x86/include/asm/msr-index.h |  2 ++
 3 files changed, 21 insertions(+)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 22aa8c0085..2426e4cf15 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -1166,6 +1166,17 @@ static void cf_check init_amd(struct cpuinfo_x86 *c)
if (c->x86 == 0x10)
__clear_bit(X86_FEATURE_MONITOR, c->x86_capability);
 
+   /*
+* Erratum #1485: Outdated microcode on Zen4 may cause corruption
+* in the code stream if SMT is enabled and STIBP is not. The fix
+* is cheap, so it's applied irrespective of the loaded microcode.
+*/
+   if (!cpu_has_hypervisor && is_zen4_uarch()) {
+   rdmsrl(MSR_AMD64_BP_CFG, value);
+   wrmsrl(MSR_AMD64_BP_CFG,
+  value | AMD64_BP_CFG_SHARED_BTB_FIX);
+   }
+
if (!cpu_has_amd_erratum(c, AMD_ERRATUM_121))
opt_allow_unsafe = 1;
else if (opt_allow_unsafe < 0)
diff --git a/xen/arch/x86/include/asm/amd.h b/xen/arch/x86/include/asm/amd.h
index 5a40bcc2ba..7d18f7c66b 100644
--- a/xen/arch/x86/include/asm/amd.h
+++ b/xen/arch/x86/include/asm/amd.h
@@ -145,12 +145,20 @@
  * Hygon (Fam18h) but without simple model number rules.  Instead, use STIBP
  * as a heuristic that distinguishes the two.
  *
+ * Zen3 and Zen4 are easier to spot by model number, but it's still not a
+ * single range. We use AutoIBRS to to discriminate instead, as that's a
+ * Zen4-specific feature.
+ *
  * The caller is required to perform the appropriate vendor check first.
  */
 #define is_zen1_uarch() ((boot_cpu_data.x86 == 0x17 || boot_cpu_data.x86 == 
0x18) && \
  !boot_cpu_has(X86_FEATURE_AMD_STIBP))
 #define is_zen2_uarch() (boot_cpu_data.x86 == 0x17 && \
  boot_cpu_has(X86_FEATURE_AMD_STIBP))
+#define is_zen3_uarch() (boot_cpu_data.x86 == 0x19 && \
+ !boot_cpu_has(X86_FEATURE_AUTO_IBRS))
+#define is_zen4_uarch() (boot_cpu_data.x86 == 0x19 && \
+ boot_cpu_has(X86_FEATURE_AUTO_IBRS))
 
 struct cpuinfo_x86;
 int cpu_has_amd_erratum(const struct cpuinfo_x86 *, int, ...);
diff --git a/xen/arch/x86/include/asm/msr-index.h 
b/xen/arch/x86/include/asm/msr-index.h
index 11ffed543a..4437e8a63e 100644
--- a/xen/arch/x86/include/asm/msr-index.h
+++ b/xen/arch/x86/include/asm/msr-index.h
@@ -403,6 +403,8 @@
 #define MSR_AMD64_DE_CFG   0xc0011029
 #define AMD64_DE_CFG_LFENCE_SERIALISE  (_AC(1, ULL) << 1)
 #define MSR_AMD64_EX_CFG   0xc001102c
+#define MSR_AMD64_BP_CFG   0xc001102e
+#define AMD64_BP_CFG_SHARED_BTB_FIX(_AC(1, ULL) << 5)
 #define MSR_AMD64_DE_CFG2  0xc00110e3
 
 #define MSR_AMD64_DR0_ADDRESS_MASK 0xc0011027
-- 
2.34.1

[for-4.18][PATCH 0/2] Fix AMD erratum #1485 on AMD Zen4

2023-10-12 Thread Alejandro Vallejo

This patch should make it to 4.18, as it prevents random crashes on AMD
Zen4 running outdated microcode.

Under certain conditions Zen4 may corrupt its own code stream when SMT is
enabled and STIBP is not. The Linux thread in which this was highlighted is
in patch2's commit message.

NOTE: Still running in CI as of now, but tested locally. Pipeline here.
https://gitlab.com/xen-project/people/agvallejo/xen/-/pipelines/1034895039

Patch 1: is just a minor refactor to ensure we don't get microarchitectures
 of different families mixed up now that we have 3 different
 families involved (Fam17h, Fam18h and Fam19h)
Patch 2: The actual fix. It involves setting a bit in an MSR if it's a non
 virtualized zen4. It's not a direct copy of the Linux patch, as we
 have started using macros to find out microarchitectures from
 CPUID bits, rather than relying on models.

Alejandro Vallejo (2):
  xen/x86: Add family guards to the is_zen[12]_uarch() macros
  x86/amd: Prevent potentially fetching wrong instruction bytes on Zen4

 xen/arch/x86/cpu/amd.c   | 16 +---
 xen/arch/x86/include/asm/amd.h   | 17 +
 xen/arch/x86/include/asm/msr-index.h |  2 ++
 xen/arch/x86/spec_ctrl.c |  3 ---
 4 files changed, 28 insertions(+), 10 deletions(-)

-- 
2.34.1

[libvirt test] 183351: tolerable all pass - PUSHED

2023-10-12 Thread osstest service owner

flight 183351 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183351/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183338
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183338
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183338
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  8eb09a2bb984f3550cc87074e57841b86be39601
baseline version:
 libvirt  bb673117d5823030a47dda70712bafbf62fe302a

Last test of basis   183338  2023-10-11 04:20:42 Z1 days
Testing same since   183351  2023-10-12 04:18:45 Z0 days1 attempts


People who touched revisions under test:
  Erik Skultety 
  Sergey Mironov 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest i

[PATCH 0/4] [PATCH 0/4] Introduce %rip-relative addressing to PER_CPU_VAR macro

2023-10-12 Thread Uros Bizjak

The following patch series introduces %rip-relative addressing to
PER_CPU_VAR macro. Instruction with %rip-relative address operand is
one byte shorter than its absolute address counterpart and is also
compatible with position independent executable (-fpie) build.

The first three patches are cleanups that fix various inconsistencies
throughout the assembly code.

The last patch introduces (%rip) suffix and uses it for x86_64 target,
resulting in a small code size decrease:

   textdata bss dec hex filename
255106774386685  808388 307057501d48856 vmlinux-new.o
255106294386685  808388 307057021d48826 vmlinux-old.o

Patch series is against current mainline and can be applied independently
of the ongoing percpu work.

Cc: Juergen Gross 
Cc: Boris Ostrovsky 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 

Uros Bizjak (4):
  x86/percpu: Use explicit segment registers in lib/cmpxchg{8,16}b_emu.S
  x86/percpu: Correct PER_CPU_VAR usage to include symbol and its addend
  x86/percpu, xen: Correct PER_CPU_VAR usage to include symbol and its
addend
  x86/percpu: Introduce %rip-relative addressing to PER_CPU_VAR macro

 arch/x86/entry/calling.h  |  2 +-
 arch/x86/entry/entry_32.S |  2 +-
 arch/x86/entry/entry_64.S |  2 +-
 arch/x86/include/asm/percpu.h |  6 --
 arch/x86/kernel/head_64.S |  2 +-
 arch/x86/lib/cmpxchg16b_emu.S | 12 ++--
 arch/x86/lib/cmpxchg8b_emu.S  | 30 +-
 arch/x86/xen/xen-asm.S| 10 +-
 8 files changed, 40 insertions(+), 26 deletions(-)

-- 
2.41.0

[PATCH 3/4] x86/percpu, xen: Correct PER_CPU_VAR usage to include symbol and its addend

2023-10-12 Thread Uros Bizjak

PER_CPU_VAR macro should be applied to a symbol and its addend.
Inconsisten usage is currently harmless, but needs to be corrected
before %rip-relative addressing is introduced to PER_CPU_VAR macro.

No functional changes intended.

Cc: Juergen Gross 
Cc: Boris Ostrovsky 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Signed-off-by: Uros Bizjak 
---
 arch/x86/xen/xen-asm.S | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 9e5e68008785..448958ddbaf8 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -28,7 +28,7 @@
  * non-zero.
  */
 SYM_FUNC_START(xen_irq_disable_direct)
-   movb $1, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+   movb $1, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
RET
 SYM_FUNC_END(xen_irq_disable_direct)
 
@@ -69,7 +69,7 @@ SYM_FUNC_END(check_events)
 SYM_FUNC_START(xen_irq_enable_direct)
FRAME_BEGIN
/* Unmask events */
-   movb $0, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+   movb $0, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 
/*
 * Preempt here doesn't matter because that will deal with any
@@ -78,7 +78,7 @@ SYM_FUNC_START(xen_irq_enable_direct)
 */
 
/* Test for pending */
-   testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending
+   testb $0xff, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_pending)
jz 1f
 
call check_events
@@ -97,7 +97,7 @@ SYM_FUNC_END(xen_irq_enable_direct)
  * x86 use opposite senses (mask vs enable).
  */
 SYM_FUNC_START(xen_save_fl_direct)
-   testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+   testb $0xff, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
setz %ah
addb %ah, %ah
RET
@@ -113,7 +113,7 @@ SYM_FUNC_END(xen_read_cr2);
 
 SYM_FUNC_START(xen_read_cr2_direct)
FRAME_BEGIN
-   _ASM_MOV PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_arch_cr2, %_ASM_AX
+   _ASM_MOV PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_arch_cr2), %_ASM_AX
FRAME_END
RET
 SYM_FUNC_END(xen_read_cr2_direct);
-- 
2.41.0

[PATCH 4/4] x86/percpu: Introduce %rip-relative addressing to PER_CPU_VAR macro

2023-10-12 Thread Uros Bizjak

Introduce x86_64 %rip-relative addressing to PER_CPU_VAR macro.
Instruction with %rip-relative address operand is one byte shorter than
its absolute address counterpart and is also compatible with position
independent executable (-fpie) build.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Cc: Peter Zijlstra 
Signed-off-by: Uros Bizjak 
---
 arch/x86/include/asm/percpu.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 34734d730463..9a5f1896c5ef 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -4,16 +4,18 @@
 
 #ifdef CONFIG_X86_64
 #define __percpu_seg   gs
+#define __percpu_rel   (%rip)
 #else
 #define __percpu_seg   fs
+#define __percpu_rel
 #endif
 
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_SMP
-#define PER_CPU_VAR(var)   %__percpu_seg:var
+#define PER_CPU_VAR(var)   %__percpu_seg:(var)##__percpu_rel
 #else /* ! SMP */
-#define PER_CPU_VAR(var)   var
+#define PER_CPU_VAR(var)   (var)##__percpu_rel
 #endif /* SMP */
 
 #ifdef CONFIG_X86_64_SMP
-- 
2.41.0

[PATCH 1/4] x86/percpu: Use explicit segment registers in lib/cmpxchg{8,16}b_emu.S

2023-10-12 Thread Uros Bizjak

PER_CPU_VAR macro is intended to be applied to a symbol, it is not
intended to be used as a selector between %fs and %gs segment
registers for general operands.

The address to these emulation functions is passed in a register, so
use explicit segment registers to access percpu variable instead.

Also add a missing function comment to this_cpu_cmpxchg8b_emu.

No functional changes intended.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Cc: Peter Zijlstra 
Signed-off-by: Uros Bizjak 
---
 arch/x86/lib/cmpxchg16b_emu.S | 12 ++--
 arch/x86/lib/cmpxchg8b_emu.S  | 30 +-
 2 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S
index 6962df315793..2bd8b89bce75 100644
--- a/arch/x86/lib/cmpxchg16b_emu.S
+++ b/arch/x86/lib/cmpxchg16b_emu.S
@@ -23,14 +23,14 @@ SYM_FUNC_START(this_cpu_cmpxchg16b_emu)
cli
 
/* if (*ptr == old) */
-   cmpqPER_CPU_VAR(0(%rsi)), %rax
+   cmpq%gs:(%rsi), %rax
jne .Lnot_same
-   cmpqPER_CPU_VAR(8(%rsi)), %rdx
+   cmpq%gs:8(%rsi), %rdx
jne .Lnot_same
 
/* *ptr = new */
-   movq%rbx, PER_CPU_VAR(0(%rsi))
-   movq%rcx, PER_CPU_VAR(8(%rsi))
+   movq%rbx, %gs:(%rsi)
+   movq%rcx, %gs:8(%rsi)
 
/* set ZF in EFLAGS to indicate success */
orl $X86_EFLAGS_ZF, (%rsp)
@@ -42,8 +42,8 @@ SYM_FUNC_START(this_cpu_cmpxchg16b_emu)
/* *ptr != old */
 
/* old = *ptr */
-   movqPER_CPU_VAR(0(%rsi)), %rax
-   movqPER_CPU_VAR(8(%rsi)), %rdx
+   movq%gs:(%rsi), %rax
+   movq%gs:8(%rsi), %rdx
 
/* clear ZF in EFLAGS to indicate failure */
andl$(~X86_EFLAGS_ZF), (%rsp)
diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
index 49805257b125..b7d68d5e2d31 100644
--- a/arch/x86/lib/cmpxchg8b_emu.S
+++ b/arch/x86/lib/cmpxchg8b_emu.S
@@ -24,12 +24,12 @@ SYM_FUNC_START(cmpxchg8b_emu)
pushfl
cli
 
-   cmpl0(%esi), %eax
+   cmpl(%esi), %eax
jne .Lnot_same
cmpl4(%esi), %edx
jne .Lnot_same
 
-   movl%ebx, 0(%esi)
+   movl%ebx, (%esi)
movl%ecx, 4(%esi)
 
orl $X86_EFLAGS_ZF, (%esp)
@@ -38,7 +38,7 @@ SYM_FUNC_START(cmpxchg8b_emu)
RET
 
 .Lnot_same:
-   movl0(%esi), %eax
+   movl(%esi), %eax
movl4(%esi), %edx
 
andl$(~X86_EFLAGS_ZF), (%esp)
@@ -53,18 +53,30 @@ EXPORT_SYMBOL(cmpxchg8b_emu)
 
 #ifndef CONFIG_UML
 
+/*
+ * Emulate 'cmpxchg8b %fs:(%esi)'
+ *
+ * Inputs:
+ * %esi : memory location to compare
+ * %eax : low 32 bits of old value
+ * %edx : high 32 bits of old value
+ * %ebx : low 32 bits of new value
+ * %ecx : high 32 bits of new value
+ *
+ * Notably this is not LOCK prefixed and is not safe against NMIs
+ */
 SYM_FUNC_START(this_cpu_cmpxchg8b_emu)
 
pushfl
cli
 
-   cmplPER_CPU_VAR(0(%esi)), %eax
+   cmpl%fs:(%esi), %eax
jne .Lnot_same2
-   cmplPER_CPU_VAR(4(%esi)), %edx
+   cmpl%fs:4(%esi), %edx
jne .Lnot_same2
 
-   movl%ebx, PER_CPU_VAR(0(%esi))
-   movl%ecx, PER_CPU_VAR(4(%esi))
+   movl%ebx, %fs:(%esi)
+   movl%ecx, %fs:4(%esi)
 
orl $X86_EFLAGS_ZF, (%esp)
 
@@ -72,8 +84,8 @@ SYM_FUNC_START(this_cpu_cmpxchg8b_emu)
RET
 
 .Lnot_same2:
-   movlPER_CPU_VAR(0(%esi)), %eax
-   movlPER_CPU_VAR(4(%esi)), %edx
+   movl%fs:(%esi), %eax
+   movl%fs:4(%esi), %edx
 
andl$(~X86_EFLAGS_ZF), (%esp)
 
-- 
2.41.0

[PATCH 2/4] x86/percpu: Correct PER_CPU_VAR usage to include symbol and its addend

2023-10-12 Thread Uros Bizjak

PER_CPU_VAR macro should be applied to a symbol and its addend.
Inconsistent usage is currently harmless, but needs to be corrected
before %rip-relative addressing is introduced to PER_CPU_VAR macro.

No functional changes intended.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Cc: Peter Zijlstra 
Signed-off-by: Uros Bizjak 
---
 arch/x86/entry/calling.h  | 2 +-
 arch/x86/entry/entry_32.S | 2 +-
 arch/x86/entry/entry_64.S | 2 +-
 arch/x86/kernel/head_64.S | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index f6907627172b..47368ab0bda0 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -173,7 +173,7 @@ For 32-bit we have the following conventions - kernel is 
built with
 .endm
 
 #define THIS_CPU_user_pcid_flush_mask   \
-   PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_user_pcid_flush_mask
+   PER_CPU_VAR(cpu_tlbstate + TLB_STATE_user_pcid_flush_mask)
 
 .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 6e6af42e044a..d4e094b2c877 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -305,7 +305,7 @@
 .macro CHECK_AND_APPLY_ESPFIX
 #ifdef CONFIG_X86_ESPFIX32
 #define GDT_ESPFIX_OFFSET (GDT_ENTRY_ESPFIX_SS * 8)
-#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + GDT_ESPFIX_OFFSET
+#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page + GDT_ESPFIX_OFFSET)
 
ALTERNATIVE "jmp .Lend_\@", "", X86_BUG_ESPFIX
 
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 43606de22511..3d6770b87b87 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -252,7 +252,7 @@ SYM_FUNC_START(__switch_to_asm)
 
 #ifdef CONFIG_STACKPROTECTOR
movqTASK_stack_canary(%rsi), %rbx
-   movq%rbx, PER_CPU_VAR(fixed_percpu_data) + FIXED_stack_canary
+   movq%rbx, PER_CPU_VAR(fixed_percpu_data + FIXED_stack_canary)
 #endif
 
/*
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index ea6995920b7a..bfe5ec2f4f83 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -449,7 +449,7 @@ SYM_CODE_START(soft_restart_cpu)
UNWIND_HINT_END_OF_STACK
 
/* Find the idle task stack */
-   movqPER_CPU_VAR(pcpu_hot) + X86_current_task, %rcx
+   movqPER_CPU_VAR(pcpu_hot + X86_current_task), %rcx
movqTASK_threadsp(%rcx), %rsp
 
jmp .Ljump_to_C_code
-- 
2.41.0

[XEN PATCH][for-4.19 1/2] xen/vmap: use LOWEST_BIT to wrap a violation of Rule 10.1

2023-10-12 Thread Nicola Vetrini

No functional change.

Signed-off-by: Nicola Vetrini 
---
 xen/common/vmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 4fd6b3067ec1..34e99d780c03 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -53,7 +53,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
 if ( !align )
 align = 1;
 else if ( align & (align - 1) )
-align &= -align;
+align = LOWEST_BIT(align);
 
 ASSERT((t >= VMAP_DEFAULT) && (t < VMAP_REGION_NR));
 if ( !vm_base[t] )
-- 
2.34.1

[XEN PATCH][for-4.19 0/2] use the macro LOWEST_BIT where appropriate

2023-10-12 Thread Nicola Vetrini

This series replaces two instances of the pattern (x & -x) with the
macro LOWEST_BIT, introduced by the series [1]. Therefore, these patches should
be applied on top of patch 1 from that series.

[1] https://marc.info/?l=xen-devel&m=169712438716424&w=2

Nicola Vetrini (2):
  xen/vmap: use LOWEST_BIT to wrap a violation of Rule 10.1
  xen/iommu: use LOWEST_BIT to wrap a violation of Rule 10.1

 xen/common/vmap.c   | 2 +-
 xen/drivers/passthrough/iommu.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--
2.34.1

[XEN PATCH][for-4.19 2/2] xen/iommu: use LOWEST_BIT to wrap a violation of Rule 10.1

2023-10-12 Thread Nicola Vetrini

No functional change.

Signed-off-by: Nicola Vetrini 
---
 xen/drivers/passthrough/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index f9a9f53dbd44..558700788d4a 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -560,7 +560,7 @@ int __init iommu_setup(void)
 rc = iommu_hardware_setup();
 if ( !rc )
 ops = iommu_get_ops();
-if ( ops && (ops->page_sizes & -ops->page_sizes) != PAGE_SIZE )
+if ( ops && (LOWEST_BIT(ops->page_sizes)) != PAGE_SIZE )
 {
 printk(XENLOG_ERR "IOMMU: page size mask %lx unsupported\n",
ops->page_sizes);
-- 
2.34.1

Re: Xen 4.18 release: Reminder about code freeze

2023-10-12 Thread George Dunlap

> > Stop tinkering in the hope that it hides the problem.  You're only
> > making it harder to fix properly.
>
> Making it harder to fix properly would be a valid reason not to commit
> the (maybe partial) fix. But looking at the fix again:
>
> diff --git a/tools/xenstored/domain.c b/tools/xenstored/domain.c
> index a6cd199fdc..9cd6678015 100644
> --- a/tools/xenstored/domain.c
> +++ b/tools/xenstored/domain.c
> @@ -989,6 +989,7 @@ static struct domain *introduce_domain(const void *ctx,
> talloc_steal(domain->conn, domain);
>
> if (!restore) {
> +   domain_conn_reset(domain);
> /* Notify the domain that xenstore is available */
> interface->connection = XENSTORE_CONNECTED;
> xenevtchn_notify(xce_handle, domain->port);
> @@ -1031,8 +1032,6 @@ int do_introduce(const void *ctx, struct connection 
> *conn,
> if (!domain)
> return errno;
>
> -   domain_conn_reset(domain);
> -
> send_ack(conn, XS_INTRODUCE);
>
> It is a 1-line movement. Textually small. Easy to understand and to
> revert. It doesn't seem to be making things harder to fix? We could
> revert it any time if a better fix is offered.
>
> Maybe we could have a XXX note in the commit message or in-code
> comment?

It moves a line from one function (do_domain_introduce()) into a
completely different function (introduce_domain()), nested inside two
if() statements; with no analysis on how the change will impact
things.

Are there any paths through do_domain_introduce() that now *won't* get
a domain_conn_reset() call?  Is that OK?

Is introduce_domain() called in other places?  Will those places now
get extra domain_conn_reset() calls they weren't expecting?  Is that
OK?

I mean, it certainly seems strange to set the state to CONNECTED, send
off an event channel, and then after that delete all watches /
transactions / buffered data and so on; but we need at least a basic
understanding of what's going on to know that this change isn't going
to break comething.

Not knowing much about the xenstore protocol: In the
(!domain->introduced) case, will there be anything to actually delete?
 It seems like it would only be necessary / useful on the
(domain->introduced) case.

 -George

[XEN PATCH][for-next v2 6/8] x86/mce: Move MC_NCLASSES into the enum mctelem_class

2023-10-12 Thread Nicola Vetrini

The definition of MC_NCLASSES contained a violation of MISRA C:2012
Rule 10.1, therefore by moving it as an enumeration constant resolves the
violation and makes it more resilient to possible additions to that enum.

Signed-off-by: Nicola Vetrini 
---
Note that the use of an enum constant as operand to [ ] and != is allowed
by the Rule.
---
 xen/arch/x86/cpu/mcheck/mctelem.c | 2 --
 xen/arch/x86/cpu/mcheck/mctelem.h | 5 +++--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mctelem.c 
b/xen/arch/x86/cpu/mcheck/mctelem.c
index 329ac20faf96..77a4d1d5ff48 100644
--- a/xen/arch/x86/cpu/mcheck/mctelem.c
+++ b/xen/arch/x86/cpu/mcheck/mctelem.c
@@ -64,8 +64,6 @@ struct mctelem_ent {
 
 #define MC_NENT (MC_URGENT_NENT + MC_NONURGENT_NENT)
 
-#defineMC_NCLASSES (MC_NONURGENT + 1)
-
 #defineCOOKIE2MCTE(c)  ((struct mctelem_ent *)(c))
 #defineMCTE2COOKIE(tep)((mctelem_cookie_t)(tep))
 
diff --git a/xen/arch/x86/cpu/mcheck/mctelem.h 
b/xen/arch/x86/cpu/mcheck/mctelem.h
index d4eba53ae0e5..21b251847bc0 100644
--- a/xen/arch/x86/cpu/mcheck/mctelem.h
+++ b/xen/arch/x86/cpu/mcheck/mctelem.h
@@ -55,8 +55,9 @@
 typedef struct mctelem_cookie *mctelem_cookie_t;
 
 typedef enum mctelem_class {
-   MC_URGENT,
-   MC_NONURGENT
+MC_URGENT,
+MC_NONURGENT,
+MC_NCLASSES
 } mctelem_class_t;
 
 extern void mctelem_init(unsigned int);
-- 
2.34.1

[XEN PATCH][for-4.19 v2 3/8] xen/pdx: amend definition of PDX_GROUP_COUNT

2023-10-12 Thread Nicola Vetrini

The definition of PDX_GROUP_COUNT causes violations of
MISRA C:2012 Rule 10.1, therefore the problematic part now uses
the LOWEST_BIT macro, which encapsulates the pattern.

Signed-off-by: Nicola Vetrini 
---
 xen/include/xen/pdx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/xen/pdx.h b/xen/include/xen/pdx.h
index f3fbc4273aa4..36d618a8ba7d 100644
--- a/xen/include/xen/pdx.h
+++ b/xen/include/xen/pdx.h
@@ -72,7 +72,7 @@
 extern unsigned long max_pdx;
 
 #define PDX_GROUP_COUNT ((1 << PDX_GROUP_SHIFT) / \
- (sizeof(*frame_table) & -sizeof(*frame_table)))
+ (LOWEST_BIT(sizeof(*frame_table
 extern unsigned long pdx_group_valid[];
 
 /**
-- 
2.34.1

[XEN PATCH][for-next v2 5/8] x86/io_apic: address violation of MISRA C:2012 Rule 10.1

2023-10-12 Thread Nicola Vetrini

The definition of IO_APIC_BASE contains a sum of an essentially enum
value (FIX_IO_APIC_BASE_0) that is positive with an index that, in all
instances, is unsigned, therefore the former is cast to unsigned, so that
the operands are of the same essential type.

No functional change.
---
 xen/arch/x86/include/asm/io_apic.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/include/asm/io_apic.h 
b/xen/arch/x86/include/asm/io_apic.h
index a7e4c9e146de..a0fc50d601fe 100644
--- a/xen/arch/x86/include/asm/io_apic.h
+++ b/xen/arch/x86/include/asm/io_apic.h
@@ -14,9 +14,10 @@
  * Copyright (C) 1997, 1998, 1999, 2000 Ingo Molnar
  */
 
-#define IO_APIC_BASE(idx)   \
-((volatile uint32_t *)(__fix_to_virt(FIX_IO_APIC_BASE_0 + (idx))\
-   + (mp_ioapics[idx].mpc_apicaddr & ~PAGE_MASK)))
+#define IO_APIC_BASE(idx) \
+((volatile uint32_t *)\
+ (__fix_to_virt((unsigned int)FIX_IO_APIC_BASE_0 + (idx)) \
+  + (mp_ioapics[idx].mpc_apicaddr & ~PAGE_MASK)))
 
 #define IO_APIC_ID(idx) (mp_ioapics[idx].mpc_apicid)
 
-- 
2.34.1

[XEN PATCH][for-next][for-4.19 v2 1/8] xen/include: add macro LOWEST_BIT

2023-10-12 Thread Nicola Vetrini

The purpose of this macro is to encapsulate the well-known expression
'x & -x', that in 2's complement architectures on unsigned integers will
give 2^ffs(x), where ffs(x) is the position of the lowest set bit in x.

A deviation for ECLAIR is also introduced.

Signed-off-by: Nicola Vetrini 
---
Changes in v2:
- rename to LOWEST_BIT
---
 automation/eclair_analysis/ECLAIR/deviations.ecl | 6 ++
 xen/include/xen/macros.h | 6 --
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/automation/eclair_analysis/ECLAIR/deviations.ecl 
b/automation/eclair_analysis/ECLAIR/deviations.ecl
index d8170106b449..b8e1155ee49d 100644
--- a/automation/eclair_analysis/ECLAIR/deviations.ecl
+++ b/automation/eclair_analysis/ECLAIR/deviations.ecl
@@ -274,6 +274,12 @@ still non-negative."
 -config=MC3R1.R10.1,etypes+={safe, 
"stmt(operator(logical)||node(conditional_operator||binary_conditional_operator))",
 "dst_type(ebool||boolean)"}
 -doc_end

+-doc_begin="The macro LOWEST_BIT encapsulates a well-known pattern to obtain 
the value
+2^ffs(x) for unsigned integers on two's complement architectures
+(all the architectures supported by Xen satisfy this requirement)."
+-config=MC3R1.R10.1,reports+={safe, 
"any_area(any_loc(any_exp(macro(^LOWEST_BIT$"}
+-doc_end
+
 ### Set 3 ###

 #
diff --git a/xen/include/xen/macros.h b/xen/include/xen/macros.h
index d0caae7db298..af47179d1056 100644
--- a/xen/include/xen/macros.h
+++ b/xen/include/xen/macros.h
@@ -8,8 +8,10 @@
 #define DIV_ROUND(n, d) (((n) + (d) / 2) / (d))
 #define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))

-#define MASK_EXTR(v, m) (((v) & (m)) / ((m) & -(m)))
-#define MASK_INSR(v, m) (((v) * ((m) & -(m))) & (m))
+#define LOWEST_BIT(x) ((x) & -(x))
+
+#define MASK_EXTR(v, m) (((v) & (m)) / LOWEST_BIT(m))
+#define MASK_INSR(v, m) (((v) * LOWEST_BIT(m)) & (m))

 #define count_args_(dot, a1, a2, a3, a4, a5, a6, a7, a8, x, ...) x
 #define count_args(args...) \
--
2.34.1

[XEN PATCH][for-4.19 v2 7/8] xen/types: address Rule 10.1 for DECLARE_BITMAP use

2023-10-12 Thread Nicola Vetrini

Given its use in the declaration
'DECLARE_BITMAP(features, IOMMU_FEAT_count)' the argument
'bits' has essential type 'enum iommu_feature', which is not
allowed by the Rule as an operand to the addition operator
in macro 'BITS_TO_LONGS'.

This construct is deviated with a deviation comment.

Signed-off-by: Nicola Vetrini 
---
 docs/misra/safe.json| 8 
 xen/include/xen/iommu.h | 1 +
 xen/include/xen/types.h | 8 
 3 files changed, 17 insertions(+)

diff --git a/docs/misra/safe.json b/docs/misra/safe.json
index 39c5c056c7d4..952324f85cf9 100644
--- a/docs/misra/safe.json
+++ b/docs/misra/safe.json
@@ -20,6 +20,14 @@
 },
 {
 "id": "SAF-2-safe",
+"analyser": {
+"eclair": "MC3R1.R10.1"
+},
+"name": "MC3R1.R10.1: use of an enumeration constant in an 
arithmetic operation",
+"text": "This violation can be fixed with a cast to (int) of the 
enumeration constant, but a deviation was chosen due to code readability (see 
also the comment in BITS_TO_LONGS)."
+},
+{
+"id": "SAF-3-safe",
 "analyser": {},
 "name": "Sentinel",
 "text": "Next ID to be used"
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 0e747b0bbc1c..d5c25770915b 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -360,6 +360,7 @@ struct domain_iommu {
 #endif
 
 /* Features supported by the IOMMU */
+/* SAF-2-safe enum constant in arithmetic operation */
 DECLARE_BITMAP(features, IOMMU_FEAT_count);
 
 /* Does the guest share HAP mapping with the IOMMU? */
diff --git a/xen/include/xen/types.h b/xen/include/xen/types.h
index aea259db1ef2..50171ea1ad28 100644
--- a/xen/include/xen/types.h
+++ b/xen/include/xen/types.h
@@ -22,6 +22,14 @@ typedef signed long ssize_t;
 
 typedef __PTRDIFF_TYPE__ ptrdiff_t;
 
+/*
+ * Users of this macro are expected to pass a positive value.
+ *
+ * Eventually, this should become an unsigned quantity, but this
+ * requires fixing various uses of this macro and BITS_PER_LONG in signed
+ * contexts, such as type-safe 'min' macro uses, which give rise to build 
errors
+ * when the arguments have differing signedness, due to the build flags used.
+ */
 #define BITS_TO_LONGS(bits) \
 (((bits)+BITS_PER_LONG-1)/BITS_PER_LONG)
 #define DECLARE_BITMAP(name,bits) \
-- 
2.34.1

[XEN PATCH][for-next v2 4/8] x86_64/mm: express macro CNT using LOWEST_BIT

2023-10-12 Thread Nicola Vetrini

The various definitions of macro CNT (and the related BUILD_BUG_ON)
can be rewritten using LOWEST_BIT, encapsulating a violation of
MISRA C:2012 Rule 10.1.

Signed-off-by: Nicola Vetrini 
---
 xen/arch/x86/x86_64/mm.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index c3ebb777144a..0eb7b71124f5 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -351,9 +351,9 @@ static int setup_compat_m2p_table(struct mem_hotadd_info 
*info)
 ~((1UL << (L2_PAGETABLE_SHIFT - 2)) - 1) );

 #define MFN(x) (((x) << L2_PAGETABLE_SHIFT) / sizeof(unsigned int))
-#define CNT ((sizeof(*frame_table) & -sizeof(*frame_table)) / \
+#define CNT (LOWEST_BIT(sizeof(*frame_table)) / \
  sizeof(*compat_machine_to_phys_mapping))
-BUILD_BUG_ON((sizeof(*frame_table) & -sizeof(*frame_table)) % \
+BUILD_BUG_ON(LOWEST_BIT(sizeof(*frame_table)) % \
  sizeof(*compat_machine_to_phys_mapping));

 for ( i = smap; i < emap; i += (1UL << (L2_PAGETABLE_SHIFT - 2)) )
@@ -410,10 +410,10 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
 va = RO_MPT_VIRT_START + smap * sizeof(*machine_to_phys_mapping);

 #define MFN(x) (((x) << L2_PAGETABLE_SHIFT) / sizeof(unsigned long))
-#define CNT ((sizeof(*frame_table) & -sizeof(*frame_table)) / \
+#define CNT (LOWEST_BIT(sizeof(*frame_table)) / \
  sizeof(*machine_to_phys_mapping))

-BUILD_BUG_ON((sizeof(*frame_table) & -sizeof(*frame_table)) % \
+BUILD_BUG_ON(LOWEST_BIT(sizeof(*frame_table)) % \
  sizeof(*machine_to_phys_mapping));

 i = smap;
@@ -539,7 +539,7 @@ void __init paging_init(void)
 mpt_size  = (max_page * BYTES_PER_LONG) + (1UL << L2_PAGETABLE_SHIFT) - 1;
 mpt_size &= ~((1UL << L2_PAGETABLE_SHIFT) - 1UL);
 #define MFN(x) (((x) << L2_PAGETABLE_SHIFT) / sizeof(unsigned long))
-#define CNT ((sizeof(*frame_table) & -sizeof(*frame_table)) / \
+#define CNT (LOWEST_BIT(sizeof(*frame_table)) / \
  sizeof(*machine_to_phys_mapping))
 BUILD_BUG_ON((sizeof(*frame_table) & ~sizeof(*frame_table)) % \
  sizeof(*machine_to_phys_mapping));
@@ -666,7 +666,7 @@ void __init paging_init(void)
 mpt_size = 0;

 #define MFN(x) (((x) << L2_PAGETABLE_SHIFT) / sizeof(unsigned int))
-#define CNT ((sizeof(*frame_table) & -sizeof(*frame_table)) / \
+#define CNT (LOWEST_BIT(sizeof(*frame_table)) / \
  sizeof(*compat_machine_to_phys_mapping))
 BUILD_BUG_ON((sizeof(*frame_table) & ~sizeof(*frame_table)) % \
  sizeof(*compat_machine_to_phys_mapping));
--
2.34.1

[XEN PATCH v2 0/8] address violations of MISRA C:2012 Rule 10.1

2023-10-12 Thread Nicola Vetrini

The widely-used construct
(x & -x), where x is an unsigned integer quantity represented in 2's complement,
does yield the expected result. Since all architectures that are targets for
compliance do fulfill such requirements, the construct is deemed safe and
deviated.

The use of 'DECLARE_BITMAP(features, IOMMU_FEAT_count);' is deviated, to avoid
harming code readability.

On the contrary, other uses of inappropriate types are changed.

Changes in v2:
- the patch 'x86/cpu-policy' has been dropped, in favour of the patch submitted
  by Andrew Cooper.

Nicola Vetrini (8):
  xen/include: add macro LOWEST_BIT
  arm/bitops: encapsulate violation of MISRA C:2012 Rule 10.1
  xen/pdx: amend definition of PDX_GROUP_COUNT
  x86_64/mm: express macro CNT using LOWEST_BIT
  x86/io_apic: address violation of MISRA C:2012 Rule 10.1
  x86/mce: Move MC_NCLASSES into the enum mctelem_class
  xen/types: address Rule 10.1 for DECLARE_BITMAP use
  xen/compat: use BUILD_BUG_ON in CHECK_SIZE macros

 .../eclair_analysis/ECLAIR/deviations.ecl  |  6 ++
 docs/misra/safe.json   |  8 
 xen/arch/arm/include/asm/bitops.h  |  6 --
 xen/arch/x86/cpu/mcheck/mctelem.c  |  2 --
 xen/arch/x86/cpu/mcheck/mctelem.h  |  5 +++--
 xen/arch/x86/include/asm/io_apic.h |  7 ---
 xen/arch/x86/x86_64/mm.c   | 12 ++--
 xen/include/xen/compat.h   | 18 +-
 xen/include/xen/iommu.h|  1 +
 xen/include/xen/macros.h   |  6 --
 xen/include/xen/pdx.h  |  2 +-
 xen/include/xen/types.h|  8 
 12 files changed, 58 insertions(+), 23 deletions(-)

--
2.34.1

[XEN PATCH][for-4.19 v2 8/8] xen/compat: use BUILD_BUG_ON in CHECK_SIZE macros

2023-10-12 Thread Nicola Vetrini

BUILD_BUG_ON is the preferred way to induce a build error
upon statically determined incorrect conditions.

This also fixes a MISRA C:2012 Rule 10.1 violation in the
previous formulation.

Signed-off-by: Nicola Vetrini 
---
Changes in v2:
- replace the construct with a BUILD_BUG_ON.
---
 xen/include/xen/compat.h | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/xen/include/xen/compat.h b/xen/include/xen/compat.h
index f2ce5bb3580a..4daa04183eac 100644
--- a/xen/include/xen/compat.h
+++ b/xen/include/xen/compat.h
@@ -151,12 +151,20 @@ CHECK_NAME_(k, n, T)(k xen_ ## n *x, \
 return x == c; \
 }
 
-#define CHECK_SIZE(name) \
-typedef int CHECK_NAME(name, S)[1 - (sizeof(xen_ ## name ## _t) != \
- sizeof(compat_ ## name ## _t)) * 2]
+#define CHECK_SIZE(name)  \
+static inline void __maybe_unused CHECK_SIZE_##name(void) \
+{ \
+typedef int CHECK_NAME(name, S)[1];   \
+BUILD_BUG_ON(sizeof(xen_ ## name ## _t) !=\
+ sizeof(compat_ ## name ## _t));  \
+}
 #define CHECK_SIZE_(k, n) \
-typedef int CHECK_NAME_(k, n, S)[1 - (sizeof(k xen_ ## n) != \
-  sizeof(k compat_ ## n)) * 2]
+static inline void __maybe_unused CHECK_SIZE_##k_##n(void) \
+{  \
+typedef int CHECK_NAME_(k, n, S)[1];   \
+BUILD_BUG_ON(sizeof(k xen_ ## n) !=\
+ sizeof(k compat_ ## n));  \
+}
 
 #define CHECK_FIELD_COMMON(name, t, f) \
 static inline int __maybe_unused name(xen_ ## t ## _t *x, compat_ ## t ## _t 
*c) \
-- 
2.34.1

[XEN PATCH][for-4.19 v2 2/8] arm/bitops: encapsulate violation of MISRA C:2012 Rule 10.1

2023-10-12 Thread Nicola Vetrini

The definitions of ffs{l}? violate Rule 10.1, by using the well-known
pattern (x & -x); its usage is wrapped by the LOWEST_BIT macro.

No functional change.

Signed-off-by: Nicola Vetrini 
---
 xen/arch/arm/include/asm/bitops.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/include/asm/bitops.h 
b/xen/arch/arm/include/asm/bitops.h
index 71ae14cab355..8b5d61545e19 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -9,6 +9,8 @@
 #ifndef _ARM_BITOPS_H
 #define _ARM_BITOPS_H
 
+#include 
+
 #include 
 
 /*
@@ -155,8 +157,8 @@ static inline int fls(unsigned int x)
 }
 
 
-#define ffs(x) ({ unsigned int __t = (x); fls(__t & -__t); })
-#define ffsl(x) ({ unsigned long __t = (x); flsl(__t & -__t); })
+#define ffs(x) ({ unsigned int __t = (x); fls(LOWEST_BIT(__t)); })
+#define ffsl(x) ({ unsigned long __t = (x); flsl(LOWEST_BIT(__t)); })
 
 /**
  * find_first_set_bit - find the first set bit in @word
-- 
2.34.1

Re: [PATCH v6 06/38] mm: Add default definition of set_ptes()

2023-10-12 Thread David Woodhouse

On Thu, 2023-10-12 at 15:05 +0100, Matthew Wilcox wrote:
> On Thu, Oct 12, 2023 at 02:53:05PM +0100, David Woodhouse wrote:
> > > +   arch_enter_lazy_mmu_mode();
> > > +   for (;;) {
> > > +   set_pte(ptep, pte);
> > > +   if (--nr == 0)
> > > +   break;
> > > +   ptep++;
> > > +   pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
> > > +   }
> > > +   arch_leave_lazy_mmu_mode();
> > 
> > This breaks the Xen PV guest.
> > 
> > In move_ptes() in mm/mremap.c we arch_enter_lazy_mmu_mode() and then
> > loop calling set_pte_at(). Which now (or at least in a few commits time
> > when you wire it up for x86 in commit a3e1c9372c9b959) ends up in your
> > implementation of set_ptes(), calls arch_enter_lazy_mmu_mode() again,
> > and:
> > 
> > [    0.628700] [ cut here ]
> > [    0.628718] kernel BUG at arch/x86/kernel/paravirt.c:144!
> 
> Easy fix ... don't do that ;-)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index af7639c3b0a3..f3da8836f689 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -231,9 +231,11 @@ static inline pte_t pte_next_pfn(pte_t pte)
>  static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> pte_t *ptep, pte_t pte, unsigned int nr)
>  {
> +   bool multiple = nr > 1;
> page_table_check_ptes_set(mm, ptep, pte, nr);
>  
> -   arch_enter_lazy_mmu_mode();
> +   if (multiple)
> +   arch_enter_lazy_mmu_mode();
> for (;;) {
> set_pte(ptep, pte);
> if (--nr == 0)
> @@ -241,7 +243,8 @@ static inline void set_ptes(struct mm_struct *mm, 
> unsigned long addr,
> ptep++;
> pte = pte_next_pfn(pte);
> }
> -   arch_leave_lazy_mmu_mode();
> +   if (multiple)
> +   arch_leave_lazy_mmu_mode();
>  }
>  #endif
>  #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
> 
> I think long-term, we should make lazy_mmu_mode nestable.  But this is
> a reasonable quick fix.

I don't much like doing it implicitly based on (nr==1) but sure, as a
quick fix that works. The 64-bit PV guest now boots again.

Tested-by: David Woodhouse 

Thanks.


smime.p7s
Description: S/MIME cryptographic signature

[xen-unstable test] 183347: tolerable trouble: fail/pass/starved - PUSHED

2023-10-12 Thread osstest service owner

flight 183347 xen-unstable real [real]
flight 183358 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/183347/
http://logs.test-lab.xenproject.org/osstest/logs/183358/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-migrupgrade 21 debian-fixup/dst_host fail pass in 
183358-retest
 test-armhf-armhf-xl   8 xen-bootfail pass in 183358-retest
 test-amd64-amd64-xl-qemuu-win7-amd64 12 windows-install fail pass in 
183358-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop  fail in 183358 like 183337
 test-armhf-armhf-xl 15 migrate-support-check fail in 183358 never pass
 test-armhf-armhf-xl 16 saverestore-support-check fail in 183358 never pass
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183337
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 183337
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 183337
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 183337
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 183337
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183337
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 183337
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 183337
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183337
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 183337
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 183337
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-raw  3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl   3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl-credit2   3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl-xsm   3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl-thunderx  3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl-vhd   3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl-credit1   3 hosts-allocate   starved  n/a
 test-arm64-arm64-libvirt-xsm  3 hosts-allocate   starved  n/a

version targeted for testing:
 xen  dc9d9aa62ddeb14abd5672690d30789829f58f7e
baseline version:
 xen  9713423a06225bcf0f22cab15e8d04870200af7a

Last test of basis   183337  2023-10-11 02:32:27 Z1 days
Testing same since   183347  2023-10-11 20:08:43 Z0 days1 attempts


People who touched revisions under test:
  Alejandro Vallejo 
  Andrew Cooper 
  Jan Beulich 
  Roger Pau Monne 
  Roger Pau Monné 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf

Re: [PATCH v6 06/38] mm: Add default definition of set_ptes()

2023-10-12 Thread Matthew Wilcox

On Thu, Oct 12, 2023 at 02:53:05PM +0100, David Woodhouse wrote:
> > +   arch_enter_lazy_mmu_mode();
> > +   for (;;) {
> > +   set_pte(ptep, pte);
> > +   if (--nr == 0)
> > +   break;
> > +   ptep++;
> > +   pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
> > +   }
> > +   arch_leave_lazy_mmu_mode();
> 
> This breaks the Xen PV guest.
> 
> In move_ptes() in mm/mremap.c we arch_enter_lazy_mmu_mode() and then
> loop calling set_pte_at(). Which now (or at least in a few commits time
> when you wire it up for x86 in commit a3e1c9372c9b959) ends up in your
> implementation of set_ptes(), calls arch_enter_lazy_mmu_mode() again,
> and:
> 
> [0.628700] [ cut here ]
> [0.628718] kernel BUG at arch/x86/kernel/paravirt.c:144!

Easy fix ... don't do that ;-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index af7639c3b0a3..f3da8836f689 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -231,9 +231,11 @@ static inline pte_t pte_next_pfn(pte_t pte)
 static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned int nr)
 {
+   bool multiple = nr > 1;
page_table_check_ptes_set(mm, ptep, pte, nr);
 
-   arch_enter_lazy_mmu_mode();
+   if (multiple)
+   arch_enter_lazy_mmu_mode();
for (;;) {
set_pte(ptep, pte);
if (--nr == 0)
@@ -241,7 +243,8 @@ static inline void set_ptes(struct mm_struct *mm, unsigned 
long addr,
ptep++;
pte = pte_next_pfn(pte);
}
-   arch_leave_lazy_mmu_mode();
+   if (multiple)
+   arch_leave_lazy_mmu_mode();
 }
 #endif
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)

I think long-term, we should make lazy_mmu_mode nestable.  But this is
a reasonable quick fix.

Re: [PATCH v6 06/38] mm: Add default definition of set_ptes()

2023-10-12 Thread David Woodhouse

On Wed, 2023-08-02 at 16:13 +0100, Matthew Wilcox (Oracle) wrote:
> Most architectures can just define set_pte() and PFN_PTE_SHIFT to
> use this definition.  It's also a handy spot to document the guarantees
> provided by the MM.
> 
> Suggested-by: Mike Rapoport (IBM) 
> Signed-off-by: Matthew Wilcox (Oracle) 
> Reviewed-by: Mike Rapoport (IBM) 
> ---
>  include/linux/pgtable.h | 81 ++---
>  1 file changed, 60 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index f34e0f2cb4d8..3fde0d5d1c29 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -182,6 +182,66 @@ static inline int pmd_young(pmd_t pmd)
>  }
>  #endif
>  
> +/*
> + * A facility to provide lazy MMU batching.  This allows PTE updates and
> + * page invalidations to be delayed until a call to leave lazy MMU mode
> + * is issued.  Some architectures may benefit from doing this, and it is
> + * beneficial for both shadow and direct mode hypervisors, which may batch
> + * the PTE updates which happen during this window.  Note that using this
> + * interface requires that read hazards be removed from the code.  A read
> + * hazard could result in the direct mode hypervisor case, since the actual
> + * write to the page tables may not yet have taken place, so reads though
> + * a raw PTE pointer after it has been modified are not guaranteed to be
> + * up to date.  This mode can only be entered and left under the protection 
> of
> + * the page table locks for all page tables which may be modified.  In the UP
> + * case, this is required so that preemption is disabled, and in the SMP 
> case,
> + * it must synchronize the delayed page table writes properly on other CPUs.
> + */
> +#ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE
> +#define arch_enter_lazy_mmu_mode() do {} while (0)
> +#define arch_leave_lazy_mmu_mode() do {} while (0)
> +#define arch_flush_lazy_mmu_mode() do {} while (0)
> +#endif
> +
> +#ifndef set_ptes
> +#ifdef PFN_PTE_SHIFT
> +/**
> + * set_ptes - Map consecutive pages to a contiguous range of addresses.
> + * @mm: Address space to map the pages into.
> + * @addr: Address to map the first page at.
> + * @ptep: Page table pointer for the first entry.
> + * @pte: Page table entry for the first page.
> + * @nr: Number of pages to map.
> + *
> + * May be overridden by the architecture, or the architecture can define
> + * set_pte() and PFN_PTE_SHIFT.
> + *
> + * Context: The caller holds the page table lock.  The pages all belong
> + * to the same folio.  The PTEs are all in the same PMD.
> + */
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +   pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +   page_table_check_ptes_set(mm, ptep, pte, nr);
> +
> +   arch_enter_lazy_mmu_mode();
> +   for (;;) {
> +   set_pte(ptep, pte);
> +   if (--nr == 0)
> +   break;
> +   ptep++;
> +   pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
> +   }
> +   arch_leave_lazy_mmu_mode();
> +}


This breaks the Xen PV guest.

In move_ptes() in mm/mremap.c we arch_enter_lazy_mmu_mode() and then
loop calling set_pte_at(). Which now (or at least in a few commits time
when you wire it up for x86 in commit a3e1c9372c9b959) ends up in your
implementation of set_ptes(), calls arch_enter_lazy_mmu_mode() again,
and:

[0.628700] [ cut here ]
[0.628718] kernel BUG at arch/x86/kernel/paravirt.c:144!
[0.628743] invalid opcode:  [#1] PREEMPT SMP NOPTI
[0.628769] CPU: 0 PID: 1 Comm: init Not tainted 6.5.0-rc4+ #1295
[0.628818] RIP: e030:paravirt_enter_lazy_mmu+0x24/0x30
[0.628839] Code: 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 8b 05 90 
28 f9 7e 85 c0 75 10 65 c7 05 81 28 f9 7e 01 00 00 00 c3 cc cc cc cc <0f> 0b 66 
2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90
[0.628875] RSP: e02b:c9004000ba48 EFLAGS: 00010202
[0.628891] RAX: 0001 RBX: 8880051b7100 RCX: 000ff000
[0.628908] RDX: 8000763ff967 RSI: 8000763ff967 RDI: 8880051b7100
[0.628925] RBP: 8000763ff967 R08: 8880051b6868 R09: 7ffce1a2
[0.628943] R10: deadbeefdeadf00d R11:  R12: 7000
[0.628964] R13: 8880050b7000 R14: 0001 R15: 7fffe000
[0.628988] FS:  () GS:88807b80() 
knlGS:
[0.629007] CS:  e030 DS:  ES:  CR0: 80050033
[0.629024] CR2: c93f5000 CR3: 03904000 CR4: 00050660
[0.629046] Call Trace:
[0.629055]  
[0.629066]  ? die+0x36/0x90
[0.629081]  ? do_trap+0xda/0x100
[0.629093]  ? paravirt_enter_lazy_mmu+0x24/0x30
[0.629112]  ? do_error_trap+0x6a/0x90
[0.629123]  ? paravirt_enter_lazy_mmu+0x24/0x30
[0.629138]  ? exc_invalid_op+0x50/0x70
[0.629155]  ? pa

Re: [XEN PATCH v1] tools/python: Add python3 compatibility

2023-10-12 Thread Javi Merino

On Wed, Oct 11, 2023 at 12:34:27AM +0800, Andrew Cooper wrote:
> On 10/10/2023 10:18 pm, Javi Merino wrote:
> > Most of the work for python3 compatibility was done in
> > 1430c5a8cad4 (tools/python: Python 3 compatibility, 2019-12-18).  This
> > just adds a few builtins needed for python3.
> >
> > Resolves: xen-project/xen#156
> >
> > Signed-off-by: Javi Merino 
> > ---
> >
> > I haven't tested it.
> >
> >  README | 1 +
> >  tools/python/scripts/convert-legacy-stream | 3 +++
> >  2 files changed, 4 insertions(+)
> >
> > diff --git a/README b/README
> > index 855db01d36..44ed88c392 100644
> > --- a/README
> > +++ b/README
> > @@ -51,6 +51,7 @@ provided by your OS distributor:
> >  * POSIX compatible awk
> >  * Development install of zlib (e.g., zlib-dev)
> >  * Development install of Python 2.7 or later (e.g., python-dev)
> > +  - If using Python 2.7, you also need the future module.  This is not 
> > needed for Python 3.
> >  * Development install of curses (e.g., libncurses-dev)
> >  * Development install of openssl (e.g., openssl-dev)
> >  * Development install of x11 (e.g. xorg-x11-dev)
> > diff --git a/tools/python/scripts/convert-legacy-stream 
> > b/tools/python/scripts/convert-legacy-stream
> > index 7fe375a668..26a66c50fc 100755
> > --- a/tools/python/scripts/convert-legacy-stream
> > +++ b/tools/python/scripts/convert-legacy-stream
> > @@ -8,6 +8,9 @@ Convert a legacy migration stream to a v2 stream.
> >  from __future__ import print_function
> >  from __future__ import division
> >  
> > +from builtins import zip
> > +from builtins import range
> > +from builtins import object
> 
> It can't be object because most other scripts use it just fine in py2
> and py3.

This just makes the VM() class behave like python3: str(vm) will
return a unicode string.  After a quick glance at the code I don't
think the code currently does this, but at least the behaviour will
now be consistent between the two versions.

> There's only one single use of zip and range in this script, and range
> is clearly fine (although it wants to be xrange() on py2 and we do
> opencode that places), so I'm guessing the problem is with zip(), but
> it's not exactly clear why?

These changes just make the code be more consistent between python2
and python3.  As I said under the commit message, I have not tested
the changes.

Cheers,
Javi

Re: [XEN PATCH v1] tools/python: Add python3 compatibility

2023-10-12 Thread Javi Merino

On Tue, Oct 10, 2023 at 05:27:03PM +0200, Marek Marczykowski-Górecki wrote:
> On Tue, Oct 10, 2023 at 03:18:45PM +0100, Javi Merino wrote:
> > Most of the work for python3 compatibility was done in
> > 1430c5a8cad4 (tools/python: Python 3 compatibility, 2019-12-18).  This
> > just adds a few builtins needed for python3.
> > 
> > Resolves: xen-project/xen#156
> > 
> > Signed-off-by: Javi Merino 
> > ---
> > 
> > I haven't tested it.
> > 
> >  README | 1 +
> >  tools/python/scripts/convert-legacy-stream | 3 +++
> >  2 files changed, 4 insertions(+)
> > 
> > diff --git a/README b/README
> > index 855db01d36..44ed88c392 100644
> > --- a/README
> > +++ b/README
> > @@ -51,6 +51,7 @@ provided by your OS distributor:
> >  * POSIX compatible awk
> >  * Development install of zlib (e.g., zlib-dev)
> >  * Development install of Python 2.7 or later (e.g., python-dev)
> > +  - If using Python 2.7, you also need the future module.  This is not 
> > needed for Python 3.
> >  * Development install of curses (e.g., libncurses-dev)
> >  * Development install of openssl (e.g., openssl-dev)
> >  * Development install of x11 (e.g. xorg-x11-dev)
> > diff --git a/tools/python/scripts/convert-legacy-stream 
> > b/tools/python/scripts/convert-legacy-stream
> > index 7fe375a668..26a66c50fc 100755
> > --- a/tools/python/scripts/convert-legacy-stream
> > +++ b/tools/python/scripts/convert-legacy-stream
> > @@ -8,6 +8,9 @@ Convert a legacy migration stream to a v2 stream.
> >  from __future__ import print_function
> >  from __future__ import division
> >  
> > +from builtins import zip
> > +from builtins import range
> > +from builtins import object
> 
> In which python version it's needed? The thing about builtins is they
> are available without explicit import.

In python3, this change is a noop.  In python2, this change makes zip,
range and object be consistent with its python3 counterparts:

- zip: with the change, zip returns an iterator in python2,
  like it does in python3.  It's equivalent to izip from itertools or
  `from future_builtins import zip`[0]

- range: with the change, in python2 range behaves like xrange() used
  to. It returns an object instead of returning the list.[1]

- object: with the change, in python2 the string representation of
  VM() is in unicode, as it is for python3.

[0] 
https://stackless.readthedocs.io/en/2.7-slp/library/future_builtins.html#future_builtins.zip
[1] https://docs.python.org/3/library/stdtypes.html#ranges

Cheers,
Javi

[linux-linus test] 183344: tolerable FAIL - PUSHED

2023-10-12 Thread osstest service owner

flight 183344 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183344/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail blocked in 
183322
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 183322
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 183322
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 183322
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183322
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183322
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 183322
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 183322
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linux1c8b86a3799f7e5be903c3f49fcdaee29fd385b5
baseline version:
 linux94f6f0550c625fab1f373bb86a6669b45e9748b3

Last test of basis   183322  2023-10-09 01:11:16 Z3 days
Testing same since   183334  2023-10-10 19:11:59 Z1 days2 attempts


People who touched revisions under test:
  Alex Bee 
  Amadeusz Sławiński 
  Antoine Gennart 
  Anup Patel 
  Balamurugan C 
  Bard Liao 
  Ben Wolsieffer 
  Biju Das 
  Christos Skevis 
  Claudiu Beznea 
  Conor Dooley 
  Dmitry Baryshkov 
  Fabio Estevam 
  Geert Uytterhoeven 
  Herbert Xu 
  Juergen Gross 
  Kailang Yang 
  Konrad Dybcio 
  Krzysztof Kozlowski 
  Kuninori Morimoto 
  Lad Prabhakar 
  Linus Torvalds 
  Lorenzo Pieralisi 
  Marc Zyngier 
  Mark Brown 
  Mathias Krause 
  Matthias Reichl 
  Miquel Raynal 
  Neil Armstrong 
  Olaf Hering 
  Pierre-Louis Bossart 
  Saurabh Sengar 
  Shengjiu Wang 
  Shradha Gupta 
  Stefan Binding 
  Sumit Garg 
  Sven Frotscher 
  Takashi Iwai 
  Thomas Gleixner 
  Vijendar Mukunda 
  W

Re: [XEN PATCH][for-4.19 v2 1/1] xen: introduce a deviation for Rule 11.9

2023-10-12 Thread Nicola Vetrini


On 11/10/2023 18:56, andrew.coop...@citrix.com wrote:

On 11/10/2023 8:46 pm, Nicola Vetrini wrote:

diff --git a/docs/misra/deviations.rst b/docs/misra/deviations.rst
index ee7aed0609d2..1b00e4e3e9b7 100644
--- a/docs/misra/deviations.rst
+++ b/docs/misra/deviations.rst
@@ -199,6 +199,11 @@ Deviations related to MISRA C:2012 Rules:
See automation/eclair_analysis/deviations.ecl for the full 
explanation.

  - Tagged as `safe` for ECLAIR.

+   * - R11.9
+ - __ACCESS_ONCE uses a 0 as a null pointer constant to check if 
a type is

+   scalar, therefore its usage for this purpose is allowed.
+ - Tagged as `deliberate` for ECLAIR.


Really?

#define __ACCESS_ONCE(x)
    (void)(typeof(x))0; /* Scalar typecheck. */

That's not a pointer.

If someone were to pass in an x who's type was pointer-to-object, then
yes it is technically a NULL pointer constant for long enough for the
build to error out.

What justification is Eclair using to suggest that it is a pointer?

If you really really want to shut things up, it doesn't need to be 0 -
it could be 1 or any other integer, but this honestly feels like a bug
in Eclair to me.

~Andrew


The non-compliant uses found by the checker were due to function 
pointers

e.g.

void (*fp)(int i);

violation for rule MC3R1.R11.9: (required) The macro `NULL' shall be the 
only permitted form of integer null pointer constant. (untagged)
p.c:12.3-12.7: (MACRO) Loc #1 [culprit: non-compliant use of null 
pointer constant]

  A(fp) = NULL;
  <~~~>
p.c:6.8-6.19: for #1 [culprit: expanded from macro `_A']
 (void)(typeof(X))0; \
   <~~>
p.c:9.16-9.20: for #1 [culprit: expanded from macro `A']
#define A(X) (*_A(X))
   <~~~>

These uses do not cause a build fail, and we deemed this usage of 0 to 
be correct
(a neutral value that would allow __ACCESS_ONCE to support any type of 
argument).
While perhaps some other value does have the same property (e.g., 1), we 
felt that it was

okay to let 0 remain there.

--
Nicola Vetrini, BSc
Software Engineer, BUGSENG srl (https://bugseng.com)

Rust Xen Guest Agent

2023-10-12 Thread Yann Dirson

A new pre-release of our guest agent prototype written in Rust is 
available, numbered 0.2.0 [1].  Identified issues and work to be done 
are tracked in Gitlab issue tracker [3].

More details can be found in this blog post [2].

As always, feedback will be greatly appreciated!


[1] https://gitlab.com/xen-project/xen-guest-agent/-/releases/0.2.0
[2] https://xcp-ng.org/blog/2023/10/12/updates-on-the-rust-guest-tools/
[3] https://gitlab.com/xen-project/xen-guest-agent/-/issues


Yann Dirson | Vates Platform Developer

XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech

[ovmf test] 183355: all pass - PUSHED

2023-10-12 Thread osstest service owner

flight 183355 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183355/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 326b9e1d815c4ae4b0b207fcb0bfa16864af5400
baseline version:
 ovmf eebd446875a4b1e4879e03764f63c6c358fd64f5

Last test of basis   183352  2023-10-12 05:42:31 Z0 days
Testing same since   183355  2023-10-12 09:13:57 Z0 days1 attempts


People who touched revisions under test:
  Yuanhao Xie 
  YuanhaoXie 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   eebd446875..326b9e1d81  326b9e1d815c4ae4b0b207fcb0bfa16864af5400 -> 
xen-tested-master

[xen-4.16-testing test] 183350: regressions - trouble: fail/pass/starved

2023-10-12 Thread osstest service owner

flight 183350 xen-4.16-testing real [real]
flight 183356 xen-4.16-testing real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/183350/
http://logs.test-lab.xenproject.org/osstest/logs/183356/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-libvirt-pair 30 leak-check/check/src_host fail in 183341 REGR. 
vs. 183082

Tests which are failing intermittently (not blocking):
 test-amd64-i386-qemut-rhel6hvm-amd 7 xen-install fail in 183341 pass in 183350
 test-amd64-i386-qemuu-rhel6hvm-amd 7 xen-install fail in 183341 pass in 183350
 test-amd64-i386-xl-shadow 20 guest-localmigrate/x10 fail in 183341 pass in 
183350
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 18 
guest-start/debianhvm.repeat fail in 183341 pass in 183350
 test-amd64-i386-libvirt-pair 28 guest-migrate/dst_host/src_host/debian.repeat 
fail pass in 183341

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-thunderx 15 migrate-support-check fail in 183341 never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-check fail in 183341 never 
pass
 test-arm64-arm64-xl-credit1 15 migrate-support-check fail in 183341 never pass
 test-arm64-arm64-xl-credit1 16 saverestore-support-check fail in 183341 never 
pass
 test-arm64-arm64-xl-credit2 15 migrate-support-check fail in 183341 never pass
 test-arm64-arm64-xl-credit2 16 saverestore-support-check fail in 183341 never 
pass
 test-arm64-arm64-xl-xsm 15 migrate-support-check fail in 183341 never pass
 test-arm64-arm64-xl-xsm 16 saverestore-support-check fail in 183341 never pass
 test-arm64-arm64-xl 15 migrate-support-check fail in 183341 never pass
 test-arm64-arm64-xl 16 saverestore-support-check fail in 183341 never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-check fail in 183341 never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-check fail in 183341 never 
pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-check fail in 183341 never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-check fail in 183341 never 
pass
 test-arm64-arm64-xl-vhd 14 migrate-support-check fail in 183341 never pass
 test-arm64-arm64-xl-vhd 15 saverestore-support-check fail in 183341 never pass
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 183082
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183082
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 183082
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 183082
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 183082
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 183082
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183082
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 183082
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183082
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 183082
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 183082
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 183082
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-che

[ovmf test] 183352: all pass - PUSHED

2023-10-12 Thread osstest service owner

flight 183352 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183352/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf eebd446875a4b1e4879e03764f63c6c358fd64f5
baseline version:
 ovmf 95c9f470ca6eab23bfd016bd438f932d7263d395

Last test of basis   183339  2023-10-11 05:10:50 Z1 days
Testing same since   183352  2023-10-12 05:42:31 Z0 days1 attempts


People who touched revisions under test:
  Wenxing Hou 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   95c9f470ca..eebd446875  eebd446875a4b1e4879e03764f63c6c358fd64f5 -> 
xen-tested-master

Re: [Xen-devel] [PATCH] x86/sm{e, a}p: do not enable SMEP/SMAP in PV shim by default on AMD

2023-10-12 Thread Andrew Cooper

On 12/10/2023 4:21 pm, David Woodhouse wrote:
> On Thu, 2023-10-12 at 10:13 +0800, andrew.coop...@citrix.com wrote:
>> On 11/10/2023 7:34 pm, David Woodhouse wrote:
>>> But why does the shim even need to turn it off when switching to the
>>> guest context? Its guest isn't running in supervisor mode so surely it
>>> doesn't *matter* whether SMEP is enabled or not? Why not just leave it
>>> on at all times?
>>
>> 32bit PV kernels run in Ring1.  Which is supervisor and not user.
> 
> Ah, thanks.
> 
>> Some older PV kernels do execute on user pages, and don't like getting
>> SMEP faults when they didn't turn it on to begin with.
> 
> PV guests never actually had the option to turn SMEP on, did they? 
> 
> (Otherwise I may have to rethink the approach of just putting
> 'smep=off' onto the shim command line when running under KVM...)

Xen and PV guests share a set of pagetables.  There is no ability to
independently control SMEP/SMAP.

While we could in principle make SMEP an feature that PV kernels can opt
into, SMAP we really can't.  The emulation costs of STAC/CLAC are
obscene from a perf perspective.

One TODO which has yet to be done is to look at the PV kernel's
ELF32/64-ness.  For a shimmed 64bit guest, SMEP/SMAP should be on and
stay on.

TBH, it's probably best to just hide the SMEP/SMAP features, rather than
to play around with the cmdline.

~Andrew

Re: [Xen-devel] [PATCH] x86/sm{e, a}p: do not enable SMEP/SMAP in PV shim by default on AMD

2023-10-12 Thread David Woodhouse

On Thu, 2023-10-12 at 10:13 +0800, andrew.coop...@citrix.com wrote:
> On 11/10/2023 7:34 pm, David Woodhouse wrote:
> > But why does the shim even need to turn it off when switching to the
> > guest context? Its guest isn't running in supervisor mode so surely it
> > doesn't *matter* whether SMEP is enabled or not? Why not just leave it
> > on at all times?
> 
> 32bit PV kernels run in Ring1.  Which is supervisor and not user.

Ah, thanks.

> Some older PV kernels do execute on user pages, and don't like getting
> SMEP faults when they didn't turn it on to begin with.

PV guests never actually had the option to turn SMEP on, did they? 

(Otherwise I may have to rethink the approach of just putting
'smep=off' onto the shim command line when running under KVM...)



smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH mm-unstable v9 14/31] s390: Convert various pgalloc functions to use ptdescs

2023-10-12 Thread Heiko Carstens

On Mon, Aug 07, 2023 at 04:04:56PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Acked-by: Mike Rapoport (IBM) 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  arch/s390/include/asm/pgalloc.h |   4 +-
>  arch/s390/include/asm/tlb.h |   4 +-
>  arch/s390/mm/pgalloc.c  | 128 
>  3 files changed, 69 insertions(+), 67 deletions(-)
...
> diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
> index d7374add7820..07fc660a24aa 100644
> --- a/arch/s390/mm/pgalloc.c
> +++ b/arch/s390/mm/pgalloc.c
...
> @@ -488,16 +486,20 @@ static void base_pgt_free(unsigned long *table)
>  static unsigned long *base_crst_alloc(unsigned long val)
>  {
>   unsigned long *table;
> + struct ptdesc *ptdesc;
>  
> - table = (unsigned long *)__get_free_pages(GFP_KERNEL, CRST_ALLOC_ORDER);
> - if (table)
> - crst_table_init(table, val);
> + ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, CRST_ALLOC_ORDER);

I guess I must miss something, but what is the reason to mask out
__GFP_HIGHMEM here? It is not part of GFP_KERNEL, nor does s390 support
HIGHMEM.

91 matches

Mail list logo