date:20230914

Re: [PATCH v5 2/5] xen/ppc: Implement bitops.h

2023-09-14 Thread Jan Beulich

On 14.09.2023 20:15, Shawn Anastasio wrote:
> On 9/13/23 2:29 AM, Jan Beulich wrote:
>> On 12.09.2023 20:35, Shawn Anastasio wrote:
>>> --- a/xen/arch/ppc/include/asm/bitops.h
>>> +++ b/xen/arch/ppc/include/asm/bitops.h
>>> @@ -1,9 +1,335 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>>> +/*
>>> + * Adapted from Linux's arch/powerpc/include/asm/bitops.h.
>>> + *
>>> + * Merged version by David Gibson .
>>> + * Based on ppc64 versions by: Dave Engebretsen, Todd Inglett, Don
>>> + * Reed, Pat McCarthy, Peter Bergner, Anton Blanchard.  They
>>> + * originally took it from the ppc32 code.
>>> + */
>>>  #ifndef _ASM_PPC_BITOPS_H
>>>  #define _ASM_PPC_BITOPS_H
>>>
>>> +#include 
>>> +
>>> +#define __set_bit(n, p) set_bit(n, p)
>>> +#define __clear_bit(n, p)   clear_bit(n, p)
>>> +
>>> +#define BITOP_BITS_PER_WORD 32
>>> +#define BITOP_MASK(nr)  (1U << ((nr) % BITOP_BITS_PER_WORD))
>>> +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
>>> +#define BITS_PER_BYTE   8
>>> +
>>>  /* PPC bit number conversion */
>>> -#define PPC_BITLSHIFT(be)  (BITS_PER_LONG - 1 - (be))
>>> -#define PPC_BIT(bit)   (1UL << PPC_BITLSHIFT(bit))
>>> -#define PPC_BITMASK(bs, be)((PPC_BIT(bs) - PPC_BIT(be)) | 
>>> PPC_BIT(bs))
>>> +#define PPC_BITLSHIFT(be)(BITS_PER_LONG - 1 - (be))
>>> +#define PPC_BIT(bit) (1UL << PPC_BITLSHIFT(bit))
>>> +#define PPC_BITMASK(bs, be)  ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
>>> +
>>> +/* Macro for generating the ***_bits() functions */
>>> +#define DEFINE_BITOP(fn, op, prefix)   
>>> \
>>> +static inline void fn(unsigned int mask,   
>>>\
>>> +  volatile unsigned int *p_)   
>>> \
>>> +{  
>>> \
>>> +unsigned int old;  
>>>\
>>> +unsigned int *p = (unsigned int *)p_;  
>>> \
>>
>> What use is this, when ...
>>
>>> +asm volatile ( prefix  
>>> \
>>> +   "1: lwarx %0,0,%3,0\n"  
>>> \
>>> +   #op "%I2 %0,%0,%2\n"
>>> \
>>> +   "stwcx. %0,0,%3\n"  
>>> \
>>> +   "bne- 1b\n" 
>>> \
>>> +   : "=&r" (old), "+m" (*p)
>>> \
>>
>> ... the "+m" operand isn't used and ...
>>
>>> +   : "rK" (mask), "r" (p)  
>>> \
>>> +   : "cc", "memory" ); 
>>> \
>>
>> ... there's a memory clobber anyway?
>>
> 
> I see what you're saying, and I'm not sure why it was written this way
> in Linux. That said, this is the kind of thing that I'm hesitant to
> change without knowing the rationale of the original author. If you are
> confident that the this can be dropped given that there is already a
> memory clobber, I could drop it. Otherwise I'm inclined to leave it in a
> state that matches Linux.

This being an arch-independent property, I am confident. Yet still you're
the maintainer, so if you want to keep it like this initially, that'll be
okay. If I feel bothered enough, I could always send a patch afterwards.

Jan

Re: [PATCH v1 16/29] xen/asm-generic: introduce stub header flushtlb.h

2023-09-14 Thread Jiamei Xie


Hi Oleksii

On 2023/9/14 22:56, Oleksii Kurochko wrote:

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
  xen/include/asm-generic/flushtlb.h | 42 ++
  1 file changed, 42 insertions(+)
  create mode 100644 xen/include/asm-generic/flushtlb.h

diff --git a/xen/include/asm-generic/flushtlb.h 
b/xen/include/asm-generic/flushtlb.h
new file mode 100644
index 00..79e4773179
--- /dev/null
+++ b/xen/include/asm-generic/flushtlb.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_FLUSHTLB_H__
+#define __ASM_GENERIC_FLUSHTLB_H__
+
+#include 
+
+/*
+ * Filter the given set of CPUs, removing those that definitely flushed their
+ * TLB since @page_timestamp.
+ */
+/* XXX lazy implementation just doesn't clear anything */
+static inline void tlbflush_filter(cpumask_t *mask, uint32_t page_timestamp) {}
+
+#define tlbflush_current_time() (0)
+
+static inline void page_set_tlbflush_timestamp(struct page_info *page)
+{
+BUG();
+}
+
+/* Flush specified CPUs' TLBs */
+void arch_flush_tlb_mask(const cpumask_t *mask);
+
+#endif /* __ASM_GENERIC_FLUSHTLB_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
+

It's duplicated.

+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: BSD
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */

[linux-linus test] 183004: regressions - FAIL

2023-09-14 Thread osstest service owner

flight 183004 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183004/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 8 xen-boot fail REGR. 
vs. 182531
 test-amd64-coresched-amd64-xl  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-libvirt-pair 12 xen-boot/src_host   fail REGR. vs. 182531
 test-amd64-amd64-libvirt-pair 13 xen-boot/dst_host   fail REGR. vs. 182531
 test-amd64-amd64-xl-xsm   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-raw  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-qcow2  8 xen-boot   fail REGR. vs. 182531
 test-amd64-amd64-xl-vhd   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-pvhv2-intel  8 xen-boot  fail REGR. vs. 182531
 test-amd64-amd64-xl-pvshim8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-freebsd12-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-ws16-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-debianhvm-amd64  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-libvirt-xsm  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-win7-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-win7-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-dom0pvh-xl-amd 14 guest-start   fail REGR. vs. 182531
 test-amd64-amd64-xl-shadow8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-xl-credit1   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-examine-uefi  8 reboot  fail REGR. vs. 182531
 test-amd64-amd64-pair12 xen-boot/src_hostfail REGR. vs. 182531
 test-amd64-amd64-pair13 xen-boot/dst_hostfail REGR. vs. 182531
 test-amd64-amd64-xl-multivcpu  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-qemuu-nested-intel  8 xen-boot  fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-ws16-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-pvhv2-amd  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-ovmf-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-qemuu-nested-amd  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-dom0pvh-xl-intel 14 guest-start fail REGR. vs. 182531
 test-amd64-amd64-libvirt  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-freebsd11-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-pygrub   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 8 xen-boot fail REGR. vs. 
182531
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 8 xen-boot fail REGR. 
vs. 182531
 test-amd64-amd64-xl-credit2   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-examine-bios  8 reboot  fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-examine  8 reboot   fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 8 xen-boot fail REGR. vs. 
182531

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds  8 xen-boot fail REGR. vs. 182531

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 182531
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 182531
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 182531
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-x

[xen-unstable-smoke test] 183006: tolerable all pass - PUSHED

2023-09-14 Thread osstest service owner

flight 183006 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183006/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  b5a601093d1f9d5e96eb74b692f1b6252a2598a2
baseline version:
 xen  6aa25c32180ab59081c73bae4c568367d9133a1f

Last test of basis   182991  2023-09-13 08:00:30 Z1 days
Testing same since   183006  2023-09-15 02:00:28 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   6aa25c3218..b5a601093d  b5a601093d1f9d5e96eb74b692f1b6252a2598a2 -> smoke

Re: [PATCH v10 02/38] x86/opcode: Add the WRMSRNS instruction to the x86 opcode map

2023-09-14 Thread Google

On Wed, 13 Sep 2023 21:47:29 -0700
Xin Li  wrote:

> Add the opcode used by WRMSRNS, which is the non-serializing version of
> WRMSR and may replace it to improve performance, to the x86 opcode map.
> 
> Tested-by: Shan Kang 
> Signed-off-by: Xin Li 

This looks good to me.

Acked-by: Masami Hiramatsu (Google) 

Thanks,

> ---
>  arch/x86/lib/x86-opcode-map.txt   | 2 +-
>  tools/arch/x86/lib/x86-opcode-map.txt | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
> index 5168ee0360b2..1efe1d9bf5ce 100644
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -1051,7 +1051,7 @@ GrpTable: Grp6
>  EndTable
>  
>  GrpTable: Grp7
> -0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME 
> (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B)
> +0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME 
> (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | 
> WRMSRNS (110),(11B)
>  1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | 
> STAC (011),(11B) | ENCLS (111),(11B)
>  2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | 
> XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
>  3: LIDT Ms
> diff --git a/tools/arch/x86/lib/x86-opcode-map.txt 
> b/tools/arch/x86/lib/x86-opcode-map.txt
> index 5168ee0360b2..1efe1d9bf5ce 100644
> --- a/tools/arch/x86/lib/x86-opcode-map.txt
> +++ b/tools/arch/x86/lib/x86-opcode-map.txt
> @@ -1051,7 +1051,7 @@ GrpTable: Grp6
>  EndTable
>  
>  GrpTable: Grp7
> -0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME 
> (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B)
> +0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME 
> (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | 
> WRMSRNS (110),(11B)
>  1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | 
> STAC (011),(11B) | ENCLS (111),(11B)
>  2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | 
> XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
>  3: LIDT Ms
> -- 
> 2.34.1
> 


-- 
Masami Hiramatsu (Google)

Re: [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support

2023-09-14 Thread Juergen Gross


On 15.09.23 03:16, andrew.coop...@citrix.com wrote:

On 15/09/2023 2:01 am, H. Peter Anvin wrote:

The whole bit with alternatives and pvops being separate is a major
maintainability problem, and honestly it never made any sense in the
first place. Never have two mechanisms to do one job; it makes it
harder to grok their interactions.


This bit is easy.

Juergen has already done the work to delete one of these two patching
mechanisms and replace it with the other.

https://lore.kernel.org/lkml/a32e211f-4add-4fb2-9e5a-480ae9b9b...@suse.com/

Unfortunately, it's only collecting pings and tumbleweeds.


Indeed.

Unfortunately there is probably some objtool support needed for that, which I'm
not sure how to implement.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [PATCH v10 08/38] x86/cpufeatures: Add the cpu feature bit for FRED

2023-09-14 Thread Juergen Gross


On 15.09.23 03:07, Thomas Gleixner wrote:

On Thu, Sep 14 2023 at 14:15, andrew wrote:

PV guests are never going to see FRED (or LKGS for that matter) because
it advertises too much stuff which simply traps because the kernel is in
CPL3.

That said, the 64bit PV ABI is a whole lot closer to FRED than it is to
IDT delivery.  (Almost as if we decided 15 years ago that giving the PV
guest kernel a good stack and GSbase was the right thing to do...)


No argument about that.


In some copious free time, I think we ought to provide a
minorly-paravirt FRED to PV guests because there are still some
improvements available as low hanging fruit.

My plan was to have a PV hypervisor leaf advertising paravirt versions
of hardware features, so a guest could see "I don't have architectural
FRED, but I do have paravirt-FRED which is as similar as we can
reasonably make it".  The same goes for a whole bunch of other features.


*GROAN*

I told you before that we want less paravirt nonsense and not more.


I agree.

We will still have to support the PV stuff for non-FRED hypervisors even with
pv-FRED being available on new Xen. So adding pv-FRED would just add more PV
interfaces without the ability to remove old stuff.


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [PATCH] xen/efi: refactor deprecated strncpy

2023-09-14 Thread Kees Cook

On Mon, Sep 11, 2023 at 06:59:31PM +, Justin Stitt wrote:
> `strncpy` is deprecated for use on NUL-terminated destination strings [1].
> 
> `efi_loader_signature` has space for 4 bytes. We are copying "Xen" (3 bytes)
> plus a NUL-byte which makes 4 total bytes. With that being said, there is
> currently not a bug with the current `strncpy()` implementation in terms of
> buffer overreads but we should favor a more robust string interface
> either way.

Yeah, this will work. Since this is a u32 destination, I do wonder if
strtomem_pad() would be better since we're not really writing a string?
But since this is all hard-coded, it doesn't matter. :)

Reviewed-by: Kees Cook 

-Kees

> 
> A suitable replacement is `strscpy` [2] due to the fact that it guarantees
> NUL-termination on the destination buffer while being functionally the
> same in this case.
> 
> Link: 
> www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings[1]
> Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html 
> [2]
> Link: https://github.com/KSPP/linux/issues/90
> Cc: linux-harden...@vger.kernel.org
> Cc: Kees Cook 
> Signed-off-by: Justin Stitt 
> ---
> Note: build-tested
> ---
>  arch/x86/xen/efi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/xen/efi.c b/arch/x86/xen/efi.c
> index 863d0d6b3edc..7250d0e0e1a9 100644
> --- a/arch/x86/xen/efi.c
> +++ b/arch/x86/xen/efi.c
> @@ -138,7 +138,7 @@ void __init xen_efi_init(struct boot_params *boot_params)
>   if (efi_systab_xen == NULL)
>   return;
>  
> - strncpy((char *)&boot_params->efi_info.efi_loader_signature, "Xen",
> + strscpy((char *)&boot_params->efi_info.efi_loader_signature, "Xen",
>   sizeof(boot_params->efi_info.efi_loader_signature));
>   boot_params->efi_info.efi_systab = (__u32)__pa(efi_systab_xen);
>   boot_params->efi_info.efi_systab_hi = (__u32)(__pa(efi_systab_xen) >> 
> 32);
> 
> ---
> base-commit: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c
> change-id: 20230911-strncpy-arch-x86-xen-efi-c-14292f5a79ee
> 
> Best regards,
> --
> Justin Stitt 
> 

-- 
Kees Cook

Re: [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support

2023-09-14 Thread H. Peter Anvin


On 9/14/23 18:46, andrew.coop...@citrix.com wrote:

On 15/09/2023 1:38 am, H. Peter Anvin wrote:

On 9/14/23 17:33, andrew.coop...@citrix.com wrote:


It's an assumption about what "definitely won't" be paravirt in the
future.

XenPV stack handling is almost-FRED-like and has been for the better
part of two decades.

You frequently complain that there's too much black magic holding XenPV
together.  A paravirt-FRED will reduce the differences vs native
substantially.



Call it "paravirtualized exception handling." In that sense, the
refactoring of the exception handling to benefit FRED is definitely
useful for reducing paravirtualization. The FRED-specific code is
largely trivial, and presumably what you would do is to replace the
FRED wrapper with a Xen wrapper and call the common handler routines.


Why do only half the job?

There's no need for any Xen wrappers at all when XenPV can use the
native FRED paths, as long as ERETU, ERETS and the relevant MSRs can be
paravirt (sure - with an interface that sucks less than right now) so
they're not taking the #GP/emulate in Xen path.

And this can work on all hardware with a slightly-future version of Xen
and Linux, because it's just a minor adjustment to how Xen writes the
exception frame on the guests stack as part of event delivery.



It's not about doing "half the job", it's about using the proper 
abstraction mechanism. By all means, if you can join the common code 
flow earlier, do so, but paravirtualizing the entry/exit stubs which is 
the *only* place ERETU and ERETS show up is just crazy.


Similarly, nearly all the MSRs are just configuration setup. The only 
ones which have any kind of performance relevance is the stack setup (RSP0).


-hpa

Re: [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support

2023-09-14 Thread andrew . cooper3

On 15/09/2023 1:38 am, H. Peter Anvin wrote:
> On 9/14/23 17:33, andrew.coop...@citrix.com wrote:
>>
>> It's an assumption about what "definitely won't" be paravirt in the
>> future.
>>
>> XenPV stack handling is almost-FRED-like and has been for the better
>> part of two decades.
>>
>> You frequently complain that there's too much black magic holding XenPV
>> together.  A paravirt-FRED will reduce the differences vs native
>> substantially.
>>
>
> Call it "paravirtualized exception handling." In that sense, the
> refactoring of the exception handling to benefit FRED is definitely
> useful for reducing paravirtualization. The FRED-specific code is
> largely trivial, and presumably what you would do is to replace the
> FRED wrapper with a Xen wrapper and call the common handler routines.

Why do only half the job?

There's no need for any Xen wrappers at all when XenPV can use the
native FRED paths, as long as ERETU, ERETS and the relevant MSRs can be
paravirt (sure - with an interface that sucks less than right now) so
they're not taking the #GP/emulate in Xen path.

And this can work on all hardware with a slightly-future version of Xen
and Linux, because it's just a minor adjustment to how Xen writes the
exception frame on the guests stack as part of event delivery.

~Andrew

Re: [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support

2023-09-14 Thread andrew . cooper3

On 15/09/2023 2:01 am, H. Peter Anvin wrote:
> The whole bit with alternatives and pvops being separate is a major
> maintainability problem, and honestly it never made any sense in the
> first place. Never have two mechanisms to do one job; it makes it
> harder to grok their interactions.

This bit is easy.

Juergen has already done the work to delete one of these two patching
mechanisms and replace it with the other.

https://lore.kernel.org/lkml/a32e211f-4add-4fb2-9e5a-480ae9b9b...@suse.com/

Unfortunately, it's only collecting pings and tumbleweeds.

~Andrew

Re: [PATCH v10 08/38] x86/cpufeatures: Add the cpu feature bit for FRED

2023-09-14 Thread Thomas Gleixner

On Thu, Sep 14 2023 at 14:15, andrew wrote:
> PV guests are never going to see FRED (or LKGS for that matter) because
> it advertises too much stuff which simply traps because the kernel is in
> CPL3.
>
> That said, the 64bit PV ABI is a whole lot closer to FRED than it is to
> IDT delivery.  (Almost as if we decided 15 years ago that giving the PV
> guest kernel a good stack and GSbase was the right thing to do...)

No argument about that.

> In some copious free time, I think we ought to provide a
> minorly-paravirt FRED to PV guests because there are still some
> improvements available as low hanging fruit.
>
> My plan was to have a PV hypervisor leaf advertising paravirt versions
> of hardware features, so a guest could see "I don't have architectural
> FRED, but I do have paravirt-FRED which is as similar as we can
> reasonably make it".  The same goes for a whole bunch of other features.

*GROAN*

I told you before that we want less paravirt nonsense and not more. I'm
serious about that. XENPV CPL3 virtualization is a dead horse from a
technical POV. No point in wasting brain cycles to enhance the zombie
unless you can get rid of the existing PV nonsense, which you can't for
obvious reasons.

That said, we can debate this once the more fundamental issues of
XEN[PV] have been addressed. I expect that to happen quite some time
after I retired :)

Thanks,

tglx

Re: [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support

2023-09-14 Thread H. Peter Anvin

WRMSR has one complexity that most other PV-ops don't, and that's the
exception table reference for the instruction itself.

In a theoretical future ought to look like:

mov$msr, %ecx
mov$lo, %eax
mov$hi, %edx
1: {call paravirt_blah(%rip) | cs...cs wrmsr | cs...cs wrmsrns }
_ASM_EXTABLE(1b, ...)

In paravirt builds, the CALL needs to be the emitted form, because it
needs to function in very early boot.

But once the paravirt-ness has been chosen and alternatives run, the
as-native paths are fully inline.

The alternative which processes this site wants to conclude that, in the
case it does not alter from the CALL, to clobber the EXTABLE reference. 
CALL instructions can #GP, and you don't want to end up thinking you're

handling a WRMSR #GP when in fact it was a non-canonical function pointer.

On 9/14/23 17:36, andrew.coop...@citrix.com wrote:> On 15/09/2023 1:07 
am, H. Peter Anvin wrote:
>> Is *that* your concern?! Just put a NOP before WRMSR – you need 
padding NOP bytes anyway – and the extable entry is no longer at the 
same address. Problem solved.

>>
>> Either that, or use a direct call, which can't #GP in the address 
range used by the kernel.

>
> For non-paravirt builds, I really hope the inlining DoesTheRightThing.
> If it doesn't lets fix it to do so.
>
> For paravirt builds, the emitted form must be the indirect call so it
> can function in boot prior to alternatives running [1].
>
No, it doesn't. You always have the option of a direct call to an 
indirect JMP. This is in fact exactly what userspace does in the form of 
the PLT.

> So you still need some way of putting the EXTABLE reference at the
> emitted site, not in the .altintr_replacement section where the
> WRMSR{,NS} instruction lives.  This needs to be at build time because
> the EXTABLE references aren't shuffled at runtime.
>
> How else do you propose getting an extable reference to midway through
> an instruction on the "wrong" part of an alternative?
Well, obviously there has to be a magic inline at the patch site. It 
ends up looking something like this:

asm volatile("1:"
 ALTERNATIVE_2("call pv_wrmsr(%%rip)",
"nop; wrmsr", X86_FEATURE_NATIVE_MSR,
"nop; wrmsrns", X86_FEATURE_WRMSRNS)
 "2:"
 _ASM_EXTABLE_TYPE(1b+1, 2b, EX_TYPE_WRMSR)
 : : "c" (msr), "a" (low), "d" (high) : "memory");

[one can argue whether or not WRMSRNS specifically should require 
"memory" or not.]

The whole bit with alternatives and pvops being separate is a major 
maintainability problem, and honestly it never made any sense in the 
first place. Never have two mechanisms to do one job; it makes it harder 
to grok their interactions.

As an alternative to the NOP, the EX_TYPE_*MSR* handlers could simply 
look for a CALL opcode at the origin.

-hpa

Re: [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support

2023-09-14 Thread Thomas Gleixner

On Fri, Sep 15 2023 at 00:46, andrew wrote:
> On 15/09/2023 12:00 am, Thomas Gleixner wrote:
> What I meant was "there should be the two top-level APIs, and under the
> covers they DTRT".  Part of doing the right thing is to wire up paravirt
> for configs where that is specified.
>
> Anything else is going to force people to write logic of the form:
>
>     if (WRMSRNS && !XENPV)
>     wrmsr_ns(...)
>     else
>         wrmsr(...)
>
> which is going to be worse overall.

I already agreed with that for generic code which might be affected by
PV. But code which is explicitely depending on something which never can
be affected by PV _and_ is in a performance sensitive code path really
wants to be able to use the native variant explicitely.

> And there really is one example of this antipattern already in the
> series.

No. There is no antipattern in this series. The only place which uses
wrmsrns() is:

@@ -70,9 +70,13 @@ static inline void update_task_stack(str
 #ifdef CONFIG_X86_32
this_cpu_write(cpu_tss_rw.x86_tss.sp1, task->thread.sp0);
 #else
-   /* Xen PV enters the kernel on the thread stack. */
-   if (cpu_feature_enabled(X86_FEATURE_XENPV))
+   if (cpu_feature_enabled(X86_FEATURE_FRED)) {
+   /* WRMSRNS is a baseline feature for FRED. */
+   wrmsrns(MSR_IA32_FRED_RSP0, (unsigned 
long)task_stack_page(task) + THREAD_SIZE);
+   } else if (cpu_feature_enabled(X86_FEATURE_XENPV)) {
+   /* Xen PV enters the kernel on the thread stack. */
load_sp0(task_top_of_stack(task));
+   }
 #endif
 }

The XENPV condition exists already today and is required independent of
FRED, no?

I deliberately distinguished #1 and #3 on my proposed todo list exactly
because the above use case really wants #1 without the extra bells and
whistles of a generic PV patchable wrmrs_ns() variant. Why?

  No matter how clever the enhanced PV implementation might be, it is
  guaranteed to generate worse code than the straight forward native
  inline assembly. Simply because it has to prevent the compiler from
  being overly clever on optimizations as it obviously mandates wider
  register restrictions, while the pure native variant (independent of
  the availability of X86_FEATURE_WRMSRNS) ony mandates the requirements
  of WRMSR[NS], but not the extra register indirection of the call ABI.

I'm not debating that any other code pattern like you pointed out in
some generic code would be horrible, but I'm not buying your strawman
related to this particular usage site.

Thanks,

tglx

Re: [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support

2023-09-14 Thread H. Peter Anvin


On 9/14/23 17:33, andrew.coop...@citrix.com wrote:


It's an assumption about what "definitely won't" be paravirt in the future.

XenPV stack handling is almost-FRED-like and has been for the better
part of two decades.

You frequently complain that there's too much black magic holding XenPV
together.  A paravirt-FRED will reduce the differences vs native
substantially.



Call it "paravirtualized exception handling." In that sense, the 
refactoring of the exception handling to benefit FRED is definitely 
useful for reducing paravirtualization. The FRED-specific code is 
largely trivial, and presumably what you would do is to replace the FRED 
wrapper with a Xen wrapper and call the common handler routines.


-hpa

Re: [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support

2023-09-14 Thread andrew . cooper3

On 15/09/2023 1:12 am, Thomas Gleixner wrote:
> On Fri, Sep 15 2023 at 00:46, andrew wrote:
>> On 15/09/2023 12:00 am, Thomas Gleixner wrote:
>>> So no. I'm fundamentally disagreeing with your recommendation. The way
>>> forward is:
>>>
>>>   1) Provide the native variant for wrmsrns(), i.e. rename the proposed
>>>  wrmsrns() to native_wrmsr_ns() and have the X86_FEATURE_WRMSRNS
>>>  safety net as you pointed out.
>>>
>>>  That function can be used in code which is guaranteed to be not
>>>  affected by the PV_XXL madness.
>>>
>>>   2) Come up with a sensible solution for the PV_XXL horrorshow
>>>
>>>   3) Implement a sane general variant of wrmsr_ns() which handles
>>>  both X86_FEATURE_WRMSRNS and X86_MISFEATURE_PV_XXL
>>>
>>>   4) Convert other code which benefits from the non-serializing variant
>>>  to wrmsr_ns()
>> Well - point 1 is mostly work that needs reverting as part of completing
>> point 3, and point 2 clearly needs doing irrespective of anything else.
> No. #1 has a value on its own independent of the general variant in #3.
>
>>>  That function can be used in code which is guaranteed to be not
>>>  affected by the PV_XXL madness.
> That makes a lot of sense because it's guaranteed to generate better
> code than whatever we come up with for the PV_XXL nonsense.

It's an assumption about what "definitely won't" be paravirt in the future.

XenPV stack handling is almost-FRED-like and has been for the better
part of two decades.

You frequently complain that there's too much black magic holding XenPV
together.  A paravirt-FRED will reduce the differences vs native
substantially.

~Andrew

Re: [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support

2023-09-14 Thread Thomas Gleixner

On Fri, Sep 15 2023 at 00:46, andrew wrote:
> On 15/09/2023 12:00 am, Thomas Gleixner wrote:
>> So no. I'm fundamentally disagreeing with your recommendation. The way
>> forward is:
>>
>>   1) Provide the native variant for wrmsrns(), i.e. rename the proposed
>>  wrmsrns() to native_wrmsr_ns() and have the X86_FEATURE_WRMSRNS
>>  safety net as you pointed out.
>>
>>  That function can be used in code which is guaranteed to be not
>>  affected by the PV_XXL madness.
>>
>>   2) Come up with a sensible solution for the PV_XXL horrorshow
>>
>>   3) Implement a sane general variant of wrmsr_ns() which handles
>>  both X86_FEATURE_WRMSRNS and X86_MISFEATURE_PV_XXL
>>
>>   4) Convert other code which benefits from the non-serializing variant
>>  to wrmsr_ns()
>
> Well - point 1 is mostly work that needs reverting as part of completing
> point 3, and point 2 clearly needs doing irrespective of anything else.

No. #1 has a value on its own independent of the general variant in #3.

>>  That function can be used in code which is guaranteed to be not
>>  affected by the PV_XXL madness.

That makes a lot of sense because it's guaranteed to generate better
code than whatever we come up with for the PV_XXL nonsense.

Thanks,

tglx

Re: [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support

2023-09-14 Thread andrew . cooper3

On 15/09/2023 12:00 am, Thomas Gleixner wrote:
>> and despite what Juergen said, I'm going to recommend that you do wire
>> this through the paravirt infrastructure, for the benefit of regular
>> users having a nice API, not because XenPV is expecting to do something
>> wildly different here.
> I fundamentaly hate adding this to the PV infrastructure. We don't want
> more PV ops, quite the contrary.

What I meant was "there should be the two top-level APIs, and under the
covers they DTRT".  Part of doing the right thing is to wire up paravirt
for configs where that is specified.

Anything else is going to force people to write logic of the form:

    if (WRMSRNS && !XENPV)
    wrmsr_ns(...)
    else
        wrmsr(...)

which is going to be worse overall.  And there really is one example of
this antipattern already in the series.

> For the initial use case at hand, there is an explicit FRED dependency
> and the code in question really wants to use WRMSRNS directly and not
> through a PV function call.
>
> I agree with your reasoning for the more generic use case where we can
> gain performance independent of FRED by using WRMSRNS for cases where
> the write has no serialization requirements.
>
> But this made me look into PV ops some more. For actual performance
> relevant code the current PV ops mechanics are a horrorshow when the op
> defaults to the native instruction.
>
> Let's look at wrmsrl():
>
> wrmsrl(msr, val
>  wrmsr(msr, (u32)val, (u32)val >> 32))
>   paravirt_write_msr(msr, low, high)
> PVOP_VCALL3(cpu.write_msr, msr, low, high)
>
> Which results in
>
>   mov $msr, %edi
>   mov $val, %rdx
>   mov %edx, %esi
>   shr $0x20, %rdx
>   callnative_write_msr
>
> and native_write_msr() does at minimum:
>
>   mov%edi,%ecx
>   mov%esi,%eax
>   wrmsr
> ret

Yeah, this is daft.  But it can also be fixed irrespective of WRMSRNS.

WRMSR has one complexity that most other PV-ops don't, and that's the
exception table reference for the instruction itself.

In a theoretical future ought to look like:

    mov    $msr, %ecx
    mov    $lo, %eax
    mov    $hi, %edx
    1: {call paravirt_blah(%rip) | cs...cs wrmsr | cs...cs wrmsrns }
    _ASM_EXTABLE(1b, ...)

In paravirt builds, the CALL needs to be the emitted form, because it
needs to function in very early boot.

But once the paravirt-ness has been chosen and alternatives run, the
as-native paths are fully inline.

The alternative which processes this site wants to conclude that, in the
case it does not alter from the CALL, to clobber the EXTABLE reference. 
CALL instructions can #GP, and you don't want to end up thinking you're
handling a WRMSR #GP when in fact it was a non-canonical function pointer.

> In the worst case 'ret' is going through the return thunk. Not to talk
> about function prologues and whatever.
>
> This becomes even more silly for trivial instructions like STI/CLI or in
> the worst case paravirt_nop().

STI/CLI are already magic.  Are they not inlined?

> The call makes only sense, when the native default is an actual
> function, but for the trivial cases it's a blatant engineering
> trainwreck.
>
> I wouldn't care at all if CONFIG_PARAVIRT_XXL would be the esoteric use
> case, but AFAICT it's default enabled on all major distros.
>
> So no. I'm fundamentally disagreeing with your recommendation. The way
> forward is:
>
>   1) Provide the native variant for wrmsrns(), i.e. rename the proposed
>  wrmsrns() to native_wrmsr_ns() and have the X86_FEATURE_WRMSRNS
>  safety net as you pointed out.
>
>  That function can be used in code which is guaranteed to be not
>  affected by the PV_XXL madness.
>
>   2) Come up with a sensible solution for the PV_XXL horrorshow
>
>   3) Implement a sane general variant of wrmsr_ns() which handles
>  both X86_FEATURE_WRMSRNS and X86_MISFEATURE_PV_XXL
>
>   4) Convert other code which benefits from the non-serializing variant
>  to wrmsr_ns()

Well - point 1 is mostly work that needs reverting as part of completing
point 3, and point 2 clearly needs doing irrespective of anything else.

Thanks,

~Andrew

Re: [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support

2023-09-14 Thread H. Peter Anvin

On September 14, 2023 4:00:39 PM PDT, Thomas Gleixner  
wrote:
>Andrew!
>
>On Thu, Sep 14 2023 at 15:05, andrew wrote:
>> On 14/09/2023 5:47 am, Xin Li wrote:
>>> +static __always_inline void wrmsrns(u32 msr, u64 val)
>>> +{
>>> +   __wrmsrns(msr, val, val >> 32);
>>> +}
>>
>> This API works in terms of this series where every WRMSRNS is hidden
>> behind a FRED check, but it's an awkward interface to use anywhere else
>> in the kernel.
>
>Agreed.
>
>> I fully understand that you expect all FRED capable systems to have
>> WRMSRNS, but it is not a hard requirement and you will end up with
>> simpler (and therefore better) logic by deleting the dependency.
>
>According to the CPU folks FRED systems are guaranteed to have WRMSRNS -
>I asked for that :). It's just not yet documented.
>
>But that I aside, I agree that we should opt for the safe side with a
>fallback like the one you have in XEN even for the places which are
>strictly FRED dependent.
>
>> As a "normal" user of the WRMSR APIs, the programmer only cares about:
>>
>> 1) wrmsr() -> needs to be serialising
>> 2) wrmsr_ns() -> safe to be non-serialising
>
>Correct.
>
>> In Xen, I added something of the form:
>>
>> /* Non-serialising WRMSR, when available.  Falls back to a serialising
>> WRMSR. */
>> static inline void wrmsr_ns(uint32_t msr, uint32_t lo, uint32_t hi)
>> {
>>     /*
>>  * WRMSR is 2 bytes.  WRMSRNS is 3 bytes.  Pad WRMSR with a redundant CS
>>  * prefix to avoid a trailing NOP.
>>  */
>>     alternative_input(".byte 0x2e; wrmsr",
>>   ".byte 0x0f,0x01,0xc6", X86_FEATURE_WRMSRNS,
>>   "c" (msr), "a" (lo), "d" (hi));
>> }
>>
>> and despite what Juergen said, I'm going to recommend that you do wire
>> this through the paravirt infrastructure, for the benefit of regular
>> users having a nice API, not because XenPV is expecting to do something
>> wildly different here.
>
>I fundamentaly hate adding this to the PV infrastructure. We don't want
>more PV ops, quite the contrary.
>
>For the initial use case at hand, there is an explicit FRED dependency
>and the code in question really wants to use WRMSRNS directly and not
>through a PV function call.
>
>I agree with your reasoning for the more generic use case where we can
>gain performance independent of FRED by using WRMSRNS for cases where
>the write has no serialization requirements.
>
>But this made me look into PV ops some more. For actual performance
>relevant code the current PV ops mechanics are a horrorshow when the op
>defaults to the native instruction.
>
>Let's look at wrmsrl():
>
>wrmsrl(msr, val
> wrmsr(msr, (u32)val, (u32)val >> 32))
>  paravirt_write_msr(msr, low, high)
>PVOP_VCALL3(cpu.write_msr, msr, low, high)
>
>Which results in
>
>   mov $msr, %edi
>   mov $val, %rdx
>   mov %edx, %esi
>   shr $0x20, %rdx
>   callnative_write_msr
>
>and native_write_msr() does at minimum:
>
>   mov%edi,%ecx
>   mov%esi,%eax
>   wrmsr
>ret
>
>In the worst case 'ret' is going through the return thunk. Not to talk
>about function prologues and whatever.
>
>This becomes even more silly for trivial instructions like STI/CLI or in
>the worst case paravirt_nop().
>
>The call makes only sense, when the native default is an actual
>function, but for the trivial cases it's a blatant engineering
>trainwreck.
>
>I wouldn't care at all if CONFIG_PARAVIRT_XXL would be the esoteric use
>case, but AFAICT it's default enabled on all major distros.
>
>So no. I'm fundamentally disagreeing with your recommendation. The way
>forward is:
>
>  1) Provide the native variant for wrmsrns(), i.e. rename the proposed
> wrmsrns() to native_wrmsr_ns() and have the X86_FEATURE_WRMSRNS
> safety net as you pointed out.
>
> That function can be used in code which is guaranteed to be not
> affected by the PV_XXL madness.
>
>  2) Come up with a sensible solution for the PV_XXL horrorshow
>
>  3) Implement a sane general variant of wrmsr_ns() which handles
> both X86_FEATURE_WRMSRNS and X86_MISFEATURE_PV_XXL
>
>  4) Convert other code which benefits from the non-serializing variant
> to wrmsr_ns()
>
>Thanks,
>
>tglx
>

With regards to (2), the IMO only sensible solution is to have the ABI be the 
one of the native instruction, and to have the PVXXL users take the full brunt 
of the overhead. That means that on a native or HVM machine, the proper code 
gets patched in inline, and the PVXXL code becomes a call to a stub to do the 
parameter marshalling before walking off back into C ABI land. The pvop further 
has to bear the full cost of providing the full native semantics unless 
otherwise agreed with the native maintainers and explicitly documented what the 
modified semantics are (notably, having excplicit stubs for certain specific 
MSRs is entirely reasonable.)

In case this sounds familiar, it is the pvops we were promised over 15 years 
ago, and yet somehow never real

Re: [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support

2023-09-14 Thread Thomas Gleixner

Andrew!

On Thu, Sep 14 2023 at 15:05, andrew wrote:
> On 14/09/2023 5:47 am, Xin Li wrote:
>> +static __always_inline void wrmsrns(u32 msr, u64 val)
>> +{
>> +__wrmsrns(msr, val, val >> 32);
>> +}
>
> This API works in terms of this series where every WRMSRNS is hidden
> behind a FRED check, but it's an awkward interface to use anywhere else
> in the kernel.

Agreed.

> I fully understand that you expect all FRED capable systems to have
> WRMSRNS, but it is not a hard requirement and you will end up with
> simpler (and therefore better) logic by deleting the dependency.

According to the CPU folks FRED systems are guaranteed to have WRMSRNS -
I asked for that :). It's just not yet documented.

But that I aside, I agree that we should opt for the safe side with a
fallback like the one you have in XEN even for the places which are
strictly FRED dependent.

> As a "normal" user of the WRMSR APIs, the programmer only cares about:
>
> 1) wrmsr() -> needs to be serialising
> 2) wrmsr_ns() -> safe to be non-serialising

Correct.

> In Xen, I added something of the form:
>
> /* Non-serialising WRMSR, when available.  Falls back to a serialising
> WRMSR. */
> static inline void wrmsr_ns(uint32_t msr, uint32_t lo, uint32_t hi)
> {
>     /*
>  * WRMSR is 2 bytes.  WRMSRNS is 3 bytes.  Pad WRMSR with a redundant CS
>  * prefix to avoid a trailing NOP.
>  */
>     alternative_input(".byte 0x2e; wrmsr",
>   ".byte 0x0f,0x01,0xc6", X86_FEATURE_WRMSRNS,
>   "c" (msr), "a" (lo), "d" (hi));
> }
>
> and despite what Juergen said, I'm going to recommend that you do wire
> this through the paravirt infrastructure, for the benefit of regular
> users having a nice API, not because XenPV is expecting to do something
> wildly different here.

I fundamentaly hate adding this to the PV infrastructure. We don't want
more PV ops, quite the contrary.

For the initial use case at hand, there is an explicit FRED dependency
and the code in question really wants to use WRMSRNS directly and not
through a PV function call.

I agree with your reasoning for the more generic use case where we can
gain performance independent of FRED by using WRMSRNS for cases where
the write has no serialization requirements.

But this made me look into PV ops some more. For actual performance
relevant code the current PV ops mechanics are a horrorshow when the op
defaults to the native instruction.

Let's look at wrmsrl():

wrmsrl(msr, val
 wrmsr(msr, (u32)val, (u32)val >> 32))
  paravirt_write_msr(msr, low, high)
PVOP_VCALL3(cpu.write_msr, msr, low, high)

Which results in

mov $msr, %edi
mov $val, %rdx
mov %edx, %esi
shr $0x20, %rdx
callnative_write_msr

and native_write_msr() does at minimum:

mov%edi,%ecx
mov%esi,%eax
wrmsr
ret

In the worst case 'ret' is going through the return thunk. Not to talk
about function prologues and whatever.

This becomes even more silly for trivial instructions like STI/CLI or in
the worst case paravirt_nop().

The call makes only sense, when the native default is an actual
function, but for the trivial cases it's a blatant engineering
trainwreck.

I wouldn't care at all if CONFIG_PARAVIRT_XXL would be the esoteric use
case, but AFAICT it's default enabled on all major distros.

So no. I'm fundamentally disagreeing with your recommendation. The way
forward is:

  1) Provide the native variant for wrmsrns(), i.e. rename the proposed
 wrmsrns() to native_wrmsr_ns() and have the X86_FEATURE_WRMSRNS
 safety net as you pointed out.

 That function can be used in code which is guaranteed to be not
 affected by the PV_XXL madness.

  2) Come up with a sensible solution for the PV_XXL horrorshow

  3) Implement a sane general variant of wrmsr_ns() which handles
 both X86_FEATURE_WRMSRNS and X86_MISFEATURE_PV_XXL

  4) Convert other code which benefits from the non-serializing variant
 to wrmsr_ns()

Thanks,

tglx

[linux-linus test] 183002: regressions - FAIL

2023-09-14 Thread osstest service owner

flight 183002 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183002/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 8 xen-boot fail REGR. 
vs. 182531
 test-amd64-coresched-amd64-xl  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-libvirt-pair 12 xen-boot/src_host   fail REGR. vs. 182531
 test-amd64-amd64-libvirt-pair 13 xen-boot/dst_host   fail REGR. vs. 182531
 test-amd64-amd64-xl-xsm   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-raw  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-qcow2  8 xen-boot   fail REGR. vs. 182531
 test-amd64-amd64-xl-vhd   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-pvhv2-intel  8 xen-boot  fail REGR. vs. 182531
 test-amd64-amd64-xl-pvshim8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-freebsd12-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-ws16-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-debianhvm-amd64  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-xl-credit1   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-xsm  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-win7-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-win7-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-dom0pvh-xl-amd 14 guest-start   fail REGR. vs. 182531
 test-amd64-amd64-xl-shadow8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-examine-uefi  8 reboot  fail REGR. vs. 182531
 test-amd64-amd64-pair12 xen-boot/src_hostfail REGR. vs. 182531
 test-amd64-amd64-pair13 xen-boot/dst_hostfail REGR. vs. 182531
 test-amd64-amd64-xl-multivcpu  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-qemuu-nested-intel  8 xen-boot  fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-ws16-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-pvhv2-amd  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-ovmf-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-qemuu-nested-amd  8 xen-bootfail REGR. vs. 182531
 test-amd64-amd64-dom0pvh-xl-intel 14 guest-start fail REGR. vs. 182531
 test-amd64-amd64-libvirt  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-freebsd11-amd64  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-pygrub   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 8 xen-boot fail REGR. vs. 
182531
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 8 xen-boot fail REGR. 
vs. 182531
 test-amd64-amd64-xl-credit2   8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-examine-bios  8 reboot  fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  8 xen-boot fail REGR. vs. 182531
 test-amd64-amd64-examine  8 reboot   fail REGR. vs. 182531
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 8 xen-boot fail REGR. vs. 
182531

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds  8 xen-boot fail REGR. vs. 182531

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 182531
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 182531
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 182531
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-x

Re: [PATCH v4 6/8] tools/binfile: switch to common annotations model

2023-09-14 Thread Julien Grall


Hi Jan,

On 04/08/2023 07:30, Jan Beulich wrote:

Use DATA() / END() and drop the now redundant .global. No change in
generated data; of course the two symbols now properly gain "hidden"
binding.

Signed-off-by: Jan Beulich 


Reviewed-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [PATCH v4 4/8] Arm: annotate entry points with type and size

2023-09-14 Thread Julien Grall


Hi,

On 04/08/2023 07:28, Jan Beulich wrote:

Use the generic framework in xen/linkage.h. No change in generated code
except for the changed padding value (noticable when config.gz isn't a
multiple of 4 in size). Plus of course the converted symbols change to
be hidden ones.

Note that ASM_INT() is switched to DATA(), not DATA_LOCAL(), as the only
use site wants the symbol global anyway.

Signed-off-by: Jan Beulich 


Reviewed-by: Julien Grall 


---
Only one each of the assembly files is being converted for now. More
could be done right here or as follow-on in separate patches.


I don't have a strong preference. Are you planning to do follow-up? (I 
am ok if it is no).


Cheers,

--
Julien Grall

Re: [PATCH v4 1/8] common: assembly entry point type/size annotations

2023-09-14 Thread Julien Grall


Hi Jan,

On 04/08/2023 07:26, Jan Beulich wrote:

Recent gas versions generate minimalistic Dwarf debug info for items
annotated as functions and having their sizes specified [1]. Furthermore
generating live patches wants items properly annotated. "Borrow" Arm's
END() and (remotely) derive other annotation infrastructure from
Linux'es, for all architectures to use.

Signed-off-by: Jan Beulich 

[1] 
https://sourceware.org/git?p=binutils-gdb.git;a=commitdiff;h=591cc9fbbfd6d51131c0f1d4a92e7893edcc7a28
---
v3: New, generalized from earlier x86-only version. LAST() (now
 LASTARG()) moved to macros.h.
---
TBD: What to set CODE_ALIGN to by default? Or should we requires arch-es
  to define that in all cases?


The code alignment is very specific to an architecture. So I think it 
would be better if there are no default.


Otherwise, it will be more difficult for a developper to figure out that 
CODE_ALIGN may need an update.




TBD: {CODE,DATA}_ALIGN are byte granular, such that a value of 0 can be
  specified (in case this has some special meaning on an arch;
  conceivably it could mean to use some kind of arch default). We may
  not strictly need that, and hence we could also make these power-of
  -2 values (using .p2align).


I don't have a strong opinion on this one.



Note that we can't use ALIGN() (in place of SYM_ALIGN()) as long as we
still have ALIGN.

Note further that FUNC()'s etc "algn" parameter is intended to allow for
only no or a single argument. If we wanted to also make the fill value
customizable per call site, the constructs would need re-doing to some
degree.

--- /dev/null
+++ b/xen/include/xen/linkage.h
@@ -0,0 +1,56 @@
+#ifndef __LINKAGE_H__
+#define __LINKAGE_H__
+
+#ifdef __ASSEMBLY__
+
+#include 
+
+#ifndef CODE_ALIGN
+# define CODE_ALIGN ??
+#endif
+#ifndef CODE_FILL
+# define CODE_FILL ~0
+#endif


What's the value to allow the architecture to override CODE_FILL and ...


+
+#ifndef DATA_ALIGN
+# define DATA_ALIGN 0
+#endif
+#ifndef DATA_FILL
+# define DATA_FILL ~0
+#endif


... DATA_FILL?


+
+#define SYM_ALIGN(algn...) .balign algn


I find the name 'algn' confusing (not even referring to the missing 
'i'). Why not naming it 'args'?



+
+#define SYM_L_GLOBAL(name) .globl name
+#define SYM_L_WEAK(name)   .weak name
+#define SYM_L_LOCAL(name)  /* nothing */
+
+#define SYM_T_FUNC STT_FUNC
+#define SYM_T_DATA STT_OBJECT
+#define SYM_T_NONE STT_NOTYPE


SYM_* will be used only in SYM() below. So why not using STT_* directly?


+
+#define SYM(name, typ, linkage, algn...)  \
+.type name, SYM_T_ ## typ;\
+SYM_L_ ## linkage(name);  \
+SYM_ALIGN(algn);  \
+name:
+
+#define END(name) .size name, . - name
+
+#define FUNC(name, algn...) \
+SYM(name, FUNC, GLOBAL, LASTARG(CODE_ALIGN, ## algn), CODE_FILL)
+#define LABEL(name, algn...) \
+SYM(name, NONE, GLOBAL, LASTARG(CODE_ALIGN, ## algn), CODE_FILL)
+#define DATA(name, algn...) \
+SYM(name, DATA, GLOBAL, LASTARG(DATA_ALIGN, ## algn), DATA_FILL)


I think the alignment should be explicit for DATA. Otherwise, at least 
on Arm, we would default to 0 which could lead to unaligned access if 
not careful.



+
+#define FUNC_LOCAL(name, algn...) \
+SYM(name, FUNC, LOCAL, LASTARG(CODE_ALIGN, ## algn), CODE_FILL)
+#define LABEL_LOCAL(name, algn...) \
+SYM(name, NONE, LOCAL, LASTARG(CODE_ALIGN, ## algn), CODE_FILL)
+#define DATA_LOCAL(name, algn...) \
+SYM(name, DATA, LOCAL, LASTARG(DATA_ALIGN, ## algn), DATA_FILL)


Same here.


+
+#endif /*  __ASSEMBLY__ */
+
+#endif /* __LINKAGE_H__ */
--- a/xen/include/xen/macros.h
+++ b/xen/include/xen/macros.h
@@ -15,6 +15,15 @@
  #define count_args(args...) \
  count_args_(., ## args, 8, 7, 6, 5, 4, 3, 2, 1, 0)
  
+#define ARG1_(x, y...) (x)

+#define ARG2_(x, y...) ARG1_(y)
+#define ARG3_(x, y...) ARG2_(y)
+#define ARG4_(x, y...) ARG3_(y)
+
+#define ARG__(nr) ARG ## nr ## _
+#define ARG_(nr)  ARG__(nr)
+#define LASTARG(x, y...) ARG_(count_args(x, ## y))(x, ## y)
+
  /* Indirect macros required for expanded argument pasting. */
  #define PASTE_(a, b) a ## b
  #define PASTE(a, b) PASTE_(a, b)



Cheers,

--
Julien Grall

4.8-unstable: building firmware/hvmloader errors in Bookworm with gcc-12.2

2023-09-14 Thread Pry Mar

Hello,

First attempt to build xen-4.18-unstabl in deb12 (Bookworm):
/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.asl
  9851: Name ( SLT, 0x0 )
Remark   2173 -  Creation of named objects within a method is
highly inefficient, use globals or method local variables instead ^
 (\_GPE._L03)

/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.asl
  9852: Name ( EVT, 0x0 )
Remark   2173 -  Creation of named objects within a method is
highly inefficient, use globals or method local variables instead ^
 (\_GPE._L03)

ASL Input:
/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.asl
-  397476 bytes   8140 keywords  11139 source lines
AML Output:
 
/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.aml
-   73130 bytes   5541 opcodes2599 named objects
Hex Dump:
 
/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.hex
-  686160 bytes

Compilation successful. 0 Errors, 129 Warnings, 2 Remarks, 2762
Optimizations
sed -e 's/AmlCode/dsdt_anycpu/g' -e 's/_aml_code//g'
/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.hex
> /home/mockbuild/pbdeps/xen-4.18~rc0/debi>
echo "int dsdt_anycpu_len=sizeof(dsdt_anycpu);" >>
/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.c.tmp
mv -f
/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.c.tmp
/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmlo>
rm -f
/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader/dsdt_anycpu.aml
/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmload>
make[9]: Leaving directory
'/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/libacpi'
make[8]: Leaving directory
'/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/hvmloader'
make[7]: ***
[/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/../../tools/Rules.mk:206:
subdir-all-hvmloader] Error 2
make[7]: Leaving directory
'/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware'
make[6]: ***
[/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware/../../tools/Rules.mk:201:
subdirs-all] Error 2
make[6]: Leaving directory
'/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware'
make[5]: *** [Makefile:37: all] Error 2
make[5]: Leaving directory
'/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/firmware'
make[4]: ***
[/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/../tools/Rules.mk:206:
subdir-all-firmware] Error 2
make[4]: Leaving directory
'/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools'
make[3]: ***
[/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools/../tools/Rules.mk:201:
subdirs-all] Error 2
make[3]: Leaving directory
'/home/mockbuild/pbdeps/xen-4.18~rc0/debian/build/build-utils_amd64/tools'
make[2]: *** [debian/rules.real:218: debian/stamps/build-utils_amd64] Error
2
make[2]: Leaving directory '/home/mockbuild/pbdeps/xen-4.18~rc0'
make[1]: *** [debian/rules.gen:57: build-arch_amd64_real] Error 2
make[1]: Leaving directory '/home/mockbuild/pbdeps/xen-4.18~rc0'
make: *** [debian/rules:24: build-arch] Error 2
dpkg-buildpackage: error: debian/rules build subprocess returned exit
status 2

any help is appreciated,
PryMar56

Re: [PATCH 8/8] x86/spec-ctrl: Mitigate the Zen1 DIV leakge

2023-09-14 Thread Andrew Cooper

On 14/09/2023 2:12 pm, Jason Andryuk wrote:
> On Wed, Sep 13, 2023 at 6:09 PM Andrew Cooper  
> wrote:
>> @@ -955,6 +960,40 @@ static void __init srso_calculations(bool 
>> hw_smt_enabled)
>>  setup_force_cpu_cap(X86_FEATURE_SRSO_NO);
>>  }
>>
>> +/*
>> + * Div leakage is specific to the AMD Zen1 microarchitecure.  Use STIBP as a
>> + * heuristic to select between Zen1 and Zen2 uarches.
>> + */
>> +static bool __init has_div_vuln(void)
>> +{
>> +if ( !(boot_cpu_data.x86_vendor &
>> +   (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
>> +return false;
>> +
>> +if ( (boot_cpu_data.x86 != 0x17 && boot_cpu_data.x86 != 0x18) ||
>> + !boot_cpu_has(X86_FEATURE_AMD_STIBP) )
>> +return false;
>> +
>> +return true;
>> +}
>> +
>> +static void __init div_calculations(bool hw_smt_enabled)
>> +{
>> +bool cpu_bug_div = has_div_vuln();
>> +
> Would it make sense to add
> if ( !cpu_bug_div )
> return
> ...
>
>> +if ( opt_div_scrub == -1 )
>> +opt_div_scrub = cpu_bug_div;
>> +
>> +if ( opt_div_scrub )
>> +setup_force_cpu_cap(X86_FEATURE_SC_DIV);
> ...so that div-scrub=1 isn't setting X86_FEATURE_SC_DIV on un-affected
> hardware?  Or do you want to leave command line control in place in
> case it might be needed as a future workaround on other hardware?

All options (where possible) allow for paths to be explicitly activated
on un-affected hardware so we can test this giant mess.

The only cases where we ignore a user choice is when the result will
crash from e.g. #GP due to insufficient microcode.

~Andrew

Re: [PATCH 8/8] x86/spec-ctrl: Mitigate the Zen1 DIV leakge

2023-09-14 Thread Andrew Cooper

On 14/09/2023 11:52 am, Jan Beulich wrote:
> On 13.09.2023 22:27, Andrew Cooper wrote:
>> @@ -378,6 +392,8 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
>>  verw STACK_CPUINFO_FIELD(verw_sel)(%r14)
>>  .L\@_verw_skip:
>>  
>> +ALTERNATIVE "", DO_SPEC_CTRL_DIV, X86_FEATURE_SC_DIV
>> +
>>  .L\@_skip_ist_exit:
>>  .endm
> While we did talk about using alternatives patching here, I'm now in
> doubt again, in particular because the rest of the macro uses
> conditionals anyway, and the code here is bypassed for non-IST exits. If
> you're sure you want to stick to this, then I think some justification
> wants adding to the description.

Patching IST paths is safe - we drop into NMI context, and rewrite
before starting APs.

VERW needs to remain a conditional because of the FB_CLEAR/MMIO paths. 
MSR_SPEC_CTRL is going to turn back into being an alternative when I
make eIBRS work sensibly.

>> @@ -955,6 +960,40 @@ static void __init srso_calculations(bool 
>> hw_smt_enabled)
>>  setup_force_cpu_cap(X86_FEATURE_SRSO_NO);
>>  }
>>  
>> +/*
>> + * Div leakage is specific to the AMD Zen1 microarchitecure.  Use STIBP as a
>> + * heuristic to select between Zen1 and Zen2 uarches.
>> + */
>> +static bool __init has_div_vuln(void)
>> +{
>> +if ( !(boot_cpu_data.x86_vendor &
>> +   (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
>> +return false;
>> +
>> +if ( (boot_cpu_data.x86 != 0x17 && boot_cpu_data.x86 != 0x18) ||
>> + !boot_cpu_has(X86_FEATURE_AMD_STIBP) )
>> +return false;
>> +
>> +return true;
>> +}
>> +
>> +static void __init div_calculations(bool hw_smt_enabled)
>> +{
>> +bool cpu_bug_div = has_div_vuln();
>> +
>> +if ( opt_div_scrub == -1 )
>> +opt_div_scrub = cpu_bug_div;
>> +
>> +if ( opt_div_scrub )
>> +setup_force_cpu_cap(X86_FEATURE_SC_DIV);
> Isn't this only lowering performance (even if just slightly) when SMT is
> enabled, without gaining us very much?

It is consistent with how MDS/L1TF/others work, which is important.

And it does protect against a single-threaded attacker, for what that's
worth in practice.

>
>> +if ( opt_smt == -1 && cpu_bug_div && hw_smt_enabled )
>> +warning_add(
>> +"Booted on leaky-DIV hardware with SMT/Hyperthreading\n"
>> +"enabled.  Please assess your configuration and choose an\n"
>> +"explicit 'smt=' setting.  See XSA-439.\n");
>> +}
> What about us running virtualized? The topology we see may not be that
> of the underlying hardware. Maybe check_smt_enabled() should return
> true when cpu_has_hypervisor is true? (In-guest decisions would
> similarly need to assume that they may be running on SMT-enabled
> hardware, despite not themselves seeing this to be the case.)
>
> Since we can't know for sure when running virtualized, that's a case
> where I would consider it useful to enable the workaround nevertheless
> (perhaps accompanied by a warning that whether this helps depends on
> the next level hypervisor).

Honestly, SMT's not relevant.  If you're virtualised, you've lost
irrespective.

There is no FOO_NO bit, so there's no possible way a VM can figure out
if it is current on, or may subsequently move to, an impacted processor.

~Andrew

Re: [PATCH 7/8] x86/spec-ctrl: Issue VERW during IST exit to Xen

2023-09-14 Thread Andrew Cooper

On 14/09/2023 11:01 am, Jan Beulich wrote:
> On 13.09.2023 22:27, Andrew Cooper wrote:
>> There is a corner case where e.g. an NMI hitting an exit-to-guest path after
>> SPEC_CTRL_EXIT_TO_* would have run the entire NMI handler *after* the VERW
>> flush to scrub potentially sensitive data from uarch buffers.
>>
>> In order to compensate, issue VERW when exiting to Xen from an IST entry.
>>
>> SPEC_CTRL_EXIT_TO_XEN already has two reads of spec_ctrl_flags off the stack,
>> and we're about to add a third.  Load the field into %ebx, and list the
>> register as clobbered.
>>
>> %r12 has been arranged to be the ist_exit signal, so add this as an input
>> dependency and use it to identify when to issue a VERW.
>>
>> Signed-off-by: Andrew Cooper 
> While looking technically okay, I still have a couple of remarks:
>
>> --- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
>> +++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
>> @@ -344,10 +344,12 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
>>   */
>>  .macro SPEC_CTRL_EXIT_TO_XEN
>>  /*
>> - * Requires %r14=stack_end
>> - * Clobbers %rax, %rcx, %rdx
>> + * Requires %r12=ist_exit, %r14=stack_end
>> + * Clobbers %rax, %rbx, %rcx, %rdx
> While I'd generally be a little concerned of the growing dependency and
> clobber lists, there being a single use site makes this pretty okay. The
> macro invocation being immediately followed by RESTORE_ALL effectively
> means we can clobber almost everything here.
>
> As to register usage and my comment on patch 5: There's no real need
> to switch %rbx to %r14 there if I'm not mistaken

As said, it's about consistency of the helpers.


>>   */
>> -testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
>> +movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
>> +
>> +testb $SCF_ist_sc_msr, %bl
>>  jz .L\@_skip_sc_msr
>>  
>>  /*
>> @@ -358,7 +360,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
>>   */
>>  xor %edx, %edx
>>  
>> -testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
>> +testb $SCF_use_shadow, %bl
>>  jz .L\@_skip_sc_msr
>>  
>>  mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%r14), %eax
>> @@ -367,8 +369,16 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
>>  
>>  .L\@_skip_sc_msr:
>>  
>> -/* TODO VERW */
>> +test %r12, %r12
>> +jz .L\@_skip_ist_exit
>> +
>> +/* Logically DO_SPEC_CTRL_COND_VERW but without the %rsp=cpuinfo 
>> dependency */
>> +testb $SCF_verw, %bl
>> +jz .L\@_verw_skip
>> +verw STACK_CPUINFO_FIELD(verw_sel)(%r14)
>> +.L\@_verw_skip:
> Nit: .L\@_skip_verw would be more consistent with the other label names.

So it would.  I'll tweak.

~Andrew

Re: [PATCH 6/8] x86/entry: Track the IST-ness of an entry for the exit paths

2023-09-14 Thread Andrew Cooper

On 14/09/2023 10:32 am, Jan Beulich wrote:
> On 13.09.2023 22:27, Andrew Cooper wrote:
>> --- a/xen/arch/x86/x86_64/compat/entry.S
>> +++ b/xen/arch/x86/x86_64/compat/entry.S
>> @@ -117,8 +117,15 @@ compat_process_trap:
>>  call  compat_create_bounce_frame
>>  jmp   compat_test_all_events
>>  
>> -/* %rbx: struct vcpu, interrupts disabled */
>> +/* %rbx: struct vcpu, %r12: ist_exit, interrupts disabled */
>>  ENTRY(compat_restore_all_guest)
>> +
>> +#ifdef CONFIG_DEBUG
>> +mov   %rsp, %rdi
>> +mov   %r12, %rsi
>> +call  check_ist_exit
>> +#endif
>> +
>>  ASSERT_INTERRUPTS_DISABLED
>>  mov   $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), %r11d
>>  and   UREGS_eflags(%rsp),%r11d
> Without having peeked ahead, is there any use of %r12 going to appear
> on this path? I thought it's only going to be restore_all_xen?

For now, we only need to change behaviour based on ist_exit in
restore_all_xen.

But, we do get here in IST context, and I'm not interested in having to
re-do the analysis to determine if this is safe.  ist_exit is a global
property of exiting Xen, so should be kept correct from the outset.

>
>> --- a/xen/arch/x86/x86_64/entry.S
>> +++ b/xen/arch/x86/x86_64/entry.S
>> @@ -142,10 +142,16 @@ process_trap:
>>  
>>  .section .text.entry, "ax", @progbits
>>  
>> -/* %rbx: struct vcpu, interrupts disabled */
>> +/* %rbx: struct vcpu, %r12: ist_exit, interrupts disabled */
>>  restore_all_guest:
>> -ASSERT_INTERRUPTS_DISABLED
>>  
>> +#ifdef CONFIG_DEBUG
>> +mov   %rsp, %rdi
>> +mov   %r12, %rsi
>> +call  check_ist_exit
>> +#endif
>> +
>> +ASSERT_INTERRUPTS_DISABLED
>>  /* Stash guest SPEC_CTRL value while we can read struct vcpu. */
>>  mov VCPU_arch_msrs(%rbx), %rdx
>>  mov VCPUMSR_spec_ctrl_raw(%rdx), %r15d
> Even here I don't think I see a need for the addition. Plus if the check
> is warranted here, is it really necessary for it to live ahead of the
> interrupts-disabled check?

What makes you think there is a relevance to the order of two assertions
in fully irqs-off code?

The checks are in the same order as the comment stating the invariants.

>  Further, seeing that being marco-ized, would
> it make sense to have a CHECK_IST_EXIT macro in case more than a single
> use site remains?
>
>> @@ -1087,6 +1100,10 @@ handle_ist_exception:
>>  .L_ist_dispatch_done:
>>  mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
>>  mov   %bl, STACK_CPUINFO_FIELD(use_pv_cr3)(%r14)
>> +
>> +/* This is an IST exit */
>> +mov   $1, %r12
>> +
>>  cmpb  $X86_EXC_NMI, UREGS_entry_vector(%rsp)
>>  jne   ret_from_intr
> Could I talk you into using a less than 7-byte insn here? "or $1, %r12"
> would be 4 bytes, "mov $1, %r12b", "inc %r12", and "not %r12" would be
> just 3. All have certain downsides, yes, but I wonder whether switching
> isn't worth it. Even "mov $1, %r12d" would be at least one byte less,
> without any downsides. And the OR and INC variants would allow the
> remaining 63 bits to be used for another purpose down the road.

This is a 2Hz-at-most path.  The size of one instruction is not
something to care about.

But I did mean to use the %r12d form, so I'll go with that.  Everything
else depends on the behaviour of earlier logic.

~Andrew

Re: [PATCH 5/8] x86/entry: Adjust restore_all_xen to hold stack_end in %r14

2023-09-14 Thread Andrew Cooper

On 14/09/2023 9:51 am, Jan Beulich wrote:
> On 13.09.2023 22:27, Andrew Cooper wrote:
>> All other SPEC_CTRL_{ENTRY,EXIT}_* helpers hold stack_end in %r14.  Adjust it
>> for consistency, freeing up %rbx to be used differently.
>>
>> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 
>
> The choice of %rbx was, iirc, for avoiding unnecessary code size increase
> (due to extra REX prefixes needed in some of the affected insns). I
> understand you view this as not a real concern, but I think saying so
> explicitly in the description would be at least desirable. After all there
> would also be the alternative of further parametrizing the involved macros.

Yeah I'm seriously not concerned about a REX prefix or two.

This is about making all the helpers more consistent, to cut down on the
number of errors involved in modifying this mess.

~Andrew

Re: [PATCH 4/8] x86/spec-ctrl: Extend all SPEC_CTRL_{ENTER,EXIT}_* comments

2023-09-14 Thread Andrew Cooper

On 14/09/2023 8:58 am, Jan Beulich wrote:
> On 13.09.2023 22:27, Andrew Cooper wrote:
>> --- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
>> +++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
>> @@ -218,7 +218,10 @@
>>  wrmsr
>>  .endm
>>  
>> -/* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
>> +/*
>> + * Used after a synchronous entry from PV context.  SYSCALL, SYSENTER, INT,
>> + * etc.  Will always interrupt a guest speculation context.
>> + */
>>  .macro SPEC_CTRL_ENTRY_FROM_PV
>>  /*
>>   * Requires %rsp=regs/cpuinfo, %rdx=0
> For the entry point comments - not being a native speaker -, the use of
> "{will,may} interrupt" reads odd. You're describing the macros here,
> not the the events that led to their invocation. Therefore it would seem
> to me that "{will,may} have interrupted" would be more appropriate.

The salient information is what the speculation state looks like when
we're running the asm in these macros.

Sync and Async perhaps aren't the best terms.  For PV context at least,
it boils down to:

1) CPL>0 => you were in fully-good guest speculation context
2) CPL=0 => you were in fully-good Xen speculation context
3) IST && CPL=0 => Here be dragons.

HVM is more of a challenge.  VT-x behaves like an IST path, while SVM
does allow us to atomically switch to good Xen state.

Really, this needs to be a separate doc, with diagrams...

>> @@ -319,7 +334,14 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
>>  UNLIKELY_END(\@_serialise)
>>  .endm
>>  
>> -/* Use when exiting to Xen in IST context. */
>> +/*
>> + * Use when exiting from any entry context, back to Xen context.  This
>> + * includes returning to other SPEC_CTRL_{ENTRY,EXIT}_* regions with
>> + * unsanitised state.
>> + *
>> + * Because we might have interrupted Xen beyond SPEC_CTRL_EXIT_TO_$GUEST, we
>> + * must treat this as if it were an EXIT_TO_$GUEST case too.
>> + */
>>  .macro SPEC_CTRL_EXIT_TO_XEN
>>  /*
>>   * Requires %rbx=stack_end
> Is it really "must"? At least in theory there are ways to recognize that
> exit is back to Xen context outside of interrupted entry/exit regions
> (simply by evaluating how far below stack top %rsp is).

Yes, it is must - it's about how Xen behaves right now, not about some
theoretical future with different tracking mechanism.

Checking the stack is very fragile and we've had bugs doing this in the
past.  It would break completely if we were to take things such as the
recursive-NMI fix (not that we're liable to at this point with FRED on
the horizon.)

A perhaps less fragile option would be to have .text.entry.spec_suspect
section and check %rip being in that.

But neither of these are good options.  It's adding complexity (latency)
to a fastpath to avoid a small hit in a rare case, so is a concrete
anti-optimisation.

>> @@ -344,6 +366,9 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
>>  wrmsr
>>  
>>  .L\@_skip_sc_msr:
>> +
>> +/* TODO VERW */
>> +
>>  .endm
> I don't think this comment is strictly necessary to add here, when the
> omission is addressed in a later patch. But I also don't mind its
> addition.

It doesn't especially matter if the series gets committed in one go, but
it does matter if it ends up being split.  This is the patch which
observes that VERW is missing.

~Andrew

[PATCH v6 2/4] xen/ppc: Define minimal stub headers required for full build

2023-09-14 Thread Shawn Anastasio

Additionally, change inclusion of asm/ headers to corresponding xen/ ones
throughout arch/ppc now that they work.

Signed-off-by: Shawn Anastasio 
Acked-by: Jan Beulich 
---
v6: No changes.

v5:
  - Drop vm_event.h in favor of asm-generic variant
  - (numa.h) Fix lingering reference to Arm

v4:
  - (device.h) Fix underscore prefixes in DT_DEVICE_START macro
  - (mm.h) Fix padding blanks in newly added *_to_* macros
  - (mm.h) Drop inaccurate comment for virt_to_mfn/mfn_to_virt. The
implementation of these macros remains unchanged.
  - (mm.h) s/zero/NULL/g in comment of page_info::v::inuse::domain
  - (mm.h) Use uint_t instead of u types in struct page_info
  - (p2m.h) Drop ARM-specific p2m_type_t enumerations
  - (system.h) Fix hard tabs in xchg() macro
  - (system.h) Add missing '\n' in build_xchg inline assembly before
PPC_ATOMIC_EXIT_BARRIER.
  - (system.h) Fix formatting of switch statements.
  - (system.h) Fix formatting of cmpxchg() macro.
  - (system.h) Fix formatting of mb/rmb/wmb macros
  - (system.h) Drop dead !CONFIG_SMP definition of smp memory barriers
  - (system.h) Replace hand-coded asm memory barriers w/ barrier()
  - (time.h) Fix overly-long definition of force_update_vcpu_system_time

v3:
  - Drop procarea.h
  - Add SPDX headers to all headers touched or added by this patch
  - (altp2m.h) Use ASSERT_UNREACHABLE for function that is meant to
stay unimplemented
  - (current.h) Consistently use plain C types in struct cpu_info
  - (current.h) Fix formatting of inline asm in get_cpu_info()
  - (device.h) Drop unnecessary DEVICE_GIC enumeration inherited from
ARM
  - (div64.h) Drop underscore-prefixed macro variables, fix formatting
  - (domain.h) Drop unnecessary space after cast in guest_mode()
  - (guest_access.h) Clean up line wrapping of functions w/ many
parameters
  - (guest_atomics.h) Avoid overly-long stub macros
  - (grant_table.h) Add include guard + SPDX
  - (hypercall.h) Add include guard + SPDX
  - (mem_access.h) Add include guard + SPDX
  - (mm.h) BUG_ON("unimplemented") for mfn_to_gfn/set_gpfn_from_mfn
  - (mm.h) Define PDX_GROUP_SHIFT in terms of XEN_PT_SHIFT_LVL_3
  - (monitor.h) BUG_ON("unimplemented") for arch_monitor_get_capabilities
  - (system.h) Fix formatting of inline assembly + macros
  - (time.h) Fix too-deep indentation

v2:
  - Use BUG_ON("unimplemented") instead of BUG() for unimplemented functions
to make searching easier.
  - (altp2m.h) Drop Intel license in favor of an SPDX header
  - (altp2m.h) Drop  include in favor of  and
forward declarations of struct domain and struct vcpu.
  - (bug.h) Add TODO comments above BUG and BUG_FRAME implementations
  - (desc.h) Drop desc.h
  - (mm.h) Drop  include
  - (mm.h) Drop PGC_static definition
  - (mm.h) Drop max_page definition
  - (mm.h/mm-radix.c) Drop total_pages definition
  - (monitor.h) Drop  and  includes in favor
of just 
  - (page.h) Drop PAGE_ALIGN definition
  - (percpu.h) Drop  include
  - (procarea.h) Drop license text in favor of SPDX header
  - (procarea.h) Drop unnecessary forward declarations and 
include
  - (processor.h) Fix macro parameter naming (drop leading underscore)
  - (processor.h) Add explanation comment to cpu_relax()
  - (regs.h) Drop stray hunk adding  include
  - (system.h) Drop license text in favor of SPDX header
  - (system.h) Drop  include
  - (opal.c) Drop stray hunk changing opal.c includes

 xen/arch/ppc/Kconfig |   1 +
 xen/arch/ppc/include/asm/Makefile|   2 +
 xen/arch/ppc/include/asm/altp2m.h|  25 +++
 xen/arch/ppc/include/asm/bug.h   |   9 +
 xen/arch/ppc/include/asm/cache.h |   2 +
 xen/arch/ppc/include/asm/config.h|  10 +
 xen/arch/ppc/include/asm/cpufeature.h|  10 +
 xen/arch/ppc/include/asm/current.h   |  43 
 xen/arch/ppc/include/asm/delay.h |  12 ++
 xen/arch/ppc/include/asm/device.h|  53 +
 xen/arch/ppc/include/asm/div64.h |  14 ++
 xen/arch/ppc/include/asm/domain.h|  47 +
 xen/arch/ppc/include/asm/event.h |  36 
 xen/arch/ppc/include/asm/flushtlb.h  |  24 +++
 xen/arch/ppc/include/asm/grant_table.h   |   5 +
 xen/arch/ppc/include/asm/guest_access.h  |  68 +++
 xen/arch/ppc/include/asm/guest_atomics.h |  23 +++
 xen/arch/ppc/include/asm/hardirq.h   |  19 ++
 xen/arch/ppc/include/asm/hypercall.h |   5 +
 xen/arch/ppc/include/asm/io.h|  16 ++
 xen/arch/ppc/include/asm/iocap.h |   8 +
 xen/arch/ppc/include/asm/iommu.h |   8 +
 xen/arch/ppc/include/asm/irq.h   |  33 +++
 xen/arch/ppc/include/asm/mem_access.h|   5 +
 xen/arch/ppc/include/asm/mm.h| 243 ++-
 xen/arch/ppc/include/asm/monitor.h   |  43 
 xen/arch/ppc/include/asm/nospec.h|  15 ++
 xen/arch/ppc/include/asm/numa.h  |  26 +++
 xen/arch/ppc/include/asm/p2m.h   |  95 +
 xen/arch/ppc/include/asm/page.h  |  18 ++
 xen/arch/ppc

[PATCH v6 4/4] xen/ppc: Enable full Xen build

2023-09-14 Thread Shawn Anastasio

Bring ppc's Makefile and arch.mk in line with arm and x86 to disable the
build overrides and enable the full Xen build.

Signed-off-by: Shawn Anastasio 
Reviewed-by: Jan Beulich 
---
 xen/arch/ppc/Makefile | 16 +++-
 xen/arch/ppc/arch.mk  |  3 ---
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/xen/arch/ppc/Makefile b/xen/arch/ppc/Makefile
index 6d5569ff64..71feb5e2c4 100644
--- a/xen/arch/ppc/Makefile
+++ b/xen/arch/ppc/Makefile
@@ -11,10 +11,24 @@ $(TARGET): $(TARGET)-syms
cp -f $< $@
 
 $(TARGET)-syms: $(objtree)/prelink.o $(obj)/xen.lds
-   $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< $(build_id_linker) -o $@
+   $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< \
+   $(objtree)/common/symbols-dummy.o -o $(dot-target).0
+   $(NM) -pa --format=sysv $(dot-target).0 \
+   | $(objtree)/tools/symbols $(all_symbols) --sysv --sort \
+   > $(dot-target).0.S
+   $(MAKE) $(build)=$(@D) $(dot-target).0.o
+   $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< \
+   $(dot-target).0.o -o $(dot-target).1
+   $(NM) -pa --format=sysv $(dot-target).1 \
+   | $(objtree)/tools/symbols $(all_symbols) --sysv --sort \
+   > $(dot-target).1.S
+   $(MAKE) $(build)=$(@D) $(dot-target).1.o
+   $(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< $(build_id_linker) \
+   $(dot-target).1.o -o $@
$(NM) -pa --format=sysv $@ \
| $(objtree)/tools/symbols --all-symbols --xensyms --sysv 
--sort \
> $@.map
+   rm -f $(@D)/.$(@F).[0-9]*
 
 $(obj)/xen.lds: $(src)/xen.lds.S FORCE
$(call if_changed_dep,cpp_lds_S)
diff --git a/xen/arch/ppc/arch.mk b/xen/arch/ppc/arch.mk
index d05cbf1df5..917ad0e6a8 100644
--- a/xen/arch/ppc/arch.mk
+++ b/xen/arch/ppc/arch.mk
@@ -7,6 +7,3 @@ CFLAGS += -m64 -mlittle-endian -mcpu=$(ppc-march-y)
 CFLAGS += -mstrict-align -mcmodel=medium -mabi=elfv2 -fPIC -mno-altivec 
-mno-vsx -msoft-float
 
 LDFLAGS += -m elf64lppc
-
-# TODO: Drop override when more of the build is working
-override ALL_OBJS-y = arch/$(SRCARCH)/built_in.o common/libfdt/built_in.o 
lib/built_in.o
-- 
2.30.2

[PATCH v6 1/4] xen/ppc: Implement bitops.h

2023-09-14 Thread Shawn Anastasio

Implement bitops.h, based on Linux's implementation as of commit
5321d1b1afb9a17302c6cec79f0cbf823eb0d3fc. Though it is based off of
Linux's implementation, this code diverges significantly in a number of
ways:
  - Bitmap entries changed to 32-bit words to match X86 and Arm on Xen
  - PPC32-specific code paths dropped
  - Formatting completely re-done to more closely line up with Xen.
Including 4 space indentation.
  - Use GCC's __builtin_popcount* for hweight* implementation

Signed-off-by: Shawn Anastasio 
---
v6:
  - Fix spacing of line-continuing backslashes in DEFINE_*OP macros
  - Drop unused prefix parameter from DEFINE_*OP macros
  - Reformat argument list in calls to test_and_{set,clear}_bits.

v5:
  - Switch lingering ulong bitop parameters/return values to uint.

v4:
  - Mention __builtin_popcount impelmentation of hweight* in message
  - Change type of BITOP_MASK expression from unsigned long to unsigned
int
  - Fix remaining underscore-prefixed macro params/vars
  - Fix line wrapping in test_and_clear_bit{,s}
  - Change type of test_and_clear_bits' pointer parameter from void *
to unsigned int *. This was already done for other functions but
missed here.
  - De-macroize test_and_set_bits.
  - Fix formatting of ffs{,l} macro's unary operators
  - Drop extra blank in ffz() macro definition

v3:
  - Fix style of inline asm blocks.
  - Fix underscore-prefixed macro parameters/variables
  - Use __builtin_popcount for hweight* implementation
  - Update C functions to use proper Xen coding style

v2:
  - Clarify changes from Linux implementation in commit message
  - Use PPC_ATOMIC_{ENTRY,EXIT}_BARRIER macros from  instead
of hardcoded "sync" instructions in inline assembly blocks.
  - Fix macro line-continuing backslash spacing
  - Fix hard-tab usage in find_*_bit C functions.

 xen/arch/ppc/include/asm/bitops.h | 332 +-
 1 file changed, 329 insertions(+), 3 deletions(-)

diff --git a/xen/arch/ppc/include/asm/bitops.h 
b/xen/arch/ppc/include/asm/bitops.h
index 548e97b414..82a097e401 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -1,9 +1,335 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Adapted from Linux's arch/powerpc/include/asm/bitops.h.
+ *
+ * Merged version by David Gibson .
+ * Based on ppc64 versions by: Dave Engebretsen, Todd Inglett, Don
+ * Reed, Pat McCarthy, Peter Bergner, Anton Blanchard.  They
+ * originally took it from the ppc32 code.
+ */
 #ifndef _ASM_PPC_BITOPS_H
 #define _ASM_PPC_BITOPS_H

+#include 
+
+#define __set_bit(n, p) set_bit(n, p)
+#define __clear_bit(n, p)   clear_bit(n, p)
+
+#define BITOP_BITS_PER_WORD 32
+#define BITOP_MASK(nr)  (1U << ((nr) % BITOP_BITS_PER_WORD))
+#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
+#define BITS_PER_BYTE   8
+
 /* PPC bit number conversion */
-#define PPC_BITLSHIFT(be)  (BITS_PER_LONG - 1 - (be))
-#define PPC_BIT(bit)   (1UL << PPC_BITLSHIFT(bit))
-#define PPC_BITMASK(bs, be)((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
+#define PPC_BITLSHIFT(be)(BITS_PER_LONG - 1 - (be))
+#define PPC_BIT(bit) (1UL << PPC_BITLSHIFT(bit))
+#define PPC_BITMASK(bs, be)  ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
+
+/* Macro for generating the ***_bits() functions */
+#define DEFINE_BITOP(fn, op)   
\
+static inline void fn(unsigned int mask,   
\
+  volatile unsigned int *p_)   
\
+{  
\
+unsigned int old;  
\
+unsigned int *p = (unsigned int *)p_;  
\
+asm volatile ( "1: lwarx %0,0,%3,0\n"  
\
+   #op "%I2 %0,%0,%2\n"
\
+   "stwcx. %0,0,%3\n"  
\
+   "bne- 1b\n" 
\
+   : "=&r" (old), "+m" (*p)
\
+   : "rK" (mask), "r" (p)  
\
+   : "cc", "memory" ); 
\
+}
+
+DEFINE_BITOP(set_bits, or)
+DEFINE_BITOP(change_bits, xor)
+
+#define DEFINE_CLROP(fn)   
\
+static inline void fn(unsigned int mask, volatile unsigned int *p_)
\
+{  
\
+unsigned int old;  
\
+unsigned int *p = (unsigned int *)p_;  
\
+asm volatile ( "1: lwarx %0,0,%3,0\n"  
\
+

[PATCH v6 3/4] xen/ppc: Add stub function and symbol definitions

2023-09-14 Thread Shawn Anastasio

Add stub function and symbol definitions required by common code. If the
file that the definition is supposed to be located in doesn't already
exist yet, temporarily place its definition in the new stubs.c

Signed-off-by: Shawn Anastasio 
Acked-by: Jan Beulich 
---
v6: No changes.

v5: No changes.

v4: No changes.

v3:
  - (stubs.c) Drop ack_none hook definition

v2:
  - Use BUG_ON("unimplemented") instead of BUG() for unimplemented functions
to make searching easier.
  - (mm-radix.c) Drop total_pages definition
  - (mm-radix.c) Move __read_mostly from after variable name to before it in
declaration of `frametable_base_pdx`
  - (setup.c) Fix include order
  - (stubs.c) Drop stub .end hw_irq_controller hook

 xen/arch/ppc/Makefile   |   1 +
 xen/arch/ppc/mm-radix.c |  42 +
 xen/arch/ppc/setup.c|   8 +
 xen/arch/ppc/stubs.c| 339 
 4 files changed, 390 insertions(+)
 create mode 100644 xen/arch/ppc/stubs.c

diff --git a/xen/arch/ppc/Makefile b/xen/arch/ppc/Makefile
index b3205b8f7a..6d5569ff64 100644
--- a/xen/arch/ppc/Makefile
+++ b/xen/arch/ppc/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_EARLY_PRINTK) += early_printk.init.o
 obj-y += mm-radix.o
 obj-y += opal.o
 obj-y += setup.o
+obj-y += stubs.o
 obj-y += tlb-radix.o

 $(TARGET): $(TARGET)-syms
diff --git a/xen/arch/ppc/mm-radix.c b/xen/arch/ppc/mm-radix.c
index 06129cef9c..11d0f27b60 100644
--- a/xen/arch/ppc/mm-radix.c
+++ b/xen/arch/ppc/mm-radix.c
@@ -265,3 +265,45 @@ void __init setup_initial_pagetables(void)
 /* Turn on the MMU */
 enable_mmu();
 }
+
+/*
+ * TODO: Implement the functions below
+ */
+unsigned long __read_mostly frametable_base_pdx;
+
+void put_page(struct page_info *p)
+{
+BUG_ON("unimplemented");
+}
+
+void arch_dump_shared_mem_info(void)
+{
+BUG_ON("unimplemented");
+}
+
+int xenmem_add_to_physmap_one(struct domain *d,
+  unsigned int space,
+  union add_to_physmap_extra extra,
+  unsigned long idx,
+  gfn_t gfn)
+{
+BUG_ON("unimplemented");
+}
+
+int destroy_xen_mappings(unsigned long s, unsigned long e)
+{
+BUG_ON("unimplemented");
+}
+
+int map_pages_to_xen(unsigned long virt,
+ mfn_t mfn,
+ unsigned long nr_mfns,
+ unsigned int flags)
+{
+BUG_ON("unimplemented");
+}
+
+int __init populate_pt_range(unsigned long virt, unsigned long nr_mfns)
+{
+BUG_ON("unimplemented");
+}
diff --git a/xen/arch/ppc/setup.c b/xen/arch/ppc/setup.c
index 3fc7705d9b..959c1454a0 100644
--- a/xen/arch/ppc/setup.c
+++ b/xen/arch/ppc/setup.c
@@ -1,5 +1,8 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -33,3 +36,8 @@ void __init noreturn start_xen(unsigned long r3, unsigned 
long r4,

 unreachable();
 }
+
+void arch_get_xen_caps(xen_capabilities_info_t *info)
+{
+BUG_ON("unimplemented");
+}
diff --git a/xen/arch/ppc/stubs.c b/xen/arch/ppc/stubs.c
new file mode 100644
index 00..4c276b0e39
--- /dev/null
+++ b/xen/arch/ppc/stubs.c
@@ -0,0 +1,339 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/* smpboot.c */
+
+cpumask_t cpu_online_map;
+cpumask_t cpu_present_map;
+cpumask_t cpu_possible_map;
+
+/* ID of the PCPU we're running on */
+DEFINE_PER_CPU(unsigned int, cpu_id);
+/* XXX these seem awfully x86ish... */
+/* representing HT siblings of each logical CPU */
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_mask);
+/* representing HT and core siblings of each logical CPU */
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
+
+nodemask_t __read_mostly node_online_map = { { [0] = 1UL } };
+
+/* time.c */
+
+s_time_t get_s_time(void)
+{
+BUG_ON("unimplemented");
+}
+
+int reprogram_timer(s_time_t timeout)
+{
+BUG_ON("unimplemented");
+}
+
+void send_timer_event(struct vcpu *v)
+{
+BUG_ON("unimplemented");
+}
+
+/* traps.c */
+
+void show_execution_state(const struct cpu_user_regs *regs)
+{
+BUG_ON("unimplemented");
+}
+
+void arch_hypercall_tasklet_result(struct vcpu *v, long res)
+{
+BUG_ON("unimplemented");
+}
+
+void vcpu_show_execution_state(struct vcpu *v)
+{
+BUG_ON("unimplemented");
+}
+
+/* shutdown.c */
+
+void machine_restart(unsigned int delay_millisecs)
+{
+BUG_ON("unimplemented");
+}
+
+void machine_halt(void)
+{
+BUG_ON("unimplemented");
+}
+
+/* vm_event.c */
+
+void vm_event_fill_regs(vm_event_request_t *req)
+{
+BUG_ON("unimplemented");
+}
+
+void vm_event_set_registers(struct vcpu *v, vm_event_response_t *rsp)
+{
+BUG_ON("unimplemented");
+}
+
+void vm_event_monitor_next_interrupt(struct vcpu *v)
+{
+/* Not supported on PPC. */
+}
+
+/* domctl.c */
+void arch_get_domain_info(const struct domain *d,
+  struct xen_domctl_

[PATCH v6 0/4] ppc: Enable full Xen build

2023-09-14 Thread Shawn Anastasio

Hello all,

This patch series performs all of the additions necessary to drop the
build overrides for PPC and enable the full Xen build. Except in cases
where compatibile implementations already exist (e.g. bitops.h), the
newly added definitions are simple, unimplemented stubs that just call
BUG_ON("unimplemented").

Thanks,
Shawn

Shawn Anastasio (4):
  xen/ppc: Implement bitops.h
  xen/ppc: Define minimal stub headers required for full build
  xen/ppc: Add stub function and symbol definitions
  xen/ppc: Enable full Xen build

 xen/arch/ppc/Kconfig |   1 +
 xen/arch/ppc/Makefile|  17 +-
 xen/arch/ppc/arch.mk |   3 -
 xen/arch/ppc/include/asm/Makefile|   2 +
 xen/arch/ppc/include/asm/altp2m.h|  25 ++
 xen/arch/ppc/include/asm/bitops.h| 332 +-
 xen/arch/ppc/include/asm/bug.h   |   9 +
 xen/arch/ppc/include/asm/cache.h |   2 +
 xen/arch/ppc/include/asm/config.h|  10 +
 xen/arch/ppc/include/asm/cpufeature.h|  10 +
 xen/arch/ppc/include/asm/current.h   |  43 +++
 xen/arch/ppc/include/asm/delay.h |  12 +
 xen/arch/ppc/include/asm/device.h|  53 
 xen/arch/ppc/include/asm/div64.h |  14 +
 xen/arch/ppc/include/asm/domain.h|  47 
 xen/arch/ppc/include/asm/event.h |  36 +++
 xen/arch/ppc/include/asm/flushtlb.h  |  24 ++
 xen/arch/ppc/include/asm/grant_table.h   |   5 +
 xen/arch/ppc/include/asm/guest_access.h  |  68 +
 xen/arch/ppc/include/asm/guest_atomics.h |  23 ++
 xen/arch/ppc/include/asm/hardirq.h   |  19 ++
 xen/arch/ppc/include/asm/hypercall.h |   5 +
 xen/arch/ppc/include/asm/io.h|  16 ++
 xen/arch/ppc/include/asm/iocap.h |   8 +
 xen/arch/ppc/include/asm/iommu.h |   8 +
 xen/arch/ppc/include/asm/irq.h   |  33 +++
 xen/arch/ppc/include/asm/mem_access.h|   5 +
 xen/arch/ppc/include/asm/mm.h| 243 +++-
 xen/arch/ppc/include/asm/monitor.h   |  43 +++
 xen/arch/ppc/include/asm/nospec.h|  15 +
 xen/arch/ppc/include/asm/numa.h  |  26 ++
 xen/arch/ppc/include/asm/p2m.h   |  95 +++
 xen/arch/ppc/include/asm/page.h  |  18 ++
 xen/arch/ppc/include/asm/paging.h|   7 +
 xen/arch/ppc/include/asm/pci.h   |   7 +
 xen/arch/ppc/include/asm/percpu.h|  24 ++
 xen/arch/ppc/include/asm/processor.h |  10 +
 xen/arch/ppc/include/asm/random.h|   9 +
 xen/arch/ppc/include/asm/setup.h |   6 +
 xen/arch/ppc/include/asm/smp.h   |  18 ++
 xen/arch/ppc/include/asm/softirq.h   |   8 +
 xen/arch/ppc/include/asm/spinlock.h  |  15 +
 xen/arch/ppc/include/asm/system.h| 219 ++-
 xen/arch/ppc/include/asm/time.h  |  23 ++
 xen/arch/ppc/include/asm/xenoprof.h  |   0
 xen/arch/ppc/mm-radix.c  |  44 ++-
 xen/arch/ppc/setup.c |   8 +
 xen/arch/ppc/stubs.c | 339 +++
 xen/arch/ppc/tlb-radix.c |   2 +-
 xen/include/public/hvm/save.h|   2 +
 xen/include/public/pmu.h |   2 +
 xen/include/public/xen.h |   2 +
 52 files changed, 2004 insertions(+), 11 deletions(-)
 create mode 100644 xen/arch/ppc/include/asm/Makefile
 create mode 100644 xen/arch/ppc/include/asm/altp2m.h
 create mode 100644 xen/arch/ppc/include/asm/cpufeature.h
 create mode 100644 xen/arch/ppc/include/asm/current.h
 create mode 100644 xen/arch/ppc/include/asm/delay.h
 create mode 100644 xen/arch/ppc/include/asm/device.h
 create mode 100644 xen/arch/ppc/include/asm/div64.h
 create mode 100644 xen/arch/ppc/include/asm/domain.h
 create mode 100644 xen/arch/ppc/include/asm/event.h
 create mode 100644 xen/arch/ppc/include/asm/flushtlb.h
 create mode 100644 xen/arch/ppc/include/asm/grant_table.h
 create mode 100644 xen/arch/ppc/include/asm/guest_access.h
 create mode 100644 xen/arch/ppc/include/asm/guest_atomics.h
 create mode 100644 xen/arch/ppc/include/asm/hardirq.h
 create mode 100644 xen/arch/ppc/include/asm/hypercall.h
 create mode 100644 xen/arch/ppc/include/asm/io.h
 create mode 100644 xen/arch/ppc/include/asm/iocap.h
 create mode 100644 xen/arch/ppc/include/asm/iommu.h
 create mode 100644 xen/arch/ppc/include/asm/irq.h
 create mode 100644 xen/arch/ppc/include/asm/mem_access.h
 create mode 100644 xen/arch/ppc/include/asm/monitor.h
 create mode 100644 xen/arch/ppc/include/asm/nospec.h
 create mode 100644 xen/arch/ppc/include/asm/numa.h
 create mode 100644 xen/arch/ppc/include/asm/p2m.h
 create mode 100644 xen/arch/ppc/include/asm/paging.h
 create mode 100644 xen/arch/ppc/include/asm/pci.h
 create mode 100644 xen/arch/ppc/include/asm/percpu.h
 create mode 100644 xen/arch/ppc/include/asm/random.h
 create mode 100644 xen/arch/ppc/include/asm/setup.h
 create mode 100644 xen/arch/ppc/include/asm/smp.h
 create mode 100644 xen/arch/ppc/include/asm/softirq.h
 create mode 1006

Re: [PATCH 5/5] x86/pv: Rewrite %dr6 handling

2023-09-14 Thread Andrew Cooper

On 14/09/2023 5:06 pm, Jan Beulich wrote:
> On 13.09.2023 01:21, Andrew Cooper wrote:
>> --- a/xen/arch/x86/include/asm/domain.h
>> +++ b/xen/arch/x86/include/asm/domain.h
>> @@ -729,6 +729,18 @@ static inline void pv_inject_hw_exception(unsigned int 
>> vector, int errcode)
>>  pv_inject_event(&event);
>>  }
>>  
>> +static inline void pv_inject_DB(unsigned long pending_dbg)
>> +{
>> +struct x86_event event = {
>> +.vector  = X86_EXC_DB,
>> +.type= X86_EVENTTYPE_HW_EXCEPTION,
>> +.error_code  = X86_EVENT_NO_EC,
>> +.pending_dbg = pending_dbg,
> This being a sub-field of an unnamed union, the build will break (also
> in pv_inject_page_fault() then, for cr2 being switched at the same time)
> once again for old enough gcc.

I'm sick and tired of utterly obsolete compiler bugs stopping us writing
good code.

It will break HVM #PF too, and I'll fix it for now as these patches need
backporting, but I've got a very strong mind to intentionally break it
next time this comes up in staging.

>> --- a/xen/arch/x86/pv/emulate.c
>> +++ b/xen/arch/x86/pv/emulate.c
>> @@ -71,11 +71,9 @@ void pv_emul_instruction_done(struct cpu_user_regs *regs, 
>> unsigned long rip)
>>  {
>>  regs->rip = rip;
>>  regs->eflags &= ~X86_EFLAGS_RF;
>> +
>>  if ( regs->eflags & X86_EFLAGS_TF )
>> -{
>> -current->arch.dr6 |= DR_STEP | DR_STATUS_RESERVED_ONE;
>> -pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
>> -}
>> +pv_inject_DB(X86_DR6_BS);
>>  }
> This (not your change, the construct) looks bogus at the first and second
> glance, because of looking at EFLAGS.TF after emulation, when the initial
> state of TF matters. It is only correct (at the third, closer look) because
> the function presently is used only from paths not altering the guest's
> EFLAGS. Do you think it would make sense to add a comment at this occasion?

It is buggy yes, but if you notice, so is SVM's __update_guest_eip() and
VMX's update_guest_eip().

And remember that while for most instructions it's the initial state
that matters, it's the final state that matters for SYSCALL/SYSRET, and
each of LRET/IRET/ERET{U,S} have oddities that aren't currently
expressed in the emulator.

I'll leave a todo for now.  This is a problem that can only reasonably
be fixed by unifying the intercept and emulation paths into a coherent
model of the instruction cycle.

~Andrew

Re: [PATCH 3/5] x86/emul: Add a pending_dbg field to x86_emulate_ctxt.retire

2023-09-14 Thread Andrew Cooper

On 14/09/2023 4:04 pm, Jan Beulich wrote:
> On 13.09.2023 01:21, Andrew Cooper wrote:
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -8379,7 +8379,10 @@ x86_emulate(
>>  if ( !mode_64bit() )
>>  _regs.r(ip) = (uint32_t)_regs.r(ip);
>>  
>> -/* Should a singlestep #DB be raised? */
>> +if ( singlestep )
>> +ctxt->retire.pending_dbg |= X86_DR6_BS;
> We set "singlestep" about first thing in the function. Is it really correct
> to latch that into pending_dbg without regard to rc? (Perhaps yes, seeing
> the comment next to the field declaration.)

Honestly, without seeing some real RTL, I'm not sure.

The thing that is definitely buggy in this retire logic is saying "don't
set singlestep if MovSS is pending".  The correct behaviour is to set
both PENDING_DBG.BS and INTERRUPTIBILITY.MOVSS, the latter of which
causes #DB to be skipped on this instruction boundary.

That's why we get the XSA-260 behaviour - when the SYSCALL/INT/etc
completes and we hit the subsequent instruction boundary, PENDING_DBG
still has values latched in it.

With the DONE path being removed, and with the plan to wire in the VMCS
register for emulations that occur when something is already pending,
then it probably does want to be restricted back to the OKAY case.

e.g. We shouldn't latch BS in EXCEPTION Case, or we'd create more
XSA-260-like behaviour.

I'll adjust in light of this.

~Andrew

Re: [PATCH v5 2/5] xen/ppc: Implement bitops.h

2023-09-14 Thread Shawn Anastasio

On 9/13/23 2:29 AM, Jan Beulich wrote:
> On 12.09.2023 20:35, Shawn Anastasio wrote:
>> Implement bitops.h, based on Linux's implementation as of commit
>> 5321d1b1afb9a17302c6cec79f0cbf823eb0d3fc. Though it is based off of
>> Linux's implementation, this code diverges significantly in a number of
>> ways:
>>   - Bitmap entries changed to 32-bit words to match X86 and Arm on Xen
>>   - PPC32-specific code paths dropped
>>   - Formatting completely re-done to more closely line up with Xen.
>> Including 4 space indentation.
>>   - Use GCC's __builtin_popcount* for hweight* implementation
>>
>> Signed-off-by: Shawn Anastasio 
>> Acked-by: Jan Beulich 
>> ---
>> v5:
>>   - Switch lingering ulong bitop parameters/return values to uint.
>>
>> v4:
>>   - Mention __builtin_popcount impelmentation of hweight* in message
>>   - Change type of BITOP_MASK expression from unsigned long to unsigned
>> int
>>   - Fix remaining underscore-prefixed macro params/vars
>>   - Fix line wrapping in test_and_clear_bit{,s}
>>   - Change type of test_and_clear_bits' pointer parameter from void *
>> to unsigned int *. This was already done for other functions but
>> missed here.
>>   - De-macroize test_and_set_bits.
>>   - Fix formatting of ffs{,l} macro's unary operators
>>   - Drop extra blank in ffz() macro definition
>>
>> v3:
>>   - Fix style of inline asm blocks.
>>   - Fix underscore-prefixed macro parameters/variables
>>   - Use __builtin_popcount for hweight* implementation
>>   - Update C functions to use proper Xen coding style
>>
>> v2:
>>   - Clarify changes from Linux implementation in commit message
>>   - Use PPC_ATOMIC_{ENTRY,EXIT}_BARRIER macros from  instead
>> of hardcoded "sync" instructions in inline assembly blocks.
>>   - Fix macro line-continuing backslash spacing
>>   - Fix hard-tab usage in find_*_bit C functions.
>>
>>  xen/arch/ppc/include/asm/bitops.h | 332 +-
>>  1 file changed, 329 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/ppc/include/asm/bitops.h 
>> b/xen/arch/ppc/include/asm/bitops.h
>> index 548e97b414..0f75ff3f9d 100644
>> --- a/xen/arch/ppc/include/asm/bitops.h
>> +++ b/xen/arch/ppc/include/asm/bitops.h
>> @@ -1,9 +1,335 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +/*
>> + * Adapted from Linux's arch/powerpc/include/asm/bitops.h.
>> + *
>> + * Merged version by David Gibson .
>> + * Based on ppc64 versions by: Dave Engebretsen, Todd Inglett, Don
>> + * Reed, Pat McCarthy, Peter Bergner, Anton Blanchard.  They
>> + * originally took it from the ppc32 code.
>> + */
>>  #ifndef _ASM_PPC_BITOPS_H
>>  #define _ASM_PPC_BITOPS_H
>>
>> +#include 
>> +
>> +#define __set_bit(n, p) set_bit(n, p)
>> +#define __clear_bit(n, p)   clear_bit(n, p)
>> +
>> +#define BITOP_BITS_PER_WORD 32
>> +#define BITOP_MASK(nr)  (1U << ((nr) % BITOP_BITS_PER_WORD))
>> +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
>> +#define BITS_PER_BYTE   8
>> +
>>  /* PPC bit number conversion */
>> -#define PPC_BITLSHIFT(be)   (BITS_PER_LONG - 1 - (be))
>> -#define PPC_BIT(bit)(1UL << PPC_BITLSHIFT(bit))
>> -#define PPC_BITMASK(bs, be) ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
>> +#define PPC_BITLSHIFT(be)(BITS_PER_LONG - 1 - (be))
>> +#define PPC_BIT(bit) (1UL << PPC_BITLSHIFT(bit))
>> +#define PPC_BITMASK(bs, be)  ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs))
>> +
>> +/* Macro for generating the ***_bits() functions */
>> +#define DEFINE_BITOP(fn, op, prefix)
>>\
>> +static inline void fn(unsigned int mask,
>>   \
>> +  volatile unsigned int *p_)
>>\
>> +{   
>>\
>> +unsigned int old;   
>>   \
>> +unsigned int *p = (unsigned int *)p_;   
>>\
> 
> What use is this, when ...
> 
>> +asm volatile ( prefix   
>>\
>> +   "1: lwarx %0,0,%3,0\n"   
>>\
>> +   #op "%I2 %0,%0,%2\n" 
>>\
>> +   "stwcx. %0,0,%3\n"   
>>\
>> +   "bne- 1b\n"  
>>\
>> +   : "=&r" (old), "+m" (*p) 
>>\
> 
> ... the "+m" operand isn't used and ...
> 
>> +   : "rK" (mask), "r" (p)   
>>\
>> +   : "cc", "memory" );  
>>\
> 
> ... there's a memory clobber anyway?
>

I see what you're saying, and I'm not sure why it was written this way
in Linux. That said, this is the kind of thing that

Re: [PATCH 2/5] x86: Introduce x86_merge_dr6()

2023-09-14 Thread Andrew Cooper

On 14/09/2023 3:53 pm, Jan Beulich wrote:
> On 13.09.2023 01:21, Andrew Cooper wrote:
>> The current logic used to update %dr6 when injecting #DB is buggy.  The
>> architectural behaviour is to overwrite B{0..3} and accumulate all other 
>> bits.
> While I consider this behavior plausible, forever since the introduction of
> debug registers in i386 I have been missing a description in the manuals of
> how %dr6 updating works. Can you point me at where the above is actually
> spelled out?

The documentation is very poor.  The comment in the code is based on my
conversations with architects.

APM Vol2 13.1.1.3 Debug-Status Register (DR6) says

"Bits 15:13 of the DR6 register are not cleared by the processor and
must be cleared by software after the contents have been read."

although this is buggy given the addition of BLD in the latest
revision.  I've asked AMD to correct it.

SDM Vol3 18.2.3 Debug Status Register (DR6) says

"Certain debug exceptions may clear bits 0-3. The remaining contents of
the DR6 register are never cleared by the processor."

~Andrew

Re: [PATCH] x86/shutdown: change default reboot method preference

2023-09-14 Thread Andrew Cooper

On 14/09/2023 4:21 pm, Roger Pau Monne wrote:
> The current logic to chose the preferred reboot method is based on the mode 
> Xen
> has been booted into, so if the box is booted from UEFI, the preferred reboot
> method will be to use the ResetSystem() run time service call.
>
> However, that call seems to be widely untested once the firmware has exited
> boot services, and quite often leads to a result similar to:

I don't know how true this is.  What Xen does differently to most of the
rest of the world is not calling SetVirtualAddressMap()

>
> Hardware Dom0 shutdown: rebooting machine
> [ Xen-4.18-unstable  x86_64  debug=y  Tainted:   C]
> CPU:0
> RIP:e008:[<0017>] 0017
> RFLAGS: 00010202   CONTEXT: hypervisor
> [...]
> Xen call trace:
>[<0017>] R 0017
>[] S 83207eff7b50
>[] F machine_restart+0x1da/0x261
>[] F apic_wait_icr_idle+0/0x37
>[] F smp_call_function_interrupt+0xc7/0xcb
>[] F call_function_interrupt+0x20/0x34
>[] F do_IRQ+0x150/0x6f3
>[] F common_interrupt+0x132/0x140
>[] F 
> arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129
>[] F 
> arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7
>[] F arch/x86/domain.c#idle_loop+0xec/0xee
>
> 
> Panic on CPU 0:
> FATAL TRAP: vector = 6 (invalid opcode)
> 
>
> Which in most cases does lead to a reboot, however that's unreliable.
>
> Change the default reboot preference to prefer ACPI over UEFI if available and
> not in reduced hardware mode.
>
> This is in line to what Linux does, so it's unlikely to cause issues on 
> current
> and future hardware, since there's a much higher chance of vendors testing
> hardware with Linux rather than Xen.
>
> I'm not aware of using ACPI reboot causing issues on boxes that do have
> properly implemented ResetSystem() methods.
>
> Signed-off-by: Roger Pau Monné 

Wording aside, Reviewed-by: Andrew Cooper 

This is a giant embarrassment to Xen, and needs fixing.

> ---
> I haven't found anything in the UEFI spec that mandates to use ResetSystem()
> when booted from UEFI, I've only found:
>
> "One method of resetting the platform is through the EFI Runtime Service
> ResetSystem()."
>
> But that's vaguely a recommendation.
>
> I've found a lot of boxes that don't reboot properly using ResetSystem(), and 
> I
> think our default should attempt to make sure Xen reliably reboots on as much
> hardware as possible.

You're supposed to use ResetSystem() so all the value-add can be added
behind your back, but it's a chocolate teapot when it's so broken that
none of the OSes use it...

~Andrew

Re: [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests"

2023-09-14 Thread Elliott Mitchell

On Fri, Sep 08, 2023 at 05:59:11AM +0200, Borislav Petkov wrote:
> On Thu, Sep 07, 2023 at 08:08:00PM -0700, Elliott Mitchell wrote:
> > This reverts commit 767f4b620edadac579c9b8b6660761d4285fa6f9.
> > 
> > There are at least 3 valid reasons why a VM may see MCE events/registers.
> 
> Hmm, so they all read like a bunch of handwaving to me, with those
> probable hypothetical "may" formulations.

Indeed.  At what point is the lack of information and response long
enough to simply commit a revert due to those lacks?

Even with the commit message having been rewritten and the link to:
https://lkml.kernel.org/r/20210628172740.245689-1-smita.koralahallichannabasa...@amd.com
added, this still reads as roughly:

"A hypothetical bug on a hypothetivisor"

I rather suspect a genuine issue was observed, but with absolutely no
detail this is useless.  I can make some guesses, but those guesses
relation to reality is dubious.

On Wed, Sep 13, 2023 at 03:50:12PM +, Luck, Tony wrote:
> > Also, please note that the EDAC modules don't handle MCE events
> > directly. They act on information passed from the MCE subsystem.
> >
> > Furthermore, there are other EDAC modules that have the same !hypervisor
> > check, so why change only this one?
> 
> The older Intel EDAC drivers translated system physical addresses to DIMM
> addresses by digging around in the CONFIG and MMIO space of the memory
> controller devices. It would seem unwise for a VMM to give access to those
> addresses to a guest (in general ... perhaps OK for a Xen style "DOM0" guest 
> that is
> handling many tasks for the VMM?).

Which seems oddly similar to:
"the Linux kernel may be handling adminstrative duties/hardware
for a hypervisor.  In this case, the events need to be processed and
potentially passed back through the hypervisor."

On Wed, Sep 13, 2023 at 12:21:50PM -0400, Yazen Ghannam wrote:
> The MCE decoder may access some newer MCA registers, or request info
> from the MCE subsystem. But this is for informational error decoding. It
> won't support any actions that a guest could take.
> 
> The AMD64 EDAC module reads system-specific memory controller registers
> through non-architectural interfaces. So also unwise or not useful for a
> guest to access.

This could be emulated.  With it not being officially specified the
emulation may not be too accurate, but it is possible.  Admittedly VMware
may have abandoned this level of perfect emulation accuracy, but one
could do it.  Which would be "full virtualization of MCE events."

On Wed, Sep 13, 2023 at 10:36:50AM -0400, Yazen Ghannam wrote:
> Furthermore, there are other EDAC modules that have the same !hypervisor
> check, so why change only this one?

Indeed.  Those will also need similar treatment, but that wouldn't be a
revert of 767f4b620eda.  I found 767f4b620eda in the process of looking
for the correct hook point.

There are at least two, and possibly more, points of view with regards
to MCE and virtualization.  I keep noticing most implementers are
strictly thinking of perfect, full virtualization of hardware, and
missing what is actually desired.

Full virtualization is where you are renting an actual physical slice of
actual hardware, proper virtualization of CEs and UEs is desireable.

In reality most clients merely want to rent the processing power the
hardware provides and not deal with actually owning the hardware.  To
them, CEs are an annoyance since they clutter logs and they're not
something they're in a position to deal with.  Instead the owner of the
hardware wants the CEs so they can monitor hardware health.

What you want depends on your SLAs, but the most prominent authors keep
missing that many clients (VM owners) don't actually want to deal with
CEs.  A SLA could also state a single UE means discarding current VM
state and rolling back to the last known good checkpoint.

-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: [QEMU PATCH v4 10/13] virtio-gpu: Resource UUID

2023-09-14 Thread Akihiko Odaki


On 2023/09/14 17:29, Albert Esteve wrote:



On Thu, Sep 14, 2023 at 9:17 AM Akihiko Odaki > wrote:


On 2023/09/13 23:18, Albert Esteve wrote:
 >
 >
 > On Wed, Sep 13, 2023 at 3:43 PM Akihiko Odaki
mailto:akihiko.od...@daynix.com>
 > >> wrote:
 >
 >     On 2023/09/13 21:58, Albert Esteve wrote:
 >      >
 >      >
 >      > On Wed, Sep 13, 2023 at 2:22 PM Akihiko Odaki
 >     mailto:akihiko.od...@daynix.com>
>
 >      > 
 >           >
 >      >     On 2023/09/13 20:34, Albert Esteve wrote:
 >      >      >
 >      >      >
 >      >      > On Wed, Sep 13, 2023 at 12:34 PM Akihiko Odaki
 >      >     mailto:akihiko.od...@daynix.com> >
 >      >>
 >      >      > 
 >     >
 >      >     
 >      wrote:
 >      >      >
 >      >      >     On 2023/09/13 16:55, Albert Esteve wrote:
 >      >      >      > Hi Antonio,
 >      >      >      >
 >      >      >      > If I'm not mistaken, this patch is related with:
 >      >      >      >
 >      >      >
 >      >
 >
https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg01853.html

 >   
  >

 >      >
 > 
   >>

 >      >      >
 >      >
 > 
   >        >      >      >
 >      >      >
 >      >
 > 
   >  >>

 >      >      >
 >      >
 > 
   >  

 >      >      >      > IMHO, ideally, virtio-gpu and vhost-user-gpu
both,
 >     would
 >      >     use the
 >      >      >      > infrastructure from the patch I linked to
store the
 >      >      >      > virtio objects, so that they can be later
shared with
 >      >     other devices.
 >      >      >
 >      >      >     I don't think such sharing is possible because the
 >     resources are
 >      >      >     identified by IDs that are local to the device.
That also
 >      >     complicates
 >

Re: [PATCH 5/5] x86/pv: Rewrite %dr6 handling

2023-09-14 Thread Jan Beulich

On 13.09.2023 01:21, Andrew Cooper wrote:
> --- a/xen/arch/x86/include/asm/domain.h
> +++ b/xen/arch/x86/include/asm/domain.h
> @@ -729,6 +729,18 @@ static inline void pv_inject_hw_exception(unsigned int 
> vector, int errcode)
>  pv_inject_event(&event);
>  }
>  
> +static inline void pv_inject_DB(unsigned long pending_dbg)
> +{
> +struct x86_event event = {
> +.vector  = X86_EXC_DB,
> +.type= X86_EVENTTYPE_HW_EXCEPTION,
> +.error_code  = X86_EVENT_NO_EC,
> +.pending_dbg = pending_dbg,

This being a sub-field of an unnamed union, the build will break (also
in pv_inject_page_fault() then, for cr2 being switched at the same time)
once again for old enough gcc.

> --- a/xen/arch/x86/pv/emulate.c
> +++ b/xen/arch/x86/pv/emulate.c
> @@ -71,11 +71,9 @@ void pv_emul_instruction_done(struct cpu_user_regs *regs, 
> unsigned long rip)
>  {
>  regs->rip = rip;
>  regs->eflags &= ~X86_EFLAGS_RF;
> +
>  if ( regs->eflags & X86_EFLAGS_TF )
> -{
> -current->arch.dr6 |= DR_STEP | DR_STATUS_RESERVED_ONE;
> -pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
> -}
> +pv_inject_DB(X86_DR6_BS);
>  }

This (not your change, the construct) looks bogus at the first and second
glance, because of looking at EFLAGS.TF after emulation, when the initial
state of TF matters. It is only correct (at the third, closer look) because
the function presently is used only from paths not altering the guest's
EFLAGS. Do you think it would make sense to add a comment at this occasion?

Jan

Re: [PATCH v1 01/29] xen/asm-generic: introduce stub header spinlock.h

2023-09-14 Thread Jan Beulich

On 14.09.2023 16:56, Oleksii Kurochko wrote:
> The patch introduces stub header needed for full Xen build.
> 
> Signed-off-by: Oleksii Kurochko 

Hmm, looking here I think I need to take back what I said in reply
to the cover letter, taking this as an example.

> --- /dev/null
> +++ b/xen/include/asm-generic/spinlock.h
> @@ -0,0 +1,23 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#ifndef __ASM_GENERIC_SPINLOCK_H__
> +#define __ASM_GENERIC_SPINLOCK_H__
> +
> +#define arch_lock_acquire_barrier() smp_mb()
> +#define arch_lock_release_barrier() smp_mb()
> +
> +#define arch_lock_relax() cpu_relax()
> +#define arch_lock_signal() do { \
> +} while(0)

Slightly easier (and without style violation) as ((void)0)?

> +#define arch_lock_signal_wmb() arch_lock_signal()

How's the WMB aspect represented in here? I think you need the x86
variant as the generic fallback.

[PATCH v1 27/29] xen/asm-generic: introduce stub header numa.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/numa.h | 35 ++
 1 file changed, 35 insertions(+)
 create mode 100644 xen/include/asm-generic/numa.h

diff --git a/xen/include/asm-generic/numa.h b/xen/include/asm-generic/numa.h
new file mode 100644
index 00..028f7b3638
--- /dev/null
+++ b/xen/include/asm-generic/numa.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ARCH_GENERIC_NUMA_H
+#define __ARCH_GENERIC_NUMA_H
+
+#include 
+
+typedef u8 nodeid_t;
+
+/* Fake one node for now. See also node_online_map. */
+#define cpu_to_node(cpu) 0
+#define node_to_cpumask(node)   (cpu_online_map)
+
+extern mfn_t first_valid_mfn;
+
+#define node_spanned_pages(nid) (max_page - mfn_x(first_valid_mfn))
+#define node_start_pfn(nid) (mfn_x(first_valid_mfn))
+#define __node_distance(a, b) (20)
+
+static inline unsigned int arch_get_dma_bitsize(void)
+{
+return 32;
+}
+
+#define arch_want_default_dmazone() (false)
+
+#endif /* __ARCH_GENERIC_NUMA_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 29/29] xen/asm-generic: introduce stub header softirq.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/softirq.h | 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 xen/include/asm-generic/softirq.h

diff --git a/xen/include/asm-generic/softirq.h 
b/xen/include/asm-generic/softirq.h
new file mode 100644
index 00..83be855e50
--- /dev/null
+++ b/xen/include/asm-generic/softirq.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_SOFTIRQ_H__
+#define __ASM_GENERIC_SOFTIRQ_H__
+
+#define NR_ARCH_SOFTIRQS   0
+
+#define arch_skip_send_event_check(cpu) 0
+
+#endif /* __ASM_GENERIC_SOFTIRQ_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 19/29] xen/asm-generic: introduce stub header hardirq.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/hardirq.h | 27 +++
 1 file changed, 27 insertions(+)
 create mode 100644 xen/include/asm-generic/hardirq.h

diff --git a/xen/include/asm-generic/hardirq.h 
b/xen/include/asm-generic/hardirq.h
new file mode 100644
index 00..b4b71a7315
--- /dev/null
+++ b/xen/include/asm-generic/hardirq.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_HARDIRQ_H
+#define __ASM_GENERIC_HARDIRQ_H
+
+#include 
+
+typedef struct {
+unsigned long __softirq_pending;
+unsigned int __local_irq_count;
+} __cacheline_aligned irq_cpustat_t;
+
+#include /* Standard mappings for irq_cpustat_t above */
+
+#define in_irq() (local_irq_count(smp_processor_id()) != 0)
+
+#define irq_enter() (local_irq_count(smp_processor_id())++)
+#define irq_exit()  (local_irq_count(smp_processor_id())--)
+
+#endif /* __ASM_GENERIC_HARDIRQ_H */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 23/29] xen/asm-generic: introduce stub header domain.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/domain.h | 53 
 1 file changed, 53 insertions(+)
 create mode 100644 xen/include/asm-generic/domain.h

diff --git a/xen/include/asm-generic/domain.h b/xen/include/asm-generic/domain.h
new file mode 100644
index 00..b2d244d121
--- /dev/null
+++ b/xen/include/asm-generic/domain.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_DOMAIN_H__
+#define __ASM_GENERIC_DOMAIN_H__
+
+#include 
+#include 
+
+struct hvm_domain
+{
+uint64_t  params[HVM_NR_PARAMS];
+};
+
+#define is_domain_direct_mapped(d) ((void)(d), 0)
+
+struct arch_vcpu_io {
+};
+
+struct arch_vcpu {
+};
+
+struct arch_domain {
+struct hvm_domain hvm;
+};
+
+#include 
+
+static inline struct vcpu_guest_context *alloc_vcpu_guest_context(void)
+{
+return xmalloc(struct vcpu_guest_context);
+}
+
+static inline void free_vcpu_guest_context(struct vcpu_guest_context *vgc)
+{
+xfree(vgc);
+}
+
+struct guest_memory_policy {};
+static inline void update_guest_memory_policy(struct vcpu *v,
+  struct guest_memory_policy *gmp)
+{}
+
+static inline void arch_vcpu_block(struct vcpu *v) {}
+
+#endif /* __ASM_GENERIC_DOMAIN_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: BSD
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 25/29] xen/asm-generic: introduce stub header irq.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/irq.h | 44 +++
 1 file changed, 44 insertions(+)
 create mode 100644 xen/include/asm-generic/irq.h

diff --git a/xen/include/asm-generic/irq.h b/xen/include/asm-generic/irq.h
new file mode 100644
index 00..5f68cbd10d
--- /dev/null
+++ b/xen/include/asm-generic/irq.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_IRQ_H__
+#define __ASM_GENERIC_IRQ_H__
+
+#include 
+#include 
+#include 
+
+/* TODO */
+#define nr_irqs 0U
+#define nr_static_irqs 0
+#define arch_hwdom_irqs(domid) 0U
+
+#define domain_pirq_to_irq(d, pirq) (pirq)
+
+#define arch_evtchn_bind_pirq(d, pirq) ((void)((d) + (pirq)))
+
+struct arch_pirq {
+};
+
+struct arch_irq_desc {
+unsigned int type;
+};
+
+static inline void arch_move_irqs(struct vcpu *v)
+{
+BUG();
+}
+
+static inline int platform_get_irq(const struct dt_device_node *device, int 
index)
+{
+BUG();
+}
+
+#endif /* __ASM_GENERIC_IRQ_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: BSD
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 26/29] xen/asm-generic: introduce stub header monitor.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/monitor.h | 64 +++
 1 file changed, 64 insertions(+)
 create mode 100644 xen/include/asm-generic/monitor.h

diff --git a/xen/include/asm-generic/monitor.h 
b/xen/include/asm-generic/monitor.h
new file mode 100644
index 00..c61fe738a8
--- /dev/null
+++ b/xen/include/asm-generic/monitor.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * include/asm-GENERIC/monitor.h
+ *
+ * Arch-specific monitor_op domctl handler.
+ *
+ * Copyright (c) 2015 Tamas K Lengyel (ta...@tklengyel.com)
+ * Copyright (c) 2016, Bitdefender S.R.L.
+ *
+ */
+
+#ifndef __ASM_GENERIC_MONITOR_H__
+#define __ASM_GENERIC_MONITOR_H__
+
+#include 
+#include 
+
+static inline
+void arch_monitor_allow_userspace(struct domain *d, bool allow_userspace)
+{
+}
+
+static inline
+int arch_monitor_domctl_op(struct domain *d, struct xen_domctl_monitor_op *mop)
+{
+/* No arch-specific monitor ops on GENERIC. */
+return -EOPNOTSUPP;
+}
+
+int arch_monitor_domctl_event(struct domain *d,
+  struct xen_domctl_monitor_op *mop);
+
+static inline
+int arch_monitor_init_domain(struct domain *d)
+{
+/* No arch-specific domain initialization on GENERIC. */
+return 0;
+}
+
+static inline
+void arch_monitor_cleanup_domain(struct domain *d)
+{
+/* No arch-specific domain cleanup on GENERIC. */
+}
+
+static inline uint32_t arch_monitor_get_capabilities(struct domain *d)
+{
+uint32_t capabilities = 0;
+
+return capabilities;
+}
+
+#endif /* __ASM_GENERIC_MONITOR_H__ */
+
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: BSD
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 20/29] xen/asm-generic: introduce stub header div64.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/div64.h | 24 
 1 file changed, 24 insertions(+)
 create mode 100644 xen/include/asm-generic/div64.h

diff --git a/xen/include/asm-generic/div64.h b/xen/include/asm-generic/div64.h
new file mode 100644
index 00..9f9c20878b
--- /dev/null
+++ b/xen/include/asm-generic/div64.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_DIV64
+#define __ASM_GENERIC_DIV64
+
+#include 
+
+# define do_div(n,base) ({  \
+uint32_t __base = (base);   \
+uint32_t __rem; \
+__rem = ((uint64_t)(n)) % __base;   \
+(n) = ((uint64_t)(n)) / __base; \
+__rem;  \
+ })
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH] x86/shutdown: change default reboot method preference

2023-09-14 Thread Roger Pau Monne

The current logic to chose the preferred reboot method is based on the mode Xen
has been booted into, so if the box is booted from UEFI, the preferred reboot
method will be to use the ResetSystem() run time service call.

However, that call seems to be widely untested once the firmware has exited
boot services, and quite often leads to a result similar to:

Hardware Dom0 shutdown: rebooting machine
[ Xen-4.18-unstable  x86_64  debug=y  Tainted:   C]
CPU:0
RIP:e008:[<0017>] 0017
RFLAGS: 00010202   CONTEXT: hypervisor
[...]
Xen call trace:
   [<0017>] R 0017
   [] S 83207eff7b50
   [] F machine_restart+0x1da/0x261
   [] F apic_wait_icr_idle+0/0x37
   [] F smp_call_function_interrupt+0xc7/0xcb
   [] F call_function_interrupt+0x20/0x34
   [] F do_IRQ+0x150/0x6f3
   [] F common_interrupt+0x132/0x140
   [] F 
arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129
   [] F 
arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7
   [] F arch/x86/domain.c#idle_loop+0xec/0xee


Panic on CPU 0:
FATAL TRAP: vector = 6 (invalid opcode)


Which in most cases does lead to a reboot, however that's unreliable.

Change the default reboot preference to prefer ACPI over UEFI if available and
not in reduced hardware mode.

This is in line to what Linux does, so it's unlikely to cause issues on current
and future hardware, since there's a much higher chance of vendors testing
hardware with Linux rather than Xen.

I'm not aware of using ACPI reboot causing issues on boxes that do have
properly implemented ResetSystem() methods.

Signed-off-by: Roger Pau Monné 
---
I haven't found anything in the UEFI spec that mandates to use ResetSystem()
when booted from UEFI, I've only found:

"One method of resetting the platform is through the EFI Runtime Service
ResetSystem()."

But that's vaguely a recommendation.

I've found a lot of boxes that don't reboot properly using ResetSystem(), and I
think our default should attempt to make sure Xen reliably reboots on as much
hardware as possible.
---
 xen/arch/x86/shutdown.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/shutdown.c b/xen/arch/x86/shutdown.c
index 7619544d14da..81d8eda64a41 100644
--- a/xen/arch/x86/shutdown.c
+++ b/xen/arch/x86/shutdown.c
@@ -150,12 +150,12 @@ static void default_reboot_type(void)
 
 if ( xen_guest )
 reboot_type = BOOT_XEN;
+else if ( !acpi_disabled && !acpi_gbl_reduced_hardware )
+reboot_type = BOOT_ACPI;
 else if ( efi_enabled(EFI_RS) )
 reboot_type = BOOT_EFI;
-else if ( acpi_disabled )
-reboot_type = BOOT_KBD;
 else
-reboot_type = BOOT_ACPI;
+reboot_type = BOOT_KBD;
 }
 
 static int __init cf_check override_reboot(const struct dmi_system_id *d)
-- 
2.42.0

[libvirt test] 183001: tolerable all pass - PUSHED

2023-09-14 Thread osstest service owner

flight 183001 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183001/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 182958
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 182958
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 182958
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  1f85f0967b12521f3d7321cb8469393e72d0ce7a
baseline version:
 libvirt  ca40725a2101c597c03661771a4e546001f629cc

Last test of basis   182958  2023-09-12 04:18:50 Z2 days
Failing since182989  2023-09-13 04:18:52 Z1 days2 attempts
Testing same since   183001  2023-09-14 04:18:45 Z0 days1 attempts


People who touched revisions under test:
  Erik Skultety 
  Jonathon Jongsma 
  Peter Krempa 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
htt

[PATCH v1 28/29] xen/asm-generic: introduce stub header p2m.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/p2m.h | 115 ++
 1 file changed, 115 insertions(+)
 create mode 100644 xen/include/asm-generic/p2m.h

diff --git a/xen/include/asm-generic/p2m.h b/xen/include/asm-generic/p2m.h
new file mode 100644
index 00..554fd46608
--- /dev/null
+++ b/xen/include/asm-generic/p2m.h
@@ -0,0 +1,115 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_P2M_H__
+#define __ASM_GENERIC_P2M_H__
+
+#include 
+
+#define paddr_bits PADDR_BITS
+
+/*
+ * List of possible type for each page in the p2m entry.
+ * The number of available bit per page in the pte for this purpose is 4 bits.
+ * So it's possible to only have 16 fields. If we run out of value in the
+ * future, it's possible to use higher value for pseudo-type and don't store
+ * them in the p2m entry.
+ */
+typedef enum {
+p2m_invalid = 0,/* Nothing mapped here */
+p2m_ram_rw, /* Normal read/write guest RAM */
+p2m_ram_ro, /* Read-only; writes are silently dropped */
+p2m_mmio_direct_dev,/* Read/write mapping of genuine Device MMIO area */
+p2m_mmio_direct_nc, /* Read/write mapping of genuine MMIO area 
non-cacheable */
+p2m_mmio_direct_c,  /* Read/write mapping of genuine MMIO area cacheable */
+p2m_map_foreign_rw, /* Read/write RAM pages from foreign domain */
+p2m_map_foreign_ro, /* Read-only RAM pages from foreign domain */
+p2m_grant_map_rw,   /* Read/write grant mapping */
+p2m_grant_map_ro,   /* Read-only grant mapping */
+/* The types below are only used to decide the page attribute in the P2M */
+p2m_iommu_map_rw,   /* Read/write iommu mapping */
+p2m_iommu_map_ro,   /* Read-only iommu mapping */
+p2m_max_real_type,  /* Types after this won't be store in the p2m */
+} p2m_type_t;
+
+#include 
+
+static inline int get_page_and_type(struct page_info *page,
+struct domain *domain,
+unsigned long type)
+{
+BUG();
+return 1;
+}
+
+/* Look up a GFN and take a reference count on the backing page. */
+typedef unsigned int p2m_query_t;
+#define P2M_ALLOC(1u<<0)   /* Populate PoD and paged-out entries */
+#define P2M_UNSHARE  (1u<<1)   /* Break CoW sharing */
+
+static inline struct page_info *get_page_from_gfn(
+struct domain *d, unsigned long gfn, p2m_type_t *t, p2m_query_t q)
+{
+BUG();
+return NULL;
+}
+
+static inline void memory_type_changed(struct domain *d)
+{
+BUG();
+}
+
+
+static inline int guest_physmap_mark_populate_on_demand(struct domain *d, 
unsigned long gfn,
+unsigned int order)
+{
+BUG();
+return 1;
+}
+
+static inline int guest_physmap_add_entry(struct domain *d,
+gfn_t gfn,
+mfn_t mfn,
+unsigned long page_order,
+p2m_type_t t)
+{
+BUG();
+return 1;
+}
+
+/* Untyped version for RAM only, for compatibility */
+static inline int __must_check
+guest_physmap_add_page(struct domain *d, gfn_t gfn, mfn_t mfn,
+   unsigned int page_order)
+{
+return guest_physmap_add_entry(d, gfn, mfn, page_order, p2m_ram_rw);
+}
+
+static inline mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn)
+{
+BUG();
+return _mfn(0);
+}
+
+static inline bool arch_acquire_resource_check(struct domain *d)
+{
+/*
+ * The reference counting of foreign entries in set_foreign_p2m_entry()
+ * is supported on GENERIC.
+ */
+return true;
+}
+
+static inline void p2m_altp2m_check(struct vcpu *v, uint16_t idx)
+{
+/* Not supported on GENERIC. */
+}
+
+#endif /* __ASM_GENERIC_P2M_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: BSD
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 24/29] xen/asm-generic: introduce stub header guest_access.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/guest_access.h | 31 ++
 1 file changed, 31 insertions(+)
 create mode 100644 xen/include/asm-generic/guest_access.h

diff --git a/xen/include/asm-generic/guest_access.h 
b/xen/include/asm-generic/guest_access.h
new file mode 100644
index 00..b865d37f4e
--- /dev/null
+++ b/xen/include/asm-generic/guest_access.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_GUEST_ACCESS_H__
+#define __ASM_GENERIC_GUEST_ACCESS_H__
+
+#include 
+
+unsigned long raw_copy_to_guest(void *to, const void *from, unsigned len);
+unsigned long raw_copy_from_guest(void *to, const void *from, unsigned len);
+
+#define __raw_copy_to_guest raw_copy_to_guest
+#define __raw_copy_from_guest raw_copy_from_guest
+
+#define guest_handle_okay(hnd, nr) (1)
+#define guest_handle_subrange_okay(hnd, first, last) (1)
+
+struct domain;
+unsigned long copy_to_guest_phys(struct domain *d,
+ paddr_t gpa,
+ void *buf,
+ unsigned int len);
+
+#endif /* __ASM_GENERIC_GUEST_ACCESS_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

Re: [PATCH 1/5] x86/pv: Fix the determiniation of whether to inject #DB

2023-09-14 Thread Jan Beulich

On 14.09.2023 16:49, Andrew Cooper wrote:
> On 14/09/2023 3:40 pm, Jan Beulich wrote:
>> On 13.09.2023 01:21, Andrew Cooper wrote:
>>> We long ago fixed the emulator to not inject exceptions behind our back.
>>> Therefore, assert that that a PV event (including interrupts, because that
>>> would be buggy too) isn't pending, rather than skipping the #DB injection if
>>> one is.
>>>
>>> On the other hand, the io_emul() stubs which use X86EMUL_DONE rather than
>>> X86EMUL_OKAY may have pending breakpoints to inject after the IO access is
>>> complete, not to mention a pending singlestep.
>> If you look at the uses of X86EMUL_DONE you'll see that this error code is
>> not intended to ever come back from the emulator. It's solely used to
>> communicate between hooks and the core emulator. Therefore I think this
>> part of the description and the added case label are wrong here. With them
>> dropped again ...
> 
> Oh.  I see that now you've pointed it out, but it's far from clear.
> 
> I'd suggest that we extend the the debug wrapper for x86_emulate() with
> an assertion to this effect.  It also has a knock-on effect in later
> patches.
> 
> With the DONE part dropped, this probably wants merging into patch 4. 
> Thoughts?

Not sure. Even then the patch here looks to still make sense on its own.
I don't mind the folding, I guess, but for the moment the two R-b are
only on the individual patches.

Jan

Re: [PATCH 4/5] x86/pv: Drop priv_op_ctxt.bpmatch and use pending_dbg instead

2023-09-14 Thread Jan Beulich

On 13.09.2023 01:21, Andrew Cooper wrote:
> With a full pending_dbg field in x86_emulate_ctxt, use it rather than using a
> local bpmatch field.
> 
> This simplifies the OKAY/DONE path as singlestep is already accumulated by
> x86_emulate() when appropriate.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich

[PATCH v1 17/29] xen/asm-generic: introduce stub header percpu.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/percpu.h | 35 
 1 file changed, 35 insertions(+)
 create mode 100644 xen/include/asm-generic/percpu.h

diff --git a/xen/include/asm-generic/percpu.h b/xen/include/asm-generic/percpu.h
new file mode 100644
index 00..d1069adb61
--- /dev/null
+++ b/xen/include/asm-generic/percpu.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_PERCPU_H__
+#define __ASM_GENERIC_PERCPU_H__
+
+#ifndef __ASSEMBLY__
+
+#include 
+
+extern char __per_cpu_start[], __per_cpu_data_end[];
+extern unsigned long __per_cpu_offset[NR_CPUS];
+void percpu_init_areas(void);
+
+#define per_cpu(var, cpu)  \
+(*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
+
+#define this_cpu(var) \
+(*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[get_processor_id()]))
+
+#define per_cpu_ptr(var, cpu)  \
+(*RELOC_HIDE(var, __per_cpu_offset[cpu]))
+#define this_cpu_ptr(var) \
+(*RELOC_HIDE(var, get_processor_id()))
+
+#endif
+
+#endif /* __ASM_GENERIC_PERCPU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 22/29] xen/asm-generic: introduce stub header delay.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/delay.h | 21 +
 1 file changed, 21 insertions(+)
 create mode 100644 xen/include/asm-generic/delay.h

diff --git a/xen/include/asm-generic/delay.h b/xen/include/asm-generic/delay.h
new file mode 100644
index 00..1e68c6cacb
--- /dev/null
+++ b/xen/include/asm-generic/delay.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_GENERIC_DELAY_H
+#define _ASM_GENERIC_DELAY_H
+
+#include 
+
+static inline void udelay(unsigned long usecs)
+{
+BUG();
+}
+
+#endif /* _ASM_GENERIC_DELAY_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: BSD
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 16/29] xen/asm-generic: introduce stub header flushtlb.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/flushtlb.h | 42 ++
 1 file changed, 42 insertions(+)
 create mode 100644 xen/include/asm-generic/flushtlb.h

diff --git a/xen/include/asm-generic/flushtlb.h 
b/xen/include/asm-generic/flushtlb.h
new file mode 100644
index 00..79e4773179
--- /dev/null
+++ b/xen/include/asm-generic/flushtlb.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_FLUSHTLB_H__
+#define __ASM_GENERIC_FLUSHTLB_H__
+
+#include 
+
+/*
+ * Filter the given set of CPUs, removing those that definitely flushed their
+ * TLB since @page_timestamp.
+ */
+/* XXX lazy implementation just doesn't clear anything */
+static inline void tlbflush_filter(cpumask_t *mask, uint32_t page_timestamp) {}
+
+#define tlbflush_current_time() (0)
+
+static inline void page_set_tlbflush_timestamp(struct page_info *page)
+{
+BUG();
+}
+
+/* Flush specified CPUs' TLBs */
+void arch_flush_tlb_mask(const cpumask_t *mask);
+
+#endif /* __ASM_GENERIC_FLUSHTLB_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: BSD
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

Re: [PATCH v1 00/29] Introduce stub headers necessary for full Xen build

2023-09-14 Thread Jan Beulich

On 14.09.2023 16:56, Oleksii Kurochko wrote:
> Based on two patch series [1] and [2], the idea of which is to provide minimal
> amount of things for a complete Xen build, a large amount of headers are the 
> same
> or almost the same, so it makes sense to move them to asm-generic.
> 
> Also, providing such stub headers should help future architectures to add
> a full Xen build.
> 
> [1] 
> https://lore.kernel.org/xen-devel/cover.1694543103.git.sanasta...@raptorengineering.com/
> [2] 
> https://lore.kernel.org/xen-devel/cover.1692181079.git.oleksii.kuroc...@gmail.com/
> 
> Oleksii Kurochko (29):
>   xen/asm-generic: introduce stub header spinlock.h

At the example of this, personally I think this goes too far. Headers in
asm-generic should be for the case where an arch elects to not implement
certain functionality. Clearly spinlocks are required uniformly.

Jan

Re: [PATCH 3/5] x86/emul: Add a pending_dbg field to x86_emulate_ctxt.retire

2023-09-14 Thread Jan Beulich

On 13.09.2023 01:21, Andrew Cooper wrote:
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -8379,7 +8379,10 @@ x86_emulate(
>  if ( !mode_64bit() )
>  _regs.r(ip) = (uint32_t)_regs.r(ip);
>  
> -/* Should a singlestep #DB be raised? */
> +if ( singlestep )
> +ctxt->retire.pending_dbg |= X86_DR6_BS;

We set "singlestep" about first thing in the function. Is it really correct
to latch that into pending_dbg without regard to rc? (Perhaps yes, seeing
the comment next to the field declaration.)

> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
> @@ -588,15 +588,26 @@ struct x86_emulate_ctxt
>  /* Canonical opcode (see below) (valid only on X86EMUL_OKAY). */
>  unsigned int opcode;
>  
> -/* Retirement state, set by the emulator (valid only on X86EMUL_OKAY). */
> +/*
> + * Retirement state, set by the emulator (valid only on 
> X86EMUL_OKAY/DONE).
> + *
> + * TODO: all this state should be input/output from the VMCS PENDING_DBG,
> + * INTERRUPTIBILITY and ACTIVITIY fields.
> + */
>  union {
> -uint8_t raw;
> +unsigned long raw;
>  struct {
> +/*
> + * Accumulated %dr6 trap bits, positive polarity.  Should only be
> + * interpreted in the case of X86EMUL_OKAY/DONE.
> + */
> +unsigned int pending_dbg;
> +
>  bool hlt:1;  /* Instruction HLTed. */
>  bool mov_ss:1;   /* Instruction sets MOV-SS irq shadow. */
>  bool sti:1;  /* Instruction sets STI irq shadow. */
>  bool unblock_nmi:1;  /* Instruction clears NMI blocking. */
> -bool singlestep:1;   /* Singlestepping was active. */
> +bool singlestep:1;   /* Singlestepping was active. (TODO, merge 
> into pending_dbg) */
>  };
>  } retire;
>  

DONE has wrongly made it into here, as pointed out for patch 1.

Jan

[PATCH v1 18/29] xen/asm-generic: introduce stub header smp.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/smp.h | 30 ++
 1 file changed, 30 insertions(+)
 create mode 100644 xen/include/asm-generic/smp.h

diff --git a/xen/include/asm-generic/smp.h b/xen/include/asm-generic/smp.h
new file mode 100644
index 00..5d6b7185f1
--- /dev/null
+++ b/xen/include/asm-generic/smp.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_SMP_H
+#define __ASM_GENERIC_SMP_H
+
+#ifndef __ASSEMBLY__
+#include 
+#include 
+#endif
+
+DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_mask);
+DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
+
+#define cpu_is_offline(cpu) unlikely(!cpu_online(cpu))
+
+/*
+ * Do we, for platform reasons, need to actually keep CPUs online when we
+ * would otherwise prefer them to be off?
+ */
+#define park_offline_cpus false
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: BSD
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 14/29] xen/asm-generic: introduce stub header setup.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/setup.h | 16 
 1 file changed, 16 insertions(+)
 create mode 100644 xen/include/asm-generic/setup.h

diff --git a/xen/include/asm-generic/setup.h b/xen/include/asm-generic/setup.h
new file mode 100644
index 00..37feac222f
--- /dev/null
+++ b/xen/include/asm-generic/setup.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_SETUP_H__
+#define __ASM_GENERIC_SETUP_H__
+
+#define max_init_domid (0)
+
+#endif /* __ASM_GENERIC_SETUP_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: BSD
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 13/29] xen/asm-generic: introduce stub header random.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/random.h | 20 
 1 file changed, 20 insertions(+)
 create mode 100644 xen/include/asm-generic/random.h

diff --git a/xen/include/asm-generic/random.h b/xen/include/asm-generic/random.h
new file mode 100644
index 00..cd2307e70b
--- /dev/null
+++ b/xen/include/asm-generic/random.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_RANDOM_H__
+#define __ASM_GENERIC_RANDOM_H__
+
+static inline unsigned int arch_get_random(void)
+{
+return 0;
+}
+
+#endif /* __ASM_GENERIC_RANDOM_H__ */
+
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: BSD
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 21/29] xen/asm-generic: introduce stub header altp2m.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/altp2m.h | 34 
 1 file changed, 34 insertions(+)
 create mode 100644 xen/include/asm-generic/altp2m.h

diff --git a/xen/include/asm-generic/altp2m.h b/xen/include/asm-generic/altp2m.h
new file mode 100644
index 00..e73cc8a04f
--- /dev/null
+++ b/xen/include/asm-generic/altp2m.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_ALTP2M_H
+#define __ASM_GENERIC_ALTP2M_H
+
+#include 
+
+struct domain;
+struct vcpu;
+
+/* Alternate p2m on/off per domain */
+static inline bool altp2m_active(const struct domain *d)
+{
+/* Not implemented on GENERIC. */
+return false;
+}
+
+/* Alternate p2m VCPU */
+static inline uint16_t altp2m_vcpu_idx(const struct vcpu *v)
+{
+/* Not implemented on GENERIC, should not be reached. */
+BUG();
+return 0;
+}
+
+#endif /* __ASM_GENERIC_ALTP2M_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: BSD
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 12/29] xen/asm-generic: introduce stub header pci.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/pci.h | 18 ++
 1 file changed, 18 insertions(+)
 create mode 100644 xen/include/asm-generic/pci.h

diff --git a/xen/include/asm-generic/pci.h b/xen/include/asm-generic/pci.h
new file mode 100644
index 00..b577ee105f
--- /dev/null
+++ b/xen/include/asm-generic/pci.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_PCI_H__
+#define __ASM_GENERIC_PCI_H__
+
+struct arch_pci_dev {
+};
+
+#endif /* __ASM_GENERIC_PCI_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: BSD
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
-- 
2.41.0

[PATCH v1 15/29] xen/asm-generic: introduce stub header xenoprof.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/xenoprof.h | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 xen/include/asm-generic/xenoprof.h

diff --git a/xen/include/asm-generic/xenoprof.h 
b/xen/include/asm-generic/xenoprof.h
new file mode 100644
index 00..8ee3408b77
--- /dev/null
+++ b/xen/include/asm-generic/xenoprof.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_XENOPROF_H__
+#define __ASM_GENERIC_XENOPROF_H__
+
+#endif /* __ASM_GENERIC_XENOPROF_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: BSD
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 09/29] xen/asm-generic: introduce stub header iocap.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/iocap.h | 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 xen/include/asm-generic/iocap.h

diff --git a/xen/include/asm-generic/iocap.h b/xen/include/asm-generic/iocap.h
new file mode 100644
index 00..dd7cb45488
--- /dev/null
+++ b/xen/include/asm-generic/iocap.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_IOCAP_H__
+#define __ASM_GENERIC_IOCAP_H__
+
+#define cache_flush_permitted(d)\
+(!rangeset_is_empty((d)->iomem_caps))
+
+#endif /* __ASM_GENERIC_IOCAP_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 10/29] xen/asm-generic: introduce stub header iommu.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces stub header necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/iommu.h | 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 xen/include/asm-generic/iommu.h

diff --git a/xen/include/asm-generic/iommu.h b/xen/include/asm-generic/iommu.h
new file mode 100644
index 00..b08550e6b3
--- /dev/null
+++ b/xen/include/asm-generic/iommu.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_IOMMU_H__
+#define __ASM_GENERIC_IOMMU_H__
+
+struct arch_iommu {
+};
+
+#endif /* __ASM_IOMMU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 07/29] xen/asm-generic: introduce stub header guest_atomics.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces stub header needed for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/guest_atomics.h | 47 +
 1 file changed, 47 insertions(+)
 create mode 100644 xen/include/asm-generic/guest_atomics.h

diff --git a/xen/include/asm-generic/guest_atomics.h 
b/xen/include/asm-generic/guest_atomics.h
new file mode 100644
index 00..6c4e79350a
--- /dev/null
+++ b/xen/include/asm-generic/guest_atomics.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_GUEST_ATOMICS_H
+#define __ASM_GENERIC_GUEST_ATOMICS_H
+
+#define guest_testop(name)  \
+static inline int guest_##name(struct domain *d, int nr, volatile void *p)  \
+{   \
+(void) d;   \
+(void) nr;  \
+(void) p;   \
+\
+return 0;   \
+}
+
+#define guest_bitop(name)   \
+static inline void guest_##name(struct domain *d, int nr, volatile void *p) \
+{   \
+(void) d;   \
+(void) nr;  \
+(void) p;   \
+}
+
+guest_bitop(set_bit)
+guest_bitop(clear_bit)
+guest_bitop(change_bit)
+
+#undef guest_bitop
+
+guest_testop(test_and_set_bit)
+guest_testop(test_and_clear_bit)
+guest_testop(test_and_change_bit)
+
+#undef guest_testop
+
+
+#define guest_test_bit(d, nr, p) ((void)(d), test_bit(nr, p))
+
+#endif /* __ASM_GENERIC_GUEST_ATOMICS_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 11/29] xen/asm-generic: introduce stub header mem_access.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/mem_access.h | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 xen/include/asm-generic/mem_access.h

diff --git a/xen/include/asm-generic/mem_access.h 
b/xen/include/asm-generic/mem_access.h
new file mode 100644
index 00..d2a0b545a4
--- /dev/null
+++ b/xen/include/asm-generic/mem_access.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_MEM_ACCESS
+#define __ASM_GENERIC_MEM_ACCESS
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 02/29] xen/asm-generic: introduce stub header paging.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces stub header needed for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/paging.h | 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 xen/include/asm-generic/paging.h

diff --git a/xen/include/asm-generic/paging.h b/xen/include/asm-generic/paging.h
new file mode 100644
index 00..2aab63b536
--- /dev/null
+++ b/xen/include/asm-generic/paging.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_PAGING_H__
+#define __ASM_GENERIC_PAGING_H__
+
+#define paging_mode_translate(d)   (1)
+#define paging_mode_external(d)(1)
+
+#endif /* __ASM_GENERIC_PAGING_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 00/29] Introduce stub headers necessary for full Xen build

2023-09-14 Thread Oleksii Kurochko

Based on two patch series [1] and [2], the idea of which is to provide minimal
amount of things for a complete Xen build, a large amount of headers are the 
same
or almost the same, so it makes sense to move them to asm-generic.

Also, providing such stub headers should help future architectures to add
a full Xen build.

[1] 
https://lore.kernel.org/xen-devel/cover.1694543103.git.sanasta...@raptorengineering.com/
[2] 
https://lore.kernel.org/xen-devel/cover.1692181079.git.oleksii.kuroc...@gmail.com/

Oleksii Kurochko (29):
  xen/asm-generic: introduce stub header spinlock.h
  xen/asm-generic: introduce stub header paging.h
  xen/asm-generic: introduce stub header cpufeature.h
  xen/asm-generic: introduce stub header device.h
  xen/asm-generic: introduce stub header event.h
  xen/asm-generic: introduce stub header grant_table.h
  xen/asm-generic: introduce stub header guest_atomics.h
  xen/asm-generic: introduce stub hypercall.h
  xen/asm-generic: introduce stub header iocap.h
  xen/asm-generic: introduce stub header iommu.h
  xen/asm-generic: introduce stub header mem_access.h
  xen/asm-generic: introduce stub header pci.h
  xen/asm-generic: introduce stub header random.h
  xen/asm-generic: introduce stub header setup.h
  xen/asm-generic: introduce stub header xenoprof.h
  xen/asm-generic: introduce stub header flushtlb.h
  xen/asm-generic: introduce stub header percpu.h
  xen/asm-generic: introduce stub header smp.h
  xen/asm-generic: introduce stub header hardirq.h
  xen/asm-generic: introduce stub header div64.h
  xen/asm-generic: introduce stub header altp2m.h
  xen/asm-generic: introduce stub header delay.h
  xen/asm-generic: introduce stub header domain.h
  xen/asm-generic: introduce stub header guest_access.h
  xen/asm-generic: introduce stub header irq.h
  xen/asm-generic: introduce stub header monitor.h
  xen/asm-generic: introduce stub header numa.h
  xen/asm-generic: introduce stub header p2m.h
  xen/asm-generic: introduce stub header softirq.h

 xen/include/asm-generic/altp2m.h|  34 +++
 xen/include/asm-generic/cpufeature.h|  23 +
 xen/include/asm-generic/delay.h |  21 +
 xen/include/asm-generic/device.h|  65 ++
 xen/include/asm-generic/div64.h |  24 +
 xen/include/asm-generic/domain.h|  53 +++
 xen/include/asm-generic/event.h |  39 
 xen/include/asm-generic/flushtlb.h  |  42 +
 xen/include/asm-generic/grant_table.h   |  14 +++
 xen/include/asm-generic/guest_access.h  |  31 +++
 xen/include/asm-generic/guest_atomics.h |  47 ++
 xen/include/asm-generic/hardirq.h   |  27 ++
 xen/include/asm-generic/hypercall.h |  14 +++
 xen/include/asm-generic/iocap.h |  17 
 xen/include/asm-generic/iommu.h |  17 
 xen/include/asm-generic/irq.h   |  44 +
 xen/include/asm-generic/mem_access.h|  14 +++
 xen/include/asm-generic/monitor.h   |  64 +
 xen/include/asm-generic/numa.h  |  35 
 xen/include/asm-generic/p2m.h   | 115 
 xen/include/asm-generic/paging.h|  17 
 xen/include/asm-generic/pci.h   |  18 
 xen/include/asm-generic/percpu.h|  35 
 xen/include/asm-generic/random.h|  20 +
 xen/include/asm-generic/setup.h |  16 
 xen/include/asm-generic/smp.h   |  30 +++
 xen/include/asm-generic/softirq.h   |  17 
 xen/include/asm-generic/spinlock.h  |  23 +
 xen/include/asm-generic/xenoprof.h  |  14 +++
 29 files changed, 930 insertions(+)
 create mode 100644 xen/include/asm-generic/altp2m.h
 create mode 100644 xen/include/asm-generic/cpufeature.h
 create mode 100644 xen/include/asm-generic/delay.h
 create mode 100644 xen/include/asm-generic/device.h
 create mode 100644 xen/include/asm-generic/div64.h
 create mode 100644 xen/include/asm-generic/domain.h
 create mode 100644 xen/include/asm-generic/event.h
 create mode 100644 xen/include/asm-generic/flushtlb.h
 create mode 100644 xen/include/asm-generic/grant_table.h
 create mode 100644 xen/include/asm-generic/guest_access.h
 create mode 100644 xen/include/asm-generic/guest_atomics.h
 create mode 100644 xen/include/asm-generic/hardirq.h
 create mode 100644 xen/include/asm-generic/hypercall.h
 create mode 100644 xen/include/asm-generic/iocap.h
 create mode 100644 xen/include/asm-generic/iommu.h
 create mode 100644 xen/include/asm-generic/irq.h
 create mode 100644 xen/include/asm-generic/mem_access.h
 create mode 100644 xen/include/asm-generic/monitor.h
 create mode 100644 xen/include/asm-generic/numa.h
 create mode 100644 xen/include/asm-generic/p2m.h
 create mode 100644 xen/include/asm-generic/paging.h
 create mode 100644 xen/include/asm-generic/pci.h
 create mode 100644 xen/include/asm-generic/percpu.h
 create mode 100644 xen/include/asm-generic/random.h
 create mode 100644 xen/include/asm-generic/setup.h
 create mode 100644 xen/include/asm-

[PATCH v1 08/29] xen/asm-generic: introduce stub hypercall.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces header stub necessry for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/hypercall.h | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 xen/include/asm-generic/hypercall.h

diff --git a/xen/include/asm-generic/hypercall.h 
b/xen/include/asm-generic/hypercall.h
new file mode 100644
index 00..d89196fb3e
--- /dev/null
+++ b/xen/include/asm-generic/hypercall.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_HYPERCALL_H__
+#define __ASM_GENERIC_HYPERCALL_H__
+
+#endif /* __ASM_GENERIC_HYPERCALL_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 01/29] xen/asm-generic: introduce stub header spinlock.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces stub header needed for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/spinlock.h | 23 +++
 1 file changed, 23 insertions(+)
 create mode 100644 xen/include/asm-generic/spinlock.h

diff --git a/xen/include/asm-generic/spinlock.h 
b/xen/include/asm-generic/spinlock.h
new file mode 100644
index 00..22a9ec5222
--- /dev/null
+++ b/xen/include/asm-generic/spinlock.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_SPINLOCK_H__
+#define __ASM_GENERIC_SPINLOCK_H__
+
+#define arch_lock_acquire_barrier() smp_mb()
+#define arch_lock_release_barrier() smp_mb()
+
+#define arch_lock_relax() cpu_relax()
+#define arch_lock_signal() do { \
+} while(0)
+
+#define arch_lock_signal_wmb() arch_lock_signal()
+
+#endif /* __ASM_GENERIC_SPINLOCK_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 06/29] xen/asm-generic: introduce stub header grant_table.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces stub header needed for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/grant_table.h | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 xen/include/asm-generic/grant_table.h

diff --git a/xen/include/asm-generic/grant_table.h 
b/xen/include/asm-generic/grant_table.h
new file mode 100644
index 00..bd8d85f1ff
--- /dev/null
+++ b/xen/include/asm-generic/grant_table.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_GRANTTABLE_H__
+#define __ASM_GENERIC_GRANTTABLE_H__
+
+#endif /* __ASM_GENERIC_GRANTTABLE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 05/29] xen/asm-generic: introduce stub header event.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces stub header needed for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/event.h | 39 +
 1 file changed, 39 insertions(+)
 create mode 100644 xen/include/asm-generic/event.h

diff --git a/xen/include/asm-generic/event.h b/xen/include/asm-generic/event.h
new file mode 100644
index 00..d25ba36aad
--- /dev/null
+++ b/xen/include/asm-generic/event.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_EVENT_H__
+#define __ASM_GENERIC_EVENT_H__
+
+#include 
+
+static inline void vcpu_mark_events_pending(struct vcpu *v)
+{
+}
+
+static inline int vcpu_event_delivery_is_enabled(struct vcpu *v)
+{
+return 0;
+}
+
+static inline int local_events_need_delivery(void)
+{
+return 0;
+}
+
+static inline void local_event_delivery_enable(void)
+{
+}
+
+static inline bool arch_virq_is_global(unsigned int virq)
+{
+return true;
+}
+
+#endif /* __ASM_GENERIC_EVENT_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 04/29] xen/asm-generic: introduce stub header device.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces stub header needed for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/device.h | 65 
 1 file changed, 65 insertions(+)
 create mode 100644 xen/include/asm-generic/device.h

diff --git a/xen/include/asm-generic/device.h b/xen/include/asm-generic/device.h
new file mode 100644
index 00..66e69ecd78
--- /dev/null
+++ b/xen/include/asm-generic/device.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_DEVICE_H__
+#define __ASM_GENERIC_DEVICE_H__
+
+struct dt_device_node;
+
+enum device_type
+{
+DEV_DT,
+DEV_PCI,
+};
+
+struct device {
+enum device_type type;
+#ifdef CONFIG_HAS_DEVICE_TREE
+struct dt_device_node *of_node; /* Used by drivers imported from Linux */
+#endif
+};
+
+enum device_class
+{
+DEVICE_SERIAL,
+DEVICE_IOMMU,
+DEVICE_GIC,
+DEVICE_PCI_HOSTBRIDGE,
+/* Use for error */
+DEVICE_UNKNOWN,
+};
+
+struct device_desc {
+/* Device name */
+const char *name;
+/* Device class */
+enum device_class class;
+/* List of devices supported by this driver */
+const struct dt_device_match *dt_match;
+/*
+ * Device initialization.
+ *
+ * -EAGAIN is used to indicate that device probing is deferred.
+ */
+int (*init)(struct dt_device_node *dev, const void *data);
+};
+
+typedef struct device device_t;
+
+#define DT_DEVICE_START(_name, _namestr, _class)\
+static const struct device_desc __dev_desc_##_name __used   \
+__section(".dev.info") = {  \
+.name = _namestr,   \
+.class = _class,\
+
+#define DT_DEVICE_END   \
+};
+
+#endif /* __ASM_GENERIC_DEVICE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v1 03/29] xen/asm-generic: introduce stub header cpufeature.h

2023-09-14 Thread Oleksii Kurochko

The patch introduces stub header needed for full Xen build.

Signed-off-by: Oleksii Kurochko 
---
 xen/include/asm-generic/cpufeature.h | 23 +++
 1 file changed, 23 insertions(+)
 create mode 100644 xen/include/asm-generic/cpufeature.h

diff --git a/xen/include/asm-generic/cpufeature.h 
b/xen/include/asm-generic/cpufeature.h
new file mode 100644
index 00..86e2a8b455
--- /dev/null
+++ b/xen/include/asm-generic/cpufeature.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_CPUFEATURE_H__
+#define __ASM_GENERIC_CPUFEATURE_H__
+
+#ifndef __ASSEMBLY__
+
+static inline int cpu_nr_siblings(unsigned int cpu)
+{
+return 1;
+}
+
+#endif
+
+#endif /* __ASM_GENERIC_CPUFEATURE_H__  */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.41.0

[PATCH v3] x86/amd: do not expose HWCR.TscFreqSel to guests

2023-09-14 Thread Roger Pau Monne

OpenBSD 7.3 will unconditionally access HWCR if the TSC is reported as
Invariant, and it will then attempt to also unconditionally access PSTATE0 if
HWCR.TscFreqSel is set (currently the case on Xen).

The motivation for exposing HWCR.TscFreqSel was to avoid warning messages from
Linux.  It has been agreed that Linux should be changed instead to not
complaint about missing HWCR.TscFreqSel when running virtualized.

The relation between HWCR.TscFreqSel and PSTATE0 is not clearly written down in
the PPR, but it's natural for OSes to attempt to fetch the P0 frequency if the
TSC is stated to increment at the P0 frequency.

Exposing PSTATEn (PSTATE0 at least) with all zeroes is not a suitable solution
because the PstateEn bit is read-write, and OSes could legitimately attempt to
set PstateEn=1 which Xen couldn't handle.

Furthermore, the TscFreqSel bit is model specific and was never safe to expose
like this in the first place.  At a minimum it should have had a toolstack
adjustment to know not to migrate such a VM.

Therefore, simply remove the bit.  Note the HWCR itself is an architectural
register, and does need to be accessible by the guest.  Since HWCR contains
both architectural and non-architectural bits, going forward care must be taken
to assert the exposed value is correct on newer CPU families.

Reported-by: Solène Rapenne 
Link: https://github.com/QubesOS/qubes-issues/issues/8502
Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests')
Signed-off-by: Roger Pau Monné 
Reviewed-by: Andrew Cooper 
---
Changes since v2:
 - Adjust commit message.
---
 xen/arch/x86/msr.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 3f0450259cdf..c33dc78cd8f6 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -240,8 +240,7 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
 case MSR_K8_HWCR:
 if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
 goto gp_fault;
-*val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10
-   ? K8_HWCR_TSC_FREQ_SEL : 0;
+*val = 0;
 break;
 
 case MSR_VIRT_SPEC_CTRL:
-- 
2.42.0

Re: [PATCH 2/5] x86: Introduce x86_merge_dr6()

2023-09-14 Thread Jan Beulich

On 13.09.2023 01:21, Andrew Cooper wrote:
> The current logic used to update %dr6 when injecting #DB is buggy.  The
> architectural behaviour is to overwrite B{0..3} and accumulate all other bits.

While I consider this behavior plausible, forever since the introduction of
debug registers in i386 I have been missing a description in the manuals of
how %dr6 updating works. Can you point me at where the above is actually
spelled out?

Jan

Re: [PATCH 1/5] x86/pv: Fix the determiniation of whether to inject #DB

2023-09-14 Thread Andrew Cooper

On 14/09/2023 3:40 pm, Jan Beulich wrote:
> On 13.09.2023 01:21, Andrew Cooper wrote:
>> We long ago fixed the emulator to not inject exceptions behind our back.
>> Therefore, assert that that a PV event (including interrupts, because that
>> would be buggy too) isn't pending, rather than skipping the #DB injection if
>> one is.
>>
>> On the other hand, the io_emul() stubs which use X86EMUL_DONE rather than
>> X86EMUL_OKAY may have pending breakpoints to inject after the IO access is
>> complete, not to mention a pending singlestep.
> If you look at the uses of X86EMUL_DONE you'll see that this error code is
> not intended to ever come back from the emulator. It's solely used to
> communicate between hooks and the core emulator. Therefore I think this
> part of the description and the added case label are wrong here. With them
> dropped again ...

Oh.  I see that now you've pointed it out, but it's far from clear.

I'd suggest that we extend the the debug wrapper for x86_emulate() with
an assertion to this effect.  It also has a knock-on effect in later
patches.

With the DONE part dropped, this probably wants merging into patch 4. 
Thoughts?

~Andrew

Re: [PATCH 1/5] x86/pv: Fix the determiniation of whether to inject #DB

2023-09-14 Thread Jan Beulich

On 13.09.2023 01:21, Andrew Cooper wrote:
> We long ago fixed the emulator to not inject exceptions behind our back.
> Therefore, assert that that a PV event (including interrupts, because that
> would be buggy too) isn't pending, rather than skipping the #DB injection if
> one is.
> 
> On the other hand, the io_emul() stubs which use X86EMUL_DONE rather than
> X86EMUL_OKAY may have pending breakpoints to inject after the IO access is
> complete, not to mention a pending singlestep.

If you look at the uses of X86EMUL_DONE you'll see that this error code is
not intended to ever come back from the emulator. It's solely used to
communicate between hooks and the core emulator. Therefore I think this
part of the description and the added case label are wrong here. With them
dropped again ...

> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 

Jan

Re: [PATCH v2 for-4.18?] x86: support data operand independent timing mode

2023-09-14 Thread Demi Marie Obenour

On Thu, Sep 14, 2023 at 10:04:31AM +0100, Julien Grall wrote:
> Hi Jan,
> 
> On 14/09/2023 07:32, Jan Beulich wrote:
> > On 13.09.2023 19:56, Julien Grall wrote:
> > > On 11/09/2023 16:01, Jan Beulich wrote:
> > > > [1] specifies a long list of instructions which are intended to exhibit
> > > > timing behavior independent of the data they operate on. On certain
> > > > hardware this independence is optional, controlled by a bit in a new
> > > > MSR. Provide a command line option to control the mode Xen and its
> > > > guests are to operate in, with a build time control over the default.
> > > > Longer term we may want to allow guests to control this.
> > > 
> > > If I read correctly the discussion on the previous version. There was
> > > some concern with this version of patch. I can't find anything public
> > > suggesting that the concerned were dismissed. Do you have any pointer?
> > 
> > Well, I can point to the XenServer patch queue, which since then has
> > gained a patch of similar (less flexible) functionality (and seeing
> > the similarity I was puzzled by this patch, which was posted late
> > for 4.17, not having gone in quite some time ago). This to me
> > demonstrates that the original concerns were "downgraded". It would
> > of course still be desirable to have guests control the behavior for
> > themselves, but that's a more intrusive change left for the future.
> > 
> > Beyond that I guess I need to have Andrew speak for himself.
> > 
> > > > Since Arm64 supposedly also has such a control, put command line option
> > > > and Kconfig control in common files.
> > > > 
> > > > [1] 
> > > > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/data-operand-independent-timing-isa-guidance.html
> > > > 
> > > > Requested-by: Demi Marie Obenour 
> > > > Signed-off-by: Jan Beulich 
> > > > ---
> > > > This may be viewed as a new feature, and hence be too late for 4.18. It
> > > > may, however, also be viewed as security relevant, which is why I'd like
> > > > to propose to at least consider it.
> > > > 
> > > > Slightly RFC, in particular for whether the Kconfig option should
> > > > default to Y or N.
> > > 
> > > I am not entirely sure. I understand that DIT will help in the
> > > cryptographic case but it is not clear to me what is the performance 
> > > impact.
> > 
> > The entire purpose of the bit is to have a performance impact, slowing
> > down execution when fastpaths could be taken, but shouldn't to achieve
> > the named goal.
> 
> I understood that. My question was more related to how much it will degrade
> the performance. This will help to know whether the default should be yes or
> no.

This information is not available.  You could do benchmarks with and
without this patch if you wanted to obtain it.

> > > > --- a/xen/arch/x86/cpu/common.c
> > > > +++ b/xen/arch/x86/cpu/common.c
> > > > @@ -204,6 +204,28 @@ void ctxt_switch_levelling(const struct
> > > > alternative_vcall(ctxt_switch_masking, next);
> > > >}
> > > > +static void setup_doitm(void)
> > > > +{
> > > > +uint64_t msr;
> > > > +
> > > > +if ( !cpu_has_doitm )
> > > 
> > > I am not very familiar with the feature. If it is not present, then can
> > > we guarantee that the instructions will be executed in constant time?
> > 
> > _We_ cannot guarantee anything. It is only hardware vendors who can. As a
> > result I can only again refer you to the referenced documentation. It
> > claims that without the bit being supported in hardware, an extensive
> > list of insns is going to expose data operand independent performance.
> 
> Right. So ...
> 
> > 
> > > If not, I think we should taint Xen and/or print a message if the admin
> > > requested to use DIT. This would make clear that the admin is trying to
> > > do something that doesn't work.
> > 
> > Tainting Xen on all hardware not exposing the bit would seem entirely
> > unreasonable to me. If we learned of specific cases where the vendor
> > promises are broken, we could taint there, sure. I guess that would be
> > individual XSAs / CVEs then for each instance.
> 
> ... we need to somehow let the user know that the HW it is running on may
> not honor DIT. Tainting might be a bit too harsh, but I think printing a
> message with warning_add() is necessary.

Prior to DOITM, the data-operand-independent timing of certain
instructions was an implicit contract between hardware and software.
DOITM made several changes:

1. The implicit contract has been replaced by an architectural
   guarantee.  This guarantee has a specific list of instructions to
   which it applies, as well as a specific list of operands that timing
   will be independent of.  The guarantee has also been made conditional
   on DOITM being enabled — if it is disabled, no guarantees are made.

2. Intel has retroactively extended this guarantee to all previous CPU
   architectures, including ones that do not give software control over

[PATCH v2] timer: fix NR_CPUS=1 build with gcc13

2023-09-14 Thread Jan Beulich

Gcc13 apparently infers from "if ( old_cpu < new_cpu )" that "new_cpu"
is >= 1, and then (on x86) complains about "per_cpu(timers, new_cpu)"
exceeding __per_cpu_offset[]'s bounds (being an array of 1 in such a
configuration). Make the code conditional upon there being at least 2
CPUs configured (otherwise there simply is nothing to migrate [to]).

Signed-off-by: Jan Beulich 
---
v2: Warn if it looks like an actual migration was (bogusly) requested.

--- a/xen/common/timer.c
+++ b/xen/common/timer.c
@@ -357,6 +357,7 @@ bool timer_expires_before(struct timer *
 void migrate_timer(struct timer *timer, unsigned int new_cpu)
 {
 unsigned int old_cpu;
+#if CONFIG_NR_CPUS > 1
 bool_t active;
 unsigned long flags;
 
@@ -404,6 +405,11 @@ void migrate_timer(struct timer *timer,
 
 spin_unlock(&per_cpu(timers, old_cpu).lock);
 spin_unlock_irqrestore(&per_cpu(timers, new_cpu).lock, flags);
+#else /* CONFIG_NR_CPUS == 1 */
+old_cpu = read_atomic(&timer->cpu);
+if ( old_cpu != TIMER_CPU_status_killed )
+WARN_ON(new_cpu != old_cpu);
+#endif /* CONFIG_NR_CPUS */
 }

[xen-unstable test] 183000: tolerable FAIL

2023-09-14 Thread osstest service owner

flight 183000 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183000/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 182997
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 182997
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 182997
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 182997
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 182997
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 182997
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 182997
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 182997
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 182997
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 182997
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 182997
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 182997
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 xen  6aa25c32180ab59081c73bae4c568367d9133a1f
baseline version:
 xen  6aa25c32180ab59081c73bae4c568367d9133a1f

Last test of basis   183000  2023-09-14 04:07:29 Z0 days
Testing same since  (not found) 0 attempts

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i3

Re: [PATCH v10 05/38] x86/trapnr: Add event type macros to

2023-09-14 Thread andrew . cooper3

On 14/09/2023 5:47 am, Xin Li wrote:
> Intel VT-x classifies events into eight different types, which is
> inherited by FRED for event identification. As such, event type
> becomes a common x86 concept, and should be defined in a common x86
> header.
>
> Add event type macros to , and use it in .
>
> Suggested-by: H. Peter Anvin (Intel) 
> Tested-by: Shan Kang 
> Signed-off-by: Xin Li 
> ---
>  arch/x86/include/asm/trapnr.h | 12 
>  arch/x86/include/asm/vmx.h| 17 +
>  2 files changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/include/asm/trapnr.h b/arch/x86/include/asm/trapnr.h
> index f5d2325aa0b7..ab7e4c9d666f 100644
> --- a/arch/x86/include/asm/trapnr.h
> +++ b/arch/x86/include/asm/trapnr.h
> @@ -2,6 +2,18 @@
>  #ifndef _ASM_X86_TRAPNR_H
>  #define _ASM_X86_TRAPNR_H
>  
> +/*
> + * Event type codes used by both FRED and Intel VT-x

And AMD SVM.  This enumeration has never been unique to just VT-x.

> + */
> +#define EVENT_TYPE_EXTINT0   // External interrupt
> +#define EVENT_TYPE_RESERVED  1
> +#define EVENT_TYPE_NMI   2   // NMI
> +#define EVENT_TYPE_HWEXC 3   // Hardware originated traps, exceptions
> +#define EVENT_TYPE_SWINT 4   // INT n
> +#define EVENT_TYPE_PRIV_SWEXC5   // INT1
> +#define EVENT_TYPE_SWEXC 6   // INT0, INT3

Typo.  into, not int0  (the difference shows up more clearly in lower case.)

> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 0e73616b82f3..c84acfefcd31 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -374,14 +375,14 @@ enum vmcs_field {
>  #define VECTORING_INFO_DELIVER_CODE_MASK INTR_INFO_DELIVER_CODE_MASK
>  #define VECTORING_INFO_VALID_MASKINTR_INFO_VALID_MASK
>  
> -#define INTR_TYPE_EXT_INTR  (0 << 8) /* external interrupt */
> -#define INTR_TYPE_RESERVED  (1 << 8) /* reserved */
> -#define INTR_TYPE_NMI_INTR   (2 << 8) /* NMI */
> -#define INTR_TYPE_HARD_EXCEPTION (3 << 8) /* processor exception */
> -#define INTR_TYPE_SOFT_INTR (4 << 8) /* software interrupt */
> -#define INTR_TYPE_PRIV_SW_EXCEPTION  (5 << 8) /* ICE breakpoint - 
> undocumented */
> -#define INTR_TYPE_SOFT_EXCEPTION (6 << 8) /* software exception */
> -#define INTR_TYPE_OTHER_EVENT   (7 << 8) /* other event */
> +#define INTR_TYPE_EXT_INTR   (EVENT_TYPE_EXTINT << 8)/* 
> external interrupt */
> +#define INTR_TYPE_RESERVED   (EVENT_TYPE_RESERVED << 8)  /* 
> reserved */
> +#define INTR_TYPE_NMI_INTR   (EVENT_TYPE_NMI << 8)   /* NMI 
> */
> +#define INTR_TYPE_HARD_EXCEPTION (EVENT_TYPE_HWEXC << 8) /* 
> processor exception */
> +#define INTR_TYPE_SOFT_INTR  (EVENT_TYPE_SWINT << 8) /* 
> software interrupt */
> +#define INTR_TYPE_PRIV_SW_EXCEPTION  (EVENT_TYPE_PRIV_SWEXC << 8)/* ICE 
> breakpoint - undocumented */

ICEBP/INT1 is no longer undocumented.

~Andrew

Re: [PATCH v2] x86/amd: do not expose HWCR.TscFreqSel to guests

2023-09-14 Thread Jan Beulich

On 14.09.2023 15:39, Roger Pau Monné wrote:
> On Thu, Sep 14, 2023 at 03:16:40PM +0200, Jan Beulich wrote:
>> On 14.09.2023 14:57, Roger Pau Monné wrote:
>>> On Thu, Sep 14, 2023 at 02:49:45PM +0200, Jan Beulich wrote:
 On 14.09.2023 14:37, Roger Pau Monné wrote:
> On Thu, Sep 14, 2023 at 07:52:33AM +0200, Jan Beulich wrote:
>> On 13.09.2023 16:52, Roger Pau Monne wrote:
>>> OpenBSD 7.3 will unconditionally access HWCR if the TSC is reported as
>>> Invariant, and it will then attempt to also unconditionally access 
>>> PSTATE0 if
>>> HWCR.TscFreqSel is set (currently the case on Xen).
>>>
>>> The relation between HWCR.TscFreqSel and PSTATE0 is not clearly written 
>>> down in
>>> the PPR, but it's natural for OSes to attempt to fetch the P0 frequency 
>>> if the
>>> TSC increments at the P0 frequency.
>>>
>>> Exposing PSTATEn (PSTATE0 at least) with all zeroes is not a suitable 
>>> solution
>>> because the PstateEn bit is read-write, and OSes could legitimately 
>>> attempt to
>>> set PstateEn=1 which Xen couldn't handle.
>>>
>>> In order to fix expose an empty HWCR, which is an architectural MSR and 
>>> so must
>>> be accessible.
>>>
>>> Note it was not safe to expose the TscFreqSel bit because it is not
>>> architectural, and could change meaning between models.
>>
>> This imo wants (or even needs) extending to address the aspect of then
>> exposing, on newer families, a r/o bit with the wrong value.
>
> We could always be exposing bits with the wrong values on newer
> (unreleased?) families, I'm not sure why it needs explicit mentioning
> here.

 Hmm, yes, that's one way to look at things. Yet exposing plain zero is
 pretty clearly not within spec here,
>>>
>>> As I understand it, the fact that HWCR.TscFreqSel is read-only doesn't
>>> exclude the possibility of it changing using other means (iow: we
>>> should consider that a write to a different register could have the
>>> side effect of toggling the bit).
>>>
>>> The PPR I'm reading doesn't mention that the bit must be 1, just that
>>> it's 1 on reset and read-only.
>>
>> Sure; the PPR being incomplete doesn't help here. My interpretation, based
>> on the bit having been r/w in earlier families, is that AMD wanted to retain
>> its meaning without allowing it to be configurable anymore. Possibly a sign
>> of it remaining so going forward.
> 
> What about:
> 
> "Note it was not safe to expose the TscFreqSel bit because it is not
> architectural, and could change meaning between models.  Since HWCR
> contains both architectural and non-architectural bits, going forward
> care must be taken to assert the exposed value is correct on newer
> CPU families."

Fine with me.

Jan

Re: [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support

2023-09-14 Thread andrew . cooper3

On 14/09/2023 5:47 am, Xin Li wrote:
> Add an always inline API __wrmsrns() to embed the WRMSRNS instruction
> into the code.
>
> Tested-by: Shan Kang 
> Signed-off-by: Xin Li 
> ---
>  arch/x86/include/asm/msr.h | 18 ++
>  1 file changed, 18 insertions(+)
>
> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> index 65ec1965cd28..c284ff9ebe67 100644
> --- a/arch/x86/include/asm/msr.h
> +++ b/arch/x86/include/asm/msr.h
> @@ -97,6 +97,19 @@ static __always_inline void __wrmsr(unsigned int msr, u32 
> low, u32 high)
>: : "c" (msr), "a"(low), "d" (high) : "memory");
>  }
>  
> +/*
> + * WRMSRNS behaves exactly like WRMSR with the only difference being
> + * that it is not a serializing instruction by default.
> + */
> +static __always_inline void __wrmsrns(u32 msr, u32 low, u32 high)
> +{
> + /* Instruction opcode for WRMSRNS; supported in binutils >= 2.40. */
> + asm volatile("1: .byte 0x0f,0x01,0xc6\n"
> +  "2:\n"
> +  _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
> +  : : "c" (msr), "a"(low), "d" (high));
> +}
> +
>  #define native_rdmsr(msr, val1, val2)\
>  do { \
>   u64 __val = __rdmsr((msr)); \
> @@ -297,6 +310,11 @@ do { 
> \
>  
>  #endif   /* !CONFIG_PARAVIRT_XXL */
>  
> +static __always_inline void wrmsrns(u32 msr, u64 val)
> +{
> + __wrmsrns(msr, val, val >> 32);
> +}

This API works in terms of this series where every WRMSRNS is hidden
behind a FRED check, but it's an awkward interface to use anywhere else
in the kernel.

I fully understand that you expect all FRED capable systems to have
WRMSRNS, but it is not a hard requirement and you will end up with
simpler (and therefore better) logic by deleting the dependency.

As a "normal" user of the WRMSR APIs, the programmer only cares about:

1) wrmsr() -> needs to be serialising
2) wrmsr_ns() -> safe to be non-serialising

In Xen, I added something of the form:

/* Non-serialising WRMSR, when available.  Falls back to a serialising
WRMSR. */
static inline void wrmsr_ns(uint32_t msr, uint32_t lo, uint32_t hi)
{
    /*
 * WRMSR is 2 bytes.  WRMSRNS is 3 bytes.  Pad WRMSR with a redundant CS
 * prefix to avoid a trailing NOP.
 */
    alternative_input(".byte 0x2e; wrmsr",
  ".byte 0x0f,0x01,0xc6", X86_FEATURE_WRMSRNS,
  "c" (msr), "a" (lo), "d" (hi));
}

and despite what Juergen said, I'm going to recommend that you do wire
this through the paravirt infrastructure, for the benefit of regular
users having a nice API, not because XenPV is expecting to do something
wildly different here.

I'd actually go as far as suggesting that you break patches 1-3 into
different series and e.g. update the regular context switch path to use
the WRMSRNS-falling-back-to-WRMSR helpers, just to get started.

~Andrew

Re: [PATCH v2] x86/amd: do not expose HWCR.TscFreqSel to guests

2023-09-14 Thread Roger Pau Monné

On Thu, Sep 14, 2023 at 03:16:40PM +0200, Jan Beulich wrote:
> On 14.09.2023 14:57, Roger Pau Monné wrote:
> > On Thu, Sep 14, 2023 at 02:49:45PM +0200, Jan Beulich wrote:
> >> On 14.09.2023 14:37, Roger Pau Monné wrote:
> >>> On Thu, Sep 14, 2023 at 07:52:33AM +0200, Jan Beulich wrote:
>  On 13.09.2023 16:52, Roger Pau Monne wrote:
> > OpenBSD 7.3 will unconditionally access HWCR if the TSC is reported as
> > Invariant, and it will then attempt to also unconditionally access 
> > PSTATE0 if
> > HWCR.TscFreqSel is set (currently the case on Xen).
> >
> > The relation between HWCR.TscFreqSel and PSTATE0 is not clearly written 
> > down in
> > the PPR, but it's natural for OSes to attempt to fetch the P0 frequency 
> > if the
> > TSC increments at the P0 frequency.
> >
> > Exposing PSTATEn (PSTATE0 at least) with all zeroes is not a suitable 
> > solution
> > because the PstateEn bit is read-write, and OSes could legitimately 
> > attempt to
> > set PstateEn=1 which Xen couldn't handle.
> >
> > In order to fix expose an empty HWCR, which is an architectural MSR and 
> > so must
> > be accessible.
> >
> > Note it was not safe to expose the TscFreqSel bit because it is not
> > architectural, and could change meaning between models.
> 
>  This imo wants (or even needs) extending to address the aspect of then
>  exposing, on newer families, a r/o bit with the wrong value.
> >>>
> >>> We could always be exposing bits with the wrong values on newer
> >>> (unreleased?) families, I'm not sure why it needs explicit mentioning
> >>> here.
> >>
> >> Hmm, yes, that's one way to look at things. Yet exposing plain zero is
> >> pretty clearly not within spec here,
> > 
> > As I understand it, the fact that HWCR.TscFreqSel is read-only doesn't
> > exclude the possibility of it changing using other means (iow: we
> > should consider that a write to a different register could have the
> > side effect of toggling the bit).
> > 
> > The PPR I'm reading doesn't mention that the bit must be 1, just that
> > it's 1 on reset and read-only.
> 
> Sure; the PPR being incomplete doesn't help here. My interpretation, based
> on the bit having been r/w in earlier families, is that AMD wanted to retain
> its meaning without allowing it to be configurable anymore. Possibly a sign
> of it remaining so going forward.

What about:

"Note it was not safe to expose the TscFreqSel bit because it is not
architectural, and could change meaning between models.  Since HWCR
contains both architectural and non-architectural bits, going forward
care must be taken to assert the exposed value is correct on newer
CPU families."

Thanks, Roger.

Re: [PATCH v2] x86/amd: do not expose HWCR.TscFreqSel to guests

2023-09-14 Thread Jan Beulich

On 14.09.2023 14:57, Roger Pau Monné wrote:
> On Thu, Sep 14, 2023 at 02:49:45PM +0200, Jan Beulich wrote:
>> On 14.09.2023 14:37, Roger Pau Monné wrote:
>>> On Thu, Sep 14, 2023 at 07:52:33AM +0200, Jan Beulich wrote:
 On 13.09.2023 16:52, Roger Pau Monne wrote:
> OpenBSD 7.3 will unconditionally access HWCR if the TSC is reported as
> Invariant, and it will then attempt to also unconditionally access 
> PSTATE0 if
> HWCR.TscFreqSel is set (currently the case on Xen).
>
> The relation between HWCR.TscFreqSel and PSTATE0 is not clearly written 
> down in
> the PPR, but it's natural for OSes to attempt to fetch the P0 frequency 
> if the
> TSC increments at the P0 frequency.
>
> Exposing PSTATEn (PSTATE0 at least) with all zeroes is not a suitable 
> solution
> because the PstateEn bit is read-write, and OSes could legitimately 
> attempt to
> set PstateEn=1 which Xen couldn't handle.
>
> In order to fix expose an empty HWCR, which is an architectural MSR and 
> so must
> be accessible.
>
> Note it was not safe to expose the TscFreqSel bit because it is not
> architectural, and could change meaning between models.

 This imo wants (or even needs) extending to address the aspect of then
 exposing, on newer families, a r/o bit with the wrong value.
>>>
>>> We could always be exposing bits with the wrong values on newer
>>> (unreleased?) families, I'm not sure why it needs explicit mentioning
>>> here.
>>
>> Hmm, yes, that's one way to look at things. Yet exposing plain zero is
>> pretty clearly not within spec here,
> 
> As I understand it, the fact that HWCR.TscFreqSel is read-only doesn't
> exclude the possibility of it changing using other means (iow: we
> should consider that a write to a different register could have the
> side effect of toggling the bit).
> 
> The PPR I'm reading doesn't mention that the bit must be 1, just that
> it's 1 on reset and read-only.

Sure; the PPR being incomplete doesn't help here. My interpretation, based
on the bit having been r/w in earlier families, is that AMD wanted to retain
its meaning without allowing it to be configurable anymore. Possibly a sign
of it remaining so going forward.

Jan

Re: [PATCH v10 08/38] x86/cpufeatures: Add the cpu feature bit for FRED

2023-09-14 Thread andrew . cooper3

On 14/09/2023 7:09 am, Jan Beulich wrote:
> On 14.09.2023 08:03, Juergen Gross wrote:
>> On 14.09.23 06:47, Xin Li wrote:
>>> From: "H. Peter Anvin (Intel)" 
>>>
>>> Any FRED CPU will always have the following features as its baseline:
>>>1) LKGS, load attributes of the GS segment but the base address into
>>>   the IA32_KERNEL_GS_BASE MSR instead of the GS segment’s descriptor
>>>   cache.
>>>2) WRMSRNS, non-serializing WRMSR for faster MSR writes.
>>>
>>> Signed-off-by: H. Peter Anvin (Intel) 
>>> Tested-by: Shan Kang 
>>> Signed-off-by: Xin Li 
>> In order to avoid having to add paravirt support for FRED I think
>> xen_init_capabilities() should gain:
>>
>> +setup_clear_cpu_cap(X86_FEATURE_FRED);
> I don't view it as very likely that Xen would expose FRED to PV guests
> (Andrew?), at which point such a precaution may not be necessary.

PV guests are never going to see FRED (or LKGS for that matter) because
it advertises too much stuff which simply traps because the kernel is in
CPL3.

That said, the 64bit PV ABI is a whole lot closer to FRED than it is to
IDT delivery.  (Almost as if we decided 15 years ago that giving the PV
guest kernel a good stack and GSbase was the right thing to do...)

In some copious free time, I think we ought to provide a
minorly-paravirt FRED to PV guests because there are still some
improvements available as low hanging fruit.

My plan was to have a PV hypervisor leaf advertising paravirt versions
of hardware features, so a guest could see "I don't have architectural
FRED, but I do have paravirt-FRED which is as similar as we can
reasonably make it".  The same goes for a whole bunch of other features.

~Andrew

Re: [PATCH 8/8] x86/spec-ctrl: Mitigate the Zen1 DIV leakge

2023-09-14 Thread Jason Andryuk

On Wed, Sep 13, 2023 at 6:09 PM Andrew Cooper  wrote:
> @@ -955,6 +960,40 @@ static void __init srso_calculations(bool hw_smt_enabled)
>  setup_force_cpu_cap(X86_FEATURE_SRSO_NO);
>  }
>
> +/*
> + * Div leakage is specific to the AMD Zen1 microarchitecure.  Use STIBP as a
> + * heuristic to select between Zen1 and Zen2 uarches.
> + */
> +static bool __init has_div_vuln(void)
> +{
> +if ( !(boot_cpu_data.x86_vendor &
> +   (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
> +return false;
> +
> +if ( (boot_cpu_data.x86 != 0x17 && boot_cpu_data.x86 != 0x18) ||
> + !boot_cpu_has(X86_FEATURE_AMD_STIBP) )
> +return false;
> +
> +return true;
> +}
> +
> +static void __init div_calculations(bool hw_smt_enabled)
> +{
> +bool cpu_bug_div = has_div_vuln();
> +

Would it make sense to add
if ( !cpu_bug_div )
return
...

> +if ( opt_div_scrub == -1 )
> +opt_div_scrub = cpu_bug_div;
> +
> +if ( opt_div_scrub )
> +setup_force_cpu_cap(X86_FEATURE_SC_DIV);

...so that div-scrub=1 isn't setting X86_FEATURE_SC_DIV on un-affected
hardware?  Or do you want to leave command line control in place in
case it might be needed as a future workaround on other hardware?

Thanks,
Jason

Re: [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support

2023-09-14 Thread andrew . cooper3

On 14/09/2023 7:02 am, Juergen Gross wrote:
> On 14.09.23 06:47, Xin Li wrote:
>> Add an always inline API __wrmsrns() to embed the WRMSRNS instruction
>> into the code.
>>
>> Tested-by: Shan Kang 
>> Signed-off-by: Xin Li 
>
> In order to avoid having to add paravirt support for WRMSRNS I think
> xen_init_capabilities() should gain:
>
> +    setup_clear_cpu_cap(X86_FEATURE_WRMSRNS);

Xen PV guests will never ever see WRMSRNS.  Operating in CPL3, they have
no possible way of adjusting an MSR which isn't serialising, because
even the hypercall forms are serialising.

Xen only exposes the bit for HVM guests.

~Andrew

Re: [PATCH v2] x86/amd: do not expose HWCR.TscFreqSel to guests

2023-09-14 Thread Roger Pau Monné

On Thu, Sep 14, 2023 at 02:49:45PM +0200, Jan Beulich wrote:
> On 14.09.2023 14:37, Roger Pau Monné wrote:
> > On Thu, Sep 14, 2023 at 07:52:33AM +0200, Jan Beulich wrote:
> >> On 13.09.2023 16:52, Roger Pau Monne wrote:
> >>> OpenBSD 7.3 will unconditionally access HWCR if the TSC is reported as
> >>> Invariant, and it will then attempt to also unconditionally access 
> >>> PSTATE0 if
> >>> HWCR.TscFreqSel is set (currently the case on Xen).
> >>>
> >>> The relation between HWCR.TscFreqSel and PSTATE0 is not clearly written 
> >>> down in
> >>> the PPR, but it's natural for OSes to attempt to fetch the P0 frequency 
> >>> if the
> >>> TSC increments at the P0 frequency.
> >>>
> >>> Exposing PSTATEn (PSTATE0 at least) with all zeroes is not a suitable 
> >>> solution
> >>> because the PstateEn bit is read-write, and OSes could legitimately 
> >>> attempt to
> >>> set PstateEn=1 which Xen couldn't handle.
> >>>
> >>> In order to fix expose an empty HWCR, which is an architectural MSR and 
> >>> so must
> >>> be accessible.
> >>>
> >>> Note it was not safe to expose the TscFreqSel bit because it is not
> >>> architectural, and could change meaning between models.
> >>
> >> This imo wants (or even needs) extending to address the aspect of then
> >> exposing, on newer families, a r/o bit with the wrong value.
> > 
> > We could always be exposing bits with the wrong values on newer
> > (unreleased?) families, I'm not sure why it needs explicit mentioning
> > here.
> 
> Hmm, yes, that's one way to look at things. Yet exposing plain zero is
> pretty clearly not within spec here,

As I understand it, the fact that HWCR.TscFreqSel is read-only doesn't
exclude the possibility of it changing using other means (iow: we
should consider that a write to a different register could have the
side effect of toggling the bit).

The PPR I'm reading doesn't mention that the bit must be 1, just that
it's 1 on reset and read-only.

Thanks, Roger.

Re: [PATCH v2] x86/amd: do not expose HWCR.TscFreqSel to guests

2023-09-14 Thread Andrew Cooper

On 13/09/2023 3:52 pm, Roger Pau Monne wrote:
> OpenBSD 7.3 will unconditionally access HWCR if the TSC is reported as
> Invariant, and it will then attempt to also unconditionally access PSTATE0 if
> HWCR.TscFreqSel is set (currently the case on Xen).
>
> The relation between HWCR.TscFreqSel and PSTATE0 is not clearly written down 
> in
> the PPR, but it's natural for OSes to attempt to fetch the P0 frequency if the
> TSC increments at the P0 frequency.

"TSC is stated to increment at ..."  would be slightly clearer IMO.

>
> Exposing PSTATEn (PSTATE0 at least) with all zeroes is not a suitable solution
> because the PstateEn bit is read-write, and OSes could legitimately attempt to
> set PstateEn=1 which Xen couldn't handle.
>
> In order to fix expose an empty HWCR, which is an architectural MSR and so 
> must
> be accessible.
>
> Note it was not safe to expose the TscFreqSel bit because it is not
> architectural, and could change meaning between models.

I'd suggest rearranging and adjusting these two paragraphs.

---
Furthermore, the TscFreqSel bit is model specific and was never safe to
expose like this in the first place.  At a minimum it should have had a
toolstack adjustment to know not to migrate such a VM.

Therefore, simply remove the bit.  Note the MSR_HWCR itself is an
architectural register, and does need to be accessible by the guest.
---

>
> Reported-by: Solène Rapenne 
> Link: https://github.com/QubesOS/qubes-issues/issues/8502
> Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests')
> Signed-off-by: Roger Pau Monné 

For the change itself, Reviewed-by: Andrew Cooper

Re: [PATCH v2] x86/amd: do not expose HWCR.TscFreqSel to guests

2023-09-14 Thread Jan Beulich

On 14.09.2023 14:37, Roger Pau Monné wrote:
> On Thu, Sep 14, 2023 at 07:52:33AM +0200, Jan Beulich wrote:
>> On 13.09.2023 16:52, Roger Pau Monne wrote:
>>> OpenBSD 7.3 will unconditionally access HWCR if the TSC is reported as
>>> Invariant, and it will then attempt to also unconditionally access PSTATE0 
>>> if
>>> HWCR.TscFreqSel is set (currently the case on Xen).
>>>
>>> The relation between HWCR.TscFreqSel and PSTATE0 is not clearly written 
>>> down in
>>> the PPR, but it's natural for OSes to attempt to fetch the P0 frequency if 
>>> the
>>> TSC increments at the P0 frequency.
>>>
>>> Exposing PSTATEn (PSTATE0 at least) with all zeroes is not a suitable 
>>> solution
>>> because the PstateEn bit is read-write, and OSes could legitimately attempt 
>>> to
>>> set PstateEn=1 which Xen couldn't handle.
>>>
>>> In order to fix expose an empty HWCR, which is an architectural MSR and so 
>>> must
>>> be accessible.
>>>
>>> Note it was not safe to expose the TscFreqSel bit because it is not
>>> architectural, and could change meaning between models.
>>
>> This imo wants (or even needs) extending to address the aspect of then
>> exposing, on newer families, a r/o bit with the wrong value.
> 
> We could always be exposing bits with the wrong values on newer
> (unreleased?) families, I'm not sure why it needs explicit mentioning
> here.

Hmm, yes, that's one way to look at things. Yet exposing plain zero is
pretty clearly not within spec here, and any such quirks we add
generally want justifying imo (as they might bite us again later).

>>> Reported-by: Solène Rapenne 
>>> Link: https://github.com/QubesOS/qubes-issues/issues/8502
>>> Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests')
>>> Signed-off-by: Roger Pau Monné 
>>
>> As mentioned before, with this Fixes: tag I think the description
>> absolutely needs to mention the (changed) view on the Linux log message
>> that had triggered the original change. It certainly helps here that
>> Thomas has already signaled agreement to a Linux side change.
> 
> Sure, what about:
> 
> "The motivation for exposing HWCR.TscFreqSel was to avoid warning
> messages from Linux.  It has been agreed that Linux should be changed
> instead to not complaint about missing HWCR.TscFreqSel when running
> virtualized."

Reads okay to me, thanks.

Jan

Re: [PATCH v2] x86/amd: do not expose HWCR.TscFreqSel to guests

2023-09-14 Thread Roger Pau Monné

On Thu, Sep 14, 2023 at 07:52:33AM +0200, Jan Beulich wrote:
> On 13.09.2023 16:52, Roger Pau Monne wrote:
> > OpenBSD 7.3 will unconditionally access HWCR if the TSC is reported as
> > Invariant, and it will then attempt to also unconditionally access PSTATE0 
> > if
> > HWCR.TscFreqSel is set (currently the case on Xen).
> > 
> > The relation between HWCR.TscFreqSel and PSTATE0 is not clearly written 
> > down in
> > the PPR, but it's natural for OSes to attempt to fetch the P0 frequency if 
> > the
> > TSC increments at the P0 frequency.
> > 
> > Exposing PSTATEn (PSTATE0 at least) with all zeroes is not a suitable 
> > solution
> > because the PstateEn bit is read-write, and OSes could legitimately attempt 
> > to
> > set PstateEn=1 which Xen couldn't handle.
> > 
> > In order to fix expose an empty HWCR, which is an architectural MSR and so 
> > must
> > be accessible.
> > 
> > Note it was not safe to expose the TscFreqSel bit because it is not
> > architectural, and could change meaning between models.
> 
> This imo wants (or even needs) extending to address the aspect of then
> exposing, on newer families, a r/o bit with the wrong value.

We could always be exposing bits with the wrong values on newer
(unreleased?) families, I'm not sure why it needs explicit mentioning
here.

> > Reported-by: Solène Rapenne 
> > Link: https://github.com/QubesOS/qubes-issues/issues/8502
> > Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests')
> > Signed-off-by: Roger Pau Monné 
> 
> As mentioned before, with this Fixes: tag I think the description
> absolutely needs to mention the (changed) view on the Linux log message
> that had triggered the original change. It certainly helps here that
> Thomas has already signaled agreement to a Linux side change.

Sure, what about:

"The motivation for exposing HWCR.TscFreqSel was to avoid warning
messages from Linux.  It has been agreed that Linux should be changed
instead to not complaint about missing HWCR.TscFreqSel when running
virtualized."

Thanks, Roger.

1 2 >

1 - 100 of 125 matches

Mail list logo