Re: arm32 tools/flask build failure

2020-12-17 Thread Jan Beulich
On 17.12.2020 02:54, Stefano Stabellini wrote:
> On Tue, 15 Dec 2020, Stefano Stabellini wrote:
>> Hi all,
>>
>> I am building Xen tools for ARM32 using qemu-user. I am getting the
>> following error building tools/flask. Everything else works fine. It is
>> worth noting that make -j1 works fine, it is only make -j4 that fails.
>>
>> I played with .NOTPARALLEL but couldn't get it to work. Anyone has any
>> ideas?
>>
>> Cheers,
>>
>> Stefano
>>
>>
>> make[2]: Leaving directory '/build/tools/flask/utils'
>> make[1]: Leaving directory '/build/tools/flask'
>> make[1]: Entering directory '/build/tools/flask'
>> /usr/bin/make -C policy all
>> make[2]: Entering directory '/build/tools/flask/policy'
>> make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
>> rule.
>> /build/tools/flask/policy/Makefile.common:115: *** target pattern contains 
>> no '%'.  Stop.
>> make[2]: Leaving directory '/build/tools/flask/policy'
>> make[1]: *** [/build/tools/flask/../../tools/Rules.mk:160: 
>> subdir-all-policy] Error 2
>> make[1]: Leaving directory '/build/tools/flask'
>> make: *** [/build/tools/flask/../../tools/Rules.mk:155: subdirs-all] Error 2
> 
> 
> The fix seems to be turning the problematic variable:
> 
> POLICY_FILENAME = $(FLASK_BUILD_DIR)/xenpolicy-$(shell $(MAKE) -C 
> $(XEN_ROOT)/xen xenversion --no-print-directory)
> 
> into a rule.

At a first glance this looks like just papering over the issue. When
I had looked at it yesterday after seeing your mail, I didn't even
spot this "interesting" make recursion. What I'd like to understand
first is where the % is coming from - the error message clearly
suggests that there's a % in the filename. Yet

.PHONY: xenversion
xenversion:
@echo $(XEN_FULLVERSION)

doesn't make clear to me where the % might be coming from. Of course
there's nothing at all precluding e.g. $(XEN_VENDORVERSION) to
contain a % character, but I don't think that's what you're running
into.

> --- a/tools/flask/policy/Makefile.common
> +++ b/tools/flask/policy/Makefile.common
> @@ -35,7 +35,6 @@ OUTPUT_POLICY ?= $(BEST_POLICY_VER)
>  #
>  
>  
> -POLICY_FILENAME = $(FLASK_BUILD_DIR)/xenpolicy-$(shell $(MAKE) -C 
> $(XEN_ROOT)/xen xenversion --no-print-directory)
>  POLICY_LOADPATH = /boot
>  
>  # List of policy versions supported by the hypervisor
> @@ -112,17 +111,19 @@ POLICY_SECTIONS += $(USERS)
>  POLICY_SECTIONS += $(ALL_CONSTRAINTS)
>  POLICY_SECTIONS += $(ISID_DEFS) $(DEV_OCONS)
>  
> -all: $(POLICY_FILENAME)
> +policy:

This is a phony target, isn't it? It then also needs marking so.
However, ...

> -install: $(POLICY_FILENAME)
> +all: policy
> +
> +install: policy
>   $(INSTALL_DIR) $(DESTDIR)/$(POLICY_LOADPATH)
>   $(INSTALL_DATA) $^ $(DESTDIR)/$(POLICY_LOADPATH)
>  
>  uninstall:
>   rm -f $(DESTDIR)/$(POLICY_LOADPATH)/$(POLICY_FILENAME)
>  
> -$(POLICY_FILENAME): $(FLASK_BUILD_DIR)/policy.conf
> - $(CHECKPOLICY) $(CHECKPOLICY_PARAM) $^ -o $@
> +policy: $(FLASK_BUILD_DIR)/policy.conf
> + $(CHECKPOLICY) $(CHECKPOLICY_PARAM) $^ -o xenpolicy-"$$($(MAKE) -C 
> $(XEN_ROOT)/xen xenversion --no-print-directory)"

... wouldn't it make sense to latch the version into an output
file, and use that as the target? Along the lines of

xenversion:
$(MAKE) -C $(XEN_ROOT)/xen --no-print-directory $@ >$@

but possibly utilizing move-if-changed. This would then result in
more "conventional" make recursion.

Jan



RE: [xen-unstable-smoke bisection] complete build-amd64-libvirt

2020-12-17 Thread Paul Durrant
> -Original Message-
> From: Wei Liu 
> Sent: 16 December 2020 10:44
> To: Andrew Cooper ; Paul Durrant 
> Cc: osstest service owner ; 
> xen-devel@lists.xenproject.org; Paul Durrant
> ; Wei Liu 
> Subject: Re: [xen-unstable-smoke bisection] complete build-amd64-libvirt
> 
> Paul, are you able to cook up a patch today? If not I will revert the
> offending patch(es).
> 

Sorry I was otherwise occupied yesterday. It's not so simple to avoid the API 
change the way things are in the series... it will
take a reasonable amount of re-factoring to avoid it. I'll re-base and fix it.

  Paul

> Wei.
> 
> On Wed, Dec 16, 2020 at 10:17:29AM +, Andrew Cooper wrote:
> > On 16/12/2020 02:27, osstest service owner wrote:
> > > branch xen-unstable-smoke
> > > xenbranch xen-unstable-smoke
> > > job build-amd64-libvirt
> > > testid libvirt-build
> > >
> > > Tree: libvirt git://xenbits.xen.org/libvirt.git
> > > Tree: libvirt_keycodemapdb https://gitlab.com/keycodemap/keycodemapdb.git
> > > Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
> > > Tree: qemuu git://xenbits.xen.org/qemu-xen.git
> > > Tree: xen git://xenbits.xen.org/xen.git
> > >
> > > *** Found and reproduced problem changeset ***
> > >
> > >   Bug is in tree:  xen git://xenbits.xen.org/xen.git
> > >   Bug introduced:  929f23114061a0089e6d63d109cf6a1d03d35c71
> > >   Bug not present: 8bc342b043a6838c03cd86039a34e3f8eea1242f
> > >   Last fail repro: 
> > > http://logs.test-lab.xenproject.org/osstest/logs/157589/
> > >
> > >
> > >   commit 929f23114061a0089e6d63d109cf6a1d03d35c71
> > >   Author: Paul Durrant 
> > >   Date:   Tue Dec 8 19:30:26 2020 +
> > >
> > >   libxl: introduce 'libxl_pci_bdf' in the idl...
> > >
> > >   ... and use in 'libxl_device_pci'
> > >
> > >   This patch is preparatory work for restricting the type passed to 
> > > functions
> > >   that only require BDF information, rather than passing a 
> > > 'libxl_device_pci'
> > >   structure which is only partially filled. In this patch only the 
> > > minimal
> > >   mechanical changes necessary to deal with the structural changes 
> > > are made.
> > >   Subsequent patches will adjust the code to make better use of the 
> > > new type.
> > >
> > >   Signed-off-by: Paul Durrant 
> > >   Acked-by: Wei Liu 
> > >   Acked-by: Nick Rosbrook 
> >
> > This breaks the API.  You can't make the following change in the IDL.
> >
> >  libxl_device_pci = Struct("device_pci", [
> > -    ("func",  uint8),
> > -    ("dev",   uint8),
> > -    ("bus",   uint8),
> > -    ("domain",    integer),
> > -    ("vdevfn",    uint32),
> > +    ("bdf", libxl_pci_bdf),
> > +    ("vdevfn", uint32),
> >
> > ~Andrew




[ovmf test] 157612: regressions - FAIL

2020-12-17 Thread osstest service owner
flight 157612 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/157612/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemuu-ovmf-amd64 12 debian-hvm-install fail REGR. vs. 157345
 test-amd64-amd64-xl-qemuu-ovmf-amd64 12 debian-hvm-install fail REGR. vs. 
157345

version targeted for testing:
 ovmf e6ae24e1d676bb2bdc0fc715b49b04908f41fc10
baseline version:
 ovmf f95e80d832e923046c92cd6f0b8208cec147138e

Last test of basis   157345  2020-12-09 12:40:46 Z7 days
Failing since157348  2020-12-09 15:39:39 Z7 days   51 attempts
Testing same since   157612  2020-12-16 21:09:14 Z0 days1 attempts


People who touched revisions under test:
  Abner Chang 
  Ard Biesheuvel 
  Baraneedharan Anbazhagan 
  Baraneedharan Anbazhagan 
  Bret Barkelew 
  Chen, Christine 
  Fan Wang 
  James Bottomley 
  Jiaxin Wu 
  Marc Moisson-Franckhauser 
  Michael D Kinney 
  Michael Kubacki 
  Pierre Gondois 
  Ray Ni 
  Rebecca Cran 
  Sami Mujawar 
  Sean Brogan 
  Sheng Wei 
  Siyuan Fu 
  Star Zeng 
  Ting Ye 
  Yuwei Chen 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 fail
 test-amd64-i386-xl-qemuu-ovmf-amd64  fail



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 699 lines long.)



[PATCH v3 01/15] x86/xen: use specific Xen pv interrupt entry for MCE

2020-12-17 Thread Juergen Gross
Xen PV guests don't use IST. For machine check interrupts switch to
the same model as debug interrupts.

Signed-off-by: Juergen Gross 
Acked-by: Peter Zijlstra (Intel) 
Reviewed-by: Thomas Gleixner 
---
 arch/x86/include/asm/idtentry.h |  3 +++
 arch/x86/xen/enlighten_pv.c | 16 +++-
 arch/x86/xen/xen-asm.S  |  2 +-
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 247a60a47331..5dd64404715a 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -585,6 +585,9 @@ DECLARE_IDTENTRY_MCE(X86_TRAP_MC,   exc_machine_check);
 #else
 DECLARE_IDTENTRY_RAW(X86_TRAP_MC,  exc_machine_check);
 #endif
+#ifdef CONFIG_XEN_PV
+DECLARE_IDTENTRY_RAW(X86_TRAP_MC,  xenpv_exc_machine_check);
+#endif
 #endif
 
 /* NMI */
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 4409306364dc..9f5e44c1f70a 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -583,6 +583,20 @@ DEFINE_IDTENTRY_RAW(xenpv_exc_debug)
exc_debug(regs);
 }
 
+#ifdef CONFIG_X86_MCE
+DEFINE_IDTENTRY_RAW(xenpv_exc_machine_check)
+{
+   /*
+* There's no IST on Xen PV, but we still need to dispatch
+* to the correct handler.
+*/
+   if (user_mode(regs))
+   noist_exc_machine_check(regs);
+   else
+   exc_machine_check(regs);
+}
+#endif
+
 struct trap_array_entry {
void (*orig)(void);
void (*xen)(void);
@@ -603,7 +617,7 @@ static struct trap_array_entry trap_array[] = {
TRAP_ENTRY_REDIR(exc_debug, true  ),
TRAP_ENTRY(exc_double_fault,true  ),
 #ifdef CONFIG_X86_MCE
-   TRAP_ENTRY(exc_machine_check,   true  ),
+   TRAP_ENTRY_REDIR(exc_machine_check, true  ),
 #endif
TRAP_ENTRY_REDIR(exc_nmi,   true  ),
TRAP_ENTRY(exc_int3,false ),
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 1cb0e84b9161..bc2586730a5b 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -172,7 +172,7 @@ xen_pv_trap asm_exc_spurious_interrupt_bug
 xen_pv_trap asm_exc_coprocessor_error
 xen_pv_trap asm_exc_alignment_check
 #ifdef CONFIG_X86_MCE
-xen_pv_trap asm_exc_machine_check
+xen_pv_trap asm_xenpv_exc_machine_check
 #endif /* CONFIG_X86_MCE */
 xen_pv_trap asm_exc_simd_coprocessor_error
 #ifdef CONFIG_IA32_EMULATION
-- 
2.26.2




[PATCH v3 04/15] x86/xen: drop USERGS_SYSRET64 paravirt call

2020-12-17 Thread Juergen Gross
USERGS_SYSRET64 is used to return from a syscall via sysret, but
a Xen PV guest will nevertheless use the iret hypercall, as there
is no sysret PV hypercall defined.

So instead of testing all the prerequisites for doing a sysret and
then mangling the stack for Xen PV again for doing an iret just use
the iret exit from the beginning.

This can easily be done via an ALTERNATIVE like it is done for the
sysenter compat case already.

It should be noted that this drops the optimization in Xen for not
restoring a few registers when returning to user mode, but it seems
as if the saved instructions in the kernel more than compensate for
this drop (a kernel build in a Xen PV guest was slightly faster with
this patch applied).

While at it remove the stale sysret32 remnants.

Signed-off-by: Juergen Gross 
---
V3:
- simplify ALTERNATIVE (Boris Petkov)
---
 arch/x86/entry/entry_64.S | 16 +++-
 arch/x86/include/asm/irqflags.h   |  6 --
 arch/x86/include/asm/paravirt.h   |  5 -
 arch/x86/include/asm/paravirt_types.h |  8 
 arch/x86/kernel/asm-offsets_64.c  |  2 --
 arch/x86/kernel/paravirt.c|  5 +
 arch/x86/kernel/paravirt_patch.c  |  4 
 arch/x86/xen/enlighten_pv.c   |  1 -
 arch/x86/xen/xen-asm.S| 20 
 arch/x86/xen/xen-ops.h|  2 --
 10 files changed, 8 insertions(+), 61 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a876204a73e0..ce0464d630a2 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -46,14 +46,6 @@
 .code64
 .section .entry.text, "ax"
 
-#ifdef CONFIG_PARAVIRT_XXL
-SYM_CODE_START(native_usergs_sysret64)
-   UNWIND_HINT_EMPTY
-   swapgs
-   sysretq
-SYM_CODE_END(native_usergs_sysret64)
-#endif /* CONFIG_PARAVIRT_XXL */
-
 /*
  * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers.
  *
@@ -123,7 +115,12 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, 
SYM_L_GLOBAL)
 * Try to use SYSRET instead of IRET if we're returning to
 * a completely clean 64-bit userspace context.  If we're not,
 * go to the slow exit path.
+* In the Xen PV case we must use iret anyway.
 */
+
+   ALTERNATIVE "", "jmpswapgs_restore_regs_and_return_to_usermode", \
+   X86_FEATURE_XENPV
+
movqRCX(%rsp), %rcx
movqRIP(%rsp), %r11
 
@@ -215,7 +212,8 @@ syscall_return_via_sysret:
 
popq%rdi
popq%rsp
-   USERGS_SYSRET64
+   swapgs
+   sysretq
 SYM_CODE_END(entry_SYSCALL_64)
 
 /*
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 8c86edefa115..e585a4705b8d 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -132,12 +132,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 #endif
 
 #define INTERRUPT_RETURN   jmp native_iret
-#define USERGS_SYSRET64\
-   swapgs; \
-   sysretq;
-#define USERGS_SYSRET32\
-   swapgs; \
-   sysretl
 
 #else
 #define INTERRUPT_RETURN   iret
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index f2ebe109a37e..dd43b1100a87 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -776,11 +776,6 @@ extern void default_banner(void);
 
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_PARAVIRT_XXL
-#define USERGS_SYSRET64
\
-   PARA_SITE(PARA_PATCH(PV_CPU_usergs_sysret64),   \
- ANNOTATE_RETPOLINE_SAFE;  \
- jmp PARA_INDIRECT(pv_ops+PV_CPU_usergs_sysret64);)
-
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(clobbers)\
PARA_SITE(PARA_PATCH(PV_IRQ_save_fl),   \
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 130f428b0cc8..0169365f1403 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -156,14 +156,6 @@ struct pv_cpu_ops {
 
u64 (*read_pmc)(int counter);
 
-   /*
-* Switch to usermode gs and return to 64-bit usermode using
-* sysret.  Only used in 64-bit kernels to return to 64-bit
-* processes.  Usermode register state, including %rsp, must
-* already be restored.
-*/
-   void (*usergs_sysret64)(void);
-
/* Normal iret.  Jump to this with the standard iret stack
   frame set up. */
void (*iret)(void);
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index 1354bc30614d..b14533af7676 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -13,8 +13,6 @@ int main(voi

[PATCH v3 02/15] x86/xen: use specific Xen pv interrupt entry for DF

2020-12-17 Thread Juergen Gross
Xen PV guests don't use IST. For double fault interrupts switch to
the same model as NMI.

Correct a typo in a comment while copying it.

Signed-off-by: Juergen Gross 
Acked-by: Peter Zijlstra (Intel) 
Reviewed-by: Thomas Gleixner 
---
V2:
- fix typo (Andy Lutomirski)
---
 arch/x86/include/asm/idtentry.h |  3 +++
 arch/x86/xen/enlighten_pv.c | 10 --
 arch/x86/xen/xen-asm.S  |  2 +-
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 5dd64404715a..3ac84cb702fc 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -608,6 +608,9 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_DB,   xenpv_exc_debug);
 
 /* #DF */
 DECLARE_IDTENTRY_DF(X86_TRAP_DF,   exc_double_fault);
+#ifdef CONFIG_XEN_PV
+DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF,xenpv_exc_double_fault);
+#endif
 
 /* #VC */
 #ifdef CONFIG_AMD_MEM_ENCRYPT
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 9f5e44c1f70a..76616024129e 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -567,10 +567,16 @@ void noist_exc_debug(struct pt_regs *regs);
 
 DEFINE_IDTENTRY_RAW(xenpv_exc_nmi)
 {
-   /* On Xen PV, NMI doesn't use IST.  The C part is the sane as native. */
+   /* On Xen PV, NMI doesn't use IST.  The C part is the same as native. */
exc_nmi(regs);
 }
 
+DEFINE_IDTENTRY_RAW_ERRORCODE(xenpv_exc_double_fault)
+{
+   /* On Xen PV, DF doesn't use IST.  The C part is the same as native. */
+   exc_double_fault(regs, error_code);
+}
+
 DEFINE_IDTENTRY_RAW(xenpv_exc_debug)
 {
/*
@@ -615,7 +621,7 @@ struct trap_array_entry {
 
 static struct trap_array_entry trap_array[] = {
TRAP_ENTRY_REDIR(exc_debug, true  ),
-   TRAP_ENTRY(exc_double_fault,true  ),
+   TRAP_ENTRY_REDIR(exc_double_fault,  true  ),
 #ifdef CONFIG_X86_MCE
TRAP_ENTRY_REDIR(exc_machine_check, true  ),
 #endif
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index bc2586730a5b..1d054c915046 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -161,7 +161,7 @@ xen_pv_trap asm_exc_overflow
 xen_pv_trap asm_exc_bounds
 xen_pv_trap asm_exc_invalid_op
 xen_pv_trap asm_exc_device_not_available
-xen_pv_trap asm_exc_double_fault
+xen_pv_trap asm_xenpv_exc_double_fault
 xen_pv_trap asm_exc_coproc_segment_overrun
 xen_pv_trap asm_exc_invalid_tss
 xen_pv_trap asm_exc_segment_not_present
-- 
2.26.2




[PATCH v3 00/15] x86: major paravirt cleanup

2020-12-17 Thread Juergen Gross
This is a major cleanup of the paravirt infrastructure aiming at
eliminating all custom code patching via paravirt patching.

This is achieved by using ALTERNATIVE instead, leading to the ability
to give objtool access to the patched in instructions.

In order to remove most of the 32-bit special handling from pvops the
time related operations are switched to use static_call() instead.

At the end of this series all paravirt patching has to do is to
replace indirect calls with direct ones. In a further step this could
be switched to static_call(), too, but that would require a major
header file disentangling.

Changes in V3:
- added patches 7 and 12
- addressed all comments

Changes in V2:
- added patches 5-12

Juergen Gross (14):
  x86/xen: use specific Xen pv interrupt entry for MCE
  x86/xen: use specific Xen pv interrupt entry for DF
  x86/pv: switch SWAPGS to ALTERNATIVE
  x86/xen: drop USERGS_SYSRET64 paravirt call
  x86: rework arch_local_irq_restore() to not use popf
  x86/paravirt: switch time pvops functions to use static_call()
  x86/alternative: support "not feature" and ALTERNATIVE_TERNARY
  x86: add new features for paravirt patching
  x86/paravirt: remove no longer needed 32-bit pvops cruft
  x86/paravirt: simplify paravirt macros
  x86/paravirt: switch iret pvops to ALTERNATIVE
  x86/paravirt: add new macros PVOP_ALT* supporting pvops in
ALTERNATIVEs
  x86/paravirt: switch functions with custom code to ALTERNATIVE
  x86/paravirt: have only one paravirt patch function

Peter Zijlstra (1):
  objtool: Alternatives vs ORC, the hard way

 arch/x86/Kconfig   |   1 +
 arch/x86/entry/entry_32.S  |   4 +-
 arch/x86/entry/entry_64.S  |  26 ++-
 arch/x86/include/asm/alternative-asm.h |   3 +
 arch/x86/include/asm/alternative.h |   7 +
 arch/x86/include/asm/cpufeatures.h |   2 +
 arch/x86/include/asm/idtentry.h|   6 +
 arch/x86/include/asm/irqflags.h|  51 ++
 arch/x86/include/asm/mshyperv.h|  11 --
 arch/x86/include/asm/paravirt.h| 157 ++
 arch/x86/include/asm/paravirt_time.h   |  38 +
 arch/x86/include/asm/paravirt_types.h  | 220 +
 arch/x86/kernel/Makefile   |   3 +-
 arch/x86/kernel/alternative.c  |  59 ++-
 arch/x86/kernel/asm-offsets.c  |   7 -
 arch/x86/kernel/asm-offsets_64.c   |   3 -
 arch/x86/kernel/cpu/vmware.c   |   5 +-
 arch/x86/kernel/irqflags.S |  11 --
 arch/x86/kernel/kvm.c  |   3 +-
 arch/x86/kernel/kvmclock.c |   3 +-
 arch/x86/kernel/paravirt.c |  83 +++---
 arch/x86/kernel/paravirt_patch.c   | 109 
 arch/x86/kernel/tsc.c  |   3 +-
 arch/x86/xen/enlighten_pv.c|  36 ++--
 arch/x86/xen/irq.c |  23 ---
 arch/x86/xen/time.c|  12 +-
 arch/x86/xen/xen-asm.S |  52 +-
 arch/x86/xen/xen-ops.h |   3 -
 drivers/clocksource/hyperv_timer.c |   5 +-
 drivers/xen/time.c |   3 +-
 kernel/sched/sched.h   |   1 +
 tools/objtool/check.c  | 180 ++--
 tools/objtool/check.h  |   5 +
 tools/objtool/orc_gen.c| 178 +---
 34 files changed, 627 insertions(+), 686 deletions(-)
 create mode 100644 arch/x86/include/asm/paravirt_time.h
 delete mode 100644 arch/x86/kernel/paravirt_patch.c

-- 
2.26.2




[PATCH v3 03/15] x86/pv: switch SWAPGS to ALTERNATIVE

2020-12-17 Thread Juergen Gross
SWAPGS is used only for interrupts coming from user mode or for
returning to user mode. So there is no reason to use the PARAVIRT
framework, as it can easily be replaced by an ALTERNATIVE depending
on X86_FEATURE_XENPV.

There are several instances using the PV-aware SWAPGS macro in paths
which are never executed in a Xen PV guest. Replace those with the
plain swapgs instruction. For SWAPGS_UNSAFE_STACK the same applies.

Signed-off-by: Juergen Gross 
Acked-by: Andy Lutomirski 
Acked-by: Peter Zijlstra (Intel) 
Reviewed-by: Borislav Petkov 
Reviewed-by: Thomas Gleixner 
---
 arch/x86/entry/entry_64.S | 10 +-
 arch/x86/include/asm/irqflags.h   | 20 
 arch/x86/include/asm/paravirt.h   | 20 
 arch/x86/include/asm/paravirt_types.h |  2 --
 arch/x86/kernel/asm-offsets_64.c  |  1 -
 arch/x86/kernel/paravirt.c|  1 -
 arch/x86/kernel/paravirt_patch.c  |  3 ---
 arch/x86/xen/enlighten_pv.c   |  3 ---
 8 files changed, 13 insertions(+), 47 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index cad08703c4ad..a876204a73e0 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -669,7 +669,7 @@ native_irq_return_ldt:
 */
 
pushq   %rdi/* Stash user RDI */
-   SWAPGS  /* to kernel GS */
+   swapgs  /* to kernel GS */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi   /* to kernel CR3 */
 
movqPER_CPU_VAR(espfix_waddr), %rdi
@@ -699,7 +699,7 @@ native_irq_return_ldt:
orq PER_CPU_VAR(espfix_stack), %rax
 
SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
-   SWAPGS  /* to user GS */
+   swapgs  /* to user GS */
popq%rdi/* Restore user RDI */
 
movq%rax, %rsp
@@ -943,7 +943,7 @@ SYM_CODE_START_LOCAL(paranoid_entry)
ret
 
 .Lparanoid_entry_swapgs:
-   SWAPGS
+   swapgs
 
/*
 * The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an
@@ -1001,7 +1001,7 @@ SYM_CODE_START_LOCAL(paranoid_exit)
jnz restore_regs_and_return_to_kernel
 
/* We are returning to a context with user GSBASE */
-   SWAPGS_UNSAFE_STACK
+   swapgs
jmp restore_regs_and_return_to_kernel
 SYM_CODE_END(paranoid_exit)
 
@@ -1426,7 +1426,7 @@ nmi_no_fsgsbase:
jnz nmi_restore
 
 nmi_swapgs:
-   SWAPGS_UNSAFE_STACK
+   swapgs
 
 nmi_restore:
POP_REGS
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 2dfc8d380dab..8c86edefa115 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -131,18 +131,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 #define SAVE_FLAGS(x)  pushfq; popq %rax
 #endif
 
-#define SWAPGS swapgs
-/*
- * Currently paravirt can't handle swapgs nicely when we
- * don't have a stack we can rely on (such as a user space
- * stack).  So we either find a way around these or just fault
- * and emulate if a guest tries to call swapgs directly.
- *
- * Either way, this is a good way to document that we don't
- * have a reliable stack. x86_64 only.
- */
-#define SWAPGS_UNSAFE_STACKswapgs
-
 #define INTERRUPT_RETURN   jmp native_iret
 #define USERGS_SYSRET64\
swapgs; \
@@ -170,6 +158,14 @@ static __always_inline int arch_irqs_disabled(void)
 
return arch_irqs_disabled_flags(flags);
 }
+#else
+#ifdef CONFIG_X86_64
+#ifdef CONFIG_XEN_PV
+#define SWAPGS ALTERNATIVE "swapgs", "", X86_FEATURE_XENPV
+#else
+#define SWAPGS swapgs
+#endif
+#endif
 #endif /* !__ASSEMBLY__ */
 
 #endif
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index f8dce11d2bc1..f2ebe109a37e 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -776,26 +776,6 @@ extern void default_banner(void);
 
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_PARAVIRT_XXL
-/*
- * If swapgs is used while the userspace stack is still current,
- * there's no way to call a pvop.  The PV replacement *must* be
- * inlined, or the swapgs instruction must be trapped and emulated.
- */
-#define SWAPGS_UNSAFE_STACK\
-   PARA_SITE(PARA_PATCH(PV_CPU_swapgs), swapgs)
-
-/*
- * Note: swapgs is very special, and in practise is either going to be
- * implemented with a single "swapgs" instruction or something very
- * special.  Either way, we don't need to save any registers for
- * it.
- */
-#define SWAPGS \
-   PARA_SITE(PARA_PATCH(PV_CPU_swapgs),\
- ANNOTATE_RETPOLINE_SAFE;  \
- call

[PATCH v3 10/15] x86/paravirt: simplify paravirt macros

2020-12-17 Thread Juergen Gross
The central pvops call macros PVOP_CALL() and PVOP_VCALL() are
looking very similar now.

The main differences are using PVOP_VCALL_ARGS or PVOP_CALL_ARGS, which
are identical, and the return value handling.

So drop PVOP_VCALL_ARGS and instead of PVOP_VCALL() just use
(void)PVOP_CALL(long, ...).

Note that it isn't easily possible to just redefine PVOP_VCALL()
to use PVOP_CALL() instead, as this would require further hiding of
commas in macro parameters.

Signed-off-by: Juergen Gross 
---
V3:
- new patch
---
 arch/x86/include/asm/paravirt_types.h | 25 -
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 42f9eef84131..a9efd4dad820 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -408,11 +408,9 @@ int paravirt_disable_iospace(void);
  * makes sure the incoming and outgoing types are always correct.
  */
 #ifdef CONFIG_X86_32
-#define PVOP_VCALL_ARGS
\
+#define PVOP_CALL_ARGS \
unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;
 
-#define PVOP_CALL_ARGS PVOP_VCALL_ARGS
-
 #define PVOP_CALL_ARG1(x)  "a" ((unsigned long)(x))
 #define PVOP_CALL_ARG2(x)  "d" ((unsigned long)(x))
 #define PVOP_CALL_ARG3(x)  "c" ((unsigned long)(x))
@@ -428,12 +426,10 @@ int paravirt_disable_iospace(void);
 #define VEXTRA_CLOBBERS
 #else  /* CONFIG_X86_64 */
 /* [re]ax isn't an arg, but the return val */
-#define PVOP_VCALL_ARGS\
+#define PVOP_CALL_ARGS \
unsigned long __edi = __edi, __esi = __esi, \
__edx = __edx, __ecx = __ecx, __eax = __eax;
 
-#define PVOP_CALL_ARGS PVOP_VCALL_ARGS
-
 #define PVOP_CALL_ARG1(x)  "D" ((unsigned long)(x))
 #define PVOP_CALL_ARG2(x)  "S" ((unsigned long)(x))
 #define PVOP_CALL_ARG3(x)  "d" ((unsigned long)(x))
@@ -492,25 +488,12 @@ int paravirt_disable_iospace(void);
PVOP_CALL(rettype, op.func, CLBR_RET_REG,   \
  PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__)
 
-
-#define PVOP_VCALL(op, clbr, call_clbr, extra_clbr, ...)   \
-   ({  \
-   PVOP_VCALL_ARGS;\
-   PVOP_TEST_NULL(op); \
-   asm volatile(paravirt_alt(PARAVIRT_CALL)\
-: call_clbr, ASM_CALL_CONSTRAINT   \
-: paravirt_type(op),   \
-  paravirt_clobber(clbr),  \
-  ##__VA_ARGS__\
-: "memory", "cc" extra_clbr);  \
-   })
-
 #define __PVOP_VCALL(op, ...)  \
-   PVOP_VCALL(op, CLBR_ANY, PVOP_VCALL_CLOBBERS,   \
+   (void)PVOP_CALL(long, op, CLBR_ANY, PVOP_VCALL_CLOBBERS,\
   VEXTRA_CLOBBERS, ##__VA_ARGS__)
 
 #define __PVOP_VCALLEESAVE(op, ...)\
-   PVOP_VCALL(op.func, CLBR_RET_REG,   \
+   (void)PVOP_CALL(long, op.func, CLBR_RET_REG,\
  PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__)
 
 
-- 
2.26.2




[PATCH v3 05/15] x86: rework arch_local_irq_restore() to not use popf

2020-12-17 Thread Juergen Gross
"popf" is a rather expensive operation, so don't use it for restoring
irq flags. Instead test whether interrupts are enabled in the flags
parameter and enable interrupts via "sti" in that case.

This results in the restore_fl paravirt op to be no longer needed.

Suggested-by: Andy Lutomirski 
Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/irqflags.h   | 20 ++-
 arch/x86/include/asm/paravirt.h   |  5 -
 arch/x86/include/asm/paravirt_types.h |  7 ++-
 arch/x86/kernel/irqflags.S| 11 ---
 arch/x86/kernel/paravirt.c|  1 -
 arch/x86/kernel/paravirt_patch.c  |  3 ---
 arch/x86/xen/enlighten_pv.c   |  2 --
 arch/x86/xen/irq.c| 23 --
 arch/x86/xen/xen-asm.S| 28 ---
 arch/x86/xen/xen-ops.h|  1 -
 10 files changed, 8 insertions(+), 93 deletions(-)

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index e585a4705b8d..144d70ea4393 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -35,15 +35,6 @@ extern __always_inline unsigned long native_save_fl(void)
return flags;
 }
 
-extern inline void native_restore_fl(unsigned long flags);
-extern inline void native_restore_fl(unsigned long flags)
-{
-   asm volatile("push %0 ; popf"
-: /* no output */
-:"g" (flags)
-:"memory", "cc");
-}
-
 static __always_inline void native_irq_disable(void)
 {
asm volatile("cli": : :"memory");
@@ -79,11 +70,6 @@ static __always_inline unsigned long 
arch_local_save_flags(void)
return native_save_fl();
 }
 
-static __always_inline void arch_local_irq_restore(unsigned long flags)
-{
-   native_restore_fl(flags);
-}
-
 static __always_inline void arch_local_irq_disable(void)
 {
native_irq_disable();
@@ -152,6 +138,12 @@ static __always_inline int arch_irqs_disabled(void)
 
return arch_irqs_disabled_flags(flags);
 }
+
+static __always_inline void arch_local_irq_restore(unsigned long flags)
+{
+   if (!arch_irqs_disabled_flags(flags))
+   arch_local_irq_enable();
+}
 #else
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_XEN_PV
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index dd43b1100a87..4abf110e2243 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -648,11 +648,6 @@ static inline notrace unsigned long 
arch_local_save_flags(void)
return PVOP_CALLEE0(unsigned long, irq.save_fl);
 }
 
-static inline notrace void arch_local_irq_restore(unsigned long f)
-{
-   PVOP_VCALLEE1(irq.restore_fl, f);
-}
-
 static inline notrace void arch_local_irq_disable(void)
 {
PVOP_VCALLEE0(irq.irq_disable);
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 0169365f1403..de87087d3bde 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -168,16 +168,13 @@ struct pv_cpu_ops {
 struct pv_irq_ops {
 #ifdef CONFIG_PARAVIRT_XXL
/*
-* Get/set interrupt state.  save_fl and restore_fl are only
-* expected to use X86_EFLAGS_IF; all other bits
-* returned from save_fl are undefined, and may be ignored by
-* restore_fl.
+* Get/set interrupt state.  save_fl is expected to use X86_EFLAGS_IF;
+* all other bits returned from save_fl are undefined.
 *
 * NOTE: These functions callers expect the callee to preserve
 * more registers than the standard C calling convention.
 */
struct paravirt_callee_save save_fl;
-   struct paravirt_callee_save restore_fl;
struct paravirt_callee_save irq_disable;
struct paravirt_callee_save irq_enable;
 
diff --git a/arch/x86/kernel/irqflags.S b/arch/x86/kernel/irqflags.S
index 0db0375235b4..8ef35063964b 100644
--- a/arch/x86/kernel/irqflags.S
+++ b/arch/x86/kernel/irqflags.S
@@ -13,14 +13,3 @@ SYM_FUNC_START(native_save_fl)
ret
 SYM_FUNC_END(native_save_fl)
 EXPORT_SYMBOL(native_save_fl)
-
-/*
- * void native_restore_fl(unsigned long flags)
- * %eax/%rdi: flags
- */
-SYM_FUNC_START(native_restore_fl)
-   push %_ASM_ARG1
-   popf
-   ret
-SYM_FUNC_END(native_restore_fl)
-EXPORT_SYMBOL(native_restore_fl)
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 18560b71e717..c60222ab8ab9 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -320,7 +320,6 @@ struct paravirt_patch_template pv_ops = {
 
/* Irq ops. */
.irq.save_fl= __PV_IS_CALLEE_SAVE(native_save_fl),
-   .irq.restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
.irq.irq_disable= __PV_IS_CALLEE_SAVE(native_irq_disable),
.irq.irq_enable = __PV_IS_CALLEE_SAVE(native_irq_enable),
.irq.safe_halt  = native

[PATCH v3 06/15] x86/paravirt: switch time pvops functions to use static_call()

2020-12-17 Thread Juergen Gross
The time pvops functions are the only ones left which might be
used in 32-bit mode and which return a 64-bit value.

Switch them to use the static_call() mechanism instead of pvops, as
this allows quite some simplification of the pvops implementation.

Due to include hell this requires to split out the time interfaces
into a new header file.

Signed-off-by: Juergen Gross 
---
 arch/x86/Kconfig  |  1 +
 arch/x86/include/asm/mshyperv.h   | 11 
 arch/x86/include/asm/paravirt.h   | 14 --
 arch/x86/include/asm/paravirt_time.h  | 38 +++
 arch/x86/include/asm/paravirt_types.h |  6 -
 arch/x86/kernel/cpu/vmware.c  |  5 ++--
 arch/x86/kernel/kvm.c |  3 ++-
 arch/x86/kernel/kvmclock.c|  3 ++-
 arch/x86/kernel/paravirt.c| 16 ---
 arch/x86/kernel/tsc.c |  3 ++-
 arch/x86/xen/time.c   | 12 -
 drivers/clocksource/hyperv_timer.c|  5 ++--
 drivers/xen/time.c|  3 ++-
 kernel/sched/sched.h  |  1 +
 14 files changed, 71 insertions(+), 50 deletions(-)
 create mode 100644 arch/x86/include/asm/paravirt_time.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a8bd298e45b1..ebabd8bf4064 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -769,6 +769,7 @@ if HYPERVISOR_GUEST
 
 config PARAVIRT
bool "Enable paravirtualization code"
+   depends on HAVE_STATIC_CALL
help
  This changes the kernel so it can modify itself when it is run
  under a hypervisor, potentially improving performance significantly
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ffc289992d1b..45942d420626 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -56,17 +56,6 @@ typedef int (*hyperv_fill_flush_list_func)(
 #define hv_get_raw_timer() rdtsc_ordered()
 #define hv_get_vector() HYPERVISOR_CALLBACK_VECTOR
 
-/*
- * Reference to pv_ops must be inline so objtool
- * detection of noinstr violations can work correctly.
- */
-static __always_inline void hv_setup_sched_clock(void *sched_clock)
-{
-#ifdef CONFIG_PARAVIRT
-   pv_ops.time.sched_clock = sched_clock;
-#endif
-}
-
 void hyperv_vector_handler(struct pt_regs *regs);
 
 static inline void hv_enable_stimer0_percpu_irq(int irq) {}
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 4abf110e2243..0785a9686e32 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -17,25 +17,11 @@
 #include 
 #include 
 
-static inline unsigned long long paravirt_sched_clock(void)
-{
-   return PVOP_CALL0(unsigned long long, time.sched_clock);
-}
-
-struct static_key;
-extern struct static_key paravirt_steal_enabled;
-extern struct static_key paravirt_steal_rq_enabled;
-
 __visible void __native_queued_spin_unlock(struct qspinlock *lock);
 bool pv_is_native_spin_unlock(void);
 __visible bool __native_vcpu_is_preempted(long cpu);
 bool pv_is_native_vcpu_is_preempted(void);
 
-static inline u64 paravirt_steal_clock(int cpu)
-{
-   return PVOP_CALL1(u64, time.steal_clock, cpu);
-}
-
 /* The paravirtualized I/O functions */
 static inline void slow_down_io(void)
 {
diff --git a/arch/x86/include/asm/paravirt_time.h 
b/arch/x86/include/asm/paravirt_time.h
new file mode 100644
index ..76cf94b7c899
--- /dev/null
+++ b/arch/x86/include/asm/paravirt_time.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PARAVIRT_TIME_H
+#define _ASM_X86_PARAVIRT_TIME_H
+
+/* Time related para-virtualized functions. */
+
+#ifdef CONFIG_PARAVIRT
+
+#include 
+#include 
+#include 
+
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+u64 dummy_sched_clock(void);
+
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+DECLARE_STATIC_CALL(pv_sched_clock, dummy_sched_clock);
+
+extern bool paravirt_using_native_sched_clock;
+
+void paravirt_set_sched_clock(u64 (*func)(void));
+
+static inline u64 paravirt_sched_clock(void)
+{
+   return static_call(pv_sched_clock)();
+}
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+#endif /* CONFIG_PARAVIRT */
+
+#endif /* _ASM_X86_PARAVIRT_TIME_H */
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index de87087d3bde..1fff349e4792 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -95,11 +95,6 @@ struct pv_lazy_ops {
 } __no_randomize_layout;
 #endif
 
-struct pv_time_ops {
-   unsigned long long (*sched_clock)(void);
-   unsigned long long (*steal_clock)(int cpu);
-} __no_randomize_layout;
-
 struct pv_cpu_ops {
/* hooks for various privileged instructions */
void (*io_delay)(void);
@@ -291,7 +286,6 @@ struct pv_lock_ops {
  * 

[PATCH v3 09/15] x86/paravirt: remove no longer needed 32-bit pvops cruft

2020-12-17 Thread Juergen Gross
PVOP_VCALL4() is only used for Xen PV, while PVOP_CALL4() isn't used
at all. Keep PVOP_CALL4() for 64 bits due to symmetry reasons.

This allows to remove the 32-bit definitions of those macros leading
to a substantial simplification of the paravirt macros, as those were
the only ones needing non-empty "pre" and "post" parameters.

PVOP_CALLEE2() and PVOP_VCALLEE2() are used nowhere, so remove them.

Another no longer needed case is special handling of return types
larger than unsigned long. Replace that with a BUILD_BUG_ON().

DISABLE_INTERRUPTS() is used in 32-bit code only, so it can just be
replaced by cli.

INTERRUPT_RETURN in 32-bit code can be replaced by iret.

ENABLE_INTERRUPTS is used nowhere, so it can be removed.

Signed-off-by: Juergen Gross 
---
 arch/x86/entry/entry_32.S |   4 +-
 arch/x86/include/asm/irqflags.h   |   5 --
 arch/x86/include/asm/paravirt.h   |  35 +---
 arch/x86/include/asm/paravirt_types.h | 112 --
 arch/x86/kernel/asm-offsets.c |   2 -
 5 files changed, 35 insertions(+), 123 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index df8c017e6161..765487e57d6e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -430,7 +430,7 @@
 * will soon execute iret and the tracer was already set to
 * the irqstate after the IRET:
 */
-   DISABLE_INTERRUPTS(CLBR_ANY)
+   cli
lss (%esp), %esp/* switch to espfix segment */
 .Lend_\@:
 #endif /* CONFIG_X86_ESPFIX32 */
@@ -1077,7 +1077,7 @@ restore_all_switch_stack:
 * when returning from IPI handler and when returning from
 * scheduler to user-space.
 */
-   INTERRUPT_RETURN
+   iret
 
 .section .fixup, "ax"
 SYM_CODE_START(asm_iret_error)
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 144d70ea4393..a0efbcd24b86 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -109,9 +109,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 }
 #else
 
-#define ENABLE_INTERRUPTS(x)   sti
-#define DISABLE_INTERRUPTS(x)  cli
-
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(x)  pushfq; popq %rax
@@ -119,8 +116,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 
 #define INTERRUPT_RETURN   jmp native_iret
 
-#else
-#define INTERRUPT_RETURN   iret
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 0785a9686e32..1dd30c95505d 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -692,6 +692,7 @@ extern void default_banner(void);
.if ((~(set)) & mask); pop %reg; .endif
 
 #ifdef CONFIG_X86_64
+#ifdef CONFIG_PARAVIRT_XXL
 
 #define PV_SAVE_REGS(set)  \
COND_PUSH(set, CLBR_RAX, rax);  \
@@ -717,46 +718,12 @@ extern void default_banner(void);
 #define PARA_PATCH(off)((off) / 8)
 #define PARA_SITE(ptype, ops)  _PVSITE(ptype, ops, .quad, 8)
 #define PARA_INDIRECT(addr)*addr(%rip)
-#else
-#define PV_SAVE_REGS(set)  \
-   COND_PUSH(set, CLBR_EAX, eax);  \
-   COND_PUSH(set, CLBR_EDI, edi);  \
-   COND_PUSH(set, CLBR_ECX, ecx);  \
-   COND_PUSH(set, CLBR_EDX, edx)
-#define PV_RESTORE_REGS(set)   \
-   COND_POP(set, CLBR_EDX, edx);   \
-   COND_POP(set, CLBR_ECX, ecx);   \
-   COND_POP(set, CLBR_EDI, edi);   \
-   COND_POP(set, CLBR_EAX, eax)
-
-#define PARA_PATCH(off)((off) / 4)
-#define PARA_SITE(ptype, ops)  _PVSITE(ptype, ops, .long, 4)
-#define PARA_INDIRECT(addr)*%cs:addr
-#endif
 
-#ifdef CONFIG_PARAVIRT_XXL
 #define INTERRUPT_RETURN   \
PARA_SITE(PARA_PATCH(PV_CPU_iret),  \
  ANNOTATE_RETPOLINE_SAFE;  \
  jmp PARA_INDIRECT(pv_ops+PV_CPU_iret);)
 
-#define DISABLE_INTERRUPTS(clobbers)   \
-   PARA_SITE(PARA_PATCH(PV_IRQ_irq_disable),   \
- PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);\
- ANNOTATE_RETPOLINE_SAFE;  \
- call PARA_INDIRECT(pv_ops+PV_IRQ_irq_disable);\
- PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
-
-#define ENABLE_INTERRUPTS(clobbers)\
-   PARA_SITE(PARA_PATCH(PV_IRQ_irq_enable),\
- PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);\
- ANNOTATE_RETPOLINE_SAFE;  \
- call PARA_INDIRECT(pv_ops+PV_IRQ_irq_enable); \
- PV_RESTORE_REGS(clobbers 

[PATCH v3 07/15] x86/alternative: support "not feature" and ALTERNATIVE_TERNARY

2020-12-17 Thread Juergen Gross
Instead of only supporting to modify instructions when a specific
feature is set, support doing so for the case a feature is not set.

As today a feature is specified using a 16 bit quantity and the highest
feature number in use is around 600, using a negated feature number for
specifying the inverted case seems to be appropriate.

  ALTERNATIVE "default_instr", "patched_instr", ~FEATURE_NR

will start with "default_instr" and patch that with "patched_instr" in
case FEATURE_NR is not set.

Using that add ALTERNATIVE_TERNARY:

  ALTERNATIVE_TERNARY "default_instr", FEATURE_NR,
  "feature_on_instr", "feature_off_instr"

which will start with "default_instr" and at patch time will, depending
on FEATURE_NR being set or not, patch that with either
"feature_on_instr" or "feature_off_instr".

Signed-off-by: Juergen Gross 
---
V3:
- new patch
---
 arch/x86/include/asm/alternative-asm.h |  3 +++
 arch/x86/include/asm/alternative.h |  7 +++
 arch/x86/kernel/alternative.c  | 17 -
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/alternative-asm.h 
b/arch/x86/include/asm/alternative-asm.h
index 464034db299f..b6989995fddf 100644
--- a/arch/x86/include/asm/alternative-asm.h
+++ b/arch/x86/include/asm/alternative-asm.h
@@ -109,6 +109,9 @@
.popsection
 .endm
 
+#define ALTERNATIVE_TERNARY(oldinstr, feature, newinstr1, newinstr2)   \
+   ALTERNATIVE_2 oldinstr, newinstr1, feature, newinstr2, ~(feature)
+
 #endif  /*  __ASSEMBLY__  */
 
 #endif /* _ASM_X86_ALTERNATIVE_ASM_H */
diff --git a/arch/x86/include/asm/alternative.h 
b/arch/x86/include/asm/alternative.h
index 13adca37c99a..a0f8f33609aa 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -59,6 +59,7 @@ struct alt_instr {
s32 instr_offset;   /* original instruction */
s32 repl_offset;/* offset to replacement instruction */
u16 cpuid;  /* cpuid bit set for replacement */
+#define ALT_INSTR_CPUID_INV0x8000  /* patch if ~cpuid bit is NOT set */
u8  instrlen;   /* length of original instruction */
u8  replacementlen; /* length of new instruction */
u8  padlen; /* length of build-time padding */
@@ -175,6 +176,9 @@ static inline int alternatives_text_reserved(void *start, 
void *end)
ALTINSTR_REPLACEMENT(newinstr2, feature2, 2)\
".popsection\n"
 
+#define ALTERNATIVE_TERNARY(oldinstr, feature, newinstr1, newinstr2)   \
+   ALTERNATIVE_2(oldinstr, newinstr1, feature, newinstr2, ~(feature))
+
 #define ALTERNATIVE_3(oldinsn, newinsn1, feat1, newinsn2, feat2, newinsn3, 
feat3) \
OLDINSTR_3(oldinsn, 1, 2, 3)
\
".pushsection .altinstructions,\"a\"\n" 
\
@@ -206,6 +210,9 @@ static inline int alternatives_text_reserved(void *start, 
void *end)
 #define alternative_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \
asm_inline volatile(ALTERNATIVE_2(oldinstr, newinstr1, feature1, 
newinstr2, feature2) ::: "memory")
 
+#define alternative_ternary(oldinstr, feature, newinstr1, newinstr2)   \
+   asm_inline volatile(ALTERNATIVE_TERNARY(oldinstr, feature, newinstr1, 
newinstr2) ::: "memory")
+
 /*
  * Alternative inline assembly with input.
  *
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 8d778e46725d..0a904fb2678b 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -388,21 +388,28 @@ void __init_or_module noinline apply_alternatives(struct 
alt_instr *start,
 */
for (a = start; a < end; a++) {
int insn_buff_sz = 0;
+   u16 feature;
+   bool not_feature;
 
instr = (u8 *)&a->instr_offset + a->instr_offset;
replacement = (u8 *)&a->repl_offset + a->repl_offset;
+   feature = a->cpuid;
+   not_feature = feature & ALT_INSTR_CPUID_INV;
+   if (not_feature)
+   feature = ~feature;
BUG_ON(a->instrlen > sizeof(insn_buff));
-   BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32);
-   if (!boot_cpu_has(a->cpuid)) {
+   BUG_ON(feature >= (NCAPINTS + NBUGINTS) * 32);
+   if (!!boot_cpu_has(feature) == not_feature) {
if (a->padlen > 1)
optimize_nops(a, instr);
 
continue;
}
 
-   DPRINTK("feat: %d*32+%d, old: (%pS (%px) len: %d), repl: (%px, 
len: %d), pad: %d",
-   a->cpuid >> 5,
-   a->cpuid & 0x1f,
+   DPRINTK("feat: %s%d*32+%d, old: (%pS (%px) len: %d), repl: 
(%px, len: %d), pad: %d",
+   not_feature ? "~" : "",
+   feature >> 5,
+   

[PATCH v3 14/15] x86/paravirt: switch functions with custom code to ALTERNATIVE

2020-12-17 Thread Juergen Gross
Instead of using paravirt patching for custom code sequences use
ALTERNATIVE for the functions with custom code replacements.

Instead of patching an ud2 instruction for unpopulated vector entries
into the caller site, use a simple function just calling BUG() as a
replacement.

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/paravirt.h   | 72 ++
 arch/x86/include/asm/paravirt_types.h |  1 -
 arch/x86/kernel/paravirt.c| 16 ++---
 arch/x86/kernel/paravirt_patch.c  | 88 ---
 4 files changed, 53 insertions(+), 124 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 49a823abc0e1..fc551dcc6458 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -108,7 +108,8 @@ static inline void write_cr0(unsigned long x)
 
 static inline unsigned long read_cr2(void)
 {
-   return PVOP_CALLEE0(unsigned long, mmu.read_cr2);
+   return PVOP_ALT_CALLEE0(unsigned long, mmu.read_cr2,
+   "mov %%cr2, %%rax;", ~X86_FEATURE_XENPV);
 }
 
 static inline void write_cr2(unsigned long x)
@@ -118,12 +119,14 @@ static inline void write_cr2(unsigned long x)
 
 static inline unsigned long __read_cr3(void)
 {
-   return PVOP_CALL0(unsigned long, mmu.read_cr3);
+   return PVOP_ALT_CALL0(unsigned long, mmu.read_cr3,
+ "mov %%cr3, %%rax;", ~X86_FEATURE_XENPV);
 }
 
 static inline void write_cr3(unsigned long x)
 {
-   PVOP_VCALL1(mmu.write_cr3, x);
+   PVOP_ALT_VCALL1(mmu.write_cr3, x,
+   "mov %%rdi, %%cr3", ~X86_FEATURE_XENPV);
 }
 
 static inline void __write_cr4(unsigned long x)
@@ -143,7 +146,7 @@ static inline void halt(void)
 
 static inline void wbinvd(void)
 {
-   PVOP_VCALL0(cpu.wbinvd);
+   PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ~X86_FEATURE_XENPV);
 }
 
 static inline u64 paravirt_read_msr(unsigned msr)
@@ -357,22 +360,28 @@ static inline void paravirt_release_p4d(unsigned long pfn)
 
 static inline pte_t __pte(pteval_t val)
 {
-   return (pte_t) { PVOP_CALLEE1(pteval_t, mmu.make_pte, val) };
+   return (pte_t) { PVOP_ALT_CALLEE1(pteval_t, mmu.make_pte, val,
+ "mov %%rdi, %%rax",
+ ~X86_FEATURE_XENPV) };
 }
 
 static inline pteval_t pte_val(pte_t pte)
 {
-   return PVOP_CALLEE1(pteval_t, mmu.pte_val, pte.pte);
+   return PVOP_ALT_CALLEE1(pteval_t, mmu.pte_val, pte.pte,
+   "mov %%rdi, %%rax", ~X86_FEATURE_XENPV);
 }
 
 static inline pgd_t __pgd(pgdval_t val)
 {
-   return (pgd_t) { PVOP_CALLEE1(pgdval_t, mmu.make_pgd, val) };
+   return (pgd_t) { PVOP_ALT_CALLEE1(pgdval_t, mmu.make_pgd, val,
+ "mov %%rdi, %%rax",
+ ~X86_FEATURE_XENPV) };
 }
 
 static inline pgdval_t pgd_val(pgd_t pgd)
 {
-   return PVOP_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd);
+   return PVOP_ALT_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd,
+   "mov %%rdi, %%rax", ~X86_FEATURE_XENPV);
 }
 
 #define  __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
@@ -405,12 +414,15 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 
 static inline pmd_t __pmd(pmdval_t val)
 {
-   return (pmd_t) { PVOP_CALLEE1(pmdval_t, mmu.make_pmd, val) };
+   return (pmd_t) { PVOP_ALT_CALLEE1(pmdval_t, mmu.make_pmd, val,
+ "mov %%rdi, %%rax",
+ ~X86_FEATURE_XENPV) };
 }
 
 static inline pmdval_t pmd_val(pmd_t pmd)
 {
-   return PVOP_CALLEE1(pmdval_t, mmu.pmd_val, pmd.pmd);
+   return PVOP_ALT_CALLEE1(pmdval_t, mmu.pmd_val, pmd.pmd,
+   "mov %%rdi, %%rax", ~X86_FEATURE_XENPV);
 }
 
 static inline void set_pud(pud_t *pudp, pud_t pud)
@@ -422,14 +434,16 @@ static inline pud_t __pud(pudval_t val)
 {
pudval_t ret;
 
-   ret = PVOP_CALLEE1(pudval_t, mmu.make_pud, val);
+   ret = PVOP_ALT_CALLEE1(pudval_t, mmu.make_pud, val,
+  "mov %%rdi, %%rax", ~X86_FEATURE_XENPV);
 
return (pud_t) { ret };
 }
 
 static inline pudval_t pud_val(pud_t pud)
 {
-   return PVOP_CALLEE1(pudval_t, mmu.pud_val, pud.pud);
+   return PVOP_ALT_CALLEE1(pudval_t, mmu.pud_val, pud.pud,
+   "mov %%rdi, %%rax", ~X86_FEATURE_XENPV);
 }
 
 static inline void pud_clear(pud_t *pudp)
@@ -448,14 +462,16 @@ static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
 
 static inline p4d_t __p4d(p4dval_t val)
 {
-   p4dval_t ret = PVOP_CALLEE1(p4dval_t, mmu.make_p4d, val);
+   p4dval_t ret = PVOP_ALT_CALLEE1(p4dval_t, mmu.make_p4d, val,
+   "mov %%rdi, %%rax", ~X86_FEATURE_XENPV);
 
return (p4d_t) { ret };
 }
 
 static inline p4dval_t p4d_val(p4d_t p4d)
 {
-   return PVOP_CALLEE1(p4dval_t, mmu.p4d_val

[PATCH v3 08/15] x86: add new features for paravirt patching

2020-12-17 Thread Juergen Gross
For being able to switch paravirt patching from special cased custom
code sequences to ALTERNATIVE handling some X86_FEATURE_* are needed
as new features. This enables to have the standard indirect pv call
as the default code and to patch that with the non-Xen custom code
sequence via ALTERNATIVE patching later.

Make sure paravirt patching is performed before alternative patching.

Signed-off-by: Juergen Gross 
---
V3:
- add comment (Boris Petkov)
- no negative features (Boris Petkov)
---
 arch/x86/include/asm/cpufeatures.h |  2 ++
 arch/x86/kernel/alternative.c  | 40 --
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index f5ef2d5b9231..1077b675a008 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -237,6 +237,8 @@
 #define X86_FEATURE_VMCALL ( 8*32+18) /* "" Hypervisor supports 
the VMCALL instruction */
 #define X86_FEATURE_VMW_VMMCALL( 8*32+19) /* "" VMware prefers 
VMMCALL hypercall instruction */
 #define X86_FEATURE_SEV_ES ( 8*32+20) /* AMD Secure Encrypted 
Virtualization - Encrypted State */
+#define X86_FEATURE_PVUNLOCK   ( 8*32+21) /* "" PV unlock function */
+#define X86_FEATURE_VCPUPREEMPT( 8*32+22) /* "" PV 
vcpu_is_preempted function */
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE   ( 9*32+ 0) /* RDFSBASE, WRFSBASE, 
RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 0a904fb2678b..abb481808811 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -600,6 +600,15 @@ int alternatives_text_reserved(void *start, void *end)
 #endif /* CONFIG_SMP */
 
 #ifdef CONFIG_PARAVIRT
+static void __init paravirt_set_cap(void)
+{
+   if (!pv_is_native_spin_unlock())
+   setup_force_cpu_cap(X86_FEATURE_PVUNLOCK);
+
+   if (!pv_is_native_vcpu_is_preempted())
+   setup_force_cpu_cap(X86_FEATURE_VCPUPREEMPT);
+}
+
 void __init_or_module apply_paravirt(struct paravirt_patch_site *start,
 struct paravirt_patch_site *end)
 {
@@ -623,6 +632,8 @@ void __init_or_module apply_paravirt(struct 
paravirt_patch_site *start,
 }
 extern struct paravirt_patch_site __start_parainstructions[],
__stop_parainstructions[];
+#else
+static void __init paravirt_set_cap(void) { }
 #endif /* CONFIG_PARAVIRT */
 
 /*
@@ -730,6 +741,33 @@ void __init alternative_instructions(void)
 * patching.
 */
 
+   /*
+* Paravirt patching and alternative patching can be combined to
+* replace a function call with a short direct code sequence (e.g.
+* by setting a constant return value instead of doing that in an
+* external function).
+* In order to make this work the following sequence is required:
+* 1. set (artificial) features depending on used paravirt
+*functions which can later influence alternative patching
+* 2. apply paravirt patching (generally replacing an indirect
+*function call with a direct one)
+* 3. apply alternative patching (e.g. replacing a direct function
+*call with a custom code sequence)
+* Doing paravirt patching after alternative patching would clobber
+* the optimization of the custom code with a function call again.
+*/
+   paravirt_set_cap();
+
+   /*
+* First patch paravirt functions, such that we overwrite the indirect
+* call with the direct call.
+*/
+   apply_paravirt(__parainstructions, __parainstructions_end);
+
+   /*
+* Then patch alternatives, such that those paravirt calls that are in
+* alternatives can be overwritten by their immediate fragments.
+*/
apply_alternatives(__alt_instructions, __alt_instructions_end);
 
 #ifdef CONFIG_SMP
@@ -748,8 +786,6 @@ void __init alternative_instructions(void)
}
 #endif
 
-   apply_paravirt(__parainstructions, __parainstructions_end);
-
restart_nmi();
alternatives_patched = 1;
 }
-- 
2.26.2




[PATCH v3 11/15] x86/paravirt: switch iret pvops to ALTERNATIVE

2020-12-17 Thread Juergen Gross
The iret paravirt op is rather special as it is using a jmp instead
of a call instruction. Switch it to ALTERNATIVE.

Signed-off-by: Juergen Gross 
---
V3:
- use ALTERNATIVE_TERNARY
---
 arch/x86/include/asm/paravirt.h   |  6 +++---
 arch/x86/include/asm/paravirt_types.h |  5 +
 arch/x86/kernel/asm-offsets.c |  5 -
 arch/x86/kernel/paravirt.c| 26 ++
 arch/x86/xen/enlighten_pv.c   |  3 +--
 5 files changed, 7 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 1dd30c95505d..49a823abc0e1 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -720,9 +720,9 @@ extern void default_banner(void);
 #define PARA_INDIRECT(addr)*addr(%rip)
 
 #define INTERRUPT_RETURN   \
-   PARA_SITE(PARA_PATCH(PV_CPU_iret),  \
- ANNOTATE_RETPOLINE_SAFE;  \
- jmp PARA_INDIRECT(pv_ops+PV_CPU_iret);)
+   ANNOTATE_RETPOLINE_SAFE;\
+   ALTERNATIVE_TERNARY("jmp *paravirt_iret(%rip);",\
+   X86_FEATURE_XENPV, "jmp xen_iret;", "jmp native_iret;")
 
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(clobbers)\
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index a9efd4dad820..5d6de014d2f6 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -151,10 +151,6 @@ struct pv_cpu_ops {
 
u64 (*read_pmc)(int counter);
 
-   /* Normal iret.  Jump to this with the standard iret stack
-  frame set up. */
-   void (*iret)(void);
-
void (*start_context_switch)(struct task_struct *prev);
void (*end_context_switch)(struct task_struct *next);
 #endif
@@ -294,6 +290,7 @@ struct paravirt_patch_template {
 
 extern struct pv_info pv_info;
 extern struct paravirt_patch_template pv_ops;
+extern void (*paravirt_iret)(void);
 
 #define PARAVIRT_PATCH(x)  \
(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 736508004b30..ecd3fd6993d1 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -61,11 +61,6 @@ static void __used common(void)
OFFSET(IA32_RT_SIGFRAME_sigcontext, rt_sigframe_ia32, uc.uc_mcontext);
 #endif
 
-#ifdef CONFIG_PARAVIRT_XXL
-   BLANK();
-   OFFSET(PV_CPU_iret, paravirt_patch_template, cpu.iret);
-#endif
-
 #ifdef CONFIG_XEN
BLANK();
OFFSET(XEN_vcpu_info_mask, vcpu_info, evtchn_upcall_mask);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 9f8aa18aa378..2ab547dd66c3 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -86,25 +86,6 @@ u64 notrace _paravirt_ident_64(u64 x)
 {
return x;
 }
-
-static unsigned paravirt_patch_jmp(void *insn_buff, const void *target,
-  unsigned long addr, unsigned len)
-{
-   struct branch *b = insn_buff;
-   unsigned long delta = (unsigned long)target - (addr+5);
-
-   if (len < 5) {
-#ifdef CONFIG_RETPOLINE
-   WARN_ONCE(1, "Failing to patch indirect JMP in %ps\n", (void 
*)addr);
-#endif
-   return len; /* call too long for patch site */
-   }
-
-   b->opcode = 0xe9;   /* jmp */
-   b->delta = delta;
-
-   return 5;
-}
 #endif
 
 DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
@@ -136,9 +117,6 @@ unsigned paravirt_patch_default(u8 type, void *insn_buff,
else if (opfunc == _paravirt_ident_64)
ret = paravirt_patch_ident_64(insn_buff, len);
 
-   else if (type == PARAVIRT_PATCH(cpu.iret))
-   /* If operation requires a jmp, then jmp */
-   ret = paravirt_patch_jmp(insn_buff, opfunc, addr, len);
 #endif
else
/* Otherwise call the function. */
@@ -316,8 +294,6 @@ struct paravirt_patch_template pv_ops = {
 
.cpu.load_sp0   = native_load_sp0,
 
-   .cpu.iret   = native_iret,
-
 #ifdef CONFIG_X86_IOPL_IOPERM
.cpu.invalidate_io_bitmap   = native_tss_invalidate_io_bitmap,
.cpu.update_io_bitmap   = native_tss_update_io_bitmap,
@@ -422,6 +398,8 @@ struct paravirt_patch_template pv_ops = {
 NOKPROBE_SYMBOL(native_get_debugreg);
 NOKPROBE_SYMBOL(native_set_debugreg);
 NOKPROBE_SYMBOL(native_load_idt);
+
+void (*paravirt_iret)(void) = native_iret;
 #endif
 
 EXPORT_SYMBOL(pv_ops);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 32b295cc2716..4716383c64a9 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1057,8 +1057,6 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst =

[PATCH v3 13/15] x86/paravirt: add new macros PVOP_ALT* supporting pvops in ALTERNATIVEs

2020-12-17 Thread Juergen Gross
Instead of using paravirt patching for custom code sequences add
support for using ALTERNATIVE handling combined with paravirt call
patching.

Signed-off-by: Juergen Gross 
---
V3:
- drop PVOP_ALT_VCALL() macro
---
 arch/x86/include/asm/paravirt_types.h | 49 +++
 1 file changed, 49 insertions(+)

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 5d6de014d2f6..7e0130781b12 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -477,44 +477,93 @@ int paravirt_disable_iospace(void);
(rettype)(__eax & PVOP_RETMASK(rettype));   \
})
 
+#define PVOP_ALT_CALL(rettype, op, alt, cond, clbr, call_clbr, \
+ extra_clbr, ...)  \
+   ({  \
+   PVOP_CALL_ARGS; \
+   PVOP_TEST_NULL(op); \
+   BUILD_BUG_ON(sizeof(rettype) > sizeof(unsigned long));  \
+   asm volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),   \
+alt, cond) \
+: call_clbr, ASM_CALL_CONSTRAINT   \
+: paravirt_type(op),   \
+  paravirt_clobber(clbr),  \
+  ##__VA_ARGS__\
+: "memory", "cc" extra_clbr);  \
+   (rettype)(__eax & PVOP_RETMASK(rettype));   \
+   })
+
 #define __PVOP_CALL(rettype, op, ...)  \
PVOP_CALL(rettype, op, CLBR_ANY, PVOP_CALL_CLOBBERS,\
  EXTRA_CLOBBERS, ##__VA_ARGS__)
 
+#define __PVOP_ALT_CALL(rettype, op, alt, cond, ...)   \
+   PVOP_ALT_CALL(rettype, op, alt, cond, CLBR_ANY, \
+ PVOP_CALL_CLOBBERS, EXTRA_CLOBBERS,   \
+ ##__VA_ARGS__)
+
 #define __PVOP_CALLEESAVE(rettype, op, ...)\
PVOP_CALL(rettype, op.func, CLBR_RET_REG,   \
  PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__)
 
+#define __PVOP_ALT_CALLEESAVE(rettype, op, alt, cond, ...) \
+   PVOP_ALT_CALL(rettype, op.func, alt, cond, CLBR_RET_REG,\
+ PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__)
+
+
 #define __PVOP_VCALL(op, ...)  \
(void)PVOP_CALL(long, op, CLBR_ANY, PVOP_VCALL_CLOBBERS,\
   VEXTRA_CLOBBERS, ##__VA_ARGS__)
 
+#define __PVOP_ALT_VCALL(op, alt, cond, ...)   \
+   (void)PVOP_ALT_CALL(long, op, alt, cond, CLBR_ANY,  \
+  PVOP_VCALL_CLOBBERS, VEXTRA_CLOBBERS,\
+  ##__VA_ARGS__)
+
 #define __PVOP_VCALLEESAVE(op, ...)\
(void)PVOP_CALL(long, op.func, CLBR_RET_REG,\
  PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__)
 
+#define __PVOP_ALT_VCALLEESAVE(op, alt, cond, ...) \
+   (void)PVOP_ALT_CALL(long, op.func, alt, cond, CLBR_RET_REG, \
+  PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__)
+
 
 
 #define PVOP_CALL0(rettype, op)
\
__PVOP_CALL(rettype, op)
 #define PVOP_VCALL0(op)
\
__PVOP_VCALL(op)
+#define PVOP_ALT_CALL0(rettype, op, alt, cond) \
+   __PVOP_ALT_CALL(rettype, op, alt, cond)
+#define PVOP_ALT_VCALL0(op, alt, cond) \
+   __PVOP_ALT_VCALL(op, alt, cond)
 
 #define PVOP_CALLEE0(rettype, op)  \
__PVOP_CALLEESAVE(rettype, op)
 #define PVOP_VCALLEE0(op)  \
__PVOP_VCALLEESAVE(op)
+#define PVOP_ALT_CALLEE0(rettype, op, alt, cond)   \
+   __PVOP_ALT_CALLEESAVE(rettype, op, alt, cond)
+#define PVOP_ALT_VCALLEE0(op, alt, cond)   \
+   __PVOP_ALT_VCALLEESAVE(op, alt, cond)
 
 
 #define PVOP_CALL1(rettype, op, arg1)  \
__PVOP_CALL(rettype, op, PVOP_CALL_ARG1(arg1))
 #define PVOP_VCALL1(op, arg1)  \
__PVOP_VCALL(op, PVOP_CALL_ARG1(arg1))
+#define PVOP_ALT_VCALL1(op, arg1, alt, cond)   \
+   __PVOP_ALT_VCALL(op, alt, cond, PVOP_CALL_ARG1(arg1))
 
 #define PVOP_CALLEE1(rettype, op, arg1)
\
__PVOP_CALLEESAVE(rettype, op, PVOP_CALL_ARG1(ar

[PATCH v3 12/15] objtool: Alternatives vs ORC, the hard way

2020-12-17 Thread Juergen Gross
From: Peter Zijlstra 

Alternatives pose an interesting problem for unwinders because from
the unwinders PoV we're just executing instructions, it has no idea
the text is modified, nor any way of retrieving what with.

Therefore the stance has been that alternatives must not change stack
state, as encoded by commit: 7117f16bf460 ("objtool: Fix ORC vs
alternatives"). This obviously guarantees that whatever actual
instructions end up in the text, the unwind information is correct.

However, there is one additional source of text patching that isn't
currently visible to objtool: paravirt immediate patching. And it
turns out one of these violates the rule.

As part of cleaning that up, the unfortunate reality is that objtool
now has to deal with alternatives modifying unwind state and validate
the combination is valid and generate ORC data to match.

The problem is that a single instance of unwind information (ORC) must
capture and correctly unwind all alternatives. Since the trivially
correct mandate is out, implement the straight forward brute-force
approach:

 1) generate CFI information for each alternative

 2) unwind every alternative with the merge-sort of the previously
generated CFI information -- O(n^2)

 3) for any possible conflict: yell.

 4) Generate ORC with merge-sort

Specifically for 3 there are two possible classes of conflicts:

 - the merge-sort itself could find conflicting CFI for the same
   offset.

 - the unwind can fail with the merged CFI.

In specific, this allows us to deal with:

Alt1Alt2Alt3

 0x00   CALL *pv_ops.save_flCALL xen_save_flPUSHF
 0x01   POP %RAX
 0x02   NOP
 ...
 0x05   NOP
 ...
 0x07   

The unwind information for offset-0x00 is identical for all 3
alternatives. Similarly offset-0x05 and higher also are identical (and
the same as 0x00). However offset-0x01 has deviating CFI, but that is
only relevant for Alt3, neither of the other alternative instruction
streams will ever hit that offset.

Signed-off-by: Peter Zijlstra (Intel) 
---
V3:
- new patch; there is still an ongoing discussion whether this patch
  couldn't be made simpler, but I'm including it here nevertheless, as
  there is some solution required in objtool for the following patches
  of the series.
Signed-off-by: Juergen Gross 
---
 tools/objtool/check.c   | 180 
 tools/objtool/check.h   |   5 ++
 tools/objtool/orc_gen.c | 178 +--
 3 files changed, 289 insertions(+), 74 deletions(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index c6ab44543c92..2d70766af857 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1090,6 +1090,32 @@ static int handle_group_alt(struct objtool_file *file,
return -1;
}
 
+   /*
+* Add the filler NOP, required for alternative CFI.
+*/
+   if (special_alt->group && special_alt->new_len < special_alt->orig_len) 
{
+   struct instruction *nop = malloc(sizeof(*nop));
+   if (!nop) {
+   WARN("malloc failed");
+   return -1;
+   }
+   memset(nop, 0, sizeof(*nop));
+   INIT_LIST_HEAD(&nop->alts);
+   INIT_LIST_HEAD(&nop->stack_ops);
+   init_cfi_state(&nop->cfi);
+
+   nop->sec = last_new_insn->sec;
+   nop->ignore = last_new_insn->ignore;
+   nop->func = last_new_insn->func;
+   nop->alt_group = alt_group;
+   nop->offset = last_new_insn->offset + last_new_insn->len;
+   nop->type = INSN_NOP;
+   nop->len = special_alt->orig_len - special_alt->new_len;
+
+   list_add(&nop->list, &last_new_insn->list);
+   last_new_insn = nop;
+   }
+
if (fake_jump)
list_add(&fake_jump->list, &last_new_insn->list);
 
@@ -2190,18 +2216,12 @@ static int handle_insn_ops(struct instruction *insn, 
struct insn_state *state)
struct stack_op *op;
 
list_for_each_entry(op, &insn->stack_ops, list) {
-   struct cfi_state old_cfi = state->cfi;
int res;
 
res = update_cfi_state(insn, &state->cfi, op);
if (res)
return res;
 
-   if (insn->alt_group && memcmp(&state->cfi, &old_cfi, 
sizeof(struct cfi_state))) {
-   WARN_FUNC("alternative modifies stack", insn->sec, 
insn->offset);
-   return -1;
-   }
-
if (op->dest.type == OP_DEST_PUSHF) {
if (!state->uaccess_stack) {
state->uaccess_stack = 1;
@@ -2399,19 +2419,137 @@ static int validate_return(struct symbol *func, struct 
instruction *insn, stru

[PATCH v3 15/15] x86/paravirt: have only one paravirt patch function

2020-12-17 Thread Juergen Gross
There is no need any longer to have different paravirt patch functions
for native and Xen. Eliminate native_patch() and rename
paravirt_patch_default() to paravirt_patch().

Signed-off-by: Juergen Gross 
---
V3:
- remove paravirt_patch_insns() (kernel test robot)
---
 arch/x86/include/asm/paravirt_types.h | 19 +--
 arch/x86/kernel/Makefile  |  3 +--
 arch/x86/kernel/alternative.c |  2 +-
 arch/x86/kernel/paravirt.c| 20 ++--
 arch/x86/kernel/paravirt_patch.c  | 11 ---
 arch/x86/xen/enlighten_pv.c   |  1 -
 6 files changed, 5 insertions(+), 51 deletions(-)
 delete mode 100644 arch/x86/kernel/paravirt_patch.c

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index f9e77046b61b..5c728eab9cd1 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -73,19 +73,6 @@ struct pv_info {
const char *name;
 };
 
-struct pv_init_ops {
-   /*
-* Patch may replace one of the defined code sequences with
-* arbitrary code, subject to the same register constraints.
-* This generally means the code is not free to clobber any
-* registers other than EAX.  The patch function should return
-* the number of bytes of code generated, as we nop pad the
-* rest in generic code.
-*/
-   unsigned (*patch)(u8 type, void *insn_buff,
- unsigned long addr, unsigned len);
-} __no_randomize_layout;
-
 #ifdef CONFIG_PARAVIRT_XXL
 struct pv_lazy_ops {
/* Set deferred update mode, used for batching operations. */
@@ -281,7 +268,6 @@ struct pv_lock_ops {
  * number for each function using the offset which we use to indicate
  * what to patch. */
 struct paravirt_patch_template {
-   struct pv_init_ops  init;
struct pv_cpu_ops   cpu;
struct pv_irq_ops   irq;
struct pv_mmu_ops   mmu;
@@ -322,10 +308,7 @@ extern void (*paravirt_iret)(void);
 /* Simple instruction patching code. */
 #define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
 
-unsigned paravirt_patch_default(u8 type, void *insn_buff, unsigned long addr, 
unsigned len);
-unsigned paravirt_patch_insns(void *insn_buff, unsigned len, const char 
*start, const char *end);
-
-unsigned native_patch(u8 type, void *insn_buff, unsigned long addr, unsigned 
len);
+unsigned paravirt_patch(u8 type, void *insn_buff, unsigned long addr, unsigned 
len);
 
 int paravirt_disable_iospace(void);
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 68608bd892c0..61f52f95670b 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -35,7 +35,6 @@ KASAN_SANITIZE_sev-es.o   
:= n
 KCSAN_SANITIZE := n
 
 OBJECT_FILES_NON_STANDARD_test_nx.o:= y
-OBJECT_FILES_NON_STANDARD_paravirt_patch.o := y
 
 ifdef CONFIG_FRAME_POINTER
 OBJECT_FILES_NON_STANDARD_ftrace_$(BITS).o := y
@@ -122,7 +121,7 @@ obj-$(CONFIG_AMD_NB)+= amd_nb.o
 obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o
 
 obj-$(CONFIG_KVM_GUEST)+= kvm.o kvmclock.o
-obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
 obj-$(CONFIG_PARAVIRT_CLOCK)   += pvclock.o
 obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index abb481808811..1dbd6a934b66 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -621,7 +621,7 @@ void __init_or_module apply_paravirt(struct 
paravirt_patch_site *start,
BUG_ON(p->len > MAX_PATCH_LEN);
/* prep the buffer with the original instructions */
memcpy(insn_buff, p->instr, p->len);
-   used = pv_ops.init.patch(p->type, insn_buff, (unsigned 
long)p->instr, p->len);
+   used = paravirt_patch(p->type, insn_buff, (unsigned 
long)p->instr, p->len);
 
BUG_ON(used > p->len);
 
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index db6ae7f7c14e..b648eaf640f2 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -99,8 +99,8 @@ void __init native_pv_lock_init(void)
static_branch_disable(&virt_spin_lock_key);
 }
 
-unsigned paravirt_patch_default(u8 type, void *insn_buff,
-   unsigned long addr, unsigned len)
+unsigned int paravirt_patch(u8 type, void *insn_buff, unsigned long addr,
+   unsigned int len)
 {
/*
 * Neat trick to map patch type back to the call within the
@@ -121,19 +121,6 @@ unsigned paravirt_patch_default(u8 type, void *insn_buff,
return ret;
 }
 
-unsigned paravirt_patch_insns(void *insn_buff, unsigned len,
-   

Re: [PATCH v2 2/4] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()

2020-12-17 Thread Sergio Lopez
On Wed, Dec 16, 2020 at 07:31:02PM +0100, Kevin Wolf wrote:
> Am 16.12.2020 um 15:55 hat Sergio Lopez geschrieben:
> > On Wed, Dec 16, 2020 at 01:35:14PM +0100, Kevin Wolf wrote:
> > > Am 15.12.2020 um 18:23 hat Sergio Lopez geschrieben:
> > > > On Tue, Dec 15, 2020 at 04:01:19PM +0100, Kevin Wolf wrote:
> > > > > Am 15.12.2020 um 14:15 hat Sergio Lopez geschrieben:
> > > > > > On Tue, Dec 15, 2020 at 01:12:33PM +0100, Kevin Wolf wrote:
> > > > > > > Am 14.12.2020 um 18:05 hat Sergio Lopez geschrieben:
> > > > > > > > While processing the parents of a BDS, one of the parents may 
> > > > > > > > process
> > > > > > > > the child that's doing the tail recursion, which leads to a BDS 
> > > > > > > > being
> > > > > > > > processed twice. This is especially problematic for the 
> > > > > > > > aio_notifiers,
> > > > > > > > as they might attempt to work on both the old and the new AIO
> > > > > > > > contexts.
> > > > > > > > 
> > > > > > > > To avoid this, add the BDS pointer to the ignore list, and 
> > > > > > > > check the
> > > > > > > > child BDS pointer while iterating over the children.
> > > > > > > > 
> > > > > > > > Signed-off-by: Sergio Lopez 
> > > > > > > 
> > > > > > > Ugh, so we get a mixed list of BdrvChild and BlockDriverState? :-/
> > > > > > 
> > > > > > I know, it's effective but quite ugly...
> > > > > > 
> > > > > > > What is the specific scenario where you saw this breaking? Did 
> > > > > > > you have
> > > > > > > multiple BdrvChild connections between two nodes so that we would 
> > > > > > > go to
> > > > > > > the parent node through one and then come back to the child node 
> > > > > > > through
> > > > > > > the other?
> > > > > > 
> > > > > > I don't think this is a corner case. If the graph is walked 
> > > > > > top->down,
> > > > > > there's no problem since children are added to the ignore list 
> > > > > > before
> > > > > > getting processed, and siblings don't process each other. But, if 
> > > > > > the
> > > > > > graph is walked bottom->up, a BDS will start processing its parents
> > > > > > without adding itself to the ignore list, so there's nothing
> > > > > > preventing them from processing it again.
> > > > > 
> > > > > I don't understand. child is added to ignore before calling the parent
> > > > > callback on it, so how can we come back through the same BdrvChild?
> > > > > 
> > > > > QLIST_FOREACH(child, &bs->parents, next_parent) {
> > > > > if (g_slist_find(*ignore, child)) {
> > > > > continue;
> > > > > }
> > > > > assert(child->klass->set_aio_ctx);
> > > > > *ignore = g_slist_prepend(*ignore, child);
> > > > > child->klass->set_aio_ctx(child, new_context, ignore);
> > > > > }
> > > > 
> > > > Perhaps I'm missing something, but the way I understand it, that loop
> > > > is adding the BdrvChild pointer of each of its parents, but not the
> > > > BdrvChild pointer of the BDS that was passed as an argument to
> > > > b_s_a_c_i.
> > > 
> > > Generally, the caller has already done that.
> > > 
> > > In the theoretical case that it was the outermost call in the recursion
> > > and it hasn't (I couldn't find any such case), I think we should still
> > > call the callback for the passed BdrvChild like we currently do.
> > > 
> > > > > You didn't dump the BdrvChild here. I think that would add some
> > > > > information on why we re-entered 0x555ee2fbf660. Maybe you can also 
> > > > > add
> > > > > bs->drv->format_name for each node to make the scenario less abstract?
> > > > 
> > > > I've generated another trace with more data:
> > > > 
> > > > bs=0x565505e48030 (backup-top) enter
> > > > bs=0x565505e48030 (backup-top) processing children
> > > > bs=0x565505e48030 (backup-top) calling bsaci child=0x565505e42090 
> > > > (child->bs=0x565505e5d420)
> > > > bs=0x565505e5d420 (qcow2) enter
> > > > bs=0x565505e5d420 (qcow2) processing children
> > > > bs=0x565505e5d420 (qcow2) calling bsaci child=0x565505e41ea0 
> > > > (child->bs=0x565505e52060)
> > > > bs=0x565505e52060 (file) enter
> > > > bs=0x565505e52060 (file) processing children
> > > > bs=0x565505e52060 (file) processing parents
> > > > bs=0x565505e52060 (file) processing itself
> > > > bs=0x565505e5d420 (qcow2) processing parents
> > > > bs=0x565505e5d420 (qcow2) calling set_aio_ctx child=0x5655066a34d0
> > > > bs=0x565505fbf660 (qcow2) enter
> > > > bs=0x565505fbf660 (qcow2) processing children
> > > > bs=0x565505fbf660 (qcow2) calling bsaci child=0x565505e41d20 
> > > > (child->bs=0x565506bc0c00)
> > > > bs=0x565506bc0c00 (file) enter
> > > > bs=0x565506bc0c00 (file) processing children
> > > > bs=0x565506bc0c00 (file) processing parents
> > > > bs=0x565506bc0c00 (file) processing itself
> > > > bs=0x565505fbf660 (qcow2) processing parents
> > > > bs=0x565505fbf660 (qcow2) calling set_aio_ctx child=0x565505fc7aa0
> > > > bs=0x565505fbf660 (qcow2) calling set_aio_ctx child=0x5655068b8510
> > > > bs=0x565505e48030 (backup-top) enter
> > > > bs=0x56

[linux-5.4 test] 157603: regressions - FAIL

2020-12-17 Thread osstest service owner
flight 157603 linux-5.4 real [real]
flight 157634 linux-5.4 real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/157603/
http://logs.test-lab.xenproject.org/osstest/logs/157634/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemuu-ovmf-amd64 12 debian-hvm-install fail REGR. vs. 157431

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 157431
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 157431
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 157431
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 157431
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 157431
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 157431
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 157431
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 157431
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 157431
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 157431
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 157431
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 linux8a866bdbbac227a99b0b37e03679908642f58aec
baseline version:
 linux2bff021f53b211386abad8cd661e6bb38d0fd524

Last test of basis   157431  2020-12-11 12:40:36 Z5 days
Testing same since   157603  2020-12-16 10:11:52 Z1 days1 attempts


People who touched revisions under test:
  Andrew Morton 
  Andy Lutomirski 
  Arnd Berg

Re: [PATCH v2 2/4] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()

2020-12-17 Thread Kevin Wolf
Am 17.12.2020 um 10:37 hat Sergio Lopez geschrieben:
> On Wed, Dec 16, 2020 at 07:31:02PM +0100, Kevin Wolf wrote:
> > Am 16.12.2020 um 15:55 hat Sergio Lopez geschrieben:
> > > On Wed, Dec 16, 2020 at 01:35:14PM +0100, Kevin Wolf wrote:
> > > > Am 15.12.2020 um 18:23 hat Sergio Lopez geschrieben:
> > > > > On Tue, Dec 15, 2020 at 04:01:19PM +0100, Kevin Wolf wrote:
> > > > > > Am 15.12.2020 um 14:15 hat Sergio Lopez geschrieben:
> > > > > > > On Tue, Dec 15, 2020 at 01:12:33PM +0100, Kevin Wolf wrote:
> > > > > > > > Am 14.12.2020 um 18:05 hat Sergio Lopez geschrieben:
> > > > > > > > > While processing the parents of a BDS, one of the parents may 
> > > > > > > > > process
> > > > > > > > > the child that's doing the tail recursion, which leads to a 
> > > > > > > > > BDS being
> > > > > > > > > processed twice. This is especially problematic for the 
> > > > > > > > > aio_notifiers,
> > > > > > > > > as they might attempt to work on both the old and the new AIO
> > > > > > > > > contexts.
> > > > > > > > > 
> > > > > > > > > To avoid this, add the BDS pointer to the ignore list, and 
> > > > > > > > > check the
> > > > > > > > > child BDS pointer while iterating over the children.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Sergio Lopez 
> > > > > > > > 
> > > > > > > > Ugh, so we get a mixed list of BdrvChild and BlockDriverState? 
> > > > > > > > :-/
> > > > > > > 
> > > > > > > I know, it's effective but quite ugly...
> > > > > > > 
> > > > > > > > What is the specific scenario where you saw this breaking? Did 
> > > > > > > > you have
> > > > > > > > multiple BdrvChild connections between two nodes so that we 
> > > > > > > > would go to
> > > > > > > > the parent node through one and then come back to the child 
> > > > > > > > node through
> > > > > > > > the other?
> > > > > > > 
> > > > > > > I don't think this is a corner case. If the graph is walked 
> > > > > > > top->down,
> > > > > > > there's no problem since children are added to the ignore list 
> > > > > > > before
> > > > > > > getting processed, and siblings don't process each other. But, if 
> > > > > > > the
> > > > > > > graph is walked bottom->up, a BDS will start processing its 
> > > > > > > parents
> > > > > > > without adding itself to the ignore list, so there's nothing
> > > > > > > preventing them from processing it again.
> > > > > > 
> > > > > > I don't understand. child is added to ignore before calling the 
> > > > > > parent
> > > > > > callback on it, so how can we come back through the same BdrvChild?
> > > > > > 
> > > > > > QLIST_FOREACH(child, &bs->parents, next_parent) {
> > > > > > if (g_slist_find(*ignore, child)) {
> > > > > > continue;
> > > > > > }
> > > > > > assert(child->klass->set_aio_ctx);
> > > > > > *ignore = g_slist_prepend(*ignore, child);
> > > > > > child->klass->set_aio_ctx(child, new_context, ignore);
> > > > > > }
> > > > > 
> > > > > Perhaps I'm missing something, but the way I understand it, that loop
> > > > > is adding the BdrvChild pointer of each of its parents, but not the
> > > > > BdrvChild pointer of the BDS that was passed as an argument to
> > > > > b_s_a_c_i.
> > > > 
> > > > Generally, the caller has already done that.
> > > > 
> > > > In the theoretical case that it was the outermost call in the recursion
> > > > and it hasn't (I couldn't find any such case), I think we should still
> > > > call the callback for the passed BdrvChild like we currently do.
> > > > 
> > > > > > You didn't dump the BdrvChild here. I think that would add some
> > > > > > information on why we re-entered 0x555ee2fbf660. Maybe you can also 
> > > > > > add
> > > > > > bs->drv->format_name for each node to make the scenario less 
> > > > > > abstract?
> > > > > 
> > > > > I've generated another trace with more data:
> > > > > 
> > > > > bs=0x565505e48030 (backup-top) enter
> > > > > bs=0x565505e48030 (backup-top) processing children
> > > > > bs=0x565505e48030 (backup-top) calling bsaci child=0x565505e42090 
> > > > > (child->bs=0x565505e5d420)
> > > > > bs=0x565505e5d420 (qcow2) enter
> > > > > bs=0x565505e5d420 (qcow2) processing children
> > > > > bs=0x565505e5d420 (qcow2) calling bsaci child=0x565505e41ea0 
> > > > > (child->bs=0x565505e52060)
> > > > > bs=0x565505e52060 (file) enter
> > > > > bs=0x565505e52060 (file) processing children
> > > > > bs=0x565505e52060 (file) processing parents
> > > > > bs=0x565505e52060 (file) processing itself
> > > > > bs=0x565505e5d420 (qcow2) processing parents
> > > > > bs=0x565505e5d420 (qcow2) calling set_aio_ctx child=0x5655066a34d0
> > > > > bs=0x565505fbf660 (qcow2) enter
> > > > > bs=0x565505fbf660 (qcow2) processing children
> > > > > bs=0x565505fbf660 (qcow2) calling bsaci child=0x565505e41d20 
> > > > > (child->bs=0x565506bc0c00)
> > > > > bs=0x565506bc0c00 (file) enter
> > > > > bs=0x565506bc0c00 (file) processing children
> > > > > bs=0x565506bc0c00 (file) processing parents
> > > > > b

Re: [PATCH v3 4/8] xen/hypfs: support dynamic hypfs nodes

2020-12-17 Thread Jan Beulich
On 09.12.2020 17:09, Juergen Gross wrote:
> @@ -158,6 +159,30 @@ static void node_exit_all(void)
>  node_exit(*last);
>  }
>  
> +void *hypfs_alloc_dyndata_size(unsigned long size)
> +{
> +unsigned int cpu = smp_processor_id();
> +
> +ASSERT(per_cpu(hypfs_locked, cpu) != hypfs_unlocked);
> +ASSERT(per_cpu(hypfs_dyndata, cpu) == NULL);
> +
> +per_cpu(hypfs_dyndata, cpu) = xzalloc_bytes(size);
> +
> +return per_cpu(hypfs_dyndata, cpu);
> +}
> +
> +void *hypfs_get_dyndata(void)
> +{
> +ASSERT(this_cpu(hypfs_dyndata));
> +
> +return this_cpu(hypfs_dyndata);
> +}
> +
> +void hypfs_free_dyndata(void)
> +{
> +XFREE(this_cpu(hypfs_dyndata));
> +}

In all three cases, would an intermediate local variable perhaps
yield better generated code? (In hypfs_get_dyndata() this may be
less important because the 2nd use is only an ASSERT().)

> @@ -219,6 +244,12 @@ int hypfs_add_dir(struct hypfs_entry_dir *parent,
>  return ret;
>  }
>  
> +void hypfs_add_dyndir(struct hypfs_entry_dir *parent,
> +  struct hypfs_entry_dir *template)
> +{
> +template->e.parent = &parent->e;
> +}

I'm struggling with the direction here: This makes the template
point at the parent, but the parent will still have no
"knowledge" of its new templated children. I suppose that's how
it is meant to be, but maybe this could do with a comment, since
it's the opposite way of hypfs_add_dir()?

Also - does this mean parent may not also have further children,
templated or "normal"?

> @@ -177,6 +182,10 @@ struct hypfs_entry *hypfs_leaf_findentry(const struct 
> hypfs_entry_dir *dir,
>  struct hypfs_entry *hypfs_dir_findentry(const struct hypfs_entry_dir *dir,
>  const char *name,
>  unsigned int name_len);
> +void *hypfs_alloc_dyndata_size(unsigned long size);
> +#define hypfs_alloc_dyndata(type) (type 
> *)hypfs_alloc_dyndata_size(sizeof(type))

This wants an extra pair of parentheses.

As a minor point, I also wonder whether you really want the type
unsafe version to be easily usable. It would be possible to
largely "hide" it by having

void *hypfs_alloc_dyndata(unsigned long size);
#define hypfs_alloc_dyndata(type) ((type *)hypfs_alloc_dyndata(sizeof(type)))

Jan



Re: [PATCH v3 4/8] xen/hypfs: support dynamic hypfs nodes

2020-12-17 Thread Jürgen Groß

On 17.12.20 12:01, Jan Beulich wrote:

On 09.12.2020 17:09, Juergen Gross wrote:

@@ -158,6 +159,30 @@ static void node_exit_all(void)
  node_exit(*last);
  }
  
+void *hypfs_alloc_dyndata_size(unsigned long size)

+{
+unsigned int cpu = smp_processor_id();
+
+ASSERT(per_cpu(hypfs_locked, cpu) != hypfs_unlocked);
+ASSERT(per_cpu(hypfs_dyndata, cpu) == NULL);
+
+per_cpu(hypfs_dyndata, cpu) = xzalloc_bytes(size);
+
+return per_cpu(hypfs_dyndata, cpu);
+}
+
+void *hypfs_get_dyndata(void)
+{
+ASSERT(this_cpu(hypfs_dyndata));
+
+return this_cpu(hypfs_dyndata);
+}
+
+void hypfs_free_dyndata(void)
+{
+XFREE(this_cpu(hypfs_dyndata));
+}


In all three cases, would an intermediate local variable perhaps
yield better generated code? (In hypfs_get_dyndata() this may be
less important because the 2nd use is only an ASSERT().)


Okay.




@@ -219,6 +244,12 @@ int hypfs_add_dir(struct hypfs_entry_dir *parent,
  return ret;
  }
  
+void hypfs_add_dyndir(struct hypfs_entry_dir *parent,

+  struct hypfs_entry_dir *template)
+{
+template->e.parent = &parent->e;
+}


I'm struggling with the direction here: This makes the template
point at the parent, but the parent will still have no
"knowledge" of its new templated children. I suppose that's how
it is meant to be, but maybe this could do with a comment, since
it's the opposite way of hypfs_add_dir()?


I'll add a comment.



Also - does this mean parent may not also have further children,
templated or "normal"?


No, the related read and findentry functions just need to cover that
case, e.g. by calling multiple sub-functions.




@@ -177,6 +182,10 @@ struct hypfs_entry *hypfs_leaf_findentry(const struct 
hypfs_entry_dir *dir,
  struct hypfs_entry *hypfs_dir_findentry(const struct hypfs_entry_dir *dir,
  const char *name,
  unsigned int name_len);
+void *hypfs_alloc_dyndata_size(unsigned long size);
+#define hypfs_alloc_dyndata(type) (type 
*)hypfs_alloc_dyndata_size(sizeof(type))


This wants an extra pair of parentheses.


Okay.



As a minor point, I also wonder whether you really want the type
unsafe version to be easily usable. It would be possible to
largely "hide" it by having

void *hypfs_alloc_dyndata(unsigned long size);
#define hypfs_alloc_dyndata(type) ((type *)hypfs_alloc_dyndata(sizeof(type)))


Yes, will change.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v3 5/8] xen/hypfs: add support for id-based dynamic directories

2020-12-17 Thread Jan Beulich
On 09.12.2020 17:09, Juergen Gross wrote:
> +static const struct hypfs_entry *hypfs_dyndir_enter(
> +const struct hypfs_entry *entry)
> +{
> +const struct hypfs_dyndir_id *data;
> +
> +data = hypfs_get_dyndata();
> +
> +/* Use template with original enter function. */
> +return data->template->e.funcs->enter(&data->template->e);
> +}

At the example of this (applies to other uses as well): I realize
hypfs_get_dyndata() asserts that the pointer is non-NULL, but
according to the bottom of ./CODING_STYLE this may not be enough
when considering the implications of a NULL deref in the context
of a PV guest. Even this living behind a sysctl doesn't really
help, both because via XSM not fully privileged domains can be
granted access, and because speculation may still occur all the
way into here. (I'll send a patch to address the latter aspect in
a few minutes.) While likely we have numerous existing examples
with similar problems, I guess in new code we'd better be as
defensive as possible.

> +/*
> + * Fill dyndata with a dynamically generated directory based on a template
> + * and a numerical id.
> + * Needs to be kept in sync with hypfs_read_dyndir_id_entry() regarding the
> + * name generated.
> + */
> +struct hypfs_entry *hypfs_gen_dyndir_id_entry(
> +const struct hypfs_entry_dir *template, unsigned int id, void *data)
> +{

s/directory/entry/ in the comment (and as a I realize only now
then also for hypfs_read_dyndir_id_entry())?

Jan



Re: [PATCH v3 5/8] xen/hypfs: add support for id-based dynamic directories

2020-12-17 Thread Jürgen Groß

On 17.12.20 12:28, Jan Beulich wrote:

On 09.12.2020 17:09, Juergen Gross wrote:

+static const struct hypfs_entry *hypfs_dyndir_enter(
+const struct hypfs_entry *entry)
+{
+const struct hypfs_dyndir_id *data;
+
+data = hypfs_get_dyndata();
+
+/* Use template with original enter function. */
+return data->template->e.funcs->enter(&data->template->e);
+}


At the example of this (applies to other uses as well): I realize
hypfs_get_dyndata() asserts that the pointer is non-NULL, but
according to the bottom of ./CODING_STYLE this may not be enough
when considering the implications of a NULL deref in the context
of a PV guest. Even this living behind a sysctl doesn't really
help, both because via XSM not fully privileged domains can be
granted access, and because speculation may still occur all the
way into here. (I'll send a patch to address the latter aspect in
a few minutes.) While likely we have numerous existing examples
with similar problems, I guess in new code we'd better be as
defensive as possible.


What do you suggest? BUG_ON()?

You are aware that this is nothing a user can influence, so it would
be a clear coding error in the hypervisor?




+/*
+ * Fill dyndata with a dynamically generated directory based on a template
+ * and a numerical id.
+ * Needs to be kept in sync with hypfs_read_dyndir_id_entry() regarding the
+ * name generated.
+ */
+struct hypfs_entry *hypfs_gen_dyndir_id_entry(
+const struct hypfs_entry_dir *template, unsigned int id, void *data)
+{


s/directory/entry/ in the comment (and as a I realize only now
then also for hypfs_read_dyndir_id_entry())?


Oh, indeed.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


[PATCH] xsm/dummy: harden against speculative abuse

2020-12-17 Thread Jan Beulich
First of all don't open-code is_control_domain(), which is already
suitably using evaluate_nospec(). Then also apply this construct to the
other paths of xsm_default_action(). Also guard two paths not using this
function.

Signed-off-by: Jan Beulich 
---
While the functions are always_inline I'm not entirely certain we can
get away with doing this inside of them, rather than in the callers. It
will certainly take more to also guard builds with non-dummy XSM.

--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -76,20 +76,20 @@ static always_inline int xsm_default_act
 case XSM_HOOK:
 return 0;
 case XSM_TARGET:
-if ( src == target )
+if ( evaluate_nospec(src == target) )
 {
 return 0;
 case XSM_XS_PRIV:
-if ( is_xenstore_domain(src) )
+if ( evaluate_nospec(is_xenstore_domain(src)) )
 return 0;
 }
 /* fall through */
 case XSM_DM_PRIV:
-if ( target && src->target == target )
+if ( target && evaluate_nospec(src->target == target) )
 return 0;
 /* fall through */
 case XSM_PRIV:
-if ( src->is_privileged )
+if ( !is_control_domain(src) )
 return 0;
 return -EPERM;
 default:
@@ -656,7 +656,7 @@ static XSM_INLINE int xsm_mmu_update(XSM
 XSM_ASSERT_ACTION(XSM_TARGET);
 if ( f != dom_io )
 rc = xsm_default_action(action, d, f);
-if ( t && !rc )
+if ( evaluate_nospec(t) && !rc )
 rc = xsm_default_action(action, d, t);
 return rc;
 }
@@ -750,6 +750,7 @@ static XSM_INLINE int xsm_xen_version (X
 case XENVER_platform_parameters:
 case XENVER_get_features:
 /* These sub-ops ignore the permission checks and return data. */
+block_speculation();
 return 0;
 case XENVER_extraversion:
 case XENVER_compile_info:



Re: [PATCH v3 5/8] xen/hypfs: add support for id-based dynamic directories

2020-12-17 Thread Jan Beulich
On 17.12.2020 12:32, Jürgen Groß wrote:
> On 17.12.20 12:28, Jan Beulich wrote:
>> On 09.12.2020 17:09, Juergen Gross wrote:
>>> +static const struct hypfs_entry *hypfs_dyndir_enter(
>>> +const struct hypfs_entry *entry)
>>> +{
>>> +const struct hypfs_dyndir_id *data;
>>> +
>>> +data = hypfs_get_dyndata();
>>> +
>>> +/* Use template with original enter function. */
>>> +return data->template->e.funcs->enter(&data->template->e);
>>> +}
>>
>> At the example of this (applies to other uses as well): I realize
>> hypfs_get_dyndata() asserts that the pointer is non-NULL, but
>> according to the bottom of ./CODING_STYLE this may not be enough
>> when considering the implications of a NULL deref in the context
>> of a PV guest. Even this living behind a sysctl doesn't really
>> help, both because via XSM not fully privileged domains can be
>> granted access, and because speculation may still occur all the
>> way into here. (I'll send a patch to address the latter aspect in
>> a few minutes.) While likely we have numerous existing examples
>> with similar problems, I guess in new code we'd better be as
>> defensive as possible.
> 
> What do you suggest? BUG_ON()?

Well, BUG_ON() would be a step in the right direction, converting
privilege escalation to DoS. The question is if we can't do better
here, gracefully failing in such a case (the usual pair of
ASSERT_UNREACHABLE() plus return/break/goto approach doesn't fit
here, at least not directly).

> You are aware that this is nothing a user can influence, so it would
> be a clear coding error in the hypervisor?

A user (or guest) can't arrange for there to be a NULL pointer,
but if there is one that can be run into here, this would still
require an XSA afaict.

Jan



Re: [PATCH v2 2/4] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()

2020-12-17 Thread Vladimir Sementsov-Ogievskiy

17.12.2020 13:58, Kevin Wolf wrote:

Am 17.12.2020 um 10:37 hat Sergio Lopez geschrieben:

On Wed, Dec 16, 2020 at 07:31:02PM +0100, Kevin Wolf wrote:

Am 16.12.2020 um 15:55 hat Sergio Lopez geschrieben:

On Wed, Dec 16, 2020 at 01:35:14PM +0100, Kevin Wolf wrote:

Am 15.12.2020 um 18:23 hat Sergio Lopez geschrieben:

On Tue, Dec 15, 2020 at 04:01:19PM +0100, Kevin Wolf wrote:

Am 15.12.2020 um 14:15 hat Sergio Lopez geschrieben:

On Tue, Dec 15, 2020 at 01:12:33PM +0100, Kevin Wolf wrote:

Am 14.12.2020 um 18:05 hat Sergio Lopez geschrieben:

While processing the parents of a BDS, one of the parents may process
the child that's doing the tail recursion, which leads to a BDS being
processed twice. This is especially problematic for the aio_notifiers,
as they might attempt to work on both the old and the new AIO
contexts.

To avoid this, add the BDS pointer to the ignore list, and check the
child BDS pointer while iterating over the children.

Signed-off-by: Sergio Lopez 


Ugh, so we get a mixed list of BdrvChild and BlockDriverState? :-/


I know, it's effective but quite ugly...


What is the specific scenario where you saw this breaking? Did you have
multiple BdrvChild connections between two nodes so that we would go to
the parent node through one and then come back to the child node through
the other?


I don't think this is a corner case. If the graph is walked top->down,
there's no problem since children are added to the ignore list before
getting processed, and siblings don't process each other. But, if the
graph is walked bottom->up, a BDS will start processing its parents
without adding itself to the ignore list, so there's nothing
preventing them from processing it again.


I don't understand. child is added to ignore before calling the parent
callback on it, so how can we come back through the same BdrvChild?

 QLIST_FOREACH(child, &bs->parents, next_parent) {
 if (g_slist_find(*ignore, child)) {
 continue;
 }
 assert(child->klass->set_aio_ctx);
 *ignore = g_slist_prepend(*ignore, child);
 child->klass->set_aio_ctx(child, new_context, ignore);
 }


Perhaps I'm missing something, but the way I understand it, that loop
is adding the BdrvChild pointer of each of its parents, but not the
BdrvChild pointer of the BDS that was passed as an argument to
b_s_a_c_i.


Generally, the caller has already done that.

In the theoretical case that it was the outermost call in the recursion
and it hasn't (I couldn't find any such case), I think we should still
call the callback for the passed BdrvChild like we currently do.


You didn't dump the BdrvChild here. I think that would add some
information on why we re-entered 0x555ee2fbf660. Maybe you can also add
bs->drv->format_name for each node to make the scenario less abstract?


I've generated another trace with more data:

bs=0x565505e48030 (backup-top) enter
bs=0x565505e48030 (backup-top) processing children
bs=0x565505e48030 (backup-top) calling bsaci child=0x565505e42090 
(child->bs=0x565505e5d420)
bs=0x565505e5d420 (qcow2) enter
bs=0x565505e5d420 (qcow2) processing children
bs=0x565505e5d420 (qcow2) calling bsaci child=0x565505e41ea0 
(child->bs=0x565505e52060)
bs=0x565505e52060 (file) enter
bs=0x565505e52060 (file) processing children
bs=0x565505e52060 (file) processing parents
bs=0x565505e52060 (file) processing itself
bs=0x565505e5d420 (qcow2) processing parents
bs=0x565505e5d420 (qcow2) calling set_aio_ctx child=0x5655066a34d0
bs=0x565505fbf660 (qcow2) enter
bs=0x565505fbf660 (qcow2) processing children
bs=0x565505fbf660 (qcow2) calling bsaci child=0x565505e41d20 
(child->bs=0x565506bc0c00)
bs=0x565506bc0c00 (file) enter
bs=0x565506bc0c00 (file) processing children
bs=0x565506bc0c00 (file) processing parents
bs=0x565506bc0c00 (file) processing itself
bs=0x565505fbf660 (qcow2) processing parents
bs=0x565505fbf660 (qcow2) calling set_aio_ctx child=0x565505fc7aa0
bs=0x565505fbf660 (qcow2) calling set_aio_ctx child=0x5655068b8510
bs=0x565505e48030 (backup-top) enter
bs=0x565505e48030 (backup-top) processing children
bs=0x565505e48030 (backup-top) calling bsaci child=0x565505e3c450 
(child->bs=0x565505fbf660)
bs=0x565505fbf660 (qcow2) enter
bs=0x565505fbf660 (qcow2) processing children
bs=0x565505fbf660 (qcow2) processing parents
bs=0x565505fbf660 (qcow2) processing itself
bs=0x565505e48030 (backup-top) processing parents
bs=0x565505e48030 (backup-top) calling set_aio_ctx child=0x565505e402d0
bs=0x565505e48030 (backup-top) processing itself
bs=0x565505fbf660 (qcow2) processing itself


Hm, is this complete? Is see no "processing itself" for
bs=0x565505e5d420. Or is this because it crashed before getting there?


Yes, it crashes there. I forgot to mention that, sorry.


Anyway, trying to reconstruct the block graph with BdrvChild pointers
annotated at the edges:

BlockBackend
   |
   v
   backup-top +
   |   |  |
   |   +---

Re: [PATCH v2 2/4] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()

2020-12-17 Thread Kevin Wolf
Am 17.12.2020 um 13:50 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 17.12.2020 13:58, Kevin Wolf wrote:
> > Am 17.12.2020 um 10:37 hat Sergio Lopez geschrieben:
> > > On Wed, Dec 16, 2020 at 07:31:02PM +0100, Kevin Wolf wrote:
> > > > Am 16.12.2020 um 15:55 hat Sergio Lopez geschrieben:
> > > > > On Wed, Dec 16, 2020 at 01:35:14PM +0100, Kevin Wolf wrote:
> > > > > > Anyway, trying to reconstruct the block graph with BdrvChild 
> > > > > > pointers
> > > > > > annotated at the edges:
> > > > > > 
> > > > > > BlockBackend
> > > > > >|
> > > > > >v
> > > > > >backup-top +
> > > > > >|   |  |
> > > > > >|   +---+  |
> > > > > >|0x5655068b8510 |  | 0x565505e3c450
> > > > > >|   |  |
> > > > > >| 0x565505e42090|  |
> > > > > >v   |  |
> > > > > >  qcow2 -+  |  |
> > > > > >||  |  |
> > > > > >| 0x565505e52060 |  |  | ??? [1]
> > > > > >||  |  |  |
> > > > > >v 0x5655066a34d0 |  |  |  | 0x565505fc7aa0
> > > > > >  file   v  v  v  v
> > > > > >   qcow2 (backing)
> > > > > >  |
> > > > > >  | 0x565505e41d20
> > > > > >  v
> > > > > >file
> > > > > > 
> > > > > > [1] This seems to be a BdrvChild with a non-BDS parent. Probably a
> > > > > >  BdrvChild directly owned by the backup job.
> > > > > > 
> > > > > > > So it seems this is happening:
> > > > > > > 
> > > > > > > backup-top (5e48030) <-| (5)
> > > > > > > ||  |
> > > > > > > || (6) > qcow2 (5fbf660)
> > > > > > > |   ^|
> > > > > > > |   (3) || (4)
> > > > > > > |-> (1) qcow2 (5e5d420) -|-> file (6bc0c00)
> > > > > > > |
> > > > > > > |-> (2) file (5e52060)
> > > > > > > 
> > > > > > > backup-top (5e48030), the BDS that was passed as argument in the 
> > > > > > > first
> > > > > > > bdrv_set_aio_context_ignore() call, is re-entered when qcow2 
> > > > > > > (5fbf660)
> > > > > > > is processing its parents, and the latter is also re-entered when 
> > > > > > > the
> > > > > > > first one starts processing its children again.
> > > > > > 
> > > > > > Yes, but look at the BdrvChild pointers, it is through different 
> > > > > > edges
> > > > > > that we come back to the same node. No BdrvChild is used twice.
> > > > > > 
> > > > > > If backup-top had added all of its children to the ignore list 
> > > > > > before
> > > > > > calling into the overlay qcow2, the backing qcow2 wouldn't 
> > > > > > eventually
> > > > > > have called back into backup-top.
> > > > > 
> > > > > I've tested a patch that first adds every child to the ignore list,
> > > > > and then processes those that weren't there before, as you suggested
> > > > > on a previous email. With that, the offending qcow2 is not re-entered,
> > > > > so we avoid the crash, but backup-top is still entered twice:
> > > > 
> > > > I think we also need to every parent to the ignore list before calling
> > > > callbacks, though it doesn't look like this is the problem you're
> > > > currently seeing.
> > > 
> > > I agree.
> > > 
> > > > > bs=0x560db0e3b030 (backup-top) enter
> > > > > bs=0x560db0e3b030 (backup-top) processing children
> > > > > bs=0x560db0e3b030 (backup-top) calling bsaci child=0x560db0e2f450 
> > > > > (child->bs=0x560db0fb2660)
> > > > > bs=0x560db0fb2660 (qcow2) enter
> > > > > bs=0x560db0fb2660 (qcow2) processing children
> > > > > bs=0x560db0fb2660 (qcow2) calling bsaci child=0x560db0e34d20 
> > > > > (child->bs=0x560db1bb3c00)
> > > > > bs=0x560db1bb3c00 (file) enter
> > > > > bs=0x560db1bb3c00 (file) processing children
> > > > > bs=0x560db1bb3c00 (file) processing parents
> > > > > bs=0x560db1bb3c00 (file) processing itself
> > > > > bs=0x560db0fb2660 (qcow2) calling bsaci child=0x560db16964d0 
> > > > > (child->bs=0x560db0e50420)
> > > > > bs=0x560db0e50420 (qcow2) enter
> > > > > bs=0x560db0e50420 (qcow2) processing children
> > > > > bs=0x560db0e50420 (qcow2) calling bsaci child=0x560db0e34ea0 
> > > > > (child->bs=0x560db0e45060)
> > > > > bs=0x560db0e45060 (file) enter
> > > > > bs=0x560db0e45060 (file) processing children
> > > > > bs=0x560db0e45060 (file) processing parents
> > > > > bs=0x560db0e45060 (file) processing itself
> > > > > bs=0x560db0e50420 (qcow2) processing parents
> > > > > bs=0x560db0e50420 (qcow2) processing itself
> > > > > bs=0x560db0fb2660 (qcow2) processing parents
> > > > > bs=0x560db0fb2660 (qcow2) calling set_aio_ctx child=0x560db1672860
> > > > > bs=0x560db0fb2660 (qcow2) calling set_aio_ctx child

Re: [PATCH v2 2/4] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()

2020-12-17 Thread Sergio Lopez
On Thu, Dec 17, 2020 at 11:58:30AM +0100, Kevin Wolf wrote:
> Am 17.12.2020 um 10:37 hat Sergio Lopez geschrieben:
> > Do you think it's safe to re-enter backup-top, or should we look for a
> > way to avoid this?
> 
> I think it should be avoided, but I don't understand why putting all
> children of backup-top into the ignore list doesn't already avoid it. If
> backup-top is in the parents list of qcow2, then qcow2 should be in the
> children list of backup-top and therefore the BdrvChild should already
> be in the ignore list.
> 
> The only way I can explain this is that backup-top and qcow2 have
> different ideas about which BdrvChild objects exist that connect them.
> Or that the graph changes between both places, but I don't see how that
> could happen in bdrv_set_aio_context_ignore().

I've been digging around with gdb, and found that, at that point, the
backup-top BDS is actually referenced by two different BdrvChild
objects:

(gdb) p *(BdrvChild *) 0x560c40f7e400
$84 = {bs = 0x560c40c4c030, name = 0x560c41ca4960 "root", klass = 
0x560c3eae7c20 , 
  role = 20, opaque = 0x560c41ca4610, perm = 3, shared_perm = 29, 
has_backup_perm = false, 
  backup_perm = 0, backup_shared_perm = 31, frozen = false, 
parent_quiesce_counter = 2, next = {
le_next = 0x0, le_prev = 0x0}, next_parent = {le_next = 0x0, le_prev = 
0x560c40c44338}}

(gdb) p sibling
$72 = (BdrvChild *) 0x560c40981840
(gdb) p *sibling
$73 = {bs = 0x560c40c4c030, name = 0x560c4161be20 "main node", klass = 
0x560c3eae6a40 , 
  role = 0, opaque = 0x560c4161bc00, perm = 0, shared_perm = 31, 
has_backup_perm = false, 
  backup_perm = 0, backup_shared_perm = 0, frozen = false, 
parent_quiesce_counter = 2, next = {
le_next = 0x0, le_prev = 0x0}, next_parent = {le_next = 0x560c40c442d0, 
le_prev = 0x560c40c501c0}}

When the chain of calls to switch AIO contexts is started, backup-top
is the first one to be processed. blk_do_set_aio_context() instructs
bdrv_child_try_set_aio_context() to add blk->root (0x560c40f7e400) as
the first element in ignore list, but the referenced BDS is still
re-entered through the other BdrvChild (0x560c40981840) by one the
children of the latter.

I can't think of a way of preventing this other than keeping track of
BDS pointers in the ignore list too. Do you think there are any
alternatives?

Thanks,
Sergio.


signature.asc
Description: PGP signature


Re: [PATCH v3 08/15] x86: add new features for paravirt patching

2020-12-17 Thread kernel test robot
Hi Juergen,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.10]
[cannot apply to xen-tip/linux-next tip/x86/core tip/x86/asm next-20201217]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Juergen-Gross/x86-major-paravirt-cleanup/20201217-173646
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
accefff5b547a9a1d959c7e76ad539bf2480e78b
config: i386-randconfig-r021-20201217 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
# 
https://github.com/0day-ci/linux/commit/032ee351da7a8adab17b0306cf5908b02f5728d2
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Juergen-Gross/x86-major-paravirt-cleanup/20201217-173646
git checkout 032ee351da7a8adab17b0306cf5908b02f5728d2
# save the attached .config to linux build tree
make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   ld: arch/x86/kernel/alternative.o: in function `paravirt_set_cap':
>> arch/x86/kernel/alternative.c:605: undefined reference to 
>> `pv_is_native_spin_unlock'
>> ld: arch/x86/kernel/alternative.c:608: undefined reference to 
>> `pv_is_native_vcpu_is_preempted'


vim +605 arch/x86/kernel/alternative.c

   601  
   602  #ifdef CONFIG_PARAVIRT
   603  static void __init paravirt_set_cap(void)
   604  {
 > 605  if (!pv_is_native_spin_unlock())
   606  setup_force_cpu_cap(X86_FEATURE_PVUNLOCK);
   607  
 > 608  if (!pv_is_native_vcpu_is_preempted())
   609  setup_force_cpu_cap(X86_FEATURE_VCPUPREEMPT);
   610  }
   611  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH v2 2/4] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()

2020-12-17 Thread Sergio Lopez
On Thu, Dec 17, 2020 at 02:06:02PM +0100, Kevin Wolf wrote:
> Am 17.12.2020 um 13:50 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > 17.12.2020 13:58, Kevin Wolf wrote:
> > > Am 17.12.2020 um 10:37 hat Sergio Lopez geschrieben:
> > > > On Wed, Dec 16, 2020 at 07:31:02PM +0100, Kevin Wolf wrote:
> > > > > Am 16.12.2020 um 15:55 hat Sergio Lopez geschrieben:
> > > > > > On Wed, Dec 16, 2020 at 01:35:14PM +0100, Kevin Wolf wrote:
> > > > > > > Anyway, trying to reconstruct the block graph with BdrvChild 
> > > > > > > pointers
> > > > > > > annotated at the edges:
> > > > > > > 
> > > > > > > BlockBackend
> > > > > > >|
> > > > > > >v
> > > > > > >backup-top +
> > > > > > >|   |  |
> > > > > > >|   +---+  |
> > > > > > >|0x5655068b8510 |  | 0x565505e3c450
> > > > > > >|   |  |
> > > > > > >| 0x565505e42090|  |
> > > > > > >v   |  |
> > > > > > >  qcow2 -+  |  |
> > > > > > >||  |  |
> > > > > > >| 0x565505e52060 |  |  | ??? [1]
> > > > > > >||  |  |  |
> > > > > > >v 0x5655066a34d0 |  |  |  | 0x565505fc7aa0
> > > > > > >  file   v  v  v  v
> > > > > > >   qcow2 (backing)
> > > > > > >  |
> > > > > > >  | 0x565505e41d20
> > > > > > >  v
> > > > > > >file
> > > > > > > 
> > > > > > > [1] This seems to be a BdrvChild with a non-BDS parent. Probably a
> > > > > > >  BdrvChild directly owned by the backup job.
> > > > > > > 
> > > > > > > > So it seems this is happening:
> > > > > > > > 
> > > > > > > > backup-top (5e48030) <-| (5)
> > > > > > > > ||  |
> > > > > > > > || (6) > qcow2 (5fbf660)
> > > > > > > > |   ^|
> > > > > > > > |   (3) || (4)
> > > > > > > > |-> (1) qcow2 (5e5d420) -|-> file (6bc0c00)
> > > > > > > > |
> > > > > > > > |-> (2) file (5e52060)
> > > > > > > > 
> > > > > > > > backup-top (5e48030), the BDS that was passed as argument in 
> > > > > > > > the first
> > > > > > > > bdrv_set_aio_context_ignore() call, is re-entered when qcow2 
> > > > > > > > (5fbf660)
> > > > > > > > is processing its parents, and the latter is also re-entered 
> > > > > > > > when the
> > > > > > > > first one starts processing its children again.
> > > > > > > 
> > > > > > > Yes, but look at the BdrvChild pointers, it is through different 
> > > > > > > edges
> > > > > > > that we come back to the same node. No BdrvChild is used twice.
> > > > > > > 
> > > > > > > If backup-top had added all of its children to the ignore list 
> > > > > > > before
> > > > > > > calling into the overlay qcow2, the backing qcow2 wouldn't 
> > > > > > > eventually
> > > > > > > have called back into backup-top.
> > > > > > 
> > > > > > I've tested a patch that first adds every child to the ignore list,
> > > > > > and then processes those that weren't there before, as you suggested
> > > > > > on a previous email. With that, the offending qcow2 is not 
> > > > > > re-entered,
> > > > > > so we avoid the crash, but backup-top is still entered twice:
> > > > > 
> > > > > I think we also need to every parent to the ignore list before calling
> > > > > callbacks, though it doesn't look like this is the problem you're
> > > > > currently seeing.
> > > > 
> > > > I agree.
> > > > 
> > > > > > bs=0x560db0e3b030 (backup-top) enter
> > > > > > bs=0x560db0e3b030 (backup-top) processing children
> > > > > > bs=0x560db0e3b030 (backup-top) calling bsaci child=0x560db0e2f450 
> > > > > > (child->bs=0x560db0fb2660)
> > > > > > bs=0x560db0fb2660 (qcow2) enter
> > > > > > bs=0x560db0fb2660 (qcow2) processing children
> > > > > > bs=0x560db0fb2660 (qcow2) calling bsaci child=0x560db0e34d20 
> > > > > > (child->bs=0x560db1bb3c00)
> > > > > > bs=0x560db1bb3c00 (file) enter
> > > > > > bs=0x560db1bb3c00 (file) processing children
> > > > > > bs=0x560db1bb3c00 (file) processing parents
> > > > > > bs=0x560db1bb3c00 (file) processing itself
> > > > > > bs=0x560db0fb2660 (qcow2) calling bsaci child=0x560db16964d0 
> > > > > > (child->bs=0x560db0e50420)
> > > > > > bs=0x560db0e50420 (qcow2) enter
> > > > > > bs=0x560db0e50420 (qcow2) processing children
> > > > > > bs=0x560db0e50420 (qcow2) calling bsaci child=0x560db0e34ea0 
> > > > > > (child->bs=0x560db0e45060)
> > > > > > bs=0x560db0e45060 (file) enter
> > > > > > bs=0x560db0e45060 (file) processing children
> > > > > > bs=0x560db0e45060 (file) processing parents
> > > > > > bs=0x560db0e45060 (file) processing itself
> > > > > > bs=0x560db0e50420 (qcow

Re: [PATCH v2 2/4] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()

2020-12-17 Thread Vladimir Sementsov-Ogievskiy

17.12.2020 16:06, Kevin Wolf wrote:

Am 17.12.2020 um 13:50 hat Vladimir Sementsov-Ogievskiy geschrieben:

17.12.2020 13:58, Kevin Wolf wrote:

Am 17.12.2020 um 10:37 hat Sergio Lopez geschrieben:

On Wed, Dec 16, 2020 at 07:31:02PM +0100, Kevin Wolf wrote:

Am 16.12.2020 um 15:55 hat Sergio Lopez geschrieben:

On Wed, Dec 16, 2020 at 01:35:14PM +0100, Kevin Wolf wrote:

Anyway, trying to reconstruct the block graph with BdrvChild pointers
annotated at the edges:

BlockBackend
|
v
backup-top +
|   |  |
|   +---+  |
|0x5655068b8510 |  | 0x565505e3c450
|   |  |
| 0x565505e42090|  |
v   |  |
  qcow2 -+  |  |
||  |  |
| 0x565505e52060 |  |  | ??? [1]
||  |  |  |
v 0x5655066a34d0 |  |  |  | 0x565505fc7aa0
  file   v  v  v  v
   qcow2 (backing)
  |
  | 0x565505e41d20
  v
file

[1] This seems to be a BdrvChild with a non-BDS parent. Probably a
  BdrvChild directly owned by the backup job.


So it seems this is happening:

backup-top (5e48030) <-| (5)
 ||  |
 || (6) > qcow2 (5fbf660)
 |   ^|
 |   (3) || (4)
 |-> (1) qcow2 (5e5d420) -|-> file (6bc0c00)
 |
 |-> (2) file (5e52060)

backup-top (5e48030), the BDS that was passed as argument in the first
bdrv_set_aio_context_ignore() call, is re-entered when qcow2 (5fbf660)
is processing its parents, and the latter is also re-entered when the
first one starts processing its children again.


Yes, but look at the BdrvChild pointers, it is through different edges
that we come back to the same node. No BdrvChild is used twice.

If backup-top had added all of its children to the ignore list before
calling into the overlay qcow2, the backing qcow2 wouldn't eventually
have called back into backup-top.


I've tested a patch that first adds every child to the ignore list,
and then processes those that weren't there before, as you suggested
on a previous email. With that, the offending qcow2 is not re-entered,
so we avoid the crash, but backup-top is still entered twice:


I think we also need to every parent to the ignore list before calling
callbacks, though it doesn't look like this is the problem you're
currently seeing.


I agree.


bs=0x560db0e3b030 (backup-top) enter
bs=0x560db0e3b030 (backup-top) processing children
bs=0x560db0e3b030 (backup-top) calling bsaci child=0x560db0e2f450 
(child->bs=0x560db0fb2660)
bs=0x560db0fb2660 (qcow2) enter
bs=0x560db0fb2660 (qcow2) processing children
bs=0x560db0fb2660 (qcow2) calling bsaci child=0x560db0e34d20 
(child->bs=0x560db1bb3c00)
bs=0x560db1bb3c00 (file) enter
bs=0x560db1bb3c00 (file) processing children
bs=0x560db1bb3c00 (file) processing parents
bs=0x560db1bb3c00 (file) processing itself
bs=0x560db0fb2660 (qcow2) calling bsaci child=0x560db16964d0 
(child->bs=0x560db0e50420)
bs=0x560db0e50420 (qcow2) enter
bs=0x560db0e50420 (qcow2) processing children
bs=0x560db0e50420 (qcow2) calling bsaci child=0x560db0e34ea0 
(child->bs=0x560db0e45060)
bs=0x560db0e45060 (file) enter
bs=0x560db0e45060 (file) processing children
bs=0x560db0e45060 (file) processing parents
bs=0x560db0e45060 (file) processing itself
bs=0x560db0e50420 (qcow2) processing parents
bs=0x560db0e50420 (qcow2) processing itself
bs=0x560db0fb2660 (qcow2) processing parents
bs=0x560db0fb2660 (qcow2) calling set_aio_ctx child=0x560db1672860
bs=0x560db0fb2660 (qcow2) calling set_aio_ctx child=0x560db1b14a20
bs=0x560db0e3b030 (backup-top) enter
bs=0x560db0e3b030 (backup-top) processing children
bs=0x560db0e3b030 (backup-top) processing parents
bs=0x560db0e3b030 (backup-top) calling set_aio_ctx child=0x560db0e332d0
bs=0x560db0e3b030 (backup-top) processing itself
bs=0x560db0fb2660 (qcow2) processing itself
bs=0x560db0e3b030 (backup-top) calling bsaci child=0x560db0e35090 
(child->bs=0x560db0e50420)
bs=0x560db0e50420 (qcow2) enter
bs=0x560db0e3b030 (backup-top) processing parents
bs=0x560db0e3b030 (backup-top) processing itself

I see that "blk_do_set_aio_context()" passes "blk->root" to
"bdrv_child_try_set_aio_context()" so it's already in the ignore list,
so I'm not sure what's happening here. Is backup-top is referenced
from two different BdrvChild or is "blk->root" not pointing to
backup-top's BDS?


The second time that backup-top is entered, it is not as the BDS of
blk->root, but as the parent node of the overlay qcow2. Which is
interesting, because last time it was still the backing qcow2, so the
change did have _som

Re: [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen

2020-12-17 Thread Ash Wilding
Hi Julien,

Thanks for taking a look at the patches and providing feedback. I've seen your
other comments and will reply to those separately when I get a chance (maybe at
the weekend or over the Christmas break).

RE the differences in ordering semantics between Xen's and Linux's atomics
helpers, please find my notes below.

Thoughts?

Cheers,
Ash.


The tables below use format AAA/BBB/CCC/DDD/EEE, where:

 - AAA is the memory barrier before the operation
 - BBB is the acquire semantics of the atomic operation
 - CCC is the release semantics of the atomic operation
 - DDD is whether the asm() block clobbers memory
 - EEE is the memory barrier after the operation

For example, ---/---/rel/mem/dmb would mean:

 - No memory barrier before the operation
 - The atomic does *not* have acquire semantics
 - The atomic *does* have release semantics
 - The asm() block clobbers memory
 - There is a DMB memory barrier after the atomic operation


arm64 LL/SC
===

Xen FunctionXen Linux   
Inconsistent
=== =   


atomic_add  ---/---/---/---/--- ---/---/---/---/--- 
---
atomic_add_return   ---/---/rel/mem/dmb ---/---/rel/mem/dmb 
--- (1)
atomic_sub  ---/---/---/---/--- ---/---/---/---/--- 
---
atomic_sub_return   ---/---/rel/mem/dmb ---/---/rel/mem/dmb 
--- (1)
atomic_and  ---/---/---/---/--- ---/---/---/---/--- 
---
atomic_cmpxchg  dmb/---/---/---/dmb ---/---/rel/mem/--- 
YES (2)
atomic_xchg ---/---/rel/mem/dmb ---/acq/rel/mem/dmb 
YES (3)

(1) It's actually interesting to me that Linux does it this way. As with the
LSE atomics below, I'd have expected acq/rel semantics and ditch the DMB.
Unless I'm missing something where there is a concern around taking an IRQ
between the LDAXR and the STLXR, which can't happen in the LSE atomic case
since it's a single instruction. But the exclusive monitor is cleared on
exception return in AArch64 so I'm struggling to see what that potential
issue may be. Regardless, Linux and Xen are consistent so we're OK ;-)

(2) The Linux version uses either STLXR with rel semantics if the comparison
passes, or DMB if the comparison fails. This is weaker than Xen's version,
which is quite blunt in always wrapping the operation between two DMBs. This
may be a holdover from Xen's arm32 versions being ported to arm64, as we
didn't support acq/rel semantics on LDREX and STREX in Armv7-A? Regardless,
this is quite a big discrepancy and I've not yet given it enough thought to
determine whether it would actually cause an issue. My feeling is that the
Linux LL/SC atomic_cmpxchg() should have have acq semantics on the LL, but
like you said these helpers are well tested so I'd be surprised if there
is a bug. See (5) below though, where the Linux LSE atomic_cmpxchg() *does*
have acq semantics.

(3) The Linux version just adds acq semantics to the LL, so we're OK here.


arm64 LSE (comparison to Xen's LL/SC)
=

Xen FunctionXen Linux   
Inconsistent
=== =   


atomic_add  ---/---/---/---/--- ---/---/---/---/--- 
---
atomic_add_return   ---/---/rel/mem/dmb ---/acq/rel/mem/--- 
YES (4)
atomic_sub  ---/---/---/---/--- ---/---/---/---/--- 
---
atomic_sub_return   ---/---/rel/mem/dmb ---/acq/rel/mem/--- 
YES (4)
atomic_and  ---/---/---/---/--- ---/---/---/---/--- 
---
atomic_cmpxchg  dmb/---/---/---/dmb ---/acq/rel/mem/--- 
YES (5)
atomic_xchg ---/---/rel/mem/dmb ---/acq/rel/mem/--- 
YES (4)

(4) As noted in (1), this is how I would have expected Linux's LL/SC atomics to
work too. I don't think this discrepancy will cause any issues.

(5) As with (2) above, this is quite a big discrepancy to Xen. However at least
this version has acq semantics unlike the LL/SC version in (2), so I'm more
confident that there won't be regressions going from Xen LL/SC to Linux LSE
version of atomic_cmpxchg().


arm32 LL/SC
===

Xen FunctionXen Linux   
Inconsistent
=== =   


atomic_add  ---/---/---/---/--- ---/---/---/---/--- 
---
atomic_add_return   dmb/---/---/---/dmb XXX/XXX/XXX/XXX/XXX 
YES (6)
atomic_sub  ---/---/---/---/--- ---/---/---/

[PATCH v4 0/8] xen/arm: Emulate ID registers

2020-12-17 Thread Bertrand Marquis
The goal of this serie is to emulate coprocessor ID registers so that
Xen only publish to guest features that are supported by Xen and can
actually be used by guests.
One practical example where this is required are SVE support which is
forbidden by Xen as it is not supported, but if Linux is compiled with
it, it will crash on boot. An other one is AMU which is also forbidden
by Xen but one Linux compiled with it would crash if the platform
supports it.

To be able to emulate the coprocessor registers defining what features
are supported by the hardware, the TID3 bit of HCR must be disabled and
Xen must emulated the values of those registers when an exception is
catched when a guest is accessing it.

This serie is first creating a guest cpuinfo structure which will
contain the values that we want to publish to the guests and then
provides the proper emulationg for those registers when Xen is getting
an exception due to an access to any of those registers.

This is a first simple implementation to solve the problem and the way
to define the values that we provide to guests and which features are
disabled will be in a future patchset enhance so that we could decide
per guest what can be used or not and depending on this deduce the bits
to activate in HCR and the values that we must publish on ID registers.

---
Changes in V2:
  Fix First patch to properly handle DFR1 register and increase dbg32
  size. Other patches have just been rebased.

Changes in V3:
  Add handling of reserved registers as RAZ
  Minor fixes described in each patch

Changes in V4:
  Add a patch to switch implementation to use READ_SYSREG instead of the
  32/64 bit version of it.
  Move cases for reserved register handling from macros to the code
  itself.
  Various typos fixes.

Bertrand Marquis (8):
  xen/arm: Use READ_SYSREG instead of 32/64 versions
  xen/arm: Add ID registers and complete cpuinfo
  xen/arm: Add arm64 ID registers definitions
  xen/arm: create a cpuinfo structure for guest
  xen/arm: Add handler for ID registers on arm64
  xen/arm: Add handler for cp15 ID registers
  xen/arm: Add CP10 exception support to handle MVFR
  xen/arm: Activate TID3 in HCR_EL2

 xen/arch/arm/arm64/vsysreg.c|  82 
 xen/arch/arm/cpufeature.c   | 113 ++--
 xen/arch/arm/traps.c|   7 +-
 xen/arch/arm/vcpreg.c   | 102 +
 xen/include/asm-arm/arm64/hsr.h |  37 +
 xen/include/asm-arm/arm64/sysregs.h |  28 +++
 xen/include/asm-arm/cpregs.h|  15 
 xen/include/asm-arm/cpufeature.h|  58 +++---
 xen/include/asm-arm/perfc_defn.h|   1 +
 xen/include/asm-arm/traps.h |   1 +
 10 files changed, 409 insertions(+), 35 deletions(-)

-- 
2.17.1




[qemu-mainline test] 157613: regressions - FAIL

2020-12-17 Thread osstest service owner
flight 157613 qemu-mainline real [real]
flight 157642 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/157613/
http://logs.test-lab.xenproject.org/osstest/logs/157642/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeat fail REGR. vs. 152631
 test-amd64-i386-xl-qemuu-ovmf-amd64 12 debian-hvm-install fail REGR. vs. 152631
 test-amd64-amd64-xl-qcow2   21 guest-start/debian.repeat fail REGR. vs. 152631
 test-armhf-armhf-xl-vhd 17 guest-start/debian.repeat fail REGR. vs. 152631

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152631
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 152631
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 152631
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152631
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152631
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 152631
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152631
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuuaf3f37319cb1e1ca0c42842ecdbd1bcfc64a4b6f
baseline version:
 qemuu1d806cef0e38b5db8347a8e12f214d543204a314

Last test of basis   152631  2020-08-20 09:07:46 Z  119 days
Failing since152659  2020-08-21 14:07:39 Z  118 days  246 attempts
Testing same since   157613  2020-12-16 21:42:22 Z0 days1 attempts


318 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xs

[PATCH v4 1/8] xen/arm: Use READ_SYSREG instead of 32/64 versions

2020-12-17 Thread Bertrand Marquis
Modify identify_cpu function to use READ_SYSREG instead of READ_SYSREG32
or READ_SYSREG64.
The aarch32 versions of the registers are 64bit on an aarch64 processor
so it was wrong to access them as 32bit registers.

Signed-off-by: Bertrand Marquis 
---
Change in V4:
  This patch was introduced in v4.

---
 xen/arch/arm/cpufeature.c | 50 +++
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
index 44126dbf07..115e1b164d 100644
--- a/xen/arch/arm/cpufeature.c
+++ b/xen/arch/arm/cpufeature.c
@@ -99,44 +99,44 @@ int enable_nonboot_cpu_caps(const struct 
arm_cpu_capabilities *caps)
 
 void identify_cpu(struct cpuinfo_arm *c)
 {
-c->midr.bits = READ_SYSREG32(MIDR_EL1);
+c->midr.bits = READ_SYSREG(MIDR_EL1);
 c->mpidr.bits = READ_SYSREG(MPIDR_EL1);
 
 #ifdef CONFIG_ARM_64
-c->pfr64.bits[0] = READ_SYSREG64(ID_AA64PFR0_EL1);
-c->pfr64.bits[1] = READ_SYSREG64(ID_AA64PFR1_EL1);
+c->pfr64.bits[0] = READ_SYSREG(ID_AA64PFR0_EL1);
+c->pfr64.bits[1] = READ_SYSREG(ID_AA64PFR1_EL1);
 
-c->dbg64.bits[0] = READ_SYSREG64(ID_AA64DFR0_EL1);
-c->dbg64.bits[1] = READ_SYSREG64(ID_AA64DFR1_EL1);
+c->dbg64.bits[0] = READ_SYSREG(ID_AA64DFR0_EL1);
+c->dbg64.bits[1] = READ_SYSREG(ID_AA64DFR1_EL1);
 
-c->aux64.bits[0] = READ_SYSREG64(ID_AA64AFR0_EL1);
-c->aux64.bits[1] = READ_SYSREG64(ID_AA64AFR1_EL1);
+c->aux64.bits[0] = READ_SYSREG(ID_AA64AFR0_EL1);
+c->aux64.bits[1] = READ_SYSREG(ID_AA64AFR1_EL1);
 
-c->mm64.bits[0]  = READ_SYSREG64(ID_AA64MMFR0_EL1);
-c->mm64.bits[1]  = READ_SYSREG64(ID_AA64MMFR1_EL1);
+c->mm64.bits[0]  = READ_SYSREG(ID_AA64MMFR0_EL1);
+c->mm64.bits[1]  = READ_SYSREG(ID_AA64MMFR1_EL1);
 
-c->isa64.bits[0] = READ_SYSREG64(ID_AA64ISAR0_EL1);
-c->isa64.bits[1] = READ_SYSREG64(ID_AA64ISAR1_EL1);
+c->isa64.bits[0] = READ_SYSREG(ID_AA64ISAR0_EL1);
+c->isa64.bits[1] = READ_SYSREG(ID_AA64ISAR1_EL1);
 #endif
 
-c->pfr32.bits[0] = READ_SYSREG32(ID_PFR0_EL1);
-c->pfr32.bits[1] = READ_SYSREG32(ID_PFR1_EL1);
+c->pfr32.bits[0] = READ_SYSREG(ID_PFR0_EL1);
+c->pfr32.bits[1] = READ_SYSREG(ID_PFR1_EL1);
 
-c->dbg32.bits[0] = READ_SYSREG32(ID_DFR0_EL1);
+c->dbg32.bits[0] = READ_SYSREG(ID_DFR0_EL1);
 
-c->aux32.bits[0] = READ_SYSREG32(ID_AFR0_EL1);
+c->aux32.bits[0] = READ_SYSREG(ID_AFR0_EL1);
 
-c->mm32.bits[0]  = READ_SYSREG32(ID_MMFR0_EL1);
-c->mm32.bits[1]  = READ_SYSREG32(ID_MMFR1_EL1);
-c->mm32.bits[2]  = READ_SYSREG32(ID_MMFR2_EL1);
-c->mm32.bits[3]  = READ_SYSREG32(ID_MMFR3_EL1);
+c->mm32.bits[0]  = READ_SYSREG(ID_MMFR0_EL1);
+c->mm32.bits[1]  = READ_SYSREG(ID_MMFR1_EL1);
+c->mm32.bits[2]  = READ_SYSREG(ID_MMFR2_EL1);
+c->mm32.bits[3]  = READ_SYSREG(ID_MMFR3_EL1);
 
-c->isa32.bits[0] = READ_SYSREG32(ID_ISAR0_EL1);
-c->isa32.bits[1] = READ_SYSREG32(ID_ISAR1_EL1);
-c->isa32.bits[2] = READ_SYSREG32(ID_ISAR2_EL1);
-c->isa32.bits[3] = READ_SYSREG32(ID_ISAR3_EL1);
-c->isa32.bits[4] = READ_SYSREG32(ID_ISAR4_EL1);
-c->isa32.bits[5] = READ_SYSREG32(ID_ISAR5_EL1);
+c->isa32.bits[0] = READ_SYSREG(ID_ISAR0_EL1);
+c->isa32.bits[1] = READ_SYSREG(ID_ISAR1_EL1);
+c->isa32.bits[2] = READ_SYSREG(ID_ISAR2_EL1);
+c->isa32.bits[3] = READ_SYSREG(ID_ISAR3_EL1);
+c->isa32.bits[4] = READ_SYSREG(ID_ISAR4_EL1);
+c->isa32.bits[5] = READ_SYSREG(ID_ISAR5_EL1);
 }
 
 /*
-- 
2.17.1




[PATCH v4 2/8] xen/arm: Add ID registers and complete cpuinfo

2020-12-17 Thread Bertrand Marquis
Add definition and entries in cpuinfo for ID registers introduced in
newer Arm Architecture reference manual:
- ID_PFR2: processor feature register 2
- ID_DFR1: debug feature register 1
- ID_MMFR4 and ID_MMFR5: Memory model feature registers 4 and 5
- ID_ISA6: ISA Feature register 6
Add more bitfield definitions in PFR fields of cpuinfo.
Add MVFR2 register definition for aarch32.
Add MVFRx_EL1 defines for aarch32.
Add mvfr values in cpuinfo.
Add some registers definition for arm64 in sysregs as some are not
always know by compilers.
Initialize the new values added in cpuinfo in identify_cpu during init.

Signed-off-by: Bertrand Marquis 

---
Changes in V2:
  Fix dbg32 table size and add proper initialisation of the second entry
  of the table by reading ID_DFR1 register.
Changes in V3:
  Fix typo in commit title
  Add MVFR2 definition and handling on aarch32 and remove specific case
  for mvfr field in cpuinfo (now the same on arm64 and arm32).
  Add MMFR4 definition if not known by the compiler.
Changes in V4:
  Add MVFRx_EL1 defines for aarch32
  Use READ_SYSREG instead of 32/64 versions of the function which
  removed the ifdef case for MVFR access.
  User register_t type for mvfr and zfr64 fields of cpuinfo structure.

---
 xen/arch/arm/cpufeature.c   | 12 +++
 xen/include/asm-arm/arm64/sysregs.h | 28 +++
 xen/include/asm-arm/cpregs.h| 15 
 xen/include/asm-arm/cpufeature.h| 56 -
 4 files changed, 102 insertions(+), 9 deletions(-)

diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
index 115e1b164d..86b99ee960 100644
--- a/xen/arch/arm/cpufeature.c
+++ b/xen/arch/arm/cpufeature.c
@@ -114,15 +114,20 @@ void identify_cpu(struct cpuinfo_arm *c)
 
 c->mm64.bits[0]  = READ_SYSREG(ID_AA64MMFR0_EL1);
 c->mm64.bits[1]  = READ_SYSREG(ID_AA64MMFR1_EL1);
+c->mm64.bits[2]  = READ_SYSREG(ID_AA64MMFR2_EL1);
 
 c->isa64.bits[0] = READ_SYSREG(ID_AA64ISAR0_EL1);
 c->isa64.bits[1] = READ_SYSREG(ID_AA64ISAR1_EL1);
+
+c->zfr64.bits[0] = READ_SYSREG(ID_AA64ZFR0_EL1);
 #endif
 
 c->pfr32.bits[0] = READ_SYSREG(ID_PFR0_EL1);
 c->pfr32.bits[1] = READ_SYSREG(ID_PFR1_EL1);
+c->pfr32.bits[2] = READ_SYSREG(ID_PFR2_EL1);
 
 c->dbg32.bits[0] = READ_SYSREG(ID_DFR0_EL1);
+c->dbg32.bits[1] = READ_SYSREG(ID_DFR1_EL1);
 
 c->aux32.bits[0] = READ_SYSREG(ID_AFR0_EL1);
 
@@ -130,6 +135,8 @@ void identify_cpu(struct cpuinfo_arm *c)
 c->mm32.bits[1]  = READ_SYSREG(ID_MMFR1_EL1);
 c->mm32.bits[2]  = READ_SYSREG(ID_MMFR2_EL1);
 c->mm32.bits[3]  = READ_SYSREG(ID_MMFR3_EL1);
+c->mm32.bits[4]  = READ_SYSREG(ID_MMFR4_EL1);
+c->mm32.bits[5]  = READ_SYSREG(ID_MMFR5_EL1);
 
 c->isa32.bits[0] = READ_SYSREG(ID_ISAR0_EL1);
 c->isa32.bits[1] = READ_SYSREG(ID_ISAR1_EL1);
@@ -137,6 +144,11 @@ void identify_cpu(struct cpuinfo_arm *c)
 c->isa32.bits[3] = READ_SYSREG(ID_ISAR3_EL1);
 c->isa32.bits[4] = READ_SYSREG(ID_ISAR4_EL1);
 c->isa32.bits[5] = READ_SYSREG(ID_ISAR5_EL1);
+c->isa32.bits[6] = READ_SYSREG(ID_ISAR6_EL1);
+
+c->mvfr.bits[0] = READ_SYSREG(MVFR0_EL1);
+c->mvfr.bits[1] = READ_SYSREG(MVFR1_EL1);
+c->mvfr.bits[2] = READ_SYSREG(MVFR2_EL1);
 }
 
 /*
diff --git a/xen/include/asm-arm/arm64/sysregs.h 
b/xen/include/asm-arm/arm64/sysregs.h
index c60029d38f..077fd95fb7 100644
--- a/xen/include/asm-arm/arm64/sysregs.h
+++ b/xen/include/asm-arm/arm64/sysregs.h
@@ -57,6 +57,34 @@
 #define ICH_AP1R2_EL2 __AP1Rx_EL2(2)
 #define ICH_AP1R3_EL2 __AP1Rx_EL2(3)
 
+/*
+ * Define ID coprocessor registers if they are not
+ * already defined by the compiler.
+ *
+ * Values picked from linux kernel
+ */
+#ifndef ID_AA64MMFR2_EL1
+#define ID_AA64MMFR2_EL1S3_0_C0_C7_2
+#endif
+#ifndef ID_PFR2_EL1
+#define ID_PFR2_EL1 S3_0_C0_C3_4
+#endif
+#ifndef ID_MMFR4_EL1
+#define ID_MMFR4_EL1S3_0_C0_C2_6
+#endif
+#ifndef ID_MMFR5_EL1
+#define ID_MMFR5_EL1S3_0_C0_C3_6
+#endif
+#ifndef ID_ISAR6_EL1
+#define ID_ISAR6_EL1S3_0_C0_C2_7
+#endif
+#ifndef ID_AA64ZFR0_EL1
+#define ID_AA64ZFR0_EL1 S3_0_C0_C4_4
+#endif
+#ifndef ID_DFR1_EL1
+#define ID_DFR1_EL1 S3_0_C0_C3_5
+#endif
+
 /* Access to system registers */
 
 #define READ_SYSREG32(name) ((uint32_t)READ_SYSREG64(name))
diff --git a/xen/include/asm-arm/cpregs.h b/xen/include/asm-arm/cpregs.h
index 8fd344146e..6daf2b1a30 100644
--- a/xen/include/asm-arm/cpregs.h
+++ b/xen/include/asm-arm/cpregs.h
@@ -63,6 +63,8 @@
 #define FPSID   p10,7,c0,c0,0   /* Floating-Point System ID Register */
 #define FPSCR   p10,7,c1,c0,0   /* Floating-Point Status and Control 
Register */
 #define MVFR0   p10,7,c7,c0,0   /* Media and VFP Feature Register 0 */
+#define MVFR1   p10,7,c6,c0,0   /* Media and VFP Feature

[PATCH v4 4/8] xen/arm: create a cpuinfo structure for guest

2020-12-17 Thread Bertrand Marquis
Create a cpuinfo structure for guest and mask into it the features that
we do not support in Xen or that we do not want to publish to guests.

Modify some values in the cpuinfo structure for guests to mask some
features which we do not want to allow to guests (like AMU) or we do not
support (like SVE).
Modify some values in the guest cpuinfo structure to guests to hide some
processor features:
- SVE as this is not supported by Xen and guest are not allowed to use
this features (ZEN is set to 0 in CPTR_EL2).
- AMU as HCPTR_TAM is set in CPTR_EL2 so AMU cannot be used by guests
All other bits are left untouched.
- RAS as this is not supported by Xen.

The code is trying to group together registers modifications for the
same feature to be able in the long term to easily enable/disable a
feature depending on user parameters or add other registers modification
in the same place (like enabling/disabling HCR bits).

Signed-off-by: Bertrand Marquis 
---
Changes in V2: Rebase
Changes in V3:
  Use current_cpu_data info instead of recalling identify_cpu
Changes in V4:
  Use boot_cpu_data instead of current_cpu_data
  Use "hide XX support" instead of disable as this part of the code is
  actually only hidding feature to guests but not disabling them (this
  is done through the HCR register).
  Modify commit message to be more clear about what is done in
  guest_cpuinfo.

---
 xen/arch/arm/cpufeature.c| 51 
 xen/include/asm-arm/cpufeature.h |  2 ++
 2 files changed, 53 insertions(+)

diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
index 86b99ee960..1f6a85aafe 100644
--- a/xen/arch/arm/cpufeature.c
+++ b/xen/arch/arm/cpufeature.c
@@ -24,6 +24,8 @@
 
 DECLARE_BITMAP(cpu_hwcaps, ARM_NCAPS);
 
+struct cpuinfo_arm __read_mostly guest_cpuinfo;
+
 void update_cpu_capabilities(const struct arm_cpu_capabilities *caps,
  const char *info)
 {
@@ -151,6 +153,55 @@ void identify_cpu(struct cpuinfo_arm *c)
 c->mvfr.bits[2] = READ_SYSREG(MVFR2_EL1);
 }
 
+/*
+ * This function is creating a cpuinfo structure with values modified to mask
+ * all cpu features that should not be published to guest.
+ * The created structure is then used to provide ID registers values to guests.
+ */
+static int __init create_guest_cpuinfo(void)
+{
+/*
+ * TODO: The code is currently using only the features detected on the boot
+ * core. In the long term we should try to compute values containing only
+ * features supported by all cores.
+ */
+guest_cpuinfo = boot_cpu_data;
+
+#ifdef CONFIG_ARM_64
+/* Hide MPAM support as xen does not support it */
+guest_cpuinfo.pfr64.mpam = 0;
+guest_cpuinfo.pfr64.mpam_frac = 0;
+
+/* Hide SVE as Xen does not support it */
+guest_cpuinfo.pfr64.sve = 0;
+guest_cpuinfo.zfr64.bits[0] = 0;
+
+/* Hide MTE support as Xen does not support it */
+guest_cpuinfo.pfr64.mte = 0;
+#endif
+
+/* Hide AMU support */
+#ifdef CONFIG_ARM_64
+guest_cpuinfo.pfr64.amu = 0;
+#endif
+guest_cpuinfo.pfr32.amu = 0;
+
+/* Hide RAS support as Xen does not support it */
+#ifdef CONFIG_ARM_64
+guest_cpuinfo.pfr64.ras = 0;
+guest_cpuinfo.pfr64.ras_frac = 0;
+#endif
+guest_cpuinfo.pfr32.ras = 0;
+guest_cpuinfo.pfr32.ras_frac = 0;
+
+return 0;
+}
+/*
+ * This function needs to be run after all smp are started to have
+ * cpuinfo structures for all cores.
+ */
+__initcall(create_guest_cpuinfo);
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
index 74139be1cc..6058744c18 100644
--- a/xen/include/asm-arm/cpufeature.h
+++ b/xen/include/asm-arm/cpufeature.h
@@ -283,6 +283,8 @@ extern void identify_cpu(struct cpuinfo_arm *);
 extern struct cpuinfo_arm cpu_data[];
 #define current_cpu_data cpu_data[smp_processor_id()]
 
+extern struct cpuinfo_arm guest_cpuinfo;
+
 #endif /* __ASSEMBLY__ */
 
 #endif
-- 
2.17.1




[PATCH v4 3/8] xen/arm: Add arm64 ID registers definitions

2020-12-17 Thread Bertrand Marquis
Add coprocessor registers definitions for all ID registers trapped
through the TID3 bit of HSR.
Those are the one that will be emulated in Xen to only publish to guests
the features that are supported by Xen and that are accessible to
guests.

Signed-off-by: Bertrand Marquis 
---
Changes in V2: Rebase
Changes in V3:
  Add case definition for reserved registers.
Changes in V4:
  Remove case definition for reserved registers and move it to the code
  directly.

---
 xen/include/asm-arm/arm64/hsr.h | 37 +
 1 file changed, 37 insertions(+)

diff --git a/xen/include/asm-arm/arm64/hsr.h b/xen/include/asm-arm/arm64/hsr.h
index ca931dd2fe..e691d41c17 100644
--- a/xen/include/asm-arm/arm64/hsr.h
+++ b/xen/include/asm-arm/arm64/hsr.h
@@ -110,6 +110,43 @@
 #define HSR_SYSREG_CNTP_CTL_EL0   HSR_SYSREG(3,3,c14,c2,1)
 #define HSR_SYSREG_CNTP_CVAL_EL0  HSR_SYSREG(3,3,c14,c2,2)
 
+/* Those registers are used when HCR_EL2.TID3 is set */
+#define HSR_SYSREG_ID_PFR0_EL1HSR_SYSREG(3,0,c0,c1,0)
+#define HSR_SYSREG_ID_PFR1_EL1HSR_SYSREG(3,0,c0,c1,1)
+#define HSR_SYSREG_ID_PFR2_EL1HSR_SYSREG(3,0,c0,c3,4)
+#define HSR_SYSREG_ID_DFR0_EL1HSR_SYSREG(3,0,c0,c1,2)
+#define HSR_SYSREG_ID_DFR1_EL1HSR_SYSREG(3,0,c0,c3,5)
+#define HSR_SYSREG_ID_AFR0_EL1HSR_SYSREG(3,0,c0,c1,3)
+#define HSR_SYSREG_ID_MMFR0_EL1   HSR_SYSREG(3,0,c0,c1,4)
+#define HSR_SYSREG_ID_MMFR1_EL1   HSR_SYSREG(3,0,c0,c1,5)
+#define HSR_SYSREG_ID_MMFR2_EL1   HSR_SYSREG(3,0,c0,c1,6)
+#define HSR_SYSREG_ID_MMFR3_EL1   HSR_SYSREG(3,0,c0,c1,7)
+#define HSR_SYSREG_ID_MMFR4_EL1   HSR_SYSREG(3,0,c0,c2,6)
+#define HSR_SYSREG_ID_MMFR5_EL1   HSR_SYSREG(3,0,c0,c3,6)
+#define HSR_SYSREG_ID_ISAR0_EL1   HSR_SYSREG(3,0,c0,c2,0)
+#define HSR_SYSREG_ID_ISAR1_EL1   HSR_SYSREG(3,0,c0,c2,1)
+#define HSR_SYSREG_ID_ISAR2_EL1   HSR_SYSREG(3,0,c0,c2,2)
+#define HSR_SYSREG_ID_ISAR3_EL1   HSR_SYSREG(3,0,c0,c2,3)
+#define HSR_SYSREG_ID_ISAR4_EL1   HSR_SYSREG(3,0,c0,c2,4)
+#define HSR_SYSREG_ID_ISAR5_EL1   HSR_SYSREG(3,0,c0,c2,5)
+#define HSR_SYSREG_ID_ISAR6_EL1   HSR_SYSREG(3,0,c0,c2,7)
+#define HSR_SYSREG_MVFR0_EL1  HSR_SYSREG(3,0,c0,c3,0)
+#define HSR_SYSREG_MVFR1_EL1  HSR_SYSREG(3,0,c0,c3,1)
+#define HSR_SYSREG_MVFR2_EL1  HSR_SYSREG(3,0,c0,c3,2)
+
+#define HSR_SYSREG_ID_AA64PFR0_EL1   HSR_SYSREG(3,0,c0,c4,0)
+#define HSR_SYSREG_ID_AA64PFR1_EL1   HSR_SYSREG(3,0,c0,c4,1)
+#define HSR_SYSREG_ID_AA64DFR0_EL1   HSR_SYSREG(3,0,c0,c5,0)
+#define HSR_SYSREG_ID_AA64DFR1_EL1   HSR_SYSREG(3,0,c0,c5,1)
+#define HSR_SYSREG_ID_AA64ISAR0_EL1  HSR_SYSREG(3,0,c0,c6,0)
+#define HSR_SYSREG_ID_AA64ISAR1_EL1  HSR_SYSREG(3,0,c0,c6,1)
+#define HSR_SYSREG_ID_AA64MMFR0_EL1  HSR_SYSREG(3,0,c0,c7,0)
+#define HSR_SYSREG_ID_AA64MMFR1_EL1  HSR_SYSREG(3,0,c0,c7,1)
+#define HSR_SYSREG_ID_AA64MMFR2_EL1  HSR_SYSREG(3,0,c0,c7,2)
+#define HSR_SYSREG_ID_AA64AFR0_EL1   HSR_SYSREG(3,0,c0,c5,4)
+#define HSR_SYSREG_ID_AA64AFR1_EL1   HSR_SYSREG(3,0,c0,c5,5)
+#define HSR_SYSREG_ID_AA64ZFR0_EL1   HSR_SYSREG(3,0,c0,c4,4)
+
 #endif /* __ASM_ARM_ARM64_HSR_H */
 
 /*
-- 
2.17.1




[PATCH v4 5/8] xen/arm: Add handler for ID registers on arm64

2020-12-17 Thread Bertrand Marquis
Add vsysreg emulation for registers trapped when TID3 bit is activated
in HSR.
The emulation is returning the value stored in cpuinfo_guest structure
for know registers and is handling reserved registers as RAZ.

Signed-off-by: Bertrand Marquis 
---
Changes in V2: Rebase
Changes in V3:
  Fix commit message
  Fix code style for GENERATE_TID3_INFO declaration
  Add handling of reserved registers as RAZ.
Changes in V4:
  Fix indentation in GENERATE_TID3_INFO macro
  Add explicit case code for reserved registers

---
 xen/arch/arm/arm64/vsysreg.c | 82 
 1 file changed, 82 insertions(+)

diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
index 8a85507d9d..41f18612c6 100644
--- a/xen/arch/arm/arm64/vsysreg.c
+++ b/xen/arch/arm/arm64/vsysreg.c
@@ -69,6 +69,14 @@ TVM_REG(CONTEXTIDR_EL1)
 break;  \
 }
 
+/* Macro to generate easily case for ID co-processor emulation */
+#define GENERATE_TID3_INFO(reg, field, offset)  \
+case HSR_SYSREG_##reg:  \
+{   \
+return handle_ro_read_val(regs, regidx, hsr.sysreg.read, hsr,   \
+  1, guest_cpuinfo.field.bits[offset]); \
+}
+
 void do_sysreg(struct cpu_user_regs *regs,
const union hsr hsr)
 {
@@ -259,6 +267,80 @@ void do_sysreg(struct cpu_user_regs *regs,
  */
 return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1);
 
+/*
+ * HCR_EL2.TID3
+ *
+ * This is trapping most Identification registers used by a guest
+ * to identify the processor features
+ */
+GENERATE_TID3_INFO(ID_PFR0_EL1, pfr32, 0)
+GENERATE_TID3_INFO(ID_PFR1_EL1, pfr32, 1)
+GENERATE_TID3_INFO(ID_PFR2_EL1, pfr32, 2)
+GENERATE_TID3_INFO(ID_DFR0_EL1, dbg32, 0)
+GENERATE_TID3_INFO(ID_DFR1_EL1, dbg32, 1)
+GENERATE_TID3_INFO(ID_AFR0_EL1, aux32, 0)
+GENERATE_TID3_INFO(ID_MMFR0_EL1, mm32, 0)
+GENERATE_TID3_INFO(ID_MMFR1_EL1, mm32, 1)
+GENERATE_TID3_INFO(ID_MMFR2_EL1, mm32, 2)
+GENERATE_TID3_INFO(ID_MMFR3_EL1, mm32, 3)
+GENERATE_TID3_INFO(ID_MMFR4_EL1, mm32, 4)
+GENERATE_TID3_INFO(ID_MMFR5_EL1, mm32, 5)
+GENERATE_TID3_INFO(ID_ISAR0_EL1, isa32, 0)
+GENERATE_TID3_INFO(ID_ISAR1_EL1, isa32, 1)
+GENERATE_TID3_INFO(ID_ISAR2_EL1, isa32, 2)
+GENERATE_TID3_INFO(ID_ISAR3_EL1, isa32, 3)
+GENERATE_TID3_INFO(ID_ISAR4_EL1, isa32, 4)
+GENERATE_TID3_INFO(ID_ISAR5_EL1, isa32, 5)
+GENERATE_TID3_INFO(ID_ISAR6_EL1, isa32, 6)
+GENERATE_TID3_INFO(MVFR0_EL1, mvfr, 0)
+GENERATE_TID3_INFO(MVFR1_EL1, mvfr, 1)
+GENERATE_TID3_INFO(MVFR2_EL1, mvfr, 2)
+GENERATE_TID3_INFO(ID_AA64PFR0_EL1, pfr64, 0)
+GENERATE_TID3_INFO(ID_AA64PFR1_EL1, pfr64, 1)
+GENERATE_TID3_INFO(ID_AA64DFR0_EL1, dbg64, 0)
+GENERATE_TID3_INFO(ID_AA64DFR1_EL1, dbg64, 1)
+GENERATE_TID3_INFO(ID_AA64ISAR0_EL1, isa64, 0)
+GENERATE_TID3_INFO(ID_AA64ISAR1_EL1, isa64, 1)
+GENERATE_TID3_INFO(ID_AA64MMFR0_EL1, mm64, 0)
+GENERATE_TID3_INFO(ID_AA64MMFR1_EL1, mm64, 1)
+GENERATE_TID3_INFO(ID_AA64MMFR2_EL1, mm64, 2)
+GENERATE_TID3_INFO(ID_AA64AFR0_EL1, aux64, 0)
+GENERATE_TID3_INFO(ID_AA64AFR1_EL1, aux64, 1)
+GENERATE_TID3_INFO(ID_AA64ZFR0_EL1, zfr64, 0)
+
+/*
+ * Those cases are catching all Reserved registers trapped by TID3 which
+ * currently have no assignment.
+ * HCR.TID3 is trapping all registers in the group 3:
+ * Op0 == 3, op1 == 0, CRn == c0,CRm == {c1-c7}, op2 == {0-7}.
+ * Those registers are defined as being RO in the Arm Architecture
+ * Reference manual Armv8 (Chapter D12.3.2 of issue F.c) so handle them
+ * as Read-only read as zero.
+ */
+case HSR_SYSREG(3,0,c0,c3,3):
+case HSR_SYSREG(3,0,c0,c3,7):
+case HSR_SYSREG(3,0,c0,c4,2):
+case HSR_SYSREG(3,0,c0,c4,3):
+case HSR_SYSREG(3,0,c0,c4,5):
+case HSR_SYSREG(3,0,c0,c4,6):
+case HSR_SYSREG(3,0,c0,c4,7):
+case HSR_SYSREG(3,0,c0,c5,2):
+case HSR_SYSREG(3,0,c0,c5,3):
+case HSR_SYSREG(3,0,c0,c5,6):
+case HSR_SYSREG(3,0,c0,c5,7):
+case HSR_SYSREG(3,0,c0,c6,2):
+case HSR_SYSREG(3,0,c0,c6,3):
+case HSR_SYSREG(3,0,c0,c6,4):
+case HSR_SYSREG(3,0,c0,c6,5):
+case HSR_SYSREG(3,0,c0,c6,6):
+case HSR_SYSREG(3,0,c0,c6,7):
+case HSR_SYSREG(3,0,c0,c7,3):
+case HSR_SYSREG(3,0,c0,c7,4):
+case HSR_SYSREG(3,0,c0,c7,5):
+case HSR_SYSREG(3,0,c0,c7,6):
+case HSR_SYSREG(3,0,c0,c7,7):
+return handle_ro_raz(regs, regidx, hsr.sysreg.read, hsr, 1);
+
 /*
  * HCR_EL2.TIDCP
  *
-- 
2.17.1




[PATCH v4 6/8] xen/arm: Add handler for cp15 ID registers

2020-12-17 Thread Bertrand Marquis
Add support for emulation of cp15 based ID registers (on arm32 or when
running a 32bit guest on arm64).
The handlers are returning the values stored in the guest_cpuinfo
structure for known registers and RAZ for all reserved registers.
In the current status the MVFR registers are no supported.

Signed-off-by: Bertrand Marquis 
---
Changes in V2: Rebase
Changes in V3:
  Add case definition for reserved registers
  Add handling of reserved registers as RAZ.
  Fix code style in GENERATE_TID3_INFO declaration
Changes in V4:
  Fix comment for missing t (no to not)
  Put cases for reserved registers directly in the code instead of using
  a define in the cpregs.h header.

---
 xen/arch/arm/vcpreg.c | 65 +++
 1 file changed, 65 insertions(+)

diff --git a/xen/arch/arm/vcpreg.c b/xen/arch/arm/vcpreg.c
index cdc91cdf5b..1fe07fe02a 100644
--- a/xen/arch/arm/vcpreg.c
+++ b/xen/arch/arm/vcpreg.c
@@ -155,6 +155,24 @@ TVM_REG32(CONTEXTIDR, CONTEXTIDR_EL1)
 break;  \
 }
 
+/* Macro to generate easily case for ID co-processor emulation */
+#define GENERATE_TID3_INFO(reg, field, offset)  \
+case HSR_CPREG32(reg):  \
+{   \
+return handle_ro_read_val(regs, regidx, cp32.read, hsr, \
+  1, guest_cpuinfo.field.bits[offset]); \
+}
+
+/* helper to define cases for all registers for one CRm value */
+#define HSR_CPREG32_TID3_CASES(REG) case HSR_CPREG32(p15,0,c0,REG,0): \
+case HSR_CPREG32(p15,0,c0,REG,1): \
+case HSR_CPREG32(p15,0,c0,REG,2): \
+case HSR_CPREG32(p15,0,c0,REG,3): \
+case HSR_CPREG32(p15,0,c0,REG,4): \
+case HSR_CPREG32(p15,0,c0,REG,5): \
+case HSR_CPREG32(p15,0,c0,REG,6): \
+case HSR_CPREG32(p15,0,c0,REG,7)
+
 void do_cp15_32(struct cpu_user_regs *regs, const union hsr hsr)
 {
 const struct hsr_cp32 cp32 = hsr.cp32;
@@ -286,6 +304,53 @@ void do_cp15_32(struct cpu_user_regs *regs, const union 
hsr hsr)
  */
 return handle_raz_wi(regs, regidx, cp32.read, hsr, 1);
 
+/*
+ * HCR_EL2.TID3
+ *
+ * This is trapping most Identification registers used by a guest
+ * to identify the processor features
+ */
+GENERATE_TID3_INFO(ID_PFR0, pfr32, 0)
+GENERATE_TID3_INFO(ID_PFR1, pfr32, 1)
+GENERATE_TID3_INFO(ID_PFR2, pfr32, 2)
+GENERATE_TID3_INFO(ID_DFR0, dbg32, 0)
+GENERATE_TID3_INFO(ID_DFR1, dbg32, 1)
+GENERATE_TID3_INFO(ID_AFR0, aux32, 0)
+GENERATE_TID3_INFO(ID_MMFR0, mm32, 0)
+GENERATE_TID3_INFO(ID_MMFR1, mm32, 1)
+GENERATE_TID3_INFO(ID_MMFR2, mm32, 2)
+GENERATE_TID3_INFO(ID_MMFR3, mm32, 3)
+GENERATE_TID3_INFO(ID_MMFR4, mm32, 4)
+GENERATE_TID3_INFO(ID_MMFR5, mm32, 5)
+GENERATE_TID3_INFO(ID_ISAR0, isa32, 0)
+GENERATE_TID3_INFO(ID_ISAR1, isa32, 1)
+GENERATE_TID3_INFO(ID_ISAR2, isa32, 2)
+GENERATE_TID3_INFO(ID_ISAR3, isa32, 3)
+GENERATE_TID3_INFO(ID_ISAR4, isa32, 4)
+GENERATE_TID3_INFO(ID_ISAR5, isa32, 5)
+GENERATE_TID3_INFO(ID_ISAR6, isa32, 6)
+/* MVFR registers are in cp10 not cp15 */
+
+/*
+ * Those cases are catching all Reserved registers trapped by TID3 which
+ * currently have no assignment.
+ * HCR.TID3 is trapping all registers in the group 3:
+ * coproc == p15, opc1 == 0, CRn == c0, CRm == {c2-c7}, opc2 == {0-7}.
+ * Those registers are defined as being RO in the Arm Architecture
+ * Reference manual Armv8 (Chapter D12.3.2 of issue F.c) so handle them
+ * as Read-only read as zero.
+ */
+case HSR_CPREG32(p15,0,c0,c3,0):
+case HSR_CPREG32(p15,0,c0,c3,1):
+case HSR_CPREG32(p15,0,c0,c3,2):
+case HSR_CPREG32(p15,0,c0,c3,3):
+case HSR_CPREG32(p15,0,c0,c3,7):
+HSR_CPREG32_TID3_CASES(c4):
+HSR_CPREG32_TID3_CASES(c5):
+HSR_CPREG32_TID3_CASES(c6):
+HSR_CPREG32_TID3_CASES(c7):
+return handle_ro_raz(regs, regidx, cp32.read, hsr, 1);
+
 /*
  * HCR_EL2.TIDCP
  *
-- 
2.17.1




[PATCH v4 7/8] xen/arm: Add CP10 exception support to handle MVFR

2020-12-17 Thread Bertrand Marquis
Add support for cp10 exceptions decoding to be able to emulate the
values for MVFR0, MVFR1 and MVFR2 when TID3 bit of HSR is activated.
This is required for aarch32 guests accessing MVFR registers using
vmrs and vmsr instructions.

Signed-off-by: Bertrand Marquis 
---
Changes in V2: Rebase
Changes in V3:
  Add case for MVFR2, fix typo VMFR <-> MVFR.
Changes in V4:
  Fix typo HSR -> HCR
  Move no to not comment fix to previous patch

---
 xen/arch/arm/traps.c |  5 +
 xen/arch/arm/vcpreg.c| 37 
 xen/include/asm-arm/perfc_defn.h |  1 +
 xen/include/asm-arm/traps.h  |  1 +
 4 files changed, 44 insertions(+)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 22bd1bd4c6..28d9d64558 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2097,6 +2097,11 @@ void do_trap_guest_sync(struct cpu_user_regs *regs)
 perfc_incr(trap_cp14_dbg);
 do_cp14_dbg(regs, hsr);
 break;
+case HSR_EC_CP10:
+GUEST_BUG_ON(!psr_mode_is_32bit(regs));
+perfc_incr(trap_cp10);
+do_cp10(regs, hsr);
+break;
 case HSR_EC_CP:
 GUEST_BUG_ON(!psr_mode_is_32bit(regs));
 perfc_incr(trap_cp);
diff --git a/xen/arch/arm/vcpreg.c b/xen/arch/arm/vcpreg.c
index 1fe07fe02a..cbad8f25a0 100644
--- a/xen/arch/arm/vcpreg.c
+++ b/xen/arch/arm/vcpreg.c
@@ -664,6 +664,43 @@ void do_cp14_dbg(struct cpu_user_regs *regs, const union 
hsr hsr)
 inject_undef_exception(regs, hsr);
 }
 
+void do_cp10(struct cpu_user_regs *regs, const union hsr hsr)
+{
+const struct hsr_cp32 cp32 = hsr.cp32;
+int regidx = cp32.reg;
+
+if ( !check_conditional_instr(regs, hsr) )
+{
+advance_pc(regs, hsr);
+return;
+}
+
+switch ( hsr.bits & HSR_CP32_REGS_MASK )
+{
+/*
+ * HCR.TID3 is trapping access to MVFR register used to identify the
+ * VFP/Simd using VMRS/VMSR instructions.
+ * Exception encoding is using MRC/MCR standard with the reg field in Crn
+ * as are declared MVFR0 and MVFR1 in cpregs.h
+ */
+GENERATE_TID3_INFO(MVFR0, mvfr, 0)
+GENERATE_TID3_INFO(MVFR1, mvfr, 1)
+GENERATE_TID3_INFO(MVFR2, mvfr, 2)
+
+default:
+gdprintk(XENLOG_ERR,
+ "%s p10, %d, r%d, cr%d, cr%d, %d @ 0x%"PRIregister"\n",
+ cp32.read ? "mrc" : "mcr",
+ cp32.op1, cp32.reg, cp32.crn, cp32.crm, cp32.op2, regs->pc);
+gdprintk(XENLOG_ERR, "unhandled 32-bit CP10 access %#x\n",
+ hsr.bits & HSR_CP32_REGS_MASK);
+inject_undef_exception(regs, hsr);
+return;
+}
+
+advance_pc(regs, hsr);
+}
+
 void do_cp(struct cpu_user_regs *regs, const union hsr hsr)
 {
 const struct hsr_cp cp = hsr.cp;
diff --git a/xen/include/asm-arm/perfc_defn.h b/xen/include/asm-arm/perfc_defn.h
index 6a83185163..31f071222b 100644
--- a/xen/include/asm-arm/perfc_defn.h
+++ b/xen/include/asm-arm/perfc_defn.h
@@ -11,6 +11,7 @@ PERFCOUNTER(trap_cp15_64,  "trap: cp15 64-bit access")
 PERFCOUNTER(trap_cp14_32,  "trap: cp14 32-bit access")
 PERFCOUNTER(trap_cp14_64,  "trap: cp14 64-bit access")
 PERFCOUNTER(trap_cp14_dbg, "trap: cp14 dbg access")
+PERFCOUNTER(trap_cp10, "trap: cp10 access")
 PERFCOUNTER(trap_cp,   "trap: cp access")
 PERFCOUNTER(trap_smc32,"trap: 32-bit smc")
 PERFCOUNTER(trap_hvc32,"trap: 32-bit hvc")
diff --git a/xen/include/asm-arm/traps.h b/xen/include/asm-arm/traps.h
index 997c37884e..c4a3d0fb1b 100644
--- a/xen/include/asm-arm/traps.h
+++ b/xen/include/asm-arm/traps.h
@@ -62,6 +62,7 @@ void do_cp15_64(struct cpu_user_regs *regs, const union hsr 
hsr);
 void do_cp14_32(struct cpu_user_regs *regs, const union hsr hsr);
 void do_cp14_64(struct cpu_user_regs *regs, const union hsr hsr);
 void do_cp14_dbg(struct cpu_user_regs *regs, const union hsr hsr);
+void do_cp10(struct cpu_user_regs *regs, const union hsr hsr);
 void do_cp(struct cpu_user_regs *regs, const union hsr hsr);
 
 /* SMCCC handling */
-- 
2.17.1




[PATCH v4 8/8] xen/arm: Activate TID3 in HCR_EL2

2020-12-17 Thread Bertrand Marquis
Activate TID3 bit in HCR register when starting a guest.
This will trap all coprecessor ID registers so that we can give to guest
values corresponding to what they can actually use and mask some
features to guests even though they would be supported by the underlying
hardware (like SVE or MPAM).

Signed-off-by: Bertrand Marquis 
---
Changes in V2: Rebase
Changes in V3: Rebase
Changes in V4: Rebase

---
 xen/arch/arm/traps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 28d9d64558..c1a9ad6056 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -98,7 +98,7 @@ register_t get_default_hcr_flags(void)
 {
 return  (HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
  (vwfi != NATIVE ? (HCR_TWI|HCR_TWE) : 0) |
- HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB|HCR_TSW);
+ HCR_TID3|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB|HCR_TSW);
 }
 
 static enum {
-- 
2.17.1




Re: [PATCH v3 6/8] xen/cpupool: add cpupool directories

2020-12-17 Thread Jan Beulich
On 09.12.2020 17:09, Juergen Gross wrote:
> Add /cpupool/ directories to hypfs. Those are completely
> dynamic, so the related hypfs access functions need to be implemented.
> 
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich 




Re: [PATCH v3 7/8] xen/cpupool: add scheduling granularity entry to cpupool entries

2020-12-17 Thread Jan Beulich
On 09.12.2020 17:09, Juergen Gross wrote:
> @@ -1080,6 +1092,56 @@ static struct hypfs_entry *cpupool_dir_findentry(
>  return hypfs_gen_dyndir_id_entry(&cpupool_pooldir, id, cpupool);
>  }
>  
> +static int cpupool_gran_read(const struct hypfs_entry *entry,
> + XEN_GUEST_HANDLE_PARAM(void) uaddr)
> +{
> +const struct hypfs_dyndir_id *data;
> +const struct cpupool *cpupool;
> +const char *gran;
> +
> +data = hypfs_get_dyndata();
> +cpupool = data->data;
> +ASSERT(cpupool);

With this and ...

> +static unsigned int hypfs_gran_getsize(const struct hypfs_entry *entry)
> +{
> +const struct hypfs_dyndir_id *data;
> +const struct cpupool *cpupool;
> +const char *gran;
> +
> +data = hypfs_get_dyndata();
> +cpupool = data->data;
> +ASSERT(cpupool);

... this ASSERT() I'd like to first settle our earlier discussion,
before possibly giving my R-b here. No other remaining remarks from
my side.

Jan



Re: [PATCH v3 6/8] xen/cpupool: add cpupool directories

2020-12-17 Thread Dario Faggioli
On Thu, 2020-12-17 at 16:54 +0100, Jan Beulich wrote:
> On 09.12.2020 17:09, Juergen Gross wrote:
> > Add /cpupool/ directories to hypfs. Those are
> > completely
> > dynamic, so the related hypfs access functions need to be
> > implemented.
> > 
> > Signed-off-by: Juergen Gross 
> 
> Reviewed-by: Jan Beulich 
> 
Not needed, I think, but still (and if this hasn't been committed
already):

Reviewed-by: Dario Faggioli 

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part


Re: XSA-351 causing Solaris-11 systems to panic during boot.

2020-12-17 Thread boris . ostrovsky


On 12/17/20 2:40 AM, Jan Beulich wrote:
> On 17.12.2020 02:51, boris.ostrov...@oracle.com wrote:
> I think this is acceptable as a workaround, albeit we may want to
> consider further restricting this (at least on staging), like e.g.
> requiring a guest config setting to enable the workaround. 


Maybe, but then someone migrating from a stable release to 4.15 will have to 
modify guest configuration.


> But
> maybe this will need to be part of the MSR policy for the domain
> instead, down the road. We'll definitely want Andrew's view here.
>
> Speaking of staging - before applying anything to the stable
> branches, I think we want to have this addressed on the main
> branch. I can't see how Solaris would work there.


Indeed it won't. I'll need to do that as well (I misinterpreted the statement 
in the XSA about only 4.14- being vulnerable)



-boris




Re: [PATCH v3 14/15] x86/paravirt: switch functions with custom code to ALTERNATIVE

2020-12-17 Thread kernel test robot
Hi Juergen,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to xen-tip/linux-next tip/x86/core tip/x86/asm v5.10 
next-20201217]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Juergen-Gross/x86-major-paravirt-cleanup/20201217-173646
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
accefff5b547a9a1d959c7e76ad539bf2480e78b
config: x86_64-randconfig-a002-20201217 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
# 
https://github.com/0day-ci/linux/commit/bc3cbe0ff1b123a4b7f48c91b32198d7dfe57797
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Juergen-Gross/x86-major-paravirt-cleanup/20201217-173646
git checkout bc3cbe0ff1b123a4b7f48c91b32198d7dfe57797
# save the attached .config to linux build tree
make W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All error/warnings (new ones prefixed by >>):

   arch/x86/entry/entry_64.S: Assembler messages:
>> arch/x86/entry/entry_64.S:1092: Error: junk at end of line, first 
>> unrecognized character is `('
>> arch/x86/entry/entry_64.S:1092: Error: backward ref to unknown label "771:"
>> arch/x86/entry/entry_64.S:1092: Error: backward ref to unknown label "771:"
>> arch/x86/entry/entry_64.S:1092: Error: junk at end of line, first 
>> unrecognized character is `,'
>> arch/x86/entry/entry_64.S:1092: Warning: missing closing '"'
>> arch/x86/entry/entry_64.S:1092: Error: expecting mnemonic; got nothing


vim +1092 arch/x86/entry/entry_64.S

ddeb8f2149de280 arch/x86/kernel/entry_64.S Alexander van Heukelum 2008-11-24  
1089  
424c7d0a9a396ba arch/x86/entry/entry_64.S  Thomas Gleixner2020-03-26  
1090  SYM_CODE_START_LOCAL(error_return)
424c7d0a9a396ba arch/x86/entry/entry_64.S  Thomas Gleixner2020-03-26  
1091  UNWIND_HINT_REGS
424c7d0a9a396ba arch/x86/entry/entry_64.S  Thomas Gleixner2020-03-26 
@1092  DEBUG_ENTRY_ASSERT_IRQS_OFF
424c7d0a9a396ba arch/x86/entry/entry_64.S  Thomas Gleixner2020-03-26  
1093  testb   $3, CS(%rsp)
424c7d0a9a396ba arch/x86/entry/entry_64.S  Thomas Gleixner2020-03-26  
1094  jz  restore_regs_and_return_to_kernel
424c7d0a9a396ba arch/x86/entry/entry_64.S  Thomas Gleixner2020-03-26  
1095  jmp swapgs_restore_regs_and_return_to_usermode
424c7d0a9a396ba arch/x86/entry/entry_64.S  Thomas Gleixner2020-03-26  
1096  SYM_CODE_END(error_return)
424c7d0a9a396ba arch/x86/entry/entry_64.S  Thomas Gleixner2020-03-26  
1097  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH v3 10/15] x86/paravirt: simplify paravirt macros

2020-12-17 Thread kernel test robot
Hi Juergen,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.10]
[cannot apply to xen-tip/linux-next tip/x86/core tip/x86/asm next-20201217]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Juergen-Gross/x86-major-paravirt-cleanup/20201217-173646
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
accefff5b547a9a1d959c7e76ad539bf2480e78b
config: x86_64-randconfig-a016-20201217 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
cee1e7d14f4628d6174b33640d502bff3b54ae45)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/0d13a33e925f799d8487bcc597e2dc016d1fdd16
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Juergen-Gross/x86-major-paravirt-cleanup/20201217-173646
git checkout 0d13a33e925f799d8487bcc597e2dc016d1fdd16
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   In file included from arch/x86/kernel/asm-offsets.c:13:
   In file included from include/linux/suspend.h:5:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:22:
   In file included from include/linux/writeback.h:14:
   In file included from include/linux/blk-cgroup.h:23:
   In file included from include/linux/blkdev.h:26:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/x86/include/asm/io.h:244:
>> arch/x86/include/asm/paravirt.h:44:2: warning: variable '__eax' is 
>> uninitialized when used within its own initialization [-Wuninitialized]
   PVOP_VCALL0(mmu.flush_tlb_user);
   ^~~
   arch/x86/include/asm/paravirt_types.h:504:2: note: expanded from macro 
'PVOP_VCALL0'
   __PVOP_VCALL(op)
   ^~~~
   arch/x86/include/asm/paravirt_types.h:492:8: note: expanded from macro 
'__PVOP_VCALL'
   (void)PVOP_CALL(long, op, CLBR_ANY, PVOP_VCALL_CLOBBERS,\
 ^~~
   arch/x86/include/asm/paravirt_types.h:471:3: note: expanded from macro 
'PVOP_CALL'
   PVOP_CALL_ARGS; \
   ^~
   arch/x86/include/asm/paravirt_types.h:431:41: note: expanded from macro 
'PVOP_CALL_ARGS'
   __edx = __edx, __ecx = __ecx, __eax = __eax;
 ~   ^
   In file included from arch/x86/kernel/asm-offsets.c:13:
   In file included from include/linux/suspend.h:5:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:22:
   In file included from include/linux/writeback.h:14:
   In file included from include/linux/blk-cgroup.h:23:
   In file included from include/linux/blkdev.h:26:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/x86/include/asm/io.h:244:
   arch/x86/include/asm/paravirt.h:49:2: warning: variable '__eax' is 
uninitialized when used within its own initialization [-Wuninitialized]
   PVOP_VCALL0(mmu.flush_tlb_kernel);
   ^
   arch/x86/include/asm/paravirt_types.h:504:2: note: expanded from macro 
'PVOP_VCALL0'
   __PVOP_VCALL(op)
   ^~~~
   arch/x86/include/asm/paravirt_types.h:492:8: note: expanded from macro 
'__PVOP_VCALL'
   (void)PVOP_CALL(long, op, CLBR_ANY, PVOP_VCALL_CLOBBERS,\
 ^~~
   arch/x86/include/asm/paravirt_types.h:471:3: note: expanded from macro 
'PVOP_CALL'
   PVOP_CALL_ARGS; \
   ^~
   arch/x86/include/asm/paravirt_types.h:431:41: note: expanded from macro 
'PVOP_CALL_ARGS'
   __edx = __edx, __ecx = __ecx, __eax = __eax;
 ~   ^
   In file included from arch/x86/kernel/asm-offsets.c:13:
   In file included from include/linux/suspend.h:5:
 

Re: XSA-351 causing Solaris-11 systems to panic during boot.

2020-12-17 Thread Andrew Cooper
On 17/12/2020 16:25, boris.ostrov...@oracle.com wrote:
> On 12/17/20 2:40 AM, Jan Beulich wrote:
>> On 17.12.2020 02:51, boris.ostrov...@oracle.com wrote:
>> I think this is acceptable as a workaround, albeit we may want to
>> consider further restricting this (at least on staging), like e.g.
>> requiring a guest config setting to enable the workaround. 
>
> Maybe, but then someone migrating from a stable release to 4.15 will have to 
> modify guest configuration.
>
>
>> But
>> maybe this will need to be part of the MSR policy for the domain
>> instead, down the road. We'll definitely want Andrew's view here.
>>
>> Speaking of staging - before applying anything to the stable
>> branches, I think we want to have this addressed on the main
>> branch. I can't see how Solaris would work there.
>
> Indeed it won't. I'll need to do that as well (I misinterpreted the statement 
> in the XSA about only 4.14- being vulnerable)

It's hopefully obvious now why we suddenly finished the "lets turn all
unknown MSRs to #GP" work at the point that we did (after dithering on
the point for several years).

To put it bluntly, default MSR readability was not a clever decision at all.

There is a large risk that there is a similar vulnerability elsewhere,
given how poorly documented the MSRs are (and one contemporary CPU I've
got the manual open for has more than 6000 *documented* MSRs).  We did
debate for a while whether the readability of the PPIN MSRs was a
vulnerability or not, before eventually deciding not.

Irrespective of what we do to fix this in Xen, has anyone fixed Solaris yet?

~Andrew



RE: [PATCH v3 06/15] x86/paravirt: switch time pvops functions to use static_call()

2020-12-17 Thread Michael Kelley
From: Juergen Gross  Sent: Thursday, December 17, 2020 1:31 AM

> The time pvops functions are the only ones left which might be
> used in 32-bit mode and which return a 64-bit value.
> 
> Switch them to use the static_call() mechanism instead of pvops, as
> this allows quite some simplification of the pvops implementation.
> 
> Due to include hell this requires to split out the time interfaces
> into a new header file.
> 
> Signed-off-by: Juergen Gross 
> ---
>  arch/x86/Kconfig  |  1 +
>  arch/x86/include/asm/mshyperv.h   | 11 
>  arch/x86/include/asm/paravirt.h   | 14 --
>  arch/x86/include/asm/paravirt_time.h  | 38 +++
>  arch/x86/include/asm/paravirt_types.h |  6 -
>  arch/x86/kernel/cpu/vmware.c  |  5 ++--
>  arch/x86/kernel/kvm.c |  3 ++-
>  arch/x86/kernel/kvmclock.c|  3 ++-
>  arch/x86/kernel/paravirt.c| 16 ---
>  arch/x86/kernel/tsc.c |  3 ++-
>  arch/x86/xen/time.c   | 12 -
>  drivers/clocksource/hyperv_timer.c|  5 ++--
>  drivers/xen/time.c|  3 ++-
>  kernel/sched/sched.h  |  1 +
>  14 files changed, 71 insertions(+), 50 deletions(-)
>  create mode 100644 arch/x86/include/asm/paravirt_time.h
>

[snip]
 
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ffc289992d1b..45942d420626 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -56,17 +56,6 @@ typedef int (*hyperv_fill_flush_list_func)(
>  #define hv_get_raw_timer() rdtsc_ordered()
>  #define hv_get_vector() HYPERVISOR_CALLBACK_VECTOR
> 
> -/*
> - * Reference to pv_ops must be inline so objtool
> - * detection of noinstr violations can work correctly.
> - */
> -static __always_inline void hv_setup_sched_clock(void *sched_clock)
> -{
> -#ifdef CONFIG_PARAVIRT
> - pv_ops.time.sched_clock = sched_clock;
> -#endif
> -}
> -
>  void hyperv_vector_handler(struct pt_regs *regs);
> 
>  static inline void hv_enable_stimer0_percpu_irq(int irq) {}

[snip]

> diff --git a/drivers/clocksource/hyperv_timer.c 
> b/drivers/clocksource/hyperv_timer.c
> index ba04cb381cd3..1ed79993fc50 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  static struct clock_event_device __percpu *hv_clock_event;
>  static u64 hv_sched_clock_offset __ro_after_init;
> @@ -445,7 +446,7 @@ static bool __init hv_init_tsc_clocksource(void)
>   clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
> 
>   hv_sched_clock_offset = hv_read_reference_counter();
> - hv_setup_sched_clock(read_hv_sched_clock_tsc);
> + paravirt_set_sched_clock(read_hv_sched_clock_tsc);
> 
>   return true;
>  }
> @@ -470,6 +471,6 @@ void __init hv_init_clocksource(void)
>   clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100);
> 
>   hv_sched_clock_offset = hv_read_reference_counter();
> - hv_setup_sched_clock(read_hv_sched_clock_msr);
> + static_call_update(pv_sched_clock, read_hv_sched_clock_msr);
>  }
>  EXPORT_SYMBOL_GPL(hv_init_clocksource);

These Hyper-V changes are problematic as we want to keep hyperv_timer.c
architecture independent.  While only the code for x86/x64 is currently
accepted upstream, code for ARM64 support is in progress.   So we need
to use hv_setup_sched_clock() in hyperv_timer.c, and have the per-arch
implementation in mshyperv.h.

Michael



Re: kexec not working in xen domU?

2020-12-17 Thread Guilherme G. Piccoli
On Mon, Dec 14, 2020 at 5:25 PM Phillip Susi  wrote:
> The regular xen cosole should work for this shouldn't it?  So
> earlyprintk=hvc0 I guess?  I also threw in console=hvc0 and loglevel=7:
>
> [  184.734810] systemd-shutdown[1]: Syncing filesystems and block
> devices.
> [  185.772511] systemd-shutdown[1]: Sending SIGTERM to remaining
> processes...
> [  185.896957] systemd-shutdown[1]: Sending SIGKILL to remaining
> processes...
> [  185.90] systemd-shutdown[1]: Unmounting file systems.
> [  185.902180] [1035]: Remounting '/' read-only in with options
> 'errors=remount-ro'.
> [  185.990634] EXT4-fs (xvda1): re-mounted. Opts: errors=remount-ro
> [  186.002373] systemd-shutdown[1]: All filesystems unmounted.
> [  186.002411] systemd-shutdown[1]: Deactivating swaps.
> [  186.002502] systemd-shutdown[1]: All swaps deactivated.
> [  186.002529] systemd-shutdown[1]: Detaching loop devices.
> [  186.002699] systemd-shutdown[1]: All loop devices detached.
> [  186.002727] systemd-shutdown[1]: Stopping MD devices.
> [  186.002814] systemd-shutdown[1]: All MD devices stopped.
> [  186.002840] systemd-shutdown[1]: Detaching DM devices.
> [  186.002974] systemd-shutdown[1]: All DM devices detached.
> [  186.003017] systemd-shutdown[1]: All filesystems, swaps, loop
> devices, MD devices and DM devices detached.
> [  186.168475] systemd-shutdown[1]: Syncing filesystems and block
> devices.
> [  186.169150] systemd-shutdown[1]: Rebooting with kexec.
> [  186.418653] xenbus_probe_frontend: xenbus_frontend_dev_shutdown:
> device/vbd/5632: Initialising != Connected, skipping
> [  186.427377] kexec_core: Starting new kernel
>

Hm..not many prints, either earlyprintk didn't work, or it's a really
early boot issue. Might worth to investigate if it's not a purgatory
issue too - did you try to use the ""new"" kexec syscall, by running
"kexec -s -l" instead of just "kexec -l" ?
Also, worth to try that with upstream kernel and kexec-tools - I
assume you're doing that already?

Cheers,


Guilherme



Re: XSA-351 causing Solaris-11 systems to panic during boot.

2020-12-17 Thread boris . ostrovsky


On 12/17/20 11:46 AM, Andrew Cooper wrote:
> On 17/12/2020 16:25, boris.ostrov...@oracle.com wrote:
>> On 12/17/20 2:40 AM, Jan Beulich wrote:
>>> On 17.12.2020 02:51, boris.ostrov...@oracle.com wrote:
>>> I think this is acceptable as a workaround, albeit we may want to
>>> consider further restricting this (at least on staging), like e.g.
>>> requiring a guest config setting to enable the workaround. 
>> Maybe, but then someone migrating from a stable release to 4.15 will have to 
>> modify guest configuration.
>>
>>
>>> But
>>> maybe this will need to be part of the MSR policy for the domain
>>> instead, down the road. We'll definitely want Andrew's view here.
>>>
>>> Speaking of staging - before applying anything to the stable
>>> branches, I think we want to have this addressed on the main
>>> branch. I can't see how Solaris would work there.
>> Indeed it won't. I'll need to do that as well (I misinterpreted the 
>> statement in the XSA about only 4.14- being vulnerable)
> It's hopefully obvious now why we suddenly finished the "lets turn all
> unknown MSRs to #GP" work at the point that we did (after dithering on
> the point for several years).
>
> To put it bluntly, default MSR readability was not a clever decision at all.
>
> There is a large risk that there is a similar vulnerability elsewhere,
> given how poorly documented the MSRs are (and one contemporary CPU I've
> got the manual open for has more than 6000 *documented* MSRs).  We did
> debate for a while whether the readability of the PPIN MSRs was a
> vulnerability or not, before eventually deciding not.

> Irrespective of what we do to fix this in Xen, has anyone fixed Solaris yet?


I am not aware of anyone working on this (not that I would be).


-boris




Re: [PATCH] xen: Rework WARN_ON() to return whether a warning was triggered

2020-12-17 Thread Bertrand Marquis
Hi Julien,

> On 15 Dec 2020, at 13:11, Julien Grall  wrote:
> 
> Hi Juergen,
> 
> On 15/12/2020 11:31, Jürgen Groß wrote:
>> On 15.12.20 12:26, Julien Grall wrote:
>>> From: Julien Grall 
>>> 
>>> So far, our implementation of WARN_ON() cannot be used in the following
>>> situation:
>>> 
>>> if ( WARN_ON() )
>>>  ...
>>> 
>>> This is because the WARN_ON() doesn't return whether a warning. Such
>> ... warning has been triggered.
> 
> I will add it.
> 
>>> construction can be handy to have if you have to print more information
>>> and now the stack track.
>> Sorry, I'm not able to parse that sentence.
> 
> Urgh :/. How about the following commit message:
> 
> "So far, our implementation of WARN_ON() cannot be used in the following 
> situation:
> 
> if ( WARN_ON() )
>  ...
> 
> This is because WARN_ON() doesn't return whether a warning has been 
> triggered. Such construciton can be handy if you want to print more 
> information and also dump the stack trace.
> 
> Therefore, rework the WARN_ON() implementation to return whether a warning 
> was triggered. The idea was borrowed from Linux".

With that.

Reviewed-by: Bertrand Marquis 

And thanks a lot for this :-)

Cheers
Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall



Re: [PATCH v2] xen/xenbus: make xs_talkv() interruptible

2020-12-17 Thread Andrew Cooper
On 16/12/2020 08:21, Jürgen Groß wrote:
> On 15.12.20 21:59, Andrew Cooper wrote:
>> On 15/12/2020 11:10, Juergen Gross wrote:
>>> In case a process waits for any Xenstore action in the xenbus driver
>>> it should be interruptible by signals.
>>>
>>> Signed-off-by: Juergen Gross 
>>> ---
>>> V2:
>>> - don't special case SIGKILL as libxenstore is handling -EINTR fine
>>> ---
>>>   drivers/xen/xenbus/xenbus_xs.c | 9 -
>>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/xen/xenbus/xenbus_xs.c
>>> b/drivers/xen/xenbus/xenbus_xs.c
>>> index 3a06eb699f33..17c8f8a155fd 100644
>>> --- a/drivers/xen/xenbus/xenbus_xs.c
>>> +++ b/drivers/xen/xenbus/xenbus_xs.c
>>> @@ -205,8 +205,15 @@ static bool test_reply(struct xb_req_data *req)
>>>     static void *read_reply(struct xb_req_data *req)
>>>   {
>>> +    int ret;
>>> +
>>>   do {
>>> -    wait_event(req->wq, test_reply(req));
>>> +    ret = wait_event_interruptible(req->wq, test_reply(req));
>>> +
>>> +    if (ret == -ERESTARTSYS && signal_pending(current)) {
>>> +    req->msg.type = XS_ERROR;
>>> +    return ERR_PTR(-EINTR);
>>> +    }
>>
>> So now I can talk fully about the situations which lead to this, I think
>> there is a bit more complexity.
>>
>> It turns out there are a number of issues related to running a Xen
>> system with no xenstored.
>>
>> 1) If a xenstore-write occurs during startup before init-xenstore-domain
>> runs, the former blocks on /dev/xen/xenbus waiting for xenstored to
>> reply, while the latter blocks on /dev/xen/xenbus_backend when trying to
>> tell the dom0 kernel that xenstored is in dom1.  This effectively
>> deadlocks the system.
>
> This should be easy to solve: any request to /dev/xen/xenbus should
> block upfront in case xenstored isn't up yet (could e.g. wait
> interruptible until xenstored_ready is non-zero).

I'm not sure that that would fix the problem.  The problem is that
setting the ring details via /dev/xen/xenbus_backend blocks, which
prevents us launching the xenstored stubdomain, which prevents the
earlier xenbus write being completed.

So long as /dev/xen/xenbus_backend doesn't block, there's no problem
with other /dev/xen/xenbus activity being pending briefly.


Looking at the current logic, I'm not completely convinced.  Even
finding a filled-in evtchn/gfn doesn't mean that xenstored is actually
ready.

There are 3 possible cases.

1) PV guest, and details in start_info
2) HVM guest, and details in HVM_PARAMs
3) No details (expected for dom0).  Something in userspace must provide
details at a later point.

So the setup phases go from nothing, to having ring details, to finding
the ring working.

I think it would be prudent to try reading a key between having details
and declaring the xenstored_ready.  Any activity, even XS_ERROR,
indicates that the other end of the ring is listening.

>
>> 2) If xenstore-watch is running when xenstored dies, it spins at 100%
>> cpu usage making no system calls at all.  This is caused by bad error
>> handling from xs_watch(), and attempting to debug found:
>
> Can you expand on "bad error handling from xs_watch()", please?

do_watch() has

    for ( ... ) { // defaults to an infinite loop
        vec = xs_read_watch();
        if (vec == NULL)
            continue;
        ...
    }


My next plan was to experiment with break instead of continue, which
I'll get to at some point.

>
>>
>> 3) (this issue).  If anyone starts xenstore-watch with no xenstored
>> running at all, it blocks in D in the kernel.
>
> Should be handled with solution for 1).
>
>>
>> The cause is the special handling for watch/unwatch commands which,
>> instead of just queuing up the data for xenstore, explicitly waits for
>> an OK for registering the watch.  This causes a write() system call to
>> block waiting for a non-existent entity to reply.
>>
>> So while this patch does resolve the major usability issue I found (I
>> can't even SIGINT and get my terminal back), I think there are issues.
>>
>> The reason why XS_WATCH/XS_UNWATCH are special cased is because they do
>> require special handling.  The main kernel thread for processing
>> incoming data from xenstored does need to know how to associate each
>> async XS_WATCH_EVENT to the caller who watched the path.
>>
>> Therefore, depending on when this cancellation hits, we might be in any
>> of the following states:
>>
>> 1) the watch is queued in the kernel, but not even sent to xenstored yet
>> 2) the watch is queued in the xenstored ring, but not acted upon
>> 3) the watch is queued in the xenstored ring, and the xenstored has seen
>> it but not replied yet
>> 4) the watch has been processed, but the XS_WATCH reply hasn't been
>> received yet
>> 5) the watch has been processed, and the XS_WATCH reply received
>>
>> State 5 (and a little bit) is the normal success path when xenstored has
>> acted upon the request, and the internal kernel infrastructure is set up
>> appropriately to hand

[xen-unstable-smoke test] 157649: tolerable all pass - PUSHED

2020-12-17 Thread osstest service owner
flight 157649 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/157649/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  641723f78d3a0b1982e1cd2ef37d8d877cfe542d
baseline version:
 xen  d81133d45d81d35a4e7445778bfd1179190cbd31

Last test of basis   157621  2020-12-17 02:00:29 Z0 days
Testing same since   157649  2020-12-17 16:00:26 Z0 days1 attempts


People who touched revisions under test:
  Juergen Gross 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   d81133d45d..641723f78d  641723f78d3a0b1982e1cd2ef37d8d877cfe542d -> smoke



Re: [PATCH 1/6] x86/p2m: tidy p2m_add_foreign() a little

2020-12-17 Thread Andrew Cooper
On 15/12/2020 16:25, Jan Beulich wrote:
> Drop a bogus ASSERT() - we don't typically assert incoming domain
> pointers to be non-NULL, and there's no particular reason to do so here.
>
> Replace the open-coded DOMID_SELF check by use of
> rcu_lock_remote_domain_by_id(), at the same time covering the request
> being made with the current domain's actual ID.
>
> Move the "both domains same" check into just the path where it really
> is meaningful.
>
> Swap the order of the two puts, such that
> - the p2m lock isn't needlessly held across put_page(),
> - a separate put_page() on an error path can be avoided,
> - they're inverse to the order of the respective gets.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 

> ---
> The DOMID_SELF check being converted also suggests to me that there's an
> implication of tdom == current->domain, which would in turn appear to
> mean the "both domains same" check could as well be dropped altogether.

I don't see anything conceptually wrong with the toolstack creating a
foreign mapping on behalf of a guest at construction time.  I'd go as
far as to argue that it is an interface shortcoming if this didn't
function correctly.

>
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -2560,9 +2560,6 @@ int p2m_add_foreign(struct domain *tdom,
>  int rc;
>  struct domain *fdom;
>  
> -ASSERT(tdom);
> -if ( foreigndom == DOMID_SELF )
> -return -EINVAL;
>  /*
>   * hvm fixme: until support is added to p2m teardown code to cleanup any
>   * foreign entries, limit this to hardware domain only.
> @@ -2573,13 +2570,15 @@ int p2m_add_foreign(struct domain *tdom,
>  if ( foreigndom == DOMID_XEN )
>  fdom = rcu_lock_domain(dom_xen);
>  else
> -fdom = rcu_lock_domain_by_id(foreigndom);
> -if ( fdom == NULL )
> -return -ESRCH;
> +{
> +rc = rcu_lock_remote_domain_by_id(foreigndom, &fdom);

It occurs to me that rcu_lock_remote_domain_by_id()'s self error path
ought to be -EINVAL rather than -EPERM.  It's never for permissions
reasons that we restrict to remote domains like this - always for
technical ones.

But that is definitely content for a different patch.

~Andrew



Re: [PATCH 2/6] x86/mm: p2m_add_foreign() is HVM-only

2020-12-17 Thread Andrew Cooper
On 15/12/2020 16:26, Jan Beulich wrote:
> This is together with its only caller, xenmem_add_to_physmap_one().

I can't parse this sentence.  Perhaps "... as is it's only caller," as a
follow-on from the subject sentence.

>  Move
> the latter next to p2m_add_foreign(), allowing this one to become static
> at the same time.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper , although...

> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -2639,7 +2646,114 @@ int p2m_add_foreign(struct domain *tdom,
>  return rc;
>  }
>  
> -#ifdef CONFIG_HVM
> +int xenmem_add_to_physmap_one(
> +struct domain *d,
> +unsigned int space,
> +union add_to_physmap_extra extra,
> +unsigned long idx,
> +gfn_t gpfn)
> +{
> +struct page_info *page = NULL;
> +unsigned long gfn = 0 /* gcc ... */, old_gpfn;
> +mfn_t prev_mfn;
> +int rc = 0;
> +mfn_t mfn = INVALID_MFN;
> +p2m_type_t p2mt;
> +
> +switch ( space )
> +{
> +case XENMAPSPACE_shared_info:
> +if ( idx == 0 )
> +mfn = virt_to_mfn(d->shared_info);
> +break;
> +case XENMAPSPACE_grant_table:
> +rc = gnttab_map_frame(d, idx, gpfn, &mfn);
> +if ( rc )
> +return rc;
> +break;
> +case XENMAPSPACE_gmfn:
> +{
> +p2m_type_t p2mt;
> +
> +gfn = idx;
> +mfn = get_gfn_unshare(d, gfn, &p2mt);
> +/* If the page is still shared, exit early */
> +if ( p2m_is_shared(p2mt) )
> +{
> +put_gfn(d, gfn);
> +return -ENOMEM;
> +}
> +page = get_page_from_mfn(mfn, d);
> +if ( unlikely(!page) )
> +mfn = INVALID_MFN;
> +break;
> +}
> +case XENMAPSPACE_gmfn_foreign:
> +return p2m_add_foreign(d, idx, gfn_x(gpfn), extra.foreign_domid);
> +default:
> +break;

... seeing as the function is moving wholesale, can we at least correct
the indention, to save yet another large churn in the future?  (If it
were me, I'd go as far as deleting the default case as well.)

~Andrew



[libvirt test] 157624: regressions - FAIL

2020-12-17 Thread osstest service owner
flight 157624 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/157624/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  4252318bb3e863df52c90cbf9b3c70de11fa1a53
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  160 days
Failing since151818  2020-07-11 04:18:52 Z  159 days  154 attempts
Testing same since   157624  2020-12-17 04:19:17 Z0 days1 attempts


People who touched revisions under test:
  Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Andika Triwidada 
  Andrea Bolognani 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastien Orivel 
  Bihong Yu 
  Binfeng Wu 
  Boris Fiuczynski 
  Brian Turek 
  Christian Ehrhardt 
  Christian Schoenebeck 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Erik Skultety 
  Fabian Affolter 
  Fabian Freyer 
  Fangge Jin 
  Farhan Ali 
  Fedora Weblate Translation 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Ian Wienand 
  Jamie Strandboge 
  Jamie Strandboge 
  Jean-Baptiste Holcroft 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jiri Denemark 
  John Ferlan 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Laine Stump 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Masayoshi Mizuma 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Neal Gompa 
  Nico Pache 
  Nikolay Shirokovskiy 
  Olaf Hering 
  Olesya Gerasimenko 
  Orion Poplawski 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Ricky Tigg 
  Roman Bogorodskiy 
  Roman Bolshakov 
  Ryan Gahagan 
  Ryan Schmidt 
  Sam Hartman 
  Scott Shambarger 
  Sebastian Mitterle 
  Shalini Chellathurai Saroja 
  Shaojun Yang 
  Shi Lei 
  Simon Gaiser 
  Stefan Bader 
  Stefan Berger 
  Szymon Scholz 
  Thomas Huth 
  Tim Wiederhake 
  Tomáš Golembiovský 
  Tuguoyi 
  Wang Xin 
  Weblate 
  Yang Hang 
  Yanqiu Zhang 
  Yi Li 
  Yi Wang 
  Yuri Chornoivan 
  Zheng Chuan 
  zhenwei pi 
  Zhenyu Zheng 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  fail
 build-arm64-libvirt  fail
 build-armhf-libvirt  fail
 build-i386-libvirt   fail
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-

Re: [PATCH 3/6] x86/p2m: set_{foreign,mmio}_p2m_entry() are HVM-only

2020-12-17 Thread Andrew Cooper
On 15/12/2020 16:26, Jan Beulich wrote:
> Extend a respective #ifdef from inside set_typed_p2m_entry() to around
> all three functions. Add ASSERT_UNREACHABLE() to the latter one's safety
> check path.
>
> Signed-off-by: Jan Beulich 

As the code currently stands, yes.  However, I'm not sure I agree
conceptually.

The p2m APIs are either a common interface to use, or HVM-specific.

PV guests don't actually have a p2m, but some of the APIs are used from
common code (e.g. copy_to/from_guest()), and some p2m concepts are
special cased as identity for PV (technically paging_mode_translate()),
while other concepts, such as foreign/mmio, which do exist for both PV
and HVM guests, are handled with totally different API sets for PV and HVM.

This is a broken mess of an abstraction.  I suspect some of it has to do
with PV autotranslate mode in the past, but that doesn't alter the fact
that we have a totally undocumented and error prone set of APIs here.

Either P2M's should (fully) be the common abstraction (despite not being
a real object for PV guests), or they should should be a different set
of APIs which is the common abstraction, and P2Ms should move being
exclusively for HVM guests.

(It's also very obvious by all the CONFIG_X86 ifdefary that we've got
arch specifics in our common code, and that is another aspect of the API
mess which needs handling.)

I'm honestly not sure which of these would be better, but I'm fairly
sure that either would be better than what we've currently got.  I
certainly think it would be better to have a plan for improvement, to
guide patches like this.

~Andrew



[xen-unstable test] 157617: tolerable FAIL - PUSHED

2020-12-17 Thread osstest service owner
flight 157617 xen-unstable real [real]
flight 157652 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/157617/
http://logs.test-lab.xenproject.org/osstest/logs/157652/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-libvirt-pair 28 guest-migrate/dst_host/src_host/debian.repeat 
fail pass in 157652-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 157568
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 157568
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 157568
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 157568
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 157568
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 157568
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 157568
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 157568
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 157568
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 157568
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 157568
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 xen  ac6a0af3870ba0f7ffb16af3e41827b0a53f88b0
baseline version:
 xen  904148ecb4a59d4c8375d8e8d38117b8605e10ac

Last test of basis   157568  2020-12-15 16:38:41 Z2 days
Testing same since   157617  2020-12-17 00:26:59 Z0 days1 attempts


People who touched revisions under test:
  Christian Lindig 
  Elliot

Re: kexec not working in xen domU?

2020-12-17 Thread Phillip Susi


Guilherme G. Piccoli writes:

> Hm..not many prints, either earlyprintk didn't work, or it's a really
> early boot issue. Might worth to investigate if it's not a purgatory
> issue too - did you try to use the ""new"" kexec syscall, by running
> "kexec -s -l" instead of just "kexec -l" ?
> Also, worth to try that with upstream kernel and kexec-tools - I
> assume you're doing that already?

I tried with -s and it didn't help.  So far I tried it originally on my
Ubuntu 20.04 amazon vps, then on my debian testing ( linux 5.9.0 ) on my
local xen server.  I'll try building the latest upstream kernel and
kexec tomorrow.



Re: [PATCH] xen/x86: Fix memory leak in vcpu_create() error path

2020-12-17 Thread Andrew Cooper
On 29/09/2020 07:18, Jan Beulich wrote:
> On 28.09.2020 17:47, Andrew Cooper wrote:
>> Various paths in vcpu_create() end up calling paging_update_paging_modes(),
>> which eventually allocate a monitor pagetable if one doesn't exist.
>>
>> However, an error in vcpu_create() results in the vcpu being cleaned up
>> locally, and not put onto the domain's vcpu list.  Therefore, the monitor
>> table is not freed by {hap,shadow}_teardown()'s loop.  This is caught by
>> assertions later that we've successfully freed the entire hap/shadow memory
>> pool.
>>
>> The per-vcpu loops in domain teardown logic is conceptually wrong, but exist
>> due to insufficient existing structure in the existing logic.
>>
>> Break paging_vcpu_teardown() out of paging_teardown(), with mirrored 
>> breakouts
>> in the hap/shadow code, and use it from arch_vcpu_create()'s error path.  
>> This
>> fixes the memory leak.
>>
>> The new {hap,shadow}_vcpu_teardown() must be idempotent, and are written to 
>> be
>> as tolerable as possible, with the minimum number of safety checks possible.
>> In particular, drop the mfn_valid() check - if junk is in these fields, then
>> Xen is going to explode anyway.
>>
>> Reported-by: Michał Leszczyński 
>> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 

Thanks.  (Wow it really is a long time since needing to drop everything
for security work...)

>> --- a/xen/arch/x86/mm/hap/hap.c
>> +++ b/xen/arch/x86/mm/hap/hap.c
>> @@ -563,30 +563,37 @@ void hap_final_teardown(struct domain *d)
>>  paging_unlock(d);
>>  }
>>  
>> +void hap_vcpu_teardown(struct vcpu *v)
>> +{
>> +struct domain *d = v->domain;
>> +mfn_t mfn;
>> +
>> +paging_lock(d);
>> +
>> +if ( !paging_mode_hap(d) || !v->arch.paging.mode )
>> +goto out;
> Any particular reason you don't use paging_get_hostmode() (as the
> original code did) here? Any particular reason for the seemingly
> redundant (and hence somewhat in conflict with the description's
> "with the minimum number of safety checks possible")
> paging_mode_hap()?

Yes to both.  As you spotted, I converted the shadow side first, and
made the two consistent.

The paging_mode_{shadow,hap})() is necessary for idempotency.  These
functions really might get called before paging is set up, for an early
failure in domain_create().

The paging mode has nothing really to do with hostmode/guestmode/etc. 
It is the only way of expressing the logic where it is clear that the
lower pointer dereferences are trivially safe.  (Also, the guestmode
predicate isn't going to survive the nested virt work.  It's
conceptually broken.)

~Andrew



Re: [PATCH v4 0/8] xen/arm: Emulate ID registers

2020-12-17 Thread Stefano Stabellini
Actually it passed. It was just a transient internet issue.


On Thu, 17 Dec 2020, no-re...@patchew.org wrote:
> Hi,
> 
> Patchew automatically ran gitlab-ci pipeline with this patch (series) 
> applied, but the job failed. Maybe there's a bug in the patches?
> 
> You can find the link to the pipeline near the end of the report below:
> 
> Type: series
> Message-id: cover.1608214355.git.bertrand.marq...@arm.com
> Subject: [PATCH v4 0/8] xen/arm: Emulate ID registers
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> sleep 10
> patchew gitlab-pipeline-check -p xen-project/patchew/xen
> === TEST SCRIPT END ===
> 
> warning: redirecting to https://gitlab.com/xen-project/patchew/xen.git/
> From https://gitlab.com/xen-project/patchew/xen
>8e0fe4fe5f..904148ecb4  master -> master
> warning: redirecting to https://gitlab.com/xen-project/patchew/xen.git/
> From https://gitlab.com/xen-project/patchew/xen
>  * [new tag]   
> patchew/cover.1608214355.git.bertrand.marq...@arm.com -> 
> patchew/cover.1608214355.git.bertrand.marq...@arm.com
> Switched to a new branch 'test'
> 4fc8dff44c xen/arm: Activate TID3 in HCR_EL2
> d72e6d1faa xen/arm: Add CP10 exception support to handle MVFR
> 9ef18928a0 xen/arm: Add handler for cp15 ID registers
> 09f61edd55 xen/arm: Add handler for ID registers on arm64
> 0a14368a8f xen/arm: create a cpuinfo structure for guest
> 01fd2fca83 xen/arm: Add arm64 ID registers definitions
> e87a25c913 xen/arm: Add ID registers and complete cpuinfo
> 66f3ee6d1a xen/arm: Use READ_SYSREG instead of 32/64 versions
> 
> === OUTPUT BEGIN ===
> [2020-12-17 16:52:57] Looking up pipeline...
> [2020-12-17 16:52:58] Found pipeline 231473331:
> 
> https://gitlab.com/xen-project/patchew/xen/-/pipelines/231473331
> 
> [2020-12-17 16:52:58] Waiting for pipeline to finish...
> [2020-12-17 17:08:03] Still waiting...
> [2020-12-17 17:23:09] Still waiting...
> [2020-12-17 17:38:13] Still waiting...
> [2020-12-17 17:53:18] Still waiting...
> [2020-12-17 18:08:22] Still waiting...
> [2020-12-17 18:23:27] Still waiting...
> [2020-12-17 18:38:32] Still waiting...
> [2020-12-17 18:53:36] Still waiting...
> [2020-12-17 19:08:42] Still waiting...
> [2020-12-17 19:23:48] Still waiting...
> [2020-12-17 19:38:53] Still waiting...
> [2020-12-17 19:53:58] Still waiting...
> [2020-12-17 20:09:03] Still waiting...
> [2020-12-17 20:11:03] Pipeline failed
> [2020-12-17 20:11:04] Job 'qemu-smoke-x86-64-clang-pvh' in stage 'test' is 
> skipped
> [2020-12-17 20:11:04] Job 'qemu-smoke-x86-64-gcc-pvh' in stage 'test' is 
> skipped
> [2020-12-17 20:11:04] Job 'qemu-smoke-x86-64-clang' in stage 'test' is skipped
> [2020-12-17 20:11:04] Job 'qemu-smoke-x86-64-gcc' in stage 'test' is skipped
> [2020-12-17 20:11:04] Job 'build-each-commit-gcc' in stage 'test' is skipped
> [2020-12-17 20:11:04] Job 'debian-unstable-gcc-debug-arm64' in stage 'build' 
> is failed
> [2020-12-17 20:11:04] Job 'debian-unstable-gcc-arm64' in stage 'build' is 
> failed
> === OUTPUT END ===
> 
> Test command exited with code: 1



Re: [PATCH v4 1/8] xen/arm: Use READ_SYSREG instead of 32/64 versions

2020-12-17 Thread Stefano Stabellini
On Thu, 17 Dec 2020, Bertrand Marquis wrote:
> Modify identify_cpu function to use READ_SYSREG instead of READ_SYSREG32
> or READ_SYSREG64.
> The aarch32 versions of the registers are 64bit on an aarch64 processor
> so it was wrong to access them as 32bit registers.

This sentence is a bit confusing because, as an example, MIDR_EL1 is
also an aarch64 register, not only an aarch32 register. Maybe we should
clarify.

Aside from that:

Reviewed-by: Stefano Stabellini 


> Signed-off-by: Bertrand Marquis 
>
> ---
> Change in V4:
>   This patch was introduced in v4.
> 
> ---
>  xen/arch/arm/cpufeature.c | 50 +++
>  1 file changed, 25 insertions(+), 25 deletions(-)
> 
> diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
> index 44126dbf07..115e1b164d 100644
> --- a/xen/arch/arm/cpufeature.c
> +++ b/xen/arch/arm/cpufeature.c
> @@ -99,44 +99,44 @@ int enable_nonboot_cpu_caps(const struct 
> arm_cpu_capabilities *caps)
>  
>  void identify_cpu(struct cpuinfo_arm *c)
>  {
> -c->midr.bits = READ_SYSREG32(MIDR_EL1);
> +c->midr.bits = READ_SYSREG(MIDR_EL1);
>  c->mpidr.bits = READ_SYSREG(MPIDR_EL1);
>  
>  #ifdef CONFIG_ARM_64
> -c->pfr64.bits[0] = READ_SYSREG64(ID_AA64PFR0_EL1);
> -c->pfr64.bits[1] = READ_SYSREG64(ID_AA64PFR1_EL1);
> +c->pfr64.bits[0] = READ_SYSREG(ID_AA64PFR0_EL1);
> +c->pfr64.bits[1] = READ_SYSREG(ID_AA64PFR1_EL1);
>  
> -c->dbg64.bits[0] = READ_SYSREG64(ID_AA64DFR0_EL1);
> -c->dbg64.bits[1] = READ_SYSREG64(ID_AA64DFR1_EL1);
> +c->dbg64.bits[0] = READ_SYSREG(ID_AA64DFR0_EL1);
> +c->dbg64.bits[1] = READ_SYSREG(ID_AA64DFR1_EL1);
>  
> -c->aux64.bits[0] = READ_SYSREG64(ID_AA64AFR0_EL1);
> -c->aux64.bits[1] = READ_SYSREG64(ID_AA64AFR1_EL1);
> +c->aux64.bits[0] = READ_SYSREG(ID_AA64AFR0_EL1);
> +c->aux64.bits[1] = READ_SYSREG(ID_AA64AFR1_EL1);
>  
> -c->mm64.bits[0]  = READ_SYSREG64(ID_AA64MMFR0_EL1);
> -c->mm64.bits[1]  = READ_SYSREG64(ID_AA64MMFR1_EL1);
> +c->mm64.bits[0]  = READ_SYSREG(ID_AA64MMFR0_EL1);
> +c->mm64.bits[1]  = READ_SYSREG(ID_AA64MMFR1_EL1);
>  
> -c->isa64.bits[0] = READ_SYSREG64(ID_AA64ISAR0_EL1);
> -c->isa64.bits[1] = READ_SYSREG64(ID_AA64ISAR1_EL1);
> +c->isa64.bits[0] = READ_SYSREG(ID_AA64ISAR0_EL1);
> +c->isa64.bits[1] = READ_SYSREG(ID_AA64ISAR1_EL1);
>  #endif
>  
> -c->pfr32.bits[0] = READ_SYSREG32(ID_PFR0_EL1);
> -c->pfr32.bits[1] = READ_SYSREG32(ID_PFR1_EL1);
> +c->pfr32.bits[0] = READ_SYSREG(ID_PFR0_EL1);
> +c->pfr32.bits[1] = READ_SYSREG(ID_PFR1_EL1);
>  
> -c->dbg32.bits[0] = READ_SYSREG32(ID_DFR0_EL1);
> +c->dbg32.bits[0] = READ_SYSREG(ID_DFR0_EL1);
>  
> -c->aux32.bits[0] = READ_SYSREG32(ID_AFR0_EL1);
> +c->aux32.bits[0] = READ_SYSREG(ID_AFR0_EL1);
>  
> -c->mm32.bits[0]  = READ_SYSREG32(ID_MMFR0_EL1);
> -c->mm32.bits[1]  = READ_SYSREG32(ID_MMFR1_EL1);
> -c->mm32.bits[2]  = READ_SYSREG32(ID_MMFR2_EL1);
> -c->mm32.bits[3]  = READ_SYSREG32(ID_MMFR3_EL1);
> +c->mm32.bits[0]  = READ_SYSREG(ID_MMFR0_EL1);
> +c->mm32.bits[1]  = READ_SYSREG(ID_MMFR1_EL1);
> +c->mm32.bits[2]  = READ_SYSREG(ID_MMFR2_EL1);
> +c->mm32.bits[3]  = READ_SYSREG(ID_MMFR3_EL1);
>  
> -c->isa32.bits[0] = READ_SYSREG32(ID_ISAR0_EL1);
> -c->isa32.bits[1] = READ_SYSREG32(ID_ISAR1_EL1);
> -c->isa32.bits[2] = READ_SYSREG32(ID_ISAR2_EL1);
> -c->isa32.bits[3] = READ_SYSREG32(ID_ISAR3_EL1);
> -c->isa32.bits[4] = READ_SYSREG32(ID_ISAR4_EL1);
> -c->isa32.bits[5] = READ_SYSREG32(ID_ISAR5_EL1);
> +c->isa32.bits[0] = READ_SYSREG(ID_ISAR0_EL1);
> +c->isa32.bits[1] = READ_SYSREG(ID_ISAR1_EL1);
> +c->isa32.bits[2] = READ_SYSREG(ID_ISAR2_EL1);
> +c->isa32.bits[3] = READ_SYSREG(ID_ISAR3_EL1);
> +c->isa32.bits[4] = READ_SYSREG(ID_ISAR4_EL1);
> +c->isa32.bits[5] = READ_SYSREG(ID_ISAR5_EL1);
>  }
>  
>  /*
> -- 
> 2.17.1
> 



Re: [PATCH v4 2/8] xen/arm: Add ID registers and complete cpuinfo

2020-12-17 Thread Stefano Stabellini
On Thu, 17 Dec 2020, Bertrand Marquis wrote:
> Add definition and entries in cpuinfo for ID registers introduced in
> newer Arm Architecture reference manual:
> - ID_PFR2: processor feature register 2
> - ID_DFR1: debug feature register 1
> - ID_MMFR4 and ID_MMFR5: Memory model feature registers 4 and 5
> - ID_ISA6: ISA Feature register 6
> Add more bitfield definitions in PFR fields of cpuinfo.
> Add MVFR2 register definition for aarch32.
> Add MVFRx_EL1 defines for aarch32.
> Add mvfr values in cpuinfo.
> Add some registers definition for arm64 in sysregs as some are not
> always know by compilers.
> Initialize the new values added in cpuinfo in identify_cpu during init.
> 
> Signed-off-by: Bertrand Marquis 

Reviewed-by: Stefano Stabellini 


> ---
> Changes in V2:
>   Fix dbg32 table size and add proper initialisation of the second entry
>   of the table by reading ID_DFR1 register.
> Changes in V3:
>   Fix typo in commit title
>   Add MVFR2 definition and handling on aarch32 and remove specific case
>   for mvfr field in cpuinfo (now the same on arm64 and arm32).
>   Add MMFR4 definition if not known by the compiler.
> Changes in V4:
>   Add MVFRx_EL1 defines for aarch32
>   Use READ_SYSREG instead of 32/64 versions of the function which
>   removed the ifdef case for MVFR access.
>   User register_t type for mvfr and zfr64 fields of cpuinfo structure.
> 
> ---
>  xen/arch/arm/cpufeature.c   | 12 +++
>  xen/include/asm-arm/arm64/sysregs.h | 28 +++
>  xen/include/asm-arm/cpregs.h| 15 
>  xen/include/asm-arm/cpufeature.h| 56 -
>  4 files changed, 102 insertions(+), 9 deletions(-)
> 
> diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
> index 115e1b164d..86b99ee960 100644
> --- a/xen/arch/arm/cpufeature.c
> +++ b/xen/arch/arm/cpufeature.c
> @@ -114,15 +114,20 @@ void identify_cpu(struct cpuinfo_arm *c)
>  
>  c->mm64.bits[0]  = READ_SYSREG(ID_AA64MMFR0_EL1);
>  c->mm64.bits[1]  = READ_SYSREG(ID_AA64MMFR1_EL1);
> +c->mm64.bits[2]  = READ_SYSREG(ID_AA64MMFR2_EL1);
>  
>  c->isa64.bits[0] = READ_SYSREG(ID_AA64ISAR0_EL1);
>  c->isa64.bits[1] = READ_SYSREG(ID_AA64ISAR1_EL1);
> +
> +c->zfr64.bits[0] = READ_SYSREG(ID_AA64ZFR0_EL1);
>  #endif
>  
>  c->pfr32.bits[0] = READ_SYSREG(ID_PFR0_EL1);
>  c->pfr32.bits[1] = READ_SYSREG(ID_PFR1_EL1);
> +c->pfr32.bits[2] = READ_SYSREG(ID_PFR2_EL1);
>  
>  c->dbg32.bits[0] = READ_SYSREG(ID_DFR0_EL1);
> +c->dbg32.bits[1] = READ_SYSREG(ID_DFR1_EL1);
>  
>  c->aux32.bits[0] = READ_SYSREG(ID_AFR0_EL1);
>  
> @@ -130,6 +135,8 @@ void identify_cpu(struct cpuinfo_arm *c)
>  c->mm32.bits[1]  = READ_SYSREG(ID_MMFR1_EL1);
>  c->mm32.bits[2]  = READ_SYSREG(ID_MMFR2_EL1);
>  c->mm32.bits[3]  = READ_SYSREG(ID_MMFR3_EL1);
> +c->mm32.bits[4]  = READ_SYSREG(ID_MMFR4_EL1);
> +c->mm32.bits[5]  = READ_SYSREG(ID_MMFR5_EL1);
>  
>  c->isa32.bits[0] = READ_SYSREG(ID_ISAR0_EL1);
>  c->isa32.bits[1] = READ_SYSREG(ID_ISAR1_EL1);
> @@ -137,6 +144,11 @@ void identify_cpu(struct cpuinfo_arm *c)
>  c->isa32.bits[3] = READ_SYSREG(ID_ISAR3_EL1);
>  c->isa32.bits[4] = READ_SYSREG(ID_ISAR4_EL1);
>  c->isa32.bits[5] = READ_SYSREG(ID_ISAR5_EL1);
> +c->isa32.bits[6] = READ_SYSREG(ID_ISAR6_EL1);
> +
> +c->mvfr.bits[0] = READ_SYSREG(MVFR0_EL1);
> +c->mvfr.bits[1] = READ_SYSREG(MVFR1_EL1);
> +c->mvfr.bits[2] = READ_SYSREG(MVFR2_EL1);
>  }
>  
>  /*
> diff --git a/xen/include/asm-arm/arm64/sysregs.h 
> b/xen/include/asm-arm/arm64/sysregs.h
> index c60029d38f..077fd95fb7 100644
> --- a/xen/include/asm-arm/arm64/sysregs.h
> +++ b/xen/include/asm-arm/arm64/sysregs.h
> @@ -57,6 +57,34 @@
>  #define ICH_AP1R2_EL2 __AP1Rx_EL2(2)
>  #define ICH_AP1R3_EL2 __AP1Rx_EL2(3)
>  
> +/*
> + * Define ID coprocessor registers if they are not
> + * already defined by the compiler.
> + *
> + * Values picked from linux kernel
> + */
> +#ifndef ID_AA64MMFR2_EL1
> +#define ID_AA64MMFR2_EL1S3_0_C0_C7_2
> +#endif
> +#ifndef ID_PFR2_EL1
> +#define ID_PFR2_EL1 S3_0_C0_C3_4
> +#endif
> +#ifndef ID_MMFR4_EL1
> +#define ID_MMFR4_EL1S3_0_C0_C2_6
> +#endif
> +#ifndef ID_MMFR5_EL1
> +#define ID_MMFR5_EL1S3_0_C0_C3_6
> +#endif
> +#ifndef ID_ISAR6_EL1
> +#define ID_ISAR6_EL1S3_0_C0_C2_7
> +#endif
> +#ifndef ID_AA64ZFR0_EL1
> +#define ID_AA64ZFR0_EL1 S3_0_C0_C4_4
> +#endif
> +#ifndef ID_DFR1_EL1
> +#define ID_DFR1_EL1 S3_0_C0_C3_5
> +#endif
> +
>  /* Access to system registers */
>  
>  #define READ_SYSREG32(name) ((uint32_t)READ_SYSREG64(name))
> diff --git a/xen/include/asm-arm/cpregs.h b/xen/include/asm-arm/cpregs.h
> index 8fd344146e..6daf2b1a30 100644
> --- a/xen/include/asm-arm/cpregs.h
> +++ b/xen/include/asm-arm/cpregs.h
>

Re: [PATCH v4 3/8] xen/arm: Add arm64 ID registers definitions

2020-12-17 Thread Stefano Stabellini
On Thu, 17 Dec 2020, Bertrand Marquis wrote:
> Add coprocessor registers definitions for all ID registers trapped
> through the TID3 bit of HSR.
> Those are the one that will be emulated in Xen to only publish to guests
> the features that are supported by Xen and that are accessible to
> guests.
> 
> Signed-off-by: Bertrand Marquis 

Reviewed-by: Stefano Stabellini 


> ---
> Changes in V2: Rebase
> Changes in V3:
>   Add case definition for reserved registers.
> Changes in V4:
>   Remove case definition for reserved registers and move it to the code
>   directly.
> 
> ---
>  xen/include/asm-arm/arm64/hsr.h | 37 +
>  1 file changed, 37 insertions(+)
> 
> diff --git a/xen/include/asm-arm/arm64/hsr.h b/xen/include/asm-arm/arm64/hsr.h
> index ca931dd2fe..e691d41c17 100644
> --- a/xen/include/asm-arm/arm64/hsr.h
> +++ b/xen/include/asm-arm/arm64/hsr.h
> @@ -110,6 +110,43 @@
>  #define HSR_SYSREG_CNTP_CTL_EL0   HSR_SYSREG(3,3,c14,c2,1)
>  #define HSR_SYSREG_CNTP_CVAL_EL0  HSR_SYSREG(3,3,c14,c2,2)
>  
> +/* Those registers are used when HCR_EL2.TID3 is set */
> +#define HSR_SYSREG_ID_PFR0_EL1HSR_SYSREG(3,0,c0,c1,0)
> +#define HSR_SYSREG_ID_PFR1_EL1HSR_SYSREG(3,0,c0,c1,1)
> +#define HSR_SYSREG_ID_PFR2_EL1HSR_SYSREG(3,0,c0,c3,4)
> +#define HSR_SYSREG_ID_DFR0_EL1HSR_SYSREG(3,0,c0,c1,2)
> +#define HSR_SYSREG_ID_DFR1_EL1HSR_SYSREG(3,0,c0,c3,5)
> +#define HSR_SYSREG_ID_AFR0_EL1HSR_SYSREG(3,0,c0,c1,3)
> +#define HSR_SYSREG_ID_MMFR0_EL1   HSR_SYSREG(3,0,c0,c1,4)
> +#define HSR_SYSREG_ID_MMFR1_EL1   HSR_SYSREG(3,0,c0,c1,5)
> +#define HSR_SYSREG_ID_MMFR2_EL1   HSR_SYSREG(3,0,c0,c1,6)
> +#define HSR_SYSREG_ID_MMFR3_EL1   HSR_SYSREG(3,0,c0,c1,7)
> +#define HSR_SYSREG_ID_MMFR4_EL1   HSR_SYSREG(3,0,c0,c2,6)
> +#define HSR_SYSREG_ID_MMFR5_EL1   HSR_SYSREG(3,0,c0,c3,6)
> +#define HSR_SYSREG_ID_ISAR0_EL1   HSR_SYSREG(3,0,c0,c2,0)
> +#define HSR_SYSREG_ID_ISAR1_EL1   HSR_SYSREG(3,0,c0,c2,1)
> +#define HSR_SYSREG_ID_ISAR2_EL1   HSR_SYSREG(3,0,c0,c2,2)
> +#define HSR_SYSREG_ID_ISAR3_EL1   HSR_SYSREG(3,0,c0,c2,3)
> +#define HSR_SYSREG_ID_ISAR4_EL1   HSR_SYSREG(3,0,c0,c2,4)
> +#define HSR_SYSREG_ID_ISAR5_EL1   HSR_SYSREG(3,0,c0,c2,5)
> +#define HSR_SYSREG_ID_ISAR6_EL1   HSR_SYSREG(3,0,c0,c2,7)
> +#define HSR_SYSREG_MVFR0_EL1  HSR_SYSREG(3,0,c0,c3,0)
> +#define HSR_SYSREG_MVFR1_EL1  HSR_SYSREG(3,0,c0,c3,1)
> +#define HSR_SYSREG_MVFR2_EL1  HSR_SYSREG(3,0,c0,c3,2)
> +
> +#define HSR_SYSREG_ID_AA64PFR0_EL1   HSR_SYSREG(3,0,c0,c4,0)
> +#define HSR_SYSREG_ID_AA64PFR1_EL1   HSR_SYSREG(3,0,c0,c4,1)
> +#define HSR_SYSREG_ID_AA64DFR0_EL1   HSR_SYSREG(3,0,c0,c5,0)
> +#define HSR_SYSREG_ID_AA64DFR1_EL1   HSR_SYSREG(3,0,c0,c5,1)
> +#define HSR_SYSREG_ID_AA64ISAR0_EL1  HSR_SYSREG(3,0,c0,c6,0)
> +#define HSR_SYSREG_ID_AA64ISAR1_EL1  HSR_SYSREG(3,0,c0,c6,1)
> +#define HSR_SYSREG_ID_AA64MMFR0_EL1  HSR_SYSREG(3,0,c0,c7,0)
> +#define HSR_SYSREG_ID_AA64MMFR1_EL1  HSR_SYSREG(3,0,c0,c7,1)
> +#define HSR_SYSREG_ID_AA64MMFR2_EL1  HSR_SYSREG(3,0,c0,c7,2)
> +#define HSR_SYSREG_ID_AA64AFR0_EL1   HSR_SYSREG(3,0,c0,c5,4)
> +#define HSR_SYSREG_ID_AA64AFR1_EL1   HSR_SYSREG(3,0,c0,c5,5)
> +#define HSR_SYSREG_ID_AA64ZFR0_EL1   HSR_SYSREG(3,0,c0,c4,4)
> +
>  #endif /* __ASM_ARM_ARM64_HSR_H */
>  
>  /*
> -- 
> 2.17.1
> 



Re: [PATCH v4 4/8] xen/arm: create a cpuinfo structure for guest

2020-12-17 Thread Stefano Stabellini
On Thu, 17 Dec 2020, Bertrand Marquis wrote:
> Create a cpuinfo structure for guest and mask into it the features that
> we do not support in Xen or that we do not want to publish to guests.
> 
> Modify some values in the cpuinfo structure for guests to mask some
> features which we do not want to allow to guests (like AMU) or we do not
> support (like SVE).
> Modify some values in the guest cpuinfo structure to guests to hide some
> processor features:
> - SVE as this is not supported by Xen and guest are not allowed to use
> this features (ZEN is set to 0 in CPTR_EL2).
> - AMU as HCPTR_TAM is set in CPTR_EL2 so AMU cannot be used by guests
> All other bits are left untouched.
> - RAS as this is not supported by Xen.
> 
> The code is trying to group together registers modifications for the
> same feature to be able in the long term to easily enable/disable a
> feature depending on user parameters or add other registers modification
> in the same place (like enabling/disabling HCR bits).
> 
> Signed-off-by: Bertrand Marquis 

Reviewed-by: Stefano Stabellini 


> ---
> Changes in V2: Rebase
> Changes in V3:
>   Use current_cpu_data info instead of recalling identify_cpu
> Changes in V4:
>   Use boot_cpu_data instead of current_cpu_data
>   Use "hide XX support" instead of disable as this part of the code is
>   actually only hidding feature to guests but not disabling them (this
>   is done through the HCR register).
>   Modify commit message to be more clear about what is done in
>   guest_cpuinfo.
> 
> ---
>  xen/arch/arm/cpufeature.c| 51 
>  xen/include/asm-arm/cpufeature.h |  2 ++
>  2 files changed, 53 insertions(+)
> 
> diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
> index 86b99ee960..1f6a85aafe 100644
> --- a/xen/arch/arm/cpufeature.c
> +++ b/xen/arch/arm/cpufeature.c
> @@ -24,6 +24,8 @@
>  
>  DECLARE_BITMAP(cpu_hwcaps, ARM_NCAPS);
>  
> +struct cpuinfo_arm __read_mostly guest_cpuinfo;
> +
>  void update_cpu_capabilities(const struct arm_cpu_capabilities *caps,
>   const char *info)
>  {
> @@ -151,6 +153,55 @@ void identify_cpu(struct cpuinfo_arm *c)
>  c->mvfr.bits[2] = READ_SYSREG(MVFR2_EL1);
>  }
>  
> +/*
> + * This function is creating a cpuinfo structure with values modified to mask
> + * all cpu features that should not be published to guest.
> + * The created structure is then used to provide ID registers values to 
> guests.
> + */
> +static int __init create_guest_cpuinfo(void)
> +{
> +/*
> + * TODO: The code is currently using only the features detected on the 
> boot
> + * core. In the long term we should try to compute values containing only
> + * features supported by all cores.
> + */
> +guest_cpuinfo = boot_cpu_data;
> +
> +#ifdef CONFIG_ARM_64
> +/* Hide MPAM support as xen does not support it */
> +guest_cpuinfo.pfr64.mpam = 0;
> +guest_cpuinfo.pfr64.mpam_frac = 0;
> +
> +/* Hide SVE as Xen does not support it */
> +guest_cpuinfo.pfr64.sve = 0;
> +guest_cpuinfo.zfr64.bits[0] = 0;
> +
> +/* Hide MTE support as Xen does not support it */
> +guest_cpuinfo.pfr64.mte = 0;
> +#endif
> +
> +/* Hide AMU support */
> +#ifdef CONFIG_ARM_64
> +guest_cpuinfo.pfr64.amu = 0;
> +#endif
> +guest_cpuinfo.pfr32.amu = 0;
> +
> +/* Hide RAS support as Xen does not support it */
> +#ifdef CONFIG_ARM_64
> +guest_cpuinfo.pfr64.ras = 0;
> +guest_cpuinfo.pfr64.ras_frac = 0;
> +#endif
> +guest_cpuinfo.pfr32.ras = 0;
> +guest_cpuinfo.pfr32.ras_frac = 0;
> +
> +return 0;
> +}
> +/*
> + * This function needs to be run after all smp are started to have
> + * cpuinfo structures for all cores.
> + */
> +__initcall(create_guest_cpuinfo);
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/include/asm-arm/cpufeature.h 
> b/xen/include/asm-arm/cpufeature.h
> index 74139be1cc..6058744c18 100644
> --- a/xen/include/asm-arm/cpufeature.h
> +++ b/xen/include/asm-arm/cpufeature.h
> @@ -283,6 +283,8 @@ extern void identify_cpu(struct cpuinfo_arm *);
>  extern struct cpuinfo_arm cpu_data[];
>  #define current_cpu_data cpu_data[smp_processor_id()]
>  
> +extern struct cpuinfo_arm guest_cpuinfo;
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif
> -- 
> 2.17.1
> 



Re: [PATCH v4 5/8] xen/arm: Add handler for ID registers on arm64

2020-12-17 Thread Stefano Stabellini
On Thu, 17 Dec 2020, Bertrand Marquis wrote:
> Add vsysreg emulation for registers trapped when TID3 bit is activated
> in HSR.
> The emulation is returning the value stored in cpuinfo_guest structure
> for know registers and is handling reserved registers as RAZ.
> 
> Signed-off-by: Bertrand Marquis 

Reviewed-by: Stefano Stabellini 


> ---
> Changes in V2: Rebase
> Changes in V3:
>   Fix commit message
>   Fix code style for GENERATE_TID3_INFO declaration
>   Add handling of reserved registers as RAZ.
> Changes in V4:
>   Fix indentation in GENERATE_TID3_INFO macro
>   Add explicit case code for reserved registers
> 
> ---
>  xen/arch/arm/arm64/vsysreg.c | 82 
>  1 file changed, 82 insertions(+)
> 
> diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
> index 8a85507d9d..41f18612c6 100644
> --- a/xen/arch/arm/arm64/vsysreg.c
> +++ b/xen/arch/arm/arm64/vsysreg.c
> @@ -69,6 +69,14 @@ TVM_REG(CONTEXTIDR_EL1)
>  break;  \
>  }
>  
> +/* Macro to generate easily case for ID co-processor emulation */
> +#define GENERATE_TID3_INFO(reg, field, offset)  \
> +case HSR_SYSREG_##reg:  \
> +{   \
> +return handle_ro_read_val(regs, regidx, hsr.sysreg.read, hsr,   \
> +  1, guest_cpuinfo.field.bits[offset]); \
> +}
> +
>  void do_sysreg(struct cpu_user_regs *regs,
> const union hsr hsr)
>  {
> @@ -259,6 +267,80 @@ void do_sysreg(struct cpu_user_regs *regs,
>   */
>  return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1);
>  
> +/*
> + * HCR_EL2.TID3
> + *
> + * This is trapping most Identification registers used by a guest
> + * to identify the processor features
> + */
> +GENERATE_TID3_INFO(ID_PFR0_EL1, pfr32, 0)
> +GENERATE_TID3_INFO(ID_PFR1_EL1, pfr32, 1)
> +GENERATE_TID3_INFO(ID_PFR2_EL1, pfr32, 2)
> +GENERATE_TID3_INFO(ID_DFR0_EL1, dbg32, 0)
> +GENERATE_TID3_INFO(ID_DFR1_EL1, dbg32, 1)
> +GENERATE_TID3_INFO(ID_AFR0_EL1, aux32, 0)
> +GENERATE_TID3_INFO(ID_MMFR0_EL1, mm32, 0)
> +GENERATE_TID3_INFO(ID_MMFR1_EL1, mm32, 1)
> +GENERATE_TID3_INFO(ID_MMFR2_EL1, mm32, 2)
> +GENERATE_TID3_INFO(ID_MMFR3_EL1, mm32, 3)
> +GENERATE_TID3_INFO(ID_MMFR4_EL1, mm32, 4)
> +GENERATE_TID3_INFO(ID_MMFR5_EL1, mm32, 5)
> +GENERATE_TID3_INFO(ID_ISAR0_EL1, isa32, 0)
> +GENERATE_TID3_INFO(ID_ISAR1_EL1, isa32, 1)
> +GENERATE_TID3_INFO(ID_ISAR2_EL1, isa32, 2)
> +GENERATE_TID3_INFO(ID_ISAR3_EL1, isa32, 3)
> +GENERATE_TID3_INFO(ID_ISAR4_EL1, isa32, 4)
> +GENERATE_TID3_INFO(ID_ISAR5_EL1, isa32, 5)
> +GENERATE_TID3_INFO(ID_ISAR6_EL1, isa32, 6)
> +GENERATE_TID3_INFO(MVFR0_EL1, mvfr, 0)
> +GENERATE_TID3_INFO(MVFR1_EL1, mvfr, 1)
> +GENERATE_TID3_INFO(MVFR2_EL1, mvfr, 2)
> +GENERATE_TID3_INFO(ID_AA64PFR0_EL1, pfr64, 0)
> +GENERATE_TID3_INFO(ID_AA64PFR1_EL1, pfr64, 1)
> +GENERATE_TID3_INFO(ID_AA64DFR0_EL1, dbg64, 0)
> +GENERATE_TID3_INFO(ID_AA64DFR1_EL1, dbg64, 1)
> +GENERATE_TID3_INFO(ID_AA64ISAR0_EL1, isa64, 0)
> +GENERATE_TID3_INFO(ID_AA64ISAR1_EL1, isa64, 1)
> +GENERATE_TID3_INFO(ID_AA64MMFR0_EL1, mm64, 0)
> +GENERATE_TID3_INFO(ID_AA64MMFR1_EL1, mm64, 1)
> +GENERATE_TID3_INFO(ID_AA64MMFR2_EL1, mm64, 2)
> +GENERATE_TID3_INFO(ID_AA64AFR0_EL1, aux64, 0)
> +GENERATE_TID3_INFO(ID_AA64AFR1_EL1, aux64, 1)
> +GENERATE_TID3_INFO(ID_AA64ZFR0_EL1, zfr64, 0)
> +
> +/*
> + * Those cases are catching all Reserved registers trapped by TID3 which
> + * currently have no assignment.
> + * HCR.TID3 is trapping all registers in the group 3:
> + * Op0 == 3, op1 == 0, CRn == c0,CRm == {c1-c7}, op2 == {0-7}.
> + * Those registers are defined as being RO in the Arm Architecture
> + * Reference manual Armv8 (Chapter D12.3.2 of issue F.c) so handle them
> + * as Read-only read as zero.
> + */
> +case HSR_SYSREG(3,0,c0,c3,3):
> +case HSR_SYSREG(3,0,c0,c3,7):
> +case HSR_SYSREG(3,0,c0,c4,2):
> +case HSR_SYSREG(3,0,c0,c4,3):
> +case HSR_SYSREG(3,0,c0,c4,5):
> +case HSR_SYSREG(3,0,c0,c4,6):
> +case HSR_SYSREG(3,0,c0,c4,7):
> +case HSR_SYSREG(3,0,c0,c5,2):
> +case HSR_SYSREG(3,0,c0,c5,3):
> +case HSR_SYSREG(3,0,c0,c5,6):
> +case HSR_SYSREG(3,0,c0,c5,7):
> +case HSR_SYSREG(3,0,c0,c6,2):
> +case HSR_SYSREG(3,0,c0,c6,3):
> +case HSR_SYSREG(3,0,c0,c6,4):
> +case HSR_SYSREG(3,0,c0,c6,5):
> +case HSR_SYSREG(3,0,c0,c6,6):
> +case HSR_SYSREG(3,0,c0,c6,7):
> +case HSR_SYSREG(3,0,c0,c7,3):
> +case HSR_SYSREG(3,0,c0,c7,4):
> +case HSR_SYSREG(3,0,c0,c7,5):
> +case HSR_SYSREG(3,0,c0,c7,6):
> +case HSR_SYSREG(3,0,c0,c7,7):
> +return handle_ro_raz(regs, regidx, hsr.

Re: [PATCH v4 6/8] xen/arm: Add handler for cp15 ID registers

2020-12-17 Thread Stefano Stabellini
On Thu, 17 Dec 2020, Bertrand Marquis wrote:
> Add support for emulation of cp15 based ID registers (on arm32 or when
> running a 32bit guest on arm64).
> The handlers are returning the values stored in the guest_cpuinfo
> structure for known registers and RAZ for all reserved registers.
> In the current status the MVFR registers are no supported.
> 
> Signed-off-by: Bertrand Marquis 
> ---
> Changes in V2: Rebase
> Changes in V3:
>   Add case definition for reserved registers
>   Add handling of reserved registers as RAZ.
>   Fix code style in GENERATE_TID3_INFO declaration
> Changes in V4:
>   Fix comment for missing t (no to not)
>   Put cases for reserved registers directly in the code instead of using
>   a define in the cpregs.h header.
> 
> ---
>  xen/arch/arm/vcpreg.c | 65 +++
>  1 file changed, 65 insertions(+)
> 
> diff --git a/xen/arch/arm/vcpreg.c b/xen/arch/arm/vcpreg.c
> index cdc91cdf5b..1fe07fe02a 100644
> --- a/xen/arch/arm/vcpreg.c
> +++ b/xen/arch/arm/vcpreg.c
> @@ -155,6 +155,24 @@ TVM_REG32(CONTEXTIDR, CONTEXTIDR_EL1)
>  break;  \
>  }
>  
> +/* Macro to generate easily case for ID co-processor emulation */
> +#define GENERATE_TID3_INFO(reg, field, offset)  \
> +case HSR_CPREG32(reg):  \
> +{   \
> +return handle_ro_read_val(regs, regidx, cp32.read, hsr, \
> +  1, guest_cpuinfo.field.bits[offset]); \

This line is misaligned, but it can be adjusted on commit

Reviewed-by: Stefano Stabellini 



> +}
> +
> +/* helper to define cases for all registers for one CRm value */
> +#define HSR_CPREG32_TID3_CASES(REG) case HSR_CPREG32(p15,0,c0,REG,0): \
> +case HSR_CPREG32(p15,0,c0,REG,1): \
> +case HSR_CPREG32(p15,0,c0,REG,2): \
> +case HSR_CPREG32(p15,0,c0,REG,3): \
> +case HSR_CPREG32(p15,0,c0,REG,4): \
> +case HSR_CPREG32(p15,0,c0,REG,5): \
> +case HSR_CPREG32(p15,0,c0,REG,6): \
> +case HSR_CPREG32(p15,0,c0,REG,7)
> +
>  void do_cp15_32(struct cpu_user_regs *regs, const union hsr hsr)
>  {
>  const struct hsr_cp32 cp32 = hsr.cp32;
> @@ -286,6 +304,53 @@ void do_cp15_32(struct cpu_user_regs *regs, const union 
> hsr hsr)
>   */
>  return handle_raz_wi(regs, regidx, cp32.read, hsr, 1);
>  
> +/*
> + * HCR_EL2.TID3
> + *
> + * This is trapping most Identification registers used by a guest
> + * to identify the processor features
> + */
> +GENERATE_TID3_INFO(ID_PFR0, pfr32, 0)
> +GENERATE_TID3_INFO(ID_PFR1, pfr32, 1)
> +GENERATE_TID3_INFO(ID_PFR2, pfr32, 2)
> +GENERATE_TID3_INFO(ID_DFR0, dbg32, 0)
> +GENERATE_TID3_INFO(ID_DFR1, dbg32, 1)
> +GENERATE_TID3_INFO(ID_AFR0, aux32, 0)
> +GENERATE_TID3_INFO(ID_MMFR0, mm32, 0)
> +GENERATE_TID3_INFO(ID_MMFR1, mm32, 1)
> +GENERATE_TID3_INFO(ID_MMFR2, mm32, 2)
> +GENERATE_TID3_INFO(ID_MMFR3, mm32, 3)
> +GENERATE_TID3_INFO(ID_MMFR4, mm32, 4)
> +GENERATE_TID3_INFO(ID_MMFR5, mm32, 5)
> +GENERATE_TID3_INFO(ID_ISAR0, isa32, 0)
> +GENERATE_TID3_INFO(ID_ISAR1, isa32, 1)
> +GENERATE_TID3_INFO(ID_ISAR2, isa32, 2)
> +GENERATE_TID3_INFO(ID_ISAR3, isa32, 3)
> +GENERATE_TID3_INFO(ID_ISAR4, isa32, 4)
> +GENERATE_TID3_INFO(ID_ISAR5, isa32, 5)
> +GENERATE_TID3_INFO(ID_ISAR6, isa32, 6)
> +/* MVFR registers are in cp10 not cp15 */
> +
> +/*
> + * Those cases are catching all Reserved registers trapped by TID3 which
> + * currently have no assignment.
> + * HCR.TID3 is trapping all registers in the group 3:
> + * coproc == p15, opc1 == 0, CRn == c0, CRm == {c2-c7}, opc2 == {0-7}.
> + * Those registers are defined as being RO in the Arm Architecture
> + * Reference manual Armv8 (Chapter D12.3.2 of issue F.c) so handle them
> + * as Read-only read as zero.
> + */
> +case HSR_CPREG32(p15,0,c0,c3,0):
> +case HSR_CPREG32(p15,0,c0,c3,1):
> +case HSR_CPREG32(p15,0,c0,c3,2):
> +case HSR_CPREG32(p15,0,c0,c3,3):
> +case HSR_CPREG32(p15,0,c0,c3,7):
> +HSR_CPREG32_TID3_CASES(c4):
> +HSR_CPREG32_TID3_CASES(c5):
> +HSR_CPREG32_TID3_CASES(c6):
> +HSR_CPREG32_TID3_CASES(c7):
> +return handle_ro_raz(regs, regidx, cp32.read, hsr, 1);
> +
>  /*
>   * HCR_EL2.TIDCP
>   *
> -- 
> 2.17.1
> 



Re: [PATCH v4 7/8] xen/arm: Add CP10 exception support to handle MVFR

2020-12-17 Thread Stefano Stabellini
On Thu, 17 Dec 2020, Bertrand Marquis wrote:
> Add support for cp10 exceptions decoding to be able to emulate the
> values for MVFR0, MVFR1 and MVFR2 when TID3 bit of HSR is activated.
> This is required for aarch32 guests accessing MVFR registers using
> vmrs and vmsr instructions.
> 
> Signed-off-by: Bertrand Marquis 

Reviewed-by: Stefano Stabellini 


> ---
> Changes in V2: Rebase
> Changes in V3:
>   Add case for MVFR2, fix typo VMFR <-> MVFR.
> Changes in V4:
>   Fix typo HSR -> HCR
>   Move no to not comment fix to previous patch
> 
> ---
>  xen/arch/arm/traps.c |  5 +
>  xen/arch/arm/vcpreg.c| 37 
>  xen/include/asm-arm/perfc_defn.h |  1 +
>  xen/include/asm-arm/traps.h  |  1 +
>  4 files changed, 44 insertions(+)
> 
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index 22bd1bd4c6..28d9d64558 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -2097,6 +2097,11 @@ void do_trap_guest_sync(struct cpu_user_regs *regs)
>  perfc_incr(trap_cp14_dbg);
>  do_cp14_dbg(regs, hsr);
>  break;
> +case HSR_EC_CP10:
> +GUEST_BUG_ON(!psr_mode_is_32bit(regs));
> +perfc_incr(trap_cp10);
> +do_cp10(regs, hsr);
> +break;
>  case HSR_EC_CP:
>  GUEST_BUG_ON(!psr_mode_is_32bit(regs));
>  perfc_incr(trap_cp);
> diff --git a/xen/arch/arm/vcpreg.c b/xen/arch/arm/vcpreg.c
> index 1fe07fe02a..cbad8f25a0 100644
> --- a/xen/arch/arm/vcpreg.c
> +++ b/xen/arch/arm/vcpreg.c
> @@ -664,6 +664,43 @@ void do_cp14_dbg(struct cpu_user_regs *regs, const union 
> hsr hsr)
>  inject_undef_exception(regs, hsr);
>  }
>  
> +void do_cp10(struct cpu_user_regs *regs, const union hsr hsr)
> +{
> +const struct hsr_cp32 cp32 = hsr.cp32;
> +int regidx = cp32.reg;
> +
> +if ( !check_conditional_instr(regs, hsr) )
> +{
> +advance_pc(regs, hsr);
> +return;
> +}
> +
> +switch ( hsr.bits & HSR_CP32_REGS_MASK )
> +{
> +/*
> + * HCR.TID3 is trapping access to MVFR register used to identify the
> + * VFP/Simd using VMRS/VMSR instructions.
> + * Exception encoding is using MRC/MCR standard with the reg field in Crn
> + * as are declared MVFR0 and MVFR1 in cpregs.h
> + */
> +GENERATE_TID3_INFO(MVFR0, mvfr, 0)
> +GENERATE_TID3_INFO(MVFR1, mvfr, 1)
> +GENERATE_TID3_INFO(MVFR2, mvfr, 2)
> +
> +default:
> +gdprintk(XENLOG_ERR,
> + "%s p10, %d, r%d, cr%d, cr%d, %d @ 0x%"PRIregister"\n",
> + cp32.read ? "mrc" : "mcr",
> + cp32.op1, cp32.reg, cp32.crn, cp32.crm, cp32.op2, regs->pc);
> +gdprintk(XENLOG_ERR, "unhandled 32-bit CP10 access %#x\n",
> + hsr.bits & HSR_CP32_REGS_MASK);
> +inject_undef_exception(regs, hsr);
> +return;
> +}
> +
> +advance_pc(regs, hsr);
> +}
> +
>  void do_cp(struct cpu_user_regs *regs, const union hsr hsr)
>  {
>  const struct hsr_cp cp = hsr.cp;
> diff --git a/xen/include/asm-arm/perfc_defn.h 
> b/xen/include/asm-arm/perfc_defn.h
> index 6a83185163..31f071222b 100644
> --- a/xen/include/asm-arm/perfc_defn.h
> +++ b/xen/include/asm-arm/perfc_defn.h
> @@ -11,6 +11,7 @@ PERFCOUNTER(trap_cp15_64,  "trap: cp15 64-bit access")
>  PERFCOUNTER(trap_cp14_32,  "trap: cp14 32-bit access")
>  PERFCOUNTER(trap_cp14_64,  "trap: cp14 64-bit access")
>  PERFCOUNTER(trap_cp14_dbg, "trap: cp14 dbg access")
> +PERFCOUNTER(trap_cp10, "trap: cp10 access")
>  PERFCOUNTER(trap_cp,   "trap: cp access")
>  PERFCOUNTER(trap_smc32,"trap: 32-bit smc")
>  PERFCOUNTER(trap_hvc32,"trap: 32-bit hvc")
> diff --git a/xen/include/asm-arm/traps.h b/xen/include/asm-arm/traps.h
> index 997c37884e..c4a3d0fb1b 100644
> --- a/xen/include/asm-arm/traps.h
> +++ b/xen/include/asm-arm/traps.h
> @@ -62,6 +62,7 @@ void do_cp15_64(struct cpu_user_regs *regs, const union hsr 
> hsr);
>  void do_cp14_32(struct cpu_user_regs *regs, const union hsr hsr);
>  void do_cp14_64(struct cpu_user_regs *regs, const union hsr hsr);
>  void do_cp14_dbg(struct cpu_user_regs *regs, const union hsr hsr);
> +void do_cp10(struct cpu_user_regs *regs, const union hsr hsr);
>  void do_cp(struct cpu_user_regs *regs, const union hsr hsr);
>  
>  /* SMCCC handling */
> -- 
> 2.17.1
> 



Re: [PATCH v4 8/8] xen/arm: Activate TID3 in HCR_EL2

2020-12-17 Thread Stefano Stabellini
On Thu, 17 Dec 2020, Bertrand Marquis wrote:
> Activate TID3 bit in HCR register when starting a guest.
> This will trap all coprecessor ID registers so that we can give to guest
> values corresponding to what they can actually use and mask some
> features to guests even though they would be supported by the underlying
> hardware (like SVE or MPAM).
> 
> Signed-off-by: Bertrand Marquis 

Reviewed-by: Stefano Stabellini 


> ---
> Changes in V2: Rebase
> Changes in V3: Rebase
> Changes in V4: Rebase
> 
> ---
>  xen/arch/arm/traps.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index 28d9d64558..c1a9ad6056 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -98,7 +98,7 @@ register_t get_default_hcr_flags(void)
>  {
>  return  (HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
>   (vwfi != NATIVE ? (HCR_TWI|HCR_TWE) : 0) |
> - HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB|HCR_TSW);
> + HCR_TID3|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB|HCR_TSW);
>  }
>  
>  static enum {
> -- 
> 2.17.1
> 



Re: [PATCH v4 0/8] xen/arm: Emulate ID registers

2020-12-17 Thread Stefano Stabellini
On Thu, 17 Dec 2020, Bertrand Marquis wrote:
> The goal of this serie is to emulate coprocessor ID registers so that
> Xen only publish to guest features that are supported by Xen and can
> actually be used by guests.
> One practical example where this is required are SVE support which is
> forbidden by Xen as it is not supported, but if Linux is compiled with
> it, it will crash on boot. An other one is AMU which is also forbidden
> by Xen but one Linux compiled with it would crash if the platform
> supports it.
> 
> To be able to emulate the coprocessor registers defining what features
> are supported by the hardware, the TID3 bit of HCR must be disabled and
> Xen must emulated the values of those registers when an exception is
> catched when a guest is accessing it.
> 
> This serie is first creating a guest cpuinfo structure which will
> contain the values that we want to publish to the guests and then
> provides the proper emulationg for those registers when Xen is getting
> an exception due to an access to any of those registers.
> 
> This is a first simple implementation to solve the problem and the way
> to define the values that we provide to guests and which features are
> disabled will be in a future patchset enhance so that we could decide
> per guest what can be used or not and depending on this deduce the bits
> to activate in HCR and the values that we must publish on ID registers.

As per our discussion I think we want to add this to the series.

---

xen/arm: clarify support status for various ARMv8.x CPUs

ARMv8.1+ is not security supported for now, as it would require more
investigation on hardware features that Xen has to hide from the guest.

Signed-off-by: Stefano Stabellini 

diff --git a/SUPPORT.md b/SUPPORT.md
index ab02aca5f4..d95ce3a411 100644
--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -37,7 +37,8 @@ supported in this document.
 
 ### ARM v8
 
-Status: Supported
+Status, ARMv8.0: Supported
+Status, ARMv8.1+: Supported, not security supported
 Status, Cortex A57 r0p0-r1p1: Supported, not security supported
 
 For the Cortex A57 r0p0 - r1p1, see Errata 832075.



Re: [PATCH] xen: Rework WARN_ON() to return whether a warning was triggered

2020-12-17 Thread Stefano Stabellini
On Tue, 15 Dec 2020, Jan Beulich wrote:
> On 15.12.2020 14:19, Julien Grall wrote:
> > On 15/12/2020 11:46, Jan Beulich wrote:
> >> On 15.12.2020 12:26, Julien Grall wrote:
> >>> --- a/xen/include/xen/lib.h
> >>> +++ b/xen/include/xen/lib.h
> >>> @@ -23,7 +23,13 @@
> >>>   #include 
> >>>   
> >>>   #define BUG_ON(p)  do { if (unlikely(p)) BUG();  } while (0)
> >>> -#define WARN_ON(p) do { if (unlikely(p)) WARN(); } while (0)
> >>> +#define WARN_ON(p)  ({  \
> >>> +bool __ret_warn_on = (p);   \
> >>
> >> Please can you avoid leading underscores here?
> > 
> > I can.
> > 
> >>
> >>> +\
> >>> +if ( unlikely(__ret_warn_on) )  \
> >>> +WARN(); \
> >>> +unlikely(__ret_warn_on);\
> >>> +})
> >>
> >> Is this latter unlikely() having any effect? So far I thought it
> >> would need to be immediately inside a control construct or be an
> >> operand to && or ||.
> > 
> > The unlikely() is directly taken from the Linux implementation.
> > 
> > My guess is the compiler is still able to use the information for the 
> > branch prediction in the case of:
> > 
> > if ( WARN_ON(...) )
> 
> Maybe. Or maybe not. I don't suppose the Linux commit introducing
> it clarifies this?

I did a bit of digging but it looks like the unlikely has been there
forever. I'd just keep it as is.



Re: [PATCH] xen: Rework WARN_ON() to return whether a warning was triggered

2020-12-17 Thread Julien Grall
On Thu, 17 Dec 2020 at 23:54, Stefano Stabellini  wrote:
>
> On Tue, 15 Dec 2020, Jan Beulich wrote:
> > On 15.12.2020 14:19, Julien Grall wrote:
> > > On 15/12/2020 11:46, Jan Beulich wrote:
> > >> On 15.12.2020 12:26, Julien Grall wrote:
> > >>> --- a/xen/include/xen/lib.h
> > >>> +++ b/xen/include/xen/lib.h
> > >>> @@ -23,7 +23,13 @@
> > >>>   #include 
> > >>>
> > >>>   #define BUG_ON(p)  do { if (unlikely(p)) BUG();  } while (0)
> > >>> -#define WARN_ON(p) do { if (unlikely(p)) WARN(); } while (0)
> > >>> +#define WARN_ON(p)  ({  \
> > >>> +bool __ret_warn_on = (p);   \
> > >>
> > >> Please can you avoid leading underscores here?
> > >
> > > I can.
> > >
> > >>
> > >>> +\
> > >>> +if ( unlikely(__ret_warn_on) )  \
> > >>> +WARN(); \
> > >>> +unlikely(__ret_warn_on);\
> > >>> +})
> > >>
> > >> Is this latter unlikely() having any effect? So far I thought it
> > >> would need to be immediately inside a control construct or be an
> > >> operand to && or ||.
> > >
> > > The unlikely() is directly taken from the Linux implementation.
> > >
> > > My guess is the compiler is still able to use the information for the
> > > branch prediction in the case of:
> > >
> > > if ( WARN_ON(...) )
> >
> > Maybe. Or maybe not. I don't suppose the Linux commit introducing
> > it clarifies this?
>
> I did a bit of digging but it looks like the unlikely has been there
> forever. I'd just keep it as is.

Thanks! I was planning to answer earlier on with some data but got
preempted with some higher priority work.
The Linux commit message is not very helpful. I will do some testing
so I can convince Jan that compilers can be clever and make use of it.

Cheers,



[xen-4.13-testing bisection] complete test-amd64-amd64-xl-qemuu-ovmf-amd64

2020-12-17 Thread osstest service owner
branch xen-4.13-testing
xenbranch xen-4.13-testing
job test-amd64-amd64-xl-qemuu-ovmf-amd64
testid debian-hvm-install

Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: seabios git://xenbits.xen.org/osstest/seabios.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  ovmf git://xenbits.xen.org/osstest/ovmf.git
  Bug introduced:  cee5b0441af39dd6f76cc4e0447a1c7f788cbb00
  Bug not present: 8e4cb8fbceb84b66b3b2fc45b9e93d70f732e970
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/157657/


  commit cee5b0441af39dd6f76cc4e0447a1c7f788cbb00
  Author: Guo Dong 
  Date:   Wed Dec 2 14:18:18 2020 -0700
  
  UefiCpuPkg/CpuDxe: Fix boot error
  
  REF: https://bugzilla.tianocore.org/show_bug.cgi?id=3084
  
  When DXE drivers are dispatched above 4GB memory and
  the system is already in 64bit mode, the address
  setCodeSelectorLongJump in stack will be override
  by parameter. so change to use 64bit address and
  jump to qword address.
  
  Signed-off-by: Guo Dong 
  Reviewed-by: Ray Ni 
  Reviewed-by: Eric Dong 


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/xen-4.13-testing/test-amd64-amd64-xl-qemuu-ovmf-amd64.debian-hvm-install.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/xen-4.13-testing/test-amd64-amd64-xl-qemuu-ovmf-amd64.debian-hvm-install
 --summary-out=tmp/157657.bisection-summary --basis-template=157135 
--blessings=real,real-bisect,real-retry xen-4.13-testing 
test-amd64-amd64-xl-qemuu-ovmf-amd64 debian-hvm-install
Searching for failure / basis pass:
 157597 fail [host=rimava1] / 157135 [host=godello1] 156988 [host=godello0] 
156636 [host=godello1] 156593 [host=albana1] 156399 [host=chardonnay0] 156317 
[host=elbling0] 156265 [host=huxelrebe1] 156054 [host=godello1] 156030 
[host=godello0] 155377 [host=fiano1] 155258 ok.
Failure / basis pass flights: 157597 / 155258
(tree with no url: minios)
Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: seabios git://xenbits.xen.org/osstest/seabios.git
Tree: xen git://xenbits.xen.org/xen.git
Latest c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
f95e80d832e923046c92cd6f0b8208cec147138e 
d0d8ad39ecb51cd7497cd524484fe09f50876798 
7269466a5b0c0e89b36dc9a7db0554ae404aa230 
748d619be3282fba35f99446098ac2d0579f6063 
10c7c213bef26274684798deb3e351a6756046d2
Basis pass c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
d8ab884fe9b4dd148980bf0d8673187f8fb25887 
d0d8ad39ecb51cd7497cd524484fe09f50876798 
730e2b1927e7d911bbd5350714054ddd5912f4ed 
41289b83ed3847dc45e7af3f1b7cb3cec6b6e7a5 
88f5b414ac0f8008c1e2b26f93c3d980120941f7
Generating revisions with ./adhoc-revtuple-generator  
git://xenbits.xen.org/linux-pvops.git#c3038e718a19fc596f7b1baba0f83d5146dc7784-c3038e718a19fc596f7b1baba0f83d5146dc7784
 
git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860
 
git://xenbits.xen.org/osstest/ovmf.git#d8ab884fe9b4dd148980bf0d8673187f8fb25887-f95e80d832e923046c92cd6f0b8208cec147138e
 git://xenbits.xen.org/qemu-xen-traditional.git#d0d8ad39ecb51cd7497cd524484\
 fe09f50876798-d0d8ad39ecb51cd7497cd524484fe09f50876798 
git://xenbits.xen.org/qemu-xen.git#730e2b1927e7d911bbd5350714054ddd5912f4ed-7269466a5b0c0e89b36dc9a7db0554ae404aa230
 
git://xenbits.xen.org/osstest/seabios.git#41289b83ed3847dc45e7af3f1b7cb3cec6b6e7a5-748d619be3282fba35f99446098ac2d0579f6063
 
git://xenbits.xen.org/xen.git#88f5b414ac0f8008c1e2b26f93c3d980120941f7-10c7c213bef26274684798deb3e351a6756046d2
Loaded 7669 nodes in revision graph
Searching for test results:
 155132 [host=albana0]
 155258 pass c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
d8ab884fe9b4dd148980bf0d8673187f8fb25887 
d0d8ad39ecb51cd7497cd524484fe09f50876798 
730e2b1927e7d911bbd5350714054ddd5912f4ed 
41289b83ed3847dc45e7af3f1b7cb3cec6b6e7a5 
88f5b414ac0f8008c1e2b26f93c3d980120941f7
 155377 [host=fiano1]
 156030 [host=godello0]
 156054 [host=godello1]
 156265 [host=huxelrebe1]
 156317 [host=elbling0]
 156399 [host=chardonnay0]
 156593 [host=albana1]
 156636 [host=godello1]
 156988 [host=godello0]
 157135 [host=godello1]
 157563 fail c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f

[linux-linus test] 157619: regressions - FAIL

2020-12-17 Thread osstest service owner
flight 157619 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/157619/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-xsm7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-libvirt   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-libvirt-xsm   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-raw7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-pvshim 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-freebsd10-i386  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-shadow 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. 
vs. 152332
 test-amd64-amd64-dom0pvh-xl-amd 14 guest-start   fail REGR. vs. 152332
 test-amd64-i386-examine   6 xen-install  fail REGR. vs. 152332
 test-amd64-amd64-xl-xsm  14 guest-start  fail REGR. vs. 152332
 test-arm64-arm64-libvirt-xsm  8 xen-boot fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-amd64-dom0pvh-xl-intel 14 guest-start fail REGR. vs. 152332
 test-arm64-arm64-xl-credit1   8 xen-boot fail REGR. vs. 152332
 test-amd64-amd64-xl-credit1  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-pvhv2-intel 14 guest-start   fail REGR. vs. 152332
 test-amd64-amd64-libvirt 14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-multivcpu 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-xl-shadow   14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-libvirt-xsm 14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-credit2  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-pvshim   14 guest-start  fail REGR. vs. 152332
 test-amd64-coresched-amd64-xl 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-libvirt-pair 25 guest-start/debian  fail REGR. vs. 152332
 test-amd64-amd64-xl-pvhv2-amd 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-pair25 guest-start/debian   fail REGR. vs. 152332
 test-arm64-arm64-examine 13 examine-iommufail REGR. vs. 152332
 test-amd64-amd64-xl-qemuu-ovmf-amd64 12 debian-hvm-install fail REGR. vs. 
152332
 test-amd64-amd64-amd64-pvgrub 20 guest-stop  fail REGR. vs. 152332
 test-amd64-amd64-i386-pvgrub 20 guest-stop   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-coresched-i386-xl  7 xen-install  fail REGR. vs. 152332
 test-arm64-arm64-xl-xsm   8 xen-boot fail REGR. vs. 152332
 test-arm64-arm64-xl-credit2  10 host-ping-check-xen  fail REGR. vs. 152332
 test-arm64-arm64-xl-seattle   8 xen-boot fail REGR. vs. 152332
 test-arm64-arm64-xl   8 xen-boot fail REGR. vs. 152332

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 14 guest-start  

[xen-unstable-smoke test] 157656: tolerable all pass - PUSHED

2020-12-17 Thread osstest service owner
flight 157656 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/157656/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  7a3b691a8f3aa7720eecaab0e7bd090aa392885a
baseline version:
 xen  641723f78d3a0b1982e1cd2ef37d8d877cfe542d

Last test of basis   157649  2020-12-17 16:00:26 Z0 days
Testing same since   157656  2020-12-17 23:02:14 Z0 days1 attempts


People who touched revisions under test:
  Stefano Stabellini 
  Stefano Stabellini 
  Wei Liu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   641723f78d..7a3b691a8f  7a3b691a8f3aa7720eecaab0e7bd090aa392885a -> smoke



[ovmf test] 157633: regressions - FAIL

2020-12-17 Thread osstest service owner
flight 157633 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/157633/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemuu-ovmf-amd64 12 debian-hvm-install fail REGR. vs. 157345
 test-amd64-amd64-xl-qemuu-ovmf-amd64 12 debian-hvm-install fail REGR. vs. 
157345

version targeted for testing:
 ovmf e6ae24e1d676bb2bdc0fc715b49b04908f41fc10
baseline version:
 ovmf f95e80d832e923046c92cd6f0b8208cec147138e

Last test of basis   157345  2020-12-09 12:40:46 Z8 days
Failing since157348  2020-12-09 15:39:39 Z8 days   52 attempts
Testing same since   157612  2020-12-16 21:09:14 Z1 days2 attempts


People who touched revisions under test:
  Abner Chang 
  Ard Biesheuvel 
  Baraneedharan Anbazhagan 
  Baraneedharan Anbazhagan 
  Bret Barkelew 
  Chen, Christine 
  Fan Wang 
  James Bottomley 
  Jiaxin Wu 
  Marc Moisson-Franckhauser 
  Michael D Kinney 
  Michael Kubacki 
  Pierre Gondois 
  Ray Ni 
  Rebecca Cran 
  Sami Mujawar 
  Sean Brogan 
  Sheng Wei 
  Siyuan Fu 
  Star Zeng 
  Ting Ye 
  Yuwei Chen 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 fail
 test-amd64-i386-xl-qemuu-ovmf-amd64  fail



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 699 lines long.)



[xen-4.12-testing test] 157627: tolerable FAIL - PUSHED

2020-12-17 Thread osstest service owner
flight 157627 xen-4.12-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/157627/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qcow219 guest-localmigrate/x10   fail  like 157134
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 157134
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 157134
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 157134
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 157134
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 157134
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 157134
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 157134
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 157134
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 157134
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 157134
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 157134
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  2525a745e18bbf14b4f7b1b18209a0ab9166178d
baseline version:
 xen  8145d38b48009255a32ab87a02e481cd09c811f9

Last test of basis   157134  2020-12-01 15:05:58 Z   16 days
Testing same since   157562  2020-12-15 13:36:12 Z2 days3 attempts


People who touched revisions under test:
  Andrew Cooper 
  Christian Lindig 
  Edwin Török 
  Harsha Shamsundara Havanur 
  Jan Beulich 
  Juergen Gross 
  Julien Grall 
  Julien Grall 

jobs:
 build-amd64-xsm  

Re: [PATCH v2] xen/xenbus: make xs_talkv() interruptible

2020-12-17 Thread Jürgen Groß

On 17.12.20 19:25, Andrew Cooper wrote:

On 16/12/2020 08:21, Jürgen Groß wrote:

On 15.12.20 21:59, Andrew Cooper wrote:

On 15/12/2020 11:10, Juergen Gross wrote:

In case a process waits for any Xenstore action in the xenbus driver
it should be interruptible by signals.

Signed-off-by: Juergen Gross 
---
V2:
- don't special case SIGKILL as libxenstore is handling -EINTR fine
---
   drivers/xen/xenbus/xenbus_xs.c | 9 -
   1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/xenbus/xenbus_xs.c
b/drivers/xen/xenbus/xenbus_xs.c
index 3a06eb699f33..17c8f8a155fd 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -205,8 +205,15 @@ static bool test_reply(struct xb_req_data *req)
     static void *read_reply(struct xb_req_data *req)
   {
+    int ret;
+
   do {
-    wait_event(req->wq, test_reply(req));
+    ret = wait_event_interruptible(req->wq, test_reply(req));
+
+    if (ret == -ERESTARTSYS && signal_pending(current)) {
+    req->msg.type = XS_ERROR;
+    return ERR_PTR(-EINTR);
+    }


So now I can talk fully about the situations which lead to this, I think
there is a bit more complexity.

It turns out there are a number of issues related to running a Xen
system with no xenstored.

1) If a xenstore-write occurs during startup before init-xenstore-domain
runs, the former blocks on /dev/xen/xenbus waiting for xenstored to
reply, while the latter blocks on /dev/xen/xenbus_backend when trying to
tell the dom0 kernel that xenstored is in dom1.  This effectively
deadlocks the system.


This should be easy to solve: any request to /dev/xen/xenbus should
block upfront in case xenstored isn't up yet (could e.g. wait
interruptible until xenstored_ready is non-zero).


I'm not sure that that would fix the problem.  The problem is that
setting the ring details via /dev/xen/xenbus_backend blocks, which
prevents us launching the xenstored stubdomain, which prevents the
earlier xenbus write being completed.


But _why_ is it blocking? Digging through the code I think it blocks
is xs_suspend() due to the normal xenstore request being pending. If
that request doesn't reach the state to cause blocking in xs_suspend()
all is fine.


So long as /dev/xen/xenbus_backend doesn't block, there's no problem
with other /dev/xen/xenbus activity being pending briefly.


Looking at the current logic, I'm not completely convinced.  Even
finding a filled-in evtchn/gfn doesn't mean that xenstored is actually
ready.


No, but the deadlock is not going to happen anymore (famous last
words).



There are 3 possible cases.

1) PV guest, and details in start_info
2) HVM guest, and details in HVM_PARAMs
3) No details (expected for dom0).  Something in userspace must provide
details at a later point.

So the setup phases go from nothing, to having ring details, to finding
the ring working.

I think it would be prudent to try reading a key between having details
and declaring the xenstored_ready.  Any activity, even XS_ERROR,
indicates that the other end of the ring is listening.


Yes. But I really think the xs_suspend() is the problematic case. And
this will be called _before_ xenstored_ready is being set.






2) If xenstore-watch is running when xenstored dies, it spins at 100%
cpu usage making no system calls at all.  This is caused by bad error
handling from xs_watch(), and attempting to debug found:


Can you expand on "bad error handling from xs_watch()", please?


do_watch() has

     for ( ... ) { // defaults to an infinite loop
         vec = xs_read_watch();
         if (vec == NULL)
             continue;
         ...
     }


My next plan was to experiment with break instead of continue, which
I'll get to at some point.


I'd rather put a sleep() in. Otherwise you might break some use cases.







3) (this issue).  If anyone starts xenstore-watch with no xenstored
running at all, it blocks in D in the kernel.


Should be handled with solution for 1).



The cause is the special handling for watch/unwatch commands which,
instead of just queuing up the data for xenstore, explicitly waits for
an OK for registering the watch.  This causes a write() system call to
block waiting for a non-existent entity to reply.

So while this patch does resolve the major usability issue I found (I
can't even SIGINT and get my terminal back), I think there are issues.

The reason why XS_WATCH/XS_UNWATCH are special cased is because they do
require special handling.  The main kernel thread for processing
incoming data from xenstored does need to know how to associate each
async XS_WATCH_EVENT to the caller who watched the path.

Therefore, depending on when this cancellation hits, we might be in any
of the following states:

1) the watch is queued in the kernel, but not even sent to xenstored yet
2) the watch is queued in the xenstored ring, but not acted upon
3) the watch is queued in the xenstored ring, and the xenstored has seen
it but not replied yet

Re: [xen-4.13-testing bisection] complete test-amd64-amd64-xl-qemuu-ovmf-amd64

2020-12-17 Thread Jan Beulich
On 18.12.2020 01:32, osstest service owner wrote:
> branch xen-4.13-testing
> xenbranch xen-4.13-testing
> job test-amd64-amd64-xl-qemuu-ovmf-amd64
> testid debian-hvm-install
> 
> Tree: linux git://xenbits.xen.org/linux-pvops.git
> Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
> Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
> Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
> Tree: qemuu git://xenbits.xen.org/qemu-xen.git
> Tree: seabios git://xenbits.xen.org/osstest/seabios.git
> Tree: xen git://xenbits.xen.org/xen.git
> 
> *** Found and reproduced problem changeset ***
> 
>   Bug is in tree:  ovmf git://xenbits.xen.org/osstest/ovmf.git
>   Bug introduced:  cee5b0441af39dd6f76cc4e0447a1c7f788cbb00
>   Bug not present: 8e4cb8fbceb84b66b3b2fc45b9e93d70f732e970
>   Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/157657/
> 
> 
>   commit cee5b0441af39dd6f76cc4e0447a1c7f788cbb00
>   Author: Guo Dong 
>   Date:   Wed Dec 2 14:18:18 2020 -0700
>   
>   UefiCpuPkg/CpuDxe: Fix boot error
>   
>   REF: https://bugzilla.tianocore.org/show_bug.cgi?id=3084
>   
>   When DXE drivers are dispatched above 4GB memory and
>   the system is already in 64bit mode, the address
>   setCodeSelectorLongJump in stack will be override
>   by parameter. so change to use 64bit address and
>   jump to qword address.
>   
>   Signed-off-by: Guo Dong 
>   Reviewed-by: Ray Ni 
>   Reviewed-by: Eric Dong 

Is this a result one can consider trustworthy? The ovmf tree used
with 4.13 shouldn't have changed anywhere half recently, I would
assume ...

Jan

> For bisection revision-tuple graph see:
>
> http://logs.test-lab.xenproject.org/osstest/results/bisect/xen-4.13-testing/test-amd64-amd64-xl-qemuu-ovmf-amd64.debian-hvm-install.html
> Revision IDs in each graph node refer, respectively, to the Trees above.
> 
> 
> Running cs-bisection-step 
> --graph-out=/home/logs/results/bisect/xen-4.13-testing/test-amd64-amd64-xl-qemuu-ovmf-amd64.debian-hvm-install
>  --summary-out=tmp/157657.bisection-summary --basis-template=157135 
> --blessings=real,real-bisect,real-retry xen-4.13-testing 
> test-amd64-amd64-xl-qemuu-ovmf-amd64 debian-hvm-install
> Searching for failure / basis pass:
>  157597 fail [host=rimava1] / 157135 [host=godello1] 156988 [host=godello0] 
> 156636 [host=godello1] 156593 [host=albana1] 156399 [host=chardonnay0] 156317 
> [host=elbling0] 156265 [host=huxelrebe1] 156054 [host=godello1] 156030 
> [host=godello0] 155377 [host=fiano1] 155258 ok.
> Failure / basis pass flights: 157597 / 155258
> (tree with no url: minios)
> Tree: linux git://xenbits.xen.org/linux-pvops.git
> Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
> Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
> Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
> Tree: qemuu git://xenbits.xen.org/qemu-xen.git
> Tree: seabios git://xenbits.xen.org/osstest/seabios.git
> Tree: xen git://xenbits.xen.org/xen.git
> Latest c3038e718a19fc596f7b1baba0f83d5146dc7784 
> c530a75c1e6a472b0eb9558310b518f0dfcd8860 
> f95e80d832e923046c92cd6f0b8208cec147138e 
> d0d8ad39ecb51cd7497cd524484fe09f50876798 
> 7269466a5b0c0e89b36dc9a7db0554ae404aa230 
> 748d619be3282fba35f99446098ac2d0579f6063 
> 10c7c213bef26274684798deb3e351a6756046d2
> Basis pass c3038e718a19fc596f7b1baba0f83d5146dc7784 
> c530a75c1e6a472b0eb9558310b518f0dfcd8860 
> d8ab884fe9b4dd148980bf0d8673187f8fb25887 
> d0d8ad39ecb51cd7497cd524484fe09f50876798 
> 730e2b1927e7d911bbd5350714054ddd5912f4ed 
> 41289b83ed3847dc45e7af3f1b7cb3cec6b6e7a5 
> 88f5b414ac0f8008c1e2b26f93c3d980120941f7
> Generating revisions with ./adhoc-revtuple-generator  
> git://xenbits.xen.org/linux-pvops.git#c3038e718a19fc596f7b1baba0f83d5146dc7784-c3038e718a19fc596f7b1baba0f83d5146dc7784
>  
> git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860
>  
> git://xenbits.xen.org/osstest/ovmf.git#d8ab884fe9b4dd148980bf0d8673187f8fb25887-f95e80d832e923046c92cd6f0b8208cec147138e
>  git://xenbits.xen.org/qemu-xen-traditional.git#d0d8ad39ecb51cd7497cd524484\
>  fe09f50876798-d0d8ad39ecb51cd7497cd524484fe09f50876798 
> git://xenbits.xen.org/qemu-xen.git#730e2b1927e7d911bbd5350714054ddd5912f4ed-7269466a5b0c0e89b36dc9a7db0554ae404aa230
>  
> git://xenbits.xen.org/osstest/seabios.git#41289b83ed3847dc45e7af3f1b7cb3cec6b6e7a5-748d619be3282fba35f99446098ac2d0579f6063
>  
> git://xenbits.xen.org/xen.git#88f5b414ac0f8008c1e2b26f93c3d980120941f7-10c7c213bef26274684798deb3e351a6756046d2
> Loaded 7669 nodes in revision graph
> Searching for test results:
>  155132 [host=albana0]
>  155258 pass c3038e718a19fc596f7b1baba0f83d5146dc7784 
> c530a75c1e6a472b0eb9558310b518f0dfcd8860 
> d8ab884fe9b4dd148980bf0d8673187f8fb25887 
> d0d8ad39ecb51cd7497cd524484fe09f50876798 
> 730e2b1927e7d911bbd5350714054ddd5912f4ed 
> 41289b83ed3847dc

[xen-4.13-testing test] 157629: regressions - FAIL

2020-12-17 Thread osstest service owner
flight 157629 xen-4.13-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/157629/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-ovmf-amd64 12 debian-hvm-install fail REGR. vs. 
157135

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass 
in 157563

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 157135
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 157135
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 157135
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 157135
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 157135
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 157135
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 157135
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 157135
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 157135
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 157135
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 157135
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  10c7c213bef26274684798deb3e351a6756046d2
baseline version:
 xen  b5302273e2c51940172400486644636f2f4fc64a

Last test of basis   157135  2020-12-01 15:06:11 Z   16 days
Testing same since   157563  2020-12-15 13:36:28 Z2 days3 attempts


People who touched revisions under

Re: [PATCH] xen: Rework WARN_ON() to return whether a warning was triggered

2020-12-17 Thread Jan Beulich
On 18.12.2020 00:54, Stefano Stabellini wrote:
> On Tue, 15 Dec 2020, Jan Beulich wrote:
>> On 15.12.2020 14:19, Julien Grall wrote:
>>> On 15/12/2020 11:46, Jan Beulich wrote:
 On 15.12.2020 12:26, Julien Grall wrote:
> --- a/xen/include/xen/lib.h
> +++ b/xen/include/xen/lib.h
> @@ -23,7 +23,13 @@
>   #include 
>   
>   #define BUG_ON(p)  do { if (unlikely(p)) BUG();  } while (0)
> -#define WARN_ON(p) do { if (unlikely(p)) WARN(); } while (0)
> +#define WARN_ON(p)  ({  \
> +bool __ret_warn_on = (p);   \

 Please can you avoid leading underscores here?
>>>
>>> I can.
>>>

> +\
> +if ( unlikely(__ret_warn_on) )  \
> +WARN(); \
> +unlikely(__ret_warn_on);\
> +})

 Is this latter unlikely() having any effect? So far I thought it
 would need to be immediately inside a control construct or be an
 operand to && or ||.
>>>
>>> The unlikely() is directly taken from the Linux implementation.
>>>
>>> My guess is the compiler is still able to use the information for the 
>>> branch prediction in the case of:
>>>
>>> if ( WARN_ON(...) )
>>
>> Maybe. Or maybe not. I don't suppose the Linux commit introducing
>> it clarifies this?
> 
> I did a bit of digging but it looks like the unlikely has been there
> forever. I'd just keep it as is.

I'm afraid I don't view this as a reason to inherit code unchanged.
If it was introduced with a clear indication that compilers can
recognize it despite the somewhat unusual placement, then fine. But
likely() / unlikely() quite often get put in more or less blindly -
see the not uncommon unlikely(a && b) style of uses, which don't
typically have the intended effect and would instead need to be
unlikely(a) && unlikely(b) [assuming each condition alone is indeed
deemed unlikely], unless compilers have learned to guess/infer
what's meant between when I last looked at this and now.

Jan