c/s 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to
alt-call") went too far with dropping NULL function pointer checks.

smp_callin() calls hvm_cpu_up() unconditionally.  When the platform doesn't
support HVM, hvm_enable() exits without filling in hvm_funcs, after which the
altcall pass nukes the (now unconditional) indirect call, causing:

  (XEN) ----[ Xen-4.17.0-10.18-d  x86_64  debug=y  Not tainted ]----
  (XEN) CPU:    1
  (XEN) RIP:    e008:[<ffff82d04034bef5>] start_secondary+0x393/0x3b7
  (XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor
  ...
  (XEN) Xen code around <ffff82d04034bef5> (start_secondary+0x393/0x3b7):
  (XEN)  ff ff 8b 05 1b 84 17 00 <0f> 0b 0f ff ff 90 89 c3 85 c0 0f 84 db fe ff 
ff
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d04034bef5>] R start_secondary+0x393/0x3b7
  (XEN)    [<ffff82d0402000e2>] F __high_start+0x42/0x60

To make matters worse, __stop_this_cpu() calls hvm_cpu_down() unconditionally
too, so what happen next is:

  (XEN) ----[ Xen-4.17.0-10.18-d  x86_64  debug=y  Not tainted ]----
  (XEN) CPU:    0
  (XEN) RIP:    e008:[<ffff82d04034ab02>] __stop_this_cpu+0x12/0x3c
  (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
  ...
  (XEN) Xen code around <ffff82d04034ab02> (__stop_this_cpu+0x12/0x3c):
  (XEN)  48 89 e5 e8 8a 1d fd ff <0f> 0b 0f ff ff 90 0f 06 db e3 48 89 e0 48 0d 
ff
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d04034ab02>] R __stop_this_cpu+0x12/0x3c
  (XEN)    [<ffff82d04034ac15>] F smp_send_stop+0xdd/0xf8
  (XEN)    [<ffff82d04034a229>] F machine_restart+0xa2/0x298
  (XEN)    [<ffff82d04034a42a>] F arch/x86/shutdown.c#__machine_restart+0xb/0x11
  (XEN)    [<ffff82d04022fd15>] F smp_call_function_interrupt+0xbf/0xea
  (XEN)    [<ffff82d04034acc6>] F call_function_interrupt+0x35/0x37
  (XEN)    [<ffff82d040331a70>] F do_IRQ+0xa3/0x6b5
  (XEN)    [<ffff82d04039482a>] F common_interrupt+0x10a/0x120
  (XEN)    [<ffff82d04031f649>] F __udelay+0x3a/0x51
  (XEN)    [<ffff82d04034d5fb>] F __cpu_up+0x48f/0x734
  (XEN)    [<ffff82d040203c2b>] F cpu_up+0x7d/0xde
  (XEN)    [<ffff82d0404543d3>] F __start_xen+0x200b/0x2618
  (XEN)    [<ffff82d0402000ef>] F __high_start+0x4f/0x60

which recurses until hitting a stack overflow.  The #DF handler, which resets
its stack on each invocation, loops indefinitely.

Reinstate the NULL function pointer checks for hvm_cpu_{up,down}().

Fixes: 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to 
alt-call")
Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
---
CC: Jan Beulich <jbeul...@suse.com>
CC: Roger Pau Monné <roger....@citrix.com>
CC: Wei Liu <w...@xen.org>

RFC.  Not tested yet on the imacted hardware.  It's a Xeon PHI with another
werid thing in need of debugging.  First boot is fine, while second
boot (loading microcode this time) has a problem with vmx.

I wonder if we want to modify the callers to check for HVM being enabled,
rather than leaving the NULL pointer checks in a position where they're liable
to be reaped again.

Also, the #UD handler really should identify 0f 0b 0f ff ff as the
clobbered-altcall sequence, and provide a message to that effect.
---
 xen/arch/x86/include/asm/hvm/hvm.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/include/asm/hvm/hvm.h 
b/xen/arch/x86/include/asm/hvm/hvm.h
index a441cbc22159..2dd08567f730 100644
--- a/xen/arch/x86/include/asm/hvm/hvm.h
+++ b/xen/arch/x86/include/asm/hvm/hvm.h
@@ -553,12 +553,16 @@ static inline void hvm_invlpg(struct vcpu *v, unsigned 
long linear)
 
 static inline int hvm_cpu_up(void)
 {
-    return alternative_call(hvm_funcs.cpu_up);
+    if ( hvm_funcs.cpu_up )
+        return alternative_call(hvm_funcs.cpu_up);
+
+    return 0;
 }
 
 static inline void hvm_cpu_down(void)
 {
-    alternative_vcall(hvm_funcs.cpu_down);
+    if ( hvm_funcs.cpu_down )
+        alternative_vcall(hvm_funcs.cpu_down);
 }
 
 static inline unsigned int hvm_get_insn_bytes(struct vcpu *v, uint8_t *buf)
-- 
2.11.0


Reply via email to