On 14/10/2020 17:28, Roger Pau Monné wrote: > On Fri, Oct 09, 2020 at 12:53:01PM +0100, Andrew Cooper wrote: >> Despite appearing to be a deliberate design choice of early PV64, the >> resulting behaviour for unregistered SYSCALL callbacks creates an untenable >> testability problem for Xen. Furthermore, the behaviour is undocumented, >> bizarre, and inconsistent with related behaviour in Xen, and very liable >> introduce a security vulnerability into a PV guest if the author hasn't >> studied Xen's assembly code in detail. >> >> There are two different bugs here. >> >> 1) The current logic confuses the registered entrypoints, and may deliver a >> SYSCALL from 32bit userspace to the 64bit entry, when only a 64bit >> entrypoint is registered. >> >> This has been the case ever since 2007 (c/s cd75d47348b) but up until >> 2018 (c/s dba899de14) the wrong selectors would be handed to the guest for >> a 32bit SYSCALL entry, making it appear as if it a 64bit entry all along. >> >> Xen would malfunction under these circumstances, if it were a PV guest. >> Linux would as well, but PVOps has always registered both entrypoints and >> discarded the Xen-provided selectors. NetBSD really does malfunction as a >> consequence (benignly now, but a VM DoS before the 2018 Xen selector fix). >> >> 2) In the case that neither SYSCALL callbacks are registered, the guest will >> be crashed when userspace executes a SYSCALL instruction, which is a >> userspace => kernel DoS. >> >> This has been the case ever since the introduction of 64bit PV support, >> but >> behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which yield >> #GP/#UD in userspace before the callback is registered, and are therefore >> safe by default. >> >> This change does constitute a change in the PV ABI, for corner cases of a PV >> guest kernel registering neither callback, or not registering the 32bit >> callback when running on AMD/Hygon hardware. >> >> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64 >> SYSENTER (safe by default, until explicitly enabled), as well as native >> hardware (always delivered to the single applicable callback). >> >> Most importantly however, and the primary reason for the change, is that it >> lets us sensibly test the fast system call entrypoints under all states a PV >> guest can construct, to prove correct behaviour. >> >> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> > Acked-by: Roger Pau Monné <roger....@citrix.com> > >> --- >> CC: Jan Beulich <jbeul...@suse.com> >> CC: Roger Pau Monné <roger....@citrix.com> >> CC: Wei Liu <w...@xen.org> >> CC: Andy Lutomirski <l...@kernel.org> >> CC: Manuel Bouyer <bou...@antioche.eu.org> >> >> v2: >> * Drop unnecessary instruction suffixes >> * Don't truncate #UD entrypoint to 32 bits >> >> Manuel: This will result in a corner case change for NetBSD. >> >> At the moment on native, 32bit userspace on 64bit NetBSD will get #UD (Intel, >> etc), or an explicit -ENOSYS (AMD, etc) when trying to execute a 32bit >> SYSCALL >> instruction. >> >> After this change, a 64bit PV VM will consistently see #UD (like on Intel, >> etc >> hardware) even when running on AMD/Hygon hardware (as Xsyscall32 isn't >> registered with Xen), rather than following Xsyscall into the proper system >> call path. >> --- >> xen/arch/x86/x86_64/entry.S | 26 +++++++++++++++++++------- >> 1 file changed, 19 insertions(+), 7 deletions(-) >> >> diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S >> index 000eb9722b..aaf8402f93 100644 >> --- a/xen/arch/x86/x86_64/entry.S >> +++ b/xen/arch/x86/x86_64/entry.S >> @@ -26,18 +26,30 @@ >> /* %rbx: struct vcpu */ >> ENTRY(switch_to_kernel) >> leaq VCPU_trap_bounce(%rbx),%rdx >> - /* TB_eip = (32-bit syscall && syscall32_addr) ? >> - * syscall32_addr : syscall_addr */ >> - xor %eax,%eax >> + >> + /* TB_eip = 32-bit syscall ? syscall32_addr : syscall_addr */ >> + mov VCPU_syscall32_addr(%rbx), %ecx > This being an unsigned long field, shouldn't you use %rcx here?
Yes I should. Sorry - thought I'd fixed all of these. I'll ad higher half handlers to the XTF test. ~Andrew