On 10/12/2020 09:51, Manuel Bouyer wrote: > On Wed, Dec 09, 2020 at 07:08:41PM +0000, Andrew Cooper wrote: >> Oh of course - we don't follow the exit-to-guest path on the way out here. >> >> As a gross hack to check that we've at least diagnosed the issue >> appropriately, could you modify NetBSD to explicitly load the %ss >> selector into %es (or any other free segment) before first entering user >> context? > If I understood it properly, the user %ss is loaded by Xen from the > trapframe when the guest swictes from kernel to user mode, isn't it ?
Yes. The kernel involves HYPERCALL_iret, and Xen copies/audits the provided trapframe, and uses it to actually enter userspace. > So you mean setting %es to the same value in the trapframe ? Yes - specifically I wanted to force the LDT reference to happen in a context where demand-faulting should work, so all the mappings get set up properly before we first encounter the LDT reference in Xen's IRET instruction. And to be clear, there is definitely a bug needing fixing here in Xen in terms of handling IRET faults caused by guest state. However, it looks like this isn't the root of the problem - merely some very weird collateral damage. > Actually I used %fs because %es is set equal to %ds. > Xen 4.13 boots fine with this change, but with 4.15 I get a loop of: > > > (XEN) *** LDT: gl1e 0000000000000000 not present > > (XEN) *** pv_map_ldt_shadow_page(0x40) failed > > [ 12.3586540] Process (pid 1) got sig 11 > > > which means that the dom0 gets the trap, and decides that the fault address > is not mapped. Without the change the dom0 doesn't show the > "Process (pid 1) got sig 11" > > I activated the NetBSD trap debug code, and this shows: > [ 6.7165877] kern.module.path=/stand/amd64-xen/9.1/modules > (XEN) *** LDT: gl1e 0000000000000000 not present > > (XEN) *** pv_map_ldt_shadow_page(0x40) failed > [ 6.9462322] pid 1.1 (init): signal 11 code=1 (trap 0x6) @rip > 0x7f7ef0c007d0 addr 0xffffbd800000a040 error=14 > [ 7.0647896] trapframe 0xffffbd80381cff00 > [ 7.1126288] rip 0x00007f7ef0c007d0 rsp 0x00007f7fff10aa30 rfl > 0x0000000000000202 > [ 7.2041518] rdi 000000000000000000 rsi 000000000000000000 rdx > 000000000000000000 > [ 7.2956758] rcx 000000000000000000 r8 000000000000000000 r9 > 000000000000000000 > [ 7.3872013] r10 000000000000000000 r11 000000000000000000 r12 > 000000000000000000 > [ 7.4787216] r13 000000000000000000 r14 000000000000000000 r15 > 000000000000000000 > [ 7.5702439] rbp 000000000000000000 rbx 0x00007f7fff10afe0 rax > 000000000000000000 > [ 7.6617663] cs 0x47 ds 0x23 es 0x23 fs 0000 gs 0000 ss 0x3f > [ 7.7345663] fsbase 000000000000000000 gsbase 000000000000000000 > > so it looks like something resets %fs to 0 ... > > Anyway the fault address 0xffffbd800000a040 is in the hypervisor's range, > isn't it ? No. Its the kernel's LDT. From previous debugging: > (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057 LDT handling in Xen is a bit complicated. To maintain host safety, we must map it into Xen's range, and we explicitly support a PV guest doing on-demand mapping of the LDT. (This pertains to the experimental Windows XP PV support which never made it beyond a prototype. Windows can page out the LDT.) Either way, we lazily map the LDT frames on first use. So %cr2 is the real hardware faulting address, and is in the Xen range. We spot that it is an LDT access, and try to lazily map the frame (at LDT base), but find that the kernel's virtual address mapping 0xffffbd000000a000 is not present (the gl1e printk). Therefore, we pass #PF to the guest kernel, adjusting vCR2 to what would have happened had Xen not mapped the real LDT elsewhere, which is expected to cause the guest kernel to do whatever demand mapping is necessary to pull the LDT back in. I suppose it is worth taking a step back and ascertaining how exactly NetBSD handles (or, should be handling) the LDT. Do you mind elaborating on how it is supposed to work? ~Andrew