On 10/12/2020 17:03, Manuel Bouyer wrote: > On Thu, Dec 10, 2020 at 03:51:46PM +0000, Andrew Cooper wrote: >>> [ 7.6617663] cs 0x47 ds 0x23 es 0x23 fs 0000 gs 0000 ss 0x3f >>> [ 7.7345663] fsbase 000000000000000000 gsbase 000000000000000000 >>> >>> so it looks like something resets %fs to 0 ... >>> >>> Anyway the fault address 0xffffbd800000a040 is in the hypervisor's range, >>> isn't it ? >> No. Its the kernel's LDT. From previous debugging: >>> (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057 >> LDT handling in Xen is a bit complicated. To maintain host safety, we >> must map it into Xen's range, and we explicitly support a PV guest doing >> on-demand mapping of the LDT. (This pertains to the experimental >> Windows XP PV support which never made it beyond a prototype. Windows >> can page out the LDT.) Either way, we lazily map the LDT frames on >> first use. >> >> So %cr2 is the real hardware faulting address, and is in the Xen range. >> We spot that it is an LDT access, and try to lazily map the frame (at >> LDT base), but find that the kernel's virtual address mapping >> 0xffffbd000000a000 is not present (the gl1e printk). >> >> Therefore, we pass #PF to the guest kernel, adjusting vCR2 to what would >> have happened had Xen not mapped the real LDT elsewhere, which is >> expected to cause the guest kernel to do whatever demand mapping is >> necessary to pull the LDT back in. >> >> >> I suppose it is worth taking a step back and ascertaining how exactly >> NetBSD handles (or, should be handling) the LDT. >> >> Do you mind elaborating on how it is supposed to work? > Note that I'm not familiar with this selector stuff; and I usually get > it wrong the first time I go back to it. > > AFAIK, in the Xen PV case, a page is allocated an mapped in kernel > space, and registered to Xen with MMUEXT_SET_LDT. > From what I found, in the common case the LDT is the same for all processes. > Does it make sense ?
The debugging earlier shows that MMUEXT_SET_LDT has indeed been called. Presumably 0xffffbd000000a000 is a plausible virtual address for NetBSD to position the LDT? However, Xen finds the mapping not-present when trying to demand-map it, hence why the #PF is forwarded to the kernel. The way we pull guest virtual addresses was altered by XSA-286 (released not too long ago despite its apparent age), but *should* have been no functional change. I wonder if we accidentally broke something there. What exactly are you running, Xen-wise, with the 4.13 version? Given that this is init failing, presumably the issue would repro with the net installer version too? ~Andrew