On 10/12/2020 17:03, Manuel Bouyer wrote:
> On Thu, Dec 10, 2020 at 03:51:46PM +0000, Andrew Cooper wrote:
>>> [   7.6617663] cs 0x47  ds 0x23  es 0x23  fs 0000  gs 0000  ss 0x3f
>>> [   7.7345663] fsbase 000000000000000000 gsbase 000000000000000000
>>>
>>> so it looks like something resets %fs to 0 ...
>>>
>>> Anyway the fault address 0xffffbd800000a040 is in the hypervisor's range,
>>> isn't it ?
>> No.  Its the kernel's LDT.  From previous debugging:
>>> (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057
>> LDT handling in Xen is a bit complicated.  To maintain host safety, we
>> must map it into Xen's range, and we explicitly support a PV guest doing
>> on-demand mapping of the LDT.  (This pertains to the experimental
>> Windows XP PV support which never made it beyond a prototype.  Windows
>> can page out the LDT.)  Either way, we lazily map the LDT frames on
>> first use.
>>
>> So %cr2 is the real hardware faulting address, and is in the Xen range. 
>> We spot that it is an LDT access, and try to lazily map the frame (at
>> LDT base), but find that the kernel's virtual address mapping
>> 0xffffbd000000a000 is not present (the gl1e printk).
>>
>> Therefore, we pass #PF to the guest kernel, adjusting vCR2 to what would
>> have happened had Xen not mapped the real LDT elsewhere, which is
>> expected to cause the guest kernel to do whatever demand mapping is
>> necessary to pull the LDT back in.
>>
>>
>> I suppose it is worth taking a step back and ascertaining how exactly
>> NetBSD handles (or, should be handling) the LDT.
>>
>> Do you mind elaborating on how it is supposed to work?
> Note that I'm not familiar with this selector stuff; and I usually get
> it wrong the first time I go back to it.
>
> AFAIK, in the Xen PV case, a page is allocated an mapped in kernel
> space, and registered to Xen with MMUEXT_SET_LDT.
> From what I found, in the common case the LDT is the same for all processes.
> Does it make sense ?

The debugging earlier shows that MMUEXT_SET_LDT has indeed been called. 
Presumably 0xffffbd000000a000 is a plausible virtual address for NetBSD
to position the LDT?

However, Xen finds the mapping not-present when trying to demand-map it,
hence why the #PF is forwarded to the kernel.

The way we pull guest virtual addresses was altered by XSA-286 (released
not too long ago despite its apparent age), but *should* have been no
functional change.  I wonder if we accidentally broke something there. 
What exactly are you running, Xen-wise, with the 4.13 version?

Given that this is init failing, presumably the issue would repro with
the net installer version too?

~Andrew

Reply via email to