Re: dom0 PV looping on search_pre_exception_table()

Andrew Cooper Thu, 10 Dec 2020 07:52:01 -0800

On 10/12/2020 09:51, Manuel Bouyer wrote:
> On Wed, Dec 09, 2020 at 07:08:41PM +0000, Andrew Cooper wrote:
>> Oh of course - we don't follow the exit-to-guest path on the way out here.
>>
>> As a gross hack to check that we've at least diagnosed the issue
>> appropriately, could you modify NetBSD to explicitly load the %ss
>> selector into %es (or any other free segment) before first entering user
>> context?
> If I understood it properly, the user %ss is loaded by Xen from the
> trapframe when the guest swictes from kernel to user mode, isn't it ?


Yes.  The kernel involves HYPERCALL_iret, and Xen copies/audits the
provided trapframe, and uses it to actually enter userspace.

> So you mean setting %es to the same value in the trapframe ?

Yes - specifically I wanted to force the LDT reference to happen in a
context where demand-faulting should work, so all the mappings get set
up properly before we first encounter the LDT reference in Xen's IRET
instruction.

And to be clear, there is definitely a bug needing fixing here in Xen in
terms of handling IRET faults caused by guest state.  However, it looks
like this isn't the root of the problem - merely some very weird
collateral damage.

> Actually I used %fs because %es is set equal to %ds.
> Xen 4.13 boots fine with this change, but with 4.15 I get a loop of:
>
>
> (XEN) *** LDT: gl1e 0000000000000000 not present                              
>  
> (XEN) *** pv_map_ldt_shadow_page(0x40) failed                                 
>  
> [  12.3586540] Process (pid 1) got sig 11                                     
>  
>
> which means that the dom0 gets the trap, and decides that the fault address
> is not mapped. Without the change the dom0 doesn't show the
> "Process (pid 1) got sig 11"
>
> I activated the NetBSD trap debug code, and this shows:
> [   6.7165877] kern.module.path=/stand/amd64-xen/9.1/modules
> (XEN) *** LDT: gl1e 0000000000000000 not present                              
>   
> (XEN) *** pv_map_ldt_shadow_page(0x40) failed
> [   6.9462322] pid 1.1 (init): signal 11 code=1 (trap 0x6) @rip 
> 0x7f7ef0c007d0 addr 0xffffbd800000a040 error=14
> [   7.0647896] trapframe 0xffffbd80381cff00
> [   7.1126288] rip 0x00007f7ef0c007d0  rsp 0x00007f7fff10aa30  rfl 
> 0x0000000000000202
> [   7.2041518] rdi 000000000000000000  rsi 000000000000000000  rdx 
> 000000000000000000
> [   7.2956758] rcx 000000000000000000  r8  000000000000000000  r9  
> 000000000000000000
> [   7.3872013] r10 000000000000000000  r11 000000000000000000  r12 
> 000000000000000000
> [   7.4787216] r13 000000000000000000  r14 000000000000000000  r15 
> 000000000000000000
> [   7.5702439] rbp 000000000000000000  rbx 0x00007f7fff10afe0  rax 
> 000000000000000000
> [   7.6617663] cs 0x47  ds 0x23  es 0x23  fs 0000  gs 0000  ss 0x3f
> [   7.7345663] fsbase 000000000000000000 gsbase 000000000000000000
>
> so it looks like something resets %fs to 0 ...
>
> Anyway the fault address 0xffffbd800000a040 is in the hypervisor's range,
> isn't it ?

No.  Its the kernel's LDT.  From previous debugging:
> (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057

LDT handling in Xen is a bit complicated.  To maintain host safety, we
must map it into Xen's range, and we explicitly support a PV guest doing
on-demand mapping of the LDT.  (This pertains to the experimental
Windows XP PV support which never made it beyond a prototype.  Windows
can page out the LDT.)  Either way, we lazily map the LDT frames on
first use.

So %cr2 is the real hardware faulting address, and is in the Xen range. 
We spot that it is an LDT access, and try to lazily map the frame (at
LDT base), but find that the kernel's virtual address mapping
0xffffbd000000a000 is not present (the gl1e printk).

Therefore, we pass #PF to the guest kernel, adjusting vCR2 to what would
have happened had Xen not mapped the real LDT elsewhere, which is
expected to cause the guest kernel to do whatever demand mapping is
necessary to pull the LDT back in.


I suppose it is worth taking a step back and ascertaining how exactly
NetBSD handles (or, should be handling) the LDT.

Do you mind elaborating on how it is supposed to work?

~Andrew

Re: dom0 PV looping on search_pre_exception_table()

Reply via email to