On 09/12/2020 18:57, Manuel Bouyer wrote:
> On Wed, Dec 09, 2020 at 06:08:53PM +0000, Andrew Cooper wrote:
>> On 09/12/2020 16:30, Manuel Bouyer wrote:
>>> On Wed, Dec 09, 2020 at 04:00:02PM +0000, Andrew Cooper wrote:
>>>> [...]
>>>>>> I wonder if the LDT is set up correctly.
>>>>> I guess it is, otherwise it wouldn't boot with a Xen 4.13 kernel, isn't 
>>>>> it ?
>>>> Well - you said you always saw it once on 4.13, which clearly shows that
>>>> something was wonky, but it managed to unblock itself.
>>>>
>>>>>> How about this incremental delta?
>>>>> Here's the output
>>>>> (XEN) IRET fault: #PF[0000]                                               
>>>>>      
>>>>> (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057        
>>>>>      
>>>>> (XEN) *** pv_map_ldt_shadow_page(0x40) failed                             
>>>>>      
>>>>> (XEN) IRET fault: #PF[0000]                                               
>>>>>      
>>>>> (XEN) %cr2 ffff820000010040, LDT base ffffc4800000a000, limit 0057        
>>>>>      
>>>>> (XEN) *** pv_map_ldt_shadow_page(0x40) failed                             
>>>>>      
>>>>> (XEN) IRET fault: #PF[0000]                                               
>>>>>   
>>>> Ok, so the promotion definitely fails, but we don't get as far as
>>>> inspecting the content of the LDT frame.  This probably means it failed
>>>> to change the page type, which probably means there are still
>>>> outstanding writeable references.
>>>>
>>>> I'm expecting the final printk to be the one which triggers.
>>> It's not. 
>>> Here's the output:
>>> (XEN) IRET fault: #PF[0000]
>>> (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057
>>> (XEN) *** LDT: gl1e 0000000000000000 not present
>>> (XEN) *** pv_map_ldt_shadow_page(0x40) failed
>>> (XEN) IRET fault: #PF[0000]
>>> (XEN) %cr2 ffff820000010040, LDT base ffffbd000000a000, limit 0057
>>> (XEN) *** LDT: gl1e 0000000000000000 not present
>>> (XEN) *** pv_map_ldt_shadow_page(0x40) failed
>> Ok.  So the mapping registered for the LDT is not yet present.  Xen
>> should be raising #PF with the guest, and would be in every case other
>> than the weird context on IRET, where we've confused bad guest state
>> with bad hypervisor state.
> Unfortunably it doesn't fix the problem. I'm now getting a loop of
> (XEN) *** LDT: gl1e 0000000000000000 not present                              
>  
> (XEN) *** pv_map_ldt_shadow_page(0x40) failed                                 
>  

Oh of course - we don't follow the exit-to-guest path on the way out here.

As a gross hack to check that we've at least diagnosed the issue
appropriately, could you modify NetBSD to explicitly load the %ss
selector into %es (or any other free segment) before first entering user
context?

If it a sequence of LDT demand-faulting issues, that should cause them
to be fully resolved before Xen's IRET becomes the first actual LDT load.

~Andrew

Reply via email to