On 05/29/17 15:11, Andrew Cooper wrote:
> On 29/05/2017 12:46, Razvan Cojocaru wrote:
>> On 05/29/17 14:05, Jan Beulich wrote:
>>>>>> On 29.05.17 at 11:20, <rcojoc...@bitdefender.com> wrote:
>>>> On 05/26/17 18:11, Andrew Cooper wrote:
>>>>> On 26/05/17 15:29, Jan Beulich wrote:
>>>>>>>>> On 25.05.17 at 11:40, <rcojoc...@bitdefender.com> wrote:
>>>>>>> I've noticed that, with pages marked NX and vm_event emulation, we can
>>>>>>> end up emulating an ud2, for which hvm_emulate_one() returns
>>>>>>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
>>>>>> Could you explain what would lead to emulation of UD2?
>>>>>>
>>>>>>> This, in turn, causes a hvm_inject_event() call in the context of
>>>>>>> hvm_do_resume(), which can, if there's already a pending event there,
>>>>>>> cause a 101 BSOD (timer-related, if I understand correctly) or loss of
>>>>>>> input (mouse frozen, keyboard unresponsive).
>>>>>>>
>>>>>>> After much trial and error, I've been able to confirm this by leaving a
>>>>>>> guest on for almost a full day with this change:
>>>>>>>
>>>>>>>      case X86EMUL_EXCEPTION:
>>>>>>> -        hvm_inject_event(&ctx.ctxt.event);
>>>>>>> +        if ( !hvm_event_pending(current) )
>>>>>>> +            hvm_inject_event(&ctx.ctxt.event);
>>>>>>>
>>>>>>> and checking that there's been no BSOD or loss of input.
>>>>>>>
>>>>>>> However, just losing the event here, while fine to prove that this is
>>>>>>> indeed the problem, is not OK. But I'm not sure what an elegant / robust
>>>>>>> way of fixing this is.
>>>>>> Much depends on what the other event is: If it's an interrupt, I'd
>>>>>> assume there to be an ordering problem (interrupts shouldn't be
>>>>>> injected when there is a pending exception, their delivery instead
>>>>>> should be attempted on the first instruction of the exception
>>>>>> handler [if interrupts remain on] or whenever interrupts get
>>>>>> re-enabled).
>>>>> I suspect it is an ordering issue, and something has processed and
>>>>> interrupt before the emulation occurs as part of the vm_event reply 
>>>>> happens.
>>>>>
>>>>> The interrupt ordering spec indicates that external interrupts take
>>>>> precedent over faults raised from executing an instruction, on the basis
>>>>> that once the interrupt handler returns, the instruction will generate
>>>>> the same fault again.  However, its not obvious how this is intended to
>>>>> interact with interrupt windows and vmexits.  I expect we can get away
>>>>> with ensuring that external interrupts are the final thing considered
>>>>> for injection on the return-to-guest path.
>>>>>
>>>>> It might be an idea to leave an assert in vmx_inject_event() that an
>>>>> event is not already pending, but in the short term, this probably also
>>>>> wants debugging by trying to identify what sequence of actions is
>>>>> leading us to inject two events in this case (if indeed this is what is
>>>>> happening).
>>>> With some patience, I've been able to catch the problem: "(XEN)
>>>> vmx_inject_event(3, 14) but 0, 225 pending".
>>>>
>>>>  63 /*
>>>>  64  * x86 event types. This enumeration is valid for:
>>>>  65  *  Intel VMX: {VM_ENTRY,VM_EXIT,IDT_VECTORING}_INTR_INFO[10:8]
>>>>  66  *  AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only)
>>>>  67  */
>>>>  68 enum x86_event_type {
>>>>  69     X86_EVENTTYPE_EXT_INTR,         /* External interrupt */
>>>>  70     X86_EVENTTYPE_NMI = 2,          /* NMI */
>>>>  71     X86_EVENTTYPE_HW_EXCEPTION,     /* Hardware exception */
>>>>  72     X86_EVENTTYPE_SW_INTERRUPT,     /* Software interrupt (CD nn) */
>>>>  73     X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */
>>>>  74     X86_EVENTTYPE_SW_EXCEPTION,     /* INT3 (CC), INTO (CE) */
>>>>  75 };
>>>>
>>>> so an X86_EVENTTYPE_EXT_INTR is pending when we're trying to inject an
>>>> X86_EVENTTYPE_HW_EXCEPTION, as a result of the code quoted above.
>>> So this confirms our suspicion, but doesn't move us closer to a
>>> solution. The question after all is why an external interrupt is
>>> being delivered prior to or while emulating. As Andrew did
>>> explain, proper behavior would be to check for external
>>> interrupts and don't enter emulation if one is pending, or don't
>>> check for external interrupts until the _next_ instruction
>>> boundary. Correct architectural behavior will result either way;
>>> the second variant merely must not continuously defer interrupts
>>> (i.e. there need to be instruction boundaries at which hardware
>>> of software do check for them).
>>>
>>> I'm not that familiar with the sequence of steps when dealing
>>> with emulation requests from an introspection agent, so I would
>>> hope you could go through those code paths to see where
>>> external interrupts are being checked for. Or wait - isn't your
>>> problem that you invoke emulation out of hvm_do_resume() (via
>>> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()),
>>> which happens after all other "normal" processing of a VM exit?
>>> Perhaps emulation should be skipped there if an event is already
>>> pending injection, as emulation not having started means we still
>>> are on an instruction boundary?
>> Indeed, we are emulating after all the vmexit processing has already
>> happened, in the hvm_do_resume() path you've mentioned.
>>
>> Not emulating if an event is pending injection is certainly trivial to
>> test - however, since then we would have to lose the emulation attempt,
>> presumably the EPT fault event would show up again (a "duplicate" event)
>> in this case (since it is no longer safe to assume that we land on the
>> same RIP after the interrupt has been handled, and thus the next
>> hvm_do_resume() may not happen where we wanted to emulate).
>>
>> I'll test this scenario out.
> 
> The problem is in .Lvmx_do_vmentry.  We call vmx_intr_assist() before
> vmx_process_softirqs()
> 
> Therefore, when the EPT violation occurs and we break for introspection,
> we still process pending interrupts before finally getting desheduled to
> wait for the vm_event reply.  On receiving the vm_event reply, we
> emulate the instruction, and find that an interrupt has already been
> queued from before, but not yet delivered.
> 
> Untangling this going to be complicated.

Would it not be acceptable to try to add a bool flag, e.g.
v->arch.hvm_vcpu.disable_interrupts, that would work similarly to how
v->arch.hvm_vcpu.single_step works in vmx_intr_assist()?

We'd set it to true before sending out an EPT violation event, and then
back to false in hvm_vm_event_do_resume().


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Reply via email to