On 05/29/17 15:11, Andrew Cooper wrote: > On 29/05/2017 12:46, Razvan Cojocaru wrote: >> On 05/29/17 14:05, Jan Beulich wrote: >>>>>> On 29.05.17 at 11:20, <rcojoc...@bitdefender.com> wrote: >>>> On 05/26/17 18:11, Andrew Cooper wrote: >>>>> On 26/05/17 15:29, Jan Beulich wrote: >>>>>>>>> On 25.05.17 at 11:40, <rcojoc...@bitdefender.com> wrote: >>>>>>> I've noticed that, with pages marked NX and vm_event emulation, we can >>>>>>> end up emulating an ud2, for which hvm_emulate_one() returns >>>>>>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event(). >>>>>> Could you explain what would lead to emulation of UD2? >>>>>> >>>>>>> This, in turn, causes a hvm_inject_event() call in the context of >>>>>>> hvm_do_resume(), which can, if there's already a pending event there, >>>>>>> cause a 101 BSOD (timer-related, if I understand correctly) or loss of >>>>>>> input (mouse frozen, keyboard unresponsive). >>>>>>> >>>>>>> After much trial and error, I've been able to confirm this by leaving a >>>>>>> guest on for almost a full day with this change: >>>>>>> >>>>>>> case X86EMUL_EXCEPTION: >>>>>>> - hvm_inject_event(&ctx.ctxt.event); >>>>>>> + if ( !hvm_event_pending(current) ) >>>>>>> + hvm_inject_event(&ctx.ctxt.event); >>>>>>> >>>>>>> and checking that there's been no BSOD or loss of input. >>>>>>> >>>>>>> However, just losing the event here, while fine to prove that this is >>>>>>> indeed the problem, is not OK. But I'm not sure what an elegant / robust >>>>>>> way of fixing this is. >>>>>> Much depends on what the other event is: If it's an interrupt, I'd >>>>>> assume there to be an ordering problem (interrupts shouldn't be >>>>>> injected when there is a pending exception, their delivery instead >>>>>> should be attempted on the first instruction of the exception >>>>>> handler [if interrupts remain on] or whenever interrupts get >>>>>> re-enabled). >>>>> I suspect it is an ordering issue, and something has processed and >>>>> interrupt before the emulation occurs as part of the vm_event reply >>>>> happens. >>>>> >>>>> The interrupt ordering spec indicates that external interrupts take >>>>> precedent over faults raised from executing an instruction, on the basis >>>>> that once the interrupt handler returns, the instruction will generate >>>>> the same fault again. However, its not obvious how this is intended to >>>>> interact with interrupt windows and vmexits. I expect we can get away >>>>> with ensuring that external interrupts are the final thing considered >>>>> for injection on the return-to-guest path. >>>>> >>>>> It might be an idea to leave an assert in vmx_inject_event() that an >>>>> event is not already pending, but in the short term, this probably also >>>>> wants debugging by trying to identify what sequence of actions is >>>>> leading us to inject two events in this case (if indeed this is what is >>>>> happening). >>>> With some patience, I've been able to catch the problem: "(XEN) >>>> vmx_inject_event(3, 14) but 0, 225 pending". >>>> >>>> 63 /* >>>> 64 * x86 event types. This enumeration is valid for: >>>> 65 * Intel VMX: {VM_ENTRY,VM_EXIT,IDT_VECTORING}_INTR_INFO[10:8] >>>> 66 * AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only) >>>> 67 */ >>>> 68 enum x86_event_type { >>>> 69 X86_EVENTTYPE_EXT_INTR, /* External interrupt */ >>>> 70 X86_EVENTTYPE_NMI = 2, /* NMI */ >>>> 71 X86_EVENTTYPE_HW_EXCEPTION, /* Hardware exception */ >>>> 72 X86_EVENTTYPE_SW_INTERRUPT, /* Software interrupt (CD nn) */ >>>> 73 X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */ >>>> 74 X86_EVENTTYPE_SW_EXCEPTION, /* INT3 (CC), INTO (CE) */ >>>> 75 }; >>>> >>>> so an X86_EVENTTYPE_EXT_INTR is pending when we're trying to inject an >>>> X86_EVENTTYPE_HW_EXCEPTION, as a result of the code quoted above. >>> So this confirms our suspicion, but doesn't move us closer to a >>> solution. The question after all is why an external interrupt is >>> being delivered prior to or while emulating. As Andrew did >>> explain, proper behavior would be to check for external >>> interrupts and don't enter emulation if one is pending, or don't >>> check for external interrupts until the _next_ instruction >>> boundary. Correct architectural behavior will result either way; >>> the second variant merely must not continuously defer interrupts >>> (i.e. there need to be instruction boundaries at which hardware >>> of software do check for them). >>> >>> I'm not that familiar with the sequence of steps when dealing >>> with emulation requests from an introspection agent, so I would >>> hope you could go through those code paths to see where >>> external interrupts are being checked for. Or wait - isn't your >>> problem that you invoke emulation out of hvm_do_resume() (via >>> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()), >>> which happens after all other "normal" processing of a VM exit? >>> Perhaps emulation should be skipped there if an event is already >>> pending injection, as emulation not having started means we still >>> are on an instruction boundary? >> Indeed, we are emulating after all the vmexit processing has already >> happened, in the hvm_do_resume() path you've mentioned. >> >> Not emulating if an event is pending injection is certainly trivial to >> test - however, since then we would have to lose the emulation attempt, >> presumably the EPT fault event would show up again (a "duplicate" event) >> in this case (since it is no longer safe to assume that we land on the >> same RIP after the interrupt has been handled, and thus the next >> hvm_do_resume() may not happen where we wanted to emulate). >> >> I'll test this scenario out. > > The problem is in .Lvmx_do_vmentry. We call vmx_intr_assist() before > vmx_process_softirqs() > > Therefore, when the EPT violation occurs and we break for introspection, > we still process pending interrupts before finally getting desheduled to > wait for the vm_event reply. On receiving the vm_event reply, we > emulate the instruction, and find that an interrupt has already been > queued from before, but not yet delivered. > > Untangling this going to be complicated.
Would it not be acceptable to try to add a bool flag, e.g. v->arch.hvm_vcpu.disable_interrupts, that would work similarly to how v->arch.hvm_vcpu.single_step works in vmx_intr_assist()? We'd set it to true before sending out an EPT violation event, and then back to false in hvm_vm_event_do_resume(). Thanks, Razvan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel