Re: [PATCH v4 for-4.14] x86/monitor: revert default behavior when monitoring register write events
On 6/8/20 11:44 PM, Tamas K Lengyel wrote: > On Mon, Jun 8, 2020 at 2:14 PM Razvan Cojocaru > wrote: >> >> On 6/8/20 10:54 PM, Tamas K Lengyel wrote: >>> On Mon, Jun 8, 2020 at 12:58 PM Razvan Cojocaru >>>> And last but not least, the proper sequence is: 1. unsubscribe from >>>> register write events, 2. process all events "still in the chamber" >>>> (keep checking the ring buffer for a while), 3. detach from the guest >>>> (disable the vm_event subsystem). Not ideal perhaps (in that it's not >>>> guaranteed that a VCPU won't resume after a longer period than our >>>> timeout), but if the sequence is followed there should be no guest hangs >>>> or crashes (at least none that we or our clients have observed so far). >>> >>> Incorrect. That's not enough. You also have to wait for all the vCPUs >>> to get scheduled before disabling vm_event or otherwise the emulation >>> is skipped entirely. Please read the patch message. If the user >>> decides to disable vm_event after getting a CR3 event delivered, the >>> CR3 never gets updated and results in the guest crashing in >>> unpredictable ways. Same happens with all the other registers. >> >> I did read the patch message. As I've previously stated ("it's not >> guaranteed that a VCPU won't resume after a longer period than our >> timeout"), it's not ideal, and I've made no claim that the problem does >> not exist or that it shouldn't be fixed - but really if you've got a >> VCPU that will wait more than a couple of seconds to get scheduled then >> you've got a bigger problem with the VM. > > Sorry, missed the part where you say you knew about this issue. We > didn't and it was absolutely not documented anywhere and certainly not > mentioned in the original patch that added the feature. It literally > took us years to figure out what the hell is going on. While as you it > can be a couple seconds before its safe to disable it can be a lot > longer depending on the system state, how many VMs are running and how > many vCPUs are active on the VM. There is absolutely necessary > use-cases where you want to keep the VM paused after a CR3 event is > received and exit vm_event afterwards. This bug totally blocks those > use-cases. Not to mention that it's a total mess having an interface > where the user has to guess when its safe to do something. If this was > pointed out when the patch was made I would have objected to that > being merged. No, we didn't know about the issue. It's apparently not my most eloquent evening. I was merely saying that I did understand what the issue was from your description, and offering an explanation on why we've never seen this in QA or production. Of course the issue would have been mentioned at least (but ideally not exist to begin with) had it been known. We do take several passes through the ring buffer (and as a side-effect reduce the probability of a race occuring to almost zero), but we do that to make sure we've cleaned up all EPT vm_events; the fact that it has helped with _this_ issue is a coincidence. IIRC Windows, at least, will be upset if a VCPU is stuck for too long. As for how the vm_event system behaves: 1. A security application that is unable to _prevent_ things being done to a system is not doing a very good job, since in that case you can only collect stats and not veto anything. I would argue that the default for such a monitoring application should be the current one (all events should be pre-action). 2. This is further supported by the fact that that's exactly how the EPT vm_events work: you get a "I want to write to this page" event _before_ the write occurs. If register writes behave differently, you have _two_ different models: one where you get an event post-action, and one where you get one pre-action. Hope that's clearer, Razvan
Re: [PATCH v4 for-4.14] x86/monitor: revert default behavior when monitoring register write events
On 6/9/20 1:50 AM, Tamas K Lengyel wrote: > On Mon, Jun 8, 2020 at 3:16 PM Razvan Cojocaru > wrote: >> 1. A security application that is unable to _prevent_ things being done >> to a system is not doing a very good job, since in that case you can >> only collect stats and not veto anything. I would argue that the default >> for such a monitoring application should be the current one (all events >> should be pre-action). > > Not all security applications require this though. Malware analysis > where stealth is required would absolutely not want this side-effect > to be visible to the guest where malware could use it to determine > that it's being monitored. So I don't buy into this argument. Fair enough, in that case having both models supported should be fine. I'll leave the rest of that conversation to my colleagues. >> 2. This is further supported by the fact that that's exactly how the EPT >> vm_events work: you get a "I want to write to this page" event _before_ >> the write occurs. If register writes behave differently, you have _two_ >> different models: one where you get an event post-action, and one where >> you get one pre-action. > > Whether you get an event before or after the effects of the event have > been applied to the system state shouldn't matter as long as you can > revert that action. I wouldn't care either way to be the default. But > having a default that breaks other use-cases is unacceptable. You keep saying that as if I disagree. :) But we've already established that the potential for a race condition has been found and needs to be fixed. My only (minor) objection has been that a patch fixing the current model would have been preferable to one that switches the default as a workaround. Still, it's understandable that perhaps there's no time or motivation for that. Thanks for the work, Razvan
Re: [PATCH v4 for-4.14] x86/monitor: revert default behavior when monitoring register write events
On 6/8/20 6:55 PM, Jan Beulich wrote: > On 03.06.2020 17:07, Roger Pau Monné wrote: >> On Wed, Jun 03, 2020 at 06:52:37AM -0600, Tamas K Lengyel wrote: >>> For the last couple years we have received numerous reports from users of >>> monitor vm_events of spurious guest crashes when using events. In >>> particular, >>> it has observed that the problem occurs when vm_events are being disabled. >>> The >>> nature of the guest crash varied widely and has only occured occasionally. >>> This >>> made debugging the issue particularly hard. We had discussions about this >>> issue >>> even here on the xen-devel mailinglist with no luck figuring it out. >>> >>> The bug has now been identified as a race-condition between register event >>> handling and disabling the monitor vm_event interface. The default behavior >>> regarding emulation of register write events is changed so that they get >>> postponed until the corresponding vm_event handler decides whether to allow >>> such >>> write to take place. Unfortunately this can only be implemented by >>> performing the >>> deny/allow step when the vCPU gets scheduled. >>> >>> Due to that postponed emulation of the event if the user decides to pause >>> the >>> VM in the vm_event handler and then disable events, the entire emulation >>> step >>> is skipped the next time the vCPU is resumed. Even if the user doesn't pause >>> during the vm_event handling but exits immediately and disables vm_event, >>> the >>> situation becomes racey as disabling vm_event may succeed before the guest's >>> vCPUs get scheduled with the pending emulation task. This has been >>> particularly >>> the case with VMS that have several vCPUs as after the VM is unpaused it may >>> actually take a long time before all vCPUs get scheduled. >>> >>> In this patch we are reverting the default behavior to always perform >>> emulation >>> of register write events when the event occurs. To postpone them can be >>> turned >>> on as an option. In that case the user of the interface still has to take >>> care >>> of only disabling the interface when its safe as it remains buggy. >>> >>> Fixes: 96760e2fba10 ('vm_event: deny register writes if refused by vm_event >>> reply'). >>> >>> Signed-off-by: Tamas K Lengyel >> >> Thanks! >> >> Reviewed-by: Roger Pau Monné >> >> I would like to get some input from Bitdefender really, and whether >> they are fine with this approach. Hello, Not really my call to make anymore, but I do have a few notes. First, IIRC the problem stems from the initial choice to have the vm_event data allocated on-demand when first subscribing to events. The proper solution (since this patch doesn't actually fix the problem), IMHO, would be for the vm_event data to _always_ exist, and instead of relying on the value of its pointer to check if there are event subscribers, we could just check the emulation flags individually and never miss a pending emulated something again. I did try to go that way in the beginning, but it has reasonably been objected that we should cut back on using hypervisor memory unnecessarily, hence we got at this point. Secondly, I see no reason why we couldn't adapt to the new default behaviour provided that the old behaviour continues to work _exactly_ as before. And last but not least, the proper sequence is: 1. unsubscribe from register write events, 2. process all events "still in the chamber" (keep checking the ring buffer for a while), 3. detach from the guest (disable the vm_event subsystem). Not ideal perhaps (in that it's not guaranteed that a VCPU won't resume after a longer period than our timeout), but if the sequence is followed there should be no guest hangs or crashes (at least none that we or our clients have observed so far). So in short, I think there's a better fix for this by simply not allocating the vm_event memory on-demand anymore and never having to deal with lost pending emulations again. It should also decrease code complexity by a tiny bit. Then again, as stated at the beginning of this message, that's just a recommendation. HTH, Razvan
Re: [PATCH v4 for-4.14] x86/monitor: revert default behavior when monitoring register write events
On 6/8/20 10:54 PM, Tamas K Lengyel wrote: > On Mon, Jun 8, 2020 at 12:58 PM Razvan Cojocaru >> And last but not least, the proper sequence is: 1. unsubscribe from >> register write events, 2. process all events "still in the chamber" >> (keep checking the ring buffer for a while), 3. detach from the guest >> (disable the vm_event subsystem). Not ideal perhaps (in that it's not >> guaranteed that a VCPU won't resume after a longer period than our >> timeout), but if the sequence is followed there should be no guest hangs >> or crashes (at least none that we or our clients have observed so far). > > Incorrect. That's not enough. You also have to wait for all the vCPUs > to get scheduled before disabling vm_event or otherwise the emulation > is skipped entirely. Please read the patch message. If the user > decides to disable vm_event after getting a CR3 event delivered, the > CR3 never gets updated and results in the guest crashing in > unpredictable ways. Same happens with all the other registers. I did read the patch message. As I've previously stated ("it's not guaranteed that a VCPU won't resume after a longer period than our timeout"), it's not ideal, and I've made no claim that the problem does not exist or that it shouldn't be fixed - but really if you've got a VCPU that will wait more than a couple of seconds to get scheduled then you've got a bigger problem with the VM. >> So in short, I think there's a better fix for this by simply not >> allocating the vm_event memory on-demand anymore and never having to >> deal with lost pending emulations again. It should also decrease code >> complexity by a tiny bit. Then again, as stated at the beginning of this >> message, that's just a recommendation.is > > Since only you guys use this feature I'm going to wait for a fix. > Until then, the default behavior should be restored so this buggy > behavior doesn't affect anyone else. You can still turn it on, its > just not going to be on for vm_event by default. I don't particularly > care what fix is there since only you guys use it. If you don't mind > that there is this buggy behavior because you never disable vm_events > once you activate it then that's that. Indeed, our usual scenario is that vm_event is always on on the machine. It's only rarely being disabled while the guest is running, and when it is, it's always with waiting sufficiently long that we've not seen any unexplainable hung VMs. Fair enough, as long as the previous behaviour is optionally available I see no reason why this patch shouldn't make it in - though, of course, as also previously stated, I'm just trying to shed as much light as possible on this from what I remember, and what happens next is not my call. My colleagues should be able to chime in tomorrow. Cheers, Razvan
Re: [Xen-devel] [PATCH] Remove myself as vm_event maintainer
On 12/10/19 11:57 AM, Jan Beulich wrote: > On 08.12.2019 11:07, Razvan Cojocaru wrote: >> --- >> MAINTAINERS | 1 - >> 1 file changed, 1 deletion(-) >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index 9c827ad759..012c847ebd 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -428,7 +428,6 @@ L: xen-devel@lists.xenproject.org >> F: unmodified_drivers/linux-2.6/ >> >> VM EVENT, MEM ACCESS and MONITOR >> -M: Razvan Cojocaru >> M: Tamas K Lengyel >> R: Alexandru Isaila >> R: Petre Pircalabu > > No matter the contents, I guess this still needs an S-o-b of yours. Re-sent, sorry for the inconvenience. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH V2] Remove myself as vm_event maintainer
Signed-off-by: Razvan Cojocaru --- MAINTAINERS | 1 - 1 file changed, 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 9c827ad759..012c847ebd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -428,7 +428,6 @@ L: xen-devel@lists.xenproject.org F: unmodified_drivers/linux-2.6/ VM EVENT, MEM ACCESS and MONITOR -M: Razvan Cojocaru M: Tamas K Lengyel R: Alexandru Isaila R: Petre Pircalabu -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] Remove myself as vm_event maintainer
--- MAINTAINERS | 1 - 1 file changed, 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 9c827ad759..012c847ebd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -428,7 +428,6 @@ L: xen-devel@lists.xenproject.org F: unmodified_drivers/linux-2.6/ VM EVENT, MEM ACCESS and MONITOR -M: Razvan Cojocaru M: Tamas K Lengyel R: Alexandru Isaila R: Petre Pircalabu -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-next] x86/svm: Correct vm_event API for descriptor accesses
On 11/28/19 1:44 PM, Andrew Cooper wrote: > c/s d0a699a389f1 "x86/monitor: add support for descriptor access events" > introduced logic looking for what appeared to be exitinfo (not that this > exists in SVM - exitinfo1 or 2 do), but actually passed the exit IDT vectoring > information. There is never any IDT vectoring involved in these intercepts so > the value passed is always zero. > > In fact, SVM doesn't provide any information, even in exitinfo1 and 2. Note > the error in the public API and state that this field is always 0, and drop > the SVM logic in hvm_monitor_descriptor_access(). > > In the SVM vmexit handler itself, optimise the switch statement by observing > that there is a linear transformation between the SVM exit_reason and > VM_EVENT_DESC_* values. (Bloat-o-meter reports 6028 => 5877 for a saving of > 151 bytes). > > Signed-off-by: Andrew Cooper > --- > CC: Jan Beulich > CC: Wei Liu > CC: Roger Pau Monné > CC: Razvan Cojocaru > CC: Tamas K Lengyel > CC: Alexandru Isaila > CC: Petre Pircalabu > CC: Adrian Pop > > Adrian: Do you recall what information you were attempting to forward from the > VMCB? I can't locate anything which would plausibly be interesting. I think it's safe to go the route you're going (you shouldn't break anything). Acked-by: Razvan Cojocaru (with or without addressing Tamas' comments). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v13] x86/emulate: Send vm_event from emulate
On 9/23/19 3:05 PM, Alexandru Stefan ISAILA wrote: > A/D bit writes (on page walks) can be considered benign by an introspection > agent, so receiving vm_events for them is a pessimization. We try here to > optimize by filtering these events out. > Currently, we are fully emulating the instruction at RIP when the hardware > sees > an EPT fault with npfec.kind != npfec_kind_with_gla. This is, however, > incorrect, because the instruction at RIP might legitimately cause an > EPT fault of its own while accessing a_different_ page from the original one, > where A/D were set. > The solution is to perform the whole emulation, while ignoring EPT > restrictions > for the walk part, and taking them into account for the "actual" emulating of > the instruction at RIP. When we send out a vm_event, we don't want the > emulation > to complete, since in that case we won't be able to veto whatever it is doing. > That would mean that we can't actually prevent any malicious activity, instead > we'd only be able to report on it. > When we see a "send-vm_event" case while emulating, we need to first send the > event out and then suspend the emulation (return X86EMUL_RETRY). > After the emulation stops we'll call hvm_vm_event_do_resume() again after the > introspection agent treats the event and resumes the guest. There, the > instruction at RIP will be fully emulated (with the EPT ignored) if the > introspection application allows it, and the guest will continue to run past > the instruction. > > A common example is if the hardware exits because of an EPT fault caused by a > page walk, p2m_mem_access_check() decides if it is going to send a vm_event. > If the vm_event was sent and it would be treated so it runs the instruction > at RIP, that instruction might also hit a protected page and provoke a > vm_event. > > Now if npfec.kind == npfec_kind_in_gpt and > d->arch.monitor.inguest_pagefault_disabled > is true then we are in the page walk case and we can do this emulation > optimization > and emulate the page walk while ignoring the EPT, but don't ignore the EPT > for the > emulation of the actual instruction. > > In the first case we would have 2 EPT events, in the second case we would have > 1 EPT event if the instruction at the RIP triggers an EPT event. > > We use hvmemul_map_linear_addr() to intercept write access and > __hvm_copy() to intercept exec, read and write access. > > A new return type was added, HVMTRANS_need_retry, in order to have all > the places that consume HVMTRANS* return X86EMUL_RETRY. > > hvm_emulate_send_vm_event() can return false if there was no violation, > if there was an error from monitor_traps() or p2m_get_mem_access(). > -ESRCH from p2m_get_mem_access() is treated as restricted access. > > NOTE: hvm_emulate_send_vm_event() assumes the caller will enable/disable > arch.vm_event->send_event > > Signed-off-by: Alexandru Isaila FWIW, Acked-by: Razvan Cojocaru ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] SVM: correct CPUID event processing
On 9/19/19 11:07 PM, Boris Ostrovsky wrote: > On 9/19/19 6:37 AM, Jan Beulich wrote: >> hvm_monitor_cpuid() expects the input registers, not two of the outputs. >> >> However, once having made the necessary adjustment, the SVM and VMX >> functions are so similar that they should be folded (thus avoiding >> further similar asymmetries to get introduced). Use the best of both >> worlds by e.g. using "curr" consistently. This then being the only >> caller of hvm_check_cpuid_faulting(), fold in that function as well. >> >> Signed-off-by: Jan Beulich > > Reviewed-by: Boris Ostrovsky Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v10] x86/emulate: Send vm_event from emulate
On 9/17/19 6:09 PM, Tamas K Lengyel wrote: > On Tue, Sep 17, 2019 at 8:24 AM Razvan Cojocaru > wrote: >> >> On 9/17/19 5:11 PM, Alexandru Stefan ISAILA wrote: >>>>>>> +bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec, >>>>>>> + uint16_t kind) >>>>>>> +{ >>>>>>> +xenmem_access_t access; >>>>>>> +vm_event_request_t req = {}; >>>>>>> +paddr_t gpa = (gfn_to_gaddr(gfn) | (gla & ~PAGE_MASK)); >>>>>>> + >>>>>>> +ASSERT(current->arch.vm_event->send_event); >>>>>>> + >>>>>>> +current->arch.vm_event->send_event = false; >>>>>>> + >>>>>>> +if ( p2m_get_mem_access(current->domain, gfn, &access, >>>>>>> +altp2m_vcpu_idx(current)) != 0 ) >>>>>>> +return false; >>>>>> ... next to the call here (but the maintainers of the file would >>>>>> have to judge in the end). That said, I continue to not understand >>>>>> why a not found entry means unrestricted access. Isn't it >>>>>> ->default_access which controls what such a "virtual" entry would >>>>>> permit? >>>>> I'm sorry for this misleading comment. The code states that if entry was >>>>> not found the access will be default_access and return 0. So in this >>>>> case the default_access will be checked. >>>>> >>>>> /* If request to get default access. */ >>>>> if ( gfn_eq(gfn, INVALID_GFN) ) >>>>> { >>>>>*access = memaccess[p2m->default_access]; >>>>>return 0; >>>>> } >>>>> >>>>> If this clears thing up I can remove the "NOTE" part if the comment. >>>> I'm afraid it doesn't clear things up: I'm still lost as to why >>>> "entry not found" implies "full access". And I'm further lost as >>>> to what the code fragment above (dealing with INVALID_GFN, but >>>> not really the "entry not found" case, which would be INVALID_MFN >>>> coming back from a translation) is supposed to tell me. >>>> >>> It is safe enough to consider a invalid mfn from hostp2 to be a >>> violation. There is still a small problem with having the altp2m view >>> not having the page propagated from hostp2m. In this case we have to use >>> altp2m_get_effective_entry(). >> >> In the absence of clear guidance from the Intel SDM on what the hardware >> default is for a page not present in the p2m, we should probably follow >> Jan's advice and check violations against default_access for such pages. > > The SDM is very clear that pages that are not present in the EPT are a > violation: > > 28.2.2 > An EPT paging-structure entry is present if any of bits 2:0 is 1; > otherwise, the entry is not present. The processor > ignores bits 62:3 and uses the entry neither to reference another EPT > paging-structure entry nor to produce a > physical address. A reference using a guest-physical address whose > translation encounters an EPT paging-struc- > ture that is not present causes an EPT violation (see Section 28.2.3.2). > > 28.2.3.2 > EPT Violations > An EPT violation may occur during an access using a guest-physical > address whose translation does not cause an > EPT misconfiguration. An EPT violation occurs in any of the following > situations: > • Translation of the guest-physical address encounters an EPT > paging-structure entry that is not present (see > Section 28.2.2). Mystery solved. Thanks! ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v10] x86/emulate: Send vm_event from emulate
On 9/17/19 5:11 PM, Alexandru Stefan ISAILA wrote: > +bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec, > + uint16_t kind) > +{ > +xenmem_access_t access; > +vm_event_request_t req = {}; > +paddr_t gpa = (gfn_to_gaddr(gfn) | (gla & ~PAGE_MASK)); > + > +ASSERT(current->arch.vm_event->send_event); > + > +current->arch.vm_event->send_event = false; > + > +if ( p2m_get_mem_access(current->domain, gfn, &access, > +altp2m_vcpu_idx(current)) != 0 ) > +return false; ... next to the call here (but the maintainers of the file would have to judge in the end). That said, I continue to not understand why a not found entry means unrestricted access. Isn't it ->default_access which controls what such a "virtual" entry would permit? >>> I'm sorry for this misleading comment. The code states that if entry was >>> not found the access will be default_access and return 0. So in this >>> case the default_access will be checked. >>> >>> /* If request to get default access. */ >>> if ( gfn_eq(gfn, INVALID_GFN) ) >>> { >>>*access = memaccess[p2m->default_access]; >>>return 0; >>> } >>> >>> If this clears thing up I can remove the "NOTE" part if the comment. >> I'm afraid it doesn't clear things up: I'm still lost as to why >> "entry not found" implies "full access". And I'm further lost as >> to what the code fragment above (dealing with INVALID_GFN, but >> not really the "entry not found" case, which would be INVALID_MFN >> coming back from a translation) is supposed to tell me. >> > It is safe enough to consider a invalid mfn from hostp2 to be a > violation. There is still a small problem with having the altp2m view > not having the page propagated from hostp2m. In this case we have to use > altp2m_get_effective_entry(). In the absence of clear guidance from the Intel SDM on what the hardware default is for a page not present in the p2m, we should probably follow Jan's advice and check violations against default_access for such pages. There are indeed - as discussed privately - two cases for an altp2m though: 1. Page not found in the altp2m, but set in the hostp2m - in which case I suggest that we take the hostp2m value into account (with or without propagation to the altp2m; I see no harm in propagating the entry but other may see something I'm missing). 2. Page not found in both altp2m and hostp2m - in which case we should probably check against default_access. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v9] x86/emulate: Send vm_event from emulate
On 9/11/19 2:41 PM, Jan Beulich wrote: > On 11.09.2019 13:21, Razvan Cojocaru wrote: >> On 9/11/19 1:39 PM, Alexandru Stefan ISAILA wrote: >>> >>> >>> On 11.09.2019 12:57, Jan Beulich wrote: >>>> On 09.09.2019 17:35, Alexandru Stefan ISAILA wrote: >>>>> +/* >>>>> + * Send memory access vm_events based on pfec. Returns true if the event >>>>> was >>>>> + * sent and false for p2m_get_mem_access() error, no violation and event >>>>> send >>>>> + * error. Assumes the caller will check arch.vm_event->send_event. >>>>> + * >>>>> + * NOTE: p2m_get_mem_access() can fail if the entry was not found in the >>>>> EPT >>>>> + * (in which case access to it is unrestricted, so no violations can >>>>> occur). >>>>> + * In this cases it is fine to continue the emulation. >>>>> + */ >>>>> +bool hvm_monitor_check_ept(unsigned long gla, gfn_t gfn, uint32_t pfec, >>>>> + uint16_t kind) >>>> >>>> Why did you choose to have "ept" in the name and also mention it >>>> in the commit? Is there anything in here which isn't generic p2m? >>> >>> The name was suggested by Razvan Cojocaru. I have no preference in the >>> name. Maybe Tamas can suggest a good one. >> >> I've suggested "ept" in the name because "regular" emulation ignores it, >> and this function takes it into account, hence the "check_ept" (which I >> thought would be read together). It's fine to change it to something else. > > Then "check_p2m" perhaps? Sounds good to me. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v9] x86/emulate: Send vm_event from emulate
On 9/11/19 1:39 PM, Alexandru Stefan ISAILA wrote: > > > On 11.09.2019 12:57, Jan Beulich wrote: >> On 09.09.2019 17:35, Alexandru Stefan ISAILA wrote: >>> +/* >>> + * Send memory access vm_events based on pfec. Returns true if the event >>> was >>> + * sent and false for p2m_get_mem_access() error, no violation and event >>> send >>> + * error. Assumes the caller will check arch.vm_event->send_event. >>> + * >>> + * NOTE: p2m_get_mem_access() can fail if the entry was not found in the >>> EPT >>> + * (in which case access to it is unrestricted, so no violations can >>> occur). >>> + * In this cases it is fine to continue the emulation. >>> + */ >>> +bool hvm_monitor_check_ept(unsigned long gla, gfn_t gfn, uint32_t pfec, >>> + uint16_t kind) >> >> Why did you choose to have "ept" in the name and also mention it >> in the commit? Is there anything in here which isn't generic p2m? > > The name was suggested by Razvan Cojocaru. I have no preference in the > name. Maybe Tamas can suggest a good one. I've suggested "ept" in the name because "regular" emulation ignores it, and this function takes it into account, hence the "check_ept" (which I thought would be read together). It's fine to change it to something else. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v6 08/10] remove late (on-demand) construction of IOMMU page tables
On 8/16/19 8:19 PM, Paul Durrant wrote: > Now that there is a per-domain IOMMU enable flag, which should be enabled if > any device is going to be passed through, stop deferring page table > construction until the assignment is done. Also don't tear down the tables > again when the last device is de-assigned; defer that task until domain > destruction. > > This allows the has_iommu_pt() helper and iommu_status enumeration to be > removed. Calls to has_iommu_pt() are simply replaced by calls to > is_iommu_enabled(). Remaining open-code tests of iommu_hap_pt_share can also > be replaced by calls to iommu_use_hap_pt(). > The arch_iommu_populate_page_table() and iommu_construct() functions become > redundant, as does the 'strict mode' dom0 page_list mapping code in > iommu_hwdom_init(), and iommu_teardown() can be made static is its only > remaining caller, iommu_domain_destroy(), is within the same source > module. > > All in all, about 220 lines of code are removed. > > NOTE: This patch will cause a small amount of extra resource to be used > to accommodate IOMMU page tables that may never be used, since the > per-domain IOMMU flag enable flag is currently set to the value > of the global iommu_enable flag. A subsequent patch will add an > option to the toolstack to allow it to be turned off if there is > no intention to assign passthrough hardware to the domain. This has slipped under my radar, sorry. Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7] x86/emulate: Send vm_event from emulate
On 7/19/19 4:38 PM, Jan Beulich wrote: On 19.07.2019 15:30, Razvan Cojocaru wrote: On 7/19/19 4:18 PM, Jan Beulich wrote: On 19.07.2019 14:34, Alexandru Stefan ISAILA wrote: On 18.07.2019 15:58, Jan Beulich wrote: On 03.07.2019 12:56, Alexandru Stefan ISAILA wrote: A/D bit writes (on page walks) can be considered benign by an introspection agent, so receiving vm_events for them is a pessimization. We try here to optimize by fitering these events out. But you add the sending of more events - how does "filter out" match the actual implementation? The events are send only if there is a mem access violation therefore we are filtering and only sending the events that are interesting to introspection. Where is it that you prevent any event from being sent? As said, reading the patch I only see new sending sites to get added. If we don't emulate, we would receive the page-walk-generated events _and_ the touching-the-page-the-instruction-is-touching events. Since the patch here alters emulation paths only, how do you know whether to emulate? In order to not receive undue events it would seem to me that you'd first have to intercept the guest on insns of interest ... Overall I think that the patch description, while it has improved, is still lacking sufficient information for a person like me (not knowing much about your monitor tools) to be able to sensibly review this (which includes understanding the precise scenario you want to improve). If the hardware exits because of an EPT fault caused by a page walk, we end up in p2m_mem_access_check(), at which point we need to decide if we want to send out a vm_event or not. If we were to send out this vm_event, and it would then be magically treated so that we get to actually run the instruction at RIP, said instruction might also hit a protected page and provoke a vm_event. Now, if npfec.kind != npfec_kind_with_gla, then we're in the page walk case, and so in this case only, and only if d->arch.monitor.inguest_pagefault_disabled is true, we would choose to do this emulation trick: emulate _the_page_walk_ while ignoring the EPT, but don't ignore the EPT for the emulation of the actual instruction. So where in the first case we would have 2 EPT events, in the second we only have one (or if the instruction at RIP does not trigger an EPT event, we would have 1 event in the first case, and none in the second). Hence the filtering mentioned. So to answer your question: "how do you know whether to emulate", we do so only if npfec.kind != npfec_kind_with_gla && d->arch.monitor.inguest_pagefault_disabled. I hope this clears it up somewhat. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7] x86/emulate: Send vm_event from emulate
On 7/19/19 4:18 PM, Jan Beulich wrote: On 19.07.2019 14:34, Alexandru Stefan ISAILA wrote: On 18.07.2019 15:58, Jan Beulich wrote: On 03.07.2019 12:56, Alexandru Stefan ISAILA wrote: A/D bit writes (on page walks) can be considered benign by an introspection agent, so receiving vm_events for them is a pessimization. We try here to optimize by fitering these events out. But you add the sending of more events - how does "filter out" match the actual implementation? The events are send only if there is a mem access violation therefore we are filtering and only sending the events that are interesting to introspection. Where is it that you prevent any event from being sent? As said, reading the patch I only see new sending sites to get added. If we don't emulate, we would receive the page-walk-generated events _and_ the touching-the-page-the-instruction-is-touching events. In comparison to the "hardware" case, the case where we emulate the instruction with the page walk ignoring the EPT results in less events, hence the prevention of some events being sent out. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5] x86/emulate: Send vm_event from emulate
On 7/1/19 5:45 PM, Alexandru Stefan ISAILA wrote: On 01.07.2019 16:13, Jan Beulich wrote: On 04.06.19 at 13:49, wrote: This patch aims to have mem access vm events sent from the emulator. This is useful where we want to only emulate a page walk without checking the EPT, but we still want to check the EPT when emulating the instruction that caused the page walk. In this case, the original EPT fault is caused by the walk trying to set the accessed or dirty bits, but executing the instruction itself might also cause an EPT fault if permitted to run, and this second fault should not be lost. I'm afraid I still can't translate this into what exactly is wanted and why. While typically we don't use examples to demonstrate that is wanted in commit messages, I think this is a rather good candidate for actually using such an approach. This may then ... We use hvmemul_map_linear_addr() to intercept r/w access and __hvm_copy() to intercept exec access. First we try to send a vm event and if the event is sent then emulation returns X86EMUL_RETRY in order to stop emulation on instructions that use access protected pages. If the event is not sent then the emulation goes on as expected. ... also help understanding this part, which I continue to be confused by, too. Simply put, the introspection agent wants to treat all A/D bit writes as benign. Receiving vm_events about them is a big pessimization, and we want to optimize. We'd like to filter these events out. To this end, we're currently (in the present incarnation of the Xen code) fully emulating the instruction at RIP when the hardware sees an EPT fault with npfec.kind != npfec_kind_with_gla. This is, however, incorrect, because the instruction at RIP might legitimately cause an EPT fault of its own (while accessing a _different_ page from the original one, where A/D were set). We've tried to solve this by actually writing the A/D bits and returning from p2m_mem_access_check(), however that has led to a slightly philosophical discussion about the proper way to achieve our goals while taking into account speculative setting of these bits. The issues raised have not been satisfactorily answered in an authoritative manner to this day. So the only option we seem to have left at this point is to perform the whole emulation, _while_ ignoring EPT restrictions for the walk part, and taking them into account for the "actual" emulating of the instruction @ RIP. If we're sending out a vm_event, then we don't want the emulation to complete - since in that case we won't be able to veto whatever is doing. That would mean that we can't actually prevent any malicious activity that happens here, instead we'd only be able to report on it. So when we see a "send-vm_event" case while emulating, we need to do two things: 1. send the event out; 2. _don't_ complete the emulation (return X86EMUL_RETRY). When 2. happens, we'll hit hvm_vm_event_do_resume() again _after_ the introspection agent treats the event and resumes the guest. There, the instruction at RIP will be fully emulated (with the EPT ignored) if the introspection application allows it, and the guest will continue to run past the instruction. That's our plan, anyway. Hopefully we're not going in a wrong direction about it (all help is very much appreciated). I hope this has made thing clearer. We're not after anything out of the ordinary or fitting a niche case - we just want to filter out vm_events caused by page walks in a manner that remains safe for the guest OS user. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RESEND] MAINTAINERS: hand over vm_event maintainership
On 6/21/19 6:15 PM, Jan Beulich wrote: On 13.06.19 at 17:11, wrote: On Thu, Jun 13, 2019 at 8:01 AM Razvan Cojocaru wrote: Remove myself as vm_event maintaner, add Alexandru and Petre as Bitdefender vm_event maintainers. Signed-off-by: Razvan Cojocaru Acked-by: Tamas K Lengyel I'll take the liberty and apply this to the revised (designated reviewer) additions. If you mean removing my email address from the maintainers list, that's fine with me, unless (as previously discussed in private with George and Tamas) my stepping down at this point would put undue stress on the x86 / mm maintainers, in which case I can continue helping through the transition phase (although unfortunately much less promptly than before). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] MAINTAINERS: Add myself as reviewer for vm_event
On 6/20/19 10:56 AM, Roger Pau Monné wrote: On Thu, Jun 20, 2019 at 10:17:46AM +0300, Petre Pircalabu wrote: Signed-off-by: Petre Pircalabu --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index ab32e7f..0151625 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -412,6 +412,7 @@ F: unmodified_drivers/linux-2.6/ VM EVENT, MEM ACCESS and MONITOR M:Razvan Cojocaru +R: Petre Pircalabu M:Tamas K Lengyel I would place the addition after the list of M(aintainers). Since this and Alexandru's are trivial patches and whatever patch ends up upstream first will possibly require a modification of the other + resend, could this be possibly fixed on commit (i.e. move both reviewers after the maintainers lines)? Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] MAINTAINERS: Add myself as reviewer for vm_event
On 6/20/19 10:17 AM, Petre Pircalabu wrote: Signed-off-by: Petre Pircalabu --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index ab32e7f..0151625 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -412,6 +412,7 @@ F: unmodified_drivers/linux-2.6/ VM EVENT, MEM ACCESS and MONITOR M:Razvan Cojocaru +R: Petre Pircalabu M:Tamas K Lengyel S:Supported F:tools/tests/xen-access Acked-by: Razvan Cojocaru ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] MAINTAINERS: Add myself as a Designated reviewer to vm_event
On 6/19/19 6:02 PM, Alexandru Stefan ISAILA wrote: Signed-off-by: Alexandru Isaila --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index ab32e7f409..78e35012e0 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -412,6 +412,7 @@ F: unmodified_drivers/linux-2.6/ VM EVENT, MEM ACCESS and MONITOR M:Razvan Cojocaru +R: Alexandru Isaila M:Tamas K Lengyel S:Supported F:tools/tests/xen-access Acked-by: Razvan Cojocaru ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC PATCH 2/2] x86/mm: Add mem access rights to NPT
On 6/13/19 6:19 PM, Tamas Lengyel wrote: On Wed, Sep 26, 2018 at 10:49 AM George Dunlap wrote: From: Isaila Alexandru This patch adds access control for NPT mode. There aren’t enough extra bits to store the access rights in the NPT p2m table, so we add a radix tree to store extra information. For efficiency: - Only allocate this radix tree when we first store "non-default" extra information - Remove entires which match the default extra information rather than continuing to store them - For superpages, only store an entry for the first gfn in the superpage. Use the order of the p2m entry being read to determine the proper place to look in the radix table. Modify p2m_type_to_flags() to accept and interpret an access value, parallel to the ept code. Add a set_default_access() method to the p2m-pt and p2m-ept versions of the p2m rather than setting it directly, to deal with different default permitted access values. Signed-off-by: Alexandru Isaila Signed-off-by: George Dunlap The mem_access/monitor bits are fairly trivial: Acked-by: Tamas K Lengyel --- NB, this is compile-tested only. Are you planning to do some actual testing? I would highly recommend that we see real test results before this is merged to verify functionality. We did do some testing with xen-access at the time, but limited testing with the actual full-blown introspection agent (because not all the needed pieces align yet). Things did appear to work as intended. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH RESEND] MAINTAINERS: hand over vm_event maintainership
Remove myself as vm_event maintaner, add Alexandru and Petre as Bitdefender vm_event maintainers. Signed-off-by: Razvan Cojocaru --- Re-sent to add Tamas', Petre's and Alexandru's addresses. --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 6fbdc2b..d60e3a5 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -405,7 +405,8 @@ L: xen-devel@lists.xenproject.org F: unmodified_drivers/linux-2.6/ VM EVENT, MEM ACCESS and MONITOR -M: Razvan Cojocaru +M: Alexandru Isaila +M: Petre Pircalabu M: Tamas K Lengyel S: Supported F: tools/tests/xen-access -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] MAINTAINERS: hand over vm_event maintainership
On 6/12/19 7:02 PM, Razvan Cojocaru wrote: Remove myself as vm_event maintaner, add Alexandru and Petre as Bitdefender vm_event maintainers. Signed-off-by: Razvan Cojocaru --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 6fbdc2b..d60e3a5 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -405,7 +405,8 @@ L: xen-devel@lists.xenproject.org F:unmodified_drivers/linux-2.6/ VM EVENT, MEM ACCESS and MONITOR -M: Razvan Cojocaru +M: Alexandru Isaila +M: Petre Pircalabu M:Tamas K Lengyel S:Supported F:tools/tests/xen-access I'm not sure why get-maintainers.pl did not add Tamas' email address (added now). I'll still be in Bitdefender, subscribed to xen-devel and following the project, but I'll be quite a bit more involved in other projects and that makes being a maintainer difficult. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] MAINTAINERS: hand over vm_event maintainership
Remove myself as vm_event maintaner, add Alexandru and Petre as Bitdefender vm_event maintainers. Signed-off-by: Razvan Cojocaru --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 6fbdc2b..d60e3a5 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -405,7 +405,8 @@ L: xen-devel@lists.xenproject.org F: unmodified_drivers/linux-2.6/ VM EVENT, MEM ACCESS and MONITOR -M: Razvan Cojocaru +M: Alexandru Isaila +M: Petre Pircalabu M: Tamas K Lengyel S: Supported F: tools/tests/xen-access -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 4/4] x86/mem_sharing: compile mem_sharing subsystem only when kconfig is enabled
On 6/3/19 7:38 PM, Tamas K Lengyel wrote: On Mon, Jun 3, 2019 at 2:26 AM Jan Beulich wrote: On 16.05.19 at 23:37, wrote: Disable it by default as it is only an experimental subsystem. Signed-off-by: Tamas K Lengyel Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: Roger Pau Monne Cc: George Dunlap Cc: Ian Jackson Cc: Julien Grall Cc: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Tim Deegan Cc: George Dunlap v4: add ASSERT_UNREACHABLE to inlined functions where applicable & other fixups --- xen/arch/x86/Kconfig | 6 +- xen/arch/x86/domain.c | 2 ++ xen/arch/x86/domctl.c | 2 ++ xen/arch/x86/mm/Makefile | 2 +- xen/arch/x86/x86_64/compat/mm.c | 2 ++ xen/arch/x86/x86_64/mm.c | 2 ++ xen/common/Kconfig| 3 --- xen/common/domain.c | 2 +- xen/common/grant_table.c | 2 +- xen/common/memory.c | 2 +- xen/common/vm_event.c | 6 +++--- xen/include/asm-x86/mem_sharing.h | 28 xen/include/asm-x86/mm.h | 3 +++ xen/include/xen/sched.h | 2 +- xen/include/xsm/dummy.h | 2 +- xen/include/xsm/xsm.h | 4 ++-- xen/xsm/dummy.c | 2 +- xen/xsm/flask/hooks.c | 4 ++-- 18 files changed, 58 insertions(+), 18 deletions(-) Daniel, it looks like you weren't Cc-ed here, but your ack is needed. Indeed, I've also seem to have missed CC-ing Razvan (fixed now). Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/5] xen/vm-event: Fix interactions with the vcpu list
On 6/3/19 3:25 PM, Andrew Cooper wrote: vm_event_resume() should use domain_vcpu(), rather than opencoding it without its Spectre v1 safety. vm_event_wake_blocked() can't ever be invoked in a case where d->vcpu is NULL, so drop the outer if() and reindent, fixing up style issues. The comment, which is left alone, is false. This algorithm still has starvation issues when there is an asymetric rate of generated events. However, the existing logic is sufficiently complicated and fragile that I don't think I've followed it fully, and because we're trying to obsolete this interface, the safest course of action is to leave it alone, rather than to end up making things subtly different. Therefore, no practical change that callers would notice. Signed-off-by: Andrew Cooper Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 3/5] xen/vm-event: Remove unnecessary vm_event_domain indirection
On 6/3/19 3:25 PM, Andrew Cooper wrote: The use of (*ved)-> leads to poor code generation, as the compiler can't assume the pointer hasn't changed, and results in hard-to-follow code. For both vm_event_{en,dis}able(), rename the ved parameter to p_ved, and work primarily with a local ved pointer. This has a key advantage in vm_event_enable(), in that the partially constructed vm_event_domain only becomes globally visible once it is fully constructed. As a consequence, the spinlock doesn't need holding. Furthermore, rearrange the order of operations to be more sensible. Check for repeated enables and an bad HVM_PARAM before allocating memory, and gather the trivial setup into one place, dropping the redundant zeroing. No practical change that callers will notice. Signed-off-by: Andrew Cooper Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 5/5] xen/vm-event: Misc fixups
On 6/3/19 3:25 PM, Andrew Cooper wrote: * Drop redundant brackes, and inline qualifiers. * Insert newlines and spaces where appropriate. * Drop redundant NDEBUG - gdprint() is already conditional. Fix the logging level, as gdprintk() already prefixes the guest marker. No functional change. Signed-off-by: Andrew Cooper --- CC: Razvan Cojocaru CC: Tamas K Lengyel CC: Petre Pircalabu --- xen/common/vm_event.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c index 72f42b4..e872680 100644 --- a/xen/common/vm_event.c +++ b/xen/common/vm_event.c @@ -102,6 +102,7 @@ static int vm_event_enable( static unsigned int vm_event_ring_available(struct vm_event_domain *ved) { int avail_req = RING_FREE_REQUESTS(&ved->front_ring); + avail_req -= ved->target_producers; avail_req -= ved->foreign_producers; @@ -168,7 +169,7 @@ static void vm_event_wake_queued(struct domain *d, struct vm_event_domain *ved) */ void vm_event_wake(struct domain *d, struct vm_event_domain *ved) { -if (!list_empty(&ved->wq.list)) +if ( !list_empty(&ved->wq.list) ) vm_event_wake_queued(d, ved); else vm_event_wake_blocked(d, ved); @@ -216,8 +217,8 @@ static int vm_event_disable(struct domain *d, struct vm_event_domain **p_ved) return 0; } -static inline void vm_event_release_slot(struct domain *d, - struct vm_event_domain *ved) +static void vm_event_release_slot(struct domain *d, + struct vm_event_domain *ved) But inline is still asking the compiler to try and generate code that doesn't end up CALLing an actual function, so is it really redundant here? I do realize that for most cases the compiler will have its way with this code anyway - especially since the function is static - but "static" is not guaranteed to also mean "inline", is it? In any case, Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/5] xen/vm-event: Expand vm_event_* spinlock macros and rename the lock
On 6/3/19 3:25 PM, Andrew Cooper wrote: These serve no purpose, but to add to the congnitive load of following the code. Remove the level of indirection. Furthermore, the lock protects all data in vm_event_domain, making ring_lock a poor choice of name. For vm_event_get_response() and vm_event_grab_slot(), fold the exit paths to have a single unlock, as the compiler can't make this optimisation itself. No functional change. Signed-off-by: Andrew Cooper Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/5] xen/vm-event: Drop unused u_domctl parameter from vm_event_domctl()
On 6/3/19 3:25 PM, Andrew Cooper wrote: This parameter isn't used at all. Futhermore, elide the copyback in failing cases, as it is only successful paths which generate data which needs sending back to the caller. Finally, drop a redundant d == NULL check, as that logic is all common at the begining of do_domctl(). No functional change. Signed-off-by: Andrew Cooper Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] xen/vm_event: fix rc check for uninitialized ring
On 5/24/19 6:23 PM, Juergen Gross wrote: > vm_event_claim_slot() returns -EOPNOTSUPP for an uninitialized ring > since commit 15e4dd5e866b43bbc ("common/vm_event: Initialize vm_event > lists on domain creation"), but the callers test for -ENOSYS. > > Correct the callers. > > Signed-off-by: Juergen Gross > --- > V2: add blank line (Jan Beulich) > replace two further ENOSYS returns with EOPNOTSUPP (Razvan Cojocaru) Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/vm_event: fix rc check for uninitialized ring
On 5/24/19 4:15 PM, Juergen Gross wrote: vm_event_claim_slot() returns -EOPNOTSUPP for an uninitialized ring since commit 15e4dd5e866b43bbc ("common/vm_event: Initialize vm_event lists on domain creation"), but the callers test for -ENOSYS. Correct the callers. Signed-off-by: Juergen Gross --- xen/arch/x86/mm/p2m.c | 2 +- xen/common/monitor.c | 2 +- xen/common/vm_event.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 57c5eeda91..8a9a1153a8 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -1705,7 +1705,7 @@ void p2m_mem_paging_populate(struct domain *d, unsigned long gfn_l) /* We're paging. There should be a ring */ int rc = vm_event_claim_slot(d, d->vm_event_paging); -if ( rc == -ENOSYS ) +if ( rc == -EOPNOTSUPP ) { gdprintk(XENLOG_ERR, "Domain %hu paging gfn %lx yet no ring " "in place\n", d->domain_id, gfn_l); diff --git a/xen/common/monitor.c b/xen/common/monitor.c index cb5f37fdb2..d5c9ff1cbf 100644 --- a/xen/common/monitor.c +++ b/xen/common/monitor.c @@ -98,7 +98,7 @@ int monitor_traps(struct vcpu *v, bool sync, vm_event_request_t *req) { case 0: break; -case -ENOSYS: +case -EOPNOTSUPP: /* * If there was no ring to handle the event, then * simply continue executing normally. diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c index 6e68be47bc..7b4ebced16 100644 --- a/xen/common/vm_event.c +++ b/xen/common/vm_event.c @@ -513,7 +513,7 @@ bool vm_event_check_ring(struct vm_event_domain *ved) * this function will always return 0 for a guest. For a non-guest, we check * for space and return -EBUSY if the ring is not available. * - * Return codes: -ENOSYS: the ring is not yet configured + * Return codes: -EOPNOTSUPP: the ring is not yet configured * -EBUSY: the ring is busy * 0: a spot has been reserved * But vm_event_grab_slot() (which ends up being what vm_event_wait_slot() returns), still returns -ENOSYS: 463 static int vm_event_grab_slot(struct vm_event_domain *ved, int foreign) 464 { 465 unsigned int avail_req; 466 467 if ( !ved->ring_page ) 468 return -ENOSYS; Although we can't get to that part if vm_event_check_ring() returns false, we should probably return -EOPNOTSUPP from there as well. In fact, I wonder if any of the -ENOSYS returns in vm_event.c should not be replaced with return -EOPNOTSUPPs. Anyway, the change does clearly improve the code even without settling the above questions, so: Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v6] x86/altp2m: Aggregate get entry and populate into common funcs
On 5/14/19 6:42 PM, Jan Beulich wrote: On 23.04.19 at 16:30, wrote: --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -478,6 +478,43 @@ void p2m_unlock_and_tlb_flush(struct p2m_domain *p2m) mm_write_unlock(&p2m->lock); } +int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn, + p2m_type_t *t, p2m_access_t *a, + bool prepopulate) +{ +*mfn = ap2m->get_entry(ap2m, gfn, t, a, 0, NULL, NULL); + +/* Check host p2m if no valid entry in alternate */ +if ( !mfn_valid(*mfn) && !p2m_is_hostp2m(ap2m) ) +{ +struct p2m_domain *hp2m = p2m_get_hostp2m(ap2m->domain); +unsigned int page_order; +int rc; + +*mfn = __get_gfn_type_access(hp2m, gfn_x(gfn), t, a, + P2M_ALLOC | P2M_UNSHARE, &page_order, 0); + +rc = -ESRCH; +if ( !mfn_valid(*mfn) || *t != p2m_ram_rw ) +return rc; + +/* If this is a superpage, copy that first */ +if ( prepopulate && page_order != PAGE_ORDER_4K ) +{ +unsigned long mask = ~((1UL << page_order) - 1); +gfn_t gfn_aligned = _gfn(gfn_x(gfn) & mask); +mfn_t mfn_aligned = _mfn(mfn_x(*mfn) & mask); + +rc = ap2m->set_entry(ap2m, gfn_aligned, mfn_aligned, page_order, *t, *a, 1); +if ( rc ) +return rc; +} +} + +return 0; +} + + mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l, p2m_type_t *t, p2m_access_t *a, p2m_query_t q, unsigned int *page_order, bool_t locked) May I ask how the placement of this function was chosen? It looks as if all its callers live inside #ifdef CONFIG_HVM sections, just this function doesn't (reportedly causing build issues together with later changes). I believe it's just an oversight. I've sent out a patch that hopefully fixes this in a satisfactory manner. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] x86/altp2m: move altp2m_get_effective_entry() under CONFIG_HVM
All its callers live inside #ifdef CONFIG_HVM sections. Signed-off-by: Razvan Cojocaru --- xen/arch/x86/mm/p2m.c | 72 +++ xen/include/asm-x86/p2m.h | 2 ++ 2 files changed, 38 insertions(+), 36 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index cc6661e..57c5eed 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -478,42 +478,6 @@ void p2m_unlock_and_tlb_flush(struct p2m_domain *p2m) mm_write_unlock(&p2m->lock); } -int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn, - p2m_type_t *t, p2m_access_t *a, - bool prepopulate) -{ -*mfn = ap2m->get_entry(ap2m, gfn, t, a, 0, NULL, NULL); - -/* Check host p2m if no valid entry in alternate */ -if ( !mfn_valid(*mfn) && !p2m_is_hostp2m(ap2m) ) -{ -struct p2m_domain *hp2m = p2m_get_hostp2m(ap2m->domain); -unsigned int page_order; -int rc; - -*mfn = __get_gfn_type_access(hp2m, gfn_x(gfn), t, a, - P2M_ALLOC | P2M_UNSHARE, &page_order, 0); - -rc = -ESRCH; -if ( !mfn_valid(*mfn) || *t != p2m_ram_rw ) -return rc; - -/* If this is a superpage, copy that first */ -if ( prepopulate && page_order != PAGE_ORDER_4K ) -{ -unsigned long mask = ~((1UL << page_order) - 1); -gfn_t gfn_aligned = _gfn(gfn_x(gfn) & mask); -mfn_t mfn_aligned = _mfn(mfn_x(*mfn) & mask); - -rc = ap2m->set_entry(ap2m, gfn_aligned, mfn_aligned, page_order, *t, *a, 1); -if ( rc ) -return rc; -} -} - -return 0; -} - mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l, p2m_type_t *t, p2m_access_t *a, p2m_query_t q, unsigned int *page_order, bool_t locked) @@ -2378,6 +2342,42 @@ int unmap_mmio_regions(struct domain *d, #ifdef CONFIG_HVM +int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn, + p2m_type_t *t, p2m_access_t *a, + bool prepopulate) +{ +*mfn = ap2m->get_entry(ap2m, gfn, t, a, 0, NULL, NULL); + +/* Check host p2m if no valid entry in alternate */ +if ( !mfn_valid(*mfn) && !p2m_is_hostp2m(ap2m) ) +{ +struct p2m_domain *hp2m = p2m_get_hostp2m(ap2m->domain); +unsigned int page_order; +int rc; + +*mfn = __get_gfn_type_access(hp2m, gfn_x(gfn), t, a, + P2M_ALLOC | P2M_UNSHARE, &page_order, 0); + +rc = -ESRCH; +if ( !mfn_valid(*mfn) || *t != p2m_ram_rw ) +return rc; + +/* If this is a superpage, copy that first */ +if ( prepopulate && page_order != PAGE_ORDER_4K ) +{ +unsigned long mask = ~((1UL << page_order) - 1); +gfn_t gfn_aligned = _gfn(gfn_x(gfn) & mask); +mfn_t mfn_aligned = _mfn(mfn_x(*mfn) & mask); + +rc = ap2m->set_entry(ap2m, gfn_aligned, mfn_aligned, page_order, *t, *a, 1); +if ( rc ) +return rc; +} +} + +return 0; +} + void p2m_altp2m_check(struct vcpu *v, uint16_t idx) { if ( altp2m_active(v->domain) ) diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index 2d0bda1..7e71bf0 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -514,6 +514,7 @@ static inline unsigned long mfn_to_gfn(struct domain *d, mfn_t mfn) return mfn_x(mfn); } +#ifdef CONFIG_HVM #define AP2MGET_prepopulate true #define AP2MGET_query false @@ -525,6 +526,7 @@ static inline unsigned long mfn_to_gfn(struct domain *d, mfn_t mfn) int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn, p2m_type_t *t, p2m_access_t *a, bool prepopulate); +#endif /* Deadlock-avoidance scheme when calling get_gfn on different gfn's */ struct two_gfns { -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults
On 5/14/19 5:16 PM, Jan Beulich wrote: On 14.05.19 at 15:47, wrote: Mem event emulation failed (5): d5v0 32bit @ 001b:6d96efff -> c5 f9 f5 05 c0 be ad 6d c5 e1 fe 1d a0 20 af 6d Looking at the source code, the emulator does appear to support vpmaddwd, however only for EVEX: http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/x86_emulate/x 86_emulate.c;h=032995ea586aa7dd90a1953b6ded656436652049;hb=refs/heads/staging #l6696 whereas our fail case uses VEX. This may be in the works in the aforementioned series, but is legitimately unsupported in 4.13 staging. Hmm, interesting. The origin of the encoding is at MMX times, which means it's more than just VPMADDWD that's missing, and it's been an omission back in the MMX/SSE2 series then. That's a genuine oversight, and in the light of this I'd like to apologize for my unfriendly initial reaction. I'll see about getting this fixed. (It would have helped if you had shared the encoding right away, since the mnemonic and operands are now often insufficient.) No problem at all. Indeed, sharing the encoding would have cleared things up faster. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults
On 5/13/19 5:15 PM, Razvan Cojocaru wrote: On 5/13/19 5:06 PM, Jan Beulich wrote: On 13.05.19 at 15:58, wrote: On 11/27/18 12:49 PM, Razvan Cojocaru wrote: With a sufficiently complete insn emulator, single-stepping should not be needed at all imo. Granted we're not quite there yet with the emulator, but we've made quite a bit of progress. As before, if there are particular instructions you know of that the emulator doesn't handle yet, please keep pointing these out. Last I know were some AVX move instructions, which have long been implemented. True, I haven't seen emulator issues in that respect with staging - the emulator appears lately to be sufficiently complete. Thank you very much for your help and support - we'll definitely point out unsupported instructions if we spot some again. We've come accross a new instruction that the emulator can't handle in Xen 4.13-unstable today: vpmaddwd xmm4,xmm4,XMMWORD PTR ds:0x513fbb20 Perhaps there are plans for this to go into the emulator as well? You're kidding? This is already in 4.12.0, and if it weren't I'm sure you're aware there are about 40 more AVX512 patches pending review. Right, I did indeed forget about the pending review part, for some reason I was sure they made it in. I've double-checked and we really are using 4.13-unstable - but we've also made changes to the emulator, working on the send-vm-events-from-the-emulator patch, so we'll revert to a pristine staging and retry, there's a chance this happens because of our changes. We'll find out what's going on exactly. I promised I'd return with more details. After some debugging, it certainly looks like the emulator returns UNIMPLEMENTED (5): Mem event emulation failed (5): d5v0 32bit @ 001b:6d96efff -> c5 f9 f5 05 c0 be ad 6d c5 e1 fe 1d a0 20 af 6d Looking at the source code, the emulator does appear to support vpmaddwd, however only for EVEX: http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/x86_emulate/x86_emulate.c;h=032995ea586aa7dd90a1953b6ded656436652049;hb=refs/heads/staging#l6696 whereas our fail case uses VEX. This may be in the works in the aforementioned series, but is legitimately unsupported in 4.13 staging. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs
On 5/13/19 7:18 PM, Mathieu Tarral wrote: > Le vendredi, mai 10, 2019 5:21 PM, Andrew Cooper > a écrit : > >> On 10/05/2019 16:17, Mathieu Tarral wrote: >> >>> Le jeudi, mai 9, 2019 6:42 PM, Andrew Cooper andrew.coop...@citrix.com a >>> écrit : >>> Therefore, the conclusion to draw is that it is a logical bug somewhere. >>> The bug is still here, so we can exclude a microcode issue. >> >> Good - that is one further angle excluded. Always make sure you are >> running with up-to-date microcode, but it looks like we back to >> investigating a logical bug in libvmi or Xen. > > I played with tool/tests/xen-access this afternoon. > > The tool is working, i could intercept breakpoints, cpuid, write and exec mem > accesses, etc.. > > However, using altp2m related intercepts leads to a guest crash sometimes: > > Windows 7 x64, 4 VCPUs > - altp2m_write: crash > - altp2m_exec: crash > - altp2m_write_no_gpt: frozen > > Windows 7 x64, 1 VCPU > - altp2m_write: crash > - altp2m_exec: OK > - altp2m_write_no_gpt: frozen > > "frozen" means that xen-access receives VMI events, bug the guest is frozen > until I decide to stop xen-access. > I'm wondering what kind of exec events it received because they are not the > same, so it's not looping > over the same RIP over and over. (?) I think you're simply tripping some OS timer because you're slowing the guest down in the crash case, and simply keep the guest too busy handling events in the "freeze" case. Remember that there's quite a delay running each offending instruction: one vm_event saying you've got a violation, a reply saying "put this VCPU in single-step mode _and_ switch to the unrestricted EPT view", another vm_event saying "instruction executed", followed by anoher reply saying "switch back to the restricted EPT _and_ take the VCPU out of single-step mode". Restricting the whole of the guest's memory (and so doing this dance for _every_ instruction causing a fault) is practically guaranteed to upset the OS. A little EPT restricting goes a long way. Of course, if this could be improved so that even stress-tests (which is basically what xen-access is) leave the guest running smoothly, that'd be fantastic. Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults
On 5/13/19 5:06 PM, Jan Beulich wrote: On 13.05.19 at 15:58, wrote: On 11/27/18 12:49 PM, Razvan Cojocaru wrote: With a sufficiently complete insn emulator, single-stepping should not be needed at all imo. Granted we're not quite there yet with the emulator, but we've made quite a bit of progress. As before, if there are particular instructions you know of that the emulator doesn't handle yet, please keep pointing these out. Last I know were some AVX move instructions, which have long been implemented. True, I haven't seen emulator issues in that respect with staging - the emulator appears lately to be sufficiently complete. Thank you very much for your help and support - we'll definitely point out unsupported instructions if we spot some again. We've come accross a new instruction that the emulator can't handle in Xen 4.13-unstable today: vpmaddwd xmm4,xmm4,XMMWORD PTR ds:0x513fbb20 Perhaps there are plans for this to go into the emulator as well? You're kidding? This is already in 4.12.0, and if it weren't I'm sure you're aware there are about 40 more AVX512 patches pending review. Right, I did indeed forget about the pending review part, for some reason I was sure they made it in. I've double-checked and we really are using 4.13-unstable - but we've also made changes to the emulator, working on the send-vm-events-from-the-emulator patch, so we'll revert to a pristine staging and retry, there's a chance this happens because of our changes. We'll find out what's going on exactly. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults
On 11/27/18 12:49 PM, Razvan Cojocaru wrote: With a sufficiently complete insn emulator, single-stepping should not be needed at all imo. Granted we're not quite there yet with the emulator, but we've made quite a bit of progress. As before, if there are particular instructions you know of that the emulator doesn't handle yet, please keep pointing these out. Last I know were some AVX move instructions, which have long been implemented. True, I haven't seen emulator issues in that respect with staging - the emulator appears lately to be sufficiently complete. Thank you very much for your help and support - we'll definitely point out unsupported instructions if we spot some again. We've come accross a new instruction that the emulator can't handle in Xen 4.13-unstable today: vpmaddwd xmm4,xmm4,XMMWORD PTR ds:0x513fbb20 Perhaps there are plans for this to go into the emulator as well? Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Altp2m use with PML can deadlock Xen
On 5/10/19 5:42 PM, Tamas K Lengyel wrote: On Thu, May 9, 2019 at 10:19 AM Andrew Cooper wrote: On 09/05/2019 14:38, Tamas K Lengyel wrote: Hi all, I'm investigating an issue with altp2m that can easily be reproduced and leads to a hypervisor deadlock when PML is available in hardware. I haven't been able to trace down where the actual deadlock occurs. The problem seem to stem from hvm/vmx/vmcs.c:vmx_vcpu_flush_pml_buffer that calls p2m_change_type_one on all gfns that were recorded the PML buffer. The problem occurs when the PML buffer full vmexit happens while the active p2m is an altp2m. Switching p2m_change_type_one to work with the altp2m instead of the hostp2m however results in EPT misconfiguration crashes. Adding to the issue is that it seem to only occur when the altp2m has remapped GFNs. Since PML records entries based on GFN leads me to question whether it is safe at all to use PML when altp2m is used with GFN remapping. However, AFAICT the GFNs in the PML buffer are not the remapped GFNs and my understanding is that it should be safe as long as the GFNs being tracked by PML are never the remapped GFNs. Booting Xen with ept=pml=0 resolves the issue. If anyone has any insight into what might be happening, please let me know. I could have sworn that George spotted a problem here and fixed it. I shouldn't be surprised if we have more. The problem that PML introduced (and this is mostly my fault, as I suggested the buggy solution) is that the vmexit handler from one vcpu pauses others to drain the PML queue into the dirty bitmap. Overall I wasn't happy with the design and I've got some ideas to improve it, but within the scope of how altp2m was engineered, I proposed domain_pause_except_self(). As it turns out, that is vulnerable to deadlocks when you get two vcpus trying to pause each other and waiting for each other to become de-scheduled. Makes sense. I see this has been reused by the altp2m code, but it *should* be safe to deadlocks now that it takes the hypercall_deadlock_mutext. Is that already in staging or your x86-next branch? I would like to verify that the problem is still present or not with that change. I tested with Xen 4.12 release and that definitely still deadlocks. I don't know if Andrew is talking about this patch (probably not, but it looks at least related): http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=24d5282527f4647907b3572820b5335c15cd0356;hp=29d28b29190ba09d53ae7e475108def84e16e363 Since there's a "Release-acked" tag on it, I think it's in 4.12. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [VMI] How to add support for MOV-TO-DRx events ?
On 5/10/19 2:03 AM, Tamas K Lengyel wrote: >> Either way, everything comes down to what behaviour is wanted to start with. > As I said, I think adding that monitoring capability is fine as long > as its limitation is clearly documented. Right. My thoughts as well. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [VMI] How to add support for MOV-TO-DRx events ?
On 5/10/19 12:31 AM, Andrew Cooper wrote: > What we'll have to do is end up in a position where we can have some > real %dr settings given by the VMI agent, and some shadow %dr settings > which the guest interacts with. Also I should warn you at this point > that, because of how the registers work, It will not be possible to have > guest-shadowed %dr functioning at the same time as VMI-provided %dr > settings. > > I guess the main usecase here is simply hiding from the guest kernel > that debugging activities are in use, and we are ok to break the real > use of gdb/other inside the guest? Razvan/Tamas: As your the > maintainers, it is your call, ultimately. What worries me here is that in that case it becomes easier for a rogue application inside the guest to figure out that the guest's being monitored, if I understand things correctly. Of course, a dom0 introspection agent may choose to simply not subscribe to DR events, and thus not alter the current flow at all, which makes things better. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [VMI] How to add support for MOV-TO-DRx events ?
On 5/9/19 11:57 PM, Mathieu Tarral wrote: > Hi, > > following a previous conversation, i would like to catch MOV-TO-DRx VMI > events to prevent the guest from disabling my hardware breakpoints. > > @Tamas pointed me to this header: > https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/vm_event.h;h=b2bafc0d77f9758e42b0d53c05a7e6bb86c86686;hb=HEAD#l154 > > And, as far as I can tell, I have to > - add a new REASON > #define VM_EVENT_REASON_WRITE_DEBUGREG 15 > > - add a new struct > struct vm_event_write_debugreg { > uint32_t index; > uint32_t _pad; > uint64_t new_value; > uint64_t old_value; > }; > > - insert it into the vm_event_st union > > Can you give me more pointer and guidance how to implement this into Xen ? You probably want to change the write_debugreg() macro into a function that does what's currently being done + send out the vm_event. You also probably need to think about whether you want the write to be preemptable or not (I'm guessing you don't, in which case it's all simpler). > I have never submitted a patch, nor looked into Xen source code. https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches > Should we create a single event for MOV-TO-REGx VMI events ? > It would void copy pasting and duplicating code. I don't understand this question. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3] x86/vm_event: add gdtr_base to the vm_event structure
On 5/3/19 11:09 AM, Jan Beulich wrote: On 03.05.19 at 00:54, wrote: Receiving this register is useful for introspecting 32-bit Windows when the event being trapped happened while in ring3. Signed-off-by: Tamas K Lengyel Cc: Razvan Cojocaru Cc: Tamas K Lengyel Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: Roger Pau Monne --- v2: add gdtr limit v3: use uint32_t to fit the 20 bits As per Andrew's response I think v2 is it. Yes, please. This will also allow us to reuse the existing (remaining) pad bits in the future for another limit (for idtr, perhaps). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/vm_event: add gdtr_base to the vm_event structure
On 5/3/19 12:42 AM, Tamas K Lengyel wrote: > Receiving this register is useful for introspecting 32-bit Windows when the > event being trapped happened while in ring3. > > Signed-off-by: Tamas K Lengyel > Cc: Razvan Cojocaru > Cc: Tamas K Lengyel > Cc: Jan Beulich > Cc: Andrew Cooper > Cc: Wei Liu > Cc: Roger Pau Monne > --- > v2: add gdtr limit Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/vm_event: add gdtr_base to the vm_event structure
On 5/2/19 4:09 PM, Tamas K Lengyel wrote: On Thu, May 2, 2019 at 2:57 AM Jan Beulich wrote: On 02.05.19 at 10:25, wrote: On 5/2/19 2:57 AM, Tamas K Lengyel wrote: Receiving this register is useful for introspecting 32-bit Windows when the event being trapped happened while in ring3. Signed-off-by: Tamas K Lengyel Cc: Razvan Cojocaru Cc: Tamas K Lengyel Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: Roger Pau Monne --- xen/arch/x86/vm_event.c | 5 + xen/include/public/vm_event.h | 3 ++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c index 51c3493b1d..873788e32f 100644 --- a/xen/arch/x86/vm_event.c +++ b/xen/arch/x86/vm_event.c @@ -179,6 +179,10 @@ static void vm_event_pack_segment_register(enum x86_segment segment, reg->es_sel = seg.sel; break; +case x86_seg_gdtr: +reg->gdtr_base = seg.base; +break; Please also add limit, ar, sel, like the others do. There's no ar or sel for GDT (and IDT). Instead, because of ... In addition to making this modification look less strange (since, in contrast to the function name, nothing is packed for gdtr_base), it will also save future work for applications wanting to use gdtr which also require backwards compatibility with previous vm_event versions. As you know, for each such modification we need to have a separate vm_event_vX header in our applications so that we're ready to create a ring buffer using requests and replies of the right size, and also to be able to properly interpret the ring buffer data, so the least frequent changes to the vm_event struct, the better. ... this I wonder whether the IDT details shouldn't get added at the same time. The churn of the header is not that big of a deal - I do the same in LibVMI and it's a largely copy-paste job that takes perhaps ~5m to add (see https://github.com/tklengyel/libvmi/commit/7509ce56d0408dbec4e374b8780da69a7bd212e8). So that should not be a factor in deciding whether we add a needed extra piece or not. Since the vm_event ABI is changing for Xen 4.13 now the ABI version will only be bumped once for 4.13. So we should be able to add/remove/shuffle whatever we want while 4.13 merge window is open. That said I don't have a use for idt and gdtr_limit that warrants having to receive it via the vm_event structure - those pieces are always just a hypercall away if needed. So in the vm_event structure I tend to just put the registers needed often enough to warrant avoiding that extra hypercall. But if Razvan says he has a use for them, I can add them here. We do use them both - idtr and gdtr, both base and limit, but we are indeed getting them via hypercall now. Considering that, since we did add gdtr_base I think it would be great if we could also add gdtr_limit. Adding idtr as well would _theoretically_ be a nice speed optimization (removing the need for a hypercall), but it might also be a speed pessimization generally applicable to increasing the vm_event size (since a VCPU with no more space left in the vm_event ring buffer will be blocked more and go through more locking logic). My point is, at the moment it's fine to skip idtr if it's not required by Jan, but if we do add either then let's please add both _base and _limit. In a hopefully near future we'll be able to use as much memory as necessary for storing vm_event data and just cache everything in the ring buffer, avoing all the "get context" hypercalls. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/vm_event: add gdtr_base to the vm_event structure
On 5/2/19 11:52 AM, Andrew Cooper wrote: On 02/05/2019 09:25, Razvan Cojocaru wrote: On 5/2/19 2:57 AM, Tamas K Lengyel wrote: Receiving this register is useful for introspecting 32-bit Windows when the event being trapped happened while in ring3. Signed-off-by: Tamas K Lengyel Cc: Razvan Cojocaru Cc: Tamas K Lengyel Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: Roger Pau Monne --- xen/arch/x86/vm_event.c | 5 + xen/include/public/vm_event.h | 3 ++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c index 51c3493b1d..873788e32f 100644 --- a/xen/arch/x86/vm_event.c +++ b/xen/arch/x86/vm_event.c @@ -179,6 +179,10 @@ static void vm_event_pack_segment_register(enum x86_segment segment, reg->es_sel = seg.sel; break; + case x86_seg_gdtr: + reg->gdtr_base = seg.base; + break; Please also add limit, ar, sel, like the others do. In Xen, we model GDTR/IDTR just like all other segments in the segment cache for convenience/consistency, including faking up of some default attributes. However, there is no such thing as a selector or access rights for them, and the VMCS lacks the fields, while the VMCB marks the fields as reserved. It is almost certainly not worth wasting the space in the ring. Right, I got carried away there: I saw gdtr_limit and didn't check that if the rest is available. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/vmx: correctly gather gs_shadow value for current vCPU
On 5/2/19 11:45 AM, Andrew Cooper wrote: On 02/05/2019 07:20, Razvan Cojocaru wrote: On 5/2/19 2:52 AM, Tamas K Lengyel wrote: Currently the gs_shadow value is only cached when the vCPU is being scheduled out by Xen. Reporting this (usually) stale value through vm_event is incorrect, since it doesn't represent the actual state of the vCPU at the time the event was recorded. This prevents vm_event subscribers from correctly finding kernel structures in the guest when it is trapped while in ring3. Refresh shadow_gs value when the context being saved is for the current vCPU. Signed-off-by: Tamas K Lengyel Cc: Razvan Cojocaru Cc: Jun Nakajima Cc: Kevin Tian Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: Roger Pau Monne --- v2: move fix to hvm so vm_event doesn't have to know specifics --- xen/arch/x86/hvm/vmx/vmx.c | 5 + 1 file changed, 5 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 283eb7b34d..5154ecc2a8 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -779,12 +779,17 @@ static void vmx_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data) static void vmx_save_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt) { +if ( v == current ) +vmx_save_guest_msrs(v); vmx_save_guest_msrs() is simple enough that the if is probably not necessary here (we can just call the function directly, regardless of what v holds). Why? Doing that would fully corrupt v's state when called in remote context, as you'd clobber v's gs_shadow which whatever the value was from the current vcpu. Good point, I've missed that. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/vmx: correctly gather gs_shadow value for current vCPU
On 5/2/19 11:36 AM, Jan Beulich wrote: On 02.05.19 at 08:20, wrote: On 5/2/19 2:52 AM, Tamas K Lengyel wrote: --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -779,12 +779,17 @@ static void vmx_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data) static void vmx_save_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt) { +if ( v == current ) +vmx_save_guest_msrs(v); vmx_save_guest_msrs() is simple enough that the if is probably not necessary here (we can just call the function directly, regardless of what v holds). Avoiding an MSR access is perhaps worth the conditional. Fair enough. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/vm_event: add gdtr_base to the vm_event structure
On 5/2/19 2:57 AM, Tamas K Lengyel wrote: Receiving this register is useful for introspecting 32-bit Windows when the event being trapped happened while in ring3. Signed-off-by: Tamas K Lengyel Cc: Razvan Cojocaru Cc: Tamas K Lengyel Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: Roger Pau Monne --- xen/arch/x86/vm_event.c | 5 + xen/include/public/vm_event.h | 3 ++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c index 51c3493b1d..873788e32f 100644 --- a/xen/arch/x86/vm_event.c +++ b/xen/arch/x86/vm_event.c @@ -179,6 +179,10 @@ static void vm_event_pack_segment_register(enum x86_segment segment, reg->es_sel = seg.sel; break; +case x86_seg_gdtr: +reg->gdtr_base = seg.base; +break; Please also add limit, ar, sel, like the others do. In addition to making this modification look less strange (since, in contrast to the function name, nothing is packed for gdtr_base), it will also save future work for applications wanting to use gdtr which also require backwards compatibility with previous vm_event versions. As you know, for each such modification we need to have a separate vm_event_vX header in our applications so that we're ready to create a ring buffer using requests and replies of the right size, and also to be able to properly interpret the ring buffer data, so the least frequent changes to the vm_event struct, the better. Petre is currently working on the vm_event changes that will hopefully enable us to just cache everything that the getcontext_partial hypercall retrieves. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/vmx: correctly gather gs_shadow value for current vCPU
On 5/2/19 2:52 AM, Tamas K Lengyel wrote: > Currently the gs_shadow value is only cached when the vCPU is being scheduled > out by Xen. Reporting this (usually) stale value through vm_event is > incorrect, > since it doesn't represent the actual state of the vCPU at the time the event > was recorded. This prevents vm_event subscribers from correctly finding kernel > structures in the guest when it is trapped while in ring3. > > Refresh shadow_gs value when the context being saved is for the current vCPU. > > Signed-off-by: Tamas K Lengyel > Cc: Razvan Cojocaru > Cc: Jun Nakajima > Cc: Kevin Tian > Cc: Jan Beulich > Cc: Andrew Cooper > Cc: Wei Liu > Cc: Roger Pau Monne > --- > v2: move fix to hvm so vm_event doesn't have to know specifics > --- > xen/arch/x86/hvm/vmx/vmx.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index 283eb7b34d..5154ecc2a8 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -779,12 +779,17 @@ static void vmx_load_cpu_state(struct vcpu *v, struct > hvm_hw_cpu *data) > > static void vmx_save_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt) > { > +if ( v == current ) > +vmx_save_guest_msrs(v); vmx_save_guest_msrs() is simple enough that the if is probably not necessary here (we can just call the function directly, regardless of what v holds). But that's not technically an issue, so if nobody else minds: Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/vm_event: correctly gather gs_shadow value
On 5/1/19 6:01 PM, Tamas K Lengyel wrote: > On Wed, May 1, 2019 at 8:53 AM Tamas K Lengyel wrote: >> >> On Wed, May 1, 2019 at 8:20 AM Razvan Cojocaru >> wrote: >>> >>> On 5/1/19 4:58 PM, Tamas K Lengyel wrote: >>>>>> It might be worth introducing a "sync state from hw" hook which collects >>>>>> all the data we intend to pass to the introspection agent. >>>>> >>>>> You mean adding another hvm hook? >>>> >>>> Actually, instead of another hook I think what would make sense it to >>>> just update vmx_save_vmcs_ctxt to automatically refresh the cached >>>> register values when it's called with "v == current". Thoughts? >>> >>> That's probably the better way to go about it, since otherwise the >>> xc_hvm_getcontext_partial() hypercall will suffer from the same problem. >>> (there are two ways of getting guest state: one is via the vm_event >>> cached values, the other is via the aforementioned hypercall). >> >> True, although issuing the hypercall in the vm_event callback is >> actually fine - that's how I found the issue to begin with, since the >> vCPU will be scheduled out with the cached registers refreshed and >> thus be different then what the vm_event itself had. But other callers >> of the hypercall can run into the problem if the guest/vcpu is not >> paused. > > Actually, doing the "v == current" check wouldn't really do anything > for the hypercall since it's not the domain issuing the hypercall on > itself. The only way to ensure that hypercall is returning correct > values would be to pause the vCPU. I've discussed this with Andrew in the meantime and he's kindly pointed out that for the hypercall we pause from remote context, which forces a de-schedule. So the hypercall _should_ be fine. But we write data to the vm_event ring from the current context, where state might actually be in hardware. He'll probably chime in with additional suggestions / comments. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/vm_event: correctly gather gs_shadow value
On 5/1/19 4:58 PM, Tamas K Lengyel wrote: >>> It might be worth introducing a "sync state from hw" hook which collects >>> all the data we intend to pass to the introspection agent. >> >> You mean adding another hvm hook? > > Actually, instead of another hook I think what would make sense it to > just update vmx_save_vmcs_ctxt to automatically refresh the cached > register values when it's called with "v == current". Thoughts? That's probably the better way to go about it, since otherwise the xc_hvm_getcontext_partial() hypercall will suffer from the same problem. (there are two ways of getting guest state: one is via the vm_event cached values, the other is via the aforementioned hypercall). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/vm_event: correctly gather gs_shadow value
On 5/1/19 7:22 AM, Tamas K Lengyel wrote: > Currently the gs_shadow value is only cached when the vCPU is being scheduled > out by Xen. Reporting this (usually) stale value through vm_event is > incorrect, > since it doesn't represent the actual state of the vCPU at the time the event > was recorded. This prevents vm_event subscribers from correctly finding kernel > structures in the guest when it is trapped while in ring3. Isn't the VCPU always scheduled out (since it is paused) with sync vm_events? Is this an async fix only? In any case, Acked-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen 4.11.1: guest uses altp2m to crash host
On 4/25/19 1:04 PM, Jan Beulich wrote: On 15.04.19 at 16:10, wrote: >> On 4/15/19 5:04 PM, Matt Leinhos wrote: >>> You are correct -- cherry picking that commit into RELEASE-4.11.1 >>> addresses the crash. The problem cannot be reproduced on 4.12. >>> >>> I hope this helps. >> >> Thank you for confirming the fix! >> >> To the deciders: would it be reasonable to ask patches like this to be >> backported to the future 4.11 release? > > Well, first of all you can always send a request. And yes, in general > I'd consider this reasonable, but I say so without having looked at > the actual commit(s) in question. Thank you for the answer. This is the commit in question: http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=1dddfff4c39d3db17dfa709b1c57f44e3ed352e3;hp=68465944c81fa2fbc40bf29ec6084072af5e48ec AFAICT there's nothing controversial about it, and it should easily apply to 4.11, so if possible I propose that we consider backporting it to the next 4.11 release. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] VMI: singlestep event not received
On 4/24/19 6:25 PM, Tamas K Lengyel wrote: On Mon, Apr 22, 2019 at 6:00 PM Mathieu Tarral wrote: On Monday 22 April 2019 11:22, Razvan Cojocaru wrote: On 4/22/19 1:26 AM, Mathieu Tarral wrote: The problem is that RtlUserThreadStart is paged-out, so i'm trying to reach it via singlestepping as a backup solution. FWIW, you can just use xc_hvm_inject_trap() with a TRAP_page_fault parameter to cause the OS to bring a paged out page back into main memory. The problem is that i'm fully based on LibVMI, which doesn't provide an API for page injection yet. I think Tamas was waiting for a full support of pagefault injection on Xen before opening an API. Maybe we should open an issue to track the progress about it. Thanks anyway for the suggestion ! Not having a LibVMI wrapper should not prevent you from using a libxc function directly. There are many corner-cases regarding when it is safe to inject a pagefault into the guest to bring back pages and that logic is not trivial enough to be implemented in LibVMI. It's way too specific to the usecase to decide when it's safe to do that. So in this case, if you know you are in ring3 of your target process you can just call the libxc function directly. That's very very true, however I think we should assume that libraries such as libvmi and libbdvmi are used by client code smart enough to figure out, using said libraries, enough things about what's appropriate (or not) to do, and let them do it rather than second-guess them. Hence, in libbdvmi I've wrapped that, and added a comment basically saying what you're saying above: https://github.com/bitdefender/libbdvmi/blob/master/src/xendriver.cpp#L542 If we end up having the client code call backend-specific code directly we're not, after all, really abstracting it with our wrapper libraries. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] x86/mm: Make change_type_range return error
On 4/24/19 5:46 PM, Roger Pau Monné wrote: On Wed, Apr 24, 2019 at 02:27:32PM +, Alexandru Stefan ISAILA wrote: diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 9e81a30cc4..27697d5a77 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -1028,7 +1028,7 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn_l, } /* Modify the p2m type of [start, end_exclusive) from ot to nt. */ -static void change_type_range(struct p2m_domain *p2m, +static int change_type_range(struct p2m_domain *p2m, unsigned long start, unsigned long end_exclusive, p2m_type_t ot, p2m_type_t nt) { @@ -1053,15 +1053,11 @@ static void change_type_range(struct p2m_domain *p2m, * This should be revisited later, but for now post a warning. */ if ( unlikely(end > host_max_pfn) ) -{ -printk(XENLOG_G_WARNING "Dom%d logdirty rangeset clipped to max_mapped_pfn\n", - d->domain_id); -end = invalidate_end = host_max_pfn; -} +return -EINVAL; /* If the requested range is out of scope, return doing nothing. */ if ( start > end ) -return; +return 0; Since you are already changing the behavior of the function above this should also return EINVAL IMO. My fault, I suggested 0 there. :) Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] VMI: singlestep event not received
On 4/22/19 1:26 AM, Mathieu Tarral wrote: The problem is that RtlUserThreadStart is paged-out, so i'm trying to reach it via singlestepping as a backup solution. FWIW, you can just use xc_hvm_inject_trap() with a TRAP_page_fault parameter to cause the OS to bring a paged out page back into main memory. HTH, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen 4.11.1: guest uses altp2m to crash host
On 4/15/19 5:04 PM, Matt Leinhos wrote: You are correct -- cherry picking that commit into RELEASE-4.11.1 addresses the crash. The problem cannot be reproduced on 4.12. I hope this helps. Thank you for confirming the fix! To the deciders: would it be reasonable to ask patches like this to be backported to the future 4.11 release? Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen 4.11.1: guest uses altp2m to crash host
Hello Matt, Judging by the description of your problem, I believe that it has already been fixed by this commit: http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=1dddfff4c39d3db17dfa709b1c57f44e3ed352e3;hp=68465944c81fa2fbc40bf29ec6084072af5e48ec Could you please either try to reproduce the problem with 4.12 or apply this patch to 4.11 and confirm? Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/hvm: Fix altp2m_op hypercall continuations
On 4/8/19 9:34 PM, Andrew Cooper wrote: On 08/04/2019 19:13, Razvan Cojocaru wrote: The changes certainly look correct. I'll run some tests tomorrow as well. Thanks, Continuation appears to work correctly with our tests (64-bit PV domain calling 64-bit Xen / dom0). Since we're only using the external altp2m model, HVMOPs from a 32-bit guest never occur, so the issues this patch fixes are not a problem for our introspection agent, at least. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/hvm: Fix altp2m_op hypercall continuations
On 4/8/19 8:39 PM, Andrew Cooper wrote: > c/s 9383de210 "x86/altp2m: support for setting restrictions for an array of > pages" introduced this logic, but do_hvm_op() was already capable of handling > -ERESTART correctly. > > More problematic however is a continuation from compat_altp2m_op(). The arg > written back into register state points into the hypercall XLAT area, not at > the original parameter passed by the guest. It may be truncated by the > vmentry, but definitely won't be correct on the next invocation. > > Delete the hypercall_create_continuation() call, and return -ERESTART, which > will cause the compat case to start working correctly. > > Signed-off-by: Andrew Cooper > --- > CC: Jan Beulich > CC: Wei Liu > CC: Roger Pau Monné > CC: Razvan Cojocaru > CC: Petre Pircalabu > --- > xen/arch/x86/hvm/hvm.c | 12 ++-- > 1 file changed, 2 insertions(+), 10 deletions(-) > > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c > index e798b49..cc89ee7 100644 > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -4729,12 +4729,10 @@ static int do_altp2m_op( > if ( rc > 0 ) > { > a.u.set_mem_access_multi.opaque = rc; > +rc = -ERESTART; > if ( __copy_field_to_guest(guest_handle_cast(arg, > xen_hvm_altp2m_op_t), > &a, u.set_mem_access_multi.opaque) ) > rc = -EFAULT; > -else > -rc = hypercall_create_continuation(__HYPERVISOR_hvm_op, "lh", > - HVMOP_altp2m, arg); > } > break; Right, that part was taken from the XENMEM_access_op_set_access_multi code and plopped in. Didn't follow the call chain all the way through and so missed the simpler way of doing this. The changes certainly look correct. I'll run some tests tomorrow as well. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] vm_event: Fix XEN_VM_EVENT_RESUME domctl
On 4/5/19 12:33 PM, Petre Pircalabu wrote: Make XEN_VM_EVENT_RESUME return 0 in case of success, instead of -EINVAL. Remove vm_event_resume form vm_event.h header and set the function's visibility to static as is used only in vm_event.c. Move the vm_event_check_ring test inside vm_event_resume in order to simplify the code. Signed-off-by: Petre Pircalabu Acked-by: Razvan Cojocaru ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] Fix p2m_set_suppress_ve
On 4/4/19 3:46 PM, Razvan Cojocaru wrote: On 4/3/19 6:30 PM, Jan Beulich wrote: On 03.04.19 at 17:17, wrote: On 4/3/19 5:58 PM, Jan Beulich wrote: On 03.04.19 at 16:29, wrote: --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -3011,8 +3011,16 @@ int p2m_set_suppress_ve(struct domain *d, gfn_t gfn, bool suppress_ve, mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL, NULL); if ( !mfn_valid(mfn) ) { - rc = -ESRCH; - goto out; + unsigned int page_order; + + mfn = __get_gfn_type_access(host_p2m, gfn_x(gfn), &t, &a, + P2M_ALLOC | P2M_UNSHARE, &page_order, 0); I'm not entirely certain about P2M_ALLOC, but I'm pretty sure that at least P2M_UNSHARE is too heavy: Why would you want to force un-sharing of a page when all you want to alter is #VE behavior? That logic was taken from p2m_set_altp2m_mem_access(), we thought the two cases are very similar. I see. On the UNSHARE observation, we don't know why the author originally requested the flag. We decided to keep it on the assumption that it _probably_ handles some corner-case that somebody has come accross. We'll prepare a mini-series factoring out the code we've been discussing in separate functions: one for getting things out of the hostp2m if the entry is not present in the altp2m, and one for the special page-order-dependent code (which is duplicated in p2m_set_altp2m_mem_access() and p2m_change_altp2m_gfn()). Before going into that, are we now certain that ALLOC is sufficient? I believe it should be for _our_ use-cases, but we don't want to break anyone's code. Maybe Tamas knows more about this. Sorry, I forgot to mention that p2m_change_altp2m_gfn() only uses ALLOC: 2649 /* Check host p2m if no valid entry in alternate */ 2650 if ( !mfn_valid(mfn) ) 2651 { 2652 mfn = __get_gfn_type_access(hp2m, gfn_x(old_gfn), &t, &a, 2653 P2M_ALLOC, &page_order, 0); 2654 2655 if ( !mfn_valid(mfn) || t != p2m_ram_rw ) 2656 goto out; 2657 2658 /* If this is a superpage, copy that first */ 2659 if ( page_order != PAGE_ORDER_4K ) 2660 { 2661 gfn_t gfn; 2662 unsigned long mask; 2663 2664 mask = ~((1UL << page_order) - 1); 2665 gfn = _gfn(gfn_x(old_gfn) & mask); 2666 mfn = _mfn(mfn_x(mfn) & mask); 2667 2668 if ( ap2m->set_entry(ap2m, gfn, mfn, page_order, t, a, 1) ) 2669 goto out; 2670 } 2671 } Confusing... Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] Fix p2m_set_suppress_ve
On 4/3/19 6:30 PM, Jan Beulich wrote: On 03.04.19 at 17:17, wrote: On 4/3/19 5:58 PM, Jan Beulich wrote: On 03.04.19 at 16:29, wrote: --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -3011,8 +3011,16 @@ int p2m_set_suppress_ve(struct domain *d, gfn_t gfn, bool suppress_ve, mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL, NULL); if ( !mfn_valid(mfn) ) { -rc = -ESRCH; -goto out; +unsigned int page_order; + +mfn = __get_gfn_type_access(host_p2m, gfn_x(gfn), &t, &a, +P2M_ALLOC | P2M_UNSHARE, &page_order, 0); I'm not entirely certain about P2M_ALLOC, but I'm pretty sure that at least P2M_UNSHARE is too heavy: Why would you want to force un-sharing of a page when all you want to alter is #VE behavior? That logic was taken from p2m_set_altp2m_mem_access(), we thought the two cases are very similar. I see. On the UNSHARE observation, we don't know why the author originally requested the flag. We decided to keep it on the assumption that it _probably_ handles some corner-case that somebody has come accross. We'll prepare a mini-series factoring out the code we've been discussing in separate functions: one for getting things out of the hostp2m if the entry is not present in the altp2m, and one for the special page-order-dependent code (which is duplicated in p2m_set_altp2m_mem_access() and p2m_change_altp2m_gfn()). Before going into that, are we now certain that ALLOC is sufficient? I believe it should be for _our_ use-cases, but we don't want to break anyone's code. Maybe Tamas knows more about this. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] Fix p2m_set_suppress_ve
On 4/3/19 7:04 PM, Jan Beulich wrote: On 03.04.19 at 17:48, wrote: >> On 4/3/19 6:30 PM, Jan Beulich wrote: >> On 03.04.19 at 17:17, wrote: Changes to the hostp2m (also known as altp2m view 0) propagate to all existing altp2ms, but they do so in a lazy manner, and also that won't happen for altp2ms created after a while. So altp2ms will not necessarily know about a page that the hostp2m knows about, which should not stop us from setting mem access restrictions or the value of the SVE bit. >>> >>> But even if the propagation is lazy, a "get" should then return the >>> propagated value, shouldn't it? Or else callers (like is the case here) >>> can be mislead. I do recognize that there may be callers who >>> intentionally _don't_ want the propagated value, but I'd expect this >>> to be the exception, not the rule. I wonder therefore whether there >>> shouldn't be a wrapper around ->get_entry() to take care of this for >>> callers which want the propagated value. Calling the bare hook would >>> remain as is. >> >> But I believe that some hostp2m entries will never be propagated. Sorry >> that I've not been clear about this. This is what I meant by "won't >> happen for altp2ms created after a while". >> >> Take, for example, the case where there's only the hostp2m for the first >> 30 minutes of a guest's life, and in this time somebody calls >> ept_set_entry(). This, in turn, calls p2m_altp2m_propagate_change() if ( >> entry_written && p2m_is_hostp2m(p2m) ). >> >> But at this point there are no altp2ms, so there's nowhere to propagate >> these changes to. >> >> Now, at minute 31 we create a new altp2m. The changes will never be >> propagated here, so altp2m->get_entry() will always (rightfully) return >> an invalid MFN. > > I guess I'm missing something here: Why "rightfully"? If the guest > runs on this altp2m, it'll die a quick death then, won't it? Yet if > this is intended to be demand-populate, then why would there be > automatic propagation for existing altp2m-s (as you explain > further up, unless I'm getting this wrong)? > >> We only want to make an exception for setting the SVE bit or mem access >> restrictions on an entry in the altp2m, but having get_entry() (or a >> wrapper) return the hostp2m values and MFN might not necessarily be what >> current callers expect. > > Then I still don't see why you would want to force propagation > in these cases: If it's generally demand-populate, it can't be right > to _fully_ populate that slot. You ought to be setting the access > type or the "suppress #VE" bit alone, without propagating the > MFN and alike. The later, when the entry actually gets propagated, > the entered values should be used as overrides to what comes > from the host p2m. To try to answer your question to the best of my knowledge, the altp2m propagation appears to be a hybrid between lazy propagation, in hvm_hap_nested_page_fault() (xen/arch/x86/hvm/hvm.c): 1748 ap2m_active = altp2m_active(currd); 1749 1750 /* 1751 * Take a lock on the host p2m speculatively, to avoid potential 1752 * locking order problems later and to handle unshare etc. 1753 */ 1754 hostp2m = p2m_get_hostp2m(currd); 1755 mfn = get_gfn_type_access(hostp2m, gfn, &p2mt, &p2ma, 1756 P2M_ALLOC | (npfec.write_access ? P2M_UNSHARE : 0), 1757 NULL); 1758 1759 if ( ap2m_active ) 1760 { 1761 if ( p2m_altp2m_lazy_copy(curr, gpa, gla, npfec, &p2m) ) 1762 { 1763 /* entry was lazily copied from host -- retry */ 1764 __put_gfn(hostp2m, gfn); 1765 rc = 1; 1766 goto out; 1767 } 1768 1769 mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 0, NULL); 1770 } 1771 else 1772 p2m = hostp2m; and immediate propagation, in ept_set_entry() (xen/arch/x86/mm/p2m-ept.c): 827 if ( entry_written && p2m_is_hostp2m(p2m) ) 828 { 829 ret = p2m_altp2m_propagate_change(d, _gfn(gfn), mfn, order, p2mt, p2ma); 830 if ( !rc ) 831 rc = ret; 832 } Either way, doing what our patch does should pose no problem as far as I can tell. For the lazy copy case, p2m_altp2m_lazy_copy() will do nothing, as our function has already populated the slot correctly: 2378 bool_t p2m_altp2m_lazy_copy(struct vcpu *v, paddr_t gpa, 2379 unsigned long gla, struct npfec npfec, 2380 struct p2m_domain **ap2m) 2381 { 2382 struct p2m_domain *hp2m = p2m_get_hostp2m(v->domain); 2383 p2m_type_t p2mt; 2384 p2m_access_t p2ma; 2385 unsigned int page_order; 2386 gfn_t gfn = _gfn(paddr_to_pfn(gpa)); 2387 unsigned long mask; 2388 mfn_t mfn; 2389 int rv; 2390 2391 *ap2m = p2m_get_altp2m(v); 2392 2393 mfn = get_gfn_type_access(*ap2m, gfn_x(gfn), &p2mt, &p2ma, 2394 0, &page_order); 2395
Re: [Xen-devel] [PATCH v1] Fix p2m_set_suppress_ve
On 4/3/19 7:16 PM, Tamas K Lengyel wrote: > It is right to fully populate a slot when for example we want > different mem_access permissions in different altp2m views. We can't > wait for the domain to on-demand populate the altp2m view because we > don't know when (and if) that will actually happen. So > p2m_set_altp2m_mem_access already copies the entry over from the > hostp2m to the altp2m and then applies the requested mem_access > setting. This is done even if the mem_access setting is the same as it > was in the hostp2m. > > Doing the same behavior for p2m_set_suppress_ve I think is consistent, > albeit you could easily overcome the issue by first simply setting a > mem_access permission on the gfn to trigger the aforementioned copy to > the altp2m before you try to set the ve settings. That's what we're doing now, but it wasn't at all intuitive to figure it out. I'd expect that setting the SVE bit simply works without maneuvering something else in place first. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] Fix p2m_set_suppress_ve
On 4/3/19 6:30 PM, Jan Beulich wrote: On 03.04.19 at 17:17, wrote: Changes to the hostp2m (also known as altp2m view 0) propagate to all existing altp2ms, but they do so in a lazy manner, and also that won't happen for altp2ms created after a while. So altp2ms will not necessarily know about a page that the hostp2m knows about, which should not stop us from setting mem access restrictions or the value of the SVE bit. But even if the propagation is lazy, a "get" should then return the propagated value, shouldn't it? Or else callers (like is the case here) can be mislead. I do recognize that there may be callers who intentionally _don't_ want the propagated value, but I'd expect this to be the exception, not the rule. I wonder therefore whether there shouldn't be a wrapper around ->get_entry() to take care of this for callers which want the propagated value. Calling the bare hook would remain as is. But I believe that some hostp2m entries will never be propagated. Sorry that I've not been clear about this. This is what I meant by "won't happen for altp2ms created after a while". Take, for example, the case where there's only the hostp2m for the first 30 minutes of a guest's life, and in this time somebody calls ept_set_entry(). This, in turn, calls p2m_altp2m_propagate_change() if ( entry_written && p2m_is_hostp2m(p2m) ). But at this point there are no altp2ms, so there's nowhere to propagate these changes to. Now, at minute 31 we create a new altp2m. The changes will never be propagated here, so altp2m->get_entry() will always (rightfully) return an invalid MFN. We only want to make an exception for setting the SVE bit or mem access restrictions on an entry in the altp2m, but having get_entry() (or a wrapper) return the hostp2m values and MFN might not necessarily be what current callers expect. Whether the described scenario _should_ work the way it does is debatable, but it appears to do so by design. Hopefully I'm not missing anything. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] Fix p2m_set_suppress_ve
On 4/3/19 5:58 PM, Jan Beulich wrote: On 03.04.19 at 16:29, wrote: --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -3011,8 +3011,16 @@ int p2m_set_suppress_ve(struct domain *d, gfn_t gfn, bool suppress_ve, mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL, NULL); if ( !mfn_valid(mfn) ) { -rc = -ESRCH; -goto out; +unsigned int page_order; + +mfn = __get_gfn_type_access(host_p2m, gfn_x(gfn), &t, &a, +P2M_ALLOC | P2M_UNSHARE, &page_order, 0); I'm not entirely certain about P2M_ALLOC, but I'm pretty sure that at least P2M_UNSHARE is too heavy: Why would you want to force un-sharing of a page when all you want to alter is #VE behavior? That logic was taken from p2m_set_altp2m_mem_access(), we thought the two cases are very similar. 269 mfn = ap2m->get_entry(ap2m, gfn, &t, &old_a, 0, NULL, NULL); 270 271 /* Check host p2m if no valid entry in alternate */ 272 if ( !mfn_valid(mfn) ) 273 { 274 275 mfn = __get_gfn_type_access(hp2m, gfn_l, &t, &old_a, 276 P2M_ALLOC | P2M_UNSHARE, &page_order, 0); 277 278 rc = -ESRCH; 279 if ( !mfn_valid(mfn) || t != p2m_ram_rw ) 280 return rc; 281 282 /* If this is a superpage, copy that first */ 283 if ( page_order != PAGE_ORDER_4K ) 284 { 285 unsigned long mask = ~((1UL << page_order) - 1); 286 gfn_t gfn2 = _gfn(gfn_l & mask); 287 mfn_t mfn2 = _mfn(mfn_x(mfn) & mask); 288 289 rc = ap2m->set_entry(ap2m, gfn2, mfn2, page_order, t, old_a, 1); 290 if ( rc ) 291 return rc; 292 } 293 } 294 295 /* 296 * Inherit the old suppress #VE bit value if it is already set, or set it 297 * to 1 otherwise 298 */ 299 return ap2m->set_entry(ap2m, gfn, mfn, PAGE_ORDER_4K, t, a, -1); 300 } I wonder if we should put the whole logic in the "if ( !mfn_valid(mfn) )" body in its own function and reuse that for both functions - although it doesn't look like the extra superpage logic matters for setting the suppress #VE bit alone (since even the code above only sets it with PAGE_ORDER_4K). Additionally, when you add such a lookup as error handling attempt, I think it is important to leave a code comment. But I wonder whether this shouldn't be done before the call to ->get_entry(), or whether in fact there's a bug here in how get_entry() behaves in this case. Changes to the hostp2m (also known as altp2m view 0) propagate to all existing altp2ms, but they do so in a lazy manner, and also that won't happen for altp2ms created after a while. So altp2ms will not necessarily know about a page that the hostp2m knows about, which should not stop us from setting mem access restrictions or the value of the SVE bit. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] x86/altp2m: treat view 0 as the hostp2m in p2m_get_mem_access()
p2m_set_mem_access() (and other places) treat view 0 as the hostp2m, but p2m_get_mem_access() does not. Correct that inconsistency. Signed-off-by: Razvan Cojocaru --- xen/arch/x86/mm/mem_access.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/arch/x86/mm/mem_access.c b/xen/arch/x86/mm/mem_access.c index 56c06a4..a144bb0 100644 --- a/xen/arch/x86/mm/mem_access.c +++ b/xen/arch/x86/mm/mem_access.c @@ -505,7 +505,7 @@ int p2m_get_mem_access(struct domain *d, gfn_t gfn, xenmem_access_t *access, if ( altp2m_idx ) return -EINVAL; } -else +else if ( altp2m_idx ) /* altp2m view 0 is treated as the hostp2m */ { if ( altp2m_idx >= MAX_ALTP2M || d->arch.altp2m_eptp[altp2m_idx] == mfn_x(INVALID_MFN) ) -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 04/12] xen/arm: memaccess: Initialize correctly *access in __p2m_get_mem_access
On 3/27/19 8:45 PM, Julien Grall wrote: > The commit 8d84e701fd "xen/arm: initialize access" initializes > *access using the wrong enumeration type. This result to a warning > using clang: > > mem_access.c:50:20: error: implicit conversion from enumeration type > 'p2m_access_t' to different enumeration type 'xenmem_access_t' > [-Werror,-Wenum-conversion] > *access = p2m->default_access; > ~ ~^~ > > The correct solution is to use the array memaccess that will do the > conversion between the 2 enums. > > Fixes: 8d84e701fd ("xen/arm: initialize access") > Signed-off-by: Julien Grall Acked-by: Razvan Cojocaru ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Commit 16cc3362aed39e3093419b9df6ec73269071d063 configure failure
On 3/13/19 5:02 PM, Wei Liu wrote: On Wed, Mar 13, 2019 at 01:31:22PM +0200, Razvan Cojocaru wrote: On 3/13/19 1:28 PM, Wei Liu wrote: On Wed, Mar 13, 2019 at 01:24:06PM +0200, Razvan Cojocaru wrote: Hello, Commit "build/m4: make python_devel.m4 work with both python 2 and 3" makes my configure run fail, at least on my Ubuntu 16.04.6 LTS machine. http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=16cc3362aed39e3093419b9df6ec73269071d063 checking for python-config... /usr/bin/python-config checking Python.h usability... yes checking Python.h presence... yes checking for Python.h... yes checking for PyArg_ParseTuple... no configure: error: Unable to find a suitable python development library configure: error: ./configure failed for tools Checking out the commit below it, configure works. I've been looking at possibly installing some additional Python-related package on that machine but so far nothing I could think of seems to be doing the trick. Of course, if Xen's no longer supposed to build on setups such as mine then sorry for the noise. Yeah I'm aware of the failure. Thanks for reporting. You're supposed to install python-dev now, but that still won't resolve the issue. I think there is a subtle difference between Debian and Ubuntu. I will try to fix it today. Indeed it won't - I do have it installed (2.7.12-1~16.04). Pull from staging -- it should be fixed now. It does build now indeed. Thanks for the quick fix. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Commit 16cc3362aed39e3093419b9df6ec73269071d063 configure failure
On 3/13/19 1:28 PM, Wei Liu wrote: On Wed, Mar 13, 2019 at 01:24:06PM +0200, Razvan Cojocaru wrote: Hello, Commit "build/m4: make python_devel.m4 work with both python 2 and 3" makes my configure run fail, at least on my Ubuntu 16.04.6 LTS machine. http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=16cc3362aed39e3093419b9df6ec73269071d063 checking for python-config... /usr/bin/python-config checking Python.h usability... yes checking Python.h presence... yes checking for Python.h... yes checking for PyArg_ParseTuple... no configure: error: Unable to find a suitable python development library configure: error: ./configure failed for tools Checking out the commit below it, configure works. I've been looking at possibly installing some additional Python-related package on that machine but so far nothing I could think of seems to be doing the trick. Of course, if Xen's no longer supposed to build on setups such as mine then sorry for the noise. Yeah I'm aware of the failure. Thanks for reporting. You're supposed to install python-dev now, but that still won't resolve the issue. I think there is a subtle difference between Debian and Ubuntu. I will try to fix it today. Indeed it won't - I do have it installed (2.7.12-1~16.04). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Commit 16cc3362aed39e3093419b9df6ec73269071d063 configure failure
Hello, Commit "build/m4: make python_devel.m4 work with both python 2 and 3" makes my configure run fail, at least on my Ubuntu 16.04.6 LTS machine. http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=16cc3362aed39e3093419b9df6ec73269071d063 checking for python-config... /usr/bin/python-config checking Python.h usability... yes checking Python.h presence... yes checking for Python.h... yes checking for PyArg_ParseTuple... no configure: error: Unable to find a suitable python development library configure: error: ./configure failed for tools Checking out the commit below it, configure works. I've been looking at possibly installing some additional Python-related package on that machine but so far nothing I could think of seems to be doing the trick. Of course, if Xen's no longer supposed to build on setups such as mine then sorry for the noise. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 3/4] x86/vmx: Fix security issue when a guest balloons out the #VE info page
On 2/21/19 10:18 PM, Andrew Cooper wrote: > The logic in altp2m_vcpu_{en,dis}able_ve() and vmx_vcpu_update_vmfunc_ve() is > dangerous. After #VE has been set up, the guest can balloon out and free the > nominated GFN, after which the processor may write to it. Also, the unlocked > GFN query means the MFN is stale by the time it is used. Alternatively, a > guest can race two disable calls to cause one VMCS to still reference the > nominated GFN after the tracking information was dropped. > > Rework the logic from scratch to make it safe. > > Hold an extra page reference on the underlying frame, to account for the > VMCS's reference. This means that if the GFN gets ballooned out, it isn't > freed back to Xen until #VE is disabled, and the VMCS no longer refers to the > page. > > A consequence of this is that altp2m_vcpu_disable_ve() needs to be called > during the domain_kill() path, to drop the reference for domains which shut > down with #VE still enabled. > > For domains using altp2m, we expect a single enable call and no disable for > the remaining lifetime of the domain. However, to avoid problems with > concurrent calls, use cmpxchg() to locklessly maintain safety. > > This doesn't have an XSA because altp2m is not yet a security-supported > feature. > > Signed-off-by: Andrew Cooper > Release-acked-by: Juergen Gross Tested-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 3/4] x86/vmx: Fix security issue when a guest balloons out the #VE info page
On 2/20/19 12:18 AM, Andrew Cooper wrote: The logic in altp2m_vcpu_{en,dis}able_ve() and vmx_vcpu_update_vmfunc_ve() is dangerous. After #VE has been set up, the guest can balloon out and free the nominated GFN, after which the processor may write to it. Also, the unlocked GFN query means the MFN is stale by the time it is used. Alternatively, a guest can race two disable calls to cause one VMCS to still reference the nominated GFN after the tracking information was dropped. Rework the logic from scratch to make it safe. Hold an extra page reference on the underlying frame, to account for the VMCS's reference. This means that if the GFN gets ballooned out, it isn't freed back to Xen until #VE is disabled, and the VMCS no longer refers to the page. A consequence of this is that arch_vcpu_unmap_resources() now needs to call altp2m_vcpu_disable_ve() to drop the reference during domain_kill(), to allow all of the memory to be freed. For domains using altp2m, we expect a single enable call and no disable for the remaining lifetime of the domain. However, to avoid problems with concurrent calls, use cmpxchg() to locklessly maintain safety. This doesn't have an XSA because altp2m is not yet a security-supported feature. Signed-off-by: Andrew Cooper Tested-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/4] x86/vmx: Properly flush the TLB when an altp2m is modified
On 2/20/19 12:18 AM, Andrew Cooper wrote: > Modificaitons to an altp2m mark the p2m as needing flushing, but this was > never wired up in the return-to-guest path. As a result, stale TLB entries > can remain after resuming the guest. > > In practice, this manifests as a missing EPT_VIOLATION or #VE exception when > the guest subsequently accesses a page which has had its permissions reduced. > > vmx_vmenter_helper() now has 11 p2ms to potentially invalidate, but issuing 11 > INVEPT instructions isn't clever. Instead, count how many contexts need > invalidating, and use INVEPT_ALL_CONTEXT if two or more are in need of > flushing. > > This doesn't have an XSA because altp2m is not yet a security-supported > feature. > > Signed-off-by: Andrew Cooper Reviewed-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/4] x86/altp2m: Rework #VE enable/disable paths
On 2/20/19 12:18 AM, Andrew Cooper wrote: > Split altp2m_vcpu_{enable,disable}_ve() out of the > HVMOP_altp2m_vcpu_{enable,disable}_notify marshalling logic. A future change > is going to need to call altp2m_vcpu_disable_ve() from the domain_kill() path. > > While at it, clean up the logic in altp2m_vcpu_{initialise,destroy}(). > altp2m_vcpu_reset() has no external callers, so fold it into its two > callsites. This in turn allows for altp2m_vcpu_destroy() to reuse > altp2m_vcpu_disable_ve() rather than opencoding it. > > No practical change. > > Signed-off-by: Andrew Cooper Reviewed-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/4] xen/common: Break domain_unmap_resources() out of domain_kill()
On 2/20/19 12:18 AM, Andrew Cooper wrote: > A subsequent change is going to need an x86-specific unmapping step, so take > the opportunity to split the current vcpu unmapping out into a dedicated path. > > No practical change. > > Signed-off-by: Andrew Cooper > --- > CC: Jan Beulich > CC: Wei Liu > CC: Roger Pau Monné > CC: Razvan Cojocaru > CC: Tamas K Lengyel > CC: Juergen Gross > --- > xen/common/domain.c | 16 +--- > xen/include/xen/domain.h | 4 > 2 files changed, 17 insertions(+), 3 deletions(-) > > diff --git a/xen/common/domain.c b/xen/common/domain.c > index 32bca8d..e66f7ea 100644 > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -700,10 +700,21 @@ int rcu_lock_live_remote_domain_by_id(domid_t dom, > struct domain **d) > return 0; > } > > +static void domain_unmap_resources(struct domain *d) > +{ > +struct vcpu *v; > + > +for_each_vcpu ( d, v ) > +{ > +unmap_vcpu_info(v); > + > +arch_vcpu_unmap_resources(v); > +} > +} > + > int domain_kill(struct domain *d) > { > int rc = 0; > -struct vcpu *v; > > if ( d == current->domain ) > return -EINVAL; > @@ -732,13 +743,12 @@ int domain_kill(struct domain *d) > d->tmem_client = NULL; > /* fallthrough */ > case DOMDYING_dying: > +domain_unmap_resources(d); > rc = domain_relinquish_resources(d); > if ( rc != 0 ) > break; > if ( cpupool_move_domain(d, cpupool0) ) > return -ERESTART;a > -for_each_vcpu ( d, v ) > -unmap_vcpu_info(v); So before this change there was a leak of some sort here? I see that before this patch unmap_vcpu_info() was called only if rc == 0, but it is now called unconditionally _before_ domain_relinquish_resources(). This does appear to change the behaviour of the code in the error case. If that's intended: Reviewed-by: Razvan Cojocaru Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFC 0/6] Slotted channels for sync vm_events
On 2/12/19 7:01 PM, Tamas K Lengyel wrote: On Thu, Feb 7, 2019 at 9:06 AM Petre Ovidiu PIRCALABU wrote: On Thu, 2019-02-07 at 11:46 +, George Dunlap wrote: On 2/6/19 2:26 PM, Petre Ovidiu PIRCALABU wrote: On Wed, 2018-12-19 at 20:52 +0200, Petre Pircalabu wrote: This patchset is a rework of the "multi-page ring buffer" for vm_events patch based on Andrew Cooper's comments. For synchronous vm_events the ring waitqueue logic was unnecessary as the vcpu sending the request was blocked until a response was received. To simplify the request/response mechanism, an array of slotted channels was created, one per vcpu. Each vcpu puts the request in the corresponding slot and blocks until the response is received. I'm sending this patch as a RFC because, while I'm still working on way to measure the overall performance improvement, your feedback would be a great assistance. Is anyone still using asynchronous vm_event requests? (the vcpu is not blocked and no response is expected). If not, I suggest that the feature should be removed as it (significantly) increases the complexity of the current implementation. Could you describe in a bit more detail what the situation is? What's the current state fo affairs with vm_events, what you're trying to change, why async vm_events is more difficult? The main reason for the vm_events modification was to improve the overall performance in high throughput introspection scenarios. For domus with a higher vcpu count, a vcpu could sleep for a certain period of time while waiting for a ring slot to become available (__vm_event_claim_slot) The first patchset only increased the ring size, and the second iteraton, based on Andrew Copper's comments, proposed a separate path to handle synchronous events ( a slotted buffer for each vcpu) in order to have the events handled independently of one another. To handle asynchronous events, a dynamically allocated vm_event ring is used. While the implementation is not exactly an exercise in simplicity, it preserves all the needed functionality and offers fallback if the Linux domain running the monitor application doesn't support IOCTL_PRIVCMD_MMAP_RESOURCE. However, the problem got a little bit more complicated when I tried implementing the vm_events using an IOREQ server (based on Paul Durrant's comments). For synchronous vm_events, it simplified things a little, eliminating both the need for a special structure to hold the processing state and the evtchns for each vcpu. The asynchronous events were a little more tricky to handle. The buffered ioreqs were a good candidate, but the only thing usable is the corresponding evtchn in conjunction with an existing ring. In order to use them, a mock buffered ioreq should be created and transmitted, with the only meaningful field being the ioreq type. I certainly think it would be better if you could write the new vm_event interface without having to spend a lot of effort supporting modes that you think nobody uses. On the other hand, getting into the habit of breaking stuff, even for people we don't know about, will be a hindrance to community growth; a commitment to keeping it working will be a benefit to growth. But of course, we haven't declared the vm_event interface 'supported' (it's not even mentioned in the SUPPORT.md document yet). Just for the sake of discussion, would it be possible / reasonble, for instance, to create a new interface, vm_events2, instead? Then you could write it to share the ioreq interface without having legacy baggage you're not using; we could deprecate and eventually remove vm_events1, and if anyone shouts, we can put it back. Thoughts? -George Yes, it's possible and it will GREATLY simplify the implementation. I just have to make sure the interfaces are mutually exclusive. I'm for removing features from the vm_event interface that are no longer in use, especially if they block more advantageous changes like this one. We don't know what the use-case was for async events nor have seen anyone even mention them since I've been working with Xen. Creating a new interface, as mentioned above, would make sense if there was a disagreement with retiring this feature. I don't think that's the case. I certainly would prefer not having to maintain two separate interfaces going forward without a clear justification and documented use-case explaining why we keep the old interface around. AFAICT, the async model is broken conceptually as well, so it makes no sense. It would make sense, IMHO, if it would be lossy (i.e. we just write in the ring buffer, if somebody manages to "catch" an event while it flies by then so be it, if not it will be overwritten). If it would have been truly lossy, I'd see a use-case for it, for gathering statistics maybe. However, as the code is now, the VCPUs are not paused only if there's still space in the ring buffer. If there's no more space in the ring buffer, the VCPU trying to put an event in still gets pause
Re: [Xen-devel] [PATCH for-4.12 V5 2/2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/18/19 1:19 PM, Jan Beulich wrote: On 15.02.19 at 16:41, wrote: HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). This can lead to domain crashes in the vmx_vmexit_handler() code that checks if the guest has the ability to switch EPTP without an exit. That code can __vmread() the host p2m's EPT_POINTER (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a chance to run altp2m_vcpu_initialise(), but after d->arch.altp2m_active is set). Signed-off-by: Razvan Cojocaru Acked-by: Jan Beulich Also I assume you mean to have the same justification here as you have in patch 1, as to 4.12 inclusion. And for this reason I'm also Cc-ing Jürgen again. Ideed, I do. Thank you very much. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 V5 1/2] altp2m: Prevent deadlocks when a domain performs altp2m operations on itself
domain_pause_except_self() was introduced to allow a domain to pause itself while doing altp2m operations. However, as written, it has a risk fo deadlock if two vcpus enter the loop at the same time. Luckily, there's already a solution for this: Attempt to call domain's hypercall_deadlock_mutex, and restart the entire hypercall if you fail. Make domain_pause_except_self() attempt to grab this mutex when pausing itself, returning -ERESTART if it fails. Have the callers check for errors and pass the value up. In both cases, the top-level do_hvm_op() should DTRT when -ERESTART is returned. The (necessary) reuse of the hypercall deadlock mutex poses the risk of getting called from a context where the lock was already acquired (e.g. someone may (say) call domctl_lock(), then afterwards call domain_pause_except_self()). However, in the interest of not overcomplicating things, no changes are made here to the mutex. Attempted nesting of this lock isn't a security issue, because all that will happen is that the vcpu will livelock taking continuations. Signed-off-by: George Dunlap Tested-by: Razvan Cojocaru --- Release exception justification: - Fixes a bug in the current code - Lays the foundation of another fix - Only affects altp2m, which isn't a supported feature NB two possible further improvements to put on the list for 4.13: - Replace send-interrupt-and-wait-for-each-one loop with hvmop_flush_tlb_all()'s more efficient send-all-the-interrupts-then-wait loop. - Modify hvmop_flush_tlb_all() to use domain_pause_except_self() to avoid code duplication --- CC: George Dunlap CC: Jan Beulich CC: Andrew Cooper CC: Wei Liu CC: "Roger Pau Monné" CC: George Dunlap CC: Ian Jackson CC: Julien Grall CC: Konrad Rzeszutek Wilk CC: Stefano Stabellini CC: Tim Deegan --- xen/arch/x86/mm/p2m.c | 10 -- xen/common/domain.c | 8 +++- xen/include/xen/sched.h | 2 +- 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index d14ce57..7232dbf 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -2530,8 +2530,11 @@ int p2m_destroy_altp2m_by_id(struct domain *d, unsigned int idx) if ( !idx || idx >= MAX_ALTP2M ) return rc; -domain_pause_except_self(d); +rc = domain_pause_except_self(d); +if ( rc ) +return rc; +rc = -EBUSY; altp2m_list_lock(d); if ( d->arch.altp2m_eptp[idx] != mfn_x(INVALID_MFN) ) @@ -2561,8 +2564,11 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, unsigned int idx) if ( idx >= MAX_ALTP2M ) return rc; -domain_pause_except_self(d); +rc = domain_pause_except_self(d); +if ( rc ) +return rc; +rc = -EINVAL; altp2m_list_lock(d); if ( d->arch.altp2m_eptp[idx] != mfn_x(INVALID_MFN) ) diff --git a/xen/common/domain.c b/xen/common/domain.c index 7470cd9..32bca8d 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -1134,18 +1134,24 @@ int domain_unpause_by_systemcontroller(struct domain *d) return 0; } -void domain_pause_except_self(struct domain *d) +int domain_pause_except_self(struct domain *d) { struct vcpu *v, *curr = current; if ( curr->domain == d ) { +/* Avoid racing with other vcpus which may want to be pausing us */ +if ( !spin_trylock(&d->hypercall_deadlock_mutex) ) +return -ERESTART; for_each_vcpu( d, v ) if ( likely(v != curr) ) vcpu_pause(v); +spin_unlock(&d->hypercall_deadlock_mutex); } else domain_pause(d); + +return 0; } void domain_unpause_except_self(struct domain *d) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index d633e1d..edee52d 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -839,7 +839,7 @@ static inline int domain_pause_by_systemcontroller_nosync(struct domain *d) } /* domain_pause() but safe against trying to pause current. */ -void domain_pause_except_self(struct domain *d); +int __must_check domain_pause_except_self(struct domain *d); void domain_unpause_except_self(struct domain *d); /* -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 V5 2/2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). This can lead to domain crashes in the vmx_vmexit_handler() code that checks if the guest has the ability to switch EPTP without an exit. That code can __vmread() the host p2m's EPT_POINTER (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a chance to run altp2m_vcpu_initialise(), but after d->arch.altp2m_active is set). Signed-off-by: Razvan Cojocaru --- CC: Jan Beulich CC: Andrew Cooper CC: Wei Liu CC: "Roger Pau Monné" --- xen/arch/x86/hvm/hvm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 410623d..79c7d81 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4538,6 +4538,10 @@ static int do_altp2m_op( break; } +rc = domain_pause_except_self(d); +if ( rc ) +break; + ostate = d->arch.altp2m_active; d->arch.altp2m_active = !!a.u.domain_state.state; @@ -4556,6 +4560,8 @@ static int do_altp2m_op( if ( ostate ) p2m_flush_altp2m(d); } + +domain_unpause_except_self(d); break; } -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/15/19 3:37 PM, George Dunlap wrote: On Feb 15, 2019, at 1:24 PM, Jan Beulich wrote: On 15.02.19 at 13:52, wrote: On Feb 12, 2019, at 11:42 AM, Razvan Cojocaru wrote: HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). Sorry to come in here on v4 and suggest changing everything, but I don’t really like the solution you have here. Not setting altp2m to ‘active’ until after the vcpus are set up makes sense; but passing this true/false value in seems ugly, and still seems a bit racy (i.e., what if p2m_active() is disabled between the check in HVMOP_altp2m_switch_p2m and the time we actually call altp2m_vcpu_update_p2m()?) I certainly don’t think domain_pause() should be our go-to solution for race avoidance, but in this case it really seems like switching the global p2m for every vcpu at once makes the most sense; and trying to safely change this on an unpaused domain is not only overly complicated, but probably not what we wanted anyway. p2m_altp2m_destroy_by_id() and p2m_switch_domain_altp2m_by_id() already use domain_pause_except_self(); so it seems like not using it for altp2m_set_domain_state was probably more of an oversight than an intentional decision. Using that here seems like a more robust solution. Ah, I didn't even recall there was such a function. As this now also allows covering a domain requesting the operation for itself, I don't mind the pausing approach anymore. Yeah, I forgot too until I was grepping for “domain_pause” to figure out what everyone else was doing. :-) The one issue is that domain_pause_except_self() currently is actually a deadlock risk if two different vcpus start it at the same time. I think the attached patch (compile-tested only) should fix this issue; after this patch you should be able to use domain_pause_except_self() in altp2m_set_domain_state instead. There's one thing I don't really like here, which is a result of the (necessary) re-use of the hypercall deadlock mutex: This certainly poses the risk of getting called from a context where the lock was already acquired. Therefore I'd like to suggest to use this lock in a recursive way (here and elsewhere). And two cosmetic remarks - there's no need to re-specify __must_check on the function definition, as the function declaration ought to be in scope anyway. And there's a stray blank inside the likely() you add. I don’t see that I added a ‘likely’; there’s one in context, but I don’t see any stray blanks there. The other two points make sense — Razvan, would you be willing to make those changes (and test the result, as I haven’t done more than compile-test it)? Of course, happy to. Just to make sure I understand where we stand: I'll try to leave the mutex alone for now and only switch to a recursive one if anything blows up. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On Feb 12, 2019, at 11:42 AM, Razvan Cojocaru wrote: HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). Sorry to come in here on v4 and suggest changing everything, but I don’t really like the solution you have here. Not setting altp2m to ‘active’ until after the vcpus are set up makes sense; but passing this true/false value in seems ugly, and still seems a bit racy (i.e., what if p2m_active() is disabled between the check in HVMOP_altp2m_switch_p2m and the time we actually call altp2m_vcpu_update_p2m()?) I certainly don’t think domain_pause() should be our go-to solution for race avoidance, but in this case it really seems like switching the global p2m for every vcpu at once makes the most sense; and trying to safely change this on an unpaused domain is not only overly complicated, but probably not what we wanted anyway. p2m_altp2m_destroy_by_id() and p2m_switch_domain_altp2m_by_id() already use domain_pause_except_self(); so it seems like not using it for altp2m_set_domain_state was probably more of an oversight than an intentional decision. Using that here seems like a more robust solution. The one issue is that domain_pause_except_self() currently is actually a deadlock risk if two different vcpus start it at the same time. I think the attached patch (compile-tested only) should fix this issue; after this patch you should be able to use domain_pause_except_self() in altp2m_set_domain_state instead. Does that sound reasonable? Not only does that sound reasonable, I personally prefer this approach. I'll apply this patch and give it a spin. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/14/19 8:06 PM, George Dunlap wrote: > On 2/12/19 11:42 AM, Razvan Cojocaru wrote: >> HVMOP_altp2m_set_domain_state does not domain_pause(), presumably >> on purpose (as it was originally supposed to cater to a in-guest >> agent, and a domain pausing itself is not a good idea). >> >> This can lead to domain crashes in the vmx_vmexit_handler() code >> that checks if the guest has the ability to switch EPTP without an >> exit. That code can __vmread() the host p2m's EPT_POINTER >> (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a >> chance to run altp2m_vcpu_initialise(), but after >> d->arch.altp2m_active is set). > > Sorry, where exactly does the crash happen? 3655 /* 3656 * If the guest has the ability to switch EPTP without an exit, 3657 * figure out whether it has done so and update the altp2m data. 3658 */ 3659 if ( altp2m_active(v->domain) && 3660 (v->arch.hvm.vmx.secondary_exec_control & 3661 SECONDARY_EXEC_ENABLE_VM_FUNCTIONS) ) 3662 { 3663 unsigned long idx; 3664 3665 if ( v->arch.hvm.vmx.secondary_exec_control & 3666 SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS ) 3667 __vmread(EPTP_INDEX, &idx); 3668 else 3669 { 3670 unsigned long eptp; 3671 3672 __vmread(EPT_POINTER, &eptp); 3673 3674 if ( (idx = p2m_find_altp2m_by_eptp(v->domain, eptp)) == 3675 INVALID_ALTP2M ) 3676 { 3677 gdprintk(XENLOG_ERR, "EPTP not found in alternate p2m list\n"); 3678 domain_crash(v->domain); Right here (at line 3678 in vmx.c). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC PATCH 6/6] xc_version: add vm_event interface version
On 2/12/19 8:13 PM, Tamas K Lengyel wrote: > On Tue, Jan 8, 2019 at 9:39 AM Razvan Cojocaru > wrote: >> >> On 1/8/19 6:27 PM, Jan Beulich wrote: >>>>>> On 19.12.18 at 19:52, wrote: >>>> Signed-off-by: Petre Pircalabu >>> >>> An empty description is not helpful. The immediate question is: Why? >>> We don't do this for other interface versions. I'm unconvinced a >>> special purpose piece of information like this one belongs into the >>> rather generic version hypercall. >> >> For an introspection application meant to be deployed on several Xen >> versions without recompiling, it is important to be able to decide at >> runtime what size and layout the vm_event struct has. >> >> Currently this can somewhat be done by associating the current version >> with the vm_event version, but that is not ideal for obvious reasons. > > We do exactly that in LibVMI and it works fine - care to elaborate > what problem you have with doing that? There is a 1:1 match between > any stable Xen version and a vm_event interface version. I don't think > we will ever have a situation where we bump the vm_event interface > version but not the Xen release version. XenServer. Any given version of XenServer may or may not fit the matrix you're talking about (because some patches are backported, some are not, etc.). In which case it all becomes very complicated. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/12/19 2:57 PM, Jan Beulich wrote: On 12.02.19 at 12:42, wrote: HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). This can lead to domain crashes in the vmx_vmexit_handler() code that checks if the guest has the ability to switch EPTP without an exit. That code can __vmread() the host p2m's EPT_POINTER (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a chance to run altp2m_vcpu_initialise(), but after d->arch.altp2m_active is set). This patch reorganizes the code so that d->arch.altp2m_active is set to true only after all the init work has been done, and to false before the uninit work begins. This required adding a new bool parameter altp2m_vcpu_update_p2m(), which relied on d->arch.altp2m_active being set before it's called. While at it, I've changed a couple of bool_t's to bool's. Signed-off-by: Razvan Cojocaru FTR - this looks okay to me now. But I don't feel qualified to give an R-b, so just for the parts where it's applicable Acked-by: Jan Beulich Thanks for the ack and the help! Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). This can lead to domain crashes in the vmx_vmexit_handler() code that checks if the guest has the ability to switch EPTP without an exit. That code can __vmread() the host p2m's EPT_POINTER (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a chance to run altp2m_vcpu_initialise(), but after d->arch.altp2m_active is set). This patch reorganizes the code so that d->arch.altp2m_active is set to true only after all the init work has been done, and to false before the uninit work begins. This required adding a new bool parameter altp2m_vcpu_update_p2m(), which relied on d->arch.altp2m_active being set before it's called. While at it, I've changed a couple of bool_t's to bool's. Signed-off-by: Razvan Cojocaru --- Changes since V3: - Reverted p2m_get_altp2m() to its original state. - Added an altp2m_active() check to HVMOP_altp2m_switch_p2m. - Updated patch description. --- xen/arch/x86/hvm/hvm.c| 20 xen/arch/x86/hvm/vmx/vmx.c| 4 ++-- xen/arch/x86/mm/altp2m.c | 4 ++-- xen/arch/x86/mm/p2m.c | 4 ++-- xen/include/asm-x86/domain.h | 2 +- xen/include/asm-x86/hvm/hvm.h | 6 +++--- 6 files changed, 26 insertions(+), 14 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 21944e9..21ec059 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4525,7 +4525,7 @@ static int do_altp2m_op( case HVMOP_altp2m_set_domain_state: { struct vcpu *v; -bool_t ostate; +bool ostate, nstate; if ( nestedhvm_enabled(d) ) { @@ -4534,12 +4534,15 @@ static int do_altp2m_op( } ostate = d->arch.altp2m_active; -d->arch.altp2m_active = !!a.u.domain_state.state; +nstate = a.u.domain_state.state; /* If the alternate p2m state has changed, handle appropriately */ -if ( d->arch.altp2m_active != ostate && +if ( nstate != ostate && (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) ) { +/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */ +d->arch.altp2m_active = false; + for_each_vcpu( d, v ) { if ( !ostate ) @@ -4550,7 +4553,14 @@ static int do_altp2m_op( if ( ostate ) p2m_flush_altp2m(d); + +/* + * Wait until altp2m_vcpu_initialise() is done before marking + * altp2m as being enabled for the domain. + */ +d->arch.altp2m_active = nstate; } + break; } @@ -4624,7 +4634,9 @@ static int do_altp2m_op( break; case HVMOP_altp2m_switch_p2m: -rc = p2m_switch_domain_altp2m_by_id(d, a.u.view.view); +rc = -EOPNOTSUPP; +if ( altp2m_active(d) ) +rc = p2m_switch_domain_altp2m_by_id(d, a.u.view.view); break; case HVMOP_altp2m_set_suppress_ve: diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 64af8bf..e542568 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void) return !!cpu_has_monitor_trap_flag; } -static void vmx_vcpu_update_eptp(struct vcpu *v) +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled) { struct domain *d = v->domain; struct p2m_domain *p2m = NULL; struct ept_data *ept; -if ( altp2m_active(d) ) +if ( altp2m_enabled ) p2m = p2m_get_altp2m(v); if ( !p2m ) p2m = p2m_get_hostp2m(d); diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c index 930bdc2..c51303b 100644 --- a/xen/arch/x86/mm/altp2m.c +++ b/xen/arch/x86/mm/altp2m.c @@ -39,7 +39,7 @@ altp2m_vcpu_initialise(struct vcpu *v) vcpu_altp2m(v).p2midx = 0; atomic_inc(&p2m_get_altp2m(v)->active_vcpus); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, true); if ( v != current ) vcpu_unpause(v); @@ -58,7 +58,7 @@ altp2m_vcpu_destroy(struct vcpu *v) altp2m_vcpu_reset(v); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, false); altp2m_vcpu_update_vmfunc_ve(v); if ( v != current ) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index d14ce57..6f991f8 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx) atomic_dec(&p2m_get_altp2m(v)->active_vcpus); vcpu_altp2m(v).p2midx = idx; atomic_inc(&p2m_get_altp2m(v)->active_vcpus); -altp2m_vcpu_update_p2m(v); +
Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/11/19 6:59 PM, Jan Beulich wrote: Plus I can't see p2m_switch_vcpu_altp2m_by_id() called for any HVMOP_altp2m_* at all. One of the actual callers is guarded by altp2m_active(), but the other isn't. Actually I see that both places are guarded by altp2m_active(). In p2m.c we have: 2312 void p2m_altp2m_check(struct vcpu *v, uint16_t idx) 2313 { 2314 if ( altp2m_active(v->domain) ) 2315 p2m_switch_vcpu_altp2m_by_id(v, idx); 2316 } and in vmx.c: 2225 static int vmx_vcpu_emulate_vmfunc(const struct cpu_user_regs *regs) 2226 { 2227 int rc = X86EMUL_EXCEPTION; 2228 struct vcpu *curr = current; 2229 2230 if ( !cpu_has_vmx_vmfunc && altp2m_active(curr->domain) && 2231 regs->eax == 0 && 2232 p2m_switch_vcpu_altp2m_by_id(curr, regs->ecx) ) 2233 rc = X86EMUL_OKAY; 2234 2235 return rc; 2236 } here there's an "&& altp2m_active(curr->domain)" in the if(). So I suppose in our scenario all that's needed it a similar check here: 4636 case HVMOP_altp2m_switch_p2m: 4637 rc = p2m_switch_domain_altp2m_by_id(d, a.u.view.view); 4638 break; for the other function, p2m_switch_domain_altp2m_by_id(). Unless I'm missing something. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/12/19 9:31 AM, Jan Beulich wrote: On 11.02.19 at 18:21, wrote: On 2/11/19 6:59 PM, Jan Beulich wrote: Thanks for noticing, actually this appears to invalidate the whole purpose of the patch (I should have tested this more before sumbitting V3, sorry). The whole point of the new boolean is to have p2m assigned an altp2m regardless of altp2m_active() (hence the change) - which now no longer happens. I got carried away with this change. The fact that this is so easy to get wrong is the reason why I've preferred the domain_pause() solution. There appears to be no clean way to fix this otherwise, and if this is so easy to misunderstand it'll break just as easily with further changes. I suppose I could just pass the bool along to p2m_get_altp2m() (and indeed remove the if())... I think the best that can be done here is to check if altp2m_active() early in p2m_switch_vcpu_altp2m_by_id() and p2m_switch_domain_altp2m_by_id(), then bail if it's not active. Since these are only called by HVMOP_altp2m_* (and thus serialized by the domain lock), it should be enough WRT HVMOP_altp2m_set_domain_state. I'm confused: Where do you see the domain lock used there? Plus I can't see p2m_switch_vcpu_altp2m_by_id() called for any HVMOP_altp2m_* at all. One of the actual callers is guarded by altp2m_active(), but the other isn't. do_altp2m_op() does d = rcu_lock_domain_by_any_id(a.domain);, and unlocks it before it exits. But that's not the "domain lock". This is just making sure the domain won't disappear behind your back. But you're right, p2m_switch_vcpu_altp2m_by_id() is not called for any HVMOP_altp2m_*, I've misread that. Hence, I believe both callers ofp2m_switch_vcpu_altp2m_by_id() may race with HVMOP_altp2m_*. Would you like me to add the altp2m_active() check in both p2m_switch_domain_altp2m_by_id() and p2m_switch_vcpu_altp2m_by_id(), and make it harder to race (it still won't be impossible, since the bool may become false between the check and the actual function logic for p2m_switch_vcpu_altp2m_by_id(), as you've noticed)? Well, altp2m being Tech Preview, any partial fix _could_ do in principle. But personally I dislike such half hearted approaches, and hence I'd recommend to address the issue in full, i.e. eliminate race potential altogether. In the end you're going to be bitten more than me by not getting this into proper shape. I'll attempt a V4 doing my best to check altp2m_active() then, and see if that's at least satisfactory for sane use-cases. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/11/19 6:59 PM, Jan Beulich wrote: >>> Thanks for noticing, actually this appears to invalidate the whole >>> purpose of the patch (I should have tested this more before sumbitting >>> V3, sorry). >>> >>> The whole point of the new boolean is to have p2m assigned an altp2m >>> regardless of altp2m_active() (hence the change) - which now no longer >>> happens. I got carried away with this change. >>> >>> The fact that this is so easy to get wrong is the reason why I've >>> preferred the domain_pause() solution. There appears to be no clean way >>> to fix this otherwise, and if this is so easy to misunderstand it'll >>> break just as easily with further changes. >>> >>> I suppose I could just pass the bool along to p2m_get_altp2m() (and >>> indeed remove the if())... >> >> I think the best that can be done here is to check if altp2m_active() >> early in p2m_switch_vcpu_altp2m_by_id() and >> p2m_switch_domain_altp2m_by_id(), then bail if it's not active. Since >> these are only called by HVMOP_altp2m_* (and thus serialized by the >> domain lock), it should be enough WRT HVMOP_altp2m_set_domain_state. > > I'm confused: Where do you see the domain lock used there? > Plus I can't see p2m_switch_vcpu_altp2m_by_id() called for > any HVMOP_altp2m_* at all. One of the actual callers is guarded > by altp2m_active(), but the other isn't. do_altp2m_op() does d = rcu_lock_domain_by_any_id(a.domain);, and unlocks it before it exits. But you're right, p2m_switch_vcpu_altp2m_by_id() is not called for any HVMOP_altp2m_*, I've misread that. Hence, I believe both callers ofp2m_switch_vcpu_altp2m_by_id() may race with HVMOP_altp2m_*. Would you like me to add the altp2m_active() check in both p2m_switch_domain_altp2m_by_id() and p2m_switch_vcpu_altp2m_by_id(), and make it harder to race (it still won't be impossible, since the bool may become false between the check and the actual function logic for p2m_switch_vcpu_altp2m_by_id(), as you've noticed)? >> I see no other way out of this (aside from the domain_pause() fix). > > If only that one would have been a complete fix, rather than just a > partial one. Agreed, but that one at least clearly fixes the external case, whereas this doesn't seem to cover all corner cases for any situation. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/11/19 12:57 PM, Razvan Cojocaru wrote: On 2/11/19 12:10 PM, Jan Beulich wrote: On 11.02.19 at 10:13, wrote: --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void) return !!cpu_has_monitor_trap_flag; } -static void vmx_vcpu_update_eptp(struct vcpu *v) +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled) { struct domain *d = v->domain; struct p2m_domain *p2m = NULL; struct ept_data *ept; - if ( altp2m_active(d) ) + if ( altp2m_enabled ) p2m = p2m_get_altp2m(v); if ( !p2m ) p2m = p2m_get_hostp2m(d); With the change you now make to p2m_get_altp2m(), this looks to be a benign change. Which to me would suggest to either leave the code alone, or to drop the if() (but - again - not its body) altogether. At which point the code could be further streamlined, as then the NULL initializer can go away and the assignment (or then perhaps initializer) could become "p2m = p2m_get_altp2m(v) ?: p2m_get_hostp2m(d)". (Generally I'd recommend to leave out the change here, and do the transformation in a follow-on patch.) Thanks for noticing, actually this appears to invalidate the whole purpose of the patch (I should have tested this more before sumbitting V3, sorry). The whole point of the new boolean is to have p2m assigned an altp2m regardless of altp2m_active() (hence the change) - which now no longer happens. I got carried away with this change. The fact that this is so easy to get wrong is the reason why I've preferred the domain_pause() solution. There appears to be no clean way to fix this otherwise, and if this is so easy to misunderstand it'll break just as easily with further changes. I suppose I could just pass the bool along to p2m_get_altp2m() (and indeed remove the if())... I think the best that can be done here is to check if altp2m_active() early in p2m_switch_vcpu_altp2m_by_id() and p2m_switch_domain_altp2m_by_id(), then bail if it's not active. Since these are only called by HVMOP_altp2m_* (and thus serialized by the domain lock), it should be enough WRT HVMOP_altp2m_set_domain_state. This of course means reverting p2m_get_altp2m() to its original non-intuitive state of returning a valid altp2m pointer even when altp2m_active() is false. I see no other way out of this (aside from the domain_pause() fix). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/11/19 12:10 PM, Jan Beulich wrote: On 11.02.19 at 10:13, wrote: --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void) return !!cpu_has_monitor_trap_flag; } -static void vmx_vcpu_update_eptp(struct vcpu *v) +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled) { struct domain *d = v->domain; struct p2m_domain *p2m = NULL; struct ept_data *ept; -if ( altp2m_active(d) ) +if ( altp2m_enabled ) p2m = p2m_get_altp2m(v); if ( !p2m ) p2m = p2m_get_hostp2m(d); With the change you now make to p2m_get_altp2m(), this looks to be a benign change. Which to me would suggest to either leave the code alone, or to drop the if() (but - again - not its body) altogether. At which point the code could be further streamlined, as then the NULL initializer can go away and the assignment (or then perhaps initializer) could become "p2m = p2m_get_altp2m(v) ?: p2m_get_hostp2m(d)". (Generally I'd recommend to leave out the change here, and do the transformation in a follow-on patch.) Thanks for noticing, actually this appears to invalidate the whole purpose of the patch (I should have tested this more before sumbitting V3, sorry). The whole point of the new boolean is to have p2m assigned an altp2m regardless of altp2m_active() (hence the change) - which now no longer happens. I got carried away with this change. The fact that this is so easy to get wrong is the reason why I've preferred the domain_pause() solution. There appears to be no clean way to fix this otherwise, and if this is so easy to misunderstand it'll break just as easily with further changes. I suppose I could just pass the bool along to p2m_get_altp2m() (and indeed remove the if())... Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably on purpose (as it was originally supposed to cater to a in-guest agent, and a domain pausing itself is not a good idea). This can lead to domain crashes in the vmx_vmexit_handler() code that checks if the guest has the ability to switch EPTP without an exit. That code can __vmread() the host p2m's EPT_POINTER (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a chance to run altp2m_vcpu_initialise(), but after d->arch.altp2m_active is set). This patch reorganizes the code so that d->arch.altp2m_active is set to true only after all the init work has been done, and to false before the uninit work begins. This required adding a new bool parameter altp2m_vcpu_update_p2m(), which relied on d->arch.altp2m_active being set before it's called. p2m_get_altp2m() now returns NULL if !altp2m_active(), to prevent it from returning a "valid" altp2m pointer between the moment where d->arch.altp2m_active = false and the point when v->p2midx actually becomes INVALID_ALTP2M. While at it, I've changed a couple of bool_t's to bool's. Signed-off-by: Razvan Cojocaru --- Changes since V2: - Removed leftover asm-x86/altp2m.h changes. - nstate = !!a.u.domain_state.state; becomes nstate = a.u.domain_state.state; - Removed the if() and else in do_altp2m_op() as recommended by Jan. - Using true explicitly instead of altp2m_active(d) for altp2m_vcpu_update_p2m() in p2m_switch_vcpu_altp2m_by_id() and p2m_switch_domain_altp2m_by_id(). - Updated patch description. - Modified p2m_get_altp2m() to return NULL if !altp2m_active(). --- xen/arch/x86/hvm/hvm.c| 16 +--- xen/arch/x86/hvm/vmx/vmx.c| 4 ++-- xen/arch/x86/mm/altp2m.c | 4 ++-- xen/arch/x86/mm/p2m.c | 4 ++-- xen/include/asm-x86/domain.h | 2 +- xen/include/asm-x86/hvm/hvm.h | 6 +++--- xen/include/asm-x86/p2m.h | 8 +++- 7 files changed, 30 insertions(+), 14 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 21944e9..50d896d 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4525,7 +4525,7 @@ static int do_altp2m_op( case HVMOP_altp2m_set_domain_state: { struct vcpu *v; -bool_t ostate; +bool ostate, nstate; if ( nestedhvm_enabled(d) ) { @@ -4534,12 +4534,15 @@ static int do_altp2m_op( } ostate = d->arch.altp2m_active; -d->arch.altp2m_active = !!a.u.domain_state.state; +nstate = a.u.domain_state.state; /* If the alternate p2m state has changed, handle appropriately */ -if ( d->arch.altp2m_active != ostate && +if ( nstate != ostate && (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) ) { +/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */ +d->arch.altp2m_active = false; + for_each_vcpu( d, v ) { if ( !ostate ) @@ -4550,7 +4553,14 @@ static int do_altp2m_op( if ( ostate ) p2m_flush_altp2m(d); + +/* + * Wait until altp2m_vcpu_initialise() is done before marking + * altp2m as being enabled for the domain. + */ +d->arch.altp2m_active = nstate; } + break; } diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 64af8bf..e542568 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void) return !!cpu_has_monitor_trap_flag; } -static void vmx_vcpu_update_eptp(struct vcpu *v) +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled) { struct domain *d = v->domain; struct p2m_domain *p2m = NULL; struct ept_data *ept; -if ( altp2m_active(d) ) +if ( altp2m_enabled ) p2m = p2m_get_altp2m(v); if ( !p2m ) p2m = p2m_get_hostp2m(d); diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c index 930bdc2..c51303b 100644 --- a/xen/arch/x86/mm/altp2m.c +++ b/xen/arch/x86/mm/altp2m.c @@ -39,7 +39,7 @@ altp2m_vcpu_initialise(struct vcpu *v) vcpu_altp2m(v).p2midx = 0; atomic_inc(&p2m_get_altp2m(v)->active_vcpus); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, true); if ( v != current ) vcpu_unpause(v); @@ -58,7 +58,7 @@ altp2m_vcpu_destroy(struct vcpu *v) altp2m_vcpu_reset(v); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, false); altp2m_vcpu_update_vmfunc_ve(v); if ( v != current ) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index d14ce57..6f991f8 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx
Re: [Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
Right, I now understand what you meant by removing "if ( ostate )" - you'd like d->arch.altp2m_active to only be set _after_ the for, for the enable, as well as for the disable case. No, certainly not. Exactly because of ... However, I wanted to keep both if()s to avoid another problem with this code: [...] So for the disable case, I wanted altp2m_active(v->domain) to return false immediately so that this code won't be triggered. Otherwise, assuming the following scenario: ... this. Apparently "and keep what is now the body" was still not clear enough. Taking your original code, I mean if ( nstate != ostate && (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) ) { /* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */ d->arch.altp2m_active = false; for_each_vcpu( d, v ) [...] if ( ostate ) p2m_flush_altp2m(d); /* * Wait until altp2m_vcpu_initialise() is done before marking * altp2m as being enabled for the domain. */ d->arch.altp2m_active = nstate; } Ah! Thanks for the clarification. Now, you're right that p2m_get_altp2m() may hand over a pointer to an altp2m between the moment where d->arch.altp2m_active is false and the point where v->p2midx actually becomes INVALID_ALTP2M with my code; but I think it can be argued that I should rather fix p2m_get_altp2m(), to return NULL if altp2m_active() == false and keep the if()s (which I've hopefully been more eloquent about now). Yes, that looks to be an option, provided it doesn't violate assumptions elsewhere. It doesn't, AFAICT. Additionally, it can be argued that if code relies on p2m_get_altp2m() returning not-NULL when !altp2m_active(), said code is broken. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/8/19 6:44 PM, Jan Beulich wrote: On 08.02.19 at 17:30, wrote: >> On 2/8/19 5:50 PM, Jan Beulich wrote: >> On 08.02.19 at 15:00, wrote: /* If the alternate p2m state has changed, handle appropriately */ -if ( d->arch.altp2m_active != ostate && +if ( nstate != ostate && (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) ) { +/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */ +if ( ostate ) +d->arch.altp2m_active = false; >>> >>> Why the if()? In the opposite case you'd simply write false into >>> what already holds false. >> >> The value written into d->arch.altp2m_active is not the point here. The >> point is that if ( ostate ), then we are disabling altp2m (because the >> if above this one makes sure ostate != nstate). >> >> So in the disable case, first I wanted to set d->arch.altp2m_active to >> false (which immediately causes altp2m_active(d) to return false as >> well), and then actually perform the work. > > Sure - I'm not asking to remove everything above, just the if() > (and keep what is now the body). That is because > - in the enable case the value is false and writing false into it is > benign, > - in the disable case you want to actively change from true to > false. > @@ -4550,7 +4554,14 @@ static int do_altp2m_op( if ( ostate ) p2m_flush_altp2m(d); +else +/* + * Wait until altp2m_vcpu_initialise() is done before marking + * altp2m as being enabled for the domain. + */ +d->arch.altp2m_active = true; >>> >>> Similarly here you could omit the "else" and simply store "nstate" afaict. >> >> As above, in the enable case I wanted to first setup altp2m on all VCPUs >> with altp2m_vcpu_initialise(), and only after that set >> d->arch.altp2m_active = true. >> >> In summary, if ( ostate ) we want to set d->arch.altp2m_active before >> the for (we're disabling altp2m), and if ( !ostate ) (which is the else >> above) we want to set d->arch.altp2m_active after said for. >> >> We can indeed store nstate. I just thought things look clearer with >> "true" and "false", but if you prefer there's no problem assigning nstate. > > Well, as always with mm and altp2m code, I'm not the maintainer, so > I don't have the final say. It's just that I think unnecessary conditionals > would better be avoided, even if they're not on a performance critical > path. > --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void) return !!cpu_has_monitor_trap_flag; } -static void vmx_vcpu_update_eptp(struct vcpu *v) +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled) { struct domain *d = v->domain; struct p2m_domain *p2m = NULL; struct ept_data *ept; -if ( altp2m_active(d) ) +if ( altp2m_enabled ) p2m = p2m_get_altp2m(v); if ( !p2m ) p2m = p2m_get_hostp2m(d); >>> >>> Is this an appropriate transformation? That is, can there not be >>> any domains for which altp2m_active() returns false despite >>> altp2m_enabled being true? (Looking at p2m_get_altp2m() I can't >>> really judge whether index would always be INVALID_ALTP2M in >>> such a case.) >> >> Yes, it should be completely safe (please see below for details). >> --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx) atomic_dec(&p2m_get_altp2m(v)->active_vcpus); vcpu_altp2m(v).p2midx = idx; atomic_inc(&p2m_get_altp2m(v)->active_vcpus); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, altp2m_active(d)); } rc = 1; } @@ -2573,7 +2573,7 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, unsigned int idx) atomic_dec(&p2m_get_altp2m(v)->active_vcpus); vcpu_altp2m(v).p2midx = idx; atomic_inc(&p2m_get_altp2m(v)->active_vcpus); -altp2m_vcpu_update_p2m(v); +altp2m_vcpu_update_p2m(v, altp2m_active(d)); } >>> >>> In both cases, isn't altp2m_active() going to return true anyway >>> when we get there (related to the question above)? >> >> Yes, it will return true. In order for it to return false, >> altp2m_vcpu_destroy() would have had to run on that VCPU, which (among >> other things) calls altp2m_vcpu_reset(), which does v->p2midx = >> INVALID_ALTP2M. > > Hmm, according to the change further up in this patch, altp2m_active() >
Re: [Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race
On 2/8/19 5:50 PM, Jan Beulich wrote: On 08.02.19 at 15:00, wrote: >> --- a/xen/arch/x86/hvm/hvm.c >> +++ b/xen/arch/x86/hvm/hvm.c >> @@ -4525,7 +4525,7 @@ static int do_altp2m_op( >> case HVMOP_altp2m_set_domain_state: >> { >> struct vcpu *v; >> -bool_t ostate; >> +bool ostate, nstate; >> >> if ( nestedhvm_enabled(d) ) >> { >> @@ -4534,12 +4534,16 @@ static int do_altp2m_op( >> } >> >> ostate = d->arch.altp2m_active; >> -d->arch.altp2m_active = !!a.u.domain_state.state; >> +nstate = !!a.u.domain_state.state; > > No need for !! here. I'll remove it. >> /* If the alternate p2m state has changed, handle appropriately */ >> -if ( d->arch.altp2m_active != ostate && >> +if ( nstate != ostate && >> (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) ) >> { >> +/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */ >> +if ( ostate ) >> +d->arch.altp2m_active = false; > > Why the if()? In the opposite case you'd simply write false into > what already holds false. The value written into d->arch.altp2m_active is not the point here. The point is that if ( ostate ), then we are disabling altp2m (because the if above this one makes sure ostate != nstate). So in the disable case, first I wanted to set d->arch.altp2m_active to false (which immediately causes altp2m_active(d) to return false as well), and then actually perform the work. >> @@ -4550,7 +4554,14 @@ static int do_altp2m_op( >> >> if ( ostate ) >> p2m_flush_altp2m(d); >> +else >> +/* >> + * Wait until altp2m_vcpu_initialise() is done before >> marking >> + * altp2m as being enabled for the domain. >> + */ >> +d->arch.altp2m_active = true; > > Similarly here you could omit the "else" and simply store "nstate" afaict. As above, in the enable case I wanted to first setup altp2m on all VCPUs with altp2m_vcpu_initialise(), and only after that set d->arch.altp2m_active = true. In summary, if ( ostate ) we want to set d->arch.altp2m_active before the for (we're disabling altp2m), and if ( !ostate ) (which is the else above) we want to set d->arch.altp2m_active after said for. We can indeed store nstate. I just thought things look clearer with "true" and "false", but if you prefer there's no problem assigning nstate. >> --- a/xen/arch/x86/hvm/vmx/vmx.c >> +++ b/xen/arch/x86/hvm/vmx/vmx.c >> @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void) >> return !!cpu_has_monitor_trap_flag; >> } >> >> -static void vmx_vcpu_update_eptp(struct vcpu *v) >> +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled) >> { >> struct domain *d = v->domain; >> struct p2m_domain *p2m = NULL; >> struct ept_data *ept; >> >> -if ( altp2m_active(d) ) >> +if ( altp2m_enabled ) >> p2m = p2m_get_altp2m(v); >> if ( !p2m ) >> p2m = p2m_get_hostp2m(d); > > Is this an appropriate transformation? That is, can there not be > any domains for which altp2m_active() returns false despite > altp2m_enabled being true? (Looking at p2m_get_altp2m() I can't > really judge whether index would always be INVALID_ALTP2M in > such a case.) Yes, it should be completely safe (please see below for details). >> --- a/xen/arch/x86/mm/p2m.c >> +++ b/xen/arch/x86/mm/p2m.c >> @@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, >> unsigned int idx) >> atomic_dec(&p2m_get_altp2m(v)->active_vcpus); >> vcpu_altp2m(v).p2midx = idx; >> atomic_inc(&p2m_get_altp2m(v)->active_vcpus); >> -altp2m_vcpu_update_p2m(v); >> +altp2m_vcpu_update_p2m(v, altp2m_active(d)); >> } >> rc = 1; >> } >> @@ -2573,7 +2573,7 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, >> unsigned int idx) >> atomic_dec(&p2m_get_altp2m(v)->active_vcpus); >> vcpu_altp2m(v).p2midx = idx; >> atomic_inc(&p2m_get_altp2m(v)->active_vcpus); >> -altp2m_vcpu_update_p2m(v); >> +altp2m_vcpu_update_p2m(v, altp2m_active(d)); >> } > > In both cases, isn't altp2m_active() going to return true anyway > when we get there (related to the question above)? Yes, it will return true. In order for it to return false, altp2m_vcpu_destroy() would have had to run on that VCPU, which (among other things) calls altp2m_vcpu_reset(), which does v->p2midx = INVALID_ALTP2M. There's an if() above (not shown in your reply) that says "if ( idx != vcpu_altp2m(v).p2midx )", so, indeed, by the time we end up here we can reasonably assume that altp2m_active(d) will return true. I've just put in "altp2m_active(d)" to make sure there's absolutely no functional change here, but AFA