Re: [PATCH v4 for-4.14] x86/monitor: revert default behavior when monitoring register write events

2020-06-08 Thread Razvan Cojocaru
On 6/8/20 11:44 PM, Tamas K Lengyel wrote:
> On Mon, Jun 8, 2020 at 2:14 PM Razvan Cojocaru
>  wrote:
>>
>> On 6/8/20 10:54 PM, Tamas K Lengyel wrote:
>>> On Mon, Jun 8, 2020 at 12:58 PM Razvan Cojocaru
>>>> And last but not least, the proper sequence is: 1. unsubscribe from
>>>> register write events, 2. process all events "still in the chamber"
>>>> (keep checking the ring buffer for a while), 3. detach from the guest
>>>> (disable the vm_event subsystem). Not ideal perhaps (in that it's not
>>>> guaranteed that a VCPU won't resume after a longer period than our
>>>> timeout), but if the sequence is followed there should be no guest hangs
>>>> or crashes (at least none that we or our clients have observed so far).
>>>
>>> Incorrect. That's not enough. You also have to wait for all the vCPUs
>>> to get scheduled before disabling vm_event or otherwise the emulation
>>> is skipped entirely. Please read the patch message. If the user
>>> decides to disable vm_event after getting a CR3 event delivered, the
>>> CR3 never gets updated and results in the guest crashing in
>>> unpredictable ways. Same happens with all the other registers.
>>
>> I did read the patch message. As I've previously stated ("it's not
>> guaranteed that a VCPU won't resume after a longer period than our
>> timeout"), it's not ideal, and I've made no claim that the problem does
>> not exist or that it shouldn't be fixed - but really if you've got a
>> VCPU that will wait more than a couple of seconds to get scheduled then
>> you've got a bigger problem with the VM.
> 
> Sorry, missed the part where you say you knew about this issue. We
> didn't and it was absolutely not documented anywhere and certainly not
> mentioned in the original patch that added the feature. It literally
> took us years to figure out what the hell is going on. While as you it
> can be a couple seconds before its safe to disable it can be a lot
> longer depending on the system state, how many VMs are running and how
> many vCPUs are active on the VM. There is absolutely necessary
> use-cases where you want to keep the VM paused after a CR3 event is
> received and exit vm_event afterwards. This bug totally blocks those
> use-cases. Not to mention that it's a total mess having an interface
> where the user has to guess when its safe to do something. If this was
> pointed out when the patch was made I would have objected to that
> being merged.

No, we didn't know about the issue. It's apparently not my most eloquent
evening. I was merely saying that I did understand what the issue was
from your description, and offering an explanation on why we've never
seen this in QA or production. Of course the issue would have been
mentioned at least (but ideally not exist to begin with) had it been known.

We do take several passes through the ring buffer (and as a side-effect
reduce the probability of a race occuring to almost zero), but we do
that to make sure we've cleaned up all EPT vm_events; the fact that it
has helped with _this_ issue is a coincidence.

IIRC Windows, at least, will be upset if a VCPU is stuck for too long.

As for how the vm_event system behaves:

1. A security application that is unable to _prevent_ things being done
to a system is not doing a very good job, since in that case you can
only collect stats and not veto anything. I would argue that the default
for such a monitoring application should be the current one (all events
should be pre-action).

2. This is further supported by the fact that that's exactly how the EPT
vm_events work: you get a "I want to write to this page" event _before_
the write occurs. If register writes behave differently, you have _two_
different models: one where you get an event post-action, and one where
you get one pre-action.


Hope that's clearer,
Razvan



Re: [PATCH v4 for-4.14] x86/monitor: revert default behavior when monitoring register write events

2020-06-08 Thread Razvan Cojocaru
On 6/9/20 1:50 AM, Tamas K Lengyel wrote:
> On Mon, Jun 8, 2020 at 3:16 PM Razvan Cojocaru
>  wrote:
>> 1. A security application that is unable to _prevent_ things being done
>> to a system is not doing a very good job, since in that case you can
>> only collect stats and not veto anything. I would argue that the default
>> for such a monitoring application should be the current one (all events
>> should be pre-action).
> 
> Not all security applications require this though. Malware analysis
> where stealth is required would absolutely not want this side-effect
> to be visible to the guest where malware could use it to determine
> that it's being monitored. So I don't buy into this argument.

Fair enough, in that case having both models supported should be fine.
I'll leave the rest of that conversation to my colleagues.

>> 2. This is further supported by the fact that that's exactly how the EPT
>> vm_events work: you get a "I want to write to this page" event _before_
>> the write occurs. If register writes behave differently, you have _two_
>> different models: one where you get an event post-action, and one where
>> you get one pre-action.
> 
> Whether you get an event before or after the effects of the event have
> been applied to the system state shouldn't matter as long as you can
> revert that action. I wouldn't care either way to be the default. But
> having a default that breaks other use-cases is unacceptable.

You keep saying that as if I disagree. :) But we've already established
that the potential for a race condition has been found and needs to be
fixed.

My only (minor) objection has been that a patch fixing the current model
would have been preferable to one that switches the default as a
workaround. Still, it's understandable that perhaps there's no time or
motivation for that.


Thanks for the work,
Razvan



Re: [PATCH v4 for-4.14] x86/monitor: revert default behavior when monitoring register write events

2020-06-08 Thread Razvan Cojocaru
On 6/8/20 6:55 PM, Jan Beulich wrote:
> On 03.06.2020 17:07, Roger Pau Monné wrote:
>> On Wed, Jun 03, 2020 at 06:52:37AM -0600, Tamas K Lengyel wrote:
>>> For the last couple years we have received numerous reports from users of
>>> monitor vm_events of spurious guest crashes when using events. In 
>>> particular,
>>> it has observed that the problem occurs when vm_events are being disabled. 
>>> The
>>> nature of the guest crash varied widely and has only occured occasionally. 
>>> This
>>> made debugging the issue particularly hard. We had discussions about this 
>>> issue
>>> even here on the xen-devel mailinglist with no luck figuring it out.
>>>
>>> The bug has now been identified as a race-condition between register event
>>> handling and disabling the monitor vm_event interface. The default behavior
>>> regarding emulation of register write events is changed so that they get
>>> postponed until the corresponding vm_event handler decides whether to allow 
>>> such
>>> write to take place. Unfortunately this can only be implemented by 
>>> performing the
>>> deny/allow step when the vCPU gets scheduled.
>>>
>>> Due to that postponed emulation of the event if the user decides to pause 
>>> the
>>> VM in the vm_event handler and then disable events, the entire emulation 
>>> step
>>> is skipped the next time the vCPU is resumed. Even if the user doesn't pause
>>> during the vm_event handling but exits immediately and disables vm_event, 
>>> the
>>> situation becomes racey as disabling vm_event may succeed before the guest's
>>> vCPUs get scheduled with the pending emulation task. This has been 
>>> particularly
>>> the case with VMS that have several vCPUs as after the VM is unpaused it may
>>> actually take a long time before all vCPUs get scheduled.
>>>
>>> In this patch we are reverting the default behavior to always perform 
>>> emulation
>>> of register write events when the event occurs. To postpone them can be 
>>> turned
>>> on as an option. In that case the user of the interface still has to take 
>>> care
>>> of only disabling the interface when its safe as it remains buggy.
>>>
>>> Fixes: 96760e2fba10 ('vm_event: deny register writes if refused by vm_event
>>> reply').
>>>
>>> Signed-off-by: Tamas K Lengyel 
>>
>> Thanks!
>>
>> Reviewed-by: Roger Pau Monné 
>>
>> I would like to get some input from Bitdefender really, and whether
>> they are fine with this approach.

Hello,

Not really my call to make anymore, but I do have a few notes.

First, IIRC the problem stems from the initial choice to have the
vm_event data allocated on-demand when first subscribing to events. The
proper solution (since this patch doesn't actually fix the problem),
IMHO, would be for the vm_event data to _always_ exist, and instead of
relying on the value of its pointer to check if there are event
subscribers, we could just check the emulation flags individually and
never miss a pending emulated something again. I did try to go that way
in the beginning, but it has reasonably been objected that we should cut
back on using hypervisor memory unnecessarily, hence we got at this point.

Secondly, I see no reason why we couldn't adapt to the new default
behaviour provided that the old behaviour continues to work _exactly_ as
before.

And last but not least, the proper sequence is: 1. unsubscribe from
register write events, 2. process all events "still in the chamber"
(keep checking the ring buffer for a while), 3. detach from the guest
(disable the vm_event subsystem). Not ideal perhaps (in that it's not
guaranteed that a VCPU won't resume after a longer period than our
timeout), but if the sequence is followed there should be no guest hangs
or crashes (at least none that we or our clients have observed so far).

So in short, I think there's a better fix for this by simply not
allocating the vm_event memory on-demand anymore and never having to
deal with lost pending emulations again. It should also decrease code
complexity by a tiny bit. Then again, as stated at the beginning of this
message, that's just a recommendation.


HTH,
Razvan



Re: [PATCH v4 for-4.14] x86/monitor: revert default behavior when monitoring register write events

2020-06-08 Thread Razvan Cojocaru
On 6/8/20 10:54 PM, Tamas K Lengyel wrote:
> On Mon, Jun 8, 2020 at 12:58 PM Razvan Cojocaru
>> And last but not least, the proper sequence is: 1. unsubscribe from
>> register write events, 2. process all events "still in the chamber"
>> (keep checking the ring buffer for a while), 3. detach from the guest
>> (disable the vm_event subsystem). Not ideal perhaps (in that it's not
>> guaranteed that a VCPU won't resume after a longer period than our
>> timeout), but if the sequence is followed there should be no guest hangs
>> or crashes (at least none that we or our clients have observed so far).
> 
> Incorrect. That's not enough. You also have to wait for all the vCPUs
> to get scheduled before disabling vm_event or otherwise the emulation
> is skipped entirely. Please read the patch message. If the user
> decides to disable vm_event after getting a CR3 event delivered, the
> CR3 never gets updated and results in the guest crashing in
> unpredictable ways. Same happens with all the other registers.

I did read the patch message. As I've previously stated ("it's not
guaranteed that a VCPU won't resume after a longer period than our
timeout"), it's not ideal, and I've made no claim that the problem does
not exist or that it shouldn't be fixed - but really if you've got a
VCPU that will wait more than a couple of seconds to get scheduled then
you've got a bigger problem with the VM.

>> So in short, I think there's a better fix for this by simply not 
>> allocating the vm_event memory on-demand anymore and never having to
>> deal with lost pending emulations again. It should also decrease code
>> complexity by a tiny bit. Then again, as stated at the beginning of this
>> message, that's just a recommendation.is
> 
> Since only you guys use this feature I'm going to wait for a fix.
> Until then, the default behavior should be restored so this buggy
> behavior doesn't affect anyone else. You can still turn it on, its
> just not going to be on for vm_event by default. I don't particularly
> care what fix is there since only you guys use it. If you don't mind
> that there is this buggy behavior because you never disable vm_events
> once you activate it then that's that.

Indeed, our usual scenario is that vm_event is always on on the machine.
It's only rarely being disabled while the guest is running, and when it
is, it's always with waiting sufficiently long that we've not seen any
unexplainable hung VMs.

Fair enough, as long as the previous behaviour is optionally available I
see no reason why this patch shouldn't make it in - though, of course,
as also previously stated, I'm just trying to shed as much light as
possible on this from what I remember, and what happens next is not my
call. My colleagues should be able to chime in tomorrow.


Cheers,
Razvan



Re: [Xen-devel] [PATCH] Remove myself as vm_event maintainer

2019-12-10 Thread Razvan COJOCARU


On 12/10/19 11:57 AM, Jan Beulich wrote:
> On 08.12.2019 11:07, Razvan Cojocaru wrote:
>> ---
>>   MAINTAINERS | 1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 9c827ad759..012c847ebd 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -428,7 +428,6 @@ L:   xen-devel@lists.xenproject.org
>>   F: unmodified_drivers/linux-2.6/
>>   
>>   VM EVENT, MEM ACCESS and MONITOR
>> -M:  Razvan Cojocaru 
>>   M: Tamas K Lengyel 
>>   R: Alexandru Isaila 
>>   R: Petre Pircalabu 
> 
> No matter the contents, I guess this still needs an S-o-b of yours.

Re-sent, sorry for the inconvenience.


Thanks,
Razvan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH V2] Remove myself as vm_event maintainer

2019-12-10 Thread Razvan Cojocaru
Signed-off-by: Razvan Cojocaru 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9c827ad759..012c847ebd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -428,7 +428,6 @@ L:  xen-devel@lists.xenproject.org
 F: unmodified_drivers/linux-2.6/
 
 VM EVENT, MEM ACCESS and MONITOR
-M: Razvan Cojocaru 
 M: Tamas K Lengyel 
 R: Alexandru Isaila 
 R: Petre Pircalabu 
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] Remove myself as vm_event maintainer

2019-12-08 Thread Razvan Cojocaru
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9c827ad759..012c847ebd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -428,7 +428,6 @@ L:  xen-devel@lists.xenproject.org
 F: unmodified_drivers/linux-2.6/
 
 VM EVENT, MEM ACCESS and MONITOR
-M: Razvan Cojocaru 
 M: Tamas K Lengyel 
 R: Alexandru Isaila 
 R: Petre Pircalabu 
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-next] x86/svm: Correct vm_event API for descriptor accesses

2019-11-28 Thread Razvan COJOCARU
On 11/28/19 1:44 PM, Andrew Cooper wrote:
> c/s d0a699a389f1 "x86/monitor: add support for descriptor access events"
> introduced logic looking for what appeared to be exitinfo (not that this
> exists in SVM - exitinfo1 or 2 do), but actually passed the exit IDT vectoring
> information.  There is never any IDT vectoring involved in these intercepts so
> the value passed is always zero.
> 
> In fact, SVM doesn't provide any information, even in exitinfo1 and 2.  Note
> the error in the public API and state that this field is always 0, and drop
> the SVM logic in hvm_monitor_descriptor_access().
> 
> In the SVM vmexit handler itself, optimise the switch statement by observing
> that there is a linear transformation between the SVM exit_reason and
> VM_EVENT_DESC_* values.  (Bloat-o-meter reports 6028 => 5877 for a saving of
> 151 bytes).
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Razvan Cojocaru 
> CC: Tamas K Lengyel 
> CC: Alexandru Isaila 
> CC: Petre Pircalabu 
> CC: Adrian Pop 
> 
> Adrian: Do you recall what information you were attempting to forward from the
> VMCB?  I can't locate anything which would plausibly be interesting.

I think it's safe to go the route you're going (you shouldn't break 
anything).

Acked-by: Razvan Cojocaru 

(with or without addressing Tamas' comments).


Thanks,
Razvan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v13] x86/emulate: Send vm_event from emulate

2019-09-23 Thread Razvan COJOCARU
On 9/23/19 3:05 PM, Alexandru Stefan ISAILA wrote:
> A/D bit writes (on page walks) can be considered benign by an introspection
> agent, so receiving vm_events for them is a pessimization. We try here to
> optimize by filtering these events out.
> Currently, we are fully emulating the instruction at RIP when the hardware 
> sees
> an EPT fault with npfec.kind != npfec_kind_with_gla. This is, however,
> incorrect, because the instruction at RIP might legitimately cause an
> EPT fault of its own while accessing a_different_  page from the original one,
> where A/D were set.
> The solution is to perform the whole emulation, while ignoring EPT 
> restrictions
> for the walk part, and taking them into account for the "actual" emulating of
> the instruction at RIP. When we send out a vm_event, we don't want the 
> emulation
> to complete, since in that case we won't be able to veto whatever it is doing.
> That would mean that we can't actually prevent any malicious activity, instead
> we'd only be able to report on it.
> When we see a "send-vm_event" case while emulating, we need to first send the
> event out and then suspend the emulation (return X86EMUL_RETRY).
> After the emulation stops we'll call hvm_vm_event_do_resume() again after the
> introspection agent treats the event and resumes the guest. There, the
> instruction at RIP will be fully emulated (with the EPT ignored) if the
> introspection application allows it, and the guest will continue to run past
> the instruction.
> 
> A common example is if the hardware exits because of an EPT fault caused by a
> page walk, p2m_mem_access_check() decides if it is going to send a vm_event.
> If the vm_event was sent and it would be treated so it runs the instruction
> at RIP, that instruction might also hit a protected page and provoke a 
> vm_event.
> 
> Now if npfec.kind == npfec_kind_in_gpt and 
> d->arch.monitor.inguest_pagefault_disabled
> is true then we are in the page walk case and we can do this emulation 
> optimization
> and emulate the page walk while ignoring the EPT, but don't ignore the EPT 
> for the
> emulation of the actual instruction.
> 
> In the first case we would have 2 EPT events, in the second case we would have
> 1 EPT event if the instruction at the RIP triggers an EPT event.
> 
> We use hvmemul_map_linear_addr() to intercept write access and
> __hvm_copy() to intercept exec, read and write access.
> 
> A new return type was added, HVMTRANS_need_retry, in order to have all
> the places that consume HVMTRANS* return X86EMUL_RETRY.
> 
> hvm_emulate_send_vm_event() can return false if there was no violation,
> if there was an error from monitor_traps() or p2m_get_mem_access().
> -ESRCH from p2m_get_mem_access() is treated as restricted access.
> 
> NOTE: hvm_emulate_send_vm_event() assumes the caller will enable/disable
> arch.vm_event->send_event
> 
> Signed-off-by: Alexandru Isaila

FWIW, Acked-by: Razvan Cojocaru 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] SVM: correct CPUID event processing

2019-09-19 Thread Razvan COJOCARU
On 9/19/19 11:07 PM, Boris Ostrovsky wrote:
> On 9/19/19 6:37 AM, Jan Beulich wrote:
>> hvm_monitor_cpuid() expects the input registers, not two of the outputs.
>>
>> However, once having made the necessary adjustment, the SVM and VMX
>> functions are so similar that they should be folded (thus avoiding
>> further similar asymmetries to get introduced). Use the best of both
>> worlds by e.g. using "curr" consistently. This then being the only
>> caller of hvm_check_cpuid_faulting(), fold in that function as well.
>>
>> Signed-off-by: Jan Beulich 
> 
> Reviewed-by: Boris Ostrovsky 

Acked-by: Razvan Cojocaru 


Thanks,
Razvan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10] x86/emulate: Send vm_event from emulate

2019-09-17 Thread Razvan COJOCARU
On 9/17/19 6:09 PM, Tamas K Lengyel wrote:
> On Tue, Sep 17, 2019 at 8:24 AM Razvan Cojocaru
>  wrote:
>>
>> On 9/17/19 5:11 PM, Alexandru Stefan ISAILA wrote:
>>>>>>> +bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec,
>>>>>>> +   uint16_t kind)
>>>>>>> +{
>>>>>>> +xenmem_access_t access;
>>>>>>> +vm_event_request_t req = {};
>>>>>>> +paddr_t gpa = (gfn_to_gaddr(gfn) | (gla & ~PAGE_MASK));
>>>>>>> +
>>>>>>> +ASSERT(current->arch.vm_event->send_event);
>>>>>>> +
>>>>>>> +current->arch.vm_event->send_event = false;
>>>>>>> +
>>>>>>> +if ( p2m_get_mem_access(current->domain, gfn, &access,
>>>>>>> +altp2m_vcpu_idx(current)) != 0 )
>>>>>>> +return false;
>>>>>> ... next to the call here (but the maintainers of the file would
>>>>>> have to judge in the end). That said, I continue to not understand
>>>>>> why a not found entry means unrestricted access. Isn't it
>>>>>> ->default_access which controls what such a "virtual" entry would
>>>>>> permit?
>>>>> I'm sorry for this misleading comment. The code states that if entry was
>>>>> not found the access will be default_access and return 0. So in this
>>>>> case the default_access will be checked.
>>>>>
>>>>> /* If request to get default access. */
>>>>> if ( gfn_eq(gfn, INVALID_GFN) )
>>>>> {
>>>>>*access = memaccess[p2m->default_access];
>>>>>return 0;
>>>>> }
>>>>>
>>>>> If this clears thing up I can remove the "NOTE" part if the comment.
>>>> I'm afraid it doesn't clear things up: I'm still lost as to why
>>>> "entry not found" implies "full access". And I'm further lost as
>>>> to what the code fragment above (dealing with INVALID_GFN, but
>>>> not really the "entry not found" case, which would be INVALID_MFN
>>>> coming back from a translation) is supposed to tell me.
>>>>
>>> It is safe enough to consider a invalid mfn from hostp2 to be a
>>> violation. There is still a small problem with having the altp2m view
>>> not having the page propagated from hostp2m. In this case we have to use
>>> altp2m_get_effective_entry().
>>
>> In the absence of clear guidance from the Intel SDM on what the hardware
>> default is for a page not present in the p2m, we should probably follow
>> Jan's advice and check violations against default_access for such pages.
> 
> The SDM is very clear that pages that are not present in the EPT are a
> violation:
> 
> 28.2.2
> An EPT paging-structure entry is present if any of bits 2:0 is 1;
> otherwise, the entry is not present. The processor
> ignores bits 62:3 and uses the entry neither to reference another EPT
> paging-structure entry nor to produce a
> physical address. A reference using a guest-physical address whose
> translation encounters an EPT paging-struc-
> ture that is not present causes an EPT violation (see Section 28.2.3.2).
> 
> 28.2.3.2
> EPT Violations
> An EPT violation may occur during an access using a guest-physical
> address whose translation does not cause an
> EPT misconfiguration. An EPT violation occurs in any of the following
> situations:
> • Translation of the guest-physical address encounters an EPT
> paging-structure entry that is not present (see
> Section 28.2.2).

Mystery solved. Thanks!
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10] x86/emulate: Send vm_event from emulate

2019-09-17 Thread Razvan Cojocaru
On 9/17/19 5:11 PM, Alexandru Stefan ISAILA wrote:
> +bool hvm_monitor_check_p2m(unsigned long gla, gfn_t gfn, uint32_t pfec,
> +   uint16_t kind)
> +{
> +xenmem_access_t access;
> +vm_event_request_t req = {};
> +paddr_t gpa = (gfn_to_gaddr(gfn) | (gla & ~PAGE_MASK));
> +
> +ASSERT(current->arch.vm_event->send_event);
> +
> +current->arch.vm_event->send_event = false;
> +
> +if ( p2m_get_mem_access(current->domain, gfn, &access,
> +altp2m_vcpu_idx(current)) != 0 )
> +return false;
 ... next to the call here (but the maintainers of the file would
 have to judge in the end). That said, I continue to not understand
 why a not found entry means unrestricted access. Isn't it
 ->default_access which controls what such a "virtual" entry would
 permit?
>>> I'm sorry for this misleading comment. The code states that if entry was
>>> not found the access will be default_access and return 0. So in this
>>> case the default_access will be checked.
>>>
>>> /* If request to get default access. */
>>> if ( gfn_eq(gfn, INVALID_GFN) )
>>> {
>>>*access = memaccess[p2m->default_access];
>>>return 0;
>>> }
>>>
>>> If this clears thing up I can remove the "NOTE" part if the comment.
>> I'm afraid it doesn't clear things up: I'm still lost as to why
>> "entry not found" implies "full access". And I'm further lost as
>> to what the code fragment above (dealing with INVALID_GFN, but
>> not really the "entry not found" case, which would be INVALID_MFN
>> coming back from a translation) is supposed to tell me.
>>
> It is safe enough to consider a invalid mfn from hostp2 to be a
> violation. There is still a small problem with having the altp2m view
> not having the page propagated from hostp2m. In this case we have to use
> altp2m_get_effective_entry().

In the absence of clear guidance from the Intel SDM on what the hardware 
default is for a page not present in the p2m, we should probably follow 
Jan's advice and check violations against default_access for such pages.

There are indeed - as discussed privately - two cases for an altp2m though:

1. Page not found in the altp2m, but set in the hostp2m - in which case 
I suggest that we take the hostp2m value into account (with or without 
propagation to the altp2m; I see no harm in propagating the entry but 
other may see something I'm missing).

2. Page not found in both altp2m and hostp2m - in which case we should 
probably check against default_access.


Thanks,
Razvan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9] x86/emulate: Send vm_event from emulate

2019-09-11 Thread Razvan Cojocaru
On 9/11/19 2:41 PM, Jan Beulich wrote:
> On 11.09.2019 13:21, Razvan Cojocaru wrote:
>> On 9/11/19 1:39 PM, Alexandru Stefan ISAILA wrote:
>>>
>>>
>>> On 11.09.2019 12:57, Jan Beulich wrote:
>>>> On 09.09.2019 17:35, Alexandru Stefan ISAILA wrote:
>>>>> +/*
>>>>> + * Send memory access vm_events based on pfec. Returns true if the event 
>>>>> was
>>>>> + * sent and false for p2m_get_mem_access() error, no violation and event 
>>>>> send
>>>>> + * error. Assumes the caller will check arch.vm_event->send_event.
>>>>> + *
>>>>> + * NOTE: p2m_get_mem_access() can fail if the entry was not found in the 
>>>>> EPT
>>>>> + * (in which case access to it is unrestricted, so no violations can 
>>>>> occur).
>>>>> + * In this cases it is fine to continue the emulation.
>>>>> + */
>>>>> +bool hvm_monitor_check_ept(unsigned long gla, gfn_t gfn, uint32_t pfec,
>>>>> +   uint16_t kind)
>>>>
>>>> Why did you choose to have "ept" in the name and also mention it
>>>> in the commit? Is there anything in here which isn't generic p2m?
>>>
>>> The name was suggested by Razvan Cojocaru. I have no preference in the
>>> name. Maybe Tamas can suggest a good one.
>>
>> I've suggested "ept" in the name because "regular" emulation ignores it,
>> and this function takes it into account, hence the "check_ept" (which I
>> thought would be read together). It's fine to change it to something else.
> 
> Then "check_p2m" perhaps?

Sounds good to me.


Thanks,
Razvan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9] x86/emulate: Send vm_event from emulate

2019-09-11 Thread Razvan Cojocaru
On 9/11/19 1:39 PM, Alexandru Stefan ISAILA wrote:
> 
> 
> On 11.09.2019 12:57, Jan Beulich wrote:
>> On 09.09.2019 17:35, Alexandru Stefan ISAILA wrote:
>>> +/*
>>> + * Send memory access vm_events based on pfec. Returns true if the event 
>>> was
>>> + * sent and false for p2m_get_mem_access() error, no violation and event 
>>> send
>>> + * error. Assumes the caller will check arch.vm_event->send_event.
>>> + *
>>> + * NOTE: p2m_get_mem_access() can fail if the entry was not found in the 
>>> EPT
>>> + * (in which case access to it is unrestricted, so no violations can 
>>> occur).
>>> + * In this cases it is fine to continue the emulation.
>>> + */
>>> +bool hvm_monitor_check_ept(unsigned long gla, gfn_t gfn, uint32_t pfec,
>>> +   uint16_t kind)
>>
>> Why did you choose to have "ept" in the name and also mention it
>> in the commit? Is there anything in here which isn't generic p2m?
> 
> The name was suggested by Razvan Cojocaru. I have no preference in the
> name. Maybe Tamas can suggest a good one.

I've suggested "ept" in the name because "regular" emulation ignores it, 
and this function takes it into account, hence the "check_ept" (which I 
thought would be read together). It's fine to change it to something else.


Thanks,
Razvan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 08/10] remove late (on-demand) construction of IOMMU page tables

2019-08-16 Thread Razvan Cojocaru
On 8/16/19 8:19 PM, Paul Durrant wrote:
> Now that there is a per-domain IOMMU enable flag, which should be enabled if
> any device is going to be passed through, stop deferring page table
> construction until the assignment is done. Also don't tear down the tables
> again when the last device is de-assigned; defer that task until domain
> destruction.
> 
> This allows the has_iommu_pt() helper and iommu_status enumeration to be
> removed. Calls to has_iommu_pt() are simply replaced by calls to
> is_iommu_enabled(). Remaining open-code tests of iommu_hap_pt_share can also
> be replaced by calls to iommu_use_hap_pt().
> The arch_iommu_populate_page_table() and iommu_construct() functions become
> redundant, as does the 'strict mode' dom0 page_list mapping code in
> iommu_hwdom_init(), and iommu_teardown() can be made static is its only
> remaining caller, iommu_domain_destroy(), is within the same source
> module.
> 
> All in all, about 220 lines of code are removed.
> 
> NOTE: This patch will cause a small amount of extra resource to be used
>   to accommodate IOMMU page tables that may never be used, since the
>   per-domain IOMMU flag enable flag is currently set to the value
>   of the global iommu_enable flag. A subsequent patch will add an
>   option to the toolstack to allow it to be turned off if there is
>   no intention to assign passthrough hardware to the domain.

This has slipped under my radar, sorry.

Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7] x86/emulate: Send vm_event from emulate

2019-07-19 Thread Razvan Cojocaru

On 7/19/19 4:38 PM, Jan Beulich wrote:

On 19.07.2019 15:30, Razvan Cojocaru wrote:

On 7/19/19 4:18 PM, Jan Beulich wrote:

On 19.07.2019 14:34, Alexandru Stefan ISAILA wrote:

On 18.07.2019 15:58, Jan Beulich wrote:

On 03.07.2019 12:56, Alexandru Stefan ISAILA wrote:

A/D bit writes (on page walks) can be considered benign by an introspection
agent, so receiving vm_events for them is a pessimization. We try here to
optimize by fitering these events out.


But you add the sending of more events - how does "filter out" match
the actual implementation?


The events are send only if there is a mem access violation therefore we
are filtering and only sending the events that are interesting to
introspection.


Where is it that you prevent any event from being sent? As said,
reading the patch I only see new sending sites to get added.


If we don't emulate, we would receive the page-walk-generated events
_and_ the touching-the-page-the-instruction-is-touching events.


Since the patch here alters emulation paths only, how do you know
whether to emulate? In order to not receive undue events it would
seem to me that you'd first have to intercept the guest on insns
of interest ... Overall I think that the patch description, while
it has improved, is still lacking sufficient information for a
person like me (not knowing much about your monitor tools) to be
able to sensibly review this (which includes understanding the
precise scenario you want to improve).


If the hardware exits because of an EPT fault caused by a page walk, we 
end up in p2m_mem_access_check(), at which point we need to decide if we 
want to send out a vm_event or not.


If we were to send out this vm_event, and it would then be magically 
treated so that we get to actually run the instruction at RIP, said 
instruction might also hit a protected page and provoke a vm_event.


Now, if npfec.kind != npfec_kind_with_gla, then we're in the page walk 
case, and so in this case only, and only if 
d->arch.monitor.inguest_pagefault_disabled is true, we would choose to 
do this emulation trick: emulate _the_page_walk_ while ignoring the EPT, 
but don't ignore the EPT for the emulation of the actual instruction.


So where in the first case we would have 2 EPT events, in the second we 
only have one (or if the instruction at RIP does not trigger an EPT 
event, we would have 1 event in the first case, and none in the second).

Hence the filtering mentioned.

So to answer your question: "how do you know whether to emulate", we do 
so only if npfec.kind != npfec_kind_with_gla && 
d->arch.monitor.inguest_pagefault_disabled.


I hope this clears it up somewhat.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7] x86/emulate: Send vm_event from emulate

2019-07-19 Thread Razvan Cojocaru

On 7/19/19 4:18 PM, Jan Beulich wrote:

On 19.07.2019 14:34, Alexandru Stefan ISAILA wrote:

On 18.07.2019 15:58, Jan Beulich wrote:

On 03.07.2019 12:56, Alexandru Stefan ISAILA wrote:

A/D bit writes (on page walks) can be considered benign by an introspection
agent, so receiving vm_events for them is a pessimization. We try here to
optimize by fitering these events out.


But you add the sending of more events - how does "filter out" match
the actual implementation?


The events are send only if there is a mem access violation therefore we
are filtering and only sending the events that are interesting to
introspection.


Where is it that you prevent any event from being sent? As said,
reading the patch I only see new sending sites to get added.


If we don't emulate, we would receive the page-walk-generated events 
_and_ the touching-the-page-the-instruction-is-touching events.


In comparison to the "hardware" case, the case where we emulate the 
instruction with the page walk ignoring the EPT results in less events, 
hence the prevention of some events being sent out.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5] x86/emulate: Send vm_event from emulate

2019-07-01 Thread Razvan Cojocaru

On 7/1/19 5:45 PM, Alexandru Stefan ISAILA wrote:



On 01.07.2019 16:13, Jan Beulich wrote:

On 04.06.19 at 13:49,  wrote:

This patch aims to have mem access vm events sent from the emulator.
This is useful where we want to only emulate a page walk without
checking the EPT, but we still want to check the EPT when emulating
the instruction that caused the page walk. In this case, the original
EPT fault is caused by the walk trying to set the accessed or dirty
bits, but executing the instruction itself might also cause an EPT
fault if permitted to run, and this second fault should not be lost.


I'm afraid I still can't translate this into what exactly is wanted and
why. While typically we don't use examples to demonstrate that is
wanted in commit messages, I think this is a rather good candidate
for actually using such an approach. This may then ...


We use hvmemul_map_linear_addr() to intercept r/w access and
__hvm_copy() to intercept exec access.

First we try to send a vm event and if the event is sent then emulation
returns X86EMUL_RETRY in order to stop emulation on instructions that
use access protected pages. If the event is not sent then the
emulation goes on as expected.


... also help understanding this part, which I continue to be confused
by, too.


Simply put, the introspection agent wants to treat all A/D bit writes as 
benign. Receiving vm_events about them is a big pessimization, and we 
want to optimize.


We'd like to filter these events out.

To this end, we're currently (in the present incarnation of the Xen 
code) fully emulating the instruction at RIP when the hardware sees an 
EPT fault with npfec.kind != npfec_kind_with_gla. This is, however, 
incorrect, because the instruction at RIP might legitimately cause an 
EPT fault of its own (while accessing a _different_ page from the 
original one, where A/D were set).


We've tried to solve this by actually writing the A/D bits and returning 
from p2m_mem_access_check(), however that has led to a slightly 
philosophical discussion about the proper way to achieve our goals while 
taking into account speculative setting of these bits. The issues raised 
have not been satisfactorily answered in an authoritative manner to this 
day.


So the only option we seem to have left at this point is to perform the 
whole emulation, _while_ ignoring EPT restrictions for the walk part, 
and taking them into account for the "actual" emulating of the 
instruction @ RIP.


If we're sending out a vm_event, then we don't want the emulation to 
complete - since in that case we won't be able to veto whatever is 
doing. That would mean that we can't actually prevent any malicious 
activity that happens here, instead we'd only be able to report on it.


So when we see a "send-vm_event" case while emulating, we need to do two 
things:


1. send the event out;
2. _don't_ complete the emulation (return X86EMUL_RETRY).

When 2. happens, we'll hit hvm_vm_event_do_resume() again _after_ the 
introspection agent treats the event and resumes the guest. There, the 
instruction at RIP will be fully emulated (with the EPT ignored) if the 
introspection application allows it, and the guest will continue to run 
past the instruction.


That's our plan, anyway. Hopefully we're not going in a wrong direction 
about it (all help is very much appreciated).


I hope this has made thing clearer. We're not after anything out of the 
ordinary or fitting a niche case - we just want to filter out vm_events 
caused by page walks in a manner that remains safe for the guest OS user.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RESEND] MAINTAINERS: hand over vm_event maintainership

2019-06-21 Thread Razvan Cojocaru

On 6/21/19 6:15 PM, Jan Beulich wrote:

On 13.06.19 at 17:11,  wrote:

On Thu, Jun 13, 2019 at 8:01 AM Razvan Cojocaru
 wrote:


Remove myself as vm_event maintaner, add Alexandru and Petre as
Bitdefender vm_event maintainers.

Signed-off-by: Razvan Cojocaru 


Acked-by: Tamas K Lengyel 


I'll take the liberty and apply this to the revised (designated reviewer)
additions.


If you mean removing my email address from the maintainers list, that's 
fine with me, unless (as previously discussed in private with George and 
Tamas) my stepping down at this point would put undue stress on the x86 
/ mm maintainers, in which case I can continue helping through the 
transition phase (although unfortunately much less promptly than before).



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] MAINTAINERS: Add myself as reviewer for vm_event

2019-06-20 Thread Razvan Cojocaru

On 6/20/19 10:56 AM, Roger Pau Monné wrote:

On Thu, Jun 20, 2019 at 10:17:46AM +0300, Petre Pircalabu wrote:

Signed-off-by: Petre Pircalabu 
---
  MAINTAINERS | 1 +
  1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ab32e7f..0151625 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -412,6 +412,7 @@ F:  unmodified_drivers/linux-2.6/
  
  VM EVENT, MEM ACCESS and MONITOR

  M:Razvan Cojocaru 
+R: Petre Pircalabu 
  M:Tamas K Lengyel 


I would place the addition after the list of M(aintainers).


Since this and Alexandru's are trivial patches and whatever patch ends 
up upstream first will possibly require a modification of the other + 
resend, could this be possibly fixed on commit (i.e. move both reviewers 
after the maintainers lines)?



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] MAINTAINERS: Add myself as reviewer for vm_event

2019-06-20 Thread Razvan Cojocaru

On 6/20/19 10:17 AM, Petre Pircalabu wrote:

Signed-off-by: Petre Pircalabu 
---
  MAINTAINERS | 1 +
  1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ab32e7f..0151625 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -412,6 +412,7 @@ F:  unmodified_drivers/linux-2.6/
  
  VM EVENT, MEM ACCESS and MONITOR

  M:Razvan Cojocaru 
+R: Petre Pircalabu 
  M:Tamas K Lengyel 
  S:Supported
  F:tools/tests/xen-access



Acked-by: Razvan Cojocaru 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] MAINTAINERS: Add myself as a Designated reviewer to vm_event

2019-06-19 Thread Razvan Cojocaru

On 6/19/19 6:02 PM, Alexandru Stefan ISAILA wrote:

Signed-off-by: Alexandru Isaila 
---
  MAINTAINERS | 1 +
  1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ab32e7f409..78e35012e0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -412,6 +412,7 @@ F:  unmodified_drivers/linux-2.6/
  
  VM EVENT, MEM ACCESS and MONITOR

  M:Razvan Cojocaru 
+R: Alexandru Isaila 
  M:Tamas K Lengyel 
  S:Supported
  F:tools/tests/xen-access


Acked-by: Razvan Cojocaru 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 2/2] x86/mm: Add mem access rights to NPT

2019-06-13 Thread Razvan Cojocaru

On 6/13/19 6:19 PM, Tamas Lengyel wrote:

On Wed, Sep 26, 2018 at 10:49 AM George Dunlap  wrote:


From: Isaila Alexandru 

This patch adds access control for NPT mode.

There aren’t enough extra bits to store the access rights in the NPT p2m
table, so we add a radix tree to store extra information.

For efficiency:
  - Only allocate this radix tree when we first store "non-default"
extra information

  - Remove entires which match the default extra information rather
than continuing to store them

  - For superpages, only store an entry for the first gfn in the
superpage.  Use the order of the p2m entry being read to determine
the proper place to look in the radix table.

Modify p2m_type_to_flags() to accept and interpret an access value,
parallel to the ept code.

Add a set_default_access() method to the p2m-pt and p2m-ept versions
of the p2m rather than setting it directly, to deal with different
default permitted access values.

Signed-off-by: Alexandru Isaila 
Signed-off-by: George Dunlap 


The mem_access/monitor bits are fairly trivial:

Acked-by: Tamas K Lengyel 


---
NB, this is compile-tested only.


Are you planning to do some actual testing? I would highly recommend
that we see real test results before this is merged to verify
functionality.


We did do some testing with xen-access at the time, but limited testing 
with the actual full-blown introspection agent (because not all the 
needed pieces align yet).


Things did appear to work as intended.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH RESEND] MAINTAINERS: hand over vm_event maintainership

2019-06-13 Thread Razvan Cojocaru
Remove myself as vm_event maintaner, add Alexandru and Petre as
Bitdefender vm_event maintainers.

Signed-off-by: Razvan Cojocaru 

---
Re-sent to add Tamas', Petre's and Alexandru's addresses.
---
 MAINTAINERS | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6fbdc2b..d60e3a5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -405,7 +405,8 @@ L:  xen-devel@lists.xenproject.org
 F: unmodified_drivers/linux-2.6/
 
 VM EVENT, MEM ACCESS and MONITOR
-M: Razvan Cojocaru 
+M: Alexandru Isaila 
+M: Petre Pircalabu 
 M: Tamas K Lengyel 
 S: Supported
 F: tools/tests/xen-access
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] MAINTAINERS: hand over vm_event maintainership

2019-06-12 Thread Razvan Cojocaru

On 6/12/19 7:02 PM, Razvan Cojocaru wrote:

Remove myself as vm_event maintaner, add Alexandru and Petre as
Bitdefender vm_event maintainers.

Signed-off-by: Razvan Cojocaru 
---
  MAINTAINERS | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6fbdc2b..d60e3a5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -405,7 +405,8 @@ L:  xen-devel@lists.xenproject.org
  F:unmodified_drivers/linux-2.6/
  
  VM EVENT, MEM ACCESS and MONITOR

-M: Razvan Cojocaru 
+M: Alexandru Isaila 
+M: Petre Pircalabu 
  M:Tamas K Lengyel 
  S:Supported
  F:tools/tests/xen-access



I'm not sure why get-maintainers.pl did not add Tamas' email address 
(added now).


I'll still be in Bitdefender, subscribed to xen-devel and following the 
project, but I'll be quite a bit more involved in other projects and 
that makes being a maintainer difficult.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] MAINTAINERS: hand over vm_event maintainership

2019-06-12 Thread Razvan Cojocaru
Remove myself as vm_event maintaner, add Alexandru and Petre as
Bitdefender vm_event maintainers.

Signed-off-by: Razvan Cojocaru 
---
 MAINTAINERS | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6fbdc2b..d60e3a5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -405,7 +405,8 @@ L:  xen-devel@lists.xenproject.org
 F: unmodified_drivers/linux-2.6/
 
 VM EVENT, MEM ACCESS and MONITOR
-M: Razvan Cojocaru 
+M: Alexandru Isaila 
+M: Petre Pircalabu 
 M: Tamas K Lengyel 
 S: Supported
 F: tools/tests/xen-access
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 4/4] x86/mem_sharing: compile mem_sharing subsystem only when kconfig is enabled

2019-06-04 Thread Razvan Cojocaru

On 6/3/19 7:38 PM, Tamas K Lengyel wrote:

On Mon, Jun 3, 2019 at 2:26 AM Jan Beulich  wrote:



On 16.05.19 at 23:37,  wrote:

Disable it by default as it is only an experimental subsystem.

Signed-off-by: Tamas K Lengyel 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: Roger Pau Monne 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: George Dunlap 

v4: add ASSERT_UNREACHABLE to inlined functions where applicable & other
fixups
---
  xen/arch/x86/Kconfig  |  6 +-
  xen/arch/x86/domain.c |  2 ++
  xen/arch/x86/domctl.c |  2 ++
  xen/arch/x86/mm/Makefile  |  2 +-
  xen/arch/x86/x86_64/compat/mm.c   |  2 ++
  xen/arch/x86/x86_64/mm.c  |  2 ++
  xen/common/Kconfig|  3 ---
  xen/common/domain.c   |  2 +-
  xen/common/grant_table.c  |  2 +-
  xen/common/memory.c   |  2 +-
  xen/common/vm_event.c |  6 +++---
  xen/include/asm-x86/mem_sharing.h | 28 
  xen/include/asm-x86/mm.h  |  3 +++
  xen/include/xen/sched.h   |  2 +-
  xen/include/xsm/dummy.h   |  2 +-
  xen/include/xsm/xsm.h |  4 ++--
  xen/xsm/dummy.c   |  2 +-
  xen/xsm/flask/hooks.c |  4 ++--
  18 files changed, 58 insertions(+), 18 deletions(-)


Daniel, it looks like you weren't Cc-ed here, but your ack is needed.


Indeed, I've also seem to have missed CC-ing Razvan (fixed now).


Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/5] xen/vm-event: Fix interactions with the vcpu list

2019-06-03 Thread Razvan Cojocaru

On 6/3/19 3:25 PM, Andrew Cooper wrote:

vm_event_resume() should use domain_vcpu(), rather than opencoding it
without its Spectre v1 safety.

vm_event_wake_blocked() can't ever be invoked in a case where d->vcpu is
NULL, so drop the outer if() and reindent, fixing up style issues.

The comment, which is left alone, is false.  This algorithm still has
starvation issues when there is an asymetric rate of generated events.

However, the existing logic is sufficiently complicated and fragile that
I don't think I've followed it fully, and because we're trying to
obsolete this interface, the safest course of action is to leave it
alone, rather than to end up making things subtly different.

Therefore, no practical change that callers would notice.

Signed-off-by: Andrew Cooper


Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/5] xen/vm-event: Remove unnecessary vm_event_domain indirection

2019-06-03 Thread Razvan Cojocaru

On 6/3/19 3:25 PM, Andrew Cooper wrote:

The use of (*ved)-> leads to poor code generation, as the compiler can't
assume the pointer hasn't changed, and results in hard-to-follow code.

For both vm_event_{en,dis}able(), rename the ved parameter to p_ved, and
work primarily with a local ved pointer.

This has a key advantage in vm_event_enable(), in that the partially
constructed vm_event_domain only becomes globally visible once it is
fully constructed.  As a consequence, the spinlock doesn't need holding.

Furthermore, rearrange the order of operations to be more sensible.
Check for repeated enables and an bad HVM_PARAM before allocating
memory, and gather the trivial setup into one place, dropping the
redundant zeroing.

No practical change that callers will notice.

Signed-off-by: Andrew Cooper


Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/5] xen/vm-event: Misc fixups

2019-06-03 Thread Razvan Cojocaru

On 6/3/19 3:25 PM, Andrew Cooper wrote:

  * Drop redundant brackes, and inline qualifiers.
  * Insert newlines and spaces where appropriate.
  * Drop redundant NDEBUG - gdprint() is already conditional.  Fix the
logging level, as gdprintk() already prefixes the guest marker.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Razvan Cojocaru 
CC: Tamas K Lengyel 
CC: Petre Pircalabu 
---
  xen/common/vm_event.c | 21 -
  1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index 72f42b4..e872680 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -102,6 +102,7 @@ static int vm_event_enable(
  static unsigned int vm_event_ring_available(struct vm_event_domain *ved)
  {
  int avail_req = RING_FREE_REQUESTS(&ved->front_ring);
+
  avail_req -= ved->target_producers;
  avail_req -= ved->foreign_producers;
  
@@ -168,7 +169,7 @@ static void vm_event_wake_queued(struct domain *d, struct vm_event_domain *ved)

   */
  void vm_event_wake(struct domain *d, struct vm_event_domain *ved)
  {
-if (!list_empty(&ved->wq.list))
+if ( !list_empty(&ved->wq.list) )
  vm_event_wake_queued(d, ved);
  else
  vm_event_wake_blocked(d, ved);
@@ -216,8 +217,8 @@ static int vm_event_disable(struct domain *d, struct 
vm_event_domain **p_ved)
  return 0;
  }
  
-static inline void vm_event_release_slot(struct domain *d,

- struct vm_event_domain *ved)
+static void vm_event_release_slot(struct domain *d,
+  struct vm_event_domain *ved)


But inline is still asking the compiler to try and generate code that 
doesn't end up CALLing an actual function, so is it really redundant 
here? I do realize that for most cases the compiler will have its way 
with this code anyway - especially since the function is static - but 
"static" is not guaranteed to also mean "inline", is it?


In any case,
Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/5] xen/vm-event: Expand vm_event_* spinlock macros and rename the lock

2019-06-03 Thread Razvan Cojocaru

On 6/3/19 3:25 PM, Andrew Cooper wrote:

These serve no purpose, but to add to the congnitive load of following
the code.  Remove the level of indirection.

Furthermore, the lock protects all data in vm_event_domain, making
ring_lock a poor choice of name.

For vm_event_get_response() and vm_event_grab_slot(), fold the exit
paths to have a single unlock, as the compiler can't make this
optimisation itself.

No functional change.

Signed-off-by: Andrew Cooper 


Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/5] xen/vm-event: Drop unused u_domctl parameter from vm_event_domctl()

2019-06-03 Thread Razvan Cojocaru

On 6/3/19 3:25 PM, Andrew Cooper wrote:

This parameter isn't used at all.  Futhermore, elide the copyback in
failing cases, as it is only successful paths which generate data which
needs sending back to the caller.

Finally, drop a redundant d == NULL check, as that logic is all common
at the begining of do_domctl().

No functional change.

Signed-off-by: Andrew Cooper 


Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/vm_event: fix rc check for uninitialized ring

2019-05-24 Thread Razvan Cojocaru
On 5/24/19 6:23 PM, Juergen Gross wrote:
> vm_event_claim_slot() returns -EOPNOTSUPP for an uninitialized ring
> since commit 15e4dd5e866b43bbc ("common/vm_event: Initialize vm_event
> lists on domain creation"), but the callers test for -ENOSYS.
> 
> Correct the callers.
> 
> Signed-off-by: Juergen Gross 
> ---
> V2: add blank line (Jan Beulich)
> replace two further ENOSYS returns with EOPNOTSUPP (Razvan Cojocaru)

Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/vm_event: fix rc check for uninitialized ring

2019-05-24 Thread Razvan Cojocaru

On 5/24/19 4:15 PM, Juergen Gross wrote:

vm_event_claim_slot() returns -EOPNOTSUPP for an uninitialized ring
since commit 15e4dd5e866b43bbc ("common/vm_event: Initialize vm_event
lists on domain creation"), but the callers test for -ENOSYS.

Correct the callers.

Signed-off-by: Juergen Gross 
---
  xen/arch/x86/mm/p2m.c | 2 +-
  xen/common/monitor.c  | 2 +-
  xen/common/vm_event.c | 2 +-
  3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 57c5eeda91..8a9a1153a8 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1705,7 +1705,7 @@ void p2m_mem_paging_populate(struct domain *d, unsigned 
long gfn_l)
  
  /* We're paging. There should be a ring */

  int rc = vm_event_claim_slot(d, d->vm_event_paging);
-if ( rc == -ENOSYS )
+if ( rc == -EOPNOTSUPP )
  {
  gdprintk(XENLOG_ERR, "Domain %hu paging gfn %lx yet no ring "
   "in place\n", d->domain_id, gfn_l);
diff --git a/xen/common/monitor.c b/xen/common/monitor.c
index cb5f37fdb2..d5c9ff1cbf 100644
--- a/xen/common/monitor.c
+++ b/xen/common/monitor.c
@@ -98,7 +98,7 @@ int monitor_traps(struct vcpu *v, bool sync, 
vm_event_request_t *req)
  {
  case 0:
  break;
-case -ENOSYS:
+case -EOPNOTSUPP:
  /*
   * If there was no ring to handle the event, then
   * simply continue executing normally.
diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index 6e68be47bc..7b4ebced16 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -513,7 +513,7 @@ bool vm_event_check_ring(struct vm_event_domain *ved)
   * this function will always return 0 for a guest.  For a non-guest, we check
   * for space and return -EBUSY if the ring is not available.
   *
- * Return codes: -ENOSYS: the ring is not yet configured
+ * Return codes: -EOPNOTSUPP: the ring is not yet configured
   *   -EBUSY: the ring is busy
   *   0: a spot has been reserved
   *



But vm_event_grab_slot() (which ends up being what vm_event_wait_slot() 
returns), still returns -ENOSYS:


463 static int vm_event_grab_slot(struct vm_event_domain *ved, int foreign)
464 {
465 unsigned int avail_req;
466
467 if ( !ved->ring_page )
468 return -ENOSYS;

Although we can't get to that part if vm_event_check_ring() returns 
false, we should probably return -EOPNOTSUPP from there as well. In 
fact, I wonder if any of the -ENOSYS returns in vm_event.c should not be 
replaced with return -EOPNOTSUPPs.


Anyway, the change does clearly improve the code even without settling 
the above questions, so:


Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6] x86/altp2m: Aggregate get entry and populate into common funcs

2019-05-14 Thread Razvan Cojocaru

On 5/14/19 6:42 PM, Jan Beulich wrote:

On 23.04.19 at 16:30,  wrote:

--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -478,6 +478,43 @@ void p2m_unlock_and_tlb_flush(struct p2m_domain *p2m)
  mm_write_unlock(&p2m->lock);
  }
  
+int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn,

+   p2m_type_t *t, p2m_access_t *a,
+   bool prepopulate)
+{
+*mfn = ap2m->get_entry(ap2m, gfn, t, a, 0, NULL, NULL);
+
+/* Check host p2m if no valid entry in alternate */
+if ( !mfn_valid(*mfn) && !p2m_is_hostp2m(ap2m) )
+{
+struct p2m_domain *hp2m = p2m_get_hostp2m(ap2m->domain);
+unsigned int page_order;
+int rc;
+
+*mfn = __get_gfn_type_access(hp2m, gfn_x(gfn), t, a,
+ P2M_ALLOC | P2M_UNSHARE, &page_order, 0);
+
+rc = -ESRCH;
+if ( !mfn_valid(*mfn) || *t != p2m_ram_rw )
+return rc;
+
+/* If this is a superpage, copy that first */
+if ( prepopulate && page_order != PAGE_ORDER_4K )
+{
+unsigned long mask = ~((1UL << page_order) - 1);
+gfn_t gfn_aligned = _gfn(gfn_x(gfn) & mask);
+mfn_t mfn_aligned = _mfn(mfn_x(*mfn) & mask);
+
+rc = ap2m->set_entry(ap2m, gfn_aligned, mfn_aligned, page_order, 
*t, *a, 1);
+if ( rc )
+return rc;
+}
+}
+
+return 0;
+}
+
+
  mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
  p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
  unsigned int *page_order, bool_t locked)


May I ask how the placement of this function was chosen? It looks
as if all its callers live inside #ifdef CONFIG_HVM sections, just this
function doesn't (reportedly causing build issues together with
later changes).


I believe it's just an oversight. I've sent out a patch that hopefully 
fixes this in a satisfactory manner.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86/altp2m: move altp2m_get_effective_entry() under CONFIG_HVM

2019-05-14 Thread Razvan Cojocaru
All its callers live inside #ifdef CONFIG_HVM sections.

Signed-off-by: Razvan Cojocaru 
---
 xen/arch/x86/mm/p2m.c | 72 +++
 xen/include/asm-x86/p2m.h |  2 ++
 2 files changed, 38 insertions(+), 36 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index cc6661e..57c5eed 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -478,42 +478,6 @@ void p2m_unlock_and_tlb_flush(struct p2m_domain *p2m)
 mm_write_unlock(&p2m->lock);
 }
 
-int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn,
-   p2m_type_t *t, p2m_access_t *a,
-   bool prepopulate)
-{
-*mfn = ap2m->get_entry(ap2m, gfn, t, a, 0, NULL, NULL);
-
-/* Check host p2m if no valid entry in alternate */
-if ( !mfn_valid(*mfn) && !p2m_is_hostp2m(ap2m) )
-{
-struct p2m_domain *hp2m = p2m_get_hostp2m(ap2m->domain);
-unsigned int page_order;
-int rc;
-
-*mfn = __get_gfn_type_access(hp2m, gfn_x(gfn), t, a,
- P2M_ALLOC | P2M_UNSHARE, &page_order, 0);
-
-rc = -ESRCH;
-if ( !mfn_valid(*mfn) || *t != p2m_ram_rw )
-return rc;
-
-/* If this is a superpage, copy that first */
-if ( prepopulate && page_order != PAGE_ORDER_4K )
-{
-unsigned long mask = ~((1UL << page_order) - 1);
-gfn_t gfn_aligned = _gfn(gfn_x(gfn) & mask);
-mfn_t mfn_aligned = _mfn(mfn_x(*mfn) & mask);
-
-rc = ap2m->set_entry(ap2m, gfn_aligned, mfn_aligned, page_order, 
*t, *a, 1);
-if ( rc )
-return rc;
-}
-}
-
-return 0;
-}
-
 mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
 p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
 unsigned int *page_order, bool_t locked)
@@ -2378,6 +2342,42 @@ int unmap_mmio_regions(struct domain *d,
 
 #ifdef CONFIG_HVM
 
+int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn,
+   p2m_type_t *t, p2m_access_t *a,
+   bool prepopulate)
+{
+*mfn = ap2m->get_entry(ap2m, gfn, t, a, 0, NULL, NULL);
+
+/* Check host p2m if no valid entry in alternate */
+if ( !mfn_valid(*mfn) && !p2m_is_hostp2m(ap2m) )
+{
+struct p2m_domain *hp2m = p2m_get_hostp2m(ap2m->domain);
+unsigned int page_order;
+int rc;
+
+*mfn = __get_gfn_type_access(hp2m, gfn_x(gfn), t, a,
+ P2M_ALLOC | P2M_UNSHARE, &page_order, 0);
+
+rc = -ESRCH;
+if ( !mfn_valid(*mfn) || *t != p2m_ram_rw )
+return rc;
+
+/* If this is a superpage, copy that first */
+if ( prepopulate && page_order != PAGE_ORDER_4K )
+{
+unsigned long mask = ~((1UL << page_order) - 1);
+gfn_t gfn_aligned = _gfn(gfn_x(gfn) & mask);
+mfn_t mfn_aligned = _mfn(mfn_x(*mfn) & mask);
+
+rc = ap2m->set_entry(ap2m, gfn_aligned, mfn_aligned, page_order, 
*t, *a, 1);
+if ( rc )
+return rc;
+}
+}
+
+return 0;
+}
+
 void p2m_altp2m_check(struct vcpu *v, uint16_t idx)
 {
 if ( altp2m_active(v->domain) )
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 2d0bda1..7e71bf0 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -514,6 +514,7 @@ static inline unsigned long mfn_to_gfn(struct domain *d, 
mfn_t mfn)
 return mfn_x(mfn);
 }
 
+#ifdef CONFIG_HVM
 #define AP2MGET_prepopulate true
 #define AP2MGET_query false
 
@@ -525,6 +526,7 @@ static inline unsigned long mfn_to_gfn(struct domain *d, 
mfn_t mfn)
 int altp2m_get_effective_entry(struct p2m_domain *ap2m, gfn_t gfn, mfn_t *mfn,
p2m_type_t *t, p2m_access_t *a,
bool prepopulate);
+#endif
 
 /* Deadlock-avoidance scheme when calling get_gfn on different gfn's */
 struct two_gfns {
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults

2019-05-14 Thread Razvan Cojocaru



On 5/14/19 5:16 PM, Jan Beulich wrote:

On 14.05.19 at 15:47,  wrote:

Mem event emulation failed (5): d5v0 32bit @ 001b:6d96efff -> c5 f9 f5
05 c0 be ad 6d c5 e1 fe 1d a0 20 af 6d

Looking at the source code, the emulator does appear to support
vpmaddwd, however only for EVEX:

http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/x86_emulate/x
86_emulate.c;h=032995ea586aa7dd90a1953b6ded656436652049;hb=refs/heads/staging
#l6696

whereas our fail case uses VEX.

This may be in the works in the aforementioned series, but is
legitimately unsupported in 4.13 staging.


Hmm, interesting. The origin of the encoding is at MMX times,
which means it's more than just VPMADDWD that's missing, and
it's been an omission back in the MMX/SSE2 series then. That's
a genuine oversight, and in the light of this I'd like to apologize
for my unfriendly initial reaction. I'll see about getting this fixed.
(It would have helped if you had shared the encoding right away,
since the mnemonic and operands are now often insufficient.)


No problem at all. Indeed, sharing the encoding would have cleared 
things up faster.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults

2019-05-14 Thread Razvan Cojocaru

On 5/13/19 5:15 PM, Razvan Cojocaru wrote:

On 5/13/19 5:06 PM, Jan Beulich wrote:

On 13.05.19 at 15:58,  wrote:

On 11/27/18 12:49 PM, Razvan Cojocaru wrote:

With a sufficiently complete insn emulator, single-stepping should
not be needed at all imo. Granted we're not quite there yet with
the emulator, but we've made quite a bit of progress. As before,
if there are particular instructions you know of that the emulator
doesn't handle yet, please keep pointing these out. Last I know
were some AVX move instructions, which have long been
implemented.

True, I haven't seen emulator issues in that respect with staging - the
emulator appears lately to be sufficiently complete. Thank you very 
much

for your help and support - we'll definitely point out unsupported
instructions if we spot some again.


We've come accross a new instruction that the emulator can't handle in
Xen 4.13-unstable today:

vpmaddwd xmm4,xmm4,XMMWORD PTR ds:0x513fbb20

Perhaps there are plans for this to go into the emulator as well?


You're kidding? This is already in 4.12.0, and if it weren't I'm sure
you're aware there are about 40 more AVX512 patches pending
review.


Right, I did indeed forget about the pending review part, for some 
reason I was sure they made it in. I've double-checked and we really are 
using 4.13-unstable - but we've also made changes to the emulator, 
working on the send-vm-events-from-the-emulator patch, so we'll revert 
to a pristine staging and retry, there's a chance this happens because 
of our changes.


We'll find out what's going on exactly.


I promised I'd return with more details. After some debugging, it 
certainly looks like the emulator returns UNIMPLEMENTED (5):


Mem event emulation failed (5): d5v0 32bit @ 001b:6d96efff -> c5 f9 f5 
05 c0 be ad 6d c5 e1 fe 1d a0 20 af 6d


Looking at the source code, the emulator does appear to support 
vpmaddwd, however only for EVEX:


http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/x86_emulate/x86_emulate.c;h=032995ea586aa7dd90a1953b6ded656436652049;hb=refs/heads/staging#l6696

whereas our fail case uses VEX.

This may be in the works in the aforementioned series, but is 
legitimately unsupported in 4.13 staging.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs

2019-05-13 Thread Razvan Cojocaru
On 5/13/19 7:18 PM, Mathieu Tarral wrote:
> Le vendredi, mai 10, 2019 5:21 PM, Andrew Cooper  
> a écrit :
> 
>> On 10/05/2019 16:17, Mathieu Tarral wrote:
>>
>>> Le jeudi, mai 9, 2019 6:42 PM, Andrew Cooper andrew.coop...@citrix.com a 
>>> écrit :
>>>
 Therefore, the conclusion to draw is that it is a logical bug somewhere.
>>> The bug is still here, so we can exclude a microcode issue.
>>
>> Good - that is one further angle excluded.  Always make sure you are
>> running with up-to-date microcode, but it looks like we back to
>> investigating a logical bug in libvmi or Xen.
> 
> I played with tool/tests/xen-access this afternoon.
> 
> The tool is working, i could intercept breakpoints, cpuid, write and exec mem 
> accesses, etc..
> 
> However, using altp2m related intercepts leads to a guest crash sometimes:
> 
> Windows 7 x64, 4 VCPUs
> - altp2m_write: crash
> - altp2m_exec: crash
> - altp2m_write_no_gpt: frozen
> 
> Windows 7 x64, 1 VCPU
> - altp2m_write: crash
> - altp2m_exec: OK
> - altp2m_write_no_gpt: frozen
> 
> "frozen" means that xen-access receives VMI events, bug the guest is frozen 
> until I decide to stop xen-access.
> I'm wondering what kind of exec events it received because they are not the 
> same, so it's not looping
> over the same RIP over and over. (?)
I think you're simply tripping some OS timer because you're slowing the
guest down in the crash case, and simply keep the guest too busy
handling events in the "freeze" case. Remember that there's quite a
delay running each offending instruction: one vm_event saying you've got
a violation, a reply saying "put this VCPU in single-step mode _and_
switch to the unrestricted EPT view", another vm_event saying
"instruction executed", followed by anoher reply saying "switch back to
the restricted EPT _and_ take the VCPU out of single-step mode".

Restricting the whole of the guest's memory (and so doing this dance for
_every_ instruction causing a fault) is practically guaranteed to upset
the OS. A little EPT restricting goes a long way.

Of course, if this could be improved so that even stress-tests (which is
basically what xen-access is) leave the guest running smoothly, that'd
be fantastic.


Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults

2019-05-13 Thread Razvan Cojocaru

On 5/13/19 5:06 PM, Jan Beulich wrote:

On 13.05.19 at 15:58,  wrote:

On 11/27/18 12:49 PM, Razvan Cojocaru wrote:

With a sufficiently complete insn emulator, single-stepping should
not be needed at all imo. Granted we're not quite there yet with
the emulator, but we've made quite a bit of progress. As before,
if there are particular instructions you know of that the emulator
doesn't handle yet, please keep pointing these out. Last I know
were some AVX move instructions, which have long been
implemented.

True, I haven't seen emulator issues in that respect with staging - the
emulator appears lately to be sufficiently complete. Thank you very much
for your help and support - we'll definitely point out unsupported
instructions if we spot some again.


We've come accross a new instruction that the emulator can't handle in
Xen 4.13-unstable today:

vpmaddwd xmm4,xmm4,XMMWORD PTR ds:0x513fbb20

Perhaps there are plans for this to go into the emulator as well?


You're kidding? This is already in 4.12.0, and if it weren't I'm sure
you're aware there are about 40 more AVX512 patches pending
review.


Right, I did indeed forget about the pending review part, for some 
reason I was sure they made it in. I've double-checked and we really are 
using 4.13-unstable - but we've also made changes to the emulator, 
working on the send-vm-events-from-the-emulator patch, so we'll revert 
to a pristine staging and retry, there's a chance this happens because 
of our changes.


We'll find out what's going on exactly.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults

2019-05-13 Thread Razvan Cojocaru

On 11/27/18 12:49 PM, Razvan Cojocaru wrote:

With a sufficiently complete insn emulator, single-stepping should
not be needed at all imo. Granted we're not quite there yet with
the emulator, but we've made quite a bit of progress. As before,
if there are particular instructions you know of that the emulator
doesn't handle yet, please keep pointing these out. Last I know
were some AVX move instructions, which have long been
implemented.

True, I haven't seen emulator issues in that respect with staging - the
emulator appears lately to be sufficiently complete. Thank you very much
for your help and support - we'll definitely point out unsupported
instructions if we spot some again.


We've come accross a new instruction that the emulator can't handle in 
Xen 4.13-unstable today:


vpmaddwd xmm4,xmm4,XMMWORD PTR ds:0x513fbb20

Perhaps there are plans for this to go into the emulator as well?


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Altp2m use with PML can deadlock Xen

2019-05-10 Thread Razvan Cojocaru

On 5/10/19 5:42 PM, Tamas K Lengyel wrote:

On Thu, May 9, 2019 at 10:19 AM Andrew Cooper  wrote:


On 09/05/2019 14:38, Tamas K Lengyel wrote:

Hi all,
I'm investigating an issue with altp2m that can easily be reproduced
and leads to a hypervisor deadlock when PML is available in hardware.
I haven't been able to trace down where the actual deadlock occurs.

The problem seem to stem from hvm/vmx/vmcs.c:vmx_vcpu_flush_pml_buffer
that calls p2m_change_type_one on all gfns that were recorded the PML
buffer. The problem occurs when the PML buffer full vmexit happens
while the active p2m is an altp2m. Switching  p2m_change_type_one to
work with the altp2m instead of the hostp2m however results in EPT
misconfiguration crashes.

Adding to the issue is that it seem to only occur when the altp2m has
remapped GFNs. Since PML records entries based on GFN leads me to
question whether it is safe at all to use PML when altp2m is used with
GFN remapping. However, AFAICT the GFNs in the PML buffer are not the
remapped GFNs and my understanding is that it should be safe as long
as the GFNs being tracked by PML are never the remapped GFNs.

Booting Xen with ept=pml=0 resolves the issue.

If anyone has any insight into what might be happening, please let me know.



I could have sworn that George spotted a problem here and fixed it.  I
shouldn't be surprised if we have more.

The problem that PML introduced (and this is mostly my fault, as I
suggested the buggy solution) is that the vmexit handler from one vcpu
pauses others to drain the PML queue into the dirty bitmap.  Overall I
wasn't happy with the design and I've got some ideas to improve it, but
within the scope of how altp2m was engineered, I proposed
domain_pause_except_self().

As it turns out, that is vulnerable to deadlocks when you get two vcpus
trying to pause each other and waiting for each other to become
de-scheduled.


Makes sense.



I see this has been reused by the altp2m code, but it *should* be safe
to deadlocks now that it takes the hypercall_deadlock_mutext.


Is that already in staging or your x86-next branch? I would like to
verify that the problem is still present or not with that change. I
tested with Xen 4.12 release and that definitely still deadlocks.


I don't know if Andrew is talking about this patch (probably not, but it 
looks at least related):


http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=24d5282527f4647907b3572820b5335c15cd0356;hp=29d28b29190ba09d53ae7e475108def84e16e363

Since there's a "Release-acked" tag on it, I think it's in 4.12.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [VMI] How to add support for MOV-TO-DRx events ?

2019-05-09 Thread Razvan Cojocaru
On 5/10/19 2:03 AM, Tamas K Lengyel wrote:
>> Either way, everything comes down to what behaviour is wanted to start with.
> As I said, I think adding that monitoring capability is fine as long
> as its limitation is clearly documented.

Right. My thoughts as well.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [VMI] How to add support for MOV-TO-DRx events ?

2019-05-09 Thread Razvan Cojocaru
On 5/10/19 12:31 AM, Andrew Cooper wrote:
> What we'll have to do is end up in a position where we can have some
> real %dr settings given by the VMI agent, and some shadow %dr settings
> which the guest interacts with.  Also I should warn you at this point
> that, because of how the registers work, It will not be possible to have
> guest-shadowed %dr functioning at the same time as VMI-provided %dr
> settings.
> 
> I guess the main usecase here is simply hiding from the guest kernel
> that debugging activities are in use, and we are ok to break the real
> use of gdb/other inside the guest?  Razvan/Tamas: As your the
> maintainers, it is your call, ultimately.

What worries me here is that in that case it becomes easier for a rogue
application inside the guest to figure out that the guest's being
monitored, if I understand things correctly.

Of course, a dom0 introspection agent may choose to simply not subscribe
to DR events, and thus not alter the current flow at all, which makes
things better.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [VMI] How to add support for MOV-TO-DRx events ?

2019-05-09 Thread Razvan Cojocaru
On 5/9/19 11:57 PM, Mathieu Tarral wrote:
> Hi,
> 
> following a previous conversation, i would like to catch MOV-TO-DRx VMI 
> events to prevent the guest from disabling my hardware breakpoints.
> 
> @Tamas pointed me to this header:
> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/vm_event.h;h=b2bafc0d77f9758e42b0d53c05a7e6bb86c86686;hb=HEAD#l154
> 
> And, as far as I can tell, I have to
> - add a new REASON
> #define VM_EVENT_REASON_WRITE_DEBUGREG  15
> 
> - add a new struct
> struct vm_event_write_debugreg {
> uint32_t index;
> uint32_t _pad;
> uint64_t new_value;
> uint64_t old_value;
> };
> 
> - insert it into the vm_event_st union
> 
> Can you give me more pointer and guidance how to implement this into Xen ?

You probably want to change the write_debugreg() macro into a function
that does what's currently being done + send out the vm_event. You also
probably need to think about whether you want the write to be
preemptable or not (I'm guessing you don't, in which case it's all simpler).

> I have never submitted a patch, nor looked into Xen source code.

https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches

> Should we create a single event for MOV-TO-REGx VMI events ?
> It would void copy pasting and duplicating code.

I don't understand this question.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] x86/vm_event: add gdtr_base to the vm_event structure

2019-05-03 Thread Razvan Cojocaru

On 5/3/19 11:09 AM, Jan Beulich wrote:

On 03.05.19 at 00:54,  wrote:

Receiving this register is useful for introspecting 32-bit Windows when the
event being trapped happened while in ring3.

Signed-off-by: Tamas K Lengyel 
Cc: Razvan Cojocaru 
Cc: Tamas K Lengyel 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: Roger Pau Monne 
---
v2: add gdtr limit
v3: use uint32_t to fit the 20 bits


As per Andrew's response I think v2 is it.


Yes, please. This will also allow us to reuse the existing (remaining) 
pad bits in the future for another limit (for idtr, perhaps).



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86/vm_event: add gdtr_base to the vm_event structure

2019-05-02 Thread Razvan Cojocaru
On 5/3/19 12:42 AM, Tamas K Lengyel wrote:
> Receiving this register is useful for introspecting 32-bit Windows when the
> event being trapped happened while in ring3.
> 
> Signed-off-by: Tamas K Lengyel 
> Cc: Razvan Cojocaru 
> Cc: Tamas K Lengyel 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Wei Liu 
> Cc: Roger Pau Monne 
> ---
> v2: add gdtr limit

Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/vm_event: add gdtr_base to the vm_event structure

2019-05-02 Thread Razvan Cojocaru

On 5/2/19 4:09 PM, Tamas K Lengyel wrote:

On Thu, May 2, 2019 at 2:57 AM Jan Beulich  wrote:



On 02.05.19 at 10:25,  wrote:

On 5/2/19 2:57 AM, Tamas K Lengyel wrote:

Receiving this register is useful for introspecting 32-bit Windows when the
event being trapped happened while in ring3.

Signed-off-by: Tamas K Lengyel 
Cc: Razvan Cojocaru 
Cc: Tamas K Lengyel 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: Roger Pau Monne 
---
   xen/arch/x86/vm_event.c   | 5 +
   xen/include/public/vm_event.h | 3 ++-
   2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
index 51c3493b1d..873788e32f 100644
--- a/xen/arch/x86/vm_event.c
+++ b/xen/arch/x86/vm_event.c
@@ -179,6 +179,10 @@ static void vm_event_pack_segment_register(enum

x86_segment segment,

   reg->es_sel = seg.sel;
   break;

+case x86_seg_gdtr:
+reg->gdtr_base = seg.base;
+break;


Please also add limit, ar, sel, like the others do.


There's no ar or sel for GDT (and IDT). Instead, because of ...


In addition to
making this modification look less strange (since, in contrast to the
function name, nothing is packed for gdtr_base), it will also save
future work for applications wanting to use gdtr which also require
backwards compatibility with previous vm_event versions.

As you know, for each such modification we need to have a separate
vm_event_vX header in our applications so that we're ready to create a
ring buffer using requests and replies of the right size, and also to be
able to properly interpret the ring buffer data, so the least frequent
changes to the vm_event struct, the better.


... this I wonder whether the IDT details shouldn't get added at
the same time.


The churn of the header is not that big of a deal - I do the same in
LibVMI and it's a largely copy-paste job that takes perhaps ~5m to add
(see 
https://github.com/tklengyel/libvmi/commit/7509ce56d0408dbec4e374b8780da69a7bd212e8).
So that should not be a factor in deciding whether we add a needed
extra piece or not. Since the vm_event ABI is changing for Xen 4.13
now the ABI version will only be bumped once for 4.13. So we should be
able to add/remove/shuffle whatever we want while 4.13 merge window is
open.

That said I don't have a use for idt and gdtr_limit that warrants
having to receive it via the vm_event structure - those pieces are
always just a hypercall away if needed. So in the vm_event structure I
tend to just put the registers needed often enough to warrant avoiding
that extra hypercall. But if Razvan says he has a use for them, I can
add them here.


We do use them both - idtr and gdtr, both base and limit, but we are 
indeed getting them via hypercall now. Considering that, since we did 
add gdtr_base I think it would be great if we could also add gdtr_limit.


Adding idtr as well would _theoretically_ be a nice speed optimization 
(removing the need for a hypercall), but it might also be a speed 
pessimization generally applicable to increasing the vm_event size 
(since a VCPU with no more space left in the vm_event ring buffer will 
be blocked more and go through more locking logic).


My point is, at the moment it's fine to skip idtr if it's not required 
by Jan, but if we do add either then let's please add both _base and _limit.


In a hopefully near future we'll be able to use as much memory as 
necessary for storing vm_event data and just cache everything in the 
ring buffer, avoing all the "get context" hypercalls.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/vm_event: add gdtr_base to the vm_event structure

2019-05-02 Thread Razvan Cojocaru



On 5/2/19 11:52 AM, Andrew Cooper wrote:

On 02/05/2019 09:25, Razvan Cojocaru wrote:

On 5/2/19 2:57 AM, Tamas K Lengyel wrote:

Receiving this register is useful for introspecting 32-bit Windows
when the
event being trapped happened while in ring3.

Signed-off-by: Tamas K Lengyel 
Cc: Razvan Cojocaru 
Cc: Tamas K Lengyel 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: Roger Pau Monne 
---
   xen/arch/x86/vm_event.c   | 5 +
   xen/include/public/vm_event.h | 3 ++-
   2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
index 51c3493b1d..873788e32f 100644
--- a/xen/arch/x86/vm_event.c
+++ b/xen/arch/x86/vm_event.c
@@ -179,6 +179,10 @@ static void vm_event_pack_segment_register(enum
x86_segment segment,
   reg->es_sel = seg.sel;
   break;
   +    case x86_seg_gdtr:
+    reg->gdtr_base = seg.base;
+    break;


Please also add limit, ar, sel, like the others do.


In Xen, we model GDTR/IDTR just like all other segments in the segment
cache for convenience/consistency, including faking up of some default
attributes.

However, there is no such thing as a selector or access rights for them,
and the VMCS lacks the fields, while the VMCB marks the fields as
reserved.   It is almost certainly not worth wasting the space in the ring.


Right, I got carried away there: I saw gdtr_limit and didn't check that 
if the rest is available.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86/vmx: correctly gather gs_shadow value for current vCPU

2019-05-02 Thread Razvan Cojocaru

On 5/2/19 11:45 AM, Andrew Cooper wrote:

On 02/05/2019 07:20, Razvan Cojocaru wrote:

On 5/2/19 2:52 AM, Tamas K Lengyel wrote:

Currently the gs_shadow value is only cached when the vCPU is being scheduled
out by Xen. Reporting this (usually) stale value through vm_event is incorrect,
since it doesn't represent the actual state of the vCPU at the time the event
was recorded. This prevents vm_event subscribers from correctly finding kernel
structures in the guest when it is trapped while in ring3.

Refresh shadow_gs value when the context being saved is for the current vCPU.

Signed-off-by: Tamas K Lengyel 
Cc: Razvan Cojocaru 
Cc: Jun Nakajima 
Cc: Kevin Tian 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: Roger Pau Monne 
---
v2: move fix to hvm so vm_event doesn't have to know specifics
---
  xen/arch/x86/hvm/vmx/vmx.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 283eb7b34d..5154ecc2a8 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -779,12 +779,17 @@ static void vmx_load_cpu_state(struct vcpu *v, struct 
hvm_hw_cpu *data)
  
  static void vmx_save_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt)

  {
+if ( v == current )
+vmx_save_guest_msrs(v);

vmx_save_guest_msrs() is simple enough that the if is probably not
necessary here (we can just call the function directly, regardless of
what v holds).


Why?  Doing that would fully corrupt v's state when called in remote
context, as you'd clobber v's gs_shadow which whatever the value was
from the current vcpu.


Good point, I've missed that.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86/vmx: correctly gather gs_shadow value for current vCPU

2019-05-02 Thread Razvan Cojocaru

On 5/2/19 11:36 AM, Jan Beulich wrote:

On 02.05.19 at 08:20,  wrote:

On 5/2/19 2:52 AM, Tamas K Lengyel wrote:

--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -779,12 +779,17 @@ static void vmx_load_cpu_state(struct vcpu *v, struct 
hvm_hw_cpu *data)
  
  static void vmx_save_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt)

  {
+if ( v == current )
+vmx_save_guest_msrs(v);


vmx_save_guest_msrs() is simple enough that the if is probably not
necessary here (we can just call the function directly, regardless of
what v holds).


Avoiding an MSR access is perhaps worth the conditional.


Fair enough.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/vm_event: add gdtr_base to the vm_event structure

2019-05-02 Thread Razvan Cojocaru

On 5/2/19 2:57 AM, Tamas K Lengyel wrote:

Receiving this register is useful for introspecting 32-bit Windows when the
event being trapped happened while in ring3.

Signed-off-by: Tamas K Lengyel 
Cc: Razvan Cojocaru 
Cc: Tamas K Lengyel 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: Roger Pau Monne 
---
  xen/arch/x86/vm_event.c   | 5 +
  xen/include/public/vm_event.h | 3 ++-
  2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
index 51c3493b1d..873788e32f 100644
--- a/xen/arch/x86/vm_event.c
+++ b/xen/arch/x86/vm_event.c
@@ -179,6 +179,10 @@ static void vm_event_pack_segment_register(enum 
x86_segment segment,
  reg->es_sel = seg.sel;
  break;
  
+case x86_seg_gdtr:

+reg->gdtr_base = seg.base;
+break;


Please also add limit, ar, sel, like the others do. In addition to 
making this modification look less strange (since, in contrast to the 
function name, nothing is packed for gdtr_base), it will also save 
future work for applications wanting to use gdtr which also require 
backwards compatibility with previous vm_event versions.


As you know, for each such modification we need to have a separate 
vm_event_vX header in our applications so that we're ready to create a 
ring buffer using requests and replies of the right size, and also to be 
able to properly interpret the ring buffer data, so the least frequent 
changes to the vm_event struct, the better.


Petre is currently working on the vm_event changes that will hopefully 
enable us to just cache everything that the getcontext_partial hypercall 
retrieves.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86/vmx: correctly gather gs_shadow value for current vCPU

2019-05-01 Thread Razvan Cojocaru
On 5/2/19 2:52 AM, Tamas K Lengyel wrote:
> Currently the gs_shadow value is only cached when the vCPU is being scheduled
> out by Xen. Reporting this (usually) stale value through vm_event is 
> incorrect,
> since it doesn't represent the actual state of the vCPU at the time the event
> was recorded. This prevents vm_event subscribers from correctly finding kernel
> structures in the guest when it is trapped while in ring3.
> 
> Refresh shadow_gs value when the context being saved is for the current vCPU.
> 
> Signed-off-by: Tamas K Lengyel 
> Cc: Razvan Cojocaru 
> Cc: Jun Nakajima 
> Cc: Kevin Tian 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Wei Liu 
> Cc: Roger Pau Monne 
> ---
> v2: move fix to hvm so vm_event doesn't have to know specifics
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 283eb7b34d..5154ecc2a8 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -779,12 +779,17 @@ static void vmx_load_cpu_state(struct vcpu *v, struct 
> hvm_hw_cpu *data)
>  
>  static void vmx_save_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt)
>  {
> +if ( v == current )
> +vmx_save_guest_msrs(v);

vmx_save_guest_msrs() is simple enough that the if is probably not
necessary here (we can just call the function directly, regardless of
what v holds).

But that's not technically an issue, so if nobody else minds:

Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/vm_event: correctly gather gs_shadow value

2019-05-01 Thread Razvan Cojocaru
On 5/1/19 6:01 PM, Tamas K Lengyel wrote:
> On Wed, May 1, 2019 at 8:53 AM Tamas K Lengyel  wrote:
>>
>> On Wed, May 1, 2019 at 8:20 AM Razvan Cojocaru
>>  wrote:
>>>
>>> On 5/1/19 4:58 PM, Tamas K Lengyel wrote:
>>>>>> It might be worth introducing a "sync state from hw" hook which collects
>>>>>> all the data we intend to pass to the introspection agent.
>>>>>
>>>>> You mean adding another hvm hook?
>>>>
>>>> Actually, instead of another hook I think what would make sense it to
>>>> just update vmx_save_vmcs_ctxt to automatically refresh the cached
>>>> register values when it's called with "v == current". Thoughts?
>>>
>>> That's probably the better way to go about it, since otherwise the
>>> xc_hvm_getcontext_partial() hypercall will suffer from the same problem.
>>> (there are two ways of getting guest state: one is via the vm_event
>>> cached values, the other is via the aforementioned hypercall).
>>
>> True, although issuing the hypercall in the vm_event callback is
>> actually fine - that's how I found the issue to begin with, since the
>> vCPU will be scheduled out with the cached registers refreshed and
>> thus be different then what the vm_event itself had. But other callers
>> of the hypercall can run into the problem if the guest/vcpu is not
>> paused.
> 
> Actually, doing the "v == current" check wouldn't really do anything
> for the hypercall since it's not the domain issuing the hypercall on
> itself. The only way to ensure that hypercall is returning correct
> values would be to pause the vCPU.

I've discussed this with Andrew in the meantime and he's kindly pointed
out that for the hypercall we pause from remote context, which forces a
de-schedule. So the hypercall _should_ be fine. But we write data to the
vm_event ring from the current context, where state might actually be in
hardware.

He'll probably chime in with additional suggestions / comments.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/vm_event: correctly gather gs_shadow value

2019-05-01 Thread Razvan Cojocaru
On 5/1/19 4:58 PM, Tamas K Lengyel wrote:
>>> It might be worth introducing a "sync state from hw" hook which collects
>>> all the data we intend to pass to the introspection agent.
>>
>> You mean adding another hvm hook?
> 
> Actually, instead of another hook I think what would make sense it to
> just update vmx_save_vmcs_ctxt to automatically refresh the cached
> register values when it's called with "v == current". Thoughts?

That's probably the better way to go about it, since otherwise the
xc_hvm_getcontext_partial() hypercall will suffer from the same problem.
(there are two ways of getting guest state: one is via the vm_event
cached values, the other is via the aforementioned hypercall).


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/vm_event: correctly gather gs_shadow value

2019-05-01 Thread Razvan Cojocaru
On 5/1/19 7:22 AM, Tamas K Lengyel wrote:
> Currently the gs_shadow value is only cached when the vCPU is being scheduled
> out by Xen. Reporting this (usually) stale value through vm_event is 
> incorrect,
> since it doesn't represent the actual state of the vCPU at the time the event
> was recorded. This prevents vm_event subscribers from correctly finding kernel
> structures in the guest when it is trapped while in ring3.

Isn't the VCPU always scheduled out (since it is paused) with sync
vm_events? Is this an async fix only?

In any case,

Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen 4.11.1: guest uses altp2m to crash host

2019-04-25 Thread Razvan Cojocaru
On 4/25/19 1:04 PM, Jan Beulich wrote:
 On 15.04.19 at 16:10,  wrote:
>> On 4/15/19 5:04 PM, Matt Leinhos wrote:
>>> You are correct -- cherry picking that commit into RELEASE-4.11.1 
>>> addresses the crash. The problem cannot be reproduced on 4.12.
>>>
>>> I hope this helps.
>>
>> Thank you for confirming the fix!
>>
>> To the deciders: would it be reasonable to ask patches like this to be 
>> backported to the future 4.11 release?
> 
> Well, first of all you can always send a request. And yes, in general
> I'd consider this reasonable, but I say so without having looked at
> the actual commit(s) in question.

Thank you for the answer. This is the commit in question:

http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=1dddfff4c39d3db17dfa709b1c57f44e3ed352e3;hp=68465944c81fa2fbc40bf29ec6084072af5e48ec

AFAICT there's nothing controversial about it, and it should easily
apply to 4.11, so if possible I propose that we consider backporting it
to the next 4.11 release.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] VMI: singlestep event not received

2019-04-24 Thread Razvan Cojocaru



On 4/24/19 6:25 PM, Tamas K Lengyel wrote:

On Mon, Apr 22, 2019 at 6:00 PM Mathieu Tarral
 wrote:


On Monday 22 April 2019 11:22, Razvan Cojocaru  
wrote:


On 4/22/19 1:26 AM, Mathieu Tarral wrote:


The problem is that RtlUserThreadStart is paged-out, so i'm trying to reach it 
via singlestepping as a backup solution.


FWIW, you can just use xc_hvm_inject_trap() with a TRAP_page_fault
parameter to cause the OS to bring a paged out page back into main memory.


The problem is that i'm fully based on LibVMI, which doesn't provide an API for 
page injection yet.

I think Tamas was waiting for a full support of pagefault injection on Xen 
before opening an API.

Maybe we should open an issue to track the progress about it.

Thanks anyway for the suggestion !


Not having a LibVMI wrapper should not prevent you from using a libxc
function directly. There are many corner-cases regarding when it is
safe to inject a pagefault into the guest to bring back pages and that
logic is not trivial enough to be implemented in LibVMI. It's way too
specific to the usecase to decide when it's safe to do that. So in
this case, if you know you are in ring3 of your target process you can
just call the libxc function directly.


That's very very true, however I think we should assume that libraries 
such as libvmi and libbdvmi are used by client code smart enough to 
figure out, using said libraries, enough things about what's appropriate 
(or not) to do, and  let them do it rather than second-guess them. 
Hence, in libbdvmi I've wrapped that, and added a comment basically 
saying what you're saying above:


https://github.com/bitdefender/libbdvmi/blob/master/src/xendriver.cpp#L542

If we end up having the client code call backend-specific code directly 
we're not, after all, really abstracting it with our wrapper libraries.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] x86/mm: Make change_type_range return error

2019-04-24 Thread Razvan Cojocaru

On 4/24/19 5:46 PM, Roger Pau Monné wrote:

On Wed, Apr 24, 2019 at 02:27:32PM +, Alexandru Stefan ISAILA wrote:

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 9e81a30cc4..27697d5a77 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1028,7 +1028,7 @@ int p2m_change_type_one(struct domain *d, unsigned long 
gfn_l,
  }
  
  /* Modify the p2m type of [start, end_exclusive) from ot to nt. */

-static void change_type_range(struct p2m_domain *p2m,
+static int change_type_range(struct p2m_domain *p2m,
unsigned long start, unsigned long 
end_exclusive,
p2m_type_t ot, p2m_type_t nt)
  {
@@ -1053,15 +1053,11 @@ static void change_type_range(struct p2m_domain *p2m,
   * This should be revisited later, but for now post a warning.
   */
  if ( unlikely(end > host_max_pfn) )
-{
-printk(XENLOG_G_WARNING "Dom%d logdirty rangeset clipped to 
max_mapped_pfn\n",
-   d->domain_id);
-end = invalidate_end = host_max_pfn;
-}
+return -EINVAL;
  
  /* If the requested range is out of scope, return doing nothing. */

  if ( start > end )
-return;
+return 0;


Since you are already changing the behavior of the function above this
should also return EINVAL IMO.


My fault, I suggested 0 there. :)


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] VMI: singlestep event not received

2019-04-22 Thread Razvan Cojocaru

On 4/22/19 1:26 AM, Mathieu Tarral wrote:

The problem is that RtlUserThreadStart is paged-out, so i'm trying to reach it 
via singlestepping as a backup solution.


FWIW, you can just use xc_hvm_inject_trap() with a TRAP_page_fault 
parameter to cause the OS to bring a paged out page back into main memory.



HTH,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen 4.11.1: guest uses altp2m to crash host

2019-04-15 Thread Razvan Cojocaru

On 4/15/19 5:04 PM, Matt Leinhos wrote:
You are correct -- cherry picking that commit into RELEASE-4.11.1 
addresses the crash. The problem cannot be reproduced on 4.12.


I hope this helps.


Thank you for confirming the fix!

To the deciders: would it be reasonable to ask patches like this to be 
backported to the future 4.11 release?



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen 4.11.1: guest uses altp2m to crash host

2019-04-15 Thread Razvan Cojocaru

Hello Matt,

Judging by the description of your problem, I believe that it has 
already been fixed by this commit:


http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=1dddfff4c39d3db17dfa709b1c57f44e3ed352e3;hp=68465944c81fa2fbc40bf29ec6084072af5e48ec

Could you please either try to reproduce the problem with 4.12 or apply 
this patch to 4.11 and confirm?



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/hvm: Fix altp2m_op hypercall continuations

2019-04-09 Thread Razvan Cojocaru

On 4/8/19 9:34 PM, Andrew Cooper wrote:

On 08/04/2019 19:13, Razvan Cojocaru wrote:

The changes certainly look correct. I'll run some tests tomorrow as well.


Thanks,


Continuation appears to work correctly with our tests (64-bit PV domain 
calling 64-bit Xen / dom0). Since we're only using the external altp2m 
model, HVMOPs from a 32-bit guest never occur, so the issues this patch 
fixes are not a problem for our introspection agent, at least.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/hvm: Fix altp2m_op hypercall continuations

2019-04-08 Thread Razvan Cojocaru
On 4/8/19 8:39 PM, Andrew Cooper wrote:
> c/s 9383de210 "x86/altp2m: support for setting restrictions for an array of
> pages" introduced this logic, but do_hvm_op() was already capable of handling
> -ERESTART correctly.
> 
> More problematic however is a continuation from compat_altp2m_op().  The arg
> written back into register state points into the hypercall XLAT area, not at
> the original parameter passed by the guest.  It may be truncated by the
> vmentry, but definitely won't be correct on the next invocation.
> 
> Delete the hypercall_create_continuation() call, and return -ERESTART, which
> will cause the compat case to start working correctly.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Razvan Cojocaru 
> CC: Petre Pircalabu 
> ---
>  xen/arch/x86/hvm/hvm.c | 12 ++--
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index e798b49..cc89ee7 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4729,12 +4729,10 @@ static int do_altp2m_op(
>  if ( rc > 0 )
>  {
>  a.u.set_mem_access_multi.opaque = rc;
> +rc = -ERESTART;
>  if ( __copy_field_to_guest(guest_handle_cast(arg, 
> xen_hvm_altp2m_op_t),
> &a, u.set_mem_access_multi.opaque) )
>  rc = -EFAULT;
> -else
> -rc = hypercall_create_continuation(__HYPERVISOR_hvm_op, "lh",
> -   HVMOP_altp2m, arg);
>  }
>  break;

Right, that part was taken from the XENMEM_access_op_set_access_multi
code and plopped in. Didn't follow the call chain all the way through
and so missed the simpler way of doing this.

The changes certainly look correct. I'll run some tests tomorrow as well.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] vm_event: Fix XEN_VM_EVENT_RESUME domctl

2019-04-05 Thread Razvan Cojocaru

On 4/5/19 12:33 PM, Petre Pircalabu wrote:

Make XEN_VM_EVENT_RESUME return 0 in case of success, instead of
-EINVAL.
Remove vm_event_resume form vm_event.h header and set the function's
visibility to static as is used only in vm_event.c.
Move the vm_event_check_ring test inside vm_event_resume in order to
simplify the code.

Signed-off-by: Petre Pircalabu 


Acked-by: Razvan Cojocaru 



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] Fix p2m_set_suppress_ve

2019-04-04 Thread Razvan Cojocaru

On 4/4/19 3:46 PM, Razvan Cojocaru wrote:

On 4/3/19 6:30 PM, Jan Beulich wrote:

On 03.04.19 at 17:17,  wrote:

On 4/3/19 5:58 PM, Jan Beulich wrote:

On 03.04.19 at 16:29,  wrote:

--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -3011,8 +3011,16 @@ int p2m_set_suppress_ve(struct domain *d, 
gfn_t gfn, bool suppress_ve,

   mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL, NULL);
   if ( !mfn_valid(mfn) )
   {
-    rc = -ESRCH;
-    goto out;
+    unsigned int page_order;
+
+    mfn = __get_gfn_type_access(host_p2m, gfn_x(gfn), &t, &a,
+    P2M_ALLOC | P2M_UNSHARE, 
&page_order, 0);


I'm not entirely certain about P2M_ALLOC, but I'm pretty sure that
at least P2M_UNSHARE is too heavy: Why would you want to force
un-sharing of a page when all you want to alter is #VE behavior?


That logic was taken from p2m_set_altp2m_mem_access(), we thought the
two cases are very similar.


I see.


On the UNSHARE observation, we don't know why the author originally 
requested the flag. We decided to keep it on the assumption that it 
_probably_ handles some corner-case that somebody has come accross.


We'll prepare a mini-series factoring out the code we've been discussing 
in separate functions: one for getting things out of the hostp2m if the 
entry is not present in the altp2m, and one for the special 
page-order-dependent code (which is duplicated in 
p2m_set_altp2m_mem_access() and p2m_change_altp2m_gfn()).


Before going into that, are we now certain that ALLOC is sufficient? I 
believe it should be for _our_ use-cases, but we don't want to break 
anyone's code. Maybe Tamas knows more about this.


Sorry, I forgot to mention that p2m_change_altp2m_gfn() only uses ALLOC:

2649 /* Check host p2m if no valid entry in alternate */
2650 if ( !mfn_valid(mfn) )
2651 {
2652 mfn = __get_gfn_type_access(hp2m, gfn_x(old_gfn), &t, &a,
2653 P2M_ALLOC, &page_order, 0);
2654
2655 if ( !mfn_valid(mfn) || t != p2m_ram_rw )
2656 goto out;
2657
2658 /* If this is a superpage, copy that first */
2659 if ( page_order != PAGE_ORDER_4K )
2660 {
2661 gfn_t gfn;
2662 unsigned long mask;
2663
2664 mask = ~((1UL << page_order) - 1);
2665 gfn = _gfn(gfn_x(old_gfn) & mask);
2666 mfn = _mfn(mfn_x(mfn) & mask);
2667
2668 if ( ap2m->set_entry(ap2m, gfn, mfn, page_order, t, a, 1) )
2669 goto out;
2670 }
2671 }

Confusing...


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] Fix p2m_set_suppress_ve

2019-04-04 Thread Razvan Cojocaru

On 4/3/19 6:30 PM, Jan Beulich wrote:

On 03.04.19 at 17:17,  wrote:

On 4/3/19 5:58 PM, Jan Beulich wrote:

On 03.04.19 at 16:29,  wrote:

--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -3011,8 +3011,16 @@ int p2m_set_suppress_ve(struct domain *d, gfn_t gfn, 
bool suppress_ve,
   mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL, NULL);
   if ( !mfn_valid(mfn) )
   {
-rc = -ESRCH;
-goto out;
+unsigned int page_order;
+
+mfn = __get_gfn_type_access(host_p2m, gfn_x(gfn), &t, &a,
+P2M_ALLOC | P2M_UNSHARE, &page_order, 0);


I'm not entirely certain about P2M_ALLOC, but I'm pretty sure that
at least P2M_UNSHARE is too heavy: Why would you want to force
un-sharing of a page when all you want to alter is #VE behavior?


That logic was taken from p2m_set_altp2m_mem_access(), we thought the
two cases are very similar.


I see.


On the UNSHARE observation, we don't know why the author originally 
requested the flag. We decided to keep it on the assumption that it 
_probably_ handles some corner-case that somebody has come accross.


We'll prepare a mini-series factoring out the code we've been discussing 
in separate functions: one for getting things out of the hostp2m if the 
entry is not present in the altp2m, and one for the special 
page-order-dependent code (which is duplicated in 
p2m_set_altp2m_mem_access() and p2m_change_altp2m_gfn()).


Before going into that, are we now certain that ALLOC is sufficient? I 
believe it should be for _our_ use-cases, but we don't want to break 
anyone's code. Maybe Tamas knows more about this.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] Fix p2m_set_suppress_ve

2019-04-03 Thread Razvan Cojocaru
On 4/3/19 7:04 PM, Jan Beulich wrote:
 On 03.04.19 at 17:48,  wrote:
>> On 4/3/19 6:30 PM, Jan Beulich wrote:
>> On 03.04.19 at 17:17,  wrote:
 Changes to the hostp2m (also known as altp2m view 0) propagate to all
 existing altp2ms, but they do so in a lazy manner, and also that won't
 happen for altp2ms created after a while. So altp2ms will not
 necessarily know about a page that the hostp2m knows about, which should
 not stop us from setting mem access restrictions or the value of the SVE
 bit.
>>>
>>> But even if the propagation is lazy, a "get" should then return the
>>> propagated value, shouldn't it? Or else callers (like is the case here)
>>> can be mislead. I do recognize that there may be callers who
>>> intentionally _don't_ want the propagated value, but I'd expect this
>>> to be the exception, not the rule. I wonder therefore whether there
>>> shouldn't be a wrapper around ->get_entry() to take care of this for
>>> callers which want the propagated value. Calling the bare hook would
>>> remain as is.
>>
>> But I believe that some hostp2m entries will never be propagated. Sorry 
>> that I've not been clear about this. This is what I meant by "won't 
>> happen for altp2ms created after a while".
>>
>> Take, for example, the case where there's only the hostp2m for the first 
>> 30 minutes of a guest's life, and in this time somebody calls 
>> ept_set_entry(). This, in turn, calls p2m_altp2m_propagate_change() if ( 
>> entry_written && p2m_is_hostp2m(p2m) ).
>>
>> But at this point there are no altp2ms, so there's nowhere to propagate 
>> these changes to.
>>
>> Now, at minute 31 we create a new altp2m. The changes will never be 
>> propagated here, so altp2m->get_entry() will always (rightfully) return 
>> an invalid MFN.
> 
> I guess I'm missing something here: Why "rightfully"? If the guest
> runs on this altp2m, it'll die a quick death then, won't it? Yet if
> this is intended to be demand-populate, then why would there be
> automatic propagation for existing altp2m-s (as you explain
> further up, unless I'm getting this wrong)?
> 
>> We only want to make an exception for setting the SVE bit or mem access 
>> restrictions on an entry in the altp2m, but having get_entry() (or a 
>> wrapper) return the hostp2m values and MFN might not necessarily be what 
>> current callers expect.
> 
> Then I still don't see why you would want to force propagation
> in these cases: If it's generally demand-populate, it can't be right
> to _fully_ populate that slot. You ought to be setting the access
> type or the "suppress #VE" bit alone, without propagating the
> MFN and alike. The later, when the entry actually gets propagated,
> the entered values should be used as overrides to what comes
> from the host p2m.
To try to answer your question to the best of my knowledge, the altp2m
propagation appears to be a hybrid between lazy propagation, in
hvm_hap_nested_page_fault() (xen/arch/x86/hvm/hvm.c):

1748 ap2m_active = altp2m_active(currd);
1749
1750 /*
1751  * Take a lock on the host p2m speculatively, to avoid potential
1752  * locking order problems later and to handle unshare etc.
1753  */
1754 hostp2m = p2m_get_hostp2m(currd);
1755 mfn = get_gfn_type_access(hostp2m, gfn, &p2mt, &p2ma,
1756   P2M_ALLOC | (npfec.write_access ?
P2M_UNSHARE : 0),
1757   NULL);
1758
1759 if ( ap2m_active )
1760 {
1761 if ( p2m_altp2m_lazy_copy(curr, gpa, gla, npfec, &p2m) )
1762 {
1763 /* entry was lazily copied from host -- retry */
1764 __put_gfn(hostp2m, gfn);
1765 rc = 1;
1766 goto out;
1767 }
1768
1769 mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 0, NULL);
1770 }
1771 else
1772 p2m = hostp2m;

and immediate propagation, in ept_set_entry() (xen/arch/x86/mm/p2m-ept.c):

 827 if ( entry_written && p2m_is_hostp2m(p2m) )
 828 {
 829 ret = p2m_altp2m_propagate_change(d, _gfn(gfn), mfn, order,
p2mt, p2ma);
 830 if ( !rc )
 831 rc = ret;
 832 }

Either way, doing what our patch does should pose no problem as far as I
can tell. For the lazy copy case, p2m_altp2m_lazy_copy() will do
nothing, as our function has already populated the slot correctly:

2378 bool_t p2m_altp2m_lazy_copy(struct vcpu *v, paddr_t gpa,
2379 unsigned long gla, struct npfec npfec,
2380 struct p2m_domain **ap2m)
2381 {
2382 struct p2m_domain *hp2m = p2m_get_hostp2m(v->domain);
2383 p2m_type_t p2mt;
2384 p2m_access_t p2ma;
2385 unsigned int page_order;
2386 gfn_t gfn = _gfn(paddr_to_pfn(gpa));
2387 unsigned long mask;
2388 mfn_t mfn;
2389 int rv;
2390
2391 *ap2m = p2m_get_altp2m(v);
2392
2393 mfn = get_gfn_type_access(*ap2m, gfn_x(gfn), &p2mt, &p2ma,
2394   0, &page_order);
2395   

Re: [Xen-devel] [PATCH v1] Fix p2m_set_suppress_ve

2019-04-03 Thread Razvan Cojocaru
On 4/3/19 7:16 PM, Tamas K Lengyel wrote:
> It is right to fully populate a slot when for example we want
> different mem_access permissions in different altp2m views. We can't
> wait for the domain to on-demand populate the altp2m view because we
> don't know when (and if) that will actually happen. So
> p2m_set_altp2m_mem_access already copies the entry over from the
> hostp2m to the altp2m and then applies the requested mem_access
> setting. This is done even if the mem_access setting is the same as it
> was in the hostp2m.
> 
> Doing the same behavior for p2m_set_suppress_ve I think is consistent,
> albeit you could easily overcome the issue by first simply setting a
> mem_access permission on the gfn to trigger the aforementioned copy to
> the altp2m before you try to set the ve settings.

That's what we're doing now, but it wasn't at all intuitive to figure it
out. I'd expect that setting the SVE bit simply works without
maneuvering something else in place first.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] Fix p2m_set_suppress_ve

2019-04-03 Thread Razvan Cojocaru

On 4/3/19 6:30 PM, Jan Beulich wrote:

On 03.04.19 at 17:17,  wrote:

Changes to the hostp2m (also known as altp2m view 0) propagate to all
existing altp2ms, but they do so in a lazy manner, and also that won't
happen for altp2ms created after a while. So altp2ms will not
necessarily know about a page that the hostp2m knows about, which should
not stop us from setting mem access restrictions or the value of the SVE
bit.


But even if the propagation is lazy, a "get" should then return the
propagated value, shouldn't it? Or else callers (like is the case here)
can be mislead. I do recognize that there may be callers who
intentionally _don't_ want the propagated value, but I'd expect this
to be the exception, not the rule. I wonder therefore whether there
shouldn't be a wrapper around ->get_entry() to take care of this for
callers which want the propagated value. Calling the bare hook would
remain as is.


But I believe that some hostp2m entries will never be propagated. Sorry 
that I've not been clear about this. This is what I meant by "won't 
happen for altp2ms created after a while".


Take, for example, the case where there's only the hostp2m for the first 
30 minutes of a guest's life, and in this time somebody calls 
ept_set_entry(). This, in turn, calls p2m_altp2m_propagate_change() if ( 
entry_written && p2m_is_hostp2m(p2m) ).


But at this point there are no altp2ms, so there's nowhere to propagate 
these changes to.


Now, at minute 31 we create a new altp2m. The changes will never be 
propagated here, so altp2m->get_entry() will always (rightfully) return 
an invalid MFN.


We only want to make an exception for setting the SVE bit or mem access 
restrictions on an entry in the altp2m, but having get_entry() (or a 
wrapper) return the hostp2m values and MFN might not necessarily be what 
current callers expect.


Whether the described scenario _should_ work the way it does is 
debatable, but it appears to do so by design.


Hopefully I'm not missing anything.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] Fix p2m_set_suppress_ve

2019-04-03 Thread Razvan Cojocaru

On 4/3/19 5:58 PM, Jan Beulich wrote:

On 03.04.19 at 16:29,  wrote:

--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -3011,8 +3011,16 @@ int p2m_set_suppress_ve(struct domain *d, gfn_t gfn, 
bool suppress_ve,
  mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL, NULL);
  if ( !mfn_valid(mfn) )
  {
-rc = -ESRCH;
-goto out;
+unsigned int page_order;
+
+mfn = __get_gfn_type_access(host_p2m, gfn_x(gfn), &t, &a,
+P2M_ALLOC | P2M_UNSHARE, &page_order, 0);


I'm not entirely certain about P2M_ALLOC, but I'm pretty sure that
at least P2M_UNSHARE is too heavy: Why would you want to force
un-sharing of a page when all you want to alter is #VE behavior?


That logic was taken from p2m_set_altp2m_mem_access(), we thought the 
two cases are very similar.


269 mfn = ap2m->get_entry(ap2m, gfn, &t, &old_a, 0, NULL, NULL);
270
271 /* Check host p2m if no valid entry in alternate */
272 if ( !mfn_valid(mfn) )
273 {
274
275 mfn = __get_gfn_type_access(hp2m, gfn_l, &t, &old_a,
276 P2M_ALLOC | P2M_UNSHARE, 
&page_order, 0);

277
278 rc = -ESRCH;
279 if ( !mfn_valid(mfn) || t != p2m_ram_rw )
280 return rc;
281
282 /* If this is a superpage, copy that first */
283 if ( page_order != PAGE_ORDER_4K )
284 {
285 unsigned long mask = ~((1UL << page_order) - 1);
286 gfn_t gfn2 = _gfn(gfn_l & mask);
287 mfn_t mfn2 = _mfn(mfn_x(mfn) & mask);
288
289 rc = ap2m->set_entry(ap2m, gfn2, mfn2, page_order, t, 
old_a, 1);

290 if ( rc )
291 return rc;
292 }
293 }
294
295 /*
296  * Inherit the old suppress #VE bit value if it is already set, 
or set it

297  * to 1 otherwise
298  */
299 return ap2m->set_entry(ap2m, gfn, mfn, PAGE_ORDER_4K, t, a, -1);
300 }

I wonder if we should put the whole logic in the "if ( !mfn_valid(mfn) 
)" body in its own function and reuse that for both functions - although 
it doesn't look like the extra superpage logic matters for setting the 
suppress #VE bit alone (since even the code above only sets it with 
PAGE_ORDER_4K).



Additionally, when you add such a lookup as error handling attempt,
I think it is important to leave a code comment. But I wonder
whether this shouldn't be done before the call to ->get_entry(), or
whether in fact there's a bug here in how get_entry() behaves in
this case.


Changes to the hostp2m (also known as altp2m view 0) propagate to all 
existing altp2ms, but they do so in a lazy manner, and also that won't 
happen for altp2ms created after a while. So altp2ms will not 
necessarily know about a page that the hostp2m knows about, which should 
not stop us from setting mem access restrictions or the value of the SVE 
bit.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86/altp2m: treat view 0 as the hostp2m in p2m_get_mem_access()

2019-04-03 Thread Razvan Cojocaru
p2m_set_mem_access() (and other places) treat view 0 as the
hostp2m, but p2m_get_mem_access() does not. Correct that
inconsistency.

Signed-off-by: Razvan Cojocaru 
---
 xen/arch/x86/mm/mem_access.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/mem_access.c b/xen/arch/x86/mm/mem_access.c
index 56c06a4..a144bb0 100644
--- a/xen/arch/x86/mm/mem_access.c
+++ b/xen/arch/x86/mm/mem_access.c
@@ -505,7 +505,7 @@ int p2m_get_mem_access(struct domain *d, gfn_t gfn, 
xenmem_access_t *access,
 if ( altp2m_idx )
 return -EINVAL;
 }
-else
+else if ( altp2m_idx ) /* altp2m view 0 is treated as the hostp2m */
 {
 if ( altp2m_idx >= MAX_ALTP2M ||
  d->arch.altp2m_eptp[altp2m_idx] == mfn_x(INVALID_MFN) )
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 04/12] xen/arm: memaccess: Initialize correctly *access in __p2m_get_mem_access

2019-03-27 Thread Razvan Cojocaru
On 3/27/19 8:45 PM, Julien Grall wrote:
> The commit 8d84e701fd "xen/arm: initialize access" initializes
> *access using the wrong enumeration type. This result to a warning
> using clang:
> 
> mem_access.c:50:20: error: implicit conversion from enumeration type
> 'p2m_access_t' to different enumeration type 'xenmem_access_t'
> [-Werror,-Wenum-conversion]
> *access = p2m->default_access;
> ~ ~^~
> 
> The correct solution is to use the array memaccess that will do the
> conversion between the 2 enums.
> 
> Fixes: 8d84e701fd ("xen/arm: initialize access")
> Signed-off-by: Julien Grall 

Acked-by: Razvan Cojocaru 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Commit 16cc3362aed39e3093419b9df6ec73269071d063 configure failure

2019-03-13 Thread Razvan Cojocaru

On 3/13/19 5:02 PM, Wei Liu wrote:

On Wed, Mar 13, 2019 at 01:31:22PM +0200, Razvan Cojocaru wrote:

On 3/13/19 1:28 PM, Wei Liu wrote:

On Wed, Mar 13, 2019 at 01:24:06PM +0200, Razvan Cojocaru wrote:

Hello,

Commit "build/m4: make python_devel.m4 work with both python 2 and 3" makes
my configure run fail, at least on my Ubuntu 16.04.6 LTS machine.

http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=16cc3362aed39e3093419b9df6ec73269071d063

checking for python-config... /usr/bin/python-config
checking Python.h usability... yes
checking Python.h presence... yes
checking for Python.h... yes
checking for PyArg_ParseTuple... no
configure: error: Unable to find a suitable python development library
configure: error: ./configure failed for tools

Checking out the commit below it, configure works. I've been looking at
possibly installing some additional Python-related package on that machine
but so far nothing I could think of seems to be doing the trick.

Of course, if Xen's no longer supposed to build on setups such as mine then
sorry for the noise.



Yeah I'm aware of the failure. Thanks for reporting.

You're supposed to install python-dev now, but that still won't resolve
the issue.  I think there is a subtle difference between Debian and
Ubuntu. I will try to fix it today.


Indeed it won't - I do have it installed (2.7.12-1~16.04).


Pull from staging -- it should be fixed now.


It does build now indeed. Thanks for the quick fix.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Commit 16cc3362aed39e3093419b9df6ec73269071d063 configure failure

2019-03-13 Thread Razvan Cojocaru

On 3/13/19 1:28 PM, Wei Liu wrote:

On Wed, Mar 13, 2019 at 01:24:06PM +0200, Razvan Cojocaru wrote:

Hello,

Commit "build/m4: make python_devel.m4 work with both python 2 and 3" makes
my configure run fail, at least on my Ubuntu 16.04.6 LTS machine.

http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=16cc3362aed39e3093419b9df6ec73269071d063

checking for python-config... /usr/bin/python-config
checking Python.h usability... yes
checking Python.h presence... yes
checking for Python.h... yes
checking for PyArg_ParseTuple... no
configure: error: Unable to find a suitable python development library
configure: error: ./configure failed for tools

Checking out the commit below it, configure works. I've been looking at
possibly installing some additional Python-related package on that machine
but so far nothing I could think of seems to be doing the trick.

Of course, if Xen's no longer supposed to build on setups such as mine then
sorry for the noise.



Yeah I'm aware of the failure. Thanks for reporting.

You're supposed to install python-dev now, but that still won't resolve
the issue.  I think there is a subtle difference between Debian and
Ubuntu. I will try to fix it today.


Indeed it won't - I do have it installed (2.7.12-1~16.04).


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Commit 16cc3362aed39e3093419b9df6ec73269071d063 configure failure

2019-03-13 Thread Razvan Cojocaru

Hello,

Commit "build/m4: make python_devel.m4 work with both python 2 and 3" 
makes my configure run fail, at least on my Ubuntu 16.04.6 LTS machine.


http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=16cc3362aed39e3093419b9df6ec73269071d063

checking for python-config... /usr/bin/python-config
checking Python.h usability... yes
checking Python.h presence... yes
checking for Python.h... yes
checking for PyArg_ParseTuple... no
configure: error: Unable to find a suitable python development library
configure: error: ./configure failed for tools

Checking out the commit below it, configure works. I've been looking at 
possibly installing some additional Python-related package on that 
machine but so far nothing I could think of seems to be doing the trick.


Of course, if Xen's no longer supposed to build on setups such as mine 
then sorry for the noise.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 3/4] x86/vmx: Fix security issue when a guest balloons out the #VE info page

2019-02-21 Thread Razvan Cojocaru
On 2/21/19 10:18 PM, Andrew Cooper wrote:
> The logic in altp2m_vcpu_{en,dis}able_ve() and vmx_vcpu_update_vmfunc_ve() is
> dangerous.  After #VE has been set up, the guest can balloon out and free the
> nominated GFN, after which the processor may write to it.  Also, the unlocked
> GFN query means the MFN is stale by the time it is used.  Alternatively, a
> guest can race two disable calls to cause one VMCS to still reference the
> nominated GFN after the tracking information was dropped.
> 
> Rework the logic from scratch to make it safe.
> 
> Hold an extra page reference on the underlying frame, to account for the
> VMCS's reference.  This means that if the GFN gets ballooned out, it isn't
> freed back to Xen until #VE is disabled, and the VMCS no longer refers to the
> page.
> 
> A consequence of this is that altp2m_vcpu_disable_ve() needs to be called
> during the domain_kill() path, to drop the reference for domains which shut
> down with #VE still enabled.
> 
> For domains using altp2m, we expect a single enable call and no disable for
> the remaining lifetime of the domain.  However, to avoid problems with
> concurrent calls, use cmpxchg() to locklessly maintain safety.
> 
> This doesn't have an XSA because altp2m is not yet a security-supported
> feature.
> 
> Signed-off-by: Andrew Cooper 
> Release-acked-by: Juergen Gross 

Tested-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/4] x86/vmx: Fix security issue when a guest balloons out the #VE info page

2019-02-20 Thread Razvan Cojocaru

On 2/20/19 12:18 AM, Andrew Cooper wrote:

The logic in altp2m_vcpu_{en,dis}able_ve() and vmx_vcpu_update_vmfunc_ve() is
dangerous.  After #VE has been set up, the guest can balloon out and free the
nominated GFN, after which the processor may write to it.  Also, the unlocked
GFN query means the MFN is stale by the time it is used.  Alternatively, a
guest can race two disable calls to cause one VMCS to still reference the
nominated GFN after the tracking information was dropped.

Rework the logic from scratch to make it safe.

Hold an extra page reference on the underlying frame, to account for the
VMCS's reference.  This means that if the GFN gets ballooned out, it isn't
freed back to Xen until #VE is disabled, and the VMCS no longer refers to the
page.

A consequence of this is that arch_vcpu_unmap_resources() now needs to call
altp2m_vcpu_disable_ve() to drop the reference during domain_kill(), to allow
all of the memory to be freed.

For domains using altp2m, we expect a single enable call and no disable for
the remaining lifetime of the domain.  However, to avoid problems with
concurrent calls, use cmpxchg() to locklessly maintain safety.

This doesn't have an XSA because altp2m is not yet a security-supported
feature.

Signed-off-by: Andrew Cooper


Tested-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/4] x86/vmx: Properly flush the TLB when an altp2m is modified

2019-02-19 Thread Razvan Cojocaru
On 2/20/19 12:18 AM, Andrew Cooper wrote:
> Modificaitons to an altp2m mark the p2m as needing flushing, but this was
> never wired up in the return-to-guest path.  As a result, stale TLB entries
> can remain after resuming the guest.
> 
> In practice, this manifests as a missing EPT_VIOLATION or #VE exception when
> the guest subsequently accesses a page which has had its permissions reduced.
> 
> vmx_vmenter_helper() now has 11 p2ms to potentially invalidate, but issuing 11
> INVEPT instructions isn't clever.  Instead, count how many contexts need
> invalidating, and use INVEPT_ALL_CONTEXT if two or more are in need of
> flushing.
> 
> This doesn't have an XSA because altp2m is not yet a security-supported
> feature.
> 
> Signed-off-by: Andrew Cooper 
Reviewed-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] x86/altp2m: Rework #VE enable/disable paths

2019-02-19 Thread Razvan Cojocaru
On 2/20/19 12:18 AM, Andrew Cooper wrote:
> Split altp2m_vcpu_{enable,disable}_ve() out of the
> HVMOP_altp2m_vcpu_{enable,disable}_notify marshalling logic.  A future change
> is going to need to call altp2m_vcpu_disable_ve() from the domain_kill() path.
> 
> While at it, clean up the logic in altp2m_vcpu_{initialise,destroy}().
> altp2m_vcpu_reset() has no external callers, so fold it into its two
> callsites.  This in turn allows for altp2m_vcpu_destroy() to reuse
> altp2m_vcpu_disable_ve() rather than opencoding it.
> 
> No practical change.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/4] xen/common: Break domain_unmap_resources() out of domain_kill()

2019-02-19 Thread Razvan Cojocaru
On 2/20/19 12:18 AM, Andrew Cooper wrote:
> A subsequent change is going to need an x86-specific unmapping step, so take
> the opportunity to split the current vcpu unmapping out into a dedicated path.
> 
> No practical change.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Razvan Cojocaru 
> CC: Tamas K Lengyel 
> CC: Juergen Gross 
> ---
>  xen/common/domain.c  | 16 +---
>  xen/include/xen/domain.h |  4 
>  2 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 32bca8d..e66f7ea 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -700,10 +700,21 @@ int rcu_lock_live_remote_domain_by_id(domid_t dom, 
> struct domain **d)
>  return 0;
>  }
>  
> +static void domain_unmap_resources(struct domain *d)
> +{
> +struct vcpu *v;
> +
> +for_each_vcpu ( d, v )
> +{
> +unmap_vcpu_info(v);
> +
> +arch_vcpu_unmap_resources(v);
> +}
> +}
> +
>  int domain_kill(struct domain *d)
>  {
>  int rc = 0;
> -struct vcpu *v;
>  
>  if ( d == current->domain )
>  return -EINVAL;
> @@ -732,13 +743,12 @@ int domain_kill(struct domain *d)
>  d->tmem_client = NULL;
>  /* fallthrough */
>  case DOMDYING_dying:
> +domain_unmap_resources(d);
>  rc = domain_relinquish_resources(d);
>  if ( rc != 0 )
>  break;
>  if ( cpupool_move_domain(d, cpupool0) )
>  return -ERESTART;a
> -for_each_vcpu ( d, v )
> -unmap_vcpu_info(v);

So before this change there was a leak of some sort here? I see that
before this patch unmap_vcpu_info() was called only if rc == 0, but it
is now called unconditionally _before_ domain_relinquish_resources().

This does appear to change the behaviour of the code in the error case.

If that's intended:
Reviewed-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC 0/6] Slotted channels for sync vm_events

2019-02-19 Thread Razvan Cojocaru

On 2/12/19 7:01 PM, Tamas K Lengyel wrote:

On Thu, Feb 7, 2019 at 9:06 AM Petre Ovidiu PIRCALABU
 wrote:


On Thu, 2019-02-07 at 11:46 +, George Dunlap wrote:

On 2/6/19 2:26 PM, Petre Ovidiu PIRCALABU wrote:

On Wed, 2018-12-19 at 20:52 +0200, Petre Pircalabu wrote:

This patchset is a rework of the "multi-page ring buffer" for
vm_events
patch based on Andrew Cooper's comments.
For synchronous vm_events the ring waitqueue logic was
unnecessary as
the
vcpu sending the request was blocked until a response was
received.
To simplify the request/response mechanism, an array of slotted
channels
was created, one per vcpu. Each vcpu puts the request in the
corresponding slot and blocks until the response is received.

I'm sending this patch as a RFC because, while I'm still working
on
way to
measure the overall performance improvement, your feedback would
be a
great
assistance.



Is anyone still using asynchronous vm_event requests? (the vcpu is
not
blocked and no response is expected).
If not, I suggest that the feature should be removed as it
(significantly) increases the complexity of the current
implementation.


Could you describe in a bit more detail what the situation
is?  What's
the current state fo affairs with vm_events, what you're trying to
change, why async vm_events is more difficult?


The main reason for the vm_events modification was to improve the
overall performance in high throughput introspection scenarios. For
domus with a higher vcpu count, a vcpu could sleep for a certain period
of time while waiting for a ring slot to become available
(__vm_event_claim_slot)
The first patchset only increased the ring size, and the second
iteraton, based on Andrew Copper's comments, proposed a separate path
to handle synchronous events ( a slotted buffer for each vcpu) in order
to have the events handled independently of one another. To handle
asynchronous events, a dynamically allocated vm_event ring is used.
While the implementation is not exactly an exercise in simplicity, it
preserves all the needed functionality and offers fallback if the Linux
domain running the monitor application doesn't support
IOCTL_PRIVCMD_MMAP_RESOURCE.
However, the problem got a little bit more complicated when I tried
implementing the vm_events using an IOREQ server (based on Paul
Durrant's comments). For synchronous vm_events, it simplified things a
little, eliminating both the need for a special structure to hold the
processing state and the evtchns for each vcpu.
The asynchronous events were a little more tricky to handle. The
buffered ioreqs were a good candidate, but the only thing usable is the
corresponding evtchn in conjunction with an existing ring. In order to
use them, a mock buffered ioreq should be created and transmitted, with
the only meaningful field being the ioreq type.


I certainly think it would be better if you could write the new
vm_event
interface without having to spend a lot of effort supporting modes
that
you think nobody uses.

On the other hand, getting into the habit of breaking stuff, even for
people we don't know about, will be a hindrance to community growth;
a
commitment to keeping it working will be a benefit to growth.

But of course, we haven't declared the vm_event interface 'supported'
(it's not even mentioned in the SUPPORT.md document yet).

Just for the sake of discussion, would it be possible / reasonble,
for
instance, to create a new interface, vm_events2, instead?  Then you
could write it to share the ioreq interface without having legacy
baggage you're not using; we could deprecate and eventually remove
vm_events1, and if anyone shouts, we can put it back.

Thoughts?

  -George

Yes, it's possible and it will GREATLY simplify the implementation. I
just have to make sure the interfaces are mutually exclusive.


I'm for removing features from the vm_event interface that are no
longer in use, especially if they block more advantageous changes like
this one. We don't know what the use-case was for async events nor
have seen anyone even mention them since I've been working with Xen.
Creating a new interface, as mentioned above, would make sense if
there was a disagreement with retiring this feature. I don't think
that's the case. I certainly would prefer not having to maintain two
separate interfaces going forward without a clear justification and
documented use-case explaining why we keep the old interface around.


AFAICT, the async model is broken conceptually as well, so it makes no 
sense. It would make sense, IMHO, if it would be lossy (i.e. we just 
write in the ring buffer, if somebody manages to "catch" an event while 
it flies by then so be it, if not it will be overwritten). If it would 
have been truly lossy, I'd see a use-case for it, for gathering 
statistics maybe.


However, as the code is now, the VCPUs are not paused only if there's 
still space in the ring buffer. If there's no more space in the ring 
buffer, the VCPU trying to put an event in still gets pause

Re: [Xen-devel] [PATCH for-4.12 V5 2/2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-18 Thread Razvan Cojocaru

On 2/18/19 1:19 PM, Jan Beulich wrote:

On 15.02.19 at 16:41,  wrote:

HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).

This can lead to domain crashes in the vmx_vmexit_handler() code
that checks if the guest has the ability to switch EPTP without an
exit. That code can __vmread() the host p2m's EPT_POINTER
(before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
chance to run altp2m_vcpu_initialise(), but after
d->arch.altp2m_active is set).

Signed-off-by: Razvan Cojocaru 


Acked-by: Jan Beulich 

Also I assume you mean to have the same justification here as
you have in patch 1, as to 4.12 inclusion. And for this reason
I'm also Cc-ing Jürgen again.


Ideed, I do. Thank you very much.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.12 V5 1/2] altp2m: Prevent deadlocks when a domain performs altp2m operations on itself

2019-02-15 Thread Razvan Cojocaru
domain_pause_except_self() was introduced to allow a domain to pause
itself while doing altp2m operations.  However, as written, it has a
risk fo deadlock if two vcpus enter the loop at the same time.

Luckily, there's already a solution for this: Attempt to call domain's
hypercall_deadlock_mutex, and restart the entire hypercall if you
fail.

Make domain_pause_except_self() attempt to grab this mutex when
pausing itself, returning -ERESTART if it fails.  Have the callers
check for errors and pass the value up.  In both cases, the top-level
do_hvm_op() should DTRT when -ERESTART is returned.

The (necessary) reuse of the hypercall deadlock mutex poses the risk
of getting called from a context where the lock was already acquired
(e.g. someone may (say) call domctl_lock(), then afterwards call
domain_pause_except_self()). However, in the interest of not
overcomplicating things, no changes are made here to the mutex.
Attempted nesting of this lock isn't a security issue, because all
that will happen is that the vcpu will livelock taking continuations.

Signed-off-by: George Dunlap 
Tested-by: Razvan Cojocaru 
---
Release exception justification:
- Fixes a bug in the current code
- Lays the foundation of another fix
- Only affects altp2m, which isn't a supported feature

NB two possible further improvements to put on the list for 4.13:
 - Replace send-interrupt-and-wait-for-each-one loop with
   hvmop_flush_tlb_all()'s more efficient
   send-all-the-interrupts-then-wait loop.
 - Modify hvmop_flush_tlb_all() to use domain_pause_except_self() to
   avoid code duplication
---
CC: George Dunlap 
CC: Jan Beulich 
CC: Andrew Cooper 
CC: Wei Liu 
CC: "Roger Pau Monné" 
CC: George Dunlap 
CC: Ian Jackson 
CC: Julien Grall 
CC: Konrad Rzeszutek Wilk 
CC: Stefano Stabellini 
CC: Tim Deegan 
---
 xen/arch/x86/mm/p2m.c   | 10 --
 xen/common/domain.c |  8 +++-
 xen/include/xen/sched.h |  2 +-
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d14ce57..7232dbf 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2530,8 +2530,11 @@ int p2m_destroy_altp2m_by_id(struct domain *d, unsigned 
int idx)
 if ( !idx || idx >= MAX_ALTP2M )
 return rc;
 
-domain_pause_except_self(d);
+rc = domain_pause_except_self(d);
+if ( rc )
+return rc;
 
+rc = -EBUSY;
 altp2m_list_lock(d);
 
 if ( d->arch.altp2m_eptp[idx] != mfn_x(INVALID_MFN) )
@@ -2561,8 +2564,11 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, 
unsigned int idx)
 if ( idx >= MAX_ALTP2M )
 return rc;
 
-domain_pause_except_self(d);
+rc = domain_pause_except_self(d);
+if ( rc )
+return rc;
 
+rc = -EINVAL;
 altp2m_list_lock(d);
 
 if ( d->arch.altp2m_eptp[idx] != mfn_x(INVALID_MFN) )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 7470cd9..32bca8d 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1134,18 +1134,24 @@ int domain_unpause_by_systemcontroller(struct domain *d)
 return 0;
 }
 
-void domain_pause_except_self(struct domain *d)
+int domain_pause_except_self(struct domain *d)
 {
 struct vcpu *v, *curr = current;
 
 if ( curr->domain == d )
 {
+/* Avoid racing with other vcpus which may want to be pausing us */
+if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
+return -ERESTART;
 for_each_vcpu( d, v )
 if ( likely(v != curr) )
 vcpu_pause(v);
+spin_unlock(&d->hypercall_deadlock_mutex);
 }
 else
 domain_pause(d);
+
+return 0;
 }
 
 void domain_unpause_except_self(struct domain *d)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index d633e1d..edee52d 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -839,7 +839,7 @@ static inline int 
domain_pause_by_systemcontroller_nosync(struct domain *d)
 }
 
 /* domain_pause() but safe against trying to pause current. */
-void domain_pause_except_self(struct domain *d);
+int __must_check domain_pause_except_self(struct domain *d);
 void domain_unpause_except_self(struct domain *d);
 
 /*
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.12 V5 2/2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-15 Thread Razvan Cojocaru
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).

This can lead to domain crashes in the vmx_vmexit_handler() code
that checks if the guest has the ability to switch EPTP without an
exit. That code can __vmread() the host p2m's EPT_POINTER
(before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
chance to run altp2m_vcpu_initialise(), but after
d->arch.altp2m_active is set).

Signed-off-by: Razvan Cojocaru 

---
CC: Jan Beulich 
CC: Andrew Cooper 
CC: Wei Liu 
CC: "Roger Pau Monné" 
---
 xen/arch/x86/hvm/hvm.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 410623d..79c7d81 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4538,6 +4538,10 @@ static int do_altp2m_op(
 break;
 }
 
+rc = domain_pause_except_self(d);
+if ( rc )
+break;
+
 ostate = d->arch.altp2m_active;
 d->arch.altp2m_active = !!a.u.domain_state.state;
 
@@ -4556,6 +4560,8 @@ static int do_altp2m_op(
 if ( ostate )
 p2m_flush_altp2m(d);
 }
+
+domain_unpause_except_self(d);
 break;
 }
 
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-15 Thread Razvan Cojocaru

On 2/15/19 3:37 PM, George Dunlap wrote:




On Feb 15, 2019, at 1:24 PM, Jan Beulich  wrote:


On 15.02.19 at 13:52,  wrote:

On Feb 12, 2019, at 11:42 AM, Razvan Cojocaru  wrote:

HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).


Sorry to come in here on v4 and suggest changing everything, but I don’t
really like the solution you have here.  Not setting altp2m to ‘active’ until
after the vcpus are set up makes sense; but passing this true/false value in
seems ugly, and still seems a bit racy (i.e., what if p2m_active() is
disabled between the check in HVMOP_altp2m_switch_p2m and the time we
actually call altp2m_vcpu_update_p2m()?)

I certainly don’t think domain_pause() should be our go-to solution for race
avoidance, but in this case it really seems like switching the global p2m for
every vcpu at once makes the most sense; and trying to safely change this on
an unpaused domain is not only overly complicated, but probably not what we
wanted anyway.

p2m_altp2m_destroy_by_id() and p2m_switch_domain_altp2m_by_id() already use
domain_pause_except_self(); so it seems like not using it for
altp2m_set_domain_state was probably more of an oversight than an intentional
decision.  Using that here seems like a more robust solution.


Ah, I didn't even recall there was such a function. As this now
also allows covering a domain requesting the operation for itself,
I don't mind the pausing approach anymore.


Yeah, I forgot too until I was grepping for “domain_pause” to figure out what 
everyone else was doing. :-)


The one issue is that domain_pause_except_self() currently is actually a
deadlock risk if two different vcpus start it at the same time.  I think the
attached patch (compile-tested only) should fix this issue; after this patch
you should be able to use domain_pause_except_self() in
altp2m_set_domain_state instead.


There's one thing I don't really like here, which is a result of the
(necessary) re-use of the hypercall deadlock mutex: This
certainly poses the risk of getting called from a context where
the lock was already acquired. Therefore I'd like to suggest to
use this lock in a recursive way (here and elsewhere).




And two cosmetic remarks - there's no need to re-specify
__must_check on the function definition, as the function
declaration ought to be in scope anyway. And there's a stray
blank inside the likely() you add.


I don’t see that I added a ‘likely’; there’s one in context, but I don’t see 
any stray blanks there.

The other two points make sense — Razvan, would you be willing to make those 
changes (and test the result, as I haven’t done more than compile-test it)?


Of course, happy to. Just to make sure I understand where we stand: I'll 
try to leave the mutex alone for now and only switch to a recursive one 
if anything blows up.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-15 Thread Razvan Cojocaru

On Feb 12, 2019, at 11:42 AM, Razvan Cojocaru  wrote:

HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).


Sorry to come in here on v4 and suggest changing everything, but I don’t really 
like the solution you have here.  Not setting altp2m to ‘active’ until after 
the vcpus are set up makes sense; but passing this true/false value in seems 
ugly, and still seems a bit racy (i.e., what if p2m_active() is disabled 
between the check in HVMOP_altp2m_switch_p2m and the time we actually call 
altp2m_vcpu_update_p2m()?)

I certainly don’t think domain_pause() should be our go-to solution for race 
avoidance, but in this case it really seems like switching the global p2m for 
every vcpu at once makes the most sense; and trying to safely change this on an 
unpaused domain is not only overly complicated, but probably not what we wanted 
anyway.

p2m_altp2m_destroy_by_id() and p2m_switch_domain_altp2m_by_id() already use 
domain_pause_except_self(); so it seems like not using it for 
altp2m_set_domain_state was probably more of an oversight than an intentional 
decision.  Using that here seems like a more robust solution.

The one issue is that domain_pause_except_self() currently is actually a 
deadlock risk if two different vcpus start it at the same time.  I think the 
attached patch (compile-tested only) should fix this issue; after this patch 
you should be able to use domain_pause_except_self() in altp2m_set_domain_state 
instead.

Does that sound reasonable?


Not only does that sound reasonable, I personally prefer this approach. 
I'll apply this patch and give it a spin.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-14 Thread Razvan Cojocaru
On 2/14/19 8:06 PM, George Dunlap wrote:
> On 2/12/19 11:42 AM, Razvan Cojocaru wrote:
>> HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
>> on purpose (as it was originally supposed to cater to a in-guest
>> agent, and a domain pausing itself is not a good idea).
>>
>> This can lead to domain crashes in the vmx_vmexit_handler() code
>> that checks if the guest has the ability to switch EPTP without an
>> exit. That code can __vmread() the host p2m's EPT_POINTER
>> (before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
>> chance to run altp2m_vcpu_initialise(), but after
>> d->arch.altp2m_active is set).
> 
> Sorry, where exactly does the crash happen?

3655 /*
3656  * If the guest has the ability to switch EPTP without an exit,
3657  * figure out whether it has done so and update the altp2m data.
3658  */
3659 if ( altp2m_active(v->domain) &&
3660 (v->arch.hvm.vmx.secondary_exec_control &
3661 SECONDARY_EXEC_ENABLE_VM_FUNCTIONS) )
3662 {
3663 unsigned long idx;
3664
3665 if ( v->arch.hvm.vmx.secondary_exec_control &
3666 SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS )
3667 __vmread(EPTP_INDEX, &idx);
3668 else
3669 {
3670 unsigned long eptp;
3671
3672 __vmread(EPT_POINTER, &eptp);
3673
3674 if ( (idx = p2m_find_altp2m_by_eptp(v->domain, eptp)) ==
3675  INVALID_ALTP2M )
3676 {
3677 gdprintk(XENLOG_ERR, "EPTP not found in alternate
p2m list\n");
3678 domain_crash(v->domain);

Right here (at line 3678 in vmx.c).


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 6/6] xc_version: add vm_event interface version

2019-02-12 Thread Razvan Cojocaru
On 2/12/19 8:13 PM, Tamas K Lengyel wrote:
> On Tue, Jan 8, 2019 at 9:39 AM Razvan Cojocaru
>  wrote:
>>
>> On 1/8/19 6:27 PM, Jan Beulich wrote:
>>>>>> On 19.12.18 at 19:52,  wrote:
>>>> Signed-off-by: Petre Pircalabu 
>>>
>>> An empty description is not helpful. The immediate question is: Why?
>>> We don't do this for other interface versions. I'm unconvinced a
>>> special purpose piece of information like this one belongs into the
>>> rather generic version hypercall.
>>
>> For an introspection application meant to be deployed on several Xen
>> versions without recompiling, it is important to be able to decide at
>> runtime what size and layout the vm_event struct has.
>>
>> Currently this can somewhat be done by associating the current version
>> with the vm_event version, but that is not ideal for obvious reasons.
> 
> We do exactly that in LibVMI and it works fine - care to elaborate
> what problem you have with doing that? There is a 1:1 match between
> any stable Xen version and a vm_event interface version. I don't think
> we will ever have a situation where we bump the vm_event interface
> version but not the Xen release version.

XenServer. Any given version of XenServer may or may not fit the matrix
you're talking about (because some patches are backported, some are not,
etc.). In which case it all becomes very complicated.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-12 Thread Razvan Cojocaru

On 2/12/19 2:57 PM, Jan Beulich wrote:

On 12.02.19 at 12:42,  wrote:

HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).

This can lead to domain crashes in the vmx_vmexit_handler() code
that checks if the guest has the ability to switch EPTP without an
exit. That code can __vmread() the host p2m's EPT_POINTER
(before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
chance to run altp2m_vcpu_initialise(), but after
d->arch.altp2m_active is set).

This patch reorganizes the code so that d->arch.altp2m_active
is set to true only after all the init work has been done, and
to false before the uninit work begins. This required adding
a new bool parameter altp2m_vcpu_update_p2m(), which relied
on d->arch.altp2m_active being set before it's called.

While at it, I've changed a couple of bool_t's to bool's.

Signed-off-by: Razvan Cojocaru 


FTR - this looks okay to me now. But I don't feel qualified to
give an R-b, so just for the parts where it's applicable
Acked-by: Jan Beulich 


Thanks for the ack and the help!


Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.12 V4] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-12 Thread Razvan Cojocaru
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).

This can lead to domain crashes in the vmx_vmexit_handler() code
that checks if the guest has the ability to switch EPTP without an
exit. That code can __vmread() the host p2m's EPT_POINTER
(before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
chance to run altp2m_vcpu_initialise(), but after
d->arch.altp2m_active is set).

This patch reorganizes the code so that d->arch.altp2m_active
is set to true only after all the init work has been done, and
to false before the uninit work begins. This required adding
a new bool parameter altp2m_vcpu_update_p2m(), which relied
on d->arch.altp2m_active being set before it's called.

While at it, I've changed a couple of bool_t's to bool's.

Signed-off-by: Razvan Cojocaru 

---
Changes since V3:
 - Reverted p2m_get_altp2m() to its original state.
 - Added an altp2m_active() check to HVMOP_altp2m_switch_p2m.
 - Updated patch description.
---
 xen/arch/x86/hvm/hvm.c| 20 
 xen/arch/x86/hvm/vmx/vmx.c|  4 ++--
 xen/arch/x86/mm/altp2m.c  |  4 ++--
 xen/arch/x86/mm/p2m.c |  4 ++--
 xen/include/asm-x86/domain.h  |  2 +-
 xen/include/asm-x86/hvm/hvm.h |  6 +++---
 6 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 21944e9..21ec059 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4525,7 +4525,7 @@ static int do_altp2m_op(
 case HVMOP_altp2m_set_domain_state:
 {
 struct vcpu *v;
-bool_t ostate;
+bool ostate, nstate;
 
 if ( nestedhvm_enabled(d) )
 {
@@ -4534,12 +4534,15 @@ static int do_altp2m_op(
 }
 
 ostate = d->arch.altp2m_active;
-d->arch.altp2m_active = !!a.u.domain_state.state;
+nstate = a.u.domain_state.state;
 
 /* If the alternate p2m state has changed, handle appropriately */
-if ( d->arch.altp2m_active != ostate &&
+if ( nstate != ostate &&
  (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) )
 {
+/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */
+d->arch.altp2m_active = false;
+
 for_each_vcpu( d, v )
 {
 if ( !ostate )
@@ -4550,7 +4553,14 @@ static int do_altp2m_op(
 
 if ( ostate )
 p2m_flush_altp2m(d);
+
+/*
+ * Wait until altp2m_vcpu_initialise() is done before marking
+ * altp2m as being enabled for the domain.
+ */
+d->arch.altp2m_active = nstate;
 }
+
 break;
 }
 
@@ -4624,7 +4634,9 @@ static int do_altp2m_op(
 break;
 
 case HVMOP_altp2m_switch_p2m:
-rc = p2m_switch_domain_altp2m_by_id(d, a.u.view.view);
+rc = -EOPNOTSUPP;
+if ( altp2m_active(d) )
+rc = p2m_switch_domain_altp2m_by_id(d, a.u.view.view);
 break;
 
 case HVMOP_altp2m_set_suppress_ve:
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 64af8bf..e542568 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void)
 return !!cpu_has_monitor_trap_flag;
 }
 
-static void vmx_vcpu_update_eptp(struct vcpu *v)
+static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled)
 {
 struct domain *d = v->domain;
 struct p2m_domain *p2m = NULL;
 struct ept_data *ept;
 
-if ( altp2m_active(d) )
+if ( altp2m_enabled )
 p2m = p2m_get_altp2m(v);
 if ( !p2m )
 p2m = p2m_get_hostp2m(d);
diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c
index 930bdc2..c51303b 100644
--- a/xen/arch/x86/mm/altp2m.c
+++ b/xen/arch/x86/mm/altp2m.c
@@ -39,7 +39,7 @@ altp2m_vcpu_initialise(struct vcpu *v)
 vcpu_altp2m(v).p2midx = 0;
 atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
 
-altp2m_vcpu_update_p2m(v);
+altp2m_vcpu_update_p2m(v, true);
 
 if ( v != current )
 vcpu_unpause(v);
@@ -58,7 +58,7 @@ altp2m_vcpu_destroy(struct vcpu *v)
 
 altp2m_vcpu_reset(v);
 
-altp2m_vcpu_update_p2m(v);
+altp2m_vcpu_update_p2m(v, false);
 altp2m_vcpu_update_vmfunc_ve(v);
 
 if ( v != current )
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d14ce57..6f991f8 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, 
unsigned int idx)
 atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
 vcpu_altp2m(v).p2midx = idx;
 atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
-altp2m_vcpu_update_p2m(v);
+  

Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-12 Thread Razvan Cojocaru

On 2/11/19 6:59 PM, Jan Beulich wrote:

Plus I can't see p2m_switch_vcpu_altp2m_by_id() called for
any HVMOP_altp2m_* at all. One of the actual callers is guarded
by altp2m_active(), but the other isn't.


Actually I see that both places are guarded by altp2m_active().

In p2m.c we have:

2312 void p2m_altp2m_check(struct vcpu *v, uint16_t idx)
2313 {
2314 if ( altp2m_active(v->domain) )
2315 p2m_switch_vcpu_altp2m_by_id(v, idx);
2316 }

and in vmx.c:

2225 static int vmx_vcpu_emulate_vmfunc(const struct cpu_user_regs *regs)
2226 {
2227 int rc = X86EMUL_EXCEPTION;
2228 struct vcpu *curr = current;
2229
2230 if ( !cpu_has_vmx_vmfunc && altp2m_active(curr->domain) &&
2231  regs->eax == 0 &&
2232  p2m_switch_vcpu_altp2m_by_id(curr, regs->ecx) )
2233 rc = X86EMUL_OKAY;
2234
2235 return rc;
2236 }

here there's an "&& altp2m_active(curr->domain)" in the if().

So I suppose in our scenario all that's needed it a similar check here:

4636 case HVMOP_altp2m_switch_p2m:
4637 rc = p2m_switch_domain_altp2m_by_id(d, a.u.view.view);
4638 break;

for the other function, p2m_switch_domain_altp2m_by_id().

Unless I'm missing something.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-12 Thread Razvan Cojocaru

On 2/12/19 9:31 AM, Jan Beulich wrote:

On 11.02.19 at 18:21,  wrote:

On 2/11/19 6:59 PM, Jan Beulich wrote:

Thanks for noticing, actually this appears to invalidate the whole
purpose of the patch (I should have tested this more before sumbitting
V3, sorry).

The whole point of the new boolean is to have p2m assigned an altp2m
regardless of altp2m_active() (hence the change) - which now no longer
happens. I got carried away with this change.

The fact that this is so easy to get wrong is the reason why I've
preferred the domain_pause() solution. There appears to be no clean way
to fix this otherwise, and if this is so easy to misunderstand it'll
break just as easily with further changes.

I suppose I could just pass the bool along to p2m_get_altp2m() (and
indeed remove the if())...


I think the best that can be done here is to check if altp2m_active()
early in p2m_switch_vcpu_altp2m_by_id() and
p2m_switch_domain_altp2m_by_id(), then bail if it's not active. Since
these are only called by HVMOP_altp2m_* (and thus serialized by the
domain lock), it should be enough WRT HVMOP_altp2m_set_domain_state.


I'm confused: Where do you see the domain lock used there?
Plus I can't see p2m_switch_vcpu_altp2m_by_id() called for
any HVMOP_altp2m_* at all. One of the actual callers is guarded
by altp2m_active(), but the other isn't.


do_altp2m_op() does d = rcu_lock_domain_by_any_id(a.domain);, and
unlocks it before it exits.


But that's not the "domain lock". This is just making sure the domain
won't disappear behind your back.


But you're right, p2m_switch_vcpu_altp2m_by_id() is not called for any
HVMOP_altp2m_*, I've misread that. Hence, I believe both callers
ofp2m_switch_vcpu_altp2m_by_id() may race with HVMOP_altp2m_*.

Would you like me to add the altp2m_active() check in both
p2m_switch_domain_altp2m_by_id() and p2m_switch_vcpu_altp2m_by_id(), and
make it harder to race (it still won't be impossible, since the bool may
become false between the check and the actual function logic for
p2m_switch_vcpu_altp2m_by_id(), as you've noticed)?


Well, altp2m being Tech Preview, any partial fix _could_ do in
principle. But personally I dislike such half hearted approaches,
and hence I'd recommend to address the issue in full, i.e.
eliminate race potential altogether. In the end you're going to
be bitten more than me by not getting this into proper shape.


I'll attempt a V4 doing my best to check altp2m_active() then, and see 
if that's at least satisfactory for sane use-cases.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-11 Thread Razvan Cojocaru
On 2/11/19 6:59 PM, Jan Beulich wrote:
>>> Thanks for noticing, actually this appears to invalidate the whole 
>>> purpose of the patch (I should have tested this more before sumbitting 
>>> V3, sorry).
>>>
>>> The whole point of the new boolean is to have p2m assigned an altp2m 
>>> regardless of altp2m_active() (hence the change) - which now no longer 
>>> happens. I got carried away with this change.
>>>
>>> The fact that this is so easy to get wrong is the reason why I've 
>>> preferred the domain_pause() solution. There appears to be no clean way 
>>> to fix this otherwise, and if this is so easy to misunderstand it'll 
>>> break just as easily with further changes.
>>>
>>> I suppose I could just pass the bool along to p2m_get_altp2m() (and 
>>> indeed remove the if())...
>>
>> I think the best that can be done here is to check if altp2m_active() 
>> early in p2m_switch_vcpu_altp2m_by_id() and 
>> p2m_switch_domain_altp2m_by_id(), then bail if it's not active. Since 
>> these are only called by HVMOP_altp2m_* (and thus serialized by the 
>> domain lock), it should be enough WRT HVMOP_altp2m_set_domain_state.
> 
> I'm confused: Where do you see the domain lock used there?
> Plus I can't see p2m_switch_vcpu_altp2m_by_id() called for
> any HVMOP_altp2m_* at all. One of the actual callers is guarded
> by altp2m_active(), but the other isn't.

do_altp2m_op() does d = rcu_lock_domain_by_any_id(a.domain);, and
unlocks it before it exits.

But you're right, p2m_switch_vcpu_altp2m_by_id() is not called for any
HVMOP_altp2m_*, I've misread that. Hence, I believe both callers
ofp2m_switch_vcpu_altp2m_by_id() may race with HVMOP_altp2m_*.

Would you like me to add the altp2m_active() check in both
p2m_switch_domain_altp2m_by_id() and p2m_switch_vcpu_altp2m_by_id(), and
make it harder to race (it still won't be impossible, since the bool may
become false between the check and the actual function logic for
p2m_switch_vcpu_altp2m_by_id(), as you've noticed)?

>> I see no other way out of this (aside from the domain_pause() fix).
> 
> If only that one would have been a complete fix, rather than just a
> partial one.

Agreed, but that one at least clearly fixes the external case, whereas
this doesn't seem to cover all corner cases for any situation.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-11 Thread Razvan Cojocaru

On 2/11/19 12:57 PM, Razvan Cojocaru wrote:

On 2/11/19 12:10 PM, Jan Beulich wrote:

On 11.02.19 at 10:13,  wrote:

--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void)
  return !!cpu_has_monitor_trap_flag;
  }
-static void vmx_vcpu_update_eptp(struct vcpu *v)
+static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled)
  {
  struct domain *d = v->domain;
  struct p2m_domain *p2m = NULL;
  struct ept_data *ept;
-    if ( altp2m_active(d) )
+    if ( altp2m_enabled )
  p2m = p2m_get_altp2m(v);
  if ( !p2m )
  p2m = p2m_get_hostp2m(d);


With the change you now make to p2m_get_altp2m(), this looks to be
a benign change. Which to me would suggest to either leave the code
alone, or to drop the if() (but - again - not its body) altogether. At
which point the code could be further streamlined, as then the NULL
initializer can go away and the assignment (or then perhaps initializer)
could become "p2m = p2m_get_altp2m(v) ?: p2m_get_hostp2m(d)".
(Generally I'd recommend to leave out the change here, and do the
transformation in a follow-on patch.)


Thanks for noticing, actually this appears to invalidate the whole 
purpose of the patch (I should have tested this more before sumbitting 
V3, sorry).


The whole point of the new boolean is to have p2m assigned an altp2m 
regardless of altp2m_active() (hence the change) - which now no longer 
happens. I got carried away with this change.


The fact that this is so easy to get wrong is the reason why I've 
preferred the domain_pause() solution. There appears to be no clean way 
to fix this otherwise, and if this is so easy to misunderstand it'll 
break just as easily with further changes.


I suppose I could just pass the bool along to p2m_get_altp2m() (and 
indeed remove the if())...


I think the best that can be done here is to check if altp2m_active() 
early in p2m_switch_vcpu_altp2m_by_id() and 
p2m_switch_domain_altp2m_by_id(), then bail if it's not active. Since 
these are only called by HVMOP_altp2m_* (and thus serialized by the 
domain lock), it should be enough WRT HVMOP_altp2m_set_domain_state.


This of course means reverting p2m_get_altp2m() to its original 
non-intuitive state of returning a valid altp2m pointer even when 
altp2m_active() is false.


I see no other way out of this (aside from the domain_pause() fix).


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-11 Thread Razvan Cojocaru

On 2/11/19 12:10 PM, Jan Beulich wrote:

On 11.02.19 at 10:13,  wrote:

--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void)
  return !!cpu_has_monitor_trap_flag;
  }
  
-static void vmx_vcpu_update_eptp(struct vcpu *v)

+static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled)
  {
  struct domain *d = v->domain;
  struct p2m_domain *p2m = NULL;
  struct ept_data *ept;
  
-if ( altp2m_active(d) )

+if ( altp2m_enabled )
  p2m = p2m_get_altp2m(v);
  if ( !p2m )
  p2m = p2m_get_hostp2m(d);


With the change you now make to p2m_get_altp2m(), this looks to be
a benign change. Which to me would suggest to either leave the code
alone, or to drop the if() (but - again - not its body) altogether. At
which point the code could be further streamlined, as then the NULL
initializer can go away and the assignment (or then perhaps initializer)
could become "p2m = p2m_get_altp2m(v) ?: p2m_get_hostp2m(d)".
(Generally I'd recommend to leave out the change here, and do the
transformation in a follow-on patch.)


Thanks for noticing, actually this appears to invalidate the whole 
purpose of the patch (I should have tested this more before sumbitting 
V3, sorry).


The whole point of the new boolean is to have p2m assigned an altp2m 
regardless of altp2m_active() (hence the change) - which now no longer 
happens. I got carried away with this change.


The fact that this is so easy to get wrong is the reason why I've 
preferred the domain_pause() solution. There appears to be no clean way 
to fix this otherwise, and if this is so easy to misunderstand it'll 
break just as easily with further changes.


I suppose I could just pass the bool along to p2m_get_altp2m() (and 
indeed remove the if())...



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-11 Thread Razvan Cojocaru
HVMOP_altp2m_set_domain_state does not domain_pause(), presumably
on purpose (as it was originally supposed to cater to a in-guest
agent, and a domain pausing itself is not a good idea).

This can lead to domain crashes in the vmx_vmexit_handler() code
that checks if the guest has the ability to switch EPTP without an
exit. That code can __vmread() the host p2m's EPT_POINTER
(before HVMOP_altp2m_set_domain_state "for_each_vcpu()" has a
chance to run altp2m_vcpu_initialise(), but after
d->arch.altp2m_active is set).

This patch reorganizes the code so that d->arch.altp2m_active
is set to true only after all the init work has been done, and
to false before the uninit work begins. This required adding
a new bool parameter altp2m_vcpu_update_p2m(), which relied
on d->arch.altp2m_active being set before it's called.

p2m_get_altp2m() now returns NULL if !altp2m_active(), to
prevent it from returning a "valid" altp2m pointer between
the moment where d->arch.altp2m_active = false and the
point when v->p2midx actually becomes INVALID_ALTP2M.

While at it, I've changed a couple of bool_t's to bool's.

Signed-off-by: Razvan Cojocaru 

---
Changes since V2:
 - Removed leftover asm-x86/altp2m.h changes.
 - nstate = !!a.u.domain_state.state; becomes
   nstate = a.u.domain_state.state;
 - Removed the if() and else in do_altp2m_op() as
   recommended by Jan.
 - Using true explicitly instead of altp2m_active(d) for
   altp2m_vcpu_update_p2m() in p2m_switch_vcpu_altp2m_by_id()
   and p2m_switch_domain_altp2m_by_id().
 - Updated patch description.
 - Modified p2m_get_altp2m() to return NULL if !altp2m_active().
---
 xen/arch/x86/hvm/hvm.c| 16 +---
 xen/arch/x86/hvm/vmx/vmx.c|  4 ++--
 xen/arch/x86/mm/altp2m.c  |  4 ++--
 xen/arch/x86/mm/p2m.c |  4 ++--
 xen/include/asm-x86/domain.h  |  2 +-
 xen/include/asm-x86/hvm/hvm.h |  6 +++---
 xen/include/asm-x86/p2m.h |  8 +++-
 7 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 21944e9..50d896d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4525,7 +4525,7 @@ static int do_altp2m_op(
 case HVMOP_altp2m_set_domain_state:
 {
 struct vcpu *v;
-bool_t ostate;
+bool ostate, nstate;
 
 if ( nestedhvm_enabled(d) )
 {
@@ -4534,12 +4534,15 @@ static int do_altp2m_op(
 }
 
 ostate = d->arch.altp2m_active;
-d->arch.altp2m_active = !!a.u.domain_state.state;
+nstate = a.u.domain_state.state;
 
 /* If the alternate p2m state has changed, handle appropriately */
-if ( d->arch.altp2m_active != ostate &&
+if ( nstate != ostate &&
  (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) )
 {
+/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */
+d->arch.altp2m_active = false;
+
 for_each_vcpu( d, v )
 {
 if ( !ostate )
@@ -4550,7 +4553,14 @@ static int do_altp2m_op(
 
 if ( ostate )
 p2m_flush_altp2m(d);
+
+/*
+ * Wait until altp2m_vcpu_initialise() is done before marking
+ * altp2m as being enabled for the domain.
+ */
+d->arch.altp2m_active = nstate;
 }
+
 break;
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 64af8bf..e542568 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void)
 return !!cpu_has_monitor_trap_flag;
 }
 
-static void vmx_vcpu_update_eptp(struct vcpu *v)
+static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled)
 {
 struct domain *d = v->domain;
 struct p2m_domain *p2m = NULL;
 struct ept_data *ept;
 
-if ( altp2m_active(d) )
+if ( altp2m_enabled )
 p2m = p2m_get_altp2m(v);
 if ( !p2m )
 p2m = p2m_get_hostp2m(d);
diff --git a/xen/arch/x86/mm/altp2m.c b/xen/arch/x86/mm/altp2m.c
index 930bdc2..c51303b 100644
--- a/xen/arch/x86/mm/altp2m.c
+++ b/xen/arch/x86/mm/altp2m.c
@@ -39,7 +39,7 @@ altp2m_vcpu_initialise(struct vcpu *v)
 vcpu_altp2m(v).p2midx = 0;
 atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
 
-altp2m_vcpu_update_p2m(v);
+altp2m_vcpu_update_p2m(v, true);
 
 if ( v != current )
 vcpu_unpause(v);
@@ -58,7 +58,7 @@ altp2m_vcpu_destroy(struct vcpu *v)
 
 altp2m_vcpu_reset(v);
 
-altp2m_vcpu_update_p2m(v);
+altp2m_vcpu_update_p2m(v, false);
 altp2m_vcpu_update_vmfunc_ve(v);
 
 if ( v != current )
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d14ce57..6f991f8 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, 
unsigned int idx

Re: [Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-11 Thread Razvan Cojocaru




Right, I now understand what you meant by removing "if ( ostate )" -
you'd like d->arch.altp2m_active to only be set _after_ the for, for the
enable, as well as for the disable case.


No, certainly not. Exactly because of ...


However, I wanted to keep both if()s to avoid another problem with this
code:
[...]
So for the disable case, I wanted altp2m_active(v->domain) to return
false immediately so that this code won't be triggered. Otherwise,
assuming the following scenario:


... this. Apparently "and keep what is now the body" was still not clear
enough.

Taking your original code, I mean

 if ( nstate != ostate &&
  (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) )
 {
 /* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */
 d->arch.altp2m_active = false;

 for_each_vcpu( d, v )
 [...]

 if ( ostate )
 p2m_flush_altp2m(d);
 /*
  * Wait until altp2m_vcpu_initialise() is done before marking
  * altp2m as being enabled for the domain.
  */
 d->arch.altp2m_active = nstate;
 }


Ah! Thanks for the clarification.


Now, you're right that p2m_get_altp2m() may hand over a pointer to an
altp2m between the moment where d->arch.altp2m_active is false and the
point where v->p2midx actually becomes INVALID_ALTP2M with my code; but
I think it can be argued that I should rather fix p2m_get_altp2m(), to
return NULL if altp2m_active() == false and keep the if()s (which I've
hopefully been more eloquent about now).


Yes, that looks to be an option, provided it doesn't violate assumptions
elsewhere.


It doesn't, AFAICT. Additionally, it can be argued that if code relies 
on p2m_get_altp2m() returning not-NULL when !altp2m_active(), said code 
is broken.



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-08 Thread Razvan Cojocaru
On 2/8/19 6:44 PM, Jan Beulich wrote:
 On 08.02.19 at 17:30,  wrote:
>> On 2/8/19 5:50 PM, Jan Beulich wrote:
>> On 08.02.19 at 15:00,  wrote:
  /* If the alternate p2m state has changed, handle appropriately */
 -if ( d->arch.altp2m_active != ostate &&
 +if ( nstate != ostate &&
   (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) )
  {
 +/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). 
 */
 +if ( ostate )
 +d->arch.altp2m_active = false;
>>>
>>> Why the if()? In the opposite case you'd simply write false into
>>> what already holds false.
>>
>> The value written into d->arch.altp2m_active is not the point here. The
>> point is that if ( ostate ), then we are disabling altp2m (because the
>> if above this one makes sure ostate != nstate).
>>
>> So in the disable case, first I wanted to set d->arch.altp2m_active to
>> false (which immediately causes altp2m_active(d) to return false as
>> well), and then actually perform the work.
> 
> Sure - I'm not asking to remove everything above, just the if()
> (and keep what is now the body). That is because
> - in the enable case the value is false and writing false into it is
>   benign,
> - in the disable case you want to actively change from true to
>   false.
> 
 @@ -4550,7 +4554,14 @@ static int do_altp2m_op(
  
  if ( ostate )
  p2m_flush_altp2m(d);
 +else
 +/*
 + * Wait until altp2m_vcpu_initialise() is done before 
 marking
 + * altp2m as being enabled for the domain.
 + */
 +d->arch.altp2m_active = true;
>>>
>>> Similarly here you could omit the "else" and simply store "nstate" afaict.
>>
>> As above, in the enable case I wanted to first setup altp2m on all VCPUs
>> with altp2m_vcpu_initialise(), and only after that set
>> d->arch.altp2m_active = true.
>>
>> In summary, if ( ostate ) we want to set d->arch.altp2m_active before
>> the for (we're disabling altp2m), and if ( !ostate ) (which is the else
>> above) we want to set d->arch.altp2m_active after said for.
>>
>> We can indeed store nstate. I just thought things look clearer with
>> "true" and "false", but if you prefer there's no problem assigning nstate.
> 
> Well, as always with mm and altp2m code, I'm not the maintainer, so
> I don't have the final say. It's just that I think unnecessary conditionals
> would better be avoided, even if they're not on a performance critical
> path.
> 
 --- a/xen/arch/x86/hvm/vmx/vmx.c
 +++ b/xen/arch/x86/hvm/vmx/vmx.c
 @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void)
  return !!cpu_has_monitor_trap_flag;
  }
  
 -static void vmx_vcpu_update_eptp(struct vcpu *v)
 +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled)
  {
  struct domain *d = v->domain;
  struct p2m_domain *p2m = NULL;
  struct ept_data *ept;
  
 -if ( altp2m_active(d) )
 +if ( altp2m_enabled )
  p2m = p2m_get_altp2m(v);
  if ( !p2m )
  p2m = p2m_get_hostp2m(d);
>>>
>>> Is this an appropriate transformation? That is, can there not be
>>> any domains for which altp2m_active() returns false despite
>>> altp2m_enabled being true? (Looking at p2m_get_altp2m() I can't
>>> really judge whether index would always be INVALID_ALTP2M in
>>> such a case.)
>>
>> Yes, it should be completely safe (please see below for details).
>>
 --- a/xen/arch/x86/mm/p2m.c
 +++ b/xen/arch/x86/mm/p2m.c
 @@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, 
 unsigned int idx)
  atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
  vcpu_altp2m(v).p2midx = idx;
  atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
 -altp2m_vcpu_update_p2m(v);
 +altp2m_vcpu_update_p2m(v, altp2m_active(d));
  }
  rc = 1;
  }
 @@ -2573,7 +2573,7 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, 
 unsigned int idx)
  atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
  vcpu_altp2m(v).p2midx = idx;
  atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
 -altp2m_vcpu_update_p2m(v);
 +altp2m_vcpu_update_p2m(v, altp2m_active(d));
  }
>>>
>>> In both cases, isn't altp2m_active() going to return true anyway
>>> when we get there (related to the question above)?
>>
>> Yes, it will return true. In order for it to return false,
>> altp2m_vcpu_destroy() would have had to run on that VCPU, which (among
>> other things) calls altp2m_vcpu_reset(), which does v->p2midx =
>> INVALID_ALTP2M.
> 
> Hmm, according to the change further up in this patch, altp2m_active()
> 

Re: [Xen-devel] [PATCH for-4.12 V2] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

2019-02-08 Thread Razvan Cojocaru
On 2/8/19 5:50 PM, Jan Beulich wrote:
 On 08.02.19 at 15:00,  wrote:
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -4525,7 +4525,7 @@ static int do_altp2m_op(
>>  case HVMOP_altp2m_set_domain_state:
>>  {
>>  struct vcpu *v;
>> -bool_t ostate;
>> +bool ostate, nstate;
>>  
>>  if ( nestedhvm_enabled(d) )
>>  {
>> @@ -4534,12 +4534,16 @@ static int do_altp2m_op(
>>  }
>>  
>>  ostate = d->arch.altp2m_active;
>> -d->arch.altp2m_active = !!a.u.domain_state.state;
>> +nstate = !!a.u.domain_state.state;
> 
> No need for !! here.

I'll remove it.

>>  /* If the alternate p2m state has changed, handle appropriately */
>> -if ( d->arch.altp2m_active != ostate &&
>> +if ( nstate != ostate &&
>>   (ostate || !(rc = p2m_init_altp2m_by_id(d, 0))) )
>>  {
>> +/* First mark altp2m as disabled, then altp2m_vcpu_destroy(). */
>> +if ( ostate )
>> +d->arch.altp2m_active = false;
> 
> Why the if()? In the opposite case you'd simply write false into
> what already holds false.

The value written into d->arch.altp2m_active is not the point here. The
point is that if ( ostate ), then we are disabling altp2m (because the
if above this one makes sure ostate != nstate).

So in the disable case, first I wanted to set d->arch.altp2m_active to
false (which immediately causes altp2m_active(d) to return false as
well), and then actually perform the work.

>> @@ -4550,7 +4554,14 @@ static int do_altp2m_op(
>>  
>>  if ( ostate )
>>  p2m_flush_altp2m(d);
>> +else
>> +/*
>> + * Wait until altp2m_vcpu_initialise() is done before 
>> marking
>> + * altp2m as being enabled for the domain.
>> + */
>> +d->arch.altp2m_active = true;
> 
> Similarly here you could omit the "else" and simply store "nstate" afaict.

As above, in the enable case I wanted to first setup altp2m on all VCPUs
with altp2m_vcpu_initialise(), and only after that set
d->arch.altp2m_active = true.

In summary, if ( ostate ) we want to set d->arch.altp2m_active before
the for (we're disabling altp2m), and if ( !ostate ) (which is the else
above) we want to set d->arch.altp2m_active after said for.

We can indeed store nstate. I just thought things look clearer with
"true" and "false", but if you prefer there's no problem assigning nstate.

>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -2150,13 +2150,13 @@ static bool_t vmx_is_singlestep_supported(void)
>>  return !!cpu_has_monitor_trap_flag;
>>  }
>>  
>> -static void vmx_vcpu_update_eptp(struct vcpu *v)
>> +static void vmx_vcpu_update_eptp(struct vcpu *v, bool altp2m_enabled)
>>  {
>>  struct domain *d = v->domain;
>>  struct p2m_domain *p2m = NULL;
>>  struct ept_data *ept;
>>  
>> -if ( altp2m_active(d) )
>> +if ( altp2m_enabled )
>>  p2m = p2m_get_altp2m(v);
>>  if ( !p2m )
>>  p2m = p2m_get_hostp2m(d);
> 
> Is this an appropriate transformation? That is, can there not be
> any domains for which altp2m_active() returns false despite
> altp2m_enabled being true? (Looking at p2m_get_altp2m() I can't
> really judge whether index would always be INVALID_ALTP2M in
> such a case.)

Yes, it should be completely safe (please see below for details).

>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -2332,7 +2332,7 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, 
>> unsigned int idx)
>>  atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
>>  vcpu_altp2m(v).p2midx = idx;
>>  atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
>> -altp2m_vcpu_update_p2m(v);
>> +altp2m_vcpu_update_p2m(v, altp2m_active(d));
>>  }
>>  rc = 1;
>>  }
>> @@ -2573,7 +2573,7 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, 
>> unsigned int idx)
>>  atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
>>  vcpu_altp2m(v).p2midx = idx;
>>  atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
>> -altp2m_vcpu_update_p2m(v);
>> +altp2m_vcpu_update_p2m(v, altp2m_active(d));
>>  }
> 
> In both cases, isn't altp2m_active() going to return true anyway
> when we get there (related to the question above)?

Yes, it will return true. In order for it to return false,
altp2m_vcpu_destroy() would have had to run on that VCPU, which (among
other things) calls altp2m_vcpu_reset(), which does v->p2midx =
INVALID_ALTP2M.

There's an if() above (not shown in your reply) that says "if ( idx !=
vcpu_altp2m(v).p2midx )", so, indeed, by the time we end up here we can
reasonably assume that altp2m_active(d) will return true.

I've just put in "altp2m_active(d)" to make sure there's absolutely no
functional change here, but AFA

  1   2   3   4   5   6   >