Re: [Xen-devel] [PATCH] VMX: sync CPU state upon vCPU destruction

2017-11-21 Thread Igor Druzhinin
On 09/11/17 14:49, Jan Beulich wrote:
> See the code comment being added for why we need this.
> 
> Reported-by: Igor Druzhinin <igor.druzhi...@citrix.com>
> Signed-off-by: Jan Beulich <jbeul...@suse.com>
> 
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -479,7 +479,13 @@ static void vmx_vcpu_destroy(struct vcpu
>   * we should disable PML manually here. Note that vmx_vcpu_destroy is 
> called
>   * prior to vmx_domain_destroy so we need to disable PML for each vcpu
>   * separately here.
> + *
> + * Before doing that though, flush all state for the vCPU previously 
> having
> + * run on the current CPU, so that this flushing of state won't happen 
> from
> + * the TLB flush IPI handler behind the back of a vmx_vmcs_enter() /
> + * vmx_vmcs_exit() section.
>   */
> +sync_local_execstate();
>  vmx_vcpu_disable_pml(v);
>  vmx_destroy_vmcs(v);
>  passive_domain_destroy(v);
> 

Reviewed-by: Igor Druzhinin <igor.druzhi...@citrix.com>

Igor

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Ping: [PATCH] VMX: sync CPU state upon vCPU destruction

2017-11-21 Thread Igor Druzhinin
On 21/11/17 15:29, Jan Beulich wrote:
>>>> On 21.11.17 at 15:07, <igor.druzhi...@citrix.com> wrote:
>> On 21/11/17 13:22, Jan Beulich wrote:
>>>>>> On 09.11.17 at 15:49, <jbeul...@suse.com> wrote:
>>>> See the code comment being added for why we need this.
>>>>
>>>> Reported-by: Igor Druzhinin <igor.druzhi...@citrix.com>
>>>> Signed-off-by: Jan Beulich <jbeul...@suse.com>
>>>
>>> I realize we aren't settled yet on where to put the sync call. The
>>> discussion appears to have stalled, though. Just to recap,
>>> alternatives to the placement below are
>>> - at the top of complete_domain_destroy(), being the specific
>>>   RCU callback exhibiting the problem (others are unlikely to
>>>   touch guest state)
>>> - in rcu_do_batch(), paralleling the similar call from
>>>   do_tasklet_work()
>>
>> rcu_do_batch() sounds better to me. As I said before I think that the
>> problem is general for the hypervisor (not for VMX only) and might
>> appear in other places as well.
> 
> The question here is: In what other cases do we expect an RCU
> callback to possibly touch guest state? I think the common use is
> to merely free some memory in a delayed fashion.
> 

I don't know for sure what the common scenario is for Xen but drawing
parallels between Linux - you're probably right.

>> Those choices that you outlined appear to be different in terms whether
>> we solve the general problem and probably have some minor performance
>> impact or we solve the ad-hoc problem but make the system more
>> entangled. Here I'm more inclined to the first choice because this
>> particular scenario the performance impact should be negligible.
> 
> For the problem at hand there's no question about a
> performance effect. The question is whether doing this for _other_
> RCU callbacks would introduce a performance drop in certain cases.
> 

Yes, right. In that case this placement would mean we are going to lose
the partial context each time we take RCU in idle, is this correct? If
so that sounds like a common scenario to me and means there will be some
performance degradation, although I don't know how common it really is.

Anyway, if you're in favor of the previous approach I have no objections
as my understanding of Xen codebase is still partial.

Igor


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Ping: [PATCH] VMX: sync CPU state upon vCPU destruction

2017-11-21 Thread Igor Druzhinin
On 21/11/17 13:22, Jan Beulich wrote:
>>>> On 09.11.17 at 15:49, <jbeul...@suse.com> wrote:
>> See the code comment being added for why we need this.
>>
>> Reported-by: Igor Druzhinin <igor.druzhi...@citrix.com>
>> Signed-off-by: Jan Beulich <jbeul...@suse.com>
> 
> I realize we aren't settled yet on where to put the sync call. The
> discussion appears to have stalled, though. Just to recap,
> alternatives to the placement below are
> - at the top of complete_domain_destroy(), being the specific
>   RCU callback exhibiting the problem (others are unlikely to
>   touch guest state)
> - in rcu_do_batch(), paralleling the similar call from
>   do_tasklet_work()

rcu_do_batch() sounds better to me. As I said before I think that the
problem is general for the hypervisor (not for VMX only) and might
appear in other places as well.

Those choices that you outlined appear to be different in terms whether
we solve the general problem and probably have some minor performance
impact or we solve the ad-hoc problem but make the system more
entangled. Here I'm more inclined to the first choice because this
particular scenario the performance impact should be negligible.

Igor


> 
> Jan
> 
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -479,7 +479,13 @@ static void vmx_vcpu_destroy(struct vcpu
>>   * we should disable PML manually here. Note that vmx_vcpu_destroy is 
>> called
>>   * prior to vmx_domain_destroy so we need to disable PML for each vcpu
>>   * separately here.
>> + *
>> + * Before doing that though, flush all state for the vCPU previously 
>> having
>> + * run on the current CPU, so that this flushing of state won't happen 
>> from
>> + * the TLB flush IPI handler behind the back of a vmx_vmcs_enter() /
>> + * vmx_vmcs_exit() section.
>>   */
>> +sync_local_execstate();
>>  vmx_vcpu_disable_pml(v);
>>  vmx_destroy_vmcs(v);
>>  passive_domain_destroy(v);
>>
>>
>>
>>
>> ___
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org 
>> https://lists.xen.org/xen-devel 
> 
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VMX: sync CPU state upon vCPU destruction

2017-11-10 Thread Igor Druzhinin
On 10/11/17 10:30, Jan Beulich wrote:
 On 10.11.17 at 09:41,  wrote:
>> On Thu, 2017-11-09 at 07:49 -0700, Jan Beulich wrote:
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -479,7 +479,13 @@ static void vmx_vcpu_destroy(struct vcpu
>>>   * we should disable PML manually here. Note that vmx_vcpu_destroy is 
>>> called
>>>   * prior to vmx_domain_destroy so we need to disable PML for each vcpu
>>>   * separately here.
>>> + *
>>> + * Before doing that though, flush all state for the vCPU previously 
>>> having
>>> + * run on the current CPU, so that this flushing of state won't happen 
>>> from
>>> + * the TLB flush IPI handler behind the back of a vmx_vmcs_enter() /
>>> + * vmx_vmcs_exit() section.
>>>   */
>>> +sync_local_execstate();
>>>  vmx_vcpu_disable_pml(v);
>>>  vmx_destroy_vmcs(v);
>>>  passive_domain_destroy(v);
>>
>> This patch fixes only one particular issue and not the general problem.
>> What if vmcs is cleared, possibly in some future code, at another place?
> 
> As indicated in the earlier discussion, if we go this route, other
> future async accesses may need to do the same then.
> 
>> The original intent of vmx_vmcs_reload() is correct: it lazily loads
>> the vmcs when it's needed. It's just the logic which checks for
>> v->is_running inside vmx_ctxt_switch_from() is flawed: v might be
>> "running" on another pCPU.
>>
>> IMHO there are 2 possible solutions:
>>
>> 1. Add additional pCPU check into vmx_ctxt_switch_from()
> 
> I agree with Dario in not seeing this as a possible solution.
> 
>>2. Drop v->is_running check inside vmx_ctxt_switch_from() making
>>vmx_vmcs_reload() unconditional.
> 
> This is an option, indeed (and I don't think it would have a
> meaningful performance impact, as vmx_vmcs_reload() does
> nothing if the right VMCS is already in place). Iirc I had added the
> conditional back then merely to introduce as little of a behavioral
> change as was (appeared to be at that time) necessary. What I'm
> not certain about, however, is the final state we'll end up in then.
> Coming back to your flow scheme (altered to represent the
> suggested new flow):
> 

I was thinking of this approach for a while and I couldn't find anything
dangerous that can be potentially done by vmcs_reload() since it looks
like that it already has all the necessary checks inside.

> pCPU1   pCPU2
> =   =
> current == vCPU1
> context_switch(next == idle)
> !! __context_switch() is skipped
> vcpu_migrate(vCPU1)
> RCU callbacks
> vmx_vcpu_destroy()
> vmx_vcpu_disable_pml()
> current_vmcs = 0
> 
> schedule(next == vCPU1)
> vCPU1->is_running = 1;
> context_switch(next == vCPU1)
> flush_tlb_mask(_mask);
> 
> <--- IPI
> 
> __sync_local_execstate()
> __context_switch(prev == vCPU1)
> vmx_ctxt_switch_from(vCPU1)
> vmx_vmcs_reload()
> ...
> 
> We'd now leave the being destroyed vCPU's VMCS active in pCPU1
> (at least I can't see where it would be deactivated again).

This would be VMCS of the migrated vCPU - not the destroyed one.

Igor

> Overall I think it is quite reasonable to terminate early a lazy
> context switch of a vCPU under destruction. From that abstract
> consideration, forcing this higher up the call stack of
> vmx_vcpu_destroy() (as I had suggested as an alternative
> previously, before actually moving it further down into VMX code,
> perhaps even right in RCU handling) would continue to be an
> option. In this context you may want to pay particular
> attention to the description of 346da00456 ("Synchronise lazy
> execstate before calling tasklet handlers").
> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-07 Thread Igor Druzhinin
On 07/11/17 14:55, Jan Beulich wrote:
 On 07.11.17 at 15:24,  wrote:
>> On 07/11/17 08:07, Jan Beulich wrote:
>>> --- unstable.orig/xen/arch/x86/domain.c
>>> +++ unstable/xen/arch/x86/domain.c
>>> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
>>>  
>>>  void vcpu_destroy(struct vcpu *v)
>>>  {
>>> +/*
>>> + * Flush all state for this vCPU before fully tearing it down. This is
>>> + * particularly important for HVM ones on VMX, so that this flushing of
>>> + * state won't happen from the TLB flush IPI handler behind the back of
>>> + * a vmx_vmcs_enter() / vmx_vmcs_exit() section.
>>> + */
>>> +sync_vcpu_execstate(v);
>>> +
>>>  xfree(v->arch.vm_event);
>>>  v->arch.vm_event = NULL;
>>
>> I don't think this is going to fix the problem since vCPU we are
>> currently destroying has nothing to do with the vCPUx that actually
>> caused the problem by its migration. We still are going to call
>> vmx_vcpu_disable_pml() which loads and cleans VMCS on the current pCPU.
> 
> Oh, right, wrong vCPU. This should be better:
> 
> --- unstable.orig/xen/arch/x86/domain.c
> +++ unstable/xen/arch/x86/domain.c
> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
>  
>  void vcpu_destroy(struct vcpu *v)
>  {
> +/*
> + * Flush all state for the vCPU previously having run on the current CPU.
> + * This is in particular relevant for HVM ones on VMX, so that this
> + * flushing of state won't happen from the TLB flush IPI handler behind
> + * the back of a vmx_vmcs_enter() / vmx_vmcs_exit() section.
> + */
> +sync_local_execstate();
> +
>  xfree(v->arch.vm_event);
>  v->arch.vm_event = NULL;
>  
> In that case the question then is whether (rather than generalizing
> is, as mentioned for the earlier version) this wouldn't better go into
> vmx_vcpu_destroy(), assuming anything called earlier from
> hvm_vcpu_destroy() isn't susceptible to the problem (i.e. doesn't
> play with VMCSes).

Ah, ok. Does this also apply to the previous issue? May I revert that
change to test it?

There is one things that I'm worrying about with this approach:

At this place we just sync the idle context because we know that we are
going to deal with VMCS later. But what about other potential cases
(perhaps some softirqs) in which we are accessing a vCPU data structure
that is currently shared between different pCPUs. Maybe we'd better sync
the context as soon as possible after we switched to idle from a
migrated vCPU.

Igor

> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-07 Thread Igor Druzhinin
On 07/11/17 08:07, Jan Beulich wrote:
 On 02.11.17 at 20:46,  wrote:
>>> Any ideas about the root cause of the fault and suggestions how to 
>>> reproduce it
>>> would be welcome. Does this crash really has something to do with PML? I 
>>> doubt
>>> because the original environment may hardly be called PML-heavy.
> 
> Well, PML-heaviness doesn't matter. It's the mere fact that PML
> is enabled on the vCPU being destroyed.
> 
>> So we finally have complete understanding of what's going on:
>>
>> Some vCPU has just migrated to another pCPU and we switched to idle but
>> per_cpu(curr_vcpu) on the current pCPU is still pointing to it - this is
>> how the current logic works. While we're in idle we're issuing
>> vcpu_destroy() for some other domain which eventually calls
>> vmx_vcpu_disable_pml() and trashes VMCS pointer on the current pCPU. At
>> this moment we get a TLB flush IPI from that same vCPU which is now
>> context switching on another pCPU - it appears to clean TLB after
>> itself. This vCPU is already marked is_running=1 by the scheduler. In
>> the IPI handler we enter __sync_local_execstate() and trying to call
>> vmx_ctxt_switch_from() for the migrated vCPU which is supposed to call
>> vmcs_reload() but doesn't do it because is_running==1. The next VMWRITE
>> crashes the hypervisor.
>>
>> So the state transition diagram might look like:
>> pCPU1: vCPUx -> migrate to pCPU2 -> idle -> RCU callbacks ->
> 
> I'm not really clear about who/what is "idle" here: pCPU1,
> pCPU2, or yet something else?

It's switching to the "current" idle context on pCPU1.

> If vCPUx migrated to pCPU2,
> wouldn't it be put back into runnable state right away, and
> hence pCPU2 can't be idle at this point? Yet for pCPU1 I don't
> think its idleness would matter much, i.e. the situation could
> also arise without it becoming idle afaics. pCPU1 making it
> anywhere softirqs are being processed would suffice.
> 

Idleness matters in that case because we are not switching
per_cpu(curr_vcpu) which I think is the main problem when vCPU migration
comes into play.

>> vcpu_destroy() -> vmx_vcpu_disable_pml() -> vmcs_clear()
>> pCPU2: context switch into vCPUx -> is_running = 1 -> TLB flush
>> pCPU1: IPI handler -> context switch out of vCPUx -> VMWRITE -> CRASH!
>>
>> We can basically just fix the condition around vmcs_reload() call but
>> I'm not completely sure that it's the right way to do - I don't think
>> leaving per_cpu(curr_vcpu) pointing to a migrated vCPU is a good idea
>> (maybe we need to clean it). What are your thoughts?
> 
> per_cpu(curr_vcpu) can only validly be written inside
> __context_switch(), hence the only way to achieve this would
> be to force __context_switch() to be called earlier than out of
> the TLB flush IPI handler, perhaps like in the (untested!) patch
> below. Two questions then remain:
> - Should we perhaps rather do this in an arch-independent way
>   (i.e. ahead of the call to vcpu_destroy() in common code)?
> - This deals with only a special case of the more general "TLB
>   flush behind the back of a vmx_vmcs_enter() /
>   vmx_vmcs_exit() section" - does this need dealing with in a
>   more general way? Here I'm thinking of introducing a
>   FLUSH_STATE flag to be passed to flush_mask() instead of
>   the current flush_tlb_mask() in context_switch() and
>   sync_vcpu_execstate(). This could at the same time be used
>   for a small performance optimization: At least for HAP vCPU-s
>   I don't think we really need the TLB part of the flushes here.
> 
> Jan
> 
> --- unstable.orig/xen/arch/x86/domain.c
> +++ unstable/xen/arch/x86/domain.c
> @@ -379,6 +379,14 @@ int vcpu_initialise(struct vcpu *v)
>  
>  void vcpu_destroy(struct vcpu *v)
>  {
> +/*
> + * Flush all state for this vCPU before fully tearing it down. This is
> + * particularly important for HVM ones on VMX, so that this flushing of
> + * state won't happen from the TLB flush IPI handler behind the back of
> + * a vmx_vmcs_enter() / vmx_vmcs_exit() section.
> + */
> +sync_vcpu_execstate(v);
> +
>  xfree(v->arch.vm_event);
>  v->arch.vm_event = NULL;
>  

I don't think this is going to fix the problem since vCPU we are
currently destroying has nothing to do with the vCPUx that actually
caused the problem by its migration. We still are going to call
vmx_vcpu_disable_pml() which loads and cleans VMCS on the current pCPU.
Perhaps I should improve my diagram:

pCPU1: vCPUx of domain X -> migrate to pCPU2 -> switch to idle context
-> RCU callbacks -> vcpu_destroy(vCPUy of domain Y) ->
vmx_vcpu_disable_pml() -> vmx_vmcs_clear() (VMCS is trashed at this
point on pCPU1)

pCPU2: context switch into vCPUx -> vCPUx.is_running = 1 -> TLB flush
from context switch to clean TLB on pCPU1

(pCPU1 is still somewhere in vcpu_destroy() loop and with VMCS cleared
by vmx_vcpu_disable_pml())

pCPU1: IPI handler for TLB flush -> context switch out of vCPUx (this is
here because we haven't 

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-11-02 Thread Igor Druzhinin
On 27/10/17 18:42, Igor Druzhinin wrote:
> On 16/02/17 11:15, Jan Beulich wrote:
>> When __context_switch() is being bypassed during original context
>> switch handling, the vCPU "owning" the VMCS partially loses control of
>> it: It will appear non-running to remote CPUs, and hence their attempt
>> to pause the owning vCPU will have no effect on it (as it already
>> looks to be paused). At the same time the "owning" CPU will re-enable
>> interrupts eventually (the lastest when entering the idle loop) and
>> hence becomes subject to IPIs from other CPUs requesting access to the
>> VMCS. As a result, when __context_switch() finally gets run, the CPU
>> may no longer have the VMCS loaded, and hence any accesses to it would
>> fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from().
>>
>> Similarly, when __context_switch() is being bypassed also on the second
>> (switch-in) path, VMCS ownership may have been lost and hence needs
>> re-establishing. Since there's no existing hook to put this in, add a
>> new one.
>>
>> Reported-by: Kevin Mayer <kevin.ma...@gdata.de>
>> Reported-by: Anshul Makkar <anshul.mak...@citrix.com>
>> Signed-off-by: Jan Beulich <jbeul...@suse.com>
>> ---
>> v2: Drop the spin loop from vmx_vmc_reload(). Use the function in
>> vmx_do_resume() instead of open coding it there (requiring the
>> ASSERT()s to be adjusted/dropped). Drop the new
>> ->ctxt_switch_same() hook.
>>
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -552,6 +552,20 @@ static void vmx_load_vmcs(struct vcpu *v
>>  local_irq_restore(flags);
>>  }
>>  
>> +void vmx_vmcs_reload(struct vcpu *v)
>> +{
>> +/*
>> + * As we may be running with interrupts disabled, we can't acquire
>> + * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled
>> + * the VMCS can't be taken away from us anymore if we still own it.
>> + */
>> +ASSERT(v->is_running || !local_irq_is_enabled());
>> +if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) )
>> +return;
>> +
>> +vmx_load_vmcs(v);
>> +}
>> +
>>  int vmx_cpu_up_prepare(unsigned int cpu)
>>  {
>>  /*
>> @@ -1678,10 +1692,7 @@ void vmx_do_resume(struct vcpu *v)
>>  bool_t debug_state;
>>  
>>  if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() )
>> -{
>> -if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) )
>> -vmx_load_vmcs(v);
>> -}
>> +vmx_vmcs_reload(v);
>>  else
>>  {
>>  /*
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -936,6 +937,18 @@ static void vmx_ctxt_switch_from(struct
>>  if ( unlikely(!this_cpu(vmxon)) )
>>  return;
>>  
>> +if ( !v->is_running )
>> +{
>> +/*
>> + * When this vCPU isn't marked as running anymore, a remote pCPU's
>> + * attempt to pause us (from vmx_vmcs_enter()) won't have a reason
>> + * to spin in vcpu_sleep_sync(), and hence that pCPU might have 
>> taken
>> + * away the VMCS from us. As we're running with interrupts disabled,
>> + * we also can't call vmx_vmcs_enter().
>> + */
>> +vmx_vmcs_reload(v);
>> +}
>> +
>>  vmx_fpu_leave(v);
>>  vmx_save_guest_msrs(v);
>>  vmx_restore_host_msrs();
>> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> @@ -174,6 +174,7 @@ void vmx_destroy_vmcs(struct vcpu *v);
>>  void vmx_vmcs_enter(struct vcpu *v);
>>  bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v);
>>  void vmx_vmcs_exit(struct vcpu *v);
>> +void vmx_vmcs_reload(struct vcpu *v);
>>  
>>  #define CPU_BASED_VIRTUAL_INTR_PENDING0x0004
>>  #define CPU_BASED_USE_TSC_OFFSETING   0x0008
>>
> 
> Hi Jan,
> 
> I'm not entirely sure if it's something related but the end result looks
> similar to the issue that this patch solved. We are now getting reports of
> a similar race condition with the following stack trace on 4.7.1 with this
> patch backported but I'm pretty sure this should be the case for master
> as well:
> 
> (XEN) [480198.570165] Xen call trace:
> (XEN) [480198.570168][] 
> vmx.c#arch/x86/hvm/vmx/vmx.o.unlikely+0x136/0x1a8
> (XEN) [480198.570171][] 
> domain.c#__context_switch+0x10c/0x3a4
> (XEN) [480198.570176][

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths

2017-10-27 Thread Igor Druzhinin
On 16/02/17 11:15, Jan Beulich wrote:
> When __context_switch() is being bypassed during original context
> switch handling, the vCPU "owning" the VMCS partially loses control of
> it: It will appear non-running to remote CPUs, and hence their attempt
> to pause the owning vCPU will have no effect on it (as it already
> looks to be paused). At the same time the "owning" CPU will re-enable
> interrupts eventually (the lastest when entering the idle loop) and
> hence becomes subject to IPIs from other CPUs requesting access to the
> VMCS. As a result, when __context_switch() finally gets run, the CPU
> may no longer have the VMCS loaded, and hence any accesses to it would
> fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from().
> 
> Similarly, when __context_switch() is being bypassed also on the second
> (switch-in) path, VMCS ownership may have been lost and hence needs
> re-establishing. Since there's no existing hook to put this in, add a
> new one.
> 
> Reported-by: Kevin Mayer 
> Reported-by: Anshul Makkar 
> Signed-off-by: Jan Beulich 
> ---
> v2: Drop the spin loop from vmx_vmc_reload(). Use the function in
> vmx_do_resume() instead of open coding it there (requiring the
> ASSERT()s to be adjusted/dropped). Drop the new
> ->ctxt_switch_same() hook.
> 
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -552,6 +552,20 @@ static void vmx_load_vmcs(struct vcpu *v
>  local_irq_restore(flags);
>  }
>  
> +void vmx_vmcs_reload(struct vcpu *v)
> +{
> +/*
> + * As we may be running with interrupts disabled, we can't acquire
> + * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled
> + * the VMCS can't be taken away from us anymore if we still own it.
> + */
> +ASSERT(v->is_running || !local_irq_is_enabled());
> +if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) )
> +return;
> +
> +vmx_load_vmcs(v);
> +}
> +
>  int vmx_cpu_up_prepare(unsigned int cpu)
>  {
>  /*
> @@ -1678,10 +1692,7 @@ void vmx_do_resume(struct vcpu *v)
>  bool_t debug_state;
>  
>  if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() )
> -{
> -if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) )
> -vmx_load_vmcs(v);
> -}
> +vmx_vmcs_reload(v);
>  else
>  {
>  /*
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -936,6 +937,18 @@ static void vmx_ctxt_switch_from(struct
>  if ( unlikely(!this_cpu(vmxon)) )
>  return;
>  
> +if ( !v->is_running )
> +{
> +/*
> + * When this vCPU isn't marked as running anymore, a remote pCPU's
> + * attempt to pause us (from vmx_vmcs_enter()) won't have a reason
> + * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken
> + * away the VMCS from us. As we're running with interrupts disabled,
> + * we also can't call vmx_vmcs_enter().
> + */
> +vmx_vmcs_reload(v);
> +}
> +
>  vmx_fpu_leave(v);
>  vmx_save_guest_msrs(v);
>  vmx_restore_host_msrs();
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -174,6 +174,7 @@ void vmx_destroy_vmcs(struct vcpu *v);
>  void vmx_vmcs_enter(struct vcpu *v);
>  bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v);
>  void vmx_vmcs_exit(struct vcpu *v);
> +void vmx_vmcs_reload(struct vcpu *v);
>  
>  #define CPU_BASED_VIRTUAL_INTR_PENDING0x0004
>  #define CPU_BASED_USE_TSC_OFFSETING   0x0008
> 

Hi Jan,

I'm not entirely sure if it's something related but the end result looks
similar to the issue that this patch solved. We are now getting reports of
a similar race condition with the following stack trace on 4.7.1 with this
patch backported but I'm pretty sure this should be the case for master
as well:

(XEN) [480198.570165] Xen call trace:
(XEN) [480198.570168][] 
vmx.c#arch/x86/hvm/vmx/vmx.o.unlikely+0x136/0x1a8
(XEN) [480198.570171][] 
domain.c#__context_switch+0x10c/0x3a4
(XEN) [480198.570176][] __sync_local_execstate+0x35/0x51
(XEN) [480198.570179][] invalidate_interrupt+0x40/0x73
(XEN) [480198.570183][] do_IRQ+0x8c/0x5cb
(XEN) [480198.570186][] common_interrupt+0x5f/0x70
(XEN) [480198.570189][] vpmu_destroy+0/0x100
(XEN) [480198.570192][] vmx.c#vmx_vcpu_destroy+0x21/0x30
(XEN) [480198.570195][] hvm_vcpu_destroy+0x70/0x77
(XEN) [480198.570197][] vcpu_destroy+0x5d/0x72
(XEN) [480198.570201][] 
domain.c#complete_domain_destroy+0x49/0x182
(XEN) [480198.570204][] 
rcupdate.c#rcu_process_callbacks+0x141/0x1a3
(XEN) [480198.570207][] softirq.c#__do_softirq+0x75/0x80
(XEN) [480198.570209][] process_pending_softirqs+0xe/0x10
(XEN) [480198.570212][] mwait-idle.c#mwait_idle+0xf5/0x2c3
(XEN) [480198.570214][] vmx_intr_assist+0x3bf/0x4f2
(XEN) [480198.570216][] 

[Xen-devel] [PATCH v5] hvmloader, libxl: use the correct ACPI settings depending on device model

2017-08-30 Thread Igor Druzhinin
We need to choose ACPI tables properly depending on the device
model version we are running. Previously, this decision was
made by BIOS type specific code in hvmloader, e.g. always load
QEMU traditional specific tables if it's ROMBIOS and always
load QEMU Xen specific tables if it's SeaBIOS.

This change saves this behavior (for compatibility) but adds
an additional way (xenstore key) to specify the correct
device model if we happen to run a non-default one. Toolstack
bit makes use of it.

The enforcement of BIOS type depending on QEMU version will
be lifted later when the rest of ROMBIOS compatibility fixes
are in place.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
Reviewed-by: Paul Durrant <paul.durr...@citrix.com>
---
Changes in v5:
* various refinements

Changes in v4:
* Use V1 port location unconditionally as modern versions of
  Qemu-trad use it anyway
* Change confusing comments in ioreq.h

Changes in v3:
* move ACPI table externs into util.h

Changes in v2:
* fix insufficient allocation size of localent
---
 tools/firmware/hvmloader/ovmf.c|  3 ---
 tools/firmware/hvmloader/rombios.c |  3 ---
 tools/firmware/hvmloader/seabios.c |  3 ---
 tools/firmware/hvmloader/util.c| 17 +
 tools/firmware/hvmloader/util.h|  3 +++
 tools/libxl/libxl_create.c |  4 +++-
 xen/include/public/hvm/ioreq.h |  9 +++--
 7 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/tools/firmware/hvmloader/ovmf.c b/tools/firmware/hvmloader/ovmf.c
index 4ff7f1d..a17a11c 100644
--- a/tools/firmware/hvmloader/ovmf.c
+++ b/tools/firmware/hvmloader/ovmf.c
@@ -41,9 +41,6 @@
 #define LOWCHUNK_MAXOFFSET  0x
 #define OVMF_INFO_PHYSICAL_ADDRESS 0x1000
 
-extern unsigned char dsdt_anycpu_qemu_xen[];
-extern int dsdt_anycpu_qemu_xen_len;
-
 #define OVMF_INFO_MAX_TABLES 4
 struct ovmf_info {
 char signature[14]; /* XenHVMOVMF\0\0\0\0 */
diff --git a/tools/firmware/hvmloader/rombios.c 
b/tools/firmware/hvmloader/rombios.c
index 56b39b7..c736fd9 100644
--- a/tools/firmware/hvmloader/rombios.c
+++ b/tools/firmware/hvmloader/rombios.c
@@ -42,9 +42,6 @@
 #define ROMBIOS_MAXOFFSET  0x
 #define ROMBIOS_END(ROMBIOS_BEGIN + ROMBIOS_SIZE)
 
-extern unsigned char dsdt_anycpu[], dsdt_15cpu[];
-extern int dsdt_anycpu_len, dsdt_15cpu_len;
-
 static void rombios_setup_e820(void)
 {
 /*
diff --git a/tools/firmware/hvmloader/seabios.c 
b/tools/firmware/hvmloader/seabios.c
index 870576a..801516d 100644
--- a/tools/firmware/hvmloader/seabios.c
+++ b/tools/firmware/hvmloader/seabios.c
@@ -29,9 +29,6 @@
 #include 
 #include 
 
-extern unsigned char dsdt_anycpu_qemu_xen[];
-extern int dsdt_anycpu_qemu_xen_len;
-
 struct seabios_info {
 char signature[14]; /* XenHVMSeaBIOS\0 */
 uint8_t length; /* Length of this struct */
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index db5f240..ab5448b 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -897,6 +897,23 @@ void hvmloader_acpi_build_tables(struct acpi_config 
*config,
 /* Allocate and initialise the acpi info area. */
 mem_hole_populate_ram(ACPI_INFO_PHYSICAL_ADDRESS >> PAGE_SHIFT, 1);
 
+/* If the device model is specified switch to the corresponding tables */
+s = xenstore_read("platform/device-model", "");
+if ( !strncmp(s, "qemu_xen_traditional", 21) )
+{
+config->dsdt_anycpu = dsdt_anycpu;
+config->dsdt_anycpu_len = dsdt_anycpu_len;
+config->dsdt_15cpu = dsdt_15cpu;
+config->dsdt_15cpu_len = dsdt_15cpu_len;
+}
+else if ( !strncmp(s, "qemu_xen", 9) )
+{
+config->dsdt_anycpu = dsdt_anycpu_qemu_xen;
+config->dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len;
+config->dsdt_15cpu = NULL;
+config->dsdt_15cpu_len = 0;
+}
+
 config->lapic_base_address = LAPIC_BASE_ADDRESS;
 config->lapic_id = acpi_lapic_id;
 config->ioapic_base_address = ioapic_base_address;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 6062f0b..2ef854e 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -276,6 +276,9 @@ extern struct e820map memory_map;
 bool check_overlap(uint64_t start, uint64_t size,
uint64_t reserved_start, uint64_t reserved_size);
 
+extern const unsigned char dsdt_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
+extern const int dsdt_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
+
 struct acpi_config;
 void hvmloader_acpi_build_tables(struct acpi_config *config,
  unsigned int physical);
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 1158303..4f13b69 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -451,7 +451,7 @@ int libxl__domain_build

Re: [Xen-devel] [PATCH v4] hvmloader, libxl: use the correct ACPI settings depending on device model

2017-08-30 Thread Igor Druzhinin
On 30/08/17 08:21, Roger Pau Monné wrote:
> On Tue, Aug 29, 2017 at 05:29:53PM +0100, Igor Druzhinin wrote:
>> We need to choose ACPI tables properly depending on the device
>> model version we are running. Previously, this decision was
>> made by BIOS type specific code in hvmloader, e.g. always load
>> QEMU traditional specific tables if it's ROMBIOS and always
>> load QEMU Xen specific tables if it's SeaBIOS.
>>
>> This change saves this behavior (for compatibility) but adds
>> an additional way (xenstore key) to specify the correct
>> device model if we happen to run a non-default one. Toolstack
>> bit makes use of it.
>>
>> The enforcement of BIOS type depending on QEMU version will
>> be lifted later when the rest of ROMBIOS compatibility fixes
>> are in place.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
>> Reviewed-by: Paul Durrant <paul.durr...@citrix.com>
>> ---
>> Changes in v4:
>> * Use V1 port location unconditionally as modern versions of
>>   Qemu-trad use it anyway
>> * Change confusing comments in ioreq.h
>>
>> Changes in v3:
>> * move ACPI table externs into util.h
>>
>> Changes in v2:
>> * fix insufficient allocation size of localent
>> ---
>>  tools/firmware/hvmloader/ovmf.c|  3 ---
>>  tools/firmware/hvmloader/rombios.c |  3 ---
>>  tools/firmware/hvmloader/seabios.c |  3 ---
> 
> You forgot to remove the calls to HVM_PARAM_ACPI_IOPORTS_LOCATION from
> the above files.
> 

I think I did that if you mean the change that I had before. These files
now don't touch HVM_PARAM_ACPI_IOPORTS_LOCATION.

>> diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
>> index 2e5809b..cffee6b 100644
>> --- a/xen/include/public/hvm/ioreq.h
>> +++ b/xen/include/public/hvm/ioreq.h
>> @@ -103,14 +103,14 @@ typedef struct buffered_iopage buffered_iopage_t;
>>   * version number in HVM_PARAM_ACPI_IOPORTS_LOCATION.
>>   */
>>  
>> -/* Version 0 (default): Traditional Xen locations. */
>> +/* Version 0 (default): Traditional (obsolete) Xen locations. */
> 
> Could you please add a note saying this is only keep for migration
> purposes (being able to migrate from older Xen versions)?
> 

Sure.

> Thanks, Roger.
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4] hvmloader, libxl: use the correct ACPI settings depending on device model

2017-08-29 Thread Igor Druzhinin
We need to choose ACPI tables properly depending on the device
model version we are running. Previously, this decision was
made by BIOS type specific code in hvmloader, e.g. always load
QEMU traditional specific tables if it's ROMBIOS and always
load QEMU Xen specific tables if it's SeaBIOS.

This change saves this behavior (for compatibility) but adds
an additional way (xenstore key) to specify the correct
device model if we happen to run a non-default one. Toolstack
bit makes use of it.

The enforcement of BIOS type depending on QEMU version will
be lifted later when the rest of ROMBIOS compatibility fixes
are in place.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
Reviewed-by: Paul Durrant <paul.durr...@citrix.com>
---
Changes in v4:
* Use V1 port location unconditionally as modern versions of
  Qemu-trad use it anyway
* Change confusing comments in ioreq.h

Changes in v3:
* move ACPI table externs into util.h

Changes in v2:
* fix insufficient allocation size of localent
---
 tools/firmware/hvmloader/ovmf.c|  3 ---
 tools/firmware/hvmloader/rombios.c |  3 ---
 tools/firmware/hvmloader/seabios.c |  3 ---
 tools/firmware/hvmloader/util.c| 17 +
 tools/firmware/hvmloader/util.h|  3 +++
 tools/libxl/libxl_create.c |  4 +++-
 xen/include/public/hvm/ioreq.h |  4 ++--
 7 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/tools/firmware/hvmloader/ovmf.c b/tools/firmware/hvmloader/ovmf.c
index 4ff7f1d..a17a11c 100644
--- a/tools/firmware/hvmloader/ovmf.c
+++ b/tools/firmware/hvmloader/ovmf.c
@@ -41,9 +41,6 @@
 #define LOWCHUNK_MAXOFFSET  0x
 #define OVMF_INFO_PHYSICAL_ADDRESS 0x1000
 
-extern unsigned char dsdt_anycpu_qemu_xen[];
-extern int dsdt_anycpu_qemu_xen_len;
-
 #define OVMF_INFO_MAX_TABLES 4
 struct ovmf_info {
 char signature[14]; /* XenHVMOVMF\0\0\0\0 */
diff --git a/tools/firmware/hvmloader/rombios.c 
b/tools/firmware/hvmloader/rombios.c
index 56b39b7..c736fd9 100644
--- a/tools/firmware/hvmloader/rombios.c
+++ b/tools/firmware/hvmloader/rombios.c
@@ -42,9 +42,6 @@
 #define ROMBIOS_MAXOFFSET  0x
 #define ROMBIOS_END(ROMBIOS_BEGIN + ROMBIOS_SIZE)
 
-extern unsigned char dsdt_anycpu[], dsdt_15cpu[];
-extern int dsdt_anycpu_len, dsdt_15cpu_len;
-
 static void rombios_setup_e820(void)
 {
 /*
diff --git a/tools/firmware/hvmloader/seabios.c 
b/tools/firmware/hvmloader/seabios.c
index 870576a..801516d 100644
--- a/tools/firmware/hvmloader/seabios.c
+++ b/tools/firmware/hvmloader/seabios.c
@@ -29,9 +29,6 @@
 #include 
 #include 
 
-extern unsigned char dsdt_anycpu_qemu_xen[];
-extern int dsdt_anycpu_qemu_xen_len;
-
 struct seabios_info {
 char signature[14]; /* XenHVMSeaBIOS\0 */
 uint8_t length; /* Length of this struct */
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index db5f240..ab5448b 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -897,6 +897,23 @@ void hvmloader_acpi_build_tables(struct acpi_config 
*config,
 /* Allocate and initialise the acpi info area. */
 mem_hole_populate_ram(ACPI_INFO_PHYSICAL_ADDRESS >> PAGE_SHIFT, 1);
 
+/* If the device model is specified switch to the corresponding tables */
+s = xenstore_read("platform/device-model", "");
+if ( !strncmp(s, "qemu_xen_traditional", 21) )
+{
+config->dsdt_anycpu = dsdt_anycpu;
+config->dsdt_anycpu_len = dsdt_anycpu_len;
+config->dsdt_15cpu = dsdt_15cpu;
+config->dsdt_15cpu_len = dsdt_15cpu_len;
+}
+else if ( !strncmp(s, "qemu_xen", 9) )
+{
+config->dsdt_anycpu = dsdt_anycpu_qemu_xen;
+config->dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len;
+config->dsdt_15cpu = NULL;
+config->dsdt_15cpu_len = 0;
+}
+
 config->lapic_base_address = LAPIC_BASE_ADDRESS;
 config->lapic_id = acpi_lapic_id;
 config->ioapic_base_address = ioapic_base_address;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 6062f0b..874916c 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -276,6 +276,9 @@ extern struct e820map memory_map;
 bool check_overlap(uint64_t start, uint64_t size,
uint64_t reserved_start, uint64_t reserved_size);
 
+extern unsigned char dsdt_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
+extern int dsdt_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
+
 struct acpi_config;
 void hvmloader_acpi_build_tables(struct acpi_config *config,
  unsigned int physical);
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 1158303..1d24209 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -451,7 +451,7 @@ int libxl__domain_build(libxl__gc *gc,
 vments[4] = "start_ti

Re: [Xen-devel] [PATCH] acpi: set correct address of the control/event blocks in the FADT

2017-08-29 Thread Igor Druzhinin


On 29/08/17 14:42, Jan Beulich wrote:
>>>> On 29.08.17 at 15:24, <andrew.coop...@citrix.com> wrote:
>> On 29/08/17 09:50, Roger Pau Monne wrote:
>>> Commit 149c6b unmasked an issue long present in Xen: the control/event
>>> block addresses provided in the ACPI FADT table where hardcoded to the
>>> V1 version. This was papered over because hvmloader would also always
>>> set HVM_PARAM_ACPI_IOPORTS_LOCATION to 1 regardless of the BIOS
>>> version.
>>>
>>> Fix this by passing the address of the control/event blocks to
>>> acpi_build_tables, so the values can be properly set in the FADT
>>> table provided to the guest.
>>>
>>> Signed-off-by: Roger Pau Monné <roger@citrix.com>
>>> ---
>>> Cc: Igor Druzhinin <igor.druzhi...@citrix.com>
>>> Cc: Jan Beulich <jbeul...@suse.com>
>>> Cc: Andrew Cooper <andrew.coop...@citrix.com>
>>> Cc: Ian Jackson <ian.jack...@eu.citrix.com>
>>> Cc: Wei Liu <wei.l...@citrix.com>
>>> ---
>>> This commit should fix the qumu-trad Windows errors seen by osstest.
>>
>> This changes windows behaviour, but does not fix windows.  Windows now
>> boots, but waits forever while trying to reboot after installing PV
>> drivers.  There is no hint in the qemu log that the ACPI shutdown event
>> was received.
> 
> Sounds to me like matching exactly the question I've raised: It
> would help to understand why things have worked originally.
> While PM1a/b are generally meant to help split brain environments
> like our Xen/qemu one, iirc we don't make use of PM1b, and hence
> it seems quite likely that both Xen and qemu monitor PM1a port
> accesses. If that's the case and things have worked before, it's
> quite possible that qemu-trad is no servicing the wrong port.
> 

That's what actually happen. It seems that modern versions of qemu-trad
service V1 port location. I was initially confused by comments in Xen
suggesting that V0 is the right choice for qemu-trad. Seems they need to
be updated or removed.

I'll prepare an updated version of the fix today. And it looks like we
don't need Roger's fix in that case.

>> Unless someone has some very quick clever ideas, the original fix will
>> need reverting.
> 
> I agree.
> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] acpi: set correct address of the control/event blocks in the FADT

2017-08-29 Thread Igor Druzhinin
On 29/08/17 14:51, Wei Liu wrote:
> On Tue, Aug 29, 2017 at 02:37:50PM +0100, Igor Druzhinin wrote:
>> On 29/08/17 14:33, Wei Liu wrote:
>>> On Tue, Aug 29, 2017 at 02:24:49PM +0100, Andrew Cooper wrote:
>>>> On 29/08/17 09:50, Roger Pau Monne wrote:
>>>>> Commit 149c6b unmasked an issue long present in Xen: the control/event
>>>>> block addresses provided in the ACPI FADT table where hardcoded to the
>>>>> V1 version. This was papered over because hvmloader would also always
>>>>> set HVM_PARAM_ACPI_IOPORTS_LOCATION to 1 regardless of the BIOS
>>>>> version.
>>>>>
>>>>> Fix this by passing the address of the control/event blocks to
>>>>> acpi_build_tables, so the values can be properly set in the FADT
>>>>> table provided to the guest.
>>>>>
>>>>> Signed-off-by: Roger Pau Monné <roger@citrix.com>
>>>>> ---
>>>>> Cc: Igor Druzhinin <igor.druzhi...@citrix.com>
>>>>> Cc: Jan Beulich <jbeul...@suse.com>
>>>>> Cc: Andrew Cooper <andrew.coop...@citrix.com>
>>>>> Cc: Ian Jackson <ian.jack...@eu.citrix.com>
>>>>> Cc: Wei Liu <wei.l...@citrix.com>
>>>>> ---
>>>>> This commit should fix the qumu-trad Windows errors seen by osstest.
>>>>
>>>> This changes windows behaviour, but does not fix windows.  Windows now
>>>> boots, but waits forever while trying to reboot after installing PV
>>>> drivers.  There is no hint in the qemu log that the ACPI shutdown event
>>>> was received.
>>>>
>>>> Unless someone has some very quick clever ideas, the original fix will
>>>> need reverting.
>>>
>>> If I don't get a new fix by the end of today I'm going to revert Igor's
>>> patch (but keep Roger's patch in tree).
>>>
>>
>> I guess the easiest way to overcome it would be to set "qemu-xen" as a
>> device-model in libxl unconditionally.
> 
> I don't think that's right because libxl does support both qemu-xen and
> qemu-trad. The value written in xenstore should reflect the reality.
> 

In that case, probably worth reverting until we figure out why setting
the right port location causes such an effect.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] acpi: set correct address of the control/event blocks in the FADT

2017-08-29 Thread Igor Druzhinin
On 29/08/17 14:33, Wei Liu wrote:
> On Tue, Aug 29, 2017 at 02:24:49PM +0100, Andrew Cooper wrote:
>> On 29/08/17 09:50, Roger Pau Monne wrote:
>>> Commit 149c6b unmasked an issue long present in Xen: the control/event
>>> block addresses provided in the ACPI FADT table where hardcoded to the
>>> V1 version. This was papered over because hvmloader would also always
>>> set HVM_PARAM_ACPI_IOPORTS_LOCATION to 1 regardless of the BIOS
>>> version.
>>>
>>> Fix this by passing the address of the control/event blocks to
>>> acpi_build_tables, so the values can be properly set in the FADT
>>> table provided to the guest.
>>>
>>> Signed-off-by: Roger Pau Monné <roger@citrix.com>
>>> ---
>>> Cc: Igor Druzhinin <igor.druzhi...@citrix.com>
>>> Cc: Jan Beulich <jbeul...@suse.com>
>>> Cc: Andrew Cooper <andrew.coop...@citrix.com>
>>> Cc: Ian Jackson <ian.jack...@eu.citrix.com>
>>> Cc: Wei Liu <wei.l...@citrix.com>
>>> ---
>>> This commit should fix the qumu-trad Windows errors seen by osstest.
>>
>> This changes windows behaviour, but does not fix windows.  Windows now
>> boots, but waits forever while trying to reboot after installing PV
>> drivers.  There is no hint in the qemu log that the ACPI shutdown event
>> was received.
>>
>> Unless someone has some very quick clever ideas, the original fix will
>> need reverting.
> 
> If I don't get a new fix by the end of today I'm going to revert Igor's
> patch (but keep Roger's patch in tree).
> 

I guess the easiest way to overcome it would be to set "qemu-xen" as a
device-model in libxl unconditionally.

Igor

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3] hvmloader, libxl: use the correct ACPI settings depending on device model

2017-08-17 Thread Igor Druzhinin
We need to choose ACPI tables and ACPI IO port location
properly depending on the device model version we are running.
Previously, this decision was made by BIOS type specific
code in hvmloader, e.g. always load QEMU traditional specific
tables if it's ROMBIOS and always load QEMU Xen specific
tables if it's SeaBIOS.

This change saves this behavior (for compatibility) but adds
an additional way (xenstore key) to specify the correct
device model if we happen to run a non-default one. Toolstack
bit makes use of it.

The enforcement of BIOS type depending on QEMU version will
be lifted later when the rest of ROMBIOS compatibility fixes
are in place.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
Reviewed-by: Paul Durrant <paul.durr...@citrix.com>
---
Changes in v3:
* move ACPI table externs into util.h

Changes in v2:
* fix insufficient allocation size of localent
---
 tools/firmware/hvmloader/hvmloader.c |  2 --
 tools/firmware/hvmloader/ovmf.c  |  5 ++---
 tools/firmware/hvmloader/rombios.c   |  5 ++---
 tools/firmware/hvmloader/seabios.c   |  6 +++---
 tools/firmware/hvmloader/util.c  | 21 +
 tools/firmware/hvmloader/util.h  |  3 +++
 tools/libxl/libxl_create.c   |  4 +++-
 7 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/tools/firmware/hvmloader/hvmloader.c 
b/tools/firmware/hvmloader/hvmloader.c
index f603f68..db11ab1 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -405,8 +405,6 @@ int main(void)
 }
 
 acpi_enable_sci();
-
-hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
 }
 
 init_vm86_tss();
diff --git a/tools/firmware/hvmloader/ovmf.c b/tools/firmware/hvmloader/ovmf.c
index 4ff7f1d..17bd0fe 100644
--- a/tools/firmware/hvmloader/ovmf.c
+++ b/tools/firmware/hvmloader/ovmf.c
@@ -41,9 +41,6 @@
 #define LOWCHUNK_MAXOFFSET  0x
 #define OVMF_INFO_PHYSICAL_ADDRESS 0x1000
 
-extern unsigned char dsdt_anycpu_qemu_xen[];
-extern int dsdt_anycpu_qemu_xen_len;
-
 #define OVMF_INFO_MAX_TABLES 4
 struct ovmf_info {
 char signature[14]; /* XenHVMOVMF\0\0\0\0 */
@@ -127,6 +124,8 @@ static void ovmf_acpi_build_tables(void)
 .dsdt_15cpu_len = 0
 };
 
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
+
 hvmloader_acpi_build_tables(, ACPI_PHYSICAL_ADDRESS);
 }
 
diff --git a/tools/firmware/hvmloader/rombios.c 
b/tools/firmware/hvmloader/rombios.c
index 56b39b7..b14d1f2 100644
--- a/tools/firmware/hvmloader/rombios.c
+++ b/tools/firmware/hvmloader/rombios.c
@@ -42,9 +42,6 @@
 #define ROMBIOS_MAXOFFSET  0x
 #define ROMBIOS_END(ROMBIOS_BEGIN + ROMBIOS_SIZE)
 
-extern unsigned char dsdt_anycpu[], dsdt_15cpu[];
-extern int dsdt_anycpu_len, dsdt_15cpu_len;
-
 static void rombios_setup_e820(void)
 {
 /*
@@ -181,6 +178,8 @@ static void rombios_acpi_build_tables(void)
 .dsdt_15cpu_len = dsdt_15cpu_len,
 };
 
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 0);
+
 hvmloader_acpi_build_tables(, ACPI_PHYSICAL_ADDRESS);
 }
 
diff --git a/tools/firmware/hvmloader/seabios.c 
b/tools/firmware/hvmloader/seabios.c
index 870576a..c8792cd 100644
--- a/tools/firmware/hvmloader/seabios.c
+++ b/tools/firmware/hvmloader/seabios.c
@@ -28,9 +28,7 @@
 
 #include 
 #include 
-
-extern unsigned char dsdt_anycpu_qemu_xen[];
-extern int dsdt_anycpu_qemu_xen_len;
+#include 
 
 struct seabios_info {
 char signature[14]; /* XenHVMSeaBIOS\0 */
@@ -99,6 +97,8 @@ static void seabios_acpi_build_tables(void)
 .dsdt_15cpu_len = 0,
 };
 
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
+
 hvmloader_acpi_build_tables(, rsdp);
 add_table(rsdp);
 }
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index db5f240..934b566 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -897,6 +897,27 @@ void hvmloader_acpi_build_tables(struct acpi_config 
*config,
 /* Allocate and initialise the acpi info area. */
 mem_hole_populate_ram(ACPI_INFO_PHYSICAL_ADDRESS >> PAGE_SHIFT, 1);
 
+/* If the device model is specified switch to the corresponding tables */
+s = xenstore_read("platform/device-model", "");
+if ( !strncmp(s, "qemu_xen_traditional", 21) )
+{
+config->dsdt_anycpu = dsdt_anycpu;
+config->dsdt_anycpu_len = dsdt_anycpu_len;
+config->dsdt_15cpu = dsdt_15cpu;
+config->dsdt_15cpu_len = dsdt_15cpu_len;
+
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 0);
+}
+else if ( !strncmp(s, "qemu_xen", 9) )
+{
+config->dsdt_anycpu = dsdt_anycpu_qemu_xen;
+config->dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len;
+config->dsdt_15cpu = NULL;
+config->dsdt_15cpu_len = 0;
+
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
+}
+
 config->lapic_

Re: [Xen-devel] [PATCH v2] hvmloader, libxl: use the correct ACPI settings depending on device model

2017-07-26 Thread Igor Druzhinin
On 26/07/17 14:30, Roger Pau Monné wrote:
> On Wed, Jul 26, 2017 at 02:21:49PM +0100, Igor Druzhinin wrote:
>> On 26/07/17 14:06, Roger Pau Monné wrote:
>>> On Wed, Jul 26, 2017 at 11:56:55AM +0100, Igor Druzhinin wrote:
>>>> On 26/07/17 08:31, Roger Pau Monné wrote:
>>>>> On Tue, Jul 25, 2017 at 08:55:30PM +0100, Igor Druzhinin wrote:
>>>>>> We need to choose ACPI tables and ACPI IO port location
>>>>>> properly depending on the device model version we are running.
>>>>>> Previously, this decision was made by BIOS type specific
>>>>>> code in hvmloader, e.g. always load QEMU traditional specific
>>>>>> tables if it's ROMBIOS and always load QEMU Xen specific
>>>>>> tables if it's SeaBIOS.
>>>>>>
>>>>>> This change saves this behavior but adds an additional way
>>>>>> (xenstore key) to specify the correct device model if we
>>>>>> happen to run a non-default one. Toolstack bit makes use of it.
>>>>>
>>>>> Should there also be a change to libxl to allow selecting rombios
>>>>> with qemu-xen or seabios with qemu-trad?
>>>>>
>>>>
>>>> It's already there (see libxl__domain_build_info_setdefault()).
>>>
>>> Current code in libxl__domain_build_info_setdefault will prevent you
>>> from selecting qemu-xen and rombios or qemu-trad and seabios (grep
>>> for "Enforce BIOS<->Device Model version relationship"), hence me
>>> asking if this should be lifted, so the new combinations that this
>>> patch seems to allow are available from libxl/xl.
>>>
>>> Roger.
>>>
>>
>> Yes, you're right. I think we need to change them to warnings rather
>> than errors. For instance, ROMBIOS is perfectly compatible with QEMU-Xen
>> with some small modifications so there is no need for enforcement.
> 
> Are those small modifications upstream in our tree(s)?
> 

Some of them are in review. I probably need to find some time to finish
them.

Igor

>> What
>> do you think?
> 
> I think adding a warning but allowing them should be fine, or else the
> changes made by this patch are mostly meaningless to libxl/xl.
> 
> Roger.
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] hvmloader, libxl: use the correct ACPI settings depending on device model

2017-07-26 Thread Igor Druzhinin
On 26/07/17 14:06, Roger Pau Monné wrote:
> On Wed, Jul 26, 2017 at 11:56:55AM +0100, Igor Druzhinin wrote:
>> On 26/07/17 08:31, Roger Pau Monné wrote:
>>> On Tue, Jul 25, 2017 at 08:55:30PM +0100, Igor Druzhinin wrote:
>>>> We need to choose ACPI tables and ACPI IO port location
>>>> properly depending on the device model version we are running.
>>>> Previously, this decision was made by BIOS type specific
>>>> code in hvmloader, e.g. always load QEMU traditional specific
>>>> tables if it's ROMBIOS and always load QEMU Xen specific
>>>> tables if it's SeaBIOS.
>>>>
>>>> This change saves this behavior but adds an additional way
>>>> (xenstore key) to specify the correct device model if we
>>>> happen to run a non-default one. Toolstack bit makes use of it.
>>>
>>> Should there also be a change to libxl to allow selecting rombios
>>> with qemu-xen or seabios with qemu-trad?
>>>
>>
>> It's already there (see libxl__domain_build_info_setdefault()).
> 
> Current code in libxl__domain_build_info_setdefault will prevent you
> from selecting qemu-xen and rombios or qemu-trad and seabios (grep
> for "Enforce BIOS<->Device Model version relationship"), hence me
> asking if this should be lifted, so the new combinations that this
> patch seems to allow are available from libxl/xl.
> 
> Roger.
> 

Yes, you're right. I think we need to change them to warnings rather
than errors. For instance, ROMBIOS is perfectly compatible with QEMU-Xen
with some small modifications so there is no need for enforcement. What
do you think?

Igor

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] hvmloader, libxl: use the correct ACPI settings depending on device model

2017-07-26 Thread Igor Druzhinin
On 26/07/17 08:31, Roger Pau Monné wrote:
> On Tue, Jul 25, 2017 at 08:55:30PM +0100, Igor Druzhinin wrote:
>> We need to choose ACPI tables and ACPI IO port location
>> properly depending on the device model version we are running.
>> Previously, this decision was made by BIOS type specific
>> code in hvmloader, e.g. always load QEMU traditional specific
>> tables if it's ROMBIOS and always load QEMU Xen specific
>> tables if it's SeaBIOS.
>>
>> This change saves this behavior but adds an additional way
>> (xenstore key) to specify the correct device model if we
>> happen to run a non-default one. Toolstack bit makes use of it.
> 
> Should there also be a change to libxl to allow selecting rombios
> with qemu-xen or seabios with qemu-trad?
> 

It's already there (see libxl__domain_build_info_setdefault()).

>> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
>> Reviewed-by: Paul Durrant <paul.durr...@citrix.com>
>> ---
>> Changes in v2:
>> * fix insufficient allocation size of localent
>> ---
>>  tools/firmware/hvmloader/hvmloader.c |  2 --
>>  tools/firmware/hvmloader/ovmf.c  |  2 ++
>>  tools/firmware/hvmloader/rombios.c   |  2 ++
>>  tools/firmware/hvmloader/seabios.c   |  3 +++
>>  tools/firmware/hvmloader/util.c  | 24 
>>  tools/libxl/libxl_create.c   |  4 +++-
>>  6 files changed, 34 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/firmware/hvmloader/hvmloader.c 
>> b/tools/firmware/hvmloader/hvmloader.c
>> index f603f68..db11ab1 100644
>> --- a/tools/firmware/hvmloader/hvmloader.c
>> +++ b/tools/firmware/hvmloader/hvmloader.c
>> @@ -405,8 +405,6 @@ int main(void)
>>  }
>>  
>>  acpi_enable_sci();
>> -
>> -hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
>>  }
>>  
>>  init_vm86_tss();
>> diff --git a/tools/firmware/hvmloader/ovmf.c 
>> b/tools/firmware/hvmloader/ovmf.c
>> index 4ff7f1d..ebadc64 100644
>> --- a/tools/firmware/hvmloader/ovmf.c
>> +++ b/tools/firmware/hvmloader/ovmf.c
>> @@ -127,6 +127,8 @@ static void ovmf_acpi_build_tables(void)
>>  .dsdt_15cpu_len = 0
>>  };
>>  
>> +hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
> 
> This 1/0 seems very opaque, we should have a proper define for it in
> param.h (not that you should fix it).
> 
>> +
>>  hvmloader_acpi_build_tables(, ACPI_PHYSICAL_ADDRESS);
>>  }
>>  
>> diff --git a/tools/firmware/hvmloader/rombios.c 
>> b/tools/firmware/hvmloader/rombios.c
>> index 56b39b7..31a7c65 100644
>> --- a/tools/firmware/hvmloader/rombios.c
>> +++ b/tools/firmware/hvmloader/rombios.c
>> @@ -181,6 +181,8 @@ static void rombios_acpi_build_tables(void)
>>  .dsdt_15cpu_len = dsdt_15cpu_len,
>>  };
>>  
>> +hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 0);
>> +
>>  hvmloader_acpi_build_tables(, ACPI_PHYSICAL_ADDRESS);
>>  }
>>  
>> diff --git a/tools/firmware/hvmloader/seabios.c 
>> b/tools/firmware/hvmloader/seabios.c
>> index 870576a..5878eff 100644
>> --- a/tools/firmware/hvmloader/seabios.c
>> +++ b/tools/firmware/hvmloader/seabios.c
>> @@ -28,6 +28,7 @@
>>  
>>  #include 
>>  #include 
>> +#include 
>>  
>>  extern unsigned char dsdt_anycpu_qemu_xen[];
>>  extern int dsdt_anycpu_qemu_xen_len;
>> @@ -99,6 +100,8 @@ static void seabios_acpi_build_tables(void)
>>  .dsdt_15cpu_len = 0,
>>  };
>>  
>> +hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
>> +
>>  hvmloader_acpi_build_tables(, rsdp);
>>  add_table(rsdp);
>>  }
>> diff --git a/tools/firmware/hvmloader/util.c 
>> b/tools/firmware/hvmloader/util.c
>> index db5f240..45b777c 100644
>> --- a/tools/firmware/hvmloader/util.c
>> +++ b/tools/firmware/hvmloader/util.c
>> @@ -31,6 +31,9 @@
>>  #include 
>>  #include 
>>  
>> +extern unsigned char dsdt_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
>> +extern int dsdt_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
> 
> Part of those extern declarations are now present in ovmf.c,
> seabios.c, rombios.c and now also util.c, maybe it would make sense to
> just declare them in util.h?
> 

Makes sense.

>>  /*
>>   * Check whether there exists overlap in the specified memory range.
>>   * Returns true if exists, else returns false.
>> @@ -897,6 +900,27 @@ void hvmloader_acpi_build_tables(struct acpi_config 
>>

[Xen-devel] [PATCH v2] hvmloader, libxl: use the correct ACPI settings depending on device model

2017-07-25 Thread Igor Druzhinin
We need to choose ACPI tables and ACPI IO port location
properly depending on the device model version we are running.
Previously, this decision was made by BIOS type specific
code in hvmloader, e.g. always load QEMU traditional specific
tables if it's ROMBIOS and always load QEMU Xen specific
tables if it's SeaBIOS.

This change saves this behavior but adds an additional way
(xenstore key) to specify the correct device model if we
happen to run a non-default one. Toolstack bit makes use of it.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
Reviewed-by: Paul Durrant <paul.durr...@citrix.com>
---
Changes in v2:
* fix insufficient allocation size of localent
---
 tools/firmware/hvmloader/hvmloader.c |  2 --
 tools/firmware/hvmloader/ovmf.c  |  2 ++
 tools/firmware/hvmloader/rombios.c   |  2 ++
 tools/firmware/hvmloader/seabios.c   |  3 +++
 tools/firmware/hvmloader/util.c  | 24 
 tools/libxl/libxl_create.c   |  4 +++-
 6 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/tools/firmware/hvmloader/hvmloader.c 
b/tools/firmware/hvmloader/hvmloader.c
index f603f68..db11ab1 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -405,8 +405,6 @@ int main(void)
 }
 
 acpi_enable_sci();
-
-hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
 }
 
 init_vm86_tss();
diff --git a/tools/firmware/hvmloader/ovmf.c b/tools/firmware/hvmloader/ovmf.c
index 4ff7f1d..ebadc64 100644
--- a/tools/firmware/hvmloader/ovmf.c
+++ b/tools/firmware/hvmloader/ovmf.c
@@ -127,6 +127,8 @@ static void ovmf_acpi_build_tables(void)
 .dsdt_15cpu_len = 0
 };
 
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
+
 hvmloader_acpi_build_tables(, ACPI_PHYSICAL_ADDRESS);
 }
 
diff --git a/tools/firmware/hvmloader/rombios.c 
b/tools/firmware/hvmloader/rombios.c
index 56b39b7..31a7c65 100644
--- a/tools/firmware/hvmloader/rombios.c
+++ b/tools/firmware/hvmloader/rombios.c
@@ -181,6 +181,8 @@ static void rombios_acpi_build_tables(void)
 .dsdt_15cpu_len = dsdt_15cpu_len,
 };
 
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 0);
+
 hvmloader_acpi_build_tables(, ACPI_PHYSICAL_ADDRESS);
 }
 
diff --git a/tools/firmware/hvmloader/seabios.c 
b/tools/firmware/hvmloader/seabios.c
index 870576a..5878eff 100644
--- a/tools/firmware/hvmloader/seabios.c
+++ b/tools/firmware/hvmloader/seabios.c
@@ -28,6 +28,7 @@
 
 #include 
 #include 
+#include 
 
 extern unsigned char dsdt_anycpu_qemu_xen[];
 extern int dsdt_anycpu_qemu_xen_len;
@@ -99,6 +100,8 @@ static void seabios_acpi_build_tables(void)
 .dsdt_15cpu_len = 0,
 };
 
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
+
 hvmloader_acpi_build_tables(, rsdp);
 add_table(rsdp);
 }
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index db5f240..45b777c 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -31,6 +31,9 @@
 #include 
 #include 
 
+extern unsigned char dsdt_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
+extern int dsdt_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
+
 /*
  * Check whether there exists overlap in the specified memory range.
  * Returns true if exists, else returns false.
@@ -897,6 +900,27 @@ void hvmloader_acpi_build_tables(struct acpi_config 
*config,
 /* Allocate and initialise the acpi info area. */
 mem_hole_populate_ram(ACPI_INFO_PHYSICAL_ADDRESS >> PAGE_SHIFT, 1);
 
+/* If the device model is specified switch to the corresponding tables */
+s = xenstore_read("platform/device-model", "");
+if ( !strncmp(s, "qemu_xen_traditional", 21) )
+{
+config->dsdt_anycpu = dsdt_anycpu;
+config->dsdt_anycpu_len = dsdt_anycpu_len;
+config->dsdt_15cpu = dsdt_15cpu;
+config->dsdt_15cpu_len = dsdt_15cpu_len;
+
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 0);
+}
+else if ( !strncmp(s, "qemu_xen", 9) )
+{
+config->dsdt_anycpu = dsdt_anycpu_qemu_xen;
+config->dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len;
+config->dsdt_15cpu = NULL;
+config->dsdt_15cpu_len = 0;
+
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
+}
+
 config->lapic_base_address = LAPIC_BASE_ADDRESS;
 config->lapic_id = acpi_lapic_id;
 config->ioapic_base_address = ioapic_base_address;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 1158303..1d24209 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -451,7 +451,7 @@ int libxl__domain_build(libxl__gc *gc,
 vments[4] = "start_time";
 vments[5] = GCSPRINTF("%lu.%02d", 
start_time.tv_sec,(int)start_time.tv_usec/1);
 
-localents = libxl__calloc(gc, 11, sizeof(char *));
+localents = libxl__

Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu

2017-07-25 Thread Igor Druzhinin
On 25/07/17 17:40, Alexey G wrote:
> On Mon, 24 Jul 2017 21:39:08 +0100
> Igor Druzhinin <igor.druzhi...@citrix.com> wrote:
>>> But, the problem is that overall MMIO hole(s) requirements are not known
>>> exactly at the time the HVM domain being created. Some PCI devices will
>>> be emulated, some will be merely passed through and yet there will be
>>> some RMRR ranges. libxl can't know all this stuff - some comes from the
>>> host, some comes from DM. So actual MMIO requirements are known to
>>> hvmloader at the PCI bus enumeration time.
>>>   
>>
>> IMO hvmloader shouldn't really be allowed to relocate memory under any
>> conditions. As Andrew said it's much easier to provision the hole
>> statically in libxl during domain construction process and it doesn't
>> really compromise any functionality. Having one more entity responsible
>> for guest memory layout only makes things more convoluted.
> 
> If moving most tasks of hvmloader to libxl is a planned feature in Citrix,
> please let it be discussed on xen-devel first as it may affect many
> people... and not all of them might be happy. :)
> 

Everything always goes through the mailing list.

> (tons of IMO and TLDR ahead, be warned)
> 
> Moving PCI BAR allocation from guest side to libxl is a controversial step.
> This may be the architecturally wrong way in fact. There are properties and
> areas of responsibility. Among primary responsibilities of guest's firmware
> is PCI BARs and MMIO hole size allocation. That's a guest's territory.
> Guest relocates PCI BARs (and not just BIOS able to do this), guest
> firmware relocates MMIO hole base for them. If it was a real system, all
> tasks like PCI BAR allocation, remapping part of RAM above 4G etc were done
> by system BIOS. In our case some of SeaBIOS/OVMF responsibilities were
> offloaded to hvmloader, like PCI BARs allocation, sizing MMIO hole(s) for
> them and generating ACPI tables. And that's ok as hvmloader can be
> considered merely a 'supplemental' firmware to perform some tasks of
> SeaBIOS/OVMF before passing control to them. This solution has some
> architecture logic at least and doesn't look bad.
> 

libxl is also a part of firmware so to speak. It's incorrect to think
that only hvmloader and BIOS images are "proper" firmware.

> On other hand, moving PCI hole calculation to libxl just to let Xen/libxl
> know what the MMIO size value is might be a bad idea.
> Aside from some code duplication, straying too far from the real hw paths,
> or breaking existing (or future) interfaces this might have some other
> negative consequences. Ex. who will be initializing guest's ACPI tables if
> only libxl will know the memory layout? Some new interfaces between libxl
> and hvmloader just to let the latter know what values to write to ACPI
> tables being created? Or libxl will be initializing guest's ACPI tables as
> well (another guest's internal task)? Similar concerns are applicable to
> guest's final E820 construction.
> 

The information is not confined by libxl - it's passed to hvmloader and
it can finish the tasks libxl couldn't. Although, ACPI tables could be
harmlessly initialized inside libxl as well (see PVH implementation).

> Another thing is that handling ioreq/PT MMIO ranges is somewhat a property
> of the device model (at least for now). Right now it's QEMU who traps PCI
> BAR accesses and tells Xen how to handle specific ranges of MMIO space. If
> QEMU already talks to Xen which ranges should be passed through or trapped
> -- it can tell him the current overall MMIO limits as well... or handle
> these limits himself -- if the MMIO hole range check is all what required to
> avoid MMIO space misusing, this check can be easily implemented in QEMU,
> provided that QEMU knows what memory/MMIO layout is. There is a lot of
> implementation freedom where to place restrictions and checks, Xen or QEMU.
> Strictly speaking, the MMIO hole itself can be considered a property of the
> emulated machine and may have implementation differences for different
> emulated chipsets. For example, the real i440' NB do not have an idea of
> high MMIO hole at all.
> 
> We have already a sort of an interface between hvmloader and QEMU --
> hvmloader has to do basic initialization for some emulated chipset's
> registers (and this depends on the machine). Providing additional handling
> for few other registers (TOM/TOLUD/etc) will cost almost nothing and
> purpose of this registers will actually match their usage in real HW. This
> way we can use an existing available interface and don't stray too far from
> the real HW ways. 
> 
> I want to try this approach for Q35 bringup patches for Xen I'm currently
> working on. I'll send

Re: [Xen-devel] [PATCH] hvmloader, libxl: use the correct ACPI settings depending on device model

2017-07-25 Thread Igor Druzhinin
On 25/07/17 15:33, Wei Liu wrote:
> On Wed, Jul 19, 2017 at 10:19:35PM +0100, Igor Druzhinin wrote:
>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>> index 1158303..8dc8186 100644
>> --- a/tools/libxl/libxl_create.c
>> +++ b/tools/libxl/libxl_create.c
>> @@ -472,6 +472,8 @@ int libxl__domain_build(libxl__gc *gc,
>> info->u.hvm.mmio_hole_memkb << 10);
>>  }
>>  }
>> +localents[i++] = "platform/device-model";
>> +localents[i++] = (char *) 
>> libxl_device_model_version_to_string(info->device_model_version);
> 
> You probably want to enlarge localents array so that it can accommodate
> the new elements.
> 

Good catch, for some reason I thought it's already preallocated big
enough. It looks like somebody before me made the same mistake as it's
already small enough to hold all the items.

Igor

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu

2017-07-25 Thread Igor Druzhinin
On 25/07/17 08:03, Zhang, Xiong Y wrote:
>> On 24/07/17 17:42, Alexey G wrote:
>>> Hi,
>>>
>>> On Mon, 24 Jul 2017 10:53:16 +0100
>>> Igor Druzhinin <igor.druzhi...@citrix.com> wrote:
>>>>> [Zhang, Xiong Y] Thanks for your suggestion.
>>>>> Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue.
>>>>> For this I still have two questions, could you help me ?
>>>>> 1) If hvmloader do low memory relocation, hvmloader and qemu will see a
>>>>> different guest memory layout . So qemu ram maybe overlop with mmio,
>>>>> does xen have plan to fix this ?
>>>>
>>>> hvmloader doesn't do memory relocation - this ability is turned off by
>>>> default. The reason for the issue is that libxl initially sets the size
>>>> of lower MMIO hole (based on the RMRR regions present and their size)
>>>> and doesn't communicate it to QEMU using 'max-ram-below-4g' argument.
>>>>
>>>> When you set 'mmio_hole' size parameter you basically forces libxl to
>>>> pass this argument to QEMU.
>>>>
>>>> That means the proper fix would be to make libxl to pass this argument
>>>> to QEMU in case there are RMRR regions present.
>>>
>>> I tend to disagree a bit.
>>> What we lack actually is some way to perform a 'dynamical' physmem
>>> relocation, when a guest domain is running already. Right now it works only
>>> in the 'static' way - i.e. if memory layout was known for both QEMU and
>>> hvmloader before starting a guest domain and with no means of arbitrarily
>>> changing this layout at runtime when hvmloader runs.
>>>
>>> But, the problem is that overall MMIO hole(s) requirements are not known
>>> exactly at the time the HVM domain being created. Some PCI devices will be
>>> emulated, some will be merely passed through and yet there will be some
>>> RMRR ranges. libxl can't know all this stuff - some comes from the host,
>>> some comes from DM. So actual MMIO requirements are known to
>> hvmloader at
>>> the PCI bus enumeration time.
>>>
>>
>> IMO hvmloader shouldn't really be allowed to relocate memory under any
>> conditions. As Andrew said it's much easier to provision the hole
>> statically in libxl during domain construction process and it doesn't
>> really compromise any functionality. Having one more entity responsible
>> for guest memory layout only makes things more convoluted.
>>
>>> libxl can be taught to retrieve all missing info from QEMU, but this way
>>> will require to perform all grunt work of PCI BARs allocation in libxl
>>> itself - in order to calculate the real MMIO hole(s) size, one needs to
>>> take into account all PCI BARs sizes and their alignment requirements
>>> diversity + existing gaps due to RMRR ranges... basically, libxl will
>>> need to do most of hvmloader/pci.c's job.
>>>
>>
>> The algorithm implemented in hvmloader for that is not complicated and
>> can be moved to libxl easily. What we can do is to provision a hole big
>> enough to include all the initially assigned PCI devices. We can also
>> account for emulated MMIO regions if necessary. But, to be honest, it
>> doesn't really matter since if there is no enough space in lower MMIO
>> hole for some BARs - they can be easily relocated to upper MMIO
>> hole by hvmloader or the guest itself (dynamically).
>>
>> Igor
> [Zhang, Xiong Y] yes, If we could supply a big enough mmio hole and don't 
> allow hvmloader to do relocate, things will be easier. But how could we 
> supply a big enough mmio hole ?
> a. statical set base address of mmio hole to 2G/3G.
> b. Like hvmloader to probe all the pci devices and calculate mmio size. But 
> this runs prior to qemu, how to probe pci devices ? 
> 

It's true that we don't know the space occupied by emulated device
before QEMU is started.  But QEMU needs to be started with some lower
MMIO hole size statically assigned.

One of the possible solutions is to calculate a hole size required to
include all the assigned pass-through devices and round it up to the
nearest GB boundary but not larger than 2GB total. If it's not enough to
also include all the emulated devices - it's not enough, some of the PCI
device are going to be relocated to upper MMIO hole in that case.

Igor

> thanks
>>> My 2kop opinion here is that we don't need to move all PCI BAR allocation to
>>> libxl, or invent some new QMP-interfaces, or introduce new hypercalls or
>>> else. A simple and somewhat good solution woul

Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu

2017-07-24 Thread Igor Druzhinin
On 24/07/17 17:42, Alexey G wrote:
> Hi,
> 
> On Mon, 24 Jul 2017 10:53:16 +0100
> Igor Druzhinin <igor.druzhi...@citrix.com> wrote:
>>> [Zhang, Xiong Y] Thanks for your suggestion.
>>> Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue.
>>> For this I still have two questions, could you help me ?
>>> 1) If hvmloader do low memory relocation, hvmloader and qemu will see a
>>> different guest memory layout . So qemu ram maybe overlop with mmio,
>>> does xen have plan to fix this ? 
>>
>> hvmloader doesn't do memory relocation - this ability is turned off by
>> default. The reason for the issue is that libxl initially sets the size
>> of lower MMIO hole (based on the RMRR regions present and their size)
>> and doesn't communicate it to QEMU using 'max-ram-below-4g' argument.
>>
>> When you set 'mmio_hole' size parameter you basically forces libxl to
>> pass this argument to QEMU.
>>
>> That means the proper fix would be to make libxl to pass this argument
>> to QEMU in case there are RMRR regions present.
> 
> I tend to disagree a bit. 
> What we lack actually is some way to perform a 'dynamical' physmem
> relocation, when a guest domain is running already. Right now it works only
> in the 'static' way - i.e. if memory layout was known for both QEMU and
> hvmloader before starting a guest domain and with no means of arbitrarily
> changing this layout at runtime when hvmloader runs.
> 
> But, the problem is that overall MMIO hole(s) requirements are not known
> exactly at the time the HVM domain being created. Some PCI devices will be
> emulated, some will be merely passed through and yet there will be some
> RMRR ranges. libxl can't know all this stuff - some comes from the host,
> some comes from DM. So actual MMIO requirements are known to hvmloader at
> the PCI bus enumeration time.
> 

IMO hvmloader shouldn't really be allowed to relocate memory under any
conditions. As Andrew said it's much easier to provision the hole
statically in libxl during domain construction process and it doesn't
really compromise any functionality. Having one more entity responsible
for guest memory layout only makes things more convoluted.

> libxl can be taught to retrieve all missing info from QEMU, but this way
> will require to perform all grunt work of PCI BARs allocation in libxl
> itself - in order to calculate the real MMIO hole(s) size, one needs to
> take into account all PCI BARs sizes and their alignment requirements
> diversity + existing gaps due to RMRR ranges... basically, libxl will
> need to do most of hvmloader/pci.c's job.
> 

The algorithm implemented in hvmloader for that is not complicated and
can be moved to libxl easily. What we can do is to provision a hole big
enough to include all the initially assigned PCI devices. We can also
account for emulated MMIO regions if necessary. But, to be honest, it
doesn't really matter since if there is no enough space in lower MMIO
hole for some BARs - they can be easily relocated to upper MMIO
hole by hvmloader or the guest itself (dynamically).

Igor

> My 2kop opinion here is that we don't need to move all PCI BAR allocation to
> libxl, or invent some new QMP-interfaces, or introduce new hypercalls or
> else. A simple and somewhat good solution would be to implement this missing
> hvmloader <-> QEMU interface in the same manner how it is done in real
> hardware.
> 
> When we move some part of guest memory in 4GB range to address space above
> 4GB via XENMEM_add_to_physmap, we basically perform what chipset's
> 'remap' (aka reclaim) does. So we can implement this interface between
> hvmloader and QEMU via providing custom emulation for MCH's
> remap/TOLUD/TOUUD stuff in QEMU if xen_enabled().
> 
> In this way hvmloader will calculate MMIO hole sizes as usual, relocate
> some guest RAM above 4GB base and communicate this information to QEMU via
> emulated host bridge registers -- so then QEMU will sync its memory layout
> info to actual physmap's.
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu

2017-07-24 Thread Igor Druzhinin
On 24/07/17 09:07, Zhang, Xiong Y wrote:
>>> On Fri, 21 Jul 2017 10:57:55 +
>>> "Zhang, Xiong Y"  wrote:
>>>
 On an intel skylake machine with upstream qemu, if I add
 "rdm=strategy=host, policy=strict" to hvm.cfg, win 8.1 DomU couldn't
 boot up and continues reboot.

 Steps to reproduce this issue:

 1)   Boot xen with iommu=1 to enable iommu
 2)   hvm.cfg contain:

 builder="hvm"

 memory=

 disk=['win8.1 img']

 device_model_override='qemu-system-i386'

 device_model_version='qemu-xen'

 rdm="strategy=host,policy=strict"

 3)   xl cr hvm.cfg

 Conditions to reproduce this issue:

 1)   DomU memory size > the top address of RMRR. Otherwise, this
 issue will disappear.
 2)   rdm=" strategy=host,policy=strict" should exist
 3)   Windows DomU.  Linux DomU doesn't have such issue.
 4)   Upstream qemu.  Traditional qemu doesn't have such issue.

 In this situation, hvmloader will relocate some guest ram below RMRR to
 high memory, and it seems window guest access an invalid address. Could
 someone give me some suggestions on how to debug this ?
>>>
>>> You're likely have RMRR range(s) below 2GB boundary.
>>>
>>> You may try the following:
>>>
>>> 1. Specify some large 'mmio_hole' value in your domain configuration file,
>>> ex. mmio_hole=2560
>>> 2. If it won't help, 'xl dmesg' output might come useful
>>>
>>> Right now upstream QEMU still doesn't support relocation of parts
>>> of guest RAM to >4GB boundary if they were overlapped by MMIO ranges.
>>> AFAIR forcing allow_memory_relocate to 1 for hvmloader didn't bring
>>> anything good for HVM guest.
>>>
>>> Setting the mmio_hole size manually allows to create a "predefined"
>>> memory/MMIO hole layout for both QEMU (via 'max-ram-below-4g') and
>>> hvmloader (via a XenStore param), effectively avoiding MMIO/RMRR
>> overlaps
>>> or RAM relocation in hvmloader, so this might help.
>>
>> Wrote too soon, "policy=strict" means that you won't be able to create a
>> DomU if RMRR was below 2G... so it's actually should be above 2GB. Anyway,
>> try setting mmio_hole size.
> [Zhang, Xiong Y] Thanks for your suggestion.
> Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue.
> For this I still have two questions, could you help me ?
> 1) If hvmloader do low memory relocation, hvmloader and qemu will see a 
> different guest memory layout . So qemu ram maybe overlop with mmio, does xen 
> have plan to fix this ?
> 

hvmloader doesn't do memory relocation - this ability is turned off by
default. The reason for the issue is that libxl initially sets the size
of lower MMIO hole (based on the RMRR regions present and their size)
and doesn't communicate it to QEMU using 'max-ram-below-4g' argument.

When you set 'mmio_hole' size parameter you basically forces libxl to
pass this argument to QEMU.

That means the proper fix would be to make libxl to pass this argument
to QEMU in case there are RMRR regions present.

Igor

> 2) Just now, I did an experiment: In hvmloader, I set HVM_BELOW_4G_RAM_END to 
> 3G and reserve one area for qemu_ram_allocate like 0xF000 ~ 0xFC00; 
> In Qemu, I modified xen_ram_alloc() to make sure it only allocate gfn in 
> 0xF000 ~ 0xFC00. In this case qemu_ram won't overlap with mmio, but 
> this workaround couldn't fix my issue.
>  It seems qemu still has another interface to allocate gfn except 
> xen_ram_alloc(), do you know this interface ?
> 
> thanks
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PULL for-2.10 6/7] xen/mapcache: introduce xen_replace_cache_entry()

2017-07-21 Thread Igor Druzhinin

On 21/07/17 14:50, Anthony PERARD wrote:

On Tue, Jul 18, 2017 at 03:22:41PM -0700, Stefano Stabellini wrote:

From: Igor Druzhinin <igor.druzhi...@citrix.com>


...


+static uint8_t *xen_replace_cache_entry_unlocked(hwaddr old_phys_addr,
+ hwaddr new_phys_addr,
+ hwaddr size)
+{
+MapCacheEntry *entry;
+hwaddr address_index, address_offset;
+hwaddr test_bit_size, cache_size = size;
+
+address_index  = old_phys_addr >> MCACHE_BUCKET_SHIFT;
+address_offset = old_phys_addr & (MCACHE_BUCKET_SIZE - 1);
+
+assert(size);
+/* test_bit_size is always a multiple of XC_PAGE_SIZE */
+test_bit_size = size + (old_phys_addr & (XC_PAGE_SIZE - 1));
+if (test_bit_size % XC_PAGE_SIZE) {
+test_bit_size += XC_PAGE_SIZE - (test_bit_size % XC_PAGE_SIZE);
+}
+cache_size = size + address_offset;
+if (cache_size % MCACHE_BUCKET_SIZE) {
+cache_size += MCACHE_BUCKET_SIZE - (cache_size % MCACHE_BUCKET_SIZE);
+}
+
+entry = >entry[address_index % mapcache->nr_buckets];
+while (entry && !(entry->paddr_index == address_index &&
+  entry->size == cache_size)) {
+entry = entry->next;
+}
+if (!entry) {
+DPRINTF("Trying to update an entry for %lx " \
+"that is not in the mapcache!\n", old_phys_addr);
+return NULL;
+}
+
+address_index  = new_phys_addr >> MCACHE_BUCKET_SHIFT;
+address_offset = new_phys_addr & (MCACHE_BUCKET_SIZE - 1);
+
+fprintf(stderr, "Replacing a dummy mapcache entry for %lx with %lx\n",
+old_phys_addr, new_phys_addr);


Looks likes this does not build on 32bits.
in: 
http://logs.test-lab.xenproject.org/osstest/logs/112041/build-i386/6.ts-xen-build.log

/home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/hw/i386/xen/xen-mapcache.c:
 In function 'xen_replace_cache_entry_unlocked':
/home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/hw/i386/xen/xen-mapcache.c:539:13:
 error: format '%lx' expects argument of type 'long unsigned int', but argument 
3 has type 'hwaddr' [-Werror=format=]
  old_phys_addr, new_phys_addr);
  ^
/home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/hw/i386/xen/xen-mapcache.c:539:13:
 error: format '%lx' expects argument of type 'long unsigned int', but argument 
4 has type 'hwaddr' [-Werror=format=]
cc1: all warnings being treated as errors
   CC  i386-softmmu/target/i386/gdbstub.o
/home/osstest/build.112041.build-i386/xen/tools/qemu-xen-dir/rules.mak:66: 
recipe for target 'hw/i386/xen/xen-mapcache.o' failed


+
+xen_remap_bucket(entry, entry->vaddr_base,
+ cache_size, address_index, false);
+if (!test_bits(address_offset >> XC_PAGE_SHIFT,
+test_bit_size >> XC_PAGE_SHIFT,
+entry->valid_mapping)) {
+DPRINTF("Unable to update a mapcache entry for %lx!\n", old_phys_addr);
+return NULL;
+}
+
+return entry->vaddr_base + address_offset;
+}
+




Please, accept the attached patch to fix the issue.

Igor
>From 69a3afa453e283e92ddfd76109b203a20a02524c Mon Sep 17 00:00:00 2001
From: Igor Druzhinin <igor.druzhi...@citrix.com>
Date: Fri, 21 Jul 2017 19:27:41 +0100
Subject: [PATCH] xen: fix compilation on 32-bit hosts

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 hw/i386/xen/xen-mapcache.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index 84cc4a2..540406a 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -529,7 +529,7 @@ static uint8_t *xen_replace_cache_entry_unlocked(hwaddr old_phys_addr,
 entry = entry->next;
 }
 if (!entry) {
-DPRINTF("Trying to update an entry for %lx " \
+DPRINTF("Trying to update an entry for "TARGET_FMT_plx \
 "that is not in the mapcache!\n", old_phys_addr);
 return NULL;
 }
@@ -537,15 +537,16 @@ static uint8_t *xen_replace_cache_entry_unlocked(hwaddr old_phys_addr,
 address_index  = new_phys_addr >> MCACHE_BUCKET_SHIFT;
 address_offset = new_phys_addr & (MCACHE_BUCKET_SIZE - 1);
 
-fprintf(stderr, "Replacing a dummy mapcache entry for %lx with %lx\n",
-old_phys_addr, new_phys_addr);
+fprintf(stderr, "Replacing a dummy mapcache entry for "TARGET_FMT_plx \
+" with "TARGET_FMT_plx"\n", old_phys_addr, new_phys_addr);
 
 xen_remap_bucket(entry, entry->vaddr_base,
  cache_size, address_index, false);
 if(!test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHI

[Xen-devel] [PATCH] hvmloader, libxl: use the correct ACPI settings depending on device model

2017-07-19 Thread Igor Druzhinin
We need to choose ACPI tables and ACPI IO port location
properly depending on the device model version we are running.
Previously, this decision was made by BIOS type specific
code in hvmloader, e.g. always load QEMU traditional specific
tables if it's ROMBIOS and always load QEMU Xen specific
tables if it's SeaBIOS.

This change saves this behavior but adds an additional way
(xenstore key) to specify the correct device model if we
happen to run a non-default one. Toolstack bit makes use of it.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 tools/firmware/hvmloader/hvmloader.c |  2 --
 tools/firmware/hvmloader/ovmf.c  |  2 ++
 tools/firmware/hvmloader/rombios.c   |  2 ++
 tools/firmware/hvmloader/seabios.c   |  3 +++
 tools/firmware/hvmloader/util.c  | 24 
 tools/libxl/libxl_create.c   |  2 ++
 6 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/tools/firmware/hvmloader/hvmloader.c 
b/tools/firmware/hvmloader/hvmloader.c
index f603f68..db11ab1 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -405,8 +405,6 @@ int main(void)
 }
 
 acpi_enable_sci();
-
-hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
 }
 
 init_vm86_tss();
diff --git a/tools/firmware/hvmloader/ovmf.c b/tools/firmware/hvmloader/ovmf.c
index 4ff7f1d..ebadc64 100644
--- a/tools/firmware/hvmloader/ovmf.c
+++ b/tools/firmware/hvmloader/ovmf.c
@@ -127,6 +127,8 @@ static void ovmf_acpi_build_tables(void)
 .dsdt_15cpu_len = 0
 };
 
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
+
 hvmloader_acpi_build_tables(, ACPI_PHYSICAL_ADDRESS);
 }
 
diff --git a/tools/firmware/hvmloader/rombios.c 
b/tools/firmware/hvmloader/rombios.c
index 56b39b7..31a7c65 100644
--- a/tools/firmware/hvmloader/rombios.c
+++ b/tools/firmware/hvmloader/rombios.c
@@ -181,6 +181,8 @@ static void rombios_acpi_build_tables(void)
 .dsdt_15cpu_len = dsdt_15cpu_len,
 };
 
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 0);
+
 hvmloader_acpi_build_tables(, ACPI_PHYSICAL_ADDRESS);
 }
 
diff --git a/tools/firmware/hvmloader/seabios.c 
b/tools/firmware/hvmloader/seabios.c
index 870576a..5878eff 100644
--- a/tools/firmware/hvmloader/seabios.c
+++ b/tools/firmware/hvmloader/seabios.c
@@ -28,6 +28,7 @@
 
 #include 
 #include 
+#include 
 
 extern unsigned char dsdt_anycpu_qemu_xen[];
 extern int dsdt_anycpu_qemu_xen_len;
@@ -99,6 +100,8 @@ static void seabios_acpi_build_tables(void)
 .dsdt_15cpu_len = 0,
 };
 
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
+
 hvmloader_acpi_build_tables(, rsdp);
 add_table(rsdp);
 }
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index db5f240..45b777c 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -31,6 +31,9 @@
 #include 
 #include 
 
+extern unsigned char dsdt_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
+extern int dsdt_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
+
 /*
  * Check whether there exists overlap in the specified memory range.
  * Returns true if exists, else returns false.
@@ -897,6 +900,27 @@ void hvmloader_acpi_build_tables(struct acpi_config 
*config,
 /* Allocate and initialise the acpi info area. */
 mem_hole_populate_ram(ACPI_INFO_PHYSICAL_ADDRESS >> PAGE_SHIFT, 1);
 
+/* If the device model is specified switch to the corresponding tables */
+s = xenstore_read("platform/device-model", "");
+if ( !strncmp(s, "qemu_xen_traditional", 21) )
+{
+config->dsdt_anycpu = dsdt_anycpu;
+config->dsdt_anycpu_len = dsdt_anycpu_len;
+config->dsdt_15cpu = dsdt_15cpu;
+config->dsdt_15cpu_len = dsdt_15cpu_len;
+
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 0);
+}
+else if ( !strncmp(s, "qemu_xen", 10) )
+{
+config->dsdt_anycpu = dsdt_anycpu_qemu_xen;
+config->dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len;
+config->dsdt_15cpu = NULL;
+config->dsdt_15cpu_len = 0;
+
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
+}
+
 config->lapic_base_address = LAPIC_BASE_ADDRESS;
 config->lapic_id = acpi_lapic_id;
 config->ioapic_base_address = ioapic_base_address;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 1158303..8dc8186 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -472,6 +472,8 @@ int libxl__domain_build(libxl__gc *gc,
info->u.hvm.mmio_hole_memkb << 10);
 }
 }
+localents[i++] = "platform/device-model";
+localents[i++] = (char *) 
libxl_device_model_version_to_string(info->device_model_version);
 
 break;
 case LIBXL_DOMAIN_TYPE_PV:
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 0/4] xen: don't save/restore the physmap on VM save/restore

2017-07-10 Thread Igor Druzhinin
Saving/restoring the physmap to/from xenstore was introduced to
QEMU majorly in order to cover up the VRAM region restore issue.
The sequence of restore operations implies that we should know
the effective guest VRAM address *before* we have the VRAM region
restored (which happens later). Unfortunately, in Xen environment
VRAM memory does actually belong to a guest - not QEMU itself -
which means the position of this region is unknown beforehand and
can't be mapped into QEMU address space immediately.

Previously, recreating xenstore keys, holding the physmap, by the
toolstack helped to get this information in place at the right
moment ready to be consumed by QEMU to map the region properly.
But using xenstore for it has certain disadvantages: toolstack
needs to be aware of these keys and save/restore them accordingly;
accessing xenstore requires extra privileges which hinders QEMU
sandboxing.

The previous attempt to get rid of that was to remember all the
VRAM pointers during QEMU initialization phase and then update
them all at once when an actual foreign mapping is established.
Unfortunately, this approach worked only for VRAM and only for
a predefined set of devices - stdvga and cirrus. QXL and other
possible future devices using a moving emulated MMIO region
would be equally broken.

The new approach leverages xenforeignmemory_map2() call recently
introduced in libxenforeignmemory. It allows to create a dummy
anonymous mapping for QEMU during its initialization and change
it to a real one later during machine state restore.

---
Changes in v3:
* Patch 3: use dummy flag based checks to gate ram_block_notify_* functions
* Patch 3: switch to inline compat function instead of a straight define
* Patch 4: add additional XEN_COMPAT_PHYSMAP blocks

Changed in v2:
* Patch 2: set dummy flag in a new flags field in struct MapCacheEntry
* Patch 3: change xen_remap_cache_entry name and signature
* Patch 3: gate ram_block_notify_* functions in xen_remap_bucket
* Patch 3: rewrite the logic of xen_replace_cache_entry_unlocked to
   reuse the existing entry instead of allocating a new one
* Patch 4: don't use xen_phys_offset_to_gaddr in non-compat mode

---
Igor Druzhinin (4):
  xen: move physmap saving into a separate function
  xen/mapcache: add an ability to create dummy mappings
  xen/mapcache: introduce xen_replace_cache_entry()
  xen: don't use xenstore to save/restore physmap anymore

 configure |  18 +++
 hw/i386/xen/xen-hvm.c | 105 +++-
 hw/i386/xen/xen-mapcache.c| 121 ++
 include/hw/xen/xen_common.h   |  15 ++
 include/sysemu/xen-mapcache.h |  11 +++-
 5 files changed, 222 insertions(+), 48 deletions(-)

-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 3/4] xen/mapcache: introduce xen_replace_cache_entry()

2017-07-10 Thread Igor Druzhinin
This new call is trying to update a requested map cache entry
according to the changes in the physmap. The call is searching
for the entry, unmaps it and maps again at the same place using
a new guest address. If the mapping is dummy this call will
make it real.

This function makes use of a new xenforeignmemory_map2() call
with an extended interface that was recently introduced in
libxenforeignmemory [1].

[1] https://www.mail-archive.com/xen-devel@lists.xen.org/msg113007.html

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 configure | 18 +
 hw/i386/xen/xen-mapcache.c| 85 +++
 include/hw/xen/xen_common.h   | 14 +++
 include/sysemu/xen-mapcache.h | 11 +-
 4 files changed, 119 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index c571ad1..ad6156b 100755
--- a/configure
+++ b/configure
@@ -2021,6 +2021,24 @@ EOF
 # Xen unstable
 elif
 cat > $TMPC <
+int main(void) {
+  xenforeignmemory_handle *xfmem;
+
+  xfmem = xenforeignmemory_open(0, 0);
+  xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
+
+  return 0;
+}
+EOF
+compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
+  then
+  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
+  xen_ctrl_version=41000
+  xen=yes
+elif
+cat > $TMPC <
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index 39cb511..8bc63e0 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -151,6 +151,7 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 }
 
 static void xen_remap_bucket(MapCacheEntry *entry,
+ void *vaddr,
  hwaddr size,
  hwaddr address_index,
  bool dummy)
@@ -167,7 +168,9 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 err = g_malloc0(nb_pfn * sizeof (int));
 
 if (entry->vaddr_base != NULL) {
-ram_block_notify_remove(entry->vaddr_base, entry->size);
+if (!(entry->flags & XEN_MAPCACHE_ENTRY_DUMMY)) {
+ram_block_notify_remove(entry->vaddr_base, entry->size);
+}
 if (munmap(entry->vaddr_base, entry->size) != 0) {
 perror("unmap fails");
 exit(-1);
@@ -181,11 +184,11 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 }
 
 if (!dummy) {
-vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
-   PROT_READ | PROT_WRITE,
+vaddr_base = xenforeignmemory_map2(xen_fmem, xen_domid, vaddr,
+   PROT_READ | PROT_WRITE, 0,
nb_pfn, pfns, err);
 if (vaddr_base == NULL) {
-perror("xenforeignmemory_map");
+perror("xenforeignmemory_map2");
 exit(-1);
 }
 } else {
@@ -193,7 +196,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
  * We create dummy mappings where we are unable to create a foreign
  * mapping immediately due to certain circumstances (i.e. on resume 
now)
  */
-vaddr_base = mmap(NULL, size, PROT_READ | PROT_WRITE,
+vaddr_base = mmap(vaddr, size, PROT_READ | PROT_WRITE,
   MAP_ANON | MAP_SHARED, -1, 0);
 if (vaddr_base == NULL) {
 perror("mmap");
@@ -201,6 +204,10 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 }
 }
 
+if (!(entry->flags & XEN_MAPCACHE_ENTRY_DUMMY)) {
+ram_block_notify_add(vaddr_base, size);
+}
+
 entry->vaddr_base = vaddr_base;
 entry->paddr_index = address_index;
 entry->size = size;
@@ -213,7 +220,6 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 entry->flags &= ~(XEN_MAPCACHE_ENTRY_DUMMY);
 }
 
-ram_block_notify_add(entry->vaddr_base, entry->size);
 bitmap_zero(entry->valid_mapping, nb_pfn);
 for (i = 0; i < nb_pfn; i++) {
 if (!err[i]) {
@@ -286,14 +292,14 @@ tryagain:
 if (!entry) {
 entry = g_malloc0(sizeof (MapCacheEntry));
 pentry->next = entry;
-xen_remap_bucket(entry, cache_size, address_index, dummy);
+xen_remap_bucket(entry, NULL, cache_size, address_index, dummy);
 } else if (!entry->lock) {
 if (!entry->vaddr_base || entry->paddr_index != address_index ||
 entry->size != cache_size ||
 !test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
-xen_remap_bucket(entry, cache_size, address_index, dummy);
+xen_remap_bucket(entry, NULL, cache_size, address_index, dummy

[Xen-devel] [PATCH v3 4/4] xen: don't use xenstore to save/restore physmap anymore

2017-07-10 Thread Igor Druzhinin
If we have a system with xenforeignmemory_map2() implemented
we don't need to save/restore physmap on suspend/restore
anymore. In case we resume a VM without physmap - try to
recreate the physmap during memory region restore phase and
remap map cache entries accordingly. The old code is left
for compatibility reasons.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 hw/i386/xen/xen-hvm.c   | 48 ++---
 hw/i386/xen/xen-mapcache.c  |  4 
 include/hw/xen/xen_common.h |  1 +
 3 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index d259cf7..d24ca47 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -289,6 +289,7 @@ static XenPhysmap *get_physmapping(XenIOState *state,
 return NULL;
 }
 
+#ifdef XEN_COMPAT_PHYSMAP
 static hwaddr xen_phys_offset_to_gaddr(hwaddr start_addr,
ram_addr_t size, void 
*opaque)
 {
@@ -334,6 +335,12 @@ static int xen_save_physmap(XenIOState *state, XenPhysmap 
*physmap)
 }
 return 0;
 }
+#else
+static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
+{
+return 0;
+}
+#endif
 
 static int xen_add_to_physmap(XenIOState *state,
   hwaddr start_addr,
@@ -368,6 +375,26 @@ go_physmap:
 DPRINTF("mapping vram to %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
 start_addr, start_addr + size);
 
+mr_name = memory_region_name(mr);
+
+physmap = g_malloc(sizeof (XenPhysmap));
+
+physmap->start_addr = start_addr;
+physmap->size = size;
+physmap->name = mr_name;
+physmap->phys_offset = phys_offset;
+
+QLIST_INSERT_HEAD(>physmap, physmap, list);
+
+if (runstate_check(RUN_STATE_INMIGRATE)) {
+/* Now when we have a physmap entry we can replace a dummy mapping with
+ * a real one of guest foreign memory. */
+uint8_t *p = xen_replace_cache_entry(phys_offset, start_addr, size);
+assert(p && p == memory_region_get_ram_ptr(mr));
+
+return 0;
+}
+
 pfn = phys_offset >> TARGET_PAGE_BITS;
 start_gpfn = start_addr >> TARGET_PAGE_BITS;
 for (i = 0; i < size >> TARGET_PAGE_BITS; i++) {
@@ -382,17 +409,6 @@ go_physmap:
 }
 }
 
-mr_name = memory_region_name(mr);
-
-physmap = g_malloc(sizeof (XenPhysmap));
-
-physmap->start_addr = start_addr;
-physmap->size = size;
-physmap->name = mr_name;
-physmap->phys_offset = phys_offset;
-
-QLIST_INSERT_HEAD(>physmap, physmap, list);
-
 xc_domain_pin_memory_cacheattr(xen_xc, xen_domid,
start_addr >> TARGET_PAGE_BITS,
(start_addr + size - 1) >> TARGET_PAGE_BITS,
@@ -1158,6 +1174,7 @@ static void xen_exit_notifier(Notifier *n, void *data)
 xs_daemon_close(state->xenstore);
 }
 
+#ifdef XEN_COMPAT_PHYSMAP
 static void xen_read_physmap(XenIOState *state)
 {
 XenPhysmap *physmap = NULL;
@@ -1205,6 +1222,11 @@ static void xen_read_physmap(XenIOState *state)
 }
 free(entries);
 }
+#else
+static void xen_read_physmap(XenIOState *state)
+{
+}
+#endif
 
 static void xen_wakeup_notifier(Notifier *notifier, void *data)
 {
@@ -1331,7 +1353,11 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 state->bufioreq_local_port = rc;
 
 /* Init RAM management */
+#ifdef XEN_COMPAT_PHYSMAP
 xen_map_cache_init(xen_phys_offset_to_gaddr, state);
+#else
+xen_map_cache_init(NULL, state);
+#endif
 xen_ram_init(pcms, ram_size, ram_memory);
 
 qemu_add_vm_change_state_handler(xen_hvm_change_state_handler, state);
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index 8bc63e0..84cc4a2 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -239,7 +239,9 @@ static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr, 
hwaddr size,
 hwaddr address_offset;
 hwaddr cache_size = size;
 hwaddr test_bit_size;
+#ifdef XEN_COMPAT_PHYSMAP
 bool translated = false;
+#endif
 bool dummy = false;
 
 tryagain:
@@ -307,11 +309,13 @@ tryagain:
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
 mapcache->last_entry = NULL;
+#ifdef XEN_COMPAT_PHYSMAP
 if (!translated && mapcache->phys_offset_to_gaddr) {
 phys_addr = mapcache->phys_offset_to_gaddr(phys_addr, size, 
mapcache->opaque);
 translated = true;
 goto tryagain;
 }
+#endif
 if (!dummy && runstate_check(RUN_STATE_INMIGRATE)) {
 dummy = true;
 goto tryagain;
diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
index e28ed48..86c7f26 100644
--- a/include/hw/xen/xen_common.h
+++ b/include/hw/xen/xen_common.h
@@ -80,6

[Xen-devel] [PATCH v3 1/4] xen: move physmap saving into a separate function

2017-07-10 Thread Igor Druzhinin
Non-functional change.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
Reviewed-by: Stefano Stabellini <sstabell...@kernel.org>
Reviewed-by: Paul Durrant <paul.durr...@citrix.com>
---
 hw/i386/xen/xen-hvm.c | 57 ---
 1 file changed, 31 insertions(+), 26 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index cffa7e2..d259cf7 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -305,6 +305,36 @@ static hwaddr xen_phys_offset_to_gaddr(hwaddr start_addr,
 return start_addr;
 }
 
+static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
+{
+char path[80], value[17];
+
+snprintf(path, sizeof(path),
+"/local/domain/0/device-model/%d/physmap/%"PRIx64"/start_addr",
+xen_domid, (uint64_t)physmap->phys_offset);
+snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)physmap->start_addr);
+if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
+return -1;
+}
+snprintf(path, sizeof(path),
+"/local/domain/0/device-model/%d/physmap/%"PRIx64"/size",
+xen_domid, (uint64_t)physmap->phys_offset);
+snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)physmap->size);
+if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
+return -1;
+}
+if (physmap->name) {
+snprintf(path, sizeof(path),
+"/local/domain/0/device-model/%d/physmap/%"PRIx64"/name",
+xen_domid, (uint64_t)physmap->phys_offset);
+if (!xs_write(state->xenstore, 0, path,
+  physmap->name, strlen(physmap->name))) {
+return -1;
+}
+}
+return 0;
+}
+
 static int xen_add_to_physmap(XenIOState *state,
   hwaddr start_addr,
   ram_addr_t size,
@@ -316,7 +346,6 @@ static int xen_add_to_physmap(XenIOState *state,
 XenPhysmap *physmap = NULL;
 hwaddr pfn, start_gpfn;
 hwaddr phys_offset = memory_region_get_ram_addr(mr);
-char path[80], value[17];
 const char *mr_name;
 
 if (get_physmapping(state, start_addr, size)) {
@@ -368,31 +397,7 @@ go_physmap:
start_addr >> TARGET_PAGE_BITS,
(start_addr + size - 1) >> TARGET_PAGE_BITS,
XEN_DOMCTL_MEM_CACHEATTR_WB);
-
-snprintf(path, sizeof(path),
-"/local/domain/0/device-model/%d/physmap/%"PRIx64"/start_addr",
-xen_domid, (uint64_t)phys_offset);
-snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)start_addr);
-if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
-return -1;
-}
-snprintf(path, sizeof(path),
-"/local/domain/0/device-model/%d/physmap/%"PRIx64"/size",
-xen_domid, (uint64_t)phys_offset);
-snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)size);
-if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
-return -1;
-}
-if (mr_name) {
-snprintf(path, sizeof(path),
-"/local/domain/0/device-model/%d/physmap/%"PRIx64"/name",
-xen_domid, (uint64_t)phys_offset);
-if (!xs_write(state->xenstore, 0, path, mr_name, strlen(mr_name))) {
-return -1;
-}
-}
-
-return 0;
+return xen_save_physmap(state, physmap);
 }
 
 static int xen_remove_from_physmap(XenIOState *state,
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 2/4] xen/mapcache: add an ability to create dummy mappings

2017-07-10 Thread Igor Druzhinin
Dummys are simple anonymous mappings that are placed instead
of regular foreign mappings in certain situations when we need
to postpone the actual mapping but still have to give a
memory region to QEMU to play with.

This is planned to be used for restore on Xen.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
Reviewed-by: Paul Durrant <paul.durr...@citrix.com>
Reviewed-by: Stefano Stabellini <sstabell...@kernel.org>
---
 hw/i386/xen/xen-mapcache.c | 44 
 1 file changed, 36 insertions(+), 8 deletions(-)

diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index e60156c..39cb511 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -53,6 +53,8 @@ typedef struct MapCacheEntry {
 uint8_t *vaddr_base;
 unsigned long *valid_mapping;
 uint8_t lock;
+#define XEN_MAPCACHE_ENTRY_DUMMY (1 << 0)
+uint8_t flags;
 hwaddr size;
 struct MapCacheEntry *next;
 } MapCacheEntry;
@@ -150,7 +152,8 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 
 static void xen_remap_bucket(MapCacheEntry *entry,
  hwaddr size,
- hwaddr address_index)
+ hwaddr address_index,
+ bool dummy)
 {
 uint8_t *vaddr_base;
 xen_pfn_t *pfns;
@@ -177,11 +180,25 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 pfns[i] = (address_index << (MCACHE_BUCKET_SHIFT-XC_PAGE_SHIFT)) + i;
 }
 
-vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid, 
PROT_READ|PROT_WRITE,
-  nb_pfn, pfns, err);
-if (vaddr_base == NULL) {
-perror("xenforeignmemory_map");
-exit(-1);
+if (!dummy) {
+vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
+   PROT_READ | PROT_WRITE,
+   nb_pfn, pfns, err);
+if (vaddr_base == NULL) {
+perror("xenforeignmemory_map");
+exit(-1);
+}
+} else {
+/*
+ * We create dummy mappings where we are unable to create a foreign
+ * mapping immediately due to certain circumstances (i.e. on resume 
now)
+ */
+vaddr_base = mmap(NULL, size, PROT_READ | PROT_WRITE,
+  MAP_ANON | MAP_SHARED, -1, 0);
+if (vaddr_base == NULL) {
+perror("mmap");
+exit(-1);
+}
 }
 
 entry->vaddr_base = vaddr_base;
@@ -190,6 +207,12 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 entry->valid_mapping = (unsigned long *) g_malloc0(sizeof(unsigned long) *
 BITS_TO_LONGS(size >> XC_PAGE_SHIFT));
 
+if (dummy) {
+entry->flags |= XEN_MAPCACHE_ENTRY_DUMMY;
+} else {
+entry->flags &= ~(XEN_MAPCACHE_ENTRY_DUMMY);
+}
+
 ram_block_notify_add(entry->vaddr_base, entry->size);
 bitmap_zero(entry->valid_mapping, nb_pfn);
 for (i = 0; i < nb_pfn; i++) {
@@ -211,6 +234,7 @@ static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr, 
hwaddr size,
 hwaddr cache_size = size;
 hwaddr test_bit_size;
 bool translated = false;
+bool dummy = false;
 
 tryagain:
 address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
@@ -262,14 +286,14 @@ tryagain:
 if (!entry) {
 entry = g_malloc0(sizeof (MapCacheEntry));
 pentry->next = entry;
-xen_remap_bucket(entry, cache_size, address_index);
+xen_remap_bucket(entry, cache_size, address_index, dummy);
 } else if (!entry->lock) {
 if (!entry->vaddr_base || entry->paddr_index != address_index ||
 entry->size != cache_size ||
 !test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
-xen_remap_bucket(entry, cache_size, address_index);
+xen_remap_bucket(entry, cache_size, address_index, dummy);
 }
 }
 
@@ -282,6 +306,10 @@ tryagain:
 translated = true;
 goto tryagain;
 }
+if (!dummy && runstate_check(RUN_STATE_INMIGRATE)) {
+dummy = true;
+goto tryagain;
+}
 trace_xen_map_cache_return(NULL);
 return NULL;
 }
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 3/4] xen/mapcache: introduce xen_replace_cache_entry()

2017-07-04 Thread Igor Druzhinin
On 04/07/17 17:42, Paul Durrant wrote:
>> -Original Message-
>> From: Igor Druzhinin
>> Sent: 04 July 2017 17:34
>> To: Paul Durrant <paul.durr...@citrix.com>; xen-de...@lists.xenproject.org;
>> qemu-de...@nongnu.org
>> Cc: sstabell...@kernel.org; Anthony Perard <anthony.per...@citrix.com>;
>> pbonz...@redhat.com
>> Subject: Re: [PATCH v2 3/4] xen/mapcache: introduce
>> xen_replace_cache_entry()
>>
>> On 04/07/17 17:27, Paul Durrant wrote:
>>>> -Original Message-
>>>> From: Igor Druzhinin
>>>> Sent: 04 July 2017 16:48
>>>> To: xen-de...@lists.xenproject.org; qemu-de...@nongnu.org
>>>> Cc: Igor Druzhinin <igor.druzhi...@citrix.com>; sstabell...@kernel.org;
>>>> Anthony Perard <anthony.per...@citrix.com>; Paul Durrant
>>>> <paul.durr...@citrix.com>; pbonz...@redhat.com
>>>> Subject: [PATCH v2 3/4] xen/mapcache: introduce
>>>> xen_replace_cache_entry()
>>>>
>>>> This new call is trying to update a requested map cache entry
>>>> according to the changes in the physmap. The call is searching
>>>> for the entry, unmaps it and maps again at the same place using
>>>> a new guest address. If the mapping is dummy this call will
>>>> make it real.
>>>>
>>>> This function makes use of a new xenforeignmemory_map2() call
>>>> with an extended interface that was recently introduced in
>>>> libxenforeignmemory [1].
>>>
>>> I don't understand how the compat layer works here. If
>> xenforeignmemory_map2() is not available then you can't control the
>> placement in virtual address space.
>>>
>>
>> If it's not 4.10 or newer xenforeignmemory_map2() doesn't exist and is
>> going to be defined as xenforeignmemory_map(). At the same time
>> XEN_COMPAT_PHYSMAP is defined and the entry replace function (which
>> relies on xenforeignmemory_map2 functionality) is never going to be called.
>>
>> If you mean that I should incorporate this into the description I can do it.
> 
> AFAICT XEN_COMPAT_PHYSMAP is not introduced until patch #4 though.
> 
> The problem really comes down to defining xenforeignmemory_map2() in terms of 
> xenforeignmemory_map(). It basically can't be safely done. Could you define 
> xenforeignmemory_map2() as abort() in the compat case instead? 
>

xen_replace_cache_entry() is not called in patch #3. Which means it's
safe to use a fallback version (xenforeignmemory_map) in
xen_remap_bucket here.

Igor

>   Paul
> 
>>
>> Igor
>>
>>>   Paul
>>>
>>>>
>>>> [1] https://www.mail-archive.com/xen-
>> de...@lists.xen.org/msg113007.html
>>>>
>>>> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
>>>> ---
>>>>  configure | 18 ++
>>>>  hw/i386/xen/xen-mapcache.c| 79
>>>> ++-
>>>>  include/hw/xen/xen_common.h   |  7 
>>>>  include/sysemu/xen-mapcache.h | 11 +-
>>>>  4 files changed, 106 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/configure b/configure
>>>> index c571ad1..ad6156b 100755
>>>> --- a/configure
>>>> +++ b/configure
>>>> @@ -2021,6 +2021,24 @@ EOF
>>>>  # Xen unstable
>>>>  elif
>>>>  cat > $TMPC <>>> +#undef XC_WANT_COMPAT_MAP_FOREIGN_API
>>>> +#include 
>>>> +int main(void) {
>>>> +  xenforeignmemory_handle *xfmem;
>>>> +
>>>> +  xfmem = xenforeignmemory_open(0, 0);
>>>> +  xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
>>>> +
>>>> +  return 0;
>>>> +}
>>>> +EOF
>>>> +compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
>>>> +  then
>>>> +  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
>>>> +  xen_ctrl_version=41000
>>>> +  xen=yes
>>>> +elif
>>>> +cat > $TMPC <>>>  #undef XC_WANT_COMPAT_DEVICEMODEL_API
>>>>  #define __XEN_TOOLS__
>>>>  #include 
>>>> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
>>>> index cd4e746..a988be7 100644
>>>> --- a/hw/i386/xen/xen-mapcache.c
>>>> +++ b/hw/i386/xen/xen-mapcache.c
>>>> @@ -151,

Re: [Xen-devel] [PATCH v2 3/4] xen/mapcache: introduce xen_replace_cache_entry()

2017-07-04 Thread Igor Druzhinin
On 04/07/17 17:27, Paul Durrant wrote:
>> -Original Message-
>> From: Igor Druzhinin
>> Sent: 04 July 2017 16:48
>> To: xen-de...@lists.xenproject.org; qemu-de...@nongnu.org
>> Cc: Igor Druzhinin <igor.druzhi...@citrix.com>; sstabell...@kernel.org;
>> Anthony Perard <anthony.per...@citrix.com>; Paul Durrant
>> <paul.durr...@citrix.com>; pbonz...@redhat.com
>> Subject: [PATCH v2 3/4] xen/mapcache: introduce
>> xen_replace_cache_entry()
>>
>> This new call is trying to update a requested map cache entry
>> according to the changes in the physmap. The call is searching
>> for the entry, unmaps it and maps again at the same place using
>> a new guest address. If the mapping is dummy this call will
>> make it real.
>>
>> This function makes use of a new xenforeignmemory_map2() call
>> with an extended interface that was recently introduced in
>> libxenforeignmemory [1].
> 
> I don't understand how the compat layer works here. If 
> xenforeignmemory_map2() is not available then you can't control the placement 
> in virtual address space.
> 

If it's not 4.10 or newer xenforeignmemory_map2() doesn't exist and is
going to be defined as xenforeignmemory_map(). At the same time
XEN_COMPAT_PHYSMAP is defined and the entry replace function (which
relies on xenforeignmemory_map2 functionality) is never going to be called.

If you mean that I should incorporate this into the description I can do it.

Igor

>   Paul
> 
>>
>> [1] https://www.mail-archive.com/xen-devel@lists.xen.org/msg113007.html
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
>> ---
>>  configure | 18 ++
>>  hw/i386/xen/xen-mapcache.c| 79
>> ++-
>>  include/hw/xen/xen_common.h   |  7 
>>  include/sysemu/xen-mapcache.h | 11 +-
>>  4 files changed, 106 insertions(+), 9 deletions(-)
>>
>> diff --git a/configure b/configure
>> index c571ad1..ad6156b 100755
>> --- a/configure
>> +++ b/configure
>> @@ -2021,6 +2021,24 @@ EOF
>>  # Xen unstable
>>  elif
>>  cat > $TMPC <> +#undef XC_WANT_COMPAT_MAP_FOREIGN_API
>> +#include 
>> +int main(void) {
>> +  xenforeignmemory_handle *xfmem;
>> +
>> +  xfmem = xenforeignmemory_open(0, 0);
>> +  xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
>> +
>> +  return 0;
>> +}
>> +EOF
>> +compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
>> +  then
>> +  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
>> +  xen_ctrl_version=41000
>> +  xen=yes
>> +elif
>> +cat > $TMPC <>  #undef XC_WANT_COMPAT_DEVICEMODEL_API
>>  #define __XEN_TOOLS__
>>  #include 
>> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
>> index cd4e746..a988be7 100644
>> --- a/hw/i386/xen/xen-mapcache.c
>> +++ b/hw/i386/xen/xen-mapcache.c
>> @@ -151,6 +151,7 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f,
>> void *opaque)
>>  }
>>
>>  static void xen_remap_bucket(MapCacheEntry *entry,
>> + void *vaddr,
>>   hwaddr size,
>>   hwaddr address_index,
>>   bool dummy)
>> @@ -167,7 +168,9 @@ static void xen_remap_bucket(MapCacheEntry
>> *entry,
>>  err = g_malloc0(nb_pfn * sizeof (int));
>>
>>  if (entry->vaddr_base != NULL) {
>> -ram_block_notify_remove(entry->vaddr_base, entry->size);
>> +if (entry->vaddr_base != vaddr) {
>> +ram_block_notify_remove(entry->vaddr_base, entry->size);
>> +}
>>  if (munmap(entry->vaddr_base, entry->size) != 0) {
>>  perror("unmap fails");
>>  exit(-1);
>> @@ -181,11 +184,11 @@ static void xen_remap_bucket(MapCacheEntry
>> *entry,
>>  }
>>
>>  if (!dummy) {
>> -vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
>> -   PROT_READ|PROT_WRITE,
>> +vaddr_base = xenforeignmemory_map2(xen_fmem, xen_domid,
>> vaddr,
>> +   PROT_READ|PROT_WRITE, 0,
>> nb_pfn, pfns, err);
>>  if (vaddr_base == NULL) {
>> -perror("xenforeignmemory_map");
>> +perror("xenforeignmemory_map

[Xen-devel] [PATCH v2 3/4] xen/mapcache: introduce xen_replace_cache_entry()

2017-07-04 Thread Igor Druzhinin
This new call is trying to update a requested map cache entry
according to the changes in the physmap. The call is searching
for the entry, unmaps it and maps again at the same place using
a new guest address. If the mapping is dummy this call will
make it real.

This function makes use of a new xenforeignmemory_map2() call
with an extended interface that was recently introduced in
libxenforeignmemory [1].

[1] https://www.mail-archive.com/xen-devel@lists.xen.org/msg113007.html

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 configure | 18 ++
 hw/i386/xen/xen-mapcache.c| 79 ++-
 include/hw/xen/xen_common.h   |  7 
 include/sysemu/xen-mapcache.h | 11 +-
 4 files changed, 106 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index c571ad1..ad6156b 100755
--- a/configure
+++ b/configure
@@ -2021,6 +2021,24 @@ EOF
 # Xen unstable
 elif
 cat > $TMPC <
+int main(void) {
+  xenforeignmemory_handle *xfmem;
+
+  xfmem = xenforeignmemory_open(0, 0);
+  xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
+
+  return 0;
+}
+EOF
+compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
+  then
+  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
+  xen_ctrl_version=41000
+  xen=yes
+elif
+cat > $TMPC <
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index cd4e746..a988be7 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -151,6 +151,7 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 }
 
 static void xen_remap_bucket(MapCacheEntry *entry,
+ void *vaddr,
  hwaddr size,
  hwaddr address_index,
  bool dummy)
@@ -167,7 +168,9 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 err = g_malloc0(nb_pfn * sizeof (int));
 
 if (entry->vaddr_base != NULL) {
-ram_block_notify_remove(entry->vaddr_base, entry->size);
+if (entry->vaddr_base != vaddr) {
+ram_block_notify_remove(entry->vaddr_base, entry->size);
+}
 if (munmap(entry->vaddr_base, entry->size) != 0) {
 perror("unmap fails");
 exit(-1);
@@ -181,11 +184,11 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 }
 
 if (!dummy) {
-vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
-   PROT_READ|PROT_WRITE,
+vaddr_base = xenforeignmemory_map2(xen_fmem, xen_domid, vaddr,
+   PROT_READ|PROT_WRITE, 0,
nb_pfn, pfns, err);
 if (vaddr_base == NULL) {
-perror("xenforeignmemory_map");
+perror("xenforeignmemory_map2");
 exit(-1);
 }
 entry->flags &= ~(XEN_MAPCACHE_ENTRY_DUMMY);
@@ -194,7 +197,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
  * We create dummy mappings where we are unable to create a foreign
  * mapping immediately due to certain circumstances (i.e. on resume 
now)
  */
-vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
+vaddr_base = mmap(vaddr, size, PROT_READ|PROT_WRITE,
   MAP_ANON|MAP_SHARED, -1, 0);
 if (vaddr_base == NULL) {
 perror("mmap");
@@ -203,13 +206,16 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 entry->flags |= XEN_MAPCACHE_ENTRY_DUMMY;
 }
 
+if (entry->vaddr_base == NULL || entry->vaddr_base != vaddr) {
+ram_block_notify_add(vaddr_base, size);
+}
+
 entry->vaddr_base = vaddr_base;
 entry->paddr_index = address_index;
 entry->size = size;
 entry->valid_mapping = (unsigned long *) g_malloc0(sizeof(unsigned long) *
 BITS_TO_LONGS(size >> XC_PAGE_SHIFT));
 
-ram_block_notify_add(entry->vaddr_base, entry->size);
 bitmap_zero(entry->valid_mapping, nb_pfn);
 for (i = 0; i < nb_pfn; i++) {
 if (!err[i]) {
@@ -282,14 +288,14 @@ tryagain:
 if (!entry) {
 entry = g_malloc0(sizeof (MapCacheEntry));
 pentry->next = entry;
-xen_remap_bucket(entry, cache_size, address_index, dummy);
+xen_remap_bucket(entry, NULL, cache_size, address_index, dummy);
 } else if (!entry->lock) {
 if (!entry->vaddr_base || entry->paddr_index != address_index ||
 entry->size != cache_size ||
 !test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
-xen_remap_bucket(entry, cache_size, address_index, dummy);
+

[Xen-devel] [PATCH v2 0/4] xen: don't save/restore the physmap on VM save/restore

2017-07-04 Thread Igor Druzhinin
Saving/restoring the physmap to/from xenstore was introduced to
QEMU majorly in order to cover up the VRAM region restore issue.
The sequence of restore operations implies that we should know
the effective guest VRAM address *before* we have the VRAM region
restored (which happens later). Unfortunately, in Xen environment
VRAM memory does actually belong to a guest - not QEMU itself -
which means the position of this region is unknown beforehand and
can't be mapped into QEMU address space immediately.

Previously, recreating xenstore keys, holding the physmap, by the
toolstack helped to get this information in place at the right
moment ready to be consumed by QEMU to map the region properly.
But using xenstore for it has certain disadvantages: toolstack
needs to be aware of these keys and save/restore them accordingly;
accessing xenstore requires extra privileges which hinders QEMU
sandboxing.

The previous attempt to get rid of that was to remember all the
VRAM pointers during QEMU initialization phase and then update
them all at once when an actual foreign mapping is established.
Unfortunately, this approach worked only for VRAM and only for
a predefined set of devices - stdvga and cirrus. QXL and other
possible future devices using a moving emulated MMIO region
would be equally broken.

The new approach leverages xenforeignmemory_map2() call recently
introduced in libxenforeignmemory. It allows to create a dummy
anonymous mapping for QEMU during its initialization and change
it to a real one later during machine state restore.

---
Changed in v2:
* Patch 2: set dummy flag in a new flags field in struct MapCacheEntry
* Patch 3: change xen_remap_cache_entry name and signature
* Patch 3: gate ram_block_notify_* functions in xen_remap_bucket
* Patch 3: rewrite the logic of xen_replace_cache_entry_unlocked to
   reuse the existing entry instead of allocating a new one
* Patch 4: don't use xen_phys_offset_to_gaddr in non-compat mode

---
Igor Druzhinin (4):
  xen: move physmap saving into a separate function
  xen/mapcache: add an ability to create dummy mappings
  xen/mapcache: introduce xen_replace_cache_entry()
  xen: don't use xenstore to save/restore physmap anymore

 configure |  18 +++
 hw/i386/xen/xen-hvm.c | 105 ++---
 hw/i386/xen/xen-mapcache.c| 107 ++
 include/hw/xen/xen_common.h   |   8 
 include/sysemu/xen-mapcache.h |  11 -
 5 files changed, 201 insertions(+), 48 deletions(-)

-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 4/4] xen: don't use xenstore to save/restore physmap anymore

2017-07-04 Thread Igor Druzhinin
If we have a system with xenforeignmemory_map2() implemented
we don't need to save/restore physmap on suspend/restore
anymore. In case we resume a VM without physmap - try to
recreate the physmap during memory region restore phase and
remap map cache entries accordingly. The old code is left
for compatibility reasons.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 hw/i386/xen/xen-hvm.c   | 48 ++---
 include/hw/xen/xen_common.h |  1 +
 2 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index d259cf7..d24ca47 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -289,6 +289,7 @@ static XenPhysmap *get_physmapping(XenIOState *state,
 return NULL;
 }
 
+#ifdef XEN_COMPAT_PHYSMAP
 static hwaddr xen_phys_offset_to_gaddr(hwaddr start_addr,
ram_addr_t size, void 
*opaque)
 {
@@ -334,6 +335,12 @@ static int xen_save_physmap(XenIOState *state, XenPhysmap 
*physmap)
 }
 return 0;
 }
+#else
+static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
+{
+return 0;
+}
+#endif
 
 static int xen_add_to_physmap(XenIOState *state,
   hwaddr start_addr,
@@ -368,6 +375,26 @@ go_physmap:
 DPRINTF("mapping vram to %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
 start_addr, start_addr + size);
 
+mr_name = memory_region_name(mr);
+
+physmap = g_malloc(sizeof (XenPhysmap));
+
+physmap->start_addr = start_addr;
+physmap->size = size;
+physmap->name = mr_name;
+physmap->phys_offset = phys_offset;
+
+QLIST_INSERT_HEAD(>physmap, physmap, list);
+
+if (runstate_check(RUN_STATE_INMIGRATE)) {
+/* Now when we have a physmap entry we can replace a dummy mapping with
+ * a real one of guest foreign memory. */
+uint8_t *p = xen_replace_cache_entry(phys_offset, start_addr, size);
+assert(p && p == memory_region_get_ram_ptr(mr));
+
+return 0;
+}
+
 pfn = phys_offset >> TARGET_PAGE_BITS;
 start_gpfn = start_addr >> TARGET_PAGE_BITS;
 for (i = 0; i < size >> TARGET_PAGE_BITS; i++) {
@@ -382,17 +409,6 @@ go_physmap:
 }
 }
 
-mr_name = memory_region_name(mr);
-
-physmap = g_malloc(sizeof (XenPhysmap));
-
-physmap->start_addr = start_addr;
-physmap->size = size;
-physmap->name = mr_name;
-physmap->phys_offset = phys_offset;
-
-QLIST_INSERT_HEAD(>physmap, physmap, list);
-
 xc_domain_pin_memory_cacheattr(xen_xc, xen_domid,
start_addr >> TARGET_PAGE_BITS,
(start_addr + size - 1) >> TARGET_PAGE_BITS,
@@ -1158,6 +1174,7 @@ static void xen_exit_notifier(Notifier *n, void *data)
 xs_daemon_close(state->xenstore);
 }
 
+#ifdef XEN_COMPAT_PHYSMAP
 static void xen_read_physmap(XenIOState *state)
 {
 XenPhysmap *physmap = NULL;
@@ -1205,6 +1222,11 @@ static void xen_read_physmap(XenIOState *state)
 }
 free(entries);
 }
+#else
+static void xen_read_physmap(XenIOState *state)
+{
+}
+#endif
 
 static void xen_wakeup_notifier(Notifier *notifier, void *data)
 {
@@ -1331,7 +1353,11 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 state->bufioreq_local_port = rc;
 
 /* Init RAM management */
+#ifdef XEN_COMPAT_PHYSMAP
 xen_map_cache_init(xen_phys_offset_to_gaddr, state);
+#else
+xen_map_cache_init(NULL, state);
+#endif
 xen_ram_init(pcms, ram_size, ram_memory);
 
 qemu_add_vm_change_state_handler(xen_hvm_change_state_handler, state);
diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
index 70a5cad..c04c5c9 100644
--- a/include/hw/xen/xen_common.h
+++ b/include/hw/xen/xen_common.h
@@ -80,6 +80,7 @@ extern xenforeignmemory_handle *xen_fmem;
 
 #if CONFIG_XEN_CTRL_INTERFACE_VERSION < 41000
 
+#define XEN_COMPAT_PHYSMAP
 #define xenforeignmemory_map2(h, d, a, p, f, ps, ar, e) \
 xenforeignmemory_map(h, d, p, ps, ar, e)
 
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 1/4] xen: move physmap saving into a separate function

2017-07-04 Thread Igor Druzhinin
Non-functional change.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 hw/i386/xen/xen-hvm.c | 57 ---
 1 file changed, 31 insertions(+), 26 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index cffa7e2..d259cf7 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -305,6 +305,36 @@ static hwaddr xen_phys_offset_to_gaddr(hwaddr start_addr,
 return start_addr;
 }
 
+static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
+{
+char path[80], value[17];
+
+snprintf(path, sizeof(path),
+"/local/domain/0/device-model/%d/physmap/%"PRIx64"/start_addr",
+xen_domid, (uint64_t)physmap->phys_offset);
+snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)physmap->start_addr);
+if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
+return -1;
+}
+snprintf(path, sizeof(path),
+"/local/domain/0/device-model/%d/physmap/%"PRIx64"/size",
+xen_domid, (uint64_t)physmap->phys_offset);
+snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)physmap->size);
+if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
+return -1;
+}
+if (physmap->name) {
+snprintf(path, sizeof(path),
+"/local/domain/0/device-model/%d/physmap/%"PRIx64"/name",
+xen_domid, (uint64_t)physmap->phys_offset);
+if (!xs_write(state->xenstore, 0, path,
+  physmap->name, strlen(physmap->name))) {
+return -1;
+}
+}
+return 0;
+}
+
 static int xen_add_to_physmap(XenIOState *state,
   hwaddr start_addr,
   ram_addr_t size,
@@ -316,7 +346,6 @@ static int xen_add_to_physmap(XenIOState *state,
 XenPhysmap *physmap = NULL;
 hwaddr pfn, start_gpfn;
 hwaddr phys_offset = memory_region_get_ram_addr(mr);
-char path[80], value[17];
 const char *mr_name;
 
 if (get_physmapping(state, start_addr, size)) {
@@ -368,31 +397,7 @@ go_physmap:
start_addr >> TARGET_PAGE_BITS,
(start_addr + size - 1) >> TARGET_PAGE_BITS,
XEN_DOMCTL_MEM_CACHEATTR_WB);
-
-snprintf(path, sizeof(path),
-"/local/domain/0/device-model/%d/physmap/%"PRIx64"/start_addr",
-xen_domid, (uint64_t)phys_offset);
-snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)start_addr);
-if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
-return -1;
-}
-snprintf(path, sizeof(path),
-"/local/domain/0/device-model/%d/physmap/%"PRIx64"/size",
-xen_domid, (uint64_t)phys_offset);
-snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)size);
-if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
-return -1;
-}
-if (mr_name) {
-snprintf(path, sizeof(path),
-"/local/domain/0/device-model/%d/physmap/%"PRIx64"/name",
-xen_domid, (uint64_t)phys_offset);
-if (!xs_write(state->xenstore, 0, path, mr_name, strlen(mr_name))) {
-return -1;
-}
-}
-
-return 0;
+return xen_save_physmap(state, physmap);
 }
 
 static int xen_remove_from_physmap(XenIOState *state,
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 2/4] xen/mapcache: add an ability to create dummy mappings

2017-07-04 Thread Igor Druzhinin
Dummys are simple anonymous mappings that are placed instead
of regular foreign mappings in certain situations when we need
to postpone the actual mapping but still have to give a
memory region to QEMU to play with.

This is planned to be used for restore on Xen.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 hw/i386/xen/xen-mapcache.c | 40 
 1 file changed, 32 insertions(+), 8 deletions(-)

diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index e60156c..cd4e746 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -53,6 +53,8 @@ typedef struct MapCacheEntry {
 uint8_t *vaddr_base;
 unsigned long *valid_mapping;
 uint8_t lock;
+#define XEN_MAPCACHE_ENTRY_DUMMY (1 << 0)
+uint8_t flags;
 hwaddr size;
 struct MapCacheEntry *next;
 } MapCacheEntry;
@@ -150,7 +152,8 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 
 static void xen_remap_bucket(MapCacheEntry *entry,
  hwaddr size,
- hwaddr address_index)
+ hwaddr address_index,
+ bool dummy)
 {
 uint8_t *vaddr_base;
 xen_pfn_t *pfns;
@@ -177,11 +180,27 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 pfns[i] = (address_index << (MCACHE_BUCKET_SHIFT-XC_PAGE_SHIFT)) + i;
 }
 
-vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid, 
PROT_READ|PROT_WRITE,
-  nb_pfn, pfns, err);
-if (vaddr_base == NULL) {
-perror("xenforeignmemory_map");
-exit(-1);
+if (!dummy) {
+vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
+   PROT_READ|PROT_WRITE,
+   nb_pfn, pfns, err);
+if (vaddr_base == NULL) {
+perror("xenforeignmemory_map");
+exit(-1);
+}
+entry->flags &= ~(XEN_MAPCACHE_ENTRY_DUMMY);
+} else {
+/*
+ * We create dummy mappings where we are unable to create a foreign
+ * mapping immediately due to certain circumstances (i.e. on resume 
now)
+ */
+vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
+  MAP_ANON|MAP_SHARED, -1, 0);
+if (vaddr_base == NULL) {
+perror("mmap");
+exit(-1);
+}
+entry->flags |= XEN_MAPCACHE_ENTRY_DUMMY;
 }
 
 entry->vaddr_base = vaddr_base;
@@ -211,6 +230,7 @@ static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr, 
hwaddr size,
 hwaddr cache_size = size;
 hwaddr test_bit_size;
 bool translated = false;
+bool dummy = false;
 
 tryagain:
 address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
@@ -262,14 +282,14 @@ tryagain:
 if (!entry) {
 entry = g_malloc0(sizeof (MapCacheEntry));
 pentry->next = entry;
-xen_remap_bucket(entry, cache_size, address_index);
+xen_remap_bucket(entry, cache_size, address_index, dummy);
 } else if (!entry->lock) {
 if (!entry->vaddr_base || entry->paddr_index != address_index ||
 entry->size != cache_size ||
 !test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
-xen_remap_bucket(entry, cache_size, address_index);
+xen_remap_bucket(entry, cache_size, address_index, dummy);
 }
 }
 
@@ -282,6 +302,10 @@ tryagain:
 translated = true;
 goto tryagain;
 }
+if (!dummy && runstate_check(RUN_STATE_INMIGRATE)) {
+dummy = true;
+goto tryagain;
+}
 trace_xen_map_cache_return(NULL);
 return NULL;
 }
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/4] xen/mapcache: introduce xen_remap_cache_entry()

2017-07-03 Thread Igor Druzhinin
On 01/07/17 01:08, Stefano Stabellini wrote:
> On Fri, 30 Jun 2017, Igor Druzhinin wrote:
>> This new call is trying to update a requested map cache entry
>> according to the changes in the physmap. The call is searching
>> for the entry, unmaps it, tries to translate the address and
>> maps again at the same place. If the mapping is dummy this call
>> will make it real.
>>
>> This function makes use of a new xenforeignmemory_map2() call
>> with extended interface that was recently introduced in
>> libxenforeignmemory [1].
>>
>> [1] https://www.mail-archive.com/xen-devel@lists.xen.org/msg113007.html
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
>> ---
>>  configure |  18 
>>  hw/i386/xen/xen-mapcache.c| 105 
>> +++---
>>  include/hw/xen/xen_common.h   |   7 +++
>>  include/sysemu/xen-mapcache.h |   6 +++
>>  4 files changed, 130 insertions(+), 6 deletions(-)
>>
>> diff --git a/configure b/configure
>> index c571ad1..ad6156b 100755
>> --- a/configure
>> +++ b/configure
>> @@ -2021,6 +2021,24 @@ EOF
>>  # Xen unstable
>>  elif
>>  cat > $TMPC <> +#undef XC_WANT_COMPAT_MAP_FOREIGN_API
>> +#include 
>> +int main(void) {
>> +  xenforeignmemory_handle *xfmem;
>> +
>> +  xfmem = xenforeignmemory_open(0, 0);
>> +  xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
>> +
>> +  return 0;
>> +}
>> +EOF
>> +compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
>> +  then
>> +  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
>> +  xen_ctrl_version=41000
>> +  xen=yes
>> +elif
>> +cat > $TMPC <>  #undef XC_WANT_COMPAT_DEVICEMODEL_API
>>  #define __XEN_TOOLS__
>>  #include 
>> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
>> index 05050de..5d8d990 100644
>> --- a/hw/i386/xen/xen-mapcache.c
>> +++ b/hw/i386/xen/xen-mapcache.c
>> @@ -149,6 +149,7 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
>> *opaque)
>>  }
>>  
>>  static void xen_remap_bucket(MapCacheEntry *entry,
>> + void *vaddr,
>>   hwaddr size,
>>   hwaddr address_index,
>>   bool dummy)
>> @@ -179,11 +180,11 @@ static void xen_remap_bucket(MapCacheEntry *entry,
>>  }
>>  
>>  if (!dummy) {
>> -vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
>> -   PROT_READ|PROT_WRITE,
>> +vaddr_base = xenforeignmemory_map2(xen_fmem, xen_domid, vaddr,
>> +   PROT_READ|PROT_WRITE, 0,
>> nb_pfn, pfns, err);
>>  if (vaddr_base == NULL) {
>> -perror("xenforeignmemory_map");
>> +perror("xenforeignmemory_map2");
>>  exit(-1);
>>  }
>>  } else {
>> @@ -191,7 +192,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
>>   * We create dummy mappings where we are unable to create a foreign
>>   * mapping immediately due to certain circumstances (i.e. on resume 
>> now)
>>   */
>> -vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
>> +vaddr_base = mmap(vaddr, size, PROT_READ|PROT_WRITE,
>>MAP_ANON|MAP_SHARED, -1, 0);
>>  if (vaddr_base == NULL) {
>>  perror("mmap");
>> @@ -278,14 +279,14 @@ tryagain:
>>  if (!entry) {
>>  entry = g_malloc0(sizeof (MapCacheEntry));
>>  pentry->next = entry;
>> -xen_remap_bucket(entry, cache_size, address_index, dummy);
>> +xen_remap_bucket(entry, NULL, cache_size, address_index, dummy);
>>  } else if (!entry->lock) {
>>  if (!entry->vaddr_base || entry->paddr_index != address_index ||
>>  entry->size != cache_size ||
>>  !test_bits(address_offset >> XC_PAGE_SHIFT,
>>  test_bit_size >> XC_PAGE_SHIFT,
>>  entry->valid_mapping)) {
>> -xen_remap_bucket(entry, cache_size, address_index, dummy);
>> +xen_remap_bucket(entry, NULL, cache_size, address_index, dummy);
>>  }
>>  }
>>  
>> @@ 

Re: [Xen-devel] [PATCH 2/4] xen/mapcache: add an ability to create dummy mappings

2017-07-03 Thread Igor Druzhinin
On 01/07/17 01:06, Stefano Stabellini wrote:
> On Fri, 30 Jun 2017, Igor Druzhinin wrote:
>> Dummys are simple anonymous mappings that are placed instead
>> of regular foreign mappings in certain situations when we need
>> to postpone the actual mapping but still have to give a
>> memory region to QEMU to play with.
>>
>> This is planned to be used for restore on Xen.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
>>
>> ---
>>  hw/i386/xen/xen-mapcache.c | 36 
>>  1 file changed, 28 insertions(+), 8 deletions(-)
>>
>> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
>> index e60156c..05050de 100644
>> --- a/hw/i386/xen/xen-mapcache.c
>> +++ b/hw/i386/xen/xen-mapcache.c
>> @@ -150,7 +150,8 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
>> *opaque)
>>  
>>  static void xen_remap_bucket(MapCacheEntry *entry,
>>   hwaddr size,
>> - hwaddr address_index)
>> + hwaddr address_index,
>> + bool dummy)
>>  {
>>  uint8_t *vaddr_base;
>>  xen_pfn_t *pfns;
>> @@ -177,11 +178,25 @@ static void xen_remap_bucket(MapCacheEntry *entry,
>>  pfns[i] = (address_index << (MCACHE_BUCKET_SHIFT-XC_PAGE_SHIFT)) + 
>> i;
>>  }
>>  
>> -vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid, 
>> PROT_READ|PROT_WRITE,
>> -  nb_pfn, pfns, err);
>> -if (vaddr_base == NULL) {
>> -perror("xenforeignmemory_map");
>> -exit(-1);
>> +if (!dummy) {
>> +vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
>> +   PROT_READ|PROT_WRITE,
>> +   nb_pfn, pfns, err);
>> +if (vaddr_base == NULL) {
>> +perror("xenforeignmemory_map");
>> +exit(-1);
>> +}
>> +} else {
>> +/*
>> + * We create dummy mappings where we are unable to create a foreign
>> + * mapping immediately due to certain circumstances (i.e. on resume 
>> now)
>> + */
>> +vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
>> +  MAP_ANON|MAP_SHARED, -1, 0);
>> +if (vaddr_base == NULL) {
>> +perror("mmap");
>> +exit(-1);
>> +}
> 
> For our sanity in debugging this in the future, I think it's best if we
> mark this mapcache entry as "dummy". Since we are at it, we could turn
> the lock field of MapCacheEntry into a flag field and #define LOCK as
> (1<<0) and DUMMY as (1<<1). Please do that as a separate patch.
>

Unfortunately, lock field is a reference counter (or at least it looks
like according to the source code). It seems to me that it's technically
possible to have one region locked from several places in QEMU code. For
that reason, I'd like to introduce a separate field - something like
uint8_t flags.

Igor

>>>  }
>>  
>>  entry->vaddr_base = vaddr_base;
>> @@ -211,6 +226,7 @@ static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr, 
>> hwaddr size,
>>  hwaddr cache_size = size;
>>  hwaddr test_bit_size;
>>  bool translated = false;
>> +bool dummy = false;
>>  
>>  tryagain:
>>  address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
>> @@ -262,14 +278,14 @@ tryagain:
>>  if (!entry) {
>>  entry = g_malloc0(sizeof (MapCacheEntry));
>>  pentry->next = entry;
>> -xen_remap_bucket(entry, cache_size, address_index);
>> +xen_remap_bucket(entry, cache_size, address_index, dummy);
>>  } else if (!entry->lock) {
>>  if (!entry->vaddr_base || entry->paddr_index != address_index ||
>>  entry->size != cache_size ||
>>  !test_bits(address_offset >> XC_PAGE_SHIFT,
>>  test_bit_size >> XC_PAGE_SHIFT,
>>  entry->valid_mapping)) {
>> -xen_remap_bucket(entry, cache_size, address_index);
>> +xen_remap_bucket(entry, cache_size, address_index, dummy);
>>  }
>>  }
>>  
>> @@ -282,6 +298,10 @@ tryagain:
>>  translated = true;
>>  goto tryagain;
>>  }
>> +if (!dummy && runstate_check(RUN_STATE_INMIGRATE)) {
>> +dummy = true;
>> +goto tryagain;
>> +}
>>  trace_xen_map_cache_return(NULL);
>>  return NULL;
>>  }
>> -- 
>> 2.7.4
>>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/balloon: don't online new memory initially

2017-07-03 Thread Igor Druzhinin
On 03/07/17 16:40, Juergen Gross wrote:
> When setting up the Xenstore watch for the memory target size the new
> watch will fire at once. Don't try to reach the configured target size
> by onlining new memory in this case, as the current memory size will
> be smaller in almost all cases due to e.g. BIOS reserved pages.
> 
> Onlining new memory will lead to more problems e.g. undesired conflicts
> with NVMe devices meant to be operated as block devices.
> 
> Instead remember the difference between target size and current size
> when the watch fires for the first time and apply it to any further
> size changes, too.
> 
> In order to avoid races between balloon.c and xen-balloon.c init calls
> do the xen-balloon.c initialization from balloon.c.
> 
> Signed-off-by: Juergen Gross 
> ---
>  drivers/xen/balloon.c |  3 +++
>  drivers/xen/xen-balloon.c | 20 
>  include/xen/balloon.h |  8 
>  3 files changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 50dcb68d8070..ab609255a0f3 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -780,6 +780,9 @@ static int __init balloon_init(void)
>   }
>  #endif
>  
> + /* Init the xen-balloon driver. */
> + xen_balloon_init();
> +
>   return 0;
>  }
>  subsys_initcall(balloon_init);
> diff --git a/drivers/xen/xen-balloon.c b/drivers/xen/xen-balloon.c
> index e7715cb62eef..66ec519c825c 100644
> --- a/drivers/xen/xen-balloon.c
> +++ b/drivers/xen/xen-balloon.c
> @@ -59,6 +59,8 @@ static void watch_target(struct xenbus_watch *watch,
>  {
>   unsigned long long new_target;
>   int err;
> + static bool watch_fired;
> + static unsigned long target_diff;
>  
>   err = xenbus_scanf(XBT_NIL, "memory", "target", "%llu", _target);
>   if (err != 1) {
> @@ -69,7 +71,14 @@ static void watch_target(struct xenbus_watch *watch,
>   /* The given memory/target value is in KiB, so it needs converting to
>* pages. PAGE_SHIFT converts bytes to pages, hence PAGE_SHIFT - 10.
>*/
> - balloon_set_new_target(new_target >> (PAGE_SHIFT - 10));
> + new_target >>= PAGE_SHIFT - 10;
> + if (watch_fired) {
> + balloon_set_new_target(new_target - target_diff);
> + return;
> + }
> +
> + watch_fired = true;
> + target_diff = new_target - balloon_stats.target_pages;
>  }
>  static struct xenbus_watch target_watch = {
>   .node = "memory/target",
> @@ -94,13 +103,8 @@ static struct notifier_block xenstore_notifier = {
>   .notifier_call = balloon_init_watcher,
>  };
>  
> -static int __init balloon_init(void)
> +void __init xen_balloon_init(void)
>  {
> - if (!xen_domain())
> - return -ENODEV;
> -
> - pr_info("Initialising balloon driver\n");
> -
>   register_balloon(_dev);
>  
>   register_xen_selfballooning(_dev);
> @@ -109,7 +113,7 @@ static int __init balloon_init(void)
>  
>   return 0;
>  }
> -subsys_initcall(balloon_init);
> +EXPORT_SYMBOL_GPL(xen_balloon_init);
>  
>  #define BALLOON_SHOW(name, format, args...)  \
>   static ssize_t show_##name(struct device *dev,  \
> diff --git a/include/xen/balloon.h b/include/xen/balloon.h
> index d1767dfb0d95..8906361bb50c 100644
> --- a/include/xen/balloon.h
> +++ b/include/xen/balloon.h
> @@ -35,3 +35,11 @@ static inline int register_xen_selfballooning(struct 
> device *dev)
>   return -ENOSYS;
>  }
>  #endif
> +
> +#ifdef CONFIG_XEN_BALLOON
> +void xen_balloon_init(void);
> +#else
> +static inline void xen_balloon_init(void)
> +{
> +}
> +#endif
> 

We came across the same issue just recently. The problem was that for
some kernel versions DMA buffers for emulated devices are allocated in
this recently hotplugged area. This area is not properly described for
QEMU so when a DMA request comes in QEMU treats it as "unassigned" and
skips by default. This eventually leads to cryptic failures of system
loading.

Internally we developed a workaround for QEMU with which we try to
satisfy all the "unassigned" requests. But it doesn't solves the problem
in a proper way IMHO.

I haven't not completely understood your use-case but we might try come
up with a general solution for both of the problems because they are
obviously related.

> Onlining new memory will lead to more problems e.g. undesired conflicts
> with NVMe devices meant to be operated as block devices.

Could you explain this in more detail?

Igor

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/4] xen/mapcache: add an ability to create dummy mappings

2017-06-30 Thread Igor Druzhinin
Dummys are simple anonymous mappings that are placed instead
of regular foreign mappings in certain situations when we need
to postpone the actual mapping but still have to give a
memory region to QEMU to play with.

This is planned to be used for restore on Xen.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 hw/i386/xen/xen-mapcache.c | 36 
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index e60156c..05050de 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -150,7 +150,8 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 
 static void xen_remap_bucket(MapCacheEntry *entry,
  hwaddr size,
- hwaddr address_index)
+ hwaddr address_index,
+ bool dummy)
 {
 uint8_t *vaddr_base;
 xen_pfn_t *pfns;
@@ -177,11 +178,25 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 pfns[i] = (address_index << (MCACHE_BUCKET_SHIFT-XC_PAGE_SHIFT)) + i;
 }
 
-vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid, 
PROT_READ|PROT_WRITE,
-  nb_pfn, pfns, err);
-if (vaddr_base == NULL) {
-perror("xenforeignmemory_map");
-exit(-1);
+if (!dummy) {
+vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
+   PROT_READ|PROT_WRITE,
+   nb_pfn, pfns, err);
+if (vaddr_base == NULL) {
+perror("xenforeignmemory_map");
+exit(-1);
+}
+} else {
+/*
+ * We create dummy mappings where we are unable to create a foreign
+ * mapping immediately due to certain circumstances (i.e. on resume 
now)
+ */
+vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
+  MAP_ANON|MAP_SHARED, -1, 0);
+if (vaddr_base == NULL) {
+perror("mmap");
+exit(-1);
+}
 }
 
 entry->vaddr_base = vaddr_base;
@@ -211,6 +226,7 @@ static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr, 
hwaddr size,
 hwaddr cache_size = size;
 hwaddr test_bit_size;
 bool translated = false;
+bool dummy = false;
 
 tryagain:
 address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
@@ -262,14 +278,14 @@ tryagain:
 if (!entry) {
 entry = g_malloc0(sizeof (MapCacheEntry));
 pentry->next = entry;
-xen_remap_bucket(entry, cache_size, address_index);
+xen_remap_bucket(entry, cache_size, address_index, dummy);
 } else if (!entry->lock) {
 if (!entry->vaddr_base || entry->paddr_index != address_index ||
 entry->size != cache_size ||
 !test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
-xen_remap_bucket(entry, cache_size, address_index);
+xen_remap_bucket(entry, cache_size, address_index, dummy);
 }
 }
 
@@ -282,6 +298,10 @@ tryagain:
 translated = true;
 goto tryagain;
 }
+if (!dummy && runstate_check(RUN_STATE_INMIGRATE)) {
+dummy = true;
+goto tryagain;
+}
 trace_xen_map_cache_return(NULL);
 return NULL;
 }
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4/4] xen: don't use xenstore to save/restore physmap anymore

2017-06-30 Thread Igor Druzhinin
If we have a system with xenforeignmemory_map2() implemented
we don't need to save/restore physmap on suspend/restore
anymore. In case we resume a VM without physmap - try to
recreate the physmap during memory region restore phase and
remap map cache entries accordingly. The old code is left
for compatibility reasons.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 hw/i386/xen/xen-hvm.c   | 45 ++---
 include/hw/xen/xen_common.h |  1 +
 2 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index d259cf7..1b6a5ce 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -305,6 +305,7 @@ static hwaddr xen_phys_offset_to_gaddr(hwaddr start_addr,
 return start_addr;
 }
 
+#ifdef XEN_COMPAT_PHYSMAP
 static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
 {
 char path[80], value[17];
@@ -334,6 +335,12 @@ static int xen_save_physmap(XenIOState *state, XenPhysmap 
*physmap)
 }
 return 0;
 }
+#else
+static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
+{
+return 0;
+}
+#endif
 
 static int xen_add_to_physmap(XenIOState *state,
   hwaddr start_addr,
@@ -368,6 +375,26 @@ go_physmap:
 DPRINTF("mapping vram to %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
 start_addr, start_addr + size);
 
+mr_name = memory_region_name(mr);
+
+physmap = g_malloc(sizeof (XenPhysmap));
+
+physmap->start_addr = start_addr;
+physmap->size = size;
+physmap->name = mr_name;
+physmap->phys_offset = phys_offset;
+
+QLIST_INSERT_HEAD(>physmap, physmap, list);
+
+if (runstate_check(RUN_STATE_INMIGRATE)) {
+/* Now when we have a physmap entry we can remap a dummy mapping and 
change
+ * it to a real one of guest foreign memory. */
+uint8_t *p = xen_remap_cache_entry(phys_offset, size);
+assert(p && p == memory_region_get_ram_ptr(mr));
+
+return 0;
+}
+
 pfn = phys_offset >> TARGET_PAGE_BITS;
 start_gpfn = start_addr >> TARGET_PAGE_BITS;
 for (i = 0; i < size >> TARGET_PAGE_BITS; i++) {
@@ -382,21 +409,11 @@ go_physmap:
 }
 }
 
-mr_name = memory_region_name(mr);
-
-physmap = g_malloc(sizeof (XenPhysmap));
-
-physmap->start_addr = start_addr;
-physmap->size = size;
-physmap->name = mr_name;
-physmap->phys_offset = phys_offset;
-
-QLIST_INSERT_HEAD(>physmap, physmap, list);
-
 xc_domain_pin_memory_cacheattr(xen_xc, xen_domid,
start_addr >> TARGET_PAGE_BITS,
(start_addr + size - 1) >> TARGET_PAGE_BITS,
XEN_DOMCTL_MEM_CACHEATTR_WB);
+
 return xen_save_physmap(state, physmap);
 }
 
@@ -1158,6 +1175,7 @@ static void xen_exit_notifier(Notifier *n, void *data)
 xs_daemon_close(state->xenstore);
 }
 
+#ifdef XEN_COMPAT_PHYSMAP
 static void xen_read_physmap(XenIOState *state)
 {
 XenPhysmap *physmap = NULL;
@@ -1205,6 +1223,11 @@ static void xen_read_physmap(XenIOState *state)
 }
 free(entries);
 }
+#else
+static void xen_read_physmap(XenIOState *state)
+{
+}
+#endif
 
 static void xen_wakeup_notifier(Notifier *notifier, void *data)
 {
diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
index 70a5cad..c04c5c9 100644
--- a/include/hw/xen/xen_common.h
+++ b/include/hw/xen/xen_common.h
@@ -80,6 +80,7 @@ extern xenforeignmemory_handle *xen_fmem;
 
 #if CONFIG_XEN_CTRL_INTERFACE_VERSION < 41000
 
+#define XEN_COMPAT_PHYSMAP
 #define xenforeignmemory_map2(h, d, a, p, f, ps, ar, e) \
 xenforeignmemory_map(h, d, p, ps, ar, e)
 
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3/4] xen/mapcache: introduce xen_remap_cache_entry()

2017-06-30 Thread Igor Druzhinin
This new call is trying to update a requested map cache entry
according to the changes in the physmap. The call is searching
for the entry, unmaps it, tries to translate the address and
maps again at the same place. If the mapping is dummy this call
will make it real.

This function makes use of a new xenforeignmemory_map2() call
with extended interface that was recently introduced in
libxenforeignmemory [1].

[1] https://www.mail-archive.com/xen-devel@lists.xen.org/msg113007.html

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 configure |  18 
 hw/i386/xen/xen-mapcache.c| 105 +++---
 include/hw/xen/xen_common.h   |   7 +++
 include/sysemu/xen-mapcache.h |   6 +++
 4 files changed, 130 insertions(+), 6 deletions(-)

diff --git a/configure b/configure
index c571ad1..ad6156b 100755
--- a/configure
+++ b/configure
@@ -2021,6 +2021,24 @@ EOF
 # Xen unstable
 elif
 cat > $TMPC <
+int main(void) {
+  xenforeignmemory_handle *xfmem;
+
+  xfmem = xenforeignmemory_open(0, 0);
+  xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
+
+  return 0;
+}
+EOF
+compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
+  then
+  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
+  xen_ctrl_version=41000
+  xen=yes
+elif
+cat > $TMPC <
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index 05050de..5d8d990 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -149,6 +149,7 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 }
 
 static void xen_remap_bucket(MapCacheEntry *entry,
+ void *vaddr,
  hwaddr size,
  hwaddr address_index,
  bool dummy)
@@ -179,11 +180,11 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 }
 
 if (!dummy) {
-vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
-   PROT_READ|PROT_WRITE,
+vaddr_base = xenforeignmemory_map2(xen_fmem, xen_domid, vaddr,
+   PROT_READ|PROT_WRITE, 0,
nb_pfn, pfns, err);
 if (vaddr_base == NULL) {
-perror("xenforeignmemory_map");
+perror("xenforeignmemory_map2");
 exit(-1);
 }
 } else {
@@ -191,7 +192,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
  * We create dummy mappings where we are unable to create a foreign
  * mapping immediately due to certain circumstances (i.e. on resume 
now)
  */
-vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
+vaddr_base = mmap(vaddr, size, PROT_READ|PROT_WRITE,
   MAP_ANON|MAP_SHARED, -1, 0);
 if (vaddr_base == NULL) {
 perror("mmap");
@@ -278,14 +279,14 @@ tryagain:
 if (!entry) {
 entry = g_malloc0(sizeof (MapCacheEntry));
 pentry->next = entry;
-xen_remap_bucket(entry, cache_size, address_index, dummy);
+xen_remap_bucket(entry, NULL, cache_size, address_index, dummy);
 } else if (!entry->lock) {
 if (!entry->vaddr_base || entry->paddr_index != address_index ||
 entry->size != cache_size ||
 !test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
-xen_remap_bucket(entry, cache_size, address_index, dummy);
+xen_remap_bucket(entry, NULL, cache_size, address_index, dummy);
 }
 }
 
@@ -482,3 +483,95 @@ void xen_invalidate_map_cache(void)
 
 mapcache_unlock();
 }
+
+static uint8_t *xen_remap_cache_entry_unlocked(hwaddr phys_addr, hwaddr size)
+{
+MapCacheEntry *entry, *pentry = NULL;
+hwaddr address_index;
+hwaddr address_offset;
+hwaddr cache_size = size;
+hwaddr test_bit_size;
+void *vaddr = NULL;
+uint8_t lock;
+
+address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
+address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1);
+
+/* test_bit_size is always a multiple of XC_PAGE_SIZE */
+if (size) {
+test_bit_size = size + (phys_addr & (XC_PAGE_SIZE - 1));
+if (test_bit_size % XC_PAGE_SIZE) {
+test_bit_size += XC_PAGE_SIZE - (test_bit_size % XC_PAGE_SIZE);
+}
+cache_size = size + address_offset;
+if (cache_size % MCACHE_BUCKET_SIZE) {
+cache_size += MCACHE_BUCKET_SIZE - (cache_size % 
MCACHE_BUCKET_SIZE);
+}
+} else {
+test_bit_size = XC_PAGE_SIZE;
+cache_size = MCACHE_BUCKET_SIZE;
+}
+
+/* Search for the requested map cache entry to invalidate */
+entry = >entry[ad

[Xen-devel] [PATCH 1/4] xen: move physmap saving into a separate function

2017-06-30 Thread Igor Druzhinin
Non-functional change.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 hw/i386/xen/xen-hvm.c | 57 ---
 1 file changed, 31 insertions(+), 26 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index cffa7e2..d259cf7 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -305,6 +305,36 @@ static hwaddr xen_phys_offset_to_gaddr(hwaddr start_addr,
 return start_addr;
 }
 
+static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
+{
+char path[80], value[17];
+
+snprintf(path, sizeof(path),
+"/local/domain/0/device-model/%d/physmap/%"PRIx64"/start_addr",
+xen_domid, (uint64_t)physmap->phys_offset);
+snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)physmap->start_addr);
+if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
+return -1;
+}
+snprintf(path, sizeof(path),
+"/local/domain/0/device-model/%d/physmap/%"PRIx64"/size",
+xen_domid, (uint64_t)physmap->phys_offset);
+snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)physmap->size);
+if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
+return -1;
+}
+if (physmap->name) {
+snprintf(path, sizeof(path),
+"/local/domain/0/device-model/%d/physmap/%"PRIx64"/name",
+xen_domid, (uint64_t)physmap->phys_offset);
+if (!xs_write(state->xenstore, 0, path,
+  physmap->name, strlen(physmap->name))) {
+return -1;
+}
+}
+return 0;
+}
+
 static int xen_add_to_physmap(XenIOState *state,
   hwaddr start_addr,
   ram_addr_t size,
@@ -316,7 +346,6 @@ static int xen_add_to_physmap(XenIOState *state,
 XenPhysmap *physmap = NULL;
 hwaddr pfn, start_gpfn;
 hwaddr phys_offset = memory_region_get_ram_addr(mr);
-char path[80], value[17];
 const char *mr_name;
 
 if (get_physmapping(state, start_addr, size)) {
@@ -368,31 +397,7 @@ go_physmap:
start_addr >> TARGET_PAGE_BITS,
(start_addr + size - 1) >> TARGET_PAGE_BITS,
XEN_DOMCTL_MEM_CACHEATTR_WB);
-
-snprintf(path, sizeof(path),
-"/local/domain/0/device-model/%d/physmap/%"PRIx64"/start_addr",
-xen_domid, (uint64_t)phys_offset);
-snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)start_addr);
-if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
-return -1;
-}
-snprintf(path, sizeof(path),
-"/local/domain/0/device-model/%d/physmap/%"PRIx64"/size",
-xen_domid, (uint64_t)phys_offset);
-snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)size);
-if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
-return -1;
-}
-if (mr_name) {
-snprintf(path, sizeof(path),
-"/local/domain/0/device-model/%d/physmap/%"PRIx64"/name",
-xen_domid, (uint64_t)phys_offset);
-if (!xs_write(state->xenstore, 0, path, mr_name, strlen(mr_name))) {
-return -1;
-}
-}
-
-return 0;
+return xen_save_physmap(state, physmap);
 }
 
 static int xen_remove_from_physmap(XenIOState *state,
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 0/4] xen: don't save/restore the physmap on VM save/restore

2017-06-30 Thread Igor Druzhinin
Saving/restoring the physmap to/from xenstore was introduced to
QEMU majorly in order to cover up the VRAM region restore issue.
The sequence of restore operations implies that we should know
the effective guest VRAM address *before* we have the VRAM region
restored (which happens later). Unfortunately, in Xen environment
VRAM memory does actually belong to a guest - not QEMU itself -
which means the position of this region is unknown beforehand and
can't be mapped into QEMU address space immediately.

Previously, recreating xenstore keys, holding the physmap, by the
toolstack helped to get this information in place at the right
moment ready to be consumed by QEMU to map the region properly.
But using xenstore for it has certain disadvantages: toolstack
needs to be aware of these keys and save/restore them accordingly;
accessing xenstore requires extra privileges which hinders QEMU
sandboxing.

The previous attempt to get rid of that was to remember all the
VRAM pointers during QEMU initialization phase and then update
them all at once when an actual foreign mapping is established.
Unfortunately, this approach worked only for VRAM and only for
a predefined set of devices - stdvga and cirrus. QXL and other
possible future devices using a moving emulated MMIO region
would be equally broken.

The new approach leverages xenforeignmemory_map2() call recently
introduced in libxenforeignmemory. It allows to create a dummy
anonymous mapping for QEMU during its initialization and change
it to a real one later during machine state restore.

Igor Druzhinin (4):
  xen: move physmap saving into a separate function
  xen/mapcache: add an ability to create dummy mappings
  xen/mapcache: introduce xen_remap_cache_entry()
  xen: don't use xenstore to save/restore physmap anymore

 configure |  18 ++
 hw/i386/xen/xen-hvm.c | 100 
 hw/i386/xen/xen-mapcache.c| 129 +++---
 include/hw/xen/xen_common.h   |   8 +++
 include/sysemu/xen-mapcache.h |   6 ++
 5 files changed, 217 insertions(+), 44 deletions(-)

-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] tools/libxenforeignmemory: add xenforeignmemory_map2 function

2017-06-28 Thread Igor Druzhinin
The new function repeats the behavior of the first version
except it has an extended list of arguments which are subsequently
passed to mmap() call.

This is needed for QEMU depriviledging.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
---
 tools/libs/foreignmemory/Makefile   |  2 +-
 tools/libs/foreignmemory/compat.c   |  6 +++---
 tools/libs/foreignmemory/core.c | 18 +-
 tools/libs/foreignmemory/freebsd.c  |  7 +++
 tools/libs/foreignmemory/include/xenforeignmemory.h | 12 
 tools/libs/foreignmemory/libxenforeignmemory.map|  4 
 tools/libs/foreignmemory/linux.c|  7 +++
 tools/libs/foreignmemory/minios.c   |  4 ++--
 tools/libs/foreignmemory/netbsd.c   |  6 +++---
 tools/libs/foreignmemory/private.h  |  7 ---
 tools/libs/foreignmemory/solaris.c  |  5 ++---
 11 files changed, 50 insertions(+), 28 deletions(-)

diff --git a/tools/libs/foreignmemory/Makefile 
b/tools/libs/foreignmemory/Makefile
index 2f2caa1..5e93ee7 100644
--- a/tools/libs/foreignmemory/Makefile
+++ b/tools/libs/foreignmemory/Makefile
@@ -2,7 +2,7 @@ XEN_ROOT = $(CURDIR)/../../..
 include $(XEN_ROOT)/tools/Rules.mk
 
 MAJOR= 1
-MINOR= 1
+MINOR= 2
 SHLIB_LDFLAGS += -Wl,--version-script=libxenforeignmemory.map
 
 CFLAGS   += -Werror -Wmissing-prototypes
diff --git a/tools/libs/foreignmemory/compat.c 
b/tools/libs/foreignmemory/compat.c
index b79ec1a..5f730ca 100644
--- a/tools/libs/foreignmemory/compat.c
+++ b/tools/libs/foreignmemory/compat.c
@@ -21,8 +21,8 @@
 
 #include "private.h"
 
-void *osdep_xenforeignmemory_map(xenforeignmemory_handle *fmem,
- uint32_t dom, int prot, size_t num,
+void *osdep_xenforeignmemory_map(xenforeignmemory_handle *fmem, uint32_t dom,
+ void *addr, int prot, int flags, size_t num,
  const xen_pfn_t arr[/*num*/], int 
err[/*num*/])
 {
 xen_pfn_t *pfn;
@@ -41,7 +41,7 @@ void *osdep_xenforeignmemory_map(xenforeignmemory_handle 
*fmem,
 }
 
 memcpy(pfn, arr, num * sizeof(*arr));
-ret = osdep_map_foreign_batch(fmem, dom, prot, pfn, num);
+ret = osdep_map_foreign_batch(fmem, dom, addr, prot, flags, pfn, num);
 
 if (ret) {
 for (i = 0; i < num; ++i)
diff --git a/tools/libs/foreignmemory/core.c b/tools/libs/foreignmemory/core.c
index 0ebd429..a6897dc 100644
--- a/tools/libs/foreignmemory/core.c
+++ b/tools/libs/foreignmemory/core.c
@@ -63,10 +63,10 @@ int xenforeignmemory_close(xenforeignmemory_handle *fmem)
 return rc;
 }
 
-void *xenforeignmemory_map(xenforeignmemory_handle *fmem,
-   uint32_t dom, int prot,
-   size_t num,
-   const xen_pfn_t arr[/*num*/], int err[/*num*/])
+void *xenforeignmemory_map2(xenforeignmemory_handle *fmem,
+uint32_t dom, void *addr,
+int prot, int flags, size_t num,
+const xen_pfn_t arr[/*num*/], int err[/*num*/])
 {
 void *ret;
 int *err_to_free = NULL;
@@ -77,7 +77,7 @@ void *xenforeignmemory_map(xenforeignmemory_handle *fmem,
 if ( err == NULL )
 return NULL;
 
-ret = osdep_xenforeignmemory_map(fmem, dom, prot, num, arr, err);
+ret = osdep_xenforeignmemory_map(fmem, dom, addr, prot, flags, num, arr, 
err);
 
 if ( ret && err_to_free )
 {
@@ -100,6 +100,14 @@ void *xenforeignmemory_map(xenforeignmemory_handle *fmem,
 return ret;
 }
 
+void *xenforeignmemory_map(xenforeignmemory_handle *fmem,
+   uint32_t dom, int prot,
+   size_t num,
+   const xen_pfn_t arr[/*num*/], int err[/*num*/])
+{
+return xenforeignmemory_map2(fmem, dom, NULL, prot, 0, num, arr, err);
+}
+
 int xenforeignmemory_unmap(xenforeignmemory_handle *fmem,
void *addr, size_t num)
 {
diff --git a/tools/libs/foreignmemory/freebsd.c 
b/tools/libs/foreignmemory/freebsd.c
index f6cd08c..dec4474 100644
--- a/tools/libs/foreignmemory/freebsd.c
+++ b/tools/libs/foreignmemory/freebsd.c
@@ -55,16 +55,15 @@ int osdep_xenforeignmemory_close(xenforeignmemory_handle 
*fmem)
 }
 
 void *osdep_xenforeignmemory_map(xenforeignmemory_handle *fmem,
- uint32_t dom, int prot,
- size_t num,
+ uint32_t dom, void *addr,
+ int prot, int flags, size_t num,
  const xen_pfn_t arr[/*num*/], int 
err[/*num*/])
 {
 int fd = fmem->fd;
 privcmd_mmapbatch_t ioctlx;
-

Re: [Xen-devel] Fwd: VM Live Migration with Local Storage

2017-06-20 Thread Igor Druzhinin
On 12/06/17 04:16, Bruno Alvisio wrote:
> Hello,
> 
> I think it would be beneficial to add local disk migration feature for
> ‘blkback' backend since it is one of the mostly used backends. I would
> like to start a discussion about the design of the machinery needed to
> achieve this feature.
> 
> ===
> Objective
> Add a feature to migrate VMs that have local storage and use the blkback
> iface.
> ===
> 
> ===
> User Interface
> Add a cmd line option in “xl migrate” command to specify if local disks
> need to be copied to the destination node.
> ===
> 
> ===
> Design
> 
>  1. As part of the libxl_domain_suspend, the “disk mirroring machinery”
> starts an asynchronous job that copies the disks blocks from source
> to the destination.
>  2. The protocol to copy the disks should resemble the one used for
> memory copy:
> 
>   * Do first initial copy of the disk.
>   * Check of sectors that have been written since copy started. For
> this, the blkback driver should be aware that migration of disk is
> happening and in this case forward the write request to
> the “migration machinery” so that a record of dirty blocks are logged.
>   * Migration machinery copies “dirty” blocks until convergence.

Be careful with that. You don't really want to merge block and memory
live migrations. They should be linked but proceed independently since
we don't want to have the last iteration of memory transfer stalled
waiting for disk convergence. Some mix of pre-copy and post-copy
approach might be suitable.

Igor

>   * Duplicate all the disk writes/reads to both disks in source and
> destinations node while VM is being suspended.
> 
> 
> Block Diagram
> 
>+—--+
>|  VM   |
>+---+
>   |
>   | I/O Write
>   |
>   V
> +--+   +---+   +-+
> |  blkback | > |  Source   |  sectors Stream   | Destination |
> +--+   |  mirror   |-->|   mirror|
>   || machinery |   I/O Writes  |  machinery  |
>   |+---+   +-+
>   |  |
>   |  |
>   | To I/O block layer   |
>   |  |
>   V  V
> +--+   +-+
> |   disk   |   |   Mirrored  |
> +--+   | Disk|
>+-+
> 
> 
> ==
> Initial Questions
> 
>  1. Is it possible to leverage the current design of QEMU for drive
> mirroring for Xen?
>  2. What is the best place to implement this protocol? As part of Xen or
> the kernel?
>  3. Is it possible to use the same stream currently used for migrating
> the memory to also migrate the disk blocks?
> 
> 
> Any guidance/feedback for a more specific design is greatly appreciated.
> 
> Thanks,
> 
> Bruno
> 
> On Wed, Feb 22, 2017 at 5:00 AM, Wei Liu  > wrote:
> 
> Hi Bruno
> 
> Thanks for your interest.
> 
> On Tue, Feb 21, 2017 at 10:34:45AM -0800, Bruno Alvisio wrote:
> > Hello,
> >
> > I have been to doing some research and as far as I know XEN supports
> > Live Migration
> > of VMs that only have shared storage. (i.e. iSCSI) If the VM has been
> > booted with local storage it cannot be live migrated.
> > QEMU seems to support live migration with local storage (I have tested 
> using
> > 'virsh migrate with the '--storage-copy-all' option)
> >
> > I am wondering if this still true in the latest XEN release. Are there 
> plans
> > to add this functionality in future releases? I would be interested in
> > contributing to the Xen Project by adding this functionality.
> >
> 
> No plan at the moment.
> 
> Xen supports a wide variety of disk backends. QEMU is one of them. The
> others are blktap (not upstreamed yet) and in-kernel blkback. The latter
> two don't have the capability to copy local storage to the remote end.
> 
> That said, I think it would be valuable to have such capability for QEMU
> backed disks. We also need to design the machinery so that other
> backends can be made to do the same thing in the future.
> 
> If you want to undertake this project, I suggest you setup a Xen system,
> read xl / libxl source code under tools directory and understand how
> everything is put together. Reading source code could be daunting at
> times, so don't hesitate to ask for pointers. After you have the big
> 

Re: [Xen-devel] [PATCH] firmware/vgabios: Port PCI based VBE LFB discovery method from QEMU fork

2017-06-13 Thread Igor Druzhinin
On 13/06/17 15:04, Andrew Cooper wrote:
> On 10/05/17 15:31, Igor Druzhinin wrote:
>> QEMU-traditional implements non-standard VBE registers for getting LFB
>> physical address from inside of VGA BIOS code. QEMU doesn't have
>> those registers implemented and returns 0 when an HVM guest is trying to
>> access them from the existing ROMBIOS code. This eventually leads to
>> a triple fault inside a guest which happened to use ROMBIOS instead of
>> SeaBIOS when in stdvga mode.
>>
>> QEMU maintains its own fork of VGA BIOS where the VBE LFB discovery is
>> implemented through a regular PCI BAR reading. In order to support that
>> we need to build a PCI compliant VGA BIOS version for stdvga and include
>> it into ROMBIOS instead of the old one.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
> 
> How much of this is ported from existing changes elsewhere?
> 

Only ASM functions below for PCI conf space accessing are ported from
vgabios fork of QEMU. If I need to incorporate this somehow into the
commit message, could you point me to an example of doing this properly?

>> ---
>> CC: Jan Beulich <jbeul...@suse.com>
>> CC: Andrew Cooper <andrew.coop...@citrix.com>
>> CC: Ian Jackson <ian.jack...@eu.citrix.com>
>> CC: Wei Liu <wei.l...@citrix.com>
>> ---
>>  tools/firmware/hvmloader/Makefile |  2 +-
>>  tools/firmware/vgabios/Makefile   | 29 +++--
>>  tools/firmware/vgabios/vbe.c  |  9 ++
>>  tools/firmware/vgabios/vgabios.c  | 68 
>> +++
>>  4 files changed, 105 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/firmware/hvmloader/Makefile 
>> b/tools/firmware/hvmloader/Makefile
>> index 80d7b44..5f6eacd 100644
>> --- a/tools/firmware/hvmloader/Makefile
>> +++ b/tools/firmware/hvmloader/Makefile
>> @@ -45,7 +45,7 @@ CIRRUSVGA_DEBUG ?= n
>>  ROMBIOS_DIR := ../rombios
>>  
>>  ifeq ($(CONFIG_ROMBIOS),y)
>> -STDVGA_ROM:= ../vgabios/VGABIOS-lgpl-latest.bin
>> +STDVGA_ROM:= ../vgabios/VGABIOS-lgpl-latest.stdvga.bin
>>  ifeq ($(CIRRUSVGA_DEBUG),y)
>>  CIRRUSVGA_ROM := ../vgabios/VGABIOS-lgpl-latest.cirrus.debug.bin
>>  else
>> diff --git a/tools/firmware/vgabios/Makefile 
>> b/tools/firmware/vgabios/Makefile
>> index 3284812..0f4026e 100644
>> --- a/tools/firmware/vgabios/Makefile
>> +++ b/tools/firmware/vgabios/Makefile
>> @@ -11,7 +11,7 @@ RELVERS = `pwd | sed "s-.*/--" | sed "s/vgabios//" | sed 
>> "s/-//"`
>>  VGABIOS_DATE = "-DVGABIOS_DATE=\"$(VGABIOS_REL_DATE)\""
>>  
>>  .PHONY: all
>> -all: bios cirrus-bios
>> +all: bios cirrus-bios stdvga-bios
>>  
>>  .PHONY: bios
>>  bios: biossums vgabios.bin vgabios.debug.bin 
>> @@ -19,6 +19,9 @@ bios: biossums vgabios.bin vgabios.debug.bin
>>  .PHONY: cirrus-bios
>>  cirrus-bios: vgabios-cirrus.bin vgabios-cirrus.debug.bin
>>  
>> +.PHONY: stdvga-bios
>> +stdvga-bios: vgabios-stdvga.bin vgabios-stdvga.debug.bin
>> +
>>  .PHONY: clean
>>  clean:
>>  rm -f  biossums vbetables-gen vbetables.h *.o *.s *.ld86 \
>> @@ -30,13 +33,15 @@ distclean: clean
>>  
>>  .PHONY: release
>>  release: 
>> -VGABIOS_VERS=\"-DVGABIOS_VERS=\\\"$(RELVERS)\\\"\" make bios cirrus-bios
>> +VGABIOS_VERS=\"-DVGABIOS_VERS=\\\"$(RELVERS)\\\"\" make bios 
>> cirrus-bios stdvga-bios
>>  /bin/rm -f  *.o *.s *.ld86 \
>>temp.awk.* vgabios.*.orig _vgabios_.*.c core *.bak .#*
>>  cp VGABIOS-lgpl-latest.bin ../$(RELEASE).bin
>>  cp VGABIOS-lgpl-latest.debug.bin ../$(RELEASE).debug.bin
>>  cp VGABIOS-lgpl-latest.cirrus.bin ../$(RELEASE).cirrus.bin
>>  cp VGABIOS-lgpl-latest.cirrus.debug.bin ../$(RELEASE).cirrus.debug.bin
>> +cp VGABIOS-lgpl-latest.stdvga.bin ../$(RELEASE).stdvga.bin
>> +cp VGABIOS-lgpl-latest.stdvga.debug.bin ../$(RELEASE).stdvga.debug.bin
>>  tar czvf ../$(RELEASE).tgz --exclude CVS -C .. $(RELEASE)/
>>  
>>  vgabios.bin: biossums vgabios.c vgabios.h vgafonts.h vgatables.h vbe.h 
>> vbe.c vbetables.h
>> @@ -59,6 +64,26 @@ vgabios.debug.bin: biossums vgabios.c vgabios.h 
>> vgafonts.h vgatables.h vbe.h vbe
>>  ./biossums VGABIOS-lgpl-latest.debug.bin
>>  ls -l VGABIOS-lgpl-latest.debug.bin
>>  
>> +vgabios-stdvga.bin: biossums vgabios.c vgabios.h vgafonts.h vgatables.h 
>> vbe.h vbe.c vbetables.h
>> +$(GCC) -E -P vgabios.c $(VGABIOS_VERS) -DVBE -DPCIBIOS -DPCI_VID=0x12

[Xen-devel] [PATCH] firmware/vgabios: Port PCI based VBE LFB discovery method from QEMU fork

2017-05-10 Thread Igor Druzhinin
QEMU-traditional implements non-standard VBE registers for getting LFB
physical address from inside of VGA BIOS code. QEMU doesn't have
those registers implemented and returns 0 when an HVM guest is trying to
access them from the existing ROMBIOS code. This eventually leads to
a triple fault inside a guest which happened to use ROMBIOS instead of
SeaBIOS when in stdvga mode.

QEMU maintains its own fork of VGA BIOS where the VBE LFB discovery is
implemented through a regular PCI BAR reading. In order to support that
we need to build a PCI compliant VGA BIOS version for stdvga and include
it into ROMBIOS instead of the old one.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
CC: Jan Beulich <jbeul...@suse.com>
CC: Andrew Cooper <andrew.coop...@citrix.com>
CC: Ian Jackson <ian.jack...@eu.citrix.com>
CC: Wei Liu <wei.l...@citrix.com>
---
 tools/firmware/hvmloader/Makefile |  2 +-
 tools/firmware/vgabios/Makefile   | 29 +++--
 tools/firmware/vgabios/vbe.c  |  9 ++
 tools/firmware/vgabios/vgabios.c  | 68 +++
 4 files changed, 105 insertions(+), 3 deletions(-)

diff --git a/tools/firmware/hvmloader/Makefile 
b/tools/firmware/hvmloader/Makefile
index 80d7b44..5f6eacd 100644
--- a/tools/firmware/hvmloader/Makefile
+++ b/tools/firmware/hvmloader/Makefile
@@ -45,7 +45,7 @@ CIRRUSVGA_DEBUG ?= n
 ROMBIOS_DIR := ../rombios
 
 ifeq ($(CONFIG_ROMBIOS),y)
-STDVGA_ROM:= ../vgabios/VGABIOS-lgpl-latest.bin
+STDVGA_ROM:= ../vgabios/VGABIOS-lgpl-latest.stdvga.bin
 ifeq ($(CIRRUSVGA_DEBUG),y)
 CIRRUSVGA_ROM := ../vgabios/VGABIOS-lgpl-latest.cirrus.debug.bin
 else
diff --git a/tools/firmware/vgabios/Makefile b/tools/firmware/vgabios/Makefile
index 3284812..0f4026e 100644
--- a/tools/firmware/vgabios/Makefile
+++ b/tools/firmware/vgabios/Makefile
@@ -11,7 +11,7 @@ RELVERS = `pwd | sed "s-.*/--" | sed "s/vgabios//" | sed 
"s/-//"`
 VGABIOS_DATE = "-DVGABIOS_DATE=\"$(VGABIOS_REL_DATE)\""
 
 .PHONY: all
-all: bios cirrus-bios
+all: bios cirrus-bios stdvga-bios
 
 .PHONY: bios
 bios: biossums vgabios.bin vgabios.debug.bin 
@@ -19,6 +19,9 @@ bios: biossums vgabios.bin vgabios.debug.bin
 .PHONY: cirrus-bios
 cirrus-bios: vgabios-cirrus.bin vgabios-cirrus.debug.bin
 
+.PHONY: stdvga-bios
+stdvga-bios: vgabios-stdvga.bin vgabios-stdvga.debug.bin
+
 .PHONY: clean
 clean:
rm -f  biossums vbetables-gen vbetables.h *.o *.s *.ld86 \
@@ -30,13 +33,15 @@ distclean: clean
 
 .PHONY: release
 release: 
-   VGABIOS_VERS=\"-DVGABIOS_VERS=\\\"$(RELVERS)\\\"\" make bios cirrus-bios
+   VGABIOS_VERS=\"-DVGABIOS_VERS=\\\"$(RELVERS)\\\"\" make bios 
cirrus-bios stdvga-bios
/bin/rm -f  *.o *.s *.ld86 \
   temp.awk.* vgabios.*.orig _vgabios_.*.c core *.bak .#*
cp VGABIOS-lgpl-latest.bin ../$(RELEASE).bin
cp VGABIOS-lgpl-latest.debug.bin ../$(RELEASE).debug.bin
cp VGABIOS-lgpl-latest.cirrus.bin ../$(RELEASE).cirrus.bin
cp VGABIOS-lgpl-latest.cirrus.debug.bin ../$(RELEASE).cirrus.debug.bin
+   cp VGABIOS-lgpl-latest.stdvga.bin ../$(RELEASE).stdvga.bin
+   cp VGABIOS-lgpl-latest.stdvga.debug.bin ../$(RELEASE).stdvga.debug.bin
tar czvf ../$(RELEASE).tgz --exclude CVS -C .. $(RELEASE)/
 
 vgabios.bin: biossums vgabios.c vgabios.h vgafonts.h vgatables.h vbe.h vbe.c 
vbetables.h
@@ -59,6 +64,26 @@ vgabios.debug.bin: biossums vgabios.c vgabios.h vgafonts.h 
vgatables.h vbe.h vbe
./biossums VGABIOS-lgpl-latest.debug.bin
ls -l VGABIOS-lgpl-latest.debug.bin
 
+vgabios-stdvga.bin: biossums vgabios.c vgabios.h vgafonts.h vgatables.h vbe.h 
vbe.c vbetables.h
+   $(GCC) -E -P vgabios.c $(VGABIOS_VERS) -DVBE -DPCIBIOS -DPCI_VID=0x1234 
-DPCI_DID=0x $(VGABIOS_DATE) > _vgabios-stdvga_.c
+   $(BCC) -o vgabios-stdvga.s -C-c -D__i86__ -S -0 _vgabios-stdvga_.c
+   sed -e 's/^\.text//' -e 's/^\.data//' vgabios-stdvga.s > 
_vgabios-stdvga_.s
+   $(AS86) _vgabios-stdvga_.s -b vgabios-stdvga.bin -u -w- -g -0 -j -O -l 
vgabios-stdvga.txt
+   rm -f _vgabios-stdvga_.s _vgabios-stdvga_.c vgabios-stdvga.s
+   cp vgabios-stdvga.bin VGABIOS-lgpl-latest.stdvga.bin
+   ./biossums VGABIOS-lgpl-latest.stdvga.bin
+   ls -l VGABIOS-lgpl-latest.stdvga.bin
+
+vgabios-stdvga.debug.bin: biossums vgabios.c vgabios.h vgafonts.h vgatables.h 
vbe.h vbe.c vbetables.h
+   $(GCC) -E -P vgabios.c $(VGABIOS_VERS) -DVBE -DPCIBIOS -DPCI_VID=0x1234 
-DPCI_DID=0x -DDEBUG $(VGABIOS_DATE) > _vgabios-stdvga-debug_.c
+   $(BCC) -o vgabios-stdvga-debug.s -C-c -D__i86__ -S -0 
_vgabios-stdvga-debug_.c
+   sed -e 's/^\.text//' -e 's/^\.data//' vgabios-stdvga-debug.s > 
_vgabios-stdvga-debug_.s
+   $(AS86) _vgabios-stdvga-debug_.s -b vgabios-stdvga-debug.bin -u -w- -g 
-0 -j -O -l vgabios-stdvga-debug.txt
+   rm -f _vgabios-stdvga-debug_.s _vgabios-stdvga-

[Xen-devel] [PATCH v2 for-4.9] x86/mm: Fix incorrect unmapping of 2MB and 1GB pages

2017-05-10 Thread Igor Druzhinin
The same set of functions is used to set as well as to clean
P2M entries, except that for clean operations INVALID_MFN (~0UL)
is passed as a parameter. Unfortunately, when calculating an
appropriate target order for a particular mapping INVALID_MFN
is not taken into account which leads to 4K page target order
being set each time even for 2MB and 1GB mappings. This eventually
breaks down an EPT structure irreversibly into 4K mappings which
prevents consecutive high order mappings to this area.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
Changes in v2:
* changed mistakenly used mfn_valid() to mfn_eq()
* aggregated gfn-mfn mask into one

CC: Jun Nakajima <jun.nakaj...@intel.com>
CC: Kevin Tian <kevin.t...@intel.com>
CC: George Dunlap <george.dun...@eu.citrix.com>
CC: Jan Beulich <jbeul...@suse.com>
CC: Andrew Cooper <andrew.coop...@citrix.com>

Bugfix intended for 4.9 release.
---
 xen/arch/x86/mm/p2m-ept.c |  3 ++-
 xen/arch/x86/mm/p2m.c | 11 +++
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index f37a1f2..f98121d 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -681,6 +681,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, 
mfn_t mfn,
 ept_entry_t *table, *ept_entry = NULL;
 unsigned long gfn_remainder = gfn;
 unsigned int i, target = order / EPT_TABLE_ORDER;
+unsigned long fn_mask = !mfn_eq(mfn, INVALID_MFN) ? (gfn | mfn_x(mfn)) : 
gfn;
 int ret, rc = 0;
 bool_t entry_written = 0;
 bool_t direct_mmio = (p2mt == p2m_mmio_direct);
@@ -701,7 +702,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, 
mfn_t mfn,
  * 2. gfn not exceeding guest physical address width.
  * 3. passing a valid order.
  */
-if ( ((gfn | mfn_x(mfn)) & ((1UL << order) - 1)) ||
+if ( (fn_mask & ((1UL << order) - 1)) ||
  ((u64)gfn >> ((ept->wl + 1) * EPT_TABLE_ORDER)) ||
  (order % EPT_TABLE_ORDER) )
 return -EINVAL;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index ae70a92..e902f1a 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -543,12 +543,15 @@ int p2m_set_entry(struct p2m_domain *p2m, unsigned long 
gfn, mfn_t mfn,
 while ( todo )
 {
 if ( hap_enabled(d) )
-order = (!((gfn | mfn_x(mfn) | todo) &
-   ((1ul << PAGE_ORDER_1G) - 1)) &&
+{
+unsigned long fn_mask = !mfn_eq(mfn, INVALID_MFN) ?
+ (gfn | mfn_x(mfn) | todo) : (gfn | todo);
+
+order = (!(fn_mask & ((1ul << PAGE_ORDER_1G) - 1)) &&
  hap_has_1gb) ? PAGE_ORDER_1G :
-(!((gfn | mfn_x(mfn) | todo) &
-   ((1ul << PAGE_ORDER_2M) - 1)) &&
+(!(fn_mask & ((1ul << PAGE_ORDER_2M) - 1)) &&
  hap_has_2mb) ? PAGE_ORDER_2M : PAGE_ORDER_4K;
+}
 else
 order = 0;
 
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.9] x86/mm: Fix incorrect unmapping of 2MB and 1GB pages

2017-05-10 Thread Igor Druzhinin
On 10/05/17 11:51, George Dunlap wrote:
> On 10/05/17 11:26, Jan Beulich wrote:
> On 10.05.17 at 11:43,  wrote:
>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -681,6 +681,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long 
>>> gfn, mfn_t mfn,
>>>  ept_entry_t *table, *ept_entry = NULL;
>>>  unsigned long gfn_remainder = gfn;
>>>  unsigned int i, target = order / EPT_TABLE_ORDER;
>>> +unsigned long mfn_mask = mfn_valid(mfn) ? mfn_x(mfn) : 0;
>>
>> Aiui MMIO pages will come here too, so an mfn_valid() check here
>> (and below) is too lax.
> 
> The resulting order will never be higher than the order passed in by the
> caller.  Assuming that the caller is setting an entire 2MiB (or 1GiB)
> region as MMIO, is it not valid to set a 2MiB or 1GiB entry as such?
> The code seems to be written in such a way that such entries are expected.
> 
>  -George
> 

Using mfn_valid() is my mistake here. I initially used mfn_eq(mfn,
INVALID_MFN) but then mixed them up eventually.

Igor


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.9] x86/mm: Fix incorrect unmapping of 2MB and 1GB pages

2017-05-10 Thread Igor Druzhinin
The same set of functions is used to set as well as to clean
P2M entries, except that for clean operations INVALID_MFN (~0UL)
is passed as a parameter. Unfortunately, when calculating an
appropriate target order for a particular mapping INVALID_MFN
is not taken into account which leads to 4K page target order
being set each time even for 2MB and 1GB mappings. This eventually
breaks down an EPT structure irreversibly into 4K mappings which
prevents consecutive high order mappings to this area.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
CC: Jun Nakajima <jun.nakaj...@intel.com>
CC: Kevin Tian <kevin.t...@intel.com>
CC: George Dunlap <george.dun...@eu.citrix.com>
CC: Jan Beulich <jbeul...@suse.com>
CC: Andrew Cooper <andrew.coop...@citrix.com>

Bugfix intended for 4.9 release.
---
 xen/arch/x86/mm/p2m-ept.c | 3 ++-
 xen/arch/x86/mm/p2m.c | 8 ++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index f37a1f2..8d82097 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -681,6 +681,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, 
mfn_t mfn,
 ept_entry_t *table, *ept_entry = NULL;
 unsigned long gfn_remainder = gfn;
 unsigned int i, target = order / EPT_TABLE_ORDER;
+unsigned long mfn_mask = mfn_valid(mfn) ? mfn_x(mfn) : 0;
 int ret, rc = 0;
 bool_t entry_written = 0;
 bool_t direct_mmio = (p2mt == p2m_mmio_direct);
@@ -701,7 +702,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, 
mfn_t mfn,
  * 2. gfn not exceeding guest physical address width.
  * 3. passing a valid order.
  */
-if ( ((gfn | mfn_x(mfn)) & ((1UL << order) - 1)) ||
+if ( ((gfn | mfn_mask) & ((1UL << order) - 1)) ||
  ((u64)gfn >> ((ept->wl + 1) * EPT_TABLE_ORDER)) ||
  (order % EPT_TABLE_ORDER) )
 return -EINVAL;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index ae70a92..fd57d41 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -536,6 +536,7 @@ int p2m_set_entry(struct p2m_domain *p2m, unsigned long 
gfn, mfn_t mfn,
 struct domain *d = p2m->domain;
 unsigned long todo = 1ul << page_order;
 unsigned int order;
+unsigned long mfn_mask;
 int set_rc, rc = 0;
 
 ASSERT(gfn_locked_by_me(p2m, gfn));
@@ -543,12 +544,15 @@ int p2m_set_entry(struct p2m_domain *p2m, unsigned long 
gfn, mfn_t mfn,
 while ( todo )
 {
 if ( hap_enabled(d) )
-order = (!((gfn | mfn_x(mfn) | todo) &
+{
+mfn_mask = mfn_valid(mfn) ? mfn_x(mfn) : 0;
+order = (!((gfn | mfn_mask | todo) &
((1ul << PAGE_ORDER_1G) - 1)) &&
  hap_has_1gb) ? PAGE_ORDER_1G :
-(!((gfn | mfn_x(mfn) | todo) &
+(!((gfn | mfn_mask | todo) &
((1ul << PAGE_ORDER_2M) - 1)) &&
  hap_has_2mb) ? PAGE_ORDER_2M : PAGE_ORDER_4K;
+}
 else
 order = 0;
 
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5] xen: don't save/restore the physmap on VM save/restore

2017-03-16 Thread Igor Druzhinin
On 16/03/17 12:54, Igor Druzhinin wrote:
> On 16/03/17 12:26, Anthony PERARD wrote:
>> On Wed, Mar 15, 2017 at 04:01:19PM +0000, Igor Druzhinin wrote:
>>> Saving/restoring the physmap to/from xenstore was introduced to
>>> QEMU majorly in order to cover up the VRAM region restore issue.
>>> The sequence of restore operations implies that we should know
>>> the effective guest VRAM address *before* we have the VRAM region
>>> restored (which happens later). Unfortunately, in Xen environment
>>> VRAM memory does actually belong to a guest - not QEMU itself -
>>> which means the position of this region is unknown beforehand and
>>> can't be mapped into QEMU address space immediately.
>>>
>>> Previously, recreating xenstore keys, holding the physmap, by the
>>> toolstack helped to get this information in place at the right
>>> moment ready to be consumed by QEMU to map the region properly.
>>>
>>> The extraneous complexity of having those keys transferred by the
>>> toolstack and unnecessary redundancy prompted us to propose a
>>> solution which doesn't require any extra data in xenstore. The idea
>>> is to defer the VRAM region mapping till the point we actually know
>>> the effective address and able to map it. To that end, we initially
>>> just skip the mapping request for the framebuffer if we unable to
>>> map it now. Then, after the memory region restore phase, we perform
>>> the mapping again, this time successfully, and update the VRAM region
>>> metadata accordingly.
>>>
>>> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
>>
>> I've tried to migrate a guest with this patch, but once migrated, the
>> screen is black (via VNC, keyboard is working fine).
>>
>> I haven't try to migrate a guest from QEMU without this patch to a QEMU
>> with it.
>>
> 
> Hmm. It works for me - I've tried to migrate between identical QEMUs
> with this patch on localhost. Save/restore also works fine.
> 
> What do you mean 'the screen is black'? Could you describe your actions
> so I could try to reproduce it?

Ok. I could track down the issue - starting from v4 the patch doesn't
work for cirrus. The reason is that post_load handler is different for
cirrus and doesn't call the parent handler from common vga code.

I manage to fix it by updating the corresponding handler by duplicating
the code. But is it a good solution? Would it be better to have the
common handler called in this function instead?

> 
> Igor
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5] xen: don't save/restore the physmap on VM save/restore

2017-03-16 Thread Igor Druzhinin
On 16/03/17 12:26, Anthony PERARD wrote:
> On Wed, Mar 15, 2017 at 04:01:19PM +0000, Igor Druzhinin wrote:
>> Saving/restoring the physmap to/from xenstore was introduced to
>> QEMU majorly in order to cover up the VRAM region restore issue.
>> The sequence of restore operations implies that we should know
>> the effective guest VRAM address *before* we have the VRAM region
>> restored (which happens later). Unfortunately, in Xen environment
>> VRAM memory does actually belong to a guest - not QEMU itself -
>> which means the position of this region is unknown beforehand and
>> can't be mapped into QEMU address space immediately.
>>
>> Previously, recreating xenstore keys, holding the physmap, by the
>> toolstack helped to get this information in place at the right
>> moment ready to be consumed by QEMU to map the region properly.
>>
>> The extraneous complexity of having those keys transferred by the
>> toolstack and unnecessary redundancy prompted us to propose a
>> solution which doesn't require any extra data in xenstore. The idea
>> is to defer the VRAM region mapping till the point we actually know
>> the effective address and able to map it. To that end, we initially
>> just skip the mapping request for the framebuffer if we unable to
>> map it now. Then, after the memory region restore phase, we perform
>> the mapping again, this time successfully, and update the VRAM region
>> metadata accordingly.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
> 
> I've tried to migrate a guest with this patch, but once migrated, the
> screen is black (via VNC, keyboard is working fine).
> 
> I haven't try to migrate a guest from QEMU without this patch to a QEMU
> with it.
> 

Hmm. It works for me - I've tried to migrate between identical QEMUs
with this patch on localhost. Save/restore also works fine.

What do you mean 'the screen is black'? Could you describe your actions
so I could try to reproduce it?

Igor

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5] xen: don't save/restore the physmap on VM save/restore

2017-03-15 Thread Igor Druzhinin
Saving/restoring the physmap to/from xenstore was introduced to
QEMU majorly in order to cover up the VRAM region restore issue.
The sequence of restore operations implies that we should know
the effective guest VRAM address *before* we have the VRAM region
restored (which happens later). Unfortunately, in Xen environment
VRAM memory does actually belong to a guest - not QEMU itself -
which means the position of this region is unknown beforehand and
can't be mapped into QEMU address space immediately.

Previously, recreating xenstore keys, holding the physmap, by the
toolstack helped to get this information in place at the right
moment ready to be consumed by QEMU to map the region properly.

The extraneous complexity of having those keys transferred by the
toolstack and unnecessary redundancy prompted us to propose a
solution which doesn't require any extra data in xenstore. The idea
is to defer the VRAM region mapping till the point we actually know
the effective address and able to map it. To that end, we initially
just skip the mapping request for the framebuffer if we unable to
map it now. Then, after the memory region restore phase, we perform
the mapping again, this time successfully, and update the VRAM region
metadata accordingly.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
v5:
* Add an assertion and debug printf

v4:
* Use VGA post_load handler for vram_ptr update

v3:
* Modify qemu_ram_ptr_length similarly with qemu_map_ram_ptr
* Add a comment explaining qemu_map_ram_ptr and qemu_ram_ptr_length
  semantic change for Xen
* Dropped some redundant changes

v2:
* Fix some building and coding style issues
---
 exec.c   |  16 +
 hw/display/vga.c |  11 ++
 xen-hvm.c| 104 ++-
 3 files changed, 46 insertions(+), 85 deletions(-)

diff --git a/exec.c b/exec.c
index aabb035..a1ac8cd 100644
--- a/exec.c
+++ b/exec.c
@@ -2008,6 +2008,14 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t 
addr)
 }
 
 block->host = xen_map_cache(block->offset, block->max_length, 1);
+if (block->host == NULL) {
+/* In case we cannot establish the mapping right away we might
+ * still be able to do it later e.g. on a later stage of restore.
+ * We don't touch the block and return NULL here to indicate
+ * that intention.
+ */
+return NULL;
+}
 }
 return ramblock_ptr(block, addr);
 }
@@ -2041,6 +2049,14 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, 
ram_addr_t addr,
 }
 
 block->host = xen_map_cache(block->offset, block->max_length, 1);
+if (block->host == NULL) {
+/* In case we cannot establish the mapping right away we might
+ * still be able to do it later e.g. on a later stage of restore.
+ * We don't touch the block and return NULL here to indicate
+ * that intention.
+ */
+return NULL;
+}
 }
 
 return ramblock_ptr(block, addr);
diff --git a/hw/display/vga.c b/hw/display/vga.c
index 69c3e1d..7d85fd8 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -2035,6 +2035,12 @@ static int vga_common_post_load(void *opaque, int 
version_id)
 {
 VGACommonState *s = opaque;
 
+if (xen_enabled() && !s->vram_ptr) {
+/* update VRAM region pointer in case we've failed
+ * the last time during init phase */
+s->vram_ptr = memory_region_get_ram_ptr(>vram);
+assert(s->vram_ptr);
+}
 /* force refresh */
 s->graphic_mode = -1;
 vbe_update_vgaregs(s);
@@ -2165,6 +2171,11 @@ void vga_common_init(VGACommonState *s, Object *obj, 
bool global_vmstate)
 vmstate_register_ram(>vram, global_vmstate ? NULL : DEVICE(obj));
 xen_register_framebuffer(>vram);
 s->vram_ptr = memory_region_get_ram_ptr(>vram);
+/* VRAM pointer might still be NULL here if we are restoring on Xen.
+   We try to get it again later at post-load phase. */
+#ifdef DEBUG_VGA_MEM
+printf("vga: vram ptr: %p\n", s->vram_ptr);
+#endif
 s->get_bpp = vga_get_bpp;
 s->get_offsets = vga_get_offsets;
 s->get_resolution = vga_get_resolution;
diff --git a/xen-hvm.c b/xen-hvm.c
index 5043beb..8bedd9b 100644
--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -317,7 +317,6 @@ static int xen_add_to_physmap(XenIOState *state,
 XenPhysmap *physmap = NULL;
 hwaddr pfn, start_gpfn;
 hwaddr phys_offset = memory_region_get_ram_addr(mr);
-char path[80], value[17];
 const char *mr_name;
 
 if (get_physmapping(state, start_addr, size)) {
@@ -340,6 +339,22 @@ go_physmap:
 DPRINTF("mapping vram to %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
 start_addr, start_addr + size);
 
+mr_name = memory_region_name(mr);
+
+physmap = g_malloc(

[Xen-devel] [PATCH v4] xen: don't save/restore the physmap on VM save/restore

2017-03-14 Thread Igor Druzhinin
Saving/restoring the physmap to/from xenstore was introduced to
QEMU majorly in order to cover up the VRAM region restore issue.
The sequence of restore operations implies that we should know
the effective guest VRAM address *before* we have the VRAM region
restored (which happens later). Unfortunately, in Xen environment
VRAM memory does actually belong to a guest - not QEMU itself -
which means the position of this region is unknown beforehand and
can't be mapped into QEMU address space immediately.

Previously, recreating xenstore keys, holding the physmap, by the
toolstack helped to get this information in place at the right
moment ready to be consumed by QEMU to map the region properly.

The extraneous complexity of having those keys transferred by the
toolstack and unnecessary redundancy prompted us to propose a
solution which doesn't require any extra data in xenstore. The idea
is to defer the VRAM region mapping till the point we actually know
the effective address and able to map it. To that end, we initially
just skip the mapping request for the framebuffer if we unable to
map it now. Then, after the memory region restore phase, we perform
the mapping again, this time successfully, and update the VRAM region
metadata accordingly.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
v4:
* Use VGA post_load handler for vram_ptr update

v3:
* Modify qemu_ram_ptr_length similarly with qemu_map_ram_ptr
* Add a comment explaining qemu_map_ram_ptr and qemu_ram_ptr_length 
  semantic change for Xen
* Dropped some redundant changes

v2:
* Fix some building and coding style issues
---
 exec.c   |  16 +
 hw/display/vga.c |   5 +++
 xen-hvm.c| 104 ++-
 3 files changed, 40 insertions(+), 85 deletions(-)

diff --git a/exec.c b/exec.c
index aabb035..a1ac8cd 100644
--- a/exec.c
+++ b/exec.c
@@ -2008,6 +2008,14 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t 
addr)
 }
 
 block->host = xen_map_cache(block->offset, block->max_length, 1);
+if (block->host == NULL) {
+/* In case we cannot establish the mapping right away we might
+ * still be able to do it later e.g. on a later stage of restore.
+ * We don't touch the block and return NULL here to indicate
+ * that intention.
+ */
+return NULL;
+}
 }
 return ramblock_ptr(block, addr);
 }
@@ -2041,6 +2049,14 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, 
ram_addr_t addr,
 }
 
 block->host = xen_map_cache(block->offset, block->max_length, 1);
+if (block->host == NULL) {
+/* In case we cannot establish the mapping right away we might
+ * still be able to do it later e.g. on a later stage of restore.
+ * We don't touch the block and return NULL here to indicate
+ * that intention.
+ */
+return NULL;
+}
 }
 
 return ramblock_ptr(block, addr);
diff --git a/hw/display/vga.c b/hw/display/vga.c
index 69c3e1d..f8aebe3 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -2035,6 +2035,11 @@ static int vga_common_post_load(void *opaque, int 
version_id)
 {
 VGACommonState *s = opaque;
 
+if (xen_enabled() && !s->vram_ptr) {
+/* update VRAM region pointer in case we've failed
+ * the last time during init phase */
+s->vram_ptr = memory_region_get_ram_ptr(>vram);
+}
 /* force refresh */
 s->graphic_mode = -1;
 vbe_update_vgaregs(s);
diff --git a/xen-hvm.c b/xen-hvm.c
index 5043beb..8bedd9b 100644
--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -317,7 +317,6 @@ static int xen_add_to_physmap(XenIOState *state,
 XenPhysmap *physmap = NULL;
 hwaddr pfn, start_gpfn;
 hwaddr phys_offset = memory_region_get_ram_addr(mr);
-char path[80], value[17];
 const char *mr_name;
 
 if (get_physmapping(state, start_addr, size)) {
@@ -340,6 +339,22 @@ go_physmap:
 DPRINTF("mapping vram to %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
 start_addr, start_addr + size);
 
+mr_name = memory_region_name(mr);
+
+physmap = g_malloc(sizeof(XenPhysmap));
+
+physmap->start_addr = start_addr;
+physmap->size = size;
+physmap->name = mr_name;
+physmap->phys_offset = phys_offset;
+
+QLIST_INSERT_HEAD(>physmap, physmap, list);
+
+if (runstate_check(RUN_STATE_INMIGRATE)) {
+/* If we are migrating the region has been already mapped */
+return 0;
+}
+
 pfn = phys_offset >> TARGET_PAGE_BITS;
 start_gpfn = start_addr >> TARGET_PAGE_BITS;
 for (i = 0; i < size >> TARGET_PAGE_BITS; i++) {
@@ -350,49 +365,17 @@ go_physmap:
 if (rc) {
 DPRINTF("add_to_physmap MFN %"PRI_xen_pfn" to PFN %"

Re: [Xen-devel] [PATCH v3] xen: don't save/restore the physmap on VM save/restore

2017-03-13 Thread Igor Druzhinin
On 13/03/17 21:15, Stefano Stabellini wrote:
> On Mon, 13 Mar 2017, Igor Druzhinin wrote:
>> Saving/restoring the physmap to/from xenstore was introduced to
>> QEMU majorly in order to cover up the VRAM region restore issue.
>> The sequence of restore operations implies that we should know
>> the effective guest VRAM address *before* we have the VRAM region
>> restored (which happens later). Unfortunately, in Xen environment
>> VRAM memory does actually belong to a guest - not QEMU itself -
>> which means the position of this region is unknown beforehand and
>> can't be mapped into QEMU address space immediately.
>>
>> Previously, recreating xenstore keys, holding the physmap, by the
>> toolstack helped to get this information in place at the right
>> moment ready to be consumed by QEMU to map the region properly.
>>
>> The extraneous complexity of having those keys transferred by the
>> toolstack and unnecessary redundancy prompted us to propose a
>> solution which doesn't require any extra data in xenstore. The idea
>> is to defer the VRAM region mapping till the point we actually know
>> the effective address and able to map it. To that end, we initially
>> only register the pointer to the framebuffer without actual mapping.
>> Then, during the memory region restore phase, we perform the mapping
>> of the known address and update the VRAM region metadata (including
>> previously registered pointer) accordingly.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
> 
> Let me get this straight. The current sequence is:
> 
> - read physmap from xenstore, including vram and rom addresses
> - vga initialization
>   - register framebuffer with xen-hvm.c
>   - set vram_ptr by mapping the vram region using xen_map_cache
> - rtl8139 initialization
>   - map rom files using xen_map_cache
> 
> The new sequence would be:
> 
> - vga initialization
>   - register framebuffer and _ptr with xen-hvm.c
>   - set vram_ptr to NULL because we don't know the vram address yet
> - rtl8139 initialization
>   - map rom files using xen_map_cache ???
> - the vram address is discovered as part of the savevm file
> - when the vram region is mapped into the guest, set vram_ptr to the right 
> value
> 
> 
> Is that right? If so, why can't we just move the
> 
>   s->vram_ptr = memory_region_get_ram_ptr(>vram);
> 
> line in vga.c to later? It would be better than changing the value of
> vram_ptr behind the scenes. Clearer for the vga maintainers too.
> 

Yes, it's one of the possible solutions. Probably would require more
changes in VGA code. But I'll take a look at this.

> 
> But my main concern is actually rom files. The current physmap mechanism
> also covers roms, such as the rtl8139 rom, which is used for pxebooting
> from the VM. How do you plan to cover those?
> 

Here is an excerpt from xen_add_to_physmap() which clearly indicates
that the only region that we track now is VRAM region.

if (mr == framebuffer && start_addr > 0xb) {
goto go_physmap;
}
return -1;

Maybe I'm missing something?

Igor

> 
>> v3:
>> * Modify qemu_ram_ptr_length similarly with qemu_map_ram_ptr
>> * Add a comment explaining qemu_map_ram_ptr and qemu_ram_ptr_length 
>>   semantic change for Xen
>> * Dropped some redundant changes
>>
>> v2:
>> * Fix some building and coding style issues
>>
>> ---
>>  exec.c   |  16 
>>  hw/display/vga.c |   2 +-
>>  include/hw/xen/xen.h |   2 +-
>>  xen-hvm-stub.c   |   2 +-
>>  xen-hvm.c| 111 
>> ---
>>  5 files changed, 44 insertions(+), 89 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index aabb035..a1ac8cd 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -2008,6 +2008,14 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, 
>> ram_addr_t addr)
>>  }
>>  
>>  block->host = xen_map_cache(block->offset, block->max_length, 1);
>> +if (block->host == NULL) {
>> +/* In case we cannot establish the mapping right away we might
>> + * still be able to do it later e.g. on a later stage of 
>> restore.
>> + * We don't touch the block and return NULL here to indicate
>> + * that intention.
>> + */
>> +return NULL;
>> +}
>>  }
>>  return ramblock_ptr(block, addr);
>>  }
>> @@ -2041,6 +2049,14 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, 
>> ram_addr_t addr,
>>

[Xen-devel] [PATCH v3] xen: don't save/restore the physmap on VM save/restore

2017-03-13 Thread Igor Druzhinin
Saving/restoring the physmap to/from xenstore was introduced to
QEMU majorly in order to cover up the VRAM region restore issue.
The sequence of restore operations implies that we should know
the effective guest VRAM address *before* we have the VRAM region
restored (which happens later). Unfortunately, in Xen environment
VRAM memory does actually belong to a guest - not QEMU itself -
which means the position of this region is unknown beforehand and
can't be mapped into QEMU address space immediately.

Previously, recreating xenstore keys, holding the physmap, by the
toolstack helped to get this information in place at the right
moment ready to be consumed by QEMU to map the region properly.

The extraneous complexity of having those keys transferred by the
toolstack and unnecessary redundancy prompted us to propose a
solution which doesn't require any extra data in xenstore. The idea
is to defer the VRAM region mapping till the point we actually know
the effective address and able to map it. To that end, we initially
only register the pointer to the framebuffer without actual mapping.
Then, during the memory region restore phase, we perform the mapping
of the known address and update the VRAM region metadata (including
previously registered pointer) accordingly.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
v3:
* Modify qemu_ram_ptr_length similarly with qemu_map_ram_ptr
* Add a comment explaining qemu_map_ram_ptr and qemu_ram_ptr_length 
  semantic change for Xen
* Dropped some redundant changes

v2:
* Fix some building and coding style issues

---
 exec.c   |  16 
 hw/display/vga.c |   2 +-
 include/hw/xen/xen.h |   2 +-
 xen-hvm-stub.c   |   2 +-
 xen-hvm.c| 111 ---
 5 files changed, 44 insertions(+), 89 deletions(-)

diff --git a/exec.c b/exec.c
index aabb035..a1ac8cd 100644
--- a/exec.c
+++ b/exec.c
@@ -2008,6 +2008,14 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t 
addr)
 }
 
 block->host = xen_map_cache(block->offset, block->max_length, 1);
+if (block->host == NULL) {
+/* In case we cannot establish the mapping right away we might
+ * still be able to do it later e.g. on a later stage of restore.
+ * We don't touch the block and return NULL here to indicate
+ * that intention.
+ */
+return NULL;
+}
 }
 return ramblock_ptr(block, addr);
 }
@@ -2041,6 +2049,14 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, 
ram_addr_t addr,
 }
 
 block->host = xen_map_cache(block->offset, block->max_length, 1);
+if (block->host == NULL) {
+/* In case we cannot establish the mapping right away we might
+ * still be able to do it later e.g. on a later stage of restore.
+ * We don't touch the block and return NULL here to indicate
+ * that intention.
+ */
+return NULL;
+}
 }
 
 return ramblock_ptr(block, addr);
diff --git a/hw/display/vga.c b/hw/display/vga.c
index 69c3e1d..be554c2 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -2163,7 +2163,7 @@ void vga_common_init(VGACommonState *s, Object *obj, bool 
global_vmstate)
 memory_region_init_ram(>vram, obj, "vga.vram", s->vram_size,
_fatal);
 vmstate_register_ram(>vram, global_vmstate ? NULL : DEVICE(obj));
-xen_register_framebuffer(>vram);
+xen_register_framebuffer(>vram, >vram_ptr);
 s->vram_ptr = memory_region_get_ram_ptr(>vram);
 s->get_bpp = vga_get_bpp;
 s->get_offsets = vga_get_offsets;
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 09c2ce5..3831843 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -45,6 +45,6 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size,
struct MemoryRegion *mr, Error **errp);
 void xen_modified_memory(ram_addr_t start, ram_addr_t length);
 
-void xen_register_framebuffer(struct MemoryRegion *mr);
+void xen_register_framebuffer(struct MemoryRegion *mr, uint8_t **ptr);
 
 #endif /* QEMU_HW_XEN_H */
diff --git a/xen-hvm-stub.c b/xen-hvm-stub.c
index c500325..c89065e 100644
--- a/xen-hvm-stub.c
+++ b/xen-hvm-stub.c
@@ -46,7 +46,7 @@ qemu_irq *xen_interrupt_controller_init(void)
 return NULL;
 }
 
-void xen_register_framebuffer(MemoryRegion *mr)
+void xen_register_framebuffer(MemoryRegion *mr, uint8_t **ptr)
 {
 }
 
diff --git a/xen-hvm.c b/xen-hvm.c
index 5043beb..221334a 100644
--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -41,6 +41,7 @@
 
 static MemoryRegion ram_memory, ram_640k, ram_lo, ram_hi;
 static MemoryRegion *framebuffer;
+static uint8_t **framebuffer_ptr;
 static bool xen_in_migration;
 
 /* Compatibility with older version */
@@ -317,7 +318,6 @@ static int xen_add_to_physmap(XenIOState *state,
   

[Xen-devel] [PATCH net v4] xen-netback: fix race condition on XenBus disconnect

2017-03-10 Thread Igor Druzhinin
In some cases during XenBus disconnect event handling and subsequent
queue resource release there may be some TX handlers active on
other processors. Use RCU in order to synchronize with them.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
v4:
 * Use READ_ONCE instead of rcu_dereference to stop sparse complaining

v3:
 * Fix unintended semantic change in xenvif_get_ethtool_stats
 * Dropped extra code

v2:
 * Add protection for xenvif_get_ethtool_stats
 * Additional comments and fixes
---
 drivers/net/xen-netback/interface.c | 26 +-
 drivers/net/xen-netback/netback.c   |  2 +-
 drivers/net/xen-netback/xenbus.c| 20 ++--
 3 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index 829b26c..8397f6c 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -165,13 +165,17 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 {
struct xenvif *vif = netdev_priv(dev);
struct xenvif_queue *queue = NULL;
-   unsigned int num_queues = vif->num_queues;
+   unsigned int num_queues;
u16 index;
struct xenvif_rx_cb *cb;
 
BUG_ON(skb->dev != dev);
 
-   /* Drop the packet if queues are not set up */
+   /* Drop the packet if queues are not set up.
+* This handler should be called inside an RCU read section
+* so we don't need to enter it here explicitly.
+*/
+   num_queues = READ_ONCE(vif->num_queues);
if (num_queues < 1)
goto drop;
 
@@ -222,18 +226,18 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
 {
struct xenvif *vif = netdev_priv(dev);
struct xenvif_queue *queue = NULL;
+   unsigned int num_queues;
u64 rx_bytes = 0;
u64 rx_packets = 0;
u64 tx_bytes = 0;
u64 tx_packets = 0;
unsigned int index;
 
-   spin_lock(>lock);
-   if (vif->queues == NULL)
-   goto out;
+   rcu_read_lock();
+   num_queues = READ_ONCE(vif->num_queues);
 
/* Aggregate tx and rx stats from each queue */
-   for (index = 0; index < vif->num_queues; ++index) {
+   for (index = 0; index < num_queues; ++index) {
queue = >queues[index];
rx_bytes += queue->stats.rx_bytes;
rx_packets += queue->stats.rx_packets;
@@ -241,8 +245,7 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
tx_packets += queue->stats.tx_packets;
}
 
-out:
-   spin_unlock(>lock);
+   rcu_read_unlock();
 
vif->dev->stats.rx_bytes = rx_bytes;
vif->dev->stats.rx_packets = rx_packets;
@@ -378,10 +381,13 @@ static void xenvif_get_ethtool_stats(struct net_device 
*dev,
 struct ethtool_stats *stats, u64 * data)
 {
struct xenvif *vif = netdev_priv(dev);
-   unsigned int num_queues = vif->num_queues;
+   unsigned int num_queues;
int i;
unsigned int queue_index;
 
+   rcu_read_lock();
+   num_queues = READ_ONCE(vif->num_queues);
+
for (i = 0; i < ARRAY_SIZE(xenvif_stats); i++) {
unsigned long accum = 0;
for (queue_index = 0; queue_index < num_queues; ++queue_index) {
@@ -390,6 +396,8 @@ static void xenvif_get_ethtool_stats(struct net_device *dev,
}
data[i] = accum;
}
+
+   rcu_read_unlock();
 }
 
 static void xenvif_get_strings(struct net_device *dev, u32 stringset, u8 * 
data)
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index f9bcf4a..602d408 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -214,7 +214,7 @@ static void xenvif_fatal_tx_err(struct xenvif *vif)
netdev_err(vif->dev, "fatal error; disabling device\n");
vif->disabled = true;
/* Disable the vif from queue 0's kthread */
-   if (vif->queues)
+   if (vif->num_queues)
xenvif_kick_thread(>queues[0]);
 }
 
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index d2d7cd9..a56d3ea 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -495,26 +495,26 @@ static void backend_disconnect(struct backend_info *be)
struct xenvif *vif = be->vif;
 
if (vif) {
+   unsigned int num_queues = vif->num_queues;
unsigned int queue_index;
-   struct xenvif_queue *queues;
 
xen_unregister_watchers(vif);
 #ifdef CONFIG_DEBUG_FS
xenvif_debugfs_delif(vif);
 #endif /* CONFIG_DEBUG_FS */
xenvif_disconnect_data(vif);
-   fo

[Xen-devel] [PATCH v2] xen: don't save/restore the physmap on VM save/restore

2017-03-10 Thread Igor Druzhinin
Saving/restoring the physmap to/from xenstore was introduced to
QEMU majorly in order to cover up the VRAM region restore issue.
The sequence of restore operations implies that we should know
the effective guest VRAM address *before* we have the VRAM region
restored (which happens later). Unfortunately, in Xen environment
VRAM memory does actually belong to a guest - not QEMU itself -
which means the position of this region is unknown beforehand and
can't be mapped into QEMU address space immediately.

Previously, recreating xenstore keys, holding the physmap, by the
toolstack helped to get this information in place at the right
moment ready to be consumed by QEMU to map the region properly.

The extraneous complexity of having those keys transferred by the
toolstack and unnecessary redundancy prompted us to propose a
solution which doesn't require any extra data in xenstore. The idea
is to defer the VRAM region mapping till the point we actually know
the effective address and able to map it. To that end, we initially
only register the pointer to the framebuffer without actual mapping.
Then, during the memory region restore phase, we perform the mapping
of the known address and update the VRAM region metadata (including
previously registered pointer) accordingly.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
v2:
* Fix some building and coding style issues
---
 exec.c   |   3 ++
 hw/display/vga.c |   2 +-
 include/hw/xen/xen.h |   2 +-
 xen-hvm-stub.c   |   2 +-
 xen-hvm.c| 114 ---
 5 files changed, 33 insertions(+), 90 deletions(-)

diff --git a/exec.c b/exec.c
index aabb035..5f2809e 100644
--- a/exec.c
+++ b/exec.c
@@ -2008,6 +2008,9 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t 
addr)
 }
 
 block->host = xen_map_cache(block->offset, block->max_length, 1);
+if (block->host == NULL) {
+return NULL;
+}
 }
 return ramblock_ptr(block, addr);
 }
diff --git a/hw/display/vga.c b/hw/display/vga.c
index 69c3e1d..be554c2 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -2163,7 +2163,7 @@ void vga_common_init(VGACommonState *s, Object *obj, bool 
global_vmstate)
 memory_region_init_ram(>vram, obj, "vga.vram", s->vram_size,
_fatal);
 vmstate_register_ram(>vram, global_vmstate ? NULL : DEVICE(obj));
-xen_register_framebuffer(>vram);
+xen_register_framebuffer(>vram, >vram_ptr);
 s->vram_ptr = memory_region_get_ram_ptr(>vram);
 s->get_bpp = vga_get_bpp;
 s->get_offsets = vga_get_offsets;
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 09c2ce5..3831843 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -45,6 +45,6 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size,
struct MemoryRegion *mr, Error **errp);
 void xen_modified_memory(ram_addr_t start, ram_addr_t length);
 
-void xen_register_framebuffer(struct MemoryRegion *mr);
+void xen_register_framebuffer(struct MemoryRegion *mr, uint8_t **ptr);
 
 #endif /* QEMU_HW_XEN_H */
diff --git a/xen-hvm-stub.c b/xen-hvm-stub.c
index c500325..c89065e 100644
--- a/xen-hvm-stub.c
+++ b/xen-hvm-stub.c
@@ -46,7 +46,7 @@ qemu_irq *xen_interrupt_controller_init(void)
 return NULL;
 }
 
-void xen_register_framebuffer(MemoryRegion *mr)
+void xen_register_framebuffer(MemoryRegion *mr, uint8_t **ptr)
 {
 }
 
diff --git a/xen-hvm.c b/xen-hvm.c
index 5043beb..270cd99 100644
--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -41,6 +41,7 @@
 
 static MemoryRegion ram_memory, ram_640k, ram_lo, ram_hi;
 static MemoryRegion *framebuffer;
+static uint8_t **framebuffer_ptr;
 static bool xen_in_migration;
 
 /* Compatibility with older version */
@@ -302,7 +303,6 @@ static hwaddr xen_phys_offset_to_gaddr(hwaddr start_addr,
 return physmap->start_addr;
 }
 }
-
 return start_addr;
 }
 
@@ -317,7 +317,6 @@ static int xen_add_to_physmap(XenIOState *state,
 XenPhysmap *physmap = NULL;
 hwaddr pfn, start_gpfn;
 hwaddr phys_offset = memory_region_get_ram_addr(mr);
-char path[80], value[17];
 const char *mr_name;
 
 if (get_physmapping(state, start_addr, size)) {
@@ -340,6 +339,27 @@ go_physmap:
 DPRINTF("mapping vram to %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
 start_addr, start_addr + size);
 
+mr_name = memory_region_name(mr);
+
+physmap = g_malloc(sizeof(XenPhysmap));
+
+physmap->start_addr = start_addr;
+physmap->size = size;
+physmap->name = mr_name;
+physmap->phys_offset = phys_offset;
+
+QLIST_INSERT_HEAD(>physmap, physmap, list);
+
+if (runstate_check(RUN_STATE_INMIGRATE)) {
+/* At this point we have a physmap entry for the framebuffer region
+ * established during the restore phase so we can safely update the
+

[Xen-devel] [PATCH net v3] xen-netback: fix race condition on XenBus disconnect

2017-03-09 Thread Igor Druzhinin
In some cases during XenBus disconnect event handling and subsequent
queue resource release there may be some TX handlers active on
other processors. Use RCU in order to synchronize with them.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
v3:
 * Fix unintended semantic change in xenvif_get_ethtool_stats
 * Dropped extra code

v2:
 * Add protection for xenvif_get_ethtool_stats
 * Additional comments and fixes
---
 drivers/net/xen-netback/interface.c | 26 +-
 drivers/net/xen-netback/netback.c   |  2 +-
 drivers/net/xen-netback/xenbus.c| 20 ++--
 3 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index 829b26c..a3c018e 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -165,13 +165,17 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 {
struct xenvif *vif = netdev_priv(dev);
struct xenvif_queue *queue = NULL;
-   unsigned int num_queues = vif->num_queues;
+   unsigned int num_queues;
u16 index;
struct xenvif_rx_cb *cb;
 
BUG_ON(skb->dev != dev);
 
-   /* Drop the packet if queues are not set up */
+   /* Drop the packet if queues are not set up.
+* This handler should be called inside an RCU read section
+* so we don't need to enter it here explicitly.
+*/
+   num_queues = rcu_dereference(vif)->num_queues;
if (num_queues < 1)
goto drop;
 
@@ -222,18 +226,18 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
 {
struct xenvif *vif = netdev_priv(dev);
struct xenvif_queue *queue = NULL;
+   unsigned int num_queues;
u64 rx_bytes = 0;
u64 rx_packets = 0;
u64 tx_bytes = 0;
u64 tx_packets = 0;
unsigned int index;
 
-   spin_lock(>lock);
-   if (vif->queues == NULL)
-   goto out;
+   rcu_read_lock();
+   num_queues = rcu_dereference(vif)->num_queues;
 
/* Aggregate tx and rx stats from each queue */
-   for (index = 0; index < vif->num_queues; ++index) {
+   for (index = 0; index < num_queues; ++index) {
queue = >queues[index];
rx_bytes += queue->stats.rx_bytes;
rx_packets += queue->stats.rx_packets;
@@ -241,8 +245,7 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
tx_packets += queue->stats.tx_packets;
}
 
-out:
-   spin_unlock(>lock);
+   rcu_read_unlock();
 
vif->dev->stats.rx_bytes = rx_bytes;
vif->dev->stats.rx_packets = rx_packets;
@@ -378,10 +381,13 @@ static void xenvif_get_ethtool_stats(struct net_device 
*dev,
 struct ethtool_stats *stats, u64 * data)
 {
struct xenvif *vif = netdev_priv(dev);
-   unsigned int num_queues = vif->num_queues;
+   unsigned int num_queues;
int i;
unsigned int queue_index;
 
+   rcu_read_lock();
+   num_queues = rcu_dereference(vif)->num_queues;
+
for (i = 0; i < ARRAY_SIZE(xenvif_stats); i++) {
unsigned long accum = 0;
for (queue_index = 0; queue_index < num_queues; ++queue_index) {
@@ -390,6 +396,8 @@ static void xenvif_get_ethtool_stats(struct net_device *dev,
}
data[i] = accum;
}
+
+   rcu_read_unlock();
 }
 
 static void xenvif_get_strings(struct net_device *dev, u32 stringset, u8 * 
data)
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index f9bcf4a..602d408 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -214,7 +214,7 @@ static void xenvif_fatal_tx_err(struct xenvif *vif)
netdev_err(vif->dev, "fatal error; disabling device\n");
vif->disabled = true;
/* Disable the vif from queue 0's kthread */
-   if (vif->queues)
+   if (vif->num_queues)
xenvif_kick_thread(>queues[0]);
 }
 
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index d2d7cd9..a56d3ea 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -495,26 +495,26 @@ static void backend_disconnect(struct backend_info *be)
struct xenvif *vif = be->vif;
 
if (vif) {
+   unsigned int num_queues = vif->num_queues;
unsigned int queue_index;
-   struct xenvif_queue *queues;
 
xen_unregister_watchers(vif);
 #ifdef CONFIG_DEBUG_FS
xenvif_debugfs_delif(vif);
 #endif /* CONFIG_DEBUG_FS */
xenvif_disconnect_data(vif);
-   for (queue_index = 0;
-queue_index < vif->

[Xen-devel] [PATCH] xen: don't save/restore the physmap on VM save/restore

2017-03-09 Thread Igor Druzhinin
Saving/restoring the physmap to/from xenstore was introduced to
QEMU majorly in order to cover up the VRAM region restore issue.
The sequence of restore operations implies that we should know
the effective guest VRAM address *before* we have the VRAM region
restored (which happens later). Unfortunately, in Xen environment
VRAM memory does actually belong to a guest - not QEMU itself -
which means the position of this region is unknown beforehand and
can't be mapped into QEMU address space immediately.

Previously, recreating xenstore keys, holding the physmap, by the
toolstack helped to get this information in place at the right
moment ready to be consumed by QEMU to map the region properly.

The extraneous complexity of having those keys transferred by the
toolstack and unnecessary redundancy prompted us to propose a
solution which doesn't require any extra data in xenstore. The idea
is to defer the VRAM region mapping till the point we actually know
the effective address and able to map it. To that end, we initially
only register the pointer to the framebuffer without actual mapping.
Then, during the memory region restore phase, we perform the mapping
of the known address and update the VRAM region metadata (including
previously registered pointer) accordingly.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 exec.c   |   3 ++
 hw/display/vga.c |   2 +-
 include/hw/xen/xen.h |   2 +-
 xen-hvm.c| 114 ---
 4 files changed, 32 insertions(+), 89 deletions(-)

diff --git a/exec.c b/exec.c
index aabb035..5f2809e 100644
--- a/exec.c
+++ b/exec.c
@@ -2008,6 +2008,9 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t 
addr)
 }
 
 block->host = xen_map_cache(block->offset, block->max_length, 1);
+if (block->host == NULL) {
+return NULL;
+}
 }
 return ramblock_ptr(block, addr);
 }
diff --git a/hw/display/vga.c b/hw/display/vga.c
index 69c3e1d..be554c2 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -2163,7 +2163,7 @@ void vga_common_init(VGACommonState *s, Object *obj, bool 
global_vmstate)
 memory_region_init_ram(>vram, obj, "vga.vram", s->vram_size,
_fatal);
 vmstate_register_ram(>vram, global_vmstate ? NULL : DEVICE(obj));
-xen_register_framebuffer(>vram);
+xen_register_framebuffer(>vram, >vram_ptr);
 s->vram_ptr = memory_region_get_ram_ptr(>vram);
 s->get_bpp = vga_get_bpp;
 s->get_offsets = vga_get_offsets;
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 09c2ce5..3831843 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -45,6 +45,6 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size,
struct MemoryRegion *mr, Error **errp);
 void xen_modified_memory(ram_addr_t start, ram_addr_t length);
 
-void xen_register_framebuffer(struct MemoryRegion *mr);
+void xen_register_framebuffer(struct MemoryRegion *mr, uint8_t **ptr);
 
 #endif /* QEMU_HW_XEN_H */
diff --git a/xen-hvm.c b/xen-hvm.c
index 5043beb..ea5ed24 100644
--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -41,6 +41,7 @@
 
 static MemoryRegion ram_memory, ram_640k, ram_lo, ram_hi;
 static MemoryRegion *framebuffer;
+static uint8_t **framebuffer_ptr;
 static bool xen_in_migration;
 
 /* Compatibility with older version */
@@ -302,7 +303,6 @@ static hwaddr xen_phys_offset_to_gaddr(hwaddr start_addr,
 return physmap->start_addr;
 }
 }
-
 return start_addr;
 }
 
@@ -317,7 +317,6 @@ static int xen_add_to_physmap(XenIOState *state,
 XenPhysmap *physmap = NULL;
 hwaddr pfn, start_gpfn;
 hwaddr phys_offset = memory_region_get_ram_addr(mr);
-char path[80], value[17];
 const char *mr_name;
 
 if (get_physmapping(state, start_addr, size)) {
@@ -340,6 +339,27 @@ go_physmap:
 DPRINTF("mapping vram to %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
 start_addr, start_addr + size);
 
+mr_name = memory_region_name(mr);
+
+physmap = g_malloc(sizeof (XenPhysmap));
+
+physmap->start_addr = start_addr;
+physmap->size = size;
+physmap->name = mr_name;
+physmap->phys_offset = phys_offset;
+
+QLIST_INSERT_HEAD(>physmap, physmap, list);
+
+if (runstate_check(RUN_STATE_INMIGRATE))
+{
+/* At this point we have a physmap entry for the framebuffer region
+ * established during the restore phase so we can safely update the 
+ * registered framebuffer address here. */
+ if (mr == framebuffer)
+*framebuffer_ptr = memory_region_get_ram_ptr(framebuffer);
+return 0;
+}
+
 pfn = phys_offset >> TARGET_PAGE_BITS;
 start_gpfn = start_addr >> TARGET_PAGE_BITS;
 for (i = 0; i < size >> TARGET_PAGE_BITS; i++) {
@@ -350,49 +370,17 @@ go_physmap:
 if (rc) {
   

Re: [Xen-devel] [PATCH net v2] xen-netback: fix race condition on XenBus disconnect

2017-03-06 Thread Igor Druzhinin
On 06/03/17 08:58, Paul Durrant wrote:
>> -Original Message-
>> From: Igor Druzhinin [mailto:igor.druzhi...@citrix.com]
>> Sent: 03 March 2017 20:23
>> To: net...@vger.kernel.org; xen-de...@lists.xenproject.org
>> Cc: Paul Durrant <paul.durr...@citrix.com>; jgr...@suse.com; Wei Liu
>> <wei.l...@citrix.com>; Igor Druzhinin <igor.druzhi...@citrix.com>
>> Subject: [PATCH net v2] xen-netback: fix race condition on XenBus
>> disconnect
>>
>> In some cases during XenBus disconnect event handling and subsequent
>> queue resource release there may be some TX handlers active on
>> other processors. Use RCU in order to synchronize with them.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
>> ---
>> v2:
>>  * Add protection for xenvif_get_ethtool_stats
>>  * Additional comments and fixes
>> ---
>>  drivers/net/xen-netback/interface.c | 29 ++---
>>  drivers/net/xen-netback/netback.c   |  2 +-
>>  drivers/net/xen-netback/xenbus.c| 20 ++--
>>  3 files changed, 33 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
>> netback/interface.c
>> index a2d32676..266b7cd 100644
>> --- a/drivers/net/xen-netback/interface.c
>> +++ b/drivers/net/xen-netback/interface.c
>> @@ -164,13 +164,17 @@ static int xenvif_start_xmit(struct sk_buff *skb,
>> struct net_device *dev)
>>  {
>>  struct xenvif *vif = netdev_priv(dev);
>>  struct xenvif_queue *queue = NULL;
>> -unsigned int num_queues = vif->num_queues;
>> +unsigned int num_queues;
>>  u16 index;
>>  struct xenvif_rx_cb *cb;
>>
>>  BUG_ON(skb->dev != dev);
>>
>> -/* Drop the packet if queues are not set up */
>> +/* Drop the packet if queues are not set up.
>> + * This handler should be called inside an RCU read section
>> + * so we don't need to enter it here explicitly.
>> + */
>> +num_queues = rcu_dereference(vif)->num_queues;
>>  if (num_queues < 1)
>>  goto drop;
>>
>> @@ -221,18 +225,21 @@ static struct net_device_stats
>> *xenvif_get_stats(struct net_device *dev)
>>  {
>>  struct xenvif *vif = netdev_priv(dev);
>>  struct xenvif_queue *queue = NULL;
>> +unsigned int num_queues;
>>  u64 rx_bytes = 0;
>>  u64 rx_packets = 0;
>>  u64 tx_bytes = 0;
>>  u64 tx_packets = 0;
>>  unsigned int index;
>>
>> -spin_lock(>lock);
>> -if (vif->queues == NULL)
>> +rcu_read_lock();
>> +
>> +num_queues = rcu_dereference(vif)->num_queues;
>> +if (num_queues < 1)
>>  goto out;
> 
> Is this if clause worth it? All it does is jump over the for loop, which 
> would not be executed anyway, since the initial test (0 < 0) would fail.

Probably not needed here, but it does make it consistent with other
similar checks across the file. Just looks more descriptive.

> 
>>
>>  /* Aggregate tx and rx stats from each queue */
>> -for (index = 0; index < vif->num_queues; ++index) {
>> +for (index = 0; index < num_queues; ++index) {
>>  queue = >queues[index];
>>  rx_bytes += queue->stats.rx_bytes;
>>  rx_packets += queue->stats.rx_packets;
>> @@ -241,7 +248,7 @@ static struct net_device_stats
>> *xenvif_get_stats(struct net_device *dev)
>>  }
>>
>>  out:
>> -spin_unlock(>lock);
>> +rcu_read_unlock();
>>
>>  vif->dev->stats.rx_bytes = rx_bytes;
>>  vif->dev->stats.rx_packets = rx_packets;
>> @@ -377,10 +384,16 @@ static void xenvif_get_ethtool_stats(struct
>> net_device *dev,
>>   struct ethtool_stats *stats, u64 * data)
>>  {
>>  struct xenvif *vif = netdev_priv(dev);
>> -unsigned int num_queues = vif->num_queues;
>> +unsigned int num_queues;
>>  int i;
>>  unsigned int queue_index;
>>
>> +rcu_read_lock();
>> +
>> +num_queues = rcu_dereference(vif)->num_queues;
>> +if (num_queues < 1)
>> +goto out;
>> +
> 
> You have introduced a semantic change with the above if clause. The 
> xenvif_stats array was previously zeroed if num_queues < 1. It appears that 
> ethtool does actually allocate a zeroed array to pass in here, but I wonder 
> whether it is still safer to have this function zero it anyway. 

[Xen-devel] [PATCH net v2] xen-netback: fix race condition on XenBus disconnect

2017-03-03 Thread Igor Druzhinin
In some cases during XenBus disconnect event handling and subsequent
queue resource release there may be some TX handlers active on
other processors. Use RCU in order to synchronize with them.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
v2:
 * Add protection for xenvif_get_ethtool_stats
 * Additional comments and fixes
---
 drivers/net/xen-netback/interface.c | 29 ++---
 drivers/net/xen-netback/netback.c   |  2 +-
 drivers/net/xen-netback/xenbus.c| 20 ++--
 3 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index a2d32676..266b7cd 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -164,13 +164,17 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 {
struct xenvif *vif = netdev_priv(dev);
struct xenvif_queue *queue = NULL;
-   unsigned int num_queues = vif->num_queues;
+   unsigned int num_queues;
u16 index;
struct xenvif_rx_cb *cb;
 
BUG_ON(skb->dev != dev);
 
-   /* Drop the packet if queues are not set up */
+   /* Drop the packet if queues are not set up.
+* This handler should be called inside an RCU read section
+* so we don't need to enter it here explicitly.
+*/
+   num_queues = rcu_dereference(vif)->num_queues;
if (num_queues < 1)
goto drop;
 
@@ -221,18 +225,21 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
 {
struct xenvif *vif = netdev_priv(dev);
struct xenvif_queue *queue = NULL;
+   unsigned int num_queues;
u64 rx_bytes = 0;
u64 rx_packets = 0;
u64 tx_bytes = 0;
u64 tx_packets = 0;
unsigned int index;
 
-   spin_lock(>lock);
-   if (vif->queues == NULL)
+   rcu_read_lock();
+
+   num_queues = rcu_dereference(vif)->num_queues;
+   if (num_queues < 1)
goto out;
 
/* Aggregate tx and rx stats from each queue */
-   for (index = 0; index < vif->num_queues; ++index) {
+   for (index = 0; index < num_queues; ++index) {
queue = >queues[index];
rx_bytes += queue->stats.rx_bytes;
rx_packets += queue->stats.rx_packets;
@@ -241,7 +248,7 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
}
 
 out:
-   spin_unlock(>lock);
+   rcu_read_unlock();
 
vif->dev->stats.rx_bytes = rx_bytes;
vif->dev->stats.rx_packets = rx_packets;
@@ -377,10 +384,16 @@ static void xenvif_get_ethtool_stats(struct net_device 
*dev,
 struct ethtool_stats *stats, u64 * data)
 {
struct xenvif *vif = netdev_priv(dev);
-   unsigned int num_queues = vif->num_queues;
+   unsigned int num_queues;
int i;
unsigned int queue_index;
 
+   rcu_read_lock();
+
+   num_queues = rcu_dereference(vif)->num_queues;
+   if (num_queues < 1)
+   goto out;
+
for (i = 0; i < ARRAY_SIZE(xenvif_stats); i++) {
unsigned long accum = 0;
for (queue_index = 0; queue_index < num_queues; ++queue_index) {
@@ -389,6 +402,8 @@ static void xenvif_get_ethtool_stats(struct net_device *dev,
}
data[i] = accum;
}
+out:
+   rcu_read_unlock();
 }
 
 static void xenvif_get_strings(struct net_device *dev, u32 stringset, u8 * 
data)
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index f9bcf4a..62fa74d 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -214,7 +214,7 @@ static void xenvif_fatal_tx_err(struct xenvif *vif)
netdev_err(vif->dev, "fatal error; disabling device\n");
vif->disabled = true;
/* Disable the vif from queue 0's kthread */
-   if (vif->queues)
+   if (vif->num_queues > 0)
xenvif_kick_thread(>queues[0]);
 }
 
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index d2d7cd9..a56d3ea 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -495,26 +495,26 @@ static void backend_disconnect(struct backend_info *be)
struct xenvif *vif = be->vif;
 
if (vif) {
+   unsigned int num_queues = vif->num_queues;
unsigned int queue_index;
-   struct xenvif_queue *queues;
 
xen_unregister_watchers(vif);
 #ifdef CONFIG_DEBUG_FS
xenvif_debugfs_delif(vif);
 #endif /* CONFIG_DEBUG_FS */
xenvif_disconnect_data(vif);
-   for (queue_index = 0;
-queue_index < vif->num_queues;
- 

Re: [Xen-devel] [PATCH] xen-netback: fix race condition on XenBus disconnect

2017-03-03 Thread Igor Druzhinin
On 03/03/17 09:18, Paul Durrant wrote:
>> -Original Message-
>> From: Igor Druzhinin [mailto:igor.druzhi...@citrix.com]
>> Sent: 02 March 2017 22:57
>> To: net...@vger.kernel.org; xen-de...@lists.xenproject.org
>> Cc: Paul Durrant <paul.durr...@citrix.com>; jgr...@suse.com; Wei Liu
>> <wei.l...@citrix.com>; Igor Druzhinin <igor.druzhi...@citrix.com>
>> Subject: [PATCH] xen-netback: fix race condition on XenBus disconnect
>>
>> In some cases during XenBus disconnect event handling and subsequent
>> queue resource release there may be some TX handlers active on
>> other processors. Use RCU in order to synchronize with them.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
>> ---
>>  drivers/net/xen-netback/interface.c | 13 -
>>  drivers/net/xen-netback/xenbus.c| 17 +++--
>>  2 files changed, 15 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
>> netback/interface.c
>> index a2d32676..32e2cc6 100644
>> --- a/drivers/net/xen-netback/interface.c
>> +++ b/drivers/net/xen-netback/interface.c
>> @@ -164,7 +164,7 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct
>> net_device *dev)
>>  {
>>  struct xenvif *vif = netdev_priv(dev);
>>  struct xenvif_queue *queue = NULL;
>> -unsigned int num_queues = vif->num_queues;
> 
> Do you not need an rcu_read_lock() around this and use of the num_queues 
> value (as you have below)?

Huh, missed this one. Point is that xenvif_start_xmit is already in RCU
read section.

Igor

> 
>> +unsigned int num_queues = rcu_dereference(vif)->num_queues;
>>  u16 index;
>>  struct xenvif_rx_cb *cb;
>>
>> @@ -221,18 +221,21 @@ static struct net_device_stats
>> *xenvif_get_stats(struct net_device *dev)
>>  {
>>  struct xenvif *vif = netdev_priv(dev);
>>  struct xenvif_queue *queue = NULL;
>> +unsigned int num_queues;
>>  u64 rx_bytes = 0;
>>  u64 rx_packets = 0;
>>  u64 tx_bytes = 0;
>>  u64 tx_packets = 0;
>>  unsigned int index;
>>
>> -spin_lock(>lock);
>> -if (vif->queues == NULL)
>> +rcu_read_lock();
>> +
>> +num_queues = rcu_dereference(vif)->num_queues;
>> +if (num_queues < 1)
>>  goto out;
>>
>>  /* Aggregate tx and rx stats from each queue */
>> -for (index = 0; index < vif->num_queues; ++index) {
>> +for (index = 0; index < num_queues; ++index) {
>>  queue = >queues[index];
>>  rx_bytes += queue->stats.rx_bytes;
>>  rx_packets += queue->stats.rx_packets;
>> @@ -241,7 +244,7 @@ static struct net_device_stats
>> *xenvif_get_stats(struct net_device *dev)
>>  }
>>
>>  out:
>> -spin_unlock(>lock);
>> +rcu_read_unlock();
>>
>>  vif->dev->stats.rx_bytes = rx_bytes;
>>  vif->dev->stats.rx_packets = rx_packets;
>> diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-
>> netback/xenbus.c
>> index d2d7cd9..76efb01 100644
>> --- a/drivers/net/xen-netback/xenbus.c
>> +++ b/drivers/net/xen-netback/xenbus.c
>> @@ -495,26 +495,23 @@ static void backend_disconnect(struct
>> backend_info *be)
>>  struct xenvif *vif = be->vif;
>>
>>  if (vif) {
>> +unsigned int num_queues = vif->num_queues;
>>  unsigned int queue_index;
>> -struct xenvif_queue *queues;
>>
>>  xen_unregister_watchers(vif);
>>  #ifdef CONFIG_DEBUG_FS
>>  xenvif_debugfs_delif(vif);
>>  #endif /* CONFIG_DEBUG_FS */
>>  xenvif_disconnect_data(vif);
>> -for (queue_index = 0;
>> - queue_index < vif->num_queues;
>> - ++queue_index)
>> -xenvif_deinit_queue(>queues[queue_index]);
>>
>> -spin_lock(>lock);
>> -queues = vif->queues;
>>  vif->num_queues = 0;
>> -vif->queues = NULL;
>> -spin_unlock(>lock);
>> +synchronize_net();
> 
> So, num_queues is your RCU protected value, rather than the queues pointer, 
> in which case I think you probably need to change code such as
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/net/xen-netback/netback.c?id=refs/tags/v4.10#n216
> 
> to be gated on num_queues.
> 
> Also shouldn't xenvif_up(), xenvif_down() and xenvif_get_ethtool_stats() not 
> be using rcu_read_lock() and rcu_dereference() of num_queues as well?
> 
>   Paul
> 
>>
>> -vfree(queues);
>> +for (queue_index = 0; queue_index < num_queues;
>> ++queue_index)
>> +xenvif_deinit_queue(>queues[queue_index]);
>> +
>> +vfree(vif->queues);
>> +vif->queues = NULL;
>>
>>  xenvif_disconnect_ctrl(vif);
>>  }
>> --
>> 1.8.3.1
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen-netback: fix race condition on XenBus disconnect

2017-03-03 Thread Igor Druzhinin
On 03/03/17 09:18, Paul Durrant wrote:
>> -Original Message-
>> From: Igor Druzhinin [mailto:igor.druzhi...@citrix.com]
>> Sent: 02 March 2017 22:57
>> To: net...@vger.kernel.org; xen-de...@lists.xenproject.org
>> Cc: Paul Durrant <paul.durr...@citrix.com>; jgr...@suse.com; Wei Liu
>> <wei.l...@citrix.com>; Igor Druzhinin <igor.druzhi...@citrix.com>
>> Subject: [PATCH] xen-netback: fix race condition on XenBus disconnect
>>
>> In some cases during XenBus disconnect event handling and subsequent
>> queue resource release there may be some TX handlers active on
>> other processors. Use RCU in order to synchronize with them.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
>> ---
>>  drivers/net/xen-netback/interface.c | 13 -
>>  drivers/net/xen-netback/xenbus.c| 17 +++--
>>  2 files changed, 15 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
>> netback/interface.c
>> index a2d32676..32e2cc6 100644
>> --- a/drivers/net/xen-netback/interface.c
>> +++ b/drivers/net/xen-netback/interface.c
>> @@ -164,7 +164,7 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct
>> net_device *dev)
>>  {
>>  struct xenvif *vif = netdev_priv(dev);
>>  struct xenvif_queue *queue = NULL;
>> -unsigned int num_queues = vif->num_queues;
> 
> Do you not need an rcu_read_lock() around this and use of the num_queues 
> value (as you have below)?
> 
>> +unsigned int num_queues = rcu_dereference(vif)->num_queues;
>>  u16 index;
>>  struct xenvif_rx_cb *cb;
>>
>> @@ -221,18 +221,21 @@ static struct net_device_stats
>> *xenvif_get_stats(struct net_device *dev)
>>  {
>>  struct xenvif *vif = netdev_priv(dev);
>>  struct xenvif_queue *queue = NULL;
>> +unsigned int num_queues;
>>  u64 rx_bytes = 0;
>>  u64 rx_packets = 0;
>>  u64 tx_bytes = 0;
>>  u64 tx_packets = 0;
>>  unsigned int index;
>>
>> -spin_lock(>lock);
>> -if (vif->queues == NULL)
>> +rcu_read_lock();
>> +
>> +num_queues = rcu_dereference(vif)->num_queues;
>> +if (num_queues < 1)
>>  goto out;
>>
>>  /* Aggregate tx and rx stats from each queue */
>> -for (index = 0; index < vif->num_queues; ++index) {
>> +for (index = 0; index < num_queues; ++index) {
>>  queue = >queues[index];
>>  rx_bytes += queue->stats.rx_bytes;
>>  rx_packets += queue->stats.rx_packets;
>> @@ -241,7 +244,7 @@ static struct net_device_stats
>> *xenvif_get_stats(struct net_device *dev)
>>  }
>>
>>  out:
>> -spin_unlock(>lock);
>> +rcu_read_unlock();
>>
>>  vif->dev->stats.rx_bytes = rx_bytes;
>>  vif->dev->stats.rx_packets = rx_packets;
>> diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-
>> netback/xenbus.c
>> index d2d7cd9..76efb01 100644
>> --- a/drivers/net/xen-netback/xenbus.c
>> +++ b/drivers/net/xen-netback/xenbus.c
>> @@ -495,26 +495,23 @@ static void backend_disconnect(struct
>> backend_info *be)
>>  struct xenvif *vif = be->vif;
>>
>>  if (vif) {
>> +unsigned int num_queues = vif->num_queues;
>>  unsigned int queue_index;
>> -struct xenvif_queue *queues;
>>
>>  xen_unregister_watchers(vif);
>>  #ifdef CONFIG_DEBUG_FS
>>  xenvif_debugfs_delif(vif);
>>  #endif /* CONFIG_DEBUG_FS */
>>  xenvif_disconnect_data(vif);
>> -for (queue_index = 0;
>> - queue_index < vif->num_queues;
>> - ++queue_index)
>> -xenvif_deinit_queue(>queues[queue_index]);
>>
>> -spin_lock(>lock);
>> -queues = vif->queues;
>>  vif->num_queues = 0;
>> -vif->queues = NULL;
>> -spin_unlock(>lock);
>> +synchronize_net();
> 
> So, num_queues is your RCU protected value, rather than the queues pointer, 
> in which case I think you probably need to change code such as
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/net/xen-netback/netback.c?id=refs/tags/v4.10#n216
> 
> to be gated on num_queues.
> 
> Also shouldn't xenvif_up(), xenvif_down() and xenvif_get_ethtool_stats() not 
> be usin

[Xen-devel] [PATCH] xen-netback: fix race condition on XenBus disconnect

2017-03-02 Thread Igor Druzhinin
In some cases during XenBus disconnect event handling and subsequent
queue resource release there may be some TX handlers active on
other processors. Use RCU in order to synchronize with them.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 drivers/net/xen-netback/interface.c | 13 -
 drivers/net/xen-netback/xenbus.c| 17 +++--
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index a2d32676..32e2cc6 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -164,7 +164,7 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 {
struct xenvif *vif = netdev_priv(dev);
struct xenvif_queue *queue = NULL;
-   unsigned int num_queues = vif->num_queues;
+   unsigned int num_queues = rcu_dereference(vif)->num_queues;
u16 index;
struct xenvif_rx_cb *cb;
 
@@ -221,18 +221,21 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
 {
struct xenvif *vif = netdev_priv(dev);
struct xenvif_queue *queue = NULL;
+   unsigned int num_queues;
u64 rx_bytes = 0;
u64 rx_packets = 0;
u64 tx_bytes = 0;
u64 tx_packets = 0;
unsigned int index;
 
-   spin_lock(>lock);
-   if (vif->queues == NULL)
+   rcu_read_lock();
+
+   num_queues = rcu_dereference(vif)->num_queues;
+   if (num_queues < 1)
goto out;
 
/* Aggregate tx and rx stats from each queue */
-   for (index = 0; index < vif->num_queues; ++index) {
+   for (index = 0; index < num_queues; ++index) {
queue = >queues[index];
rx_bytes += queue->stats.rx_bytes;
rx_packets += queue->stats.rx_packets;
@@ -241,7 +244,7 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
}
 
 out:
-   spin_unlock(>lock);
+   rcu_read_unlock();
 
vif->dev->stats.rx_bytes = rx_bytes;
vif->dev->stats.rx_packets = rx_packets;
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index d2d7cd9..76efb01 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -495,26 +495,23 @@ static void backend_disconnect(struct backend_info *be)
struct xenvif *vif = be->vif;
 
if (vif) {
+   unsigned int num_queues = vif->num_queues;
unsigned int queue_index;
-   struct xenvif_queue *queues;
 
xen_unregister_watchers(vif);
 #ifdef CONFIG_DEBUG_FS
xenvif_debugfs_delif(vif);
 #endif /* CONFIG_DEBUG_FS */
xenvif_disconnect_data(vif);
-   for (queue_index = 0;
-queue_index < vif->num_queues;
-++queue_index)
-   xenvif_deinit_queue(>queues[queue_index]);
 
-   spin_lock(>lock);
-   queues = vif->queues;
vif->num_queues = 0;
-   vif->queues = NULL;
-   spin_unlock(>lock);
+   synchronize_net();
 
-   vfree(queues);
+   for (queue_index = 0; queue_index < num_queues; ++queue_index)
+   xenvif_deinit_queue(>queues[queue_index]);
+
+   vfree(vif->queues);
+   vif->queues = NULL;
 
xenvif_disconnect_ctrl(vif);
}
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] BUG due to "xen-netback: protect resource cleaning on XenBus disconnect"

2017-03-02 Thread Igor Druzhinin
On 02/03/17 12:19, Paul Durrant wrote:
>> -Original Message-
>> From: Juergen Gross [mailto:jgr...@suse.com]
>> Sent: 02 March 2017 12:13
>> To: Wei Liu <wei.l...@citrix.com>
>> Cc: Igor Druzhinin <igor.druzhi...@citrix.com>; xen-devel > de...@lists.xenproject.org>; Linux Kernel Mailing List > ker...@vger.kernel.org>; net...@vger.kernel.org; Boris Ostrovsky
>> <boris.ostrov...@oracle.com>; David Miller <da...@davemloft.net>; Paul
>> Durrant <paul.durr...@citrix.com>
>> Subject: Re: BUG due to "xen-netback: protect resource cleaning on XenBus
>> disconnect"
>>
>> On 02/03/17 13:06, Wei Liu wrote:
>>> On Thu, Mar 02, 2017 at 12:56:20PM +0100, Juergen Gross wrote:
>>>> With commits f16f1df65 and 9a6cdf52b we get in our Xen testing:
>>>>
>>>> [  174.512861] switch: port 2(vif3.0) entered disabled state
>>>> [  174.522735] BUG: sleeping function called from invalid context at
>>>> /home/build/linux-linus/mm/vmalloc.c:1441
>>>> [  174.523451] in_atomic(): 1, irqs_disabled(): 0, pid: 28, name: xenwatch
>>>> [  174.524131] CPU: 1 PID: 28 Comm: xenwatch Tainted: GW
>>>> 4.10.0upstream-11073-g4977ab6-dirty #1
>>>> [  174.524819] Hardware name: MSI MS-7680/H61M-P23 (MS-7680), BIOS
>> V17.0
>>>> 03/14/2011
>>>> [  174.525517] Call Trace:
>>>> [  174.526217]  show_stack+0x23/0x60
>>>> [  174.526899]  dump_stack+0x5b/0x88
>>>> [  174.527562]  ___might_sleep+0xde/0x130
>>>> [  174.528208]  __might_sleep+0x35/0xa0
>>>> [  174.528840]  ? _raw_spin_unlock_irqrestore+0x13/0x20
>>>> [  174.529463]  ? __wake_up+0x40/0x50
>>>> [  174.530089]  remove_vm_area+0x20/0x90
>>>> [  174.530724]  __vunmap+0x1d/0xc0
>>>> [  174.531346]  ? delete_object_full+0x13/0x20
>>>> [  174.531973]  vfree+0x40/0x80
>>>> [  174.532594]  set_backend_state+0x18a/0xa90
>>>> [  174.533221]  ? dwc_scan_descriptors+0x24d/0x430
>>>> [  174.533850]  ? kfree+0x5b/0xc0
>>>> [  174.534476]  ? xenbus_read+0x3d/0x50
>>>> [  174.535101]  ? xenbus_read+0x3d/0x50
>>>> [  174.535718]  ? xenbus_gather+0x31/0x90
>>>> [  174.536332]  ? ___might_sleep+0xf6/0x130
>>>> [  174.536945]  frontend_changed+0x6b/0xd0
>>>> [  174.537565]  xenbus_otherend_changed+0x7d/0x80
>>>> [  174.538185]  frontend_changed+0x12/0x20
>>>> [  174.538803]  xenwatch_thread+0x74/0x110
>>>> [  174.539417]  ? woken_wake_function+0x20/0x20
>>>> [  174.540049]  kthread+0xe5/0x120
>>>> [  174.540663]  ? xenbus_printf+0x50/0x50
>>>> [  174.541278]  ? __kthread_init_worker+0x40/0x40
>>>> [  174.541898]  ret_from_fork+0x21/0x2c
>>>> [  174.548635] switch: port 2(vif3.0) entered disabled state
>>>>
>>>> I believe calling vfree() when holding a spin_lock isn't a good idea.
>>>>
>>>
>>> Use vfree_atomic instead?
>>
>> Hmm, isn't this overkill here?
>>
>> You can just set a local variable with the address and do vfree() after
>> releasing the lock.
>>
> 
> Yep, that's what I was thinking. Patch coming shortly.
> 
>   Paul

We have an internal patch that was just recently tested without using
spinlocks. Calling vfree in the spinlock section is not the worst that
could happen as our testing revealed. I switched to RCU for protecting
the environment from memory release. I'll share it today.

Igor

> 
>>
>> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 0/2] xen-netback: fix memory leaks on XenBus disconnect

2017-01-17 Thread Igor Druzhinin
Just split the initial patch in two as proposed by Wei.

Since the approach for locking netdev statistics is inconsistent (tends not
to have any locking at all) accross the kernel we'd better to rely on our
internal lock for this purpose.

Igor Druzhinin (2):
  xen-netback: fix memory leaks on XenBus disconnect
  xen-netback: protect resource cleaning on XenBus disconnect

 drivers/net/xen-netback/interface.c |  6 --
 drivers/net/xen-netback/xenbus.c| 13 +
 2 files changed, 17 insertions(+), 2 deletions(-)

-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 1/2] xen-netback: fix memory leaks on XenBus disconnect

2017-01-17 Thread Igor Druzhinin
Eliminate memory leaks introduced several years ago by cleaning the
queue resources which are allocated on XenBus connection event. Namely, queue
structure array and pages used for IO rings.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 drivers/net/xen-netback/xenbus.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 6c57b02..3e99071 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -493,11 +493,20 @@ static int backend_create_xenvif(struct backend_info *be)
 static void backend_disconnect(struct backend_info *be)
 {
if (be->vif) {
+   unsigned int queue_index;
+
xen_unregister_watchers(be->vif);
 #ifdef CONFIG_DEBUG_FS
xenvif_debugfs_delif(be->vif);
 #endif /* CONFIG_DEBUG_FS */
xenvif_disconnect_data(be->vif);
+   for (queue_index = 0; queue_index < be->vif->num_queues; 
++queue_index)
+   xenvif_deinit_queue(>vif->queues[queue_index]);
+
+   vfree(be->vif->queues);
+   be->vif->num_queues = 0;
+   be->vif->queues = NULL;
+
xenvif_disconnect_ctrl(be->vif);
}
 }
@@ -1026,6 +1035,8 @@ static void connect(struct backend_info *be)
 err:
if (be->vif->num_queues > 0)
xenvif_disconnect_data(be->vif); /* Clean up existing queues */
+   for (queue_index = 0; queue_index < be->vif->num_queues; ++queue_index)
+   xenvif_deinit_queue(>vif->queues[queue_index]);
vfree(be->vif->queues);
be->vif->queues = NULL;
be->vif->num_queues = 0;
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 2/2] xen-netback: protect resource cleaning on XenBus disconnect

2017-01-17 Thread Igor Druzhinin
vif->lock is used to protect statistics gathering agents from using the
queue structure during cleaning.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 drivers/net/xen-netback/interface.c | 6 --
 drivers/net/xen-netback/xenbus.c| 2 ++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index 41c69b3..c48252a 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -230,18 +230,18 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
 {
struct xenvif *vif = netdev_priv(dev);
struct xenvif_queue *queue = NULL;
-   unsigned int num_queues = vif->num_queues;
unsigned long rx_bytes = 0;
unsigned long rx_packets = 0;
unsigned long tx_bytes = 0;
unsigned long tx_packets = 0;
unsigned int index;
 
+   spin_lock(>lock);
if (vif->queues == NULL)
goto out;
 
/* Aggregate tx and rx stats from each queue */
-   for (index = 0; index < num_queues; ++index) {
+   for (index = 0; index < vif->num_queues; ++index) {
queue = >queues[index];
rx_bytes += queue->stats.rx_bytes;
rx_packets += queue->stats.rx_packets;
@@ -250,6 +250,8 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
}
 
 out:
+   spin_unlock(>lock);
+
vif->dev->stats.rx_bytes = rx_bytes;
vif->dev->stats.rx_packets = rx_packets;
vif->dev->stats.tx_bytes = tx_bytes;
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 3e99071..d82cd71 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -503,9 +503,11 @@ static void backend_disconnect(struct backend_info *be)
for (queue_index = 0; queue_index < be->vif->num_queues; 
++queue_index)
xenvif_deinit_queue(>vif->queues[queue_index]);
 
+   spin_lock(>vif->lock);
vfree(be->vif->queues);
be->vif->num_queues = 0;
be->vif->queues = NULL;
+   spin_unlock(>vif->lock);
 
xenvif_disconnect_ctrl(be->vif);
}
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen-netback: fix memory leaks on XenBus disconnect

2017-01-12 Thread Igor Druzhinin
On 12/01/17 17:51, Igor Druzhinin wrote:
> Eliminate memory leaks introduced several years ago by cleaning the queue
> resources which are allocated on XenBus connection event. Namely, queue
> structure array and pages used for IO rings.
> vif->lock is used to protect statistics gathering agents from using the
> queue structure during cleaning.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
> ---
>  drivers/net/xen-netback/interface.c |  6 --
>  drivers/net/xen-netback/xenbus.c| 13 +
>  2 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/interface.c 
> b/drivers/net/xen-netback/interface.c
> index e30ffd2..5795213 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -221,18 +221,18 @@ static struct net_device_stats *xenvif_get_stats(struct 
> net_device *dev)
>  {
>   struct xenvif *vif = netdev_priv(dev);
>   struct xenvif_queue *queue = NULL;
> - unsigned int num_queues = vif->num_queues;
>   unsigned long rx_bytes = 0;
>   unsigned long rx_packets = 0;
>   unsigned long tx_bytes = 0;
>   unsigned long tx_packets = 0;
>   unsigned int index;
>  
> + spin_lock(>lock);
>   if (vif->queues == NULL)
>   goto out;
>  
>   /* Aggregate tx and rx stats from each queue */
> - for (index = 0; index < num_queues; ++index) {
> + for (index = 0; index < vif->num_queues; ++index) {
>   queue = >queues[index];
>   rx_bytes += queue->stats.rx_bytes;
>   rx_packets += queue->stats.rx_packets;
> @@ -241,6 +241,8 @@ static struct net_device_stats *xenvif_get_stats(struct 
> net_device *dev)
>   }
>  
>  out:
> + spin_unlock(>lock);
> +
>   vif->dev->stats.rx_bytes = rx_bytes;
>   vif->dev->stats.rx_packets = rx_packets;
>   vif->dev->stats.tx_bytes = tx_bytes;
> diff --git a/drivers/net/xen-netback/xenbus.c 
> b/drivers/net/xen-netback/xenbus.c
> index 3124eae..85b742e 100644
> --- a/drivers/net/xen-netback/xenbus.c
> +++ b/drivers/net/xen-netback/xenbus.c
> @@ -493,11 +493,22 @@ static int backend_create_xenvif(struct backend_info 
> *be)
>  static void backend_disconnect(struct backend_info *be)
>  {
>   if (be->vif) {
> + unsigned int queue_index;
> +
>   xen_unregister_watchers(be->vif);
>  #ifdef CONFIG_DEBUG_FS
>   xenvif_debugfs_delif(be->vif);
>  #endif /* CONFIG_DEBUG_FS */
>   xenvif_disconnect_data(be->vif);
> + for (queue_index = 0; queue_index < be->vif->num_queues; 
> ++queue_index)
> + xenvif_deinit_queue(>vif->queues[queue_index]);
> +
> + spin_lock(>vif->lock);
> + vfree(be->vif->queues);
> + be->vif->num_queues = 0;
> + be->vif->queues = NULL;
> + spin_unlock(>vif->lock);
> +
>   xenvif_disconnect_ctrl(be->vif);
>   }
>  }
> @@ -1034,6 +1045,8 @@ static void connect(struct backend_info *be)
>  err:
>   if (be->vif->num_queues > 0)
>   xenvif_disconnect_data(be->vif); /* Clean up existing queues */
> + for (queue_index = 0; queue_index < be->vif->num_queues; ++queue_index)
> + xenvif_deinit_queue(>vif->queues[queue_index]);
>   vfree(be->vif->queues);
>   be->vif->queues = NULL;
>   be->vif->num_queues = 0;
> 

Add Juergen Gross to CC.

Igor

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xen-netback: fix memory leaks on XenBus disconnect

2017-01-12 Thread Igor Druzhinin
Eliminate memory leaks introduced several years ago by cleaning the queue
resources which are allocated on XenBus connection event. Namely, queue
structure array and pages used for IO rings.
vif->lock is used to protect statistics gathering agents from using the
queue structure during cleaning.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>
---
 drivers/net/xen-netback/interface.c |  6 --
 drivers/net/xen-netback/xenbus.c| 13 +
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index e30ffd2..5795213 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -221,18 +221,18 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
 {
struct xenvif *vif = netdev_priv(dev);
struct xenvif_queue *queue = NULL;
-   unsigned int num_queues = vif->num_queues;
unsigned long rx_bytes = 0;
unsigned long rx_packets = 0;
unsigned long tx_bytes = 0;
unsigned long tx_packets = 0;
unsigned int index;
 
+   spin_lock(>lock);
if (vif->queues == NULL)
goto out;
 
/* Aggregate tx and rx stats from each queue */
-   for (index = 0; index < num_queues; ++index) {
+   for (index = 0; index < vif->num_queues; ++index) {
queue = >queues[index];
rx_bytes += queue->stats.rx_bytes;
rx_packets += queue->stats.rx_packets;
@@ -241,6 +241,8 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
}
 
 out:
+   spin_unlock(>lock);
+
vif->dev->stats.rx_bytes = rx_bytes;
vif->dev->stats.rx_packets = rx_packets;
vif->dev->stats.tx_bytes = tx_bytes;
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 3124eae..85b742e 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -493,11 +493,22 @@ static int backend_create_xenvif(struct backend_info *be)
 static void backend_disconnect(struct backend_info *be)
 {
if (be->vif) {
+   unsigned int queue_index;
+
xen_unregister_watchers(be->vif);
 #ifdef CONFIG_DEBUG_FS
xenvif_debugfs_delif(be->vif);
 #endif /* CONFIG_DEBUG_FS */
xenvif_disconnect_data(be->vif);
+   for (queue_index = 0; queue_index < be->vif->num_queues; 
++queue_index)
+   xenvif_deinit_queue(>vif->queues[queue_index]);
+
+   spin_lock(>vif->lock);
+   vfree(be->vif->queues);
+   be->vif->num_queues = 0;
+   be->vif->queues = NULL;
+   spin_unlock(>vif->lock);
+
xenvif_disconnect_ctrl(be->vif);
}
 }
@@ -1034,6 +1045,8 @@ static void connect(struct backend_info *be)
 err:
if (be->vif->num_queues > 0)
xenvif_disconnect_data(be->vif); /* Clean up existing queues */
+   for (queue_index = 0; queue_index < be->vif->num_queues; ++queue_index)
+   xenvif_deinit_queue(>vif->queues[queue_index]);
vfree(be->vif->queues);
be->vif->queues = NULL;
be->vif->num_queues = 0;
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] trace: Fix incorrect number of pages used for trace metadata

2016-10-04 Thread Igor Druzhinin

Checked that.

Tested-by: Igor Druzhinin <igor.druzhi...@citrix.com>

On 30/09/16 17:12, George Dunlap wrote:

On 30/09/16 17:04, Igor Druzhinin wrote:

On 30/09/16 15:46, George Dunlap wrote:

On 29/09/16 14:53, Igor Druzhinin wrote:

As long as t_info_first_offset is calculated in uint32_t offsets we
need to
multiply it by sizeof(uint32_t) in order to get the right number of
pages
for trace metadata. Not doing that makes it impossible to read the trace
buffer correctly from userspace for some corner cases.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>


Hmm, looks like we actually want to revert this c/s fbf96e6, "xentrace:
correct formula to calculate t_info_pages".  But that one was presumably
written (and Acked by me) because the variable name there,
t_info_first_offset, is confusing.

The other mistake in fbf96e6 is that before t_info_words was actually
denominated in words; but after it's denominated in bytes (which is
again confusing).

What about something like the attached instead?  This should fix your
problem while making the code clearer.

 -George




Reviewed-by: Igor Druzhinin <igor.druzhi...@citrix.com>


Thanks.  Any chance I could get a Tested-by as well? :-)

 -George



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] trace: Fix incorrect number of pages used for trace metadata

2016-09-30 Thread Igor Druzhinin

On 30/09/16 15:46, George Dunlap wrote:

On 29/09/16 14:53, Igor Druzhinin wrote:

As long as t_info_first_offset is calculated in uint32_t offsets we need to
multiply it by sizeof(uint32_t) in order to get the right number of pages
for trace metadata. Not doing that makes it impossible to read the trace
buffer correctly from userspace for some corner cases.

Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com>


Hmm, looks like we actually want to revert this c/s fbf96e6, "xentrace:
correct formula to calculate t_info_pages".  But that one was presumably
written (and Acked by me) because the variable name there,
t_info_first_offset, is confusing.

The other mistake in fbf96e6 is that before t_info_words was actually
denominated in words; but after it's denominated in bytes (which is
again confusing).

What about something like the attached instead?  This should fix your
problem while making the code clearer.

 -George




Reviewed-by: Igor Druzhinin <igor.druzhi...@citrix.com>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel