Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.

2017-02-13 Thread Jan Beulich
>>> On 13.02.17 at 17:56,  wrote:
 On 13.02.17 at 17:32,  wrote:
>> On 09/02/17 16:22, Jan Beulich wrote:
>>> Mind giving the attached patch a try (which admittedly was only
>>> lightly tested so far - in particular I haven't seen the second of
>>> the debug messages being logged, yet)?
>> Patch looks promising. I couldn't do much thorough testing, but initial 
>> reboot cycles (around 20 reboots of 32 VMS) went fine.
> 
> Thanks. The most interesting part though is whether the 2nd of the
> two log messages ever showed up. I any event I'll submit the cleaned
> up patch, for more formal discussion of the approach to happen there.

Actually, no, while putting together the commit message I thought
of a second situation which likely would also need addressing: If
we bypass __context_switch() also on the context-switch-in path
(because of scheduling the same vCPU again) the CPU may also
have lost control of the VMCS. There's no context switch hook
though to place the reload invocation in, so I'll first have to think
about what the best model here would be (of course I'd be more
than happy about suggestions).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.

2017-02-13 Thread Jan Beulich
>>> On 13.02.17 at 17:32,  wrote:
> On 09/02/17 16:22, Jan Beulich wrote:
>> Mind giving the attached patch a try (which admittedly was only
>> lightly tested so far - in particular I haven't seen the second of
>> the debug messages being logged, yet)?
> Patch looks promising. I couldn't do much thorough testing, but initial 
> reboot cycles (around 20 reboots of 32 VMS) went fine.

Thanks. The most interesting part though is whether the 2nd of the
two log messages ever showed up. I any event I'll submit the cleaned
up patch, for more formal discussion of the approach to happen there.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.

2017-02-13 Thread anshul makkar

Hi Jan,


On 09/02/17 16:22, Jan Beulich wrote:

On 08.02.17 at 16:32,  wrote:

On 07.02.17 at 18:26,  wrote:

Facing a issue where bootstorm of guests leads to host crash. I debugged
and found that that enabling PML  introduces a  race condition during
guest teardown stage while disabling PML on a vcpu  and context switch
happening for another vcpu.

Crash happens only on Broadwell processors as PML got introduced in this
generation.

Here is my analysis:

Race condition:

vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write(
disable_PML); vmx_vmcx_exit();)

In between I have a callpath from another pcpu executing context
switch-> vmx_fpu_leave() and crash on vmwrite..

if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) )
{
   v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS;
   __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);  //crash
   }

So that's after current has changed already, so it's effectively
dealing with a foreign VMCS, but it doesn't use vmx_vmcs_enter().
The locking done in vmx_vmcs_try_enter() / vmx_vmcs_exit(),
however, assumes that any user of a VMCS either owns the lock
or has current as the owner of the VMCS. Of course such a call
also can't be added here, as a vcpu on the context-switch-from
path can't vcpu_pause() itself.

That taken together with the bypassing of __context_switch()
when the incoming vCPU is the idle one (which means that via
context_saved() ->is_running will be cleared before running
->ctxt_switch_from()), the vcpu_pause() invocation in
vmx_vmcs_try_enter() may not have to wait at all if the call
happens between the clearing of ->is_running and the
eventual invocation of vmx_ctxt_switch_from().

Mind giving the attached patch a try (which admittedly was only
lightly tested so far - in particular I haven't seen the second of
the debug messages being logged, yet)?
Patch looks promising. I couldn't do much thorough testing, but initial 
reboot cycles (around 20 reboots of 32 VMS) went fine.


Jan


Anshul

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.

2017-02-09 Thread Jan Beulich
>>> On 08.02.17 at 16:32,  wrote:
 On 07.02.17 at 18:26,  wrote:
>> Facing a issue where bootstorm of guests leads to host crash. I debugged 
>> and found that that enabling PML  introduces a  race condition during 
>> guest teardown stage while disabling PML on a vcpu  and context switch 
>> happening for another vcpu.
>> 
>> Crash happens only on Broadwell processors as PML got introduced in this 
>> generation.
>> 
>> Here is my analysis:
>> 
>> Race condition:
>> 
>> vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( 
>> disable_PML); vmx_vmcx_exit();)
>> 
>> In between I have a callpath from another pcpu executing context 
>> switch-> vmx_fpu_leave() and crash on vmwrite..
>> 
>>if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) )
>> {
>>   v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS;
>>   __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);  //crash
>>   }
> 
> So that's after current has changed already, so it's effectively
> dealing with a foreign VMCS, but it doesn't use vmx_vmcs_enter().
> The locking done in vmx_vmcs_try_enter() / vmx_vmcs_exit(),
> however, assumes that any user of a VMCS either owns the lock
> or has current as the owner of the VMCS. Of course such a call
> also can't be added here, as a vcpu on the context-switch-from
> path can't vcpu_pause() itself.
> 
> That taken together with the bypassing of __context_switch()
> when the incoming vCPU is the idle one (which means that via
> context_saved() ->is_running will be cleared before running
> ->ctxt_switch_from()), the vcpu_pause() invocation in
> vmx_vmcs_try_enter() may not have to wait at all if the call
> happens between the clearing of ->is_running and the
> eventual invocation of vmx_ctxt_switch_from().
> 
> If the above makes sense (which I'm not sure at all), the
> question is whether using this_cpu(curr_vcpu) instead of
> current in the VMCS enter/exit functions would help.

This won't help, as it won't make vcpu_pause() wait (which is the
core of the problem).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.

2017-02-08 Thread anshul makkar
May be "causes". But not sure, as disabling this feature solves the 
issue (no hypervisor crash) and we have reports that the same code base 
works fine on Haswell (again no hardware support for PML in Haswell, but 
code base is same. But inherently it leads to non-execution of PML code 
path, so no crash)


Anshul


On 08/02/17 15:37, Jan Beulich wrote:

On 08.02.17 at 15:58,  wrote:

Code is trying to destroy multiple vcpus held by the domain:
complete_domain_destroy->hvm_vcpu_destroy() for each vcpu.

In vmx_vcpu_destroy, we have a call for vmx_vcpu_disable_pml which leads
to a race while destroying foreign vcpu's with other domains rebooting
on the same vcpus .

With a single domain reboot, no race is observed.

Commit e18d4274772e52ac81fda1acb246d11ef666e5fe causes this race condition.

Causes or exposes? I ask because the similar reports we have are
all on 4.4.x, i.e. no PML code there at all.

Jan




___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.

2017-02-08 Thread Jan Beulich
>>> On 08.02.17 at 15:58,  wrote:
> Code is trying to destroy multiple vcpus held by the domain: 
> complete_domain_destroy->hvm_vcpu_destroy() for each vcpu.
> 
> In vmx_vcpu_destroy, we have a call for vmx_vcpu_disable_pml which leads 
> to a race while destroying foreign vcpu's with other domains rebooting 
> on the same vcpus .
> 
> With a single domain reboot, no race is observed.
> 
> Commit e18d4274772e52ac81fda1acb246d11ef666e5fe causes this race condition.

Causes or exposes? I ask because the similar reports we have are
all on 4.4.x, i.e. no PML code there at all.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.

2017-02-08 Thread Jan Beulich
>>> On 07.02.17 at 18:26,  wrote:
> Facing a issue where bootstorm of guests leads to host crash. I debugged 
> and found that that enabling PML  introduces a  race condition during 
> guest teardown stage while disabling PML on a vcpu  and context switch 
> happening for another vcpu.
> 
> Crash happens only on Broadwell processors as PML got introduced in this 
> generation.
> 
> Here is my analysis:
> 
> Race condition:
> 
> vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( 
> disable_PML); vmx_vmcx_exit();)
> 
> In between I have a callpath from another pcpu executing context 
> switch-> vmx_fpu_leave() and crash on vmwrite..
> 
>if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) )
> {
>   v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS;
>   __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);  //crash
>   }

So that's after current has changed already, so it's effectively
dealing with a foreign VMCS, but it doesn't use vmx_vmcs_enter().
The locking done in vmx_vmcs_try_enter() / vmx_vmcs_exit(),
however, assumes that any user of a VMCS either owns the lock
or has current as the owner of the VMCS. Of course such a call
also can't be added here, as a vcpu on the context-switch-from
path can't vcpu_pause() itself.

That taken together with the bypassing of __context_switch()
when the incoming vCPU is the idle one (which means that via
context_saved() ->is_running will be cleared before running
->ctxt_switch_from()), the vcpu_pause() invocation in
vmx_vmcs_try_enter() may not have to wait at all if the call
happens between the clearing of ->is_running and the
eventual invocation of vmx_ctxt_switch_from().

If the above makes sense (which I'm not sure at all), the
question is whether using this_cpu(curr_vcpu) instead of
current in the VMCS enter/exit functions would help.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.

2017-02-08 Thread anshul makkar

Hi,

Code is trying to destroy multiple vcpus held by the domain: 
complete_domain_destroy->hvm_vcpu_destroy() for each vcpu.


In vmx_vcpu_destroy, we have a call for vmx_vcpu_disable_pml which leads 
to a race while destroying foreign vcpu's with other domains rebooting 
on the same vcpus .


With a single domain reboot, no race is observed.

Commit e18d4274772e52ac81fda1acb246d11ef666e5fe causes this race condition.

Anshul


On 07/02/17 17:58, anshul makkar wrote:

Hi, Sorry, forgot to include you in cc list.

Anshul


On 07/02/17 17:26, anshul makkar wrote:

Hi,

Facing a issue where bootstorm of guests leads to host crash. I 
debugged and found that that enabling PML  introduces a  race 
condition during guest teardown stage while disabling PML on a vcpu  
and context switch happening for another vcpu.


Crash happens only on Broadwell processors as PML got introduced in 
this generation.


Here is my analysis:

Race condition:

vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( 
disable_PML); vmx_vmcx_exit();)


In between I have a callpath from another pcpu executing context 
switch-> vmx_fpu_leave() and crash on vmwrite..


  if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) )
{
 v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS;
 __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); //crash
 }

Debug logs
XEN) [221256.749928] VMWRITE VMCS Invalid !
(XEN) [221256.754870] **[00] { now c93b4341df1d, hw 
0035fffea000, op 0035fffea000 } vmclear
(XEN) [221256.765052] ** frames [ 82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]


(XEN) [221256.773969] **[01] { now c93b4341e099, hw 
, op 0035fffea000 } vmptrld
(XEN) [221256.784150] ** frames [ 82d0801f0765 
vmx_vmcs_try_enter+0x95/0xb0 ]


(XEN) [221256.792197] **[02] { now c93b4341e1f1, hw 
0035fffea000, op 0035fffea000 } vmclear
(XEN) [221256.802378] ** frames [ 82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]


(XEN) [221256.811298] **[03] { now c93b5784dd0a, hw 
, op 0039d7074000 } vmptrld
(XEN) [221256.821478] ** frames [ 82d0801f4c31 
vmx_do_resume+0x51/0x150 ]


(XEN) [221256.829139] **[04] { now c93b59d67b5b, hw 
0039d7074000, op 002b9a575000 } vmptrld
(XEN) [221256.839320] ** frames [ 82d0801f4c31 
vmx_do_resume+0x51/0x150 ]


(XEN) [221256.882850] **[07] { now c93b59e71e48, hw 
002b9a575000, op 0039d7074000 } vmptrld
(XEN) [221256.893034] ** frames [ 82d0801f4d13 
vmx_do_resume+0x133/0x150 ]


(XEN) [221256.900790] **[08] { now c93b59e78675, hw 
0039d7074000, op 0040077ae000 } vmptrld
(XEN) [221256.910968] ** frames [ 82d0801f0765 
vmx_vmcs_try_enter+0x95/0xb0 ]


(XEN) [221256.919015] **[09] { now c93b59e78ac8, hw 
0040077ae000, op 0040077ae000 } vmclear
(XEN) [221256.929196] ** frames [ 82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]


(XEN) [221256.938117] **[10] { now c93b59e78d72, hw 
, op 0040077ae000 } vmptrld
(XEN) [221256.948297] ** frames [ 82d0801f0765 
vmx_vmcs_try_enter+0x95/0xb0 ]


(XEN) [221256.956345] **[11] { now c93b59e78ff0, hw 
0040077ae000, op 0040077ae000 } vmclear
(XEN) [221256.966525] ** frames [ 82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]


(XEN) [221256.975445] **[12] { now c93b59e7deda, hw 
, op 0040077b3000 } vmptrld
(XEN) [221256.985626] ** frames [ 82d0801f0765 
vmx_vmcs_try_enter+0x95/0xb0 ]


(XEN) [221256.993672] **[13] { now c93b59e9fe00, hw 
0040077b3000, op 0040077b3000 } vmclear
(XEN) [221257.003852] ** frames [ 82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]


(XEN) [221257.012772] **[14] { now c93b59ea007e, hw 
, op 0040077b3000 } vmptrld
(XEN) [221257.022952] ** frames [ 82d0801f0765 
vmx_vmcs_try_enter+0x95/0xb0 ]


(XEN) [221257.031000] **[15] { now c93b59ea02ba, hw 
0040077b3000, op 0040077b3000 } vmclear
(XEN) [221257.041180] ** frames [ 82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]


(XEN) [221257.050101]  
(XEN) [221257.053008] vmcs_ptr:0x, 
vcpu->vmcs:0x2b9a575000



vmcs is loaded and between the next call to vm_write, there is a 
clear of vmcs caused by vmx_vcpu_disable_pml.


Above log highlights that IPI is clearing the vmcs in between vmptrld 
and vmwrite  but I also verified that interrupts are disabled during 
context switch and execution of vm_write in vmx_fpu_leave.. This has 
got me confused.


Also, I am not sure if I understand the handling of foreign_vmcs 
correctly, which can also be the cause of the race.


Please if you can share some suggestions here.


Thanks

Anshul Makkar







___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel





___
Xen-devel mailing list
Xen-devel@lists.xen.org

[Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.

2017-02-07 Thread anshul makkar

Hi,

Facing a issue where bootstorm of guests leads to host crash. I debugged 
and found that that enabling PML  introduces a  race condition during 
guest teardown stage while disabling PML on a vcpu  and context switch 
happening for another vcpu.


Crash happens only on Broadwell processors as PML got introduced in this 
generation.


Here is my analysis:

Race condition:

vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( 
disable_PML); vmx_vmcx_exit();)


In between I have a callpath from another pcpu executing context 
switch-> vmx_fpu_leave() and crash on vmwrite..


  if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) )
{
 v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS;
 __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);  //crash
 }

Debug logs
XEN) [221256.749928] VMWRITE VMCS Invalid !
(XEN) [221256.754870] **[00] { now c93b4341df1d, hw 
0035fffea000, op 0035fffea000 } vmclear
(XEN) [221256.765052] ** frames [ 82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]


(XEN) [221256.773969] **[01] { now c93b4341e099, hw 
, op 0035fffea000 } vmptrld
(XEN) [221256.784150] ** frames [ 82d0801f0765 
vmx_vmcs_try_enter+0x95/0xb0 ]


(XEN) [221256.792197] **[02] { now c93b4341e1f1, hw 
0035fffea000, op 0035fffea000 } vmclear
(XEN) [221256.802378] ** frames [ 82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]


(XEN) [221256.811298] **[03] { now c93b5784dd0a, hw 
, op 0039d7074000 } vmptrld
(XEN) [221256.821478] ** frames [ 82d0801f4c31 
vmx_do_resume+0x51/0x150 ]


(XEN) [221256.829139] **[04] { now c93b59d67b5b, hw 
0039d7074000, op 002b9a575000 } vmptrld
(XEN) [221256.839320] ** frames [ 82d0801f4c31 
vmx_do_resume+0x51/0x150 ]


(XEN) [221256.882850] **[07] { now c93b59e71e48, hw 
002b9a575000, op 0039d7074000 } vmptrld
(XEN) [221256.893034] ** frames [ 82d0801f4d13 
vmx_do_resume+0x133/0x150 ]


(XEN) [221256.900790] **[08] { now c93b59e78675, hw 
0039d7074000, op 0040077ae000 } vmptrld
(XEN) [221256.910968] ** frames [ 82d0801f0765 
vmx_vmcs_try_enter+0x95/0xb0 ]


(XEN) [221256.919015] **[09] { now c93b59e78ac8, hw 
0040077ae000, op 0040077ae000 } vmclear
(XEN) [221256.929196] ** frames [ 82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]


(XEN) [221256.938117] **[10] { now c93b59e78d72, hw 
, op 0040077ae000 } vmptrld
(XEN) [221256.948297] ** frames [ 82d0801f0765 
vmx_vmcs_try_enter+0x95/0xb0 ]


(XEN) [221256.956345] **[11] { now c93b59e78ff0, hw 
0040077ae000, op 0040077ae000 } vmclear
(XEN) [221256.966525] ** frames [ 82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]


(XEN) [221256.975445] **[12] { now c93b59e7deda, hw 
, op 0040077b3000 } vmptrld
(XEN) [221256.985626] ** frames [ 82d0801f0765 
vmx_vmcs_try_enter+0x95/0xb0 ]


(XEN) [221256.993672] **[13] { now c93b59e9fe00, hw 
0040077b3000, op 0040077b3000 } vmclear
(XEN) [221257.003852] ** frames [ 82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]


(XEN) [221257.012772] **[14] { now c93b59ea007e, hw 
, op 0040077b3000 } vmptrld
(XEN) [221257.022952] ** frames [ 82d0801f0765 
vmx_vmcs_try_enter+0x95/0xb0 ]


(XEN) [221257.031000] **[15] { now c93b59ea02ba, hw 
0040077b3000, op 0040077b3000 } vmclear
(XEN) [221257.041180] ** frames [ 82d080134652 
smp_call_function_interrupt+0x92/0xa0 ]


(XEN) [221257.050101]  
(XEN) [221257.053008] vmcs_ptr:0x, vcpu->vmcs:0x2b9a575000


vmcs is loaded and between the next call to vm_write, there is a clear 
of vmcs caused by vmx_vcpu_disable_pml.


Above log highlights that IPI is clearing the vmcs in between vmptrld 
and vmwrite  but I also verified that interrupts are disabled during 
context switch and execution of vm_write in vmx_fpu_leave.. This has got 
me confused.


Also, I am not sure if I understand the handling of foreign_vmcs 
correctly, which can also be the cause of the race.


Please if you can share some suggestions here.


Thanks

Anshul Makkar







___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel