Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
>>> On 13.02.17 at 17:56,wrote: On 13.02.17 at 17:32, wrote: >> On 09/02/17 16:22, Jan Beulich wrote: >>> Mind giving the attached patch a try (which admittedly was only >>> lightly tested so far - in particular I haven't seen the second of >>> the debug messages being logged, yet)? >> Patch looks promising. I couldn't do much thorough testing, but initial >> reboot cycles (around 20 reboots of 32 VMS) went fine. > > Thanks. The most interesting part though is whether the 2nd of the > two log messages ever showed up. I any event I'll submit the cleaned > up patch, for more formal discussion of the approach to happen there. Actually, no, while putting together the commit message I thought of a second situation which likely would also need addressing: If we bypass __context_switch() also on the context-switch-in path (because of scheduling the same vCPU again) the CPU may also have lost control of the VMCS. There's no context switch hook though to place the reload invocation in, so I'll first have to think about what the best model here would be (of course I'd be more than happy about suggestions). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
>>> On 13.02.17 at 17:32,wrote: > On 09/02/17 16:22, Jan Beulich wrote: >> Mind giving the attached patch a try (which admittedly was only >> lightly tested so far - in particular I haven't seen the second of >> the debug messages being logged, yet)? > Patch looks promising. I couldn't do much thorough testing, but initial > reboot cycles (around 20 reboots of 32 VMS) went fine. Thanks. The most interesting part though is whether the 2nd of the two log messages ever showed up. I any event I'll submit the cleaned up patch, for more formal discussion of the approach to happen there. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
Hi Jan, On 09/02/17 16:22, Jan Beulich wrote: On 08.02.17 at 16:32,wrote: On 07.02.17 at 18:26, wrote: Facing a issue where bootstorm of guests leads to host crash. I debugged and found that that enabling PML introduces a race condition during guest teardown stage while disabling PML on a vcpu and context switch happening for another vcpu. Crash happens only on Broadwell processors as PML got introduced in this generation. Here is my analysis: Race condition: vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( disable_PML); vmx_vmcx_exit();) In between I have a callpath from another pcpu executing context switch-> vmx_fpu_leave() and crash on vmwrite.. if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) { v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS; __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); //crash } So that's after current has changed already, so it's effectively dealing with a foreign VMCS, but it doesn't use vmx_vmcs_enter(). The locking done in vmx_vmcs_try_enter() / vmx_vmcs_exit(), however, assumes that any user of a VMCS either owns the lock or has current as the owner of the VMCS. Of course such a call also can't be added here, as a vcpu on the context-switch-from path can't vcpu_pause() itself. That taken together with the bypassing of __context_switch() when the incoming vCPU is the idle one (which means that via context_saved() ->is_running will be cleared before running ->ctxt_switch_from()), the vcpu_pause() invocation in vmx_vmcs_try_enter() may not have to wait at all if the call happens between the clearing of ->is_running and the eventual invocation of vmx_ctxt_switch_from(). Mind giving the attached patch a try (which admittedly was only lightly tested so far - in particular I haven't seen the second of the debug messages being logged, yet)? Patch looks promising. I couldn't do much thorough testing, but initial reboot cycles (around 20 reboots of 32 VMS) went fine. Jan Anshul ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
>>> On 08.02.17 at 16:32,wrote: On 07.02.17 at 18:26, wrote: >> Facing a issue where bootstorm of guests leads to host crash. I debugged >> and found that that enabling PML introduces a race condition during >> guest teardown stage while disabling PML on a vcpu and context switch >> happening for another vcpu. >> >> Crash happens only on Broadwell processors as PML got introduced in this >> generation. >> >> Here is my analysis: >> >> Race condition: >> >> vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( >> disable_PML); vmx_vmcx_exit();) >> >> In between I have a callpath from another pcpu executing context >> switch-> vmx_fpu_leave() and crash on vmwrite.. >> >>if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) >> { >> v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS; >> __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); //crash >> } > > So that's after current has changed already, so it's effectively > dealing with a foreign VMCS, but it doesn't use vmx_vmcs_enter(). > The locking done in vmx_vmcs_try_enter() / vmx_vmcs_exit(), > however, assumes that any user of a VMCS either owns the lock > or has current as the owner of the VMCS. Of course such a call > also can't be added here, as a vcpu on the context-switch-from > path can't vcpu_pause() itself. > > That taken together with the bypassing of __context_switch() > when the incoming vCPU is the idle one (which means that via > context_saved() ->is_running will be cleared before running > ->ctxt_switch_from()), the vcpu_pause() invocation in > vmx_vmcs_try_enter() may not have to wait at all if the call > happens between the clearing of ->is_running and the > eventual invocation of vmx_ctxt_switch_from(). > > If the above makes sense (which I'm not sure at all), the > question is whether using this_cpu(curr_vcpu) instead of > current in the VMCS enter/exit functions would help. This won't help, as it won't make vcpu_pause() wait (which is the core of the problem). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
May be "causes". But not sure, as disabling this feature solves the issue (no hypervisor crash) and we have reports that the same code base works fine on Haswell (again no hardware support for PML in Haswell, but code base is same. But inherently it leads to non-execution of PML code path, so no crash) Anshul On 08/02/17 15:37, Jan Beulich wrote: On 08.02.17 at 15:58,wrote: Code is trying to destroy multiple vcpus held by the domain: complete_domain_destroy->hvm_vcpu_destroy() for each vcpu. In vmx_vcpu_destroy, we have a call for vmx_vcpu_disable_pml which leads to a race while destroying foreign vcpu's with other domains rebooting on the same vcpus . With a single domain reboot, no race is observed. Commit e18d4274772e52ac81fda1acb246d11ef666e5fe causes this race condition. Causes or exposes? I ask because the similar reports we have are all on 4.4.x, i.e. no PML code there at all. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
>>> On 08.02.17 at 15:58,wrote: > Code is trying to destroy multiple vcpus held by the domain: > complete_domain_destroy->hvm_vcpu_destroy() for each vcpu. > > In vmx_vcpu_destroy, we have a call for vmx_vcpu_disable_pml which leads > to a race while destroying foreign vcpu's with other domains rebooting > on the same vcpus . > > With a single domain reboot, no race is observed. > > Commit e18d4274772e52ac81fda1acb246d11ef666e5fe causes this race condition. Causes or exposes? I ask because the similar reports we have are all on 4.4.x, i.e. no PML code there at all. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
>>> On 07.02.17 at 18:26,wrote: > Facing a issue where bootstorm of guests leads to host crash. I debugged > and found that that enabling PML introduces a race condition during > guest teardown stage while disabling PML on a vcpu and context switch > happening for another vcpu. > > Crash happens only on Broadwell processors as PML got introduced in this > generation. > > Here is my analysis: > > Race condition: > > vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( > disable_PML); vmx_vmcx_exit();) > > In between I have a callpath from another pcpu executing context > switch-> vmx_fpu_leave() and crash on vmwrite.. > >if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) > { > v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS; > __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); //crash > } So that's after current has changed already, so it's effectively dealing with a foreign VMCS, but it doesn't use vmx_vmcs_enter(). The locking done in vmx_vmcs_try_enter() / vmx_vmcs_exit(), however, assumes that any user of a VMCS either owns the lock or has current as the owner of the VMCS. Of course such a call also can't be added here, as a vcpu on the context-switch-from path can't vcpu_pause() itself. That taken together with the bypassing of __context_switch() when the incoming vCPU is the idle one (which means that via context_saved() ->is_running will be cleared before running ->ctxt_switch_from()), the vcpu_pause() invocation in vmx_vmcs_try_enter() may not have to wait at all if the call happens between the clearing of ->is_running and the eventual invocation of vmx_ctxt_switch_from(). If the above makes sense (which I'm not sure at all), the question is whether using this_cpu(curr_vcpu) instead of current in the VMCS enter/exit functions would help. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
Hi, Code is trying to destroy multiple vcpus held by the domain: complete_domain_destroy->hvm_vcpu_destroy() for each vcpu. In vmx_vcpu_destroy, we have a call for vmx_vcpu_disable_pml which leads to a race while destroying foreign vcpu's with other domains rebooting on the same vcpus . With a single domain reboot, no race is observed. Commit e18d4274772e52ac81fda1acb246d11ef666e5fe causes this race condition. Anshul On 07/02/17 17:58, anshul makkar wrote: Hi, Sorry, forgot to include you in cc list. Anshul On 07/02/17 17:26, anshul makkar wrote: Hi, Facing a issue where bootstorm of guests leads to host crash. I debugged and found that that enabling PML introduces a race condition during guest teardown stage while disabling PML on a vcpu and context switch happening for another vcpu. Crash happens only on Broadwell processors as PML got introduced in this generation. Here is my analysis: Race condition: vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( disable_PML); vmx_vmcx_exit();) In between I have a callpath from another pcpu executing context switch-> vmx_fpu_leave() and crash on vmwrite.. if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) { v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS; __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); //crash } Debug logs XEN) [221256.749928] VMWRITE VMCS Invalid ! (XEN) [221256.754870] **[00] { now c93b4341df1d, hw 0035fffea000, op 0035fffea000 } vmclear (XEN) [221256.765052] ** frames [ 82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221256.773969] **[01] { now c93b4341e099, hw , op 0035fffea000 } vmptrld (XEN) [221256.784150] ** frames [ 82d0801f0765 vmx_vmcs_try_enter+0x95/0xb0 ] (XEN) [221256.792197] **[02] { now c93b4341e1f1, hw 0035fffea000, op 0035fffea000 } vmclear (XEN) [221256.802378] ** frames [ 82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221256.811298] **[03] { now c93b5784dd0a, hw , op 0039d7074000 } vmptrld (XEN) [221256.821478] ** frames [ 82d0801f4c31 vmx_do_resume+0x51/0x150 ] (XEN) [221256.829139] **[04] { now c93b59d67b5b, hw 0039d7074000, op 002b9a575000 } vmptrld (XEN) [221256.839320] ** frames [ 82d0801f4c31 vmx_do_resume+0x51/0x150 ] (XEN) [221256.882850] **[07] { now c93b59e71e48, hw 002b9a575000, op 0039d7074000 } vmptrld (XEN) [221256.893034] ** frames [ 82d0801f4d13 vmx_do_resume+0x133/0x150 ] (XEN) [221256.900790] **[08] { now c93b59e78675, hw 0039d7074000, op 0040077ae000 } vmptrld (XEN) [221256.910968] ** frames [ 82d0801f0765 vmx_vmcs_try_enter+0x95/0xb0 ] (XEN) [221256.919015] **[09] { now c93b59e78ac8, hw 0040077ae000, op 0040077ae000 } vmclear (XEN) [221256.929196] ** frames [ 82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221256.938117] **[10] { now c93b59e78d72, hw , op 0040077ae000 } vmptrld (XEN) [221256.948297] ** frames [ 82d0801f0765 vmx_vmcs_try_enter+0x95/0xb0 ] (XEN) [221256.956345] **[11] { now c93b59e78ff0, hw 0040077ae000, op 0040077ae000 } vmclear (XEN) [221256.966525] ** frames [ 82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221256.975445] **[12] { now c93b59e7deda, hw , op 0040077b3000 } vmptrld (XEN) [221256.985626] ** frames [ 82d0801f0765 vmx_vmcs_try_enter+0x95/0xb0 ] (XEN) [221256.993672] **[13] { now c93b59e9fe00, hw 0040077b3000, op 0040077b3000 } vmclear (XEN) [221257.003852] ** frames [ 82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221257.012772] **[14] { now c93b59ea007e, hw , op 0040077b3000 } vmptrld (XEN) [221257.022952] ** frames [ 82d0801f0765 vmx_vmcs_try_enter+0x95/0xb0 ] (XEN) [221257.031000] **[15] { now c93b59ea02ba, hw 0040077b3000, op 0040077b3000 } vmclear (XEN) [221257.041180] ** frames [ 82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221257.050101] (XEN) [221257.053008] vmcs_ptr:0x, vcpu->vmcs:0x2b9a575000 vmcs is loaded and between the next call to vm_write, there is a clear of vmcs caused by vmx_vcpu_disable_pml. Above log highlights that IPI is clearing the vmcs in between vmptrld and vmwrite but I also verified that interrupts are disabled during context switch and execution of vm_write in vmx_fpu_leave.. This has got me confused. Also, I am not sure if I understand the handling of foreign_vmcs correctly, which can also be the cause of the race. Please if you can share some suggestions here. Thanks Anshul Makkar ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org
[Xen-devel] PML causing race condition during guest bootstorm and host crash on Broadwell cpu.
Hi, Facing a issue where bootstorm of guests leads to host crash. I debugged and found that that enabling PML introduces a race condition during guest teardown stage while disabling PML on a vcpu and context switch happening for another vcpu. Crash happens only on Broadwell processors as PML got introduced in this generation. Here is my analysis: Race condition: vmcs.c vmx_vcpu_disable_pml (vcpu){ vmx_vmcs_enter() ; vm_write( disable_PML); vmx_vmcx_exit();) In between I have a callpath from another pcpu executing context switch-> vmx_fpu_leave() and crash on vmwrite.. if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) { v->arch.hvm_vmx.host_cr0 |= X86_CR0_TS; __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); //crash } Debug logs XEN) [221256.749928] VMWRITE VMCS Invalid ! (XEN) [221256.754870] **[00] { now c93b4341df1d, hw 0035fffea000, op 0035fffea000 } vmclear (XEN) [221256.765052] ** frames [ 82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221256.773969] **[01] { now c93b4341e099, hw , op 0035fffea000 } vmptrld (XEN) [221256.784150] ** frames [ 82d0801f0765 vmx_vmcs_try_enter+0x95/0xb0 ] (XEN) [221256.792197] **[02] { now c93b4341e1f1, hw 0035fffea000, op 0035fffea000 } vmclear (XEN) [221256.802378] ** frames [ 82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221256.811298] **[03] { now c93b5784dd0a, hw , op 0039d7074000 } vmptrld (XEN) [221256.821478] ** frames [ 82d0801f4c31 vmx_do_resume+0x51/0x150 ] (XEN) [221256.829139] **[04] { now c93b59d67b5b, hw 0039d7074000, op 002b9a575000 } vmptrld (XEN) [221256.839320] ** frames [ 82d0801f4c31 vmx_do_resume+0x51/0x150 ] (XEN) [221256.882850] **[07] { now c93b59e71e48, hw 002b9a575000, op 0039d7074000 } vmptrld (XEN) [221256.893034] ** frames [ 82d0801f4d13 vmx_do_resume+0x133/0x150 ] (XEN) [221256.900790] **[08] { now c93b59e78675, hw 0039d7074000, op 0040077ae000 } vmptrld (XEN) [221256.910968] ** frames [ 82d0801f0765 vmx_vmcs_try_enter+0x95/0xb0 ] (XEN) [221256.919015] **[09] { now c93b59e78ac8, hw 0040077ae000, op 0040077ae000 } vmclear (XEN) [221256.929196] ** frames [ 82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221256.938117] **[10] { now c93b59e78d72, hw , op 0040077ae000 } vmptrld (XEN) [221256.948297] ** frames [ 82d0801f0765 vmx_vmcs_try_enter+0x95/0xb0 ] (XEN) [221256.956345] **[11] { now c93b59e78ff0, hw 0040077ae000, op 0040077ae000 } vmclear (XEN) [221256.966525] ** frames [ 82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221256.975445] **[12] { now c93b59e7deda, hw , op 0040077b3000 } vmptrld (XEN) [221256.985626] ** frames [ 82d0801f0765 vmx_vmcs_try_enter+0x95/0xb0 ] (XEN) [221256.993672] **[13] { now c93b59e9fe00, hw 0040077b3000, op 0040077b3000 } vmclear (XEN) [221257.003852] ** frames [ 82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221257.012772] **[14] { now c93b59ea007e, hw , op 0040077b3000 } vmptrld (XEN) [221257.022952] ** frames [ 82d0801f0765 vmx_vmcs_try_enter+0x95/0xb0 ] (XEN) [221257.031000] **[15] { now c93b59ea02ba, hw 0040077b3000, op 0040077b3000 } vmclear (XEN) [221257.041180] ** frames [ 82d080134652 smp_call_function_interrupt+0x92/0xa0 ] (XEN) [221257.050101] (XEN) [221257.053008] vmcs_ptr:0x, vcpu->vmcs:0x2b9a575000 vmcs is loaded and between the next call to vm_write, there is a clear of vmcs caused by vmx_vcpu_disable_pml. Above log highlights that IPI is clearing the vmcs in between vmptrld and vmwrite but I also verified that interrupts are disabled during context switch and execution of vm_write in vmx_fpu_leave.. This has got me confused. Also, I am not sure if I understand the handling of foreign_vmcs correctly, which can also be the cause of the race. Please if you can share some suggestions here. Thanks Anshul Makkar ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel