Re: [Xen-devel] [stable-4.11] Heads-up: c719519 (x86/SMP: don't try to stop already stopped CPUs) causes 100% kexec/kdump failure

2019-10-29 Thread Jan Beulich
On 29.10.2019 12:29, Sergey Dyasli wrote:
> On 28/10/2019 17:30, Stonehouse, Robert wrote:
>> This is a heads-up as I have observed that the following commit (backported 
>> onto an Amazon 4.11 tree) causes kexec (and hence kdump) to fail. 
>> 
>> commit c719519a4183d0630121f6abeba420f49dbc3229
>> Author: Jan Beulich 
>> AuthorDate: Fri Jul 5 10:32:41 2019 +0200
>> Commit: Jan Beulich 
>> CommitDate: Fri Jul 5 10:32:41 2019 +0200
>>
>> x86/SMP: don't try to stop already stopped CPUs
>> 
>> In particular with an enabled IOMMU (but not really limited to this
>> case), trying to invoke fixup_irqs() after having already done
>> disable_IO_APIC() -> clear_IO_APIC() is a rather bad idea:
>> 
> 
> This was already fixed in staging by "x86/crash: fix kexec transition 
> breakage":
> 
>   
> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=f56813f3470c5b4987963c3c41e4fe16b95c5a3f
> 
> Looks like it needs inclusion into 4.11 branch.

Hmm, in principle I did fish out this one and a few more for
backporting. But it looks like I've applied them to the 4.12
branch only. Thanks for noticing!

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [stable-4.11] Heads-up: c719519 (x86/SMP: don't try to stop already stopped CPUs) causes 100% kexec/kdump failure

2019-10-29 Thread Sergey Dyasli
On 28/10/2019 17:30, Stonehouse, Robert wrote:
> This is a heads-up as I have observed that the following commit (backported 
> onto an Amazon 4.11 tree) causes kexec (and hence kdump) to fail. 
> 
> commit c719519a4183d0630121f6abeba420f49dbc3229
> Author: Jan Beulich 
> AuthorDate: Fri Jul 5 10:32:41 2019 +0200
> Commit: Jan Beulich 
> CommitDate: Fri Jul 5 10:32:41 2019 +0200
> 
> x86/SMP: don't try to stop already stopped CPUs
> 
> In particular with an enabled IOMMU (but not really limited to this
> case), trying to invoke fixup_irqs() after having already done
> disable_IO_APIC() -> clear_IO_APIC() is a rather bad idea:
> 

This was already fixed in staging by "x86/crash: fix kexec transition breakage":


https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=f56813f3470c5b4987963c3c41e4fe16b95c5a3f

Looks like it needs inclusion into 4.11 branch.

--
Thanks,
Sergey

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [stable-4.11] Heads-up: c719519 (x86/SMP: don't try to stop already stopped CPUs) causes 100% kexec/kdump failure

2019-10-29 Thread Dietmar Hahn
Hi,

Am Montag, 28. Oktober 2019, 18:30:12 CET schrieb Stonehouse, Robert:
> This is a heads-up as I have observed that the following commit (backported 
> onto an Amazon 4.11 tree) causes kexec (and hence kdump) to fail. 
> 
> commit c719519a4183d0630121f6abeba420f49dbc3229
> Author: Jan Beulich 
> AuthorDate: Fri Jul 5 10:32:41 2019 +0200
> Commit: Jan Beulich 
> CommitDate: Fri Jul 5 10:32:41 2019 +0200
> 
> x86/SMP: don't try to stop already stopped CPUs
> 
> In particular with an enabled IOMMU (but not really limited to this
> case), trying to invoke fixup_irqs() after having already done
> disable_IO_APIC() -> clear_IO_APIC() is a rather bad idea:
> 
> 
> The test was performing "echo c > /proc/sysrq-trigger" in dom0 and the loaded 
> crash kernel fails to show any signs of starting. This is the end of the Xen 
> console ...
> 
> (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
> 
> 
> Expected behaviour is that the kdump kernel immediately loads and then 
> performs the crash dump

I can confirm this behavior but with xen version (4.11.0_08-1) from
SuSE SLES12 SP4 which doesn't contain the said commit
c719519a4183d0630121f6abeba420f49dbc3229.But I can see this only on systems 
with newer Intel CPUS like
"Intel(R) Xeon(R) Gold 6242 CPU".



> 
> I'm sorry that I have not yet had time to check if this affects vanilla 
> stable-4.11 or master. I just wanted to be certain that you don't have the 
> same issue.
> 
> 
> Reverting one hunk via the following commit fixes things for me (this is an 
> experiment and not at all a proposed fix)
> 
> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -303,15 +303,15 @@ static void stop_this_cpu(void *dummy)
>  void smp_send_stop(void)
>  {
>  unsigned int cpu = smp_processor_id();
> +
> +local_irq_disable();
> +fixup_irqs(cpumask_of(cpu), 0);
> +local_irq_enable();
>  
>  if ( num_online_cpus() > 1 )
>  {
>  int timeout = 10;
>  
> -local_irq_disable();
> -fixup_irqs(cpumask_of(cpu), 0);
> -local_irq_enable();
> -
>  smp_call_function(stop_this_cpu, NULL, 0);
>  
>  /* Wait 10ms for all other CPUs to go offline. */
> 
> 
> Regards
> Rob
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [stable-4.11] Heads-up: c719519 (x86/SMP: don't try to stop already stopped CPUs) causes 100% kexec/kdump failure

2019-10-29 Thread Jan Beulich
On 28.10.2019 18:30, Stonehouse, Robert wrote:
> Reverting one hunk via the following commit fixes things for me (this is an 
> experiment and not at all a proposed fix)
> 
> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -303,15 +303,15 @@ static void stop_this_cpu(void *dummy)
>  void smp_send_stop(void)
>  {
>  unsigned int cpu = smp_processor_id();
> +
> +local_irq_disable();
> +fixup_irqs(cpumask_of(cpu), 0);
> +local_irq_enable();
> 
>  if ( num_online_cpus() > 1 )
>  {
>  int timeout = 10;
>  
> -local_irq_disable();
> -fixup_irqs(cpumask_of(cpu), 0);
> -local_irq_enable();

Are you saying we get here the first time only when num_online_cpus()
already returns 1 (but there are actually multiple CPUs, i.e. affinity
changes are actually needed)? If so - why?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel