Re: [Xen-devel] [stable-4.11] Heads-up: c719519 (x86/SMP: don't try to stop already stopped CPUs) causes 100% kexec/kdump failure

2019-10-29 Thread Jan Beulich
On 29.10.2019 12:29, Sergey Dyasli wrote:
> On 28/10/2019 17:30, Stonehouse, Robert wrote:
>> This is a heads-up as I have observed that the following commit (backported 
>> onto an Amazon 4.11 tree) causes kexec (and hence kdump) to fail. 
>> 
>> commit c719519a4183d0630121f6abeba420f49dbc3229
>> Author: Jan Beulich 
>> AuthorDate: Fri Jul 5 10:32:41 2019 +0200
>> Commit: Jan Beulich 
>> CommitDate: Fri Jul 5 10:32:41 2019 +0200
>>
>> x86/SMP: don't try to stop already stopped CPUs
>> 
>> In particular with an enabled IOMMU (but not really limited to this
>> case), trying to invoke fixup_irqs() after having already done
>> disable_IO_APIC() -> clear_IO_APIC() is a rather bad idea:
>> 
> 
> This was already fixed in staging by "x86/crash: fix kexec transition 
> breakage":
> 
>   
> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=f56813f3470c5b4987963c3c41e4fe16b95c5a3f
> 
> Looks like it needs inclusion into 4.11 branch.

Hmm, in principle I did fish out this one and a few more for
backporting. But it looks like I've applied them to the 4.12
branch only. Thanks for noticing!

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [stable-4.11] Heads-up: c719519 (x86/SMP: don't try to stop already stopped CPUs) causes 100% kexec/kdump failure

2019-10-29 Thread Sergey Dyasli
On 28/10/2019 17:30, Stonehouse, Robert wrote:
> This is a heads-up as I have observed that the following commit (backported 
> onto an Amazon 4.11 tree) causes kexec (and hence kdump) to fail. 
> 
> commit c719519a4183d0630121f6abeba420f49dbc3229
> Author: Jan Beulich 
> AuthorDate: Fri Jul 5 10:32:41 2019 +0200
> Commit: Jan Beulich 
> CommitDate: Fri Jul 5 10:32:41 2019 +0200
> 
> x86/SMP: don't try to stop already stopped CPUs
> 
> In particular with an enabled IOMMU (but not really limited to this
> case), trying to invoke fixup_irqs() after having already done
> disable_IO_APIC() -> clear_IO_APIC() is a rather bad idea:
> 

This was already fixed in staging by "x86/crash: fix kexec transition breakage":


https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=f56813f3470c5b4987963c3c41e4fe16b95c5a3f

Looks like it needs inclusion into 4.11 branch.

--
Thanks,
Sergey

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [stable-4.11] Heads-up: c719519 (x86/SMP: don't try to stop already stopped CPUs) causes 100% kexec/kdump failure

2019-10-29 Thread Dietmar Hahn
Hi,

Am Montag, 28. Oktober 2019, 18:30:12 CET schrieb Stonehouse, Robert:
> This is a heads-up as I have observed that the following commit (backported 
> onto an Amazon 4.11 tree) causes kexec (and hence kdump) to fail. 
> 
> commit c719519a4183d0630121f6abeba420f49dbc3229
> Author: Jan Beulich 
> AuthorDate: Fri Jul 5 10:32:41 2019 +0200
> Commit: Jan Beulich 
> CommitDate: Fri Jul 5 10:32:41 2019 +0200
> 
> x86/SMP: don't try to stop already stopped CPUs
> 
> In particular with an enabled IOMMU (but not really limited to this
> case), trying to invoke fixup_irqs() after having already done
> disable_IO_APIC() -> clear_IO_APIC() is a rather bad idea:
> 
> 
> The test was performing "echo c > /proc/sysrq-trigger" in dom0 and the loaded 
> crash kernel fails to show any signs of starting. This is the end of the Xen 
> console ...
> 
> (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
> 
> 
> Expected behaviour is that the kdump kernel immediately loads and then 
> performs the crash dump

I can confirm this behavior but with xen version (4.11.0_08-1) from
SuSE SLES12 SP4 which doesn't contain the said commit
c719519a4183d0630121f6abeba420f49dbc3229.But I can see this only on systems 
with newer Intel CPUS like
"Intel(R) Xeon(R) Gold 6242 CPU".



> 
> I'm sorry that I have not yet had time to check if this affects vanilla 
> stable-4.11 or master. I just wanted to be certain that you don't have the 
> same issue.
> 
> 
> Reverting one hunk via the following commit fixes things for me (this is an 
> experiment and not at all a proposed fix)
> 
> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -303,15 +303,15 @@ static void stop_this_cpu(void *dummy)
>  void smp_send_stop(void)
>  {
>  unsigned int cpu = smp_processor_id();
> +
> +local_irq_disable();
> +fixup_irqs(cpumask_of(cpu), 0);
> +local_irq_enable();
>  
>  if ( num_online_cpus() > 1 )
>  {
>  int timeout = 10;
>  
> -local_irq_disable();
> -fixup_irqs(cpumask_of(cpu), 0);
> -local_irq_enable();
> -
>  smp_call_function(stop_this_cpu, NULL, 0);
>  
>  /* Wait 10ms for all other CPUs to go offline. */
> 
> 
> Regards
> Rob
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [stable-4.11] Heads-up: c719519 (x86/SMP: don't try to stop already stopped CPUs) causes 100% kexec/kdump failure

2019-10-29 Thread Jan Beulich
On 28.10.2019 18:30, Stonehouse, Robert wrote:
> Reverting one hunk via the following commit fixes things for me (this is an 
> experiment and not at all a proposed fix)
> 
> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -303,15 +303,15 @@ static void stop_this_cpu(void *dummy)
>  void smp_send_stop(void)
>  {
>  unsigned int cpu = smp_processor_id();
> +
> +local_irq_disable();
> +fixup_irqs(cpumask_of(cpu), 0);
> +local_irq_enable();
> 
>  if ( num_online_cpus() > 1 )
>  {
>  int timeout = 10;
>  
> -local_irq_disable();
> -fixup_irqs(cpumask_of(cpu), 0);
> -local_irq_enable();

Are you saying we get here the first time only when num_online_cpus()
already returns 1 (but there are actually multiple CPUs, i.e. affinity
changes are actually needed)? If so - why?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [stable-4.11] Heads-up: c719519 (x86/SMP: don't try to stop already stopped CPUs) causes 100% kexec/kdump failure

2019-10-28 Thread Stonehouse, Robert
This is a heads-up as I have observed that the following commit (backported 
onto an Amazon 4.11 tree) causes kexec (and hence kdump) to fail. 

commit c719519a4183d0630121f6abeba420f49dbc3229
Author: Jan Beulich 
AuthorDate: Fri Jul 5 10:32:41 2019 +0200
Commit: Jan Beulich 
CommitDate: Fri Jul 5 10:32:41 2019 +0200

x86/SMP: don't try to stop already stopped CPUs

In particular with an enabled IOMMU (but not really limited to this
case), trying to invoke fixup_irqs() after having already done
disable_IO_APIC() -> clear_IO_APIC() is a rather bad idea:


The test was performing "echo c > /proc/sysrq-trigger" in dom0 and the loaded 
crash kernel fails to show any signs of starting. This is the end of the Xen 
console ...

(XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.


Expected behaviour is that the kdump kernel immediately loads and then performs 
the crash dump

I'm sorry that I have not yet had time to check if this affects vanilla 
stable-4.11 or master. I just wanted to be certain that you don't have the same 
issue.


Reverting one hunk via the following commit fixes things for me (this is an 
experiment and not at all a proposed fix)

--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -303,15 +303,15 @@ static void stop_this_cpu(void *dummy)
 void smp_send_stop(void)
 {
 unsigned int cpu = smp_processor_id();
+
+local_irq_disable();
+fixup_irqs(cpumask_of(cpu), 0);
+local_irq_enable();
 
 if ( num_online_cpus() > 1 )
 {
 int timeout = 10;
 
-local_irq_disable();
-fixup_irqs(cpumask_of(cpu), 0);
-local_irq_enable();
-
 smp_call_function(stop_this_cpu, NULL, 0);
 
 /* Wait 10ms for all other CPUs to go offline. */


Regards
Rob

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel