RE: [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-09-25 Thread 河合英宏 / KAWAI,HIDEHIRO
Peter saids -tip tree doesn't have panic_on_unrecovered_nmi in the previoius discussion, but it still exists. So, I didn't change anything about panic_on_unrecovered_nmi. Thanks, Hidehiro Kawai Hitachi, Ltd. Research & Development Group > From: Hidehiro Kawai [mailto:hidehiro.kawai...@hitachi.c

[V4 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-09-25 Thread Hidehiro Kawai
nmi_shootdown_cpus(), a subroutine of crash_kexec(), sends NMI IPI to non-panic cpus to stop them while saving their register information and doing some cleanups for crash dumping. So if a non-panic cpus is infinitely looping in NMI context, we fail to save its register information and lose the in

[V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-09-25 Thread Hidehiro Kawai
If panic on NMI happens just after panic() on the same CPU, panic() is recursively called. As the result, it stalls after failing to acquire panic_lock. To avoid this problem, don't call panic() in NMI context if we've already entered panic(). V4: - Improve comments in io_check_error() and panic

[V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-09-25 Thread Hidehiro Kawai
This patch introduces new boot option "noextnmi" which disables external NMI. This option is useful for the dump capture kernel so that an HA application or administrator wouldn't mistakenly shoot down the kernel by NMI. Currently, only x86 supports this option. Signed-off-by: Hidehiro Kawai Cc

[V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-09-25 Thread Hidehiro Kawai
Currently, panic() and crash_kexec() can be called at the same time. For example (x86 case): CPU 0: oops_end() crash_kexec() mutex_trylock() // acquired nmi_shootdown_cpus() // stop other cpus CPU 1: panic() crash_kexec() mutex_trylock() // failed to acquire sm

[V4 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec

2015-09-25 Thread Hidehiro Kawai
When an HA clustering software or administrator detects unresponsivenes of a host, they issue an NMI to the host to completely stop current works and take a crash dump. If the kernel has already panicked or is capturing a crash dump at that time, further NMI can cause a crash dump failure. Also,

[PATCH V3] kexec: Use file name as the output message prefix

2015-09-25 Thread Minfei Huang
kexec output message misses the prefix "kexec", when Dave Young split the kexec code. Now, we use file name as the output message prefix. Currently, the format of output message: [ 140.290795] SYSC_kexec_load: hello, world [ 140.291534] kexec: sanity_check_segment_list: hello, world Ideally, th