> > Am 15.10.25 um 08:36 schrieb lirongqing: > > From: Li RongQing <[email protected]> > > > > Currently, when 'hung_task_panic' is enabled, the kernel panics > > immediately upon detecting the first hung task. However, some hung > > tasks are transient and allow system recovery, while persistent hangs > > should trigger a panic when accumulating beyond a threshold. > > > > Extend the 'hung_task_panic' sysctl to accept a threshold value > > specifying the number of hung tasks that must be detected before > > triggering a kernel panic. This provides finer control for > > environments where transient hangs may occur but persistent hangs > should be fatal. > > > > The sysctl now accepts: > > - 0: don't panic (maintains original behavior) > > - 1: panic on first hung task (maintains original behavior) > > - N > 1: panic after N hung tasks are detected in a single scan > > > > This maintains backward compatibility while providing flexibility for > > different hang scenarios. > > > > Signed-off-by: Li RongQing <[email protected]> > > Cc: Andrew Jeffery <[email protected]> > > Cc: Anshuman Khandual <[email protected]> > > Cc: Arnd Bergmann <[email protected]> > > Cc: David Hildenbrand <[email protected]> > > Cc: Florian Wesphal <[email protected]> > > Cc: Jakub Kacinski <[email protected]> > > Cc: Jason A. Donenfeld <[email protected]> > > Cc: Joel Granados <[email protected]> > > Cc: Joel Stanley <[email protected]> > > Cc: Jonathan Corbet <[email protected]> > > Cc: Kees Cook <[email protected]> > > Cc: Lance Yang <[email protected]> > > Cc: Liam Howlett <[email protected]> > > Cc: Lorenzo Stoakes <[email protected]> > > Cc: "Masami Hiramatsu (Google)" <[email protected]> > > Cc: "Paul E . McKenney" <[email protected]> > > Cc: Pawan Gupta <[email protected]> > > Cc: Petr Mladek <[email protected]> > > Cc: Phil Auld <[email protected]> > > Cc: Randy Dunlap <[email protected]> > > Cc: Russell King <[email protected]> > > Cc: Shuah Khan <[email protected]> > > Cc: Simon Horman <[email protected]> > > Cc: Stanislav Fomichev <[email protected]> > > Cc: Steven Rostedt <[email protected]> > > --- > > diff with v3: comments modification, suggested by Lance, Masami, Randy > > and Petr diff with v2: do not add a new sysctl, extend > > hung_task_panic, suggested by Kees Cook > > > > Documentation/admin-guide/kernel-parameters.txt | 20 > +++++++++++++------- > > Documentation/admin-guide/sysctl/kernel.rst | 9 +++++---- > > arch/arm/configs/aspeed_g5_defconfig | 2 +- > > kernel/configs/debug.config | 2 +- > > kernel/hung_task.c | 15 > ++++++++++----- > > lib/Kconfig.debug | 9 > +++++---- > > tools/testing/selftests/wireguard/qemu/kernel.config | 2 +- > > 7 files changed, 36 insertions(+), 23 deletions(-) > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > > b/Documentation/admin-guide/kernel-parameters.txt > > index a51ab46..492f0bc 100644 > > --- a/Documentation/admin-guide/kernel-parameters.txt > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > @@ -1992,14 +1992,20 @@ > > the added memory block itself do not be affected. > > > > hung_task_panic= > > - [KNL] Should the hung task detector generate panics. > > - Format: 0 | 1 > > + [KNL] Number of hung tasks to trigger kernel panic. > > + Format: <int> > > + > > + When set to a non-zero value, a kernel panic will be > > triggered > if > > + the number of detected hung tasks reaches this value. > > + > > + 0: don't panic > > + 1: panic immediately on first hung task > > + N: panic after N hung tasks are detected in a single > > scan > > > > - A value of 1 instructs the kernel to panic when a > > - hung task is detected. The default value is controlled > > - by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time > > - option. The value selected by this boot parameter can > > - be changed later by the kernel.hung_task_panic sysctl. > > + The default value is controlled by the > > + CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time option. > The value > > + selected by this boot parameter can be changed later by > > the > > + kernel.hung_task_panic sysctl. > > > > hvc_iucv= [S390] Number of z/VM IUCV hypervisor console > (HVC) > > terminal devices. Valid values: 0..8 diff --git > > a/Documentation/admin-guide/sysctl/kernel.rst > > b/Documentation/admin-guide/sysctl/kernel.rst > > index f3ee807..0065a55 100644 > > --- a/Documentation/admin-guide/sysctl/kernel.rst > > +++ b/Documentation/admin-guide/sysctl/kernel.rst > > @@ -397,13 +397,14 @@ a hung task is detected. > > hung_task_panic > > =============== > > > > -Controls the kernel's behavior when a hung task is detected. > > +When set to a non-zero value, a kernel panic will be triggered if the > > +number of hung tasks found during a single scan reaches this value. > > This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. > > > > -= ================================================= > > += ======================================================= > > 0 Continue operation. This is the default behavior. > > -1 Panic immediately. > > -= ================================================= > > +N Panic when N hung tasks are found during a single scan. > > += ======================================================= > > > > > > hung_task_check_count > > […] > > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index > > 3034e294..3976c90 100644 > > --- a/lib/Kconfig.debug > > +++ b/lib/Kconfig.debug > > @@ -1258,12 +1258,13 @@ config DEFAULT_HUNG_TASK_TIMEOUT > > Keeping the default should be fine in most cases. > > > > config BOOTPARAM_HUNG_TASK_PANIC > > - bool "Panic (Reboot) On Hung Tasks" > > + int "Number of hung tasks to trigger kernel panic" > > depends on DETECT_HUNG_TASK > > + default 0 > > help > > - Say Y here to enable the kernel to panic on "hung tasks", > > - which are bugs that cause the kernel to leave a task stuck > > - in uninterruptible "D" state. > > + When set to a non-zero value, a kernel panic will be triggered > > + if the number of hung tasks found during a single scan reaches > > + this value. > > > > The panic can be used in combination with panic_timeout, > > to cause the system to reboot automatically after a > Why not leave the sentence about the uninterruptible "D" state in there? > This seem to say a kernel bug to cause hung task, but it maybe hardware failure(or virtio backend bug); so I do not keep it
> Also, it sounds like, some are actually using this in production. Maybe it > should be moved out of `Kconfig.debug` too? > I think hung task panic is a useful feature, it should move out of Kconfig.debug Thanks -Li > > Kind regards, > > Paul
