Hi, Eric, Oleg
Any comment?
>From the previous discussions, i think this change is necessary, but
we need to confirm that move the decrement of signal->live is a
safe.Here are some of my considerations
There are three places that are going to be called besides do_exit().
1. current_is_single_thr
From: Qianli Zhao
When init sub-threads running on different CPUs exit at the same time,
zap_pid_ns_processe()->BUG() may be happened(timing is as below),move
panic() before set PF_EXITING to fix this problem.
In addition,if panic() after other sub-threads finish do_exit(),
some key variab
From: Qianli Zhao
When init sub-threads running on different CPUs exit at the same time,
zap_pid_ns_processe()->BUG() may be happened(timing is as below),move
panic() before set PF_EXITING to fix this problem.
In addition,if panic() after other sub-threads finish do_exit(),
some key variab
t; documented. That is why I said it should be documented at least in the
> changelog.
Ok.
I will update the changelog as you suggest.
Oleg Nesterov 于2021年3月25日周四 上午2:12写道:
>
> Hi,
>
> On 03/23, qianli zhao wrote:
> >
> > Hi,Oleg
> >
> > > You certainl
p,if sub-threads finish do_exit(),these variables of sub-task
will be lost,and we cannot parse the coredump of the init process
through the tool normally such as "gcore".
Oleg Nesterov 于2021年3月23日周二 下午5:00写道:
>
> On 03/23, qianli zhao wrote:
> >
> > Hi,Oleg
> >
&g
seems that we don't understand each other.
>
> If we move atomic_dec_and_test(signal->live) and do
>
> if (group_dead && is_global_init)
> panic(...);
>
>
> before setting PF_EXITING like your patch does, then zap_pid_ns
er this move is safe or not,from my
analysis, no side effects have been found.
Would you like tell me how to prove that or give me some suggestion?
Thanks
Eric W. Biederman 于2021年3月19日周五 上午3:09写道:
>
> Oleg Nesterov writes:
>
> > On 03/18, qianli zhao wrote:
> >>
>
init! exitcode=0x%08x\n")
exit_notify()
find_alive_thread() //no alive
sub-threads
zap_pid_ns_processes()
//CONFIG_PID_NS is not set
BUG()
Oleg Nesterov 于2021年3月20日周六 上午
* immediately to get a useable coredump.
-*/
- if (unlikely(is_global_init(tsk)))
- panic("Attempted to kill init! exitcode=0x%08x\n",
- tsk->signal->group_exit_code ?: (int)code);
-
Eric W. Biederman
panic earlier before other init's sub-threads exit
Oleg Nesterov 于2021年3月19日周五 上午2:05写道:
>
> On 03/18, qianli zhao wrote:
> >
> > Hi,Oleg
> >
> > Thank you for your reply.
> >
> > >> When init sub-threads running on different CPUs exit at the same
ocesses()
being called when init do_exit().
In addition, the patch also protects the init process state to
successfully get usable init coredump.
In my test,this patch works.
Oleg Nesterov 于2021年3月17日周三 下午10:38写道:
>
> On 03/17, Qianli Zhao wrote:
> >
> > From: Qianli Zhao
>
From: Qianli Zhao
When init sub-threads running on different CPUs exit at the same time,
zap_pid_ns_processe()->BUG() may be happened.
And every thread status is abnormal after exit(PF_EXITING set,task->mm=NULL
etc),
which makes it difficult to parse coredump from fulldump normally.
In or
Hi Eric,Oleg
> As Oleg pointer out we need to do something like the code below.
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 04029e35e69a..bc676c06ef9a 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -785,15 +785,16 @@ void __noreturn do_exit(long code)
> sync_mm_
From: Qianli Zhao
When init sub-threads running on different CPUs exit at the same time,
zap_pid_ns_processe()->BUG() may be happened.
And every thread status is abnormal after exit(PF_EXITING set,task->mm=NULL
etc),
which makes it difficult to parse coredump from fulldump normally.
In or
Hi, Eric
Thank you for your suggestion
> At the start of your changelog and your patch subject you describe what
> you are doing but not why. For the next revision of the patch please
> lead with the why it makes what you are trying to do much easier to
> understand.
got it.
>
> It does not wor
UNKILLABLE to mark this stat,prevent other init
threads from continuing to exit
In addition i use siglock to protect tsk->signal->flags.
> And iiuc with this patch the kernel will crash if init's sub-thread execs,
> signal_group_exit() returns T in this case.
Oleg Nesterov 于2
From: Qianli Zhao
Once any init thread finds SIGNAL_GROUP_EXIT, trigger panic immediately
instead of last thread of global init has exited, and do not allow other
init threads to exit, protect task/memory state of all sub-threads for
get reliable init coredump
[ 24.705376] Kernel panic - not
Hi,tglx
Would you like to continue to review the new patch set?I made some
changes according to your suggestion.
Unless this change will not be considered or unnecessary.
Thanks
Qianli Zhao 于2020年10月12日周一 上午11:00写道:
>
> From: Qianli Zhao
>
> kthread_work is not covered by debug
From: Qianli Zhao
kthread_work is not covered by debug objects, but the same problems as with
regular work objects apply.
Some of the issues like reinitialization of an active kthread_work are hard
to debug because the problem manifests itself later in a completely
different context.
Add
Hi,Thomas
Thanks for your reply
On Thu, 1 Oct 2020 at 22:34, Thomas Gleixner wrote:
>
> On Mon, Aug 17 2020 at 14:37, Qianli Zhao wrote:
> > From: Qianli Zhao
> >
> > Add debugobject support to track the life time of kthread_work
> > which is used to detect reini
The following commit has been merged into the timers/core branch of tip:
Commit-ID: b952caf2d5ca898cc10d63be7722ae7a5daca696
Gitweb:
https://git.kernel.org/tip/b952caf2d5ca898cc10d63be7722ae7a5daca696
Author:Qianli Zhao
AuthorDate:Thu, 13 Aug 2020 23:03:14 +08:00
Dear maintainer
Is this change ignored or rejected?
I'm not sure who is the maintainer of kthread, please help give me a
definite reply
Thanks
On Mon, 17 Aug 2020 at 14:37, Qianli Zhao wrote:
>
> From: Qianli Zhao
>
> Add debugobject support to track the life time of kthrea
From: Qianli Zhao
If a workqueue flushes itself then that will lead to
a deadlock. Print a warning and a stack trace when
this happens.
crash> ps 10856
PIDPPID CPU TASK ST COMM
108562 2 ffc873428080 UN [kworker/u16:15]
crash> bt 10856
PID: 10856
owing commit (built with gcc-9):
>
> commit: 2e7d8748eba7e32150cbd4f57129ea77d1255892 ("[RFC V2] kthread: add
> object debug support")
> url:
> https://github.com/0day-ci/linux/commits/Qianli-Zhao/kthread-add-object-debug-support/20200812-131719
> base: https://git.k
From: Qianli Zhao
If a workqueue flushes itself then that will lead to
a deadlock. Print a warning and a stack trace when
this happens.
crash> ps 10856
PIDPPID CPU TASK ST COMM
108562 2 ffc873428080 UN [kworker/u16:15]
crash> bt 10856
PID: 10856
Markus
Thanks for your suggestion,and sorry for my poor wording.
On Tue, Aug 25, 2020 at 4:00 PM Markus Elfring wrote:
>
> > Flushing own workqueue or work self in work context will lead to
> > a deadlock.
>
> I imagine that the wording “or work self” can become clearer another bit.
>
>
> > Catc
From: Qianli Zhao
Flushing own workqueue or work self in work context will lead to
a deadlock.
Catch this incorrect usage and issue a warning when issue happened
crash> ps 10856
PIDPPID CPU TASK ST COMM
108562 2 ffc873428080 UN [kworker/u16:15]
crash> bt
From: Qianli Zhao
In a work process context,flush own workqueue or work self
will cause process blocked(enter state D),leading to a
deadlock,catch this wrong use,warn when the issue happened
crash> ps 10856
PIDPPID CPU TASK ST COMM
108562 2 ffc873428080
From: Qianli Zhao
Add debugobject support to track the life time of kthread_work
which is used to detect reinitialization/free active object problems
Add kthread_init_work_onstack()/kthread_init_delayed_work_onstack() for
kthread onstack support
If we reinitialize a kthread_work that has been
From: Qianli Zhao
Add debugobject support to track the life time of kthread_work
which is used to detect reinitialization/free active object problems
Add kthread_init_work_onstack()/kthread_init_delayed_work_onstack() for
kthread onstack support
If we reinitialize a kthread_work that has been
Hi,Stephen
Thanks for your suggestion, i will improve my patch.
Thanks.
On Sat, Aug 15, 2020 at 4:18 PM Stephen Boyd wrote:
>
> Quoting Qianli Zhao (2020-08-13 02:55:16)
> > From: Qianli Zhao
> >
> > Add debugobject support to track the life time of kthread_work
>
Thomas Gleixner 于2020年8月13日周四 下午6:46写道:
>
> Qianli Zhao writes:
>
> Please start the first word after the colon with an upper case letter.
>
> > do_init_timer can specify flags of timer_list,
>
> Please write do_init_timer() so it's entirely clear that this is ab
From: Qianli Zhao
do_init_timer() can specify flags of timer_list,
only TIMER_DEFFERABLE, TIMER_PINNED, TIMER_IRQSAFE are legal
do a sanity check, mask and warning illegal set of flags
Signed-off-by: Qianli Zhao
---
V2:
- update changelog
- mask and warning illegal set
---
include/linux
From: Qianli Zhao
Add debugobject support to track the life time of kthread_work
which is used to detect reinitialization/free active object problems
Add kthread_init_work_onstack/kthread_init_delayed_work_onstack for
kthread onstack support
If we reinitialize a kthread_work that has been
From: Qianli Zhao
Add debugobject support to track the life time of kthread_work
which is used to detect reinitialization/free active object problems
Add kthread_init_work_onstack/kthread_init_delayed_work_onstack for
kthread onstack support
Signed-off-by: Qianli Zhao
---
I got an crash issue
From: Qianli Zhao
Add debugobject support to track the life time of kthread_work
which is used to detect reinitialization/free active object problems
Add kthread_init_work_onstack/kthread_init_delayed_work_onstack for
kthread onstack support
Signed-off-by: Qianli Zhao
---
include/linux
From: Qianli Zhao
do_init_timer can specify flags of timer_list,
but this function does not expect to specify the CPU or idx.
If user invoking do_init_timer and specify CPU,
The result may not what we expected.
E.g:
do_init_timer invoked in core2,and specify flags 0x1
final result flags is 0x3
From: Qianli Zhao
There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer
From: Qianli Zhao
There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer
39 matches
Mail list logo