Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-21 Thread Valentin Schneider
On 20/07/20 15:21, Peter Zijlstra wrote: > On Mon, Jul 20, 2020 at 04:02:24PM +0200, Oleg Nesterov wrote: >> I have to admit, I do not understand the usage of prev_state in schedule(), >> it looks really, really subtle... > > Right, so commit dbfb089d360 solved a problem where schedule() re-read

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-21 Thread peterz
On Tue, Jul 21, 2020 at 12:52:52AM -0400, Paul Gortmaker wrote: > [Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917] On 20/07/2020 (Mon 16:21) > Peter Zijlstra wrote: > > > On Mon, Jul 20, 2020 at 04:02:24PM +0200, Oleg Nesterov wrote: > > > I have to admit, I do

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Paul Gortmaker
[Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917] On 20/07/2020 (Mon 16:21) Peter Zijlstra wrote: > On Mon, Jul 20, 2020 at 04:02:24PM +0200, Oleg Nesterov wrote: > > I have to admit, I do not understand the usage of prev_state in schedule(), > > it looks really, really subtle...

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Peter Zijlstra
On Mon, Jul 20, 2020 at 05:35:15PM +0200, Oleg Nesterov wrote: > On 07/20, Oleg Nesterov wrote: > > > > On 07/20, Peter Zijlstra wrote: > > > > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -4193,9 +4193,6 @@ static void __sched notrace __schedule(bool preempt) > > >

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Oleg Nesterov
On 07/20, Oleg Nesterov wrote: > > On 07/20, Peter Zijlstra wrote: > > > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -4193,9 +4193,6 @@ static void __sched notrace __schedule(bool preempt) > > local_irq_disable(); > > rcu_note_context_switch(preempt); > > > > - /*

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Oleg Nesterov
On 07/20, Peter Zijlstra wrote: > > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -4193,9 +4193,6 @@ static void __sched notrace __schedule(bool preempt) > local_irq_disable(); > rcu_note_context_switch(preempt); > > - /* See deactivate_task() below. */ > -

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Valentin Schneider
On 20/07/20 14:17, pet...@infradead.org wrote: > On Mon, Jul 20, 2020 at 01:20:26PM +0100, Valentin Schneider wrote: >> On 20/07/20 12:26, pet...@infradead.org wrote: > >> > + /* >> > + * We must re-load prev->state in case ttwu_remote() changed it >> > + * before we acquired rq->lock. >> >

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Peter Zijlstra
On Mon, Jul 20, 2020 at 04:02:24PM +0200, Oleg Nesterov wrote: > I have to admit, I do not understand the usage of prev_state in schedule(), > it looks really, really subtle... Right, so commit dbfb089d360 solved a problem where schedule() re-read prev->state vs prev->on_rq = 0. That is,

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread peterz
On Mon, Jul 20, 2020 at 01:26:23PM +0200, pet...@infradead.org wrote: > kernel/sched/core.c | 34 -- > 1 file changed, 28 insertions(+), 6 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index e15543cb84812..b5973d7fa521c 100644 > ---

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Oleg Nesterov
On 07/20, Peter Zijlstra wrote: > > Also, is there any way to not have ptrace do this? Well, we need to ensure that even SIGKILL can't wake the tracee up while debugger plays with its registers/etc. > How performance > critical is this ptrace path? This is a slow path. We can probably change

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread peterz
On Mon, Jul 20, 2020 at 01:20:26PM +0100, Valentin Schneider wrote: > On 20/07/20 12:26, pet...@infradead.org wrote: > > + /* > > +* We must re-load prev->state in case ttwu_remote() changed it > > +* before we acquired rq->lock. > > +*/ > > + tmp_state = prev->state; > > + if

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Christian Brauner
On Mon, Jul 20, 2020 at 01:26:23PM +0200, pet...@infradead.org wrote: > On Mon, Jul 20, 2020 at 12:59:24PM +0200, pet...@infradead.org wrote: > > On Mon, Jul 20, 2020 at 10:41:06AM +0200, Peter Zijlstra wrote: > > > On Mon, Jul 20, 2020 at 10:26:58AM +0200, Oleg Nesterov wrote: > > > > Peter, > >

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Valentin Schneider
On 20/07/20 12:26, pet...@infradead.org wrote: > --- > kernel/sched/core.c | 34 -- > 1 file changed, 28 insertions(+), 6 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index e15543cb84812..b5973d7fa521c 100644 > ---

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Jiri Slaby
On 20. 07. 20, 13:26, pet...@infradead.org wrote: > On Mon, Jul 20, 2020 at 12:59:24PM +0200, pet...@infradead.org wrote: >> On Mon, Jul 20, 2020 at 10:41:06AM +0200, Peter Zijlstra wrote: >>> On Mon, Jul 20, 2020 at 10:26:58AM +0200, Oleg Nesterov wrote: Peter, Let me add another

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread peterz
On Mon, Jul 20, 2020 at 12:59:24PM +0200, pet...@infradead.org wrote: > On Mon, Jul 20, 2020 at 10:41:06AM +0200, Peter Zijlstra wrote: > > On Mon, Jul 20, 2020 at 10:26:58AM +0200, Oleg Nesterov wrote: > > > Peter, > > > > > > Let me add another note. TASK_TRACED/TASK_STOPPED was always

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread peterz
On Mon, Jul 20, 2020 at 10:41:06AM +0200, Peter Zijlstra wrote: > On Mon, Jul 20, 2020 at 10:26:58AM +0200, Oleg Nesterov wrote: > > Peter, > > > > Let me add another note. TASK_TRACED/TASK_STOPPED was always protected by > > ->siglock. In particular, ttwu(__TASK_TRACED) must be always called

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Peter Zijlstra
On Mon, Jul 20, 2020 at 10:26:58AM +0200, Oleg Nesterov wrote: > Peter, > > Let me add another note. TASK_TRACED/TASK_STOPPED was always protected by > ->siglock. In particular, ttwu(__TASK_TRACED) must be always called with > ->siglock held. That is why ptrace_freeze_traced() assumes it can

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Oleg Nesterov
Peter, Let me add another note. TASK_TRACED/TASK_STOPPED was always protected by ->siglock. In particular, ttwu(__TASK_TRACED) must be always called with ->siglock held. That is why ptrace_freeze_traced() assumes it can safely do s/TASK_TRACED/__TASK_TRACED/ under spin_lock(siglock). Can this

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Oleg Nesterov
On 07/20, Jiri Slaby wrote: > > On 18. 07. 20, 19:14, Oleg Nesterov wrote: > > > > This is already wrong. But > > > > Where does this __might_sleep() come from ??? I ses no blocking calls > > in ptrace_stop(). Not to mention it is called with ->siglock held and > > right after this

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Oleg Nesterov
On 07/20, Jiri Slaby wrote: > > You tackled it, we cherry-picked dbfb089d360 to our kernels. Ccing more > people. Thanks... so with this patch __schedule() does prev_state = prev->state; ... if (!preempt && prev_state && prev_state == prev->state) { if

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-20 Thread Jiri Slaby
On 18. 07. 20, 19:14, Oleg Nesterov wrote: > On 07/18, Jiri Slaby wrote: >> >> On 17. 07. 20, 14:40, Oleg Nesterov wrote: >>> >>> please see the updated patch below, lets check ptrace_unfreeze() too. >> >> Sure, dmesg attached. > > Thanks a lot! > > But I am totally confused... > >> [

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-19 Thread Jiri Slaby
On 18. 07. 20, 19:44, Christian Brauner wrote: > On Sat, Jul 18, 2020 at 07:14:07PM +0200, Oleg Nesterov wrote: >> On 07/18, Jiri Slaby wrote: >>> >>> On 17. 07. 20, 14:40, Oleg Nesterov wrote: please see the updated patch below, lets check ptrace_unfreeze() too. >>> >>> Sure, dmesg

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-19 Thread Oleg Nesterov
Hi Hillf, On 07/19, Hillf Danton wrote: > > Dunno if the wheel prior to JOBCTL_TASK_WORK helps debug the warnings. > > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -2541,7 +2541,7 @@ bool get_signal(struct ksignal *ksig) > > relock: > spin_lock_irq(>siglock); > - current->jobctl

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-18 Thread Christian Brauner
On Sat, Jul 18, 2020 at 07:14:07PM +0200, Oleg Nesterov wrote: > On 07/18, Jiri Slaby wrote: > > > > On 17. 07. 20, 14:40, Oleg Nesterov wrote: > > > > > > please see the updated patch below, lets check ptrace_unfreeze() too. > > > > Sure, dmesg attached. > > Thanks a lot! > > But I am totally

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-18 Thread Oleg Nesterov
On 07/18, Jiri Slaby wrote: > > On 17. 07. 20, 14:40, Oleg Nesterov wrote: > > > > please see the updated patch below, lets check ptrace_unfreeze() too. > > Sure, dmesg attached. Thanks a lot! But I am totally confused... > [ 94.513944] [ cut here ] > [ 94.513985] do

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-18 Thread Jiri Slaby
On 17. 07. 20, 13:12, Christian Brauner wrote: > On Fri, Jul 17, 2020 at 01:04:38PM +0200, Jiri Slaby wrote: >> On 17. 07. 20, 12:45, Jiri Slaby wrote: >>> Hi, >>> >>> the strace testsuite triggers this on 5.8-rc4 and -rc5 both on x86_64 >>> and i586: >> >> make check needs -jsomething, running is

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-18 Thread Jiri Slaby
On 17. 07. 20, 14:40, Oleg Nesterov wrote: > On 07/17, Oleg Nesterov wrote: >> >> On 07/17, Jiri Slaby wrote: >>> >>> On 17. 07. 20, 12:45, Jiri Slaby wrote: Hi, the strace testsuite triggers this on 5.8-rc4 and -rc5 both on x86_64 and i586: >>> >>> make check needs

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-17 Thread Oleg Nesterov
On 07/17, Oleg Nesterov wrote: > > On 07/17, Jiri Slaby wrote: > > > > On 17. 07. 20, 12:45, Jiri Slaby wrote: > > > Hi, > > > > > > the strace testsuite triggers this on 5.8-rc4 and -rc5 both on x86_64 > > > and i586: > > > > make check needs -jsomething, running is sequentially (-j1) doesn't > >

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-17 Thread Oleg Nesterov
On 07/17, Jiri Slaby wrote: > > On 17. 07. 20, 12:45, Jiri Slaby wrote: > > Hi, > > > > the strace testsuite triggers this on 5.8-rc4 and -rc5 both on x86_64 > > and i586: > > make check needs -jsomething, running is sequentially (-j1) doesn't > trigger it. After the error, I cannot run anything.

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-17 Thread Christian Brauner
On Fri, Jul 17, 2020 at 01:04:38PM +0200, Jiri Slaby wrote: > On 17. 07. 20, 12:45, Jiri Slaby wrote: > > Hi, > > > > the strace testsuite triggers this on 5.8-rc4 and -rc5 both on x86_64 > > and i586: > > make check needs -jsomething, running is sequentially (-j1) doesn't > trigger it. After

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-17 Thread Jiri Slaby
On 17. 07. 20, 12:45, Jiri Slaby wrote: > Hi, > > the strace testsuite triggers this on 5.8-rc4 and -rc5 both on x86_64 > and i586: make check needs -jsomething, running is sequentially (-j1) doesn't trigger it. After the error, I cannot run anything. Like ps to find out what test caused the

5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-17 Thread Jiri Slaby
Hi, the strace testsuite triggers this on 5.8-rc4 and -rc5 both on x86_64 and i586: > kernel BUG at kernel/signal.c:1917! > invalid opcode: [#1] SMP NOPTI > CPU: 7 PID: 18367 Comm: filter-unavaila Not tainted > 5.8.0-rc4-3.g2cd7849-default #1 openSUSE Tumbleweed (unreleased) > Hardware