Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-29 Thread Andy Lutomirski
On Tue, Jun 28, 2016 at 11:58 AM, Oleg Nesterov wrote: > On 06/27, Oleg Nesterov wrote: >> >> On 06/27, Andy Lutomirski wrote: >> > >> > Want to send a patch? I could do it, but you understand this code >> > much better than I do. >> >> Well, I'll try to do this tomorrow unless you do it. > > I h

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-29 Thread Andy Lutomirski
On Tue, Jun 28, 2016 at 3:59 PM, Oleg Nesterov wrote: > On 06/28, Andy Lutomirski wrote: >> >> On Tue, Jun 28, 2016 at 1:12 PM, Oleg Nesterov wrote: >> > >> > So please forget unless you see another reason for this change. >> > >> >> But I might need to that anyway for procfs to read the the stac

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Oleg Nesterov
On 06/28, Andy Lutomirski wrote: > > On Tue, Jun 28, 2016 at 1:12 PM, Oleg Nesterov wrote: > > > > So please forget unless you see another reason for this change. > > > > But I might need to that anyway for procfs to read the the stack, > right? Do you see another way to handle that case? Well,

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Oleg Nesterov
On 06/28, Linus Torvalds wrote: > > Then try_get_task_stack(tsk) becomes > > void *try_get_task_stack(struct task_struct *tsk) > { > void *stack = tsk->stack; > if (!atomic_inc_not_zero(&tsk->stackref)) >stack = NULL; > return stack; > }

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Linus Torvalds
On Tue, Jun 28, 2016 at 2:35 PM, Linus Torvalds wrote: > On Tue, Jun 28, 2016 at 2:21 PM, Andy Lutomirski wrote: > >> If so, that seems considerably more complicated than just adding a reference >> count. > > Fair enough. Ahh, and if you put the reference count just in the task_struct (next to

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Linus Torvalds
On Tue, Jun 28, 2016 at 2:21 PM, Andy Lutomirski wrote: > > Or are you > suggesting that we actually make a list somewhere of stacks that are > nominally unused but are still around for RCU's benefit and then > scavenge from that lest when we need a new stack? Yes. We'd have to make our own list

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Andy Lutomirski
On Tue, Jun 28, 2016 at 2:14 PM, Linus Torvalds wrote: > On Tue, Jun 28, 2016 at 1:54 PM, Andy Lutomirski wrote: >> >> But I might need to that anyway for procfs to read the the stack, >> right? Do you see another way to handle that case? > > I think the other way to handle the kernel stack read

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Linus Torvalds
On Tue, Jun 28, 2016 at 2:14 PM, Linus Torvalds wrote: > > I think the other way to handle the kernel stack reading would be to > simply make the stack freeing be RCU-delayed, and use the RCU list > itself as the stack cache. That said, if you end up having to have a stack reference count for oth

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Linus Torvalds
On Tue, Jun 28, 2016 at 1:54 PM, Andy Lutomirski wrote: > > But I might need to that anyway for procfs to read the the stack, > right? Do you see another way to handle that case? I think the other way to handle the kernel stack reading would be to simply make the stack freeing be RCU-delayed, an

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Andy Lutomirski
On Tue, Jun 28, 2016 at 1:12 PM, Oleg Nesterov wrote: > On 06/28, Andy Lutomirski wrote: >> >> On Tue, Jun 28, 2016 at 11:58 AM, Oleg Nesterov wrote: >> > >> > Then how (say) proc_pid_stack() can work? If it hits the task which is >> > alreay dead we are (probably) fine, valid_stack_ptr() should

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Oleg Nesterov
On 06/28, Andy Lutomirski wrote: > > On Tue, Jun 28, 2016 at 11:58 AM, Oleg Nesterov wrote: > > > > Then how (say) proc_pid_stack() can work? If it hits the task which is > > alreay dead we are (probably) fine, valid_stack_ptr() should fail iiuc. > > > > But what if we race with the last schedule(

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Andy Lutomirski
On Tue, Jun 28, 2016 at 11:58 AM, Oleg Nesterov wrote: > On 06/27, Oleg Nesterov wrote: >> >> On 06/27, Andy Lutomirski wrote: >> > >> > Want to send a patch? I could do it, but you understand this code >> > much better than I do. >> >> Well, I'll try to do this tomorrow unless you do it. > > I h

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Oleg Nesterov
On 06/27, Oleg Nesterov wrote: > > On 06/27, Andy Lutomirski wrote: > > > > Want to send a patch? I could do it, but you understand this code > > much better than I do. > > Well, I'll try to do this tomorrow unless you do it. I have cloned luto/linux.git to see if kthread_stop() can pin ->stack s

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-27 Thread Linus Torvalds
On Sun, Jun 26, 2016 at 10:22 PM, Andy Lutomirski wrote: > > kthread_stop is *sick*. > > struct kthread self; > > > current->vfork_done = &self.exited; > > > do_exit(ret); > > And then some other thread goes and waits for the completion, which is > *on the stack*, which, in any sane wo

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-27 Thread Oleg Nesterov
On 06/27, Andy Lutomirski wrote: > > On Mon, Jun 27, 2016 at 7:54 AM, Oleg Nesterov wrote: > > > >> Is there seriously no way to directly wait for a struct task_struct to > >> exit? Could we, say, kmalloc the completion (or maybe even the whole > >> struct kthread) and (ick!) hang it off ->vfork_

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-27 Thread Andy Lutomirski
On Mon, Jun 27, 2016 at 7:54 AM, Oleg Nesterov wrote: > On 06/26, Andy Lutomirski wrote: >> >> kthread_stop is *sick*. >> >> struct kthread self; >> >> ... >> >> current->vfork_done = &self.exited; >> >> ... >> >> do_exit(ret); >> >> And then some other thread goes and waits for the co

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-27 Thread Oleg Nesterov
On 06/26, Andy Lutomirski wrote: > > kthread_stop is *sick*. > > struct kthread self; > > ... > > current->vfork_done = &self.exited; > > ... > > do_exit(ret); > > And then some other thread goes and waits for the completion, which is > *on the stack*, which, in any sane world (e.g. wit

Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-27 Thread Peter Zijlstra
On Sun, Jun 26, 2016 at 10:22:32PM -0700, Andy Lutomirski wrote: > kthread_stop is *sick*. > > struct kthread self; > > ... > > current->vfork_done = &self.exited; > > ... > > do_exit(ret); > > And then some other thread goes and waits for the completion, which is > *on the stack*

kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-26 Thread Andy Lutomirski
My v4 series was doing pretty well until this explosion: On Sun, Jun 26, 2016 at 9:41 PM, kernel test robot wrote: > > > FYI, we noticed the following commit: > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git x86/vmap_stack > commit 26424589626d7f82d09d4e7c0569f9487b2e810a ("[DEB