Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-17 Thread Andrea Arcangeli
On Tue, Dec 17, 2013 at 05:53:35PM +0100, Oleg Nesterov wrote: > On 12/16, Oleg Nesterov wrote: > > > > On 12/16, Andrea Arcangeli wrote: > > > > > > On Mon, Dec 16, 2013 at 07:36:18PM +0100, Oleg Nesterov wrote: > > > > > > > > And compound_lock_irqsave() looks racy even after > > > >

Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-17 Thread Oleg Nesterov
On 12/16, Oleg Nesterov wrote: > > On 12/16, Andrea Arcangeli wrote: > > > > On Mon, Dec 16, 2013 at 07:36:18PM +0100, Oleg Nesterov wrote: > > > > > > And compound_lock_irqsave() looks racy even after get_page_unless_zero(). > > > > > > For example, suppose that page_head was already freed and

Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-17 Thread Oleg Nesterov
On 12/16, Oleg Nesterov wrote: On 12/16, Andrea Arcangeli wrote: On Mon, Dec 16, 2013 at 07:36:18PM +0100, Oleg Nesterov wrote: And compound_lock_irqsave() looks racy even after get_page_unless_zero(). For example, suppose that page_head was already freed and then

Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-17 Thread Andrea Arcangeli
On Tue, Dec 17, 2013 at 05:53:35PM +0100, Oleg Nesterov wrote: On 12/16, Oleg Nesterov wrote: On 12/16, Andrea Arcangeli wrote: On Mon, Dec 16, 2013 at 07:36:18PM +0100, Oleg Nesterov wrote: And compound_lock_irqsave() looks racy even after get_page_unless_zero().

Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-16 Thread Oleg Nesterov
On 12/16, Andrea Arcangeli wrote: > > On Mon, Dec 16, 2013 at 07:36:18PM +0100, Oleg Nesterov wrote: > > > > So it seems that put_compound_tail() should also do get/put(head) like > > put_compound_page() does, and this probably means we should factor out > > the common code somehow. > > > > Yes it

Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-16 Thread Andrea Arcangeli
On Mon, Dec 16, 2013 at 07:36:18PM +0100, Oleg Nesterov wrote: > On 12/13, Andrea Arcangeli wrote: > > > > On Fri, Dec 13, 2013 at 05:22:40PM +0100, Oleg Nesterov wrote: > > > > > > I'll try to make v2 based on -mm and your suggestions. > > > > Ok great! > > Yes, it would be great, but I need

Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-16 Thread Oleg Nesterov
On 12/13, Andrea Arcangeli wrote: > > On Fri, Dec 13, 2013 at 05:22:40PM +0100, Oleg Nesterov wrote: > > > > I'll try to make v2 based on -mm and your suggestions. > > Ok great! Yes, it would be great, but I need your help again ;) Let me quote the pseudo-code you sent me:

Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-16 Thread Oleg Nesterov
On 12/13, Andrea Arcangeli wrote: On Fri, Dec 13, 2013 at 05:22:40PM +0100, Oleg Nesterov wrote: I'll try to make v2 based on -mm and your suggestions. Ok great! Yes, it would be great, but I need your help again ;) Let me quote the pseudo-code you sent me:

Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-16 Thread Andrea Arcangeli
On Mon, Dec 16, 2013 at 07:36:18PM +0100, Oleg Nesterov wrote: On 12/13, Andrea Arcangeli wrote: On Fri, Dec 13, 2013 at 05:22:40PM +0100, Oleg Nesterov wrote: I'll try to make v2 based on -mm and your suggestions. Ok great! Yes, it would be great, but I need your help again ;)

Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-16 Thread Oleg Nesterov
On 12/16, Andrea Arcangeli wrote: On Mon, Dec 16, 2013 at 07:36:18PM +0100, Oleg Nesterov wrote: So it seems that put_compound_tail() should also do get/put(head) like put_compound_page() does, and this probably means we should factor out the common code somehow. Yes it was supposed

Re: process 'stuck' at exit.

2013-12-14 Thread Oleg Nesterov
On 12/10, Dave Jones wrote: > > On Tue, Dec 10, 2013 at 07:23:30PM -0500, Dave Jones wrote: > > > I was distracted by seeing all the other threads exiting, so I was only > looking at > > what this one had already done. > > another thing that distracted me was that /proc/10818/stack was just

Re: process 'stuck' at exit.

2013-12-14 Thread Oleg Nesterov
On 12/10, Dave Jones wrote: On Tue, Dec 10, 2013 at 07:23:30PM -0500, Dave Jones wrote: I was distracted by seeing all the other threads exiting, so I was only looking at what this one had already done. another thing that distracted me was that /proc/10818/stack was just showing

Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-13 Thread Andrea Arcangeli
On Fri, Dec 13, 2013 at 05:22:40PM +0100, Oleg Nesterov wrote: > Andrea. Thanks a lot for the detailed reply. > > On 12/13, Andrea Arcangeli wrote: > > > > On Wed, Dec 11, 2013 at 08:18:55PM +0100, Oleg Nesterov wrote: > > > > get_huge_page_tail checks different invariants in the VM_BUG_ON and is

Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-13 Thread Oleg Nesterov
Andrea. Thanks a lot for the detailed reply. On 12/13, Andrea Arcangeli wrote: > > On Wed, Dec 11, 2013 at 08:18:55PM +0100, Oleg Nesterov wrote: > > get_huge_page_tail checks different invariants in the VM_BUG_ON and is > only used by gup.c not sure why to call that here. Compared to

Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-13 Thread Andrea Arcangeli
On Wed, Dec 11, 2013 at 08:18:55PM +0100, Oleg Nesterov wrote: > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -210,7 +210,7 @@ EXPORT_SYMBOL(put_page); > * This function is exported but must not be called by anything other > * than get_page(). It implements the slow path of get_page(). > */ >

Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-13 Thread Andrea Arcangeli
On Wed, Dec 11, 2013 at 08:18:55PM +0100, Oleg Nesterov wrote: --- a/mm/swap.c +++ b/mm/swap.c @@ -210,7 +210,7 @@ EXPORT_SYMBOL(put_page); * This function is exported but must not be called by anything other * than get_page(). It implements the slow path of get_page(). */ -bool

Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-13 Thread Oleg Nesterov
Andrea. Thanks a lot for the detailed reply. On 12/13, Andrea Arcangeli wrote: On Wed, Dec 11, 2013 at 08:18:55PM +0100, Oleg Nesterov wrote: get_huge_page_tail checks different invariants in the VM_BUG_ON and is only used by gup.c not sure why to call that here. Compared to

Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-13 Thread Andrea Arcangeli
On Fri, Dec 13, 2013 at 05:22:40PM +0100, Oleg Nesterov wrote: Andrea. Thanks a lot for the detailed reply. On 12/13, Andrea Arcangeli wrote: On Wed, Dec 11, 2013 at 08:18:55PM +0100, Oleg Nesterov wrote: get_huge_page_tail checks different invariants in the VM_BUG_ON and is only

Re: process 'stuck' at exit.

2013-12-12 Thread Linus Torvalds
On Thu, Dec 12, 2013 at 11:00 AM, Andrea Arcangeli wrote: > > However it wasn't me introducing the bug, my code when patched in > early 2011 would work fine. The bug was introduced half a year later > in commit 9ea71503a8ed9184d2d0b8ccc4d269d05f7940ae . I'd argue that half a year later the bug

Re: process 'stuck' at exit.

2013-12-12 Thread Andrea Arcangeli
Hello, On Tue, Dec 10, 2013 at 01:57:49PM -0800, Linus Torvalds wrote: > On Tue, Dec 10, 2013 at 1:41 PM, Dave Jones wrote: > > > > http://codemonkey.org.uk/junk/trace > > Hmm. Ok, so something is calling [__]get_user_pages_fast() and > put_page() in a loop, but the trace doesn't show what that

Re: process 'stuck' at exit.

2013-12-12 Thread Andrea Arcangeli
Hello, On Tue, Dec 10, 2013 at 01:57:49PM -0800, Linus Torvalds wrote: On Tue, Dec 10, 2013 at 1:41 PM, Dave Jones da...@redhat.com wrote: http://codemonkey.org.uk/junk/trace Hmm. Ok, so something is calling [__]get_user_pages_fast() and put_page() in a loop, but the trace doesn't show

Re: process 'stuck' at exit.

2013-12-12 Thread Linus Torvalds
On Thu, Dec 12, 2013 at 11:00 AM, Andrea Arcangeli aarca...@redhat.com wrote: However it wasn't me introducing the bug, my code when patched in early 2011 would work fine. The bug was introduced half a year later in commit 9ea71503a8ed9184d2d0b8ccc4d269d05f7940ae . I'd argue that half a year

Re: process 'stuck' at exit.

2013-12-11 Thread Darren Hart
On Wed, 2013-12-11 at 23:26 -0500, Dave Jones wrote: > On Tue, Dec 10, 2013 at 02:48:52PM -0800, Linus Torvalds wrote: > > > Dave, can you re-create that trinity run and test that patch? I think > > we've got this > > 24 hours later, all is well. I think we can call this one done. > >

Re: process 'stuck' at exit.

2013-12-11 Thread Dave Jones
On Tue, Dec 10, 2013 at 02:48:52PM -0800, Linus Torvalds wrote: > Dave, can you re-create that trinity run and test that patch? I think > we've got this 24 hours later, all is well. I think we can call this one done. Tested-by: Dave Jones Dave -- To unsubscribe from this list:

PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-11 Thread Oleg Nesterov
On 12/11, Oleg Nesterov wrote: > > On 12/11, Thomas Gleixner wrote: > > > > On Wed, 11 Dec 2013, Oleg Nesterov wrote: > > > > > > I have to admit, I do not understand why we can't avoid this altogether. > > > > > > __get_page_tail() can find the stable ->first_page, why get_futex_key() > > > can't

Re: process 'stuck' at exit.

2013-12-11 Thread Mel Gorman
On Wed, Dec 11, 2013 at 05:38:55PM +0100, Thomas Gleixner wrote: > On Wed, 11 Dec 2013, Mel Gorman wrote: > > On Tue, Dec 10, 2013 at 11:42:15PM +0100, Thomas Gleixner wrote: > > > Now, if that map is RO, i.e. we took the fallback path then the THP > > > one will fail as it has write=1

Re: process 'stuck' at exit.

2013-12-11 Thread Oleg Nesterov
On 12/11, Thomas Gleixner wrote: > > On Wed, 11 Dec 2013, Oleg Nesterov wrote: > > > > I know almost nothing about THP, but why we may need write == true in > > this case? > > > > IOW, > > > > > - if (likely(__get_user_pages_fast(address, 1, 1, ) == 1)) { > > > + if

Re: process 'stuck' at exit.

2013-12-11 Thread Thomas Gleixner
On Wed, 11 Dec 2013, Oleg Nesterov wrote: > On 12/10, Linus Torvalds wrote: > > > > I think what happens is: > > - get_user_pages_fast(address, 1, 1, ) fails (because it's read-only) > > - get_user_pages_fast(address, 1, 0, ) succeeds and gets a large-page > > - __get_user_pages_fast(address,

Re: process 'stuck' at exit.

2013-12-11 Thread Oleg Nesterov
On 12/10, Linus Torvalds wrote: > > I think what happens is: > - get_user_pages_fast(address, 1, 1, ) fails (because it's read-only) > - get_user_pages_fast(address, 1, 0, ) succeeds and gets a large-page > - __get_user_pages_fast(address, 1, 1, ) fails (because it's read-only). > > so what

Re: process 'stuck' at exit.

2013-12-11 Thread Thomas Gleixner
On Wed, 11 Dec 2013, Mel Gorman wrote: > On Tue, Dec 10, 2013 at 11:42:15PM +0100, Thomas Gleixner wrote: > > Now, if that map is RO, i.e. we took the fallback path then the THP > > one will fail as it has write=1 unconditionally. > > > > if (likely(__get_user_pages_fast(address, 1, 1, ) ==

Re: process 'stuck' at exit.

2013-12-11 Thread Mel Gorman
On Tue, Dec 10, 2013 at 11:42:15PM +0100, Thomas Gleixner wrote: > On Tue, 10 Dec 2013, Linus Torvalds wrote: > > > On Tue, Dec 10, 2013 at 1:57 PM, Linus Torvalds > > wrote: > > > > > > So it looks like __get_user_pages_fast() fails, and keeps failing. > > > > Hmm.. Is any of the addresses

Re: process 'stuck' at exit.

2013-12-11 Thread Ingo Molnar
* Linus Torvalds wrote: > On Tue, Dec 10, 2013 at 1:32 PM, Dave Jones wrote: > > > > http://www.codemonkey.org.uk/junk/perf.data.xz > > "Forbidden > > You don't have permission to access /junk/perf.data.xz on this server." > > also, we'd need the vmlinux file to actually decode the data,

Re: process 'stuck' at exit.

2013-12-11 Thread Ingo Molnar
* Linus Torvalds torva...@linux-foundation.org wrote: On Tue, Dec 10, 2013 at 1:32 PM, Dave Jones da...@redhat.com wrote: http://www.codemonkey.org.uk/junk/perf.data.xz Forbidden You don't have permission to access /junk/perf.data.xz on this server. also, we'd need the vmlinux

Re: process 'stuck' at exit.

2013-12-11 Thread Mel Gorman
On Tue, Dec 10, 2013 at 11:42:15PM +0100, Thomas Gleixner wrote: On Tue, 10 Dec 2013, Linus Torvalds wrote: On Tue, Dec 10, 2013 at 1:57 PM, Linus Torvalds torva...@linux-foundation.org wrote: So it looks like __get_user_pages_fast() fails, and keeps failing. Hmm.. Is any of the

Re: process 'stuck' at exit.

2013-12-11 Thread Thomas Gleixner
On Wed, 11 Dec 2013, Mel Gorman wrote: On Tue, Dec 10, 2013 at 11:42:15PM +0100, Thomas Gleixner wrote: Now, if that map is RO, i.e. we took the fallback path then the THP one will fail as it has write=1 unconditionally. if (likely(__get_user_pages_fast(address, 1, 1, page) == 1))

Re: process 'stuck' at exit.

2013-12-11 Thread Oleg Nesterov
On 12/10, Linus Torvalds wrote: I think what happens is: - get_user_pages_fast(address, 1, 1, page) fails (because it's read-only) - get_user_pages_fast(address, 1, 0, page) succeeds and gets a large-page - __get_user_pages_fast(address, 1, 1, page) fails (because it's read-only). so

Re: process 'stuck' at exit.

2013-12-11 Thread Thomas Gleixner
On Wed, 11 Dec 2013, Oleg Nesterov wrote: On 12/10, Linus Torvalds wrote: I think what happens is: - get_user_pages_fast(address, 1, 1, page) fails (because it's read-only) - get_user_pages_fast(address, 1, 0, page) succeeds and gets a large-page - __get_user_pages_fast(address, 1,

Re: process 'stuck' at exit.

2013-12-11 Thread Oleg Nesterov
On 12/11, Thomas Gleixner wrote: On Wed, 11 Dec 2013, Oleg Nesterov wrote: I know almost nothing about THP, but why we may need write == true in this case? IOW, - if (likely(__get_user_pages_fast(address, 1, 1, page) == 1)) { + if

Re: process 'stuck' at exit.

2013-12-11 Thread Mel Gorman
On Wed, Dec 11, 2013 at 05:38:55PM +0100, Thomas Gleixner wrote: On Wed, 11 Dec 2013, Mel Gorman wrote: On Tue, Dec 10, 2013 at 11:42:15PM +0100, Thomas Gleixner wrote: Now, if that map is RO, i.e. we took the fallback path then the THP one will fail as it has write=1 unconditionally.

PATCH? introduce get_compound_page (Was: process 'stuck' at exit)

2013-12-11 Thread Oleg Nesterov
On 12/11, Oleg Nesterov wrote: On 12/11, Thomas Gleixner wrote: On Wed, 11 Dec 2013, Oleg Nesterov wrote: I have to admit, I do not understand why we can't avoid this altogether. __get_page_tail() can find the stable -first_page, why get_futex_key() can't ? Because it can

Re: process 'stuck' at exit.

2013-12-11 Thread Dave Jones
On Tue, Dec 10, 2013 at 02:48:52PM -0800, Linus Torvalds wrote: Dave, can you re-create that trinity run and test that patch? I think we've got this 24 hours later, all is well. I think we can call this one done. Tested-by: Dave Jones da...@fedoraproject.org Dave -- To

Re: process 'stuck' at exit.

2013-12-11 Thread Darren Hart
On Wed, 2013-12-11 at 23:26 -0500, Dave Jones wrote: On Tue, Dec 10, 2013 at 02:48:52PM -0800, Linus Torvalds wrote: Dave, can you re-create that trinity run and test that patch? I think we've got this 24 hours later, all is well. I think we can call this one done. Tested-by: Dave

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 02:48:52PM -0800, Linus Torvalds wrote: > Dave, can you re-create that trinity run and test that patch? Looks ok so far, but I'll leave it run overnight to be sure. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of

Re: process 'stuck' at exit.

2013-12-10 Thread Mel Gorman
On Tue, Dec 10, 2013 at 08:18:29PM +0100, Thomas Gleixner wrote: > On Tue, 10 Dec 2013, Linus Torvalds wrote: > > > Hmm. Looks like the futex code is somehow stuck in a loop, calling > > get_user_pages_fast(). > > > > The futex code itself is apparently so low-overhead that it doesn't > > show

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 07:23:30PM -0500, Dave Jones wrote: > I was distracted by seeing all the other threads exiting, so I was only > looking at > what this one had already done. another thing that distracted me was that /proc/10818/stack was just showing that [] 0x

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 07:05:04PM -0500, Steven Rostedt wrote: > On Tue, Dec 10, 2013 at 06:00:09PM -0500, Dave Jones wrote: > > > > The only thing I'm still unclear on, is how that pid allegedly wasn't doing > > a futex call as part of its run. The only thing I can think of is that > > the

Re: process 'stuck' at exit.

2013-12-10 Thread Steven Rostedt
On Tue, Dec 10, 2013 at 06:00:09PM -0500, Dave Jones wrote: > > The only thing I'm still unclear on, is how that pid allegedly wasn't doing > a futex call as part of its run. The only thing I can think of is that > the other pid that _did_ do a futex call did it on a page that was MAP_SHARED >

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 02:48:52PM -0800, Linus Torvalds wrote: > On Tue, Dec 10, 2013 at 2:42 PM, Thomas Gleixner wrote: > > > > And yes, I remember that we do not do an extra check for the fshared > > case, because get_user_pages_fast() should do it for us already. If > > not we are

Re: process 'stuck' at exit.

2013-12-10 Thread Al Viro
On Tue, Dec 10, 2013 at 03:05:46PM -0800, Linus Torvalds wrote: > Nobody actually uses that argument any more (it goes back to the old > i386 "let's manually verify that we have write permissions, because > the CPU doesn't do it for us in the trap handling"), and it should > probably be removed.

Re: process 'stuck' at exit.

2013-12-10 Thread Thomas Gleixner
On Tue, 10 Dec 2013, Linus Torvalds wrote: > On Tue, Dec 10, 2013 at 2:57 PM, Thomas Gleixner wrote: > > > > But how does the access_ok() move do anything helpful here? > > Just making it all more obvious. > > > We really need it for the fastpath !fshared case, but for the fshared > > case you

Re: process 'stuck' at exit.

2013-12-10 Thread Al Viro
On Tue, Dec 10, 2013 at 11:42:15PM +0100, Thomas Gleixner wrote: > /* > * If write access is not required (eg. FUTEX_WAIT), try > * and get read-only access. > */ > if (err == -EFAULT && rw == VERIFY_READ) { > err =

Re: process 'stuck' at exit.

2013-12-10 Thread Linus Torvalds
On Tue, Dec 10, 2013 at 2:57 PM, Thomas Gleixner wrote: > > But how does the access_ok() move do anything helpful here? Just making it all more obvious. > We really need it for the fastpath !fshared case, but for the fshared > case you actively break working code, because you force a

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 02:58:19PM -0800, Darren Hart wrote: > On Tue, 2013-12-10 at 14:48 -0800, Linus Torvalds wrote: > > On Tue, Dec 10, 2013 at 2:42 PM, Thomas Gleixner > > wrote: > > > > > > And yes, I remember that we do not do an extra check for the fshared > > > case, because

Re: process 'stuck' at exit.

2013-12-10 Thread Darren Hart
On Tue, 2013-12-10 at 14:48 -0800, Linus Torvalds wrote: > On Tue, Dec 10, 2013 at 2:42 PM, Thomas Gleixner wrote: > > > > And yes, I remember that we do not do an extra check for the fshared > > case, because get_user_pages_fast() should do it for us already. If > > not we are fubared not only

Re: process 'stuck' at exit.

2013-12-10 Thread Thomas Gleixner
On Tue, 10 Dec 2013, Linus Torvalds wrote: > On Tue, Dec 10, 2013 at 2:19 PM, Linus Torvalds > wrote: > > > > Shouldn't we do something like the attached? > > So I think that kernel/futex.c part of the patch might be a good idea, > but on x86-64 (which is what Dave is running), the > >

Re: process 'stuck' at exit.

2013-12-10 Thread Darren Hart
On Tue, 2013-12-10 at 23:42 +0100, Thomas Gleixner wrote: > On Tue, 10 Dec 2013, Linus Torvalds wrote: > > > On Tue, Dec 10, 2013 at 1:57 PM, Linus Torvalds > > wrote: > > > > > > So it looks like __get_user_pages_fast() fails, and keeps failing. > > > > Hmm.. Is any of the addresses unchecked,

Re: process 'stuck' at exit.

2013-12-10 Thread Linus Torvalds
On Tue, Dec 10, 2013 at 2:42 PM, Thomas Gleixner wrote: > > And yes, I remember that we do not do an extra check for the fshared > case, because get_user_pages_fast() should do it for us already. If > not we are fubared not only in the futex code. Yeah. It turns out we do do the access check

Re: process 'stuck' at exit.

2013-12-10 Thread Thomas Gleixner
On Tue, 10 Dec 2013, Linus Torvalds wrote: > On Tue, Dec 10, 2013 at 1:57 PM, Linus Torvalds > wrote: > > > > So it looks like __get_user_pages_fast() fails, and keeps failing. > > Hmm.. Is any of the addresses unchecked, perhaps? > __get_user_pages_fast() does an access_ok() check, while >

Re: process 'stuck' at exit.

2013-12-10 Thread Darren Hart
On Tue, 2013-12-10 at 14:33 -0800, Linus Torvalds wrote: > On Tue, Dec 10, 2013 at 2:19 PM, Linus Torvalds > wrote: > > > > Shouldn't we do something like the attached? > > So I think that kernel/futex.c part of the patch might be a good idea, > but on x86-64 (which is what Dave is running), the

Re: process 'stuck' at exit.

2013-12-10 Thread Linus Torvalds
On Tue, Dec 10, 2013 at 2:19 PM, Linus Torvalds wrote: > > Shouldn't we do something like the attached? So I think that kernel/futex.c part of the patch might be a good idea, but on x86-64 (which is what Dave is running), the if (end >> __VIRTUAL_MASK_SHIFT) test in

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 05:21:50PM -0500, Steven Rostedt wrote: > The trace-cmd show, shows you what's in the trace file, which was for the > "manual" version. > > Sorry for the confusion. ah, thanks. That shows.. version = 6 CPU 0 is empty CPU 2 is empty CPU 3 is empty cpus=4

Re: process 'stuck' at exit.

2013-12-10 Thread Steven Rostedt
On Tue, Dec 10, 2013 at 05:16:21PM -0500, Dave Jones wrote: > 46.ga6...@home.goodmis.org> > User-Agent: Mutt/1.5.21 (2010-09-15) > X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 > Sender: linux-kernel-ow...@vger.kernel.org > Precedence: bulk > List-ID: > X-Mailing-List:

Re: process 'stuck' at exit.

2013-12-10 Thread Linus Torvalds
On Tue, Dec 10, 2013 at 1:57 PM, Linus Torvalds wrote: > > So it looks like __get_user_pages_fast() fails, and keeps failing. Hmm.. Is any of the addresses unchecked, perhaps? __get_user_pages_fast() does an access_ok() check, while get_user_pages_fast() does *not* seem to do one. That looks a

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 05:09:46PM -0500, Steven Rostedt wrote: > So you can do either: > > trace-cmd record -p function -l get_user_pages_fast --func-stack sleep 5 > > Which will trace the get_user_pages_fast and spit out a full call trace. This gives me a 100M trace.dat, but trace show

Re: process 'stuck' at exit.

2013-12-10 Thread Steven Rostedt
On Tue, Dec 10, 2013 at 04:41:43PM -0500, Dave Jones wrote: > > > > OK, thanks. So it doesn't return to user-space. > > > > could you do > > > >cd /sys/kernel/debug/tracing/ > >echo 10818 >> set_ftrace_pid > >echo function_graph >> current_tracer > >echo 1 >> tracing_on

Re: process 'stuck' at exit.

2013-12-10 Thread Oleg Nesterov
On 12/10, Linus Torvalds wrote: > > So it looks like __get_user_pages_fast() fails, and keeps failing. And "again:" does get_user_pages_fast(). And according to this trace get_user_pages_fast() takes mmap_sem and calls __get_user_pages(). But __get_user_pages() should fail even before

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 01:59:30PM -0800, Linus Torvalds wrote: > On Tue, Dec 10, 2013 at 1:56 PM, Dave Jones wrote: > > > > Hmm, the only vmlinux I have still around is newer than the running kernel, > > so that's not going to be much help. > > Ok, /proc/kallsyms would do it, but never

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 01:57:49PM -0800, Linus Torvalds wrote: > On Tue, Dec 10, 2013 at 1:41 PM, Dave Jones wrote: > > > > http://codemonkey.org.uk/junk/trace > > Hmm. Ok, so something is calling [__]get_user_pages_fast() and > put_page() in a loop, but the trace doesn't show what that

Re: process 'stuck' at exit.

2013-12-10 Thread Linus Torvalds
On Tue, Dec 10, 2013 at 1:56 PM, Dave Jones wrote: > > Hmm, the only vmlinux I have still around is newer than the running kernel, > so that's not going to be much help. Ok, /proc/kallsyms would do it, but never mind. I think you already pinpointed where the loop is with the trace file, so no

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 01:49:19PM -0800, Linus Torvalds wrote: > On Tue, Dec 10, 2013 at 1:32 PM, Dave Jones wrote: > > > > http://www.codemonkey.org.uk/junk/perf.data.xz > > "Forbidden > > You don't have permission to access /junk/perf.data.xz on this server." fixed. > also, we'd

Re: process 'stuck' at exit.

2013-12-10 Thread Linus Torvalds
On Tue, Dec 10, 2013 at 1:41 PM, Dave Jones wrote: > > http://codemonkey.org.uk/junk/trace Hmm. Ok, so something is calling [__]get_user_pages_fast() and put_page() in a loop, but the trace doesn't show what that "something" is, because it is itself not ever called. However, that pattern does

Re: process 'stuck' at exit.

2013-12-10 Thread Linus Torvalds
On Tue, Dec 10, 2013 at 1:32 PM, Dave Jones wrote: > > http://www.codemonkey.org.uk/junk/perf.data.xz "Forbidden You don't have permission to access /junk/perf.data.xz on this server." also, we'd need the vmlinux file to actually decode the data, I think. Linus -- To

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 10:34:31PM +0100, Oleg Nesterov wrote: > On 12/10, Dave Jones wrote: > > > > On Tue, Dec 10, 2013 at 09:35:59PM +0100, Oleg Nesterov wrote: > > > > > > I am looking at the first message and I can't understand who stuck > > > "at exit". > > > > > > The trace

Re: process 'stuck' at exit.

2013-12-10 Thread Steven Rostedt
On Tue, 10 Dec 2013 22:28:01 +0100 (CET) Thomas Gleixner wrote: > On Tue, 10 Dec 2013, Darren Hart wrote: > > On Tue, 2013-12-10 at 15:55 -0500, Dave Jones wrote: > > > On Tue, Dec 10, 2013 at 09:34:24PM +0100, Thomas Gleixner wrote: > > > > > > > > > PS: Oleg - the whole thread is on lkml.

Re: process 'stuck' at exit.

2013-12-10 Thread Oleg Nesterov
On 12/10, Dave Jones wrote: > > On Tue, Dec 10, 2013 at 09:35:59PM +0100, Oleg Nesterov wrote: > > > > I am looking at the first message and I can't understand who stuck > > "at exit". > > > > The trace shows that the task with pid=10818 called sys_futex() ? > > > > Perhaps "exit" means the

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 01:18:20PM -0800, Linus Torvalds wrote: > On Tue, Dec 10, 2013 at 1:06 PM, Darren Hart wrote: > > > >. Knowing exactly what syscall was made would > > be very useful, but I don't know if that information is even available > > anymore. > > Well, the loop should

Re: process 'stuck' at exit.

2013-12-10 Thread Thomas Gleixner
On Tue, 10 Dec 2013, Darren Hart wrote: > On Tue, 2013-12-10 at 15:55 -0500, Dave Jones wrote: > > On Tue, Dec 10, 2013 at 09:34:24PM +0100, Thomas Gleixner wrote: > > > > > > > PS: Oleg - the whole thread is on lkml. Ping me if you need more > > context. > > > > > > > > btw, I've left the

Re: process 'stuck' at exit.

2013-12-10 Thread Darren Hart
On Tue, 2013-12-10 at 15:55 -0500, Dave Jones wrote: > On Tue, Dec 10, 2013 at 09:34:24PM +0100, Thomas Gleixner wrote: > > > > > PS: Oleg - the whole thread is on lkml. Ping me if you need more > context. > > > > > > btw, I've left the machine in that state, and will for as long as >

Re: process 'stuck' at exit.

2013-12-10 Thread Linus Torvalds
On Tue, Dec 10, 2013 at 1:18 PM, Linus Torvalds wrote: > > to get a good profile for a minute, and then looking at the > instruction-level profiles in futex_requeue() should be possible. Ugh. Looking at kernel/futex.s even *without* debugging enabled is pretty messy. Although much of it seems to

Re: process 'stuck' at exit.

2013-12-10 Thread Linus Torvalds
On Tue, Dec 10, 2013 at 1:06 PM, Darren Hart wrote: > >. Knowing exactly what syscall was made would > be very useful, but I don't know if that information is even available > anymore. Well, the loop should be visible in the profile, since it's still active. So doing something like

Re: process 'stuck' at exit.

2013-12-10 Thread Thomas Gleixner
On Tue, 10 Dec 2013, Linus Torvalds wrote: > On Tue, Dec 10, 2013 at 12:33 PM, Thomas Gleixner wrote: > > > > The -EAGAIN is when the user value changed, simplified: > > No it's not. > > Thomas, stop this crap already. Look at the f*cking code carefully > instead of just dismissing cases. > >

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 01:06:23PM -0800, Darren Hart wrote: > On Tue, 2013-12-10 at 15:49 -0500, Dave Jones wrote: > > On Tue, Dec 10, 2013 at 09:35:59PM +0100, Oleg Nesterov wrote: > > > Dave, I must have missed something, help. > > > > > > I am looking at the first message and I can't

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 12:57:57PM -0800, Darren Hart wrote: > > > Call Trace: > > > [] ? retint_restore_args+0xe/0xe > > > [] ? trace_hardirqs_on_thunk+0x3a/0x3f > > > [] ? native_sched_clock+0x24/0x80 > > > [] ? local_clock+0xf/0x50 > > > [] ? put_lock_stats.isra.28+0xe/0x30 > > >

Re: process 'stuck' at exit.

2013-12-10 Thread Darren Hart
On Tue, 2013-12-10 at 15:49 -0500, Dave Jones wrote: > On Tue, Dec 10, 2013 at 09:35:59PM +0100, Oleg Nesterov wrote: > > Dave, I must have missed something, help. > > > > I am looking at the first message and I can't understand who stuck > > "at exit". > > > > The trace shows that the

Re: process 'stuck' at exit.

2013-12-10 Thread Darren Hart
On Tue, 2013-12-10 at 10:40 -0800, Linus Torvalds wrote: > Hmm. Looks like the futex code is somehow stuck in a loop, calling > get_user_pages_fast(). > > The futex code itself is apparently so low-overhead that it doesn't > show up in your 'perf top' report (which is dominated by all the >

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 09:34:24PM +0100, Thomas Gleixner wrote: > > > PS: Oleg - the whole thread is on lkml. Ping me if you need more > > context. > > > > btw, I've left the machine in that state, and will for as long as necesary > > in case someone has any ideas for further tracing

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 09:35:59PM +0100, Oleg Nesterov wrote: > Dave, I must have missed something, help. > > I am looking at the first message and I can't understand who stuck > "at exit". > > The trace shows that the task with pid=10818 called sys_futex() ? > > Perhaps "exit" means

Re: process 'stuck' at exit.

2013-12-10 Thread Linus Torvalds
On Tue, Dec 10, 2013 at 12:33 PM, Thomas Gleixner wrote: > > The -EAGAIN is when the user value changed, simplified: No it's not. Thomas, stop this crap already. Look at the f*cking code carefully instead of just dismissing cases. The worrisome EAGAIN case is futex_requeue

Re: process 'stuck' at exit.

2013-12-10 Thread Oleg Nesterov
Dave, I must have missed something, help. I am looking at the first message and I can't understand who stuck "at exit". The trace shows that the task with pid=10818 called sys_futex() ? Perhaps "exit" means the userspace paths? Oleg. -- To unsubscribe from this list: send the line

Re: process 'stuck' at exit.

2013-12-10 Thread Thomas Gleixner
On Tue, 10 Dec 2013, Dave Jones wrote: > On Tue, Dec 10, 2013 at 11:55:06AM -0800, Linus Torvalds wrote: > > On Tue, Dec 10, 2013 at 11:18 AM, Thomas Gleixner > wrote: > > > > > > So this is pretty unlikely. The retry requires: > > > > > >get_futex_value_locked() == EFAULT; > > > > >

Re: process 'stuck' at exit.

2013-12-10 Thread Thomas Gleixner
On Tue, 10 Dec 2013, Linus Torvalds wrote: > On Tue, Dec 10, 2013 at 11:18 AM, Thomas Gleixner wrote: > > > > So this is pretty unlikely. The retry requires: > > > >get_futex_value_locked() == EFAULT; > > > > Now we drop the hash bucket locks and do: > > > >get_user(); > > > > And if

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 11:55:06AM -0800, Linus Torvalds wrote: > On Tue, Dec 10, 2013 at 11:18 AM, Thomas Gleixner wrote: > > > > So this is pretty unlikely. The retry requires: > > > >get_futex_value_locked() == EFAULT; > > > > Now we drop the hash bucket locks and do: > > > >

Re: process 'stuck' at exit.

2013-12-10 Thread Linus Torvalds
On Tue, Dec 10, 2013 at 11:18 AM, Thomas Gleixner wrote: > > So this is pretty unlikely. The retry requires: > >get_futex_value_locked() == EFAULT; > > Now we drop the hash bucket locks and do: > >get_user(); > > And if that get_user() faults again, we bail out. I think you need to look

Re: process 'stuck' at exit.

2013-12-10 Thread Thomas Gleixner
On Tue, 10 Dec 2013, Linus Torvalds wrote: > Hmm. Looks like the futex code is somehow stuck in a loop, calling > get_user_pages_fast(). > > The futex code itself is apparently so low-overhead that it doesn't > show up in your 'perf top' report (which is dominated by all the > expensive debug

Re: process 'stuck' at exit.

2013-12-10 Thread Linus Torvalds
Hmm. Looks like the futex code is somehow stuck in a loop, calling get_user_pages_fast(). The futex code itself is apparently so low-overhead that it doesn't show up in your 'perf top' report (which is dominated by all the expensive debug things that get_user_pages_fast() etc ends up doing), but

process 'stuck' at exit.

2013-12-10 Thread Dave Jones
I woke up to find my fuzzer in a curious state. 1121 pts/5SN+0:00 | \_ ../trinity -q -l off -N 99 -C 42 1130 pts/5SN+0:01 | \_ ../trinity -q -l off -N 99 -C 42 1131 pts/5SN+0:17 | \_ ../trinity -q -l off -N 99 -C 42 10818 ?

Re: process 'stuck' at exit.

2013-12-10 Thread Thomas Gleixner
On Tue, 10 Dec 2013, Linus Torvalds wrote: Hmm. Looks like the futex code is somehow stuck in a loop, calling get_user_pages_fast(). The futex code itself is apparently so low-overhead that it doesn't show up in your 'perf top' report (which is dominated by all the expensive debug things

Re: process 'stuck' at exit.

2013-12-10 Thread Linus Torvalds
On Tue, Dec 10, 2013 at 11:18 AM, Thomas Gleixner t...@linutronix.de wrote: So this is pretty unlikely. The retry requires: get_futex_value_locked() == EFAULT; Now we drop the hash bucket locks and do: get_user(); And if that get_user() faults again, we bail out. I think you need

Re: process 'stuck' at exit.

2013-12-10 Thread Dave Jones
On Tue, Dec 10, 2013 at 11:55:06AM -0800, Linus Torvalds wrote: On Tue, Dec 10, 2013 at 11:18 AM, Thomas Gleixner t...@linutronix.de wrote: So this is pretty unlikely. The retry requires: get_futex_value_locked() == EFAULT; Now we drop the hash bucket locks and do:

Re: process 'stuck' at exit.

2013-12-10 Thread Thomas Gleixner
On Tue, 10 Dec 2013, Linus Torvalds wrote: On Tue, Dec 10, 2013 at 11:18 AM, Thomas Gleixner t...@linutronix.de wrote: So this is pretty unlikely. The retry requires: get_futex_value_locked() == EFAULT; Now we drop the hash bucket locks and do: get_user(); And if that

  1   2   >