Re: splice vs execve lockdep trace.

2013-07-18 Thread Dave Chinner
On Thu, Jul 18, 2013 at 05:21:17PM -0500, Ben Myers wrote: > Dave, > > On Thu, Jul 18, 2013 at 04:16:32PM -0500, Ben Myers wrote: > > On Thu, Jul 18, 2013 at 01:42:03PM +1000, Dave Chinner wrote: > > > On Wed, Jul 17, 2013 at 05:17:36PM -0700, Linus Torvalds wrote: > > > > On Wed, Jul 17, 2013 at

Re: splice vs execve lockdep trace.

2013-07-18 Thread Ben Myers
Dave, On Thu, Jul 18, 2013 at 04:16:32PM -0500, Ben Myers wrote: > On Thu, Jul 18, 2013 at 01:42:03PM +1000, Dave Chinner wrote: > > On Wed, Jul 17, 2013 at 05:17:36PM -0700, Linus Torvalds wrote: > > > On Wed, Jul 17, 2013 at 4:40 PM, Ben Myers wrote: > > > >> > > > >> We're still talking at cro

Re: splice vs execve lockdep trace.

2013-07-18 Thread Ben Myers
Dave, On Thu, Jul 18, 2013 at 01:42:03PM +1000, Dave Chinner wrote: > On Wed, Jul 17, 2013 at 05:17:36PM -0700, Linus Torvalds wrote: > > On Wed, Jul 17, 2013 at 4:40 PM, Ben Myers wrote: > > >> > > >> We're still talking at cross purposes then. > > >> > > >> How the hell do you handle mmap() and

Re: splice vs execve lockdep trace.

2013-07-17 Thread Dave Chinner
On Wed, Jul 17, 2013 at 05:17:36PM -0700, Linus Torvalds wrote: > On Wed, Jul 17, 2013 at 4:40 PM, Ben Myers wrote: > >> > >> We're still talking at cross purposes then. > >> > >> How the hell do you handle mmap() and page faulting? > > > > __xfs_get_blocks serializes access to the block map with

Re: splice vs execve lockdep trace.

2013-07-17 Thread Dave Chinner
On Wed, Jul 17, 2013 at 09:03:11AM -0700, Linus Torvalds wrote: > On Tue, Jul 16, 2013 at 10:51 PM, Dave Chinner wrote: > > > > But When i say "stale data" I mean that the data being returned > > might not have originally belonged to the underlying file you are > > reading. > > We're still talkin

Re: splice vs execve lockdep trace.

2013-07-17 Thread Linus Torvalds
On Wed, Jul 17, 2013 at 4:40 PM, Ben Myers wrote: >> >> We're still talking at cross purposes then. >> >> How the hell do you handle mmap() and page faulting? > > __xfs_get_blocks serializes access to the block map with the i_lock on the > xfs_inode. This appears to be racy with respect to hole p

Re: splice vs execve lockdep trace.

2013-07-17 Thread Ben Myers
Linus, On Wed, Jul 17, 2013 at 09:03:11AM -0700, Linus Torvalds wrote: > On Tue, Jul 16, 2013 at 10:51 PM, Dave Chinner wrote: > > > > But When i say "stale data" I mean that the data being returned > > might not have originally belonged to the underlying file you are > > reading. > > We're stil

Re: splice vs execve lockdep trace.

2013-07-17 Thread Linus Torvalds
On Tue, Jul 16, 2013 at 10:51 PM, Dave Chinner wrote: > > But When i say "stale data" I mean that the data being returned > might not have originally belonged to the underlying file you are > reading. We're still talking at cross purposes then. How the hell do you handle mmap() and page faulting

Re: splice vs execve lockdep trace.

2013-07-16 Thread Dave Chinner
On Tue, Jul 16, 2013 at 09:54:09PM -0700, Linus Torvalds wrote: > On Tue, Jul 16, 2013 at 9:06 PM, Dave Chinner wrote: > > > > Right, and that's one of the biggest problems page based IO has - we > > can't serialise it against other IO and other page cache > > manipulation functions like hole punc

Re: splice vs execve lockdep trace.

2013-07-16 Thread Linus Torvalds
On Tue, Jul 16, 2013 at 9:06 PM, Dave Chinner wrote: > > Right, and that's one of the biggest problems page based IO has - we > can't serialise it against other IO and other page cache > manipulation functions like hole punching. What happens when a > splice read or mmap page fault races with a ho

Re: splice vs execve lockdep trace.

2013-07-16 Thread Dave Chinner
On Tue, Jul 16, 2013 at 02:02:12PM -0700, Linus Torvalds wrote: > On Tue, Jul 16, 2013 at 1:43 PM, Dave Chinner wrote: > > > > Yes - IO is serialised based on the ip->i_iolock, not i_mutex. We > > don't use i_mutex for many things IO related, and so internal > > locking is needed to serialise agai

Re: splice vs execve lockdep trace.

2013-07-16 Thread Linus Torvalds
On Tue, Jul 16, 2013 at 1:43 PM, Dave Chinner wrote: > > Yes - IO is serialised based on the ip->i_iolock, not i_mutex. We > don't use i_mutex for many things IO related, and so internal > locking is needed to serialise against stuff like truncate, hole > punching, etc, that are run through non-vf

Re: splice vs execve lockdep trace.

2013-07-16 Thread Dave Chinner
On Tue, Jul 16, 2013 at 01:18:06PM -0700, Linus Torvalds wrote: > On Tue, Jul 16, 2013 at 12:33 PM, Ben Myers wrote: > >> > > >> > And looking more at that, I'm actually starting to think this is an > >> > XFS locking problem. XFS really should not call back to splice while > >> > holding the inod

Re: splice vs execve lockdep trace.

2013-07-16 Thread Linus Torvalds
On Tue, Jul 16, 2013 at 12:33 PM, Ben Myers wrote: >> > >> > And looking more at that, I'm actually starting to think this is an >> > XFS locking problem. XFS really should not call back to splice while >> > holding the inode lock. .. that was misleading, normally "inode lock" would be i_lock, bu

Re: splice vs execve lockdep trace.

2013-07-16 Thread Ben Myers
Hi Dave, Linus, On Tue, Jul 16, 2013 at 04:03:51PM +1000, Dave Chinner wrote: > On Mon, Jul 15, 2013 at 08:25:14PM -0700, Linus Torvalds wrote: > > On Mon, Jul 15, 2013 at 7:38 PM, Dave Jones wrote: > > > > > > The recent trinity changes shouldn't have really made > > > any notable difference h

Re: splice vs execve lockdep trace.

2013-07-16 Thread Dave Jones
On Tue, Jul 16, 2013 at 09:59:35AM -0400, Vince Weaver wrote: > On Mon, 15 Jul 2013, Linus Torvalds wrote: > > > On Mon, Jul 15, 2013 at 7:38 PM, Dave Jones wrote: > > > > > Interestingly, the 'soft lockups' I was > > > seeing all the time on that box seem to have gone into hiding. > >

Re: splice vs execve lockdep trace.

2013-07-16 Thread Vince Weaver
On Mon, 15 Jul 2013, Linus Torvalds wrote: > On Mon, Jul 15, 2013 at 7:38 PM, Dave Jones wrote: > > > Interestingly, the 'soft lockups' I was > > seeing all the time on that box seem to have gone into hiding. > > Honestly, I'm somewhat inclined to blame the whole perf situation, and > saying th

Re: splice vs execve lockdep trace.

2013-07-15 Thread Dave Chinner
On Tue, Jul 16, 2013 at 07:16:02AM +0100, Al Viro wrote: > On Tue, Jul 16, 2013 at 04:03:51PM +1000, Dave Chinner wrote: > > > I posted patches to fix this i_mutex/i_iolock inversion a couple of > > years ago (july 2011): > > > > https://lkml.org/lkml/2011/7/18/4 > > > > And V2 was posted here a

Re: splice vs execve lockdep trace.

2013-07-15 Thread Dave Chinner
On Tue, Jul 16, 2013 at 07:16:02AM +0100, Al Viro wrote: > On Tue, Jul 16, 2013 at 04:03:51PM +1000, Dave Chinner wrote: > > > I posted patches to fix this i_mutex/i_iolock inversion a couple of > > years ago (july 2011): > > > > https://lkml.org/lkml/2011/7/18/4 > > > > And V2 was posted here a

Re: splice vs execve lockdep trace.

2013-07-15 Thread Al Viro
On Tue, Jul 16, 2013 at 04:03:51PM +1000, Dave Chinner wrote: > I posted patches to fix this i_mutex/i_iolock inversion a couple of > years ago (july 2011): > > https://lkml.org/lkml/2011/7/18/4 > > And V2 was posted here and reviewed (aug 2011): > > http://xfs.9218.n7.nabble.com/PATCH-0-2-spli

Re: splice vs execve lockdep trace.

2013-07-15 Thread Dave Chinner
On Mon, Jul 15, 2013 at 08:25:14PM -0700, Linus Torvalds wrote: > On Mon, Jul 15, 2013 at 7:38 PM, Dave Jones wrote: > > > > The recent trinity changes shouldn't have really made > > any notable difference here. > > Hmm. I'm not aware pf anything that has changed in this area since > 3.10 - nei

Re: splice vs execve lockdep trace.

2013-07-15 Thread Al Viro
On Mon, Jul 15, 2013 at 08:25:14PM -0700, Linus Torvalds wrote: > The "pipe -> cred_guard_mutex" lock chain is pretty direct, and can be > clearly attributed to splicing into /proc. Now, whether that is a > *good* idea or not is clearly debatable, and I do think that maybe we > should just not spl

Re: splice vs execve lockdep trace.

2013-07-15 Thread Dave Jones
On Mon, Jul 15, 2013 at 08:25:14PM -0700, Linus Torvalds wrote: > And looking more at that, I'm actually starting to think this is an > XFS locking problem. XFS really should not call back to splice while > holding the inode lock. > > But that XFS code doesn't seem new either. Is XFS a ne

Re: splice vs execve lockdep trace.

2013-07-15 Thread Linus Torvalds
On Mon, Jul 15, 2013 at 7:38 PM, Dave Jones wrote: > > The recent trinity changes shouldn't have really made > any notable difference here. Hmm. I'm not aware pf anything that has changed in this area since 3.10 - neither in execve, xfs or in splice. Not even since 3.9. But I may certainly hav

Re: splice vs execve lockdep trace.

2013-07-15 Thread Dave Jones
On Mon, Jul 15, 2013 at 07:32:51PM -0700, Linus Torvalds wrote: > So the problematic op *seems* to be the splice into /proc//attr/ > files, which causes that "pipe -> cred_guard_mutex" locking. > > While the *normal* ordering would be the other way around, coming from > execve(), which has t

Re: splice vs execve lockdep trace.

2013-07-15 Thread Linus Torvalds
Hmm. I don't have a lot of ideas, I'm just adding lockdep, splice and FS people (and Oleg, just because) to the cc to see if anybody else does. Al, Peter? So the problematic op *seems* to be the splice into /proc//attr/ files, which causes that "pipe -> cred_guard_mutex" locking. While the *norm