Re: v4.2-rc dcache regression, probably 75a6f82a0d10

2015-08-01 Thread Al Viro
On Sat, Aug 01, 2015 at 09:09:24AM -0700, Linus Torvalds wrote: > On Sat, Aug 1, 2015 at 12:26 AM, Al Viro wrote: > > > > Actually, the shit had hit the fan earlier. Look: in > > commit b18825a7c8e37a7cf6abb97a12a6ad71af160de7 > > Author: David Howells > > Date: Thu Sep 12 19:22:53 2013 +0100

Re: v4.2-rc dcache regression, probably 75a6f82a0d10

2015-08-01 Thread Linus Torvalds
On Sat, Aug 1, 2015 at 12:26 AM, Al Viro wrote: > > Actually, the shit had hit the fan earlier. Look: in > commit b18825a7c8e37a7cf6abb97a12a6ad71af160de7 > Author: David Howells > Date: Thu Sep 12 19:22:53 2013 +0100 > > VFS: Put a small type field into struct dentry::d_flags > > we have

Re: v4.2-rc dcache regression, probably 75a6f82a0d10

2015-08-01 Thread Dominique Martinet
Dominique Martinet wrote on Sat, Aug 01, 2015: > I had to adapt a bit because using an old kernel (4bf46a272), will try > again with a recent master to doublecheck There have been more changes than what I thought, can't seem to reproduce in a while on linus' HEAD with that fix (it fell in that che

Re: v4.2-rc dcache regression, probably 75a6f82a0d10

2015-08-01 Thread Dominique Martinet
Al Viro wrote on Sat, Aug 01, 2015: > And that has turned the check done to an inode that *was* ours at some > point (i.e. fetching it had been followed by checking that ->d_seq had > been still valid) into something completely unprotected. Suppose we > are in lazy mode and somebody had evicted nd

Re: v4.2-rc dcache regression, probably 75a6f82a0d10

2015-08-01 Thread Al Viro
On Fri, Jul 31, 2015 at 03:52:38PM -0700, Linus Torvalds wrote: > Is that correct? Maybe, I haven't checked. And maybe it's a big bad > bug. Regardless, it sure as hell isn't just changing the order of the > access to those fields. That "DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU" > clearing came from __

Re: v4.2-rc dcache regression, probably 75a6f82a0d10

2015-07-31 Thread Dominique Martinet
Hugh Dickins wrote on Fri, Jul 31, 2015: > On Fri, 31 Jul 2015, Linus Torvalds wrote: >> I'd be more suspicious about other effects. For example, iot's not at >> all obvious that the commit in question just changes the order of the >> flags/inode field accesses, there are potentialy bigger changes

Re: v4.2-rc dcache regression, probably 75a6f82a0d10

2015-07-31 Thread Hugh Dickins
On Fri, 31 Jul 2015, Hugh Dickins wrote: > On Fri, 31 Jul 2015, Linus Torvalds wrote: > > > > So leave it running a while longer, but maybe it's 4bf46a272647 like > > Dominique suspects. Although I don't see how that could trigger > > anything either.. > > I restarted with a slightly different ve

Re: v4.2-rc dcache regression, probably 75a6f82a0d10

2015-07-31 Thread Hugh Dickins
On Fri, 31 Jul 2015, Linus Torvalds wrote: > > I'd be more suspicious about other effects. For example, iot's not at > all obvious that the commit in question just changes the order of the > flags/inode field accesses, there are potentialy bigger changes there. > For example, this part (in __d_obt

Re: v4.2-rc dcache regression, probably 75a6f82a0d10

2015-07-31 Thread Hugh Dickins
On Fri, 31 Jul 2015, Dominique Martinet wrote: > Hugh Dickins wrote on Fri, Jul 31, 2015: > > It will indeed be weird and odd if it confirms that DCACHE_DISCONNECTED > > revert is good. I agree that Dominique's 4bf46a272647 seems now more > > likely, if still unlikely; but that was included in v4.

Re: v4.2-rc dcache regression, probably 75a6f82a0d10

2015-07-31 Thread Linus Torvalds
On Fri, Jul 31, 2015 at 1:50 PM, Dominique Martinet wrote: > > It's probably an old race that was very hard to hit because of cache > coherency. > Basically, before the wmb/rmb, the dentry was always updated closely to > its flags, so the other CPU would "usually" get both updates at the same > ti

Re: v4.2-rc dcache regression, probably 75a6f82a0d10

2015-07-31 Thread Dominique Martinet
Hugh Dickins wrote on Fri, Jul 31, 2015: > It will indeed be weird and odd if it confirms that DCACHE_DISCONNECTED > revert is good. I agree that Dominique's 4bf46a272647 seems now more > likely, if still unlikely; but that was included in v4.1, and I saw > no problem with v4.1 once the rmap_walk(

Re: v4.2-rc dcache regression, probably 75a6f82a0d10

2015-07-31 Thread Hugh Dickins
On Fri, 31 Jul 2015, Linus Torvalds wrote: > On Fri, Jul 31, 2015 at 10:46 AM, Hugh Dickins wrote: > > > > Sounds like a dcache problem, and 75a6f82a0d10 seemed the only > > likely candidate, so I experimented with reverting it yesterday, > > and ran successfully for 24 hours. > > Hmm. Sounds odd

Re: v4.2-rc dcache regression, probably 75a6f82a0d10

2015-07-31 Thread Linus Torvalds
On Fri, Jul 31, 2015 at 10:46 AM, Hugh Dickins wrote: > > Sounds like a dcache problem, and 75a6f82a0d10 seemed the only > likely candidate, so I experimented with reverting it yesterday, > and ran successfully for 24 hours. Hmm. Sounds odd. Are you running nfsd? That would explain why it happens

Re: v4.2-rc dcache regression, probably 75a6f82a0d10

2015-07-31 Thread J. Bruce Fields
On Fri, Jul 31, 2015 at 10:46:51AM -0700, Hugh Dickins wrote: > I think there's something not quite right with the fs/dcache.c > commit 75a6f82a0d10 ("freeing unlinked file indefinitely delayed"). > > When running my old tmpfs swapping load (two repetitive make -j20 > kernel builds, one on tmpfs,

Re: v4.2-rc dcache regression, probably 75a6f82a0d10

2015-07-31 Thread Dominique Martinet
Hi, Hugh Dickins wrote on Fri, Jul 31, 2015: > I think there's something not quite right with the fs/dcache.c > commit 75a6f82a0d10 ("freeing unlinked file indefinitely delayed"). > > When running my old tmpfs swapping load (two repetitive make -j20 > kernel builds, one on tmpfs, one on ext4 over