Re: dcache shrink list corruption?

2014-04-29 Thread Dave Chinner
On Tue, Apr 29, 2014 at 08:10:15PM +0100, Al Viro wrote: > On Tue, Apr 29, 2014 at 07:16:10PM +0100, Al Viro wrote: > > On Tue, Apr 29, 2014 at 08:03:24PM +0200, Miklos Szeredi wrote: > > > > > Introducing a new per-sb lock should be OK. > > > > > > Another idea, which could have subtler

Re: dcache shrink list corruption?

2014-04-29 Thread Al Viro
On Tue, Apr 29, 2014 at 07:16:10PM +0100, Al Viro wrote: > On Tue, Apr 29, 2014 at 08:03:24PM +0200, Miklos Szeredi wrote: > > > Introducing a new per-sb lock should be OK. > > > > Another idea, which could have subtler effects, is simply not to kill > > a dentry that is on the shrink list

Re: dcache shrink list corruption?

2014-04-29 Thread Miklos Szeredi
On Tue, Apr 29, 2014 at 7:43 PM, Linus Torvalds wrote: > On Tue, Apr 29, 2014 at 9:01 AM, Miklos Szeredi wrote: >> This was reported by IBM for 3.12, but if my analysis is right, it affects >> current kernel as well as older ones. >> >> So the question is: does anything protect the shrink list

Re: dcache shrink list corruption?

2014-04-29 Thread Linus Torvalds
On Tue, Apr 29, 2014 at 11:03 AM, Miklos Szeredi wrote: > > Because we no longer have that. It now uses the list_lru thing, with > a "per-node" lock, whatever that one is. Oh, yes. Right you are. I just started looking at that and went "ugh". The lru lists are all distributed now with multiple

Re: dcache shrink list corruption?

2014-04-29 Thread Al Viro
On Tue, Apr 29, 2014 at 08:03:24PM +0200, Miklos Szeredi wrote: > Introducing a new per-sb lock should be OK. > > Another idea, which could have subtler effects, is simply not to kill > a dentry that is on the shrink list (indicated by > DCACHE_SHRINK_LIST), since it's bound to get killed

Re: dcache shrink list corruption?

2014-04-29 Thread Al Viro
On Tue, Apr 29, 2014 at 06:01:39PM +0200, Miklos Szeredi wrote: > Attached patch is just a starting point (untested). Not sure how to minimize > contention without adding too much complexity. Contention isn't the worst problem here - I'd expect the cacheline ping-pong to hurt more... I agree

Re: dcache shrink list corruption?

2014-04-29 Thread Linus Torvalds
On Tue, Apr 29, 2014 at 9:01 AM, Miklos Szeredi wrote: > This was reported by IBM for 3.12, but if my analysis is right, it affects > current kernel as well as older ones. > > So the question is: does anything protect the shrink list from concurrent > modification by one or more dput() instances?

dcache shrink list corruption?

2014-04-29 Thread Miklos Szeredi
This was reported by IBM for 3.12, but if my analysis is right, it affects current kernel as well as older ones. So the question is: does anything protect the shrink list from concurrent modification by one or more dput() instances? E.g. two dentries are on the shrink list, for both dget(),

dcache shrink list corruption?

2014-04-29 Thread Miklos Szeredi
This was reported by IBM for 3.12, but if my analysis is right, it affects current kernel as well as older ones. So the question is: does anything protect the shrink list from concurrent modification by one or more dput() instances? E.g. two dentries are on the shrink list, for both dget(),

Re: dcache shrink list corruption?

2014-04-29 Thread Linus Torvalds
On Tue, Apr 29, 2014 at 9:01 AM, Miklos Szeredi mik...@szeredi.hu wrote: This was reported by IBM for 3.12, but if my analysis is right, it affects current kernel as well as older ones. So the question is: does anything protect the shrink list from concurrent modification by one or more

Re: dcache shrink list corruption?

2014-04-29 Thread Al Viro
On Tue, Apr 29, 2014 at 06:01:39PM +0200, Miklos Szeredi wrote: Attached patch is just a starting point (untested). Not sure how to minimize contention without adding too much complexity. Contention isn't the worst problem here - I'd expect the cacheline ping-pong to hurt more... I agree

Re: dcache shrink list corruption?

2014-04-29 Thread Al Viro
On Tue, Apr 29, 2014 at 08:03:24PM +0200, Miklos Szeredi wrote: Introducing a new per-sb lock should be OK. Another idea, which could have subtler effects, is simply not to kill a dentry that is on the shrink list (indicated by DCACHE_SHRINK_LIST), since it's bound to get killed anyway.

Re: dcache shrink list corruption?

2014-04-29 Thread Linus Torvalds
On Tue, Apr 29, 2014 at 11:03 AM, Miklos Szeredi mik...@szeredi.hu wrote: Because we no longer have that. It now uses the list_lru thing, with a per-node lock, whatever that one is. Oh, yes. Right you are. I just started looking at that and went ugh. The lru lists are all distributed now

Re: dcache shrink list corruption?

2014-04-29 Thread Miklos Szeredi
On Tue, Apr 29, 2014 at 7:43 PM, Linus Torvalds torva...@linux-foundation.org wrote: On Tue, Apr 29, 2014 at 9:01 AM, Miklos Szeredi mik...@szeredi.hu wrote: This was reported by IBM for 3.12, but if my analysis is right, it affects current kernel as well as older ones. So the question is:

Re: dcache shrink list corruption?

2014-04-29 Thread Al Viro
On Tue, Apr 29, 2014 at 07:16:10PM +0100, Al Viro wrote: On Tue, Apr 29, 2014 at 08:03:24PM +0200, Miklos Szeredi wrote: Introducing a new per-sb lock should be OK. Another idea, which could have subtler effects, is simply not to kill a dentry that is on the shrink list (indicated by

Re: dcache shrink list corruption?

2014-04-29 Thread Dave Chinner
On Tue, Apr 29, 2014 at 08:10:15PM +0100, Al Viro wrote: On Tue, Apr 29, 2014 at 07:16:10PM +0100, Al Viro wrote: On Tue, Apr 29, 2014 at 08:03:24PM +0200, Miklos Szeredi wrote: Introducing a new per-sb lock should be OK. Another idea, which could have subtler effects, is simply

Re: dcache shrink list corruption?

2014-04-29 Thread Al Viro
On Wed, Apr 30, 2014 at 07:18:51AM +1000, Dave Chinner wrote: Seems like it would work, but it seems fragile to me - I'm wondering how we can ensure that the private shrink list manipulations can be kept private. We have a similar situation with the inode cache (private shrink list) but

Re: dcache shrink list corruption?

2014-04-29 Thread Linus Torvalds
On Tue, Apr 29, 2014 at 2:48 PM, Al Viro v...@zeniv.linux.org.uk wrote: Ummm... You mean, have d_lookup() et.al. fail on something that is on a shrink list? So I tried to see if that would work just consider it dead by the time it hits the shrink list, and if somebody does a lookup on the

Re: dcache shrink list corruption?

2014-04-29 Thread Al Viro
On Tue, Apr 29, 2014 at 04:04:11PM -0700, Linus Torvalds wrote: But at a minimum, we have d_op-d_prune() that would now be possibly be called for the old dentry *after* a new dentry has been allocated. Not to mention the inode not having been dropped. So it looks like a disaster where the

Re: dcache shrink list corruption?

2014-04-29 Thread Al Viro
On Wed, Apr 30, 2014 at 12:20:13AM +0100, Al Viro wrote: On Tue, Apr 29, 2014 at 04:04:11PM -0700, Linus Torvalds wrote: But at a minimum, we have d_op-d_prune() that would now be possibly be called for the old dentry *after* a new dentry has been allocated. Not to mention the inode not

Re: dcache shrink list corruption?

2014-04-29 Thread Linus Torvalds
On Tue, Apr 29, 2014 at 7:31 PM, Al Viro v...@zeniv.linux.org.uk wrote: OK, aggregate diff follows, more readable splitup (3 commits) attached. It seems to survive beating here; testing, review and comments are welcome. Miklos, did you have some particular load that triggered this, or was it

Re: dcache shrink list corruption?

2014-04-29 Thread Al Viro
On Tue, Apr 29, 2014 at 07:56:13PM -0700, Linus Torvalds wrote: On Tue, Apr 29, 2014 at 7:31 PM, Al Viro v...@zeniv.linux.org.uk wrote: OK, aggregate diff follows, more readable splitup (3 commits) attached. It seems to survive beating here; testing, review and comments are welcome.

<    1   2