Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2018-01-26 Thread Michal Hocko
On Fri 26-01-18 12:12:00, Tetsuo Handa wrote: > Would you answer to Michal's questions > > Is this a permanent state or does the holder eventually releases the lock? > > Do you remember the last good kernel? > > and my guess > > Since commit 0bcac06f27d75285 was not backported to 4.14-sta

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2018-01-25 Thread Tetsuo Handa
Eric Wheeler wrote: > Hi Tetsuo, > > Thank you for looking into this! > > I tried running this C program in 4.14.15 but did not get a deadlock, just > OOM kills. Is the patch required to induce the deadlock? This reproducer must not trigger actual deadlock. Running this reproducer with this pat

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2018-01-25 Thread Eric Wheeler
On Thu, 25 Jan 2018, Tetsuo Handa wrote: > Using a debug patch and a reproducer shown below, we can trivially form > a circular locking dependency shown below. > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 8219001..240efb1 100644 > --- a/mm/oom_

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2018-01-25 Thread Michal Hocko
On Thu 25-01-18 19:56:59, Tetsuo Handa wrote: [...] > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 1afb2af..9858449 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -410,6 +410,9 @@ static unsigned long do_shrink_slab(struct shrink_control > *shrinkctl, > return freed; > } > > +struc

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2018-01-25 Thread Tetsuo Handa
Using a debug patch and a reproducer shown below, we can trivially form a circular locking dependency shown below. diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 8219001..240efb1 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -950,7 +950,7 @@ static vo

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2018-01-25 Thread Michal Hocko
On Thu 25-01-18 11:04:23, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Wed 15-11-17 09:00:20, Johannes Weiner wrote: > > > In any case, Minchan's lock breaking seems way preferable over that > > > level of headscratching complexity for an unusual case like Shakeel's. > > > > agreed! I would go

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2018-01-24 Thread Tetsuo Handa
Michal Hocko wrote: > On Wed 15-11-17 09:00:20, Johannes Weiner wrote: > > In any case, Minchan's lock breaking seems way preferable over that > > level of headscratching complexity for an unusual case like Shakeel's. > > agreed! I would go the more complex way only if it turns out that early > br

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-23 Thread Minchan Kim
On Thu, Nov 16, 2017 at 12:44:22PM -0500, Johannes Weiner wrote: > On Wed, Nov 15, 2017 at 09:56:02AM +0900, Minchan Kim wrote: > > @@ -498,6 +498,14 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int > > nid, > > sc.nid = 0; > > > > freed += do_shrink_slab(&

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-20 Thread Paul E. McKenney
On Mon, Nov 20, 2017 at 07:56:28PM +0900, Tetsuo Handa wrote: > Christoph Hellwig wrote: > > On Mon, Nov 20, 2017 at 10:42:37AM +0100, Michal Hocko wrote: > > > The patch has been dropped because allnoconfig failed to compile back > > > then > > > http://lkml.kernel.org/r/CAP=VYLr0rPWi1aeuk4w1On9C

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-20 Thread Tetsuo Handa
Christoph Hellwig wrote: > On Mon, Nov 20, 2017 at 10:42:37AM +0100, Michal Hocko wrote: > > The patch has been dropped because allnoconfig failed to compile back > > then > > http://lkml.kernel.org/r/CAP=VYLr0rPWi1aeuk4w1On9CYRNmnEWwJgGtaX=wevgabur...@mail.gmail.com > > I have problem to find the

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-20 Thread Christoph Hellwig
On Mon, Nov 20, 2017 at 10:42:37AM +0100, Michal Hocko wrote: > The patch has been dropped because allnoconfig failed to compile back > then > http://lkml.kernel.org/r/CAP=VYLr0rPWi1aeuk4w1On9CYRNmnEWwJgGtaX=wevgabur...@mail.gmail.com > I have problem to find the follow up discussion though. The m

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-20 Thread Michal Hocko
On Mon 20-11-17 01:33:09, Christoph Hellwig wrote: > On Mon, Nov 20, 2017 at 10:25:26AM +0100, Michal Hocko wrote: > > On Fri 17-11-17 09:35:21, Christoph Hellwig wrote: > > > On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: > > > > Since do_shrink_slab() can reschedule, we cannot prot

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-20 Thread Christoph Hellwig
On Mon, Nov 20, 2017 at 10:25:26AM +0100, Michal Hocko wrote: > On Fri 17-11-17 09:35:21, Christoph Hellwig wrote: > > On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: > > > Since do_shrink_slab() can reschedule, we cannot protect shrinker_list > > > using one RCU section. But using at

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-20 Thread Michal Hocko
On Fri 17-11-17 09:35:21, Christoph Hellwig wrote: > On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: > > Since do_shrink_slab() can reschedule, we cannot protect shrinker_list > > using one RCU section. But using atomic_inc()/atomic_dec() for each > > do_shrink_slab() call will not im

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-17 Thread Christoph Hellwig
On Fri, Nov 17, 2017 at 09:41:46AM -0800, Shakeel Butt wrote: > On Fri, Nov 17, 2017 at 9:35 AM, Christoph Hellwig wrote: > > On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: > >> Since do_shrink_slab() can reschedule, we cannot protect shrinker_list > >> using one RCU section. But us

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-17 Thread Shakeel Butt
On Fri, Nov 17, 2017 at 9:41 AM, Shakeel Butt wrote: > On Fri, Nov 17, 2017 at 9:35 AM, Christoph Hellwig wrote: >> On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: >>> Since do_shrink_slab() can reschedule, we cannot protect shrinker_list >>> using one RCU section. But using atomic_

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-17 Thread Shakeel Butt
On Fri, Nov 17, 2017 at 9:35 AM, Christoph Hellwig wrote: > On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: >> Since do_shrink_slab() can reschedule, we cannot protect shrinker_list >> using one RCU section. But using atomic_inc()/atomic_dec() for each >> do_shrink_slab() call will n

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-17 Thread Christoph Hellwig
On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: > Since do_shrink_slab() can reschedule, we cannot protect shrinker_list > using one RCU section. But using atomic_inc()/atomic_dec() for each > do_shrink_slab() call will not impact so much. But you could use SRCU..

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-16 Thread Johannes Weiner
On Wed, Nov 15, 2017 at 09:56:02AM +0900, Minchan Kim wrote: > @@ -498,6 +498,14 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, > sc.nid = 0; > > freed += do_shrink_slab(&sc, shrinker, nr_scanned, nr_eligible); > + /* > +

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-16 Thread Tetsuo Handa
Johannes Weiner wrote: > On Wed, Nov 15, 2017 at 07:58:09PM +0900, Tetsuo Handa wrote: > > I think that Minchan's approach depends on how > > > > In our production, we have observed that the job loader gets stuck for > > 10s of seconds while doing mount operation. It turns out that it was > >

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-15 Thread Minchan Kim
On Wed, Nov 15, 2017 at 05:41:41PM -0800, Shakeel Butt wrote: > On Wed, Nov 15, 2017 at 4:46 PM, Minchan Kim wrote: > > On Tue, Nov 14, 2017 at 10:28:10PM -0800, Shakeel Butt wrote: > >> On Tue, Nov 14, 2017 at 4:56 PM, Minchan Kim wrote: > >> > On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Ha

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-15 Thread Shakeel Butt
On Wed, Nov 15, 2017 at 4:46 PM, Minchan Kim wrote: > On Tue, Nov 14, 2017 at 10:28:10PM -0800, Shakeel Butt wrote: >> On Tue, Nov 14, 2017 at 4:56 PM, Minchan Kim wrote: >> > On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: >> >> When shrinker_rwsem was introduced, it was assumed th

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-15 Thread Minchan Kim
On Wed, Nov 15, 2017 at 12:51:43PM +0100, Michal Hocko wrote: < snip > > > Since it is possible for a local unpriviledged user to lockup the system at > > least > > due to mute_trylock(&oom_lock) versus (printk() or > > schedule_timeout_killable(1)), > > I suggest completely eliminating schedulin

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-15 Thread Minchan Kim
On Tue, Nov 14, 2017 at 10:28:10PM -0800, Shakeel Butt wrote: > On Tue, Nov 14, 2017 at 4:56 PM, Minchan Kim wrote: > > On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: > >> When shrinker_rwsem was introduced, it was assumed that > >> register_shrinker()/unregister_shrinker() are real

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-15 Thread Michal Hocko
On Wed 15-11-17 09:00:20, Johannes Weiner wrote: > On Wed, Nov 15, 2017 at 10:02:51AM +0100, Michal Hocko wrote: > > On Tue 14-11-17 06:37:42, Tetsuo Handa wrote: > > > This patch uses polling loop with short sleep for unregister_shrinker() > > > rather than wait_on_atomic_t(), for we can save read

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-15 Thread Johannes Weiner
On Wed, Nov 15, 2017 at 10:02:51AM +0100, Michal Hocko wrote: > On Tue 14-11-17 06:37:42, Tetsuo Handa wrote: > > This patch uses polling loop with short sleep for unregister_shrinker() > > rather than wait_on_atomic_t(), for we can save reader's cost (plain > > atomic_dec() compared to atomic_dec_

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-15 Thread Johannes Weiner
On Wed, Nov 15, 2017 at 07:58:09PM +0900, Tetsuo Handa wrote: > I think that Minchan's approach depends on how > > In our production, we have observed that the job loader gets stuck for > 10s of seconds while doing mount operation. It turns out that it was > stuck in register_shrinker() and

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-15 Thread Michal Hocko
On Wed 15-11-17 19:58:09, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Tue 14-11-17 06:37:42, Tetsuo Handa wrote: > > > When shrinker_rwsem was introduced, it was assumed that > > > register_shrinker()/unregister_shrinker() are really unlikely paths > > > which are called during initialization

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-15 Thread Tetsuo Handa
Michal Hocko wrote: > On Tue 14-11-17 06:37:42, Tetsuo Handa wrote: > > When shrinker_rwsem was introduced, it was assumed that > > register_shrinker()/unregister_shrinker() are really unlikely paths > > which are called during initialization and tear down. But nowadays, > > register_shrinker()/unr

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-15 Thread Michal Hocko
On Wed 15-11-17 09:56:25, Michal Hocko wrote: > On Wed 15-11-17 09:56:02, Minchan Kim wrote: > > On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: > > > When shrinker_rwsem was introduced, it was assumed that > > > register_shrinker()/unregister_shrinker() are really unlikely paths > >

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-15 Thread Michal Hocko
On Tue 14-11-17 06:37:42, Tetsuo Handa wrote: > When shrinker_rwsem was introduced, it was assumed that > register_shrinker()/unregister_shrinker() are really unlikely paths > which are called during initialization and tear down. But nowadays, > register_shrinker()/unregister_shrinker() might be ca

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-15 Thread Michal Hocko
On Wed 15-11-17 09:56:02, Minchan Kim wrote: > On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: > > When shrinker_rwsem was introduced, it was assumed that > > register_shrinker()/unregister_shrinker() are really unlikely paths > > which are called during initialization and tear down.

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-14 Thread Shakeel Butt
On Tue, Nov 14, 2017 at 4:56 PM, Minchan Kim wrote: > On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: >> When shrinker_rwsem was introduced, it was assumed that >> register_shrinker()/unregister_shrinker() are really unlikely paths >> which are called during initialization and tear d

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-14 Thread Minchan Kim
On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote: > When shrinker_rwsem was introduced, it was assumed that > register_shrinker()/unregister_shrinker() are really unlikely paths > which are called during initialization and tear down. But nowadays, > register_shrinker()/unregister_shrink

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-13 Thread Shakeel Butt
On Mon, Nov 13, 2017 at 1:37 PM, Tetsuo Handa wrote: > When shrinker_rwsem was introduced, it was assumed that > register_shrinker()/unregister_shrinker() are really unlikely paths > which are called during initialization and tear down. But nowadays, > register_shrinker()/unregister_shrinker() mig

[PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-13 Thread Tetsuo Handa
When shrinker_rwsem was introduced, it was assumed that register_shrinker()/unregister_shrinker() are really unlikely paths which are called during initialization and tear down. But nowadays, register_shrinker()/unregister_shrinker() might be called regularly. This patch prepares for allowing paral