Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-24 Thread Tejun Heo
Hello, Jan. On Thu, May 24, 2018 at 12:19:00PM +0200, Jan Kara wrote: > > We're periodically seeing close to 256 kworkers getting stuck with the > > following stack trace and overtime the entire system gets stuck. > > OK, but that means that you have to have 256 block devices, don't you? As

Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-24 Thread Tejun Heo
Hello, Jan. On Thu, May 24, 2018 at 12:19:00PM +0200, Jan Kara wrote: > > We're periodically seeing close to 256 kworkers getting stuck with the > > following stack trace and overtime the entire system gets stuck. > > OK, but that means that you have to have 256 block devices, don't you? As

Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-24 Thread Jan Kara
On Wed 23-05-18 10:56:32, Tejun Heo wrote: > From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001 > From: Tejun Heo > Date: Wed, 23 May 2018 10:29:00 -0700 > > cgwb_release() punts the actual release to cgwb_release_workfn() on > system_wq. Depending on the

Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-24 Thread Jan Kara
On Wed 23-05-18 10:56:32, Tejun Heo wrote: > From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001 > From: Tejun Heo > Date: Wed, 23 May 2018 10:29:00 -0700 > > cgwb_release() punts the actual release to cgwb_release_workfn() on > system_wq. Depending on the number of cgroups

Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-23 Thread Tejun Heo
Hello, Rik. On Wed, May 23, 2018 at 06:03:15PM -0400, Rik van Riel wrote: > Dumb question. Does setting max_active to 1 mean > that every cgwb_release_workfn() ends up forcing > another RCU grace period on the whole system, while > today you might have a bunch of them waiting on the > same RCU

Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-23 Thread Tejun Heo
Hello, Rik. On Wed, May 23, 2018 at 06:03:15PM -0400, Rik van Riel wrote: > Dumb question. Does setting max_active to 1 mean > that every cgwb_release_workfn() ends up forcing > another RCU grace period on the whole system, while > today you might have a bunch of them waiting on the > same RCU

Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-23 Thread Rik van Riel
On Wed, 2018-05-23 at 10:56 -0700, Tejun Heo wrote: > The events leading to the lockup are... > > 1. A lot of cgwb_release_workfn() is queued at the same time and all >system_wq kworkers are assigned to execute them. > > 2. They all end up calling synchronize_rcu_expedited(). One of them >

Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-23 Thread Rik van Riel
On Wed, 2018-05-23 at 10:56 -0700, Tejun Heo wrote: > The events leading to the lockup are... > > 1. A lot of cgwb_release_workfn() is queued at the same time and all >system_wq kworkers are assigned to execute them. > > 2. They all end up calling synchronize_rcu_expedited(). One of them >

Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-23 Thread Jens Axboe
On 5/23/18 11:56 AM, Tejun Heo wrote: > From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001 > From: Tejun Heo > Date: Wed, 23 May 2018 10:29:00 -0700 > > cgwb_release() punts the actual release to cgwb_release_workfn() on > system_wq. Depending on the number

Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-23 Thread Jens Axboe
On 5/23/18 11:56 AM, Tejun Heo wrote: > From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001 > From: Tejun Heo > Date: Wed, 23 May 2018 10:29:00 -0700 > > cgwb_release() punts the actual release to cgwb_release_workfn() on > system_wq. Depending on the number of cgroups or

Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-23 Thread Paul E. McKenney
On Wed, May 23, 2018 at 11:51:43AM -0700, Tejun Heo wrote: > On Wed, May 23, 2018 at 11:39:07AM -0700, Paul E. McKenney wrote: > > > While this resolves the problem at hand, it might be a good idea to > > > isolate rcu_exp_work to its own workqueue too as it can be used from > > > various paths

Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-23 Thread Paul E. McKenney
On Wed, May 23, 2018 at 11:51:43AM -0700, Tejun Heo wrote: > On Wed, May 23, 2018 at 11:39:07AM -0700, Paul E. McKenney wrote: > > > While this resolves the problem at hand, it might be a good idea to > > > isolate rcu_exp_work to its own workqueue too as it can be used from > > > various paths

Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-23 Thread Tejun Heo
On Wed, May 23, 2018 at 11:39:07AM -0700, Paul E. McKenney wrote: > > While this resolves the problem at hand, it might be a good idea to > > isolate rcu_exp_work to its own workqueue too as it can be used from > > various paths and is prone to this sort of indirect A-A deadlocks. > > Commit

Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-23 Thread Tejun Heo
On Wed, May 23, 2018 at 11:39:07AM -0700, Paul E. McKenney wrote: > > While this resolves the problem at hand, it might be a good idea to > > isolate rcu_exp_work to its own workqueue too as it can be used from > > various paths and is prone to this sort of indirect A-A deadlocks. > > Commit

Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-23 Thread Paul E. McKenney
On Wed, May 23, 2018 at 10:56:32AM -0700, Tejun Heo wrote: > >From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001 > From: Tejun Heo > Date: Wed, 23 May 2018 10:29:00 -0700 > > cgwb_release() punts the actual release to cgwb_release_workfn() on > system_wq.

Re: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-23 Thread Paul E. McKenney
On Wed, May 23, 2018 at 10:56:32AM -0700, Tejun Heo wrote: > >From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001 > From: Tejun Heo > Date: Wed, 23 May 2018 10:29:00 -0700 > > cgwb_release() punts the actual release to cgwb_release_workfn() on > system_wq. Depending on the

[PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-23 Thread Tejun Heo
>From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 23 May 2018 10:29:00 -0700 cgwb_release() punts the actual release to cgwb_release_workfn() on system_wq. Depending on the number of cgroups or block devices, there can be a lot

[PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue

2018-05-23 Thread Tejun Heo
>From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 23 May 2018 10:29:00 -0700 cgwb_release() punts the actual release to cgwb_release_workfn() on system_wq. Depending on the number of cgroups or block devices, there can be a lot of