Hello, Jan.
On Thu, May 24, 2018 at 12:19:00PM +0200, Jan Kara wrote:
> > We're periodically seeing close to 256 kworkers getting stuck with the
> > following stack trace and overtime the entire system gets stuck.
>
> OK, but that means that you have to have 256 block devices, don't you? As
Hello, Jan.
On Thu, May 24, 2018 at 12:19:00PM +0200, Jan Kara wrote:
> > We're periodically seeing close to 256 kworkers getting stuck with the
> > following stack trace and overtime the entire system gets stuck.
>
> OK, but that means that you have to have 256 block devices, don't you? As
On Wed 23-05-18 10:56:32, Tejun Heo wrote:
> From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001
> From: Tejun Heo
> Date: Wed, 23 May 2018 10:29:00 -0700
>
> cgwb_release() punts the actual release to cgwb_release_workfn() on
> system_wq. Depending on the
On Wed 23-05-18 10:56:32, Tejun Heo wrote:
> From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001
> From: Tejun Heo
> Date: Wed, 23 May 2018 10:29:00 -0700
>
> cgwb_release() punts the actual release to cgwb_release_workfn() on
> system_wq. Depending on the number of cgroups
Hello, Rik.
On Wed, May 23, 2018 at 06:03:15PM -0400, Rik van Riel wrote:
> Dumb question. Does setting max_active to 1 mean
> that every cgwb_release_workfn() ends up forcing
> another RCU grace period on the whole system, while
> today you might have a bunch of them waiting on the
> same RCU
Hello, Rik.
On Wed, May 23, 2018 at 06:03:15PM -0400, Rik van Riel wrote:
> Dumb question. Does setting max_active to 1 mean
> that every cgwb_release_workfn() ends up forcing
> another RCU grace period on the whole system, while
> today you might have a bunch of them waiting on the
> same RCU
On Wed, 2018-05-23 at 10:56 -0700, Tejun Heo wrote:
> The events leading to the lockup are...
>
> 1. A lot of cgwb_release_workfn() is queued at the same time and all
>system_wq kworkers are assigned to execute them.
>
> 2. They all end up calling synchronize_rcu_expedited(). One of them
>
On Wed, 2018-05-23 at 10:56 -0700, Tejun Heo wrote:
> The events leading to the lockup are...
>
> 1. A lot of cgwb_release_workfn() is queued at the same time and all
>system_wq kworkers are assigned to execute them.
>
> 2. They all end up calling synchronize_rcu_expedited(). One of them
>
On 5/23/18 11:56 AM, Tejun Heo wrote:
> From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001
> From: Tejun Heo
> Date: Wed, 23 May 2018 10:29:00 -0700
>
> cgwb_release() punts the actual release to cgwb_release_workfn() on
> system_wq. Depending on the number
On 5/23/18 11:56 AM, Tejun Heo wrote:
> From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001
> From: Tejun Heo
> Date: Wed, 23 May 2018 10:29:00 -0700
>
> cgwb_release() punts the actual release to cgwb_release_workfn() on
> system_wq. Depending on the number of cgroups or
On Wed, May 23, 2018 at 11:51:43AM -0700, Tejun Heo wrote:
> On Wed, May 23, 2018 at 11:39:07AM -0700, Paul E. McKenney wrote:
> > > While this resolves the problem at hand, it might be a good idea to
> > > isolate rcu_exp_work to its own workqueue too as it can be used from
> > > various paths
On Wed, May 23, 2018 at 11:51:43AM -0700, Tejun Heo wrote:
> On Wed, May 23, 2018 at 11:39:07AM -0700, Paul E. McKenney wrote:
> > > While this resolves the problem at hand, it might be a good idea to
> > > isolate rcu_exp_work to its own workqueue too as it can be used from
> > > various paths
On Wed, May 23, 2018 at 11:39:07AM -0700, Paul E. McKenney wrote:
> > While this resolves the problem at hand, it might be a good idea to
> > isolate rcu_exp_work to its own workqueue too as it can be used from
> > various paths and is prone to this sort of indirect A-A deadlocks.
>
> Commit
On Wed, May 23, 2018 at 11:39:07AM -0700, Paul E. McKenney wrote:
> > While this resolves the problem at hand, it might be a good idea to
> > isolate rcu_exp_work to its own workqueue too as it can be used from
> > various paths and is prone to this sort of indirect A-A deadlocks.
>
> Commit
On Wed, May 23, 2018 at 10:56:32AM -0700, Tejun Heo wrote:
> >From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001
> From: Tejun Heo
> Date: Wed, 23 May 2018 10:29:00 -0700
>
> cgwb_release() punts the actual release to cgwb_release_workfn() on
> system_wq.
On Wed, May 23, 2018 at 10:56:32AM -0700, Tejun Heo wrote:
> >From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001
> From: Tejun Heo
> Date: Wed, 23 May 2018 10:29:00 -0700
>
> cgwb_release() punts the actual release to cgwb_release_workfn() on
> system_wq. Depending on the
>From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001
From: Tejun Heo
Date: Wed, 23 May 2018 10:29:00 -0700
cgwb_release() punts the actual release to cgwb_release_workfn() on
system_wq. Depending on the number of cgroups or block devices, there
can be a lot
>From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001
From: Tejun Heo
Date: Wed, 23 May 2018 10:29:00 -0700
cgwb_release() punts the actual release to cgwb_release_workfn() on
system_wq. Depending on the number of cgroups or block devices, there
can be a lot of
18 matches
Mail list logo