Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-13 Thread Phil Auld
On Wed, Mar 13, 2019 at 01:26:51PM -0700 bseg...@google.com wrote: > Phil Auld writes: > > > On Wed, Mar 13, 2019 at 10:44:09AM -0700 bseg...@google.com wrote: > >> Phil Auld writes: > >> > >> > On Mon, Mar 11, 2019 at 04:25:36PM -0400 Phil Auld wrote: > >> >> On Mon, Mar 11, 2019 at

Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-13 Thread bsegall
Phil Auld writes: > On Wed, Mar 13, 2019 at 10:44:09AM -0700 bseg...@google.com wrote: >> Phil Auld writes: >> >> > On Mon, Mar 11, 2019 at 04:25:36PM -0400 Phil Auld wrote: >> >> On Mon, Mar 11, 2019 at 10:44:25AM -0700 bseg...@google.com wrote: >> >> > Letting it spin for 100ms and then only

Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-13 Thread Phil Auld
On Wed, Mar 13, 2019 at 10:44:09AM -0700 bseg...@google.com wrote: > Phil Auld writes: > > > On Mon, Mar 11, 2019 at 04:25:36PM -0400 Phil Auld wrote: > >> On Mon, Mar 11, 2019 at 10:44:25AM -0700 bseg...@google.com wrote: > >> > Letting it spin for 100ms and then only increasing by 6% seems

Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-13 Thread bsegall
Phil Auld writes: > On Mon, Mar 11, 2019 at 04:25:36PM -0400 Phil Auld wrote: >> On Mon, Mar 11, 2019 at 10:44:25AM -0700 bseg...@google.com wrote: >> > Letting it spin for 100ms and then only increasing by 6% seems extremely >> > generous. If we went this route I'd probably say "after looping N

Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-12 Thread bsegall
Phil Auld writes: > On Mon, Mar 11, 2019 at 10:44:25AM -0700 bseg...@google.com wrote: >> Phil Auld writes: >> >> > On Wed, Mar 06, 2019 at 11:25:02AM -0800 bseg...@google.com wrote: >> >> Phil Auld writes: >> >> >> >> > On Tue, Mar 05, 2019 at 12:45:34PM -0800 bseg...@google.com wrote: >>

Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-12 Thread Phil Auld
On Mon, Mar 11, 2019 at 04:25:36PM -0400 Phil Auld wrote: > On Mon, Mar 11, 2019 at 10:44:25AM -0700 bseg...@google.com wrote: > > Letting it spin for 100ms and then only increasing by 6% seems extremely > > generous. If we went this route I'd probably say "after looping N > > times, set the

Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-11 Thread Phil Auld
On Mon, Mar 11, 2019 at 10:44:25AM -0700 bseg...@google.com wrote: > Phil Auld writes: > > > On Wed, Mar 06, 2019 at 11:25:02AM -0800 bseg...@google.com wrote: > >> Phil Auld writes: > >> > >> > On Tue, Mar 05, 2019 at 12:45:34PM -0800 bseg...@google.com wrote: > >> >> Phil Auld writes: > >>

Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-11 Thread bsegall
Phil Auld writes: > On Wed, Mar 06, 2019 at 11:25:02AM -0800 bseg...@google.com wrote: >> Phil Auld writes: >> >> > On Tue, Mar 05, 2019 at 12:45:34PM -0800 bseg...@google.com wrote: >> >> Phil Auld writes: >> >> >> >> > Interestingly, if I limit the number of child cgroups to the number of

Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-09 Thread Phil Auld
On Wed, Mar 06, 2019 at 11:25:02AM -0800 bseg...@google.com wrote: > Phil Auld writes: > > > On Tue, Mar 05, 2019 at 12:45:34PM -0800 bseg...@google.com wrote: > >> Phil Auld writes: > >> > >> > Interestingly, if I limit the number of child cgroups to the number of > >> > them I'm actually

Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-06 Thread bsegall
Phil Auld writes: > On Tue, Mar 05, 2019 at 12:45:34PM -0800 bseg...@google.com wrote: >> Phil Auld writes: >> >> > Interestingly, if I limit the number of child cgroups to the number of >> > them I'm actually putting processes into (16 down from 2500) the problem >> > does not reproduce. >>

Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-06 Thread Phil Auld
On Tue, Mar 05, 2019 at 12:45:34PM -0800 bseg...@google.com wrote: > Phil Auld writes: > > > Interestingly, if I limit the number of child cgroups to the number of > > them I'm actually putting processes into (16 down from 2500) the problem > > does not reproduce. > > That is indeed

Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-05 Thread bsegall
Phil Auld writes: > On Tue, Mar 05, 2019 at 10:49:01AM -0800 bseg...@google.com wrote: >> Phil Auld writes: >> >> >> > >> >> > raw_spin_lock(_b->lock); >> >> > for (;;) { >> >> > overrun = hrtimer_forward_now(timer, cfs_b->period); >> >> > if

Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-05 Thread Phil Auld
On Tue, Mar 05, 2019 at 10:49:01AM -0800 bseg...@google.com wrote: > Phil Auld writes: > > >> > > >> > raw_spin_lock(_b->lock); > >> > for (;;) { > >> > overrun = hrtimer_forward_now(timer, cfs_b->period); > >> > if (!overrun) > >> > break; > >> > > >> >

Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-05 Thread bsegall
Phil Auld writes: >> > >> >raw_spin_lock(_b->lock); >> >for (;;) { >> >overrun = hrtimer_forward_now(timer, cfs_b->period); >> >if (!overrun) >> >break; >> > >> >idle = do_sched_cfs_period_timer(cfs_b, overrun); >> >} >> >if

Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-04 Thread Phil Auld
On Mon, Mar 04, 2019 at 10:13:49AM -0800 bseg...@google.com wrote: > Phil Auld writes: > > > Hi, > > > > I have a reproducible case of this: > > > > [ 217.264946] NMI watchdog: Watchdog detected hard LOCKUP on cpu 24 > > [ 217.264948] Modules linked in: sunrpc iTCO_wdt gpio_ich > >

Re: [RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-04 Thread bsegall
Phil Auld writes: > Hi, > > I have a reproducible case of this: > > [ 217.264946] NMI watchdog: Watchdog detected hard LOCKUP on cpu 24 > [ 217.264948] Modules linked in: sunrpc iTCO_wdt gpio_ich > iTCO_vendor_support intel_powerclamp coretemp kvm_intel kvm ipmi_ssif > irqbypass

[RFC] sched/fair: hard lockup in sched_cfs_period_timer

2019-03-01 Thread Phil Auld
Hi, I have a reproducible case of this: [ 217.264946] NMI watchdog: Watchdog detected hard LOCKUP on cpu 24 [ 217.264948] Modules linked in: sunrpc iTCO_wdt gpio_ich iTCO_vendor_support intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul