Re: workqueue code needing preemption disabled

2013-03-18 Thread Steven Rostedt
On Mon, 2013-03-18 at 12:06 -0700, Tejun Heo wrote: > Me neither. Unfortunately, I'm out of ideas at the moment. > Hmm... last year, there was a similar issue, I think it was in AMD > cpufreq, which was caused by work function doing > set_cpus_allowed_ptr(), so the idle worker was on the correct

Re: workqueue code needing preemption disabled

2013-03-18 Thread Tejun Heo
On Mon, Mar 18, 2013 at 02:57:30PM -0400, Steven Rostedt wrote: > I like the theory, but it has one flaw. I agree that the update should > be wrapped in preempt_disable() but since this bug happens on the same > CPU, the state of the list will be the same when it was preempted to > when it bugged.

Re: workqueue code needing preemption disabled

2013-03-18 Thread Steven Rostedt
On Mon, 2013-03-18 at 11:21 -0700, Tejun Heo wrote: > I've been thinking about it and AFAICS the only way that BUG_ON() > could trigger from preemption is if preemption happens while the > idle_list head is becoming or stopping being empty. > ie. pool->worklist is half updated so list_empty() isn'

Re: workqueue code needing preemption disabled

2013-03-18 Thread Steven Rostedt
On Mon, 2013-03-18 at 11:26 -0700, Tejun Heo wrote: > > Hmm, the issue is that a "use to be" idle thread got migrated, and is > > now being woken up by another worker. What can cause an established > > worker to migrate without HOTPLUG being active? > > It doesn't. I think it's trying to wakeup

Re: workqueue code needing preemption disabled

2013-03-18 Thread Tejun Heo
On Mon, Mar 18, 2013 at 01:08:07PM -0400, Steven Rostedt wrote: > On Mon, 2013-03-18 at 09:43 -0700, Tejun Heo wrote: > > > Making gcwq locks disable preemption would be much safer / easier, but > > if that's not desirable, anything touching gcwq->idle_list would be a > > good place to start - wor

Re: workqueue code needing preemption disabled

2013-03-18 Thread Tejun Heo
On Mon, Mar 18, 2013 at 02:23:56PM -0400, Steven Rostedt wrote: > On Mon, 2013-03-18 at 09:43 -0700, Tejun Heo wrote: > > Hello, Steven. > > > > On Mon, Mar 18, 2013 at 12:30:43PM -0400, Steven Rostedt wrote: > > > If you happen to know the critical areas that require preemption to be > > > disabl

Re: workqueue code needing preemption disabled

2013-03-18 Thread Steven Rostedt
On Mon, 2013-03-18 at 09:43 -0700, Tejun Heo wrote: > Hello, Steven. > > On Mon, Mar 18, 2013 at 12:30:43PM -0400, Steven Rostedt wrote: > > If you happen to know the critical areas that require preemption to be > > disabled for real, we can encapsulate them with: > > > > preempt_disable_rt()

Re: workqueue code needing preemption disabled

2013-03-18 Thread Steven Rostedt
On Mon, 2013-03-18 at 09:43 -0700, Tejun Heo wrote: > Making gcwq locks disable preemption would be much safer / easier, but > if that's not desirable, anything touching gcwq->idle_list would be a > good place to start - worker_enter_idle() and worker_leave_idle(). > Hmmm... ignoring CPU hotplug,

Re: workqueue code needing preemption disabled

2013-03-18 Thread Tejun Heo
On Mon, Mar 18, 2013 at 12:41:23PM -0400, Steven Rostedt wrote: > But, I'm worried about the loops that are done while holding this lock. > Just looking at is_chained_work() that does for_each_busy_worker(), how > big can that list be? If it's bound by # of CPUs then that may be fine, > but if it c

Re: workqueue code needing preemption disabled

2013-03-18 Thread Tejun Heo
Hello, Steven. On Mon, Mar 18, 2013 at 12:30:43PM -0400, Steven Rostedt wrote: > If you happen to know the critical areas that require preemption to be > disabled for real, we can encapsulate them with: > > preempt_disable_rt(); > > preempt_enable_rt(); > > These are currently only

Re: workqueue code needing preemption disabled

2013-03-18 Thread Steven Rostedt
On Mon, 2013-03-18 at 09:27 -0700, Tejun Heo wrote: > Does that mean that a task holding gcwq->lock may be preempted? If > so, that sure could lead to weird problems. Maybe gcwq->lock should > be marked non-preemptible somehow? If the gcwq->lock is never held for a long time (really, more than

Re: workqueue code needing preemption disabled

2013-03-18 Thread Steven Rostedt
On Mon, 2013-03-18 at 12:27 -0400, Steven Rostedt wrote: > IOW, what can happen in -rt here is: > > spin_lock_irq(&gcwq->lock); > [...] > > -> preempt_schedule(); > schedule(); > try_to_wake_up_local(); > > [...] > sp

Re: workqueue code needing preemption disabled

2013-03-18 Thread Tejun Heo
Hey, Steven. On Mon, Mar 18, 2013 at 12:23:19PM -0400, Steven Rostedt wrote: > > Maybe I'm confused but I can't really see how the above would be a > > problem to workqueue in itself. Both rq->lock and gcwq->lock are > > irq-safe, so spin_lock() not disabling preemption shouldn't be a > > problem

Re: workqueue code needing preemption disabled

2013-03-18 Thread Steven Rostedt
On Mon, 2013-03-18 at 12:23 -0400, Steven Rostedt wrote: > > Maybe I'm confused but I can't really see how the above would be a > > problem to workqueue in itself. Both rq->lock and gcwq->lock are > > irq-safe, so spin_lock() not disabling preemption shouldn't be a > > problem. Are CPU hotplug o

Re: workqueue code needing preemption disabled

2013-03-18 Thread Steven Rostedt
On Mon, 2013-03-18 at 09:06 -0700, Tejun Heo wrote: > Hello, Steven. > > On Mon, Mar 18, 2013 at 10:36:23AM -0400, Steven Rostedt wrote: > > kernel BUG at kernel/sched/core.c:1731! > > invalid opcode: [#1] PREEMPT SMP > > CPU 5 > > Pid: 16637, comm: kworker/5:0 Not tainted 3.6.11-rt30.25.el

Re: workqueue code needing preemption disabled

2013-03-18 Thread Tejun Heo
Hello, Steven. On Mon, Mar 18, 2013 at 10:36:23AM -0400, Steven Rostedt wrote: > kernel BUG at kernel/sched/core.c:1731! > invalid opcode: [#1] PREEMPT SMP > CPU 5 > Pid: 16637, comm: kworker/5:0 Not tainted 3.6.11-rt30.25.el6rt.x86_64 #1 HP > ProLiant DL580 G7 ... > static void try_to_wak

workqueue code needing preemption disabled

2013-03-18 Thread Steven Rostedt
Hi Tejun, I'm debugging a crash on -rt that has the following: kernel BUG at kernel/sched/core.c:1731! invalid opcode: [#1] PREEMPT SMP CPU 5 Pid: 16637, comm: kworker/5:0 Not tainted 3.6.11-rt30.25.el6rt.x86_64 #1 HP ProLiant DL580 G7 RIP: 0010:[] [] __schedule+0x89a/0x8c0 RSP: 0018:fff