Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-06-05 Thread Michael Wang
On 06/05/2013 04:08 PM, Jiri Kosina wrote: > On Wed, 5 Jun 2013, Michael Wang wrote: > >>> Just to not let this thread sleep -- I am seeing this as well, even with >>> current Linus' tree (git HEAD == aa4f608). >> >> Have you tried this: >> >> diff --git a/drivers/cpufreq/cpufreq_governor.c >> b/

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-06-05 Thread Jiri Kosina
On Wed, 5 Jun 2013, Michael Wang wrote: > > Just to not let this thread sleep -- I am seeing this as well, even with > > current Linus' tree (git HEAD == aa4f608). > > Have you tried this: > > diff --git a/drivers/cpufreq/cpufreq_governor.c > b/drivers/cpufreq/cpufreq_governor.c > index 443442d

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-06-04 Thread Michael Wang
Hi, Jiri On 06/05/2013 05:20 AM, Jiri Kosina wrote: [snip] > > Just to not let this thread sleep -- I am seeing this as well, even with > current Linus' tree (git HEAD == aa4f608). Have you tried this: diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 4

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-06-04 Thread Jiri Kosina
On Fri, 17 May 2013, Borislav Petkov wrote: > commit f7ea0fd639c2c48d3c61b6eec75362be290c6874 > Author: Thomas Gleixner > Date: Mon May 13 21:40:27 2013 +0200 > > tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline > > Now, when I halt the box, I see these splats originat

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-21 Thread Michael Wang
On 05/21/2013 03:21 PM, Borislav Petkov wrote: > On Tue, May 21, 2013 at 10:20:51AM +0800, Michael Wang wrote: >> This is not enough to prove that policy->cpus is wrong, the cpu could >> be online when get from policy->cpus, but offline when checked here, >> since hotplug is able to happen during t

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-21 Thread Borislav Petkov
On Tue, May 21, 2013 at 10:20:51AM +0800, Michael Wang wrote: > This is not enough to prove that policy->cpus is wrong, the cpu could > be online when get from policy->cpus, but offline when checked here, > since hotplug is able to happen during the period. Strictly speaking you're correct but I d

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Michael Wang
On 05/21/2013 10:20 AM, Michael Wang wrote: [snip] > > If hotplug could not happen but still get an offline cpu from > policy->cpus, than we could say it's wrong, otherwise we proved nothing... like this: diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Michael Wang
On 05/20/2013 09:23 PM, Borislav Petkov wrote: > On Mon, May 20, 2013 at 05:24:05PM +0800, Michael Wang wrote: diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 443442d..449be88 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/d

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Borislav Petkov
On Mon, May 20, 2013 at 07:13:08PM +0530, Viresh Kumar wrote: > Hmm, so for sure there is some locking issue there. ave you tried my > Hpatch? No, not yet. Pretty busy ATM. Btw, you could try reproducing it too, in the meantime - simply enable CONFIG_NO_HZ_COMMON=y # CONFIG_NO_HZ_IDLE is not set

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Viresh Kumar
On 20 May 2013 18:53, Borislav Petkov wrote: > I just confirmed that policy->cpus contains offlined cores with this: > > diff --git a/drivers/cpufreq/cpufreq_governor.c > b/drivers/cpufreq/cpufreq_governor.c > index 5af40ad82d23..e8c25f71e9b6 100644 > --- a/drivers/cpufreq/cpufreq_governor.c > ++

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Borislav Petkov
On Mon, May 20, 2013 at 05:24:05PM +0800, Michael Wang wrote: > >> diff --git a/drivers/cpufreq/cpufreq_governor.c > >> b/drivers/cpufreq/cpufreq_governor.c > >> index 443442d..449be88 100644 > >> --- a/drivers/cpufreq/cpufreq_governor.c > >> +++ b/drivers/cpufreq/cpufreq_governor.c > >> @@ -26,6 +

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Viresh Kumar
On 20 May 2013 15:10, Viresh Kumar wrote: > On 20 May 2013 15:01, Srivatsa S. Bhat > wrote: >> And Viresh, in the regular hotplug paths, the call to gov_cancel_work() is >> supposed to kill any pending workqueue functions pertaining to offline CPUs >> right? > > Yes.. It will cancel work for all

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Viresh Kumar
On 20 May 2013 15:01, Srivatsa S. Bhat wrote: > And Viresh, in the regular hotplug paths, the call to gov_cancel_work() is > supposed to kill any pending workqueue functions pertaining to offline CPUs > right? Yes.. It will cancel work for all cpus first and will start again for online cpus again

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Srivatsa S. Bhat
On 05/20/2013 01:40 PM, Frederic Weisbecker wrote: > 2013/5/20 Borislav Petkov : >> On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote: >>> I suppose the reason is that the cpu we passed to >>> mod_delayed_work_on() has a chance to become offline before we >>> disabled irq, what about che

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Michael Wang
On 05/20/2013 05:09 PM, Viresh Kumar wrote: > On 20 May 2013 14:26, Michael Wang wrote: >> On 05/20/2013 03:25 PM, Michael Wang wrote: >>> Yeah, that's right, I guess the issue is, although the policy->cpus is >>> correct at a given time, after get cpu from it, it's possible to be >>> changed, unl

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Viresh Kumar
On 20 May 2013 14:26, Michael Wang wrote: > On 05/20/2013 03:25 PM, Michael Wang wrote: >> Yeah, that's right, I guess the issue is, although the policy->cpus is >> correct at a given time, after get cpu from it, it's possible to be >> changed, unless we disabled preempt or irq, or hotplug before

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Michael Wang
On 05/20/2013 03:25 PM, Michael Wang wrote: [] > > Yeah, that's right, I guess the issue is, although the policy->cpus is > correct at a given time, after get cpu from it, it's possible to be > changed, unless we disabled preempt or irq, or hotplug before we use it... > > Like such issue cases: >

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Frederic Weisbecker
2013/5/20 Borislav Petkov : > On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote: >> I suppose the reason is that the cpu we passed to >> mod_delayed_work_on() has a chance to become offline before we >> disabled irq, what about check it before send resched ipi? like: > > I think this is

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Tejun Heo
Hello, On Mon, May 20, 2013 at 08:47:27AM +0200, Borislav Petkov wrote: > > So there are two questions here: > > 1. Is gov_queue_work() want to queue the work on offline cpu? > > 2. Is mod_delayed_work_on() allow offline cpu? > > > > I guess both should be false? > > Well, if we don't allow queu

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Michael Wang
Hi, Viresh On 05/20/2013 03:12 PM, Viresh Kumar wrote: > Hi Michael, > > I haven't followed this mail chain earlier and saw this mail only as I am > added in cc now. I probably have answers to few questions here: Thanks for your quick respond :) > > On 20 May 2013 12:36, Michael Wang wrote: >>

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Viresh Kumar
Hi Michael, I haven't followed this mail chain earlier and saw this mail only as I am added in cc now. I probably have answers to few questions here: On 20 May 2013 12:36, Michael Wang wrote: > On 05/20/2013 02:58 PM, Michael Wang wrote: >> On 05/20/2013 02:47 PM, Borislav Petkov wrote: >>> On M

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-20 Thread Michael Wang
On 05/20/2013 02:58 PM, Michael Wang wrote: > On 05/20/2013 02:47 PM, Borislav Petkov wrote: >> On Mon, May 20, 2013 at 02:23:37PM +0800, Michael Wang wrote: >>> On 05/20/2013 12:50 PM, Borislav Petkov wrote: On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote: > I suppose the rea

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-19 Thread Michael Wang
On 05/20/2013 02:47 PM, Borislav Petkov wrote: > On Mon, May 20, 2013 at 02:23:37PM +0800, Michael Wang wrote: >> On 05/20/2013 12:50 PM, Borislav Petkov wrote: >>> On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote: I suppose the reason is that the cpu we passed to mod_delayed_

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-19 Thread Borislav Petkov
On Mon, May 20, 2013 at 02:23:37PM +0800, Michael Wang wrote: > On 05/20/2013 12:50 PM, Borislav Petkov wrote: > > On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote: > >> I suppose the reason is that the cpu we passed to > >> mod_delayed_work_on() has a chance to become offline before we

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-19 Thread Michael Wang
On 05/20/2013 12:50 PM, Borislav Petkov wrote: > On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote: >> I suppose the reason is that the cpu we passed to >> mod_delayed_work_on() has a chance to become offline before we >> disabled irq, what about check it before send resched ipi? like: >

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-19 Thread Borislav Petkov
On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote: > I suppose the reason is that the cpu we passed to > mod_delayed_work_on() has a chance to become offline before we > disabled irq, what about check it before send resched ipi? like: I think this is only addressing the symptoms - what

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule, round 2

2013-05-19 Thread Michael Wang
Hi, Borislav On 05/17/2013 09:56 PM, Borislav Petkov wrote: [snip] > [ 51.737378] [] native_smp_send_reschedule+0x58/0x60 > [ 51.744013] [] wake_up_nohz_cpu+0x2d/0xa0 I suppose the reason is that the cpu we passed to mod_delayed_work_on() has a chance to become offline before we disabled ir

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-15 Thread Paul E. McKenney
On Thu, May 16, 2013 at 12:43:58AM +0200, Borislav Petkov wrote: > On Wed, May 15, 2013 at 11:45:28AM -0700, Paul E. McKenney wrote: > > Does the following patch help? > > Hmm, I just tried on 3.10-rc1 > > CONFIG_NO_HZ_FULL_ALL=y > > on the one hand and then > > CONFIG_NO_HZ_FULL=y > # CONFIG_N

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-15 Thread Borislav Petkov
On Wed, May 15, 2013 at 11:45:28AM -0700, Paul E. McKenney wrote: > Does the following patch help? Hmm, I just tried on 3.10-rc1 CONFIG_NO_HZ_FULL_ALL=y on the one hand and then CONFIG_NO_HZ_FULL=y # CONFIG_NO_HZ_FULL_ALL is not set with "nohz_full=4-7 rcu_nocbs=4-7" on the cmdline and I don't

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-15 Thread Paul E. McKenney
On Thu, May 09, 2013 at 02:58:59PM +0200, Borislav Petkov wrote: > On Thu, May 09, 2013 at 02:50:40PM +0200, Borislav Petkov wrote: > > Looks like we're sending a resched IPI to a cpu which is not online > > yet in order to start the MCE polling timer. So the rcu* options are > > kinda unlikely to

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-13 Thread Jiri Kosina
On Mon, 13 May 2013, Thomas Gleixner wrote: > > > --- a/kernel/time/tick-sched.c > > > +++ b/kernel/time/tick-sched.c > > > @@ -650,6 +650,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct > > > tick_sched *ts, > > > > > > ts->last_tick = hrtimer_get_expires(&ts->sched_time

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-13 Thread Thomas Gleixner
On Mon, 13 May 2013, Jiri Kosina wrote: > > --- a/kernel/time/tick-sched.c > > +++ b/kernel/time/tick-sched.c > > @@ -650,6 +650,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct > > tick_sched *ts, > > > > ts->last_tick = hrtimer_get_expires(&ts->sched_timer); > >

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-13 Thread Jiri Kosina
On Fri, 10 May 2013, Frederic Weisbecker wrote: > The problem is that it doesn't catch issues with irqs that have been enabled > before in start_secondary(), then re-disabled somewhow. Warning on offline > CPU from the place > that disables the tick should catch the issue. > > Jiri, could you t

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-10 Thread Frederic Weisbecker
On Fri, May 10, 2013 at 06:23:40PM +0200, Borislav Petkov wrote: > On Fri, May 10, 2013 at 05:43:50PM +0200, Frederic Weisbecker wrote: > > So either interrupts are spuriously enabled early, or ts->tick_stopped > > is not correctly initialized. > > Hmm, it can't be interrupts disabled because add_

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-10 Thread Borislav Petkov
On Fri, May 10, 2013 at 05:43:50PM +0200, Frederic Weisbecker wrote: > Right. But this is adding a timer locally, from CPU 1 to CPU 1, as > indicated in the trace with the "1 1" line. So the only way for > this IPI to be self-sent is if the tick is stopped locally (cf: > wake_up_full_nohz_cpu()). >

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-10 Thread Frederic Weisbecker
On Fri, May 10, 2013 at 05:21:02PM +0200, Borislav Petkov wrote: > On Fri, May 10, 2013 at 05:03:56PM +0200, Jiri Kosina wrote: > > [ ... snip ... ] > > Enabling non-boot CPUs ... > > smpboot: Booting Node 0 Processor 1 APIC 0x1 > > CPU1 microcode updated early to revision 0x60f, date = 2010-09-

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-10 Thread Jiri Kosina
On Fri, 10 May 2013, Frederic Weisbecker wrote: > In fact it would be nice to have DO_ONCE(something) and stuff whatever > we want inside. > All the printk_once() et. al could even be implemented using that. Sounds nice, but if it's going to be used for something else than purely debugging outpu

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-10 Thread Frederic Weisbecker
On Fri, May 10, 2013 at 11:45:56AM +0200, Borislav Petkov wrote: > On Fri, May 10, 2013 at 11:37:29AM +0200, Ingo Molnar wrote: > > The pattern I use in such cases is: > > > > if (WARN_ONCE(!cpu_online(cpu))) { > > printk("%d %d\n", cpu, smp_processor_id()); > > dump_st

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-10 Thread Borislav Petkov
On Fri, May 10, 2013 at 05:03:56PM +0200, Jiri Kosina wrote: > [ ... snip ... ] > Enabling non-boot CPUs ... > smpboot: Booting Node 0 Processor 1 APIC 0x1 > CPU1 microcode updated early to revision 0x60f, date = 2010-09-29 > Disabled fast string operations > 1 1 > CPU: 1 PID: 0 Comm: swapper

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-10 Thread Jiri Kosina
On Fri, 10 May 2013, Frederic Weisbecker wrote: > Like Borislav said, it's due to the scheduler IPI sent to an offline > target. Here this is because we enqueue a timer and we must ensure the > target handles this timer by rescheduling its tick if necessary. > > But it's weird because the mce tim

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-10 Thread Borislav Petkov
On Fri, May 10, 2013 at 11:37:29AM +0200, Ingo Molnar wrote: > The pattern I use in such cases is: > > if (WARN_ONCE(!cpu_online(cpu))) { > printk("%d %d\n", cpu, smp_processor_id()); > dump_stack(); > } Cool, and WARN_ONCE dumps stack already so:

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-10 Thread Ingo Molnar
* Frederic Weisbecker wrote: > 2013/5/10 Borislav Petkov : > > On Fri, May 10, 2013 at 02:29:31AM +0200, Frederic Weisbecker wrote: > >> @@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu) > >> { > >> if (tick_nohz_full_cpu(cpu)) { > >> if (cpu != smp_processor_i

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-10 Thread Borislav Petkov
On Fri, May 10, 2013 at 11:26:39AM +0200, Frederic Weisbecker wrote: > 2013/5/10 Borislav Petkov : > > On Fri, May 10, 2013 at 02:29:31AM +0200, Frederic Weisbecker wrote: > >> @@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu) > >> { > >> if (tick_nohz_full_cpu(cpu)) { > >>

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-10 Thread Frederic Weisbecker
2013/5/10 Borislav Petkov : > On Fri, May 10, 2013 at 02:29:31AM +0200, Frederic Weisbecker wrote: >> @@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu) >> { >> if (tick_nohz_full_cpu(cpu)) { >> if (cpu != smp_processor_id() || >> - tick_nohz_tick_s

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-10 Thread Borislav Petkov
On Fri, May 10, 2013 at 02:29:31AM +0200, Frederic Weisbecker wrote: > @@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu) > { > if (tick_nohz_full_cpu(cpu)) { > if (cpu != smp_processor_id() || > - tick_nohz_tick_stopped()) > + tick_

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-09 Thread Frederic Weisbecker
On Thu, May 09, 2013 at 02:29:18PM +0200, Jiri Kosina wrote: > Hi, > > I just got the warning below when resuming from hibernation with kernel > that has NO_HZ_FULL_ALL=y. This is with topmost commit e0fd9affeb640. > > > [ ... snip ... ] > PM: Hibernation mode set to 'shutdown' > PM: Marking

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-09 Thread Borislav Petkov
On Thu, May 09, 2013 at 02:50:40PM +0200, Borislav Petkov wrote: > Looks like we're sending a resched IPI to a cpu which is not online > yet in order to start the MCE polling timer. So the rcu* options are > kinda unlikely to be related, AFAICT. On a second thought, they must be somehow indirectly

Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule

2013-05-09 Thread Borislav Petkov
On Thu, May 09, 2013 at 02:29:18PM +0200, Jiri Kosina wrote: > Hi, > > I just got the warning below when resuming from hibernation with kernel > that has NO_HZ_FULL_ALL=y. This is with topmost commit e0fd9affeb640. Did you boot with any of the NO_HZ_FULL options on the command line, i.e. rcu_noc