On 06/05/2013 04:08 PM, Jiri Kosina wrote:
> On Wed, 5 Jun 2013, Michael Wang wrote:
>
>>> Just to not let this thread sleep -- I am seeing this as well, even with
>>> current Linus' tree (git HEAD == aa4f608).
>>
>> Have you tried this:
>>
>> diff --git a/drivers/cpufreq/cpufreq_governor.c
>> b/
On Wed, 5 Jun 2013, Michael Wang wrote:
> > Just to not let this thread sleep -- I am seeing this as well, even with
> > current Linus' tree (git HEAD == aa4f608).
>
> Have you tried this:
>
> diff --git a/drivers/cpufreq/cpufreq_governor.c
> b/drivers/cpufreq/cpufreq_governor.c
> index 443442d
Hi, Jiri
On 06/05/2013 05:20 AM, Jiri Kosina wrote:
[snip]
>
> Just to not let this thread sleep -- I am seeing this as well, even with
> current Linus' tree (git HEAD == aa4f608).
Have you tried this:
diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index 4
On Fri, 17 May 2013, Borislav Petkov wrote:
> commit f7ea0fd639c2c48d3c61b6eec75362be290c6874
> Author: Thomas Gleixner
> Date: Mon May 13 21:40:27 2013 +0200
>
> tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline
>
> Now, when I halt the box, I see these splats originat
On 05/21/2013 03:21 PM, Borislav Petkov wrote:
> On Tue, May 21, 2013 at 10:20:51AM +0800, Michael Wang wrote:
>> This is not enough to prove that policy->cpus is wrong, the cpu could
>> be online when get from policy->cpus, but offline when checked here,
>> since hotplug is able to happen during t
On Tue, May 21, 2013 at 10:20:51AM +0800, Michael Wang wrote:
> This is not enough to prove that policy->cpus is wrong, the cpu could
> be online when get from policy->cpus, but offline when checked here,
> since hotplug is able to happen during the period.
Strictly speaking you're correct but I d
On 05/21/2013 10:20 AM, Michael Wang wrote:
[snip]
>
> If hotplug could not happen but still get an offline cpu from
> policy->cpus, than we could say it's wrong, otherwise we proved nothing...
like this:
diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index
On 05/20/2013 09:23 PM, Borislav Petkov wrote:
> On Mon, May 20, 2013 at 05:24:05PM +0800, Michael Wang wrote:
diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index 443442d..449be88 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/d
On Mon, May 20, 2013 at 07:13:08PM +0530, Viresh Kumar wrote:
> Hmm, so for sure there is some locking issue there. ave you tried my
> Hpatch?
No, not yet. Pretty busy ATM. Btw, you could try reproducing it too, in
the meantime - simply enable
CONFIG_NO_HZ_COMMON=y
# CONFIG_NO_HZ_IDLE is not set
On 20 May 2013 18:53, Borislav Petkov wrote:
> I just confirmed that policy->cpus contains offlined cores with this:
>
> diff --git a/drivers/cpufreq/cpufreq_governor.c
> b/drivers/cpufreq/cpufreq_governor.c
> index 5af40ad82d23..e8c25f71e9b6 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> ++
On Mon, May 20, 2013 at 05:24:05PM +0800, Michael Wang wrote:
> >> diff --git a/drivers/cpufreq/cpufreq_governor.c
> >> b/drivers/cpufreq/cpufreq_governor.c
> >> index 443442d..449be88 100644
> >> --- a/drivers/cpufreq/cpufreq_governor.c
> >> +++ b/drivers/cpufreq/cpufreq_governor.c
> >> @@ -26,6 +
On 20 May 2013 15:10, Viresh Kumar wrote:
> On 20 May 2013 15:01, Srivatsa S. Bhat
> wrote:
>> And Viresh, in the regular hotplug paths, the call to gov_cancel_work() is
>> supposed to kill any pending workqueue functions pertaining to offline CPUs
>> right?
>
> Yes.. It will cancel work for all
On 20 May 2013 15:01, Srivatsa S. Bhat wrote:
> And Viresh, in the regular hotplug paths, the call to gov_cancel_work() is
> supposed to kill any pending workqueue functions pertaining to offline CPUs
> right?
Yes.. It will cancel work for all cpus first and will start again for
online cpus again
On 05/20/2013 01:40 PM, Frederic Weisbecker wrote:
> 2013/5/20 Borislav Petkov :
>> On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote:
>>> I suppose the reason is that the cpu we passed to
>>> mod_delayed_work_on() has a chance to become offline before we
>>> disabled irq, what about che
On 05/20/2013 05:09 PM, Viresh Kumar wrote:
> On 20 May 2013 14:26, Michael Wang wrote:
>> On 05/20/2013 03:25 PM, Michael Wang wrote:
>>> Yeah, that's right, I guess the issue is, although the policy->cpus is
>>> correct at a given time, after get cpu from it, it's possible to be
>>> changed, unl
On 20 May 2013 14:26, Michael Wang wrote:
> On 05/20/2013 03:25 PM, Michael Wang wrote:
>> Yeah, that's right, I guess the issue is, although the policy->cpus is
>> correct at a given time, after get cpu from it, it's possible to be
>> changed, unless we disabled preempt or irq, or hotplug before
On 05/20/2013 03:25 PM, Michael Wang wrote:
[]
>
> Yeah, that's right, I guess the issue is, although the policy->cpus is
> correct at a given time, after get cpu from it, it's possible to be
> changed, unless we disabled preempt or irq, or hotplug before we use it...
>
> Like such issue cases:
>
2013/5/20 Borislav Petkov :
> On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote:
>> I suppose the reason is that the cpu we passed to
>> mod_delayed_work_on() has a chance to become offline before we
>> disabled irq, what about check it before send resched ipi? like:
>
> I think this is
Hello,
On Mon, May 20, 2013 at 08:47:27AM +0200, Borislav Petkov wrote:
> > So there are two questions here:
> > 1. Is gov_queue_work() want to queue the work on offline cpu?
> > 2. Is mod_delayed_work_on() allow offline cpu?
> >
> > I guess both should be false?
>
> Well, if we don't allow queu
Hi, Viresh
On 05/20/2013 03:12 PM, Viresh Kumar wrote:
> Hi Michael,
>
> I haven't followed this mail chain earlier and saw this mail only as I am
> added in cc now. I probably have answers to few questions here:
Thanks for your quick respond :)
>
> On 20 May 2013 12:36, Michael Wang wrote:
>>
Hi Michael,
I haven't followed this mail chain earlier and saw this mail only as I am
added in cc now. I probably have answers to few questions here:
On 20 May 2013 12:36, Michael Wang wrote:
> On 05/20/2013 02:58 PM, Michael Wang wrote:
>> On 05/20/2013 02:47 PM, Borislav Petkov wrote:
>>> On M
On 05/20/2013 02:58 PM, Michael Wang wrote:
> On 05/20/2013 02:47 PM, Borislav Petkov wrote:
>> On Mon, May 20, 2013 at 02:23:37PM +0800, Michael Wang wrote:
>>> On 05/20/2013 12:50 PM, Borislav Petkov wrote:
On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote:
> I suppose the rea
On 05/20/2013 02:47 PM, Borislav Petkov wrote:
> On Mon, May 20, 2013 at 02:23:37PM +0800, Michael Wang wrote:
>> On 05/20/2013 12:50 PM, Borislav Petkov wrote:
>>> On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote:
I suppose the reason is that the cpu we passed to
mod_delayed_
On Mon, May 20, 2013 at 02:23:37PM +0800, Michael Wang wrote:
> On 05/20/2013 12:50 PM, Borislav Petkov wrote:
> > On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote:
> >> I suppose the reason is that the cpu we passed to
> >> mod_delayed_work_on() has a chance to become offline before we
On 05/20/2013 12:50 PM, Borislav Petkov wrote:
> On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote:
>> I suppose the reason is that the cpu we passed to
>> mod_delayed_work_on() has a chance to become offline before we
>> disabled irq, what about check it before send resched ipi? like:
>
On Mon, May 20, 2013 at 11:16:33AM +0800, Michael Wang wrote:
> I suppose the reason is that the cpu we passed to
> mod_delayed_work_on() has a chance to become offline before we
> disabled irq, what about check it before send resched ipi? like:
I think this is only addressing the symptoms - what
Hi, Borislav
On 05/17/2013 09:56 PM, Borislav Petkov wrote:
[snip]
> [ 51.737378] [] native_smp_send_reschedule+0x58/0x60
> [ 51.744013] [] wake_up_nohz_cpu+0x2d/0xa0
I suppose the reason is that the cpu we passed to mod_delayed_work_on()
has a chance to become offline before we disabled ir
On Wed, May 15, 2013 at 04:55:13PM -0700, Paul E. McKenney wrote:
> I never saw the problem, so I have to defer to you on this one. I will
> hold off on the patch unless the problem shows up again.
Thanks Paul.
Well, it's not this problem, but another one. Let me check if everyone
is on CC... nop
On Thu, May 16, 2013 at 12:43:58AM +0200, Borislav Petkov wrote:
> On Wed, May 15, 2013 at 11:45:28AM -0700, Paul E. McKenney wrote:
> > Does the following patch help?
>
> Hmm, I just tried on 3.10-rc1
>
> CONFIG_NO_HZ_FULL_ALL=y
>
> on the one hand and then
>
> CONFIG_NO_HZ_FULL=y
> # CONFIG_N
On Wed, May 15, 2013 at 11:45:28AM -0700, Paul E. McKenney wrote:
> Does the following patch help?
Hmm, I just tried on 3.10-rc1
CONFIG_NO_HZ_FULL_ALL=y
on the one hand and then
CONFIG_NO_HZ_FULL=y
# CONFIG_NO_HZ_FULL_ALL is not set
with "nohz_full=4-7 rcu_nocbs=4-7" on the cmdline and I don't
On Thu, May 09, 2013 at 02:58:59PM +0200, Borislav Petkov wrote:
> On Thu, May 09, 2013 at 02:50:40PM +0200, Borislav Petkov wrote:
> > Looks like we're sending a resched IPI to a cpu which is not online
> > yet in order to start the MCE polling timer. So the rcu* options are
> > kinda unlikely to
On Mon, 13 May 2013, Thomas Gleixner wrote:
> > > --- a/kernel/time/tick-sched.c
> > > +++ b/kernel/time/tick-sched.c
> > > @@ -650,6 +650,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct
> > > tick_sched *ts,
> > >
> > > ts->last_tick = hrtimer_get_expires(&ts->sched_time
On Mon, 13 May 2013, Jiri Kosina wrote:
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -650,6 +650,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct
> > tick_sched *ts,
> >
> > ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
> >
On Fri, 10 May 2013, Frederic Weisbecker wrote:
> The problem is that it doesn't catch issues with irqs that have been enabled
> before in start_secondary(), then re-disabled somewhow. Warning on offline
> CPU from the place
> that disables the tick should catch the issue.
>
> Jiri, could you t
On Fri, May 10, 2013 at 06:23:40PM +0200, Borislav Petkov wrote:
> On Fri, May 10, 2013 at 05:43:50PM +0200, Frederic Weisbecker wrote:
> > So either interrupts are spuriously enabled early, or ts->tick_stopped
> > is not correctly initialized.
>
> Hmm, it can't be interrupts disabled because add_
On Fri, May 10, 2013 at 05:43:50PM +0200, Frederic Weisbecker wrote:
> Right. But this is adding a timer locally, from CPU 1 to CPU 1, as
> indicated in the trace with the "1 1" line. So the only way for
> this IPI to be self-sent is if the tick is stopped locally (cf:
> wake_up_full_nohz_cpu()).
>
On Fri, May 10, 2013 at 05:21:02PM +0200, Borislav Petkov wrote:
> On Fri, May 10, 2013 at 05:03:56PM +0200, Jiri Kosina wrote:
> > [ ... snip ... ]
> > Enabling non-boot CPUs ...
> > smpboot: Booting Node 0 Processor 1 APIC 0x1
> > CPU1 microcode updated early to revision 0x60f, date = 2010-09-
On Fri, 10 May 2013, Frederic Weisbecker wrote:
> In fact it would be nice to have DO_ONCE(something) and stuff whatever
> we want inside.
> All the printk_once() et. al could even be implemented using that.
Sounds nice, but if it's going to be used for something else than purely
debugging outpu
On Fri, May 10, 2013 at 11:45:56AM +0200, Borislav Petkov wrote:
> On Fri, May 10, 2013 at 11:37:29AM +0200, Ingo Molnar wrote:
> > The pattern I use in such cases is:
> >
> > if (WARN_ONCE(!cpu_online(cpu))) {
> > printk("%d %d\n", cpu, smp_processor_id());
> > dump_st
On Fri, May 10, 2013 at 05:03:56PM +0200, Jiri Kosina wrote:
> [ ... snip ... ]
> Enabling non-boot CPUs ...
> smpboot: Booting Node 0 Processor 1 APIC 0x1
> CPU1 microcode updated early to revision 0x60f, date = 2010-09-29
> Disabled fast string operations
> 1 1
> CPU: 1 PID: 0 Comm: swapper
On Fri, 10 May 2013, Frederic Weisbecker wrote:
> Like Borislav said, it's due to the scheduler IPI sent to an offline
> target. Here this is because we enqueue a timer and we must ensure the
> target handles this timer by rescheduling its tick if necessary.
>
> But it's weird because the mce tim
On Fri, May 10, 2013 at 11:37:29AM +0200, Ingo Molnar wrote:
> The pattern I use in such cases is:
>
> if (WARN_ONCE(!cpu_online(cpu))) {
> printk("%d %d\n", cpu, smp_processor_id());
> dump_stack();
> }
Cool, and WARN_ONCE dumps stack already so:
* Frederic Weisbecker wrote:
> 2013/5/10 Borislav Petkov :
> > On Fri, May 10, 2013 at 02:29:31AM +0200, Frederic Weisbecker wrote:
> >> @@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu)
> >> {
> >> if (tick_nohz_full_cpu(cpu)) {
> >> if (cpu != smp_processor_i
On Fri, May 10, 2013 at 11:26:39AM +0200, Frederic Weisbecker wrote:
> 2013/5/10 Borislav Petkov :
> > On Fri, May 10, 2013 at 02:29:31AM +0200, Frederic Weisbecker wrote:
> >> @@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu)
> >> {
> >> if (tick_nohz_full_cpu(cpu)) {
> >>
2013/5/10 Borislav Petkov :
> On Fri, May 10, 2013 at 02:29:31AM +0200, Frederic Weisbecker wrote:
>> @@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu)
>> {
>> if (tick_nohz_full_cpu(cpu)) {
>> if (cpu != smp_processor_id() ||
>> - tick_nohz_tick_s
On Fri, May 10, 2013 at 02:29:31AM +0200, Frederic Weisbecker wrote:
> @@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu)
> {
> if (tick_nohz_full_cpu(cpu)) {
> if (cpu != smp_processor_id() ||
> - tick_nohz_tick_stopped())
> + tick_
On Thu, May 09, 2013 at 02:29:18PM +0200, Jiri Kosina wrote:
> Hi,
>
> I just got the warning below when resuming from hibernation with kernel
> that has NO_HZ_FULL_ALL=y. This is with topmost commit e0fd9affeb640.
>
>
> [ ... snip ... ]
> PM: Hibernation mode set to 'shutdown'
> PM: Marking
On Thu, May 09, 2013 at 02:50:40PM +0200, Borislav Petkov wrote:
> Looks like we're sending a resched IPI to a cpu which is not online
> yet in order to start the MCE polling timer. So the rcu* options are
> kinda unlikely to be related, AFAICT.
On a second thought, they must be somehow indirectly
On Thu, May 09, 2013 at 02:29:18PM +0200, Jiri Kosina wrote:
> Hi,
>
> I just got the warning below when resuming from hibernation with kernel
> that has NO_HZ_FULL_ALL=y. This is with topmost commit e0fd9affeb640.
Did you boot with any of the NO_HZ_FULL options on the command line,
i.e. rcu_noc
Hi,
I just got the warning below when resuming from hibernation with kernel
that has NO_HZ_FULL_ALL=y. This is with topmost commit e0fd9affeb640.
[ ... snip ... ]
PM: Hibernation mode set to 'shutdown'
PM: Marking nosave pages: [mem 0x0009e000-0x000f]
PM: Marking nosave pages: [mem 0x7c4
50 matches
Mail list logo