Re: [PATCH 04/36] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE

2022-07-30 Thread Michel Lespinasse
On Sat, Jul 30, 2022 at 09:52:34PM +0200, Rafael J. Wysocki wrote:
> On Sat, Jul 30, 2022 at 11:48 AM Michel Lespinasse
>  wrote:
> > I'm not sure if that was the patch you meant to send though, as it
> > seems it's only adding a tracepoint so shouldn't make any difference
> > if I'm not actually using the tracepoint ?
> 
> You are right, it looks like I pasted a link to a different patch by
> mistake.  Sorry about that.
> 
> I meant this one:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=pm&id=d295ad34f236c3518634fb6403d4c0160456e470
> 
> which will appear in the final 5.19.

Thanks. I can confirm that this patch fixes the boot time debug
warnings for me. And I see that linus already merged it, nice!

--
Michel "walken" Lespinasse.



Re: [PATCH 04/36] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE

2022-07-30 Thread Michel Lespinasse
On Fri, Jul 29, 2022 at 04:59:50PM +0200, Rafael J. Wysocki wrote:
> On Fri, Jul 29, 2022 at 12:25 PM Michel Lespinasse
>  wrote:
> >
> > On Thu, Jul 28, 2022 at 10:20:53AM -0700, Paul E. McKenney wrote:
> > > On Mon, Jul 25, 2022 at 12:43:06PM -0700, Michel Lespinasse wrote:
> > > > On Wed, Jun 08, 2022 at 04:27:27PM +0200, Peter Zijlstra wrote:
> > > > > Commit c227233ad64c ("intel_idle: enable interrupts before C1 on
> > > > > Xeons") wrecked intel_idle in two ways:
> > > > >
> > > > >  - must not have tracing in idle functions
> > > > >  - must return with IRQs disabled
> > > > >
> > > > > Additionally, it added a branch for no good reason.
> > > > >
> > > > > Fixes: c227233ad64c ("intel_idle: enable interrupts before C1 on 
> > > > > Xeons")
> > > > > Signed-off-by: Peter Zijlstra (Intel) 
> > > >
> > > > After this change was introduced, I am seeing "WARNING: suspicious RCU
> > > > usage" when booting a kernel with debug options compiled in. Please
> > > > see the attached dmesg output. The issue starts with commit 32d4fd5751ea
> > > > and is still present in v5.19-rc8.
> > > >
> > > > I'm not sure, is this too late to fix or revert in v5.19 final ?
> > >
> > > I finally got a chance to take a quick look at this.
> > >
> > > The rcu_eqs_exit() function is making a lockdep complaint about
> > > being invoked with interrupts enabled.  This function is called from
> > > rcu_idle_exit(), which is an expected code path from cpuidle_enter_state()
> > > via its call to rcu_idle_exit().  Except that rcu_idle_exit() disables
> > > interrupts before invoking rcu_eqs_exit().
> > >
> > > The only other call to rcu_idle_exit() does not disable interrupts,
> > > but it is via rcu_user_exit(), which would be a very odd choice for
> > > cpuidle_enter_state().
> > >
> > > It seems unlikely, but it might be that it is the use of local_irq_save()
> > > instead of raw_local_irq_save() within rcu_idle_exit() that is causing
> > > the trouble.  If this is the case, then the commit shown below would
> > > help.  Note that this commit removes the warning from lockdep, so it
> > > is necessary to build the kernel with CONFIG_RCU_EQS_DEBUG=y to enable
> > > equivalent debugging.
> > >
> > > Could you please try your test with the -rce commit shown below applied?
> >
> > Thanks for looking into it.
> >
> > After checking out Peter's commit 32d4fd5751ea,
> > cherry picking your commit ed4ae5eff4b3,
> > and setting CONFIG_RCU_EQS_DEBUG=y in addition of my usual debug config,
> > I am now seeing this a few seconds into the boot:
> >
> > [3.010650] [ cut here ]
> > [3.010651] WARNING: CPU: 0 PID: 0 at kernel/sched/clock.c:397 
> > sched_clock_tick+0x27/0x60
> > [3.010657] Modules linked in:
> > [3.010660] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> > 5.19.0-rc1-test-5-g1be22fea0611 #1
> > [3.010662] Hardware name: LENOVO 30BFS44D00/1036, BIOS S03KT51A 
> > 01/17/2022
> > [3.010663] RIP: 0010:sched_clock_tick+0x27/0x60
> > [3.010665] Code: 1f 40 00 53 eb 02 5b c3 66 90 8b 05 2f c3 40 01 85 c0 
> > 74 18 65 8b 05 60 88 8f 4e 85 c0 75 0d 65 8b 05 a9 85 8f 4e 85 c0 74 02 
> > <0f> 0b e8 e2 6c 89 00 48 c7 c3 40 d5 02 00
> >  89 c0 48 03 1c c5 c0 98
> > [3.010667] RSP: :b2803e28 EFLAGS: 00010002
> > [3.010670] RAX: 0001 RBX: c8ce7fa07060 RCX: 
> > 0001
> > [3.010671] RDX:  RSI: b268dd21 RDI: 
> > b269ab13
> > [3.010673] RBP: 0001 R08: ffc300d5 R09: 
> > 0002be80
> > [3.010674] R10: 03625b53183a R11: a012b802b7a4 R12: 
> > b2aa9e80
> > [3.010675] R13: b2aa9e00 R14: 0001 R15: 
> > 
> > [3.010677] FS:  () GS:a012b800() 
> > knlGS:
> > [3.010678] CS:  0010 DS:  ES:  CR0: 80050033
> > [3.010680] CR2: a012f81ff000 CR3: 000c99612001 CR4: 
> > 003706f0
> > [3.010681] DR0:  DR1:  DR2: 
> > 
> > [3.010682] DR3:  DR6: fffe0ff0 DR7: 
> > 0400
> > [3.010683] Call Trace:

Re: [PATCH 04/36] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE

2022-07-30 Thread Michel Lespinasse
On Fri, Jul 29, 2022 at 08:26:22AM -0700, Paul E. McKenney wrote:> Would you be 
willing to try another shot in the dark, but untested
> this time?  I freely admit that this is getting strange.
> 
>   Thanx, Paul

Yes, adding this second change got rid of the boot time warning for me.

> 
> 
> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index e374c0c923dae..279f557bf60bb 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -394,7 +394,7 @@ notrace void sched_clock_tick(void)
>   if (!static_branch_likely(&sched_clock_running))
>   return;
>  
> - lockdep_assert_irqs_disabled();
> + WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !raw_irqs_disabled());
>  
>   scd = this_scd();
>   __scd_stamp(scd);



Re: [PATCH 04/36] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE

2022-07-29 Thread Michel Lespinasse
On Thu, Jul 28, 2022 at 10:20:53AM -0700, Paul E. McKenney wrote:
> On Mon, Jul 25, 2022 at 12:43:06PM -0700, Michel Lespinasse wrote:
> > On Wed, Jun 08, 2022 at 04:27:27PM +0200, Peter Zijlstra wrote:
> > > Commit c227233ad64c ("intel_idle: enable interrupts before C1 on
> > > Xeons") wrecked intel_idle in two ways:
> > > 
> > >  - must not have tracing in idle functions
> > >  - must return with IRQs disabled
> > > 
> > > Additionally, it added a branch for no good reason.
> > > 
> > > Fixes: c227233ad64c ("intel_idle: enable interrupts before C1 on Xeons")
> > > Signed-off-by: Peter Zijlstra (Intel) 
> > 
> > After this change was introduced, I am seeing "WARNING: suspicious RCU
> > usage" when booting a kernel with debug options compiled in. Please
> > see the attached dmesg output. The issue starts with commit 32d4fd5751ea
> > and is still present in v5.19-rc8.
> > 
> > I'm not sure, is this too late to fix or revert in v5.19 final ?
> 
> I finally got a chance to take a quick look at this.
> 
> The rcu_eqs_exit() function is making a lockdep complaint about
> being invoked with interrupts enabled.  This function is called from
> rcu_idle_exit(), which is an expected code path from cpuidle_enter_state()
> via its call to rcu_idle_exit().  Except that rcu_idle_exit() disables
> interrupts before invoking rcu_eqs_exit().
> 
> The only other call to rcu_idle_exit() does not disable interrupts,
> but it is via rcu_user_exit(), which would be a very odd choice for
> cpuidle_enter_state().
> 
> It seems unlikely, but it might be that it is the use of local_irq_save()
> instead of raw_local_irq_save() within rcu_idle_exit() that is causing
> the trouble.  If this is the case, then the commit shown below would
> help.  Note that this commit removes the warning from lockdep, so it
> is necessary to build the kernel with CONFIG_RCU_EQS_DEBUG=y to enable
> equivalent debugging.
> 
> Could you please try your test with the -rce commit shown below applied?

Thanks for looking into it.

After checking out Peter's commit 32d4fd5751ea,
cherry picking your commit ed4ae5eff4b3,
and setting CONFIG_RCU_EQS_DEBUG=y in addition of my usual debug config,
I am now seeing this a few seconds into the boot:

[3.010650] [ cut here ]
[3.010651] WARNING: CPU: 0 PID: 0 at kernel/sched/clock.c:397 
sched_clock_tick+0x27/0x60
[3.010657] Modules linked in:
[3.010660] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
5.19.0-rc1-test-5-g1be22fea0611 #1
[3.010662] Hardware name: LENOVO 30BFS44D00/1036, BIOS S03KT51A 01/17/2022
[3.010663] RIP: 0010:sched_clock_tick+0x27/0x60
[3.010665] Code: 1f 40 00 53 eb 02 5b c3 66 90 8b 05 2f c3 40 01 85 c0 74 
18 65 8b 05 60 88 8f 4e 85 c0 75 0d 65 8b 05 a9 85 8f 4e 85 c0 74 02 <0f> 0b e8 
e2 6c 89 00 48 c7 c3 40 d5 02 00
 89 c0 48 03 1c c5 c0 98
[3.010667] RSP: :b2803e28 EFLAGS: 00010002
[3.010670] RAX: 0001 RBX: c8ce7fa07060 RCX: 0001
[3.010671] RDX:  RSI: b268dd21 RDI: b269ab13
[3.010673] RBP: 0001 R08: ffc300d5 R09: 0002be80
[3.010674] R10: 03625b53183a R11: a012b802b7a4 R12: b2aa9e80
[3.010675] R13: b2aa9e00 R14: 0001 R15: 
[3.010677] FS:  () GS:a012b800() 
knlGS:
[3.010678] CS:  0010 DS:  ES:  CR0: 80050033
[3.010680] CR2: a012f81ff000 CR3: 000c99612001 CR4: 003706f0
[3.010681] DR0:  DR1:  DR2: 
[3.010682] DR3:  DR6: fffe0ff0 DR7: 0400
[3.010683] Call Trace:
[3.010685]  
[3.010688]  cpuidle_enter_state+0xb7/0x4b0
[3.010694]  cpuidle_enter+0x29/0x40
[3.010697]  do_idle+0x1d4/0x210
[3.010702]  cpu_startup_entry+0x19/0x20
[3.010704]  rest_init+0x117/0x1a0
[3.010708]  arch_call_rest_init+0xa/0x10
[3.010711]  start_kernel+0x6d8/0x6ff
[3.010716]  secondary_startup_64_no_verify+0xce/0xdb
[3.010728]  
[3.010729] irq event stamp: 44179
[3.010730] hardirqs last  enabled at (44179): [] 
asm_sysvec_apic_timer_interrupt+0x1b/0x20
[3.010734] hardirqs last disabled at (44177): [] 
__do_softirq+0x3f0/0x498
[3.010736] softirqs last  enabled at (44178): [] 
__do_softirq+0x332/0x498
[3.010738] softirqs last disabled at (44171): [] 
irq_exit_rcu+0xab/0xf0
[3.010741] ---[ end trace  ]---



Re: [PATCH 04/36] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE

2022-07-25 Thread Michel Lespinasse
On Wed, Jun 08, 2022 at 04:27:27PM +0200, Peter Zijlstra wrote:
> Commit c227233ad64c ("intel_idle: enable interrupts before C1 on
> Xeons") wrecked intel_idle in two ways:
> 
>  - must not have tracing in idle functions
>  - must return with IRQs disabled
> 
> Additionally, it added a branch for no good reason.
> 
> Fixes: c227233ad64c ("intel_idle: enable interrupts before C1 on Xeons")
> Signed-off-by: Peter Zijlstra (Intel) 

After this change was introduced, I am seeing "WARNING: suspicious RCU
usage" when booting a kernel with debug options compiled in. Please
see the attached dmesg output. The issue starts with commit 32d4fd5751ea
and is still present in v5.19-rc8.

I'm not sure, is this too late to fix or revert in v5.19 final ?

Thanks,

--
Michel "walken" Lespinasse
[0.00] microcode: microcode updated early to revision 0x2006d05, date = 
2021-11-13
[0.00] Linux version 5.19.0-rc8-test-3-ge3a8d97e6a35 (walken@zeus) 
(gcc (Debian 8.3.0-6) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #1 SMP 
PREEMPT_DYNAMIC Mon Jul 25 00:32:16 PDT 2022
[0.00] Command line: 
BOOT_IMAGE=/vmlinuz-5.19.0-rc8-test-3-ge3a8d97e6a35 
root=/dev/mapper/budai--vg-root ro consoleblank=600 quiet
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[0.00] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[0.00] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[0.00] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
[0.00] x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
[0.00] x86/fpu: xstate_offset[5]:  960, xstate_sizes[5]:   64
[0.00] x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]:  512
[0.00] x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024
[0.00] x86/fpu: Enabled xstate features 0xff, context size is 2560 
bytes, using 'compacted' format.
[0.00] signal: max sigframe size: 3632
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0003efff] usable
[0.00] BIOS-e820: [mem 0x0003f000-0x0003] reserved
[0.00] BIOS-e820: [mem 0x0004-0x0009] usable
[0.00] BIOS-e820: [mem 0x000a-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x62a8efff] usable
[0.00] BIOS-e820: [mem 0x62a8f000-0x6aef4fff] reserved
[0.00] BIOS-e820: [mem 0x6aef5000-0x6b0c2fff] ACPI data
[0.00] BIOS-e820: [mem 0x6b0c3000-0x6be0dfff] ACPI NVS
[0.00] BIOS-e820: [mem 0x6be0e000-0x6c7f1fff] reserved
[0.00] BIOS-e820: [mem 0x6c7f2000-0x6c8e5fff] type 20
[0.00] BIOS-e820: [mem 0x6c8e6000-0x6c8e6fff] reserved
[0.00] BIOS-e820: [mem 0x6c8e7000-0x6c9a7fff] type 20
[0.00] BIOS-e820: [mem 0x6c9a8000-0x6e6f] usable
[0.00] BIOS-e820: [mem 0x6e70-0x8fff] reserved
[0.00] BIOS-e820: [mem 0xfd00-0xfe010fff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfed0-0xfed00fff] reserved
[0.00] BIOS-e820: [mem 0xfed2-0xfed44fff] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xff00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00107fff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] efi: EFI v2.70 by American Megatrends
[0.00] efi: TPMFinalLog=0x6bd97000 ACPI=0x6b0c2000 ACPI 2.0=0x6b0c2014 
SMBIOS=0x6c47b000 SMBIOS 3.0=0x6c47a000 MEMATTR=0x5d04d018 ESRT=0x604e0718 
[0.00] SMBIOS 3.0.0 present.
[0.00] DMI: LENOVO 30BFS44D00/1036, BIOS S03KT51A 01/17/2022
[0.00] tsc: Detected 3700.000 MHz processor
[0.00] tsc: Detected 3699.850 MHz TSC
[0.000278] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.000281] e820: remove [mem 0x000a-0x000f] usable
[0.000289] last_pfn = 0x108 max_arch_pfn = 0x4
[0.000397] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[0.001626] e820: update [mem 0x7e00-0xf