Re: [patch V4 33/37] cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE

2023-05-23 Thread Mark Brown
On Tue, May 23, 2023 at 01:12:26AM +0200, Thomas Gleixner wrote:

> Let me find a brown paperbag and go to sleep before I even try to
> compile the obvious fix.

That fixes the problem on TX2 - thanks!

Tested-by: Mark Brown 


signature.asc
Description: PGP signature


Re: [patch V4 33/37] cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE

2023-05-22 Thread Thomas Gleixner
On Mon, May 22 2023 at 23:27, Mark Brown wrote:
> On Mon, May 22, 2023 at 11:04:17PM +0200, Thomas Gleixner wrote:
>
>> That does not make any sense at all and my tired brain does not help
>> either.
>
>> Can you please apply the below debug patch and provide the output?
>
> Here's the log, a quick glance says the 
>
>   if (!--ncpus)
>   break;
>
> check is doing the wrong thing

Obviously.

Let me find a brown paperbag and go to sleep before I even try to
compile the obvious fix.

---
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 005f863a3d2b..88a7ede322bd 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1770,9 +1770,6 @@ static void __init cpuhp_bringup_mask(const struct 
cpumask *mask, unsigned int n
for_each_cpu(cpu, mask) {
struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 
-   if (!--ncpus)
-   break;
-
if (cpu_up(cpu, target) && can_rollback_cpu(st)) {
/*
 * If this failed then cpu_up() might have only
@@ -1781,6 +1778,9 @@ static void __init cpuhp_bringup_mask(const struct 
cpumask *mask, unsigned int n
 */
WARN_ON(cpuhp_invoke_callback_range(false, cpu, st, 
CPUHP_OFFLINE));
}
+
+   if (!--ncpus)
+   break;
}
 }
 



Re: [patch V4 33/37] cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE

2023-05-22 Thread Mark Brown
On Mon, May 22, 2023 at 11:04:17PM +0200, Thomas Gleixner wrote:

> That does not make any sense at all and my tired brain does not help
> either.

> Can you please apply the below debug patch and provide the output?

Here's the log, a quick glance says the 

if (!--ncpus)
break;

check is doing the wrong thing when CONFIG_NR_CPUS=256 as it is for
arm64 defconfig and the system actually has 256 CPUs, the Odroid looks
like the same issue as the Exynos defconfig that fails there has
NR_CPUs=8 which is what the board has.

[0.048542] smp: Bringing up secondary CPUs ...
[0.048545] Bringup max 256 CPUs to 235
[0.048547] Bringup CPU0 left 256
[0.048575] Bringup CPU0 0
[0.048577] Bringup CPU1 left 255
[0.124561] Detected PIPT I-cache on CPU1
[0.124586] GICv3: CPU1: found redistributor 100 region 0:0x00040108
[0.124595] GICv3: CPU1: using allocated LPI pending table 
@0x00088091
[0.124654] CPU1: Booted secondary processor 0x000100 [0x431f0af1]
[0.124759] Bringup CPU1 0
[0.124763] Bringup CPU2 left 254
[0.195421] Detected PIPT I-cache on CPU2
[0.195445] GICv3: CPU2: found redistributor 200 region 0:0x00040110
[0.195453] GICv3: CPU2: using allocated LPI pending table 
@0x00088092
[0.195510] CPU2: Booted secondary processor 0x000200 [0x431f0af1]
[0.195611] Bringup CPU2 0
[0.195615] Bringup CPU3 left 253
[0.273859] Detected PIPT I-cache on CPU3
[0.273885] GICv3: CPU3: found redistributor 300 region 0:0x00040118
[0.273893] GICv3: CPU3: using allocated LPI pending table 
@0x00088093
[0.273949] CPU3: Booted secondary processor 0x000300 [0x431f0af1]
[0.274050] Bringup CPU3 0
[0.274053] Bringup CPU4 left 252
[0.351345] Detected PIPT I-cache on CPU4
[0.351374] GICv3: CPU4: found redistributor 400 region 0:0x00040120
[0.351382] GICv3: CPU4: using allocated LPI pending table 
@0x00088094
[0.351438] CPU4: Booted secondary processor 0x000400 [0x431f0af1]
[0.351540] Bringup CPU4 0
[0.351543] Bringup CPU5 left 251
[0.431068] Detected PIPT I-cache on CPU5
[0.431099] GICv3: CPU5: found redistributor 500 region 0:0x00040128
[0.431107] GICv3: CPU5: using allocated LPI pending table 
@0x00088095
[0.431162] CPU5: Booted secondary processor 0x000500 [0x431f0af1]
[0.431264] Bringup CPU5 0
[0.431267] Bringup CPU6 left 250
[0.503403] Detected PIPT I-cache on CPU6
[0.503435] GICv3: CPU6: found redistributor 600 region 0:0x00040130
[0.503443] GICv3: CPU6: using allocated LPI pending table 
@0x00088096
[0.503498] CPU6: Booted secondary processor 0x000600 [0x431f0af1]
[0.503600] Bringup CPU6 0
[0.503604] Bringup CPU7 left 249
[0.580128] Detected PIPT I-cache on CPU7
[0.580162] GICv3: CPU7: found redistributor 700 region 0:0x00040138
[0.580171] GICv3: CPU7: using allocated LPI pending table 
@0x00088097
[0.580226] CPU7: Booted secondary processor 0x000700 [0x431f0af1]
[0.580328] Bringup CPU7 0
[0.580332] Bringup CPU8 left 248
[0.660158] Detected PIPT I-cache on CPU8
[0.660194] GICv3: CPU8: found redistributor 800 region 0:0x00040140
[0.660203] GICv3: CPU8: using allocated LPI pending table 
@0x00088098
[0.660258] CPU8: Booted secondary processor 0x000800 [0x431f0af1]
[0.660359] Bringup CPU8 0
[0.660363] Bringup CPU9 left 247
[0.741063] Detected PIPT I-cache on CPU9
[0.741102] GICv3: CPU9: found redistributor 900 region 0:0x00040148
[0.741110] GICv3: CPU9: using allocated LPI pending table 
@0x00088099
[0.741166] CPU9: Booted secondary processor 0x000900 [0x431f0af1]
[0.741268] Bringup CPU9 0
[0.741272] Bringup CPU10 left 246
[0.817643] Detected PIPT I-cache on CPU10
[0.817684] GICv3: CPU10: found redistributor a00 region 0:0x00040150
[0.817692] GICv3: CPU10: using allocated LPI pending table 
@0x0008809a
[0.817747] CPU10: Booted secondary processor 0x000a00 [0x431f0af1]
[0.817850] Bringup CPU10 0
[0.817854] Bringup CPU11 left 245
[0.896094] Detected PIPT I-cache on CPU11
[0.896137] GICv3: CPU11: found redistributor b00 region 0:0x00040158
[0.896145] GICv3: CPU11: using allocated LPI pending table 
@0x0008809b
[0.896202] CPU11: Booted secondary processor 0x000b00 [0x431f0af1]
[0.896304] Bringup CPU11 0
[0.896308] Bringup CPU12 left 244
[0.976966] Detected PIPT I-cache on CPU12
[0.977010] GICv3: CPU12: found redistributor c00 region 0:0x00040160
[0.977018] GICv3: CPU12: using allocated LPI pending table 
@0x0008809c
[0.977074] CPU12: Booted secondary processor 0x000c00 [0x431f0af1]
[0.977179] Bringup CPU12 0
[0.977183] Bringup CPU13 left 243
[1.053939] Detected PIPT I-cache on CPU1

Re: [patch V4 33/37] cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE

2023-05-22 Thread Thomas Gleixner
On Mon, May 22 2023 at 20:45, Mark Brown wrote:
> On Fri, May 12, 2023 at 11:07:50PM +0200, Thomas Gleixner wrote:
>> From: Thomas Gleixner 
>> 
>> There is often significant latency in the early stages of CPU bringup, and
>> time is wasted by waking each CPU (e.g. with SIPI/INIT/INIT on x86) and
>> then waiting for it to respond before moving on to the next.
>> 
>> Allow a platform to enable parallel setup which brings all to be onlined
>> CPUs up to the CPUHP_BP_KICK_AP state. While this state advancement on the
>> control CPU (BP) is single-threaded the important part is the last state
>> CPUHP_BP_KICK_AP which wakes the to be onlined CPUs up.
>
> We're seeing a regression on ThunderX2 systems with 256 CPUs with an
> arm64 defconfig running -next which I've bisected to this patch.  Before
> this commit we bring up 256 CPUs:
>
> [   29.137225] GICv3: CPU254: found redistributor 11e03 region 
> 1:0x000441f6
> [   29.137238] GICv3: CPU254: using allocated LPI pending table 
> @0x0008818e
> [   29.137305] CPU254: Booted secondary processor 0x011e03 [0x431f0af1]
> [   29.292421] Detected PIPT I-cache on CPU255
> [   29.292635] GICv3: CPU255: found redistributor 11f03 region 
> 1:0x000441fe
> [   29.292648] GICv3: CPU255: using allocated LPI pending table 
> @0x0008818f
> [   29.292715] CPU255: Booted secondary processor 0x011f03 [0x431f0af1]
> [   29.292859] smp: Brought up 2 nodes, 256 CPUs
> [   29.292864] SMP: Total of 256 processors activated.
>
> but after we only bring up 255, missing the 256th:
>
> [   29.165888] GICv3: CPU254: found redistributor 11e03 region 
> 1:0x000441f6
> [   29.165901] GICv3: CPU254: using allocated LPI pending table 
> @0x0008818e
> [   29.165968] CPU254: Booted secondary processor 0x011e03 [0x431f0af1]
> [   29.166120] smp: Brought up 2 nodes, 255 CPUs
> [   29.166125] SMP: Total of 255 processors activated.
>
> I can't immediately see an issue with the patch itself, for systems
> without CONFIG_HOTPLUG_PARALLEL=y it should replace the loop over
> cpu_present_mask done by for_each_present_cpu() with an open coded one.
> I didn't check the rest of the series yet.
>
> The KernelCI bisection bot also isolated an issue on Odroid XU3 (a 32
> bit arm system) with the final CPU of the 8 on the system not coming up
> to the same patch:
>
>   
> https://groups.io/g/kernelci-results/message/42480?p=%2C%2C%2C20%2C0%2C0%2C0%3A%3Acreated%2C0%2Call-cpus%2C20%2C2%2C0%2C9905
>
> Other boards I've checked (including some with multiple CPU clusters)
> seem to be bringing up all their CPUs so it doesn't seem to just be
> general breakage.

That does not make any sense at all and my tired brain does not help
either.

Can you please apply the below debug patch and provide the output?

Thanks,

tglx
---
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 005f863a3d2b..90a9b2ae8391 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1767,13 +1767,20 @@ static void __init cpuhp_bringup_mask(const struct 
cpumask *mask, unsigned int n
 {
unsigned int cpu;
 
+   pr_info("Bringup max %u CPUs to %d\n", ncpus, target);
+
for_each_cpu(cpu, mask) {
struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+   int ret;
+
+   pr_info("Bringup CPU%u left %u\n", cpu, ncpus);
 
if (!--ncpus)
break;
 
-   if (cpu_up(cpu, target) && can_rollback_cpu(st)) {
+   ret = cpu_up(cpu, target);
+   pr_info("Bringup CPU%u %d\n", cpu, ret);
+   if (ret && can_rollback_cpu(st)) {
/*
 * If this failed then cpu_up() might have only
 * rolled back to CPUHP_BP_KICK_AP for the final



Re: [patch V4 33/37] cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE

2023-05-22 Thread Mark Brown
On Fri, May 12, 2023 at 11:07:50PM +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> There is often significant latency in the early stages of CPU bringup, and
> time is wasted by waking each CPU (e.g. with SIPI/INIT/INIT on x86) and
> then waiting for it to respond before moving on to the next.
> 
> Allow a platform to enable parallel setup which brings all to be onlined
> CPUs up to the CPUHP_BP_KICK_AP state. While this state advancement on the
> control CPU (BP) is single-threaded the important part is the last state
> CPUHP_BP_KICK_AP which wakes the to be onlined CPUs up.

We're seeing a regression on ThunderX2 systems with 256 CPUs with an
arm64 defconfig running -next which I've bisected to this patch.  Before
this commit we bring up 256 CPUs:

[   29.137225] GICv3: CPU254: found redistributor 11e03 region 
1:0x000441f6
[   29.137238] GICv3: CPU254: using allocated LPI pending table 
@0x0008818e
[   29.137305] CPU254: Booted secondary processor 0x011e03 [0x431f0af1]
[   29.292421] Detected PIPT I-cache on CPU255
[   29.292635] GICv3: CPU255: found redistributor 11f03 region 
1:0x000441fe
[   29.292648] GICv3: CPU255: using allocated LPI pending table 
@0x0008818f
[   29.292715] CPU255: Booted secondary processor 0x011f03 [0x431f0af1]
[   29.292859] smp: Brought up 2 nodes, 256 CPUs
[   29.292864] SMP: Total of 256 processors activated.

but after we only bring up 255, missing the 256th:

[   29.165888] GICv3: CPU254: found redistributor 11e03 region 
1:0x000441f6
[   29.165901] GICv3: CPU254: using allocated LPI pending table 
@0x0008818e
[   29.165968] CPU254: Booted secondary processor 0x011e03 [0x431f0af1]
[   29.166120] smp: Brought up 2 nodes, 255 CPUs
[   29.166125] SMP: Total of 255 processors activated.

I can't immediately see an issue with the patch itself, for systems
without CONFIG_HOTPLUG_PARALLEL=y it should replace the loop over
cpu_present_mask done by for_each_present_cpu() with an open coded one.
I didn't check the rest of the series yet.

The KernelCI bisection bot also isolated an issue on Odroid XU3 (a 32
bit arm system) with the final CPU of the 8 on the system not coming up
to the same patch:

  
https://groups.io/g/kernelci-results/message/42480?p=%2C%2C%2C20%2C0%2C0%2C0%3A%3Acreated%2C0%2Call-cpus%2C20%2C2%2C0%2C9905

Other boards I've checked (including some with multiple CPU clusters)
seem to be bringing up all their CPUs so it doesn't seem to just be
general breakage.

Log from my bisect:

git bisect start
# bad: [9f258af06b6268be8e960f63c3f66e88bdbbbdb0] Add linux-next specific files 
for 20230522
git bisect bad 9f258af06b6268be8e960f63c3f66e88bdbbbdb0
# good: [44c026a73be8038f03dbdeef028b642880cf1511] Linux 6.4-rc3
git bisect good 44c026a73be8038f03dbdeef028b642880cf1511
# good: [914db90ee0172753ab5298a48c63ac4f1fe089cf] Merge branch 
'for-linux-next' of git://anongit.freedesktop.org/drm/drm-misc
git bisect good 914db90ee0172753ab5298a48c63ac4f1fe089cf
# good: [4624865b65777295cbe97cf1b98e6e49d81119d3] Merge branch 'next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git
git bisect good 4624865b65777295cbe97cf1b98e6e49d81119d3
# bad: [be7220c44fbc06825f7f122d06051630e1bf51e4] Merge branch 'for-next' of 
git://github.com/cminyard/linux-ipmi.git
git bisect bad be7220c44fbc06825f7f122d06051630e1bf51e4
# good: [cc677f7bec0da862a93d176524cdad5f416d58ef] Merge branch 'for-next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
git bisect good cc677f7bec0da862a93d176524cdad5f416d58ef
# bad: [cdcc744aee1b886cbe4737798c0b8178b9ba5ae5] next-20230518/rcu
git bisect bad cdcc744aee1b886cbe4737798c0b8178b9ba5ae5
# bad: [8397dce1586a35af63fe9ea3e8fb3344758e55b5] Merge branch into tip/master: 
'x86/mm'
git bisect bad 8397dce1586a35af63fe9ea3e8fb3344758e55b5
# bad: [0c7ffa32dbd6b09a87fea4ad1de8b27145dfd9a6] x86/smpboot/64: Implement 
arch_cpuhp_init_parallel_bringup() and enable it
git bisect bad 0c7ffa32dbd6b09a87fea4ad1de8b27145dfd9a6
# good: [ab24eb9abb9c60c45119370731735b79ed79f36c] x86/xen/hvm: Get rid of 
DEAD_FROZEN handling
git bisect good ab24eb9abb9c60c45119370731735b79ed79f36c
# good: [72b11aa7f8f93449141544cecb21b2963416902d] riscv: Switch to hotplug 
core state synchronization
git bisect good 72b11aa7f8f93449141544cecb21b2963416902d
# good: [f54d4434c281f38b975d58de47adeca671beff4f] x86/apic: Provide 
cpu_primary_thread mask
git bisect good f54d4434c281f38b975d58de47adeca671beff4f
# bad: [bea629d57d006733d155bdb65ba4867788da69b6] x86/apic: Save the APIC 
virtual base address
git bisect bad bea629d57d006733d155bdb65ba4867788da69b6
# bad: [18415f33e2ac4ab382cbca8b5ff82a9036b5bd49] cpu/hotplug: Allow "parallel" 
bringup up to CPUHP_BP_KICK_AP_STATE
git bisect bad 18415f33e2ac4ab382cbca8b5ff82a9036b5bd49
# first bad commit: [18415f33e2ac4ab382cbca8b5ff82a9036b5bd49] cpu/hotplug: 
Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE


signature.asc
Description: PGP signatur