RE: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-05-04 Thread Thomas Gleixner
Michael!

On Thu, Apr 27 2023 at 14:48, Michael Kelley wrote:
> From: Thomas Gleixner  Sent: Friday, April 14, 2023 4:44 
> PM
>
> I smoke-tested several Linux guest configurations running on Hyper-V,
> using the "kernel/git/tglx/devel.git hotplug" tree as updated on April 26th.
> No functional issues, but encountered one cosmetic issue (details below).
>
> Configurations tested:
> *  16 vCPUs and 32 vCPUs
> *  1 NUMA node and 2 NUMA nodes
> *  Parallel bring-up enabled and disabled via kernel boot line
> *  "Normal" VMs and SEV-SNP VMs running with a paravisor on Hyper-V.
> This config can use parallel bring-up because most of the SNP-ness is
> hidden in the paravisor.  I was glad to see this work properly.
>
> There's not much difference in performance with and without parallel
> bring-up on the 32 vCPU VM.   Without parallel, the time is about 26
> milliseconds.  With parallel, it's about 24 ms.   So bring-up is already
> fast in the virtual environment.

Depends on the environment :)

> The cosmetic issue is in the dmesg log, and arises because Hyper-V
> enumerates SMT CPUs differently from many other environments.  In
> a Hyper-V guest, the SMT threads in a core are numbered as 
> pairs.  Guest CPUs #0 & #1 are SMT threads in core, as are #2 & #3, etc.  With
> parallel bring-up, here's the dmesg output:
>
> [0.444345] smp: Bringing up secondary CPUs ...
> [0.445139]  node  #0, CPUs:#2  #4  #6  #8 #10 #12 #14 #16 #18 #20 
> #22 #24 #26 #28 #30
> [0.454112] x86: Booting SMP configuration:
> [0.456035]   #1  #3  #5  #7  #9 #11 #13 #15 #17 #19 #21 #23 #25 #27 
> #29 #31
> [0.466120] smp: Brought up 1 node, 32 CPUs
> [0.467036] smpboot: Max logical packages: 1
> [0.468035] smpboot: Total of 32 processors activated (153240.06 BogoMIPS)
>
> The function announce_cpu() is specifically testing for CPU #1 to output the
> "Booting SMP configuration" message.  In a Hyper-V guest, CPU #1 is the second
> SMT thread in a core, so it isn't started until all the even-numbered CPUs are
> started.

Ah. Didn't notice that because SMT siblings are usually enumerated after
all primary ones in ACPI.

> I don't know if this cosmetic issue is worth fixing, but I thought I'd point 
> it out.

That's trivial enough to fix. I'll amend the topmost patch before
posting V2.

Thanks for giving it a ride!

   tglx



RE: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-27 Thread Michael Kelley (LINUX)
From: Thomas Gleixner  Sent: Friday, April 14, 2023 4:44 PM

[snip]

> 
> Conclusion
> --
> 
> Adding the basic parallel bringup mechanism as provided by this series
> makes a lot of sense. Improving particular issues as pointed out in the
> analysis makes sense too.
> 
> But trying to solve an application specific problem fully in the kernel
> with tons of complexity, without exploring straight forward and simple
> approaches first, does not make any sense at all.
> 
> Thanks,
> 
>   tglx
> 
> ---
>  Documentation/admin-guide/kernel-parameters.txt |   20
>  Documentation/core-api/cpu_hotplug.rst  |   13
>  arch/Kconfig|   23 +
>  arch/arm/Kconfig|1
>  arch/arm/include/asm/smp.h  |2
>  arch/arm/kernel/smp.c   |   18
>  arch/arm64/Kconfig  |1
>  arch/arm64/include/asm/smp.h|2
>  arch/arm64/kernel/smp.c |   14
>  arch/csky/Kconfig   |1
>  arch/csky/include/asm/smp.h |2
>  arch/csky/kernel/smp.c  |8
>  arch/mips/Kconfig   |1
>  arch/mips/cavium-octeon/smp.c   |1
>  arch/mips/include/asm/smp-ops.h |1
>  arch/mips/kernel/smp-bmips.c|1
>  arch/mips/kernel/smp-cps.c  |   14
>  arch/mips/kernel/smp.c  |8
>  arch/mips/loongson64/smp.c  |1
>  arch/parisc/Kconfig |1
>  arch/parisc/kernel/process.c|4
>  arch/parisc/kernel/smp.c|7
>  arch/riscv/Kconfig  |1
>  arch/riscv/include/asm/smp.h|2
>  arch/riscv/kernel/cpu-hotplug.c |   14
>  arch/x86/Kconfig|   45 --
>  arch/x86/include/asm/apic.h |5
>  arch/x86/include/asm/cpu.h  |5
>  arch/x86/include/asm/cpumask.h  |5
>  arch/x86/include/asm/processor.h|1
>  arch/x86/include/asm/realmode.h |3
>  arch/x86/include/asm/sev-common.h   |3
>  arch/x86/include/asm/smp.h  |   26 -
>  arch/x86/include/asm/topology.h |   23 -
>  arch/x86/include/asm/tsc.h  |2
>  arch/x86/kernel/acpi/sleep.c|9
>  arch/x86/kernel/apic/apic.c |   22 -
>  arch/x86/kernel/callthunks.c|4
>  arch/x86/kernel/cpu/amd.c   |2
>  arch/x86/kernel/cpu/cacheinfo.c |   21
>  arch/x86/kernel/cpu/common.c|   50 --
>  arch/x86/kernel/cpu/topology.c  |3
>  arch/x86/kernel/head_32.S   |   14
>  arch/x86/kernel/head_64.S   |  121 +
>  arch/x86/kernel/sev.c   |2
>  arch/x86/kernel/smp.c   |3
>  arch/x86/kernel/smpboot.c   |  508 
> 
>  arch/x86/kernel/topology.c  |   98 
>  arch/x86/kernel/tsc.c   |   20
>  arch/x86/kernel/tsc_sync.c  |   36 -
>  arch/x86/power/cpu.c|   37 -
>  arch/x86/realmode/init.c|3
>  arch/x86/realmode/rm/trampoline_64.S|   27 +
>  arch/x86/xen/enlighten_hvm.c|   11
>  arch/x86/xen/smp_hvm.c  |   16
>  arch/x86/xen/smp_pv.c   |   56 +-
>  drivers/acpi/processor_idle.c   |4
>  include/linux/cpu.h |4
>  include/linux/cpuhotplug.h  |   17
>  kernel/cpu.c|  397 +-
>  kernel/smp.c|2
>  kernel/smpboot.c|  163 ---
>  62 files changed, 953 insertions(+), 976 deletions(-)
> 

I smoke-tested several Linux guest configurations running on Hyper-V,
using the "kernel/git/tglx/devel.git hotplug" tree as updated on April 26th.
No functional issues, but encountered one cosmetic issue (details below).

Configurations tested:
*  16 vCPUs and 32 vCPUs
*  1 NUMA node and 2 NUMA nodes
*  Parallel bring-up enabled and disabled via kernel boot line
*  "Normal" VMs and SEV-SNP VMs running with a paravisor on Hyper-V.
This config can use parallel bring-up because most of the SNP-ness is
hidden in the paravisor.  I was glad to see this work properly.

There's not much difference in performance with and without parallel
bring-up on the 32 vCPU VM.   Without parallel, the time is about 26
milliseconds.  With 

Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-25 Thread Thomas Gleixner
On Thu, Apr 20 2023 at 17:57, Thomas Gleixner wrote:
> On Thu, Apr 20 2023 at 07:51, Sean Christopherson wrote:
> Something like the completely untested below should just work whatever
> APIC ID the BIOS decided to dice.
>
> That might just work on SEV too without that GHCB muck, but what do I
> know.

It does not.

RDMSR(X2APIC_ID) is trapped via #VC which cannot be handled at that
point. Unfortunately the GHCB protocol does not provide a RDMSR
mechanism similar to the CPUID mechanism. Neither does the secure
firmware enforce CPUID(0xb):APICID to real APIC ID consistency.

So the hypervisor can dice the APIC IDs as long as they are consistent
with the provided ACPI/MADT table.

So no parallel startup for SEV for now.

Thanks,

tglx



Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-24 Thread Paul Menzel

Dear Thomas,


Am 20.04.23 um 21:10 schrieb Thomas Gleixner:

On Thu, Apr 20 2023 at 18:47, Paul Menzel wrote:

Am 20.04.23 um 17:57 schrieb Thomas Gleixner:
I quickly applied it on top of your branch, but I am getting:


As I said it was untested. I was traveling and did not have access to a
machine to even build it completely. Fixed up and tested version below.


Sorry, if it sounded like a complaint. I just wanted to give a quick 
feedback.


[…]

I tested your new version even on Friday, and it worked fine – no ten 
seconds delay. Please find the messages attached.


Thank you all for your great work.


Kind regards,

Paul


PS: I am going to try to test your updated branch at the end of the week.[0.00] Linux version 6.3.0-rc3-00046-g8ba643d7e1c7 (root@bf16f3646a84) 
(gcc (Debian 11.2.0-12) 11.2.0, GNU ld (GNU Binutils for Debian) 2.40) #452 SMP 
PREEMPT_DYNAMIC Thu Apr 20 20:15:01 UTC 2023
[0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-6.3.0-rc3-00046-g8ba643d7e1c7 root=/dev/sda3 rw quiet 
noisapnp cryptomgr.notests ipv6.disable_ipv6=1 selinux=0
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'standard' format.
[0.00] signal: max sigframe size: 1776
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x5fe4cfff] usable
[0.00] BIOS-e820: [mem 0x5fe4d000-0x7fff] reserved
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00017eff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 3.0.0 present.
[0.00] DMI: ASUS F2A85-M_PRO/F2A85-M_PRO, BIOS 4.18-9-gb640ed51b2 
04/17/2023
[0.00] tsc: Fast TSC calibration using PIT
[0.00] tsc: Detected 3900.440 MHz processor
[0.000756] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.000759] e820: remove [mem 0x000a-0x000f] usable
[0.000763] last_pfn = 0x17f000 max_arch_pfn = 0x4
[0.000768] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[0.000938] last_pfn = 0x5fe4d max_arch_pfn = 0x4
[0.004000] Using GB pages for direct mapping
[0.004000] ACPI: Early table checksum verification disabled
[0.004000] ACPI: RSDP 0x000F6830 24 (v02 COREv4)
[0.004000] ACPI: XSDT 0x5FE5A0E0 74 (v01 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: FACP 0x5FE5BBC0 000114 (v06 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: DSDT 0x5FE5A280 00193A (v02 COREv4 COREBOOT 
00010001 INTL 20200925)
[0.004000] ACPI: FACS 0x5FE5A240 40
[0.004000] ACPI: FACS 0x5FE5A240 40
[0.004000] ACPI: SSDT 0x5FE5BCE0 8A (v02 COREv4 COREBOOT 
002A CORE 20200925)
[0.004000] ACPI: MCFG 0x5FE5BD70 3C (v01 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: APIC 0x5FE5BDB0 62 (v03 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: HPET 0x5FE5BE20 38 (v01 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: HEST 0x5FE5BE60 0001D0 (v01 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: IVRS 0x5FE5C030 70 (v02 AMDAMDIOMMU 
0001 AMD  )
[0.004000] ACPI: SSDT 0x5FE5C0A0 00051F (v02 AMDALIB 
0001 MSFT 0400)
[0.004000] ACPI: SSDT 0x5FE5C5C0 0006B2 (v01 AMDPOWERNOW 
0001 AMD  0001)
[0.004000] ACPI: VFCT 0x5FE5CC80 00F269 (v01 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: Reserving FACP table memory at [mem 0x5fe5bbc0-0x5fe5bcd3]
[0.004000] ACPI: Reserving DSDT table memory at [mem 0x5fe5a280-0x5fe5bbb9]
[0.004000] ACPI: Reserving FACS table memory at [mem 0x5fe5a240-0x5fe5a27f]
[0.004000] ACPI: Reserving FACS table memory at [mem 0x5fe5a240-0x5fe5a27f]
[0.004000] ACPI: Reserving SSDT table memory at [mem 0x5fe5bce0-0x5fe5bd69]
[0.004000] ACPI: Reserving MCFG table memory at [mem 0x5fe5bd70-0x5fe5bdab]
[0.004000] ACPI: Reserving APIC table memory at [mem 0x5fe5bdb0-0x5fe5be11]
[0.004000] ACPI: Reserving HPET table memory at [mem 0x5fe5be20-0x5fe5be57]
[0.004000] ACPI: Reserving HEST table memory at [mem 

Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-21 Thread Thomas Gleixner
On Thu, Apr 20 2023 at 21:10, Thomas Gleixner wrote:
> On Thu, Apr 20 2023 at 18:47, Paul Menzel wrote:
>> Am 20.04.23 um 17:57 schrieb Thomas Gleixner:
>> I quickly applied it on top of your branch, but I am getting:
>
> As I said it was untested. I was traveling and did not have access to a
> machine to even build it completely. Fixed up and tested version below.

I've updated

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git hotplug

for your conveniance.

Thanks,

tglx



Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-20 Thread Thomas Gleixner
On Thu, Apr 20 2023 at 18:47, Paul Menzel wrote:
> Am 20.04.23 um 17:57 schrieb Thomas Gleixner:
> I quickly applied it on top of your branch, but I am getting:

As I said it was untested. I was traveling and did not have access to a
machine to even build it completely. Fixed up and tested version below.

Thanks,

tglx
---
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -138,7 +138,8 @@
 #defineAPIC_EILVT_MASKED   (1 << 16)
 
 #define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
-#define APIC_BASE_MSR  0x800
+#define APIC_BASE_MSR  0x800
+#define APIC_X2APIC_ID_MSR 0x802
 #define XAPIC_ENABLE   (1UL << 11)
 #define X2APIC_ENABLE  (1UL << 10)
 
@@ -162,6 +163,7 @@
 #define APIC_CPUID(apicid) ((apicid) & XAPIC_DEST_CPUS_MASK)
 #define NUM_APIC_CLUSTERS  ((BAD_APICID + 1) >> XAPIC_DEST_CPUS_SHIFT)
 
+#ifndef __ASSEMBLY__
 /*
  * the local APIC register structure, memory mapped. Not terribly well
  * tested, but we might eventually use this one in the future - the
@@ -435,4 +437,5 @@ enum apic_delivery_modes {
APIC_DELIVERY_MODE_EXTINT   = 7,
 };
 
+#endif /* !__ASSEMBLY__ */
 #endif /* _ASM_X86_APICDEF_H */
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -195,14 +195,13 @@ extern void nmi_selftest(void);
 #endif
 
 extern unsigned int smpboot_control;
+extern unsigned long apic_mmio_base;
 
 #endif /* !__ASSEMBLY__ */
 
 /* Control bits for startup_64 */
-#define STARTUP_APICID_CPUID_1F 0x8000
-#define STARTUP_APICID_CPUID_0B 0x4000
-#define STARTUP_APICID_CPUID_01 0x2000
-#define STARTUP_APICID_SEV_ES  0x1000
+#define STARTUP_READ_APICID0x8000
+#define STARTUP_APICID_SEV_ES  0x4000
 
 /* Top 8 bits are reserved for control */
 #define STARTUP_PARALLEL_MASK  0xFF00
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -101,6 +101,8 @@ static int apic_extnmi __ro_after_init =
  */
 static bool virt_ext_dest_id __ro_after_init;
 
+unsigned long apic_mmio_base __ro_after_init;
+
 /*
  * Map cpu index to physical APIC ID
  */
@@ -2164,6 +2166,7 @@ void __init register_lapic_address(unsig
 
if (!x2apic_mode) {
set_fixmap_nocache(FIX_APIC_BASE, address);
+   apic_mmio_base = APIC_BASE;
apic_printk(APIC_VERBOSE, "mapped APIC to %16lx (%16lx)\n",
APIC_BASE, address);
}
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -24,8 +24,10 @@
 #include "../entry/calling.h"
 #include 
 #include 
+#include 
 #include 
 #include 
+
 #include 
 
 /*
@@ -237,37 +239,25 @@ SYM_INNER_LABEL(secondary_startup_64_no_
 
 #ifdef CONFIG_SMP
/*
-* For parallel boot, the APIC ID is retrieved from CPUID, and then
-* used to look up the CPU number.  For booting a single CPU, the
-* CPU number is encoded in smpboot_control.
+* For parallel boot, the APIC ID is either retrieved the APIC or
+* from CPUID, and then used to look up the CPU number.
+* For booting a single CPU, the CPU number is encoded in
+* smpboot_control.
 *
-* Bit 31   STARTUP_APICID_CPUID_1F flag (use CPUID 0x1f)
-* Bit 30   STARTUP_APICID_CPUID_0B flag (use CPUID 0x0b)
-* Bit 29   STARTUP_APICID_CPUID_01 flag (use CPUID 0x01)
-* Bit 28   STARTUP_APICID_SEV_ES flag (CPUID 0x0b via GHCB MSR)
+* Bit 31   STARTUP_APICID_READ (Read APICID from APIC)
+* Bit 30   STARTUP_APICID_SEV_ES flag (CPUID 0x0b via GHCB MSR)
 * Bit 0-23 CPU# if STARTUP_APICID_CPUID_xx flags are not set
 */
movlsmpboot_control(%rip), %ecx
+   testl   $STARTUP_READ_APICID, %ecx
+   jnz .Lread_apicid
 #ifdef CONFIG_AMD_MEM_ENCRYPT
testl   $STARTUP_APICID_SEV_ES, %ecx
jnz .Luse_sev_cpuid_0b
 #endif
-   testl   $STARTUP_APICID_CPUID_1F, %ecx
-   jnz .Luse_cpuid_1f
-   testl   $STARTUP_APICID_CPUID_0B, %ecx
-   jnz .Luse_cpuid_0b
-   testl   $STARTUP_APICID_CPUID_01, %ecx
-   jnz .Luse_cpuid_01
andl$(~STARTUP_PARALLEL_MASK), %ecx
jmp .Lsetup_cpu
 
-.Luse_cpuid_01:
-   mov $0x01, %eax
-   cpuid
-   mov %ebx, %edx
-   shr $24, %edx
-   jmp .Lsetup_AP
-
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 .Luse_sev_cpuid_0b:
/* Set the GHCB MSR to request CPUID 0x0B_EDX */
@@ -292,24 +282,30 @@ SYM_INNER_LABEL(secondary_startup_64_no_
jmp .Lsetup_AP
 #endif
 
-.Luse_cpuid_0b:
-   mov $0x0B, %eax
-   xorl%ecx, %ecx
-   cpuid
+.Lread_apicid:
+   mov $MSR_IA32_APICBASE, %ecx
+   rdmsr
+   testl   $X2APIC_ENABLE, %eax
+   jnz .Lread_apicid_msr
+
+   /* Read the APIC ID from the fix-mapped MMIO space. */
+   movqapic_mmio_base(%rip), %rcx
+   addq$APIC_ID, %rcx
+   movl(%rcx), %eax
+   shr 

Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-20 Thread Paul Menzel

Dear Thomas,


Am 20.04.23 um 17:57 schrieb Thomas Gleixner:

On Thu, Apr 20 2023 at 07:51, Sean Christopherson wrote:

On Thu, Apr 20, 2023, Thomas Gleixner wrote:

On Thu, Apr 20 2023 at 10:23, Andrew Cooper wrote:

On 20/04/2023 9:32 am, Thomas Gleixner wrote:

On Wed, Apr 19, 2023, Andrew Cooper wrote:

This was changed in x2APIC, which made the x2APIC_ID immutable.



I'm pondering to simply deny parallel mode if x2APIC is not there.


I'm not sure if that will help much.


Spoilsport.


LOL, well let me pile on then.  x2APIC IDs aren't immutable on AMD hardware.  
The
ID is read-only when the CPU is in x2APIC mode, but any changes made to the ID
while the CPU is in xAPIC mode survive the transition to x2APIC.  From the APM:

   A value previously written by software to the 8-bit APIC_ID register (MMIO 
offset
   30h) is converted by hardware into the appropriate format and reflected into 
the
   32-bit x2APIC_ID register (MSR 802h).

FWIW, my observations from testing on bare metal are that the xAPIC ID is 
effectively
read-only (writes are dropped) on Intel CPUs as far back as Haswell, while the 
above
behavior described in the APM holds true on at least Rome and Milan.

My guess is that Intel's uArch specific behavior of the xAPIC ID being read-only
was introduced when x2APIC came along, but I didn't test farther back than 
Haswell.


I'm not so worried about modern hardware. The horrorshow is the old muck
as demonstrated and of course there is virt :)

Something like the completely untested below should just work whatever
APIC ID the BIOS decided to dice.

That might just work on SEV too without that GHCB muck, but what do I
know.

Thanks,

 tglx
---
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -138,7 +138,8 @@
  #define   APIC_EILVT_MASKED   (1 << 16)
  
  #define APIC_BASE (fix_to_virt(FIX_APIC_BASE))

-#define APIC_BASE_MSR  0x800
+#define APIC_BASE_MSR  0x800
+#define APIC_X2APIC_ID_MSR 0x802
  #define XAPIC_ENABLE  (1UL << 11)
  #define X2APIC_ENABLE (1UL << 10)
  
@@ -162,6 +163,7 @@

  #define APIC_CPUID(apicid)((apicid) & XAPIC_DEST_CPUS_MASK)
  #define NUM_APIC_CLUSTERS ((BAD_APICID + 1) >> XAPIC_DEST_CPUS_SHIFT)
  
+#ifndef __ASSEMBLY__

  /*
   * the local APIC register structure, memory mapped. Not terribly well
   * tested, but we might eventually use this one in the future - the
@@ -435,4 +437,5 @@ enum apic_delivery_modes {
APIC_DELIVERY_MODE_EXTINT   = 7,
  };
  
+#endif /* !__ASSEMBLY__ */

  #endif /* _ASM_X86_APICDEF_H */
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -195,14 +195,13 @@ extern void nmi_selftest(void);
  #endif
  
  extern unsigned int smpboot_control;

+extern unsigned long apic_mmio_base;
  
  #endif /* !__ASSEMBLY__ */
  
  /* Control bits for startup_64 */

-#define STARTUP_APICID_CPUID_1F 0x8000
-#define STARTUP_APICID_CPUID_0B 0x4000
-#define STARTUP_APICID_CPUID_01 0x2000
-#define STARTUP_APICID_SEV_ES  0x1000
+#define STARTUP_READ_APICID0x8000
+#define STARTUP_APICID_SEV_ES  0x4000
  
  /* Top 8 bits are reserved for control */

  #define STARTUP_PARALLEL_MASK 0xFF00
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -101,6 +101,8 @@ static int apic_extnmi __ro_after_init =
   */
  static bool virt_ext_dest_id __ro_after_init;
  
+unsigned long apic_mmio_base __ro_after_init;

+
  /*
   * Map cpu index to physical APIC ID
   */
@@ -2164,6 +2166,7 @@ void __init register_lapic_address(unsig
  
  	if (!x2apic_mode) {

set_fixmap_nocache(FIX_APIC_BASE, address);
+   apic_mmio_base = APIC_BASE;
apic_printk(APIC_VERBOSE, "mapped APIC to %16lx (%16lx)\n",
APIC_BASE, address);
}
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -24,8 +24,10 @@
  #include "../entry/calling.h"
  #include 
  #include 
+#include 
  #include 
  #include 
+
  #include 
  
  /*

@@ -237,37 +239,24 @@ SYM_INNER_LABEL(secondary_startup_64_no_
  
  #ifdef CONFIG_SMP

/*
-* For parallel boot, the APIC ID is retrieved from CPUID, and then
-* used to look up the CPU number.  For booting a single CPU, the
-* CPU number is encoded in smpboot_control.
+* For parallel boot, the APIC ID is either retrieved the APIC or
+* from CPUID, and then used to look up the CPU number.
+* For booting a single CPU, the CPU number is encoded in
+* smpboot_control.
 *
-* Bit 31   STARTUP_APICID_CPUID_1F flag (use CPUID 0x1f)
-* Bit 30   STARTUP_APICID_CPUID_0B flag (use CPUID 0x0b)
-* Bit 29   STARTUP_APICID_CPUID_01 flag (use CPUID 0x01)
-* Bit 28   STARTUP_APICID_SEV_ES flag (CPUID 0x0b via GHCB MSR)
+* Bit 31   STARTUP_APICID_READ (Read APICID from APIC)
+* Bit 30   STARTUP_APICID_SEV_ES flag (CPUID 0x0b via 

Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-20 Thread Thomas Gleixner
On Thu, Apr 20 2023 at 07:51, Sean Christopherson wrote:
> On Thu, Apr 20, 2023, Thomas Gleixner wrote:
>> On Thu, Apr 20 2023 at 10:23, Andrew Cooper wrote:
>> > On 20/04/2023 9:32 am, Thomas Gleixner wrote:
>> > > On Wed, Apr 19, 2023, Andrew Cooper wrote:
>> > > > This was changed in x2APIC, which made the x2APIC_ID immutable.
>>
>> >> I'm pondering to simply deny parallel mode if x2APIC is not there.
>> >
>> > I'm not sure if that will help much.
>> 
>> Spoilsport.
>
> LOL, well let me pile on then.  x2APIC IDs aren't immutable on AMD hardware.  
> The
> ID is read-only when the CPU is in x2APIC mode, but any changes made to the ID
> while the CPU is in xAPIC mode survive the transition to x2APIC.  From the 
> APM:
>
>   A value previously written by software to the 8-bit APIC_ID register (MMIO 
> offset
>   30h) is converted by hardware into the appropriate format and reflected 
> into the
>   32-bit x2APIC_ID register (MSR 802h).
>
> FWIW, my observations from testing on bare metal are that the xAPIC ID is 
> effectively
> read-only (writes are dropped) on Intel CPUs as far back as Haswell, while 
> the above
> behavior described in the APM holds true on at least Rome and Milan.
>
> My guess is that Intel's uArch specific behavior of the xAPIC ID being 
> read-only
> was introduced when x2APIC came along, but I didn't test farther back than 
> Haswell.

I'm not so worried about modern hardware. The horrorshow is the old muck
as demonstrated and of course there is virt :)

Something like the completely untested below should just work whatever
APIC ID the BIOS decided to dice.

That might just work on SEV too without that GHCB muck, but what do I
know.

Thanks,

tglx
---
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -138,7 +138,8 @@
 #defineAPIC_EILVT_MASKED   (1 << 16)
 
 #define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
-#define APIC_BASE_MSR  0x800
+#define APIC_BASE_MSR  0x800
+#define APIC_X2APIC_ID_MSR 0x802
 #define XAPIC_ENABLE   (1UL << 11)
 #define X2APIC_ENABLE  (1UL << 10)
 
@@ -162,6 +163,7 @@
 #define APIC_CPUID(apicid) ((apicid) & XAPIC_DEST_CPUS_MASK)
 #define NUM_APIC_CLUSTERS  ((BAD_APICID + 1) >> XAPIC_DEST_CPUS_SHIFT)
 
+#ifndef __ASSEMBLY__
 /*
  * the local APIC register structure, memory mapped. Not terribly well
  * tested, but we might eventually use this one in the future - the
@@ -435,4 +437,5 @@ enum apic_delivery_modes {
APIC_DELIVERY_MODE_EXTINT   = 7,
 };
 
+#endif /* !__ASSEMBLY__ */
 #endif /* _ASM_X86_APICDEF_H */
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -195,14 +195,13 @@ extern void nmi_selftest(void);
 #endif
 
 extern unsigned int smpboot_control;
+extern unsigned long apic_mmio_base;
 
 #endif /* !__ASSEMBLY__ */
 
 /* Control bits for startup_64 */
-#define STARTUP_APICID_CPUID_1F 0x8000
-#define STARTUP_APICID_CPUID_0B 0x4000
-#define STARTUP_APICID_CPUID_01 0x2000
-#define STARTUP_APICID_SEV_ES  0x1000
+#define STARTUP_READ_APICID0x8000
+#define STARTUP_APICID_SEV_ES  0x4000
 
 /* Top 8 bits are reserved for control */
 #define STARTUP_PARALLEL_MASK  0xFF00
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -101,6 +101,8 @@ static int apic_extnmi __ro_after_init =
  */
 static bool virt_ext_dest_id __ro_after_init;
 
+unsigned long apic_mmio_base __ro_after_init;
+
 /*
  * Map cpu index to physical APIC ID
  */
@@ -2164,6 +2166,7 @@ void __init register_lapic_address(unsig
 
if (!x2apic_mode) {
set_fixmap_nocache(FIX_APIC_BASE, address);
+   apic_mmio_base = APIC_BASE;
apic_printk(APIC_VERBOSE, "mapped APIC to %16lx (%16lx)\n",
APIC_BASE, address);
}
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -24,8 +24,10 @@
 #include "../entry/calling.h"
 #include 
 #include 
+#include 
 #include 
 #include 
+
 #include 
 
 /*
@@ -237,37 +239,24 @@ SYM_INNER_LABEL(secondary_startup_64_no_
 
 #ifdef CONFIG_SMP
/*
-* For parallel boot, the APIC ID is retrieved from CPUID, and then
-* used to look up the CPU number.  For booting a single CPU, the
-* CPU number is encoded in smpboot_control.
+* For parallel boot, the APIC ID is either retrieved the APIC or
+* from CPUID, and then used to look up the CPU number.
+* For booting a single CPU, the CPU number is encoded in
+* smpboot_control.
 *
-* Bit 31   STARTUP_APICID_CPUID_1F flag (use CPUID 0x1f)
-* Bit 30   STARTUP_APICID_CPUID_0B flag (use CPUID 0x0b)
-* Bit 29   STARTUP_APICID_CPUID_01 flag (use CPUID 0x01)
-* Bit 28   STARTUP_APICID_SEV_ES flag (CPUID 0x0b via GHCB MSR)
+* Bit 31   STARTUP_APICID_READ (Read APICID from APIC)
+* Bit 30   STARTUP_APICID_SEV_ES flag (CPUID 0x0b via GHCB MSR)
 * Bit 0-23  

Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-20 Thread Sean Christopherson
On Thu, Apr 20, 2023, Thomas Gleixner wrote:
> On Thu, Apr 20 2023 at 10:23, Andrew Cooper wrote:
> > On 20/04/2023 9:32 am, Thomas Gleixner wrote:
> > > On Wed, Apr 19, 2023, Andrew Cooper wrote:
> > > > This was changed in x2APIC, which made the x2APIC_ID immutable.
>
> >> I'm pondering to simply deny parallel mode if x2APIC is not there.
> >
> > I'm not sure if that will help much.
> 
> Spoilsport.

LOL, well let me pile on then.  x2APIC IDs aren't immutable on AMD hardware.  
The
ID is read-only when the CPU is in x2APIC mode, but any changes made to the ID
while the CPU is in xAPIC mode survive the transition to x2APIC.  From the APM:

  A value previously written by software to the 8-bit APIC_ID register (MMIO 
offset
  30h) is converted by hardware into the appropriate format and reflected into 
the
  32-bit x2APIC_ID register (MSR 802h).

FWIW, my observations from testing on bare metal are that the xAPIC ID is 
effectively
read-only (writes are dropped) on Intel CPUs as far back as Haswell, while the 
above
behavior described in the APM holds true on at least Rome and Milan.

My guess is that Intel's uArch specific behavior of the xAPIC ID being read-only
was introduced when x2APIC came along, but I didn't test farther back than 
Haswell.



Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-20 Thread Thomas Gleixner
On Thu, Apr 20 2023 at 10:23, Andrew Cooper wrote:
> On 20/04/2023 9:32 am, Thomas Gleixner wrote:
>> I'm pondering to simply deny parallel mode if x2APIC is not there.
>
> I'm not sure if that will help much.

Spoilsport.

> Just because x2APIC is there doesn't mean it's in use.  There are
> several generations of Intel system which have x2APIC but also use the
> opt-out bit in ACPI tables.  There are some machines which have
> mismatched APIC-ness settings in the BIOS->OS handover.
>
> There's very little you can do on the BSP alone to know for certain that
> the APs come out of wait-for-SIPI already in x2APIC mode.

Yeah. Reading the APIC that early is going to be entertaining too :)

> One way is the ÆPIC Leak "locked into x2APIC mode" giant security
> bodge. 

Bah.

> If the system really does have a CPU with an APIC ID above 0xfe, then
> chances are good that the APs come out consistently...

Anything else would be really magic :)



Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-20 Thread Andrew Cooper
On 20/04/2023 9:32 am, Thomas Gleixner wrote:
> On Wed, Apr 19 2023 at 17:21, Andrew Cooper wrote:
>> On 19/04/2023 2:50 pm, Andrew Cooper wrote:
>> For xAPIC, the APIC_ID register is writeable (at least, model
>> specifically), and CPUID is only the value it would have had at reset. 
>> So the AP bringup logic can't actually use CPUID reliably.
>>
>> This was changed in x2APIC, which made the x2APIC_ID immutable.
>>
>> I don't see an option other than the AP bringup code query for xAPIC vs
>> x2APIC mode, and either looking at the real APIC_ID register, or falling
>> back to CPUID.
> I'm pondering to simply deny parallel mode if x2APIC is not there.

I'm not sure if that will help much.

Just because x2APIC is there doesn't mean it's in use.  There are
several generations of Intel system which have x2APIC but also use the
opt-out bit in ACPI tables.  There are some machines which have
mismatched APIC-ness settings in the BIOS->OS handover.

There's very little you can do on the BSP alone to know for certain that
the APs come out of wait-for-SIPI already in x2APIC mode.

One way is the ÆPIC Leak "locked into x2APIC mode" giant security
bodge.  If the system really does have a CPU with an APIC ID above 0xfe,
then chances are good that the APs come out consistently...

~Andrew



Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-20 Thread Thomas Gleixner
On Wed, Apr 19 2023 at 17:21, Andrew Cooper wrote:
> On 19/04/2023 2:50 pm, Andrew Cooper wrote:
>> What I'm confused by is why this system boots in the first place.  I can
>> only think that's is a system which only has 4-bit APIC IDs, and happens
>> to function when bit 4 gets truncated off the top of the SIPI destination...
>
> https://www.amd.com/system/files/TechDocs/42300_15h_Mod_10h-1Fh_BKDG.pdf
>
> This system does still require the IO-APICs to be at 0, and the LAPICs
> to start at some offset, which is clearly 16 in this case.  Also, this
> system has configurable 4-bit or 8-bit wide APIC IDs, and I can't tell
> which mode is active just from the manual.

That document contradicts itself:

  "The ApicId of core j must be enumerated/assigned as:
   ApicId[core=j] = (OFFSET_IDX) * MNC + j

   Where OFFSET_IDX is an integer offset (0 to N) used to shift up the
   core ApicId values to allow room for IOAPIC devices.

   It is recommended that BIOS use the following APIC ID assignments for
   the broadest operating system sup- port. Given N = MNC and M =
   Number_Of_IOAPICs:

   • Assign the core ApicId’s first from 0 to N-1, and the IOAPIC IDs
 from N to N+(M-1)."

Oh well. If the rest of these docs is of the same quality then it's not
a surprise that BIOSes are trainwrecks.

> But, it does mean that the BIOS has genuinely modified the APIC IDs of
> the logic processors.  This does highlight an error in reasoning with
> the parallel bringup code.

Yes.

> For xAPIC, the APIC_ID register is writeable (at least, model
> specifically), and CPUID is only the value it would have had at reset. 
> So the AP bringup logic can't actually use CPUID reliably.
>
> This was changed in x2APIC, which made the x2APIC_ID immutable.
>
> I don't see an option other than the AP bringup code query for xAPIC vs
> x2APIC mode, and either looking at the real APIC_ID register, or falling
> back to CPUID.

I'm pondering to simply deny parallel mode if x2APIC is not there.

Thanks,

tglx



Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-19 Thread Paul Menzel

Dear Thomas,


Am 19.04.23 um 14:38 schrieb Thomas Gleixner:

On Wed, Apr 19 2023 at 11:38, Thomas Gleixner wrote:

On Tue, Apr 18 2023 at 22:10, Paul Menzel wrote:

Am 18.04.23 um 10:40 schrieb Thomas Gleixner:

Can you please provide the output of cpuid?


Of course. Here the top, and the whole output is attached.


Thanks for the data. Can you please apply the debug patch below and
provide the dmesg output? Just the line which is added by the patch is
enough. You can boot with cpuhp.parallel=off so you don't have wait for
10 seconds.


Borislav found some a machine which also refuses to boot. It turns of
the debug patch was spot on:

[0.462724]  node  #0, CPUs:  #1
[0.462731] smpboot: Kicking AP alive: 17
[0.465723]  #2
[0.465732] smpboot: Kicking AP alive: 18
[0.467641]  #3
[0.467641] smpboot: Kicking AP alive: 19

So the kernel gets APICID 17, 18, 19 from ACPI but CPUID leaf 0x1
ebx[31:24], which is the initial APICID has:

CPU10x01
CPU20x02
CPU30x03

Which means the APICID to Linux CPU number lookup based on CPUID 0x01
fails for all of them and stops them dead in the low level startup code.


I am attaching the logs for completeness. Linux is build from your 
branch with the debug print on top. The firmware, coreboot based, is 
built from [1], but it also happened non-parallel MP init. The code has 
better debug prints (attached) though as far as I can see. As Borislav 
is able to reproduce this too with some non-coreboot firmware, I assume 
it’s unrelated to coreboot.


```
[0.259247] smp: Bringing up secondary CPUs ...
[0.259446] x86: Booting SMP configuration:
[0.259448]  node  #0, CPUs:  #1
[0.259453] smpboot: Kicking AP alive: 17
[   10.260918] CPU1 failed to report alive state
[   10.260998] smp: Brought up 1 node, 1 CPU
[   10.261000] smpboot: Max logical packages: 2
[   10.261001] smpboot: Total of 1 processors activated (7801.09 BogoMIPS)
```


IOW, the BIOS assignes random numbers to the AP APICs for whatever
raisins, which leaves the parallel startup low level code up a creek
without a paddle, except for actually reading the APICID back from the
APIC. *SHUDDER*

I'm leaning towards disabling the CPUID lead 0x01 based discovery and be
done with it.



Kind regards,

Paul


[1]: https://review.coreboot.org/68169[0.00] Linux version 6.3.0-rc3-00045-g64de4df9c80b (root@bf16f3646a84) 
(gcc (Debian 11.2.0-12) 11.2.0, GNU ld (GNU Binutils for Debian) 2.40) #449 SMP 
PREEMPT_DYNAMIC Wed Apr 19 16:13:54 UTC 2023
[0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-6.3.0-rc3-00045-g64de4df9c80b root=/dev/sda3 rw quiet 
noisapnp cryptomgr.notests ipv6.disable_ipv6=1 selinux=0
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'standard' format.
[0.00] signal: max sigframe size: 1776
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x5fe3cfff] usable
[0.00] BIOS-e820: [mem 0x5fe3d000-0x7fff] reserved
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00017eff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 3.0.0 present.
[0.00] DMI: ASUS F2A85-M_PRO/F2A85-M_PRO, BIOS 4.18-15-gc782ef4345 
04/19/2023
[0.00] tsc: Fast TSC calibration using PIT
[0.00] tsc: Detected 3900.549 MHz processor
[0.000756] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.000759] e820: remove [mem 0x000a-0x000f] usable
[0.000763] last_pfn = 0x17f000 max_arch_pfn = 0x4
[0.000768] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[0.000940] last_pfn = 0x5fe3d max_arch_pfn = 0x4
[0.004000] Using GB pages for direct mapping
[0.004000] ACPI: Early table checksum verification disabled
[0.004000] ACPI: RSDP 0x000F6830 24 (v02 COREv4)
[0.004000] ACPI: XSDT 0x5FE4A0E0 74 (v01 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: FACP 0x5FE4BBC0 000114 (v06 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: DSDT 0x5FE4A280 00193A (v02 COREv4 COREBOOT 
00010001 INTL 20200925)
[0.004000] ACPI: FACS 0x5FE4A240 

Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-19 Thread Andrew Cooper
On 19/04/2023 2:50 pm, Andrew Cooper wrote:
> On 19/04/2023 2:43 pm, Thomas Gleixner wrote:
>> On Wed, Apr 19 2023 at 14:38, Thomas Gleixner wrote:
>>> On Wed, Apr 19 2023 at 11:38, Thomas Gleixner wrote:
>>> IOW, the BIOS assignes random numbers to the AP APICs for whatever
>>> raisins, which leaves the parallel startup low level code up a creek
>>> without a paddle, except for actually reading the APICID back from the
>>> APIC. *SHUDDER*
>> So Andrew just pointed out on IRC that this might be related to the
>> ancient issue of the 3-wire APIC bus where IO/APIC and APIC shared the
>> ID space, but that system is definitely post 3-wire APIC :)
> Doesn't mean the BIOS code was updated adequately following that.
>
> What I'm confused by is why this system boots in the first place.  I can
> only think that's is a system which only has 4-bit APIC IDs, and happens
> to function when bit 4 gets truncated off the top of the SIPI destination...

https://www.amd.com/system/files/TechDocs/42300_15h_Mod_10h-1Fh_BKDG.pdf

This system does still require the IO-APICs to be at 0, and the LAPICs
to start at some offset, which is clearly 16 in this case.  Also, this
system has configurable 4-bit or 8-bit wide APIC IDs, and I can't tell
which mode is active just from the manual.

But, it does mean that the BIOS has genuinely modified the APIC IDs of
the logic processors.  This does highlight an error in reasoning with
the parallel bringup code.

For xAPIC, the APIC_ID register is writeable (at least, model
specifically), and CPUID is only the value it would have had at reset. 
So the AP bringup logic can't actually use CPUID reliably.

This was changed in x2APIC, which made the x2APIC_ID immutable.

I don't see an option other than the AP bringup code query for xAPIC vs
x2APIC mode, and either looking at the real APIC_ID register, or falling
back to CPUID.

~Andrew



Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-19 Thread Andrew Cooper
On 19/04/2023 2:43 pm, Thomas Gleixner wrote:
> On Wed, Apr 19 2023 at 14:38, Thomas Gleixner wrote:
>> On Wed, Apr 19 2023 at 11:38, Thomas Gleixner wrote:
>> IOW, the BIOS assignes random numbers to the AP APICs for whatever
>> raisins, which leaves the parallel startup low level code up a creek
>> without a paddle, except for actually reading the APICID back from the
>> APIC. *SHUDDER*
> So Andrew just pointed out on IRC that this might be related to the
> ancient issue of the 3-wire APIC bus where IO/APIC and APIC shared the
> ID space, but that system is definitely post 3-wire APIC :)

Doesn't mean the BIOS code was updated adequately following that.

What I'm confused by is why this system boots in the first place.  I can
only think that's is a system which only has 4-bit APIC IDs, and happens
to function when bit 4 gets truncated off the top of the SIPI destination...

~Andrew



Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-19 Thread Thomas Gleixner
On Wed, Apr 19 2023 at 14:38, Thomas Gleixner wrote:
> On Wed, Apr 19 2023 at 11:38, Thomas Gleixner wrote:
> IOW, the BIOS assignes random numbers to the AP APICs for whatever
> raisins, which leaves the parallel startup low level code up a creek
> without a paddle, except for actually reading the APICID back from the
> APIC. *SHUDDER*

So Andrew just pointed out on IRC that this might be related to the
ancient issue of the 3-wire APIC bus where IO/APIC and APIC shared the
ID space, but that system is definitely post 3-wire APIC :)

Thanks,

tglx





Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-19 Thread David Woodhouse
On Wed, 2023-04-19 at 14:38 +0200, Thomas Gleixner wrote:
> 
> I'm leaning towards disabling the CPUID lead 0x01 based discovery and be
> done with it.

Makes sense. The large machines where users really want the parallel
startup all ought to have X2APIC and hence CPUID 0x0b.



smime.p7s
Description: S/MIME cryptographic signature


Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-19 Thread Thomas Gleixner
On Wed, Apr 19 2023 at 11:38, Thomas Gleixner wrote:
> On Tue, Apr 18 2023 at 22:10, Paul Menzel wrote:
>> Am 18.04.23 um 10:40 schrieb Thomas Gleixner:
>>> Can you please provide the output of cpuid?
>>
>> Of course. Here the top, and the whole output is attached.
>
> Thanks for the data. Can you please apply the debug patch below and
> provide the dmesg output? Just the line which is added by the patch is
> enough. You can boot with cpuhp.parallel=off so you don't have wait for
> 10 seconds.

Borislav found some a machine which also refuses to boot. It turns of
the debug patch was spot on:

[0.462724]  node  #0, CPUs:  #1
[0.462731] smpboot: Kicking AP alive: 17
[0.465723]  #2
[0.465732] smpboot: Kicking AP alive: 18
[0.467641]  #3
[0.467641] smpboot: Kicking AP alive: 19

So the kernel gets APICID 17, 18, 19 from ACPI but CPUID leaf 0x1
ebx[31:24], which is the initial APICID has:

CPU10x01
CPU20x02
CPU30x03

Which means the APICID to Linux CPU number lookup based on CPUID 0x01
fails for all of them and stops them dead in the low level startup code.

IOW, the BIOS assignes random numbers to the AP APICs for whatever
raisins, which leaves the parallel startup low level code up a creek
without a paddle, except for actually reading the APICID back from the
APIC. *SHUDDER*

I'm leaning towards disabling the CPUID lead 0x01 based discovery and be
done with it.

Thanks,

tglx



Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-19 Thread Thomas Gleixner
Paul!

On Tue, Apr 18 2023 at 22:10, Paul Menzel wrote:
> Am 18.04.23 um 10:40 schrieb Thomas Gleixner:
>> Can you please provide the output of cpuid?
>
> Of course. Here the top, and the whole output is attached.

Thanks for the data. Can you please apply the debug patch below and
provide the dmesg output? Just the line which is added by the patch is
enough. You can boot with cpuhp.parallel=off so you don't have wait for
10 seconds.

Thanks,

tglx
---
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -814,6 +814,7 @@ static int wakeup_secondary_cpu_via_init
unsigned long send_status = 0, accept_status = 0;
int maxlvt, num_starts, j;
 
+   pr_info("Kicking AP alive: %d\n", phys_apicid);
preempt_disable();
maxlvt = lapic_get_maxlvt();
 



Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-18 Thread Paul Menzel

Dear Thomas,


Am 18.04.23 um 10:40 schrieb Thomas Gleixner:

On Tue, Apr 18 2023 at 08:58, Thomas Gleixner wrote:

On Mon, Apr 17 2023 at 19:40, Paul Menzel wrote:

Am 17.04.23 um 16:48 schrieb Thomas Gleixner:


On Mon, Apr 17 2023 at 13:19, Paul Menzel wrote:

Am 15.04.23 um 01:44 schrieb Thomas Gleixner:
[0.258193] smpboot: CPU0: AMD A6-6400K APU with Radeon(tm) HD Graphics 
(family: 0x15, model: 0x13, stepping: 0x1)
[…]
[0.259329] smp: Bringing up secondary CPUs ...
[0.259527] x86: Booting SMP configuration:
[0.259528]  node  #0, CPUs:  #1
[0.261007] After schedule_preempt_disabled
[   10.260990] CPU1 failed to report alive state


Weird. CPU1 fails to come up and report that it has reached the
synchronization point.

Does it work when you add cpuhp.parallel=off on the kernel command line?


Yes, the ten seconds delay is gone with `cpuhp.parallel=off`.

There was a patch set in the past, that worked on that device. I think
up to v4 it did *not* work at all and hung [1]. I need some days to
collect the results again.


Can you please apply the patch below on top of the pile remove the
command line option again?


Bah. That patch does not make any sense at all. Not enough coffee.

Can you please provide the output of cpuid?


Of course. Here the top, and the whole output is attached.

```
CPU 0:
   vendor_id = "AuthenticAMD"
   version information (1/eax):
  processor type  = primary processor (0)
  family  = 0xf (15)
  model   = 0x3 (3)
  stepping id = 0x1 (1)
  extended family = 0x6 (6)
  extended model  = 0x1 (1)
  (family synth)  = 0x15 (21)
  (model synth)   = 0x13 (19)
  (simple synth)  = AMD (unknown type) (Richland RL-A1) 
[Piledriver], 32nm

[…]
```


Kind regards,

PaulCPU 0:
   vendor_id = "AuthenticAMD"
   version information (1/eax):
  processor type  = primary processor (0)
  family  = 0xf (15)
  model   = 0x3 (3)
  stepping id = 0x1 (1)
  extended family = 0x6 (6)
  extended model  = 0x1 (1)
  (family synth)  = 0x15 (21)
  (model synth)   = 0x13 (19)
  (simple synth)  = AMD (unknown type) (Richland RL-A1) [Piledriver], 32nm
   miscellaneous (1/ebx):
  process local APIC physical ID = 0x0 (0)
  maximum IDs for CPUs in pkg= 0x2 (2)
  CLFLUSH line size  = 0x8 (8)
  brand index= 0x0 (0)
   brand id = 0x00 (0): unknown
   feature information (1/edx):
  x87 FPU on chip= true
  VME: virtual-8086 mode enhancement = true
  DE: debugging extensions   = true
  PSE: page size extensions  = true
  TSC: time stamp counter= true
  RDMSR and WRMSR support= true
  PAE: physical address extensions   = true
  MCE: machine check exception   = true
  CMPXCHG8B inst.= true
  APIC on chip   = true
  SYSENTER and SYSEXIT   = true
  MTRR: memory type range registers  = true
  PTE global bit = true
  MCA: machine check architecture= true
  CMOV: conditional move/compare instr   = true
  PAT: page attribute table  = true
  PSE-36: page size extension= true
  PSN: processor serial number   = false
  CLFLUSH instruction= true
  DS: debug store= false
  ACPI: thermal monitor and clock ctrl   = false
  MMX Technology = true
  FXSAVE/FXRSTOR = true
  SSE extensions = true
  SSE2 extensions= true
  SS: self snoop = false
  hyper-threading / multi-core supported = true
  TM: therm. monitor = false
  IA64   = false
  PBE: pending break event   = false
   feature information (1/ecx):
  PNI/SSE3: Prescott New Instructions = true
  PCLMULDQ instruction= true
  DTES64: 64-bit debug store  = false
  MONITOR/MWAIT   = true
  CPL-qualified debug store   = false
  VMX: virtual machine extensions = false
  SMX: safer mode extensions  = false
  Enhanced Intel SpeedStep Technology = false
  TM2: thermal monitor 2  = false
  SSSE3 extensions= true
  context ID: adaptive or shared L1 data  = false
  SDBG: IA32_DEBUG_INTERFACE  = false
  FMA instruction = true
  CMPXCHG16B instruction  = true
  xTPR disable= false
  PDCM: perfmon and debug = false
  PCID: process context identifiers  

Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-18 Thread Thomas Gleixner
On Tue, Apr 18 2023 at 08:58, Thomas Gleixner wrote:
> On Mon, Apr 17 2023 at 19:40, Paul Menzel wrote:
>> Am 17.04.23 um 16:48 schrieb Thomas Gleixner:
>>
>>> On Mon, Apr 17 2023 at 13:19, Paul Menzel wrote:
 Am 15.04.23 um 01:44 schrieb Thomas Gleixner:
 [0.258193] smpboot: CPU0: AMD A6-6400K APU with Radeon(tm) HD
 Graphics (family: 0x15, model: 0x13, stepping: 0x1)
 […]
 [0.259329] smp: Bringing up secondary CPUs ...
 [0.259527] x86: Booting SMP configuration:
 [0.259528]  node  #0, CPUs:  #1
 [0.261007] After schedule_preempt_disabled
 [   10.260990] CPU1 failed to report alive state
>>> 
>>> Weird. CPU1 fails to come up and report that it has reached the
>>> synchronization point.
>>> 
>>> Does it work when you add cpuhp.parallel=off on the kernel command line?
>>
>> Yes, the ten seconds delay is gone with `cpuhp.parallel=off`.
>>
>> There was a patch set in the past, that worked on that device. I think 
>> up to v4 it did *not* work at all and hung [1]. I need some days to 
>> collect the results again.
>
> Can you please apply the patch below on top of the pile remove the
> command line option again?

Bah. That patch does not make any sense at all. Not enough coffee.

Can you please provide the output of cpuid?

Thanks,

tglx







Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-18 Thread Thomas Gleixner
Paul!

On Mon, Apr 17 2023 at 19:40, Paul Menzel wrote:
> Am 17.04.23 um 16:48 schrieb Thomas Gleixner:
>
>> On Mon, Apr 17 2023 at 13:19, Paul Menzel wrote:
>>> Am 15.04.23 um 01:44 schrieb Thomas Gleixner:
>>> [0.258193] smpboot: CPU0: AMD A6-6400K APU with Radeon(tm) HD
>>> Graphics (family: 0x15, model: 0x13, stepping: 0x1)
>>> […]
>>> [0.259329] smp: Bringing up secondary CPUs ...
>>> [0.259527] x86: Booting SMP configuration:
>>> [0.259528]  node  #0, CPUs:  #1
>>> [0.261007] After schedule_preempt_disabled
>>> [   10.260990] CPU1 failed to report alive state
>> 
>> Weird. CPU1 fails to come up and report that it has reached the
>> synchronization point.
>> 
>> Does it work when you add cpuhp.parallel=off on the kernel command line?
>
> Yes, the ten seconds delay is gone with `cpuhp.parallel=off`.
>
> There was a patch set in the past, that worked on that device. I think 
> up to v4 it did *not* work at all and hung [1]. I need some days to 
> collect the results again.

Can you please apply the patch below on top of the pile remove the
command line option again?

Thanks,


tglx
---
 kernel/cpu.c |1 +
 1 file changed, 1 insertion(+)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1777,6 +1777,7 @@ static void __init cpuhp_bringup_mask(co
 */
WARN_ON(cpuhp_invoke_callback_range(false, cpu, st, 
CPUHP_OFFLINE));
}
+   msleep(20);
}
 }
 



Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-17 Thread Paul Menzel

Dear Thomas,


Am 17.04.23 um 16:48 schrieb Thomas Gleixner:


On Mon, Apr 17 2023 at 13:19, Paul Menzel wrote:

Am 15.04.23 um 01:44 schrieb Thomas Gleixner:
[0.258193] smpboot: CPU0: AMD A6-6400K APU with Radeon(tm) HD
Graphics (family: 0x15, model: 0x13, stepping: 0x1)
[…]
[0.259329] smp: Bringing up secondary CPUs ...
[0.259527] x86: Booting SMP configuration:
[0.259528]  node  #0, CPUs:  #1
[0.261007] After schedule_preempt_disabled
[   10.260990] CPU1 failed to report alive state


Weird. CPU1 fails to come up and report that it has reached the
synchronization point.

Does it work when you add cpuhp.parallel=off on the kernel command line?


Yes, the ten seconds delay is gone with `cpuhp.parallel=off`.

There was a patch set in the past, that worked on that device. I think 
up to v4 it did *not* work at all and hung [1]. I need some days to 
collect the results again.



Kind regards,

Paul


[1]: 
https://lore.kernel.org/lkml/ab28d2ce-4a9c-387d-9eda-558045a0c...@molgen.mpg.de/[0.00] Linux version 6.3.0-rc6-00311-gde8224969f66 (root@bf16f3646a84) 
(gcc (Debian 11.2.0-12) 11.2.0, GNU ld (GNU Binutils for Debian) 2.40) #446 SMP 
PREEMPT_DYNAMIC Sat Apr 15 14:12:29 UTC 2023
[0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-6.3.0-rc6-00311-gde8224969f66 root=/dev/sda3 rw quiet 
noisapnp cryptomgr.notests ipv6.disable_ipv6=1 selinux=0 cpuhp.parallel=off
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'standard' format.
[0.00] signal: max sigframe size: 1776
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x5fe4cfff] usable
[0.00] BIOS-e820: [mem 0x5fe4d000-0x7fff] reserved
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00017eff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 3.0.0 present.
[0.00] DMI: ASUS F2A85-M_PRO/F2A85-M_PRO, BIOS 4.18-9-gb640ed51b2 
04/17/2023
[0.00] tsc: Fast TSC calibration using PIT
[0.00] tsc: Initial usec timer 9249065
[0.00] tsc: Detected 3899.954 MHz processor
[0.000755] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.000759] e820: remove [mem 0x000a-0x000f] usable
[0.000763] last_pfn = 0x17f000 max_arch_pfn = 0x4
[0.000768] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[0.000938] last_pfn = 0x5fe4d max_arch_pfn = 0x4
[0.004000] Using GB pages for direct mapping
[0.004000] ACPI: Early table checksum verification disabled
[0.004000] ACPI: RSDP 0x000F6830 24 (v02 COREv4)
[0.004000] ACPI: XSDT 0x5FE5A0E0 74 (v01 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: FACP 0x5FE5BBC0 000114 (v06 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: DSDT 0x5FE5A280 00193A (v02 COREv4 COREBOOT 
00010001 INTL 20200925)
[0.004000] ACPI: FACS 0x5FE5A240 40
[0.004000] ACPI: FACS 0x5FE5A240 40
[0.004000] ACPI: SSDT 0x5FE5BCE0 8A (v02 COREv4 COREBOOT 
002A CORE 20200925)
[0.004000] ACPI: MCFG 0x5FE5BD70 3C (v01 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: APIC 0x5FE5BDB0 62 (v03 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: HPET 0x5FE5BE20 38 (v01 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: HEST 0x5FE5BE60 0001D0 (v01 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: IVRS 0x5FE5C030 70 (v02 AMDAMDIOMMU 
0001 AMD  )
[0.004000] ACPI: SSDT 0x5FE5C0A0 00051F (v02 AMDALIB 
0001 MSFT 0400)
[0.004000] ACPI: SSDT 0x5FE5C5C0 0006B2 (v01 AMDPOWERNOW 
0001 AMD  0001)
[0.004000] ACPI: VFCT 0x5FE5CC80 00F269 (v01 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: Reserving FACP table memory at [mem 0x5fe5bbc0-0x5fe5bcd3]
[0.004000] ACPI: Reserving DSDT table memory at [mem 0x5fe5a280-0x5fe5bbb9]
[0.004000] ACPI: Reserving FACS table memory at [mem 0x5fe5a240-0x5fe5a27f]
[0.004000] ACPI: Reserving FACS table memory at [mem 0x5fe5a240-0x5fe5a27f]
[

Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-17 Thread Thomas Gleixner
Paul!

On Mon, Apr 17 2023 at 13:19, Paul Menzel wrote:
> Am 15.04.23 um 01:44 schrieb Thomas Gleixner:
> [0.258193] smpboot: CPU0: AMD A6-6400K APU with Radeon(tm) HD 
> Graphics (family: 0x15, model: 0x13, stepping: 0x1)
> […]
> [0.259329] smp: Bringing up secondary CPUs ...
> [0.259527] x86: Booting SMP configuration:
> [0.259528]  node  #0, CPUs:  #1
> [0.261007] After schedule_preempt_disabled
> [   10.260990] CPU1 failed to report alive state

Weird. CPU1 fails to come up and report that it has reached the
synchronization point.

Does it work when you add cpuhp.parallel=off on the kernel command line?

Thanks,

tglx



Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-17 Thread Paul Menzel

[Correct David’s address]

Am 17.04.23 um 13:19 schrieb Paul Menzel:

Dear Thomas,


Am 15.04.23 um 01:44 schrieb Thomas Gleixner:


This is a complete rework of the parallel bringup patch series (V17)

 
https://lore.kernel.org/lkml/20230328195758.1049469-1-usama.a...@bytedance.com


to address the issues which were discovered in review:


[…]

Thank you very much for your rework.

I tested this on the ASUS F2A85-M PRO, and get a delay of ten seconds.

```
[…]
[    0.258193] smpboot: CPU0: AMD A6-6400K APU with Radeon(tm) HD 
Graphics (family: 0x15, model: 0x13, stepping: 0x1)

[…]
[    0.259329] smp: Bringing up secondary CPUs ...
[    0.259527] x86: Booting SMP configuration:
[    0.259528]  node  #0, CPUs:  #1
[    0.261007] After schedule_preempt_disabled
[   10.260990] CPU1 failed to report alive state
[   10.261070] smp: Brought up 1 node, 1 CPU
[   10.261073] smpboot: Max logical packages: 2
[   10.261074] smpboot: Total of 1 processors activated (7800.54 BogoMIPS)
[   10.261601] devtmpfs: initialized
[   10.261697] x86/mm: Memory block size: 128MB
```

This delay has been there with v6.3-rc6-46-gde4664485abbc and some 
custom (printk) patches on top and merging dwmw2/parallel-6.2-rc3-v16 
into it. I only tested this. I think dwmw2/parallel-6.2-v17 failed to 
build for me, when trying to merge it into Linus’ master version at that 
time. I didn’t come around to report it, and you posted your rework, so 
I am replying here.


I am going to try your branch directly in the next days, but just wanted 
to report back already.



Kind regards,

Paul




Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-17 Thread Paul Menzel

Dear Thomas,


Am 15.04.23 um 01:44 schrieb Thomas Gleixner:


This is a complete rework of the parallel bringup patch series (V17)

 
https://lore.kernel.org/lkml/20230328195758.1049469-1-usama.a...@bytedance.com

to address the issues which were discovered in review:


[…]

Thank you very much for your rework.

I tested this on the ASUS F2A85-M PRO, and get a delay of ten seconds.

```
[…]
[0.258193] smpboot: CPU0: AMD A6-6400K APU with Radeon(tm) HD 
Graphics (family: 0x15, model: 0x13, stepping: 0x1)

[…]
[0.259329] smp: Bringing up secondary CPUs ...
[0.259527] x86: Booting SMP configuration:
[0.259528]  node  #0, CPUs:  #1
[0.261007] After schedule_preempt_disabled
[   10.260990] CPU1 failed to report alive state
[   10.261070] smp: Brought up 1 node, 1 CPU
[   10.261073] smpboot: Max logical packages: 2
[   10.261074] smpboot: Total of 1 processors activated (7800.54 BogoMIPS)
[   10.261601] devtmpfs: initialized
[   10.261697] x86/mm: Memory block size: 128MB
```

This delay has been there with v6.3-rc6-46-gde4664485abbc and some 
custom (printk) patches on top and merging dwmw2/parallel-6.2-rc3-v16 
into it. I only tested this. I think dwmw2/parallel-6.2-v17 failed to 
build for me, when trying to merge it into Linus’ master version at that 
time. I didn’t come around to report it, and you posted your rework, so 
I am replying here.


I am going to try your branch directly in the next days, but just wanted 
to report back already.



Kind regards,

Paul[0.00] Linux version 6.3.0-rc6-00311-gde8224969f66 (root@bf16f3646a84) 
(gcc (Debian 11.2.0-12) 11.2.0, GNU ld (GNU Binutils for Debian) 2.40) #446 SMP 
PREEMPT_DYNAMIC Sat Apr 15 14:12:29 UTC 2023
[0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-6.3.0-rc6-00311-gde8224969f66 root=/dev/sda3 rw quiet 
noisapnp cryptomgr.notests ipv6.disable_ipv6=1 selinux=0
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'standard' format.
[0.00] signal: max sigframe size: 1776
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x5fe4cfff] usable
[0.00] BIOS-e820: [mem 0x5fe4d000-0x7fff] reserved
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00017eff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 3.0.0 present.
[0.00] DMI: ASUS F2A85-M_PRO/F2A85-M_PRO, BIOS 4.18-9-g9917d2d915 
04/17/2023
[0.00] tsc: Fast TSC calibration using PIT
[0.00] tsc: Initial usec timer 6035615
[0.00] tsc: Detected 3900.273 MHz processor
[0.000756] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.000759] e820: remove [mem 0x000a-0x000f] usable
[0.000763] last_pfn = 0x17f000 max_arch_pfn = 0x4
[0.000768] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[0.000942] last_pfn = 0x5fe4d max_arch_pfn = 0x4
[0.004000] Using GB pages for direct mapping
[0.004000] ACPI: Early table checksum verification disabled
[0.004000] ACPI: RSDP 0x000F6830 24 (v02 COREv4)
[0.004000] ACPI: XSDT 0x5FE5A0E0 74 (v01 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: FACP 0x5FE5BBC0 000114 (v06 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: DSDT 0x5FE5A280 00193A (v02 COREv4 COREBOOT 
00010001 INTL 20200925)
[0.004000] ACPI: FACS 0x5FE5A240 40
[0.004000] ACPI: FACS 0x5FE5A240 40
[0.004000] ACPI: SSDT 0x5FE5BCE0 8A (v02 COREv4 COREBOOT 
002A CORE 20200925)
[0.004000] ACPI: MCFG 0x5FE5BD70 3C (v01 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: APIC 0x5FE5BDB0 62 (v03 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: HPET 0x5FE5BE20 38 (v01 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: HEST 0x5FE5BE60 0001D0 (v01 COREv4 COREBOOT 
 CORE 20200925)
[0.004000] ACPI: IVRS 0x5FE5C030 70 (v02 AMDAMDIOMMU 
0001 AMD  )
[0.004000] ACPI: SSDT 0x5FE5C0A0 00051F (v02 AMDALIB 
0001 MSFT 0400)
[0.004000] ACPI: SSDT 0x5FE5C5C0 

Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-17 Thread Andrew Cooper
On 17/04/2023 11:30 am, Peter Zijlstra wrote:
> On Sat, Apr 15, 2023 at 01:44:13AM +0200, Thomas Gleixner wrote:
>
>> Background
>> --
>>
>> The reason why people are interested in parallel bringup is to shorten
>> the (kexec) reboot time of cloud servers to reduce the downtime of the
>> VM tenants. There are obviously other interesting use cases for this
>> like VM startup time, embedded devices...
> ...
>
>>   There are two issue there:
>>
>> a) The death by MCE broadcast problem
>>
>>Quite some (contemporary) x86 CPU generations are affected by
>>this:
>>
>>  - MCE can be broadcasted to all CPUs and not only issued locally
>>to the CPU which triggered it.
>>
>>  - Any CPU which has CR4.MCE == 0, even if it sits in a wait
>>for INIT/SIPI state, will cause an immediate shutdown of the
>>machine if a broadcasted MCE is delivered.
> When doing kexec, CR4.MCE should already have been set to 1 by the prior
> kernel, no?

No(ish).  Purgatory can't take #MC, or NMIs for that matter.

It's cleaner to explicitly disable CR4.MCE and let the system reset
(with all the MC banks properly preserved), than it is to take #MC while
the IDT isn't in sync with the handlers, and wander off into the weeds.

~Andrew



Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-17 Thread Peter Zijlstra
On Sat, Apr 15, 2023 at 01:44:13AM +0200, Thomas Gleixner wrote:

> Background
> --
> 
> The reason why people are interested in parallel bringup is to shorten
> the (kexec) reboot time of cloud servers to reduce the downtime of the
> VM tenants. There are obviously other interesting use cases for this
> like VM startup time, embedded devices...

...

>   There are two issue there:
> 
> a) The death by MCE broadcast problem
> 
>Quite some (contemporary) x86 CPU generations are affected by
>this:
> 
>  - MCE can be broadcasted to all CPUs and not only issued locally
>to the CPU which triggered it.
> 
>  - Any CPU which has CR4.MCE == 0, even if it sits in a wait
>for INIT/SIPI state, will cause an immediate shutdown of the
>machine if a broadcasted MCE is delivered.

When doing kexec, CR4.MCE should already have been set to 1 by the prior
kernel, no?



Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-04-17 Thread Juergen Gross

On 15.04.23 01:44, Thomas Gleixner wrote:

Hi!

This is a complete rework of the parallel bringup patch series (V17)

 
https://lore.kernel.org/lkml/20230328195758.1049469-1-usama.a...@bytedance.com

to address the issues which were discovered in review:

  1) The X86 microcode loader serialization requirement

 https://lore.kernel.org/lkml/87v8iirxun.ffs@tglx

 Microcode loading on HT enabled X86 CPUs requires that the microcode is
 loaded on the primary thread. The sibling thread(s) must be in
 quiescent state; either looping in a place which is aware of potential
 changes by the microcode update (see late loading) or in fully quiescent
 state, i.e. waiting for INIT/SIPI.

 This is required by hardware/firmware on Intel. Aside of that it's a
 vendor independent software correctness issue. Assume the following
 sequence:

 CPU1.0   CPU1.1
  CPUID($A)
 Load microcode.
 Changes CPUID($A, $B)
  CPUID($B)

 CPU1.1 makes a decision on $A and $B which might be inconsistent due
 to the microcode update.

 The solution for this is to bringup the primary threads first and after
 that the siblings. Loading microcode on the siblings is a NOOP on Intel
 and on AMD it is guaranteed to only modify thread local state.

 This ensures that the APs can load microcode before reaching the alive
 synchronization point w/o doing any further x86 specific
 synchronization between the core siblings.

  2) The general design issues discussed in V16

 https://lore.kernel.org/lkml/87pm8y6yme.ffs@tglx

 The previous parallel bringup patches just glued this mechanism into
 the existing code without a deeper analysis of the synchronization
 mechanisms and without generalizing it so that the control logic is
 mostly in the core code and not made an architecture specific tinker
 space.

 Much of that had been pointed out 2 years ago in the discussions about
 the early versions of parallel bringup already.


The series is based on:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip x86/apic

and also available from git:

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git hotplug


Background
--

The reason why people are interested in parallel bringup is to shorten
the (kexec) reboot time of cloud servers to reduce the downtime of the
VM tenants. There are obviously other interesting use cases for this
like VM startup time, embedded devices...

The current fully serialized bringup does the following per AP:

 1) Prepare callbacks (allocate, intialize, create threads)
 2) Kick the AP alive (e.g. INIT/SIPI on x86)
 3) Wait for the AP to report alive state
 4) Let the AP continue through the atomic bringup
 5) Let the AP run the threaded bringup to full online state

There are two significant delays:

 #3 The time for an AP to report alive state in start_secondary() on x86
has been measured in the range between 350us and 3.5ms depending on
vendor and CPU type, BIOS microcode size etc.

 #4 The atomic bringup does the microcode update. This has been measured
to take up to ~8ms on the primary threads depending on the microcode
patch size to apply.

On a two socket SKL server with 56 cores (112 threads) the boot CPU spends
on current mainline about 800ms busy waiting for the APs to come up and
apply microcode. That's more than 80% of the actual onlining procedure.

By splitting the actual bringup mechanism into two parts this can be
reduced to waiting for the first AP to report alive or if the system is
large enough the first AP is already waiting when the boot CPU finished the
wake-up of the last AP.


The actual solution comes in several parts
--

  1) [P 1-2] General cleanups (init annotations, kernel doc...)

  2) [P 3] The obvious

 Avoid pointless delay calibration when TSC is synchronized across
 sockets. That removes a whopping 100ms delay for the first CPU of a
 socket. This is an improvement independent of parallel bringup and had
 been discussed two years ago already.

  2) [P 3-6] Removal of the CPU0 hotplug hack.

 This was added 11 years ago with the promise to make this a real
 hardware mechanism, but that never materialized. As physical CPU
 hotplug is not really supported and the physical unplugging of CPU0
 never materialized there is no reason to keep this cruft around. It's
 just maintenance ballast for no value and the removal makes
 implementing the parallel bringup feature way simpler.

  3) [P 7-16] Cleanup of the existing bringup mechanism:

  a) Code reorganisation so that the general hotplug specific code is
 in smpboot.c and not sprinkled all over the place

  b) Decouple MTRR/PAT initialization from smp_callout_mask to prepare
 for replacing