On 22/01/2026 3:56 pm, Jürgen Groß wrote: > Just as a heads up: a hardware partner of SUSE has seen hard lockups > of the Linux kernel during boot on a new machine. This machine has > 8 NUMA nodes and 960 CPUs. The hang occurs in roughly 1.5% of the boot > attempts in MTRR initialization of the APs. > > I have sent a small patch series to LKML which seems to fix the problem: > https://lore.kernel.org/lkml/[email protected]/ > > As Xen MTRR handling is taken from the Linux kernel, I guess the same > problem could happen in Xen, too. > > As the hang always occurred while waiting for the lock, which is > serializing the single CPUs doing MTRR initialization, my solution was > to eliminate the lock, allowing all APs to init MTRRs in parallel. > > Maybe we want to do the same in Xen.
I suspect Xen might be insulated by the fact that we don't have parallel AP start (yet), so we don't have the whole system competing on the spinlock at once. Nevertheless, there's a lot of improvement available. We still have a lot of pre-64bit logic that we haven't purged yet. ~Andrew
