On Thu, Jan 22, 2026 at 04:56:24PM +0100, Jürgen Groß wrote: > Just as a heads up: a hardware partner of SUSE has seen hard lockups > of the Linux kernel during boot on a new machine. This machine has > 8 NUMA nodes and 960 CPUs. The hang occurs in roughly 1.5% of the boot > attempts in MTRR initialization of the APs.
Do you know why you get hard lockups? Is some watchdog triggering on Linux? Otherwise I think it should just be slow, but ultimately succeed? > I have sent a small patch series to LKML which seems to fix the problem: > https://lore.kernel.org/lkml/[email protected]/ > > As Xen MTRR handling is taken from the Linux kernel, I guess the same > problem could happen in Xen, too. > > As the hang always occurred while waiting for the lock, which is > serializing the single CPUs doing MTRR initialization, my solution was > to eliminate the lock, allowing all APs to init MTRRs in parallel. > > Maybe we want to do the same in Xen. Hm, yes, I think Xen would be equally affected with regards to being contented on a lock while updating MTRRs. The MTRR initialization is deferred until all APs are up, and serialized on the set_atomicity_lock lock. Regards, Roger.
