On Thu, Jan 22, 2026 at 04:56:24PM +0100, Jürgen Groß wrote:
> Just as a heads up: a hardware partner of SUSE has seen hard lockups
> of the Linux kernel during boot on a new machine. This machine has
> 8 NUMA nodes and 960 CPUs. The hang occurs in roughly 1.5% of the boot
> attempts in MTRR initialization of the APs.

Do you know why you get hard lockups?  Is some watchdog triggering on
Linux?  Otherwise I think it should just be slow, but ultimately
succeed?

> I have sent a small patch series to LKML which seems to fix the problem:
> https://lore.kernel.org/lkml/[email protected]/
> 
> As Xen MTRR handling is taken from the Linux kernel, I guess the same
> problem could happen in Xen, too.
> 
> As the hang always occurred while waiting for the lock, which is
> serializing the single CPUs doing MTRR initialization, my solution was
> to eliminate the lock, allowing all APs to init MTRRs in parallel.
> 
> Maybe we want to do the same in Xen.

Hm, yes, I think Xen would be equally affected with regards to being
contented on a lock while updating MTRRs.  The MTRR initialization is
deferred until all APs are up, and serialized on the
set_atomicity_lock lock.

Regards, Roger.

Reply via email to