On 22.01.26 18:18, Roger Pau Monné wrote:
On Thu, Jan 22, 2026 at 04:56:24PM +0100, Jürgen Groß wrote:
Just as a heads up: a hardware partner of SUSE has seen hard lockups
of the Linux kernel during boot on a new machine. This machine has
8 NUMA nodes and 960 CPUs. The hang occurs in roughly 1.5% of the boot
attempts in MTRR initialization of the APs.

Do you know why you get hard lockups?  Is some watchdog triggering on
Linux?  Otherwise I think it should just be slow, but ultimately
succeed?

The NMI watchdog triggered.


I have sent a small patch series to LKML which seems to fix the problem:
https://lore.kernel.org/lkml/[email protected]/

As Xen MTRR handling is taken from the Linux kernel, I guess the same
problem could happen in Xen, too.

As the hang always occurred while waiting for the lock, which is
serializing the single CPUs doing MTRR initialization, my solution was
to eliminate the lock, allowing all APs to init MTRRs in parallel.

Maybe we want to do the same in Xen.

Hm, yes, I think Xen would be equally affected with regards to being
contented on a lock while updating MTRRs.  The MTRR initialization is
deferred until all APs are up, and serialized on the
set_atomicity_lock lock.

Right.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to