Public bug reported:

Problem:
  During kexec/reboot on ARM64 Grace systems, a CSD lock timeout occurs when 
KFENCE's toggle_allocation_gate() calls kick_all_cpus_sync() while CPU#0 is 
stuck in
  nbcon_atomic_flush_pending() with IRQs disabled. This causes system hangs or 
significant delays during kexec.

  The root cause is twofold:
  1. nbcon_atomic_flush_pending() holds IRQs disabled for the entire console 
flush (including pl011 UART busy-wait), blocking CPU#0 from responding to CSD 
IPIs
  2. KFENCE's toggle_allocation_gate() continues firing during shutdown, 
sending IPIs to all CPUs via kick_all_cpus_sync()
  
https://lore.kernel.org/all/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6hkd@v6sqmpbu3vvu/

  Reproduction:
  Verified on a 176-CPU Grace system using a test module that simulates the 
nbcon_atomic_flush_pending() IRQ-off condition. With 
CONFIG_KFENCE_STATIC_KEYS=y and CONFIG_KFENCE_SAMPLE_INTERVAL=100:

  Without fix:
  smp: csd: Detected non-responsive CSD lock (#1) on CPU#145, waiting 
5000000036 ns for CPU#00 do_nothing+0x0/0x10(0x0).
  smp:     csd: CSD lock (#1) unresponsive.
  Sending NMI from CPU 145 to CPUs 0:

  With all three fixes applied: clean kexec, no CSD lock.

  Fix:

  1. ce2bba89566b mm/kfence: add reboot notifier to disable KFENCE on shutdown
    - Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
  2. 9bc9ccbf4c93 mm/kfence: fix potential deadlock in reboot notifier
    - Fixes: ce2bba89566b ("mm/kfence: add reboot notifier to disable KFENCE on 
shutdown")
  3. 9bd18e1262c0 printk/nbcon: Restore IRQ in atomic flush after each emitted 
record
  
We can ignore the kfence commits as we don't enable that.

** Affects: linux-nvidia-6.17 (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2146955

Title:
  CSD lock timeout during kexec/reboot when KFENCE is enabled

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.17/+bug/2146955/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to