Public bug reported:
[SRU Justification]
[Impact]
Systems on Jammy running high-throughput DMA workloads experience soft lockups
and RCU stalls in fq_flush_timeout, which result in system hangs.
The IOVA allocator in the 5.15 kernel uses a per-CPU magazine cache (rcache) to
avoid expensive rbtree operations. Each CPU has two magazines of 128 PFNs; when
both are full, the primary "loaded" magazine is pushed to a global depot (a
fixed-size array of 32 magazines per size-bin). When the depot is also full, the
overflow magazine is freed via iova_magazine_free_pfns(), which acquires
iova_rbtree_lock and performs up to 128 rbtree lookups and removals while
holding it.
The problem manifests through the flush-queue timer. Every 10ms,
fq_flush_timeout fires in softirq context and drains all CPUs' flush queues in a
single non-preemptible loop. Because __iova_rcache_insert uses raw_cpu_ptr(),
all recycled IOVAs are funnelled into the timer CPU's magazines. Once those
magazines and the shared depot are full, every subsequent overflow triggers
the expensive iova_magazine_free_pfns, resulting in up to 128 rbtree operations
under iova_rbtree_lock, all within the same softirq:
fq_flush_timeout (timer softirq on CPU X)
iova_domain_flush
for_each_possible_cpu(cpu):
fq_ring_free (up to IOVA_FQ_SIZE=256 entries)
free_iova_fast
__iova_rcache_insert (into CPU X's rcache via raw_cpu_ptr)
if depot_size >= 32:
iova_magazine_free_pfns (128 rbtree ops under iova_rbtree_lock)
The RCU stall trace from an affected system on 5.15.0-117 confirms this exact
path with reliable stack frames:
native_queued_spin_lock_slowpath+0x2c/0x40
_raw_spin_lock_irqsave+0x3d/0x50
iova_magazine_free_pfns.part.0+0x20/0xd0
free_iova_fast+0x219/0x290
fq_ring_free+0xa8/0x170
fq_flush_timeout+0x74/0xc0
call_timer_fn
run_timer_softirq
__do_softirq
[Fix]
Backport upstream commits, adapted for the 5.15 codebase:
1. 911aa1245da8 ("iommu/iova: Make the rcache depot scale better")
2. 233045378dbb ("iommu/iova: Manage the depot list size")
Cherry-pick upstream commit:
3. 7591c127f3b1 ("kmemleak: iommu/iova: fix transient kmemleak false positive")
Patch 1 replaces the fixed-size depot array with an unbounded singly-linked
list. Magazines are always pushed to the depot regardless of size. As a result,
the overflow path and its inline call to iova_magazine_free_pfns are eliminated
from __iova_rcache_insert.
Patch 2 prevents unbounded memory growth of the now-unlimited depot by adding a
delayed_work (background workqueue) that trims the depot when it exceeds
num_online_cpus() magazines. This reclaim runs in process context, which is
preemptible and sleepable, and therefore, cannot cause soft lockups.
Patch 3 fixes a kmemleak false positive introduced by patch 1.
Adaptations made for 5.15 backport:
- Patches 1 and 2 modify both drivers/iommu/iova.c and include/linux/iova.h
because in 5.15, struct iova_rcache is defined in the header (upstream moved
it into iova.c in a prior refactoring series not present in 5.15).
- The rcache init function in 5.15 is init_iova_rcaches() (static void, called
unconditionally from init_iova_domain) rather than upstream's
iova_domain_init_rcaches() (exported, returns int with error cleanup). The
backport preserves the 5.15 function signature and error handling pattern.
- 5.15 uses top-of-function variable declarations rather than upstream's C99
in-loop declarations.
- The core logic (depot linked-list, overflow elimination, background worker) is
identical between upstream and the backport.
[Test Plan]
TODO
[Where problems could occur]
Regression risk is low as changes in patches 1 and 2 are confined to the IOVA
rcache depot internals (drivers/iommu/iova.c and include/linux/iova.h). No
changes have been made to IOVA allocation or free semantics from the caller's
perspective. Patch 3 is purely diagnostic and has no runtime effect. Moreover,
the fix is already available on Noble and Resolute, where it has been thoroughly
tested.
[Other Info]
Similar issues have been reported in [0], [1], and [2]. The fix has already been
integrated into Noble and subsequent releases. Backporting this fix ensures
stability for users of the 5.15 kernel.
[0] - https://lkml.rescloud.iu.edu/2304.1/01286.html
[1] -
https://mailweb.openeuler.org/archives/list/[email protected]/message/FAOBDKYWJ5SNADM625H2A4YCOPRAIRGB/
[2] - https://access.redhat.com/solutions/7031930
** Affects: linux (Ubuntu)
Importance: Undecided
Status: Fix Released
** Affects: linux (Ubuntu Jammy)
Importance: Undecided
Assignee: Munir Siddiqui (munirsid)
Status: In Progress
** Affects: linux (Ubuntu Noble)
Importance: Undecided
Status: Fix Released
** Affects: linux (Ubuntu Resolute)
Importance: Undecided
Status: Fix Released
** Also affects: linux (Ubuntu Jammy)
Importance: Undecided
Status: New
** Also affects: linux (Ubuntu Resolute)
Importance: Undecided
Status: New
** Also affects: linux (Ubuntu Noble)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu Jammy)
Status: New => In Progress
** Changed in: linux (Ubuntu Noble)
Status: New => Fix Released
** Changed in: linux (Ubuntu Resolute)
Status: New => Fix Released
** Changed in: linux (Ubuntu)
Status: New => Fix Released
** Changed in: linux (Ubuntu Jammy)
Assignee: (unassigned) => Munir Siddiqui (munirsid)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2158106
Title:
[Jammy] soft lockups and rcu stalls in fq_flush_timeout causing system
hangs
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2158106/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs