An rdp's segcblist enabled state is treated differently on CPU hotplug operations, depending on whether it is offloaded or not.
1) Not offloaded: An rdp is disabled on CPU down. All its callbacks are migrated and no more aren't supposed to be enqueued until it gets re-enabled on CPU up. 2) Offloaded: An rdp is not disabled on CPU down in order to let the CB/GP kthreads finish their jobs on remaining callbacks. Hence it is not re-enabled on CPU up either. Since an rdp's offloaded state is set in stone at boot, we expect the offloaded state to remain the same between CPU down and CPU up. So 1) and 2) are symmetrical. Now the offloaded state will become toggable at runtime. Hence the new possible asymmetrical scenarios: 3) An rdp goes into CPU down while in a not-offloaded state. It gets later set to offloaded and finally goes into CPU up. 4) An rdp goes into CPU down while in an offloaded state. It gets later set to not-offloaded and finally goes into CPU up. The scenario 4) is currently well handled. The rdp isn't disabled on CPU down and it gets re-initialized on CPU up. We require the segcblist to be empty in order to toggle to non-offloaded state while a CPU is offlined. The scenario 3) would run into trouble though, as the rdp is disabled on CPU down and not re-initialized/re-enabled on CPU up. In order to fix this, always re-initialize/re-enable an rdp on CPU up unless it still has callbacks at that time, which anyway can only happen when the rdp went down and up in offloaded state (case 2), the only case that doesn't need re-initialization. NOTE: The proper longer term fix will be to wait for all the offloaded callbacks to be processed before completing CPU down operations. So we can unconditionally re-initialize on CPU up. Inspired-by: Paul E. McKenney <paul...@kernel.org> Signed-off-by: Frederic Weisbecker <frede...@kernel.org> Cc: Paul E. McKenney <paul...@kernel.org> Cc: Josh Triplett <j...@joshtriplett.org> Cc: Steven Rostedt <rost...@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoy...@efficios.com> Cc: Lai Jiangshan <jiangshan...@gmail.com> Cc: Joel Fernandes <j...@joelfernandes.org> Cc: Neeraj Upadhyay <neer...@codeaurora.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: Boqun Feng <boqun.f...@gmail.com> --- kernel/rcu/tree.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 48e8e63cdeb2..049433d0fa05 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4004,12 +4004,18 @@ int rcutree_prepare_cpu(unsigned int cpu) rdp->qlen_last_fqs_check = 0; rdp->n_force_qs_snap = rcu_state.n_force_qs; rdp->blimit = blimit; - if (rcu_segcblist_empty(&rdp->cblist) && /* No early-boot CBs? */ - !rcu_segcblist_is_offloaded(&rdp->cblist)) - rcu_segcblist_init(&rdp->cblist); /* Re-enable callbacks. */ rdp->dynticks_nesting = 1; /* CPU not up, no tearing. */ rcu_dynticks_eqs_online(); raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ + /* + * Lock in case the CB/GP kthreads are still around handling + * old callbacks (longer term we should flush all callbacks + * before completing CPU offline) + */ + rcu_nocb_lock(rdp); + if (rcu_segcblist_empty(&rdp->cblist)) /* No early-boot CBs? */ + rcu_segcblist_init(&rdp->cblist); /* Re-enable callbacks. */ + rcu_nocb_unlock(rdp); /* * Add CPU to leaf rcu_node pending-online bitmask. Any needed -- 2.25.1