Hi Jon,
I believe the real deadlock scenario is as below:
CPU0 CPU1
==== =======
spin_lock_bh(&tn->node_list_lock);
<SOFTIRQ>
call_timer_fn()
<IRQ IN>
irq_fn()
<IRQ_OUT>
del_timer_sync()
//wait until tipc_node_timeout() is completed
<SOFTIRQ>
tipc_node_timeout()
tipc_node_cleanup()
spin_lock_bh(&tn->node_list_lock);
tn->node_list_lock has been held, but del_timer_sync() must wait until
tipc_node_timeout() is completed on other CPUs, otherwise, it will never
return. Exactly speaking, if tipc_node_timeout() is not finished,
del_timer_sync() could not exit from the loop below:
del_timer_sync()
for (;;) {
int ret = try_to_del_timer_sync(timer);
if (ret >= 0)
return ret;
cpu_relax();
}
However, tipc_node_timeout() will try to grab the tn->node_list_lock on
CPU1, but the lock has been taken before del_timer_sync() on CPU0. As a
result, system will be locked forever.
Regarding the root cause, as long as we don't synchronously wait the
event of tipc_node_timeout() completion on other CPUs, the deadlock will
never happen. It means if we replace del_timer_sync() with del_timer()
in tipc_node_delete(), the deadlock will be eliminated. But we need to
ensure node instance is valid especially for tipc_node_timeout() after
tipc_node_put() is called in tipc_node_delete(). If so, we need to
carefully increase and decrease node reference count.
In your solution, tn->node_list_lock would not be held in
tipc_node_timeout(), instead, grabbing the lock is moved to workqueue,
which means the deadlock is able to be killed too.
If we use the solution I suggested above, it will be a bit natural and
we don't need to additionally introduce workqueue. But as I said, we
have to carefully check whether we properly get/put node reference count.
Actually, we original used del_timer() to asynchronously delete node
timer, but as some issues were found, later we changed as its deletion
behavior with a synchronous manner.
Anyway, your solution is also fine to me. So you can decide which
approach you prefer to.
Thanks,
Ying
On 11/11/2018 03:05 AM, Jon Maloy wrote:
> We see the following lockdep warning:
>
> [ 2284.078521] ======================================================
> [ 2284.078604] WARNING: possible circular locking dependency detected
> [ 2284.078604] 4.19.0+ #42 Tainted: G E
> [ 2284.078604] ------------------------------------------------------
> [ 2284.078604] rmmod/254 is trying to acquire lock:
> [ 2284.078604] 00000000acd94e28 ((&n->timer)#2){+.-.}, at:
> del_timer_sync+0x5/0xa0
> [ 2284.078604]
> [ 2284.078604] but task is already holding lock:
> [ 2284.078604] 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at:
> tipc_node_stop+0xac/0x190 [tipc]
> [ 2284.078604]
> [ 2284.078604] which lock already depends on the new lock.
> [ 2284.078604]
> [ 2284.078604]
> [ 2284.078604] the existing dependency chain (in reverse order) is:
> [ 2284.078604]
> [ 2284.078604] -> #1 (&(&tn->node_list_lock)->rlock){+.-.}:
> [ 2284.078604] tipc_node_timeout+0x20a/0x330 [tipc]
> [ 2284.078604] call_timer_fn+0xa1/0x280
> [ 2284.078604] run_timer_softirq+0x1f2/0x4d0
> [ 2284.078604] __do_softirq+0xfc/0x413
> [ 2284.078604] irq_exit+0xb5/0xc0
> [ 2284.078604] smp_apic_timer_interrupt+0xac/0x210
> [ 2284.078604] apic_timer_interrupt+0xf/0x20
> [ 2284.078604] default_idle+0x1c/0x140
> [ 2284.078604] do_idle+0x1bc/0x280
> [ 2284.078604] cpu_startup_entry+0x19/0x20
> [ 2284.078604] start_secondary+0x187/0x1c0
> [ 2284.078604] secondary_startup_64+0xa4/0xb0
> [ 2284.078604]
> [ 2284.078604] -> #0 ((&n->timer)#2){+.-.}:
> [ 2284.078604] del_timer_sync+0x34/0xa0
> [ 2284.078604] tipc_node_delete+0x1a/0x40 [tipc]
> [ 2284.078604] tipc_node_stop+0xcb/0x190 [tipc]
> [ 2284.078604] tipc_net_stop+0x154/0x170 [tipc]
> [ 2284.078604] tipc_exit_net+0x16/0x30 [tipc]
> [ 2284.078604] ops_exit_list.isra.8+0x36/0x70
> [ 2284.078604] unregister_pernet_operations+0x87/0xd0
> [ 2284.078604] unregister_pernet_subsys+0x1d/0x30
> [ 2284.078604] tipc_exit+0x11/0x6f2 [tipc]
> [ 2284.078604] __x64_sys_delete_module+0x1df/0x240
> [ 2284.078604] do_syscall_64+0x66/0x460
> [ 2284.078604] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 2284.078604]
> [ 2284.078604] other info that might help us debug this:
> [ 2284.078604]
> [ 2284.078604] Possible unsafe locking scenario:
> [ 2284.078604]
> [ 2284.078604] CPU0 CPU1
> [ 2284.078604] ---- ----
> [ 2284.078604] lock(&(&tn->node_list_lock)->rlock);
> [ 2284.078604] lock((&n->timer)#2);
> [ 2284.078604]
> lock(&(&tn->node_list_lock)->rlock);
> [ 2284.078604] lock((&n->timer)#2);
> [ 2284.078604]
> [ 2284.078604] *** DEADLOCK ***
> [ 2284.078604]
> [ 2284.078604] 3 locks held by rmmod/254:
> [ 2284.078604] #0: 000000003368be9b (pernet_ops_rwsem){+.+.}, at:
> unregister_pernet_subsys+0x15/0x30
> [ 2284.078604] #1: 0000000046ed9c86 (rtnl_mutex){+.+.}, at:
> tipc_net_stop+0x144/0x170 [tipc]
> [ 2284.078604] #2: 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.},
> at: tipc_node_stop+0xac/0x19
> [...}
>
> The reason is that the node timer handler sometimes needs to delete a
> node which has been disconnected for too long. To do this, it grabs
> the lock 'node_list_lock', which may at the same time be held by the
> generic node cleanup function, tipc_node_stop(), during module removal.
> Since the latter is calling del_timer_sync() inside the same lock, we
> have a potential deadlock.
>
> We fix this by moving the timer trigged delete function to work queue
> context.
>
> Fixes: 6a939f365bdb ("tipc: Auto removal of peer down node instance")
> Signed-off-by: Jon Maloy <[email protected]>
> ---
> net/tipc/node.c | 74
> ++++++++++++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 58 insertions(+), 16 deletions(-)
>
> diff --git a/net/tipc/node.c b/net/tipc/node.c
> index 2afc4f8..40527e6 100644
> --- a/net/tipc/node.c
> +++ b/net/tipc/node.c
> @@ -75,6 +75,12 @@ struct tipc_bclink_entry {
> struct sk_buff_head namedq;
> };
>
> +struct tipc_del_node_work {
> + struct work_struct work;
> + struct net *net;
> + u32 addr;
> +};
> +
> /**
> * struct tipc_node - TIPC node structure
> * @addr: network address of node
> @@ -581,25 +587,61 @@ static void tipc_node_clear_links(struct tipc_node
> *node)
> }
> }
>
> -/* tipc_node_cleanup - delete nodes that does not
> - * have active links for NODE_CLEANUP_AFTER time
> +static void tipc_exec_node_delete(struct work_struct *work)
> +{
> + struct tipc_del_node_work *dwork;
> + struct tipc_net *tn;
> + struct tipc_node *n;
> + struct net *net;
> + u32 addr;
> +
> + dwork = container_of(work, struct tipc_del_node_work, work);
> + addr = dwork->addr;
> + net = dwork->net;
> + kfree(dwork);
> +
> + /* In case the module has been removed: */
> + tn = tipc_net(net);
> + if (!tn)
> + return;
> + n = tipc_node_find(net, addr);
> + if (!n || node_is_up(n))
> + return;
> +
> + spin_lock_bh(&tn->node_list_lock);
> + tipc_node_write_lock(n);
> + tipc_node_clear_links(n);
> + tipc_node_delete_from_list(n);
> + tipc_node_write_unlock(n);
> + spin_unlock_bh(&tn->node_list_lock);
> +
> + /* For work queue item */
> + tipc_node_put(n);
> +}
> +
> +/* tipc_node_cleanup - delete node if it didn't have active
> + * links for NODE_CLEANUP_AFTER millisecods
> */
> static int tipc_node_cleanup(struct tipc_node *peer)
> {
> - struct tipc_net *tn = tipc_net(peer->net);
> - bool deleted = false;
> + struct tipc_del_node_work *dwork;
>
> - spin_lock_bh(&tn->node_list_lock);
> - tipc_node_write_lock(peer);
> + if (node_is_up(peer))
> + return false;
> + if (!time_after(jiffies, peer->delete_at))
> + return false;
>
> - if (!node_is_up(peer) && time_after(jiffies, peer->delete_at)) {
> - tipc_node_clear_links(peer);
> - tipc_node_delete_from_list(peer);
> - deleted = true;
> - }
> - tipc_node_write_unlock(peer);
> - spin_unlock_bh(&tn->node_list_lock);
> - return deleted;
> + /* Delete in work queue context to avoid potential deadlock */
> + dwork = kzalloc(sizeof(*dwork), GFP_ATOMIC);
> + if (!dwork)
> + return false;
> +
> + INIT_WORK(&dwork->work, tipc_exec_node_delete);
> + dwork->addr = peer->addr;
> + dwork->net = peer->net;
> + tipc_node_get(peer);
> + schedule_work(&dwork->work);
> + return true;
> }
>
> /* tipc_node_timeout - handle expiration of node timer
> @@ -613,8 +655,8 @@ static void tipc_node_timeout(struct timer_list *t)
> int bearer_id;
> int rc = 0;
>
> - if (!node_is_up(n) && tipc_node_cleanup(n)) {
> - /*Removing the reference of Timer*/
> + if (tipc_node_cleanup(n)) {
> + /* Remove reference to this timer */
> tipc_node_put(n);
> return;
> }
>
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion