Hi Ying, My reasoning was exactly like yours. I also thought about going back to del_timer(), but then remembered all the problems we had with this, and how well is working now, so I opted for the more coward (and uglier) solution. But I'll give this a second glance. I don't like to use work queues unless absolutely necessary.
///jon > -----Original Message----- > From: Ying Xue <[email protected]> > Sent: 13-Nov-18 07:50 > To: Jon Maloy <[email protected]>; tipc- > [email protected] > Cc: Tung Quang Nguyen <[email protected]>; Hoang Huu Le > <[email protected]>; Jon Maloy <[email protected]>; > [email protected]; Canh Duc Luu <[email protected]>; Gordan > Mihaljevic <[email protected]>; > [email protected] > Subject: Re: [net-next 2/2] tipc: fix lockdep warning during node delete > > Hi Jon, > > I believe the real deadlock scenario is as below: > > CPU0 CPU1 > ==== ======= > spin_lock_bh(&tn->node_list_lock); > <SOFTIRQ> > call_timer_fn() > <IRQ IN> > > irq_fn() > <IRQ_OUT> > del_timer_sync() > //wait until tipc_node_timeout() is completed > <SOFTIRQ> > > tipc_node_timeout() > tipc_node_cleanup() > spin_lock_bh(&tn->node_list_lock); > > tn->node_list_lock has been held, but del_timer_sync() must wait until > tipc_node_timeout() is completed on other CPUs, otherwise, it will never > return. Exactly speaking, if tipc_node_timeout() is not finished, > del_timer_sync() could not exit from the loop below: > > del_timer_sync() > for (;;) { > int ret = try_to_del_timer_sync(timer); > if (ret >= 0) > return ret; > cpu_relax(); > } > > However, tipc_node_timeout() will try to grab the tn->node_list_lock on > CPU1, but the lock has been taken before del_timer_sync() on CPU0. As a > result, system will be locked forever. > > Regarding the root cause, as long as we don't synchronously wait the event > of tipc_node_timeout() completion on other CPUs, the deadlock will never > happen. It means if we replace del_timer_sync() with del_timer() in > tipc_node_delete(), the deadlock will be eliminated. But we need to ensure > node instance is valid especially for tipc_node_timeout() after > tipc_node_put() is called in tipc_node_delete(). If so, we need to carefully > increase and decrease node reference count. > > In your solution, tn->node_list_lock would not be held in > tipc_node_timeout(), instead, grabbing the lock is moved to workqueue, > which means the deadlock is able to be killed too. > > If we use the solution I suggested above, it will be a bit natural and we > don't > need to additionally introduce workqueue. But as I said, we have to carefully > check whether we properly get/put node reference count. > > Actually, we original used del_timer() to asynchronously delete node timer, > but as some issues were found, later we changed as its deletion behavior > with a synchronous manner. > > Anyway, your solution is also fine to me. So you can decide which approach > you prefer to. > > Thanks, > Ying > > On 11/11/2018 03:05 AM, Jon Maloy wrote: > > We see the following lockdep warning: > > > > [ 2284.078521] > ====================================================== > > [ 2284.078604] WARNING: possible circular locking dependency detected > > [ 2284.078604] 4.19.0+ #42 Tainted: G E > > [ 2284.078604] ------------------------------------------------------ > > [ 2284.078604] rmmod/254 is trying to acquire lock: > > [ 2284.078604] 00000000acd94e28 ((&n->timer)#2){+.-.}, at: > > del_timer_sync+0x5/0xa0 [ 2284.078604] [ 2284.078604] but task is > > already holding lock: > > [ 2284.078604] 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, > > at: tipc_node_stop+0xac/0x190 [tipc] [ 2284.078604] [ 2284.078604] > > which lock already depends on the new lock. > > [ 2284.078604] > > [ 2284.078604] > > [ 2284.078604] the existing dependency chain (in reverse order) is: > > [ 2284.078604] > > [ 2284.078604] -> #1 (&(&tn->node_list_lock)->rlock){+.-.}: > > [ 2284.078604] tipc_node_timeout+0x20a/0x330 [tipc] > > [ 2284.078604] call_timer_fn+0xa1/0x280 > > [ 2284.078604] run_timer_softirq+0x1f2/0x4d0 > > [ 2284.078604] __do_softirq+0xfc/0x413 > > [ 2284.078604] irq_exit+0xb5/0xc0 > > [ 2284.078604] smp_apic_timer_interrupt+0xac/0x210 > > [ 2284.078604] apic_timer_interrupt+0xf/0x20 > > [ 2284.078604] default_idle+0x1c/0x140 > > [ 2284.078604] do_idle+0x1bc/0x280 > > [ 2284.078604] cpu_startup_entry+0x19/0x20 > > [ 2284.078604] start_secondary+0x187/0x1c0 > > [ 2284.078604] secondary_startup_64+0xa4/0xb0 > > [ 2284.078604] > > [ 2284.078604] -> #0 ((&n->timer)#2){+.-.}: > > [ 2284.078604] del_timer_sync+0x34/0xa0 > > [ 2284.078604] tipc_node_delete+0x1a/0x40 [tipc] > > [ 2284.078604] tipc_node_stop+0xcb/0x190 [tipc] > > [ 2284.078604] tipc_net_stop+0x154/0x170 [tipc] > > [ 2284.078604] tipc_exit_net+0x16/0x30 [tipc] > > [ 2284.078604] ops_exit_list.isra.8+0x36/0x70 > > [ 2284.078604] unregister_pernet_operations+0x87/0xd0 > > [ 2284.078604] unregister_pernet_subsys+0x1d/0x30 > > [ 2284.078604] tipc_exit+0x11/0x6f2 [tipc] > > [ 2284.078604] __x64_sys_delete_module+0x1df/0x240 > > [ 2284.078604] do_syscall_64+0x66/0x460 > > [ 2284.078604] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > [ 2284.078604] > > [ 2284.078604] other info that might help us debug this: > > [ 2284.078604] > > [ 2284.078604] Possible unsafe locking scenario: > > [ 2284.078604] > > [ 2284.078604] CPU0 CPU1 > > [ 2284.078604] ---- ---- > > [ 2284.078604] lock(&(&tn->node_list_lock)->rlock); > > [ 2284.078604] lock((&n->timer)#2); > > [ 2284.078604] > > lock(&(&tn->node_list_lock)->rlock); > > [ 2284.078604] lock((&n->timer)#2); > > [ 2284.078604] > > [ 2284.078604] *** DEADLOCK *** > > [ 2284.078604] > > [ 2284.078604] 3 locks held by rmmod/254: > > [ 2284.078604] #0: 000000003368be9b (pernet_ops_rwsem){+.+.}, at: > > unregister_pernet_subsys+0x15/0x30 > > [ 2284.078604] #1: 0000000046ed9c86 (rtnl_mutex){+.+.}, at: > > tipc_net_stop+0x144/0x170 [tipc] [ 2284.078604] #2: 00000000f997afc0 > > (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x19 > > [...} > > > > The reason is that the node timer handler sometimes needs to delete a > > node which has been disconnected for too long. To do this, it grabs > > the lock 'node_list_lock', which may at the same time be held by the > > generic node cleanup function, tipc_node_stop(), during module removal. > > Since the latter is calling del_timer_sync() inside the same lock, we > > have a potential deadlock. > > > > We fix this by moving the timer trigged delete function to work queue > > context. > > > > Fixes: 6a939f365bdb ("tipc: Auto removal of peer down node instance") > > Signed-off-by: Jon Maloy <[email protected]> > > --- > > net/tipc/node.c | 74 > > ++++++++++++++++++++++++++++++++++++++++++++------------- > > 1 file changed, 58 insertions(+), 16 deletions(-) > > > > diff --git a/net/tipc/node.c b/net/tipc/node.c index 2afc4f8..40527e6 > > 100644 > > --- a/net/tipc/node.c > > +++ b/net/tipc/node.c > > @@ -75,6 +75,12 @@ struct tipc_bclink_entry { > > struct sk_buff_head namedq; > > }; > > > > +struct tipc_del_node_work { > > + struct work_struct work; > > + struct net *net; > > + u32 addr; > > +}; > > + > > /** > > * struct tipc_node - TIPC node structure > > * @addr: network address of node > > @@ -581,25 +587,61 @@ static void tipc_node_clear_links(struct > tipc_node *node) > > } > > } > > > > -/* tipc_node_cleanup - delete nodes that does not > > - * have active links for NODE_CLEANUP_AFTER time > > +static void tipc_exec_node_delete(struct work_struct *work) { > > + struct tipc_del_node_work *dwork; > > + struct tipc_net *tn; > > + struct tipc_node *n; > > + struct net *net; > > + u32 addr; > > + > > + dwork = container_of(work, struct tipc_del_node_work, work); > > + addr = dwork->addr; > > + net = dwork->net; > > + kfree(dwork); > > + > > + /* In case the module has been removed: */ > > + tn = tipc_net(net); > > + if (!tn) > > + return; > > + n = tipc_node_find(net, addr); > > + if (!n || node_is_up(n)) > > + return; > > + > > + spin_lock_bh(&tn->node_list_lock); > > + tipc_node_write_lock(n); > > + tipc_node_clear_links(n); > > + tipc_node_delete_from_list(n); > > + tipc_node_write_unlock(n); > > + spin_unlock_bh(&tn->node_list_lock); > > + > > + /* For work queue item */ > > + tipc_node_put(n); > > +} > > + > > +/* tipc_node_cleanup - delete node if it didn't have active > > + * links for NODE_CLEANUP_AFTER millisecods > > */ > > static int tipc_node_cleanup(struct tipc_node *peer) { > > - struct tipc_net *tn = tipc_net(peer->net); > > - bool deleted = false; > > + struct tipc_del_node_work *dwork; > > > > - spin_lock_bh(&tn->node_list_lock); > > - tipc_node_write_lock(peer); > > + if (node_is_up(peer)) > > + return false; > > + if (!time_after(jiffies, peer->delete_at)) > > + return false; > > > > - if (!node_is_up(peer) && time_after(jiffies, peer->delete_at)) { > > - tipc_node_clear_links(peer); > > - tipc_node_delete_from_list(peer); > > - deleted = true; > > - } > > - tipc_node_write_unlock(peer); > > - spin_unlock_bh(&tn->node_list_lock); > > - return deleted; > > + /* Delete in work queue context to avoid potential deadlock */ > > + dwork = kzalloc(sizeof(*dwork), GFP_ATOMIC); > > + if (!dwork) > > + return false; > > + > > + INIT_WORK(&dwork->work, tipc_exec_node_delete); > > + dwork->addr = peer->addr; > > + dwork->net = peer->net; > > + tipc_node_get(peer); > > + schedule_work(&dwork->work); > > + return true; > > } > > > > /* tipc_node_timeout - handle expiration of node timer @@ -613,8 > > +655,8 @@ static void tipc_node_timeout(struct timer_list *t) > > int bearer_id; > > int rc = 0; > > > > - if (!node_is_up(n) && tipc_node_cleanup(n)) { > > - /*Removing the reference of Timer*/ > > + if (tipc_node_cleanup(n)) { > > + /* Remove reference to this timer */ > > tipc_node_put(n); > > return; > > } > > _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
