On 02/19/2016 09:20 AM, Xue, Ying wrote: > Hi Jon, > > I understood our scenario. Whatever the patch and another patch(v2 tipc: fix > crash during node removal) seem all incorrect. I will post a patch to fix the > issue soon.
Ok. Actually, I think it is ok to use del_timer_sync(), which makes this very easy to fix. I was misled by our bad experience from the link timer, where we ran into deadlocks. This cannot happen with the node timer. ///jon > > Regards, > Ying > > -----Original Message----- > From: Jon Maloy [mailto:[email protected]] > Sent: 2016年2月19日 15:38 > To: Xue, Ying; [email protected]; Parthasarathy > Bhuvaragan; Richard Alpe; [email protected] > Cc: [email protected] > Subject: RE: [PATCH net-next 1/1] tipc: eliminate risk of finding > to-be-deleted node instance > > > >> -----Original Message----- >> From: Xue, Ying [mailto:[email protected]] >> Sent: Thursday, 18 February, 2016 08:50 >> To: Jon Maloy; [email protected]; Parthasarathy >> Bhuvaragan; Richard Alpe; [email protected] >> Cc: [email protected] >> Subject: RE: [PATCH net-next 1/1] tipc: eliminate risk of finding >> to-be-deleted node instance >> >> @@ -395,21 +396,19 @@ static void tipc_node_delete(struct tipc_node *node) { >> list_del_rcu(&node->list); >> hlist_del_rcu(&node->hash); >> - kfree(node->bc_entry.link); >> - kfree_rcu(node, rcu); >> + tipc_node_put(node); >> >> We cannot call the first tipc_node_put(node) before >> del_timer(&node->timer)), otherwise, this may cause a panic. For >> example, after the first >> tipc_node_put(node) is performed, its refcnt is zero. > It will never be zero here. During normal runtime, the counter is 2 or more > (depending on how many tipc_node_find()'ers are active at the moment). This > decrementation is done because we just removed the node from the lookup > table, but we still haven't tried deactivating the timer, so the value is > guaranteed 1 or more. > (see one more comment further down) > >> When the node is accessed >> in del_tmer(), the node instance might be freed by another CPU. Thus, >> panic happens at the moment. >> >> Regards, >> Ying >> >> + if (del_timer(&node->timer)) >> + tipc_node_put(node); > This is where we have the real problem. If we are unsuccessful in deleting > the timer here because the timer handler is running (and the timer therefore > already inactive), there is no way the handler can know that it must not > reactivate it before it returns. After mod_timer() returns, the timer is > always active, irrespective of its previous state and return value. Any > suggestion for how to fix this? Even v2 of my patch "tipc: fix crash during > node removal" seems to be wrong. All other solutions I have found suggest > del_timer_sync(), but we cannot use that, since we know it may cause deadlock. > > ///jon > >> } >> >> void tipc_node_stop(struct net *net) >> { >> - struct tipc_net *tn = net_generic(net, tipc_net_id); >> + struct tipc_net *tn = tipc_net(net); >> struct tipc_node *node, *t_node; >> >> spin_lock_bh(&tn->node_list_lock); >> - list_for_each_entry_safe(node, t_node, &tn->node_list, list) { >> - if (del_timer(&node->timer)) >> - tipc_node_put(node); >> - tipc_node_put(node); >> - } >> + list_for_each_entry_safe(node, t_node, &tn->node_list, list) >> + tipc_node_delete(node); >> spin_unlock_bh(&tn->node_list_lock); >> } >> >> -- >> 1.9.1 ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
