Hi Jon, I understood our scenario. Whatever the patch and another patch(v2 tipc: fix crash during node removal) seem all incorrect. I will post a patch to fix the issue soon.
Regards, Ying -----Original Message----- From: Jon Maloy [mailto:[email protected]] Sent: 2016年2月19日 15:38 To: Xue, Ying; [email protected]; Parthasarathy Bhuvaragan; Richard Alpe; [email protected] Cc: [email protected] Subject: RE: [PATCH net-next 1/1] tipc: eliminate risk of finding to-be-deleted node instance > -----Original Message----- > From: Xue, Ying [mailto:[email protected]] > Sent: Thursday, 18 February, 2016 08:50 > To: Jon Maloy; [email protected]; Parthasarathy > Bhuvaragan; Richard Alpe; [email protected] > Cc: [email protected] > Subject: RE: [PATCH net-next 1/1] tipc: eliminate risk of finding > to-be-deleted node instance > > @@ -395,21 +396,19 @@ static void tipc_node_delete(struct tipc_node *node) { > list_del_rcu(&node->list); > hlist_del_rcu(&node->hash); > - kfree(node->bc_entry.link); > - kfree_rcu(node, rcu); > + tipc_node_put(node); > > We cannot call the first tipc_node_put(node) before > del_timer(&node->timer)), otherwise, this may cause a panic. For > example, after the first > tipc_node_put(node) is performed, its refcnt is zero. It will never be zero here. During normal runtime, the counter is 2 or more (depending on how many tipc_node_find()'ers are active at the moment). This decrementation is done because we just removed the node from the lookup table, but we still haven't tried deactivating the timer, so the value is guaranteed 1 or more. (see one more comment further down) > When the node is accessed > in del_tmer(), the node instance might be freed by another CPU. Thus, > panic happens at the moment. > > Regards, > Ying > > + if (del_timer(&node->timer)) > + tipc_node_put(node); This is where we have the real problem. If we are unsuccessful in deleting the timer here because the timer handler is running (and the timer therefore already inactive), there is no way the handler can know that it must not reactivate it before it returns. After mod_timer() returns, the timer is always active, irrespective of its previous state and return value. Any suggestion for how to fix this? Even v2 of my patch "tipc: fix crash during node removal" seems to be wrong. All other solutions I have found suggest del_timer_sync(), but we cannot use that, since we know it may cause deadlock. ///jon > } > > void tipc_node_stop(struct net *net) > { > - struct tipc_net *tn = net_generic(net, tipc_net_id); > + struct tipc_net *tn = tipc_net(net); > struct tipc_node *node, *t_node; > > spin_lock_bh(&tn->node_list_lock); > - list_for_each_entry_safe(node, t_node, &tn->node_list, list) { > - if (del_timer(&node->timer)) > - tipc_node_put(node); > - tipc_node_put(node); > - } > + list_for_each_entry_safe(node, t_node, &tn->node_list, list) > + tipc_node_delete(node); > spin_unlock_bh(&tn->node_list_lock); > } > > -- > 1.9.1 ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
