On Feb 23, 2016 6:20 PM, "Xue, Ying" <ying....@windriver.com> wrote: > > Even if node timeout function restarts the node timer with mod_timer(), I don’t see any problem here. If you concern the usage about dealing with node timer and its refcount, please look at another similar example which can demonstrate how to maintain the relationship between sk_timer and sk_refcnt. For instance, you can refer to both sk_reset_timer() and sk_stop_timer().
I took a look and found that where sk_reset_timer() and sk_stop_timer() are called, the sock lock is always hold. Thus, the checking(for timer) and modificaion(for refcount) in those routines are atomic as a single operation. But ours are not. > > > > Regards, > > Ying > > > > From: jason [mailto:huzhiji...@gmail.com] > Sent: 2016年2月23日 17:55 > To: Xue, Ying > Cc: Jon Maloy; Richard Alpe; parthasarathy.bhuvara...@ericsson.com; tipc-discussion@lists.sourceforge.net; Jon Maloy > Subject: RE: [PATCH net-next v3 1/1] tipc: fix crash during node removal > > > > > On Feb 22, 2016 11:22 PM, "Xue, Ying" <ying....@windriver.com> wrote: > > > > Hi Jon, > > > > I think the scenario described below is not true. This is because del_timer() doesn't return 1 while node_timeout() is being called. Please take a look at __run_timers() defined in kernel/time/timer.c. When a timer is expired, __run_timers() will call the timeout function attached to the timer. But before __run_timers() runs timeout function, it first detaches the expired timer, > > Hi Ying, > But the timeout function here do mod_timer() which attach itself again? > > and then performs its timeout function. That means del_timer() will definitely return 0 while node_timeout() is under execution. So, the following scenario cannot happen at all. > > > > Regards, > > Ying > > > > -----Original Message----- > > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > > Sent: 2016年2月20日 9:17 > > To: tipc-discussion@lists.sourceforge.net; parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; jon.ma...@ericsson.com; huzhiji...@gmail.com > > Cc: ma...@donjonn.com > > Subject: [PATCH net-next v3 1/1] tipc: fix crash during node removal > > > > When the TIPC module is unloaded, we have identified a race condition that allows a node reference counter to go to zero and the node instance being freed before the node timer is finished with accessing it. This leads to occasional crashes, especially in multi-namespace environments. > > > > The scenario goes as follows: > > > > CPU0:(node_stop) CPU1:(node_timeout) // ref == 2 > > > > 1: if(!mod_timer()) > > 2: if (del_timer()) > > 3: tipc_node_put() // ref -> 1 > > 4: tipc_node_put() // ref -> 0 > > 5: kfree_rcu(node); > > 6: tipc_node_get(node) > > 7: // BOOM! > > > > We now clean up this functionality as follows: > > > > 1) We remove the node pointer from the node lookup table before we > > attempt deactivating the timer. This way, we reduce the risk that > > tipc_node_find() may obtain a valid pointer to an instance marked > > for deletion; a harmless but undesirable situation. > > > > 2) We use del_timer_sync() instead of del_timer() to safely deactivate > > the node timer without any risk that it might be reactivated by the > > timeout handler. There is no risk of deadlock here, since the two > > functions never touch the same spinlocks. > > > > 3: We remove a pointless tipc_node_get() + tipc_node_put() from the > > timeout handler. > > > > v3: - changed to del_timer_sync() > > - don't test for return value of mod_timer() in tipc_node_timeout(), > > and don't touch kref. > > > > Reported-by: Zhijiang Hu <huzhiji...@gmail.com> > > Signed-off-by: Jon Maloy <jon.ma...@ericsson.com> > > --- > > net/tipc/node.c | 22 ++++++++++------------ > > 1 file changed, 10 insertions(+), 12 deletions(-) > > > > diff --git a/net/tipc/node.c b/net/tipc/node.c index f310e931..1fdaed0 100644 > > --- a/net/tipc/node.c > > +++ b/net/tipc/node.c > > @@ -225,9 +225,10 @@ static unsigned int tipc_hashfn(u32 addr) > > > > static void tipc_node_kref_release(struct kref *kref) { > > - struct tipc_node *node = container_of(kref, struct tipc_node, kref); > > + struct tipc_node *n = container_of(kref, struct tipc_node, kref); > > > > - tipc_node_delete(node); > > + kfree(n->bc_entry.link); > > + kfree_rcu(n, rcu); > > } > > > > static void tipc_node_put(struct tipc_node *node) @@ -395,8 +396,10 @@ static void tipc_node_delete(struct tipc_node *node) { > > list_del_rcu(&node->list); > > hlist_del_rcu(&node->hash); > > - kfree(node->bc_entry.link); > > - kfree_rcu(node, rcu); > > + tipc_node_put(node); > > + > > + del_timer_sync(&node->timer); > > + tipc_node_put(node); > > } > > > > void tipc_node_stop(struct net *net) > > @@ -405,11 +408,8 @@ void tipc_node_stop(struct net *net) > > struct tipc_node *node, *t_node; > > > > spin_lock_bh(&tn->node_list_lock); > > - list_for_each_entry_safe(node, t_node, &tn->node_list, list) { > > - if (del_timer(&node->timer)) > > - tipc_node_put(node); > > - tipc_node_put(node); > > - } > > + list_for_each_entry_safe(node, t_node, &tn->node_list, list) > > + tipc_node_delete(node); > > spin_unlock_bh(&tn->node_list_lock); > > } > > > > @@ -530,9 +530,7 @@ static void tipc_node_timeout(unsigned long data) > > if (rc & TIPC_LINK_DOWN_EVT) > > tipc_node_link_down(n, bearer_id, false); > > } > > - if (!mod_timer(&n->timer, jiffies + n->keepalive_intv)) > > - tipc_node_get(n); > > - tipc_node_put(n); > > + mod_timer(&n->timer, jiffies + n->keepalive_intv); > > } > > > > /** > > -- > > 1.9.1 > > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion