On Feb 23, 2016 6:20 PM, "Xue, Ying" <ying....@windriver.com> wrote:
>
> Even if node timeout function restarts the node timer with mod_timer(), I
don’t see any problem here. If you concern the usage about dealing with
node timer and its refcount, please look at another similar example which
can demonstrate how to maintain the relationship between sk_timer and
sk_refcnt. For instance, you can refer to  both sk_reset_timer() and
sk_stop_timer().

I took a look and found that where sk_reset_timer() and sk_stop_timer() are
called, the sock lock is always hold. Thus, the checking(for timer) and
modificaion(for refcount) in those routines are atomic as a single
operation. But ours are not.

>
>
>
> Regards,
>
> Ying
>
>
>
> From: jason [mailto:huzhiji...@gmail.com]
> Sent: 2016年2月23日 17:55
> To: Xue, Ying
> Cc: Jon Maloy; Richard Alpe; parthasarathy.bhuvara...@ericsson.com;
tipc-discussion@lists.sourceforge.net; Jon Maloy
> Subject: RE: [PATCH net-next v3 1/1] tipc: fix crash during node removal
>
>
>
>
> On Feb 22, 2016 11:22 PM, "Xue, Ying" <ying....@windriver.com> wrote:
> >
> > Hi Jon,
> >
> > I think the scenario described below is not true. This is because
del_timer() doesn't return 1 while node_timeout() is being called. Please
take a look at __run_timers() defined in kernel/time/timer.c. When a timer
is expired, __run_timers() will call the timeout function attached to the
timer. But before __run_timers() runs timeout function, it first detaches
the expired timer,
>
> Hi Ying,
> But the timeout function here do mod_timer() which attach itself again?
>
> and then performs its timeout function. That means del_timer() will
definitely return 0 while node_timeout() is under execution. So, the
following scenario cannot happen at all.
> >
> > Regards,
> > Ying
> >
> > -----Original Message-----
> > From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> > Sent: 2016年2月20日 9:17
> > To: tipc-discussion@lists.sourceforge.net;
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com;
jon.ma...@ericsson.com; huzhiji...@gmail.com
> > Cc: ma...@donjonn.com
> > Subject: [PATCH net-next v3 1/1] tipc: fix crash during node removal
> >
> > When the TIPC module is unloaded, we have identified a race condition
that allows a node reference counter to go to zero and the node instance
being freed before the node timer is finished with accessing it. This leads
to occasional crashes, especially in multi-namespace environments.
> >
> > The scenario goes as follows:
> >
> > CPU0:(node_stop)                       CPU1:(node_timeout)  // ref == 2
> >
> > 1:                                          if(!mod_timer())
> > 2: if (del_timer())
> > 3:   tipc_node_put()                                        // ref -> 1
> > 4: tipc_node_put()                                          // ref -> 0
> > 5:   kfree_rcu(node);
> > 6:                                               tipc_node_get(node)
> > 7:                                               // BOOM!
> >
> > We now clean up this functionality as follows:
> >
> > 1) We remove the node pointer from the node lookup table before we
> >    attempt deactivating the timer. This way, we reduce the risk that
> >    tipc_node_find() may obtain a valid pointer to an instance marked
> >    for deletion; a harmless but undesirable situation.
> >
> > 2) We use del_timer_sync() instead of del_timer() to safely deactivate
> >    the node timer without any risk that it might be reactivated by the
> >    timeout handler. There is no risk of deadlock here, since the two
> >    functions never touch the same spinlocks.
> >
> > 3: We remove a pointless tipc_node_get() + tipc_node_put() from the
> >    timeout handler.
> >
> > v3: - changed to del_timer_sync()
> >     - don't test for return value of mod_timer() in tipc_node_timeout(),
> >       and don't touch kref.
> >
> > Reported-by: Zhijiang Hu <huzhiji...@gmail.com>
> > Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
> > ---
> >  net/tipc/node.c | 22 ++++++++++------------
> >  1 file changed, 10 insertions(+), 12 deletions(-)
> >
> > diff --git a/net/tipc/node.c b/net/tipc/node.c index f310e931..1fdaed0
100644
> > --- a/net/tipc/node.c
> > +++ b/net/tipc/node.c
> > @@ -225,9 +225,10 @@ static unsigned int tipc_hashfn(u32 addr)
> >
> >  static void tipc_node_kref_release(struct kref *kref)  {
> > -       struct tipc_node *node = container_of(kref, struct tipc_node,
kref);
> > +       struct tipc_node *n = container_of(kref, struct tipc_node,
kref);
> >
> > -       tipc_node_delete(node);
> > +       kfree(n->bc_entry.link);
> > +       kfree_rcu(n, rcu);
> >  }
> >
> >  static void tipc_node_put(struct tipc_node *node) @@ -395,8 +396,10 @@
static void tipc_node_delete(struct tipc_node *node)  {
> >         list_del_rcu(&node->list);
> >         hlist_del_rcu(&node->hash);
> > -       kfree(node->bc_entry.link);
> > -       kfree_rcu(node, rcu);
> > +       tipc_node_put(node);
> > +
> > +       del_timer_sync(&node->timer);
> > +       tipc_node_put(node);
> >  }
> >
> >  void tipc_node_stop(struct net *net)
> > @@ -405,11 +408,8 @@ void tipc_node_stop(struct net *net)
> >         struct tipc_node *node, *t_node;
> >
> >         spin_lock_bh(&tn->node_list_lock);
> > -       list_for_each_entry_safe(node, t_node, &tn->node_list, list) {
> > -               if (del_timer(&node->timer))
> > -                       tipc_node_put(node);
> > -               tipc_node_put(node);
> > -       }
> > +       list_for_each_entry_safe(node, t_node, &tn->node_list, list)
> > +               tipc_node_delete(node);
> >         spin_unlock_bh(&tn->node_list_lock);
> >  }
> >
> > @@ -530,9 +530,7 @@ static void tipc_node_timeout(unsigned long data)
> >                 if (rc & TIPC_LINK_DOWN_EVT)
> >                         tipc_node_link_down(n, bearer_id, false);
> >         }
> > -       if (!mod_timer(&n->timer, jiffies + n->keepalive_intv))
> > -               tipc_node_get(n);
> > -       tipc_node_put(n);
> > +       mod_timer(&n->timer, jiffies + n->keepalive_intv);
> >  }
> >
> >  /**
> > --
> > 1.9.1
> >
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to