On Feb 22, 2016 11:22 PM, "Xue, Ying" <ying....@windriver.com> wrote:
>
> Hi Jon,
>
> I think the scenario described below is not true. This is because
del_timer() doesn't return 1 while node_timeout() is being called. Please
take a look at __run_timers() defined in kernel/time/timer.c. When a timer
is expired, __run_timers() will call the timeout function attached to the
timer. But before __run_timers() runs timeout function, it first detaches
the expired timer,

Hi Ying,
But the timeout function here do mod_timer() which attach itself again?

and then performs its timeout function. That means del_timer() will
definitely return 0 while node_timeout() is under execution. So, the
following scenario cannot happen at all.
>
> Regards,
> Ying
>
> -----Original Message-----
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: 2016年2月20日 9:17
> To: tipc-discussion@lists.sourceforge.net;
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com;
jon.ma...@ericsson.com; huzhiji...@gmail.com
> Cc: ma...@donjonn.com
> Subject: [PATCH net-next v3 1/1] tipc: fix crash during node removal
>
> When the TIPC module is unloaded, we have identified a race condition
that allows a node reference counter to go to zero and the node instance
being freed before the node timer is finished with accessing it. This leads
to occasional crashes, especially in multi-namespace environments.
>
> The scenario goes as follows:
>
> CPU0:(node_stop)                       CPU1:(node_timeout)  // ref == 2
>
> 1:                                          if(!mod_timer())
> 2: if (del_timer())
> 3:   tipc_node_put()                                        // ref -> 1
> 4: tipc_node_put()                                          // ref -> 0
> 5:   kfree_rcu(node);
> 6:                                               tipc_node_get(node)
> 7:                                               // BOOM!
>
> We now clean up this functionality as follows:
>
> 1) We remove the node pointer from the node lookup table before we
>    attempt deactivating the timer. This way, we reduce the risk that
>    tipc_node_find() may obtain a valid pointer to an instance marked
>    for deletion; a harmless but undesirable situation.
>
> 2) We use del_timer_sync() instead of del_timer() to safely deactivate
>    the node timer without any risk that it might be reactivated by the
>    timeout handler. There is no risk of deadlock here, since the two
>    functions never touch the same spinlocks.
>
> 3: We remove a pointless tipc_node_get() + tipc_node_put() from the
>    timeout handler.
>
> v3: - changed to del_timer_sync()
>     - don't test for return value of mod_timer() in tipc_node_timeout(),
>       and don't touch kref.
>
> Reported-by: Zhijiang Hu <huzhiji...@gmail.com>
> Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
> ---
>  net/tipc/node.c | 22 ++++++++++------------
>  1 file changed, 10 insertions(+), 12 deletions(-)
>
> diff --git a/net/tipc/node.c b/net/tipc/node.c index f310e931..1fdaed0
100644
> --- a/net/tipc/node.c
> +++ b/net/tipc/node.c
> @@ -225,9 +225,10 @@ static unsigned int tipc_hashfn(u32 addr)
>
>  static void tipc_node_kref_release(struct kref *kref)  {
> -       struct tipc_node *node = container_of(kref, struct tipc_node,
kref);
> +       struct tipc_node *n = container_of(kref, struct tipc_node, kref);
>
> -       tipc_node_delete(node);
> +       kfree(n->bc_entry.link);
> +       kfree_rcu(n, rcu);
>  }
>
>  static void tipc_node_put(struct tipc_node *node) @@ -395,8 +396,10 @@
static void tipc_node_delete(struct tipc_node *node)  {
>         list_del_rcu(&node->list);
>         hlist_del_rcu(&node->hash);
> -       kfree(node->bc_entry.link);
> -       kfree_rcu(node, rcu);
> +       tipc_node_put(node);
> +
> +       del_timer_sync(&node->timer);
> +       tipc_node_put(node);
>  }
>
>  void tipc_node_stop(struct net *net)
> @@ -405,11 +408,8 @@ void tipc_node_stop(struct net *net)
>         struct tipc_node *node, *t_node;
>
>         spin_lock_bh(&tn->node_list_lock);
> -       list_for_each_entry_safe(node, t_node, &tn->node_list, list) {
> -               if (del_timer(&node->timer))
> -                       tipc_node_put(node);
> -               tipc_node_put(node);
> -       }
> +       list_for_each_entry_safe(node, t_node, &tn->node_list, list)
> +               tipc_node_delete(node);
>         spin_unlock_bh(&tn->node_list_lock);
>  }
>
> @@ -530,9 +530,7 @@ static void tipc_node_timeout(unsigned long data)
>                 if (rc & TIPC_LINK_DOWN_EVT)
>                         tipc_node_link_down(n, bearer_id, false);
>         }
> -       if (!mod_timer(&n->timer, jiffies + n->keepalive_intv))
> -               tipc_node_get(n);
> -       tipc_node_put(n);
> +       mod_timer(&n->timer, jiffies + n->keepalive_intv);
>  }
>
>  /**
> --
> 1.9.1
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to