Hi Parth,

Sorry for late response.

As I could not find your v3 version, I just give comments based on the 
version.

On 11/22/2016 12:27 AM, Parthasarathy Bhuvaragan wrote:
> Commit 333f796235a527 ("tipc: fix a race condition leading to
> subscriber refcnt bug") reveals a soft lockup while acquiring
> nametbl_lock.
>
> Before commit 333f796235a527, we call tipc_conn_shutdown() from
> tipc_close_conn() in the context of tipc_topsrv_stop(). In that
> context, we are allowed to grab the nametbl_lock.
>
> In commit 333f796235a527, i moved the tipc_conn_release (renamed from
> tipc_conn_shutdown) to the connection refcount cleanup.

Can you please confirm whether the soft lockup doesn't happen any more 
if we don't adjust the sequence of tipc_conn_release?

If yes, I think we can propose other method to fix the issue solved by 
commit 333f796235a527, but in the new method it's unnecessary to adjust 
the order of tipc_conn_release.

In fact the order of tipc_conn_shutdown, tipc_unregister_callbacks and 
tipc_conn_release is very important. When I wrote that code, I spent 
much time considering how to carefully close connection.

In my opinion, the ideal order is still as belows:

1, Close connection;
2. Call tipc_unregister_callbacks to let sk->sk_user_data. As long as 
sk->sk_user_data is 0, no more data will be submitted to 
con->rwork/on->swork works.
3. Release socket.

Regards,
Ying

  This allows
> either tipc_nametbl_withdraw() or tipc_topsrv_stop() to perform
> tipc_sock_release().
>
> Since tipc_exit_net() first calls tipc_topsrv_stop() and then
> tipc_nametble_withdraw() increases the chances for the later to
> perform the connection cleanup.
>
> The soft lockup occurs in the call chain of tipc_nametbl_withdraw(),
> when it performs the tipc_conn_kref_release() as it tries to grab
> nametbl_lock again while holding it already.
> tipc_nametbl_withdraw() grabs nametbl_lock
>   tipc_nametbl_remove_publ()
>     tipc_subscrp_report_overlap()
>       tipc_subscrp_send_event()
>         tipc_conn_sendmsg()
>           << if (con->flags != CF_CONNECTED) we do conn_put(),
>              triggering the cleanup as refcount=0. >>
>           tipc_conn_kref_release
>             tipc_sock_release
>               tipc_conn_release
>                 tipc_subscrb_delete
>                   tipc_subscrp_delete
>                     tipc_nametbl_unsubscribe << Soft Lockup >>
>
> Until now, tipc_server_stop() grabs and releases the idr_lock twice
> for every connection. Once to find the connection and second to unset
> connection flag, remove it from list and decrement the refcount.
> The above lockup occurs when tipc_nametbl_withdraw() grabs the
> connection in between the two, thus owning the connection
> and triggering the cleanup while decrementing the refcount later.
>
> In this commit, we perform:
> - call tipc_nametbl_withdraw() before tipc_topsrv_stop() to avoid:
>   1. soft lockup
>   2. its too late to actually notify the subscribers, as the topology
>      server might already have started shutting down.
> - In tipc_server_stop(), we remove all the connections from connection
> list in the scope of the idr_lock to prevent any other thread finding
> a connection which has unset CF_CONNECTED in its flags.
>
> Fixes: 333f796235a52727 ("tipc: fix a race condition leading to subscriber 
> refcnt bug")
> Signed-off-by: Parthasarathy Bhuvaragan 
> <parthasarathy.bhuvara...@ericsson.com>
> ---
> v2: commit message update.
>     cleanup tipc_server_stop() as per Ying Xue.
> ---
>  net/tipc/core.c   |  4 ++++
>  net/tipc/net.c    |  2 --
>  net/tipc/server.c | 13 ++++++++-----
>  3 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/net/tipc/core.c b/net/tipc/core.c
> index 236b043a4156..3ea1452e2f06 100644
> --- a/net/tipc/core.c
> +++ b/net/tipc/core.c
> @@ -93,6 +93,10 @@ static int __net_init tipc_init_net(struct net *net)
>
>  static void __net_exit tipc_exit_net(struct net *net)
>  {
> +     struct tipc_net *tn = net_generic(net, tipc_net_id);
> +
> +     tipc_nametbl_withdraw(net, TIPC_CFG_SRV, tn->own_addr, 0,
> +                           tn->own_addr);
>       tipc_topsrv_stop(net);
>       tipc_net_stop(net);
>       tipc_bcast_stop(net);
> diff --git a/net/tipc/net.c b/net/tipc/net.c
> index 28bf4feeb81c..92696cc6e763 100644
> --- a/net/tipc/net.c
> +++ b/net/tipc/net.c
> @@ -130,8 +130,6 @@ void tipc_net_stop(struct net *net)
>       if (!tn->own_addr)
>               return;
>
> -     tipc_nametbl_withdraw(net, TIPC_CFG_SRV, tn->own_addr, 0,
> -                           tn->own_addr);
>       rtnl_lock();
>       tipc_bearer_stop(net);
>       tipc_node_stop(net);
> diff --git a/net/tipc/server.c b/net/tipc/server.c
> index 215849ce453d..54c23ecccd26 100644
> --- a/net/tipc/server.c
> +++ b/net/tipc/server.c
> @@ -623,16 +623,19 @@ int tipc_server_start(struct tipc_server *s)
>  void tipc_server_stop(struct tipc_server *s)
>  {
>       struct tipc_conn *con;
> -     int total = 0;
>       int id;
>
>       spin_lock_bh(&s->idr_lock);
> -     for (id = 0; total < s->idr_in_use; id++) {
> +     for (id = 0; s->idr_in_use; id++) {
>               con = idr_find(&s->conn_idr, id);
> -             if (con) {
> -                     total++;
> +             if (con && test_and_clear_bit(CF_CONNECTED, &con->flags)) {
> +                     idr_remove(&s->conn_idr, con->conid);
> +                     s->idr_in_use--;
> +                     /* release spinlock as kernel_sock_shutdown can sleep.
> +                      */
>                       spin_unlock_bh(&s->idr_lock);
> -                     tipc_close_conn(con);
> +                     kernel_sock_shutdown(con->sock, SHUT_RDWR);
> +                     conn_put(con);
>                       spin_lock_bh(&s->idr_lock);
>               }
>       }
>


------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to