Commit 333f796235a527 ("tipc: fix a race condition leading to
subscriber refcnt bug") reveals a soft lockup while acquiring
nametbl_lock.

Before commit 333f796235a527, we call tipc_conn_shutdown() from
tipc_close_conn() in the context of tipc_topsrv_stop(). In that
context, we are allowed to grab the nametbl_lock.

In commit 333f796235a527, i moved the tipc_conn_release (renamed from
tipc_conn_shutdown) to the connection refcount cleanup. This allows
either tipc_nametbl_withdraw() or tipc_topsrv_stop() to perform
tipc_sock_release().

Since tipc_exit_net() first calls tipc_topsrv_stop() and then
tipc_nametble_withdraw() increases the chances for the later to
perform the connection cleanup.

The soft lockup occurs in the call chain of tipc_nametbl_withdraw(),
when it performs the tipc_conn_kref_release() as it tries to grab
nametbl_lock again while holding it already.
tipc_nametbl_withdraw() grabs nametbl_lock
  tipc_nametbl_remove_publ()
    tipc_subscrp_report_overlap()
      tipc_subscrp_send_event()
        tipc_conn_sendmsg()
          << if (con->flags != CF_CONNECTED) we do conn_put(),
             triggering the cleanup as refcount=0. >>
          tipc_conn_kref_release
            tipc_sock_release
              tipc_conn_release
                tipc_subscrb_delete
                  tipc_subscrp_delete
                    tipc_nametbl_unsubscribe << Soft Lockup >>

Until now, tipc_server_stop() grabs and releases the idr_lock twice
for every connection. Once to find the connection and second to unset
connection flag, remove it from list and decrement the refcount.
The above lockup occurs when tipc_nametbl_withdraw() grabs the
connection in between the two, thus owning the connection
and triggering the cleanup while decrementing the refcount later.

In this commit, we perform:
- call tipc_nametbl_withdraw() before tipc_topsrv_stop() to avoid:
  1. soft lockup
  2. its too late to actually notify the subscribers, as the topology
     server might already have started shutting down.
- In tipc_server_stop(), we remove all the connections from connection
list in the scope of the idr_lock to prevent any other thread finding
a connection which has unset CF_CONNECTED in its flags.

Fixes: 333f796235a52727 ("tipc: fix a race condition leading to subscriber 
refcnt bug")
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
---
v2: commit message update.
    cleanup tipc_server_stop() as per Ying Xue.
---
 net/tipc/core.c   |  4 ++++
 net/tipc/net.c    |  2 --
 net/tipc/server.c | 13 ++++++++-----
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/tipc/core.c b/net/tipc/core.c
index 236b043a4156..3ea1452e2f06 100644
--- a/net/tipc/core.c
+++ b/net/tipc/core.c
@@ -93,6 +93,10 @@ static int __net_init tipc_init_net(struct net *net)
 
 static void __net_exit tipc_exit_net(struct net *net)
 {
+       struct tipc_net *tn = net_generic(net, tipc_net_id);
+
+       tipc_nametbl_withdraw(net, TIPC_CFG_SRV, tn->own_addr, 0,
+                             tn->own_addr);
        tipc_topsrv_stop(net);
        tipc_net_stop(net);
        tipc_bcast_stop(net);
diff --git a/net/tipc/net.c b/net/tipc/net.c
index 28bf4feeb81c..92696cc6e763 100644
--- a/net/tipc/net.c
+++ b/net/tipc/net.c
@@ -130,8 +130,6 @@ void tipc_net_stop(struct net *net)
        if (!tn->own_addr)
                return;
 
-       tipc_nametbl_withdraw(net, TIPC_CFG_SRV, tn->own_addr, 0,
-                             tn->own_addr);
        rtnl_lock();
        tipc_bearer_stop(net);
        tipc_node_stop(net);
diff --git a/net/tipc/server.c b/net/tipc/server.c
index 215849ce453d..54c23ecccd26 100644
--- a/net/tipc/server.c
+++ b/net/tipc/server.c
@@ -623,16 +623,19 @@ int tipc_server_start(struct tipc_server *s)
 void tipc_server_stop(struct tipc_server *s)
 {
        struct tipc_conn *con;
-       int total = 0;
        int id;
 
        spin_lock_bh(&s->idr_lock);
-       for (id = 0; total < s->idr_in_use; id++) {
+       for (id = 0; s->idr_in_use; id++) {
                con = idr_find(&s->conn_idr, id);
-               if (con) {
-                       total++;
+               if (con && test_and_clear_bit(CF_CONNECTED, &con->flags)) {
+                       idr_remove(&s->conn_idr, con->conid);
+                       s->idr_in_use--;
+                       /* release spinlock as kernel_sock_shutdown can sleep.
+                        */
                        spin_unlock_bh(&s->idr_lock);
-                       tipc_close_conn(con);
+                       kernel_sock_shutdown(con->sock, SHUT_RDWR);
+                       conn_put(con);
                        spin_lock_bh(&s->idr_lock);
                }
        }
-- 
2.1.4


------------------------------------------------------------------------------
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to