Hi Ying, I tested with the 3 patches applied: 1/3: tipc: fix nametbl_lock soft lockup at node/link events 2/3: tipc: fix nametbl_lock soft lockup at module exit 3/3: tipc: move connection cleanup to a workqueue
In my case the soft lockup I was seeing was resolved by patch 3 ("tipc: move connection cleanup to a workqueue"). I have not tested without patch 2 applied. Regards, John On Tue, Dec 13, 2016 at 12:37 AM, Ying Xue <ying....@windriver.com> wrote: > Hi Parth, > > Sorry for late response. > > As I could not find your v3 version, I just give comments based on the > version. > > On 11/22/2016 12:27 AM, Parthasarathy Bhuvaragan wrote: > >> Commit 333f796235a527 ("tipc: fix a race condition leading to >> subscriber refcnt bug") reveals a soft lockup while acquiring >> nametbl_lock. >> >> Before commit 333f796235a527, we call tipc_conn_shutdown() from >> tipc_close_conn() in the context of tipc_topsrv_stop(). In that >> context, we are allowed to grab the nametbl_lock. >> >> In commit 333f796235a527, i moved the tipc_conn_release (renamed from >> tipc_conn_shutdown) to the connection refcount cleanup. >> > > Can you please confirm whether the soft lockup doesn't happen any more if > we don't adjust the sequence of tipc_conn_release? > > If yes, I think we can propose other method to fix the issue solved by > commit 333f796235a527, but in the new method it's unnecessary to adjust the > order of tipc_conn_release. > > In fact the order of tipc_conn_shutdown, tipc_unregister_callbacks and > tipc_conn_release is very important. When I wrote that code, I spent much > time considering how to carefully close connection. > > In my opinion, the ideal order is still as belows: > > 1, Close connection; > 2. Call tipc_unregister_callbacks to let sk->sk_user_data. As long as > sk->sk_user_data is 0, no more data will be submitted to > con->rwork/on->swork works. > 3. Release socket. > > Regards, > Ying > > > This allows > >> either tipc_nametbl_withdraw() or tipc_topsrv_stop() to perform >> tipc_sock_release(). >> >> Since tipc_exit_net() first calls tipc_topsrv_stop() and then >> tipc_nametble_withdraw() increases the chances for the later to >> perform the connection cleanup. >> >> The soft lockup occurs in the call chain of tipc_nametbl_withdraw(), >> when it performs the tipc_conn_kref_release() as it tries to grab >> nametbl_lock again while holding it already. >> tipc_nametbl_withdraw() grabs nametbl_lock >> tipc_nametbl_remove_publ() >> tipc_subscrp_report_overlap() >> tipc_subscrp_send_event() >> tipc_conn_sendmsg() >> << if (con->flags != CF_CONNECTED) we do conn_put(), >> triggering the cleanup as refcount=0. >> >> tipc_conn_kref_release >> tipc_sock_release >> tipc_conn_release >> tipc_subscrb_delete >> tipc_subscrp_delete >> tipc_nametbl_unsubscribe << Soft Lockup >> >> >> Until now, tipc_server_stop() grabs and releases the idr_lock twice >> for every connection. Once to find the connection and second to unset >> connection flag, remove it from list and decrement the refcount. >> The above lockup occurs when tipc_nametbl_withdraw() grabs the >> connection in between the two, thus owning the connection >> and triggering the cleanup while decrementing the refcount later. >> >> In this commit, we perform: >> - call tipc_nametbl_withdraw() before tipc_topsrv_stop() to avoid: >> 1. soft lockup >> 2. its too late to actually notify the subscribers, as the topology >> server might already have started shutting down. >> - In tipc_server_stop(), we remove all the connections from connection >> list in the scope of the idr_lock to prevent any other thread finding >> a connection which has unset CF_CONNECTED in its flags. >> >> Fixes: 333f796235a52727 ("tipc: fix a race condition leading to >> subscriber refcnt bug") >> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@eric >> sson.com> >> --- >> v2: commit message update. >> cleanup tipc_server_stop() as per Ying Xue. >> --- >> net/tipc/core.c | 4 ++++ >> net/tipc/net.c | 2 -- >> net/tipc/server.c | 13 ++++++++----- >> 3 files changed, 12 insertions(+), 7 deletions(-) >> >> diff --git a/net/tipc/core.c b/net/tipc/core.c >> index 236b043a4156..3ea1452e2f06 100644 >> --- a/net/tipc/core.c >> +++ b/net/tipc/core.c >> @@ -93,6 +93,10 @@ static int __net_init tipc_init_net(struct net *net) >> >> static void __net_exit tipc_exit_net(struct net *net) >> { >> + struct tipc_net *tn = net_generic(net, tipc_net_id); >> + >> + tipc_nametbl_withdraw(net, TIPC_CFG_SRV, tn->own_addr, 0, >> + tn->own_addr); >> tipc_topsrv_stop(net); >> tipc_net_stop(net); >> tipc_bcast_stop(net); >> diff --git a/net/tipc/net.c b/net/tipc/net.c >> index 28bf4feeb81c..92696cc6e763 100644 >> --- a/net/tipc/net.c >> +++ b/net/tipc/net.c >> @@ -130,8 +130,6 @@ void tipc_net_stop(struct net *net) >> if (!tn->own_addr) >> return; >> >> - tipc_nametbl_withdraw(net, TIPC_CFG_SRV, tn->own_addr, 0, >> - tn->own_addr); >> rtnl_lock(); >> tipc_bearer_stop(net); >> tipc_node_stop(net); >> diff --git a/net/tipc/server.c b/net/tipc/server.c >> index 215849ce453d..54c23ecccd26 100644 >> --- a/net/tipc/server.c >> +++ b/net/tipc/server.c >> @@ -623,16 +623,19 @@ int tipc_server_start(struct tipc_server *s) >> void tipc_server_stop(struct tipc_server *s) >> { >> struct tipc_conn *con; >> - int total = 0; >> int id; >> >> spin_lock_bh(&s->idr_lock); >> - for (id = 0; total < s->idr_in_use; id++) { >> + for (id = 0; s->idr_in_use; id++) { >> con = idr_find(&s->conn_idr, id); >> - if (con) { >> - total++; >> + if (con && test_and_clear_bit(CF_CONNECTED, >> &con->flags)) { >> + idr_remove(&s->conn_idr, con->conid); >> + s->idr_in_use--; >> + /* release spinlock as kernel_sock_shutdown can >> sleep. >> + */ >> spin_unlock_bh(&s->idr_lock); >> - tipc_close_conn(con); >> + kernel_sock_shutdown(con->sock, SHUT_RDWR); >> + conn_put(con); >> spin_lock_bh(&s->idr_lock); >> } >> } >> >> > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion