On 12/12/2016 12:37 PM, Ying Xue wrote: > Hi Parth, > > Sorry for late response. > > As I could not find your v3 version, I just give comments based on the > version. > > On 11/22/2016 12:27 AM, Parthasarathy Bhuvaragan wrote: >> Commit 333f796235a527 ("tipc: fix a race condition leading to >> subscriber refcnt bug") reveals a soft lockup while acquiring >> nametbl_lock. >> >> Before commit 333f796235a527, we call tipc_conn_shutdown() from >> tipc_close_conn() in the context of tipc_topsrv_stop(). In that >> context, we are allowed to grab the nametbl_lock. >> >> In commit 333f796235a527, i moved the tipc_conn_release (renamed from >> tipc_conn_shutdown) to the connection refcount cleanup. > > Can you please confirm whether the soft lockup doesn't happen any more > if we don't adjust the sequence of tipc_conn_release? > > If yes, I think we can propose other method to fix the issue solved by > commit 333f796235a527, but in the new method it's unnecessary to adjust > the order of tipc_conn_release. > > In fact the order of tipc_conn_shutdown, tipc_unregister_callbacks and > tipc_conn_release is very important. When I wrote that code, I spent > much time considering how to carefully close connection. > > In my opinion, the ideal order is still as belows: > > 1, Close connection; > 2. Call tipc_unregister_callbacks to let sk->sk_user_data. As long as > sk->sk_user_data is 0, no more data will be submitted to > con->rwork/on->swork works. > 3. Release socket. Yes, with your proposed change the soft lockup reported by John will go away but it does not avoid the problem which is fixed by commit 333f796235a527. As long as we yield and let the scheduler schedule a new work item, we will break the single threaded work queue assumption.
I tested with the proposed ideal order and does not fix the fault fixed by commit 333f796235a527. Thanks for the review. I will spend some more time to find a simpler solution for both. Regards Partha > > Regards, > Ying > > This allows >> either tipc_nametbl_withdraw() or tipc_topsrv_stop() to perform >> tipc_sock_release(). >> >> Since tipc_exit_net() first calls tipc_topsrv_stop() and then >> tipc_nametble_withdraw() increases the chances for the later to >> perform the connection cleanup. >> >> The soft lockup occurs in the call chain of tipc_nametbl_withdraw(), >> when it performs the tipc_conn_kref_release() as it tries to grab >> nametbl_lock again while holding it already. >> tipc_nametbl_withdraw() grabs nametbl_lock >> tipc_nametbl_remove_publ() >> tipc_subscrp_report_overlap() >> tipc_subscrp_send_event() >> tipc_conn_sendmsg() >> << if (con->flags != CF_CONNECTED) we do conn_put(), >> triggering the cleanup as refcount=0. >> >> tipc_conn_kref_release >> tipc_sock_release >> tipc_conn_release >> tipc_subscrb_delete >> tipc_subscrp_delete >> tipc_nametbl_unsubscribe << Soft Lockup >> >> >> Until now, tipc_server_stop() grabs and releases the idr_lock twice >> for every connection. Once to find the connection and second to unset >> connection flag, remove it from list and decrement the refcount. >> The above lockup occurs when tipc_nametbl_withdraw() grabs the >> connection in between the two, thus owning the connection >> and triggering the cleanup while decrementing the refcount later. >> >> In this commit, we perform: >> - call tipc_nametbl_withdraw() before tipc_topsrv_stop() to avoid: >> 1. soft lockup >> 2. its too late to actually notify the subscribers, as the topology >> server might already have started shutting down. >> - In tipc_server_stop(), we remove all the connections from connection >> list in the scope of the idr_lock to prevent any other thread finding >> a connection which has unset CF_CONNECTED in its flags. >> >> Fixes: 333f796235a52727 ("tipc: fix a race condition leading to >> subscriber refcnt bug") >> Signed-off-by: Parthasarathy Bhuvaragan >> <parthasarathy.bhuvara...@ericsson.com> >> --- >> v2: commit message update. >> cleanup tipc_server_stop() as per Ying Xue. >> --- >> net/tipc/core.c | 4 ++++ >> net/tipc/net.c | 2 -- >> net/tipc/server.c | 13 ++++++++----- >> 3 files changed, 12 insertions(+), 7 deletions(-) >> >> diff --git a/net/tipc/core.c b/net/tipc/core.c >> index 236b043a4156..3ea1452e2f06 100644 >> --- a/net/tipc/core.c >> +++ b/net/tipc/core.c >> @@ -93,6 +93,10 @@ static int __net_init tipc_init_net(struct net *net) >> >> static void __net_exit tipc_exit_net(struct net *net) >> { >> + struct tipc_net *tn = net_generic(net, tipc_net_id); >> + >> + tipc_nametbl_withdraw(net, TIPC_CFG_SRV, tn->own_addr, 0, >> + tn->own_addr); >> tipc_topsrv_stop(net); >> tipc_net_stop(net); >> tipc_bcast_stop(net); >> diff --git a/net/tipc/net.c b/net/tipc/net.c >> index 28bf4feeb81c..92696cc6e763 100644 >> --- a/net/tipc/net.c >> +++ b/net/tipc/net.c >> @@ -130,8 +130,6 @@ void tipc_net_stop(struct net *net) >> if (!tn->own_addr) >> return; >> >> - tipc_nametbl_withdraw(net, TIPC_CFG_SRV, tn->own_addr, 0, >> - tn->own_addr); >> rtnl_lock(); >> tipc_bearer_stop(net); >> tipc_node_stop(net); >> diff --git a/net/tipc/server.c b/net/tipc/server.c >> index 215849ce453d..54c23ecccd26 100644 >> --- a/net/tipc/server.c >> +++ b/net/tipc/server.c >> @@ -623,16 +623,19 @@ int tipc_server_start(struct tipc_server *s) >> void tipc_server_stop(struct tipc_server *s) >> { >> struct tipc_conn *con; >> - int total = 0; >> int id; >> >> spin_lock_bh(&s->idr_lock); >> - for (id = 0; total < s->idr_in_use; id++) { >> + for (id = 0; s->idr_in_use; id++) { >> con = idr_find(&s->conn_idr, id); >> - if (con) { >> - total++; >> + if (con && test_and_clear_bit(CF_CONNECTED, &con->flags)) { >> + idr_remove(&s->conn_idr, con->conid); >> + s->idr_in_use--; >> + /* release spinlock as kernel_sock_shutdown can sleep. >> + */ >> spin_unlock_bh(&s->idr_lock); >> - tipc_close_conn(con); >> + kernel_sock_shutdown(con->sock, SHUT_RDWR); >> + conn_put(con); >> spin_lock_bh(&s->idr_lock); >> } >> } >> > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion