Hi Ying,

I tested with the 3 patches applied:
1/3: tipc: fix nametbl_lock soft lockup at node/link events
2/3: tipc: fix nametbl_lock soft lockup at module exit
3/3: tipc: move connection cleanup to a workqueue

In my case the soft lockup I was seeing was resolved by patch 3 ("tipc:
move connection cleanup to a workqueue").

I have not tested without patch 2 applied.

Regards,
John



On Tue, Dec 13, 2016 at 12:37 AM, Ying Xue <ying....@windriver.com> wrote:

> Hi Parth,
>
> Sorry for late response.
>
> As I could not find your v3 version, I just give comments based on the
> version.
>
> On 11/22/2016 12:27 AM, Parthasarathy Bhuvaragan wrote:
>
>> Commit 333f796235a527 ("tipc: fix a race condition leading to
>> subscriber refcnt bug") reveals a soft lockup while acquiring
>> nametbl_lock.
>>
>> Before commit 333f796235a527, we call tipc_conn_shutdown() from
>> tipc_close_conn() in the context of tipc_topsrv_stop(). In that
>> context, we are allowed to grab the nametbl_lock.
>>
>> In commit 333f796235a527, i moved the tipc_conn_release (renamed from
>> tipc_conn_shutdown) to the connection refcount cleanup.
>>
>
> Can you please confirm whether the soft lockup doesn't happen any more if
> we don't adjust the sequence of tipc_conn_release?
>
> If yes, I think we can propose other method to fix the issue solved by
> commit 333f796235a527, but in the new method it's unnecessary to adjust the
> order of tipc_conn_release.
>
> In fact the order of tipc_conn_shutdown, tipc_unregister_callbacks and
> tipc_conn_release is very important. When I wrote that code, I spent much
> time considering how to carefully close connection.
>
> In my opinion, the ideal order is still as belows:
>
> 1, Close connection;
> 2. Call tipc_unregister_callbacks to let sk->sk_user_data. As long as
> sk->sk_user_data is 0, no more data will be submitted to
> con->rwork/on->swork works.
> 3. Release socket.
>
> Regards,
> Ying
>
>
>  This allows
>
>> either tipc_nametbl_withdraw() or tipc_topsrv_stop() to perform
>> tipc_sock_release().
>>
>> Since tipc_exit_net() first calls tipc_topsrv_stop() and then
>> tipc_nametble_withdraw() increases the chances for the later to
>> perform the connection cleanup.
>>
>> The soft lockup occurs in the call chain of tipc_nametbl_withdraw(),
>> when it performs the tipc_conn_kref_release() as it tries to grab
>> nametbl_lock again while holding it already.
>> tipc_nametbl_withdraw() grabs nametbl_lock
>>   tipc_nametbl_remove_publ()
>>     tipc_subscrp_report_overlap()
>>       tipc_subscrp_send_event()
>>         tipc_conn_sendmsg()
>>           << if (con->flags != CF_CONNECTED) we do conn_put(),
>>              triggering the cleanup as refcount=0. >>
>>           tipc_conn_kref_release
>>             tipc_sock_release
>>               tipc_conn_release
>>                 tipc_subscrb_delete
>>                   tipc_subscrp_delete
>>                     tipc_nametbl_unsubscribe << Soft Lockup >>
>>
>> Until now, tipc_server_stop() grabs and releases the idr_lock twice
>> for every connection. Once to find the connection and second to unset
>> connection flag, remove it from list and decrement the refcount.
>> The above lockup occurs when tipc_nametbl_withdraw() grabs the
>> connection in between the two, thus owning the connection
>> and triggering the cleanup while decrementing the refcount later.
>>
>> In this commit, we perform:
>> - call tipc_nametbl_withdraw() before tipc_topsrv_stop() to avoid:
>>   1. soft lockup
>>   2. its too late to actually notify the subscribers, as the topology
>>      server might already have started shutting down.
>> - In tipc_server_stop(), we remove all the connections from connection
>> list in the scope of the idr_lock to prevent any other thread finding
>> a connection which has unset CF_CONNECTED in its flags.
>>
>> Fixes: 333f796235a52727 ("tipc: fix a race condition leading to
>> subscriber refcnt bug")
>> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@eric
>> sson.com>
>> ---
>> v2: commit message update.
>>     cleanup tipc_server_stop() as per Ying Xue.
>> ---
>>  net/tipc/core.c   |  4 ++++
>>  net/tipc/net.c    |  2 --
>>  net/tipc/server.c | 13 ++++++++-----
>>  3 files changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/net/tipc/core.c b/net/tipc/core.c
>> index 236b043a4156..3ea1452e2f06 100644
>> --- a/net/tipc/core.c
>> +++ b/net/tipc/core.c
>> @@ -93,6 +93,10 @@ static int __net_init tipc_init_net(struct net *net)
>>
>>  static void __net_exit tipc_exit_net(struct net *net)
>>  {
>> +       struct tipc_net *tn = net_generic(net, tipc_net_id);
>> +
>> +       tipc_nametbl_withdraw(net, TIPC_CFG_SRV, tn->own_addr, 0,
>> +                             tn->own_addr);
>>         tipc_topsrv_stop(net);
>>         tipc_net_stop(net);
>>         tipc_bcast_stop(net);
>> diff --git a/net/tipc/net.c b/net/tipc/net.c
>> index 28bf4feeb81c..92696cc6e763 100644
>> --- a/net/tipc/net.c
>> +++ b/net/tipc/net.c
>> @@ -130,8 +130,6 @@ void tipc_net_stop(struct net *net)
>>         if (!tn->own_addr)
>>                 return;
>>
>> -       tipc_nametbl_withdraw(net, TIPC_CFG_SRV, tn->own_addr, 0,
>> -                             tn->own_addr);
>>         rtnl_lock();
>>         tipc_bearer_stop(net);
>>         tipc_node_stop(net);
>> diff --git a/net/tipc/server.c b/net/tipc/server.c
>> index 215849ce453d..54c23ecccd26 100644
>> --- a/net/tipc/server.c
>> +++ b/net/tipc/server.c
>> @@ -623,16 +623,19 @@ int tipc_server_start(struct tipc_server *s)
>>  void tipc_server_stop(struct tipc_server *s)
>>  {
>>         struct tipc_conn *con;
>> -       int total = 0;
>>         int id;
>>
>>         spin_lock_bh(&s->idr_lock);
>> -       for (id = 0; total < s->idr_in_use; id++) {
>> +       for (id = 0; s->idr_in_use; id++) {
>>                 con = idr_find(&s->conn_idr, id);
>> -               if (con) {
>> -                       total++;
>> +               if (con && test_and_clear_bit(CF_CONNECTED,
>> &con->flags)) {
>> +                       idr_remove(&s->conn_idr, con->conid);
>> +                       s->idr_in_use--;
>> +                       /* release spinlock as kernel_sock_shutdown can
>> sleep.
>> +                        */
>>                         spin_unlock_bh(&s->idr_lock);
>> -                       tipc_close_conn(con);
>> +                       kernel_sock_shutdown(con->sock, SHUT_RDWR);
>> +                       conn_put(con);
>>                         spin_lock_bh(&s->idr_lock);
>>                 }
>>         }
>>
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to