On 12/06/2017 03:31 AM, Jon Maloy wrote:
>> If you in remove the connection item from the idr
>> list *before* you flush the queues and remove the socket, there is no risk
>> that this will happen.
>> If tipc_conn_sendmsg() obtains a pointer, we can be certain that the queue
>> has not been flushed yet, and that this will not happen until conn_put() is
>> called.
>> And, you won't need to use the callback lock or touch the socket at all here.
> It's even better: if you introduce the !kref_get_unless_zero(&con->kref)) 
> test, there is no way tipc_conn_sendmsg() will obtain a pointer while  
> tipc_conn_kref_release() is being executed, i.e., after the queue is flushed.
> I suggest you make both these changes.


Please let me use below call chains to show what the race condition is:

  CPU 1:                                       CPU 2:
==========                                 ===============
T1: tipc_conn_sendmsg
T2:  con = tipc_conn_lookup()
T3:  test_bit(CF_CONNECTED, &con->flags)
T4:   =====> interrupted by another thread;
T5:                                        tipc_close_conn()
T6:                                        test_and_clear_bit(con->flag)
T7:                                        tipc_unregister_callbacks()
T8:                                        flush_work(&con->swork)
T9:                                        flush_work(&con->rwork)
T10: queue_work(s->send_wq, &con->swork)


The key point is that the whole process of identifying whether conn
is/being closed with test_bit(CF_CONNECTED, &con->flags) or
sk->sk_user_data and queuing swork item into workqueue is not atomic.
Even if all checking for connection status are well done and connection
is still connected before swork item is queued, if the thread of running
tipc_conn_sendmsg() is interrupted at T4 and it's backed at T10, a swork
might be queued into workqueue after its queue has been flushed in
tipc_close_conn() on CPU 2.

That's why we need use sk_callback_lock read lock to guarantee that the
entire process of identifying connection status and queuing swork item
cannot be interrupted.

> 
> ///jon
> 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to