> -----Original Message-----
> From: Ying Xue [mailto:[email protected]]
> Sent: Wednesday, December 06, 2017 03:29
> To: Jon Maloy <[email protected]>; [email protected]
> Cc: [email protected]
> Subject: Re: [net-next PATCH 1/2] tipc: prevent swork item from queuing
> workqueue after connection is closed
> 
> 
> 
> On 12/06/2017 03:31 AM, Jon Maloy wrote:
> >> If you in remove the connection item from the idr list *before* you
> >> flush the queues and remove the socket, there is no risk that this
> >> will happen.
> >> If tipc_conn_sendmsg() obtains a pointer, we can be certain that the
> >> queue has not been flushed yet, and that this will not happen until
> >> conn_put() is called.
> >> And, you won't need to use the callback lock or touch the socket at all
> here.
> > It's even better: if you introduce the !kref_get_unless_zero(&con->kref))
> test, there is no way tipc_conn_sendmsg() will obtain a pointer while
> tipc_conn_kref_release() is being executed, i.e., after the queue is flushed.
> > I suggest you make both these changes.
> 
> 
> Please let me use below call chains to show what the race condition is:
> 
>   CPU 1:                                       CPU 2:
> ==========                                 ===============
> T1: tipc_conn_sendmsg
> T2:  con = tipc_conn_lookup()
> T3:  test_bit(CF_CONNECTED, &con->flags)
> T4:   =====> interrupted by another thread;
> T5:                                        tipc_close_conn()
> T6:                                        test_and_clear_bit(con->flag)
> T7:                                        tipc_unregister_callbacks()
> T8:                                        flush_work(&con->swork)
> T9:                                        flush_work(&con->rwork)
> T10: queue_work(s->send_wq, &con->swork)
> 
> 
> The key point is that the whole process of identifying whether conn is/being
> closed with test_bit(CF_CONNECTED, &con->flags) or
> sk->sk_user_data and queuing swork item into workqueue is not atomic.

That is true. But looking at the kref value is, because if that value is zero 
nobody else has a pointer to the connection item. And vice versa, if it is 
non-zero the queues cannot not have been flushed yet.
I am not even sure that the CF_CONNECTED flag is needed anymore, but I need to 
think more about that.
What I am suggesting is that we move most of the functionality that is 
currently in close_conn() to keref_release(). Something like this:

static void tipc_close_conn(struct tipc_conn *con, bool flush)
{
        if (test_and_clear_bit(CF_CONNECTED, &con->flags))
                conn_put(con);
} 

static void tipc_conn_kref_release(struct kref *kref)
{
        struct tipc_conn *con = container_of(kref, struct tipc_conn, kref);
        struct tipc_server *s = con->server;
        struct sockaddr_tipc *saddr = s->saddr;
        struct socket *sock = con->sock;
        struct sock *sk;

        spin_lock_bh(&s->idr_lock);
        idr_remove(&s->conn_idr, con->conid);
        s->idr_in_use--;
        spin_unlock_bh(&s->idr_lock);

        if (sock) {
                sk = sock->sk;
                if (test_bit(CF_SERVER, &con->flags)) {
                        __module_get(sock->ops->owner);
                        __module_get(sk->sk_prot_creator->owner);
                }
                flush_work(&con->swork);
                flush_work(&con->rwork);
                saddr->scope = -TIPC_NODE_SCOPE;
                kernel_bind(sock, (struct sockaddr *)saddr, sizeof(*saddr));
                if (con->conid)
                        s->tipc_conn_release(con->conid, con->usr_data);
                sock_release(sock);
                con->sock = NULL;
        }
        tipc_clean_outqueues(con);
        kfree(con);
}

///jon

> Even if all checking for connection status are well done and connection is 
> still
> connected before swork item is queued, if the thread of running
> tipc_conn_sendmsg() is interrupted at T4 and it's backed at T10, a swork
> might be queued into workqueue after its queue has been flushed in
> tipc_close_conn() on CPU 2.
> 
> That's why we need use sk_callback_lock read lock to guarantee that the
> entire process of identifying connection status and queuing swork item
> cannot be interrupted.
> 
> >
> > ///jon
> >
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to