Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

Jon Maloy Tue, 14 Jun 2016 06:06:20 -0700


> -----Original Message-----
> From: Xue, Ying [mailto:[email protected]]
> Sent: Tuesday, 14 June, 2016 05:46
> To: Jon Maloy; [email protected]; Parthasarathy 
> Bhuvaragan;
> Richard Alpe
> Cc: [email protected]; [email protected]
> Subject: RE: [PATCH net v3 1/1] tipc: fix socket timer deadlock
> 
> I concern whether it's really safe for us by testing socket owner instead of
> sk_lock.slock.
> We have identified the deadlock reason is because tipc_sk_respond() is called
> under socket lock.
> 
> Just for the case of responding  to CONN_MANAGER messages, testing socket
> owner is safe. But the solution seems unsafe when TIPC_ERR_OVERLOAD
> message is sent.
> 
> While TIPC_ERR_OVERLOAD is sent out under BH context, socket's owner is set.


Where? There are two locations where tipc_sk_respond() may be sent under BH, in 
tipc_sk_enqueue() and filter_rcv(), but it is either-or: either we are under 
BH, and socket is not owned, or it is owned, and we are not under BH.
I don't see at all what you are referring to.

Regards
///jon

> So the message will be sent out under BH through tipc_node_xmit_skb() instead
> of insert it into sk->sk_write_queue.
> 
> Regards,
> Ying
> 
> -----Original Message-----
> From: Jon Maloy [mailto:[email protected]]
> Sent: Tuesday, June 14, 2016 1:51 AM
> To: [email protected];
> [email protected]; Xue, Ying; [email protected];
> [email protected]
> Cc: [email protected]; [email protected]
> Subject: [PATCH net v3 1/1] tipc: fix socket timer deadlock
> 
> We sometimes observe a 'deadly embrace' type deadlock occurring between
> mutually connected sockets on the same node. This happens when the one-hour
> peer supervision timers happen to expire simultaneously in both sockets.
> 
> The scenario is as follows:
> 
> CPU 1:                          CPU 2:
> --------                        --------
> tipc_sk_timeout(sk1)            tipc_sk_timeout(sk2)
>   lock(sk1.slock)                 lock(sk2.slock)
>   msg_create(probe)               msg_create(probe)
>   unlock(sk1.slock)               unlock(sk2.slock)
>   tipc_node_xmit_skb()            tipc_node_xmit_skb()
>     tipc_node_xmit()                tipc_node_xmit()
>       tipc_sk_rcv(sk2)                tipc_sk_rcv(sk1)
>         lock(sk2.slock)                 lock((sk1.slock)
>         filter_rcv()                    filter_rcv()
>           tipc_sk_proto_rcv()             tipc_sk_proto_rcv()
>             msg_create(probe_rsp)           msg_create(probe_rsp)
>             tipc_sk_respond()               tipc_sk_respond()
>               tipc_node_xmit_skb()            tipc_node_xmit_skb()
>                 tipc_node_xmit()                tipc_node_xmit()
>                   tipc_sk_rcv(sk1)                tipc_sk_rcv(sk2)
>                     lock((sk1.slock)                lock((sk2.slock)
>                     ===> DEADLOCK                   ===> DEADLOCK
> 
> Further analysis reveals that there are at least three different locations in 
> the
> socket code where tipc_sk_respond() is called within the context of the socket
> lock, with ensuing risk of similar deadlocks.
> 
> We solve this by ensuring that messages created by tipc_sk_respond() only are
> sent directly if sk_lock.owned mutex is held. Otherwise they are queued up in
> the socket write queue and sent after the socket lock has been released.
> 
> v2: - Testing on mutex sk_lock.owned instead of sk_lock.slock in
>       tipc_sk_respond(). This is safer, since sk_lock.slock may
>       occasionally and briefly be held (by concurrent user contexts)
>       even if we are in user context.
> v3: - By lowering the socket timeout to 36 ms instead of 3,600,000 and
>       setting up 1000 connections I could easily reproduce the deadlock
>       and verify that my solution works.
>     - When killing one of the processes I sometimes got a kernel crash
>       in the loop emptying the socket write queue. Realizing that there
>       may be concurrent processes emptying the write queue, I had to add
>       a test that the dequeuing actually returned a buffer. This solved
>       the problem.
>     - I tried Ying's suggestion with unconditionally adding all
>       CONN_MANAGER messages to the backlog queue, and it didn't work.
>       This is because we will often add the message to the backlog when
>       the socket is *not* owned, so there will be nothing triggering
>       execution of backlog_rcv() within acceptable time. Apart from
>       that, my solution solves the problem at all three locations where
>       this deadlock may happen, as already stated above.
> 
> Reported-by: GUNA <[email protected]>
> Signed-off-by: Jon Maloy <[email protected]>
> ---
>  net/tipc/socket.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 88bfcd7..e8ed3a8 
> 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -278,7 +278,11 @@ static void tipc_sk_respond(struct sock *sk, struct
> sk_buff *skb, int err)
> 
>       dnode = msg_destnode(buf_msg(skb));
>       selector = msg_origport(buf_msg(skb));
> -     tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector);
> +
> +     if (sock_owned_by_user(sk))
> +             tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector);
> +     else
> +             skb_queue_tail(&sk->sk_write_queue, skb);
>  }
> 
>  /**
> @@ -1830,6 +1834,14 @@ void tipc_sk_rcv(struct net *net, struct sk_buff_head
> *inputq)
>                               tipc_sk_enqueue(inputq, sk, dport);
>                               spin_unlock_bh(&sk->sk_lock.slock);
>                       }
> +                     /* Send pending response/rejected messages, if any */
> +                     while (!skb_queue_empty(&sk->sk_write_queue)) {
> +                             skb = skb_dequeue(&sk->sk_write_queue);
> +                             if (!skb)
> +                                     break;
> +                             dnode = msg_destnode(buf_msg(skb));
> +                             tipc_node_xmit_skb(net, skb, dnode, dport);
> +                     }
>                       sock_put(sk);
>                       continue;
>               }
> --
> 1.9.1


------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

Reply via email to