I concern whether it's really safe for us by testing socket owner instead of sk_lock.slock. We have identified the deadlock reason is because tipc_sk_respond() is called under socket lock.
Just for the case of responding to CONN_MANAGER messages, testing socket owner is safe. But the solution seems unsafe when TIPC_ERR_OVERLOAD message is sent. While TIPC_ERR_OVERLOAD is sent out under BH context, socket's owner is set. So the message will be sent out under BH through tipc_node_xmit_skb() instead of insert it into sk->sk_write_queue. Regards, Ying -----Original Message----- From: Jon Maloy [mailto:[email protected]] Sent: Tuesday, June 14, 2016 1:51 AM To: [email protected]; [email protected]; Xue, Ying; [email protected]; [email protected] Cc: [email protected]; [email protected] Subject: [PATCH net v3 1/1] tipc: fix socket timer deadlock We sometimes observe a 'deadly embrace' type deadlock occurring between mutually connected sockets on the same node. This happens when the one-hour peer supervision timers happen to expire simultaneously in both sockets. The scenario is as follows: CPU 1: CPU 2: -------- -------- tipc_sk_timeout(sk1) tipc_sk_timeout(sk2) lock(sk1.slock) lock(sk2.slock) msg_create(probe) msg_create(probe) unlock(sk1.slock) unlock(sk2.slock) tipc_node_xmit_skb() tipc_node_xmit_skb() tipc_node_xmit() tipc_node_xmit() tipc_sk_rcv(sk2) tipc_sk_rcv(sk1) lock(sk2.slock) lock((sk1.slock) filter_rcv() filter_rcv() tipc_sk_proto_rcv() tipc_sk_proto_rcv() msg_create(probe_rsp) msg_create(probe_rsp) tipc_sk_respond() tipc_sk_respond() tipc_node_xmit_skb() tipc_node_xmit_skb() tipc_node_xmit() tipc_node_xmit() tipc_sk_rcv(sk1) tipc_sk_rcv(sk2) lock((sk1.slock) lock((sk2.slock) ===> DEADLOCK ===> DEADLOCK Further analysis reveals that there are at least three different locations in the socket code where tipc_sk_respond() is called within the context of the socket lock, with ensuing risk of similar deadlocks. We solve this by ensuring that messages created by tipc_sk_respond() only are sent directly if sk_lock.owned mutex is held. Otherwise they are queued up in the socket write queue and sent after the socket lock has been released. v2: - Testing on mutex sk_lock.owned instead of sk_lock.slock in tipc_sk_respond(). This is safer, since sk_lock.slock may occasionally and briefly be held (by concurrent user contexts) even if we are in user context. v3: - By lowering the socket timeout to 36 ms instead of 3,600,000 and setting up 1000 connections I could easily reproduce the deadlock and verify that my solution works. - When killing one of the processes I sometimes got a kernel crash in the loop emptying the socket write queue. Realizing that there may be concurrent processes emptying the write queue, I had to add a test that the dequeuing actually returned a buffer. This solved the problem. - I tried Ying's suggestion with unconditionally adding all CONN_MANAGER messages to the backlog queue, and it didn't work. This is because we will often add the message to the backlog when the socket is *not* owned, so there will be nothing triggering execution of backlog_rcv() within acceptable time. Apart from that, my solution solves the problem at all three locations where this deadlock may happen, as already stated above. Reported-by: GUNA <[email protected]> Signed-off-by: Jon Maloy <[email protected]> --- net/tipc/socket.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 88bfcd7..e8ed3a8 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -278,7 +278,11 @@ static void tipc_sk_respond(struct sock *sk, struct sk_buff *skb, int err) dnode = msg_destnode(buf_msg(skb)); selector = msg_origport(buf_msg(skb)); - tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector); + + if (sock_owned_by_user(sk)) + tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector); + else + skb_queue_tail(&sk->sk_write_queue, skb); } /** @@ -1830,6 +1834,14 @@ void tipc_sk_rcv(struct net *net, struct sk_buff_head *inputq) tipc_sk_enqueue(inputq, sk, dport); spin_unlock_bh(&sk->sk_lock.slock); } + /* Send pending response/rejected messages, if any */ + while (!skb_queue_empty(&sk->sk_write_queue)) { + skb = skb_dequeue(&sk->sk_write_queue); + if (!skb) + break; + dnode = msg_destnode(buf_msg(skb)); + tipc_node_xmit_skb(net, skb, dnode, dport); + } sock_put(sk); continue; } -- 1.9.1 ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381 _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
