Good job Jon! Reviewed. //E On Jun 10, 2016 04:25, "Jon Maloy" <[email protected]> wrote:
> We have sometimes observed a deadlock situation occurring between > mutually connected sockets on the same node. This happens when the > one-hour peer supervision timers happen to expire simultaneously > in both sockets. > > The scenario is as follows: > > CPU 1: CPU 2: > -------- -------- > tipc_sk_timeout(sk1) tipc_sk_timeout(sk2) > lock(sk1.slock) lock(sk2.slock) > msg_create(probe) msg_create(probe) > unlock(sk1.slock) unlock(sk2.slock) > tipc_node_xmit_skb() tipc_node_xmit_skb() > tipc_node_xmit() tipc_node_xmit() > tipc_sk_rcv(sk2) tipc_sk_rcv(sk1) > lock(s2.slock) lock((sk1.slock) > filter_rcv() filter_rcv() > tipc_sk_proto_rcv() tipc_sk_proto_rcv() > msg_create(probe_rsp) msg_create(probe_rsp) > tipc_sk_respond() tipc_sk_respond() > tipc_node_xmit_skb() tipc_node_xmit_skb() > tipc_node_xmit() tipc_node_xmit() > tipc_sk_rcv(sk1) tipc_sk_rcv(sk2) > lock((sk1.slock) lock((sk2.slock) > ===> DEADLOCK ===> DEADLOCK > > Further analysis reveals that there are at least three different > locations in the socket code where tipc_sk_respond() is called within > the context of the socket lock, with ensuing risk of similar deadlocks. > > We solve this by ensuring that messages created by tipc_sk_respond() > only are sent directly if sk_lock.slock is not held. Otherwise they are > queued up in the socket write queue and sent after the lock has been > released. > > Reported-by: GUNA <[email protected]> > Signed-off-by: Jon Maloy <[email protected]> > --- > net/tipc/socket.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/net/tipc/socket.c b/net/tipc/socket.c > index 88bfcd7..b19c73c 100644 > --- a/net/tipc/socket.c > +++ b/net/tipc/socket.c > @@ -278,7 +278,11 @@ static void tipc_sk_respond(struct sock *sk, struct > sk_buff *skb, int err) > > dnode = msg_destnode(buf_msg(skb)); > selector = msg_origport(buf_msg(skb)); > - tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector); > + > + if (!spin_is_locked(&sk->sk_lock.slock)) > + tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector); > + else > + skb_queue_tail(&sk->sk_write_queue, skb); > } > > /** > @@ -1830,6 +1834,12 @@ void tipc_sk_rcv(struct net *net, struct > sk_buff_head *inputq) > tipc_sk_enqueue(inputq, sk, dport); > spin_unlock_bh(&sk->sk_lock.slock); > } > + /* Send pending response messages, if any */ > + while (!skb_queue_empty(&sk->sk_write_queue)) { > + skb = skb_dequeue(&sk->sk_write_queue); > + dnode = msg_destnode(buf_msg(skb)); > + tipc_node_xmit_skb(net, skb, dnode, dport); > + } > sock_put(sk); > continue; > } > -- > 1.9.1 > > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > patterns at an interface-level. Reveals which users, apps, and protocols > are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > tipc-discussion mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
