[tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-13 Thread Jon Maloy
We sometimes observe a 'deadly embrace' type deadlock occurring between mutually connected sockets on the same node. This happens when the one-hour peer supervision timers happen to expire simultaneously in both sockets. The scenario is as follows: CPU 1: CPU 2:

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-13 Thread Erik Hugne
On Mon, Jun 13, 2016 at 01:51:15PM -0400, Jon Maloy wrote: > --- a/net/tipc/socket.c > +++ b/net/tipc/socket.c > @@ -1830,6 +1834,14 @@ void tipc_sk_rcv(struct net *net, struct sk_buff_head > *inputq) > tipc_sk_enqueue(inputq, sk, dport); >

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-13 Thread Jon Maloy
On 06/13/2016 02:24 PM, Erik Hugne wrote: > On Mon, Jun 13, 2016 at 01:51:15PM -0400, Jon Maloy wrote: >> --- a/net/tipc/socket.c >> +++ b/net/tipc/socket.c >> @@ -1830,6 +1834,14 @@ void tipc_sk_rcv(struct net *net, struct sk_buff_head >> *inputq) >> tipc_sk_enqueue

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-13 Thread Erik Hugne
On Jun 13, 2016 20:29, "Jon Maloy" wrote: > I already thought about that, and checked the implementation of > skb_dequeue(). As I suspected it unconditionally grabs the queue lock > before checking if there is anything to return. We don't want that on > the critical data path. > What i suggested

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-13 Thread Jon Maloy
On 06/13/2016 03:20 PM, Erik Hugne wrote: > > > On Jun 13, 2016 20:29, "Jon Maloy" > wrote: > > I already thought about that, and checked the implementation of > > skb_dequeue(). As I suspected it unconditionally grabs the queue lock > > before checking if there is anyt

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-13 Thread Erik Hugne
On Jun 14, 2016 00:42, "Jon Maloy" wrote: > It definitely is. When checking for !skb_queue_empty() the only thing done is a comparison between the list->next pointer and the list pointer itself, i.e., the lock is not taken. Since this condition almost never is true, we can skip over the content o

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-14 Thread Jon Maloy
> -Original Message- > From: Xue, Ying [mailto:ying@windriver.com] > Sent: Tuesday, 14 June, 2016 05:46 > To: Jon Maloy; tipc-discussion@lists.sourceforge.net; Parthasarathy > Bhuvaragan; > Richard Alpe > Cc: ma...@donjonn.com; gbala...@gmail.com > Subject: RE: [PATCH net v3 1/1] tip

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-15 Thread Xue, Ying
gbala...@gmail.com Subject: Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock On Mon, Jun 13, 2016 at 01:51:15PM -0400, Jon Maloy wrote: > --- a/net/tipc/socket.c > +++ b/net/tipc/socket.c > @@ -1830,6 +1834,14 @@ void tipc_sk_rcv(struct net *net, struct sk_buff_h

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-15 Thread Xue, Ying
I concern whether it's really safe for us by testing socket owner instead of sk_lock.slock. We have identified the deadlock reason is because tipc_sk_respond() is called under socket lock. Just for the case of responding to CONN_MANAGER messages, testing socket owner is safe. But the solution

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-15 Thread Xue, Ying
> Just for the case of responding to CONN_MANAGER messages, testing > socket owner is safe. But the solution seems unsafe when > TIPC_ERR_OVERLOAD message is sent. > > While TIPC_ERR_OVERLOAD is sent out under BH context, socket's owner is set. Where? There are two locations where tipc_sk_res