Thank you for your thorough testing. This is really the kind of work that helps us keeping out of trouble. See my comments below.
///jon > -----Original Message----- > From: Parthasarathy Bhuvaragan > Sent: Tuesday, 20 December, 2016 07:08 > To: Jon Maloy <[email protected]>; [email protected]; > Ying Xue <[email protected]> > Subject: Re: [tipc-discussion] [net-next v3 3/3] tipc: reduce risk of user > starvation > during link congestion > > On 12/20/2016 12:32 PM, Parthasarathy Bhuvaragan wrote: > > Hi Jon, > > > > This patch in the series caused multiple stability issues in my test. I > > spent couple of days to fix all of them in the patches attached. > > > > If you are ok with the fixes, then > > Reviewed-by: Parthasarathy Bhuvaragan > > <[email protected]> > > > > /Partha > I see that the attachments were not delivered. > So sending all the 4 patches inline. > > /Partha > > --- > > From 5e1d48de1e441c5583872d2c2d95cc9353ddd63e Mon Sep 17 00:00:00 2001 > From: Parthasarathy Bhuvaragan <[email protected]> > Date: Thu, 15 Dec 2016 16:02:20 +0100 > Subject: [PATCH net-next v1 1/4] tipc: fix node error handling at link > congestion > > Since the link can return ELINKCONG, but still transmits the frame > on the link, we need to adjust the return code handling in the node. > > Signed-off-by: Parthasarathy Bhuvaragan > <[email protected]> > --- > net/tipc/node.c | 16 +++++----------- > 1 file changed, 5 insertions(+), 11 deletions(-) > > diff --git a/net/tipc/node.c b/net/tipc/node.c > index f51d360dac77..edfb05746131 100644 > --- a/net/tipc/node.c > +++ b/net/tipc/node.c > @@ -1206,10 +1206,10 @@ int tipc_node_xmit(struct net *net, struct > sk_buff_head *list, > spin_unlock_bh(&le->lock); > tipc_node_read_unlock(n); > > - if (likely(rc == 0)) > - tipc_bearer_xmit(net, bearer_id, &xmitq, &le->maddr); > - else if (rc == -ENOBUFS) > + if (unlikely(rc == -ENOBUFS)) > tipc_node_link_down(n, bearer_id, false); > + else > + tipc_bearer_xmit(net, bearer_id, &xmitq, &le->maddr); This is logically correct but not really necessary. When the link returns -ELINKCONG xmitq will always be empty. We should still do this change for clarity. > > tipc_node_put(n); > > @@ -1219,22 +1219,16 @@ int tipc_node_xmit(struct net *net, struct > sk_buff_head *list, > /* tipc_node_xmit_skb(): send single buffer to destination > * Buffers sent via this functon are generally TIPC_SYSTEM_IMPORTANCE > * messages, which will not be rejected > - * The only exception is datagram messages rerouted after secondary > - * lookup, which are rare and safe to dispose of anyway. > - * TODO: Return real return value, and let callers use > - * tipc_wait_for_sendpkt() where applicable > */ > int tipc_node_xmit_skb(struct net *net, struct sk_buff *skb, u32 dnode, > u32 selector) > { > struct sk_buff_head head; > - int rc; > > skb_queue_head_init(&head); > __skb_queue_tail(&head, skb); > - rc = tipc_node_xmit(net, &head, dnode, selector); > - if (rc == -ELINKCONG) > - kfree_skb(skb); > + tipc_node_xmit(net, &head, dnode, selector); Yes, clearly a bug. Thank you. > + > return 0; > } > > -- > 2.1.4 > > From 2c8a355536bf2089b9e216fae919150acc357b40 Mon Sep 17 00:00:00 2001 > From: Parthasarathy Bhuvaragan <[email protected]> > Date: Tue, 20 Dec 2016 12:04:28 +0100 > Subject: [PATCH net-next v1 2/4] tipc: fix sending to own node even at link > congestion > > We let the broadcast layer to transmit a copy to the own node even > if the link returns ELINKCONG. > > Signed-off-by: Parthasarathy Bhuvaragan > <[email protected]> > --- > net/tipc/bcast.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c > index 1a56cabbf389..247f87518565 100644 > --- a/net/tipc/bcast.c > +++ b/net/tipc/bcast.c > @@ -196,8 +196,10 @@ int tipc_bcast_xmit(struct net *net, struct > sk_buff_head *list) > rc = tipc_link_xmit(l, list, &xmitq); > tipc_bcast_unlock(net); > > - /* Don't send to local node if adding to link failed */ > - if (unlikely(rc)) { > + /* Don't send to local node if adding to link failed > + * for reasons other than -ELINKCONG. > + */ > + if (unlikely(rc) && (rc != -ELINKCONG)) { > __skb_queue_purge(&rcvq); > return rc; > } Also a bug, although I had already fixed that through the code changes in the later upcoming "replicast" series; this is why I didn't catch it here. > -- > 2.1.4 > > From e69d41d9836062429069364244191c15442fca43 Mon Sep 17 00:00:00 2001 > From: Parthasarathy Bhuvaragan <[email protected]> > Date: Tue, 20 Dec 2016 08:33:21 +0100 > Subject: [PATCH net-next v1 3/4] tipc: fix return code from bcast_xmit > > We must send the return code from tipc_bcast_xmit() to the caller > tipc_sendmcast() to detect link congestion in socket layer. > > Signed-off-by: Parthasarathy Bhuvaragan > <[email protected]> > --- > net/tipc/bcast.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c > index 247f87518565..c91b276f6831 100644 > --- a/net/tipc/bcast.c > +++ b/net/tipc/bcast.c > @@ -208,7 +208,7 @@ int tipc_bcast_xmit(struct net *net, struct > sk_buff_head *list) > tipc_bcbase_xmit(net, &xmitq); > tipc_sk_mcast_rcv(net, &rcvq, &inputq); > __skb_queue_purge(list); > - return 0; > + return rc; > } Also obsoleted by the next series, but ok. > > /* tipc_bcast_rcv - receive a broadcast packet, and deliver to rcv link > -- > 2.1.4 > > From b3c1ad38d0fa710630cf6df02d1ec627277b5144 Mon Sep 17 00:00:00 2001 > From: Parthasarathy Bhuvaragan <[email protected]> > Date: Tue, 20 Dec 2016 10:21:26 +0100 > Subject: [PATCH net-next v1 4/4] tipc: cleanup the congested links for > multicast > > If the same socket is used for unicast and multicast, then the > user experiences congestion on the multicast link if any of the > unicast link was congested in prior send. This is also ok, but when I later introduce replicast this is exactly the effect we want, and that is why I did it this way. I could of course make it smarter and look into the "method" struct when the return value is -ELINKCONG, and only purge when I see that the broadcast link was used. What worries me is what would happen if the next message also happens to be a unicast to the congested/now-purged link. > > In this commit, we purge the congested links in tipc_sendmcast(). > > Signed-off-by: Parthasarathy Bhuvaragan > <[email protected]> > --- > net/tipc/socket.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/tipc/socket.c b/net/tipc/socket.c > index 3dd851aaf050..cc82b8ac8e97 100644 > --- a/net/tipc/socket.c > +++ b/net/tipc/socket.c > @@ -553,6 +553,7 @@ static int tipc_release(struct socket *sock) > /* Reject any messages that accumulated in backlog queue */ > release_sock(sk); > u32_list_purge(&tsk->cong_links); > + tsk->cong_link_cnt = 0; This part is ok. > call_rcu(&tsk->rcu, tipc_sk_callback); > sock->sk = NULL; > > @@ -760,6 +761,7 @@ static int tipc_sendmcast(struct socket *sock, > struct tipc_name_seq *seq, > > rc = tipc_bcast_xmit(net, &pkts); > if (unlikely(rc == -ELINKCONG)) { > + u32_list_purge(&tsk->cong_links); Less sure about this. ///jon > tsk->cong_link_cnt = 1; > rc = 0; > } > -- > 2.1.4 ------------------------------------------------------------------------------ Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today.http://sdm.link/intel _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
