On 08/25/2016 05:41 AM, Xue, Ying wrote: > > -----Original Message----- > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > Sent: Wednesday, August 17, 2016 2:09 AM > To: tipc-discussion@lists.sourceforge.net; > parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; > jon.ma...@ericsson.com > Cc: ma...@donjonn.com; gbala...@gmail.com > Subject: [PATCH net-next 2/3] tipc: rate limit broadcast retransmissions > > As cluster sizes grow, so does the amount of identical or overlapping > broadcast NACKs generated by the packet receivers. This often leads to 'NACK > crunches' resulting in huge numbers of redundant retransmissions of the same > packet ranges. > > In this commit, we introduce rate control of broadcast retransmissions, so > that a retransmitted range cannot be retransmitted again until after at least > 10 ms. This reduces the frequency of duplicate retransmissions by an order of > magnitude, while having a significant positive impact on throughput and > scalability. > > Signed-off-by: Jon Maloy <jon.ma...@ericsson.com> > --- > net/tipc/link.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 47 insertions(+), 5 deletions(-) > > diff --git a/net/tipc/link.c b/net/tipc/link.c index 136316f..58bb44d 100644 > --- a/net/tipc/link.c > +++ b/net/tipc/link.c > @@ -181,7 +181,10 @@ struct tipc_link { > u16 acked; > struct tipc_link *bc_rcvlink; > struct tipc_link *bc_sndlink; > - int nack_state; > + unsigned long prev_retr; > + u16 prev_from; > + u16 prev_to; > + u8 nack_state; > bool bc_peer_is_up; > > /* Statistics */ > @@ -202,6 +205,8 @@ enum { > BC_NACK_SND_SUPPRESS, > }; > > +#define TIPC_BC_RETR_LIMIT 10 /* [ms] */ > > [Ying] I suggest we should define jiffies number rather than 10ms. If so, we > don't need to convert l->prev_retr from jiffies to microsecond.
jiffies is not a uniquely defined entity, as it depends on the clock speed of the actual CPU. I measured 10 ms to be a value that works better than e.g. 2 ms or 50 ms, so I don't think it is a good idea to add any more randomness to this. > > + > /* > * Interval between NACKs when packets arrive out of order > */ > @@ -1590,11 +1595,48 @@ void tipc_link_bc_init_rcv(struct tipc_link *l, > struct tipc_msg *hdr) > l->rcv_nxt = peers_snd_nxt; > } > > +/* link_bc_retr eval()- check if the indicated range can be > +retransmitted now > + * - Adjust permitted range if there is overlap with previous > +retransmission */ static bool link_bc_retr_eval(struct tipc_link *l, > +u16 *from, u16 *to) { > + unsigned long elapsed = jiffies_to_msecs(jiffies - l->prev_retr); > + > + if (less(*to, *from)) > + return false; > + > + /* New retransmission request */ > + if ((elapsed > TIPC_BC_RETR_LIMIT) || > + less(*to, l->prev_from) || more(*from, l->prev_to)) { > + l->prev_from = *from; > + l->prev_to = *to; > + l->prev_retr = jiffies; > + return true; > + } > + > > [Ying] In the my understanding, the statement above seems to be changed as > below: > > if ((elapsed > TIPC_BC_RETR_LIMIT) && (less(*to, l->prev_from) || more(*from, > l->prev_to))) > > Otherwise, I think the rate of retransmission may be still very high > especially when packet disorder frequently happens. > As it's very normal that BCast retransmission range requested from different > nodes is not same, I guess the rate is not well suppressed in above condition. > > Maybe, we can use the method which is being effective in tipc_sk_enqueue() to > limit the rate like: The condition indicates that the two retransmission requests are completely disjoint, e.g., 10-12 and 13-15. I don't see any reason why we should wait with retransmitting 13-15 in this case, as it is obvious that somebody is missing it, and we haven't retransmitted it before. But I do understand it can be problematic if re receive repeated disjoint nacks, e.g., 10-12, 13-15, 10-12, 13-15 etc. I will make a test and see if there is anything to gain from keeping a "history" of the retransmissions, so that we never repeat the same packets inside the same interval. I'll take a look. ///jon > > tipc_sk_enqueue() > { > .. > while (skb_queue_len(inputq)) { > if (unlikely(time_after_eq(jiffies, time_limit))) > return; > .. > } > > And we don’t need to consider the retransmission range. > > Regards, > Ying > > + /* Inside range of previous retransmit */ > + if (!less(*from, l->prev_from) && !more(*to, l->prev_to)) > + return false; > + > + /* Fully or partially outside previous range => exclude overlap */ > + if (less(*from, l->prev_from)) { > + *to = l->prev_from - 1; > + l->prev_from = *from; > + } > + if (more(*to, l->prev_to)) { > + *from = l->prev_to + 1; > + l->prev_to = *to; > + } > + l->prev_retr = jiffies; > + return true; > +} > + > /* tipc_link_bc_sync_rcv - update rcv link according to peer's send state > */ > int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, > struct sk_buff_head *xmitq) > { > + struct tipc_link *snd_l = l->bc_sndlink; > u16 peers_snd_nxt = msg_bc_snd_nxt(hdr); > u16 from = msg_bcast_ack(hdr) + 1; > u16 to = from + msg_bc_gap(hdr) - 1; > @@ -1613,14 +1655,14 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct > tipc_msg *hdr, > if (!l->bc_peer_is_up) > return rc; > > + l->stats.recv_nacks++; > + > /* Ignore if peers_snd_nxt goes beyond receive window */ > if (more(peers_snd_nxt, l->rcv_nxt + l->window)) > return rc; > > - if (!less(to, from)) { > - rc = tipc_link_retrans(l->bc_sndlink, from, to, xmitq); > - l->stats.recv_nacks++; > - } > + if (link_bc_retr_eval(snd_l, &from, &to)) > + rc = tipc_link_retrans(snd_l, from, to, xmitq); > > l->snd_nxt = peers_snd_nxt; > if (link_bc_rcv_gap(l)) > -- > 2.7.4 > ------------------------------------------------------------------------------ _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion