On 08/25/2016 05:41 AM, Xue, Ying wrote:
>
> -----Original Message-----
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Wednesday, August 17, 2016 2:09 AM
> To: tipc-discussion@lists.sourceforge.net; 
> parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
> jon.ma...@ericsson.com
> Cc: ma...@donjonn.com; gbala...@gmail.com
> Subject: [PATCH net-next 2/3] tipc: rate limit broadcast retransmissions
>
> As cluster sizes grow, so does the amount of identical or overlapping 
> broadcast NACKs generated by the packet receivers. This often leads to 'NACK 
> crunches' resulting in huge numbers of redundant retransmissions of the same 
> packet ranges.
>
> In this commit, we introduce rate control of broadcast retransmissions, so 
> that a retransmitted range cannot be retransmitted again until after at least 
> 10 ms. This reduces the frequency of duplicate retransmissions by an order of 
> magnitude, while having a significant positive impact on throughput and 
> scalability.
>
> Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
> ---
>   net/tipc/link.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 47 insertions(+), 5 deletions(-)
>
> diff --git a/net/tipc/link.c b/net/tipc/link.c index 136316f..58bb44d 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -181,7 +181,10 @@ struct tipc_link {
>       u16 acked;
>       struct tipc_link *bc_rcvlink;
>       struct tipc_link *bc_sndlink;
> -     int nack_state;
> +     unsigned long prev_retr;
> +     u16 prev_from;
> +     u16 prev_to;
> +     u8 nack_state;
>       bool bc_peer_is_up;
>   
>       /* Statistics */
> @@ -202,6 +205,8 @@ enum {
>       BC_NACK_SND_SUPPRESS,
>   };
>   
> +#define TIPC_BC_RETR_LIMIT 10   /* [ms] */
>
> [Ying] I suggest we should define jiffies number rather than 10ms. If so, we 
> don't need to convert l->prev_retr  from jiffies to microsecond.

jiffies is not a uniquely defined entity, as it depends on the clock 
speed of the actual CPU.
I measured 10 ms to be a value that works better than e.g. 2 ms or 50 
ms, so I don't think it is a good idea to add any more randomness to this.

>
> +
>   /*
>    * Interval between NACKs when packets arrive out of order
>    */
> @@ -1590,11 +1595,48 @@ void tipc_link_bc_init_rcv(struct tipc_link *l, 
> struct tipc_msg *hdr)
>               l->rcv_nxt = peers_snd_nxt;
>   }
>   
> +/* link_bc_retr eval()- check if the indicated range can be
> +retransmitted now
> + * - Adjust permitted range if there is overlap with previous
> +retransmission  */ static bool link_bc_retr_eval(struct tipc_link *l,
> +u16 *from, u16 *to) {
> +     unsigned long elapsed = jiffies_to_msecs(jiffies - l->prev_retr);
> +
> +     if (less(*to, *from))
> +             return false;
> +
> +     /* New retransmission request */
> +     if ((elapsed > TIPC_BC_RETR_LIMIT) ||
> +         less(*to, l->prev_from) || more(*from, l->prev_to)) {
> +             l->prev_from = *from;
> +             l->prev_to = *to;
> +             l->prev_retr = jiffies;
> +             return true;
> +     }
> +
>
> [Ying] In the my understanding, the statement above seems to be changed as 
> below:
>
> if ((elapsed > TIPC_BC_RETR_LIMIT) && (less(*to, l->prev_from) || more(*from, 
> l->prev_to)))
>
> Otherwise, I think the rate of retransmission may be still very high 
> especially when packet disorder frequently happens.
> As it's very normal that BCast retransmission range requested from different 
> nodes is not same, I guess the rate is not well suppressed in above condition.
>
> Maybe, we can use the method which is being effective in tipc_sk_enqueue() to 
> limit the rate like:

The condition indicates that the two retransmission requests are 
completely disjoint, e.g., 10-12 and 13-15. I don't see any reason why 
we should wait with retransmitting 13-15 in this case, as it is obvious 
that somebody is missing it, and we haven't retransmitted it before.
But I do understand it can be problematic if re receive repeated 
disjoint nacks, e.g., 10-12, 13-15, 10-12, 13-15 etc. I will make a test 
and see if there is anything to gain from keeping a "history" of the 
retransmissions, so that we never repeat the same packets inside the 
same interval. I'll take a look.

///jon

>
> tipc_sk_enqueue()
> {
> ..
>          while (skb_queue_len(inputq)) {
>                  if (unlikely(time_after_eq(jiffies, time_limit)))
>                          return;
> ..
> }
>
> And we don’t need to consider the retransmission range.
>
> Regards,
> Ying
>
> +     /* Inside range of previous retransmit */
> +     if (!less(*from, l->prev_from) && !more(*to, l->prev_to))
> +             return false;
> +
> +     /* Fully or partially outside previous range => exclude overlap */
> +     if (less(*from, l->prev_from)) {
> +             *to = l->prev_from - 1;
> +             l->prev_from = *from;
> +     }
> +     if (more(*to, l->prev_to)) {
> +             *from = l->prev_to + 1;
> +             l->prev_to = *to;
> +     }
> +     l->prev_retr = jiffies;
> +     return true;
> +}
> +
>   /* tipc_link_bc_sync_rcv - update rcv link according to peer's send state
>    */
>   int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr,
>                         struct sk_buff_head *xmitq)
>   {
> +     struct tipc_link *snd_l = l->bc_sndlink;
>       u16 peers_snd_nxt = msg_bc_snd_nxt(hdr);
>       u16 from = msg_bcast_ack(hdr) + 1;
>       u16 to = from + msg_bc_gap(hdr) - 1;
> @@ -1613,14 +1655,14 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct 
> tipc_msg *hdr,
>       if (!l->bc_peer_is_up)
>               return rc;
>   
> +     l->stats.recv_nacks++;
> +
>       /* Ignore if peers_snd_nxt goes beyond receive window */
>       if (more(peers_snd_nxt, l->rcv_nxt + l->window))
>               return rc;
>   
> -     if (!less(to, from)) {
> -             rc = tipc_link_retrans(l->bc_sndlink, from, to, xmitq);
> -             l->stats.recv_nacks++;
> -     }
> +     if (link_bc_retr_eval(snd_l, &from, &to))
> +             rc = tipc_link_retrans(snd_l, from, to, xmitq);
>   
>       l->snd_nxt = peers_snd_nxt;
>       if (link_bc_rcv_gap(l))
> --
> 2.7.4
>


------------------------------------------------------------------------------
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to