On 08/25/2016 05:41 AM, Xue, Ying wrote:
>
> -----Original Message-----
> From: Jon Maloy [mailto:[email protected]]
> Sent: Wednesday, August 17, 2016 2:09 AM
> To: [email protected];
> [email protected]; Xue, Ying; [email protected];
> [email protected]
> Cc: [email protected]; [email protected]
> Subject: [PATCH net-next 2/3] tipc: rate limit broadcast retransmissions
>
> As cluster sizes grow, so does the amount of identical or overlapping
> broadcast NACKs generated by the packet receivers. This often leads to 'NACK
> crunches' resulting in huge numbers of redundant retransmissions of the same
> packet ranges.
>
> In this commit, we introduce rate control of broadcast retransmissions, so
> that a retransmitted range cannot be retransmitted again until after at least
> 10 ms. This reduces the frequency of duplicate retransmissions by an order of
> magnitude, while having a significant positive impact on throughput and
> scalability.
>
> Signed-off-by: Jon Maloy <[email protected]>
> ---
> net/tipc/link.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 47 insertions(+), 5 deletions(-)
>
> diff --git a/net/tipc/link.c b/net/tipc/link.c index 136316f..58bb44d 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -181,7 +181,10 @@ struct tipc_link {
> u16 acked;
> struct tipc_link *bc_rcvlink;
> struct tipc_link *bc_sndlink;
> - int nack_state;
> + unsigned long prev_retr;
> + u16 prev_from;
> + u16 prev_to;
> + u8 nack_state;
> bool bc_peer_is_up;
>
> /* Statistics */
> @@ -202,6 +205,8 @@ enum {
> BC_NACK_SND_SUPPRESS,
> };
>
> +#define TIPC_BC_RETR_LIMIT 10 /* [ms] */
>
> [Ying] I suggest we should define jiffies number rather than 10ms. If so, we
> don't need to convert l->prev_retr from jiffies to microsecond.
jiffies is not a uniquely defined entity, as it depends on the clock
speed of the actual CPU.
I measured 10 ms to be a value that works better than e.g. 2 ms or 50
ms, so I don't think it is a good idea to add any more randomness to this.
>
> +
> /*
> * Interval between NACKs when packets arrive out of order
> */
> @@ -1590,11 +1595,48 @@ void tipc_link_bc_init_rcv(struct tipc_link *l,
> struct tipc_msg *hdr)
> l->rcv_nxt = peers_snd_nxt;
> }
>
> +/* link_bc_retr eval()- check if the indicated range can be
> +retransmitted now
> + * - Adjust permitted range if there is overlap with previous
> +retransmission */ static bool link_bc_retr_eval(struct tipc_link *l,
> +u16 *from, u16 *to) {
> + unsigned long elapsed = jiffies_to_msecs(jiffies - l->prev_retr);
> +
> + if (less(*to, *from))
> + return false;
> +
> + /* New retransmission request */
> + if ((elapsed > TIPC_BC_RETR_LIMIT) ||
> + less(*to, l->prev_from) || more(*from, l->prev_to)) {
> + l->prev_from = *from;
> + l->prev_to = *to;
> + l->prev_retr = jiffies;
> + return true;
> + }
> +
>
> [Ying] In the my understanding, the statement above seems to be changed as
> below:
>
> if ((elapsed > TIPC_BC_RETR_LIMIT) && (less(*to, l->prev_from) || more(*from,
> l->prev_to)))
>
> Otherwise, I think the rate of retransmission may be still very high
> especially when packet disorder frequently happens.
> As it's very normal that BCast retransmission range requested from different
> nodes is not same, I guess the rate is not well suppressed in above condition.
>
> Maybe, we can use the method which is being effective in tipc_sk_enqueue() to
> limit the rate like:
The condition indicates that the two retransmission requests are
completely disjoint, e.g., 10-12 and 13-15. I don't see any reason why
we should wait with retransmitting 13-15 in this case, as it is obvious
that somebody is missing it, and we haven't retransmitted it before.
But I do understand it can be problematic if re receive repeated
disjoint nacks, e.g., 10-12, 13-15, 10-12, 13-15 etc. I will make a test
and see if there is anything to gain from keeping a "history" of the
retransmissions, so that we never repeat the same packets inside the
same interval. I'll take a look.
///jon
>
> tipc_sk_enqueue()
> {
> ..
> while (skb_queue_len(inputq)) {
> if (unlikely(time_after_eq(jiffies, time_limit)))
> return;
> ..
> }
>
> And we don’t need to consider the retransmission range.
>
> Regards,
> Ying
>
> + /* Inside range of previous retransmit */
> + if (!less(*from, l->prev_from) && !more(*to, l->prev_to))
> + return false;
> +
> + /* Fully or partially outside previous range => exclude overlap */
> + if (less(*from, l->prev_from)) {
> + *to = l->prev_from - 1;
> + l->prev_from = *from;
> + }
> + if (more(*to, l->prev_to)) {
> + *from = l->prev_to + 1;
> + l->prev_to = *to;
> + }
> + l->prev_retr = jiffies;
> + return true;
> +}
> +
> /* tipc_link_bc_sync_rcv - update rcv link according to peer's send state
> */
> int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr,
> struct sk_buff_head *xmitq)
> {
> + struct tipc_link *snd_l = l->bc_sndlink;
> u16 peers_snd_nxt = msg_bc_snd_nxt(hdr);
> u16 from = msg_bcast_ack(hdr) + 1;
> u16 to = from + msg_bc_gap(hdr) - 1;
> @@ -1613,14 +1655,14 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct
> tipc_msg *hdr,
> if (!l->bc_peer_is_up)
> return rc;
>
> + l->stats.recv_nacks++;
> +
> /* Ignore if peers_snd_nxt goes beyond receive window */
> if (more(peers_snd_nxt, l->rcv_nxt + l->window))
> return rc;
>
> - if (!less(to, from)) {
> - rc = tipc_link_retrans(l->bc_sndlink, from, to, xmitq);
> - l->stats.recv_nacks++;
> - }
> + if (link_bc_retr_eval(snd_l, &from, &to))
> + rc = tipc_link_retrans(snd_l, from, to, xmitq);
>
> l->snd_nxt = peers_snd_nxt;
> if (link_bc_rcv_gap(l))
> --
> 2.7.4
>
------------------------------------------------------------------------------
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion