-----Original Message-----
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Wednesday, August 31, 2016 12:06 AM
To: Xue, Ying; Jon Maloy; tipc-discussion@lists.sourceforge.net; Parthasarathy 
Bhuvaragan; Richard Alpe
Cc: gbala...@gmail.com
Subject: RE: [PATCH net-next 2/3] tipc: rate limit broadcast retransmissions



> -----Original Message-----
> From: Xue, Ying [mailto:ying....@windriver.com]
> Sent: Tuesday, 30 August, 2016 05:55
> To: Jon Maloy <ma...@donjonn.com>; Jon Maloy <jon.ma...@ericsson.com>; 
> tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan 
> <parthasarathy.bhuvara...@ericsson.com>; Richard Alpe 
> <richard.a...@ericsson.com>
> Cc: gbala...@gmail.com
> Subject: RE: [PATCH net-next 2/3] tipc: rate limit broadcast 
> retransmissions
> 
> > +/* link_bc_retr eval()- check if the indicated range can be 
> > +retransmitted now
> > + * - Adjust permitted range if there is overlap with previous 
> > +retransmission  */ static bool link_bc_retr_eval(struct tipc_link 
> > +*l,
> > +u16 *from, u16 *to) {
> > +   unsigned long elapsed = jiffies_to_msecs(jiffies - l->prev_retr);
> > +
> > +   if (less(*to, *from))
> > +           return false;
> > +
> > +   /* New retransmission request */
> > +   if ((elapsed > TIPC_BC_RETR_LIMIT) ||
> > +       less(*to, l->prev_from) || more(*from, l->prev_to)) {
> > +           l->prev_from = *from;
> > +           l->prev_to = *to;
> > +           l->prev_retr = jiffies;
> > +           return true;
> > +   }
> > +
> >
> > [Ying] In the my understanding, the statement above seems to be 
> > changed as
> below:
> >
> > if ((elapsed > TIPC_BC_RETR_LIMIT) && (less(*to, l->prev_from) ||
> more(*from, l->prev_to)))
> >
> > Otherwise, I think the rate of retransmission may be still very high 
> > especially
> when packet disorder frequently happens.
> > As it's very normal that BCast retransmission range requested from 
> > different
> nodes is not same, I guess the rate is not well suppressed in above condition.
> >
> > Maybe, we can use the method which is being effective in 
> > tipc_sk_enqueue()
> to limit the rate like:
> 
> The condition indicates that the two retransmission requests are 
> completely disjoint, e.g., 10-12 and 13-15.
> 
> [Ying]Understood, and it makes sense.
> 
> I don't see any reason why
> we should wait with retransmitting 13-15 in this case, as it is 
> obvious that somebody is missing it, and we haven't retransmitted it before.
> 
> [Ying] Most of cases I observed on bcast before, is that after NACK is 
> just submitted to request sender to retransmit missed packets, the 
> requested packets quickly arrives soon. But tragically sender still 
> retransmits the requested packets through bcast later, leading to 
> waste network bandwidth.  Especially when packets are received in 
> disorder, it will further worsen the situation, which may be one of 
> reasons why we see many duplicated bcast message appearing on network 
> wire. To avoid this case, it's better to a bit postpone the 
> retransmission of messages on sender, on the contrary, we also can 
> restrain the retransmission request on receiver side. In other words, when 
> receiver detects that there is a gap at the first time, it's better to delay 
> a bit time to submit NACK message.
> 

There is an implicit delay in the fact that the gap detection is made through 
looking at STATE messages, and not directly from the packet flow. Reception of 
STATE messages should typically happen after the indicated "last_sent" packet 
already has been received, and after the detection we want one single 
retransmission of the missing packets as soon as possible in order to not waste 
bandwidth.
To further speed up detection and retransmission, you will see that I in the 
next commit introduce detection and NACK-ing directly from the incoming packet 
stream, although it is delayed and coordinated among the receivers.

In brief, I only see positive effects of early gap detection and sending of 
NACKs, as long as we can make an intelligent rate limiting of the actual 
retransmissions.

[Ying] Thanks for the clear explanation, and it makes sense.

> Moreover, I have a more aggressive idea: now that NACK is delivered 
> through unicast, why not use unicast to retransmit bcast messages. 
> This is because what packets missed rarely happens in practice.

I tested this by running the "bcast-blast" program between two peers in a 
200-node network, and I can assure there were plenty of missed packets. 
Moreover, it seems to me that when a packet is lost, it is most often because 
it is dropped by an overwhelmed bridge, so that many, if not all, recipient 
will miss the same packet. So I don't think unicast is the solution.

[Ying] Yes, for packet loss, it's much more efficient to retransmit bcast 
messages through bcast than unicast. But compared to packet loss, packet 
disorder seems to more easily happen. If we cannot well restrain the rate of 
retransmission as packets are out of order, the entire network can be quickly 
overwhelmed because too many packets are repeatedly retransmitted. At least, I 
ever observed the phenomenon before.

Of course, I agree with you. At present we still should use bcast to respond to 
retransmission request.


My testing also consisted of adding many atomic counters in the code, and print 
out those by command. This gave a good insight into the statistical behavior of 
the protocol across all nodes (name spaces). In this rather extreme test I 
could see a lot of retransmissions (almost 100 %), but counting the absolute 
number of packets sent and received I could also observe the percentage of 
duplicate packets (i.e., redundant retransmissions) was very low (< 10 %), 
something that lead me to believe that the retransmission protocol is about as 
efficient as it can be.  

[Ying] I have no such environment where 200 nodes are linked each other as a 
TIPC cluster. Your measure method seems wonderful! But I suggest how efficient 
our current bcast retransmission algorithm is if packets are often disordered. 

It is possible that we can make throughput better by decreasing the absolute 
number of dropped packets, but I rather think this can be achieved by 
decreasing the absolute pressure on the bridge by using a slower sending rate, 
and I doubt there is much to achieve from improving the retransmission 
protocol. I am trying to figure out a way to do this, but just decreasing the 
window size didn't do the trick.

My feeling is that there is more that can be done to this, but I haven't 
figured out how, and I don't have much more time to spend on this right now.

[Ying] Yes, it's always a tradeoff thing when we improve the performance of 
bcast retransmission algorithm. But I ever found there were lots of papers 
researched this topic although I did not carefully take a look at how they did. 
But I believe these papers may give us some idea to enlighten us. If I find 
some papers useful for us, I will share them here.

By the way, I have no other concerns to the series. 

Thanks,
Ying

BR
///jon

> So when bcast messages are
> retransmitted through bcast, these message for most of nodes are 
> redundant, which probably means it's unnecessary to send them through bcast.
> 
> Regards,
> Ying
> 
> But I do understand it can be problematic if re receive repeated 
> disjoint nacks, e.g., 10-12, 13-15, 10-12, 13-15 etc. I will make a 
> test and see if there is anything to gain from keeping a "history" of 
> the retransmissions, so that we never repeat the same packets inside 
> the same interval. I'll take a look.
> 
> ///jon
> 
> >
> > tipc_sk_enqueue()
> > {
> > ..
> >          while (skb_queue_len(inputq)) {
> >                  if (unlikely(time_after_eq(jiffies, time_limit)))
> >                          return;
> > ..
> > }
> >
> > And we don’t need to consider the retransmission range.
> >
> > Regards,
> > Ying
> >
> > +   /* Inside range of previous retransmit */
> > +   if (!less(*from, l->prev_from) && !more(*to, l->prev_to))
> > +           return false;
> > +
> > +   /* Fully or partially outside previous range => exclude overlap */
> > +   if (less(*from, l->prev_from)) {
> > +           *to = l->prev_from - 1;
> > +           l->prev_from = *from;
> > +   }
> > +   if (more(*to, l->prev_to)) {
> > +           *from = l->prev_to + 1;
> > +           l->prev_to = *to;
> > +   }
> > +   l->prev_retr = jiffies;
> > +   return true;
> > +}
> > +
> >   /* tipc_link_bc_sync_rcv - update rcv link according to peer's send state
> >    */
> >   int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr,
> >                       struct sk_buff_head *xmitq)
> >   {
> > +   struct tipc_link *snd_l = l->bc_sndlink;
> >     u16 peers_snd_nxt = msg_bc_snd_nxt(hdr);
> >     u16 from = msg_bcast_ack(hdr) + 1;
> >     u16 to = from + msg_bc_gap(hdr) - 1; @@ -1613,14 +1655,14 @@ int 
> > tipc_link_bc_sync_rcv(struct tipc_link *l, struct
> tipc_msg *hdr,
> >     if (!l->bc_peer_is_up)
> >             return rc;
> >
> > +   l->stats.recv_nacks++;
> > +
> >     /* Ignore if peers_snd_nxt goes beyond receive window */
> >     if (more(peers_snd_nxt, l->rcv_nxt + l->window))
> >             return rc;
> >
> > -   if (!less(to, from)) {
> > -           rc = tipc_link_retrans(l->bc_sndlink, from, to, xmitq);
> > -           l->stats.recv_nacks++;
> > -   }
> > +   if (link_bc_retr_eval(snd_l, &from, &to))
> > +           rc = tipc_link_retrans(snd_l, from, to, xmitq);
> >
> >     l->snd_nxt = peers_snd_nxt;
> >     if (link_bc_rcv_gap(l))
> > --
> > 2.7.4
> >

------------------------------------------------------------------------------
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to