-----Original Message-----
From: Jon Maloy [mailto:[email protected]] 
Sent: Wednesday, August 17, 2016 2:09 AM
To: [email protected]; 
[email protected]; Xue, Ying; [email protected]; 
[email protected]
Cc: [email protected]; [email protected]
Subject: [PATCH net-next 1/3] tipc: transfer broadcast nacks in link state 
messages

When we send broadcasts in clusters of more 70-80 nodes, we sometimes see the 
broadcast link resetting because of an excessive number of retransmissions. 
This is caused by a combination of two factors:

1) A 'NACK crunch", where loss of broadcast packets is discovered
   and NACK'ed by several nodes simultaneously, leading to multiple
   redundant broadcast retransmissions.

2) The fact that the NACKS as such also are sent as broadcast, leading
   to excessive load and packet loss on the transmitting switch/bridge.

This commit deals with the latter problem, by moving sending of broadcast nacks 
from the dedicated BCAST_PROTOCOL/NACK message type to regular unicast 
LINK_PROTOCOL/STATE messages. We allocate 10 unused bits in word 8 of the said 
message for this purpose, and introduce a new capability bit, 
TIPC_BCAST_STATE_NACK in order to keep the change backwards compatible.

[Ying] Using unicast to send NACK message is much better than bcast.


/* tipc_link_bc_sync_rcv - update rcv link according to peer's send state
  */
-void tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr,
-                          struct sk_buff_head *xmitq)
+int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr,
+                         struct sk_buff_head *xmitq)
 {
        u16 peers_snd_nxt = msg_bc_snd_nxt(hdr);
+       u16 from = msg_bcast_ack(hdr) + 1;
+       u16 to = from + msg_bc_gap(hdr) - 1;
+       int rc = 0;
 
        if (!link_is_up(l))
-               return;
+               return rc;
 
        if (!msg_peer_node_is_up(hdr))
-               return;
+               return rc;
 
        /* Open when peer ackowledges our bcast init msg (pkt #1) */
        if (msg_ack(hdr))
                l->bc_peer_is_up = true;
 
        if (!l->bc_peer_is_up)
-               return;
+               return rc;
 
        /* Ignore if peers_snd_nxt goes beyond receive window */
        if (more(peers_snd_nxt, l->rcv_nxt + l->window))
-               return;
+               return rc;
+
+       if (!less(to, from)) {
+               rc = tipc_link_retrans(l->bc_sndlink, from, to, xmitq);
+               l->stats.recv_nacks++;
+       }
+
+       l->snd_nxt = peers_snd_nxt;
+       if (link_bc_rcv_gap(l))
+               rc |= TIPC_LINK_SND_STATE;
+
+       /* Return now if sender supports nack via STATE messages */
+       if (l->peer_caps & TIPC_BCAST_STATE_NACK)
+               return rc;
+

[Ying] If peer nodes doesn't support the new NACK way, it seems we not only 
retransmit the missed messages through tipc_link_retrans(), but also we send a 
NACK message created with tipc_link_build_bc_proto_msg(). But in the old mode, 
we don't need to retransmit the missed messages at the moment. Do you think 
this different behavior is right?

Thanks,
Ying

+       /* Otherwise, be backwards compatible */
 
        if (!more(peers_snd_nxt, l->rcv_nxt)) {
                l->nack_state = BC_NACK_SND_CONDITIONAL;
-               return;
+               return 0;
        }
 
        /* Don't NACK if one was recently sent or peeked */
        if (l->nack_state == BC_NACK_SND_SUPPRESS) {
                l->nack_state = BC_NACK_SND_UNCONDITIONAL;
-               return;
+               return 0;
        }
 
        /* Conditionally delay NACK sending until next synch rcv */
        if (l->nack_state == BC_NACK_SND_CONDITIONAL) {
                l->nack_state = BC_NACK_SND_UNCONDITIONAL;
                if ((peers_snd_nxt - l->rcv_nxt) < TIPC_MIN_LINK_WIN)
-                       return;
+                       return 0;
        }
 
        /* Send NACK now but suppress next one */
        tipc_link_build_bc_proto_msg(l, true, peers_snd_nxt, xmitq);
        l->nack_state = BC_NACK_SND_SUPPRESS;
+       return 0;
 }

------------------------------------------------------------------------------
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to