Hi all, I'm currently working on 4.4.0 kernel and is observing the following issues on tipc multicast.
1. I have a system setup with 3 CPUs each using tipc to multicast to processes running on each CPU. After sending around 50 messages (the max window size), the far end did not receive the message any more. When Iooking at the tipc-conf -ls data, it said the broadcast-link start bunding Message and I am seeing some congestion increment also. 2. The bundling is triggered because we are not handling all acks from other peer correctly. Based on the trace I have so far, it seems we are not receiving some invalid bc_ack which corrupted our local ack (link->acked). It will reject the following bc_ack and did not release the packet properly on the transmq. So eventually we hit the 50 window size and we are not allowed to send any more. 3. I dump out the tipc message data in the tipc_rcv funcion (in node.c) using some pr_info debugcode and find out that the hdr data is changed after the tipc_msg_validate function is called. // First log before tipc_msg_validate Feb 20 21:03:44 [SEQ 758012] lab204slot12 kernel: [ 49.736962] skb: ffff88034a775500, hdr: ffff88034f9e494e-ffff88034f9e494e, data:84057-1000406-5508000 // log after tipc_msg_validate Feb 20 21:03:44 [SEQ 758015] lab204slot12 kernel: [ 49.736966] skb4: ffff88034a775500, hdr: ffff88034f9e494e-ffff88034f9e494e, data:e0014057-0-2000000 void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b) { u16 bc_ack = msg_bcast_ack(hdr); u16 ack = msg_ack(hdr); // debug code to generate the first log pr_info("skb: %p, hdr: %p-%p, data:%x-%x-%x old:%x\n", skb, hdr, hdr->hdr, hdr->hdr[0], hdr->hdr[1], hdr->hdr[2]); oldData = hdr->hdr[1]; /* Ensure message is well-formed */ if (unlikely(!tipc_msg_validate(skb))) goto discard; // debug code to generate the 2nd log if the data is changed. if (oldData != hdr->hdr[1]) { pr_info("skb4: %p, hdr: %p-%p, data:%x-%x-%x old:%x\n", skb, hdr, hdr->hdr, hdr->hdr[0], hdr->hdr[1], hdr->hdr[2]); dumpData = true; } 4. It seems the tipc_msg_validate modified the skb message and the hdr. The modified data looks fine and has the correct expected bc-ack/ack values in the message. However, currently the bc_ack and ack value is initialized before the tipc_msg_validate and so we'll use that value which may cause issue on my bc_ack update and comparsion. 5 If i move the bc_ack and ack after tipc_msg_validate, i don't have any more tipc multicast stuck issue. I have run it for half a day with multicast on 4 CPUs and so far there is no tipc multicast bundle trigger and no bogus bc_ack issue. All multicast messges has been sent and received properly. 6 Is this a known behavior and is this an issue? If yes, is this a patch for it and will 4.4.48 has the same issue? Does the tipc_msg_validate function suppose to modify the hdr data and should we use the bc_ack/ack values afterwards the modification is completed. Any comment is appreciated. Regards, Matthew Sonus network. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion