Hi all,
I'm currently working on 4.4.0 kernel and is observing the following issues
on tipc multicast.
1. I have a system setup with 3 CPUs each using tipc to multicast to
processes running on each CPU. After sending around 50 messages (the max window
size), the far end did not receive the message any more. When Iooking at the
tipc-conf -ls data, it said the broadcast-link start bunding
Message and I am seeing some congestion increment also.
2. The bundling is triggered because we are not handling all acks from
other peer correctly. Based on the trace I have so far, it seems we are not
receiving some invalid bc_ack which corrupted our local ack (link->acked). It
will reject the following bc_ack and did not release the packet properly on the
transmq. So eventually we hit the 50 window size and we are not allowed to send
any more.
3. I dump out the tipc message data in the tipc_rcv funcion (in node.c)
using some pr_info debugcode and find out that the hdr data is changed after
the tipc_msg_validate function is called.
// First log before tipc_msg_validate
Feb 20 21:03:44 [SEQ 758012] lab204slot12 kernel: [ 49.736962] skb:
ffff88034a775500, hdr: ffff88034f9e494e-ffff88034f9e494e,
data:84057-1000406-5508000
// log after tipc_msg_validate
Feb 20 21:03:44 [SEQ 758015] lab204slot12 kernel: [ 49.736966] skb4:
ffff88034a775500, hdr: ffff88034f9e494e-ffff88034f9e494e,
data:e0014057-0-2000000
void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b)
{
u16 bc_ack = msg_bcast_ack(hdr);
u16 ack = msg_ack(hdr);
// debug code to generate the first log
pr_info("skb: %p, hdr: %p-%p, data:%x-%x-%x old:%x\n", skb, hdr, hdr->hdr,
hdr->hdr[0], hdr->hdr[1], hdr->hdr[2]);
oldData = hdr->hdr[1];
/* Ensure message is well-formed */
if (unlikely(!tipc_msg_validate(skb)))
goto discard;
// debug code to generate the 2nd log if the data is changed.
if (oldData != hdr->hdr[1]) {
pr_info("skb4: %p, hdr: %p-%p, data:%x-%x-%x old:%x\n", skb, hdr,
hdr->hdr, hdr->hdr[0], hdr->hdr[1], hdr->hdr[2]);
dumpData = true;
}
4. It seems the tipc_msg_validate modified the skb message and the hdr.
The modified data looks fine and has the correct expected bc-ack/ack values in
the message. However, currently the bc_ack and ack value is initialized before
the tipc_msg_validate and so we'll use that value which may cause issue on my
bc_ack update and comparsion.
5 If i move the bc_ack and ack after tipc_msg_validate, i don't have any
more tipc multicast stuck issue. I have run it for half a day with multicast
on 4 CPUs and so far there is no tipc multicast bundle trigger and no bogus
bc_ack issue. All multicast messges has been sent and received properly.
6 Is this a known behavior and is this an issue? If yes, is this a
patch for it and will 4.4.48 has the same issue? Does the tipc_msg_validate
function suppose to modify the hdr data and should we use the bc_ack/ack values
afterwards the modification is completed.
Any comment is appreciated.
Regards,
Matthew
Sonus network.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion