Hi all,

   I'm currently working on 4.4.0 kernel and is observing the following issues 
on tipc multicast.


1.      I have a system setup with 3 CPUs each using tipc to multicast to 
processes running on each CPU. After sending around 50 messages (the max window 
size), the far end did not receive the message any more. When Iooking at the 
tipc-conf -ls data, it said the broadcast-link start bunding

Message and I am seeing some congestion increment also.

2.       The bundling is triggered because we are not handling all acks from 
other peer correctly. Based on the trace I have so far, it seems we are not 
receiving some invalid bc_ack which corrupted our local ack (link->acked). It 
will reject the following bc_ack and did not release the packet properly on the 
transmq. So eventually we hit the 50 window size and we are not allowed to send 
any more.

3.      I dump out the tipc message data in the tipc_rcv funcion (in node.c) 
using some pr_info debugcode and find out that the hdr data is changed after 
the tipc_msg_validate function is called.

// First log before tipc_msg_validate

Feb 20 21:03:44 [SEQ 758012] lab204slot12 kernel:  [   49.736962] skb: 
ffff88034a775500, hdr: ffff88034f9e494e-ffff88034f9e494e, 
data:84057-1000406-5508000

// log after tipc_msg_validate

Feb 20 21:03:44 [SEQ 758015] lab204slot12 kernel:  [   49.736966] skb4: 
ffff88034a775500, hdr: ffff88034f9e494e-ffff88034f9e494e, 
data:e0014057-0-2000000



void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b)

{

        u16 bc_ack = msg_bcast_ack(hdr);

        u16 ack    = msg_ack(hdr);



    // debug code to generate the first log

    pr_info("skb: %p, hdr: %p-%p, data:%x-%x-%x old:%x\n", skb, hdr, hdr->hdr, 
hdr->hdr[0], hdr->hdr[1], hdr->hdr[2]);

    oldData = hdr->hdr[1];



        /* Ensure message is well-formed */

        if (unlikely(!tipc_msg_validate(skb)))

                goto discard;



    // debug code to generate the 2nd log if the data is changed.

    if (oldData != hdr->hdr[1]) {

        pr_info("skb4: %p, hdr: %p-%p, data:%x-%x-%x old:%x\n", skb, hdr, 
hdr->hdr, hdr->hdr[0], hdr->hdr[1], hdr->hdr[2]);

        dumpData = true;

    }



4.      It seems the tipc_msg_validate modified the skb message and the hdr. 
The modified data looks fine and has the correct expected bc-ack/ack values in 
the message. However, currently the bc_ack and ack value is initialized before 
the tipc_msg_validate and so we'll use that value which may cause issue on my 
bc_ack update and comparsion.



5        If i move the bc_ack and ack after tipc_msg_validate, i don't have any 
more tipc multicast stuck issue.  I have run it for half a day with multicast 
on 4 CPUs and so far there is no tipc multicast bundle trigger and no bogus 
bc_ack issue.  All multicast messges has been sent and received properly.



6         Is this a known behavior and is this an issue? If yes, is this a 
patch for it and will 4.4.48 has the same issue? Does the tipc_msg_validate 
function suppose to modify the hdr data and should we use the bc_ack/ack values 
afterwards the modification is completed.



Any comment is appreciated.



Regards,

   Matthew

   Sonus network.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to