Hi Jon,

  Thanks for the comment, I totally agreed that skb_may_pull is the one 
changing the data content, but that procedure trace is a bit Hard to follow. So 
the correct data is probably after validate has linearized the data. It seems 
safe to make the change anyway since no one uses bc_ack before validate.

  We are checking that if we have your recommended patch. Is that patch already 
in kernel 4.9.11?

  Thanks,
    Matthew 


-----Original Message-----
From: Jon Maloy [mailto:[email protected]] 
Sent: Wednesday, February 22, 2017 2:57 PM
To: Wong, Matthew; [email protected]
Subject: RE: [tipc-discussion] tipc multicast stuck (hit max window) due to 
invalid bc_ack value

Hi Matthew,
See below for my comment.

Also, although this is about a different problem, you should check if you have 
the following patch, and the one it is referring to:
commit 06bd2b1ed04ca9f (" tipc: fix broadcast link synchronization problem")

 
> -----Original Message-----
> From: Wong, Matthew [mailto:[email protected]]
> Sent: Wednesday, February 22, 2017 12:41 PM
> To: [email protected]
> Subject: [tipc-discussion] tipc multicast stuck (hit max window) due 
> to invalid bc_ack value
> 
> 
> Hi all,
> 
>    I'm currently working on 4.4.0 kernel and is observing the 
> following issues on tipc multicast.
> 
> 
> 1.      I have a system setup with 3 CPUs each using tipc to multicast to
> processes running on each CPU. After sending around 50 messages (the 
> max window size), the far end did not receive the message any more. 
> When Iooking at the tipc-conf -ls data, it said the broadcast-link 
> start bunding

[...]

> 
> 4.      It seems the tipc_msg_validate modified the skb message and the hdr.
> The modified data looks fine and has the correct expected bc-ack/ack 
> values in the message. However, currently the bc_ack and ack value is 
> initialized before the tipc_msg_validate and so we'll use that value 
> which may cause issue on my bc_ack update and comparsion.

The only possible culprit here is the function skb_may_pull(), which is called 
from msg_validate() in the rare case that header part of the packet buffer is 
non-linear. The function is a little hard to follow, but as I understand it, it 
linearizes the buffer in such cases, and header fields read before the 
validation will obviously be wrong.

> 
> 
> 
> 5        If i move the bc_ack and ack after tipc_msg_validate, i don't have 
> any
> more tipc multicast stuck issue.  I have run it for half a day with 
> multicast on 4 CPUs and so far there is no tipc multicast bundle 
> trigger and no bogus bc_ack issue.  All multicast messges has been sent and 
> received properly.
> 
> 
> 
> 6         Is this a known behavior and is this an issue? If yes, is this a 
> patch for it
> and will 4.4.48 has the same issue? Does the tipc_msg_validate 
> function suppose to modify the hdr data and should we use the 
> bc_ack/ack values afterwards the modification is completed.

We have never seen this before, but your diagnostics is totally credible. I 
will post a patch for this asap.
Nice job!

BR
///jon

> 
> 
> 
> Any comment is appreciated.
> 
> 
> 
> Regards,
> 
>    Matthew
> 
>    Sonus network.
> 
> ----------------------------------------------------------------------
> -------- Check out the vibrant tech community on one of the world's 
> most engaging tech sites, SlashDot.org! http://sdm.link/slashdot 
> _______________________________________________
> tipc-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to