Hi, Could you please share your failed kernel demsg log as well as your captured traffic logs on both nodes respectively?
If the captured traffic logs are too big, please figure out to reduce their size. Let me check what happened. Thanks, Ying On 07/17/2017 12:33 AM, Booth, Andrew wrote: > Hi Canh, > > > I have re-run the test several times with the provided patch and the > issue still occurs, the symptoms seem to be the same. The patch did not > seem to fix the issue. > > > Any other ideas? > > > Thanks for any info, > > Andrew > > > ------------------------------------------------------------------------ > *From:* Butler, Peter > *Sent:* Friday, July 14, 2017 10:15 AM > *To:* LUU Duc Canh; Jon Maloy; Tung Quang Nguyen > *Cc:* Booth, Andrew; Parthasarathy Bhuvaragan; Ying Xue; > [email protected] > *Subject:* RE: FW: TIPC issue: connection stalls when switch for bearer > 0 recovers > > > Hi Canh, > > > > For some reason Andrew Booth’s email address was incorrect in this > thread – I have corrected it in this reply and will work with him on > trying out this patch. > > > > Thanks > > > > *From:*LUU Duc Canh [mailto:[email protected]] > *Sent:* July-13-17 5:19 PM > *To:* Jon Maloy <[email protected]>; Tung Quang Nguyen > <[email protected]> > *Cc:* Andrew Booth ([email protected]) <[email protected]>; Butler, Peter > <[email protected]>; Parthasarathy Bhuvaragan > <[email protected]>; Ying Xue > <[email protected]>; [email protected] > *Subject:* Re: FW: TIPC issue: connection stalls when switch for bearer > 0 recovers > > > > Hi Andrew, > > Could you help me apply and try with Jon's patch as file attachment? > > Please let me know the result when you have done. > > Regards, > Canh > > On 13/07/2017 21:48, Jon Maloy wrote: > > Canh,Tung, > > This sounds like it might be the link synch bug we just identified > and fixed. Maybe you could send that patch to Andrew and let him try? > > PS. I am on vacation, and will only sporadically be reading email > the next three weeks. > > ///jon > > > > > > *From:* Booth, Andrew [mailto:[email protected]] > *Sent:* Thursday, July 13, 2017 20:05 > *To:* Jon Maloy <[email protected]> > <mailto:[email protected]>; Parthasarathy Bhuvaragan > <[email protected]> > <mailto:[email protected]>; Ying Xue > <[email protected]> <mailto:[email protected]> > *Cc:* Butler, Peter <[email protected]> > <mailto:[email protected]>; Booth, Andrew <[email protected]> > <mailto:[email protected]> > *Subject:* TIPC issue: connection stalls when switch for bearer 0 > recovers > > > > Hi, > > > > I am using a configuration with applications on 7 cards > communicating using TIPC. Each card has two ethernet devices > connecting to two disjoint subnets served by switch0 and switch1, > respectively. TIPC is set to use two bearers on each card. > > > > When I reboot switch0 I occasionally see TIPC connections fail. More > precisely, the applications send "keepalive" messages every 5 > seconds, and when switch0 recovers the keepalive messages are not > answered within 5 seconds so the applications close the connection. > I have wireshark captures of a connection during the period where it > fails; this shows some of the keepalive request and ack packets > exchanged on the network, but each application’s logs indicate that > they are not received from the socket. The connection in this case > is largely idle other than the keepalive exchanges. > > > > I'm looking for ways to narrow down the issue. > > > > The applications are select-based and I'm adding some more logging > to ensure that the read bit is set correctly, I would be very > surprised if it isn't. > > > > I'm considering adding an ioctl to TIPC to get some information from > the socket (number of bytes accepted from the application, number of > bytes sent to the application,etc) that could be called when our 5s > timer expires. The idea is to try and see if the TIPC socket > receives the packets that I see in the wireshark capture, or if they > are held in (or dropped by) the kernel during earlier processing. > > > > I have wireshark captures if you're interested. The capture > containing all TIPC traffic is about 10MB per slot, the capture > showing only the connection traffic is about 30KB per slot. > > > > Most of the cards are running TIPC from the 4.4.0 Linux kernel with > some patches for specific bugs (I'm not sure how to identify which > ones). One of the cards is running an older version of TIPC > (pre-2.0), I'm not monitoring this card for connection errors, but > it is exchanging TIPC packets and there is a chance it could be > causing interference. > > > > Any thoughts on how to proceed? > > > > Thanks for any info, > > Andrew > > > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
