Hi Peter, See below. > -----Original Message----- > From: Peter Koss <[email protected]> > Sent: September 20, 2018 11:31 AM > To: Jon Maloy <[email protected]>; tipc- > [email protected] > Subject: RE: What affects congestion beyond window size, and what might > have reduced congestion thresholds in TIPC 2.0.x? > > > Hi Jon, > > Again, thanks for thinking about this. Kernel version : > Linux a33ems1 3.10.62-ltsi-WR6.0.0.36_standard #1 SMP PREEMPT Mon
Ok. A pretty old kernel then. > Aug 20 17:25:51 CDT 2018 x86_64 x86_64 x86_64 GNU/Linux > > The retransmission error (i.e., kind of catastrophic interruption) only was > noted upon experiments pushing the window size to 300-400, it was not > noted at lower window sizes. > > Our thoughts are around how to see evidence of retransmission going on > prior to that, or ways to see evidence of that in TIPC 2.0.5. You use tipc-config to list the link statistics. There you will see the amount of retransmissions, just as you can see the amount of congestions. > compare to TIPC 1.7.7 under Wind River Linux 3 as a data point, but care > mostly about addressing it under 2.0.5 and Wind River Linux 6. > > This is not running a VM. Not sure about your question on Remote > Procedure Call Sorry, I did of course mean RPS (Receive Packet Steering). But in such an old kernel TIPC is not supporting that, so that cannot be the problem. > activated, if there's a command to run or code construct to > check I could get that. I still don't understand what it is you consider a problem. If you use a window size of 150, you don't have the link reset, you say. So, what is it that is not working? BR ///jon > > Regards > > PK > -----Original Message----- > From: Jon Maloy <[email protected]> > Sent: Thursday, September 20, 2018 7:28 AM > To: Peter Koss <[email protected]>; tipc- > [email protected] > Subject: RE: What affects congestion beyond window size, and what might > have reduced congestion thresholds in TIPC 2.0.x? > > Hi Peter, > See my comments below. > > > -----Original Message----- > > From: Peter Koss <[email protected]> > > Sent: September 19, 2018 6:11 PM > > To: Jon Maloy <[email protected]>; tipc- > > [email protected] > > Subject: RE: What affects congestion beyond window size, and what > > might have reduced congestion thresholds in TIPC 2.0.x? > > > > Thanks for responding. > > > > There was code in TIPC 1.7x that gave some node receive queue > > inforHimation, but that is now obsolete in 2.0.x. We are using socket > > receive calls to get data instead, which seems to suggest one of two > > problems: either the receive side queue is filling up and exceeding > > limits, or the ack back to the sender is having trouble. We do see the > sender getting an errno=EAGAIN. > > Overall the performance levels we see are much less than with TIPC > > 1.7.x under Wind River Linux 3 than with TIPC 2.0.x under Wind River Linux > 6. > > Which Linux kernel version are you running? > > > > > case TIPC_NODE_RECVQ_DEPTH: > > + value = (u32)atomic_read(&tipc_queue_size); <== This is > obsolete > > now, call occurs, we get 0. > > + break; > > + case TIPC_SOCK_RECVQ_DEPTH: > > + value = skb_queue_len(&sk->sk_receive_queue); > > + break; > > > > > > Questions we have currently: > > - What is the socket receive queue limit (default)? > > That depends on the Linux version you are using. Prior to 4.6 it was 64 Mb, > in > later versions it is 2 Mb, but with a much better flow control algorithm. > > > - Is it wise to try a window size > 150? > > I have never done it myself except for experimental purposes, but I see no > problem with it. > Do you have any particular reason to do so? Does it give significant better > throughput than at 150 ? > > > - Is there a good way to control or influence the flow control > > sender/receiver coordination, > You can improve the window size to potentially improve link level > throughput, and you can increase sending socket importance priority to > reduce the risk of receive socket buffer overflow. > > > or a best way to adjust receive buffer limit? > If you want to change this you follow the instruction under 5.2 at the > following link: > http://tipc.sourceforge.net/programming.html#incr_rcvbuf > But I see no sign that buffer overflow is your problem. > > > > > For context, the first sign of errors shows up as congestion, where > > the max value will increase to slightly above whatever window size we > > set (50,150,300,400). > > > > pl0_1:~$ /usr/sbin/tipc-config -ls | grep "Send queue max" > > Congestion link:0 Send queue max:2 avg:1 > > Congestion link:93121 Send queue max:162 avg:3 > > Congestion link:206724 Send queue max:164 avg:3 > > Congestion link:67839 Send queue max:167 avg:3 > > Congestion link:214788 Send queue max:166 avg:3 > > Congestion link:205240 Send queue max:165 avg:3 > > Congestion link:240955 Send queue max:166 avg:3 > > Congestion link:0 Send queue max:0 avg:0 > > Congestion link:0 Send queue max:1 avg:0 > > Congestion link:0 Send queue max:0 avg:0 > > This is all normal and unproblematic. We allow an oversubscription of one > message (max 46 1500 byte packets) on each link to make the algorithm > simpler. So you will often find the max value higher than the nominal upper > limit. > > > > > The next following error occur only when the window size is high, > > 300-400, not seen at > > 50 or 150, so we think this may be extraneous to our issue. It also makes > us > > wonder > > whether going above 150 is wise, hence the question above. > > > > Sep 17 05:42:00 pl0_4 kernel: tipc: Retransmission failure on link > > <1.1.5:bond1-1.1.2:bond1> Sep 17 05:42:00 pl0_4 kernel: tipc: > > Resetting link > > This is your real problem. For some reason a packet has been retransmitted > >100 times on a link without going through. Then the link is reset, and all > associated connections as well. > We have seen this happen for various reasons over the years, and fixed > them all. > Is possibly RPC activated on your receiving node? > Are you running a VM with a virtio interface? This one tends to be > overwhelmed sometimes and just stops sending for 30 seconds, something > leading to broken links. > > But again, it all depends on which kernel and environment you are running. > Please update me on this. > > BR > ///jon > > > Sep 17 05:42:00 pl0_4 kernel: Link 1001002<eth:bond1>::WW Sep 17 > > 05:42:00 > > pl0_4 kernel: tipc: Lost link <1.1.5:bond1-1.1.2:bond1> on network > > plane A Sep 17 05:42:00 pl0_4 kernel: tipc: Lost contact with <1.1.2> > > Sep 17 05:42:00 > > pl0_10 kernel: tipc: Resetting link <1.1.2:bond1-1.1.5:bond1>, > > requested by peer Sep 17 05:42:00 pl0_10 kernel: tipc: Lost link > > <1.1.2:bond1-1.1.5:bond1> on network plane A > > > > Thanks in advance, advice is appreciated. > > > > PK > > > > -----Original Message----- > > From: Jon Maloy <[email protected]> > > Sent: Tuesday, September 18, 2018 12:15 PM > > To: Peter Koss <[email protected]>; tipc- > > [email protected] > > Subject: RE: What affects congestion beyond window size, and what > > might have reduced congestion thresholds in TIPC 2.0.x? > > > > Hi Peter, > > The only parameter of those mentioned below that would have any effect > > on congestion is TIPC_MAX_LINK_WIN, which should reduce occurrences > of > > link level congestion. > > However, you don't describe which symptoms you see caused by this > > congestion. > > - Is it only a higher 'congested' counter when you look at the link > > statistics? If so, you don't have a problem at all, this is a totally > > normal and frequent occurrence. (Maybe we should have given this field > > a different name to avert > > confusion.) > > - If this causes a severely reduced throughput you may have a problem, > > but I don't find that very likely. > > - If you are losing messages at the socket level (dropped because of > > receive buffer overflow) you *do* have a problem, but this can most > > often be remedied by extending the socket receive buffer limit. > > > > BR > > ///Jon Maloy > > > > -----Original Message----- > > From: Peter Koss <[email protected]> > > Sent: September 18, 2018 12:33 PM > > To: [email protected] > > Subject: [tipc-discussion] What affects congestion beyond window size, > > and what might have reduced congestion thresholds in TIPC 2.0.x? > > > > > > In TIPC 1.7.6, we battled with congestion quite a bit. We ultimately > > settled > > on adjusting these parameters in TIPC, which we also used in TIPC > > 1.7.7. This was running on Wind River Linux 3, where TIPC was an > > independent module from the kernel. > > > > SOL_TIPC changed from 271 to 50. > > (probably not > > affecting congestion) > > TIPC_MAX_LINK_WIN changed from 50 to 150 > > TIPC_NODE_RECVQ_DEPTH set to 131 > > > > Using Wind River Linux 6, we get TIPC 2.0.5 as part of the kernel, and > > we see congestion at occurring at much lower overall load levels (less > > traffic > overall), > > compared to TIPC 1.7.7 & WR3. We've made the same changes as above > via > > a loadable module for TIPC 2.0.5, and also noted that > > TIPC_NODE_RECVQ_DEPTH is now obsoleted. Upon observing > congestion, > > we have changed the default window size, and max window size, up to > > 300 and even 400. This helps congestion a little bit, but not sufficiently. > > > > > > Does anyone know: > > -What has changed in TIPC 2.0.x that affects this? > > -Are there other parameters to change, to assist this? > > -Is there a replacement set of parameters that affect what > > TIPC_NODE_RECVQ_DEPTH influences? > > > > > > > > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION > intended > > solely for the use of the addressee(s). If you are not the intended > > recipient, please notify so to the sender by e-mail and delete the > > original message. In such cases, please notify us immediately at > > [email protected] <mailto:[email protected]> . Further, you are not to > > copy, disclose, or distribute this e-mail or its contents to any > > unauthorized person(s). Any such actions are considered unlawful. This > > e-mail may contain viruses. Infinite has taken every reasonable > > precaution to minimize this risk, but is not liable for any damage you > > may sustain as a result of any virus in this e-mail. You should carry out > > your > own virus checks before opening the e-mail or attachments. > > Infinite reserves the right to monitor and review the content of all > > messages sent to or from this e-mail address. Messages sent to or from > > this e-mail address may be stored on the Infinite e-mail system. > > > > > > > > ***INFINITE******** End of Disclaimer********INFINITE******** > > > > _______________________________________________ > > tipc-discussion mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
