Hi Peter,
See below.

> -----Original Message-----
> From: Peter Koss <[email protected]>
> Sent: September 20, 2018 11:31 AM
> To: Jon Maloy <[email protected]>; tipc-
> [email protected]
> Subject: RE: What affects congestion beyond window size, and what might
> have reduced congestion thresholds in TIPC 2.0.x?
> 
> 
> Hi Jon,
> 
> Again, thanks for thinking about this.  Kernel version :
>      Linux a33ems1 3.10.62-ltsi-WR6.0.0.36_standard #1 SMP PREEMPT Mon

Ok. A pretty old kernel then.

> Aug 20 17:25:51 CDT 2018 x86_64 x86_64 x86_64 GNU/Linux
> 
> The retransmission error (i.e., kind of catastrophic interruption) only was
> noted upon experiments pushing the window size to 300-400, it was not
> noted at lower window sizes.
> 
> Our thoughts are around how to see evidence of retransmission going on
> prior to that, or ways to see evidence of that in TIPC 2.0.5.  

You use tipc-config to list the link statistics. There you will see the amount 
of retransmissions, just as you can see the amount of congestions.

> compare to TIPC 1.7.7 under Wind River Linux 3 as a data point, but care
> mostly about addressing it under 2.0.5 and Wind River Linux 6.
> 
> This is not running a VM.   Not sure about your question on Remote
> Procedure Call 

Sorry, I did of course mean RPS (Receive Packet Steering). But in such an old 
kernel TIPC is not supporting that, so that cannot be the problem.

> activated, if there's a command to run or code construct to
> check I could get that.

I still don't understand what it is you consider a problem. If you use a window 
size of 150, you don't have the link reset, you say. 
So, what is it that is not working?

BR
///jon

> 
> Regards
> 
> PK
> -----Original Message-----
> From: Jon Maloy <[email protected]>
> Sent: Thursday, September 20, 2018 7:28 AM
> To: Peter Koss <[email protected]>; tipc-
> [email protected]
> Subject: RE: What affects congestion beyond window size, and what might
> have reduced congestion thresholds in TIPC 2.0.x?
> 
> Hi Peter,
> See my comments below.
> 
> > -----Original Message-----
> > From: Peter Koss <[email protected]>
> > Sent: September 19, 2018 6:11 PM
> > To: Jon Maloy <[email protected]>; tipc-
> > [email protected]
> > Subject: RE: What affects congestion beyond window size, and what
> > might have reduced congestion thresholds in TIPC 2.0.x?
> >
> > Thanks for responding.
> >
> > There was code in TIPC 1.7x that gave some node receive queue
> > inforHimation, but that is now obsolete in 2.0.x.  We are using socket
> > receive calls to get data instead, which seems to suggest one of two
> > problems: either the receive side queue is filling up and exceeding
> > limits, or the ack back to the sender is having trouble.  We do see the
> sender getting an errno=EAGAIN.
> > Overall the performance levels we see are much less than with TIPC
> > 1.7.x under Wind River Linux 3 than with TIPC 2.0.x under Wind River Linux
> 6.
> 
> Which Linux kernel version are you running?
> 
> >
> > case TIPC_NODE_RECVQ_DEPTH:
> > +               value = (u32)atomic_read(&tipc_queue_size);     <== This is
> obsolete
> > now, call occurs, we get 0.
> > +               break;
> > +       case TIPC_SOCK_RECVQ_DEPTH:
> > +               value = skb_queue_len(&sk->sk_receive_queue);
> > +               break;
> >
> >
> > Questions we have currently:
> > - What is the socket receive queue limit (default)?
> 
> That depends  on the Linux version you are using. Prior to 4.6 it was 64 Mb, 
> in
> later versions it is 2 Mb, but with a much better flow control algorithm.
> 
> > - Is it wise to try a window size > 150?
> 
> I have never done it myself except for experimental purposes, but I see no
> problem with it.
> Do you have any particular reason to do so? Does it give significant better
> throughput than  at 150 ?
> 
> > - Is there a good way to control or influence the flow control
> > sender/receiver coordination,
> You can improve the window size to potentially improve link level
> throughput, and you can increase sending socket importance priority to
> reduce the risk of receive socket buffer overflow.
> 
> > or a best way to adjust receive buffer limit?
> If you want to change this you follow the instruction under 5.2 at the
> following link:
> http://tipc.sourceforge.net/programming.html#incr_rcvbuf
> But I see no sign that buffer overflow is your problem.
> 
> >
> > For context, the first sign of errors shows up as congestion, where
> > the max value will increase to slightly above whatever window size we
> > set (50,150,300,400).
> >
> > pl0_1:~$ /usr/sbin/tipc-config -ls | grep "Send queue max"
> >   Congestion link:0  Send queue max:2 avg:1
> >   Congestion link:93121  Send queue max:162 avg:3
> >   Congestion link:206724  Send queue max:164 avg:3
> >   Congestion link:67839  Send queue max:167 avg:3
> >   Congestion link:214788  Send queue max:166 avg:3
> >   Congestion link:205240  Send queue max:165 avg:3
> >   Congestion link:240955  Send queue max:166 avg:3
> >   Congestion link:0  Send queue max:0 avg:0
> >   Congestion link:0  Send queue max:1 avg:0
> >   Congestion link:0  Send queue max:0 avg:0
> 
> This is all normal and unproblematic. We allow an oversubscription of one
> message (max 46 1500 byte packets) on each link to make the algorithm
> simpler. So you will often find the max value higher than the nominal upper
> limit.
> 
> >
> > The next following error occur only when the window size is high,
> > 300-400, not seen at
> > 50 or 150, so we think this may be extraneous to our issue.   It also makes
> us
> > wonder
> > whether going above 150 is wise, hence the question above.
> >
> > Sep 17 05:42:00 pl0_4 kernel: tipc: Retransmission failure on link
> > <1.1.5:bond1-1.1.2:bond1> Sep 17 05:42:00 pl0_4 kernel: tipc:
> > Resetting link
> 
> This is your real problem. For some reason a packet has been retransmitted
> >100 times on a link without going through. Then the link is reset, and all
> associated connections as well.
> We have seen this happen for various reasons over the years, and fixed
> them all.
> Is possibly RPC activated on your receiving node?
> Are you running a VM with a virtio interface? This one tends to be
> overwhelmed sometimes and just stops sending for 30 seconds, something
> leading to broken links.
> 
> But again, it all depends on which kernel and environment you are running.
> Please update me on this.
> 
> BR
> ///jon
> 
> > Sep 17 05:42:00 pl0_4 kernel: Link 1001002<eth:bond1>::WW Sep 17
> > 05:42:00
> > pl0_4 kernel: tipc: Lost link <1.1.5:bond1-1.1.2:bond1> on network
> > plane A Sep 17 05:42:00 pl0_4 kernel: tipc: Lost contact with <1.1.2>
> > Sep 17 05:42:00
> > pl0_10 kernel: tipc: Resetting link <1.1.2:bond1-1.1.5:bond1>,
> > requested by peer Sep 17 05:42:00 pl0_10 kernel: tipc: Lost link
> > <1.1.2:bond1-1.1.5:bond1> on network plane A
> >
> > Thanks in advance, advice is appreciated.
> >
> > PK
> >
> > -----Original Message-----
> > From: Jon Maloy <[email protected]>
> > Sent: Tuesday, September 18, 2018 12:15 PM
> > To: Peter Koss <[email protected]>; tipc-
> > [email protected]
> > Subject: RE: What affects congestion beyond window size, and what
> > might have reduced congestion thresholds in TIPC 2.0.x?
> >
> > Hi Peter,
> > The only parameter of those mentioned below that would have any effect
> > on congestion is TIPC_MAX_LINK_WIN, which should reduce occurrences
> of
> > link level congestion.
> > However, you don't describe which symptoms you see caused by this
> > congestion.
> > - Is it only a higher 'congested'  counter when you look at the link
> > statistics? If so, you don't have a problem at all, this is a totally
> > normal and frequent occurrence. (Maybe we should have given this field
> > a different name to avert
> > confusion.)
> > - If this causes a severely reduced throughput you may have a problem,
> > but I don't find that very likely.
> > - If you are losing messages at the socket level (dropped because of
> > receive buffer overflow) you *do* have a problem, but this can most
> > often be remedied by extending the socket receive buffer limit.
> >
> > BR
> > ///Jon Maloy
> >
> > -----Original Message-----
> > From: Peter Koss <[email protected]>
> > Sent: September 18, 2018 12:33 PM
> > To: [email protected]
> > Subject: [tipc-discussion] What affects congestion beyond window size,
> > and what might have reduced congestion thresholds in TIPC 2.0.x?
> >
> >
> > In TIPC 1.7.6, we battled with congestion quite a bit.    We ultimately 
> > settled
> > on adjusting these parameters in TIPC, which we also used in TIPC
> > 1.7.7.  This was running on Wind River Linux 3, where TIPC was an
> > independent module from the kernel.
> >
> > SOL_TIPC                                          changed from 271 to 50.  
> > (probably not
> > affecting congestion)
> > TIPC_MAX_LINK_WIN                   changed from 50 to 150
> > TIPC_NODE_RECVQ_DEPTH        set to 131
> >
> > Using Wind River Linux 6, we get TIPC 2.0.5 as part of the kernel, and
> > we see congestion at occurring at much lower overall load levels (less 
> > traffic
> overall),
> > compared to TIPC 1.7.7 & WR3.   We've made the same changes as above
> via
> > a loadable module for TIPC 2.0.5, and also noted that
> > TIPC_NODE_RECVQ_DEPTH is now obsoleted.   Upon observing
> congestion,
> > we have changed the default window size, and max window size, up to
> > 300 and even 400.  This helps congestion a little bit, but not sufficiently.
> >
> >
> > Does anyone know:
> > -What has changed in TIPC 2.0.x that affects this?
> > -Are there other parameters to change, to assist this?
> > -Is there a replacement set of parameters that affect what
> > TIPC_NODE_RECVQ_DEPTH influences?
> >
> >
> >
> > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION
> intended
> > solely for the use of the addressee(s). If you are not the intended
> > recipient, please notify so to the sender by e-mail and delete the
> > original message. In such cases, please notify us immediately at
> > [email protected] <mailto:[email protected]> . Further, you are not to
> > copy, disclose, or distribute this e-mail or its contents to any
> > unauthorized person(s). Any such actions are considered unlawful. This
> > e-mail may contain viruses. Infinite has taken every reasonable
> > precaution to minimize this risk, but is not liable for any damage you
> > may sustain as a result of any virus in this e-mail. You should carry out 
> > your
> own virus checks before opening the e-mail or attachments.
> > Infinite reserves the right to monitor and review the content of all
> > messages sent to or from this e-mail address. Messages sent to or from
> > this e-mail address may be stored on the Infinite e-mail system.
> >
> >
> >
> > ***INFINITE******** End of Disclaimer********INFINITE********
> >
> > _______________________________________________
> > tipc-discussion mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/tipc-discussion


_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to