Hi Jon,

To describe the problem, this is a performance matching issue to us, between 
(2.0.5 TIPC and Wind River Linux 6)  and (1.7.7 TIPC and Wind River Linux 3).  
Our performance is much less under the former, something like half as much as 
the latter.

We see the sender getting EAGAIN using 150 window size (and 50 and 300), that's 
probably the core issue to us, which is happening at a much less busy 
processing rate than the older environment.   There was no link reset at 150 
noted, but the EAGAIN is happening at an unexpectedly low rate.   We use 
non-blocking socket flags for this data, on both old and new versions.   Our 
program is sensitive to small delays, hence the non-blocking choice.

I think we are most interested in any tuning available specific to 2.0.5 , or 
anything changed in TIPC to perhaps slow down these things.    

Regards

Peter




-----Original Message-----
From: Jon Maloy <[email protected]> 
Sent: Thursday, September 20, 2018 11:11 AM
To: Peter Koss <[email protected]>; [email protected]
Subject: RE: What affects congestion beyond window size, and what might have 
reduced congestion thresholds in TIPC 2.0.x?

Hi Peter,
See below.

> -----Original Message-----
> From: Peter Koss <[email protected]>
> Sent: September 20, 2018 11:31 AM
> To: Jon Maloy <[email protected]>; tipc- 
> [email protected]
> Subject: RE: What affects congestion beyond window size, and what 
> might have reduced congestion thresholds in TIPC 2.0.x?
> 
> 
> Hi Jon,
> 
> Again, thanks for thinking about this.  Kernel version :
>      Linux a33ems1 3.10.62-ltsi-WR6.0.0.36_standard #1 SMP PREEMPT Mon

Ok. A pretty old kernel then.

> Aug 20 17:25:51 CDT 2018 x86_64 x86_64 x86_64 GNU/Linux
> 
> The retransmission error (i.e., kind of catastrophic interruption) 
> only was noted upon experiments pushing the window size to 300-400, it 
> was not noted at lower window sizes.
> 
> Our thoughts are around how to see evidence of retransmission going on 
> prior to that, or ways to see evidence of that in TIPC 2.0.5.

You use tipc-config to list the link statistics. There you will see the amount 
of retransmissions, just as you can see the amount of congestions.

> compare to TIPC 1.7.7 under Wind River Linux 3 as a data point, but 
> care mostly about addressing it under 2.0.5 and Wind River Linux 6.
> 
> This is not running a VM.   Not sure about your question on Remote
> Procedure Call

Sorry, I did of course mean RPS (Receive Packet Steering). But in such an old 
kernel TIPC is not supporting that, so that cannot be the problem.

> activated, if there's a command to run or code construct to check I 
> could get that.

I still don't understand what it is you consider a problem. If you use a window 
size of 150, you don't have the link reset, you say. 
So, what is it that is not working?

BR
///jon

> 
> Regards
> 
> PK
> -----Original Message-----
> From: Jon Maloy <[email protected]>
> Sent: Thursday, September 20, 2018 7:28 AM
> To: Peter Koss <[email protected]>; tipc- 
> [email protected]
> Subject: RE: What affects congestion beyond window size, and what 
> might have reduced congestion thresholds in TIPC 2.0.x?
> 
> Hi Peter,
> See my comments below.
> 
> > -----Original Message-----
> > From: Peter Koss <[email protected]>
> > Sent: September 19, 2018 6:11 PM
> > To: Jon Maloy <[email protected]>; tipc- 
> > [email protected]
> > Subject: RE: What affects congestion beyond window size, and what 
> > might have reduced congestion thresholds in TIPC 2.0.x?
> >
> > Thanks for responding.
> >
> > There was code in TIPC 1.7x that gave some node receive queue 
> > inforHimation, but that is now obsolete in 2.0.x.  We are using 
> > socket receive calls to get data instead, which seems to suggest one 
> > of two
> > problems: either the receive side queue is filling up and exceeding 
> > limits, or the ack back to the sender is having trouble.  We do see 
> > the
> sender getting an errno=EAGAIN.
> > Overall the performance levels we see are much less than with TIPC 
> > 1.7.x under Wind River Linux 3 than with TIPC 2.0.x under Wind River 
> > Linux
> 6.
> 
> Which Linux kernel version are you running?
> 
> >
> > case TIPC_NODE_RECVQ_DEPTH:
> > +               value = (u32)atomic_read(&tipc_queue_size);     <== This is
> obsolete
> > now, call occurs, we get 0.
> > +               break;
> > +       case TIPC_SOCK_RECVQ_DEPTH:
> > +               value = skb_queue_len(&sk->sk_receive_queue);
> > +               break;
> >
> >
> > Questions we have currently:
> > - What is the socket receive queue limit (default)?
> 
> That depends  on the Linux version you are using. Prior to 4.6 it was 
> 64 Mb, in later versions it is 2 Mb, but with a much better flow control 
> algorithm.
> 
> > - Is it wise to try a window size > 150?
> 
> I have never done it myself except for experimental purposes, but I 
> see no problem with it.
> Do you have any particular reason to do so? Does it give significant 
> better throughput than  at 150 ?
> 
> > - Is there a good way to control or influence the flow control 
> > sender/receiver coordination,
> You can improve the window size to potentially improve link level 
> throughput, and you can increase sending socket importance priority to 
> reduce the risk of receive socket buffer overflow.
> 
> > or a best way to adjust receive buffer limit?
> If you want to change this you follow the instruction under 5.2 at the 
> following link:
> http://tipc.sourceforge.net/programming.html#incr_rcvbuf
> But I see no sign that buffer overflow is your problem.
> 
> >
> > For context, the first sign of errors shows up as congestion, where 
> > the max value will increase to slightly above whatever window size 
> > we set (50,150,300,400).
> >
> > pl0_1:~$ /usr/sbin/tipc-config -ls | grep "Send queue max"
> >   Congestion link:0  Send queue max:2 avg:1
> >   Congestion link:93121  Send queue max:162 avg:3
> >   Congestion link:206724  Send queue max:164 avg:3
> >   Congestion link:67839  Send queue max:167 avg:3
> >   Congestion link:214788  Send queue max:166 avg:3
> >   Congestion link:205240  Send queue max:165 avg:3
> >   Congestion link:240955  Send queue max:166 avg:3
> >   Congestion link:0  Send queue max:0 avg:0
> >   Congestion link:0  Send queue max:1 avg:0
> >   Congestion link:0  Send queue max:0 avg:0
> 
> This is all normal and unproblematic. We allow an oversubscription of 
> one message (max 46 1500 byte packets) on each link to make the 
> algorithm simpler. So you will often find the max value higher than 
> the nominal upper limit.
> 
> >
> > The next following error occur only when the window size is high, 
> > 300-400, not seen at
> > 50 or 150, so we think this may be extraneous to our issue.   It also makes
> us
> > wonder
> > whether going above 150 is wise, hence the question above.
> >
> > Sep 17 05:42:00 pl0_4 kernel: tipc: Retransmission failure on link 
> > <1.1.5:bond1-1.1.2:bond1> Sep 17 05:42:00 pl0_4 kernel: tipc:
> > Resetting link
> 
> This is your real problem. For some reason a packet has been 
> retransmitted
> >100 times on a link without going through. Then the link is reset, 
> >and all
> associated connections as well.
> We have seen this happen for various reasons over the years, and fixed 
> them all.
> Is possibly RPC activated on your receiving node?
> Are you running a VM with a virtio interface? This one tends to be 
> overwhelmed sometimes and just stops sending for 30 seconds, something 
> leading to broken links.
> 
> But again, it all depends on which kernel and environment you are running.
> Please update me on this.
> 
> BR
> ///jon
> 
> > Sep 17 05:42:00 pl0_4 kernel: Link 1001002<eth:bond1>::WW Sep 17
> > 05:42:00
> > pl0_4 kernel: tipc: Lost link <1.1.5:bond1-1.1.2:bond1> on network 
> > plane A Sep 17 05:42:00 pl0_4 kernel: tipc: Lost contact with 
> > <1.1.2> Sep 17 05:42:00
> > pl0_10 kernel: tipc: Resetting link <1.1.2:bond1-1.1.5:bond1>, 
> > requested by peer Sep 17 05:42:00 pl0_10 kernel: tipc: Lost link 
> > <1.1.2:bond1-1.1.5:bond1> on network plane A
> >
> > Thanks in advance, advice is appreciated.
> >
> > PK
> >
> > -----Original Message-----
> > From: Jon Maloy <[email protected]>
> > Sent: Tuesday, September 18, 2018 12:15 PM
> > To: Peter Koss <[email protected]>; tipc- 
> > [email protected]
> > Subject: RE: What affects congestion beyond window size, and what 
> > might have reduced congestion thresholds in TIPC 2.0.x?
> >
> > Hi Peter,
> > The only parameter of those mentioned below that would have any 
> > effect on congestion is TIPC_MAX_LINK_WIN, which should reduce 
> > occurrences
> of
> > link level congestion.
> > However, you don't describe which symptoms you see caused by this 
> > congestion.
> > - Is it only a higher 'congested'  counter when you look at the link 
> > statistics? If so, you don't have a problem at all, this is a 
> > totally normal and frequent occurrence. (Maybe we should have given 
> > this field a different name to avert
> > confusion.)
> > - If this causes a severely reduced throughput you may have a 
> > problem, but I don't find that very likely.
> > - If you are losing messages at the socket level (dropped because of 
> > receive buffer overflow) you *do* have a problem, but this can most 
> > often be remedied by extending the socket receive buffer limit.
> >
> > BR
> > ///Jon Maloy
> >
> > -----Original Message-----
> > From: Peter Koss <[email protected]>
> > Sent: September 18, 2018 12:33 PM
> > To: [email protected]
> > Subject: [tipc-discussion] What affects congestion beyond window 
> > size, and what might have reduced congestion thresholds in TIPC 2.0.x?
> >
> >
> > In TIPC 1.7.6, we battled with congestion quite a bit.    We ultimately 
> > settled
> > on adjusting these parameters in TIPC, which we also used in TIPC 
> > 1.7.7.  This was running on Wind River Linux 3, where TIPC was an 
> > independent module from the kernel.
> >
> > SOL_TIPC                                          changed from 271 to 50.  
> > (probably not
> > affecting congestion)
> > TIPC_MAX_LINK_WIN                   changed from 50 to 150
> > TIPC_NODE_RECVQ_DEPTH        set to 131
> >
> > Using Wind River Linux 6, we get TIPC 2.0.5 as part of the kernel, 
> > and we see congestion at occurring at much lower overall load levels 
> > (less traffic
> overall),
> > compared to TIPC 1.7.7 & WR3.   We've made the same changes as above
> via
> > a loadable module for TIPC 2.0.5, and also noted that
> > TIPC_NODE_RECVQ_DEPTH is now obsoleted.   Upon observing
> congestion,
> > we have changed the default window size, and max window size, up to
> > 300 and even 400.  This helps congestion a little bit, but not sufficiently.
> >
> >
> > Does anyone know:
> > -What has changed in TIPC 2.0.x that affects this?
> > -Are there other parameters to change, to assist this?
> > -Is there a replacement set of parameters that affect what 
> > TIPC_NODE_RECVQ_DEPTH influences?
> >
> >
> >
> > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION
> intended
> > solely for the use of the addressee(s). If you are not the intended 
> > recipient, please notify so to the sender by e-mail and delete the 
> > original message. In such cases, please notify us immediately at 
> > [email protected] <mailto:[email protected]> . Further, you are not 
> > to copy, disclose, or distribute this e-mail or its contents to any 
> > unauthorized person(s). Any such actions are considered unlawful. 
> > This e-mail may contain viruses. Infinite has taken every reasonable 
> > precaution to minimize this risk, but is not liable for any damage 
> > you may sustain as a result of any virus in this e-mail. You should 
> > carry out your
> own virus checks before opening the e-mail or attachments.
> > Infinite reserves the right to monitor and review the content of all 
> > messages sent to or from this e-mail address. Messages sent to or 
> > from this e-mail address may be stored on the Infinite e-mail system.
> >
> >
> >
> > ***INFINITE******** End of Disclaimer********INFINITE********
> >
> > _______________________________________________
> > tipc-discussion mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/tipc-discussion


_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to