On Fri, Feb 28, 2014 at 08:27:13PM -0800, Bill Broadley wrote: > On 02/26/2014 11:37 PM, Nick Schmalenberger wrote: > > I need to increase throughput for long lived tcp connections over > > ipsec over wan between Amazon in Ireland and a Level3 gigabit > > link in Ashburn Virginia (currently running at about 20Mbps). > > Do you mean that the link is 20mbit? Or that the bandwidth you are > achieving over it is 20 mbit? The latter. > > What is the largest MTU supported across that link? > The interface mtu on the Ireland side is 9001, but it is really rather less over the path. Tracepath, and ping with pmtu discovery both seem to think it is 1500 after the first hop, but it seems to get fragmented after that anyway, I have seen with tcpdump or tshark. The actual mtu seems to be 1452. I assume this is because the Amazon ipsec router is just disregarding the don't fragment for PMTU discovery.
> > I've read various articles saying to enlarge the buffers and make > > various other kernel tweaks. Some say to base it on the bandwidth > > delay product, some say just on the link speed, and some say > > don't bother linux does all that automatically now. Alot of it > > seems random. > > Heh, well there's quite a few variables to consider. But in general > bandwidth is much harder to utilize over a high latency link. So BDP is > definitely relevant. > > What exactly are you trying to send/receive over this high latency link? > The protocol is Kafka, sending video playing info back from Europe for aggregation in Ashburn. > > However, with wireshark I see that the "bytes in flight" > > measurement which counts unacknowledged bytes from the source > > never gets close to the window size sent by the destination. Does > > this suggest anything in particular to tweak? I got some books on > > One cheat/hack is to just do more TCP connections. Generally the > throughput will increase over high latency links with more TCP > connections. Up to a point of course. > Yeah, we have had success with this, compression by Kafka also helps, but I'm hoping to increase throughput at a lower level anyway if its possible. > > > > wireshark, which were quite helpful in how to use the graphs and > > filters, but on tcp performance they mostly just talked about the > > effect of packet loss. I'm not certain, but I don't think packet > > loss is the main thing holding back my performance, because there > > is some which causes a brief dip in the window size and then it > > recovers. Throughput stays pretty flat. > > > > It would be really amazing if there was a flowchart on doing this > > for linux that could be informed by wireshark io graphs and other > > graphs. Has anybody ever seen such a chart? If this approach is > > succesful for me, and I can understand how to do it in several > > scenarios, I think I will even like to make such a flow chart if > > it doesn't already exist. Thanks for any tips and disabusement of > > my misunderstandings about tcp in linux :) > > TCP defaults are definitely suboptimal for transatlantic links. As our > many assumptions that applications have. Much depends on what you are > trying to do. I've seen various appliance like widgets that will proxy > a given protocol for a high latency link so that servers/clients with > poor assumptions don't take quite as much of a hit. > > I pretty good overview of the related issues is: > http://www.psc.edu/index.php/networking/641-tcp-tune > > What I'd do first is attempt to fix things manually by tinkering with > the mentioned values. It wouldn't be particularly hard to write a > traffic generator that played with bitrate and number of simultaneous > connections to analyze available performance. Said tool could even > explore the reasonable ranges for the various knobs you can tinker with > by writing to /sys and /proc. Personally I'd be more likely to parse > the tcpdump logs myself so I could use an arbitrary number of filters, > statistics, and post processing. Shouldn't be too hard to graph the > number of unack'd packets over time for instance. > The unack'd packets is exactly what I've been focusing on with wireshark, and I think wireshark is actually the ideal tool to do the graphing. It has a metric called "bytes in flight" which counts unack'd bytes, and I graphed it like this in comparison to window size: http://postimg.org/image/az0hspgkp/ and the spikes clearly match up with retransmissions which makes sense because retransmissions should continue until those segments are acked. I'm also assuming the real packet loss shown is negligible. What I don't get though, is why bytes in flight never seems to get close to the window size (except for this 1 time: http://postimg.org/image/8b3ia1qhh/ ). Also, why would the window size seem to sustain at higher level after these dips recover? My expectation was bytes in flight would have a sawtooth pattern up to the window size. Its also possible that the window size is being reached and I just didn't setup the graphs to show that because of smoothing or aggregation or something, although I hope by selecting MAX bytes in flight and MIN window size to represent each interval it would help with that. I've heard of some of the appliances you mention, and I think its probably not a coincidence that Riverbed is the primary sponsor of Wireshark! My boss has mentioned them as an option here, and I think they are even supported in the Amazon environment. But, having a wireshark-informed approach to tuning Linux would be so cool... What I had in mind with a flow chart, was some things like that the max value of net.ipv4.tcp_wmem does not override net.core.wmem_max (according to https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt), and so there could be a logic of how the various values relate as bottlenecks. Then ideally various patterns in the wireshark graphs would relate to tunables in linux in a way that fits into the flowchart. For example, could a too small tx buffer explain "bytes in flight" never reaching the window size sent from the destination? Do the spikes in bytes in flight during retransmission demonstrate the tx buffer is actually plenty large, and the source application just can't sustain higher output? Does not reaching the window size at least mean I should focus on tuning the source host? The way I'm generating the traffic, was I created a 100M file from /dev/urandom, and I copied itself 5 times to make a 500M file (sometimes I just did 100M also) Then, I ran nc -kl 10.0.4.61 8080 > /dev/null on the destination, and curl -T /tmp/500Mrandomfile http://10.0.4.61:8080 > /dev/null on the source. On the source /tmp is a tmpfs so should be quite high performance, and between hosts on the lan it goes right up near a gigabit/s, so I think this should be a pretty good http-like benchmark. Thanks for all your suggestions! Nick _______________________________________________ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech