Hello!

We are on joyent and are having an issue with large machines that receive
thousands of short-lived connections. We're using a base64 smartos image,
joyent version 13.1.0. We're seeing a tcp retransmit rate of about 3 - 6
per second. This drives up our request queueing time by a
minimum of 1 second (which is the default tcp initial retransmit timeout)
for the requests that trigger a tcp retransmit.

Example:

[18:16:43] [email protected] [~]
> netstat -s 1 | grep tcpRetransSegs
tcpRetransSegs =5353360 tcpRetransBytes =489877464
tcpRetransSegs = 1 tcpRetransBytes = 0
tcpRetransSegs = 3 tcpRetransBytes = 0
tcpRetransSegs = 3 tcpRetransBytes = 1216
tcpRetransSegs = 2 tcpRetransBytes = 0
tcpRetransSegs = 3 tcpRetransBytes = 0
tcpRetransSegs = 2 tcpRetransBytes = 0
tcpRetransSegs = 3 tcpRetransBytes = 0
tcpRetransSegs = 6 tcpRetransBytes = 0
tcpRetransSegs = 2 tcpRetransBytes = 0

We have on average about 6,300 established connections per app server.

We have over 15,000 TIME_WAIT connections on average:

[18:28:39] [email protected] [~]
> netstat -a | grep TIME_WAIT | wc -l
15124

All TCP statistics:

TCP tcpRtoAlgorithm = 4 tcpRtoMin = 400
tcpRtoMax = 60000 tcpMaxConn = -1
tcpActiveOpens =793317711 tcpPassiveOpens =913623229
tcpAttemptFails =1066594 tcpEstabResets =1744216
tcpCurrEstab = 6226 tcpOutSegs =37625293048
tcpOutDataSegs =1253992774 tcpOutDataBytes =4057941516
tcpRetransSegs =5355414 tcpRetransBytes =489906426
tcpOutAck =2123412674 tcpOutAckDelayed =3078925761
tcpOutUrg = 0 tcpOutWinUpdate = 73
tcpOutWinProbe = 127 tcpOutControl =3549154370
tcpOutRsts =132941554 tcpOutFastRetrans = 530
tcpInSegs =50209254986
tcpInAckSegs = 0 tcpInAckBytes =563239377
tcpInDupAck =1063545817 tcpInAckUnsent = 0
tcpInInorderSegs =500722163 tcpInInorderBytes =3376172909
tcpInUnorderSegs =858659 tcpInUnorderBytes =966995186
tcpInDupSegs =9439326 tcpInDupBytes =4088222
tcpInPartDupSegs = 13 tcpInPartDupBytes = 10196
tcpInPastWinSegs = 78 tcpInPastWinBytes =1610024294
tcpInWinProbe = 0 tcpInWinUpdate = 30
tcpInClosed =32204385 tcpRttNoUpdate = 5652
tcpRttUpdate =2528802492 tcpTimRetrans =2155791
tcpTimRetransDrop =101699 tcpTimKeepalive =663046
tcpTimKeepaliveProbe=410002 tcpTimKeepaliveDrop = 24406
tcpListenDrop = 6602 tcpListenDropQ0 = 0
tcpHalfOpenDrop = 0 tcpOutSackRetrans =157261
tcpInErrs = 8 udpNoPorts =120132424

Note that tcpListenDrops happen very infrequently, so these don't seem to
be a core
issue.

We've been able to reproduce this issue on a smaller app server that's
receiving no traffic by sending a high enough number of requests so that we
get over 8,000 total connections (TIME_WAIT and ESTABLISHED). Then about 1
out of every 15 to 20 requests has a delay of 1 second.

Any suggested tunings? It seems as though the OS occasionally pauses
sending ACKs under a high connection rate, possibly to do some cleanup of
TIME_WAITs that need to be closed. These then cause the retransmits.

Thanks!
Paul



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to