Hello! We are on joyent and are having an issue with large machines that receive thousands of short-lived connections. We're using a base64 smartos image, joyent version 13.1.0. We're seeing a tcp retransmit rate of about 3 - 6 per second. This drives up our request queueing time by a minimum of 1 second (which is the default tcp initial retransmit timeout) for the requests that trigger a tcp retransmit.
Example: [18:16:43] [email protected] [~] > netstat -s 1 | grep tcpRetransSegs tcpRetransSegs =5353360 tcpRetransBytes =489877464 tcpRetransSegs = 1 tcpRetransBytes = 0 tcpRetransSegs = 3 tcpRetransBytes = 0 tcpRetransSegs = 3 tcpRetransBytes = 1216 tcpRetransSegs = 2 tcpRetransBytes = 0 tcpRetransSegs = 3 tcpRetransBytes = 0 tcpRetransSegs = 2 tcpRetransBytes = 0 tcpRetransSegs = 3 tcpRetransBytes = 0 tcpRetransSegs = 6 tcpRetransBytes = 0 tcpRetransSegs = 2 tcpRetransBytes = 0 We have on average about 6,300 established connections per app server. We have over 15,000 TIME_WAIT connections on average: [18:28:39] [email protected] [~] > netstat -a | grep TIME_WAIT | wc -l 15124 All TCP statistics: TCP tcpRtoAlgorithm = 4 tcpRtoMin = 400 tcpRtoMax = 60000 tcpMaxConn = -1 tcpActiveOpens =793317711 tcpPassiveOpens =913623229 tcpAttemptFails =1066594 tcpEstabResets =1744216 tcpCurrEstab = 6226 tcpOutSegs =37625293048 tcpOutDataSegs =1253992774 tcpOutDataBytes =4057941516 tcpRetransSegs =5355414 tcpRetransBytes =489906426 tcpOutAck =2123412674 tcpOutAckDelayed =3078925761 tcpOutUrg = 0 tcpOutWinUpdate = 73 tcpOutWinProbe = 127 tcpOutControl =3549154370 tcpOutRsts =132941554 tcpOutFastRetrans = 530 tcpInSegs =50209254986 tcpInAckSegs = 0 tcpInAckBytes =563239377 tcpInDupAck =1063545817 tcpInAckUnsent = 0 tcpInInorderSegs =500722163 tcpInInorderBytes =3376172909 tcpInUnorderSegs =858659 tcpInUnorderBytes =966995186 tcpInDupSegs =9439326 tcpInDupBytes =4088222 tcpInPartDupSegs = 13 tcpInPartDupBytes = 10196 tcpInPastWinSegs = 78 tcpInPastWinBytes =1610024294 tcpInWinProbe = 0 tcpInWinUpdate = 30 tcpInClosed =32204385 tcpRttNoUpdate = 5652 tcpRttUpdate =2528802492 tcpTimRetrans =2155791 tcpTimRetransDrop =101699 tcpTimKeepalive =663046 tcpTimKeepaliveProbe=410002 tcpTimKeepaliveDrop = 24406 tcpListenDrop = 6602 tcpListenDropQ0 = 0 tcpHalfOpenDrop = 0 tcpOutSackRetrans =157261 tcpInErrs = 8 udpNoPorts =120132424 Note that tcpListenDrops happen very infrequently, so these don't seem to be a core issue. We've been able to reproduce this issue on a smaller app server that's receiving no traffic by sending a high enough number of requests so that we get over 8,000 total connections (TIME_WAIT and ESTABLISHED). Then about 1 out of every 15 to 20 requests has a delay of 1 second. Any suggested tunings? It seems as though the OS occasionally pauses sending ACKs under a high connection rate, possibly to do some cleanup of TIME_WAITs that need to be closed. These then cause the retransmits. Thanks! Paul ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com
