Hi, Did you monitor count of TIME_WAIT sockets during test? Did you check free handlers count for process (jmeter/java)? I see this (or similar) behaviour (Connection timeout) on windows 10 pro. But I noticed that while I have this behaviour when I use jmeter on laptop connected with vpn, I don't have this then I use windows server in internal net. The other components are similar (f5 and vip). My current diagnosis/checking are is TIME_WAIT sockets on windows side. I don't check components between yet.
Regards, Mariusz On Fri, 17 Apr 2020 at 11:48, Owen Pahl <[email protected]> wrote: > Hi All, > > Apologies for the length of this post/question. This covers a problem that > I have been attempting to debug for the past few months ... > > I am having an on-going problem with JMeter where I get connection time out > failures once load passes a threshold on our load generation machines. > The error rate is 0.5-1% of total number of transactions. > > Packet captures show JMeter sometimes not closing connections correctly and > then re-using the port sometime later but before the other end has cleaned > up the half-open connection. > (I am also open to the possibility that this is a symptom rather than a > cause). > > Further details: > In this particular test scenario, I am testing a load-balanced application > that is fronted by a VIP hosted on an F5. > > From the trace mentioned above, at loads above some threshold (yet to be > determined) when the F5 initiates the connection tear-down the JMeter > machine responds with an ACK but never sends a FIN to close its side of the > connection. > This leaves a half-open connection on the F5 but a closed connection on the > JMeter machine. Some time later after the timed-wait period (2 mins) is > passed but before the idle connection clean up job runs on the F5 (5 mins); > a connection is initiated on the same local port. The F5 sees this as > further traffic on a connection it thought it closed and re-sends its FIN. > This confuses the JMeter side that thinks it is standing up a new > connection and re-sends its SYN. This repeats with exponential back-off 7 > times as per the TCP spec. After this the connection is reported as failing > with a connection time of ~127 seconds. > > To prove this is the probable cause I was able to temporarily get the clean > up timeout on the F5 reduced to 90 seconds. In this scenario with the same > applied load we saw no connection timeout errors. > > The load machine is RedHat 7.7 running as far as I am aware, the standard > defaults (the host was built and maintained by a 3rd party provider). > I have tried JMeter 5.0 (the current version we use at work) and 5.2.1, > again running vanilla here other than setting a large JVM heap size. No > changes to HTTPClient config, JVM args etc. > The test is triggered from Jenkins although I doubt this has any impact as > I am just starting JMeter as I would on the command line. Jenkins just > helps with pulling the results back through many jumps. > I have tried on the HotSpot 1.8.0_51 and openJDK 1.8.0_232 JVMs, both give > the same behaviour. > CPU, memory and GC activity all look low to normal during the test. > The JMeter host is a virtual machine with 4 cpus and 16GB ram. > And to discount issues with the host application, I have run the same load > across multiple machines (on the same VLAN etc) and everything works as > expected. > During the test at the point where the errors are being logged, there are > still over 21k ephemeral ports available. > The process is peaking at ~1500 threads although I'm not sure if this is > being reported correctly by RedHat. I was expecting around 6-7x that number > as I have configured the script to download embedded resources with 6 > threads. > The machine is peaking at ~7500 open files which matches roughly the number > of open connections reported by ss as expected. > > I see in the documentation for Apache HttpClient there is a section on > connection management that seems to describe the behaviour displayed. > > > http://hc.apache.org/httpcomponents-client-4.3.x/tutorial/html/connmgmt.html#d5e405 > "One of the major shortcomings of the classic blocking I/O model is that > the network socket can react to I/O events only when blocked in an I/O > operation. When a connection is released back to the manager, it can be > kept alive however it is unable to monitor the status of the socket and > react to any I/O events. If the connection gets closed on the server side, > the client side connection is unable to detect the change in the connection > state (and react appropriately by closing the socket on its end)." > > It suggests running a separate thread to periodically clean up the > connection pools by calling > HttpClientConnectionManager.closeExpiredConnections();. I can see this > currently gets called but only at the start of a new thread iteration. > > This leads to some obvious questions: > > - is this behaviour anyone has seen before? > - is this a bug with JMeter/HTTP Client? > - is this maybe a bug with Java? or potentially RedHat reusing ports too > aggressively? > - Any suggestions for a work-around or further diagnosis? > > > Cheers, > Owen >
