Hi,
Did you monitor count of TIME_WAIT sockets during test? Did you check free
handlers count for process (jmeter/java)?
I see this (or similar) behaviour (Connection timeout) on windows 10 pro.
But I noticed that while I  have this behaviour when I use jmeter on laptop
connected with vpn, I don't have this then I use windows server in internal
net. The other components are similar (f5 and vip). My current
diagnosis/checking are is TIME_WAIT sockets on windows side. I don't check
components between yet.

Regards,
Mariusz

On Fri, 17 Apr 2020 at 11:48, Owen Pahl <[email protected]> wrote:

> Hi All,
>
> Apologies for the length of this post/question. This covers a problem that
> I have been attempting to debug for the past few months ...
>
> I am having an on-going problem with JMeter where I get connection time out
> failures once load passes a threshold on our load generation machines.
> The error rate is 0.5-1% of total number of transactions.
>
> Packet captures show JMeter sometimes not closing connections correctly and
> then re-using the port sometime later but before the other end has cleaned
> up the half-open connection.
> (I am also open to the possibility that this is a symptom rather than a
> cause).
>
> Further details:
> In this particular test scenario, I am testing a load-balanced application
> that is fronted by a VIP hosted on an F5.
>
> From the trace mentioned above, at loads above some threshold (yet to be
> determined) when the F5 initiates the connection tear-down the JMeter
> machine responds with an ACK but never sends a FIN to close its side of the
> connection.
> This leaves a half-open connection on the F5 but a closed connection on the
> JMeter machine. Some time later after the timed-wait period (2 mins) is
> passed but before the idle connection clean up job runs on the F5 (5 mins);
> a connection is initiated on the same local port. The F5 sees this as
> further traffic on a connection it thought it closed and re-sends its FIN.
> This confuses the JMeter side that thinks it is standing up a new
> connection and re-sends its SYN. This repeats with exponential back-off 7
> times as per the TCP spec. After this the connection is reported as failing
> with a connection time of ~127 seconds.
>
> To prove this is the probable cause I was able to temporarily get the clean
> up timeout on the F5 reduced to 90 seconds. In this scenario with the same
> applied load we saw no connection timeout errors.
>
> The load machine is RedHat 7.7 running as far as I am aware, the standard
> defaults (the host was built and maintained by a 3rd party provider).
> I have tried JMeter 5.0 (the current version we use at work) and 5.2.1,
> again running vanilla here other than setting a large JVM heap size. No
> changes to HTTPClient config, JVM args etc.
> The test is triggered from Jenkins although I doubt this has any impact as
> I am just starting JMeter as I would on the command line. Jenkins just
> helps with pulling the results back through many jumps.
> I have tried on the HotSpot 1.8.0_51 and openJDK 1.8.0_232 JVMs, both give
> the same behaviour.
> CPU, memory and GC activity all look low to normal during the test.
> The JMeter host is a virtual machine with 4 cpus and 16GB ram.
> And to discount issues with the host application, I have run the same load
> across multiple machines (on the same VLAN etc) and everything works as
> expected.
> During the test at the point where the errors are being logged, there are
> still over 21k ephemeral ports available.
> The process is peaking at ~1500 threads although I'm not sure if this is
> being reported correctly by RedHat. I was expecting around 6-7x that number
> as I have configured the script to download embedded resources with 6
> threads.
> The machine is peaking at ~7500 open files which matches roughly the number
> of open connections reported by ss as expected.
>
> I see in the documentation for Apache HttpClient there is a section on
> connection management that seems to describe the behaviour displayed.
>
>
> http://hc.apache.org/httpcomponents-client-4.3.x/tutorial/html/connmgmt.html#d5e405
> "One of the major shortcomings of the classic blocking I/O model is that
> the network socket can react to I/O events only when blocked in an I/O
> operation. When a connection is released back to the manager, it can be
> kept alive however it is unable to monitor the status of the socket and
> react to any I/O events. If the connection gets closed on the server side,
> the client side connection is unable to detect the change in the connection
> state (and react appropriately by closing the socket on its end)."
>
> It suggests running a separate thread to periodically clean up the
> connection pools by calling
> HttpClientConnectionManager.closeExpiredConnections();. I can see this
> currently gets called but only at the start of a new thread iteration.
>
> This leads to some obvious questions:
>
> - is this behaviour anyone has seen before?
> - is this a bug with JMeter/HTTP Client?
> - is this maybe a bug with Java? or potentially RedHat reusing ports too
> aggressively?
> - Any suggestions for a work-around or further diagnosis?
>
>
> Cheers,
> Owen
>

Reply via email to