Hi, I'm encountering a strange problem when running our JMeter test plan.
I effectively get a halt to all JMeter activity for up to 60 seconds (a strangely round number) in the middle of the test. JMeter treats this as elapsed time when it processes the HTTP responses after the delay, so we get a spike in our response times graph (screenshot available here: https://imgur.com/ayKTzOU). Test Description: A typical test contains 1000 users, over 2 hours, with a 15 minute ramp time on requests. Randomised delays are present between all requests. Platform Details from JMeter Log: Version 5.1.1 r1855137 java.version=1.8.0_211 java.vm.name=Java HotSpot(TM) 64-Bit Server VM os.name=Windows 10 os.arch=amd64 os.version=10.0 file.encoding=Cp1252 java.awt.headless=true Max memory =7654604800 Available Processors =4 Default Locale=English (EN) JMeter Locale=English (EN) During this window of zero activity (in these examples between 11:38:44 and 11:39:44), I have determined the following: - During this window there are zero HTTP requests being processed by the application servers. - After the delay, JMeter issues any requests whose delay timers have passed (no timers are longer than 1 or 2 seconds), causing a spike in requests. - Processor usage on the load source box where JMeter is running drops to almost zero during this period (you can see the perfmon capture for the test on the JMeter box here: https://imgur.com/vgVhB1X). - Disk queue length is also zero, which suggests it isn't a case of Windows blocking a process when it opens a file handle. - I've turned on GC Verbose logging, because my initial suspicion was GC STW pauses, but weirdly no GC happens at all during that period, the GC logs reports 60 seconds of application time, when normal time between GC is 1 second or less; here's an extract of the log: Metaspace used 53031K, capacity 83842K, committed 101364K, reserved 1112064K class space used 8006K, capacity 20674K, committed 38132K, reserved 1048576K } [Times: user=0.19 sys=0.00, real=0.05 secs] 2019-06-17T11:38:44.243+0100: 1550.963: Total time for which application threads were stopped: 0.0541356 seconds, Stopping threads took: 0.0002249 seconds 2019-06-17T11:38:44.246+0100: 1550.967: [GC concurrent-string-deduplication, 2702.9K->1178.3K(1524.6K), avg 54.2%, 0.0038746 secs] 2019-06-17T11:39:44.846+0100: 1611.566: Application time: 60.6031019 seconds {Heap before GC invocations=1196 (full 1): garbage-first heap total 7475200K, used 3054884K [0x0000000628400000, 0x0000000628607210, 0x00000007f0800000) region size 2048K, 256 young (524288K), 32 survivors (65536K) Metaspace used 53616K, capacity 85276K, committed 102388K, reserved 1114112K - If I turn on some of the debug logging in JMeter, (for the network activity), then JMeter does still write to the log during this period, but at a drastically reduced rate; for example, typically the PoolingHttpClientConnectionManager generates a 'Closing connections idle longer than 1 MICROSECONDS' log up to 10 times per second, but during this period it writes that message zero times. - Changing GC settings does have some effect on the frequency of this issue happening. With the default GC settings it happens quite often (multiple times per test). I'm currently running the following custom GC options, which reduces the frequency of the problem to about 50% probability of it happening once during a test: -XX:+HeapDumpOnOutOfMemoryError -Xms7299m -Xmx7299m -XX:MaxMetaspaceSize=256m -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:G1ReservePercent=20 -XX:+AlwaysPreTouch -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses -XX:NewSize=512m -XX:MaxNewSize=3g -XX:NewRatio=2 -XX:MaxMetaspaceExpansion=64M Any clues as to additional logging to further investigate, or JMeter settings to change, would be greatly appreciated. Thanks, Alistair ================================================================== Alistair Evans Chief Software Architect for Wholesale Banking 7 Layer Solutions Limited Tel: +44 (0)1242 225520 Email: [email protected]<mailto:[email protected]> Web: www.7layer.net<http://www.7layer.net/> ================================================================== -- This email was Virus checked by UTM 9. This email is confidential and intended solely for the use of the individual to whom it is addressed. Any views or opinions presented are solely those of the author and do not necessarily represent those of 7 Layer Solutions Limited. If you are not the intended recipient, be advised that you have received this email in error and that any use, dissemination, forwarding, printing or copying of this email is strictly prohibited. If you have received this email in error please contact the sender. 7 Layer Solutions Limited registered in England and Wales, company registration number 3817853. Registered offices: 17, St George’s Road, Cheltenham, Gloucestershire, GL50 3DT United Kingdom
