Hi,

I'm encountering a strange problem when running our JMeter test plan.

I effectively get a halt to all JMeter activity for up to 60 seconds (a 
strangely round number) in the middle of the test. JMeter treats this as 
elapsed time when it processes the HTTP responses after the delay, so we get a 
spike in our response times graph (screenshot available here: 
https://imgur.com/ayKTzOU).

Test Description: A typical test contains 1000 users, over 2 hours, with a 15 
minute ramp time on requests. Randomised delays are present between all 
requests.

Platform Details from JMeter Log:
Version 5.1.1 r1855137
java.version=1.8.0_211
java.vm.name=Java HotSpot(TM) 64-Bit Server VM
os.name=Windows 10
os.arch=amd64
os.version=10.0
file.encoding=Cp1252
java.awt.headless=true
Max memory     =7654604800
Available Processors =4
Default Locale=English (EN)
JMeter  Locale=English (EN)

During this window of zero activity (in these examples between 11:38:44 and 
11:39:44), I have determined the following:


-          During this window there are zero HTTP requests being processed by 
the application servers.



-          After the delay, JMeter issues any requests whose delay timers have 
passed (no timers are longer than 1 or 2 seconds), causing a spike in requests.


-          Processor usage on the load source box where JMeter is running drops 
to almost zero during this period (you can see the perfmon capture for the test 
on the JMeter box here: https://imgur.com/vgVhB1X).



-          Disk queue length is also zero, which suggests it isn't a case of 
Windows blocking a process when it opens a file handle.


-          I've turned on GC Verbose logging, because my initial suspicion was 
GC STW pauses, but weirdly no GC happens at all during that period, the GC logs 
reports 60 seconds of application time, when normal time between GC is 1 second 
or less; here's an extract of the log:


Metaspace       used 53031K, capacity 83842K, committed 101364K, reserved 
1112064K
  class space    used 8006K, capacity 20674K, committed 38132K, reserved 
1048576K
}
[Times: user=0.19 sys=0.00, real=0.05 secs]
2019-06-17T11:38:44.243+0100: 1550.963: Total time for which application 
threads were stopped: 0.0541356 seconds, Stopping threads took: 0.0002249 
seconds
2019-06-17T11:38:44.246+0100: 1550.967: [GC concurrent-string-deduplication, 
2702.9K->1178.3K(1524.6K), avg 54.2%, 0.0038746 secs]
2019-06-17T11:39:44.846+0100: 1611.566: Application time: 60.6031019 seconds
{Heap before GC invocations=1196 (full 1):
garbage-first heap   total 7475200K, used 3054884K [0x0000000628400000, 
0x0000000628607210, 0x00000007f0800000)
  region size 2048K, 256 young (524288K), 32 survivors (65536K)
Metaspace       used 53616K, capacity 85276K, committed 102388K, reserved 
1114112K

-          If I turn on some of the debug logging in JMeter, (for the network 
activity), then JMeter does still write to the log during this period, but at a 
drastically reduced rate; for example, typically the 
PoolingHttpClientConnectionManager generates a 'Closing connections idle longer 
than 1 MICROSECONDS' log up to 10 times per second, but during this period it 
writes that message zero times.


-          Changing GC settings does have some effect on the frequency of this 
issue happening. With the default GC settings it happens quite often (multiple 
times per test). I'm currently running the following custom GC options, which 
reduces the frequency of the problem to about 50% probability of it happening 
once during a test:



-XX:+HeapDumpOnOutOfMemoryError

-Xms7299m

-Xmx7299m

-XX:MaxMetaspaceSize=256m

-XX:+UseG1GC

-XX:MaxGCPauseMillis=100

-XX:G1ReservePercent=20

-XX:+AlwaysPreTouch

-XX:+ParallelRefProcEnabled

-XX:+UseStringDeduplication

-XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses

-XX:NewSize=512m

-XX:MaxNewSize=3g

-XX:NewRatio=2

-XX:MaxMetaspaceExpansion=64M



Any clues as to additional logging to further investigate, or JMeter settings 
to change, would be greatly appreciated.

Thanks,

Alistair

==================================================================
Alistair Evans
Chief Software Architect for Wholesale Banking
7 Layer Solutions Limited
Tel: +44 (0)1242 225520
Email: [email protected]<mailto:[email protected]>
Web: www.7layer.net<http://www.7layer.net/>
==================================================================


-- 
This email was Virus checked by UTM 9.

This email is confidential and intended solely for the use of the individual to 
whom it is addressed. Any views or opinions presented are solely those of the 
author and do not necessarily represent those of 7 Layer Solutions Limited. If 
you are not the intended recipient, be advised that you have received this 
email in error and that any use, dissemination, forwarding, printing or copying 
of this email is strictly prohibited. If you have received this email in error 
please contact the sender.

7 Layer Solutions Limited registered in England and Wales, company registration 
number 3817853.
Registered offices: 17, St George’s Road, Cheltenham, Gloucestershire, GL50 3DT 
United Kingdom

Reply via email to