Looks like a application bug..probably a deadlock On Mon, 17 Jun 2019, 22:11 Felix Schumacher, < felix.schumac...@internetallee.de> wrote:
> > Am 17.06.19 um 15:32 schrieb Alistair Evans: > > Hi, > > > > I'm encountering a strange problem when running our JMeter test plan. > > > > I effectively get a halt to all JMeter activity for up to 60 seconds (a > strangely round number) in the middle of the test. JMeter treats this as > elapsed time when it processes the HTTP responses after the delay, so we > get a spike in our response times graph (screenshot available here: > https://imgur.com/ayKTzOU). > > Are you sure it is JMeter that is hanging for 60 seconds? Have you had a > look at the system under test? Maybe use another JMeter instance with > less load to look, if you get the same 60 seconds period of hangup. > > Is this halt reproducible? If so take a few threaddumps at those times > (both on JMeter side and the server side). > > > > > Test Description: A typical test contains 1000 users, over 2 hours, with > a 15 minute ramp time on requests. Randomised delays are present between > all requests. > > > > Platform Details from JMeter Log: > > Version 5.1.1 r1855137 > > java.version=1.8.0_211 > > java.vm.name=Java HotSpot(TM) 64-Bit Server VM > > os.name=Windows 10 > > os.arch=amd64 > > os.version=10.0 > > file.encoding=Cp1252 > > java.awt.headless=true > > Max memory =7654604800 > > Available Processors =4 > > Default Locale=English (EN) > > JMeter Locale=English (EN) > > > > During this window of zero activity (in these examples between 11:38:44 > and 11:39:44), I have determined the following: > > > > > > - During this window there are zero HTTP requests being > processed by the application servers. > As written above, have a look at the state of the server, too. > > > > > > - After the delay, JMeter issues any requests whose delay > timers have passed (no timers are longer than 1 or 2 seconds), causing a > spike in requests. > That is expected and a feature for JMeter waiting to be implemented > (that is handling these kind of overload gracefully and notifying the > user about such an overload period). > > > > > > - Processor usage on the load source box where JMeter is > running drops to almost zero during this period (you can see the perfmon > capture for the test on the JMeter box here: https://imgur.com/vgVhB1X). > Not unsual, waiting on locks or the server doesn't need much cpu. > > > > > > > > - Disk queue length is also zero, which suggests it isn't a > case of Windows blocking a process when it opens a file handle. > > > > > > - I've turned on GC Verbose logging, because my initial > suspicion was GC STW pauses, but weirdly no GC happens at all during that > period, the GC logs reports 60 seconds of application time, when normal > time between GC is 1 second or less; here's an extract of the log: > > > > > > Metaspace used 53031K, capacity 83842K, committed 101364K, > reserved 1112064K > > class space used 8006K, capacity 20674K, committed 38132K, reserved > 1048576K > > } > > [Times: user=0.19 sys=0.00, real=0.05 secs] > > 2019-06-17T11:38:44.243+0100: 1550.963: Total time for which application > threads were stopped: 0.0541356 seconds, Stopping threads took: 0.0002249 > seconds > > 2019-06-17T11:38:44.246+0100: 1550.967: [GC > concurrent-string-deduplication, 2702.9K->1178.3K(1524.6K), avg 54.2%, > 0.0038746 secs] > > 2019-06-17T11:39:44.846+0100: 1611.566: Application time: 60.6031019 > seconds > > {Heap before GC invocations=1196 (full 1): > > garbage-first heap total 7475200K, used 3054884K [0x0000000628400000, > 0x0000000628607210, 0x00000007f0800000) > > region size 2048K, 256 young (524288K), 32 survivors (65536K) > > Metaspace used 53616K, capacity 85276K, committed 102388K, > reserved 1114112K > > Nice to now that GC -- or rather memory usage -- is not a problem here. > > > > > - If I turn on some of the debug logging in JMeter, (for the > network activity), then JMeter does still write to the log during this > period, but at a drastically reduced rate; for example, typically the > PoolingHttpClientConnectionManager generates a 'Closing connections idle > longer than 1 MICROSECONDS' log up to 10 times per second, but during this > period it writes that message zero times. > Probably because all connections are still used waiting for the server. > > > > > > - Changing GC settings does have some effect on the frequency > of this issue happening. With the default GC settings it happens quite > often (multiple times per test). I'm currently running the following custom > GC options, which reduces the frequency of the problem to about 50% > probability of it happening once during a test: > > > > > > > > -XX:+HeapDumpOnOutOfMemoryError > > > > -Xms7299m > > > > -Xmx7299m > > > > -XX:MaxMetaspaceSize=256m > > > > -XX:+UseG1GC > > > > -XX:MaxGCPauseMillis=100 > > > > -XX:G1ReservePercent=20 > > > > -XX:+AlwaysPreTouch > > > > -XX:+ParallelRefProcEnabled > > > > -XX:+UseStringDeduplication > > > > -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses > > > > -XX:NewSize=512m > > > > -XX:MaxNewSize=3g > > > > -XX:NewRatio=2 > > > > -XX:MaxMetaspaceExpansion=64M > > Wow, that are a lot of GC options. I tend to start without any GC > options (except Xmx/Xms) and only add more if there are problems with GC. > > > > > > > > > > Any clues as to additional logging to further investigate, or JMeter > settings to change, would be greatly appreciated. > > Take threaddumps when the halt is happening. > > Felix > > > > > Thanks, > > > > Alistair > > > > ================================================================== > > Alistair Evans > > Chief Software Architect for Wholesale Banking > > 7 Layer Solutions Limited > > Tel: +44 (0)1242 225520 > > Email: alistair.ev...@7layer.net<mailto:alistair.ev...@7layer.net> > > Web: www.7layer.net<http://www.7layer.net/> > > ================================================================== > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@jmeter.apache.org > For additional commands, e-mail: user-h...@jmeter.apache.org > >