Hello,
W're running several instances of Tomcat 8.5.20 / JDK8.144 / CentOs7 on our 
company for various web sites in many hosts. Recently I’m trying to understand 
a performance problem we’re having on our e-commerce web site.
The configuration is the following

HAProxy   <—> 2x Tomcat 8.5.20 <—>  JBoss 5.1 EJB <—> Postgres 9.6

Tomcat runs a web site built with Struts / Freemarker that does call JBoss EJBs 
with RMI.
Monitoring a specific task (putting a product on the cart) I see the following :

- with a fresh started tomcat instance, the time it takes is around 0,8 
seconds. Most of the time is spent on the two RMI calls the task does.
- with an instance that is running from some time, the time can reach 2/3 
seconds; occasionally 5/6 seconds. Most time is still spent on RMI calls. I.e. 
what slows down are the RMI calls.
- restarting the jvm fixes the issue
- ***it seems*** but I’m still testing this since it seems there’s no 
‘meatspace gc trigger command available', that when Metaspace is garbage 
collected, tomcat then performs like a fresh instance.

Since we’re using more than one tomcat instance (2 in production for this 
website, 1 for development) I can see that the issue is isolated to Tomcat or 
the JVM/Host where it runs because other Tomcat instances behave well at the 
same time one is slow. The same JBoss/Postgres backend is used by java batches 
and fat clients and it does work well with consistent times.

To clarify: the moment one production tomcat that is running from some time 
finishes the task in 3 seconds, the development tomcat or a fresh started 
production tomcat instance does the same task in less that one second. Note 
that repeating the task gives always consistent results, i.e. the instance is 
running from some time is always slow,  the fresh running instance is always 
fast.

Tomcat is running with these VM options: 

-Xms20480m -Xmx20480m -Dsun.rmi.dgc.client.gcInterval=3600000 
-Dsun.rmi.dgc.server.gcInterval=3600000 -Xloggc:/tmp/gc.txt 
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9012 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps -XX:+UseG1GC -XX:ReservedCodeCacheSize=1g 
-XX:InitialCodeCacheSize=256m -XX:+UseHugeTLBFS -XX:MetaspaceSize=1g 
-XX:MaxMetaspaceSize=2g

Some of the options have been recently added (for example the increase in code 
cache  size) but it seems they had no impact on the issue.

Metaspace goes up to 1,6GB before being collected. Value after garbage collect 
is around 200MB. Heap usage is variable, it usually stays under 10G and is 
around 1G after garbage collect.
CPU usage rarely goes over 10%. Loaded classes between 20k and 40k. Active 
sessions around 100/120 for each instance.

Any help or direction to understand what’s causing this is greatly appreciated.
Thank you
—
Ing. Andrea Vettori

Reply via email to