Hello all.. I am going to do my best to describe my problem. Hopefully someone will have some sort of insight.
Tomcat 7.0.41 (working on updating that) Java 1.6 (Working on getting this updated to the latest minor release) RHEL Linux I inherited an opti-tenant setup. Individual user accounts on the system each have their own Tomcat instance, each is started using sysinit. This is done to keep each website in its own permissible world so one website can't interfere with a others data. There are two load balanced apache proxies at the edge that point to one Tomcat server (I know I know but again I inherited this) Apache lays over the top of tomcat to terminate SSL and uses AJP to proxypass to each tomcat instance based on the users assigned port. Things have run fine for years (so I am being told anyway) until recently. Let me give an example of an outage. User1, user2 and user3 all use unique databases on a shared database server, SQL server 10. User 4 runs on a windows jboss server and also has a database on shared database server 10. Users 5-50 all run in the mentioned Linux server using tomcat and have databases on *other* various shared databases servers but have nothing to do with database server 10. User 4 had a stored proc go wild on database server 10 basically knocking it offline. Now one would expect sites 1-4 to experience interruption of service because they use a shared DBMS platform. However. Every single site goes down. I monitor the connections for each site with a custom tool. When this happens, the connections start stacking up across all the components. (Proxies all the way through the stack) Looking at the AJP connection pool threads for user 9 shows that user has exhausted their AJP connection pool threads. They are maxed out at 300 yet that user doesn't have high activity at all. The CPU load, memory usage and traffic for everything except SQL server 10 is stable during this outrage. The proxies start consuming more and more memory the longer the outrage occurs but that's expected as the connection counts stack up into the thousands. After a short time all the sites apache / ssl termination later start throwing AJP timeout errors. Shortly after that the edge proxies will naturally also starting throwing timeout errors of their own. I am only watching user 9 using a tool that allows me to have insight into what's going on using JMX metrics but I suspect that once I get all the others instrumented that I will see the same thing. Maxed out AJP connection pools. Aren't those supposed to be unique per user/ JVM? Am I missing something in the docs? Any assistance from the tomcat gods is much appreciated. Thanks in advance. TCD