On 4 October 2017 15:17:25 BST, Mark Thomas <ma...@apache.org> wrote: >On 04/10/17 13:51, TurboChargedDad . wrote: >> Hello all.. >> I am going to do my best to describe my problem. Hopefully someone >will >> have some sort of insight. >> >> Tomcat 7.0.41 (working on updating that) >> Java 1.6 (Working on getting this updated to the latest minor >release) >> RHEL Linux >> >> I inherited an opti-tenant setup. Individual user accounts on the >system >> each have their own Tomcat instance, each is started using sysinit. >This >> is done to keep each website in its own permissible world so one >website >> can't interfere with a others data. >> >> There are two load balanced apache proxies at the edge that point to >one >> Tomcat server (I know I know but again I inherited this) >> >> Apache lays over the top of tomcat to terminate SSL and uses AJP to >> proxypass to each tomcat instance based on the users assigned port. >> >> Things have run fine for years (so I am being told anyway) until >recently. >> Let me give an example of an outage. >> >> User1, user2 and user3 all use unique databases on a shared database >> server, SQL server 10. >> >> User 4 runs on a windows jboss server and also has a database on >shared >> database server 10. >> >> Users 5-50 all run in the mentioned Linux server using tomcat and >have >> databases on *other* various shared databases servers but have >nothing to >> do with database server 10. >> >> User 4 had a stored proc go wild on database server 10 basically >knocking >> it offline. >> >> Now one would expect sites 1-4 to experience interruption of >service >> because they use a shared DBMS platform. However. >> >> Every single site goes down. I monitor the connections for each site >with a >> custom tool. When this happens, the connections start stacking up >across >> all the components. (Proxies all the way through the stack) >> Looking at the AJP connection pool threads for user 9 shows that user >has >> exhausted their AJP connection pool threads. They are maxed out at >300 yet >> that user doesn't have high activity at all. The CPU load, memory >usage and >> traffic for everything except SQL server 10 is stable during this >outrage. >> The proxies start consuming more and more memory the longer the >outrage >> occurs but that's expected as the connection counts stack up into the >> thousands. After a short time all the sites apache / ssl termination >later >> start throwing AJP timeout errors. Shortly after that the edge >proxies >> will naturally also starting throwing timeout errors of their own. >> >> I am only watching user 9 using a tool that allows me to have insight >into >> what's going on using JMX metrics but I suspect that once I get all >the >> others instrumented that I will see the same thing. Maxed out AJP >> connection pools. >> >> Aren't those supposed to be unique per user/ JVM? Am I missing >something in >> the docs? >> >> Any assistance from the tomcat gods is much appreciated. > >TL;DR - Try switching to the NIO AJP connector on Tomcat. > >Take a look at this session I just uploaded from TomcatCon London last >week. You probably want to start around 35:00 and the topic of thread >exhaustion.
Whoops. Here is the link. https://youtu.be/2QYWp1k5QQM Mark > >HTH, > >Mark > >P.S. The other sessions we have are on the way. I plan to update the >site and post links once I have them all uploaded. > >--------------------------------------------------------------------- >To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >For additional commands, e-mail: users-h...@tomcat.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org