>>The problem: periodically Tomcat stops serving up servlets and JSP from my
>>webapps.  However, the manager webapp and static pages continue to be
>>served normally.  If I click 'stop' for any of my webapps from the manager
>>app, the page load just hangs forever.  If you refresh the manager, it
>>shows the webapp as stopped.

>>You would think that I could just stop Tomcat with the
>>$CATALINA_HOME/bin/shutdown.sh script, but that doesn't seem to do anything
>>at all.  Generally I have to 'kill' the pid to get Tomcat to stop,
>
> Most likely you have non-daemon threads started up by web applications.
 
This turned out to be the case, but for a funky reason.  By using the AccessLogValve, 
we narrowed down the problem to a few JSP pages that were called.  These JSP's get 
data from a different database than all the others - via a PostgreSQL JDBC driver 
(data embedded in business objects, etc...).  The PostgreSQL server's hardware RAID 
array had a drive start to degrade in the mirror set the OS is on (also RH Linux 8), 
but the data set of disks was fine.  The drive didn't totally fail, but it was causing 
havoc in the Linux kernel - apparently something was not good between the hardware 
raid controller's firmware and the kernel.  When a new Connection to the db was 
attempted (and hence read/write activity on the OS set, I suppose to create a new 
pid), Tomcat would crash as described above.  
 
It turns out that the SCSI backplane was the real problem and had to be replaced.  
Since getting this server back up was the main priority, I didn't get to do more Java 
debugging and see exactly what was causing Tomcat to crash.   My best hypothesis is 
that the JDBC driver was the root cause, in that it didn't time out correctly when 
errors occured while creating the Connection.  I have since re-coded our 
ConnectionPool so that it abandons the Connection creation thread after 5 seconds 
(i.e. Thread.join(5000)).  There is undoubtedly still the possibility of a memory leak 
here, but that's the least of my concerns at this second.  I'm also planning on 
reporting this to the PostgreSQL-JDBC list to see if this is a possible bug in the 
driver.
 
>Turn on debug="99" in server.xml (wherever you see debug="0").  Same
>thing for the servlets defined in $CATALINA_HOME/conf/web.xml.
 
Wow....that's a lot of logging on a production machine.  It was very hard to find 
anything useful inside all the clutter, but by selective use of debug="99" in certain 
containers it was at least manageable.

> Are you running with a security manager?  Actually I was going to ask this at
> the beginning: is this happening even when only the tomcat webapps
> (admin, manager, docs, examples) are installed?  Is it one of your
> webapps causing this behavior or one of tomcat's?

No security manager.  I removed all the tomcat webapps besides manager and it didn't 
make a difference.  My real question is why does this runaway thread crash almost 
everything in Tomcat?  It doesn't seem to kill a lot of resources.  If a single 
execution thread (from a servlet or JSP) goes on infinitely, why does it screw up the 
whole servlet container?  Or rather, part of the container since static pages and the 
manager app still worked fine even when things were bad.
 
On a related note, how do you know when you reach a load that exceeds the capacity of 
your server?  I'm happy to set up load-balancing if need be, but I have no idea how to 
detect the need other than anecdotal "slowness".
 
Thanks for your insightful questions and comments.
 
Roman

Reply via email to