Thanks to Tim Lucia I found the solution to my problem. In case any one else gets this problem here's the answer.
Problem Briefly: Is there anyway to figure out what TomCat is doing, or trying to do, when it hangs and does not respond to any http or https request? Problem Details: I am running Tomcat 5.1.12 on Redhat9 on a 4 processor server. I get frequent but random Tomcat hangs. It has not happened on a 1 processor system, with either Linux or Windows. I can force the hang to happen fairly reliably if I run tests to bombard the server with http requests (several per second). According to logs it happens after the end of processing one request and before the beginning the next. It is apparently not within application code, unless it's a finalizer. I have run a higher priority daemon thread in same JVM that just writes the time to a log file, and it hangs at the same time, so it could be the JVM that's hanging, or whatever does the real threading. Mostly, but not always, 'top' shows the 'java' process using 99.9% of CPU, and 2 of the 4 processors at about 40%. I can kill the java process with 'kill -9', but I can't figure what it was stuck doing. Any suggestions? Answer: The linux command 'kill -QUIT <pid>' dumps the state of the JVM to catalina.out which shows, for example, where you are in your code if it is in an infinite loop or a wait-deadlock. kill -QUIT does not actually stop Tomcat. (You find the pid of tomcat to use in 'kill -QUIT <pid>' using the command 'ps -ef | grep java' which gives output like this: root 30625 1 0 Jan22 ? 00:10:00 /pgm/java/bin/java -Djava.util.logging.manager=org.apache.juli.ClassLoaderLo gManager -Djava.util.logging.config.file=/data/tomcat/conf/logging.propertie s -Djava.endorsed.dirs=/pgm/tomcat/common/endorsed -classpath :/pgm/tomcat/bin/bootstrap.jar:/pgm/tomcat/bin/commons-logging-api.jar -Dcat alina.base=/data/tomcat -Dcatalina.home=/pgm/tomcat -Djava.io.tmpdir=/data/t omcat/temp org.apache.catalina.startup.Bootstrap start root 11354 11056 0 08:30 pts/1 00:00:00 grep java The pid is 30625 in this case - so the command is 'kill -QUIT 30625' ) If kill -QUIT does not write stuff to catalina.out, the JVM is hung. This was my problem, and the cause was a kernel SMP threading bug. I switched from Redhat 9 (2.4.20 kernel) to Fedora Core 4 (2.6.11-1.1369_FC4smp kernel) and have now run for 48 hours without a hang. Changing LD_ASSUME_KERNEL also made a difference. See the tomcat release notes ... #GLIBC 2.2 / Linux 2.4 users should define an environment variable: #export LD_ASSUME_KERNEL=2.2.5 # #Redhat Linux 9.0 users should use the following setting to avoid #stability problems: #export LD_ASSUME_KERNEL=2.4.1 On Redhat 9 running on the 4-way SMP, LD_ASSUME_KERNEL=2.2.5, or nothing at all seemed to be more stable than the recommended LD_ASSUME_KERNEL=2.4.1. I am current running Fedora Core 4 with LD_ASSUME_KERNEL=2.2.5 and it seems to be stable. Dave --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]