Hi, our customer is running a cluster of tomcat servlet engines. On these, our web application is running. The basic setup is
Loadbalancer <---> Apache 1.3.x with mod_jk <---> Tomcat with 2-3 Apache servers and >30 Tomcat instances bundled into clusters of 3-5 instances each. Apache + Tomcat servers are running on recent SUN multi-core machines under Solaris. The basic setup hasn't changed much over the past few years, except occasional updates to soft- and hardware, and the number of Tomcat instances has been increasing steadily. Currently, they're using Tomcat-5.5.26 on SUN's jdk 1.5.0_10 (64 bit) and mod_jk 1.2.28. Over the years, we have seen the same situation since before Tomcat-5.5.12. Most of the time, things work nicely. Occasionally, though, the whole system comes to a complete halt. A post-mortem thread dump shows all (!) worker threads on all instances waiting for input from the Apache servers, e. g.: "TP-Processor2432" daemon prio=10 tid=0x00b2f258 nid=0x9f1 runnable [0x7cfbf000. .0x7cfbfa70] at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:313) - locked <0x95947c70> (a java.io.BufferedInputStream) at org.apache.jk.common.ChannelSocket.read(ChannelSocket.java:626) at org.apache.jk.common.ChannelSocket.receive(ChannelSocket.java:564) at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:691) at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:895) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Thread.java:595) Due to the large number of machines involved and the high number of client requests, it is impossible to see how such a situation evolves. We have ruled out lengthy garbage collection pauses (CMS collector is enabled). There is no obviously relevant information in the logfiles. Usually, the situation can be resolved by restarting Apache and/or (some) Tomcat servers, which makes DOS attacks unlikely, IMO. Has anyone seen this situation before? Any ideas what could be the problem, and how to resolve it? Any idea how to gain more information? Thanks, Peter -- Peter Conrad Tivano Software GmbH Bahnhofstr. 18 63263 Neu-Isenburg Tel: 06102 / 8099070 Fax: 06102 / 8099071 HRB 11680, AG Offenbach/Main Geschäftsführer: Martin Apel --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org