Hi,

our customer is running a cluster of tomcat servlet engines. On these,
our web application is running. The basic setup is

Loadbalancer <---> Apache 1.3.x with mod_jk <---> Tomcat

with 2-3 Apache servers and >30 Tomcat instances bundled into clusters
of 3-5 instances each. Apache + Tomcat servers are running on recent
SUN multi-core machines under Solaris. The basic setup hasn't changed
much over the past few years, except occasional updates to soft- and
hardware, and the number of Tomcat instances has been increasing steadily.

Currently, they're using Tomcat-5.5.26 on SUN's jdk 1.5.0_10 (64 bit)
and mod_jk 1.2.28. Over the years, we have seen the same situation since
before Tomcat-5.5.12.

Most of the time, things work nicely. Occasionally, though, the whole
system comes to a complete halt. A post-mortem thread dump shows all (!)
worker threads on all instances waiting for input from the Apache servers,
e. g.:

"TP-Processor2432" daemon prio=10 tid=0x00b2f258 nid=0x9f1 runnable [0x7cfbf000.
.0x7cfbfa70]
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
        - locked <0x95947c70> (a java.io.BufferedInputStream)
        at org.apache.jk.common.ChannelSocket.read(ChannelSocket.java:626)
        at org.apache.jk.common.ChannelSocket.receive(ChannelSocket.java:564)
        at 
org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:691)
        at 
org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:895)
        at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
        at java.lang.Thread.run(Thread.java:595)

Due to the large number of machines involved and the high number of client
requests, it is impossible to see how such a situation evolves. We have
ruled out lengthy garbage collection pauses (CMS collector is enabled).
There is no obviously relevant information in the logfiles.

Usually, the situation can be resolved by restarting Apache and/or
(some) Tomcat servers, which makes DOS attacks unlikely, IMO.

Has anyone seen this situation before? Any ideas what could be the
problem, and how to resolve it? Any idea how to gain more information?

Thanks,
        Peter
-- 
Peter Conrad
Tivano Software GmbH
Bahnhofstr. 18
63263 Neu-Isenburg
Tel: 06102 / 8099070
Fax: 06102 / 8099071
HRB 11680, AG Offenbach/Main
Geschäftsführer: Martin Apel

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to