6-7 weeks ago, we built up some new servers and started having sudden failures... Tomcat just stops with no error message, no system error messages, nothing that I have been able to find so far.
To refresh everyone's memory, this is a new server, a Dell T110 with a Xeon 3440 processor and 4GB memory. I have turned off both the turbo mode and hyperthreading. The environment: 64 bit Slackware Linux java version "1.6.0_17" Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode) Tomcat: apache-tomcat-6.0.20 These are the current JAVA_OPTS="-Xms1024m -Xmx1024m -XX:PermSize=368m -XX:MaxPermSize=368m" I have observed the memory usage and general performance with Java VisualVM and have seen nothing strange. I thought I was seeing GC as memory usage was going up and down but in fact it was mostly people coming onto the system and leaving it. After several hours, the memory settles to a baseline of about 375MB. Forced GC never takes it below that value and the ups and downs from the people coming onto and leaving the system also returns it to pretty much that value. The maximum memory used never was above 700MB for the entire day. The server runs well, idling along at 2-5% load, except for a quick spike during GC, serving jsp's, etc. at a reasonable speed. Without warning and with no tracks in any log (Tomcat or system) or to the console, Tomcat just shuts down. I can usually simply restart it as the ports used by Tomcat are closed... today, I needed to run shutdown.sh before I could run startup.sh (startup.sh gave no errors but would not start Tomcat until I ran shutdown.sh and that process put nothing in the logs... this is the first time this has happened.) Sometimes, the system will run for a week, sometimes for only several hours, sometimes only for a few minutes. Today, it ran until about 1:00PM and has been down four times since then. The failure (Tomcat shutting down) is not always the same place in the code (I have some debugging messages going to catalina.out.) Load does not seem to make a difference. I have tried another sever (Dell T105, AMD processor, 6GB memory) and have observed the same results. I have run memTest86 on the T110 for about 30 hours and it showed nothing. I rebuilt the T110 with SUSE linux, Java 1.6.18 and Tomcat 6.0.24... it lasted 15 minutes. I have used the same server.xml on all the installs: <Server port="8005" shutdown="SHUTDOWN"> <!--APR library loader. Documentation at /docs/apr.html --> <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" /> <!--Initialize Jasper prior to webapps are loaded. Documentation at /docs/jasper-howto.html --> <Listener className="org.apache.catalina.core.JasperListener" /> <!-- JMX Support for the Tomcat server. Documentation at /docs/non-existent.html --> <Listener className="org.apache.catalina.mbeans.ServerLifecycleListener" /> <Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener" /> <!-- Global JNDI resources Documentation at /docs/jndi-resources-howto.html --> <GlobalNamingResources> <!-- Editable user database that can also be used by UserDatabaseRealm to authenticate users --> <Resource name="UserDatabase" auth="Container" type="org.apache.catalina.UserDatabase" description="User database that can be updated and saved" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" pathname="conf/tomcat-users.xml" /> </GlobalNamingResources> <!-- A "Service" is a collection of one or more "Connectors" that share a single "Container" Note: A "Service" is not itself a "Container", so you may not define subcomponents such as "Valves" at this level. Documentation at /docs/config/service.html --> <Service name="Catalina"> <!--The connectors can use a shared executor, you can define one or more named thread pools--> <!-- <Executor name="tomcatThreadPool" namePrefix="catalina-exec-" maxThreads="150" minSpareThreads="4"/> --> <!-- A "Connector" represents an endpoint by which requests are received and responses are returned. Documentation at : Java HTTP Connector: /docs/config/http.html (blocking & non-blocking) Java AJP Connector: /docs/config/ajp.html APR (HTTP/AJP) Connector: /docs/apr.html Define a non-SSL HTTP/1.1 Connector on port 8080 --> <Connector port="8080" protocol="HTTP/1.1" maxHttpHeaderSize="8192" maxThreads="600" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8443" scheme="http" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" /> <!-- A "Connector" using the shared thread pool--> <!-- <Connector executor="tomcatThreadPool" port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" /> --> <!-- Define a SSL HTTP/1.1 Connector on port 8443 This connector uses the JSSE configuration, when using APR, the connector should be using the OpenSSL style configuration described in the APR documentation --> <Connector port="8443" maxHttpHeaderSize="8192" maxThreads="600" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" disableUploadTimeout="true" acceptCount="100" scheme="https" secure="true" clientAuth="false" sslProtocol="TLS" SSLEnabled="true" keystoreFile="/usr/local/certs/tomcat_keystore.ks" keystorePass="jellybean"/> <!-- Define a SSL HTTP/1.1 Connector on port 443 --> <Connector port="443" maxHttpHeaderSize="8192" maxThreads="600" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" disableUploadTimeout="true" acceptCount="100" scheme="https" secure="true" clientAuth="false" sslProtocol="TLS" SSLEnabled="true" keystoreFile="/usr/local/certs/tomcat_keystore.ks" keystorePass="jellybean"/> <!-- <Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true" maxThreads="150" scheme="https" secure="true" clientAuth="false" sslProtocol="TLS" /> --> <!-- Define an AJP 1.3 Connector on port 8009 --> <Connector port="8009" enableLookups="false" redirectPort="443" protocol="AJP/1.3" /> <!-- An Engine represents the entry point (within Catalina) that processes every request. The Engine implementation for Tomcat stand alone analyzes the HTTP headers included with the request, and passes them on to the appropriate Host (virtual host). Documentation at /docs/config/engine.html --> <!-- You should set jvmRoute to support load-balancing via AJP ie : <Engine name="Catalina" defaultHost="localhost" jvmRoute="jvm1"> --> <Engine name="Catalina" defaultHost="localhost"> <!--For clustering, please take a look at documentation at: /docs/cluster-howto.html (simple how to) /docs/config/cluster.html (reference documentation) --> <!-- <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"/> --> <!-- The request dumper valve dumps useful debugging information about the request and response data received and sent by Tomcat. Documentation at: /docs/config/valve.html --> <!-- <Valve className="org.apache.catalina.valves.RequestDumperValve"/> --> <!-- This Realm uses the UserDatabase configured in the global JNDI resources under the key "UserDatabase". Any edits that are performed against this UserDatabase are immediately available for use by the Realm. --> <Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase"/> <!-- Define the default virtual host Note: XML Schema validation will not work with Xerces 2.2. --> <Host name="localhost" appBase="webapps" unpackWARs="true" autoDeploy="true" deployOnStartup="true" xmlValidation="false" xmlNamespaceAware="false"> <!-- SingleSignOn valve, share authentication between web applications Documentation at: /docs/config/valve.html --> <!-- <Valve className="org.apache.catalina.authenticator.SingleSignOn" /> --> <!-- Access log processes all example. Documentation at: /docs/config/valve.html --> <!-- <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs" prefix="localhost_access_log." suffix=".txt" pattern="common" resolveHosts="false"/> --> </Host> </Engine> </Service> </Server> When Tomcat shuts down, the memory that it was using seems to still be held (as seen from top) but it is nowhere near the machine physical memory. The application has been running on an older server (Dell 600SC, 32 bit Slackware, 2GB memory) for several years and, while the application will throw exceptions now and then, it never crashed. This lead me to believe the problem had something to do with the 64 bit JVM but, with without seeing errors anywhere, I can't be certain and don't know what I can do about it except go back to 32 bit. One time, I observed the heap and permGen memory usage with Visual JVM. It was running around 600MB before I forced a GC and 375MB afterward. Speed was good. Memory usage from top was 2.4GB. Five minutes later, Tomcat stopped leaving no tracks that I could find. The memory usage from top was around 2.4GB. The memory usage from Visual JVM was still showing 400MB+ although the Tomcat process was gone. I restarted Tomcat (did not reboot) so Tomcat had been shutdown gracefully enough to close the ports (8080, 8443, 443.) Tomcat stayed up for less than an hour (under light load) and stopped again. The memory used according to top was less than 3GB but I didn't get the exact number. I restarted it again (no server reboot) and it ran for the rest of the night (light load) and top was showing 3.3GB for memory in the morning. Anyone have any ideas how I might track this problem down? Thanks, Carl