Taylan,
The failures we've seen are in anywhere between 8 hours to a week of
runtime.
The timing of the failures seems similar.
We have also had failures with hotspot error files (hs_err) present, and
the cause specified was indeed SIGSEGV indicating a page fault.
I have never seen any hs_* files but have seen core files where strace
showed the jvm stopped on a seg fault.
We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when
the situation allows (during regular updates of the application, or a
crash) to see if that helps.
I have used jdk 1.6.0_17 and 1.6.0_18 with the same results... have not
tried 1.6.0_16. Please post your results of this trial.
Running tomcat on the
foreground might show something, but then again I could be waiting for a
month for it to happen.
Yes, this has been part of my problem as anytime we change something, we
have to wait a week for the server to fail.
In one sense, I am fortunate that I have a little more flexibility than you.
I have two servers (different hardware) but only need one in service at a
time. Therefore, I always have one server I can test ideas on although I
have never been able to develop a meaningful stress test, i.e., the only way
I can test a change is to put it in production.
Thanks,
Carl
----- Original Message -----
From: "Taylan Develioglu"<tdevelio...@ebuddy.com>
To: "Tomcat Users List"<users@tomcat.apache.org>
Sent: Wednesday, February 24, 2010 8:31 AM
Subject: Re: jvm exits without trace
Hello Carl,
The failures we've seen are in anywhere between 8 hours to a week of
runtime. Most of them have (still) been running for almost a month
without failure. There are ~100 machines.
> From the top of my head, I think we've had about 10+ failures now.
We have also had failures with hotspot error files (hs_err) present, and
the cause specified was indeed SIGSEGV indicating a page fault. But I
don't know if the two are related.
We also use jdk 1.6.0_18, I'm downgrading the machines to 1.6.0_16 when
the situation allows (during regular updates of the application, or a
crash) to see if that helps.
It might be useful to note that the failures happen with tomcat 6.0.20
as well as 6.0.24.
As far as load concerns, I haven't had a failure on an idle machines.
The machines are well loaded, but only at a fraction limit in regards to
load and cpu utilization.
Most memory is commited to tomcat, where a 24G machine would have 18G
allocated to heap, 128M to permgen and some unspecified amount would get
used by jni for apr. About 4G remains free after calculating taking into
account the jvm itsself.
A 16G machine would have 12G allocated to the heap.
Besides the fact that our apps heavily use nio and mina I wouldn't say
there's anything else noteworthy. There can be anywhere up to 10000
concurrents on one machine.
I had searched for coredumps, but no luck. Running tomcat on the
foreground might show something, but then again I could be waiting for a
month for it to happen.
On Wed, 2010-02-24 at 12:42 +0100, Carl wrote:
Taylan,
I am the person who started the "Tomcat dies suddenly" thread which I
still
haven't resolved. I am curious about the pattern of failures you are
experiencing because they may provide some clues to my problem. In my
case,
the system will run for 15 minutes to 10 days before failing (most of the
time it is several days to a week.) It appears to die from a seg fault
in
the JVM (I am using Sun 1.6.0_18 but have tried previous versions)... you
may be able to see the cause of the failure from the core file (the core
files on my systems were in several directories so you may have to do a
'find' to locate them.) Load may be a factor but the failures generally
come after the load has been heavy for a while. I am running a couple of
applications and it seems the failures are more frequent when people are
hitting the additional apps (the primary app is always used, the
remaining
apps are used sporatically.)
How does this compare to what you are experiencing?
Thanks,
Carl
----- Original Message -----
From: "Taylan Develioglu"<tdevelio...@ebuddy.com>
To: "Tomcat Users List"<users@tomcat.apache.org>;<p...@pidster.com>
Sent: Wednesday, February 24, 2010 5:09 AM
Subject: Re: jvm exits without trace
The GC log shows plenty of heap space left in all the spaces.
I purposely didn't bother replacing the variables because I figured
they
would not be relevant.
But if you think they might provide clues they're as follows:
JAVA_HEAP_SIZE=18432M
JAVA_EDEN_SIZE=$(($(echo $JAVA_HEAP_SIZE|sed 's/M$\|G$//')/6))M
JAVA_PERM_SIZE=128M
JAVA_STCK_SIZE=128K
EDEN_SIZE is 1/6th of total heap.
And I said there was nothing in the system logs.
But you get a couple of points for trying.
On Wed, 2010-02-24 at 10:44 +0100, Pid wrote:
On 24/02/2010 09:36, Taylan Develioglu wrote:
I thought I'd add the connector definitions too, :
<Connector port="80"
protocol="org.apache.coyote.http11.Http11AprProtocol"
compression="1024" keepAliveTimeout="60000"
maxKeepAliveRequests="-1"
enableLookups="false" redirectPort="443"
maxThreads="150"
pollerSize="32768"
pollerThreadCount="4"/>
<Connector port="443"
protocol="org.apache.coyote.http11.Http11AprProtocol"
SSLEnabled="true"
enableLookups="false" maxThreads="10" scheme="https"
secure="true"
SSLCertificateFile="/etc/ssl/private/something.crt"
SSLCertificateKeyFile="/etc/ssl/private/something.key"
SSLCACertificateFile="/etc/ssl/certs/ca.crt"/>
On Wed, 2010-02-24 at 10:23 +0100, Taylan Develioglu wrote:
Hi,
I have jvm's, running tomcat and our application, exiting
mysteriously,
and was wondering if anyone could give me some advice on how to
debug
this thing.
There is nothing in catalina.out, nor our application logs, and no
hotspot error file. GC log looks normal. No trace in system logs.
I am left completely clueless :(, has anyone dealt with a problem
like
this before?
Any help appreciated.
- Tomcat 6.0.24
- TC native 1.1.18
- APR 1.3.9
- Sun JDK 6u18
- Debian Lenny, 2.6.31.10-amd64
2 servlets, one as ROOT. 2 HTTP connectors that use TCNative/APR.
JAVA_OPTS ( ):
-verbose:gc
-Djava.awt.headless=true
-Dsun.net.inetaddr.ttl=60
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=$TMP_DIR
-Djava.library.path=/usr/local/lib
-Djava.endorsed.dirs=$CATALINA_BASE/endorsed
-Dcatalina.base=$CATALINA_BASE
-Dcatalina.home=$CATALINA_HOME
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.util.logging.config.file="$CATALINA_BASE/conf/logging.properties"
-XX:+PrintGCDetails
-Xloggc:$CATALINA_BASE/logs/gc.log
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=70
-Xms$JAVA_HEAP_SIZE
-Xmx$JAVA_HEAP_SIZE
-XX:NewSize=$JAVA_EDEN_SIZE
-XX:MaxNewSize=$JAVA_EDEN_SIZE
-XX:PermSize=$JAVA_PERM_SIZE
-XX:MaxPermSize=$JAVA_PERM_SIZE
-Xss$JAVA_STCK_SIZE
-XX:+UseLargePages
There's no actual heap size settings in the above. But you get a
couple
of points for trying.
Google "Linux Out Of Memory killer" or "OOM Killer" and then check the
server logs carefully. (e.g. /var/log/messages)
p
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org