George,
Thanks for the thoughts.
My first thought was that the problem was hardware related and that the
reason I could not see the problem was that the memtest86 did not
sufficiently stress the machine to change the temperature enough to cause a
failure. Subsequently, I built up another server with entirely different
architecture (AMD vs Intel, different memory, different disks, etc.) and it
failed in exactly the same manner. I have added memory to this second
server just to test that we were not running out of memory by some fluke
but those tests failed in exactly the same manner. My conclusion is that
the problem is not hardware but rather either the Sun JVM (the only one I
have used) or an errant piece of native code somewhere. I have tried to
find the errant code by reducing the machine to the absolute minimum
required to run the applicxation and that also showed the same failure.
Chris suggested using strace (which I have) but I inadvertently overwrote
the file containing the failure (not one of my brighter moves.)
Thanks,
Carl
----- Original Message -----
From: "George Sexton" <geor...@mhsoftware.com>
To: "'Tomcat Users List'" <users@tomcat.apache.org>
Sent: Tuesday, February 23, 2010 7:45 PM
Subject: RE: Tomcat dies suddenly
-----Original Message-----
From: Carl [mailto:c...@etrak-plus.com]
Sent: Tuesday, February 23, 2010 5:09 AM
To: Tomcat Users List
Subject: Re: Tomcat dies suddenly
Just an update.
After 8 1/2 days, on the newly built Slackware machine with the JRE in
the
Slackware distribution removed bebore installing the operating system
and
using the newest version of the mysql-connector, the system failed in
exactly the same fashion as the previous attempts: ran beautifully
right up
to the point of failure and the failure was the JVM being stopped with
a
reported seg fault.
Changed this server to the IBM JVM. Tested it locally (directly
accessed
the IP within the DMZ) and it worked great. Switched it to production
early
this morning (4:30AM before people start coming onto the system) and
everything seemed good. Then, specific customers (the rest were able
to
come in just fine) starting getting 404's (we use only https, didn't
have a
Carl,
Just out of curiosity, have you tried building out machines with DIFFERENT
hardware. E.G. building out a server using an IBM or HP computer, rather
than than the ones you already have. If I recall correctly, you started this
thread out with SIG 11's.
SIG 11's on Linux are quite often hardware problems. I know you've done
memtest, but sometimes that's not enough. Here's a link to a problem I had:
http://archive.lug.boulder.co.us/Week-of-Mon-20071210/035903.html
To make a long story short, there was random disk corruption that was
happening. When I stopped using the on-board controller and went to a PCI
one, the computer would reboot itself under heavy load. Some of the static
burn-in utilities can miss hardware defects because they don't actually
stress the system. E.G. power, CPU, disks, etc. The problem was a specific
rev of a specific motherboard.
I think you need to step back, get a computer from a different manufacturer
and test. You've tried different OS's, different JVMs, different everything,
but different hardware. By your own admission, the app used run flawlessly
on an older server.
When you've eliminated everything else, the thing that remains, however
unlikely must be the culprit.
George Sexton
MH Software, Inc.
http://www.mhsoftware.com/
Voice: 303 438 9585
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org