George,

Thanks for the thoughts.

My first thought was that the problem was hardware related and that the reason I could not see the problem was that the memtest86 did not sufficiently stress the machine to change the temperature enough to cause a failure. Subsequently, I built up another server with entirely different architecture (AMD vs Intel, different memory, different disks, etc.) and it failed in exactly the same manner. I have added memory to this second server just to test that we were not running out of memory by some fluke but those tests failed in exactly the same manner. My conclusion is that the problem is not hardware but rather either the Sun JVM (the only one I have used) or an errant piece of native code somewhere. I have tried to find the errant code by reducing the machine to the absolute minimum required to run the applicxation and that also showed the same failure. Chris suggested using strace (which I have) but I inadvertently overwrote the file containing the failure (not one of my brighter moves.)

Thanks,

Carl
----- Original Message ----- From: "George Sexton" <geor...@mhsoftware.com>
To: "'Tomcat Users List'" <users@tomcat.apache.org>
Sent: Tuesday, February 23, 2010 7:45 PM
Subject: RE: Tomcat dies suddenly



-----Original Message-----
From: Carl [mailto:c...@etrak-plus.com]
Sent: Tuesday, February 23, 2010 5:09 AM
To: Tomcat Users List
Subject: Re: Tomcat dies suddenly

Just an update.

After 8 1/2 days, on the newly built Slackware machine with the JRE in
the
Slackware distribution removed bebore installing the operating system
and
using the newest version of the mysql-connector, the system failed in
exactly the same fashion as the previous attempts: ran beautifully
right up
to the point of failure and the failure was the JVM being stopped with
a
reported seg fault.

Changed this server to the IBM JVM.  Tested it locally (directly
accessed
the IP within the DMZ) and it worked great.  Switched it to production
early
this morning (4:30AM before people start coming onto the system) and
everything seemed good.  Then, specific customers (the rest were able
to
come in just fine) starting getting 404's (we use only https, didn't
have a


Carl,

Just out of curiosity, have you tried building out machines with DIFFERENT hardware. E.G. building out a server using an IBM or HP computer, rather than than the ones you already have. If I recall correctly, you started this thread out with SIG 11's.

SIG 11's on Linux are quite often hardware problems. I know you've done memtest, but sometimes that's not enough. Here's a link to a problem I had:

http://archive.lug.boulder.co.us/Week-of-Mon-20071210/035903.html

To make a long story short, there was random disk corruption that was happening. When I stopped using the on-board controller and went to a PCI one, the computer would reboot itself under heavy load. Some of the static burn-in utilities can miss hardware defects because they don't actually stress the system. E.G. power, CPU, disks, etc. The problem was a specific rev of a specific motherboard.

I think you need to step back, get a computer from a different manufacturer and test. You've tried different OS's, different JVMs, different everything, but different hardware. By your own admission, the app used run flawlessly on an older server.

When you've eliminated everything else, the thing that remains, however unlikely must be the culprit.




George Sexton
MH Software, Inc.
http://www.mhsoftware.com/
Voice: 303 438 9585



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to