Andre,

You have the ability to boil things down to the bare essentials.

1) you never saw this issue under a previous JVM 1.5 and Tomcat version 5.5.x

Correct.  (Running on a P4 with 32 bit Slackware.)

2) the problem happens on two separate servers, which seems to rule out a common server hardware issue

Correct.

3) it happens under different versions of Linux, which seems to rule out a problem with one particular Linux distribution

Correct... Slackware and openSuse.

4) it seems to be a SegFault in the JVM, leaving a core dump but no traces in the logs. (which SegFaults in my experience happen usually when trying to execute something which is not valid executable code for the platform at hand) Anyway, it does not seem to be due to running out of some resource, nor to a hidden call to system.exit().

Correct... might be some strange code someplace but I can't find any.

5) not quite sure of this anymore, but it seems to happen also on different JVMs, which would tend to rule out a problem with a particular JVM port.

No, I have only used Sun's 64 bit. Started with 1.6.0_17 and am now using 1.6.0_18.

6) it does not happen immediately, not in any obvious way related to what is being processsed, except that it seems to happen more readily under load

Correct although I am leaning more towards something related to accessing applications B, C and D. Correct that it does not seem to have an issue at any particular point in the code or after some activity by a user.

7) it is obviously not a common problem with either JVM or Tomcat, or we would have had laments from others by now

Correct, I think it is something specific to my setup.

8) I don't know how a Java/Tomcat webapp application could trigger a SegFault on its own, other than by having the JVM participate in it. And apparently your apps are working fine up to the moment of the sudden death, so for once they do not appear as being among the usual suspects.

Correct. I can see no degradation of speed right up to the moment of failure.

9) This, in one of your earlier posts, triggered my curiosity :
quote
This Tomcat is straight out of the box except for some modifications to JAVA_OPTS in tomcat/bin/catalina.sh (NDLR: canonically, a better place would be setenv.sh) and opening up ports and turning on SSL in tomcat/conf/server.xml.
unquote

So, maybe two suggestions, taking into account that I am just making wild guesses here (but that's pretty much what everyone by now is doing too, so I don't feel too bad) :

- have you tried running Tomcat from the command-line, with STDOUT/STDERR to the console ? Maybe something shows up there which doesn't show up anywhere else ?

I have been starting Tomcat from startup.sh which redirects STDOUT to catalina.out and STDERR to somewhere (I will have to look at it closer.) Starting tomorrow morning, the server which will be running production (I keep the other server in reserve for failures and the old server further back just in case I can't keep up with the failures) will be running under strace to see if that gives us anything (and I will be pounding on applications B, C and D just to see if I can force a failure.)


- what about this SSL ? that just seems to me a likely candidate for something that is maybe not used all the time, probably calls stuff which should be native code, and is usually provided separately from Tomcat.
Can you turn it off and still be operational ?
Also, if it is provided separately, it should probably be relatively "grouped" in some directory, making it easier to check if everything is as it should be.

We use SSL for all communications because most of the data we handle is personal data for children. Can't really turn that off.

Note also that apart from a direct hardware similarity between the servers on which it happens, another common element seems to be the place at which it happens, namely the server room. This is a long shot, but a power supply issue may also provoke hardware failures. Or if your server room is on top of a mountain, or near a particle accelerator ?
(re relativistic gamma rays, dark energy and all that stuff).
;-)

I am not certain but I do know I don't have to use any lights at night, I provide enough glowing (light) to see where I am going.

All of servers are on UPS's which are tested periodically.

Thanks for your thoughts, you have such a great way of analyzing problems.

Carl

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to