Andre,
You have the ability to boil things down to the bare essentials.
1) you never saw this issue under a previous JVM 1.5 and Tomcat version
5.5.x
Correct. (Running on a P4 with 32 bit Slackware.)
2) the problem happens on two separate servers, which seems to rule out a
common server hardware issue
Correct.
3) it happens under different versions of Linux, which seems to rule out a
problem with one particular Linux distribution
Correct... Slackware and openSuse.
4) it seems to be a SegFault in the JVM, leaving a core dump but no traces
in the logs.
(which SegFaults in my experience happen usually when trying to execute
something which is not valid executable code for the platform at hand)
Anyway, it does not seem to be due to running out of some resource, nor to
a hidden call to system.exit().
Correct... might be some strange code someplace but I can't find any.
5) not quite sure of this anymore, but it seems to happen also on
different JVMs, which would tend to rule out a problem with a particular
JVM port.
No, I have only used Sun's 64 bit. Started with 1.6.0_17 and am now using
1.6.0_18.
6) it does not happen immediately, not in any obvious way related to what
is being processsed, except that it seems to happen more readily under
load
Correct although I am leaning more towards something related to accessing
applications B, C and D. Correct that it does not seem to have an issue at
any particular point in the code or after some activity by a user.
7) it is obviously not a common problem with either JVM or Tomcat, or we
would have had laments from others by now
Correct, I think it is something specific to my setup.
8) I don't know how a Java/Tomcat webapp application could trigger a
SegFault on its own, other than by having the JVM participate in it.
And apparently your apps are working fine up to the moment of the sudden
death, so for once they do not appear as being among the usual suspects.
Correct. I can see no degradation of speed right up to the moment of
failure.
9) This, in one of your earlier posts, triggered my curiosity :
quote
This Tomcat is straight out of the box except for some modifications to
JAVA_OPTS in tomcat/bin/catalina.sh (NDLR: canonically, a better place
would be setenv.sh) and opening up ports and turning on SSL in
tomcat/conf/server.xml.
unquote
So, maybe two suggestions, taking into account that I am just making wild
guesses here (but that's pretty much what everyone by now is doing too, so
I don't feel too bad) :
- have you tried running Tomcat from the command-line, with STDOUT/STDERR
to the console ? Maybe something shows up there which doesn't show up
anywhere else ?
I have been starting Tomcat from startup.sh which redirects STDOUT to
catalina.out and STDERR to somewhere (I will have to look at it closer.)
Starting tomorrow morning, the server which will be running production (I
keep the other server in reserve for failures and the old server further
back just in case I can't keep up with the failures) will be running under
strace to see if that gives us anything (and I will be pounding on
applications B, C and D just to see if I can force a failure.)
- what about this SSL ? that just seems to me a likely candidate for
something that is maybe not used all the time, probably calls stuff which
should be native code, and is usually provided separately from Tomcat.
Can you turn it off and still be operational ?
Also, if it is provided separately, it should probably be relatively
"grouped" in some directory, making it easier to check if everything is as
it should be.
We use SSL for all communications because most of the data we handle is
personal data for children. Can't really turn that off.
Note also that apart from a direct hardware similarity between the servers
on which it happens, another common element seems to be the place at which
it happens, namely the server room. This is a long shot, but a power
supply issue may also provoke hardware failures. Or if your server room
is on top of a mountain, or near a particle accelerator ?
(re relativistic gamma rays, dark energy and all that stuff).
;-)
I am not certain but I do know I don't have to use any lights at night, I
provide enough glowing (light) to see where I am going.
All of servers are on UPS's which are tested periodically.
Thanks for your thoughts, you have such a great way of analyzing problems.
Carl
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org