Thomas,
On 7/23/24 13:44, James H. H. Lampert wrote:
Ladies and Gentlemen:
We still have a chronic Tomcat crashing problem at one of our
installations.
The weirdest thing about this is that while this is certainly *one* of
our heaviest-usage installations, it's not *the* heaviest.
We already have Tomcat shutting down and restarting itself every night.
And we have the Catalina job and its associated JVM job running in a
private subsystem, with a 7G private memory pool before it starts to dip
into the base memory pool. And we're launching with (according to
catalina.out) -Xms4096m Xmx5120m.
Might I make a few recommendations and comments, here?
1. Don't bother setting the min and max heap sizes different from each
other. Assuming the process is going to grow to the max heap, you're
going to need the max anyway so you may as well allocate it all at once
up front.
2. What has to fit into that 7GiB private memory pool? Does it include
any OS, or is it just the JVM itself?
3. Note that JVM memory requirements aren't limited to the heap. There's
plenty of "native memory" that is necessary to run a JVM as well. For
example, I have a production application with heap settings -Xms2048M
-Xmx2048M and 'ps' tells me that the virtual size of the process is
9.9GiB and the resident size is 2.7GiB. With a ~5GiB heap, you run the
risk of hitting your memory limit. If you are getting "OOME heap" then
this is not your issue, but it's something to think about.
Our webapp has integration with M$ Office 365, which this installation
uses.
The usual pattern when it starts to get into trouble may be connected
with that integration. Looking at a typical crash in catalina.out, I see
several OAuth2 errors that appear to involve an expired token, producing
lengthy (over 50 line) Java stacktraces.
Other errors seem to involve messages from graph.microsoft.com involving
"item not found," that seem to be connected with email attachment
downloads from Office365.
Then a NullPointerException is thrown, producing a stacktrace of over 60
lines.
Long stack traces are not uncommon in web-based applications, especially
if an application framework is involved and/or if you have many
"forward" operations where a request gets forwarded through several
components before a response is finally generated.
I wouldn't draw too many conclusions from the stack-trace size(s).
Then another Microsoft "item not found," like the previous one.
Then a handshaking error. Not sure what the handshaking error is *with.*
Then Tomcat runs out of memory in the Java heap space, does a dump, and
everything hits the proverbial fan. 4775 lines of catalina.out entries
before we manually shut it down with extreme prejudice and restart it,
508 of them before the first out-of-memory error, the rest after.
And yes, I've packaged up an excerpt to send to our webapp developers,
to see if they can make head or tail of what went wrong.
Anybody have any suggestions of what to look for?
As Thomas suggested, a heap dump would be helpful but if you have
already provided that to your application developers there is no need to
send it to us. But... if they aren't sure how to read it or aren't sure
what it's telling them... perhaps you should send THEM to *us*.
It's helpful to know some more about the application. How many users do
you typically serve at a time (concurrent logins, not necessarily
concurrent requests)? What kind of information gets stored long-term in
the application's heap? This would be things such as session storage
(per user, should be freed on logout) or caches (typically global,
either never freed or some kind of eviction policy like "last 1000
entries"). Sometimes, the application really is just storing too much in
memory.
I had an application that ran for years very happily with 64MiB of heap.
One day it started throwing OOMEs and when we investigated the cause, it
turns out we just had more users than we had in the past and we simply
needed to resize of heap.
(A much later analysis showed that handing much more heap to the memory
manager made the GC run a lot more efficiently and "stole" less time
away from the application over time. So we run with a 2GiB heap now,
even though we don't need nearly that much operating memory.)
-chris
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org