Thomas,

On 7/23/24 13:44, James H. H. Lampert wrote:
Ladies and Gentlemen:

We still have a chronic Tomcat crashing problem at one of our installations.

The weirdest thing about this is that while this is certainly *one* of our heaviest-usage installations, it's not *the* heaviest.

We already have Tomcat shutting down and restarting itself every night. And we have the Catalina job and its associated JVM job running in a private subsystem, with a 7G private memory pool before it starts to dip into the base memory pool. And we're launching with (according to catalina.out) -Xms4096m Xmx5120m.

Might I make a few recommendations and comments, here?

1. Don't bother setting the min and max heap sizes different from each other. Assuming the process is going to grow to the max heap, you're going to need the max anyway so you may as well allocate it all at once up front.

2. What has to fit into that 7GiB private memory pool? Does it include any OS, or is it just the JVM itself?

3. Note that JVM memory requirements aren't limited to the heap. There's plenty of "native memory" that is necessary to run a JVM as well. For example, I have a production application with heap settings -Xms2048M -Xmx2048M and 'ps' tells me that the virtual size of the process is 9.9GiB and the resident size is 2.7GiB. With a ~5GiB heap, you run the risk of hitting your memory limit. If you are getting "OOME heap" then this is not your issue, but it's something to think about.

Our webapp has integration with M$ Office 365, which this installation uses.

The usual pattern when it starts to get into trouble may be connected with that integration. Looking at a typical crash in catalina.out, I see several OAuth2 errors that appear to involve an expired token, producing lengthy (over 50 line) Java stacktraces.

Other errors seem to involve messages from graph.microsoft.com involving "item not found," that seem to be connected with email attachment downloads from Office365.

Then a NullPointerException is thrown, producing a stacktrace of over 60 lines.

Long stack traces are not uncommon in web-based applications, especially if an application framework is involved and/or if you have many "forward" operations where a request gets forwarded through several components before a response is finally generated.

I wouldn't draw too many conclusions from the stack-trace size(s).

Then another Microsoft "item not found," like the previous one.

Then a handshaking error. Not sure what the handshaking error is *with.*

Then Tomcat runs out of memory in the Java heap space, does a dump, and everything hits the proverbial fan. 4775 lines of catalina.out entries before we manually shut it down with extreme prejudice and restart it, 508 of them before the first out-of-memory error, the rest after.

And yes, I've packaged up an excerpt to send to our webapp developers, to see if they can make head or tail of what went wrong.

Anybody have any suggestions of what to look for?

As Thomas suggested, a heap dump would be helpful but if you have already provided that to your application developers there is no need to send it to us. But... if they aren't sure how to read it or aren't sure what it's telling them... perhaps you should send THEM to *us*.

It's helpful to know some more about the application. How many users do you typically serve at a time (concurrent logins, not necessarily concurrent requests)? What kind of information gets stored long-term in the application's heap? This would be things such as session storage (per user, should be freed on logout) or caches (typically global, either never freed or some kind of eviction policy like "last 1000 entries"). Sometimes, the application really is just storing too much in memory.

I had an application that ran for years very happily with 64MiB of heap. One day it started throwing OOMEs and when we investigated the cause, it turns out we just had more users than we had in the past and we simply needed to resize of heap.

(A much later analysis showed that handing much more heap to the memory manager made the GC run a lot more efficiently and "stole" less time away from the application over time. So we run with a 2GiB heap now, even though we don't need nearly that much operating memory.)

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to