Thomas, James,

On 2/6/23 17:00, Thomas Hoffmann (Speed4Trade GmbH) wrote:
Hello James,

-----Ursprüngliche Nachricht-----
Von: James H. H. Lampert <jam...@touchtonecorp.com.INVALID>
Gesendet: Montag, 6. Februar 2023 18:18
An: Tomcat Users List <users@tomcat.apache.org>
Betreff: Re: AW: Having trouble with Tomcat crashes. Interesting memory
numbers in Manager

Thanks, Herr Hoffmann. Your questions were most helpful in determining
what information to gather and share. And thanks in advance to anybody
else who has any insights.

First, I will note that the seemingly non-sequitur nursery-survivor numbers
aren't just what we see during a crash; they're what we see when it's running
normally.

On 2/4/23 6:13 AM, Thomas Hoffmann (Speed4Trade GmbH) wrote:
Could you describe "crash" in a bit more detail?

Typically, the signed-on users start to get degraded response times, before it
becomes completely unresponsive.

- does the tomcat / java process run but is unresponsive?

Yes. Exactly. And shutting it down (and therefore freeing up the port for a
restart) takes a fairly sizeable amount of time, and leaves a core dump of
approximately 6G size, a Javacore dump of approximately 4M size, and a JIT
dump of approximately 20M size.

- does the java process crash itself (then there should be a logfile written)?
The job does not generally terminate itself, or even respond to a shutdown
request; it has to be forcibly terminated (given that it's running on an AS/400,
this would typically be either from WRKACTJOB, or from an ENDJOB
command, or from their GUI console equivalents).

This may be relevant: even when it is not in this state, the Tomcat server,
when being shut down, tends not to respond readily to shutdown requests.

- Is there any OOM message in the logfiles?
Not out-of-memory, but there are chronic problems with contacting outside
web services (many of them involving Oauth2), and with BIRT reporting.

Around the time of the shutdown, I typically see stuff like:
     Unhandled exception
     Type=Segmentation error vmState=0x00000000
     J9Generic_Signal_Number=00000004 Signal_Number=0000000b
Error_Value=00000000 Signal_Code=00000032

I am not sure whether this is going into catalina.out before or after the job is
forcibly terminated.

- Is the process still alive but CPU at 100% ?
Yes.

We just had a near-miss as I was typing this: CPU pushing up into the high
80s, and the JVM job for Tomcat eating up most of it, but it backed down to
something more normal without my having to intervene, and without any
sign of anybody else intervening.

One of my colleagues managed to get into manager during the near-miss,
and took a screen-shot. The "nursery-allocate" Used was at 400.97M (34%),
"nursery-survivor" as I described last week, "tenured-LOA" Used was at zero
used, and "tenured-SOA" was showing Initial 2918.40M, Total 3648.00M,
Maximum 4864.00M, and Used 1997.72M (41%).

--
JHHL

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

The observations looks like java is running out of memory and the garbage 
collector can't keep up with making memory free again.
Either the GC uses 100% or the application has some cpu intensive procedures. I 
would guess, that it’s the GC.
One option would be to open a JMX port on tomcat and use VisualVM to connect to 
the java process and inspect the memory and GC usage.
When the CPU is eating 100% CPU you might also consider generating a thread 
dump (kill -3) and check if there are any suspicious threads running.

Also setting the java options HeapDumpOnOutOfMemoryError and HeapDumpPath might 
help, if the process stops because of OOM.
If the GC can always free some bytes again, which the application is instantly 
eating again, an OOM might not occur.

You can also add parameters to log some GC statistics, but I never used that : 
https://sematext.com/blog/java-garbage-collection-logs/

I'm not ready to blame this on GC /yet/, but Thomas's recommendations align with mine: enable the memory-error-related options and take thread-dumps. They may be long, but several of them over time (even a few seconds of minutes) can really shed light into what's going on.

If you can also get some heap numbers from the running JVM over time that would be very helpful. There are various ways of doing this; I'm not sure what's most convenient in your environment. One way is to run the 'jmap' command. This is how it's done on *NIX and Windows systems:

$ jmap -heap [pid]

That will spit-out a human-readable set of heap-size/usage numbers. You may be able to copy/paste that into a spreadsheet or script the parsing of those dumps over time. The idea is to get a heap-space graph so you can see what's happening to the heaps over time. The ideal graph looks like a nice saw-tooth shape where the usage goes up steadily and then abruptly drops back to some benign size, then creeps back up again. If it looks like that, and achieves steady-state where the value it drops down to is roughly the same over time, you are good. If that lower-bound is creeping-up, it means that you either have a memory leak, or you have real memory needs which are increasing.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to