Christopher Schultz wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Bill,

On 12/9/13, 5:38 PM, Bill Davidson wrote:
Last week, one of my servers got an OutOfMemoryError at
approximately 1:21pm.

:(

It's worth pointing out that this is not a trivial issue.

My monitoring software which does a heart beat check once per
minute did not notice until 3:01pm.  Heart beat kept working for
over an hour and a half.

Was it a transient error, or a chronic condition? A single thread can,
for instance, spew objects into its stack or eden space exhausting
memory but, when that thread hits the OOME, all those objects are
freed which basically recovers from the situation.

If, instead, you fill-up some shared cache, buffer, etc. and NO
threads can get more memory, then you're basically toast.

Which of the above was it?

During that time my high capacity high availablity 24/7 application
was getting occasional OutOfMemoryError's until memory got bad
enough that even the heart beat check servlet failed.  Apparently
some things that allocate large chunks of memory started failing
first, but none of my customers called to complain.  Smaller stuff
continiued to work.  I didn't know until my monitoring software
sent me an email about the heart beat failure.

That doesn't work for me.  I need to know sooner.

+1

I thought of trying to handle it with error-page in web.xml.
Apparently that does not work.  I used java.lang.Throwable as the
exception-type. I was already using this for a number of common
exceptions to send me email.

In most OOME situations, your recovery options are limited... because
the JVM might need to allocate (a small amount of) memory in order to
even report the error.

I see the OutOfMemoryError's logged in my catalina.out

Is there some way that I can catch this so that I can send email or
something? I need to know as soon as possible so that I can attempt diagnosis and restart the server. Google has not been
helpful. Everything says that you have to fix the memory leak.
Duh.  I know that. We've fixed many over the years.  We haven't had
one in nearly 2 years. We thought we'd fixed them all.  We need to
find out about them sooner when they do happen.

There are a bunch of things you can try to do. They all have their
caveats, failure scenarios, and inefficacies.

1. Use -XX:OnOutOfMemoryError="cmd args;cmd args"

Rig this to email you, register a passive-check data point with your
monitoring server, etc. Just remember that OOMEs happen for a number
of reasons. You could have run out of file handles or you could have
run out of heap space.

2. Use JMX monitoring, set java.lang:MemoryPool/[heap
space]/UsageThreshold to whatever byte value you want to set as your
limit. Then, check java.lang:MemoryPool/[heap
space]/UsageThresholdExceeded to see if it is true. If so, your usage
threshold has been exceeded.

Note that this is not proof-positive than an OOME occurred. It's also
tough to tell what value to use for the threshold. You can't really
set it to MaxHeap - 1 byte, because you'll never get that value in
practice. If you set it too low, you'll get warnings all the time when
your heap usage rises in the normal course of business.

3. catch IOException in a filter and set an application attribute.
Check this attribute from your monitor.

I've been considering doing this, because I can rig it so that the
error handler does not actually require any memory to run. The problem
is that sometimes OOMEs interrupt one thread and not another. You may
not catch the OOME in that thread -- it may happen in a background
thread that does not go through the filter.

4. You can do what I do: simply look at your total heap space by
inspecting java.lang:Memory/HeapMemoryUsage["used"] and set a
threshold that will cause your monitor to alarm for WARNING and
CRITICAL conditions. You may recover and not have to check anything.
These days, I get a false-alarm about once every 3 weeks when the heap
space grows a hair higher than usual before a full GC runs and clears
everything out.

The nice thing about #4 is that you can find our early if you *might*
be having a problem. Then you can keep an eye on your service to make
sure it "recovers". If it never OOME's, great. If it does, you can
manually restart or whatever. If it OOME's, and #1-#3 above fail
because memory might be required to actually execute the
do-this-thing-on-OOME action, then you might never get notified. With
#4, you don't have to wait until an OOME to take action.


Here is another discussion of the matter :
http://forum.dlang.org/thread/ikpzfqonfhvrrsthc...@forum.dlang.org?page=3#post-kjcscn:241sap:241:40digitalmars.com

and another :
http://stackoverflow.com/questions/6244055/why-there-are-no-outofmemoryerror-subclasses


Based on :
>> I see the OutOfMemoryError's logged in my catalina.out
If so, can't you "pipe" your catalina.out through a program that will inspect each line (in real-time), and when it sees such a line, immediately send a signal somewhere ?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to