On 12/9/2013 3:12 PM, Christopher Schultz wrote:

Was it a transient error, or a chronic condition? A single thread can,
for instance, spew objects into its stack or eden space exhausting
memory but, when that thread hits the OOME, all those objects are
freed which basically recovers from the situation.

If, instead, you fill-up some shared cache, buffer, etc. and NO
threads can get more memory, then you're basically toast.

Which of the above was it?

It looked more like the first one though we still haven't tracked down the
cause.  We had several dozen threads running at the time.  That's
common for us.  It's not that unusual for us to have a couple of hundred
users with active sessions per server at any given time.

There are a bunch of things you can try to do. They all have their
caveats, failure scenarios, and inefficacies.

1. Use -XX:OnOutOfMemoryError="cmd args;cmd args"

Rig this to email you, register a passive-check data point with your
monitoring server, etc. Just remember that OOMEs happen for a number
of reasons. You could have run out of file handles or you could have
run out of heap space.

That looks interesting.  It wouldn't tell me about the error but at least I'd
know that there was an OOME.  Better than nothing and I can go check
catalina.out.  Of course, I still have the problem that threads silently fail
and show my users not so much as an error message.

2. Use JMX monitoring, set java.lang:MemoryPool/[heap
space]/UsageThreshold to whatever byte value you want to set as your
limit. Then, check java.lang:MemoryPool/[heap
space]/UsageThresholdExceeded to see if it is true. If so, your usage
threshold has been exceeded.

Note that this is not proof-positive than an OOME occurred. It's also
tough to tell what value to use for the threshold. You can't really
set it to MaxHeap - 1 byte, because you'll never get that value in
practice. If you set it too low, you'll get warnings all the time when
your heap usage rises in the normal course of business.

I'm less enthused about that one.

3. catch IOException in a filter and set an application attribute.
Check this attribute from your monitor.

I've been considering doing this, because I can rig it so that the
error handler does not actually require any memory to run. The problem
is that sometimes OOMEs interrupt one thread and not another. You may
not catch the OOME in that thread -- it may happen in a background
thread that does not go through the filter.

I'm not sure I understand this one.  How does an IOException relate to an OOME?

4. You can do what I do: simply look at your total heap space by
inspecting java.lang:Memory/HeapMemoryUsage["used"] and set a
threshold that will cause your monitor to alarm for WARNING and
CRITICAL conditions. You may recover and not have to check anything.
These days, I get a false-alarm about once every 3 weeks when the heap
space grows a hair higher than usual before a full GC runs and clears
everything out.

The nice thing about #4 is that you can find our early if you *might*
be having a problem. Then you can keep an eye on your service to make
sure it "recovers". If it never OOME's, great. If it does, you can
manually restart or whatever. If it OOME's, and #1-#3 above fail
because memory might be required to actually execute the
do-this-thing-on-OOME action, then you might never get notified. With
#4, you don't have to wait until an OOME to take action.

Is there a way I can get to this from my heartbeat servlet?



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to