On 12/9/2013 3:12 PM, Christopher Schultz wrote:
Was it a transient error, or a chronic condition? A single thread can, for instance, spew objects into its stack or eden space exhausting memory but, when that thread hits the OOME, all those objects are freed which basically recovers from the situation. If, instead, you fill-up some shared cache, buffer, etc. and NO threads can get more memory, then you're basically toast. Which of the above was it?
It looked more like the first one though we still haven't tracked down the cause. We had several dozen threads running at the time. That's common for us. It's not that unusual for us to have a couple of hundred users with active sessions per server at any given time.
There are a bunch of things you can try to do. They all have their caveats, failure scenarios, and inefficacies. 1. Use -XX:OnOutOfMemoryError="cmd args;cmd args" Rig this to email you, register a passive-check data point with your monitoring server, etc. Just remember that OOMEs happen for a number of reasons. You could have run out of file handles or you could have run out of heap space.
That looks interesting. It wouldn't tell me about the error but at least I'd know that there was an OOME. Better than nothing and I can go check catalina.out. Of course, I still have the problem that threads silently fail and show my users not so much as an error message.
2. Use JMX monitoring, set java.lang:MemoryPool/[heap space]/UsageThreshold to whatever byte value you want to set as your limit. Then, check java.lang:MemoryPool/[heap space]/UsageThresholdExceeded to see if it is true. If so, your usage threshold has been exceeded. Note that this is not proof-positive than an OOME occurred. It's also tough to tell what value to use for the threshold. You can't really set it to MaxHeap - 1 byte, because you'll never get that value in practice. If you set it too low, you'll get warnings all the time when your heap usage rises in the normal course of business.
I'm less enthused about that one.
3. catch IOException in a filter and set an application attribute. Check this attribute from your monitor. I've been considering doing this, because I can rig it so that the error handler does not actually require any memory to run. The problem is that sometimes OOMEs interrupt one thread and not another. You may not catch the OOME in that thread -- it may happen in a background thread that does not go through the filter.
I'm not sure I understand this one. How does an IOException relate to an OOME?
4. You can do what I do: simply look at your total heap space by inspecting java.lang:Memory/HeapMemoryUsage["used"] and set a threshold that will cause your monitor to alarm for WARNING and CRITICAL conditions. You may recover and not have to check anything. These days, I get a false-alarm about once every 3 weeks when the heap space grows a hair higher than usual before a full GC runs and clears everything out. The nice thing about #4 is that you can find our early if you *might* be having a problem. Then you can keep an eye on your service to make sure it "recovers". If it never OOME's, great. If it does, you can manually restart or whatever. If it OOME's, and #1-#3 above fail because memory might be required to actually execute the do-this-thing-on-OOME action, then you might never get notified. With #4, you don't have to wait until an OOME to take action.
Is there a way I can get to this from my heartbeat servlet? --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org