Agreed. We’re still bringing up some instrumentation using Grafana/KairosDB so my plan is to get some GC logs and analyze this over time and look at memory as it grows.
If I can’t get that working in the next few months I might just use data dog as I know they have a decent JVM memory interface. Kevin On Tue, Dec 30, 2014 at 1:42 PM, Tim Bain <tb...@alumni.duke.edu> wrote: > That sounds like a plausible explanation of why you DLQ filled in that > situation. The more interesting question is what caused the GCing > behavior. > > I assume you don't have any GC logs/memory dumps to do forensics against, > so you're stuck trying to reproduce the problem so you can have something > to analyze? If so, remember that you don't have to actually get to the > point of doing continual GCs (which might take weeks); it's enough to see > that the amount of Old Gen usage observed immediately after you force a > full GC (e.g. from the Memory tab of JConsole) increases between successive > full GCs over a non-trivial time period. Then just analyze what's causing > the increase and that should be your answer. > On Dec 30, 2014 2:06 PM, "Kevin Burton" <bur...@spinn3r.com> wrote: > > > I think I figured out a realistic theory as to why my broker died last > week > > and ended up with my entire queue moving to the dead letter queue. > > > > The problem happens when ActiveMQ goes into a full-GC loop where it > > continually spends 100% CPU doing full -GCs. > > > > messages are still served, but only say 1% of them (because there’s no > CPU > > left to do much work). > > > > The problem comes from sessions or client acks where you have to commit() > > or ack a message manually. > > > > This never happens. Because the broker is spending all time doing GCs. > > > > So the 1% of messages that leave, only have a 1% chance of being ack’d… > > which means that 99% of messages that are served by the broker aren’t > > ack’d. > > > > This keeps happening slowly and your entire queue will be drained with > > messages going into the DLQ. > > > > It would probably end up with around say a small percentage of your > > messages being processed. But the majority would end up in the DLQ. > > > > > > > > -- > > > > Founder/CEO Spinn3r.com > > Location: *San Francisco, CA* > > blog: http://burtonator.wordpress.com > > … or check out my Google+ profile > > <https://plus.google.com/102718274791889610666/posts> > > <http://spinn3r.com> > > > -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com>