Re: Potential Bug in Master-Slave with Replicated LevelDB Store

Tim Bain Thu, 30 Oct 2014 11:32:49 -0700

I spent time last week trying to tune the parallel GC to prevent any
objects from reaching OldGen once the broker was up and running in a steady
state, to try to avoid expensive full GCs.  My goal was zero full GCs for a
broker with 3-6 months of uptime, to prevent clients and other brokers from
failing over from one broker to another.

I increased the size of NewGen relative to OldGen, I increased the size of
Survivor relative to Eden, and I tweaked a few other settings, and I was
never able to avoid a slow stream of objects making it into OldGen that
were deemed dead by the time a full GC happened (usually because I
triggered it manually).  I was able to reduce the rate of object promotion
by about half, and full GCs would probably have been less painful when
OldGen is only 5-10% of the total heap, so the changes should have made
full GCs less frequent and less painful, but I wasn't able to eliminate
them entirely.

So I've given up on the parallel GC and I'm now tweaking G1 to make it
behave as we'd like, and so far the results are far more promising than
with the parallel GC.  So I second Ulrich's recommendation to use G1 rather
than parallel GC, even though the overhead of G1 is several times that of
the parallel GC, if you're more interested in avoiding occasional lengthy
pauses due to full GCs than in getting the highest possible throughput from
your broker.

On Tue, Oct 21, 2014 at 10:13 AM, Tim Bain <tb...@alumni.duke.edu> wrote:

> G1GC is great for reducing the duration of any single stop-the-world GC
> (and hence minimizing latency of any individual operation as well as
> avoiding timeouts), but the total time spent performing GCs (and hence the
> total amount of time the brokers are paused) is several times that of the
> parallel GC algorithm, based on some articles I read a couple weeks back.
> So although G1GC should work for a wide range (possibly all) of ActiveMQ
> memory usage patterns and may be the right option for you based on how your
> broker is used, you may get better overall throughput from sticking with
> ParallelGC but adjusting the ratio of YoungGen to OldGen to favor YoungGen
> (increasing the odds that a message gets GC'ed before it gets to OldGen)
> and the ratio of Eden to Survivor within YoungGen to favor Survivor (to
> increase the odds that a message can stick around in YoungGen long enough
> to die before it gets promoted to OldGen).  But you have to be confident
> that your usage patterns won't allow OldGen to fill during the life of your
> broker's uptime (whether that's hours or years), otherwise you'll end up
> doing a long full GC and you'd probably have been better off going with
> G1GC.
>
> For our broker, we expire undelivered messages quickly (under a minute),
> so in theory expanding both YoungGen and Survivor might prevent anything
> from getting into OldGen and thus prevent long full GCs.  I'm actually
> going to be doing this tuning this week, so I'll report out what I find,
> though obviously YMMV since everyone's message usage patterns are different.
>
> On Tue, Oct 21, 2014 at 5:25 AM, uromahn <ulr...@ulrichromahn.net> wrote:
>
>> Another update:
>>
>> I ran the broker with the native Java LevelDB and found that I am still
>> seeing the Warnings in the log file as reported before.
>>
>> However, to my surprise the broker seem to perform better and even
>> slightly
>> faster! I always thought the native LevelDB should be faster but I guess
>> the
>> access via JNI may be less optimal than using an embedded Java (or Scala)
>> engine.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://activemq.2283324.n4.nabble.com/Potential-Bug-in-Master-Slave-with-Replicated-LevelDB-Store-tp4686450p4686583.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>
>
>

Re: Potential Bug in Master-Slave with Replicated LevelDB Store

Reply via email to