Re: Upgrading 0.20.6 -> 0.89

Dmitriy Lyubimov Wed, 29 Sep 2010 15:25:04 -0700

Ok, thank you, Ryan. I certainly can't claim a deep intrinsic knowledge on
GC, just read the Sun's GC guide which doesn't seem to draw that clear
distinction between full gc and tenured gen GC, and since then acted on
their examples given for figuring pause times -- which seem to refer to
tenured gen gc only, i guess.


On Wed, Sep 29, 2010 at 1:54 PM, Ryan Rawson <ryano...@gmail.com> wrote:

> FullGC is not the CMS cycle... During a FullGC the entire main heap is
> rewritten and compacted.  This can happen due to fragmentation issues,
> and can happen if the CMS cycle does not finish releasing enough
> memory before the minor GC needs to promote (aka: concurrent promotion
> failure, or something to that effect). This is NOT the same as the CMS
> cycle, and during FullGC (rewriting/compacting heap) all threads are
> indeed paused.
>
> On a modern i7 oriented architecture with the fast memory buses I have
> heard people say that it is about 1 second per GB of heap.  8 gb heap
> = 8s pause.
>
> However experience has shown that this can be substantially longer, up
> to 20 seconds, sometimes 2 x 20 second pauses back to back causing ZK
> to time out and the master to presume the RS is dead and recover the
> log.  At this point when the RS recovers it must self terminate
> because the regions dont 'belong' to itself anymore.
>
>
>
> On Wed, Sep 29, 2010 at 10:51 AM, Dmitriy Lyubimov <dlie...@gmail.com>
> wrote:
> >> Full GCs do happen. We have it at 40 seconds here.
> >
> > Jean-Daniel, Is it total with concurrent CMS?
> >
> > 40 seconds is a plausible number for full CMS, even more plausible for
> > i-CMS, so i assume that's what you are quoting here.
> >
> > But CMS doesn't pause jvm for that much time. Most of that time is spent
> in
> > a single background thread. It only pauses the entire jvm for
> non-concurrent
> > steps that are supposed to be optimized to be very short. In my
> experience
> > non-concurrent pauses are normally under 200ms even for processes with a
> > very heavy memory use (~16G heap) (unless you are having a really bad
> case
> > of GC thrashing). I never saw one actually logging any more than
> 200-300ms
> > for non-conrrent remark phase -- and that was a really bad case. I never
> > checked HBase's GC logs so far though. If Hbase does have those
> > non-concurrent pauses any bigger, >1 s, that's kind of .. unusual and
> > severe, for lack of a better word, for a real time process, imo.
> >
> > So therefore the process can't go 'numb' for that much from ZK's point of
> > view, if it does, then it's a problem imo. It's more likely that such a
> > delay may be associated with VM swap activities, but if GC can freeze
> HBase
> > that bad.. that would be bad indeed from acceptance point of view.
> >
> >
> > On Wed, Sep 29, 2010 at 9:37 AM, Jean-Daniel Cryans <jdcry...@apache.org
> >wrote:
> >
> >> I'd say it mostly depends on your tolerance to regions being
> >> unavailable while the recovery happens. You have to account for the ZK
> >> timeout (60 secs by default), plus the time to split (I don't have any
> >> good metric for that, usually it's kinda fast but you should try it
> >> with your data), plus the time to reassign the regions and, when they
> >> open, the replay of the split logs. Some tips:
> >>
> >>  - Lower ZK timeout means you are more likely to have long GC pauses
> >> taking down your region server (well, the master assumes the RS is
> >> dead). Full GCs do happen. We have it at 40 seconds here.
> >>  - Smaller HLogs / fewer HLogs means you will force flush regions a
> >> lot more in order to keep the data to be replayed lower. Here we have
> >> hbase.regionserver.maxlogs=8 so we do flush a lot, but we serve a live
> >> website. In the future, distributed log splitting (once committed)
> >> will make it faster.
> >>
> >> J-D
> >>
> >> On Wed, Sep 29, 2010 at 6:12 AM, Daniel Einspanjer
> >> <deinspan...@mozilla.com> wrote:
> >> >  Question regarding configuration and tuning...
> >> >
> >> > Our current configuration/schema has fairly low hlog rollover sizes to
> >> keep
> >> > the possibility of data loss to a minimum.  When we upgrade to .89
> with
> >> > append support, I imagine we'll be able to safely set this to a much
> >> larger
> >> > size.  Are there any rough guidelines for what a good values should be
> >> now?
> >> >
> >> > -Daniel
> >> >
> >> > On 9/28/10 6:13 PM, Buttler, David wrote:
> >> >>
> >> >> Fantastic news, I look forward to it
> >> >> Dave
> >> >>
> >> >> -----Original Message-----
> >> >> From: Todd Lipcon [mailto:t...@cloudera.com]
> >> >> Sent: Tuesday, September 28, 2010 11:25 AM
> >> >> To: user@hbase.apache.org
> >> >> Subject: Re: Upgrading 0.20.6 ->  0.89
> >> >>
> >> >> On Tue, Sep 28, 2010 at 9:35 AM, Buttler, David<buttl...@llnl.gov>
> >>  wrote:
> >> >>
> >> >>> I currently suggest that you use the CDH3 hadoop package.
>  Apparently
> >> >>> StumbleUpon has a production version of 0.89 that they are using.
>  It
> >> >>> would
> >> >>> be helpful if Cloudera put that in their distribution.
> >> >>>
> >> >>>
> >> >> Working on it ;-) CDH3b3 should be available in about 2 weeks and
> will
> >> >> include an HBase version that's very similar to what StumbleUpon has
> >> >> published.
> >> >>
> >> >> -Todd
> >> >>
> >> >>>
> >> >>> -----Original Message-----
> >> >>> From: Mark Laffoon [mailto:mlaff...@semanticresearch.com]
> >> >>> Sent: Tuesday, September 28, 2010 8:00 AM
> >> >>> To: user@hbase.apache.org
> >> >>> Subject: Upgrading 0.20.6 ->  0.89
> >> >>>
> >> >>> We're using 0.20.6; we have a non-trivial application using many
> >> aspects
> >> >>> of hbase; we have a couple of customers in production; we understand
> >> this
> >> >>> is still pre-release, however we don't want to lose any data.
> >> >>>
> >> >>>
> >> >>>
> >> >>> Will upgrading to 0.89 be a PITA?
> >> >>>
> >> >>> Should we expect to be able to upgrade the servers without losing
> data?
> >> >>>
> >> >>> Will there be tons of client code changes?
> >> >>>
> >> >>> What about configuration changes (especially little changes that
> will
> >> >>> bite
> >> >>> us)?
> >> >>>
> >> >>> Do we need/want to upgrade hadoop at all (we're on 0.20.2)?
> >> >>>
> >> >>> If we do upgrade, what is the recommended package to get it from?
> >> >>>
> >> >>>
> >> >>>
> >> >>> Thanks in advance for any or all answers,
> >> >>>
> >> >>> Mark
> >> >>>
> >> >>>
> >> >>
> >> >
> >>
> >
>

Re: Upgrading 0.20.6 -> 0.89

Reply via email to