Re: Cluster Wide Pauses

Todd Lipcon Wed, 26 Jan 2011 19:52:15 -0800

Hey all,

I spent some time this afternoon looking into this issue and think I found a
good culprit:
https://issues.apache.org/jira/browse/HBASE-3483


If you've been having this problem, please watch that JIRA for a patch to
try (the one up there now is OK but not great).

-Todd

On Fri, Jan 14, 2011 at 10:06 AM, Geoff Hendrey <ghend...@decarta.com>wrote:

> This is not an answer to your question, but just an anecdote on cluster
> pauses/slowdowns. We had horrible problems with cluster wide pauses. I
> think there were several keys to getting this resolved:
>
> 1) we used the default settings recommended for bulk inserts:
> http://people.apache.org/~jdcryans/HUG8/HUG8-rawson.pdf
> 2) we upgraded to hbase 20.6 b/c there was a deadlock bug in prior
> versions that basically just caused the entire cluster to "go to sleep"
>
> Finally, we had a very strange problem which took 3 weeks of debugging
> to get to the bottom of. I don't expect that this is your problem, but
> I'll just throw it out there. Most bulk HBase data-producing M/R jobs
> are going to do some processing, then write the data from the reducer
> into hbase (using autoflush=false and disabling the WAL). Since the
> reducers all receive keys in the same order, this causes all the
> reducers to load the same HBase region simultaneously. We had this
> "great idea" that if we reversed the keys that we wrote out of our
> mapper, then un-reversed them in the reducer, that our reducers would be
> randomly writing to different region servers, not hitting a single
> region in lock step. Now, I have some theories on why this seemingly
> innocuous approach repeatedly destroyed our entire Hbase database. I
> won't wax philosophical here, but one thing is certain: Any table
> created via batch inserts of randomized keys got totally hosed. Scans
> became dirt slow and compactions ran constantly, even *days* after the
> table was created. None of these problems made a whole lot of sense,
> which is why it took 3-4 weeks of debugging for us to back this "key
> randomizing" out of our code. The hosed tables, actually had to be
> dropped for the problem, and ensuing chaos to totally abate. Until we
> dropped the tables, if the region server logs showed constant
> compaction. Like I said, it sounds crazy, but this definitely was the
> cause of our problem. I'm fully expecting a lot of "your crazy"
> responses to this email, but we repeatedly reproduced the issue, and the
> fix was to stop the "key reversing". We just had to live with all the
> reducers loading individual regions in lock step, as this was really not
> a big deal (at least not as big a deal as hosing the entire
> installation).
>
> -g
>
> -----Original Message-----
> From: c...@tarnas.org [mailto:c...@tarnas.org] On Behalf Of Christopher
> Tarnas
> Sent: Friday, January 14, 2011 9:54 AM
> To: user@hbase.apache.org
> Subject: Re: Cluster Wide Pauses
>
> Thanks - I was not sure and had not received a response from the list on
> my
> related question earlier this week.
>
> It does seem like compactions are related to my problem, and if I
> understand
> correctly does raising hbase.hregion.memstore.block.multiplier give it
> more
> of a buffer for that before writes are blocked while compactions happen?
> I'm
> writing via thrift (about 30 clients) to a 5 node cluster when I see
> this
> problem. There is no io wait so I don't think it is disk bound, and it
> is
> not CPU starved. I'm waiting on IT to get me access to ganglia for the
> network info.
>
> -chris
>
> On Fri, Jan 14, 2011 at 11:29 AM, Jonathan Gray <jg...@fb.com> wrote:
>
> > These are a different kind of pause (those caused by
> blockingStoreFiles).
> >
> > This is HBase stepping in and actually blocking updates to a region
> because
> > compactions have not been able to keep up with the write load.  It
> could
> > manifest itself in the same way but this is different than shorter
> pauses
> > caused by periodic offlining of regions during balancing and splits.
> >
> > Wayne, have you confirmed in your RegionServer logs that the pauses
> are
> > associated with splits or region movement, and that you are not seeing
> the
> > blocking store files issue?
> >
> > JG
> >
> > > -----Original Message-----
> > > From: c...@tarnas.org [mailto:c...@tarnas.org] On Behalf Of
> Christopher
> > > Tarnas
> > > Sent: Friday, January 14, 2011 7:29 AM
> > > To: user@hbase.apache.org
> > > Subject: Re: Cluster Wide Pauses
> > >
> > > I have been seeing similar problems and found by raising the
> > > hbase.hregion.memstore.block.multiplier
> > > to above 12 (default is two) and the hbase.hstore.blockingStoreFiles
> to
> > 16 I
> > > managed to reduce the frequency of the pauses during loads.  My
> nodes are
> > > pretty beefy (48 GB of ram) so I had room to experiment.
> > >
> > > From what I understand that gave the regionservers more buffer
> before
> > > they had to halt the world to catch up. The pauses still happen but
> their
> > > impact is less now.
> > >
> > > -chris
> > >
> > > On Fri, Jan 14, 2011 at 8:34 AM, Wayne <wav...@gmail.com> wrote:
> > >
> > > > We have not found any smoking gun here. Most likely these are
> region
> > > > splits on a quickly growing/hot region that all clients get caught
> > waiting for.
> > > >
> > > >
> > > > On Thu, Jan 13, 2011 at 7:49 AM, Wayne <wav...@gmail.com> wrote:
> > > >
> > > > > Thank you for the lead! We will definitely look closer at the OS
> > logs.
> > > > >
> > > > >
> > > > > On Thu, Jan 13, 2011 at 6:59 AM, Tatsuya Kawano
> > > > ><tatsuya6...@gmail.com
> > > > >wrote:
> > > > >
> > > > >>
> > > > >> Hi Wayne,
> > > > >>
> > > > >> > We are seeing some TCP Resets on all nodes at the same time,
> and
> > > > >> sometimes
> > > > >> > quite a lot of them.
> > > > >>
> > > > >>
> > > > >> Have you checked this article from Andrei and Cosmin? They had
> a
> > > > >> busy firewall to cause network blackout.
> > > > >>
> > > > >> http://hstack.org/hbase-performance-testing/
> > > > >>
> > > > >> Maybe it's not your case but just for sure.
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> --
> > > > >> Tatsuya Kawano (Mr.)
> > > > >> Tokyo, Japan
> > > > >>
> > > > >>
> > > > >> On Jan 13, 2011, at 4:52 AM, Wayne <wav...@gmail.com> wrote:
> > > > >>
> > > > >> > We are seeing some TCP Resets on all nodes at the same time,
> and
> > > > >> sometimes
> > > > >> > quite a lot of them. We have yet to correlate the pauses to
> the
> > > > >> > TCP
> > > > >> resets
> > > > >> > but I am starting to wonder if this is partly a network
> problem.
> > > > >> > Does Gigabit Ethernet break down on high volume nodes? Do
> high
> > > > >> > volume nodes
> > > > >> use
> > > > >> > 10G or Infiniband?
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Jan 12, 2011 at 1:52 PM, Stack <st...@duboce.net>
> wrote:
> > > > >> >
> > > > >> >> Jon asks that you describe your loading in the issue.  Would
> you
> > > > >> >> mind doing so.  Ted, stick up in the issue the workload and
> > > > >> >> configs. you are running if you don't mind.  I'd like to try
> it
> > over here.
> > > > >> >> Thanks lads,
> > > > >> >> St.Ack
> > > > >> >>
> > > > >> >>
> > > > >> >> On Wed, Jan 12, 2011 at 9:03 AM, Wayne <wav...@gmail.com>
> > > wrote:
> > > > >> >>> Added: https://issues.apache.org/jira/browse/HBASE-3438.
> > > > >> >>>
> > > > >> >>> On Wed, Jan 12, 2011 at 11:40 AM, Wayne <wav...@gmail.com>
> > > wrote:
> > > > >> >>>
> > > > >> >>>> We are using 0.89.20100924, r1001068
> > > > >> >>>>
> > > > >> >>>> We are seeing see it during heavy write load (which is all
> the
> > > > time),
> > > > >> >> but
> > > > >> >>>> yesterday we had read load as well as write load and saw
> both
> > > > >> >>>> reads
> > > > >> and
> > > > >> >>>> writes stop for 10+ seconds. The region size is the
> biggest
> > > > >> >>>> clue we
> > > > >> have
> > > > >> >>>> found from our tests as setting up a new cluster with a
> 1GB
> > > > >> >>>> max
> > > > >> region
> > > > >> >> size
> > > > >> >>>> and starting to load heavily we will see this a lot for
> long
> > > > >> >>>> long
> > > > >> time
> > > > >> >>>> frames. Maybe the bigger file gets hung up more easily
> with a
> > > > split?
> > > > >> >> Your
> > > > >> >>>> description below also fits in that early on the load is
> not
> > > > balanced
> > > > >> so
> > > > >> >> it
> > > > >> >>>> is easier to stop everything on one node as the balance is
> not
> > > > great
> > > > >> >> early
> > > > >> >>>> on. I will file a JIRA. I will also try to dig deeper into
> the
> > > > >> >>>> logs
> > > > >> >> during
> > > > >> >>>> the pauses to find a node that might be stuck in a split.
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> On Wed, Jan 12, 2011 at 11:17 AM, Stack <st...@duboce.net>
> > > wrote:
> > > > >> >>>>
> > > > >> >>>>> On Tue, Jan 11, 2011 at 2:34 PM, Wayne <wav...@gmail.com>
> > > wrote:
> > > > >> >>>>>> We have very frequent cluster wide pauses that stop all
> > > > >> >>>>>> reads and
> > > > >> >>>>> writes
> > > > >> >>>>>> for seconds.
> > > > >> >>>>>
> > > > >> >>>>> All reads and all writes?
> > > > >> >>>>>
> > > > >> >>>>> I've seen the pause too for writes.  Its something I've
> > > > >> >>>>> always
> > > > meant
> > > > >> >>>>> to look into.  Friso postulates one cause.  Another that
> > > > >> >>>>> we've
> > > > >> talked
> > > > >> >>>>> of is a region taking a while to come back on line after
> a
> > > > >> >>>>> split
> > > > or
> > > > >> a
> > > > >> >>>>> rebalance for whatever reason.  Client loading might be
> > 'random'
> > > > >> >>>>> spraying over lots of random regions but they all get
> stuck
> > > > waiting
> > > > >> on
> > > > >> >>>>> one particular region to come back online.
> > > > >> >>>>>
> > > > >> >>>>> I suppose reads could be blocked for same reason if all
> are
> > > > >> >>>>> trying
> > > > >> to
> > > > >> >>>>> read from the offlined region.
> > > > >> >>>>>
> > > > >> >>>>> What version of hbase are you using?  Splits should be
> faster
> > > > >> >>>>> in
> > > > >> 0.90
> > > > >> >>>>> now that the split daughters come up on the same region.
> > > > >> >>>>>
> > > > >> >>>>> Sorry I don't have a better answer for you.  Need to dig
> in.
> > > > >> >>>>>
> > > > >> >>>>> File a JIRA.  If you want to help out some, stick some
> data
> > > > >> >>>>> up in
> > > > >> it.
> > > > >> >>>>> Some suggestions would be to enable logging of when we
> > > lookup
> > > > region
> > > > >> >>>>> locations in client and then note when requests go to
> zero.
> > > > >> >>>>> Can
> > > > you
> > > > >> >>>>> figure what region the clients are waiting on (if they
> are
> > > > >> >>>>> waiting
> > > > >> on
> > > > >> >>>>> any).  If you can pull out a particular one, try and
> elicit
> > > > >> >>>>> its history at time of blockage.  Is it being moved or
> > > > >> >>>>> mid-split?  I suppose it makes sense that bigger regions
> > > > >> >>>>> would make the
> > > > situation
> > > > >> >>>>> 'worse'.  I can take a look at it too.
> > > > >> >>>>>
> > > > >> >>>>> St.Ack
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> We are constantly loading data to this cluster of 10
> nodes.
> > > > >> >>>>>> These pauses can happen as frequently as every minute
> but
> > > > sometimes
> > > > >> >> are
> > > > >> >>>>> not
> > > > >> >>>>>> seen for 15+ minutes. Basically watching the Region
> server
> > > > >> >>>>>> list
> > > > >> with
> > > > >> >>>>> request
> > > > >> >>>>>> counts is the only evidence of what is going on. All
> reads
> > > > >> >>>>>> and
> > > > >> writes
> > > > >> >>>>>> totally stop and if there is ever any activity it is on
> the
> > > > >> >>>>>> node
> > > > >> >> hosting
> > > > >> >>>>> the
> > > > >> >>>>>> .META. table with a request count of region count + 1.
> This
> > > > problem
> > > > >> >>>>> seems to
> > > > >> >>>>>> be worse with a larger region size. We tried a 1GB
> region
> > > > >> >>>>>> size
> > > > and
> > > > >> >> saw
> > > > >> >>>>> this
> > > > >> >>>>>> more than we saw actual activity (and stopped using a
> larger
> > > > region
> > > > >> >> size
> > > > >> >>>>>> because of it). We went back to the default region size
> and
> > > > >> >>>>>> it
> > > > was
> > > > >> >>>>> better,
> > > > >> >>>>>> but we had too many regions so now we are up to 512M for
> a
> > > > >> >>>>>> region
> > > > >> >> size
> > > > >> >>>>> and
> > > > >> >>>>>> we are seeing it more again.
> > > > >> >>>>>>
> > > > >> >>>>>> Does anyone know what this is? We have dug into all of
> the
> > > > >> >>>>>> logs
> > > > to
> > > > >> >> find
> > > > >> >>>>> some
> > > > >> >>>>>> sort of pause but are not able to find anything. Is this
> an
> > > > >> >>>>>> wal
> > > > >> hlog
> > > > >> >>>>> roll?
> > > > >> >>>>>> Is this a region split or compaction? Of course our
> biggest
> > > > >> >>>>>> fear
> > > > is
> > > > >> a
> > > > >> >> GC
> > > > >> >>>>>> pause on the master but we do not have java logging
> turned
> > > > >> >>>>>> on
> > > > with
> > > > >> >> the
> > > > >> >>>>>> master to tell. What could possibly stop the entire
> cluster
> > > > >> >>>>>> from
> > > > >> >> working
> > > > >> >>>>> for
> > > > >> >>>>>> seconds at a time very frequently?
> > > > >> >>>>>>
> > > > >> >>>>>> Thanks in advance for any ideas of what could be causing
> > this.
> > > > >> >>>>>>
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > > >>
> > > > >
> > > >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Cluster Wide Pauses

Reply via email to