Hey all, I spent some time this afternoon looking into this issue and think I found a good culprit: https://issues.apache.org/jira/browse/HBASE-3483
If you've been having this problem, please watch that JIRA for a patch to try (the one up there now is OK but not great). -Todd On Fri, Jan 14, 2011 at 10:06 AM, Geoff Hendrey <ghend...@decarta.com>wrote: > This is not an answer to your question, but just an anecdote on cluster > pauses/slowdowns. We had horrible problems with cluster wide pauses. I > think there were several keys to getting this resolved: > > 1) we used the default settings recommended for bulk inserts: > http://people.apache.org/~jdcryans/HUG8/HUG8-rawson.pdf > 2) we upgraded to hbase 20.6 b/c there was a deadlock bug in prior > versions that basically just caused the entire cluster to "go to sleep" > > Finally, we had a very strange problem which took 3 weeks of debugging > to get to the bottom of. I don't expect that this is your problem, but > I'll just throw it out there. Most bulk HBase data-producing M/R jobs > are going to do some processing, then write the data from the reducer > into hbase (using autoflush=false and disabling the WAL). Since the > reducers all receive keys in the same order, this causes all the > reducers to load the same HBase region simultaneously. We had this > "great idea" that if we reversed the keys that we wrote out of our > mapper, then un-reversed them in the reducer, that our reducers would be > randomly writing to different region servers, not hitting a single > region in lock step. Now, I have some theories on why this seemingly > innocuous approach repeatedly destroyed our entire Hbase database. I > won't wax philosophical here, but one thing is certain: Any table > created via batch inserts of randomized keys got totally hosed. Scans > became dirt slow and compactions ran constantly, even *days* after the > table was created. None of these problems made a whole lot of sense, > which is why it took 3-4 weeks of debugging for us to back this "key > randomizing" out of our code. The hosed tables, actually had to be > dropped for the problem, and ensuing chaos to totally abate. Until we > dropped the tables, if the region server logs showed constant > compaction. Like I said, it sounds crazy, but this definitely was the > cause of our problem. I'm fully expecting a lot of "your crazy" > responses to this email, but we repeatedly reproduced the issue, and the > fix was to stop the "key reversing". We just had to live with all the > reducers loading individual regions in lock step, as this was really not > a big deal (at least not as big a deal as hosing the entire > installation). > > -g > > -----Original Message----- > From: c...@tarnas.org [mailto:c...@tarnas.org] On Behalf Of Christopher > Tarnas > Sent: Friday, January 14, 2011 9:54 AM > To: user@hbase.apache.org > Subject: Re: Cluster Wide Pauses > > Thanks - I was not sure and had not received a response from the list on > my > related question earlier this week. > > It does seem like compactions are related to my problem, and if I > understand > correctly does raising hbase.hregion.memstore.block.multiplier give it > more > of a buffer for that before writes are blocked while compactions happen? > I'm > writing via thrift (about 30 clients) to a 5 node cluster when I see > this > problem. There is no io wait so I don't think it is disk bound, and it > is > not CPU starved. I'm waiting on IT to get me access to ganglia for the > network info. > > -chris > > On Fri, Jan 14, 2011 at 11:29 AM, Jonathan Gray <jg...@fb.com> wrote: > > > These are a different kind of pause (those caused by > blockingStoreFiles). > > > > This is HBase stepping in and actually blocking updates to a region > because > > compactions have not been able to keep up with the write load. It > could > > manifest itself in the same way but this is different than shorter > pauses > > caused by periodic offlining of regions during balancing and splits. > > > > Wayne, have you confirmed in your RegionServer logs that the pauses > are > > associated with splits or region movement, and that you are not seeing > the > > blocking store files issue? > > > > JG > > > > > -----Original Message----- > > > From: c...@tarnas.org [mailto:c...@tarnas.org] On Behalf Of > Christopher > > > Tarnas > > > Sent: Friday, January 14, 2011 7:29 AM > > > To: user@hbase.apache.org > > > Subject: Re: Cluster Wide Pauses > > > > > > I have been seeing similar problems and found by raising the > > > hbase.hregion.memstore.block.multiplier > > > to above 12 (default is two) and the hbase.hstore.blockingStoreFiles > to > > 16 I > > > managed to reduce the frequency of the pauses during loads. My > nodes are > > > pretty beefy (48 GB of ram) so I had room to experiment. > > > > > > From what I understand that gave the regionservers more buffer > before > > > they had to halt the world to catch up. The pauses still happen but > their > > > impact is less now. > > > > > > -chris > > > > > > On Fri, Jan 14, 2011 at 8:34 AM, Wayne <wav...@gmail.com> wrote: > > > > > > > We have not found any smoking gun here. Most likely these are > region > > > > splits on a quickly growing/hot region that all clients get caught > > waiting for. > > > > > > > > > > > > On Thu, Jan 13, 2011 at 7:49 AM, Wayne <wav...@gmail.com> wrote: > > > > > > > > > Thank you for the lead! We will definitely look closer at the OS > > logs. > > > > > > > > > > > > > > > On Thu, Jan 13, 2011 at 6:59 AM, Tatsuya Kawano > > > > ><tatsuya6...@gmail.com > > > > >wrote: > > > > > > > > > >> > > > > >> Hi Wayne, > > > > >> > > > > >> > We are seeing some TCP Resets on all nodes at the same time, > and > > > > >> sometimes > > > > >> > quite a lot of them. > > > > >> > > > > >> > > > > >> Have you checked this article from Andrei and Cosmin? They had > a > > > > >> busy firewall to cause network blackout. > > > > >> > > > > >> http://hstack.org/hbase-performance-testing/ > > > > >> > > > > >> Maybe it's not your case but just for sure. > > > > >> > > > > >> Thanks, > > > > >> > > > > >> -- > > > > >> Tatsuya Kawano (Mr.) > > > > >> Tokyo, Japan > > > > >> > > > > >> > > > > >> On Jan 13, 2011, at 4:52 AM, Wayne <wav...@gmail.com> wrote: > > > > >> > > > > >> > We are seeing some TCP Resets on all nodes at the same time, > and > > > > >> sometimes > > > > >> > quite a lot of them. We have yet to correlate the pauses to > the > > > > >> > TCP > > > > >> resets > > > > >> > but I am starting to wonder if this is partly a network > problem. > > > > >> > Does Gigabit Ethernet break down on high volume nodes? Do > high > > > > >> > volume nodes > > > > >> use > > > > >> > 10G or Infiniband? > > > > >> > > > > > >> > > > > > >> > On Wed, Jan 12, 2011 at 1:52 PM, Stack <st...@duboce.net> > wrote: > > > > >> > > > > > >> >> Jon asks that you describe your loading in the issue. Would > you > > > > >> >> mind doing so. Ted, stick up in the issue the workload and > > > > >> >> configs. you are running if you don't mind. I'd like to try > it > > over here. > > > > >> >> Thanks lads, > > > > >> >> St.Ack > > > > >> >> > > > > >> >> > > > > >> >> On Wed, Jan 12, 2011 at 9:03 AM, Wayne <wav...@gmail.com> > > > wrote: > > > > >> >>> Added: https://issues.apache.org/jira/browse/HBASE-3438. > > > > >> >>> > > > > >> >>> On Wed, Jan 12, 2011 at 11:40 AM, Wayne <wav...@gmail.com> > > > wrote: > > > > >> >>> > > > > >> >>>> We are using 0.89.20100924, r1001068 > > > > >> >>>> > > > > >> >>>> We are seeing see it during heavy write load (which is all > the > > > > time), > > > > >> >> but > > > > >> >>>> yesterday we had read load as well as write load and saw > both > > > > >> >>>> reads > > > > >> and > > > > >> >>>> writes stop for 10+ seconds. The region size is the > biggest > > > > >> >>>> clue we > > > > >> have > > > > >> >>>> found from our tests as setting up a new cluster with a > 1GB > > > > >> >>>> max > > > > >> region > > > > >> >> size > > > > >> >>>> and starting to load heavily we will see this a lot for > long > > > > >> >>>> long > > > > >> time > > > > >> >>>> frames. Maybe the bigger file gets hung up more easily > with a > > > > split? > > > > >> >> Your > > > > >> >>>> description below also fits in that early on the load is > not > > > > balanced > > > > >> so > > > > >> >> it > > > > >> >>>> is easier to stop everything on one node as the balance is > not > > > > great > > > > >> >> early > > > > >> >>>> on. I will file a JIRA. I will also try to dig deeper into > the > > > > >> >>>> logs > > > > >> >> during > > > > >> >>>> the pauses to find a node that might be stuck in a split. > > > > >> >>>> > > > > >> >>>> > > > > >> >>>> > > > > >> >>>> On Wed, Jan 12, 2011 at 11:17 AM, Stack <st...@duboce.net> > > > wrote: > > > > >> >>>> > > > > >> >>>>> On Tue, Jan 11, 2011 at 2:34 PM, Wayne <wav...@gmail.com> > > > wrote: > > > > >> >>>>>> We have very frequent cluster wide pauses that stop all > > > > >> >>>>>> reads and > > > > >> >>>>> writes > > > > >> >>>>>> for seconds. > > > > >> >>>>> > > > > >> >>>>> All reads and all writes? > > > > >> >>>>> > > > > >> >>>>> I've seen the pause too for writes. Its something I've > > > > >> >>>>> always > > > > meant > > > > >> >>>>> to look into. Friso postulates one cause. Another that > > > > >> >>>>> we've > > > > >> talked > > > > >> >>>>> of is a region taking a while to come back on line after > a > > > > >> >>>>> split > > > > or > > > > >> a > > > > >> >>>>> rebalance for whatever reason. Client loading might be > > 'random' > > > > >> >>>>> spraying over lots of random regions but they all get > stuck > > > > waiting > > > > >> on > > > > >> >>>>> one particular region to come back online. > > > > >> >>>>> > > > > >> >>>>> I suppose reads could be blocked for same reason if all > are > > > > >> >>>>> trying > > > > >> to > > > > >> >>>>> read from the offlined region. > > > > >> >>>>> > > > > >> >>>>> What version of hbase are you using? Splits should be > faster > > > > >> >>>>> in > > > > >> 0.90 > > > > >> >>>>> now that the split daughters come up on the same region. > > > > >> >>>>> > > > > >> >>>>> Sorry I don't have a better answer for you. Need to dig > in. > > > > >> >>>>> > > > > >> >>>>> File a JIRA. If you want to help out some, stick some > data > > > > >> >>>>> up in > > > > >> it. > > > > >> >>>>> Some suggestions would be to enable logging of when we > > > lookup > > > > region > > > > >> >>>>> locations in client and then note when requests go to > zero. > > > > >> >>>>> Can > > > > you > > > > >> >>>>> figure what region the clients are waiting on (if they > are > > > > >> >>>>> waiting > > > > >> on > > > > >> >>>>> any). If you can pull out a particular one, try and > elicit > > > > >> >>>>> its history at time of blockage. Is it being moved or > > > > >> >>>>> mid-split? I suppose it makes sense that bigger regions > > > > >> >>>>> would make the > > > > situation > > > > >> >>>>> 'worse'. I can take a look at it too. > > > > >> >>>>> > > > > >> >>>>> St.Ack > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> We are constantly loading data to this cluster of 10 > nodes. > > > > >> >>>>>> These pauses can happen as frequently as every minute > but > > > > sometimes > > > > >> >> are > > > > >> >>>>> not > > > > >> >>>>>> seen for 15+ minutes. Basically watching the Region > server > > > > >> >>>>>> list > > > > >> with > > > > >> >>>>> request > > > > >> >>>>>> counts is the only evidence of what is going on. All > reads > > > > >> >>>>>> and > > > > >> writes > > > > >> >>>>>> totally stop and if there is ever any activity it is on > the > > > > >> >>>>>> node > > > > >> >> hosting > > > > >> >>>>> the > > > > >> >>>>>> .META. table with a request count of region count + 1. > This > > > > problem > > > > >> >>>>> seems to > > > > >> >>>>>> be worse with a larger region size. We tried a 1GB > region > > > > >> >>>>>> size > > > > and > > > > >> >> saw > > > > >> >>>>> this > > > > >> >>>>>> more than we saw actual activity (and stopped using a > larger > > > > region > > > > >> >> size > > > > >> >>>>>> because of it). We went back to the default region size > and > > > > >> >>>>>> it > > > > was > > > > >> >>>>> better, > > > > >> >>>>>> but we had too many regions so now we are up to 512M for > a > > > > >> >>>>>> region > > > > >> >> size > > > > >> >>>>> and > > > > >> >>>>>> we are seeing it more again. > > > > >> >>>>>> > > > > >> >>>>>> Does anyone know what this is? We have dug into all of > the > > > > >> >>>>>> logs > > > > to > > > > >> >> find > > > > >> >>>>> some > > > > >> >>>>>> sort of pause but are not able to find anything. Is this > an > > > > >> >>>>>> wal > > > > >> hlog > > > > >> >>>>> roll? > > > > >> >>>>>> Is this a region split or compaction? Of course our > biggest > > > > >> >>>>>> fear > > > > is > > > > >> a > > > > >> >> GC > > > > >> >>>>>> pause on the master but we do not have java logging > turned > > > > >> >>>>>> on > > > > with > > > > >> >> the > > > > >> >>>>>> master to tell. What could possibly stop the entire > cluster > > > > >> >>>>>> from > > > > >> >> working > > > > >> >>>>> for > > > > >> >>>>>> seconds at a time very frequently? > > > > >> >>>>>> > > > > >> >>>>>> Thanks in advance for any ideas of what could be causing > > this. > > > > >> >>>>>> > > > > >> >>>>> > > > > >> >>>> > > > > >> >>>> > > > > >> >>> > > > > >> >> > > > > >> > > > > >> > > > > > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera