Re: regionserver crash under heavy load

Jean-Daniel Cryans Wed, 14 Jul 2010 14:13:10 -0700

After discussing your issue on the IRC channel, the biggest problem is
probably all the compactions that are taking place on all the small
files that are generated. If all your files are 100k big, before any
of your families is able to reach 256MB it will take a hell lot of
compactions, which basically means rewriting your data internally
hundreds of times. We are wondering what could be done to ease that
problem, like flushing each family individually, but that wouldn't be
an ideal situation either and it would require a lot of work.


In the mean time, I would recommend either using a very small number
of families or, if you really need to have 20 families, loading them
one by one. You'll be impressed by the speedup.

J-D

On Wed, Jul 14, 2010 at 1:32 PM, Jean-Daniel Cryans <jdcry...@apache.org> wrote:
> Extremely high IO is bad, hints that there's problem (since it means
> that most of the processes are waiting for either disk or network).
> That's what I see in the datanode's log, lots of threads timeout'ing.
>
> 20 column families is usually unreasonable (it has a deep impact also
> with flushing when doing massive imports), you should review your
> schema. I never good reasons to use more than 2-3 families. Here is an
> example of what 20 families does:
>
> 2010-07-13 04:49:55,981 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush
> for region 
> Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112..
> Current region memstore size 65.7m
> 2010-07-13 04:49:57,314 INFO
> org.apache.hadoop.hbase.regionserver.Store: Added
> hdfs://t-namenode1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/b33f1f164dd4d3f04aa65c2223c74112/agent/1177093464238110730,
> entries=8788, sequenceid=35, memsize=2.0m, filesize=144.1k to
> Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112.
> ...
> 2010-07-13 04:49:57,486 INFO
> org.apache.hadoop.hbase.regionserver.Store: Added
> hdfs://t-namenode1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/b33f1f164dd4d3f04aa65c2223c74112/cluster/4834392580329156904,
> entries=8788, sequenceid=35, memsize=2.0m, filesize=141.6k to
> Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112.
> ...
>
> As you can see, we flush when we hit a global size of 64MB on a region
> but each family is flushed to a different (very small) file. This is
> by design, HBase is column-oriented and using all families at the same
> time usually points to bad usage/schema design. Also this must account
> for a LOT of the IO you are seeing.
>
> And about the compression, I do see an error in the region server log:
>
> 2010-07-13 04:49:57,137 WARN org.apache.hadoop.util.NativeCodeLoader:
> Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
>
> Something's missing in your configuration. Please review
> http://wiki.apache.org/hadoop/UsingLzoCompression
>
> J-D
>
> On Wed, Jul 14, 2010 at 1:16 PM, Jinsong Hu <jinsong...@hotmail.com> wrote:
>> the test has stopped. so I can't really tell now. what I noticed is that I
>> was running
>> "iostat -x 5" all the time and I noticed that there are lots of times that
>> the %idle is
>> less than 5%,  or even close to 0% . I also notice that the disk %util
>> higher than 90%.
>>
>>
>> This indicates that the disk is pretty busy. On the other hand, I was
>> running 3 mappers
>> to insert records to the hbase, I artificially restrict the insertion rate
>> to be 80 records/second.
>> The table has 20 column families, each family has 4-5 columns. by my
>> estimate, each
>> record is about 10K byte. so the pumping rate is not really that high.
>>
>> the table columns are compressed. I do notice that when I disable the
>> compression,
>> the insertion rate is even slower. This indicates that the disk usage
>> probably is indeed the
>> bottle neck, but it is just hard to believe that I can't even insert record
>> at 3x80=240 records/second
>> with 10K byte records.  The load isn't heavy at all .
>> Right now I reduce the speed to even lower 20 records/second
>> and test and see if it will be in better shape.
>>
>> Did you check the datanode log ? There are lots of errors there. can you
>> tell why it happens ?
>>
>>
>> Jimmy.
>>
>> --------------------------------------------------
>> From: "Jean-Daniel Cryans" <jdcry...@apache.org>
>> Sent: Wednesday, July 14, 2010 12:10 PM
>> To: <user@hbase.apache.org>
>> Subject: Re: regionserver crash under heavy load
>>
>>> Thanks for the zookeeper log, I was able to pinpoint more the problem.
>>> So I see this (didn't dig enough in the logs it seems, wasn't
>>> expecting a 3 minutes difference)
>>>
>>> 2010-07-13 21:49:17,864 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got brand-new decompressor
>>> 2010-07-13 22:03:46,313 DEBUG
>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>>> Total=3.3544312MB (3517376), Free=405.42056MB (425114272),
>>> Max=408.775MB (428631648), Counts: Blocks=0, Access=6580365, Hit=0,
>>> Miss=658036
>>> 5, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%,
>>> Evicted/Run=NaN
>>> 2010-07-13 22:03:46,483 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got brand-new compressor
>>> 2010-07-13 22:03:46,404 INFO org.apache.zookeeper.ClientCnxn: Client
>>> session timed out, have not heard from server in 908177ms for
>>> sessionid 0x229c87a7f4e0011, closing socket connection and attempting
>>> reconnect2010-07-13 22:03:46,400 INFO
>>> org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor
>>> 2010-07-13 22:03:46,313 WARN
>>> org.apache.hadoop.hbase.regionserver.wal.HLog: IPC Server handler 2 on
>>> 60020 took 868374ms appending an edit to hlog; editcount=4274
>>>
>>> At the same time, we can see that the zookeeper log does expire the
>>> session:
>>>
>>> 2010-07-13 21:52:00,001 - INFO  [SessionTracker:zookeeperser...@315] -
>>> Expiring session 0x229c87a7f4e0011, timeout of 120000ms exceeded
>>> 2010-07-13 21:52:00,001 - INFO  [SessionTracker:zookeeperser...@315] -
>>> Expiring session 0x129c87a7f980015, timeout of 120000ms exceeded
>>> 2010-07-13 21:52:00,002 - INFO
>>> [ProcessThread:-1:preprequestproces...@385] - Processed session
>>> termination for sessionid: 0x229c87a7f4e0011
>>> 2010-07-13 21:52:00,002 - INFO
>>> [ProcessThread:-1:preprequestproces...@385] - Processed session
>>> termination for sessionid: 0x129c87a7f980015
>>>
>>> So your region server process was paused for more than 10 minutes, the
>>> logs show that gap and the lines that follow talk about that big
>>> pause. Are your nodes swapping?
>>>
>>> J-D
>>>
>>> On Wed, Jul 14, 2010 at 11:48 AM, Jinsong Hu <jinsong...@hotmail.com>
>>> wrote:
>>>>
>>>> the zookeepers are on 3 separate physical machine that colocates with 3
>>>> master.
>>>> I have put the logs for them here.
>>>> http://t-collectors1.proofpoint.com:8079/zookeeper.tar.gz
>>>> the version is also from CDH3 June 10 release.
>>>>
>>>> I checked the datanode in that period , there are lots of exceptions, I
>>>> have
>>>> put it here:
>>>> http://t-collectors1.proofpoint.com:8079/backup.tar.gz
>>>>
>>>> Jinsong
>>>>
>>>> --------------------------------------------------
>>>> From: "Jean-Daniel Cryans" <jdcry...@apache.org>
>>>> Sent: Wednesday, July 14, 2010 11:16 AM
>>>> To: <user@hbase.apache.org>
>>>> Subject: Re: regionserver crash under heavy load
>>>>
>>>>> So your region servers had their session expired but I don't see any
>>>>> sign of GC activity. Usually that points to a case where zookeeper
>>>>> isn't able to answer request fast enough because it is IO starved. Are
>>>>> the zookeeper quorum member on the same nodes as the region servers?
>>>>>
>>>>> J-D
>>>>>
>>>>> On Wed, Jul 14, 2010 at 10:16 AM, Jinsong Hu <jinsong...@hotmail.com>
>>>>> wrote:
>>>>>>
>>>>>> I have uploaded the crashed log files to
>>>>>>
>>>>>> http://somewhere
>>>>>> http://somewhere
>>>>>>
>>>>>> it includes the GC log and config files too. both of these regionserver
>>>>>> crashed when I
>>>>>> use only 3 mappers to insert records to hbase, and each task I limit
>>>>>> the
>>>>>> data rate
>>>>>> to 80 records /second.
>>>>>>
>>>>>> The machine I use is relatively old, having only 4G of ram. 4 core CPU,
>>>>>> and
>>>>>> 250G disk.
>>>>>> I run tasktracker , datanode and regionserver on them. I have 3
>>>>>> machines,
>>>>>> only 1 regionserver
>>>>>> is still up after the continuous insertion for overnight. before the
>>>>>> test
>>>>>> I
>>>>>> created  hbase
>>>>>> tables so I started with an empty table. I saw it running fine for
>>>>>> several
>>>>>> hours and this morning
>>>>>> I check again, 2 regionserver died. I backup the log and restart the
>>>>>> regionserver, and then
>>>>>> I found the regionserver process is up, but not listening to any port.
>>>>>>
>>>>>> the log doesn't show any error:
>>>>>> 2010-07-14 16:23:45,769 DEBUG
>>>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
>>>>>> : Set watcher on master address ZNode /hbase/master
>>>>>> 2010-07-14 16:23:45,951 DEBUG
>>>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
>>>>>> : Read ZNode /hbase/master got 10.110.24.48:60000
>>>>>> 2010-07-14 16:23:45,957 INFO
>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>>>>>> Telling master at 10.110.24.48:60000 that we are up
>>>>>> 2010-07-14 16:23:46,283 INFO
>>>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook:
>>>>>> Installed shutdown hook thread: Shutdownhook:regionserver60020
>>>>>>
>>>>>>
>>>>>> I am using the June 10 release of the cloudera distribution for both
>>>>>> hadoop
>>>>>> and hbase.
>>>>>>
>>>>>> Jimmy
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------
>>>>>> From: "Jean-Daniel Cryans" <jdcry...@apache.org>
>>>>>> Sent: Tuesday, July 13, 2010 4:24 PM
>>>>>> To: <user@hbase.apache.org>
>>>>>> Subject: Re: regionserver crash under heavy load
>>>>>>
>>>>>>> Your region server doesn't look much loaded from the metrics POV. But
>>>>>>> I specifically asked for the lines around that, not just the dump,
>>>>>>> since it will contain the reason for the shutdown.
>>>>>>>
>>>>>>>> I do notice that the disk usage is pretty high.  I am just thinking
>>>>>>>> that
>>>>>>>> our
>>>>>>>> problem probably is a hardware limit. but the server should not crash
>>>>>>>> when
>>>>>>>> the hardware limit is reached.
>>>>>>>
>>>>>>> We still don't know why it crashed and it may not even be related to
>>>>>>> HW limits, we need those bigger log traces. Also use pastebin.com or
>>>>>>> anything like that.
>>>>>>>
>>>>>>>>
>>>>>>>> do you have any idea when CDH3 official release will be out ?
>>>>>>>
>>>>>>> I don't work for cloudera, but IIRC the next beta for CDH3 is due for
>>>>>>> September.
>>>>>>>
>>>>>>>>
>>>>>>>> Jimmy
>>>>>>>>
>>>>>>>> --------------------------------------------------
>>>>>>>> From: "Jean-Daniel Cryans" <jdcry...@apache.org>
>>>>>>>> Sent: Tuesday, July 13, 2010 2:55 PM
>>>>>>>> To: <user@hbase.apache.org>
>>>>>>>> Subject: Re: regionserver crash under heavy load
>>>>>>>>
>>>>>>>>> Please use a pasting service for the log traces. I personally use
>>>>>>>>> pastebin.com
>>>>>>>>>
>>>>>>>>> You probably had a GC that lasted too long, this is something out of
>>>>>>>>> the control of the application (apart from trying to put as less
>>>>>>>>> data
>>>>>>>>> in memory as possible, but you are inserting so...). Your log
>>>>>>>>> doesn't
>>>>>>>>> contain enough information for us to tell, please look for a "Dump
>>>>>>>>> of
>>>>>>>>> metrics" line and paste the lines around that.
>>>>>>>>>
>>>>>>>>> J-D
>>>>>>>>>
>>>>>>>>> On Tue, Jul 13, 2010 at 2:49 PM, Jinsong Hu <jinsong...@hotmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi, Todd:
>>>>>>>>>>  I downloaded hadoop-0.20.2+320 and hbase-0.89.20100621+17 from
>>>>>>>>>> CDH3
>>>>>>>>>> and
>>>>>>>>>> inserted data with full load, after a while the hbase regionserver
>>>>>>>>>> crashed.
>>>>>>>>>> I checked  system with "iostat -x 5" and notice the disk is pretty
>>>>>>>>>> busy.
>>>>>>>>>> Then I modified my client code and reduced the insertion rate by 6
>>>>>>>>>> times,
>>>>>>>>>> and the test runs fine.  Is there any way that regionserver be
>>>>>>>>>> modified
>>>>>>>>>> so
>>>>>>>>>> that at least it doesn't crash under heavy load ?  I used apache
>>>>>>>>>> hbase
>>>>>>>>>> 0.20.5 distribution and the same problem happens. I am thinking
>>>>>>>>>> that
>>>>>>>>>> when
>>>>>>>>>> the regionserver is too busy, it should throttle incoming data rate
>>>>>>>>>> to
>>>>>>>>>> protect the server.  Could this be done ?
>>>>>>>>>>  Do you also know when the CDH3 official release will come out ?
>>>>>>>>>> the
>>>>>>>>>> one
>>>>>>>>>> I
>>>>>>>>>> downloaded is beta version.
>>>>>>>>>>
>>>>>>>>>> Jimmy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2010-07-13 02:24:34,389 INFO
>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>>>>>>>> Close
>>>>>>>>>> d Spam_MsgEventTable,56-2010-05-19
>>>>>>>>>> 10:09:02\x099a420f4f31748828fd24aeea1d06b294,
>>>>>>>>>> 1278973678315.01dd22f517dabf53ddd135709b68ba6c.
>>>>>>>>>> 2010-07-13 02:24:34,389 INFO
>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>>>>>>>>>> aborting server at: m0002029.ppops.net,60020,1278969481450
>>>>>>>>>> 2010-07-13 02:24:34,389 DEBUG
>>>>>>>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
>>>>>>>>>> : Closed connection with ZooKeeper; /hbase/root-region-server
>>>>>>>>>> 2010-07-13 02:24:34,389 INFO
>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>>>>>>>>>> regionserver60020 exiting
>>>>>>>>>> 2010-07-13 02:24:34,608 INFO
>>>>>>>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook:
>>>>>>>>>> Shutdown hook starting; hbase.shutdown.hook=true;
>>>>>>>>>> fsShutdownHook=Thread[Thread-1
>>>>>>>>>> 0,5,main]
>>>>>>>>>> 2010-07-13 02:24:34,608 INFO
>>>>>>>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook:
>>>>>>>>>> Starting fs shutdown hook thread.
>>>>>>>>>> 2010-07-13 02:24:34,608 ERROR org.apache.hadoop.hdfs.DFSClient:
>>>>>>>>>> Exception
>>>>>>>>>> closin
>>>>>>>>>> g file
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> /hbase/.logs/m0002029.ppops.net,60020,1278969481450/10.110.24.79%3A60020.
>>>>>>>>>> 1278987220794 : java.io.IOException: IOException
>>>>>>>>>> flush:java.io.IOException:
>>>>>>>>>> IOEx
>>>>>>>>>> ception flush:java.io.IOException: IOException
>>>>>>>>>> flush:java.io.IOException:
>>>>>>>>>> IOExce
>>>>>>>>>> ption flush:java.io.IOException: IOException
>>>>>>>>>> flush:java.io.IOException:
>>>>>>>>>> IOExcept
>>>>>>>>>> ion flush:java.io.IOException: IOException
>>>>>>>>>> flush:java.io.IOException:
>>>>>>>>>> IOExceptio
>>>>>>>>>> n flush:java.io.IOException: IOException flush:java.io.IOException:
>>>>>>>>>> Error
>>>>>>>>>> Recove
>>>>>>>>>> ry for block blk_-1605696159279298313_2395924 failed  because
>>>>>>>>>> recovery
>>>>>>>>>> from
>>>>>>>>>> prim
>>>>>>>>>> ary datanode 10.110.24.80:50010 failed 6 times.  Pipeline was
>>>>>>>>>> 10.110.24.80:50010
>>>>>>>>>> . Aborting...
>>>>>>>>>> java.io.IOException: IOException flush:java.io.IOException:
>>>>>>>>>> IOException
>>>>>>>>>> flush:ja
>>>>>>>>>> va.io.IOException: IOException flush:java.io.IOException:
>>>>>>>>>> IOException
>>>>>>>>>> flush:java
>>>>>>>>>> .io.IOException: IOException flush:java.io.IOException: IOException
>>>>>>>>>> flush:java.i
>>>>>>>>>> o.IOException: IOException flush:java.io.IOException: IOException
>>>>>>>>>> flush:java.io.
>>>>>>>>>> IOException: IOException flush:java.io.IOException: Error Recovery
>>>>>>>>>> for
>>>>>>>>>> block
>>>>>>>>>> blk
>>>>>>>>>> _-1605696159279298313_2395924 failed  because recovery from primary
>>>>>>>>>> datanode
>>>>>>>>>> 10.
>>>>>>>>>> 110.24.80:50010 failed 6 times.  Pipeline was 10.110.24.80:50010.
>>>>>>>>>> Aborting...
>>>>>>>>>>   at
>>>>>>>>>>
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:
>>>>>>>>>> 3214)
>>>>>>>>>>   at
>>>>>>>>>>
>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:
>>>>>>>>>> 97)
>>>>>>>>>>   at
>>>>>>>>>>
>>>>>>>>>> org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944
>>>>>>>>>> )
>>>>>>>>>>   at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
>>>>>>>>>>   at
>>>>>>>>>>
>>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>>>>>>>>>> sorImpl.java:25)
>>>>>>>>>>   at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>>>   at
>>>>>>>>>>
>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(S
>>>>>>>>>> equenceFileLogWriter.java:124)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.hflush(HLog.java:826)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1004)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:817)
>>>>>>>>>>   at
>>>>>>>>>>
>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.j
>>>>>>>>>> ava:1531)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1447)
>>>>>>>>>>   at
>>>>>>>>>>
>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.
>>>>>>>>>> java:1703)
>>>>>>>>>>   at
>>>>>>>>>>
>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionSe
>>>>>>>>>> rver.java:2361)
>>>>>>>>>>   at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>>>>>>>>>>   at
>>>>>>>>>>
>>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>>>>>>>>>> sorImpl.java:25)
>>>>>>>>>>   at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
>>>>>>>>>>   at
>>>>>>>>>>
>>>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:
>>>>>>>>>> 919)
>>>>>>>>>> 2010-07-13 02:24:34,610 ERROR org.apache.hadoop.hdfs.DFSClient:
>>>>>>>>>> Exception
>>>>>>>>>> closin
>>>>>>>>>> g file
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> /hbase/Spam_MsgEventTable/079c7de876422e57e5f09fef5d997e06/.tmp/677365813
>>>>>>>>>> 4549268273 : java.io.IOException: All datanodes 10.110.24.80:50010
>>>>>>>>>> are
>>>>>>>>>> bad.
>>>>>>>>>> Abor
>>>>>>>>>> ting...
>>>>>>>>>> java.io.IOException: All datanodes 10.110.24.80:50010 are bad.
>>>>>>>>>> Aborting...
>>>>>>>>>>   at
>>>>>>>>>>
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError
>>>>>>>>>> (DFSClient.java:2603)
>>>>>>>>>>   at
>>>>>>>>>>
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClien
>>>>>>>>>> t.java:2139)
>>>>>>>>>>   at
>>>>>>>>>>
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
>>>>>>>>>> Client.java:2306)
>>>>>>>>>> 2010-07-13 02:24:34,729 INFO
>>>>>>>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook:
>>>>>>>>>> Shutdown hook finished.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: regionserver crash under heavy load

Reply via email to