Re: regionserver crash under heavy load

Jean-Daniel Cryans Tue, 13 Jul 2010 14:55:46 -0700

Please use a pasting service for the log traces. I personally use pastebin.com


You probably had a GC that lasted too long, this is something out of
the control of the application (apart from trying to put as less data
in memory as possible, but you are inserting so...). Your log doesn't
contain enough information for us to tell, please look for a "Dump of
metrics" line and paste the lines around that.

J-D

On Tue, Jul 13, 2010 at 2:49 PM, Jinsong Hu <jinsong...@hotmail.com> wrote:
> Hi, Todd:
>  I downloaded hadoop-0.20.2+320 and hbase-0.89.20100621+17 from CDH3 and
> inserted data with full load, after a while the hbase regionserver crashed.
> I checked  system with "iostat -x 5" and notice the disk is pretty busy.
> Then I modified my client code and reduced the insertion rate by 6 times,
> and the test runs fine.  Is there any way that regionserver be modified so
> that at least it doesn't crash under heavy load ?  I used apache hbase
> 0.20.5 distribution and the same problem happens. I am thinking that when
> the regionserver is too busy, it should throttle incoming data rate to
> protect the server.  Could this be done ?
>  Do you also know when the CDH3 official release will come out ? the one I
> downloaded is beta version.
>
> Jimmy
>
>
>
>
>
>
> 2010-07-13 02:24:34,389 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Close
> d Spam_MsgEventTable,56-2010-05-19
> 10:09:02\x099a420f4f31748828fd24aeea1d06b294,
> 1278973678315.01dd22f517dabf53ddd135709b68ba6c.
> 2010-07-13 02:24:34,389 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> aborting server at: m0002029.ppops.net,60020,1278969481450
> 2010-07-13 02:24:34,389 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
> : Closed connection with ZooKeeper; /hbase/root-region-server
> 2010-07-13 02:24:34,389 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> regionserver60020 exiting
> 2010-07-13 02:24:34,608 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook:
> Shutdown hook starting; hbase.shutdown.hook=true;
> fsShutdownHook=Thread[Thread-1
> 0,5,main]
> 2010-07-13 02:24:34,608 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook:
> Starting fs shutdown hook thread.
> 2010-07-13 02:24:34,608 ERROR org.apache.hadoop.hdfs.DFSClient: Exception
> closin
> g file
> /hbase/.logs/m0002029.ppops.net,60020,1278969481450/10.110.24.79%3A60020.
> 1278987220794 : java.io.IOException: IOException flush:java.io.IOException:
> IOEx
> ception flush:java.io.IOException: IOException flush:java.io.IOException:
> IOExce
> ption flush:java.io.IOException: IOException flush:java.io.IOException:
> IOExcept
> ion flush:java.io.IOException: IOException flush:java.io.IOException:
> IOExceptio
> n flush:java.io.IOException: IOException flush:java.io.IOException: Error
> Recove
> ry for block blk_-1605696159279298313_2395924 failed  because recovery from
> prim
> ary datanode 10.110.24.80:50010 failed 6 times.  Pipeline was
> 10.110.24.80:50010
> . Aborting...
> java.io.IOException: IOException flush:java.io.IOException: IOException
> flush:ja
> va.io.IOException: IOException flush:java.io.IOException: IOException
> flush:java
> .io.IOException: IOException flush:java.io.IOException: IOException
> flush:java.i
> o.IOException: IOException flush:java.io.IOException: IOException
> flush:java.io.
> IOException: IOException flush:java.io.IOException: Error Recovery for block
> blk
> _-1605696159279298313_2395924 failed  because recovery from primary datanode
> 10.
> 110.24.80:50010 failed 6 times.  Pipeline was 10.110.24.80:50010.
> Aborting...
>       at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:
> 3214)
>       at
> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:
> 97)
>       at
> org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944
> )
>       at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
>       at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(S
> equenceFileLogWriter.java:124)
>       at org.apache.hadoop.hbase.regionserver.wal.HLog.hflush(HLog.java:826)
>       at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1004)
>       at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:817)
>       at
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.j
> ava:1531)
>       at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1447)
>       at
> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.
> java:1703)
>       at
> org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionSe
> rver.java:2361)
>       at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>       at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
>       at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:
> 919)
> 2010-07-13 02:24:34,610 ERROR org.apache.hadoop.hdfs.DFSClient: Exception
> closin
> g file
> /hbase/Spam_MsgEventTable/079c7de876422e57e5f09fef5d997e06/.tmp/677365813
> 4549268273 : java.io.IOException: All datanodes 10.110.24.80:50010 are bad.
> Abor
> ting...
> java.io.IOException: All datanodes 10.110.24.80:50010 are bad. Aborting...
>       at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError
> (DFSClient.java:2603)
>       at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClien
> t.java:2139)
>       at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
> Client.java:2306)
> 2010-07-13 02:24:34,729 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook:
> Shutdown hook finished.
>

Re: regionserver crash under heavy load

Reply via email to