Re: Hbase region servers shuts down unexpectedly

Ted Yu Fri, 08 Nov 2013 10:27:51 -0800

Have you tried using setBatch() to limit the number of columns returned ?

See code example in 9.4.4.3. of
http://hbase.apache.org/book.html#client.filter.kvm



On Fri, Nov 8, 2013 at 10:18 AM, Ivan Tretyakov <[email protected]
> wrote:

> Hello!
>
> We have following issue on our cluster running HBase 0.92.1-cdh4.1.1.
> When we start full scan of the table some of servers shuts down
> unexpectedly with following lines in the log:
>
> 2013-11-07 21:19:12,173 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooLarge):
> {"processingtimems":6723,"call":"next(-3171672497308828151, 1000), rpc
> version=1, client version=29, methodsFingerPrint=1891768260","client":"
> 10.0.241.99:43063
>
> ","starttimems":1383859145449,"queuetimems":0,"class":"HRegionServer","responsesize":1059073884,"method":"next"}
> 2013-11-07 21:19:33,009 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
> 20545ms instead of 3000ms, this is likely due to a long garbage collecting
> pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2013-11-07 21:19:41,651 INFO org.apache.hadoop.hbase.util.VersionInfo:
> HBase 0.92.1-cdh4.1.1
>
> or one more example:
>
> 2013-11-07 22:07:02,587 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooLarge):
> {"processingtimems":12540,"call":"next(8031108008798991209, 1000), rpc
> version=1, client version=29, methodsFingerPrint=1891768260","client":"
> 10.0.240.211:33538
>
> ","starttimems":1383862010045,"queuetimems":14955,"class":"HRegionServer","responsesize":1322737704,"method":"next"}
> 2013-11-07 22:08:00,413 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception for block
> BP-1892992341-10.10.122.111-1352825964285:blk_-2134516062062022634_68425527
> java.io.EOFException: Premature EOF: no length prefix available
>         at
>
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
>         at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114)
>         at
>
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:670)
> 2013-11-07 22:08:09,394 INFO org.apache.hadoop.hbase.util.VersionInfo:
> HBase 0.92.1-cdh4.1.1
>
> Last line ' HBase 0.92.1-cdh4.1.1' is indicating just started new instance
> of region server. Every time I see 'responseTooLarge' message before
> shutdown.
> The job is working with '-caching' option equal to 1000.
>
> My current assumption that problem caused by memory shortage on RS and long
> GC pause which cause ZK session to expire and server to shutdown (-Xmx for
> RS is 8GB). Then cloudera manager restarts it.
>
> I've tried to run job with '-caching' equal to 1 there were no restarted
> servers but job didn't finished within reasonable amount of time. I
> understand that decreasing value of caching can mitigate the problem but it
> not looks like right way for me, because number of regions per server can
> be increased in future and we will have similar problem. And it it will
> also slow down the job.
>
> Do you think the problem caused by the same reasons which I assume?
> Is that known issue?
> What do you think could be the ways to resolve it?
> Is there some option to send response when it is becoming too large
> independent on caching value?
>
> Thanks in advance for your answers.
> I'm ready to provide any additional information you may need to help me
> with this issue.
>
> --
> Best Regards
> Ivan Tretyakov
>

Re: Hbase region servers shuts down unexpectedly

Reply via email to