Thank you for the answer Ted. We were able to fix the issue by tuning hbase.client.scanner.max.result.size parameter.
P.S. "The HBase development team has affectionately dubbed this scenario a Juliet Pause — the master (Romeo) presumes the region server (Juliet) is dead when it’s really just sleeping, and thus takes some drastic action (recovery). When the server wakes up, it sees that a great mistake has been made and takes its own life. Makes for a good play, but a pretty awful failure scenario!" On Fri, Nov 8, 2013 at 10:26 PM, Ted Yu <[email protected]> wrote: > Have you tried using setBatch() to limit the number of columns returned ? > > See code example in 9.4.4.3. of > http://hbase.apache.org/book.html#client.filter.kvm > > > On Fri, Nov 8, 2013 at 10:18 AM, Ivan Tretyakov < > [email protected] > > wrote: > > > Hello! > > > > We have following issue on our cluster running HBase 0.92.1-cdh4.1.1. > > When we start full scan of the table some of servers shuts down > > unexpectedly with following lines in the log: > > > > 2013-11-07 21:19:12,173 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooLarge): > > {"processingtimems":6723,"call":"next(-3171672497308828151, 1000), rpc > > version=1, client version=29, methodsFingerPrint=1891768260","client":" > > 10.0.241.99:43063 > > > > > ","starttimems":1383859145449,"queuetimems":0,"class":"HRegionServer","responsesize":1059073884,"method":"next"} > > 2013-11-07 21:19:33,009 WARN org.apache.hadoop.hbase.util.Sleeper: We > slept > > 20545ms instead of 3000ms, this is likely due to a long garbage > collecting > > pause and it's usually bad, see > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > 2013-11-07 21:19:41,651 INFO org.apache.hadoop.hbase.util.VersionInfo: > > HBase 0.92.1-cdh4.1.1 > > > > or one more example: > > > > 2013-11-07 22:07:02,587 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooLarge): > > {"processingtimems":12540,"call":"next(8031108008798991209, 1000), rpc > > version=1, client version=29, methodsFingerPrint=1891768260","client":" > > 10.0.240.211:33538 > > > > > ","starttimems":1383862010045,"queuetimems":14955,"class":"HRegionServer","responsesize":1322737704,"method":"next"} > > 2013-11-07 22:08:00,413 WARN org.apache.hadoop.hdfs.DFSClient: > > DFSOutputStream ResponseProcessor exception for block > > > BP-1892992341-10.10.122.111-1352825964285:blk_-2134516062062022634_68425527 > > java.io.EOFException: Premature EOF: no length prefix available > > at > > > > > org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162) > > at > > > > > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114) > > at > > > > > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:670) > > 2013-11-07 22:08:09,394 INFO org.apache.hadoop.hbase.util.VersionInfo: > > HBase 0.92.1-cdh4.1.1 > > > > Last line ' HBase 0.92.1-cdh4.1.1' is indicating just started new > instance > > of region server. Every time I see 'responseTooLarge' message before > > shutdown. > > The job is working with '-caching' option equal to 1000. > > > > My current assumption that problem caused by memory shortage on RS and > long > > GC pause which cause ZK session to expire and server to shutdown (-Xmx > for > > RS is 8GB). Then cloudera manager restarts it. > > > > I've tried to run job with '-caching' equal to 1 there were no restarted > > servers but job didn't finished within reasonable amount of time. I > > understand that decreasing value of caching can mitigate the problem but > it > > not looks like right way for me, because number of regions per server can > > be increased in future and we will have similar problem. And it it will > > also slow down the job. > > > > Do you think the problem caused by the same reasons which I assume? > > Is that known issue? > > What do you think could be the ways to resolve it? > > Is there some option to send response when it is becoming too large > > independent on caching value? > > > > Thanks in advance for your answers. > > I'm ready to provide any additional information you may need to help me > > with this issue. > > > > -- > > Best Regards > > Ivan Tretyakov > > > -- Best Regards Ivan Tretyakov Deployment Engineer Grid Dynamics +7 812 640 38 76 Skype: ivan.v.tretyakov www.griddynamics.com [email protected]
