Have you tried using setBatch() to limit the number of columns returned ? See code example in 9.4.4.3. of http://hbase.apache.org/book.html#client.filter.kvm
On Fri, Nov 8, 2013 at 10:18 AM, Ivan Tretyakov <[email protected] > wrote: > Hello! > > We have following issue on our cluster running HBase 0.92.1-cdh4.1.1. > When we start full scan of the table some of servers shuts down > unexpectedly with following lines in the log: > > 2013-11-07 21:19:12,173 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooLarge): > {"processingtimems":6723,"call":"next(-3171672497308828151, 1000), rpc > version=1, client version=29, methodsFingerPrint=1891768260","client":" > 10.0.241.99:43063 > > ","starttimems":1383859145449,"queuetimems":0,"class":"HRegionServer","responsesize":1059073884,"method":"next"} > 2013-11-07 21:19:33,009 WARN org.apache.hadoop.hbase.util.Sleeper: We slept > 20545ms instead of 3000ms, this is likely due to a long garbage collecting > pause and it's usually bad, see > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > 2013-11-07 21:19:41,651 INFO org.apache.hadoop.hbase.util.VersionInfo: > HBase 0.92.1-cdh4.1.1 > > or one more example: > > 2013-11-07 22:07:02,587 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooLarge): > {"processingtimems":12540,"call":"next(8031108008798991209, 1000), rpc > version=1, client version=29, methodsFingerPrint=1891768260","client":" > 10.0.240.211:33538 > > ","starttimems":1383862010045,"queuetimems":14955,"class":"HRegionServer","responsesize":1322737704,"method":"next"} > 2013-11-07 22:08:00,413 WARN org.apache.hadoop.hdfs.DFSClient: > DFSOutputStream ResponseProcessor exception for block > BP-1892992341-10.10.122.111-1352825964285:blk_-2134516062062022634_68425527 > java.io.EOFException: Premature EOF: no length prefix available > at > > org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162) > at > > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:114) > at > > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:670) > 2013-11-07 22:08:09,394 INFO org.apache.hadoop.hbase.util.VersionInfo: > HBase 0.92.1-cdh4.1.1 > > Last line ' HBase 0.92.1-cdh4.1.1' is indicating just started new instance > of region server. Every time I see 'responseTooLarge' message before > shutdown. > The job is working with '-caching' option equal to 1000. > > My current assumption that problem caused by memory shortage on RS and long > GC pause which cause ZK session to expire and server to shutdown (-Xmx for > RS is 8GB). Then cloudera manager restarts it. > > I've tried to run job with '-caching' equal to 1 there were no restarted > servers but job didn't finished within reasonable amount of time. I > understand that decreasing value of caching can mitigate the problem but it > not looks like right way for me, because number of regions per server can > be increased in future and we will have similar problem. And it it will > also slow down the job. > > Do you think the problem caused by the same reasons which I assume? > Is that known issue? > What do you think could be the ways to resolve it? > Is there some option to send response when it is becoming too large > independent on caching value? > > Thanks in advance for your answers. > I'm ready to provide any additional information you may need to help me > with this issue. > > -- > Best Regards > Ivan Tretyakov >
