Forgot to mention that. It's version 0.92.1 (Cloudera CDH4.1.1), running on CentOS 6 64 bit, Java 1.6.0_31
On Dec 14, 2012, at 5:31 PM, lars hofhansl <lhofha...@yahoo.com> wrote: > Hey Bryan, > > > which version of HBase it this? > > -- Lars > > > > ________________________________ > From: Bryan Keller <brya...@gmail.com> > To: "user@hbase.apache.org" <user@hbase.apache.org> > Sent: Friday, December 14, 2012 2:59 PM > Subject: HBaseClient.call() hang > > I have encountered a problem with HBaseClient.call() hanging. This occurs > when one of my regionservers goes down while performing a table scan. > > What exacerbates this problem is that the scan I am performing uses filters, > and the region size of the table is large (4gb). Because of this, it can take > several minutes for a row to be returned when calling scanner.next(). > Apparently there is no keep alive message being sent back to the scanner > while the region server is busy, so I had to increase the hbase.rpc.timeout > value to a large number (60 min), otherwise the next() call will timeout > waiting for the regionserver to send something back. > > The result is that this HBaseClient.call() hang is made much worse, because > it won't time out for 60 minutes. > > I have a couple of questions: > > 1. Any thoughts on why the HBaseClient.call() is getting stuck? I noticed > that call.wait() is not using any timeout so it will wait indefinitely until > interrupted externally > > 2. Is there a solution where I do not need to set hbase.rpc.timeout to a very > large number? My only thought would be to forego using filters and do the > filtering client side, which seems pretty inefficient > > Here is a stack dump of the thread that was hung: > > Thread 10609: (state = BLOCKED) > - java.lang.Object.wait(long) @bci=0 (Interpreted frame) > - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame) > - org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable, > java.net.InetSocketAddress, java.lang.Class, > org.apache.hadoop.hbase.security.User, int) @bci=51, line=904 (Interpreted > frame) > - > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object, > java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150 (Interpreted > frame) > - $Proxy12.next(long, int) @bci=26 (Interpreted frame) > - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92 > (Interpreted frame) > - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42 > (Interpreted frame) > - > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable) > @bci=36, line=1325 (Interpreted frame) > - org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117, > line=1299 (Compiled frame) > - org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue() > @bci=41, line=150 (Interpreted frame) > - org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue() @bci=4, > line=142 (Interpreted frame) > - org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue() > @bci=4, line=458 (Interpreted frame) > - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4, > line=76 (Interpreted frame) > - org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue() > @bci=4, line=85 (Interpreted frame) > - > org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context) > @bci=6, line=139 (Interpreted frame) > - > org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf, > org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex, > org.apache.hadoop.mapred.TaskUmbilicalProtocol, > org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645 (Interpreted > frame) > - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf, > org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=325 > (Interpreted frame) > - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted frame) > - > java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, > java.security.AccessControlContext) @bci=0 (Interpreted frame) > - javax.security.auth.Subject.doAs(javax.security.auth.Subject, > java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) > - > org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) > @bci=14, line=1332 (Interpreted frame) > - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=776, line=262 > (Interpreted frame)