Bryan: >> I have encountered a problem with HBaseClient.call() hanging. This occurs >> when one of my regionservers goes down while performing a table scan.
Have you checked the issue of HBASE-6313? Jieshan -----Original Message----- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Saturday, December 15, 2012 2:00 PM To: user@hbase.apache.org Subject: Re: HBaseClient.call() hang I should have mentioned that original patches for HBASE-5416 were contributed by Max Lapan. On Fri, Dec 14, 2012 at 9:49 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Bryan: > > bq. My only thought would be to forego using filters > Please keep using filters. > > I and Sergey are working on HBASE-5416: Improve performance of scans with > some kind of filters > This feature allows you to specify one column family as being essential. > The other column family is only returned to client when essential column > family matches. I wonder if this may be of help to you. > > You mentioned regionserver going down or being busy. I assume it was not > often that regionserver(s) went down. For busy region server, did you try > jstack'ing regionserver process ? > > Thanks > > > On Fri, Dec 14, 2012 at 2:59 PM, Bryan Keller <brya...@gmail.com> wrote: > >> I have encountered a problem with HBaseClient.call() hanging. This occurs >> when one of my regionservers goes down while performing a table scan. >> >> What exacerbates this problem is that the scan I am performing uses >> filters, and the region size of the table is large (4gb). Because of this, >> it can take several minutes for a row to be returned when calling >> scanner.next(). Apparently there is no keep alive message being sent back >> to the scanner while the region server is busy, so I had to increase the >> hbase.rpc.timeout value to a large number (60 min), otherwise the next() >> call will timeout waiting for the regionserver to send something back. >> >> The result is that this HBaseClient.call() hang is made much worse, >> because it won't time out for 60 minutes. >> >> I have a couple of questions: >> >> 1. Any thoughts on why the HBaseClient.call() is getting stuck? I noticed >> that call.wait() is not using any timeout so it will wait indefinitely >> until interrupted externally >> >> 2. Is there a solution where I do not need to set hbase.rpc.timeout to a >> very large number? My only thought would be to forego using filters and do >> the filtering client side, which seems pretty inefficient >> >> Here is a stack dump of the thread that was hung: >> >> Thread 10609: (state = BLOCKED) >> - java.lang.Object.wait(long) @bci=0 (Interpreted frame) >> - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame) >> - >> org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable, >> java.net.InetSocketAddress, java.lang.Class, >> org.apache.hadoop.hbase.security.User, int) @bci=51, line=904 (Interpreted >> frame) >> - >> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object, >> java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150 >> (Interpreted frame) >> - $Proxy12.next(long, int) @bci=26 (Interpreted frame) >> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92 >> (Interpreted frame) >> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42 >> (Interpreted frame) >> - >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable) >> @bci=36, line=1325 (Interpreted frame) >> - org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117, >> line=1299 (Compiled frame) >> - org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue() >> @bci=41, line=150 (Interpreted frame) >> - org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue() >> @bci=4, line=142 (Interpreted frame) >> - >> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue() >> @bci=4, line=458 (Interpreted frame) >> - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4, >> line=76 (Interpreted frame) >> - >> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue() >> @bci=4, line=85 (Interpreted frame) >> - >> org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context) >> @bci=6, line=139 (Interpreted frame) >> - >> org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf, >> org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex, >> org.apache.hadoop.mapred.TaskUmbilicalProtocol, >> org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645 (Interpreted >> frame) >> - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf, >> org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=325 >> (Interpreted frame) >> - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted >> frame) >> - >> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, >> java.security.AccessControlContext) @bci=0 (Interpreted frame) >> - javax.security.auth.Subject.doAs(javax.security.auth.Subject, >> java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted >> frame) >> - >> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) >> @bci=14, line=1332 (Interpreted frame) >> - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=776, >> line=262 (Interpreted frame) >> >> >