ScannerTimeoutException during MapReduce

Jan Lukavský Thu, 11 Aug 2011 06:59:15 -0700

Hi,

we've recently moved to HBase 0.90.3 (cdh3u1) from 0.20.6, whichresolved most of our previous issues, but we are now having much moreScannerTimeoutExceptions than before. All these exceptions come fromtrace like this


org.apache.hadoop.hbase.client.ScannerTimeoutException: 307127ms passed since 
the last invocation, timeout is currently set to 60000
        at 
org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1133)
        at 
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:143)
        at 
org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:142)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456)
        at 
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)

After a bit of investigation, I suppose that cause of this is that thefirst call to scanner.next() after HTable.getScanner() times out. Whatcould be the cause of this? I see neither any region moving around inthe cluster nor any compation on the side of the regionserver. As longas I can tell everything looks just fine. This would suggest, that ittook too long to locate the regionserver in call to HTable.getScanner(),but I cannot see any reason. Could this issue be resolved on the side ofTableRecordReader? Eg. at TableRecordReaderImpl.java:143 theScannerTimeoutException could be caught and the scanner restarted acouple more times (say configurable?).

After looking at the code it also seems to me, that there may be a bugcausing the reader to skip the first row of region. The scenario is asfollows:

 - the reader is initialized with TableRecordReader.init()

- then nextKeyValue is called, causing call to scanner.next() - hereScannerTimeoutException occurs- the scanner is restarted by call to restart() and then *two* callsto scanner.next() occur, causing we have lost the first row


Can anyone confirm this?

Thanks,
 Jan

ScannerTimeoutException during MapReduce

Reply via email to