It can be, for several reasons: 1) The reader feeds the data into an operation that cannot always consume data. A sort for example takes records until the buffer is full and the sort is triggered. A join may take some records and then pause until the necessary hash tables are fully built.
2) The JVM pauses with garbage collection. This can in some scenarios be quite a bit (minutes). Stephan On Thu, Nov 27, 2014 at 6:10 PM, Flavio Pompermaier <[email protected]> wrote: > Could it be that there are times in the TaskManager where there are large > pauses between an inputFormat.nextRecord() and the next one..? > > On Thu, Nov 27, 2014 at 3:44 PM, Stefano Bortoli <[email protected]> > wrote: > >> hi all, >> >> I am facing an odd issue while running a quite complex duplicates >> detection process. >> >> The code runs like a charm on a dataset of a million with few duplicates >> (3 minutes), but hits the scanner timeout over a dataset of 9.2M. >> >> The problem happens randomly, and I don't think it is related to the >> business logic, or the scan configurations for what matters. >> >> The caching block is set to 100, and the scan timeout is 900.000 >> milliseconds (15min). The job would run normally in around 0.5 seconds on a >> 100 entries... therefore I must be hitting something deep. Something >> related on how Hadoop and Hbase work together. >> >> My problem is that it may fail or it may not. Yesterday I could complete >> the whole scan without problems, the the job failed over another error. >> Today, the same code failed after 3.5h, a little before completion of the >> first phase. >> >> I think it may be something about GC. >> >> I log the execution time of every single map, and everything finishes >> within milliseconds. Even then the exception happens. (as I catch it, >> print, and throw it again). >> >> Any idea of where the issue could be? >> >> thanks a lot for the support. Stack trace appended. >> >> saluti, >> Stefano >> >> Error: org.apache.hadoop.hbase.client.ScannerTimeoutException: 2387347ms >> passed since the last invocation, timeout is currently set to 900000 >> at >> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:352) >> at >> org.apache.flink.addons.hbase.TableInputFormat.nextRecord(TableInputFormat.java:106) >> at >> org.apache.flink.addons.hbase.TableInputFormat.nextRecord(TableInputFormat.java:48) >> at >> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:195) >> at >> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:246) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: org.apache.hadoop.hbase.UnknownScannerException: >> org.apache.hadoop.hbase.UnknownScannerException: Name: 291, already closed? >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3043) >> at >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497) >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012) >> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) >> at >> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) >> at >> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) >> at >> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) >> at java.lang.Thread.run(Thread.java:745) >> >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >> at >> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >> at >> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >> at >> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) >> at >> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) >> at >> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:283) >> at >> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:198) >> at >> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57) >> at >> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) >> at >> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90) >> at >> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:336) >> ... 5 more >> Caused by: >> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.UnknownScannerException): >> org.apache.hadoop.hbase.UnknownScannerException: Name: 291, already closed? >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3043) >> at >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497) >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012) >> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) >> at >> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) >> at >> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) >> at >> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) >> at java.lang.Thread.run(Thread.java:745) >> >> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1458) >> at >> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1662) >> at >> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1720) >> at >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29900) >> at >> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:168) >> > >
