It can be, for several reasons:

1) The reader feeds the data into an operation that cannot always consume
data. A sort for example takes records until the buffer is full and the
sort is triggered. A join may take some records and then pause until the
necessary hash tables are fully built.

2) The JVM pauses with garbage collection. This can in some scenarios be
quite a bit (minutes).

Stephan

On Thu, Nov 27, 2014 at 6:10 PM, Flavio Pompermaier <[email protected]>
wrote:

> Could it be that there are times in the TaskManager where there are large
> pauses between an inputFormat.nextRecord() and the next one..?
>
> On Thu, Nov 27, 2014 at 3:44 PM, Stefano Bortoli <[email protected]>
> wrote:
>
>> hi all,
>>
>> I am facing an odd issue while running a quite complex duplicates
>> detection process.
>>
>> The code runs like a charm on a dataset of a million with few duplicates
>> (3 minutes), but hits the scanner timeout over a dataset of 9.2M.
>>
>> The problem happens randomly, and I don't think it is related to the
>> business logic, or the scan configurations for what matters.
>>
>> The caching block is set to 100, and the scan timeout is 900.000
>> milliseconds (15min). The job would run normally in around 0.5 seconds on a
>> 100 entries... therefore I must be hitting something deep. Something
>> related on how Hadoop and Hbase work together.
>>
>> My problem is that it may fail or it may not. Yesterday I could complete
>> the whole scan without problems, the the job failed over another error.
>> Today, the same code failed after 3.5h, a little before completion of the
>> first phase.
>>
>> I think it may be something about GC.
>>
>> I log the execution time of every single map, and everything finishes
>> within milliseconds. Even then the exception happens. (as I catch it,
>> print, and throw it again).
>>
>> Any idea of where the issue could be?
>>
>> thanks a lot for the support. Stack trace appended.
>>
>> saluti,
>> Stefano
>>
>> Error: org.apache.hadoop.hbase.client.ScannerTimeoutException: 2387347ms
>> passed since the last invocation, timeout is currently set to 900000
>> at
>> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:352)
>> at
>> org.apache.flink.addons.hbase.TableInputFormat.nextRecord(TableInputFormat.java:106)
>> at
>> org.apache.flink.addons.hbase.TableInputFormat.nextRecord(TableInputFormat.java:48)
>> at
>> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:195)
>> at
>> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:246)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: org.apache.hadoop.hbase.UnknownScannerException:
>> org.apache.hadoop.hbase.UnknownScannerException: Name: 291, already closed?
>> at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3043)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>> at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
>> at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
>> at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>> at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>> at
>> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>> at
>> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
>> at
>> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:283)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:198)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
>> at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
>> at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
>> at
>> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:336)
>> ... 5 more
>> Caused by:
>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.UnknownScannerException):
>> org.apache.hadoop.hbase.UnknownScannerException: Name: 291, already closed?
>> at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3043)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>> at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
>> at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
>> at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1458)
>> at
>> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1662)
>> at
>> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1720)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29900)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:168)
>>
>
>

Reply via email to