hi all,

I am facing an odd issue while running a quite complex duplicates detection
process.

The code runs like a charm on a dataset of a million with few duplicates (3
minutes), but hits the scanner timeout over a dataset of 9.2M.

The problem happens randomly, and I don't think it is related to the
business logic, or the scan configurations for what matters.

The caching block is set to 100, and the scan timeout is 900.000
milliseconds (15min). The job would run normally in around 0.5 seconds on a
100 entries... therefore I must be hitting something deep. Something
related on how Hadoop and Hbase work together.

My problem is that it may fail or it may not. Yesterday I could complete
the whole scan without problems, the the job failed over another error.
Today, the same code failed after 3.5h, a little before completion of the
first phase.

I think it may be something about GC.

I log the execution time of every single map, and everything finishes
within milliseconds. Even then the exception happens. (as I catch it,
print, and throw it again).

Any idea of where the issue could be?

thanks a lot for the support. Stack trace appended.

saluti,
Stefano

Error: org.apache.hadoop.hbase.client.ScannerTimeoutException: 2387347ms
passed since the last invocation, timeout is currently set to 900000
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:352)
at
org.apache.flink.addons.hbase.TableInputFormat.nextRecord(TableInputFormat.java:106)
at
org.apache.flink.addons.hbase.TableInputFormat.nextRecord(TableInputFormat.java:48)
at
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:195)
at
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:246)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.UnknownScannerException:
org.apache.hadoop.hbase.UnknownScannerException: Name: 291, already closed?
at
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3043)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
at
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
at
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
at
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
at java.lang.Thread.run(Thread.java:745)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:283)
at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:198)
at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:336)
... 5 more
Caused by:
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.UnknownScannerException):
org.apache.hadoop.hbase.UnknownScannerException: Name: 291, already closed?
at
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3043)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
at
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
at
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
at
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
at java.lang.Thread.run(Thread.java:745)

at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1458)
at
org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1662)
at
org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1720)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29900)
at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:168)

Reply via email to