Hello,
I’m running HBase 1.4.4. I’ve got a simple endpoint coprocessor that sums
records when called. Whenever a split occurs, it fails when called,
throwing a RegionNotFoundException. The error manifests itself by spending
10 minutes retrying the connection 35 times:
2019-02-19 09:42:34 INFO o.a.h.h.c.RpcRetryingCaller
[hconnection-0x100f9a76-shared--pool3-t215]: Call exception, tries=25,
retries=35, started=331810 ms ago, cancelled=false,
msg=org.apache.hadoop.hbase.NotServingRegionException: Region
coprocessor-test,1,1550568604433.63f03f2a494dc5756238ba08af437af6. is not
online on <hostname>,16020,1550568101996
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3082)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1275)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2201)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36617)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2354)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
row '1_pfx-cfb0e548-f399-4059-af80-54fe9b7a828f' on table
'coprocessor-test' at
region=coprocessor-test,1_pfx-7b2b6071-7d2c-4282-9645-31ca027327dc6549,1550568988094.f6cc0c6245702c544fb7fe65c1e3299b.,
hostname=<hostname>l,16020,1550568101996, seqNum=630
before eventually failing:
Tue Feb 19 09:37:02 UTC 2019,
RpcRetryingCaller{globalStartTime=1550569022304, pause=100, retries=35},
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region
coprocessor-test,9,1550568604433.2d98945e85cca401a2c5d8bd777a0451. is not
online on <hostname>,16020,1550568099593
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3082)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1275)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2201)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36617)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2354)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
If I then re-run the coprocessor, it works without any issues. So, I need a
way to quickly catch this error and manually retry it until it works. I
can't see a way to change any useful parameter – the 35 retries and the
time between retries seem to be hardcoded.
Can anyone suggest how I can go about solving this?
Regards,
Ben