Looks like problem was because of index region split Index region split at prod013: Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO [regionserver/prod013/10.0.0.13:60020-splits-1537400010677] regionserver.SplitRequest: Region split, hbase:meta updated, and report to master. Parent=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986., new regions: IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1537461904877.33cba925c7acb347ac3f5e70e839c3cb., IDX_MARK_O,\x107834005168\x0000000046200068=4YF!YI,1537461904877.acb8f16a004a894c8706f6e12cd26144.. Split took 0sec Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO [regionserver/prod013/10.0.0.13:60020-splits-1537400010677] regionserver.SplitRequest: Split transaction journal: Sep 20 19:45:05 prod013 hbase[193055]: STARTED at 1537461904853 Sep 20 19:45:05 prod013 hbase[193055]: PREPARED at 1537461904877 Sep 20 19:45:05 prod013 hbase[193055]: BEFORE_PRE_SPLIT_HOOK at 1537461904877 Sep 20 19:45:05 prod013 hbase[193055]: AFTER_PRE_SPLIT_HOOK at 1537461904877 Sep 20 19:45:05 prod013 hbase[193055]: SET_SPLITTING at 1537461904880 Sep 20 19:45:05 prod013 hbase[193055]: CREATE_SPLIT_DIR at 1537461904987 Sep 20 19:45:05 prod013 hbase[193055]: CLOSED_PARENT_REGION at 1537461905002 Sep 20 19:45:05 prod013 hbase[193055]: OFFLINED_PARENT at 1537461905002 Sep 20 19:45:05 prod013 hbase[193055]: STARTED_REGION_A_CREATION at 1537461905056 Sep 20 19:45:05 prod013 hbase[193055]: STARTED_REGION_B_CREATION at 1537461905131 Sep 20 19:45:05 prod013 hbase[193055]: PONR at 1537461905192 Sep 20 19:45:05 prod013 hbase[193055]: OPENED_REGION_A at 1537461905249 Sep 20 19:45:05 prod013 hbase[193055]: OPENED_REGION_B at 1537461905252 Sep 20 19:45:05 prod013 hbase[193055]: BEFORE_POST_SPLIT_HOOK at 1537461905439 Sep 20 19:45:05 prod013 hbase[193055]: AFTER_POST_SPLIT_HOOK at 1537461905439 Sep 20 19:45:05 prod013 hbase[193055]: COMPLETED at 1537461905439
Index update failed at prod002: Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,520 WARN [hconnection-0x4f3242a0-shared--pool32-t36014] client.AsyncProcess: #220, table=IDX_MARK_O, attempt=1/1 failed=1ops, last exception: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Re gion IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986. is not online on prod013,60020,1537304282885 Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3081) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1271) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2365) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2359) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277) Sep 20 20:09:24 prod002 hbase[97285]: on prod013,60020,1537304282885, tracking started Thu Sep 20 20:09:24 MSK 2018; not retrying 1 - final failure Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,549 INFO [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x39beae45 connecting to ZooKeeper ensemble=10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181 Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,549 INFO [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020] zookeeper.ZooKeeper: Initiating client connection, connectString=10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@3ef61f7 Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,562 INFO [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.0.0.3:2181)] zookeeper.ClientCnxn: Opening socket connection to server 10.0.0.3/10.0.0.3:2181. Will not attempt to authenticate using SASL (unknown error) Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,570 INFO [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.0.0.3:2181)] zookeeper.ClientCnxn: Socket connection established to 10.0.0.3/10.0.0.3:2181, initiating session Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,572 INFO [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.0.0.3:2181)] zookeeper.ClientCnxn: Session establishment complete on server 10.0.0.3/10.0.0.3:2181, sessionid = 0x30000e039e01c7f, negotiated timeout = 40000 Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,628 INFO [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020] index.PhoenixIndexFailurePolicy: Successfully update INDEX_DISABLE_TIMESTAMP for IDX_MARK_O due to an exception while writing updates. indexState=PENDING_DISABLE Sep 20 20:09:24 prod002 hbase[97285]: org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException: disableIndexOnFailure=true, Failed to write to multiple index tables: [IDX_MARK_O] Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:235) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:620) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:595) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:578) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3646) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3108) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3050) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:916) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:844) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2405) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2359) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297) Sep 20 20:09:24 prod002 hbase[97285]: at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277) Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,632 INFO [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020] util.IndexManagementUtil: Rethrowing org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 1121 (XCL21): Write to the index failed. disableIndexOnFailure=true, Failed to write to multiple index tables: [IDX_MARK_O] ,serverTimestamp=1537463364504, > On 20 Sep 2018, at 21:01, Batyrshin Alexander <0x62...@gmail.com> wrote: > > Our setup: > HBase-1.4.7 > Phoenix-4.14-hbase-1.4 > > >> On 20 Sep 2018, at 20:19, Batyrshin Alexander <0x62...@gmail.com >> <mailto:0x62...@gmail.com>> wrote: >> >> Hello, >> Looks live we got dead lock with repeating "ERROR 1120 (XCL20)" exception. >> At this time all indexes is ACTIVE. >> Can you help to make deeper diagnose? >> >> java.sql.SQLException: ERROR 1120 (XCL20): Writes to table blocked until >> index can be updated. tableName=TBL_MARK >> at >> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:494) >> at >> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150) >> at >> org.apache.phoenix.execute.MutationState.validateAndGetServerTimestamp(MutationState.java:815) >> at >> org.apache.phoenix.execute.MutationState.validateAll(MutationState.java:789) >> at org.apache.phoenix.execute.MutationState.send(MutationState.java:981) >> at >> org.apache.phoenix.execute.MutationState.send(MutationState.java:1514) >> at >> org.apache.phoenix.execute.MutationState.commit(MutationState.java:1337) >> at >> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:670) >> at >> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:666) >> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) >> at >> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:666) >> at >> x.persistence.phoenix.PhoenixDao.$anonfun$doUpsert$1(PhoenixDao.scala:103) >> at scala.util.Try$.apply(Try.scala:209) >> at x.persistence.phoenix.PhoenixDao.doUpsert(PhoenixDao.scala:101) >> at >> x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$2(PhoenixDao.scala:45) >> at >> x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$2$adapted(PhoenixDao.scala:45) >> at scala.collection.immutable.Stream.flatMap(Stream.scala:486) >> at >> scala.collection.immutable.Stream.$anonfun$flatMap$1(Stream.scala:494) >> at scala.collection.immutable.Stream.$anonfun$append$1(Stream.scala:252) >> at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1169) >> at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1159) >> at scala.collection.immutable.Stream.length(Stream.scala:309) >> at scala.collection.SeqLike.size(SeqLike.scala:105) >> at scala.collection.SeqLike.size$(SeqLike.scala:105) >> at scala.collection.AbstractSeq.size(Seq.scala:41) >> at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:285) >> at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:283) >> at scala.collection.AbstractTraversable.toArray(Traversable.scala:104) >> at >> x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$1(PhoenixDao.scala:45) >> at scala.util.Try$.apply(Try.scala:209) >> at x.persistence.phoenix.PhoenixDao.batchInsert(PhoenixDao.scala:45) >> at >> x.persistence.phoenix.PhoenixDao.$anonfun$insert$2(PhoenixDao.scala:35) >> at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:655) >> at scala.util.Success.$anonfun$map$1(Try.scala:251) >> at scala.util.Success.map(Try.scala:209) >> at scala.concurrent.Future.$anonfun$map$1(Future.scala:289) >> at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29) >> at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29) >> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> >