On 21 Sep 2018, at 00:15, Batyrshin Alexander <[email protected]
<mailto:[email protected]>> wrote:
And how this split looks at Master logs:
Sep 20 19:45:04 prod001 hbase[10838]: 2018-09-20 19:45:04,888
INFO [AM.ZK.Worker-pool5-t282] master.RegionStates: Transition
{3e44b85ddf407da831dbb9a871496986 state=OPEN,
ts=1537304859509, server=prod013,60020,1537304282885} to
{3e44b85ddf407da831dbb9a871496986 state=SPLITTING,
ts=1537461904888, server=prod
Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340
INFO [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
{3e44b85ddf407da831dbb9a871496986
state=SPLITTING, ts=1537461905340,
server=prod013,60020,1537304282885} to
{3e44b85ddf407da831dbb9a871496986 state=SPLIT, ts=1537461905340,
server=pro
Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340
INFO [AM.ZK.Worker-pool5-t284] master.RegionStates: Offlined
3e44b85ddf407da831dbb9a871496986 from prod013,60020,1537304282885
Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341
INFO [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
{33cba925c7acb347ac3f5e70e839c3cb
state=SPLITTING_NEW, ts=1537461905340,
server=prod013,60020,1537304282885} to
{33cba925c7acb347ac3f5e70e839c3cb state=OPEN, ts=1537461905341,
server=
Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341
INFO [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
{acb8f16a004a894c8706f6e12cd26144
state=SPLITTING_NEW, ts=1537461905340,
server=prod013,60020,1537304282885} to
{acb8f16a004a894c8706f6e12cd26144 state=OPEN, ts=1537461905341,
server=
Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,343
INFO [AM.ZK.Worker-pool5-t284] master.AssignmentManager:
Handled SPLIT
event;
parent=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
daughter a=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1
Sep 20 19:47:41 prod001 hbase[10838]: 2018-09-20 19:47:41,972
INFO [prod001,60000,1537304851459_ChoreService_2]
balancer.StochasticLoadBalancer: Skipping load balancing because
balanced cluster; total cost is 17.82282205608522, sum
multiplier is 1102.0 min cost which need balance is 0.05
Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,021
INFO [prod001,60000,1537304851459_ChoreService_1]
hbase.MetaTableAccessor:
Deleted
IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.
Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,022
INFO [prod001,60000,1537304851459_ChoreService_1]
master.CatalogJanitor: Scanned 779 catalog row(s), gc'd 0
unreferenced merged region(s) and 1 unreferenced parent region(s)
On 20 Sep 2018, at 21:43, Batyrshin Alexander
<[email protected] <mailto:[email protected]>> wrote:
Looks like problem was because of index region split
Index region split at prod013:
Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441
INFO
[regionserver/prod013/10.0.0.13:60020-splits-1537400010677]
regionserver.SplitRequest: Region split, hbase:meta updated,
and report to master.
Parent=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
new
regions:
IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1537461904877.33cba925c7acb347ac3f5e70e839c3cb.,
IDX_MARK_O,\x107834005168\x0000000046200068=4YF!YI,1537461904877.acb8f16a004a894c8706f6e12cd26144..
Split took 0sec
Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441
INFO
[regionserver/prod013/10.0.0.13:60020-splits-1537400010677]
regionserver.SplitRequest: Split transaction journal:
Sep 20 19:45:05 prod013 hbase[193055]: STARTED at
1537461904853
Sep 20 19:45:05 prod013 hbase[193055]: PREPARED at
1537461904877
Sep 20 19:45:05 prod013 hbase[193055]:
BEFORE_PRE_SPLIT_HOOK at 1537461904877
Sep 20 19:45:05 prod013 hbase[193055]:
AFTER_PRE_SPLIT_HOOK at 1537461904877
Sep 20 19:45:05 prod013 hbase[193055]: SET_SPLITTING at
1537461904880
Sep 20 19:45:05 prod013 hbase[193055]: CREATE_SPLIT_DIR
at 1537461904987
Sep 20 19:45:05 prod013 hbase[193055]:
CLOSED_PARENT_REGION at 1537461905002
Sep 20 19:45:05 prod013 hbase[193055]: OFFLINED_PARENT
at 1537461905002
Sep 20 19:45:05 prod013 hbase[193055]:
STARTED_REGION_A_CREATION at 1537461905056
Sep 20 19:45:05 prod013 hbase[193055]:
STARTED_REGION_B_CREATION at 1537461905131
Sep 20 19:45:05 prod013 hbase[193055]: PONR at
1537461905192
Sep 20 19:45:05 prod013 hbase[193055]: OPENED_REGION_A
at 1537461905249
Sep 20 19:45:05 prod013 hbase[193055]: OPENED_REGION_B
at 1537461905252
Sep 20 19:45:05 prod013 hbase[193055]:
BEFORE_POST_SPLIT_HOOK at 1537461905439
Sep 20 19:45:05 prod013 hbase[193055]:
AFTER_POST_SPLIT_HOOK at 1537461905439
Sep 20 19:45:05 prod013 hbase[193055]: COMPLETED at
1537461905439
Index update failed at prod002:
Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,520
WARN [hconnection-0x4f3242a0-shared--pool32-t36014]
client.AsyncProcess: #220, table=IDX_MARK_O, attempt=1/1
failed=1ops, last exception:
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Re
gion
IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.
is not online on prod013,60020,1537304282885
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3081)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1271)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2365)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2359)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
Sep 20 20:09:24 prod002 hbase[97285]: on
prod013,60020,1537304282885, tracking started Thu Sep 20
20:09:24 MSK 2018; not retrying 1 - final failure
Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,549
INFO
[RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
zookeeper.RecoverableZooKeeper: Process
identifier=hconnection-0x39beae45 connecting to ZooKeeper
ensemble=10.0.0.1:2181 <http://10.0.0.1:2181/>,10.0.0.2:2181
<http://10.0.0.2:2181/>,10.0.0.3:2181 <http://10.0.0.3:2181/>
Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,549
INFO
[RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
zookeeper.ZooKeeper: Initiating client
connection, connectString=10.0.0.1:2181
<http://10.0.0.1:2181/>,10.0.0.2:2181
<http://10.0.0.2:2181/>,10.0.0.3:2181 <http://10.0.0.3:2181/>
sessionTimeout=90000
watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@3ef61f7
Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,562
INFO
[RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.0.0.3:2181
<http://10.0.0.3:2181/>)] zookeeper.ClientCnxn: Opening
socket connection to server 10.0.0.3/10.0.0.3:2181
<http://10.0.0.3/10.0.0.3:2181>. Will not attempt to
authenticate using SASL (unknown error)
Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,570
INFO
[RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.0.0.3:2181
<http://10.0.0.3:2181/>)] zookeeper.ClientCnxn: Socket
connection established to 10.0.0.3/10.0.0.3:2181
<http://10.0.0.3/10.0.0.3:2181>, initiating session
Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,572
INFO
[RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.0.0.3:2181
<http://10.0.0.3:2181/>)] zookeeper.ClientCnxn:
Session establishment complete on server 10.0.0.3/10.0.0.3:2181
<http://10.0.0.3/10.0.0.3:2181>, sessionid = 0x30000e039e01c7f,
negotiated timeout = 40000
Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,628
INFO
[RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
index.PhoenixIndexFailurePolicy: Successfully
update INDEX_DISABLE_TIMESTAMP for IDX_MARK_O due to an
exception while writing updates. indexState=PENDING_DISABLE
Sep 20 20:09:24 prod002 hbase[97285]:
org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException:
disableIndexOnFailure=true, Failed to write to multiple index
tables: [IDX_MARK_O]
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:235)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:620)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:595)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:578)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3646)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3108)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3050)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:916)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:844)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2405)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2359)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
Sep 20 20:09:24 prod002 hbase[97285]: at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,632
INFO
[RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
util.IndexManagementUtil:
Rethrowing org.apache.hadoop.hbase.DoNotRetryIOException: ERROR
1121 (XCL21): Write to the index failed.
disableIndexOnFailure=true, Failed to write to multiple index
tables: [IDX_MARK_O] ,serverTimestamp=1537463364504,
On 20 Sep 2018, at 21:01, Batyrshin Alexander
<[email protected] <mailto:[email protected]>> wrote:
Our setup:
HBase-1.4.7
Phoenix-4.14-hbase-1.4
On 20 Sep 2018, at 20:19, Batyrshin Alexander
<[email protected] <mailto:[email protected]>> wrote:
Hello,
Looks live we got dead lock with repeating "ERROR 1120
(XCL20)" exception. At this time all indexes is ACTIVE.
Can you help to make deeper diagnose?
java.sql.SQLException: ERROR 1120 (XCL20): Writes to table
blocked until index can be updated. tableName=TBL_MARK
at
org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:494)
at
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
at
org.apache.phoenix.execute.MutationState.validateAndGetServerTimestamp(MutationState.java:815)
at
org.apache.phoenix.execute.MutationState.validateAll(MutationState.java:789)
at
org.apache.phoenix.execute.MutationState.send(MutationState.java:981)
at
org.apache.phoenix.execute.MutationState.send(MutationState.java:1514)
at
org.apache.phoenix.execute.MutationState.commit(MutationState.java:1337)
at
org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:670)
at
org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:666)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at
org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:666)
at
x.persistence.phoenix.PhoenixDao.$anonfun$doUpsert$1(PhoenixDao.scala:103)
at scala.util.Try$.apply(Try.scala:209)
at
x.persistence.phoenix.PhoenixDao.doUpsert(PhoenixDao.scala:101)
at
x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$2(PhoenixDao.scala:45)
at
x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$2$adapted(PhoenixDao.scala:45)
at scala.collection.immutable.Stream.flatMap(Stream.scala:486)
at
scala.collection.immutable.Stream.$anonfun$flatMap$1(Stream.scala:494)
at
scala.collection.immutable.Stream.$anonfun$append$1(Stream.scala:252)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1169)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1159)
at scala.collection.immutable.Stream.length(Stream.scala:309)
at scala.collection.SeqLike.size(SeqLike.scala:105)
at scala.collection.SeqLike.size$(SeqLike.scala:105)
at scala.collection.AbstractSeq.size(Seq.scala:41)
at
scala.collection.TraversableOnce.toArray(TraversableOnce.scala:285)
at
scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:283)
at
scala.collection.AbstractTraversable.toArray(Traversable.scala:104)
at
x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$1(PhoenixDao.scala:45)
at scala.util.Try$.apply(Try.scala:209)
at
x.persistence.phoenix.PhoenixDao.batchInsert(PhoenixDao.scala:45)
at
x.persistence.phoenix.PhoenixDao.$anonfun$insert$2(PhoenixDao.scala:35)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:655)
at scala.util.Success.$anonfun$map$1(Try.scala:251)
at scala.util.Success.map(Try.scala:209)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:289)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
at
scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)