Hi, I got below tricky problem:
Situation:
I successfully did a upsert into multiple tables with transaction enabled(and
there are many index created on these table).
Problem:
after the fist time upsert done successfully, I tried to do the 2nd, 3rd....
and next same upsert, sometime, the 2nd works, then the 3rd upsert will get
timeout exception, at this time, the whole phoenix seems hangs there and keep
retrying. I tried to stop the whole hbase cluster including phoenix queryserver
and tepera and restart, then when I try to connect with sqlline.py, it got hang
again.
hbase-site.xml setting:
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
<property>
<name>phoenix.transactions.enabled</name>
<value>true</value>
</property>
<property>
<name>data.tx.snapshot.dir</name>
<value>/tmp/tephra/snapshots</value>
</property>
<property>
<name>data.tx.timeout</name>
<value>120</value>
</property>
<property>
<name>phoenix.query.timeoutMs</name>
<value>1800000</value>
</property>
<property>
<name>hbase.regionserver.lease.period</name>
<value>1200000</value>
</property>
<property>
<name>hbase.rpc.timeout</name>
<value>1200000</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>1000</value>
</property>
<property>
<name>hbase.client.scanner.timeout.period</name>
<value>1200000</value>
</property>
Below is some queryserver log:
18/04/08 05:47:12 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=xxxx.xxxx.local:2181 sessionTimeout=90000
watcher=org.apache.tephra.zookeeper.TephraZKClientService$5@6700104f
18/04/08 05:47:12 INFO zookeeper.ClientCnxn: Opening socket connection to
server xxxx.xxxx.local/127.0.0.1:2181. Will not attempt to authenticate using
SASL (unknown error)
18/04/08 05:47:12 INFO zookeeper.ClientCnxn: Socket connection established to
xxxx.xxxx.local/127.0.0.1:2181, initiating session
18/04/08 05:47:12 INFO zookeeper.ClientCnxn: Session establishment complete on
server xxxx.xxxx.local/127.0.0.1:2181, sessionid = 0x162a3c72c9c0012,
negotiated timeout = 90000
18/04/08 05:57:39 INFO client.RpcRetryingCaller: Call exception, tries=10,
retries=35, started=38310 ms ago, cancelled=false, msg=row
'SYSTEM.CATALOG,xxxLOAD_*N**_DIM,99999999999999' on table 'hbase:meta' at
region=hbase:meta,,1.1588230740, hostname=xxxx.xxxx.local,16201,1523166165622,
seqNum=0
18/04/08 05:57:49 INFO client.RpcRetryingCaller: Call exception, tries=11,
retries=35, started=48335 ms ago, cancelled=false, msg=row
'SYSTEM.CATALOG,xxxxxLOAD_*N**_DIM,99999999999999' on table 'hbase:meta' at
region=hbase:meta,,1.1588230740, hostname=xxx.xxx.local,16201,1523166165622,
seqNum=0