thanks for your suggestion. I found something interesting, not sure if that is some potential reason. That is my indexes created on my tables. I created a lot of indexes. After I removed all of the indexes, it seems things went better(no more hanging like that). So I am suspecting there is some incompatible or other issues in the way I set up mu cluster.
Something special i used to create table: )c.DATA_BLOCK_ENCODING='FAST_DIFF', SALT_BUCKETS=3, COMPRESSION='GZ',TRANSACTIONAL=true ; and some indexes I created like this: CREATE INDEX testing_IDX_2 ON xxx.xxx (field1, field2) INCLUDE (field3, field4) On 2018/04/09 17:04:03, Josh Elser <[email protected]> wrote: > Have you looked at DEBUG logging client and server(HBase) side? > > The "Call exception" log messages imply that the client is repeatedly > trying to issue an RPC to a RegionServer and failing. This should be > where you focus your attention. It may be something trivial to fix > related to configuration/security setup. > > On 4/8/18 2:04 AM, [email protected] wrote: > > Hi, I got below tricky problem: > > Situation: > > I successfully did a upsert into multiple tables with transaction > > enabled(and there are many index created on these table). > > Problem: > > after the fist time upsert done successfully, I tried to do the 2nd, > > 3rd.... and next same upsert, sometime, the 2nd works, then the 3rd upsert > > will get timeout exception, at this time, the whole phoenix seems hangs > > there and keep retrying. I tried to stop the whole hbase cluster including > > phoenix queryserver and tepera and restart, then when I try to connect with > > sqlline.py, it got hang again. > > > > hbase-site.xml setting: > > <property> > > <name>hbase.regionserver.wal.codec</name> > > > > <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value> > > </property> > > <property> > > <name>phoenix.transactions.enabled</name> > > <value>true</value> > > </property> > > <property> > > <name>data.tx.snapshot.dir</name> > > <value>/tmp/tephra/snapshots</value> > > </property> > > <property> > > <name>data.tx.timeout</name> > > <value>120</value> > > </property> > > <property> > > <name>phoenix.query.timeoutMs</name> > > <value>1800000</value> > > </property> > > <property> > > <name>hbase.regionserver.lease.period</name> > > <value>1200000</value> > > </property> > > <property> > > <name>hbase.rpc.timeout</name> > > <value>1200000</value> > > </property> > > <property> > > <name>hbase.client.scanner.caching</name> > > <value>1000</value> > > </property> > > <property> > > <name>hbase.client.scanner.timeout.period</name> > > <value>1200000</value> > > </property> > > > > > > > > Below is some queryserver log: > > 18/04/08 05:47:12 INFO zookeeper.ZooKeeper: Initiating client connection, > > connectString=xxxx.xxxx.local:2181 sessionTimeout=90000 > > watcher=org.apache.tephra.zookeeper.TephraZKClientService$5@6700104f > > 18/04/08 05:47:12 INFO zookeeper.ClientCnxn: Opening socket connection to > > server xxxx.xxxx.local/127.0.0.1:2181. Will not attempt to authenticate > > using SASL (unknown error) > > 18/04/08 05:47:12 INFO zookeeper.ClientCnxn: Socket connection established > > to xxxx.xxxx.local/127.0.0.1:2181, initiating session > > 18/04/08 05:47:12 INFO zookeeper.ClientCnxn: Session establishment complete > > on server xxxx.xxxx.local/127.0.0.1:2181, sessionid = 0x162a3c72c9c0012, > > negotiated timeout = 90000 > > 18/04/08 05:57:39 INFO client.RpcRetryingCaller: Call exception, tries=10, > > retries=35, started=38310 ms ago, cancelled=false, msg=row > > 'SYSTEM.CATALOG,xxxLOAD_*N**_DIM,99999999999999' on table 'hbase:meta' at > > region=hbase:meta,,1.1588230740, > > hostname=xxxx.xxxx.local,16201,1523166165622, seqNum=0 > > 18/04/08 05:57:49 INFO client.RpcRetryingCaller: Call exception, tries=11, > > retries=35, started=48335 ms ago, cancelled=false, msg=row > > 'SYSTEM.CATALOG,xxxxxLOAD_*N**_DIM,99999999999999' on table 'hbase:meta' at > > region=hbase:meta,,1.1588230740, > > hostname=xxx.xxx.local,16201,1523166165622, seqNum=0 > > >
