Hi, sorry to reply late.
that is just a part of my hbase-site.xml.
below is the full content:
************************************hbase/conf/hbase.site.xml**************************
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://broker.xxx-xxx.local:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>broker.xxx-xxx.local</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>broker.xxx-xxx.local</value>
</property>
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
<property>
<name>phoenix.transactions.enabled</name>
<value>true</value>
</property>
<property>
<name>data.tx.snapshot.dir</name>
<value>/tmp/tephra/snapshots</value>
</property>
<property>
<name>data.tx.timeout</name>
<value>120</value>
</property>
<property>
<name>phoenix.query.timeoutMs</name>
<value>2800000</value>
</property>
<property>
<name>hbase.regionserver.lease.period</name>
<value>2200000</value>
</property>
<property>
<name>hbase.rpc.timeout</name>
<value>2200000</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>2000</value>
</property>
<property>
<name>hbase.client.scanner.timeout.period</name>
<value>2200000</value>
</property>
</configuration>
*************************client:
phoenix/bin/hbase-site.xml*************************
<configuration>
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
<property>
<name>phoenix.transactions.enabled</name>
<value>true</value>
</property>
<property>
<name>data.tx.snapshot.dir</name>
<value>/tmp/tephra/snapshots</value>
</property>
<property>
<name>data.tx.timeout</name>
<value>120</value>
</property>
<property>
<name>phoenix.query.timeoutMs</name>
<value>2800000</value>
</property>
<property>
<name>hbase.regionserver.lease.period</name>
<value>2200000</value>
</property>
<property>
<name>hbase.rpc.timeout</name>
<value>2200000</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>2000</value>
</property>
<property>
<name>hbase.client.scanner.timeout.period</name>
<value>2200000</value>
</property>
</configuration>
On 2018/04/09 18:01:14, Josh Elser <[email protected]> wrote:
> The hbase-site.xml elements you shared earlier, were those your entire
> hbase-site contents or just part of it?
>
> Make sure you have the required properties set as described on
> https://phoenix.apache.org/secondary_indexing.html for your indexes. If
> you're still seeing problems, you may need to increase the number of
> handlers you configured HBase to use.
>
> While in the stuck state, you may benefit from getting a thread-dump or
> two from the client and your regionserver(s). This would help in
> figuring out exactly where things are stuck (like the DEBUG logs would do).
>
> On 4/9/18 1:30 PM, [email protected] wrote:
> > thanks for your suggestion. I found something interesting, not sure if
> > that is some potential reason. That is my indexes created on my tables.
> > I created a lot of indexes. After I removed all of the indexes, it seems
> > things went better(no more hanging like that). So I am suspecting there is
> > some incompatible or other issues in the way I set up mu cluster.
> >
> > Something special i used to create table:
> > )c.DATA_BLOCK_ENCODING='FAST_DIFF', SALT_BUCKETS=3,
> > COMPRESSION='GZ',TRANSACTIONAL=true ;
> > and some indexes I created like this:
> > CREATE INDEX testing_IDX_2 ON xxx.xxx (field1, field2) INCLUDE (field3,
> > field4)
> >
> >
> >
> > On 2018/04/09 17:04:03, Josh Elser <[email protected]> wrote:
> >> Have you looked at DEBUG logging client and server(HBase) side?
> >>
> >> The "Call exception" log messages imply that the client is repeatedly
> >> trying to issue an RPC to a RegionServer and failing. This should be
> >> where you focus your attention. It may be something trivial to fix
> >> related to configuration/security setup.
> >>
> >> On 4/8/18 2:04 AM, [email protected] wrote:
> >>> Hi, I got below tricky problem:
> >>> Situation:
> >>> I successfully did a upsert into multiple tables with transaction
> >>> enabled(and there are many index created on these table).
> >>> Problem:
> >>> after the fist time upsert done successfully, I tried to do the 2nd,
> >>> 3rd.... and next same upsert, sometime, the 2nd works, then the 3rd
> >>> upsert will get timeout exception, at this time, the whole phoenix seems
> >>> hangs there and keep retrying. I tried to stop the whole hbase cluster
> >>> including phoenix queryserver and tepera and restart, then when I try to
> >>> connect with sqlline.py, it got hang again.
> >>>
> >>> hbase-site.xml setting:
> >>> <property>
> >>> <name>hbase.regionserver.wal.codec</name>
> >>>
> >>> <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
> >>> </property>
> >>> <property>
> >>> <name>phoenix.transactions.enabled</name>
> >>> <value>true</value>
> >>> </property>
> >>> <property>
> >>> <name>data.tx.snapshot.dir</name>
> >>> <value>/tmp/tephra/snapshots</value>
> >>> </property>
> >>> <property>
> >>> <name>data.tx.timeout</name>
> >>> <value>120</value>
> >>> </property>
> >>> <property>
> >>> <name>phoenix.query.timeoutMs</name>
> >>> <value>1800000</value>
> >>> </property>
> >>> <property>
> >>> <name>hbase.regionserver.lease.period</name>
> >>> <value>1200000</value>
> >>> </property>
> >>> <property>
> >>> <name>hbase.rpc.timeout</name>
> >>> <value>1200000</value>
> >>> </property>
> >>> <property>
> >>> <name>hbase.client.scanner.caching</name>
> >>> <value>1000</value>
> >>> </property>
> >>> <property>
> >>> <name>hbase.client.scanner.timeout.period</name>
> >>> <value>1200000</value>
> >>> </property>
> >>>
> >>>
> >>>
> >>> Below is some queryserver log:
> >>> 18/04/08 05:47:12 INFO zookeeper.ZooKeeper: Initiating client connection,
> >>> connectString=xxxx.xxxx.local:2181 sessionTimeout=90000
> >>> watcher=org.apache.tephra.zookeeper.TephraZKClientService$5@6700104f
> >>> 18/04/08 05:47:12 INFO zookeeper.ClientCnxn: Opening socket connection to
> >>> server xxxx.xxxx.local/127.0.0.1:2181. Will not attempt to authenticate
> >>> using SASL (unknown error)
> >>> 18/04/08 05:47:12 INFO zookeeper.ClientCnxn: Socket connection
> >>> established to xxxx.xxxx.local/127.0.0.1:2181, initiating session
> >>> 18/04/08 05:47:12 INFO zookeeper.ClientCnxn: Session establishment
> >>> complete on server xxxx.xxxx.local/127.0.0.1:2181, sessionid =
> >>> 0x162a3c72c9c0012, negotiated timeout = 90000
> >>> 18/04/08 05:57:39 INFO client.RpcRetryingCaller: Call exception,
> >>> tries=10, retries=35, started=38310 ms ago, cancelled=false, msg=row
> >>> 'SYSTEM.CATALOG,xxxLOAD_*N**_DIM,99999999999999' on table 'hbase:meta' at
> >>> region=hbase:meta,,1.1588230740,
> >>> hostname=xxxx.xxxx.local,16201,1523166165622, seqNum=0
> >>> 18/04/08 05:57:49 INFO client.RpcRetryingCaller: Call exception,
> >>> tries=11, retries=35, started=48335 ms ago, cancelled=false, msg=row
> >>> 'SYSTEM.CATALOG,xxxxxLOAD_*N**_DIM,99999999999999' on table 'hbase:meta'
> >>> at region=hbase:meta,,1.1588230740,
> >>> hostname=xxx.xxx.local,16201,1523166165622, seqNum=0
> >>>
> >>
>