Yes, value 1 for "hbase.client.retries.number" is the root cause of above exception.
General guideline/formulae could be(not official):- (time taken for region movement in your cluster + timeout of zookeeper) / hbase.client.pause Or with intuition, you can set to at least 10. On Fri, Jul 14, 2017 at 7:59 AM, Tanujit Ghosh <[email protected]> wrote: > Thanks Josh for the quick reply > > So we are not getting the Exception transiently, but it get thrown out our > application code wrapped in an SQLException from Phoenix layer. > > For now, we have written kind of a backoff retry logic which re-tries the > query after some time again but looking for a more elegant solution to this > problem. > > Does the hbase.client.retries.number parameter has any effect on this kind > of failure scenario, in our current cluster, we have set it to 1. > > > > On Thu, Jul 13, 2017 at 4:45 PM, Josh Elser <[email protected]> wrote: > >> >> >> On 7/13/17 1:48 PM, Tanujit Ghosh wrote: >> >>> Hi All, >>> >>> We are facing a problem in our cluster as stated below. >>> >>> We have a long running java process which does various select on >>> underlying Phoenix/HBASE table structure and return data. This process gets >>> requests from other upstream apps and responds with results from >>> Phoenix/HBASE. >>> >>> We are facing an issue here is that if any one of the HBASE region >>> servers goes down, then we start getting a RegionNotServingException when >>> we run a query on a table whose regions were on the region server that went >>> down. Although now the cluster has reassigned those regions to other region >>> servers, somehow it does not reflect in the Phoenix query layer. >>> >> >> This expected to a degree. Even after Regions move on the cluster, the >> client will not re-query a Region's location until it is not where the >> client thinks it was (invalidates the cache of Region->RS, and re-queries >> it from hbase:meta). >> >> If you see transiently this for a region after it moves, that is >> expected. You have to do nothing -- the client automatically recovers and >> is just informing you. However, if your client becomes "stuck" (looping >> with NotServingRegionExceptions), that's a completely different problem and >> would likely be an HBase bug. >> >> I'm not sure which case you're describing. >> >> >> As per Phoenix documentation, we are creating a new PhoenixConnection >>> object for each query and running the select statements. >>> >>> Has anyone faced a similar issue? >>> Any suggestions/help in this regards will be appreciated. >>> >>> Thanks and Regards, >>> Tanujit >>> >> > > > -- > Regards, > Tanujit >
