Yes, value 1 for "hbase.client.retries.number" is the root cause of above
exception.

General guideline/formulae could be(not official):-
(time taken for region movement in your cluster + timeout of zookeeper) /
hbase.client.pause

Or with intuition, you can set to at least 10.

On Fri, Jul 14, 2017 at 7:59 AM, Tanujit Ghosh <tanujit.gh...@gmail.com>
wrote:

> Thanks Josh for the quick reply
>
> So we are not getting the Exception transiently, but it get thrown out our
> application code wrapped in an SQLException from Phoenix layer.
>
> For now, we have written kind of a backoff retry logic which re-tries the
> query after some time again but looking for a more elegant solution to this
> problem.
>
> Does the hbase.client.retries.number parameter has any effect on this kind
> of failure scenario, in our current cluster, we have set it to 1.
>
>
>
> On Thu, Jul 13, 2017 at 4:45 PM, Josh Elser <els...@apache.org> wrote:
>
>>
>>
>> On 7/13/17 1:48 PM, Tanujit Ghosh wrote:
>>
>>> Hi All,
>>>
>>> We are facing a problem in our cluster as stated below.
>>>
>>> We have a long running java process which does various select on
>>> underlying Phoenix/HBASE table structure and return data. This process gets
>>> requests from other upstream apps and responds with results from
>>> Phoenix/HBASE.
>>>
>>> We are facing an issue here is that if any one of the HBASE region
>>> servers goes down, then we start getting a RegionNotServingException when
>>> we run a query on a table whose regions were on the region server that went
>>> down. Although now the cluster has reassigned those regions to other region
>>> servers, somehow it does not reflect in the Phoenix query layer.
>>>
>>
>> This expected to a degree. Even after Regions move on the cluster, the
>> client will not re-query a Region's location until it is not where the
>> client thinks it was (invalidates the cache of Region->RS, and re-queries
>> it from hbase:meta).
>>
>> If you see transiently this for a region after it moves, that is
>> expected. You have to do nothing -- the client automatically recovers and
>> is just informing you. However, if your client becomes "stuck" (looping
>> with NotServingRegionExceptions), that's a completely different problem and
>> would likely be an HBase bug.
>>
>> I'm not sure which case you're describing.
>>
>>
>> As per Phoenix documentation, we are creating a new PhoenixConnection
>>> object for each query and running the select statements.
>>>
>>> Has anyone faced a similar issue?
>>> Any suggestions/help in this regards will be appreciated.
>>>
>>> Thanks and Regards,
>>> Tanujit
>>>
>>
>
>
> --
> Regards,
> Tanujit
>

Reply via email to