[ 
https://issues.apache.org/jira/browse/HBASE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huaxiang sun reassigned HBASE-13850:
------------------------------------

    Assignee: huaxiang sun

> Check for dead server on CallTimeoutException
> ---------------------------------------------
>
>                 Key: HBASE-13850
>                 URL: https://issues.apache.org/jira/browse/HBASE-13850
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, MTTR
>    Affects Versions: 2.0.0, 1.2.0
>            Reporter: Matteo Bertozzi
>            Assignee: huaxiang sun
>            Priority: Minor
>         Attachments: HBASE-13850-v0.patch, TestGetPerf.java
>
>
> WARN this may be a misconf, so let me know if there is a timeout param to set.
> {noformat}
> hbase-site.xml
> zookeeper.session.timeout 10000
> hbase.regionserver.storefile.refresh.period 10000
> hbase.client.operation.timeout 5000
> hbase.client.meta.operation.timeout 5000
> hbase.client.scanner.timeout.period 10000
> hbase.regionserver.lease.period 10000
> {noformat}
> I have a test that does a kill STOP on a RS and tries to query it.
> From the conf the zk lease is 10sec, and the master is correctly doing the 
> reassign after 10sec and meta is updated.
> the client keep trying to query the RS for a specific row until it get a 
> response. The table.get(row) in the loop throws a CallTimeoutException every 
> 5sec (which is the configured settings). but instead of succeed after 2/3 
> retries (> 10sec where the master reassign) it keeps retrying up to 60sec (I 
> don't know what that 60sec is, maybe a conf param that I'm not able to find)
> one simple fix in the code is handling the CallTimeoutException in 
> RegionServerCallable and clear the meta cache for that RS that is not 
> responding. (but maybe there is already a conf to set to reduce that 60sec 
> period)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to