[ https://issues.apache.org/jira/browse/HBASE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
huaxiang sun reassigned HBASE-13850: ------------------------------------ Assignee: huaxiang sun > Check for dead server on CallTimeoutException > --------------------------------------------- > > Key: HBASE-13850 > URL: https://issues.apache.org/jira/browse/HBASE-13850 > Project: HBase > Issue Type: Improvement > Components: Client, MTTR > Affects Versions: 2.0.0, 1.2.0 > Reporter: Matteo Bertozzi > Assignee: huaxiang sun > Priority: Minor > Attachments: HBASE-13850-v0.patch, TestGetPerf.java > > > WARN this may be a misconf, so let me know if there is a timeout param to set. > {noformat} > hbase-site.xml > zookeeper.session.timeout 10000 > hbase.regionserver.storefile.refresh.period 10000 > hbase.client.operation.timeout 5000 > hbase.client.meta.operation.timeout 5000 > hbase.client.scanner.timeout.period 10000 > hbase.regionserver.lease.period 10000 > {noformat} > I have a test that does a kill STOP on a RS and tries to query it. > From the conf the zk lease is 10sec, and the master is correctly doing the > reassign after 10sec and meta is updated. > the client keep trying to query the RS for a specific row until it get a > response. The table.get(row) in the loop throws a CallTimeoutException every > 5sec (which is the configured settings). but instead of succeed after 2/3 > retries (> 10sec where the master reassign) it keeps retrying up to 60sec (I > don't know what that 60sec is, maybe a conf param that I'm not able to find) > one simple fix in the code is handling the CallTimeoutException in > RegionServerCallable and clear the meta cache for that RS that is not > responding. (but maybe there is already a conf to set to reduce that 60sec > period) -- This message was sent by Atlassian JIRA (v6.4.14#64029)