huaxiang sun created HBASE-17889: ------------------------------------ Summary: ResultBoundedCompletionService's cancel() needs to interrupt the working thread and free it to the thread-pool Key: HBASE-17889 URL: https://issues.apache.org/jira/browse/HBASE-17889 Project: HBase Issue Type: Bug Components: Client Affects Versions: 2.0.0, 1.4.0, 1.2.6, 1.3.2 Reporter: huaxiang sun Assignee: huaxiang sun
We run into one case with read-replica, when the server hosting the primary region is shutdown, we see Get did not go to replica region and it paused for about 50 seconds before Get was resumed. More debugging finds out that when the server is down, one of the threads was stuck at the write, it holds lock at https://github.com/apache/hbase/blob/branch-1.3/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClientImpl.java#L916. The later write threads were waiting on this lock until all threads in the connection's thread pool were stuck on this lock. At that moment, no work will be done. After socket write times out, it frees up all threads and it continues. When QueueingFuture#cancel() is called, it does not interrupt the working thread and return the thread to the pool. Attaching the jstack trace. -- This message was sent by Atlassian JIRA (v6.3.15#6346)