[ 
https://issues.apache.org/jira/browse/HBASE-27768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault reassigned HBASE-27768:
-----------------------------------------

    Assignee: Bryan Beaudreault

> Race conditions in BlockingRpcConnection
> ----------------------------------------
>
>                 Key: HBASE-27768
>                 URL: https://issues.apache.org/jira/browse/HBASE-27768
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>
> We've been experiencing strange timeouts since upgrading to hbase2 client. We 
> use BlockingRpcConnection for now until we migrate our auth stack to native 
> TLS. In diagnosing the timeouts, I noticed a few issues in this class:
>  # Most importantly, there is a race condition which can result in a case 
> where a BlockingRpcConnection instance has 2 reader threads running. In this 
> case, both are competing for the socket and it causes weird timeouts and in 
> some cases corrupted response (i.e. InvalidProtocolBufferException)
>  # The waitForWork loop does not properly handle interruption. When it gets 
> interrupted, if the above race condition occurs, the waitForWork loop ends up 
> forever being in a tight loop. The "wait()" call instantly throws 
> InterruptedException, and we set interrupted state back and restart the loop. 
> So no waiting is occurring anymore.
> The race condition is somewhat rare, only occurring in certain failure 
> scenarios on our highest volume clients. But when it happens, a low level of 
> errors will forever be thrown for the affected server connection until the 
> client is bounced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to