[ https://issues.apache.org/jira/browse/IMPALA-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Ho resolved IMPALA-5537. -------------------------------- Resolution: Fixed Fix Version/s: Impala 2.10.0 https://github.com/apache/incubator-impala/commit/23565166886d7e7b33b477e391c3e6b658a80b32 IMPALA-5537: Retry RPC on somes exceptions with SSL connection After the fix for IMPALA-5388, all TSSLException thrown will be treated as fatal error and the query will fail. Turns out that this is too strict and in a secure cluster under load, queries can easily hit timeout waiting for RPC response. When running without SSL, we call RetryRpcRecv() to retry the recv part of an RPC if the TSocket underlying the RPC gets an EAGAIN during recv(). This change extends that logic to cover secure connection. In particular, we pattern match against the exception string "SSL_read: Resource temporarily unavailable" which corresponds to EAGAIN error code being thrown in the SSL_read() path. Similarly, we will handle closed connection in send() path with secure connection by pattern matching against the exception string "TTransportException: Transport not open". To verify that the exception is thrown during the send part of a RPC call, the RPC client interface has been augmented to take a bool* argument which is set to true after the send part of the RPC has completed but before the recv part starts. If DoRPC() catches an exception and the send part isn't done yet, the entire RPC if the exception string matches certain substrings which are safe to retry. The fault injection utility has also been updated to distinguish between time out and lost connection to exercise different error handling paths in the send and recv paths. Change-Id: I8243d4cac93c453e9396b0e24f41e147c8637b8c Reviewed-on: http://gerrit.cloudera.org:8080/7229 Reviewed-by: Dan Hecht <dhe...@cloudera.com> Tested-by: Impala Public Jenkins > Impala does not retry RPCs that fail in SSL_read() > -------------------------------------------------- > > Key: IMPALA-5537 > URL: https://issues.apache.org/jira/browse/IMPALA-5537 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 2.9.0, Impala 2.10.0 > Reporter: Lars Volker > Assignee: Michael Ho > Priority: Blocker > Fix For: Impala 2.10.0 > > > IMPALA-5388 changed the RPC retry logic to be much less aggressive. This > increased the probability of failing queries under load when using SSL. We > should consider retrying RPCs if the underlying socket throws an exception > during SSL_read(). -- This message was sent by Atlassian JIRA (v6.4.14#64029)