[ 
https://issues.apache.org/jira/browse/IMPALA-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-5537.
--------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0

https://github.com/apache/incubator-impala/commit/23565166886d7e7b33b477e391c3e6b658a80b32

IMPALA-5537: Retry RPC on somes exceptions with SSL connection
After the fix for IMPALA-5388, all TSSLException thrown will be
treated as fatal error and the query will fail. Turns out that
this is too strict and in a secure cluster under load, queries
can easily hit timeout waiting for RPC response.

When running without SSL, we call RetryRpcRecv() to retry the recv
part of an RPC if the TSocket underlying the RPC gets an EAGAIN
during recv(). This change extends that logic to cover secure
connection. In particular, we pattern match against the exception
string "SSL_read: Resource temporarily unavailable" which corresponds
to EAGAIN error code being thrown in the SSL_read() path.

Similarly, we will handle closed connection in send() path with
secure connection by pattern matching against the exception string
"TTransportException: Transport not open". To verify that the exception
is thrown during the send part of a RPC call, the RPC client interface
has been augmented to take a bool* argument which is set to true after
the send part of the RPC has completed but before the recv part starts.
If DoRPC() catches an exception and the send part isn't done yet, the
entire RPC if the exception string matches certain substrings which are
safe to retry.

The fault injection utility has also been updated to distinguish between
time out and lost connection to exercise different error handling paths
in the send and recv paths.

Change-Id: I8243d4cac93c453e9396b0e24f41e147c8637b8c
Reviewed-on: http://gerrit.cloudera.org:8080/7229
Reviewed-by: Dan Hecht <dhe...@cloudera.com>
Tested-by: Impala Public Jenkins

> Impala does not retry RPCs that fail in SSL_read()
> --------------------------------------------------
>
>                 Key: IMPALA-5537
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5537
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.9.0, Impala 2.10.0
>            Reporter: Lars Volker
>            Assignee: Michael Ho
>            Priority: Blocker
>             Fix For: Impala 2.10.0
>
>
> IMPALA-5388 changed the RPC retry logic to be much less aggressive. This 
> increased the probability of failing queries under load when using SSL. We 
> should consider retrying RPCs if the underlying socket throws an exception 
> during SSL_read().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to