[ https://issues.apache.org/jira/browse/IMPALA-9359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060995#comment-17060995 ]
Norbert Luksa commented on IMPALA-9359: --------------------------------------- Looks like ASF Jira bot failed to copy the commit message, so here it is for reference: IMPALA-9359: recover from corrupt kerberos ccache This is a clean cherry-pick of KUDU-3050. The original commit message is below. KUDU-3050: recover from corrupt kerberos ccache This handles two failure modes: * krb5_cc_start_seq_get() can fail if the kerberos credential cache gets corrupted on disk, e.g. is truncated. * the renewal can fail to find a credential in the credential cache, either if it is missing or the renewal thread hits an error while reading through credentials. Also add some additional logging and limit the max backoff time to make it easier to debug other kinds of renewal errors. The test triggers a pre-existing memory leak bug in some older Kerberos libraries. Added a suppression for leak sanitizer to ClientNegotiation::CheckGSSAPI() to suppress it. Test: Add a test that exercises the recovery logic after truncating the credential cache. The test failed before this change. Change-Id: I86567f16816d1c6729679398ce56296744cb30c9 Reviewed-on: http://gerrit.cloudera.org:8080/15407 Reviewed-by: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Recover gracefully from corrupt kerberos credential cache > --------------------------------------------------------- > > Key: IMPALA-9359 > URL: https://issues.apache.org/jira/browse/IMPALA-9359 > Project: IMPALA > Issue Type: Improvement > Components: Security > Affects Versions: Impala 3.3.0 > Reporter: Tim Armstrong > Assignee: Tim Armstrong > Priority: Major > Labels: kerberos > Fix For: Impala 3.4.0 > > > # Start up a kerberized Impala cluster > # Corrupt the kerberos ticket cache used by impala /tmp/krb5cc_impala_internal > # Observe queries fail. The details depend a lot on timing, etc. I have seen > communication failures between impalads and with other systems, e.g. HDFS. > # The system will stay wedge in this state indefinitely > We have seen this happen once in production from /tmp filling up. > I prototyped a fix that amounts to re-running Kinit() to blow away the broken > credential cache. It needs more work to be production-ready -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org