To my knowledge the various RPC clients take care of renewal (whether reactively or using a renewal thread). Some examples: https://github.com/apache/hadoop/blob/release-2.7.3-RC2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L638 https://github.com/apache/kafka/blob/0.10.2/clients/src/main/java/org/apache/kafka/common/security/kerberos/KerberosLogin.java#L139
So I don't think Flink needs a renewal thread but the overall situation is complex. Some stack traces and logs may be needed to understand the issue. Eron On Thu, Dec 14, 2017 at 8:17 AM, Oleksandr Nitavskyi <o.nitavs...@criteo.com > wrote: > Hello all, > > > > I have a question about Kerberos authentication in Yarn environment for > long running streaming job. According to the documentation ( > https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/security- > kerberos.html#yarnmesos-mode ) Flink’s solution is to use keytab in order > to perform authentication in YARN perimeter. > > > > If keytab is configured, Flink uses > *UserGroupInformation#loginUserFromKeytab* method in order to perform > authentication. In the YARN Security documentation ( > > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn- > project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ > YarnApplicationSecurity.md#keytabs-for-am-and-containers- > distributed-via-yarn ) mentioned that it should be enough: > > > > *Launched containers must themselves log in > via UserGroupInformation.loginUserFromKeytab(). UGI handles the login, and > schedules a background thread to relogin the user periodically.* > > > > But in reality if we check the Source code of UGI, we can see that no > background Thread is created: https://github.com/apache/ > hadoop/blob/trunk/hadoop-common-project/hadoop-common/ > src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1153. > There are just created javax.security.auth.login.LoginContext > > and performed authentication. Looks like it is true for different Hadoop > branches - 2.7, 2.8, 3.0, trunk. So Flink also doesn’t create any > background Threads: https://github.com/apache/flink/blob/master/flink- > runtime/src/main/java/org/apache/flink/runtime/security/ > modules/HadoopModule.java#L69. So in my case job loses credentials for > ResourceManager and HDFS after some time (12 hours in my case). > > > > Looks like UGI’s code is not aligned with the documentation and it > doesn’t relogin periodically. > > But do you think patching with background Thread which performs > UGI#reloginUserFromKeytab can be a solution? > > > > P.S. We are running Flink as a single job on Yarn. > > > > >