To my knowledge the various RPC clients take care of renewal (whether
reactively or using a renewal thread).  Some examples:
https://github.com/apache/hadoop/blob/release-2.7.3-RC2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L638
https://github.com/apache/kafka/blob/0.10.2/clients/src/main/java/org/apache/kafka/common/security/kerberos/KerberosLogin.java#L139

So I don't think Flink needs a renewal thread but the overall situation is
complex.  Some stack traces and logs may be needed to understand the issue.

Eron

On Thu, Dec 14, 2017 at 8:17 AM, Oleksandr Nitavskyi <o.nitavs...@criteo.com
> wrote:

> Hello all,
>
>
>
> I have a question about Kerberos authentication in Yarn environment for
> long running streaming job. According to the documentation (
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/security-
> kerberos.html#yarnmesos-mode ) Flink’s solution is to use keytab in order
> to perform authentication in YARN perimeter.
>
>
>
> If keytab is configured, Flink uses
> *UserGroupInformation#loginUserFromKeytab* method in order to perform
> authentication. In the YARN Security documentation (
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-
> project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/
> YarnApplicationSecurity.md#keytabs-for-am-and-containers-
> distributed-via-yarn ) mentioned that it should be enough:
>
>
>
> *Launched containers must themselves log in
> via UserGroupInformation.loginUserFromKeytab(). UGI handles the login, and
> schedules a background thread to relogin the user periodically.*
>
>
>
> But in reality if we check the Source code of UGI, we can see that no
> background Thread is created: https://github.com/apache/
> hadoop/blob/trunk/hadoop-common-project/hadoop-common/
> src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1153.
> There are just created javax.security.auth.login.LoginContext
>
> and performed authentication. Looks like it is true for different Hadoop
> branches - 2.7, 2.8, 3.0, trunk. So Flink also doesn’t create any
> background Threads: https://github.com/apache/flink/blob/master/flink-
> runtime/src/main/java/org/apache/flink/runtime/security/
> modules/HadoopModule.java#L69. So in my case job loses credentials for
> ResourceManager and HDFS after some time (12 hours in my case).
>
>
>
> Looks like UGI’s code is not aligned with the documentation and it
> doesn’t relogin periodically.
>
> But do you think patching with background Thread which performs
> UGI#reloginUserFromKeytab can be a solution?
>
>
>
> P.S. We are running Flink as a single job on Yarn.
>
>
>
>
>

Reply via email to