Hi Thomas,

Nils (CC) and I found out that you need at least Hadoop version 2.6.1
to properly run Kerberos applications on Hadoop clusters. Versions
before that have critical bugs related to the internal security token
handling that may expire the token although it is still valid.

That said, there is another limitation of Hadoop that the maximum
internal token life time is one week. To work around this limit, you
have two options:

a) increasing the maximum token life time

In yarn-site.xml:

<property>
  <name>yarn.resourcemanager.delegation.token.max-lifetime</name>
  <value>9223372036854775807</value>
</property>

In hdfs-site.xml

<property>
  <name>dfs.namenode.delegation.token.max-lifetime</name>
  <value>9223372036854775807</value>
</property>


b) setup the Yarn ResourceManager as a proxy for the HDFS Namenode:

>From 
>http://www.cloudera.com/documentation/enterprise/5-3-x/topics/cm_sg_yarn_long_jobs.html

"You can work around this by configuring the ResourceManager as a
proxy user for the corresponding HDFS NameNode so that the
ResourceManager can request new tokens when the existing ones are past
their maximum lifetime."

@Nils: Could you comment on what worked best for you?

Best,
Max


On Mon, Mar 14, 2016 at 12:24 PM, Thomas Lamirault
<thomas.lamira...@ericsson.com> wrote:
>
> Hello everyone,
>
>
>
> We are facing the same probleme now in our Flink applications, launch using 
> YARN.
>
> Just want to know if there is any update about this exception ?
>
>
>
> Thanks
>
>
>
> Thomas
>
>
>
> ________________________________
>
> De : ni...@basj.es [ni...@basj.es] de la part de Niels Basjes 
> [ni...@basjes.nl]
> Envoyé : vendredi 4 décembre 2015 10:40
> À : user@flink.apache.org
> Objet : Re: Flink job on secure Yarn fails after many hours
>
> Hi Maximilian,
>
> I just downloaded the version from your google drive and used that to run my 
> test topology that accesses HBase.
> I deliberately started it twice to double the chance to run into this 
> situation.
>
> I'll keep you posted.
>
> Niels
>
>
> On Thu, Dec 3, 2015 at 11:44 AM, Maximilian Michels <m...@apache.org> wrote:
>>
>> Hi Niels,
>>
>> Just got back from our CI. The build above would fail with a
>> Checkstyle error. I corrected that. Also I have built the binaries for
>> your Hadoop version 2.6.0.
>>
>> Binaries:
>>
>> https://github.com/mxm/flink/archive/kerberos-yarn-heartbeat-fail-0.10.1.zip
>>
>> Thanks,
>> Max
>>
>> On Wed, Dec 2, 2015 at 6:52 PM, Maximilian Michels <0.0.0.0:41281
>> >>>> >> >> > 21:30:28,185 ERROR 
>> >>>> >> >> > org.apache.flink.runtime.jobmanager.JobManager
>> >>>> >> >> > - Actor akka://flink/user/jobmanager#403236912 terminated,
>> >>>> >> >> > stopping
>> >>>> >> >> > process...
>> >>>> >> >> > 21:30:28,286 INFO
>> >>>> >> >> > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor
>> >>>> >> >> > - Removing web root dir
>> >>>> >> >> > /tmp/flink-web-e1a44f94-ea6d-40ee-b87c-e3122d5cb9bd
>> >>>> >> >> >
>> >>>> >> >> >
>> >>>> >> >> > --
>> >>>> >> >> > Best regards / Met vriendelijke groeten,
>> >>>> >> >> >
>> >>>> >> >> > Niels Basjes
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > --
>> >>>> >> > Best regards / Met vriendelijke groeten,
>> >>>> >> >
>> >>>> >> > Niels Basjes
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > Best regards / Met vriendelijke groeten,
>> >>>> >
>> >>>> > Niels Basjes
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best regards / Met vriendelijke groeten,
>> >>>
>> >>> Niels Basjes
>
>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes

Reply via email to