Update on the status so far.... I suspect I found a problem in a secure
setup.

I have created a very simple Flink topology consisting of a streaming
Source (the outputs the timestamp a few times per second) and a Sink (that
puts that timestamp into a single record in HBase).
Running this on a non-secure Yarn cluster works fine.

To run it on a secured Yarn cluster my main routine now looks like this:

public static void main(String[] args) throws Exception {
    System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
    UserGroupInformation.loginUserFromKeytab("nbas...@xxxxxx.net",
"/home/nbasjes/.krb/nbasjes.keytab");

    final StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
    env.setParallelism(1);

    DataStream<String> stream = env.addSource(new TimerTicksSource());
    stream.addSink(new SetHBaseRowSink());
    env.execute("Long running Flink application");
}

When I run this
     flink run -m yarn-cluster -yn 1 -yjm 1024 -ytm 4096
./kerberos-1.0-SNAPSHOT.jar

I see after the startup messages:

17:13:24,466 INFO  org.apache.hadoop.security.UserGroupInformation
      - Login successful for user nbas...@xxxxxx.net using keytab file
/home/nbasjes/.krb/nbasjes.keytab
11/03/2015 17:13:25 Job execution switched to status RUNNING.
11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to SCHEDULED
11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to DEPLOYING
11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to RUNNING

Which looks good.

However ... no data goes into HBase.
After some digging I found this error in the task managers log:

17:13:42,677 WARN  org.apache.hadoop.hbase.ipc.RpcClient
          - Exception encountered while connecting to the server :
javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed
to find any Kerberos tgt)]
17:13:42,677 FATAL org.apache.hadoop.hbase.ipc.RpcClient
          - SASL authentication failed. The most likely cause is
missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed
to find any Kerberos tgt)]
        at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
        at 
org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:177)
        at 
org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:815)
        at 
org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349)


First starting a yarn-session and then loading my job gives the same error.

My best guess at this point is that Flink needs the same fix as described
here:

https://issues.apache.org/jira/browse/SPARK-6918   (
https://github.com/apache/spark/pull/5586 )

What do you guys think?

Niels Basjes



On Tue, Oct 27, 2015 at 6:12 PM, Maximilian Michels <m...@apache.org> wrote:

> Hi Niels,
>
> You're welcome. Some more information on how this would be configured:
>
> In the kdc.conf, there are two variables:
>
>         max_life = 2h 0m 0s
>         max_renewable_life = 7d 0h 0m 0s
>
> max_life is the maximum life of the current ticket. However, it may be
> renewed up to a time span of max_renewable_life from the first ticket issue
> on. This means that from the first ticket issue, new tickets may be
> requested for one week. Each renewed ticket has a life time of max_life (2
> hours in this case).
>
> Please let us know about any difficulties with long-running streaming
> application and Kerberos.
>
> Best regards,
> Max
>
> On Tue, Oct 27, 2015 at 2:46 PM, Niels Basjes <ni...@basjes.nl> wrote:
>
>> Hi,
>>
>> Thanks for your feedback.
>> So I guess I'll have to talk to the security guys about having special
>> kerberos ticket expiry times for these types of jobs.
>>
>> Niels Basjes
>>
>> On Fri, Oct 23, 2015 at 11:45 AM, Maximilian Michels <m...@apache.org>
>> wrote:
>>
>>> Hi Niels,
>>>
>>> Thank you for your question. Flink relies entirely on the Kerberos
>>> support of Hadoop. So your question could also be rephrased to "Does
>>> Hadoop support long-term authentication using Kerberos?". And the
>>> answer is: Yes!
>>>
>>> While Hadoop uses Kerberos tickets to authenticate users with services
>>> initially, the authentication process continues differently
>>> afterwards. Instead of saving the ticket to authenticate on a later
>>> access, Hadoop creates its own security tockens (DelegationToken) that
>>> it passes around. These are authenticated to Kerberos periodically. To
>>> my knowledge, the tokens have a life span identical to the Kerberos
>>> ticket maximum life span. So be sure to set the maximum life span very
>>> high for long streaming jobs. The renewal time, on the other hand, is
>>> not important because Hadoop abstracts this away using its own
>>> security tockens.
>>>
>>> I'm afraid there is not Kerberos how-to yet. If you are on Yarn, then
>>> it is sufficient to authenticate the client with Kerberos. On a Flink
>>> standalone cluster you need to ensure that, initially, all nodes are
>>> authenticated with Kerberos using the kinit tool.
>>>
>>> Feel free to ask if you have more questions and let us know about any
>>> difficulties.
>>>
>>> Best regards,
>>> Max
>>>
>>>
>>>
>>> On Thu, Oct 22, 2015 at 2:06 PM, Niels Basjes <ni...@basjes.nl> wrote:
>>> > Hi,
>>> >
>>> > I want to write a long running (i.e. never stop it) streaming flink
>>> > application on a kerberos secured Hadoop/Yarn cluster. My application
>>> needs
>>> > to do things with files on HDFS and HBase tables on that cluster so
>>> having
>>> > the correct kerberos tickets is very important. The stream is to be
>>> ingested
>>> > from Kafka.
>>> >
>>> > One of the things with Kerberos is that the tickets expire after a
>>> > predetermined time. My knowledge about kerberos is very limited so I
>>> hope
>>> > you guys can help me.
>>> >
>>> > My question is actually quite simple: Is there an howto somewhere on
>>> how to
>>> > correctly run a long running flink application with kerberos that
>>> includes a
>>> > solution for the kerberos ticket timeout  ?
>>> >
>>> > Thanks
>>> >
>>> > Niels Basjes
>>>
>>
>>
>>
>> --
>> Best regards / Met vriendelijke groeten,
>>
>> Niels Basjes
>>
>
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Reply via email to