Hi, Excellent. What you can help me with are the commands to build the binary distribution from source. I tried it last Thursday and the build seemed to get stuck at some point (at the end of/just after building the dist module). I haven't been able to figure out why yet.
Niels On 5 Nov 2015 14:57, "Maximilian Michels" <m...@apache.org> wrote: > Thank you for looking into the problem, Niels. Let us know if you need > anything. We would be happy to merge a pull request once you have verified > the fix. > > On Thu, Nov 5, 2015 at 1:38 PM, Niels Basjes <ni...@basjes.nl> wrote: > >> I created https://issues.apache.org/jira/browse/FLINK-2977 >> >> On Thu, Nov 5, 2015 at 12:25 PM, Robert Metzger <rmetz...@apache.org> >> wrote: >> >>> Hi Niels, >>> thank you for analyzing the issue so properly. I agree with you. It >>> seems that HDFS and HBase are using their own tokes which we need to >>> transfer from the client to the YARN containers. We should be able to port >>> the fix from Spark (which they got from Storm) into our YARN client. >>> I think we would add this in org.apache.flink.yarn.Utils#setTokensFor(). >>> >>> Do you want to implement and verify the fix yourself? If you are to busy >>> at the moment, we can also discuss how we share the work (I'm implementing >>> it, you test the fix) >>> >>> >>> Robert >>> >>> On Tue, Nov 3, 2015 at 5:26 PM, Niels Basjes <ni...@basjes.nl> wrote: >>> >>>> Update on the status so far.... I suspect I found a problem in a secure >>>> setup. >>>> >>>> I have created a very simple Flink topology consisting of a streaming >>>> Source (the outputs the timestamp a few times per second) and a Sink (that >>>> puts that timestamp into a single record in HBase). >>>> Running this on a non-secure Yarn cluster works fine. >>>> >>>> To run it on a secured Yarn cluster my main routine now looks like this: >>>> >>>> public static void main(String[] args) throws Exception { >>>> System.setProperty("java.security.krb5.conf", "/etc/krb5.conf"); >>>> UserGroupInformation.loginUserFromKeytab("nbas...@xxxxxx.net", >>>> "/home/nbasjes/.krb/nbasjes.keytab"); >>>> >>>> final StreamExecutionEnvironment env = >>>> StreamExecutionEnvironment.getExecutionEnvironment(); >>>> env.setParallelism(1); >>>> >>>> DataStream<String> stream = env.addSource(new TimerTicksSource()); >>>> stream.addSink(new SetHBaseRowSink()); >>>> env.execute("Long running Flink application"); >>>> } >>>> >>>> When I run this >>>> flink run -m yarn-cluster -yn 1 -yjm 1024 -ytm 4096 >>>> ./kerberos-1.0-SNAPSHOT.jar >>>> >>>> I see after the startup messages: >>>> >>>> 17:13:24,466 INFO org.apache.hadoop.security.UserGroupInformation >>>> - Login successful for user nbas...@xxxxxx.net using keytab >>>> file /home/nbasjes/.krb/nbasjes.keytab >>>> 11/03/2015 17:13:25 Job execution switched to status RUNNING. >>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to >>>> SCHEDULED >>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to >>>> DEPLOYING >>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched to >>>> RUNNING >>>> >>>> Which looks good. >>>> >>>> However ... no data goes into HBase. >>>> After some digging I found this error in the task managers log: >>>> >>>> 17:13:42,677 WARN org.apache.hadoop.hbase.ipc.RpcClient >>>> - Exception encountered while connecting to the server : >>>> javax.security.sasl.SaslException: GSS initiate failed [Caused by >>>> GSSException: No valid credentials provided (Mechanism level: Failed to >>>> find any Kerberos tgt)] >>>> 17:13:42,677 FATAL org.apache.hadoop.hbase.ipc.RpcClient >>>> - SASL authentication failed. The most likely cause is missing or >>>> invalid credentials. Consider 'kinit'. >>>> javax.security.sasl.SaslException: GSS initiate failed [Caused by >>>> GSSException: No valid credentials provided (Mechanism level: Failed to >>>> find any Kerberos tgt)] >>>> at >>>> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) >>>> at >>>> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:177) >>>> at >>>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:815) >>>> at >>>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349) >>>> >>>> >>>> First starting a yarn-session and then loading my job gives the same >>>> error. >>>> >>>> My best guess at this point is that Flink needs the same fix as >>>> described here: >>>> >>>> https://issues.apache.org/jira/browse/SPARK-6918 ( >>>> https://github.com/apache/spark/pull/5586 ) >>>> >>>> What do you guys think? >>>> >>>> Niels Basjes >>>> >>>> >>>> >>>> On Tue, Oct 27, 2015 at 6:12 PM, Maximilian Michels <m...@apache.org> >>>> wrote: >>>> >>>>> Hi Niels, >>>>> >>>>> You're welcome. Some more information on how this would be configured: >>>>> >>>>> In the kdc.conf, there are two variables: >>>>> >>>>> max_life = 2h 0m 0s >>>>> max_renewable_life = 7d 0h 0m 0s >>>>> >>>>> max_life is the maximum life of the current ticket. However, it may be >>>>> renewed up to a time span of max_renewable_life from the first ticket >>>>> issue >>>>> on. This means that from the first ticket issue, new tickets may be >>>>> requested for one week. Each renewed ticket has a life time of max_life (2 >>>>> hours in this case). >>>>> >>>>> Please let us know about any difficulties with long-running streaming >>>>> application and Kerberos. >>>>> >>>>> Best regards, >>>>> Max >>>>> >>>>> On Tue, Oct 27, 2015 at 2:46 PM, Niels Basjes <ni...@basjes.nl> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Thanks for your feedback. >>>>>> So I guess I'll have to talk to the security guys about having >>>>>> special >>>>>> kerberos ticket expiry times for these types of jobs. >>>>>> >>>>>> Niels Basjes >>>>>> >>>>>> On Fri, Oct 23, 2015 at 11:45 AM, Maximilian Michels <m...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hi Niels, >>>>>>> >>>>>>> Thank you for your question. Flink relies entirely on the Kerberos >>>>>>> support of Hadoop. So your question could also be rephrased to "Does >>>>>>> Hadoop support long-term authentication using Kerberos?". And the >>>>>>> answer is: Yes! >>>>>>> >>>>>>> While Hadoop uses Kerberos tickets to authenticate users with >>>>>>> services >>>>>>> initially, the authentication process continues differently >>>>>>> afterwards. Instead of saving the ticket to authenticate on a later >>>>>>> access, Hadoop creates its own security tockens (DelegationToken) >>>>>>> that >>>>>>> it passes around. These are authenticated to Kerberos periodically. >>>>>>> To >>>>>>> my knowledge, the tokens have a life span identical to the Kerberos >>>>>>> ticket maximum life span. So be sure to set the maximum life span >>>>>>> very >>>>>>> high for long streaming jobs. The renewal time, on the other hand, is >>>>>>> not important because Hadoop abstracts this away using its own >>>>>>> security tockens. >>>>>>> >>>>>>> I'm afraid there is not Kerberos how-to yet. If you are on Yarn, then >>>>>>> it is sufficient to authenticate the client with Kerberos. On a Flink >>>>>>> standalone cluster you need to ensure that, initially, all nodes are >>>>>>> authenticated with Kerberos using the kinit tool. >>>>>>> >>>>>>> Feel free to ask if you have more questions and let us know about any >>>>>>> difficulties. >>>>>>> >>>>>>> Best regards, >>>>>>> Max >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Oct 22, 2015 at 2:06 PM, Niels Basjes <ni...@basjes.nl> >>>>>>> wrote: >>>>>>> > Hi, >>>>>>> > >>>>>>> > I want to write a long running (i.e. never stop it) streaming flink >>>>>>> > application on a kerberos secured Hadoop/Yarn cluster. My >>>>>>> application needs >>>>>>> > to do things with files on HDFS and HBase tables on that cluster >>>>>>> so having >>>>>>> > the correct kerberos tickets is very important. The stream is to >>>>>>> be ingested >>>>>>> > from Kafka. >>>>>>> > >>>>>>> > One of the things with Kerberos is that the tickets expire after a >>>>>>> > predetermined time. My knowledge about kerberos is very limited so >>>>>>> I hope >>>>>>> > you guys can help me. >>>>>>> > >>>>>>> > My question is actually quite simple: Is there an howto somewhere >>>>>>> on how to >>>>>>> > correctly run a long running flink application with kerberos that >>>>>>> includes a >>>>>>> > solution for the kerberos ticket timeout ? >>>>>>> > >>>>>>> > Thanks >>>>>>> > >>>>>>> > Niels Basjes >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best regards / Met vriendelijke groeten, >>>>>> >>>>>> Niels Basjes >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best regards / Met vriendelijke groeten, >>>> >>>> Niels Basjes >>>> >>> >>> >> >> >> -- >> Best regards / Met vriendelijke groeten, >> >> Niels Basjes >> > >