Hi Dominique,

Had another quick look at the error trace you provided, the problem doesn’t 
seem to be related to Kerberos authentication.
For some reason the JobManager simply isn’t reachable from the client, as 
Robert has pointed out. There should be some clue about this in the JM logs.


On 1 June 2017 at 6:08:42 PM, Robert Metzger (rmetz...@apache.org) wrote:

Can you check the logs of the JobManager? (maybe in DEBUG level), to see if you 
see something that tries to establish a connection with it?
Are you sure you are properly authenticated to access the JM?

On Tue, May 30, 2017 at 4:41 PM, Dominique Rondé <dominique.ro...@allsecur.de> 
wrote:
Hi Gordon,

we use Flink Flink 1.2.0 bundled with Hadoop 2.6 and Scala 2.11 build on 
2017-02-02.

Cheers

Dominique


Am 30.05.2017 um 16:31 schrieb Tzu-Li (Gordon) Tai:
Hi Dominique,

Could you tell us the version / build commit of Flink that you’re using?

Cheers,
Gordon


On 30 May 2017 at 4:29:08 PM, Dominique Rondé (dominique.ro...@allsecur.de) 
wrote:

Hi folks,

I just become into the need to bring Flink into a yarn system, that is 
configured with kerberos. According to the documentation, I changed the 
flink.conf.yaml like that:

security.kerberos.login.use-ticket-cache: true
security.kerberos.login.contexts: Client

I know that providing a keytab is the prefered, but I have to do a special 
request to receive one. ;-)

After startup, the provisionent is stopped by this error:

2017-05-30 16:16:48,684 INFO  org.apache.flink.yarn.YarnClusterClient           
            - Waiting until all TaskManagers have connected
Waiting until all TaskManagers have connected
2017-05-30 16:16:48,685 INFO  org.apache.flink.yarn.YarnClusterClient           
            - Starting client actor system.
2017-05-30 16:16:52,099 WARN  org.apache.flink.runtime.net.ConnectionUtils      
            - Could not connect to lfrar255.srv.allianz/10.17.24.162:56659. 
Selecting a local address using heuristics.
2017-05-30 16:16:52,473 INFO  akka.event.slf4j.Slf4jLogger                      
            - Slf4jLogger started
2017-05-30 16:16:52,512 INFO  Remoting                                          
            - Starting remoting
2017-05-30 16:16:52,670 INFO  Remoting                                          
            - Remoting started; listening on addresses 
:[akka.tcp://fl...@sla09037.srv.allianz:34579]
Exception in thread "main" java.lang.RuntimeException: Unable to get 
ClusterClient status from Application Client
        at 
org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:248)
        at 
org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:520)
        at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:660)
        at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:476)
        at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:473)
        at 
org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
        at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
        at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:473)
Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: 
Could not retrieve the leader gateway
        at 
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:141)
        at 
org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:691)
        at 
org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)
        ... 10 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
[10000 milliseconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
        at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:190)
        at scala.concurrent.Await.result(package.scala)
        at 
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:139)
        ... 12 more
2017-05-30 16:17:02,690 INFO  org.apache.flink.yarn.YarnClusterClient           
            - Shutting down YarnClusterClient from the client shutdown hook
2017-05-30 16:17:02,691 INFO  org.apache.flink.yarn.YarnClusterClient           
            - Disconnecting YarnClusterClient from ApplicationMaster
2017-05-30 16:17:03,693 INFO  
akka.remote.RemoteActorRefProvider$RemotingTerminator         - Shutting down 
remote daemon.
2017-05-30 16:17:03,696 INFO  
akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remote daemon 
shut down; proceeding with flushing remote transports.
2017-05-30 16:17:03,744 INFO  
akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remoting shut 
down.
 
Has anyone an idea what is going wrong?

Best wished

Dominique



Reply via email to