Hi Dominique, I’m not exactly sure but this looks more like a Hadoop or a Hadoop configuration problem to me. Could it be that the Hadoop version you’re running does not support the specification of multiple KMS servers via kms://ht...@lfrarxxx1.srv.company;lfrarXXX2.srv.company:16000/kms?
Cheers, Till On Thu, May 11, 2017 at 4:06 PM, Dominique Rondé < dominique.ro...@allsecur.de> wrote: > Dear all, > > i got some trouble during the start of Flink in a Yarn-Container based > on Cloudera. I have a start script like that: > > slaxxxx:/applvg/home/flink/mvp $ cat run.sh > export FLINK_HOME_DIR=/applvg/home/flink/mvp/flink-1.2.0/ > export FLINK_JAR_DIR=/applvg/home/flink/mvp/cache > export YARN_CONF_DIR=/etc/hadoop/conf > export HADOOP_CONF_DIR=/etc/hadoop/conf > > > /applvg/home/flink/mvp/flink-1.2.0/bin/yarn-session.sh -n 4 -s 3 -st -jm > 2048 -tm 2048 -qu root.mr-spark.avp -d > > If I execute this script it looks like following: > > sla09037:/applvg/home/flink/mvp $ ./run.sh > 2017-05-11 15:13:24,541 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.rpc.address, localhost > 2017-05-11 15:13:24,542 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.rpc.port, 6123 > 2017-05-11 15:13:24,542 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.heap.mb, 256 > 2017-05-11 15:13:24,543 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: taskmanager.heap.mb, 512 > 2017-05-11 15:13:24,543 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: taskmanager.numberOfTaskSlots, 1 > 2017-05-11 15:13:24,543 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: taskmanager.memory.preallocate, false > 2017-05-11 15:13:24,543 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: parallelism.default, 1 > 2017-05-11 15:13:24,543 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.web.port, 8081 > 2017-05-11 15:13:24,571 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.rpc.address, localhost > 2017-05-11 15:13:24,572 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.rpc.port, 6123 > 2017-05-11 15:13:24,572 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.heap.mb, 256 > 2017-05-11 15:13:24,572 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: taskmanager.heap.mb, 512 > 2017-05-11 15:13:24,572 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: taskmanager.numberOfTaskSlots, 1 > 2017-05-11 15:13:24,572 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: taskmanager.memory.preallocate, false > 2017-05-11 15:13:24,572 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: parallelism.default, 1 > 2017-05-11 15:13:24,572 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.web.port, 8081 > 2017-05-11 15:13:25,000 INFO > org.apache.flink.runtime.security.modules.HadoopModule - Hadoop > user set to fl...@companyde.rootdom.net (auth:KERBEROS) > 2017-05-11 15:13:25,030 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.rpc.address, localhost > 2017-05-11 15:13:25,030 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.rpc.port, 6123 > 2017-05-11 15:13:25,030 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.heap.mb, 256 > 2017-05-11 15:13:25,030 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: taskmanager.heap.mb, 512 > 2017-05-11 15:13:25,031 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: taskmanager.numberOfTaskSlots, 1 > 2017-05-11 15:13:25,031 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: taskmanager.memory.preallocate, false > 2017-05-11 15:13:25,031 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: parallelism.default, 1 > 2017-05-11 15:13:25,031 INFO > org.apache.flink.configuration.GlobalConfiguration - Loading > configuration property: jobmanager.web.port, 8081 > 2017-05-11 15:13:25,050 INFO > org.apache.flink.yarn.YarnClusterDescriptor - Using > values: > 2017-05-11 15:13:25,051 INFO > org.apache.flink.yarn.YarnClusterDescriptor - > TaskManager count = 4 > 2017-05-11 15:13:25,051 INFO > org.apache.flink.yarn.YarnClusterDescriptor - > JobManager memory = 2048 > 2017-05-11 15:13:25,051 INFO > org.apache.flink.yarn.YarnClusterDescriptor - > TaskManager memory = 2048 > 2017-05-11 15:13:25,903 WARN > org.apache.hadoop.util.NativeCodeLoader - Unable > to load native-hadoop library for your platform... using builtin-java > classes where applicable > 2017-05-11 15:13:25,962 WARN > org.apache.flink.yarn.YarnClusterDescriptor - The > configuration directory ('/applvg/home/flink/mvp/flink-1.2.0/conf') > contains both LOG4J and Logback configuration files. Please delete or > rename one of them. > 2017-05-11 15:13:25,972 INFO > org.apache.flink.yarn.Utils - Copying > from file:/applvg/home/flink/mvp/flink-1.2.0/lib to > hdfs://nameservice1/user/flink/.flink/application_1493762518335_0216/lib > 2017-05-11 15:13:27,522 INFO > org.apache.flink.yarn.Utils - Copying > from file:/applvg/home/flink/mvp/flink-1.2.0/conf/log4j.properties to > hdfs://nameservice1/user/flink/.flink/application_ > 1493762518335_0216/log4j.properties > 2017-05-11 15:13:27,552 INFO > org.apache.flink.yarn.Utils - Copying > from file:/applvg/home/flink/mvp/flink-1.2.0/conf/logback.xml to > hdfs://nameservice1/user/flink/.flink/application_ > 1493762518335_0216/logback.xml > 2017-05-11 15:13:27,584 INFO > org.apache.flink.yarn.Utils - Copying > from > file:/applvg/home/flink/mvp/flink-1.2.0/lib/flink-dist_2.11-1.2.0.jar to > hdfs://nameservice1/user/flink/.flink/application_ > 1493762518335_0216/flink-dist_2.11-1.2.0.jar > 2017-05-11 15:13:28,508 INFO > org.apache.flink.yarn.Utils - Copying > from /applvg/home/flink/mvp/flink-1.2.0/conf/flink-conf.yaml to > hdfs://nameservice1/user/flink/.flink/application_ > 1493762518335_0216/flink-conf.yaml > 2017-05-11 15:13:28,553 INFO > org.apache.flink.yarn.YarnClusterDescriptor - Adding > delegation token to the AM container.. > 2017-05-11 15:13:28,563 INFO > org.apache.hadoop.hdfs.DFSClient - Created > HDFS_DELEGATION_TOKEN token 27247 for flink on ha-hdfs:nameservice1 > Error while deploying YARN cluster: Couldn't deploy Yarn cluster > java.lang.RuntimeException: Couldn't deploy Yarn cluster > at > org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy( > AbstractYarnClusterDescriptor.java:421) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.run( > FlinkYarnSessionCli.java:620) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call( > FlinkYarnSessionCli.java:476) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call( > FlinkYarnSessionCli.java:473) > at > org.apache.flink.runtime.security.HadoopSecurityContext$1.run( > HadoopSecurityContext.java:43) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1656) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured( > HadoopSecurityContext.java:40) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.main( > FlinkYarnSessionCli.java:473) > Caused by: java.lang.IllegalArgumentException: > java.net.UnknownHostException: lfrar256.srv.company;lfrar257.srv.company > at > org.apache.hadoop.security.SecurityUtil.buildTokenService( > SecurityUtil.java:374) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider. > getDelegationTokenService(KMSClientProvider.java:823) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens( > KMSClientProvider.java:779) > at > org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExte > nsion.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86) > at > org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens( > DistributedFileSystem.java:2046) > at > org.apache.hadoop.mapreduce.security.TokenCache. > obtainTokensForNamenodesInternal(TokenCache.java:121) > at > org.apache.hadoop.mapreduce.security.TokenCache. > obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes( > TokenCache.java:80) > at org.apache.flink.yarn.Utils.setTokensFor(Utils.java:154) > at > org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal( > AbstractYarnClusterDescriptor.java:753) > at > org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy( > AbstractYarnClusterDescriptor.java:419) > ... 9 more > Caused by: java.net.UnknownHostException: > lfrarXXX1.srv.company;lfrarXXX2.srv.company > ... 20 more > > It seems that flink found these hosts here: > slaxxxxx:/applvg/home/flink/mvp $ grep -r > "lfrarXXX1.srv.company;lfrarXXX2.srv.company" /etc/hadoop/conf > /etc/hadoop/conf/core-site.xml: > <value>kms://ht...@lfrarxxx1.srv.company;lfrarXXX2.srv. > company:16000/kms</value> > /etc/hadoop/conf/hdfs-site.xml: > <value>kms://ht...@lfrarxxx1.srv.company;lfrarXXX2.srv. > company:16000/kms</value> > > So I guess that flink got this connectionstrings from the > Cloudera-Config and "forget" to split it at the ";". So if i ping each > of those everything is working. > > Maybe you have some hints to avoid this problem? > > Best wishes > Dominiuqe > >