Hi Dominique,

I’m not exactly sure but this looks more like a Hadoop or a Hadoop
configuration problem to me. Could it be that the Hadoop version you’re
running does not support the specification of multiple KMS servers via
kms://ht...@lfrarxxx1.srv.company;lfrarXXX2.srv.company:16000/kms?

Cheers,
Till
​

On Thu, May 11, 2017 at 4:06 PM, Dominique Rondé <
dominique.ro...@allsecur.de> wrote:

> Dear all,
>
> i got some trouble during the start of Flink in a Yarn-Container based
> on Cloudera. I have a start script like that:
>
> slaxxxx:/applvg/home/flink/mvp $ cat run.sh
> export FLINK_HOME_DIR=/applvg/home/flink/mvp/flink-1.2.0/
> export FLINK_JAR_DIR=/applvg/home/flink/mvp/cache
> export YARN_CONF_DIR=/etc/hadoop/conf
> export HADOOP_CONF_DIR=/etc/hadoop/conf
>
>
> /applvg/home/flink/mvp/flink-1.2.0/bin/yarn-session.sh -n 4 -s 3 -st -jm
> 2048 -tm 2048 -qu root.mr-spark.avp -d
>
> If I execute this script it looks like following:
>
> sla09037:/applvg/home/flink/mvp $ ./run.sh
> 2017-05-11 15:13:24,541 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.rpc.address, localhost
> 2017-05-11 15:13:24,542 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.rpc.port, 6123
> 2017-05-11 15:13:24,542 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.heap.mb, 256
> 2017-05-11 15:13:24,543 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.heap.mb, 512
> 2017-05-11 15:13:24,543 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.numberOfTaskSlots, 1
> 2017-05-11 15:13:24,543 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.memory.preallocate, false
> 2017-05-11 15:13:24,543 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: parallelism.default, 1
> 2017-05-11 15:13:24,543 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.web.port, 8081
> 2017-05-11 15:13:24,571 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.rpc.address, localhost
> 2017-05-11 15:13:24,572 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.rpc.port, 6123
> 2017-05-11 15:13:24,572 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.heap.mb, 256
> 2017-05-11 15:13:24,572 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.heap.mb, 512
> 2017-05-11 15:13:24,572 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.numberOfTaskSlots, 1
> 2017-05-11 15:13:24,572 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.memory.preallocate, false
> 2017-05-11 15:13:24,572 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: parallelism.default, 1
> 2017-05-11 15:13:24,572 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.web.port, 8081
> 2017-05-11 15:13:25,000 INFO
> org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop
> user set to fl...@companyde.rootdom.net (auth:KERBEROS)
> 2017-05-11 15:13:25,030 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.rpc.address, localhost
> 2017-05-11 15:13:25,030 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.rpc.port, 6123
> 2017-05-11 15:13:25,030 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.heap.mb, 256
> 2017-05-11 15:13:25,030 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.heap.mb, 512
> 2017-05-11 15:13:25,031 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.numberOfTaskSlots, 1
> 2017-05-11 15:13:25,031 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: taskmanager.memory.preallocate, false
> 2017-05-11 15:13:25,031 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: parallelism.default, 1
> 2017-05-11 15:13:25,031 INFO
> org.apache.flink.configuration.GlobalConfiguration            - Loading
> configuration property: jobmanager.web.port, 8081
> 2017-05-11 15:13:25,050 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - Using
> values:
> 2017-05-11 15:13:25,051 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   -
> TaskManager count = 4
> 2017-05-11 15:13:25,051 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   -
> JobManager memory = 2048
> 2017-05-11 15:13:25,051 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   -
> TaskManager memory = 2048
> 2017-05-11 15:13:25,903 WARN
> org.apache.hadoop.util.NativeCodeLoader                       - Unable
> to load native-hadoop library for your platform... using builtin-java
> classes where applicable
> 2017-05-11 15:13:25,962 WARN
> org.apache.flink.yarn.YarnClusterDescriptor                   - The
> configuration directory ('/applvg/home/flink/mvp/flink-1.2.0/conf')
> contains both LOG4J and Logback configuration files. Please delete or
> rename one of them.
> 2017-05-11 15:13:25,972 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/applvg/home/flink/mvp/flink-1.2.0/lib to
> hdfs://nameservice1/user/flink/.flink/application_1493762518335_0216/lib
> 2017-05-11 15:13:27,522 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/applvg/home/flink/mvp/flink-1.2.0/conf/log4j.properties to
> hdfs://nameservice1/user/flink/.flink/application_
> 1493762518335_0216/log4j.properties
> 2017-05-11 15:13:27,552 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/applvg/home/flink/mvp/flink-1.2.0/conf/logback.xml to
> hdfs://nameservice1/user/flink/.flink/application_
> 1493762518335_0216/logback.xml
> 2017-05-11 15:13:27,584 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from
> file:/applvg/home/flink/mvp/flink-1.2.0/lib/flink-dist_2.11-1.2.0.jar to
> hdfs://nameservice1/user/flink/.flink/application_
> 1493762518335_0216/flink-dist_2.11-1.2.0.jar
> 2017-05-11 15:13:28,508 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from /applvg/home/flink/mvp/flink-1.2.0/conf/flink-conf.yaml to
> hdfs://nameservice1/user/flink/.flink/application_
> 1493762518335_0216/flink-conf.yaml
> 2017-05-11 15:13:28,553 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - Adding
> delegation token to the AM container..
> 2017-05-11 15:13:28,563 INFO
> org.apache.hadoop.hdfs.DFSClient                              - Created
> HDFS_DELEGATION_TOKEN token 27247 for flink on ha-hdfs:nameservice1
> Error while deploying YARN cluster: Couldn't deploy Yarn cluster
> java.lang.RuntimeException: Couldn't deploy Yarn cluster
>         at
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(
> AbstractYarnClusterDescriptor.java:421)
>         at
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(
> FlinkYarnSessionCli.java:620)
>         at
> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(
> FlinkYarnSessionCli.java:476)
>         at
> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(
> FlinkYarnSessionCli.java:473)
>         at
> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(
> HadoopSecurityContext.java:43)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1656)
>         at
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(
> HadoopSecurityContext.java:40)
>         at
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(
> FlinkYarnSessionCli.java:473)
> Caused by: java.lang.IllegalArgumentException:
> java.net.UnknownHostException: lfrar256.srv.company;lfrar257.srv.company
>         at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(
> SecurityUtil.java:374)
>         at
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.
> getDelegationTokenService(KMSClientProvider.java:823)
>         at
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(
> KMSClientProvider.java:779)
>         at
> org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExte
> nsion.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(
> DistributedFileSystem.java:2046)
>         at
> org.apache.hadoop.mapreduce.security.TokenCache.
> obtainTokensForNamenodesInternal(TokenCache.java:121)
>         at
> org.apache.hadoop.mapreduce.security.TokenCache.
> obtainTokensForNamenodesInternal(TokenCache.java:100)
>         at
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(
> TokenCache.java:80)
>         at org.apache.flink.yarn.Utils.setTokensFor(Utils.java:154)
>         at
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(
> AbstractYarnClusterDescriptor.java:753)
>         at
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(
> AbstractYarnClusterDescriptor.java:419)
>         ... 9 more
> Caused by: java.net.UnknownHostException:
> lfrarXXX1.srv.company;lfrarXXX2.srv.company
>         ... 20 more
>
> It seems that flink found these hosts here:
> slaxxxxx:/applvg/home/flink/mvp $ grep -r
> "lfrarXXX1.srv.company;lfrarXXX2.srv.company" /etc/hadoop/conf
> /etc/hadoop/conf/core-site.xml:
> <value>kms://ht...@lfrarxxx1.srv.company;lfrarXXX2.srv.
> company:16000/kms</value>
> /etc/hadoop/conf/hdfs-site.xml:
> <value>kms://ht...@lfrarxxx1.srv.company;lfrarXXX2.srv.
> company:16000/kms</value>
>
> So I guess that flink got this connectionstrings from the
> Cloudera-Config and "forget" to split it at the ";". So if i ping each
> of those everything is working.
>
> Maybe you have some hints to avoid this problem?
>
> Best wishes
> Dominiuqe
>
>

Reply via email to