Hi Howard,

We run Flink 1.2 in Yarn without issues. Sorry I don't have any specific
solution, but are you sure you don't have some sort of Flink mix? In your
logs I can see:

The configuration directory ('/home/software/flink-1.1.4/conf') contains
both LOG4J and Logback configuration files. Please delete or rename one of
them.

Where it mentions 1.1.4 in the folder for the conf dir instead of 1.2.

Cheers,

Bruno

On Fri, 17 Feb 2017 at 08:50 Howard,Li(vip.com) <howard...@vipshop.com>
wrote:

> Hi,
>
>          I’m trying to run flink on yarn by using command: bin/flink run
> -m yarn-cluster -yn 2 -ys 4 ./examples/batch/WordCount.jar
>
>          But I got the following error:
>
>
>
> 2017-02-17 15:52:40,746 INFO
> org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for
> the flink jar passed. Using the location of class
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
>
> 2017-02-17 15:52:40,746 INFO
> org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for
> the flink jar passed. Using the location of class
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
>
> 2017-02-17 15:52:40,775 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - Using
> values:
>
> 2017-02-17 15:52:40,775 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   -
> TaskManager count = 2
>
> 2017-02-17 15:52:40,775 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   -
> JobManager memory = 1024
>
> 2017-02-17 15:52:40,775 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   -
> TaskManager memory = 1024
>
> 2017-02-17 15:52:40,796 INFO
> org.apache.hadoop.yarn.client.RMProxy                         - Connecting
> to ResourceManager at /0.0.0.0:8032
>
> 2017-02-17 15:52:41,680 WARN
> org.apache.flink.yarn.YarnClusterDescriptor                   - The
> configuration directory ('/home/software/flink-1.1.4/conf') contains both
> LOG4J and Logback configuration files. Please delete or rename one of them.
>
> 2017-02-17 15:52:41,702 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/home/software/flink-1.1.4/conf/logback.xml to hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/logback.xml
>
> 2017-02-17 15:52:42,025 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/home/software/flink-1.1.4/lib to hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/lib
>
> 2017-02-17 15:52:42,695 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/home/software/flink-1.1.4/conf/log4j.properties to hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/log4j.properties
>
> 2017-02-17 15:52:42,722 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from file:/home/software/flink-1.1.4/lib/flink-dist_2.10-1.1.4.jar to
> hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-dist_2.10-1.1.4.jar
>
> 2017-02-17 15:52:43,346 INFO
> org.apache.flink.yarn.Utils                                   - Copying
> from /home/software/flink-1.1.4/conf/flink-conf.yaml to hdfs://
> 10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-conf.yaml
>
> 2017-02-17 15:52:43,386 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting
> application master application_1487247313588_0017
>
> 2017-02-17 15:52:43,425 INFO
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted
> application application_1487247313588_0017
>
> 2017-02-17 15:52:43,425 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for
> the cluster to be allocated
>
> 2017-02-17 15:52:43,427 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying
> cluster, current state ACCEPTED
>
> 2017-02-17 15:52:48,471 INFO
> org.apache.flink.yarn.YarnClusterDescriptor                   - YARN
> application has been deployed successfully.
>
> Cluster started: Yarn cluster with application id
> application_1487247313588_0017
>
> Using address 10.199.202.162:43809 to connect to JobManager.
>
> JobManager web interface address
> http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0017/
>
> Using the parallelism provided by the remote cluster (8). To use another
> parallelism, set it at the ./bin/flink client.
>
> Starting execution of program
>
> 2017-02-17 15:52:49,278 INFO
> org.apache.flink.yarn.YarnClusterClient                       - Starting
> program in interactive mode
>
> Executing WordCount example with default input data set.
>
> Use --input to specify file input.
>
> Printing result to stdout. Use --output to specify output path.
>
> 2017-02-17 15:52:49,609 INFO
> org.apache.flink.yarn.YarnClusterClient                       - Waiting
> until all TaskManagers have connected
>
> Waiting until all TaskManagers have connected
>
> 2017-02-17 15:52:49,610 INFO  org.apache.flink.yarn.YarnClusterClient
>                       - Starting client actor system.
>
>
>
> ------------------------------------------------------------
>
> The program finished with the following exception:
>
>
>
> org.apache.flink.client.program.ProgramInvocationException: The main
> method caused an error.
>
>      at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)
>
>      at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:404)
>
>      at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:321)
>
>      at
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:777)
>
>      at org.apache.flink.client.CliFrontend.run(CliFrontend.java:253)
>
>      at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1005)
>
>      at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1048)
>
> Caused by: java.lang.RuntimeException: Unable to get ClusterClient status
> from Application Client
>
>      at
> org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)
>
>      at
> org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:514)
>
>      at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:395)
>
>      at
> org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:204)
>
>      at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:383)
>
>      at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:370)
>
>      at
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
>
>      at
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:896)
>
>      at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>
>      at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
>
>      at
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)
>
>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>      at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>      at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>      at java.lang.reflect.Method.invoke(Method.java:498)
>
>      at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:510)
>
>      ... 6 more
>
> Caused by:
> org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could
> not retrieve the leader gateway
>
>      at
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:127)
>
>      at
> org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:645)
>
>      at
> org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:237)
>
>      ... 21 more
>
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after
> [10000 milliseconds]
>
>      at
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>
>      at
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>
>      at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>
>      at
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>
>      at scala.concurrent.Await$.result(package.scala:107)
>
>      at scala.concurrent.Await.result(package.scala)
>
>      at
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:125)
>
>      ... 23 more
>
> 2017-02-17 15:53:20,084 INFO
> org.apache.flink.yarn.YarnClusterClient                       - Sending
> shutdown request to the Application Master
>
> 2017-02-17 15:53:20,085 INFO
> org.apache.flink.yarn.YarnClusterClient                       - Start
> application client.
>
> 2017-02-17 15:53:20,088 WARN
> org.apache.flink.yarn.YarnClusterClient                       - YARN
> reported application state FAILED
>
> 2017-02-17 15:53:20,089 WARN
> org.apache.flink.yarn.YarnClusterClient                       -
> Diagnostics: Application application_1487247313588_0017 failed 1 times due
> to AM Container for appattempt_1487247313588_0017_000001 exited with
> exitCode: -103
>
> For more detailed output, check application tracking page:
> http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0017Then,
> click on links to logs of each attempt.
>
> Diagnostics: Container
> [pid=18733,containerID=container_1487247313588_0017_01_000001] is running
> beyond virtual memory limits. Current usage: 264.7 MB of 1 GB physical
> memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.
>
> Dump of the process-tree for container_1487247313588_0017_01_000001 :
>
>      |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>
>      |- 18740 18733 18733 18733 (java) 955 64 2298933248 67430
> /home/software/jdk1.8.0_111/bin/java -Xmx424M
> -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.log
> -Dlogback.configurationFile=file:logback.xml
> -Dlog4j.configuration=file:log4j.properties
> org.apache.flink.yarn.YarnApplicationMasterRunner
>
>      |- 18733 18731 18733 18733 (bash) 0 0 108605440 334 /bin/bash -c
> /home/software/jdk1.8.0_111/bin/java -Xmx424M
> -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.log
> -Dlogback.configurationFile=file:logback.xml
> -Dlog4j.configuration=file:log4j.properties
> org.apache.flink.yarn.YarnApplicationMasterRunner
> 1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.out
> 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.err
>
>
>
>
> Container killed on request. Exit code is 143
>
> Container exited with a non-zero exit code 143
>
> Failing this attempt. Failing the application.
>
> 2017-02-17 15:53:20,102 INFO
> org.apache.flink.yarn.ApplicationClient                       -
> Notification about new leader address akka.tcp://
> flink@10.199.202.162:43809/user/jobmanager with session ID null.
>
> 2017-02-17 15:53:20,106 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:53:20,107 INFO
> org.apache.flink.yarn.ApplicationClient                       - Received
> address of new leader akka.tcp://
> flink@10.199.202.162:43809/user/jobmanager with session ID null.
>
> 2017-02-17 15:53:20,108 INFO
> org.apache.flink.yarn.ApplicationClient                       - Disconnect
> from JobManager null.
>
> 2017-02-17 15:53:20,112 INFO
> org.apache.flink.yarn.ApplicationClient                       - Trying to
> register at JobManager akka.tcp://
> flink@10.199.202.162:43809/user/jobmanager.
>
> Listening for transport dt_socket at address: 5006
>
> 2017-02-17 15:53:20,624 INFO
> org.apache.flink.yarn.ApplicationClient                       - Trying to
> register at JobManager akka.tcp://
> flink@10.199.202.162:43809/user/jobmanager.
>
> 2017-02-17 15:53:21,124 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:53:21,645 INFO
> org.apache.flink.yarn.ApplicationClient                       - Trying to
> register at JobManager akka.tcp://
> flink@10.199.202.162:43809/user/jobmanager.
>
> 2017-02-17 15:53:22,145 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:53:23,165 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:53:23,664 INFO
> org.apache.flink.yarn.ApplicationClient                       - Trying to
> register at JobManager akka.tcp://
> flink@10.199.202.162:43809/user/jobmanager.
>
> 2017-02-17 15:53:24,185 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
> 2017-02-17 15:53:25,204 INFO
> org.apache.flink.yarn.ApplicationClient                       - Sending
> StopCluster request to JobManager.
>
>
>
> The main error is :
> org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could
> not retrieve the leader gateway。May be It have some relationship with
> https://issues.apache.org/jira/browse/FLINK-2821. It is said that IP will
> always take place in akka address, but not hostnames. But I find hostname
> in akka address in leaderRetrievalService.
>
>
>
> This problem won’t appear in 1.1.4.
>
>
>
> Thank you all.
>
>
>
> Howard
> 本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作!
> This communication is intended only for the addressee(s) and may contain
> information that is privileged and confidential. You are hereby notified
> that, if you are not an intended recipient listed above, or an authorized
> employee or agent of an addressee of this communication responsible for
> delivering e-mail messages to an intended recipient, any dissemination,
> distribution or reproduction of this communication (including any
> attachments hereto) is strictly prohibited. If you have received this
> communication in error, please notify us immediately by a reply e-mail
> addressed to the sender and permanently delete the original e-mail
> communication and any attachments from all storage devices without making
> or otherwise retaining a copy.
>

Reply via email to