Hi all,
We've got a jar with hadoop  configuration files in it.

Previously we use blocking mode to deploy jars on YARN, they run well.
Recently we find the client process occupies more and more memory , so we
try to use detached mode, but the job failed to deploy with following error
information:

The program finished with the following exception:

org.apache.flink.client.deployment.ClusterDeploymentException: Could
not deploy Yarn job cluster.
        at 
org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:82)
        at 
org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:230)
        at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
        at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
        at 
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
        at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
Caused by: 
org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException:
The YARN application unexpectedly switched to state FAILED during
deployment.
Diagnostics from YARN: Application application_1533815330295_30183
failed 2 times due to AM Container for appattempt_xxxx exited with
exitCode: 1
For more detailed output, check application tracking
page:http:xxxxThen, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e05_xxxx
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:593)
        at org.apache.hadoop.util.Shell.run(Shell.java:490)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:784)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:298)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:324)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Shell output: main : command provided 1
main : user is streams
main : requested yarn user is user1


Then I found this email,
http://mail-archives.apache.org/mod_mbox/flink-user/201901.mbox/<tencent_0301f26148ceee21005e9...@qq.com>
, and set *yarn.per-job-cluster.include-user-jar: LAST*,  then part of our
jobs can be deployed as expected.

But for some job need to operate another hdfs, with hadoop conf files in
them, there's still problem. Job manager cannot resolve the hdfs domain
name. I guess it's because the hadoop conf file in jar is loaded instead of
the conf file in client hadoop  dir.

Is here someone can help?

Reply via email to