[ https://issues.apache.org/jira/browse/YARN-9190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758365#comment-16758365 ]
Billie Rinaldi commented on YARN-9190: -------------------------------------- [~tangzhankun] Since I haven't seen this locally, I'll suggest some debugging tips: * Check the client output for ApiServiceClient vs. ServiceClient logs. You should see ApiServiceClient, which is the one that goes through the RM REST API. * Check the RM log for a POST: createService log by ApiServer. After that there should be ServiceClient logs describing what it's doing: using an existing tarball, uploading a new tarball for reuse (only allowed for hdfs or yarn admin), or uploading a temporary tarball just for this app. * Find the tarball the AM is using (check the AM log and/or look in the container directory) and check which jars it contains. Compare the jar list with the contents of a good tarball, such as the one uploaded with yarn app -enableFastLaunch. > [Submarine] Submarine job will fail to run as a first job on a new created > Hadoop 3.2.0 RC1 cluster > --------------------------------------------------------------------------------------------------- > > Key: YARN-9190 > URL: https://issues.apache.org/jira/browse/YARN-9190 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Zhankun Tang > Assignee: Sunil Govindan > Priority: Major > > This issue was found when verifying submarine in Hadoop 3.2.0 RC1 planning. > The reproduce steps are: > # Init a new HDFS and YARN (LinuxContainerExecutor and Docker enabled) > # Before run any other yarn service job, use yarn user to submit a submarine > job > The job will fail with below error: > > {code:java} > LogType:serviceam-err.txt > LogLastModifiedTime:Thu Jan 10 21:15:23 +0800 2019 > LogLength:86 > LogContents: > Error: Could not find or load main class > org.apache.hadoop.yarn.service.ServiceMaster > End of LogType:serviceam-err.txt > {code} > This seems because the dependencies are not ready as the service client > reported: > {code:java} > 2019-01-10 21:50:47,380 WARN client.ServiceClient: Property > yarn.service.framework.path has a value > /yarn-services/3.2.0/service-dep.tar.gz, but is not a valid file > 2019-01-10 21:50:47,381 INFO client.ServiceClient: Uploading all dependency > jars to HDFS. For faster submission of apps, set config property > yarn.service.framework.path to the dependency tarball location. Dependency > tarball can be uploaded to any HDFS path directly or by using command: yarn > app -enableFastLaunch [<Destination Folder>]{code} > > When this error happens, I found that there is no “/yarn-services” directory > created in HDFS. > But after I run “yarn app -launch my-sleeper sleeper”, the “/yarn-services” > created in HDFS and then the submarine job can run successfully. > {code:java} > yarn@master0-VirtualBox:~/apache-hadoop-install-dir/hadoop-dev-workspace$ > hdfs dfs -ls /yarn-services/3.2.0/* > -rwxr-xr-x 1 yarn supergroup 93596476 2019-01-11 08:23 > /yarn-services/3.2.0/service-dep.tar.gz{code} > It seems an issue of yarn service in 3.2.0 RC1 and I files this Jira to track > it. > > And verified that trunk branch doesn't have this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org