I'm stumped with this one. I'm using YARN on EMR to distribute my spark job. While it seems initially, the job is starting up fine - the Spark Executor nodes are having trouble pulling the jars from the location on hdfs that the master just put the files on.
[hadoop@ip-172-16-2-167 ~]$ SPARK_JAR=./spark/lib/spark-assembly-1.0.0-hadoop2.4.0.jar ./spark/bin/spark-class org.apache.spark.deploy.yarn.Client --jar /mnt/tmp/GenerateAssetContent/rickshaw-spark-0.0.1-SNAPSHOT.jar --class com.evocalize.rickshaw.spark.applications.GenerateAssetContent --args yarn-standalone --num-workers 3 --master-memory 2g --worker-memory 2g --worker-cores 1 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/.versions/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.4.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] WARNING: This client is deprecated and will be removed in a future version of Spark. Use ./bin/spark-submit with "--master yarn" --args is deprecated. Use --arg instead. --num-workers is deprecated. Use --num-executors instead. --master-memory is deprecated. Use --driver-memory instead. --worker-memory is deprecated. Use --executor-memory instead. --worker-cores is deprecated. Use --executor-cores instead. 14/07/18 22:27:50 INFO client.RMProxy: Connecting to ResourceManager at /172.16.2.167:9022 14/07/18 22:27:51 INFO yarn.Client: Got Cluster metric info from ApplicationsManager (ASM), number of NodeManagers: 2 14/07/18 22:27:51 INFO yarn.Client: Queue info ... queueName: default, queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0, queueApplicationCount = 0, queueChildQueueCount = 0 14/07/18 22:27:51 INFO yarn.Client: Max mem capabililty of a single resource in this cluster 3072 14/07/18 22:27:51 INFO yarn.Client: Preparing Local resources 14/07/18 22:27:53 INFO yarn.Client: Uploading file:/mnt/tmp/GenerateAssetContent/rickshaw-spark-0.0.1-SNAPSHOT.jar to hdfs://172.16.2.167:9000/user/hadoop/.sparkStaging/application_1405713259773_0014/rickshaw-spark-0.0.1-SNAPSHOT.jar 14/07/18 22:27:57 INFO yarn.Client: Uploading file:/home/hadoop/spark/lib/spark-assembly-1.0.0-hadoop2.4.0.jar to hdfs://172.16.2.167:9000/user/hadoop/.sparkStaging/application_1405713259773_0014/spark-assembly-1.0.0-hadoop2.4.0.jar 14/07/18 22:27:59 INFO yarn.Client: Setting up the launch environment 14/07/18 22:27:59 INFO yarn.Client: Setting up container launch context 14/07/18 22:27:59 INFO yarn.Client: Command for starting the Spark ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx2048m, -Djava.io.tmpdir=$PWD/tmp, -Dlog4j.configuration=log4j-spark-container.properties, org.apache.spark.deploy.yarn.ApplicationMaster, --class, com.evocalize.rickshaw.spark.applications.GenerateAssetContent, --jar , /mnt/tmp/GenerateAssetContent/rickshaw-spark-0.0.1-SNAPSHOT.jar, --args 'yarn-standalone' , --executor-memory, 2048, --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr) 14/07/18 22:27:59 INFO yarn.Client: Submitting application to ASM 14/07/18 22:27:59 INFO impl.YarnClientImpl: Submitted application application_1405713259773_0014 ... 14/07/18 22:28:23 INFO yarn.Client: Application report from ASM: application identifier: application_1405713259773_0014 appId: 14 clientToAMToken: null appDiagnostics: Application application_1405713259773_0014 failed 2 times due to AM Container for appattempt_1405713259773_0014_000002 exited with exitCode: -1000 due to: File does not exist: hdfs://172.16.2.167:9000/user/hadoop/.sparkStaging/application_1405713259773_0014/spark-assembly-1.0.0-hadoop2.4.0.jar .Failing this attempt.. Failing the application. appMasterHost: N/A appQueue: default appMasterRpcPort: -1 appStartTime: 1405722479547 yarnAppState: FAILED distributedFinalState: FAILED appTrackingUrl: ip-172-16-2-167.us-west-1.compute.internal:9026/cluster/app/application_1405713259773_0014 appUser: hadoop ----- I tried to ls the file on the location - it doesn't exist either - although Spark could have cleaned that up before exiting. I verified that on EMR, all ports are open between each other so this can't be a port issue. What am I missing? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-YARN-on-AWS-EMR-Issues-finding-file-on-hdfs-tp10214.html Sent from the Apache Spark User List mailing list archive at Nabble.com.