I'm stumped with this one. I'm using YARN on EMR to distribute my spark job.
While it seems initially, the job is starting up fine - the Spark Executor
nodes are having trouble pulling the jars from the location on hdfs that the
master just put the files on. 

[hadoop@ip-172-16-2-167 ~]$
SPARK_JAR=./spark/lib/spark-assembly-1.0.0-hadoop2.4.0.jar
./spark/bin/spark-class org.apache.spark.deploy.yarn.Client --jar
/mnt/tmp/GenerateAssetContent/rickshaw-spark-0.0.1-SNAPSHOT.jar --class
com.evocalize.rickshaw.spark.applications.GenerateAssetContent --args
yarn-standalone --num-workers 3 --master-memory 2g --worker-memory 2g
--worker-cores 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/hadoop/.versions/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.4.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARNING: This client is deprecated and will be removed in a future version
of Spark. Use ./bin/spark-submit with "--master yarn"
--args is deprecated. Use --arg instead.
--num-workers is deprecated. Use --num-executors instead.
--master-memory is deprecated. Use --driver-memory instead.
--worker-memory is deprecated. Use --executor-memory instead.
--worker-cores is deprecated. Use --executor-cores instead.
14/07/18 22:27:50 INFO client.RMProxy: Connecting to ResourceManager at
/172.16.2.167:9022
14/07/18 22:27:51 INFO yarn.Client: Got Cluster metric info from
ApplicationsManager (ASM), number of NodeManagers: 2
14/07/18 22:27:51 INFO yarn.Client: Queue info ... queueName: default,
queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
      queueApplicationCount = 0, queueChildQueueCount = 0
14/07/18 22:27:51 INFO yarn.Client: Max mem capabililty of a single resource
in this cluster 3072
14/07/18 22:27:51 INFO yarn.Client: Preparing Local resources
14/07/18 22:27:53 INFO yarn.Client: Uploading
file:/mnt/tmp/GenerateAssetContent/rickshaw-spark-0.0.1-SNAPSHOT.jar to
hdfs://172.16.2.167:9000/user/hadoop/.sparkStaging/application_1405713259773_0014/rickshaw-spark-0.0.1-SNAPSHOT.jar
14/07/18 22:27:57 INFO yarn.Client: Uploading
file:/home/hadoop/spark/lib/spark-assembly-1.0.0-hadoop2.4.0.jar to
hdfs://172.16.2.167:9000/user/hadoop/.sparkStaging/application_1405713259773_0014/spark-assembly-1.0.0-hadoop2.4.0.jar
14/07/18 22:27:59 INFO yarn.Client: Setting up the launch environment
14/07/18 22:27:59 INFO yarn.Client: Setting up container launch context
14/07/18 22:27:59 INFO yarn.Client: Command for starting the Spark
ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx2048m,
-Djava.io.tmpdir=$PWD/tmp, 
-Dlog4j.configuration=log4j-spark-container.properties,
org.apache.spark.deploy.yarn.ApplicationMaster, --class,
com.evocalize.rickshaw.spark.applications.GenerateAssetContent, --jar ,
/mnt/tmp/GenerateAssetContent/rickshaw-spark-0.0.1-SNAPSHOT.jar,  --args 
'yarn-standalone' , --executor-memory, 2048, --executor-cores, 1,
--num-executors , 3, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
14/07/18 22:27:59 INFO yarn.Client: Submitting application to ASM
14/07/18 22:27:59 INFO impl.YarnClientImpl: Submitted application
application_1405713259773_0014
...
14/07/18 22:28:23 INFO yarn.Client: Application report from ASM:
         application identifier: application_1405713259773_0014
         appId: 14
         clientToAMToken: null
         appDiagnostics: Application application_1405713259773_0014 failed 2 
times
due to AM Container for appattempt_1405713259773_0014_000002 exited with 
exitCode: -1000 due to: File does not exist:
hdfs://172.16.2.167:9000/user/hadoop/.sparkStaging/application_1405713259773_0014/spark-assembly-1.0.0-hadoop2.4.0.jar
.Failing this attempt.. Failing the application.
         appMasterHost: N/A
         appQueue: default
         appMasterRpcPort: -1
         appStartTime: 1405722479547
         yarnAppState: FAILED
         distributedFinalState: FAILED
         appTrackingUrl:
ip-172-16-2-167.us-west-1.compute.internal:9026/cluster/app/application_1405713259773_0014
         appUser: hadoop

-----

I tried to ls the file on the location - it doesn't exist either - although
Spark could have cleaned that up before exiting. I verified that on EMR, all
ports are open between each other so this can't be a port issue. What am I
missing? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-YARN-on-AWS-EMR-Issues-finding-file-on-hdfs-tp10214.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to