The last time I checked, if you launch EMR 4 with only Spark selected as an application, HDFS isn't correctly installed.
Did you select another application like Hive at launch time as well as Spark? If not, try that. Thanks, Ewan ------ Original message------ From: Dean Wampler Date: Wed, 9 Sep 2015 22:29 To: shahab; Cc: user@spark.apache.org; Subject:Re: [Spark on Amazon EMR] : File does not exist: hdfs://ip-x-x-x-x:/.../spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar If you log into the cluster, do you see the file if you type: hdfs dfs -ls hdfs://ipx-x-x-x:8020/user/hadoop/.sparkStaging/application_123344567_0018/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar (with the correct server address for "ipx-x-x-x"). If not, is the server address correct and routable inside the cluster. Recall that EC2 instances have both public and private host names & IP addresses. Also, is the port number correct for HDFS in the cluster? dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) Typesafe<http://typesafe.com> @deanwampler<http://twitter.com/deanwampler> http://polyglotprogramming.com On Wed, Sep 9, 2015 at 9:28 AM, shahab <shahab.mok...@gmail.com<mailto:shahab.mok...@gmail.com>> wrote: Hi, I am using Spark on Amazon EMR. So far I have not succeeded to submit the application successfully, not sure what's problem. In the log file I see the followings. java.io.FileNotFoundException: File does not exist: hdfs://ipx-x-x-x:8020/user/hadoop/.sparkStaging/application_123344567_0018/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar However, even putting spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar in the fat jar file didn't solve the problem. I am out of clue now. I want to submit a spark application, using aws web console, as a step. I submit the application as : spark-submit --deploy-mode cluster --class mypack.MyMainClass --master yarn-cluster s3://mybucket/MySparkApp.jar Is there any one who has similar problem with EMR? best, /Shahab