https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-10795
https://stackoverflow.com/questions/34632617/spark-python-submission-error-file-does-not-exist-pyspark-zip https://stackoverflow.com/questions/34632617/spark-python-submission-error-file-does-not-exist-pyspark-zip> Sent: Thursday, July 16, 2020 at 6:54 PM > From: "Davide Curcio" <davide.cur...@live.com> > To: "user@spark.apache.org" <user@spark.apache.org> > Subject: “Pyspark.zip does not exist” using Spark in cluster mode with Yarn > > I'm trying to run some Spark script in cluster mode using Yarn but I've > always obtained this error. I read in other similar question that the cause > can be: > > "Local" set up hard-coded as a master but I don't have it > HADOOP_CONF_DIR environment variable that's wrong inside spark-env.sh but it > seems right > I've tried with every code, even simple code but it still doesn't work, even > though in local mode they work. > > Here is my log when I try to execute the code: > > spark/bin/spark-submit --deploy-mode cluster --master yarn ~/prova7.py > log4j:WARN No appenders could be found for logger > (org.apache.hadoop.util.Shell). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 20/07/16 16:10:27 INFO Client: Requesting a new application from cluster with > 2 NodeManagers > 20/07/16 16:10:27 INFO Client: Verifying our application has not requested > more than the maximum memory capability of the cluster (1536 MB per container) > 20/07/16 16:10:27 INFO Client: Will allocate AM container, with 896 MB memory > including 384 MB overhead > 20/07/16 16:10:27 INFO Client: Setting up container launch context for our AM > 20/07/16 16:10:27 INFO Client: Setting up the launch environment for our AM > container > 20/07/16 16:10:27 INFO Client: Preparing resources for our AM container > 20/07/16 16:10:27 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive > is set, falling back to uploading libraries under SPARK_HOME. > 20/07/16 16:10:31 INFO Client: Uploading resource > file:/tmp/spark-750fb229-4166-4444-9c69-eb90e9a2318d/__spark_libs__4588035472069967339.zip > -> > file:/home/ubuntu/.sparkStaging/application_1594914119543_0010/__spark_libs__4588035472069967339.zip > 20/07/16 16:10:31 INFO Client: Uploading resource file:/home/ubuntu/prova7.py > -> file:/home/ubuntu/.sparkStaging/application_1594914119543_0010/prova7.py > 20/07/16 16:10:31 INFO Client: Uploading resource > file:/home/ubuntu/spark/python/lib/pyspark.zip -> > file:/home/ubuntu/.sparkStaging/application_1594914119543_0010/pyspark.zip > 20/07/16 16:10:31 INFO Client: Uploading resource > file:/home/ubuntu/spark/python/lib/py4j-0.10.7-src.zip -> > file:/home/ubuntu/.sparkStaging/application_1594914119543_0010/py4j-0.10.7-src.zip > 20/07/16 16:10:32 INFO Client: Uploading resource > file:/tmp/spark-750fb229-4166-4444-9c69-eb90e9a2318d/__spark_conf__1291791519024875749.zip > -> > file:/home/ubuntu/.sparkStaging/application_1594914119543_0010/__spark_conf__.zip > 20/07/16 16:10:32 INFO SecurityManager: Changing view acls to: ubuntu > 20/07/16 16:10:32 INFO SecurityManager: Changing modify acls to: ubuntu > 20/07/16 16:10:32 INFO SecurityManager: Changing view acls groups to: > 20/07/16 16:10:32 INFO SecurityManager: Changing modify acls groups to: > 20/07/16 16:10:32 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(ubuntu); groups > with view permissions: Set(); users with modify permissions: Set(ubuntu); > groups with modify permissions: Set() > 20/07/16 16:10:33 INFO Client: Submitting application > application_1594914119543_0010 to ResourceManager > 20/07/16 16:10:33 INFO YarnClientImpl: Submitted application > application_1594914119543_0010 > 20/07/16 16:10:34 INFO Client: Application report for > application_1594914119543_0010 (state: FAILED) > 20/07/16 16:10:34 INFO Client: > client token: N/A > diagnostics: Application application_1594914119543_0010 failed 2 times > due to AM Container for appattempt_1594914119543_0010_000002 exited with > exitCode: -1000 > Failing this attempt.Diagnostics: [2020-07-16 16:10:34.391]File > file:/home/ubuntu/.sparkStaging/application_1594914119543_0010/pyspark.zip > does not exist > java.io.FileNotFoundException: File > file:/home/ubuntu/.sparkStaging/application_1594914119543_0010/pyspark.zip > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454) > at > org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > For more detailed output, check the application tracking page: > http://ec2-3-215-190-32.compute-1.amazonaws.com:8088/cluster/app/application_1594914119543_0010 > Then click on links to logs of each attempt. > . Failing the application. > ApplicationMaster host: N/A > ApplicationMaster RPC port: -1 > queue: default > start time: 1594915833427 > final status: FAILED > tracking URL: > http://ec2-3-215-190-32.compute-1.amazonaws.com:8088/cluster/app/application_1594914119543_0010 > user: ubuntu > 20/07/16 16:10:34 INFO Client: Deleted staging directory > file:/home/ubuntu/.sparkStaging/application_1594914119543_0010 > 20/07/16 16:10:34 ERROR Client: Application diagnostics message: Application > application_1594914119543_0010 failed 2 times due to AM Container for > appattempt_1594914119543_0010_000002 exited with exitCode: -1000 > Failing this attempt.Diagnostics: [2020-07-16 16:10:34.391]File > file:/home/ubuntu/.sparkStaging/application_1594914119543_0010/pyspark.zip > does not exist > java.io.FileNotFoundException: File > file:/home/ubuntu/.sparkStaging/application_1594914119543_0010/pyspark.zip > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454) > at > org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(Fspark/bin/spark-submit > --deploy-mode cluster --master yarn ~/prova7.pyutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > For more detailed output, check the application tracking page: > http://ec2-3-215-190-32.compute-1.amazonaws.com:8088/cluster/app/application_1594914119543_0010 > Then click on links to logs of each attempt. > . Failing the application. > Exception in thread "main" org.apache.spark.SparkException: Application > application_1594914119543_0010 finished with failed status > at org.apache.spark.deploy.yarn.Client.run(Client.scala:1150) > at > org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1530) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > 20/07/16 16:10:34 INFO ShutdownHookManager: Shutdown hook called > 20/07/16 16:10:34 INFO ShutdownHookManager: Deleting directory > /tmp/spark-750fb229-4166-4444-9c69-eb90e9a2318d > 20/07/16 16:10:34 INFO ShutdownHookManager: Deleting directory > /tmp/spark-257b390a-3c40-49fd-b285-de35f27e3dfb > Do you have any suggestion about how to solve this problem? > > Thanks in advance, > > Davide > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org