Hi all! I found this problem when I tried running python application on Amazon's EMR yarn cluster.
It is possible to run bundled example applications on EMR but I cannot figure out how to run a little bit more complex python application which depends on some other python scripts. I tried adding those files with '--py-files' and it works fine in local mode but it fails and gives me following message when run in EMR: "Error: Only local python files are supported: s3://pathtomybucket/mylibrary.py". Simplest way to reproduce in local: bin/spark-submit --py-files s3://whatever.path.com/library.py main.py Actual commands to run it in EMR #launch cluster aws emr create-cluster --name SparkCluster --ami-version 3.3.1 --instance-type m1.medium --instance-count 2 --ec2-attributes KeyName=key20141114 --log-uri s3://pathtomybucket/cluster_logs --enable-debugging --use-default-roles --bootstrap-action Name=Spark,Path=s3://pathtomybucket/bootstrap-actions/spark/install-spark,Args=["-s"," http://pathtomybucket/bootstrap-actions/spark ","-l","WARN","-v","1.2","-b","2014121700","-x"] #{ # "ClusterId": "j-2Y58DME79MPQJ" #} #run application aws emr add-steps --cluster-id "j-2Y58DME79MPQJ" --steps ActionOnFailure=CONTINUE,Name=SparkPy,Jar=s3://eu-west-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[/home/hadoop/spark/bin/spark-submit,--deploy-mode,cluster,--master,yarn-cluster,--py-files,s3://pathtomybucket/tasks/demo/main.py,main.py] #{ # "StepIds": [ # "s-2UP4PP75YX0KU" # ] #} And in stderr of that step I get "Error: Only local python files are supported: s3://pathtomybucket/tasks/demo/main.py". What is the workaround or correct way to do it? Using hadoop's distcp to copy dependency files from s3 to nodes as another pre-step? Regards, Vladimir