That sounds like SPARK-5479 which is not in 1.4... On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov <elkhan8...@gmail.com> wrote:
> In addition to previous emails, when i try to execute this command from > command line: > > ./bin/spark-submit --verbose --master yarn-cluster --py-files > mypython/libs/numpy-1.9.2.zip --deploy-mode cluster > mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0 > > > - numpy-1.9.2.zip - is downloaded numpy package > - kmeans.py is default example which comes with Spark 1.4 > - kmeans_data.txt - is default data file which comes with Spark 1.4 > > > It fails saying that it could not find numpy: > > File "kmeans.py", line 31, in <module> > import numpy > ImportError: No module named numpy > > Has anyone run Python Spark application on Yarn-cluster mode ? (which has > 3rd party Python modules to be shipped with) > > What are the configurations or installations to be done before running > Python Spark job with 3rd party dependencies on Yarn-cluster ? > > Thanks in advance. > > -- Marcelo