Please take a look at the pull request with the actual fix; that will explain why it's the same issue.
On Thu, Jun 25, 2015 at 12:51 PM, Elkhan Dadashov <elkhan8...@gmail.com> wrote: > Thanks Marcelo. > > But my case is different. My mypython/libs/numpy-1.9.2.zip is in *local > directory* (can also put in HDFS), but still fails. > > But SPARK-5479 <https://issues.apache.org/jira/browse/SPARK-5479> is : > PySpark on yarn mode need to support *non-local* python files. > > The job fails only when i try to include 3rd party dependency from local > computer with --py-files (in Spark 1.4) > > Both of these commands succeed: > > ./bin/spark-submit --master yarn-cluster --verbose hdfs:///pi.py > ./bin/spark-submit --master yarn-cluster --deploy-mode cluster --verbose > examples/src/main/python/pi.py > > But in this particular example with 3rd party numpy module: > > ./bin/spark-submit --verbose --master yarn-cluster --py-files > mypython/libs/numpy-1.9.2.zip --deploy-mode cluster > mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0 > > > All these files : > > mypython/libs/numpy-1.9.2.zip, mypython/scripts/kmeans.py are local > files, kmeans_data.txt is in HDFS. > > > Thanks. > > > On Thu, Jun 25, 2015 at 12:22 PM, Marcelo Vanzin <van...@cloudera.com> > wrote: > >> That sounds like SPARK-5479 which is not in 1.4... >> >> On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov <elkhan8...@gmail.com> >> wrote: >> >>> In addition to previous emails, when i try to execute this command from >>> command line: >>> >>> ./bin/spark-submit --verbose --master yarn-cluster --py-files >>> mypython/libs/numpy-1.9.2.zip --deploy-mode cluster >>> mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0 >>> >>> >>> - numpy-1.9.2.zip - is downloaded numpy package >>> - kmeans.py is default example which comes with Spark 1.4 >>> - kmeans_data.txt - is default data file which comes with Spark 1.4 >>> >>> >>> It fails saying that it could not find numpy: >>> >>> File "kmeans.py", line 31, in <module> >>> import numpy >>> ImportError: No module named numpy >>> >>> Has anyone run Python Spark application on Yarn-cluster mode ? (which >>> has 3rd party Python modules to be shipped with) >>> >>> What are the configurations or installations to be done before running >>> Python Spark job with 3rd party dependencies on Yarn-cluster ? >>> >>> Thanks in advance. >>> >>> >> -- >> Marcelo >> > > > > -- > > Best regards, > Elkhan Dadashov > -- Marcelo