Please take a look at the pull request with the actual fix; that will
explain why it's the same issue.

On Thu, Jun 25, 2015 at 12:51 PM, Elkhan Dadashov <elkhan8...@gmail.com>
wrote:

> Thanks Marcelo.
>
> But my case is different. My mypython/libs/numpy-1.9.2.zip is in *local
> directory* (can also put in HDFS), but still fails.
>
> But SPARK-5479 <https://issues.apache.org/jira/browse/SPARK-5479> is :
> PySpark on yarn mode need to support *non-local* python files.
>
> The job fails only when i try to include 3rd party dependency from local
> computer with --py-files (in Spark 1.4)
>
> Both of these commands succeed:
>
> ./bin/spark-submit --master yarn-cluster --verbose hdfs:///pi.py
> ./bin/spark-submit --master yarn-cluster --deploy-mode cluster  --verbose
> examples/src/main/python/pi.py
>
> But in this particular example with 3rd party numpy module:
>
> ./bin/spark-submit --verbose --master yarn-cluster --py-files
>  mypython/libs/numpy-1.9.2.zip --deploy-mode cluster
> mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0
>
>
> All these files :
>
> mypython/libs/numpy-1.9.2.zip,  mypython/scripts/kmeans.py are local
> files, kmeans_data.txt is in HDFS.
>
>
> Thanks.
>
>
> On Thu, Jun 25, 2015 at 12:22 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
>
>> That sounds like SPARK-5479 which is not in 1.4...
>>
>> On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov <elkhan8...@gmail.com>
>> wrote:
>>
>>> In addition to previous emails, when i try to execute this command from
>>> command line:
>>>
>>> ./bin/spark-submit --verbose --master yarn-cluster --py-files
>>>  mypython/libs/numpy-1.9.2.zip --deploy-mode cluster
>>> mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0
>>>
>>>
>>> - numpy-1.9.2.zip - is downloaded numpy package
>>> - kmeans.py is default example which comes with Spark 1.4
>>> - kmeans_data.txt  - is default data file which comes with Spark 1.4
>>>
>>>
>>> It fails saying that it could not find numpy:
>>>
>>> File "kmeans.py", line 31, in <module>
>>>     import numpy
>>> ImportError: No module named numpy
>>>
>>> Has anyone run Python Spark application on Yarn-cluster mode ? (which
>>> has 3rd party Python modules to be shipped with)
>>>
>>> What are the configurations or installations to be done before running
>>> Python Spark job with 3rd party dependencies on Yarn-cluster ?
>>>
>>> Thanks in advance.
>>>
>>>
>> --
>> Marcelo
>>
>
>
>
> --
>
> Best regards,
> Elkhan Dadashov
>



-- 
Marcelo

Reply via email to