I guess you will have to install numpy on all the machines for this to
work. Try reinstalling on all the machines:

sudo apt-get purge python-numpy
sudo pip uninstall numpy
sudo pip install numpy



Thanks
Best Regards

On Sun, Dec 20, 2015 at 8:33 AM, Abhinav M Kulkarni <
abhinavkulka...@gmail.com> wrote:

> I am running Spark programs on a large cluster (for which, I do not have
> administrative privileges). numpy is not installed on the worker nodes.
> Hence, I bundled numpy with my program, but I get the following error:
>
> Traceback (most recent call last):
>   File "/home/user/spark-script.py", line 12, in <module>
>     import numpy
>   File "/usr/local/lib/python2.7/dist-packages/numpy/__init__.py", line
> 170, in <module>
>   File "/usr/local/lib/python2.7/dist-packages/numpy/add_newdocs.py", line
> 13, in <module>
>   File "/usr/local/lib/python2.7/dist-packages/numpy/lib/__init__.py",
> line 8, in <module>
>   File "/usr/local/lib/python2.7/dist-packages/numpy/lib/type_check.py",
> line 11, in <module>
>   File "/usr/local/lib/python2.7/dist-packages/numpy/core/__init__.py",
> line 6, in <module>
> ImportError: cannot import name multiarray
>
> The script is actually quite simple:
>
> from pyspark import SparkConf, SparkContext
> sc = SparkContext()
>
> sc.addPyFile('numpy.zip')
>
> import numpy
>
> a = sc.parallelize(numpy.array([12, 23, 34, 45, 56, 67, 78, 89, 90]))
> print a.collect()
>
> I understand that the error occurs because numpy dynamically loads
> multiarray.so dependency and even if my numpy.zip file includes
> multiarray.so file, somehow the dynamic loading doesn't work with Apache
> Spark. Why so? And how do you othewise create a standalone numpymodule
> with static linking?
>
> P.S. The numpy.zip file I had included with the program was zipped version
> of the numpy installation on my Ubuntu machine. I also tried downloading
> numpy source and building it and bundling it with the program, but the
> problem persisted.
>
> Thanks.
>
>

Reply via email to