I guess you will have to install numpy on all the machines for this to work. Try reinstalling on all the machines:
sudo apt-get purge python-numpy sudo pip uninstall numpy sudo pip install numpy Thanks Best Regards On Sun, Dec 20, 2015 at 8:33 AM, Abhinav M Kulkarni < abhinavkulka...@gmail.com> wrote: > I am running Spark programs on a large cluster (for which, I do not have > administrative privileges). numpy is not installed on the worker nodes. > Hence, I bundled numpy with my program, but I get the following error: > > Traceback (most recent call last): > File "/home/user/spark-script.py", line 12, in <module> > import numpy > File "/usr/local/lib/python2.7/dist-packages/numpy/__init__.py", line > 170, in <module> > File "/usr/local/lib/python2.7/dist-packages/numpy/add_newdocs.py", line > 13, in <module> > File "/usr/local/lib/python2.7/dist-packages/numpy/lib/__init__.py", > line 8, in <module> > File "/usr/local/lib/python2.7/dist-packages/numpy/lib/type_check.py", > line 11, in <module> > File "/usr/local/lib/python2.7/dist-packages/numpy/core/__init__.py", > line 6, in <module> > ImportError: cannot import name multiarray > > The script is actually quite simple: > > from pyspark import SparkConf, SparkContext > sc = SparkContext() > > sc.addPyFile('numpy.zip') > > import numpy > > a = sc.parallelize(numpy.array([12, 23, 34, 45, 56, 67, 78, 89, 90])) > print a.collect() > > I understand that the error occurs because numpy dynamically loads > multiarray.so dependency and even if my numpy.zip file includes > multiarray.so file, somehow the dynamic loading doesn't work with Apache > Spark. Why so? And how do you othewise create a standalone numpymodule > with static linking? > > P.S. The numpy.zip file I had included with the program was zipped version > of the numpy installation on my Ubuntu machine. I also tried downloading > numpy source and building it and bundling it with the program, but the > problem persisted. > > Thanks. > >