Dear all,

When I submit a pyspark application using this command:
./bin/spark-submit --master yarn-client examples/src/main/python/wordcount.py 
"hdfs://..."
I get the following exception:
Error from python worker:
Traceback (most recent call last):
File "/usr/ali/lib/python2.5/runpy.py", line 85, in run_module
loader = get_loader(mod_name)
File "/usr/ali/lib/python2.5/pkgutil.py", line 456, in get_loader
return find_loader(fullname)
File "/usr/ali/lib/python2.5/pkgutil.py", line 466, in find_loader
for importer in iter_importers(fullname):
File "/usr/ali/lib/python2.5/pkgutil.py", line 422, in iter_importers
__import__(pkg)
ImportError: No module named pyspark
PYTHONPATH was:
/home/xxx/spark/python:/home/xxx/spark_on_yarn/python/lib/py4j-0.8.1-src.zip:/disk11/mapred/tmp/usercache/xxxx/filecache/11/spark-assembly-1.0.0-hadoop2.0.0-ydh2.0.0.jar
Maybe `pyspark/python` and `py4j-0.8.1-src.zip` is not included in the YARN 
worker, How can I distribute these files with my application? Can I use 
`--pyfiles python.zip, py4j-0.8.1-src.zip `?Or how can I package modules in 
pyspark to a .egg file?

Reply via email to