Hello, 2016-02-16 11:03 GMT+01:00 Mohannad Ali <man...@gmail.com>:
> Hello Everyone, > > I have code inside my project organized in packages and modules, however I > keep getting the error "ImportError: No module named <package.module>" when > I run spark on YARN. > > My directory structure is something like this: > > project/ > package/ > module.py > __init__.py > bin/ > docs/ > setup.py > main_script.py > requirements.txt > tests/ > package/ > module_test.py > __init__.py > __init__.py > > > So when I pass `main_script.py` to spark-submit with master set to > "yarn-client", the packages aren't found and I get the error above. > > With a code structure like this adding everything as pyfile to the spark > context seems counter intuitive. > > I just want to organize my code as much as possible to make it more > readable and maintainable. Is there a better way to achieve good code > organization without running into such problems? > According to the docs[1], you should be able to zip your "project/" (or "package/"?) directory and pass the zip-file to spark-submit via --py-files Best, Eike [1] --py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps.