Have you tried setting PYTHONPATH? $ export PYTHONPATH="/path/to/project" $ spark-submit --master yarn-client /path/to/project/main_script.py
Regards, Ram On 16 February 2016 at 15:33, Mohannad Ali <man...@gmail.com> wrote: > Hello Everyone, > > I have code inside my project organized in packages and modules, however I > keep getting the error "ImportError: No module named <package.module>" when > I run spark on YARN. > > My directory structure is something like this: > > project/ > package/ > module.py > __init__.py > bin/ > docs/ > setup.py > main_script.py > requirements.txt > tests/ > package/ > module_test.py > __init__.py > __init__.py > > > So when I pass `main_script.py` to spark-submit with master set to > "yarn-client", the packages aren't found and I get the error above. > > With a code structure like this adding everything as pyfile to the spark > context seems counter intuitive. > > I just want to organize my code as much as possible to make it more > readable and maintainable. Is there a better way to achieve good code > organization without running into such problems? > > Best Regards, > Mo >