Have you tried setting PYTHONPATH?
$ export PYTHONPATH="/path/to/project"
$ spark-submit --master yarn-client /path/to/project/main_script.py

Regards,
Ram


On 16 February 2016 at 15:33, Mohannad Ali <man...@gmail.com> wrote:

> Hello Everyone,
>
> I have code inside my project organized in packages and modules, however I
> keep getting the error "ImportError: No module named <package.module>" when
> I run spark on YARN.
>
> My directory structure is something like this:
>
> project/
>      package/
>          module.py
>          __init__.py
>      bin/
>      docs/
>      setup.py
>      main_script.py
>      requirements.txt
>      tests/
>           package/
>                module_test.py
>                __init__.py
>          __init__.py
>
>
> So when I pass `main_script.py` to spark-submit with master set to
> "yarn-client", the packages aren't found and I get the error above.
>
> With a code structure like this adding everything as pyfile to the spark
> context seems counter intuitive.
>
> I just want to organize my code as much as possible to make it more
> readable and maintainable. Is there a better way to achieve good code
> organization without running into such problems?
>
> Best Regards,
> Mo
>

Reply via email to