Hello Everyone,

I have code inside my project organized in packages and modules, however I
keep getting the error "ImportError: No module named <package.module>" when
I run spark on YARN.

My directory structure is something like this:

project/
     package/
         module.py
         __init__.py
     bin/
     docs/
     setup.py
     main_script.py
     requirements.txt
     tests/
          package/
               module_test.py
               __init__.py
         __init__.py


So when I pass `main_script.py` to spark-submit with master set to
"yarn-client", the packages aren't found and I get the error above.

With a code structure like this adding everything as pyfile to the spark
context seems counter intuitive.

I just want to organize my code as much as possible to make it more
readable and maintainable. Is there a better way to achieve good code
organization without running into such problems?

Best Regards,
Mo

Reply via email to