Hi , We avaluating PySpark and successfully executed examples of PySpark on Yarn.
Next step what we want to do: We have a python project ( bunch of python script using Anaconda packages). Question: What is the way to execute PySpark on Yarn having a lot of python files ( ~ 50)? Should it be packaged in archive? How the command to execute Pyspark on Yarn with a lot of files will looks like? Currently command looks like: ./bin/spark-submit --master yarn --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 examples/src/main/python/wordcount.py 1000 Thanks Oleg.