I am trying to figure out python library management.  So my question is: 
Where do third party Python libraries(ex. numpy, scipy, etc.) need to exist
if I running a spark job via 'spark-submit' against my cluster in 'yarn
client' mode.  Do the libraries need to only exist on the client(ie. the
server executing the driver code) or do the libraries need to exist on the
datanode/worker nodes where the tasks are executed?  The documentation seems
to indicate that under 'yarn client' the libraries are only need on the
client machine not the entire cluster.  If the libraries are needed across
all cluster machines, any suggestions on a deployment strategy or dependency
management model that works well?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-where-do-third-parties-libraries-need-to-be-installed-under-Yarn-client-mode-tp22639.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to