To run MLlib, you only need numpy on each node. For additional dependencies, you can call the spark-submit with --py-files option and add the .zip or .egg.
https://spark.apache.org/docs/latest/submitting-applications.html Cheers, Christian On Fri, Apr 24, 2015 at 1:56 AM, Hoai-Thu Vuong <thuv...@gmail.com> wrote: > I use sudo pip install ... for each machine in cluster. And don't think how > submit library > > On Fri, Apr 24, 2015 at 4:21 AM dusts66 <dustin.davids...@gmail.com> wrote: >> >> I am trying to figure out python library management. So my question is: >> Where do third party Python libraries(ex. numpy, scipy, etc.) need to >> exist >> if I running a spark job via 'spark-submit' against my cluster in 'yarn >> client' mode. Do the libraries need to only exist on the client(ie. the >> server executing the driver code) or do the libraries need to exist on the >> datanode/worker nodes where the tasks are executed? The documentation >> seems >> to indicate that under 'yarn client' the libraries are only need on the >> client machine not the entire cluster. If the libraries are needed across >> all cluster machines, any suggestions on a deployment strategy or >> dependency >> management model that works well? >> >> Thanks >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-where-do-third-parties-libraries-need-to-be-installed-under-Yarn-client-mode-tp22639.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > -- Christian Perez Silicon Valley Data Science Data Analyst christ...@svds.com @cp_phd --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org