I use sudo pip install ... for each machine in cluster. And don't think how submit library
On Fri, Apr 24, 2015 at 4:21 AM dusts66 <dustin.davids...@gmail.com> wrote: > I am trying to figure out python library management. So my question is: > Where do third party Python libraries(ex. numpy, scipy, etc.) need to exist > if I running a spark job via 'spark-submit' against my cluster in 'yarn > client' mode. Do the libraries need to only exist on the client(ie. the > server executing the driver code) or do the libraries need to exist on the > datanode/worker nodes where the tasks are executed? The documentation > seems > to indicate that under 'yarn client' the libraries are only need on the > client machine not the entire cluster. If the libraries are needed across > all cluster machines, any suggestions on a deployment strategy or > dependency > management model that works well? > > Thanks > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-where-do-third-parties-libraries-need-to-be-installed-under-Yarn-client-mode-tp22639.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >