Re: Pyspark where do third parties libraries need to be installed under Yarn-client mode

Hoai-Thu Vuong Fri, 24 Apr 2015 01:57:07 -0700

I use sudo pip install ... for each machine in cluster. And don't think how
submit library


On Fri, Apr 24, 2015 at 4:21 AM dusts66 <dustin.davids...@gmail.com> wrote:

> I am trying to figure out python library management.  So my question is:
> Where do third party Python libraries(ex. numpy, scipy, etc.) need to exist
> if I running a spark job via 'spark-submit' against my cluster in 'yarn
> client' mode.  Do the libraries need to only exist on the client(ie. the
> server executing the driver code) or do the libraries need to exist on the
> datanode/worker nodes where the tasks are executed?  The documentation
> seems
> to indicate that under 'yarn client' the libraries are only need on the
> client machine not the entire cluster.  If the libraries are needed across
> all cluster machines, any suggestions on a deployment strategy or
> dependency
> management model that works well?
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-where-do-third-parties-libraries-need-to-be-installed-under-Yarn-client-mode-tp22639.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Pyspark where do third parties libraries need to be installed under Yarn-client mode

Reply via email to