To run MLlib, you only need numpy on each node. For additional
dependencies, you can call the spark-submit with --py-files option and
add the .zip or .egg.

https://spark.apache.org/docs/latest/submitting-applications.html

Cheers,

Christian

On Fri, Apr 24, 2015 at 1:56 AM, Hoai-Thu Vuong <thuv...@gmail.com> wrote:
> I use sudo pip install ... for each machine in cluster. And don't think how
> submit library
>
> On Fri, Apr 24, 2015 at 4:21 AM dusts66 <dustin.davids...@gmail.com> wrote:
>>
>> I am trying to figure out python library management.  So my question is:
>> Where do third party Python libraries(ex. numpy, scipy, etc.) need to
>> exist
>> if I running a spark job via 'spark-submit' against my cluster in 'yarn
>> client' mode.  Do the libraries need to only exist on the client(ie. the
>> server executing the driver code) or do the libraries need to exist on the
>> datanode/worker nodes where the tasks are executed?  The documentation
>> seems
>> to indicate that under 'yarn client' the libraries are only need on the
>> client machine not the entire cluster.  If the libraries are needed across
>> all cluster machines, any suggestions on a deployment strategy or
>> dependency
>> management model that works well?
>>
>> Thanks
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-where-do-third-parties-libraries-need-to-be-installed-under-Yarn-client-mode-tp22639.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>



-- 
Christian Perez
Silicon Valley Data Science
Data Analyst
christ...@svds.com
@cp_phd

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to