I have been writing map-reduce on hadoop using PIG , and is now trying to
migrate to SPARK.

My cluster consists of multiple nodes, and the jobs depend on a native
library (.so files).
In hadoop and PIG , I could distribute the files across nodes using 
"-files" or "-archive" option, but I could not find any similar mechanism
for SPARK.

Can some one please explain what are the best ways to distribute dependent
files across nodes? 
I have see an SparkContext.addFile() , but looks like this will copy big
files everytime per job.
Moreover, I am not sure if addFile() can automatically unzip archive files.

thanks in advance.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/any-distributed-cache-mechanism-available-in-spark-tp3236.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to