I have been writing map-reduce on hadoop using PIG , and is now trying to migrate to SPARK.
My cluster consists of multiple nodes, and the jobs depend on a native library (.so files). In hadoop and PIG , I could distribute the files across nodes using "-files" or "-archive" option, but I could not find any similar mechanism for SPARK. Can some one please explain what are the best ways to distribute dependent files across nodes? I have see an SparkContext.addFile() , but looks like this will copy big files everytime per job. Moreover, I am not sure if addFile() can automatically unzip archive files. thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/any-distributed-cache-mechanism-available-in-spark-tp3236.html Sent from the Apache Spark User List mailing list archive at Nabble.com.