Re: Configuring distributed caching with Spark and YARN

santhoma Tue, 01 Apr 2014 00:34:24 -0700

I think with addJar() there is no 'caching',  in the sense files will be
copied everytime per job.
Whereas in hadoop distributed cache, files will be copied only once, and a
symlink will be created to the cache file for subsequent runs:
https://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/filecache/DistributedCache.html


Also,hadoop distributed cache can copy an archive  file to the node and
unzip it automatically to current working dir. The advantage here is that
the copying will be very fast..

Still looking for similar  mechanisms in SPARK




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-distributed-caching-with-Spark-and-YARN-tp1074p3566.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Configuring distributed caching with Spark and YARN

Reply via email to