Hi Udit,
The persisted RDDs in memory is cleared by Spark using LRU policy and you
can also set the time to clear the persisted RDDs and meta-data by setting*
spark.cleaner.ttl *(default infinite). But I am not aware about any
properties to clean the older shuffle write from from disks.
thanks,
b
Thanks for the reply.
This will reduce the shuffle write to disk to an extent but for a long
running job(multiple days), the shuffle write would still occupy a lot of
space on disk. Why do we need to store the data from older map tasks to
memory?
On Tue, Mar 31, 2015 at 1:19 PM, Bijay Pathak
wrot
The Spark Sort-Based Shuffle (default from 1.1) keeps the data from
each Map tasks to memory until they they can't fit after which they
are sorted and spilled to disk. You can reduce the Shuffle write to
disk by increasing spark.shuffle.memoryFraction(default 0.2).
By writing the shuffle output to
I have noticed a similar issue when using spark streaming. The spark
shuffle write size increases to a large size(in GB) and then the app
crashes saying:
java.io.FileNotFoundException:
/data/vol0/nodemanager/usercache/$user/appcache/application_1427480955913_0339/spark-local-20150330231234-db1a/0b/
Thanks Saisai. I will try your solution, but still i don't understand why
filesystem should be used where there is a plenty of memory available!
On Mon, Mar 30, 2015 at 11:22 AM, Saisai Shao
wrote:
> Shuffle write will finally spill the data into file system as a bunch of
> files. If you want
Shuffle write will finally spill the data into file system as a bunch of
files. If you want to avoid disk write, you can mount a ramdisk and
configure "spark.local.dir" to this ram disk. So shuffle output will write
to memory based FS, and will not introduce disk IO.
Thanks
Jerry
2015-03-30 17:15
Hi,
I was looking at SparkUI, Executors, and I noticed that I have 597 MB for
"Shuffle while I am using cached temp-table and the Spark had 2 GB free
memory (the number under Memory Used is 597 MB /2.6 GB) ?!!!
Shouldn't be Shuffle Write be zero and everything (map/reduce) tasks be
done in memor