Re: why "Shuffle Write" is not zero when everything is cached and there is enough memory?

Udit Mehta Tue, 31 Mar 2015 12:35:44 -0700

I have noticed a similar issue when using spark streaming. The spark
shuffle write size increases to a large size(in GB) and then the app
crashes saying:
java.io.FileNotFoundException:
/data/vol0/nodemanager/usercache/$user/appcache/application_1427480955913_0339/spark-local-20150330231234-db1a/0b/temp_shuffle_1b23808f-f285-40b2-bec7-1c6790050d7f
(No such file or directory)


I dont understand why the shuffle size increases to such a large value for
long running jobs.

Thanks,
Udiy

On Mon, Mar 30, 2015 at 4:01 AM, shahab <shahab.mok...@gmail.com> wrote:

> Thanks Saisai. I will try your solution, but still i don't understand why
> filesystem should be used where there is a plenty of memory available!
>
>
>
> On Mon, Mar 30, 2015 at 11:22 AM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
>> Shuffle write will finally spill the data into file system as a bunch of
>> files. If you want to avoid disk write, you can mount a ramdisk and
>> configure "spark.local.dir" to this ram disk. So shuffle output will write
>> to memory based FS, and will not introduce disk IO.
>>
>> Thanks
>> Jerry
>>
>> 2015-03-30 17:15 GMT+08:00 shahab <shahab.mok...@gmail.com>:
>>
>>> Hi,
>>>
>>> I was looking at SparkUI, Executors, and I noticed that I have 597 MB
>>> for  "Shuffle while I am using cached temp-table and the Spark had 2 GB
>>> free memory (the number under Memory Used is 597 MB /2.6 GB) ?!!!
>>>
>>> Shouldn't be Shuffle Write be zero and everything (map/reduce) tasks be
>>> done in memory?
>>>
>>> best,
>>>
>>> /Shahab
>>>
>>
>>
>

Re: why "Shuffle Write" is not zero when everything is cached and there is enough memory?

Reply via email to