My issue is that there is not enough pressure on GC, hence GC is not kicking in fast enough to delete the shuffle files of previous iterations.
Regards, Keith. http://keith-chapman.com On Thu, Feb 22, 2018 at 6:58 PM, naresh Goud <nareshgoud.du...@gmail.com> wrote: > It would be very difficult to tell without knowing what is your > application code doing, what kind of transformation/actions performing. > From my previous experience tuning application code which avoids > unnecessary objects reduce pressure on GC. > > > On Thu, Feb 22, 2018 at 2:13 AM, Keith Chapman <keithgchap...@gmail.com> > wrote: > >> Hi, >> >> I'm benchmarking a spark application by running it for multiple >> iterations, its a benchmark thats heavy on shuffle and I run it on a local >> machine with a very large hear (~200GB). The system has a SSD. When running >> for 3 to 4 iterations I get into a situation that I run out of disk space >> on the /tmp directory. On further investigation I was able to figure out >> that the reason for this is that the shuffle files are still around, >> because I have a very large hear GC has not happen and hence the shuffle >> files are not deleted. I was able to confirm this by lowering the heap size >> and I see GC kicking in more often and the size of /tmp stays under >> control. Is there any way I could configure spark to handle this issue? >> >> One option that I have is to have GC run more often by >> setting spark.cleaner.periodicGC.interval to a much lower value. Is >> there a cleaner solution? >> >> Regards, >> Keith. >> >> http://keith-chapman.com >> > >