bq. that solved some problems Is there any problem that was not solved by the tweak ?
Thanks On Thu, Mar 3, 2016 at 4:11 PM, Eugen Cepoi <cepoi.eu...@gmail.com> wrote: > You can limit the amount of memory spark will use for shuffle even in 1.6. > You can do that by tweaking the spark.memory.fraction and the > spark.storage.fraction. For example if you want to have no shuffle cache at > all you can set the storage.fraction to 1 or something close, to let a > small place for the shuffle cache. And then use the rest for storage, and > if you don't persist/broadcast data then you can reduce the whole > memory.fraction. > > Though not sure how good it is to tweak those values, as it assumes spark > is mostly using it for caching stuff... I have used similar tweaks in spark > 1.4 and tried it on spark 1.6 and that solved some problems... > > Eugen > > 2016-03-03 15:59 GMT-08:00 Andy Dang <nam...@gmail.com>: > >> Spark shuffling algorithm is very aggressive in storing everything in >> RAM, and the behavior is worse in 1.6 with the UnifiedMemoryManagement. At >> least in previous versions you can limit the shuffler memory, but Spark 1.6 >> will use as much memory as it can get. What I see is that Spark seems to >> underestimate the amount of memory that objects take up, and thus doesn't >> spill frequently enough. There's a dirty work around (legacy mode) but the >> common advice is to increase your parallelism (and keep in mind that >> operations such as join have implicit parallelism, so you'll want to be >> explicit about it). >> >> ------- >> Regards, >> Andy >> >> On Mon, Feb 22, 2016 at 2:12 PM, Alex Dzhagriev <dzh...@gmail.com> wrote: >> >>> Hello all, >>> >>> I'm using spark 1.6 and trying to cache a dataset which is 1.5 TB, I >>> have only ~800GB RAM in total, so I am choosing the DISK_ONLY storage >>> level. Unfortunately, I'm getting out of the overhead memory limit: >>> >>> >>> Container killed by YARN for exceeding memory limits. 27.0 GB of 27 GB >>> physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. >>> >>> >>> I'm giving 6GB overhead memory and using 10 cores per executor. >>> Apparently, that's not enough. Without persisting the data and later >>> computing the dataset (twice in my case) the job works fine. Can anyone, >>> please, explain what is the overhead which consumes that much memory during >>> persist to the disk and how can I estimate what extra memory should I give >>> to the executors in order to make it not fail? >>> >>> Thanks, Alex. >>> >> >> >