bq. that 0.1 is "always" enough? The answer is: it depends (on use cases). The value of 0.1 has been validated by several users. I think it is a reasonable default.
Cheers On Mon, Mar 2, 2015 at 8:36 AM, Ryan Williams <ryan.blake.willi...@gmail.com > wrote: > For reference, the initial version of #3525 > <https://github.com/apache/spark/pull/3525> (still open) made this > fraction a configurable value, but consensus went against that being > desirable so I removed it and marked SPARK-4665 > <https://issues.apache.org/jira/browse/SPARK-4665> as "won't fix". > > My team wasted a lot of time on this failure mode as well and has settled > in to passing "--conf spark.yarn.executor.memoryOverhead=1024" to most > jobs (that works out to 10-20% of --executor-memory, depending on the job). > > I agree that learning about this the hard way is a weak part of the > Spark-on-YARN onboarding experience. > > The fact that our instinct here is to increase the 0.07 minimum instead of > the alternate 384MB > <https://github.com/apache/spark/blob/3efd8bb6cf139ce094ff631c7a9c1eb93fdcd566/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala#L93> > minimum seems like evidence that the fraction is the thing we should allow > people to configure, instead of absolute amount that is currently > configurable. > > Finally, do we feel confident that 0.1 is "always" enough? > > > On Sat, Feb 28, 2015 at 4:51 PM Corey Nolet <cjno...@gmail.com> wrote: > >> Thanks for taking this on Ted! >> >> On Sat, Feb 28, 2015 at 4:17 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> I have created SPARK-6085 with pull request: >>> https://github.com/apache/spark/pull/4836 >>> >>> Cheers >>> >>> On Sat, Feb 28, 2015 at 12:08 PM, Corey Nolet <cjno...@gmail.com> wrote: >>> >>>> +1 to a better default as well. >>>> >>>> We were working find until we ran against a real dataset which was much >>>> larger than the test dataset we were using locally. It took me a couple >>>> days and digging through many logs to figure out this value was what was >>>> causing the problem. >>>> >>>> On Sat, Feb 28, 2015 at 11:38 AM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> Having good out-of-box experience is desirable. >>>>> >>>>> +1 on increasing the default. >>>>> >>>>> >>>>> On Sat, Feb 28, 2015 at 8:27 AM, Sean Owen <so...@cloudera.com> wrote: >>>>> >>>>>> There was a recent discussion about whether to increase or indeed make >>>>>> configurable this kind of default fraction. I believe the suggestion >>>>>> there too was that 9-10% is a safer default. >>>>>> >>>>>> Advanced users can lower the resulting overhead value; it may still >>>>>> have to be increased in some cases, but a fatter default may make this >>>>>> kind of surprise less frequent. >>>>>> >>>>>> I'd support increasing the default; any other thoughts? >>>>>> >>>>>> On Sat, Feb 28, 2015 at 3:34 PM, Koert Kuipers <ko...@tresata.com> >>>>>> wrote: >>>>>> > hey, >>>>>> > running my first map-red like (meaning disk-to-disk, avoiding in >>>>>> memory >>>>>> > RDDs) computation in spark on yarn i immediately got bitten by a >>>>>> too low >>>>>> > spark.yarn.executor.memoryOverhead. however it took me about an >>>>>> hour to find >>>>>> > out this was the cause. at first i observed failing shuffles >>>>>> leading to >>>>>> > restarting of tasks, then i realized this was because executors >>>>>> could not be >>>>>> > reached, then i noticed in containers got shut down and reallocated >>>>>> in >>>>>> > resourcemanager logs (no mention of errors, it seemed the containers >>>>>> > finished their business and shut down successfully), and finally i >>>>>> found the >>>>>> > reason in nodemanager logs. >>>>>> > >>>>>> > i dont think this is a pleasent first experience. i realize >>>>>> > spark.yarn.executor.memoryOverhead needs to be set differently from >>>>>> > situation to situation. but shouldnt the default be a somewhat >>>>>> higher value >>>>>> > so that these errors are unlikely, and then the experts that are >>>>>> willing to >>>>>> > deal with these errors can tune it lower? so why not make the >>>>>> default 10% >>>>>> > instead of 7%? that gives something that works in most situations >>>>>> out of the >>>>>> > box (at the cost of being a little wasteful). it worked for me. >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>> >>>>>> >>>>> >>>> >>> >>