Re: bitten by spark.yarn.executor.memoryOverhead

Ted Yu Mon, 02 Mar 2015 08:48:07 -0800

bq. that 0.1 is "always" enough?

The answer is: it depends (on use cases).
The value of 0.1 has been validated by several users. I think it is a
reasonable default.


Cheers

On Mon, Mar 2, 2015 at 8:36 AM, Ryan Williams <ryan.blake.willi...@gmail.com
> wrote:

> For reference, the initial version of #3525
> <https://github.com/apache/spark/pull/3525> (still open) made this
> fraction a configurable value, but consensus went against that being
> desirable so I removed it and marked SPARK-4665
> <https://issues.apache.org/jira/browse/SPARK-4665> as "won't fix".
>
> My team wasted a lot of time on this failure mode as well and has settled
> in to passing "--conf spark.yarn.executor.memoryOverhead=1024" to most
> jobs (that works out to 10-20% of --executor-memory, depending on the job).
>
> I agree that learning about this the hard way is a weak part of the
> Spark-on-YARN onboarding experience.
>
> The fact that our instinct here is to increase the 0.07 minimum instead of
> the alternate 384MB
> <https://github.com/apache/spark/blob/3efd8bb6cf139ce094ff631c7a9c1eb93fdcd566/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala#L93>
> minimum seems like evidence that the fraction is the thing we should allow
> people to configure, instead of absolute amount that is currently
> configurable.
>
> Finally, do we feel confident that 0.1 is "always" enough?
>
>
> On Sat, Feb 28, 2015 at 4:51 PM Corey Nolet <cjno...@gmail.com> wrote:
>
>> Thanks for taking this on Ted!
>>
>> On Sat, Feb 28, 2015 at 4:17 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> I have created SPARK-6085 with pull request:
>>> https://github.com/apache/spark/pull/4836
>>>
>>> Cheers
>>>
>>> On Sat, Feb 28, 2015 at 12:08 PM, Corey Nolet <cjno...@gmail.com> wrote:
>>>
>>>> +1 to a better default as well.
>>>>
>>>> We were working find until we ran against a real dataset which was much
>>>> larger than the test dataset we were using locally. It took me a couple
>>>> days and digging through many logs to figure out this value was what was
>>>> causing the problem.
>>>>
>>>> On Sat, Feb 28, 2015 at 11:38 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> Having good out-of-box experience is desirable.
>>>>>
>>>>> +1 on increasing the default.
>>>>>
>>>>>
>>>>> On Sat, Feb 28, 2015 at 8:27 AM, Sean Owen <so...@cloudera.com> wrote:
>>>>>
>>>>>> There was a recent discussion about whether to increase or indeed make
>>>>>> configurable this kind of default fraction. I believe the suggestion
>>>>>> there too was that 9-10% is a safer default.
>>>>>>
>>>>>> Advanced users can lower the resulting overhead value; it may still
>>>>>> have to be increased in some cases, but a fatter default may make this
>>>>>> kind of surprise less frequent.
>>>>>>
>>>>>> I'd support increasing the default; any other thoughts?
>>>>>>
>>>>>> On Sat, Feb 28, 2015 at 3:34 PM, Koert Kuipers <ko...@tresata.com>
>>>>>> wrote:
>>>>>> > hey,
>>>>>> > running my first map-red like (meaning disk-to-disk, avoiding in
>>>>>> memory
>>>>>> > RDDs) computation in spark on yarn i immediately got bitten by a
>>>>>> too low
>>>>>> > spark.yarn.executor.memoryOverhead. however it took me about an
>>>>>> hour to find
>>>>>> > out this was the cause. at first i observed failing shuffles
>>>>>> leading to
>>>>>> > restarting of tasks, then i realized this was because executors
>>>>>> could not be
>>>>>> > reached, then i noticed in containers got shut down and reallocated
>>>>>> in
>>>>>> > resourcemanager logs (no mention of errors, it seemed the containers
>>>>>> > finished their business and shut down successfully), and finally i
>>>>>> found the
>>>>>> > reason in nodemanager logs.
>>>>>> >
>>>>>> > i dont think this is a pleasent first experience. i realize
>>>>>> > spark.yarn.executor.memoryOverhead needs to be set differently from
>>>>>> > situation to situation. but shouldnt the default be a somewhat
>>>>>> higher value
>>>>>> > so that these errors are unlikely, and then the experts that are
>>>>>> willing to
>>>>>> > deal with these errors can tune it lower? so why not make the
>>>>>> default 10%
>>>>>> > instead of 7%? that gives something that works in most situations
>>>>>> out of the
>>>>>> > box (at the cost of being a little wasteful). it worked for me.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Re: bitten by spark.yarn.executor.memoryOverhead

Reply via email to