Re: bitten by spark.yarn.executor.memoryOverhead

Corey Nolet Sat, 28 Feb 2015 13:52:18 -0800

Thanks for taking this on Ted!

On Sat, Feb 28, 2015 at 4:17 PM, Ted Yu <yuzhih...@gmail.com> wrote:


> I have created SPARK-6085 with pull request:
> https://github.com/apache/spark/pull/4836
>
> Cheers
>
> On Sat, Feb 28, 2015 at 12:08 PM, Corey Nolet <cjno...@gmail.com> wrote:
>
>> +1 to a better default as well.
>>
>> We were working find until we ran against a real dataset which was much
>> larger than the test dataset we were using locally. It took me a couple
>> days and digging through many logs to figure out this value was what was
>> causing the problem.
>>
>> On Sat, Feb 28, 2015 at 11:38 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Having good out-of-box experience is desirable.
>>>
>>> +1 on increasing the default.
>>>
>>>
>>> On Sat, Feb 28, 2015 at 8:27 AM, Sean Owen <so...@cloudera.com> wrote:
>>>
>>>> There was a recent discussion about whether to increase or indeed make
>>>> configurable this kind of default fraction. I believe the suggestion
>>>> there too was that 9-10% is a safer default.
>>>>
>>>> Advanced users can lower the resulting overhead value; it may still
>>>> have to be increased in some cases, but a fatter default may make this
>>>> kind of surprise less frequent.
>>>>
>>>> I'd support increasing the default; any other thoughts?
>>>>
>>>> On Sat, Feb 28, 2015 at 3:34 PM, Koert Kuipers <ko...@tresata.com>
>>>> wrote:
>>>> > hey,
>>>> > running my first map-red like (meaning disk-to-disk, avoiding in
>>>> memory
>>>> > RDDs) computation in spark on yarn i immediately got bitten by a too
>>>> low
>>>> > spark.yarn.executor.memoryOverhead. however it took me about an hour
>>>> to find
>>>> > out this was the cause. at first i observed failing shuffles leading
>>>> to
>>>> > restarting of tasks, then i realized this was because executors could
>>>> not be
>>>> > reached, then i noticed in containers got shut down and reallocated in
>>>> > resourcemanager logs (no mention of errors, it seemed the containers
>>>> > finished their business and shut down successfully), and finally i
>>>> found the
>>>> > reason in nodemanager logs.
>>>> >
>>>> > i dont think this is a pleasent first experience. i realize
>>>> > spark.yarn.executor.memoryOverhead needs to be set differently from
>>>> > situation to situation. but shouldnt the default be a somewhat higher
>>>> value
>>>> > so that these errors are unlikely, and then the experts that are
>>>> willing to
>>>> > deal with these errors can tune it lower? so why not make the default
>>>> 10%
>>>> > instead of 7%? that gives something that works in most situations out
>>>> of the
>>>> > box (at the cost of being a little wasteful). it worked for me.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>

Re: bitten by spark.yarn.executor.memoryOverhead

Reply via email to