hey,
running my first map-red like (meaning disk-to-disk, avoiding in memory
RDDs) computation in spark on yarn i immediately got bitten by a too low
spark.yarn.executor.memoryOverhead. however it took me about an hour to
find out this was the cause. at first i observed failing shuffles leading
to restarting of tasks, then i realized this was because executors could
not be reached, then i noticed in containers got shut down and reallocated
in resourcemanager logs (no mention of errors, it seemed the containers
finished their business and shut down successfully), and finally i found
the reason in nodemanager logs.

i dont think this is a pleasent first experience. i realize
spark.yarn.executor.memoryOverhead needs to be set differently from
situation to situation. but shouldnt the default be a somewhat higher value
so that these errors are unlikely, and then the experts that are willing to
deal with these errors can tune it lower? so why not make the default 10%
instead of 7%? that gives something that works in most situations out of
the box (at the cost of being a little wasteful). it worked for me.

Reply via email to