Hung spark executors don't count toward worker memory limit

Keith Simmons Thu, 09 Oct 2014 17:07:10 -0700

Hi Folks,

We have a spark job that is occasionally running out of memory and hanging
(I believe in GC).  This is it's own issue we're debugging, but in the
meantime, there's another unfortunate side effect.  When the job is killed
(most often because of GC errors), each worker attempts to kill its
respective executor.  However, it appears that several of the executors
fail to shut themselves down (I actually have to kill -9 them).  However,
even though the worker fails to successfully cleanup the executor, it
starts the next job as though all the resources have been freed up.  This
is causing the spark worker to exceed it's configured memory limit, which
is in turn running our boxes out of memory.  Is there a setting I can
configure to prevent this issue?  Perhaps by having the worker force kill
the executor or not start the next job until it's confirmed the executor
has exited?  Let me know if there's any additional information I can
provide.


Keith

P.S. We're running spark 1.0.2

Hung spark executors don't count toward worker memory limit

Reply via email to