Hi Folks, We have a spark job that is occasionally running out of memory and hanging (I believe in GC). This is it's own issue we're debugging, but in the meantime, there's another unfortunate side effect. When the job is killed (most often because of GC errors), each worker attempts to kill its respective executor. However, it appears that several of the executors fail to shut themselves down (I actually have to kill -9 them). However, even though the worker fails to successfully cleanup the executor, it starts the next job as though all the resources have been freed up. This is causing the spark worker to exceed it's configured memory limit, which is in turn running our boxes out of memory. Is there a setting I can configure to prevent this issue? Perhaps by having the worker force kill the executor or not start the next job until it's confirmed the executor has exited? Let me know if there's any additional information I can provide.
Keith P.S. We're running spark 1.0.2