On the most recent several releases of EMR, Spark dynamicAllocation is automatically enabled, as it allows longer running apps like Zeppelin's Spark interpreter to continue running in the background without taking up resources for any executors unless Spark jobs are actively running.
However, if you are seeing resources still being used even after some idle time, maybe you are using maximizeResourceAllocation (which makes any Spark job use 100% of the resources, with one executor per slave node). If you use maximizeResourceAllocation, it effectively disables dynamicAllocation because it causes spark.executor.instances to be set. If you still want to use dynamicAllocation along with maxizeResourceAllocation, just set spark.dynamicAllocation.enabled to true in the spark-defaults configuration classification. This will signal to the maximizeResourceAllocation feature not to set spark.executor.instances so that dynamicAllocation will be used. Keep in mind that this might not be the most ideal way to use dynamicAllocation though (especially if you don't have many nodes in the cluster) because the maximizeResourceAllocation feature would make the executors very coarsely grained since there's only one per node. It would still allow multiple applications to run at once though because executors from one application could spin down when idle, allowing another application to spin up executors. Hope this helps, Jonathan On Mon, Oct 3, 2016 at 5:38 PM Jung, Soonoh <soonoh.j...@gmail.com> wrote: > Hi everyone, > > I am using Zeppelin in AWS EMR (Zeppelin 0.6.1, spark 2.0 on Yarn RM) > Basically Zeppelin spark interpreter's spark job is not finishing after > executing a notebook. > It looks like the spark job still occupying memory a lot in my Yarn > cluster. > Is there a way restart spark interpreter automatically(or pragmatically) > every time I run a notebook in order to release that memory in my Yarn > cluster? > > Regards, > Soonoh >