On the most recent several releases of EMR, Spark dynamicAllocation is
automatically enabled, as it allows longer running apps like Zeppelin's
Spark interpreter to continue running in the background without taking up
resources for any executors unless Spark jobs are actively running.

However, if you are seeing resources still being used even after some idle
time, maybe you are using maximizeResourceAllocation (which makes any Spark
job use 100% of the resources, with one executor per slave node). If you
use maximizeResourceAllocation, it effectively disables dynamicAllocation
because it causes spark.executor.instances to be set. If you still want to
use dynamicAllocation along with maxizeResourceAllocation, just set
spark.dynamicAllocation.enabled to true in the spark-defaults configuration
classification. This will signal to the maximizeResourceAllocation feature
not to set spark.executor.instances so that dynamicAllocation will be used.

Keep in mind that this might not be the most ideal way to use
dynamicAllocation though (especially if you don't have many nodes in the
cluster) because the maximizeResourceAllocation feature would make the
executors very coarsely grained since there's only one per node. It would
still allow multiple applications to run at once though because executors
from one application could spin down when idle, allowing another
application to spin up executors.

Hope this helps,
Jonathan
On Mon, Oct 3, 2016 at 5:38 PM Jung, Soonoh <soonoh.j...@gmail.com> wrote:

> Hi everyone,
>
> I am using Zeppelin in AWS EMR (Zeppelin 0.6.1, spark 2.0 on Yarn RM)
> Basically Zeppelin spark interpreter's spark job is not finishing after
> executing a notebook.
> It looks like the spark job still occupying memory a lot in my Yarn
> cluster.
> Is there a way restart spark interpreter automatically(or pragmatically)
> every time I run a notebook in order to release that memory in my Yarn
> cluster?
>
> Regards,
> Soonoh
>

Reply via email to