Re: Mesos resource allocation

Tim Chen Mon, 05 Jan 2015 16:20:59 -0800

Hi Josh,

I see, I haven't heard folks using larger JVM heap size than you mentioned
(30gb), but in your scenario what you're proposing does make sense.


I've created SPARK-5095 and we can continue our discussion about how to
address this.

Tim


On Mon, Jan 5, 2015 at 1:22 AM, Josh Devins <j...@soundcloud.com> wrote:

> Hey Tim, sorry for the delayed reply, been on vacation for a couple weeks.
>
> The reason we want to control the number of executors is that running
> executors with JVM heaps over 30GB causes significant garbage
> collection problems. We have observed this through much
> trial-and-error for jobs that are a dozen-or-so stages, running for
> more than ~20m. For example, if we run 8 executors with 60GB heap each
> (for example, we have also other values larger than 30GB), even after
> much tuning of heap parameters (% for RDD cache, etc.) we run into GC
> problems. Effectively GC becomes so high that it takes over all
> compute time from the JVM. If we then halve the heap (30GB) but double
> the number of executors (16), all GC problems are relieved and we get
> to use the full memory resources of the cluster. We talked with some
> engineers from Databricks at Strata in Barcelona recently and received
> the same advice — do not run executors with more than 30GB heaps.
> Since our machines are 64GB machines and we are typically only running
> one or two jobs at a time on the cluster (for now), we can only use
> half the cluster memory with the current configuration options
> available in Mesos.
>
> Happy to hear your thoughts and actually very curious about how others
> are running Spark on Mesos with large heaps (as a result of large
> memory machines). Perhaps this is a non-issue when we have more
> multi-tenancy in the cluster, but for now, this is not the case.
>
> Thanks,
>
> Josh
>
>
> On 24 December 2014 at 06:22, Tim Chen <t...@mesosphere.io> wrote:
> >
> > Hi Josh,
> >
> > If you want to cap the amount of memory per executor in Coarse grain
> mode, then yes you only get 240GB of memory as you mentioned. What's the
> reason you don't want to raise the capacity of memory you use per executor?
> >
> > In coarse grain mode the Spark executor is long living and it internally
> will get tasks distributed by Spark internal Coarse grained scheduler. I
> think the assumption is that it already allocated the maximum available on
> that slave and don't really assume we need another one.
> >
> > I think it's worth considering having a configuration of number of cores
> per executor, especially when Mesos have inverse offers and optimistic
> offers so we can choose to launch more executors when resources becomes
> available even in coarse grain mode and then support giving the executors
> back but more higher priority tasks arrive.
> >
> > For fine grain mode, the spark executors are started by Mesos executors
> that is configured from Mesos scheduler backend. I believe the RDD is
> cached as long as the Mesos executor is running as the BlockManager is
> created on executor registration.
> >
> > Let me know if you need any more info.
> >
> > Tim
> >
> >
> >>
> >> ---------- Forwarded message ----------
> >> From: Josh Devins <j...@soundcloud.com>
> >> Date: 22 December 2014 at 17:23
> >> Subject: Mesos resource allocation
> >> To: user@spark.apache.org
> >>
> >>
> >> We are experimenting with running Spark on Mesos after running
> >> successfully in Standalone mode for a few months. With the Standalone
> >> resource manager (as well as YARN), you have the option to define the
> >> number of cores, number of executors and memory per executor. In
> >> Mesos, however, it appears as though you cannot specify the number of
> >> executors, even in coarse-grained mode. If this is the case, how do
> >> you define the number of executors to run with?
> >>
> >> Here's an example of why this matters (to us). Let's say we have the
> >> following cluster:
> >>
> >> num nodes: 8
> >> num cores: 256 (32 per node)
> >> total memory: 512GB (64GB per node)
> >>
> >> If I set my job to require 256 cores and per-executor-memory to 30GB,
> >> then Mesos will schedule a single executor per machine (8 executors
> >> total) and each executor will get 32 cores to work with. This means
> >> that we have 8 executors * 32GB each for a total of 240G of cluster
> >> memory in use — less than half of what is available. If you want
> >> actually 16 executors in order to increase the amount of memory in use
> >> across the cluster, how can you do this with Mesos? It seems that a
> >> parameter is missing (or I haven't found it yet) which lets me tune
> >> this for Mesos:
> >>  * number of executors per n-cores OR
> >>  * number of executors total
> >>
> >> Furthermore, in fine-grained mode in Mesos, how are the executors
> >> started/allocated? That is, since Spark tasks map to Mesos tasks, when
> >> and how are executors started? If they are transient and an executor
> >> per task is created, does this mean we cannot have cached RDDs?
> >>
> >> Thanks for any advice or pointers,
> >>
> >> Josh
> >
> >
> >
>

Re: Mesos resource allocation

Reply via email to