Here's a specific example of what I want to do. My Spark application is
running with total-executor-cores=8. A request comes in, it spawns a thread
to handle that request, and starts a job. That job should use only 4 cores,
not all 8 of the cores available to the cluster.. When the first job is
scheduled, it should take only 4 cores, not all 8 of the cores that are
available to the driver.

Is there any way to accomplish this? This is on mesos.

In order to support the use cases described in
https://spark.apache.org/docs/latest/job-scheduling.html, where a spark
application runs for a long time and handles requests from multiple users,
I believe what I'm asking about is a very important feature. One of the
goals is to get lower latency for each request, but if the first request
takes all resources and we can't guarantee any free resources for the
second request, then that defeats the purpose. Does that make sense?

Thanks in advance for any advice you can provide!

- Philip

On Sat, Sep 12, 2015 at 10:40 PM, Philip Weaver <philip.wea...@gmail.com>
wrote:

> I'm playing around with dynamic allocation in spark-1.5.0, with the FAIR
> scheduler, so I can define a long-running application capable of executing
> multiple simultaneous spark jobs.
>
> The kind of jobs that I'm running do not benefit from more than 4 cores,
> but I want my application to be able to take several times that in order to
> run multiple jobs at the same time.
>
> I suppose my question is more basic: How can I limit the number of cores
> used to load an RDD or DataFrame? I can immediately repartition or coalesce
> my RDD or DataFrame to 4 partitions after I load it, but that doesn't stop
> Spark from using more cores to load it.
>
> Does it make sense what I am trying to accomplish, and is there any way to
> do it?
>
> - Philip
>
>

Reply via email to