Re: Limiting number of cores per job in multi-threaded driver.

Philip Weaver Fri, 02 Oct 2015 21:03:26 -0700

You can't really say 8 cores is not much horsepower when you have no idea
what my use case is. That's silly.


On Fri, Sep 18, 2015 at 10:33 PM, Adrian Tanase <atan...@adobe.com> wrote:

> Forgot to mention that you could also restrict the parallelism to 4,
> essentially using only 4 cores at any given time, however if your job is
> complex, a stage might be broken into more than 1 task...
>
> Sent from my iPhone
>
> On 19 Sep 2015, at 08:30, Adrian Tanase <atan...@adobe.com> wrote:
>
> Reading through the docs it seems that with a combination of FAIR
> scheduler and maybe pools you can get pretty far.
>
> However the smallest unit of scheduled work is the task so probably you
> need to think about the parallelism of each transformation.
>
> I'm guessing that by increasing the level of parallelism you get many
> smaller tasks that the scheduler can then run across the many jobs you
> might have - as opposed to fewer, longer tasks...
>
> Lastly, 8 cores is not that much horsepower :)
> You may consider running with beefier machines or a larger cluster, to get
> at least tens of cores.
>
> Hope this helps,
> -adrian
>
> Sent from my iPhone
>
> On 18 Sep 2015, at 18:37, Philip Weaver <philip.wea...@gmail.com> wrote:
>
> Here's a specific example of what I want to do. My Spark application is
> running with total-executor-cores=8. A request comes in, it spawns a thread
> to handle that request, and starts a job. That job should use only 4 cores,
> not all 8 of the cores available to the cluster.. When the first job is
> scheduled, it should take only 4 cores, not all 8 of the cores that are
> available to the driver.
>
> Is there any way to accomplish this? This is on mesos.
>
> In order to support the use cases described in
> https://spark.apache.org/docs/latest/job-scheduling.html, where a spark
> application runs for a long time and handles requests from multiple users,
> I believe what I'm asking about is a very important feature. One of the
> goals is to get lower latency for each request, but if the first request
> takes all resources and we can't guarantee any free resources for the
> second request, then that defeats the purpose. Does that make sense?
>
> Thanks in advance for any advice you can provide!
>
> - Philip
>
> On Sat, Sep 12, 2015 at 10:40 PM, Philip Weaver <philip.wea...@gmail.com>
> wrote:
>
>> I'm playing around with dynamic allocation in spark-1.5.0, with the FAIR
>> scheduler, so I can define a long-running application capable of executing
>> multiple simultaneous spark jobs.
>>
>> The kind of jobs that I'm running do not benefit from more than 4 cores,
>> but I want my application to be able to take several times that in order to
>> run multiple jobs at the same time.
>>
>> I suppose my question is more basic: How can I limit the number of cores
>> used to load an RDD or DataFrame? I can immediately repartition or coalesce
>> my RDD or DataFrame to 4 partitions after I load it, but that doesn't stop
>> Spark from using more cores to load it.
>>
>> Does it make sense what I am trying to accomplish, and is there any way
>> to do it?
>>
>> - Philip
>>
>>
>

Re: Limiting number of cores per job in multi-threaded driver.

Reply via email to