I'm playing around with dynamic allocation in spark-1.5.0, with the FAIR
scheduler, so I can define a long-running application capable of executing
multiple simultaneous spark jobs.

The kind of jobs that I'm running do not benefit from more than 4 cores,
but I want my application to be able to take several times that in order to
run multiple jobs at the same time.

I suppose my question is more basic: How can I limit the number of cores
used to load an RDD or DataFrame? I can immediately repartition or coalesce
my RDD or DataFrame to 4 partitions after I load it, but that doesn't stop
Spark from using more cores to load it.

Does it make sense what I am trying to accomplish, and is there any way to
do it?

- Philip

Reply via email to