Thank you, Daniel and Yong!

On Wed, Jan 18, 2017 at 4:56 PM, Daniel Siegmann <
dsiegm...@securityscorecard.io> wrote:

> I am not too familiar with Spark Standalone, so unfortunately I cannot
> give you any definite answer. I do want to clarify something though.
>
> The properties spark.sql.shuffle.partitions and spark.default.parallelism
> affect how your data is split up, which will determine the *total* number
> of tasks, *NOT* the number of tasks being run in parallel. Except of
> course you will never run more tasks in parallel than there are total, so
> if your data is small you might be able to control it via these parameters
> - but that wouldn't typically be how you'd use these parameters.
>
> On YARN as you noted there is spark.executor.instances as well as
> spark.executor.cores, and you'd multiple them to determine the maximum
> number of tasks that would run in parallel on your cluster. But there is no
> guarantee the executors would be distributed evenly across nodes.
>
> Unfortunately I'm not familiar with how this works on Spark Standalone.
> Your expectations seem reasonable to me. Sorry I can't be helpful,
> hopefully someone else will be able to explain exactly how this works.
>



-- 
Saliya Ekanayake, Ph.D
Applied Computer Scientist
Network Dynamics and Simulation Science Laboratory (NDSSL)
Virginia Tech, Blacksburg

Reply via email to