Hi,

When launching a job in Spark, I have great trouble deciding the number of
tasks. Someone says it is better to create a task per HDFS block size,
i.e., make sure one task process 128MB of input data; others suggest that
the number of tasks should be the twice of the total cores available to the
job. Also, I found that someone suggests launching small tasks using Spark,
i.e., make sure each task lasts around 100ms.

I am quite confused about all these suggestions. Is there any general rule
for deciding the number of tasks in Spark? Great thanks!

Best

Reply via email to