Hi, When launching a job in Spark, I have great trouble deciding the number of tasks. Someone says it is better to create a task per HDFS block size, i.e., make sure one task process 128MB of input data; others suggest that the number of tasks should be the twice of the total cores available to the job. Also, I found that someone suggests launching small tasks using Spark, i.e., make sure each task lasts around 100ms.
I am quite confused about all these suggestions. Is there any general rule for deciding the number of tasks in Spark? Great thanks! Best