It may be that your system runs out of resources (ie 174 is the ceiling) due to the following
1. RDD Partition = (Spark) Task 2. RDD Partition != (Spark) Executor 3. (Spark) Task != (Spark) Executor 4. (Spark) Task = JVM Thread 5. (Spark) Executor = JVM instance From: ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com] Sent: Friday, June 5, 2015 10:48 AM To: user Subject: How to increase the number of tasks I have a stage that spawns 174 tasks when i run repartition on avro data. Tasks read between 512/317/316/214/173 MB of data. Even if i increase number of executors/ number of partitions (when calling repartition) the number of tasks launched remains fixed to 174. 1) I want to speed up this task. How do i do it ? 2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why is this behavior ? Since this is a repartition stage, it should not depend on the nature of data. Its taking more than 30 mins and i want to speed it up by throwing more executors at it. Please suggest Deepak