I did not change spark.default.parallelism, What is recommended value for it.
On Fri, Jun 5, 2015 at 3:31 PM, 李铖 <lidali...@gmail.com> wrote: > Did you have a change of the value of 'spark.default.parallelism'?be a > bigger number. > > 2015-06-05 17:56 GMT+08:00 Evo Eftimov <evo.efti...@isecc.com>: > >> It may be that your system runs out of resources (ie 174 is the ceiling) >> due to the following >> >> >> >> 1. RDD Partition = (Spark) Task >> >> 2. RDD Partition != (Spark) Executor >> >> 3. (Spark) Task != (Spark) Executor >> >> 4. (Spark) Task = JVM Thread >> >> 5. (Spark) Executor = JVM instance >> >> >> >> *From:* ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com] >> *Sent:* Friday, June 5, 2015 10:48 AM >> *To:* user >> *Subject:* How to increase the number of tasks >> >> >> >> I have a stage that spawns 174 tasks when i run repartition on avro >> data. >> >> Tasks read between 512/317/316/214/173 MB of data. Even if i increase >> number of executors/ number of partitions (when calling repartition) the >> number of tasks launched remains fixed to 174. >> >> >> >> 1) I want to speed up this task. How do i do it ? >> >> 2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why is >> this behavior ? >> >> Since this is a repartition stage, it should not depend on the nature of >> data. >> >> >> >> Its taking more than 30 mins and i want to speed it up by throwing more >> executors at it. >> >> >> >> Please suggest >> >> >> >> Deepak >> >> >> > > -- Deepak