The problem is that equally-sized partitions take variable time to complete based on their contents?
Sent from my mobile phone On May 1, 2014 8:31 AM, "deenar.toraskar" <deenar.toras...@db.com> wrote: > Hi > > I am using Spark to distribute computationally intensive tasks across the > cluster. Currently I partition my RDD of tasks randomly. There is a large > variation in how long each of the jobs take to complete, leading to most > partitions being processed quickly and a couple of partitions take forever > to complete. I can mitigate this problem by increasing the number of > partitions to some extent. > > Ideally i would like to partition tasks by complexity (Let's assume I can > get such a value from the task object) such that each sum of complexity in > of elements in each partition evenly distributed. Has anyone created such a > partitioner before? > > > Regards > Deenar > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Equally-weighted-partitions-in-Spark-tp5171.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >