You can override the default partitioner with range partitioner<https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Partitioner.scala#L92>which distributes data in roughly equal sized partitions.
On Thu, May 1, 2014 at 11:14 PM, deenar.toraskar <deenar.toras...@db.com>wrote: > Yes > > On a job I am currently running, 99% of the partitions finish within > seconds > and a couple of partitions take around and hour to finish. I am pricing > some > instruments and complex instruments take far longer to price than plain > vanilla ones. If I could distribute these complex instruments evenly, the > overall job times would greatly reduce. > > > Deenar > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Equally-weighted-partitions-in-Spark-tp5171p5208.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >