Deenar, I haven't heard of any activity to do partitioning in that way, but it does seem more broadly valuable.
On Fri, May 2, 2014 at 10:15 AM, deenar.toraskar <deenar.toras...@db.com>wrote: > I have equal sized partitions now, but I want the RDD to be partitioned > such > that the partitions are equally weighted by some attribute of each RDD > element (e.g. size or complexity). > > I have been looking at the RangePartitioner code and I have come up with > something like > > EquallyWeightedPartitioner(noOfPartitions, weightFunction) > > 1) take a sum or (sample) of complexities of all elements and calculate > average weight per partition > 2) take a histogram of weights > 3) assign a list of partitions to each bucket > 4) getPartition(key: Any): Int would > a) get the weight and then find the bucket > b) assign a random partition from the list of partitions associated with > each bucket > > Just wanted to know if someone else had come across this issue before and > there was a better way of doing this. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Equally-weighted-partitions-in-Spark-tp5171p5212.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >