Deenar,

I haven't heard of any activity to do partitioning in that way, but it does
seem more broadly valuable.


On Fri, May 2, 2014 at 10:15 AM, deenar.toraskar <deenar.toras...@db.com>wrote:

> I have equal sized partitions now, but I want the RDD to be partitioned
> such
> that the partitions are equally weighted by some attribute of each RDD
> element (e.g. size or complexity).
>
> I have been looking at the RangePartitioner code and I have come up with
> something like
>
> EquallyWeightedPartitioner(noOfPartitions, weightFunction)
>
> 1) take a sum or (sample) of complexities of all elements and calculate
> average weight per partition
> 2) take a histogram of weights
> 3) assign a list of partitions to each bucket
> 4)  getPartition(key: Any): Int would
>   a) get the weight and then find the bucket
>   b) assign a random partition from the list of partitions associated with
> each bucket
>
> Just wanted to know if someone else had come across this issue before and
> there was a better way of doing this.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Equally-weighted-partitions-in-Spark-tp5171p5212.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to