I think the word "partition" here is a tad different than the term
"partition" that we use in Spark. Basically, I want something similar to
Guava's Iterables.partition [1], that is, If I have an RDD[People] and I
want to run an algorithm that can be optimized by working on 30 people at a
time, I'd like to be able to say:

val rdd: RDD[People] = .....
val partitioned: RDD[Seq[People]] = rdd.partition(30)....

I also don't want any shuffling- everything can still be processed locally.


[1]
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Iterables.html#partition(java.lang.Iterable,%20int)

Reply via email to