Hi Fei, How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ?
https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L395 coalesce is mostly used for reducing the number of partitions before writing to HDFS, but it might still be a narrow dependency (satisfying your requirements) if you increase the # of partitions. Best, Anastasios On Sun, Jan 15, 2017 at 12:58 AM, Fei Hu <hufe...@gmail.com> wrote: > Dear all, > > I want to equally divide a RDD partition into two partitions. That means, > the first half of elements in the partition will create a new partition, > and the second half of elements in the partition will generate another new > partition. But the two new partitions are required to be at the same node > with their parent partition, which can help get high data locality. > > Is there anyone who knows how to implement it or any hints for it? > > Thanks in advance, > Fei > > -- -- Anastasios Zouzias <a...@zurich.ibm.com>