Hi Rishi, Thanks for your reply! The RDD has 24 partitions, and the cluster has a master node + 24 computing nodes (12 cores per node). Each node will have a partition, and I want to split each partition to two sub-partitions on the same node to improve the parallelism and achieve high data locality.
Thanks, Fei On Sun, Jan 15, 2017 at 2:33 AM, Rishi Yadav <ri...@infoobjects.com> wrote: > Can you provide some more details: > 1. How many partitions does RDD have > 2. How big is the cluster > On Sat, Jan 14, 2017 at 3:59 PM Fei Hu <hufe...@gmail.com> wrote: > >> Dear all, >> >> I want to equally divide a RDD partition into two partitions. That means, >> the first half of elements in the partition will create a new partition, >> and the second half of elements in the partition will generate another new >> partition. But the two new partitions are required to be at the same node >> with their parent partition, which can help get high data locality. >> >> Is there anyone who knows how to implement it or any hints for it? >> >> Thanks in advance, >> Fei >> >>