Its a pretty nice question ! I'll trying to understand the problem, and see can help further.
When you say CustomRDD I believe you will using it in the transformation stage, once the data is read from a file source like HDFS or Cassandra or Kafka. Now the RDD.getPartitions() should return the partitions its having, and in case of HashPartitioner (which will be used in functions like reduceByKey), the partition of the key will be identified like partition_num=(key.hashCode%numOfParttions) ...... so when the RDD(partitions) reaches to nodes (reducer phase) say after reduceByKey, groupByKey can have few partitions based on the keys it contains which are mapped to partitions. So, I believe it is not required to match the numOfPartitons of the HashPartitoner to the getPartitions of the RDD. Thanks, Manish On Fri, Dec 2, 2016 at 1:53 PM, Amit Sela <amitsel...@gmail.com> wrote: > This might be a silly question, but I wanted to make sure, when > implementing my own RDD, if using a HashPartitioner as the RDD's > partitioner the number of partitions returned by the implementation of > getPartitions() has to match the number of partitions set in the > HashPartitioner, correct ? >