partitionedSource is an RDD, right? If so, then
partitionedSource.countshould return the number of elements in the
RDD, regardless of how many
partitions it’s split into.
If you want to count the number of elements per partition, you’ll need to
use RDD.mapPartitions, I believe.
On Sat, May 24,
Hi, dear user group:
I recently try to use the parallelize method of SparkContext to slice original
data into small pieces for further handling. Something like the below:
val partitionedSource = sparkContext.parallelize(seq, sparkPartitionSize)
The size of my original testing data is 88 objects