subject:"Issue with the parallelize method in SparkContext"

Re: Issue with the parallelize method in SparkContext

2014-05-24 Thread Nicholas Chammas

partitionedSource is an RDD, right? If so, then partitionedSource.countshould return the number of elements in the RDD, regardless of how many partitions it’s split into. If you want to count the number of elements per partition, you’ll need to use RDD.mapPartitions, I believe. On Sat, May 24,

Issue with the parallelize method in SparkContext

2014-05-24 Thread Wisc Forum

Hi, dear user group: I recently try to use the parallelize method of SparkContext to slice original data into small pieces for further handling. Something like the below: val partitionedSource = sparkContext.parallelize(seq, sparkPartitionSize) The size of my original testing data is 88 objects