Try looking at the .mapPartitions( ) method implemented for RDD[T] objects. It will give you direct access to an iterator containing the member objects of each partition for doing the kind of within-partition hashtag counts you're describing.
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Selecting-first-ten-values-in-a-RDD-partition-tp6517p6534.html Sent from the Apache Spark User List mailing list archive at Nabble.com.