I have a DSTREAM which consists of RDD partitioned every 2 sec. I have sorted each RDD and want to retain only top 10 values and discard further value. How can I retain only top 10 values ?
I am trying to get top 10 hashtags. Instead of sorting the entire of 5-minute-counts (thereby, incurring the cost of a data shuffle), I am trying to get the top 10 hashtags in each partition. I am struck at how to retain top 10 hashtags in each partition. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Selecting-first-ten-values-in-a-RDD-partition-tp6517.html Sent from the Apache Spark User List mailing list archive at Nabble.com.