[ https://issues.apache.org/jira/browse/SPARK-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614562#comment-14614562 ]
Saisai Shao commented on SPARK-8827: ------------------------------------ I think you could get top values with follow pattern like: {code} dstream.foreachRDD(lambda r: r.top(...)) {code} this will get the top values of each bath duration, you could take a try. > pyspark.DStream top method > -------------------------- > > Key: SPARK-8827 > URL: https://issues.apache.org/jira/browse/SPARK-8827 > Project: Spark > Issue Type: Improvement > Components: PySpark, Streaming > Reporter: Emanuele Cesena > Priority: Minor > > Is there a reason for not having DStream.top? > Any issues with the following definition? > {code} > def topPartition(partition): > return sorted(partition, key=lambda p: p[1], reverse=True)[:10] > def top(counts): > return counts.transform(lambda rdd: rdd.mapPartitions(topPartition) > .sortBy(lambda p: p[1],ascending=False)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org