I would think this should be done at the application level. After all, the core functionality of SparkStreaming is to capture RDDs in some real time interval and process them - not to aggregate their results.
But maybe there is a better way....... On Thu, Nov 13, 2014 at 8:28 PM, SK <skrishna...@gmail.com> wrote: > Hi, > > I am using the following code to generate the (score, count) for each > window: > > val score_count_by_window = topic.map(r => r._2) // r._2 is the integer > score > .countByValue() > > score_count_by_window.print() > > E.g. output for a window is as follows, which means that within the Dstream > for that window, there are 2 rdds with score 0; 3 with score 1, and 1 with > score -1. > (0, 2) > (1, 3) > (-1, 1) > > I would like to get the aggregate count for each score over all windows > until program terminates. I tried countByValueAndWindow() but the result is > same as countByValue() (i.e. it is producing only per window counts). > reduceByWindow also does not produce the result I am expecting. What is the > correct way to sum up the counts over multiple windows? > > thanks > > > > > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-getting-total-count-over-all-windows-tp18888.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- jay vyas