RE: Maintaining overall cumulative data in Spark Streaming

Sandeep Giri Thu, 29 Oct 2015 21:19:39 -0700

Yes, update state by key worked.

Though there are some more complications.
On Oct 30, 2015 8:27 AM, "skaarthik oss" <skaarthik....@gmail.com> wrote:


> Did you consider UpdateStateByKey operation?
>
>
>
> *From:* Sandeep Giri [mailto:sand...@knowbigdata.com]
> *Sent:* Thursday, October 29, 2015 3:09 PM
> *To:* user <user@spark.apache.org>; dev <d...@spark.apache.org>
> *Subject:* Maintaining overall cumulative data in Spark Streaming
>
>
>
> Dear All,
>
>
>
> If a continuous stream of text is coming in and you have to keep
> publishing the overall word count so far since 0:00 today, what would you
> do?
>
>
>
> Publishing the results for a window is easy but if we have to keep
> aggregating the results, how to go about it?
>
>
>
> I have tried to keep an StreamRDD with aggregated count and keep doing a
> fullouterjoin but didn't work. Seems like the StreamRDD gets reset.
>
>
>
> Kindly help.
>
>
>
> Regards,
>
> Sandeep Giri
>
>
>

RE: Maintaining overall cumulative data in Spark Streaming

Reply via email to