Re: Maintaining overall cumulative data in Spark Streaming

Sandeep Giri Fri, 30 Oct 2015 06:30:07 -0700

How to we reset the aggregated statistics to null?

Regards,
Sandeep Giri,
+1 347 781 4573 (US)
+91-953-899-8962 (IN)


www.KnowBigData.com. <http://KnowBigData.com.>
Phone: +1-253-397-1945 (Office)

[image: linkedin icon] <https://linkedin.com/company/knowbigdata> [image:
other site icon] <http://knowbigdata.com>  [image: facebook icon]
<https://facebook.com/knowbigdata> [image: twitter icon]
<https://twitter.com/IKnowBigData> <https://twitter.com/IKnowBigData>


On Fri, Oct 30, 2015 at 9:49 AM, Sandeep Giri <sand...@knowbigdata.com>
wrote:

> Yes, update state by key worked.
>
> Though there are some more complications.
> On Oct 30, 2015 8:27 AM, "skaarthik oss" <skaarthik....@gmail.com> wrote:
>
>> Did you consider UpdateStateByKey operation?
>>
>>
>>
>> *From:* Sandeep Giri [mailto:sand...@knowbigdata.com]
>> *Sent:* Thursday, October 29, 2015 3:09 PM
>> *To:* user <user@spark.apache.org>; dev <d...@spark.apache.org>
>> *Subject:* Maintaining overall cumulative data in Spark Streaming
>>
>>
>>
>> Dear All,
>>
>>
>>
>> If a continuous stream of text is coming in and you have to keep
>> publishing the overall word count so far since 0:00 today, what would you
>> do?
>>
>>
>>
>> Publishing the results for a window is easy but if we have to keep
>> aggregating the results, how to go about it?
>>
>>
>>
>> I have tried to keep an StreamRDD with aggregated count and keep doing a
>> fullouterjoin but didn't work. Seems like the StreamRDD gets reset.
>>
>>
>>
>> Kindly help.
>>
>>
>>
>> Regards,
>>
>> Sandeep Giri
>>
>>
>>
>

Re: Maintaining overall cumulative data in Spark Streaming

Reply via email to