Hi Burak, Thanks for the response. Can you please elaborate on your idea of storing the state of the unique ids. Do you have any sample code or links I can refer to. Thanks
On Wed, Jan 25, 2017 at 9:13 AM, Burak Yavuz <brk...@gmail.com> wrote: > Off the top of my head... (Each may have it's own issues) > > If upstream you add a uniqueId to all your records, then you may use a > BloomFilter to approximate if you've seen a row before. > The problem I can see with that approach is how to repopulate the bloom > filter on restarts. > > If you are certain that you're not going to reprocess some data after a > certain time, i.e. there is no way I'm going to get the same data in 2 > hours, it may only happen in the last 2 hours, then you may also keep the > state of uniqueId's as well, and then age them out after a certain time. > > > Best, > Burak > > On Tue, Jan 24, 2017 at 9:53 PM, shyla deshpande <deshpandesh...@gmail.com > > wrote: > >> Please share your thoughts..... >> >> On Tue, Jan 24, 2017 at 4:01 PM, shyla deshpande < >> deshpandesh...@gmail.com> wrote: >> >>> >>> >>> On Tue, Jan 24, 2017 at 9:44 AM, shyla deshpande < >>> deshpandesh...@gmail.com> wrote: >>> >>>> My streaming application stores lot of aggregations using mapWithState. >>>> >>>> I want to know what are all the possible ways I can make it idempotent. >>>> >>>> Please share your views. >>>> >>>> Thanks >>>> >>>> On Mon, Jan 23, 2017 at 5:41 PM, shyla deshpande < >>>> deshpandesh...@gmail.com> wrote: >>>> >>>>> In a Wordcount application which stores the count of all the words >>>>> input so far using mapWithState. How do I make sure my counts are not >>>>> messed up if I happen to read a line more than once? >>>>> >>>>> Appreciate your response. >>>>> >>>>> Thanks >>>>> >>>> >>>> >>> >> >