Hi Burak,
Thanks for the response. Can you please elaborate on your idea of storing
the state of the unique ids.
Do you have any sample code or links I can refer to.
Thanks

On Wed, Jan 25, 2017 at 9:13 AM, Burak Yavuz <brk...@gmail.com> wrote:

> Off the top of my head... (Each may have it's own issues)
>
> If upstream you add a uniqueId to all your records, then you may use a
> BloomFilter to approximate if you've seen a row before.
> The problem I can see with that approach is how to repopulate the bloom
> filter on restarts.
>
> If you are certain that you're not going to reprocess some data after a
> certain time, i.e. there is no way I'm going to get the same data in 2
> hours, it may only happen in the last 2 hours, then you may also keep the
> state of uniqueId's as well, and then age them out after a certain time.
>
>
> Best,
> Burak
>
> On Tue, Jan 24, 2017 at 9:53 PM, shyla deshpande <deshpandesh...@gmail.com
> > wrote:
>
>> Please share your thoughts.....
>>
>> On Tue, Jan 24, 2017 at 4:01 PM, shyla deshpande <
>> deshpandesh...@gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, Jan 24, 2017 at 9:44 AM, shyla deshpande <
>>> deshpandesh...@gmail.com> wrote:
>>>
>>>> My streaming application stores lot of aggregations using mapWithState.
>>>>
>>>> I want to know what are all the possible ways I can make it idempotent.
>>>>
>>>> Please share your views.
>>>>
>>>> Thanks
>>>>
>>>> On Mon, Jan 23, 2017 at 5:41 PM, shyla deshpande <
>>>> deshpandesh...@gmail.com> wrote:
>>>>
>>>>> In a Wordcount application which  stores the count of all the words
>>>>> input so far using mapWithState.  How do I make sure my counts are not
>>>>> messed up if I happen to read a line more than once?
>>>>>
>>>>> Appreciate your response.
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to