Hi,

On Thu, Jan 29, 2015 at 1:54 AM, YaoPau <jonrgr...@gmail.com> wrote:
>
> My thinking is to maintain state in an RDD and update it an persist it with
> each 2-second pass, but this also seems like it could get messy.  Any
> thoughts or examples that might help me?
>

I have just implemented some timestamp-based windowing on DStreams (can't
share the code now, but will be published a couple of months ahead),
although with the assumption that items are in correct order. The main
challenge (rather technical) was to keep proper state across RDD boundaries
and to tell the state "you can mark this partial window from the last
interval as 'complete' now" without shuffling too much data around. For
example, if there are some empty intervals, you don't know when the next
item to go into the partial window will arrive, or if there will be one at
all. I guess if you want to have out-of-order tolerance, that will become
even trickier, as you need to define and think about some timeout for
partial windows in your state...

Tobias

Reply via email to