Hi,

sliding windows replicate their records for each window.
If you have use an incrementally aggregating function (ReduceFunction,
AggregateFunction) with a sliding, the space requirement should not be an
issue because each window stores a single value.
However, this also means that each window performs its aggregations
independently from the others. So, if you many concurrent sliding windows,
pre-aggregate the records in a tumbling window can reduce the computational
effort.

Best, Fabian



2017-12-12 8:10 GMT+01:00 Jinhua Luo <luajit...@gmail.com>:

> Hi All,
>
> Given one stream source which generates 20k events/sec, and I need to
> aggregate the element count using sliding window of 1 hour size.
>
> The problem is, the window may buffer too many elements (which may
> cause a lot of block I/O because of checkpointing?), and in fact it
> does not necessary to store them for one hour, because the elements
> should get folded incrementally. But unlike Tumbling Window, the
> sliding window would save elements for next window, right?
>
> So I am considering kind of workaround, should I chain two window like
> below:
>
>             .timeWindow(Time.minutes(1))
>             ...
>             .timeWindow(Time.hours(1), Time.minutes(1))
>
> Here the first window generate 1 minute aggregation units and the
> second window provides the sliding output.
>
> Any suggestions? Thanks.
>

Reply via email to