Re: What is the best way to aggregate data over a long window

gongzhongqiang Fri, 17 May 2024 00:43:57 -0700

Hi  Sachin,

We can optimize this problem in the following ways:
-
use 
org.apache.flink.streaming.api.datastream.WindowedStream#aggregate(org.apache.flink.api.common.functions.AggregateFunction<T,ACC,R>)
to reduce number of data
- use TTL to clean data which are not need
- enble incremental checkpoint
- use
multi-level time window granularity for pre-aggregation can
significantly improve performance and reduce computation latency


Best,
Zhongqiang Gong

Sachin Mittal <sjmit...@gmail.com> 于2024年5月17日周五 03:48写道：

> Hi,
> My pipeline step is something like this:
>
> SingleOutputStreamOperator<ReducedData> reducedData =
>     data
>         .keyBy(new KeySelector())
>         .window(
>             TumblingEventTimeWindows.of(Time.seconds(secs)))
>         .reduce(new DataReducer())
>         .name("reduce");
>
>
> This works fine for secs = 300.
> However once I increase the time window to say 1 hour or 3600 the state
> size increases as now it has a lot more records to reduce.
>
> Hence I need to allocate much more memory to the task manager.
>
> However there is no upper limit to this memory allocated. If the volume of
> data increases by say 10 fold I would have no option but to again increase
> the memory.
>
> Is there a better way to perform long window aggregation so overall this
> step has a small memory footprint.
>
> Thanks
> Sachin
>
>

Re: What is the best way to aggregate data over a long window

Reply via email to