Hi Sachin, We can optimize this problem in the following ways: - use org.apache.flink.streaming.api.datastream.WindowedStream#aggregate(org.apache.flink.api.common.functions.AggregateFunction<T,ACC,R>) to reduce number of data - use TTL to clean data which are not need - enble incremental checkpoint - use multi-level time window granularity for pre-aggregation can significantly improve performance and reduce computation latency
Best, Zhongqiang Gong Sachin Mittal <sjmit...@gmail.com> 于2024年5月17日周五 03:48写道: > Hi, > My pipeline step is something like this: > > SingleOutputStreamOperator<ReducedData> reducedData = > data > .keyBy(new KeySelector()) > .window( > TumblingEventTimeWindows.of(Time.seconds(secs))) > .reduce(new DataReducer()) > .name("reduce"); > > > This works fine for secs = 300. > However once I increase the time window to say 1 hour or 3600 the state > size increases as now it has a lot more records to reduce. > > Hence I need to allocate much more memory to the task manager. > > However there is no upper limit to this memory allocated. If the volume of > data increases by say 10 fold I would have no option but to again increase > the memory. > > Is there a better way to perform long window aggregation so overall this > step has a small memory footprint. > > Thanks > Sachin > >