Hi,

Flink will have to maintain state of the defined aggregations per each window 
and key (the more names you have, the bigger the state). Flinkā€™s state backend 
will be used for that (for example memory or rocksdb).

However in most cases state will be small and not dependent on the length of 
the window, but only on number of keys. In your case per each key (name) only 
one counter will be maintained. Same applies to sums and averages (averages 
will use counter and sum).

There is no magic way to deal with too large state. Either add more RAM to the 
cluster, fallback to using disks or rewrite your query/application so it will 
not need that large state.

Piotrek

> On 23 Nov 2017, at 20:23, Shivam Sharma <28shivamsha...@gmail.com> wrote:
> 
> Hi All,
> 
> I have a small question regarding where does Flink stores data for doing 
> window aggregations. Lets say I am running following query on Flink table:
> 
> SELECT name, count(*)
> FROM testTable
> GROUP BY TUMBLE(rowtime, INTERVAL '1' MINUTE), name
> 
> So, If I understand above query properly so it must be saving data for 1 
> minute somewhere to find aggregations. If Flink is persisting this in memory 
> then my concern is if I increase interval to a DAY or more then it will store 
> the complete data for interval which can cross memory. If persistence is disk 
> then latency will be there.
> 
> Basically how do we solve such kind of use-cases using FLINK where 
> aggregation interval are quite high.
> 
> Thanks in advance
> 
> -- 
> Shivam Sharma
> 

Reply via email to