Re: Fwd: Incremental aggregation using Fold and failure recovery

2017-06-29 Thread Ahmad Hassan
Thanks a lot Gordon! On 29 June 2017 at 13:39, Tzu-Li (Gordon) Tai wrote: > Sorry, one typo. > > public AverageAccumulator merge(WindowStats a, WindowStats b) { > > should be: > > public WindowStats merge(WindowStats a, WindowStats b) { > > On 29 June 2017 at 8:22:34 PM,

Re: Fwd: Incremental aggregation using Fold and failure recovery

2017-06-29 Thread Tzu-Li (Gordon) Tai
Sorry, one typo. public AverageAccumulator merge(WindowStats a, WindowStats b) { should be: public WindowStats merge(WindowStats a, WindowStats b) { On 29 June 2017 at 8:22:34 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) wrote: I see. Then yes, a fold operation would be more efficient here.

Re: Fwd: Incremental aggregation using Fold and failure recovery

2017-06-29 Thread Tzu-Li (Gordon) Tai
I see. Then yes, a fold operation would be more efficient here. However, can you give an idea on how to use aggregateFunction in latest flink to replace the following fold function? Sure! The documentation for 1.3 is still a bit lagging behind for some of the new features, but the Javadoc for

Re: Fwd: Incremental aggregation using Fold and failure recovery

2017-06-29 Thread Ahmad Hassan
Hi Gordon, Thanks for the details. I am using fold to process events and maintain statistics per each product ID within WindowStats instance. So fold is much efficient because events can be in millions but unique products will be less than 50k. However, if i use generic window function, It will

Re: Fwd: Incremental aggregation using Fold and failure recovery

2017-06-29 Thread Tzu-Li (Gordon) Tai
Hi Ahmad, Yes, that is correct. The aggregated fold value (i.e. your WindowStats instance) will be checkpointed by Flink as managed state, and restored from the last complete checkpoint in case of failures. One comment on using the fold function: if what you’re essentially doing in the fold is

Fwd: Incremental aggregation using Fold and failure recovery

2017-06-29 Thread Ahmad Hassan
Any thoughts on this problem please? Hi All, I am collecting millions of events per 24hour for 'N' number of products where 'N' can be 50k. I use the following fold mechanism with sliding window: final DataStream eventStream = inputStream .keyBy(TENANT, CATEGORY)