Hi,
the name pre-aggregation is a bit misleading. I have started calling it 
incremental aggregation because it does not work like a combiner.

What it does is to incrementally fold (or reduce) elements as they arrive at 
the window operator. This reduces the amount of required space, because, 
otherwise, all the elements would have to be stored before the window is 
triggered. When using an incremental fold (or reduce) the WindowFunction only 
get’s the one final result of the incremental aggregation.

Cheers,
Aljoscha
> On 17 Feb 2016, at 09:27, Stefano Baghino <stefano.bagh...@radicalbit.io> 
> wrote:
> 
> Hi Nirmalaya,
> 
> my reply was based on me misreading your original post, thinking you had a 
> batch of data, not a stream. I see that the apply method can also take a 
> reducer the pre-aggregates your data before passing it to the window 
> function. I suspect that pre-aggregation runs locally just like a combiner 
> would, but I'm really not sure about it. We should have more feedback on this 
> regard.
> 
> On Tue, Feb 16, 2016 at 2:19 AM, Nirmalya Sengupta 
> <sengupta.nirma...@gmail.com> wrote:
> Hello Stefano <stefano.bagh...@radicalbit.io>
> 
> Sorry for the late reply. Many thanks for taking effort to write and share an 
> example code snippet.
> 
> I have been playing with the countWindow behaviour for some weeks now and I 
> am generally aware of the functionality of countWindowAll(). For my useCase, 
> where I _have to observe_ the entire stream as it founts in, using 
> countWindowAll() is probably the most obvious solution. This is what you 
> recommend too. However, because this is going to use 1 thread only (or 1 node 
> only in a cluster), I was thinking about ways to make use of the 
> 'distributedness' of the framework. Hence, my question.
> 
> Your reply leads to me read and think a bit more. If I have to use 
> parallelism to achieve what I want to achieve, I think managing a ValueState 
> of my own is possibly the solution. If you have any other thoughts, please 
> share. 
> 
> From your  earlier response: '... you can still enjoy a high level of 
> parallelism up until the last operator by using a combiner, which is 
> basically a reducer that operates locally ...'. Could you elaborate this a 
> bit, whenever you have time?
> 
> -- Nirmalya
> 
> -- 
> Software Technologist
> http://www.linkedin.com/in/nirmalyasengupta
> "If you have built castles in the air, your work need not be lost. That is 
> where they should be.
> Now put the foundation under them."
> 
> 
> 
> -- 
> BR,
> Stefano Baghino
> 
> Software Engineer @ Radicalbit

Reply via email to