Combiners in streaming are a bit tricky, from their semantics:

1) Combiners always hold data back, through the preaggregation. That adds
latency and also means the values are not in the actual windows
immediately, where a trigger may expect them.

2) In batch, a combiner combines as long as there is input data, or as long
as there is space in the buffer. In streaming you would need to define
something like "combine, but do not hold back longer than 5 seconds", to
control the latrncy impact. Holding data back for a limited time makes the
combiner less effective (it combines fewer elements)

I think these two points limit the benefit of combiners in streaming. There
are cases where they may still help, but I think they are much fewer than
in batch.

Stephan


On Thu, Feb 18, 2016 at 11:03 AM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> They would be awesome, but it’s not yet possible in Flink Streaming, I’m
> afraid.
>
> > On 18 Feb 2016, at 10:59, Stefano Baghino <stefano.bagh...@radicalbit.io>
> wrote:
> >
> > I think combiners are pretty awesome for certain cases to minimize
> network usage (the average use case seems to fit perfectly), maybe it would
> be worthwhile adding a detailed description of the approach to the docs?
> >
> > On Thu, Feb 18, 2016 at 10:47 AM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
> > @Nirmalya: Yes, this is right if you temperatures don’t have any other
> field on which you could partition them.
> >
> > @Stefano: Under some circumstances it would be possible to use a a
> combiner (I’m using the name as Hadoop MapReduce would use it, here). When
> the assignment of elements to windows happens based on the timestamp in the
> elements and window triggering happens based on watermark it is possible to
> combine locally. The reason is that the elements will end up in the same
> windows regardless of the time at which the window is processed so it can
> be done in two steps. Does that make sense? It’s a very ad-hoc description
> and I could make up a drawing or something if that helped. :D
> >
> >
> > > On 18 Feb 2016, at 10:04, Stefano Baghino <
> stefano.bagh...@radicalbit.io> wrote:
> > >
> > > Thanks, Aljosha, for the explanation. Isn't there a way to apply the
> concept of the combiner to a streaming process?
> > >
> > > On Thu, Feb 18, 2016 at 3:56 AM, Nirmalya Sengupta <
> sengupta.nirma...@gmail.com> wrote:
> > > Hello Aljoscha  <aljos...@apache.org>
> > >
> > > Thanks very much for clarifying the role of  Pre-Aggregation (rather,
> Incr-Aggregation, now that I understand the intention). It helps me to
> understand. Thanks to Setfano too, for keeping at the original question of
> mine.
> > >
> > > My current understanding is that if I have to compute the average of a
> streaming set of _temperatures_ then the *best* way to accomplish this, is
> by employing one node (or thread, on my laptop), losing speed but gaining
> deterministic behaviour in the process. I can decide to capture the average
> either by grouping the temperatures by count or by time. Because I am
> sliding the window anyway, I don't run the risk of accumulation of elements
> in the window and buffer overrun.
> > >
> > > Could  you please confirm if my understanding is correct? I feel happy
> if I 'understand' the basis of a design well! :-)
> > >
> > > --  Nirmalya
> > > --
> > > Software Technologist
> > > http://www.linkedin.com/in/nirmalyasengupta
> > > "If you have built castles in the air, your work need not be lost.
> That is where they should be.
> > > Now put the foundation under them."
> > >
> > >
> > >
> > > --
> > > BR,
> > > Stefano Baghino
> > >
> > > Software Engineer @ Radicalbit
> >
> >
> >
> >
> > --
> > BR,
> > Stefano Baghino
> >
> > Software Engineer @ Radicalbit
>
>

Reply via email to