Hi Nirmalya,
if you want to calculate the running average over all measurements
independent of the probe ID, then you cannot parallelize the computation.
In this case you have to use a global window.
Cheers,
Till
On Feb 19, 2016 6:30 PM, "Nirmalya Sengupta"
wrote:
> Hello Aljoscha ,
>
> My sin
Hello Aljoscha ,
My sincere apologies at the beginning, if I seem to repeat the same
question, almost interminably. If it is frustrating you, I seek your
patience but I really want to nail it down in mind. :-)
The point about parallelizing is well taken. I understand why the stream
should be bro
Hi,
as I understand it the “temp_reading_timestamp” field is not a key on which you
can partition your data. This is a field that would be used for assigning the
elements to timestamps. In you data you also have the “probeID” field. This is
a field that could be used to parallelize computation,
Hello Aljoscha ,
You mentioned: '.. Yes, this is right if you temperatures don’t have any
other field on which you could partition them. '.
What I am failing to understand is that if temperatures are partitioned on
some other field (in my use-case, I have one such: the
temp_reading_timestamp), th
Combiners in streaming are a bit tricky, from their semantics:
1) Combiners always hold data back, through the preaggregation. That adds
latency and also means the values are not in the actual windows
immediately, where a trigger may expect them.
2) In batch, a combiner combines as long as there
They would be awesome, but it’s not yet possible in Flink Streaming, I’m afraid.
> On 18 Feb 2016, at 10:59, Stefano Baghino
> wrote:
>
> I think combiners are pretty awesome for certain cases to minimize network
> usage (the average use case seems to fit perfectly), maybe it would be
> worth
I think combiners are pretty awesome for certain cases to minimize network
usage (the average use case seems to fit perfectly), maybe it would be
worthwhile adding a detailed description of the approach to the docs?
On Thu, Feb 18, 2016 at 10:47 AM, Aljoscha Krettek
wrote:
> @Nirmalya: Yes, this
@Nirmalya: Yes, this is right if you temperatures don’t have any other field on
which you could partition them.
@Stefano: Under some circumstances it would be possible to use a a combiner
(I’m using the name as Hadoop MapReduce would use it, here). When the
assignment of elements to windows hap
Thanks, Aljosha, for the explanation. Isn't there a way to apply the
concept of the combiner to a streaming process?
On Thu, Feb 18, 2016 at 3:56 AM, Nirmalya Sengupta <
sengupta.nirma...@gmail.com> wrote:
> Hello Aljoscha
>
> Thanks very much for clarifying the role of Pre-Aggregation (rather
Hello Aljoscha
Thanks very much for clarifying the role of Pre-Aggregation (rather,
Incr-Aggregation, now that I understand the intention). It helps me to
understand. Thanks to Setfano too, for keeping at the original question of
mine.
My current understanding is that if I have to compute the
Hi,
the name pre-aggregation is a bit misleading. I have started calling it
incremental aggregation because it does not work like a combiner.
What it does is to incrementally fold (or reduce) elements as they arrive at
the window operator. This reduces the amount of required space, because,
oth
Hi Nirmalaya,
my reply was based on me misreading your original post, thinking you had a
batch of data, not a stream. I see that the apply method can also take a
reducer the pre-aggregates your data before passing it to the window
function. I suspect that pre-aggregation runs locally just like a c
Hello Stefano
Sorry for the late reply. Many thanks for taking effort to write and share
an example code snippet.
I have been playing with the countWindow behaviour for some weeks now and I
am generally aware of the functionality of countWindowAll(). For my
useCase, where I _have to observe_ the
Hello again Nirmalaya,
in this case you are keying by timestamp; think of keying as grouping: this
means that windows are brought together according to their timestamp. I
misread your original post but now that I see the code I understand your
problem.
I've put some code here:
https://github.com/
Hello Stefano
I have tried to implement what I understood from your mail earlier in the
day, but it doesn't seem to produce the result I expect. Here's the code
snippet:
-
val env = StreamExecutionEnvironment.createLocal
Hello Stefano
Many thanks for responding so quickly.
Your explanation not only confirms my understanding but gives a much
simpler solution. The facility of associating a specific parallelism to a
given operator didn't strike me at all. You are right that for my
particular UseCase, that is the si
Hi Nirmalaya,
I think for your use case in particular it's enough to specify the reducer
that computes the average to have a parallelism of 1 by calling the
`setParallelism` API when you apply it. Keep in mind that you can still
enjoy a high level of parallelism up until the last operator by using
Hello Flinksters,
This is perhaps too trivial for most here in this forum, but I want to have
my understanding clear.
I want to find the average of temperatures coming in as a stream. The tuple
as the following fields:
probeID,readingTimeStamp,radiationLevel,photoSensor,humidity,ambientTemperatu
18 matches
Mail list logo