Re: Finding the average temperature

2016-02-21 Thread Till Rohrmann
Hi Nirmalya, if you want to calculate the running average over all measurements independent of the probe ID, then you cannot parallelize the computation. In this case you have to use a global window. Cheers, Till On Feb 19, 2016 6:30 PM, "Nirmalya Sengupta" wrote:

Re: Finding the average temperature

2016-02-19 Thread Nirmalya Sengupta
Hello Aljoscha , My sincere apologies at the beginning, if I seem to repeat the same question, almost interminably. If it is frustrating you, I seek your patience but I really want to nail it down in mind. :-) The point about parallelizing is well taken. I understand why

Re: Finding the average temperature

2016-02-19 Thread Aljoscha Krettek
Hi, as I understand it the “temp_reading_timestamp” field is not a key on which you can partition your data. This is a field that would be used for assigning the elements to timestamps. In you data you also have the “probeID” field. This is a field that could be used to parallelize computation,

Re: Finding the average temperature

2016-02-18 Thread Nirmalya Sengupta
Hello Aljoscha , You mentioned: '.. Yes, this is right if you temperatures don’t have any other field on which you could partition them. '. What I am failing to understand is that if temperatures are partitioned on some other field (in my use-case, I have one such: the

Re: Finding the average temperature

2016-02-18 Thread Stephan Ewen
Combiners in streaming are a bit tricky, from their semantics: 1) Combiners always hold data back, through the preaggregation. That adds latency and also means the values are not in the actual windows immediately, where a trigger may expect them. 2) In batch, a combiner combines as long as there

Re: Finding the average temperature

2016-02-18 Thread Aljoscha Krettek
They would be awesome, but it’s not yet possible in Flink Streaming, I’m afraid. > On 18 Feb 2016, at 10:59, Stefano Baghino > wrote: > > I think combiners are pretty awesome for certain cases to minimize network > usage (the average use case seems to fit

Re: Finding the average temperature

2016-02-18 Thread Stefano Baghino
I think combiners are pretty awesome for certain cases to minimize network usage (the average use case seems to fit perfectly), maybe it would be worthwhile adding a detailed description of the approach to the docs? On Thu, Feb 18, 2016 at 10:47 AM, Aljoscha Krettek wrote:

Re: Finding the average temperature

2016-02-18 Thread Aljoscha Krettek
@Nirmalya: Yes, this is right if you temperatures don’t have any other field on which you could partition them. @Stefano: Under some circumstances it would be possible to use a a combiner (I’m using the name as Hadoop MapReduce would use it, here). When the assignment of elements to windows

Re: Finding the average temperature

2016-02-17 Thread Nirmalya Sengupta
Hello Aljoscha Thanks very much for clarifying the role of Pre-Aggregation (rather, Incr-Aggregation, now that I understand the intention). It helps me to understand. Thanks to Setfano too, for keeping at the original question of mine. My current understanding is that if

Re: Finding the average temperature

2016-02-17 Thread Stefano Baghino
Hi Nirmalaya, my reply was based on me misreading your original post, thinking you had a batch of data, not a stream. I see that the apply method can also take a reducer the pre-aggregates your data before passing it to the window function. I suspect that pre-aggregation runs locally just like a

Re: Finding the average temperature

2016-02-15 Thread Nirmalya Sengupta
Hello Stefano Sorry for the late reply. Many thanks for taking effort to write and share an example code snippet. I have been playing with the countWindow behaviour for some weeks now and I am generally aware of the functionality of countWindowAll(). For my

Re: Finding the average temperature

2016-02-14 Thread Stefano Baghino
Hello again Nirmalaya, in this case you are keying by timestamp; think of keying as grouping: this means that windows are brought together according to their timestamp. I misread your original post but now that I see the code I understand your problem. I've put some code here:

Re: Finding the average temperature

2016-02-14 Thread Nirmalya Sengupta
Hello Stefano I have tried to implement what I understood from your mail earlier in the day, but it doesn't seem to produce the result I expect. Here's the code snippet: - val env =

Finding the average temperature

2016-02-13 Thread Nirmalya Sengupta
Hello Flinksters, This is perhaps too trivial for most here in this forum, but I want to have my understanding clear. I want to find the average of temperatures coming in as a stream. The tuple as the following fields:

Re: Finding the average temperature

2016-02-13 Thread Nirmalya Sengupta
Hello Stefano Many thanks for responding so quickly. Your explanation not only confirms my understanding but gives a much simpler solution. The facility of associating a specific parallelism to a given operator didn't strike me at all. You are right that for my