Re: aggregation by time window

2013-01-28 Thread Zhiwei Lin
do you mean every 7 mins? e.g, [10:07, 10:14), [10:14, 10:21) . On 28 January 2013 12:56, Oleg Ruchovets wrote: > Hi , > I have such row data structure: > > event_id | time > == > event1 | 10:07 > event2 | 10:10 > event3 | 10:12 > > event4 | 10:

Re: aggregation by time window

2013-01-28 Thread Kai Voigt
Quick idea: since each of your events will go into several buckets, you could use map() to emit each item multiple times for each bucket. Am 28.01.2013 um 13:56 schrieb Oleg Ruchovets : > Hi , >I have such row data structure: > > event_id | time > == > event1 | 10:07 >

Re: aggregation by time window

2013-01-28 Thread Oleg Ruchovets
Hi Kai. It is very interesting. Can you please explain in more details your Idea? What will be a key in a map phase? Suppose we have event at 10:07. How would you emit this to the multiple buckets? Thanks Oleg. On Mon, Jan 28, 2013 at 3:17 PM, Kai Voigt wrote: > Quick idea: > > since each

Re: aggregation by time window

2013-01-28 Thread Kai Voigt
Hi again, the idea is that you emit every event multiple times. So your map input record (event1, 10:07) will be emitted seven times during the map() call. Like I said, (10:04,event1), (10:05,event1), ..., (10:10,event1) will be the seven outputs for processing a single event. The output key w

Re: aggregation by time window

2013-01-28 Thread Oleg Ruchovets
Hi , Zhiwei. No :-). Every 7 minutes is is easy. just transform time to milisecond/7*6 will give you a bucket key. I need to do the following: Find the events which was dirung time T related to the event X. In very naive approach I need to take first event and find other events which

Re: aggregation by time window

2013-01-28 Thread Oleg Ruchovets
Well , much more clear , but still have a questions :-) Suppose we have 3 map input records event1 | 10:07 event2 | 10:10 event3 | 10:12 Output from map(event1 | 10:07) will be : mapOutput(10:04:event1) mapOutput(10:05:event1) mapOutput(10:06:event1) mapOutput(10:07:event1) mapOutput(10:08