Re: How to perform this join operation?

2016-05-26 Thread Till Rohrmann
Hi Elias, I like the idea of having a trailing / sliding window assigner to perform your join. However, the result should not be entirely correct wrt your initial join specification. Given an events data set which contains the elements e1 = (4000, 1, 1) and e2 = (4500, 2, 2) and a changes data

Re: How to perform this join operation?

2016-05-25 Thread Stephan Ewen
Hi Elias! I think you brought up a couple of good issues. Let me try and summarize what we have so far: 1) Joining in a more flexible fashion => The problem you are solving with the trailing / sliding window combination: Is the right way to phrase the join problem "join records where key is

Re: How to perform this join operation?

2016-05-20 Thread Elias Levy
Till, An issue with your suggestion is that the job state may grow unbounded. You are managing expiration of data from the cache in the operator, but the state is partitioned by the stream key. That means if we no longer observe a key, the state associated with that key will never be removed. In

Re: How to perform this join operation?

2016-05-03 Thread Elias Levy
Till, Thanks again for putting this together. It is certainly along the lines of what I want to accomplish, but I see some problem with it. In your code you use a ValueStore to store the priority queue. If you are expecting to store a lot of values in the queue, then you are likely to be using

Re: How to perform this join operation?

2016-05-03 Thread Aljoscha Krettek
Hi Elias, thanks for the long write-up. It's interesting that it actually kinda works right now. You might be interested in a design doc that we're currently working on. I posted it on the dev list but here it is:

Re: How to perform this join operation?

2016-05-02 Thread Elias Levy
Thanks for the suggestion. I ended up implementing it a different way. What is needed is a mechanism to give each stream a different window assigner, and then let Flink perform the join normally given the assigned windows. Specifically, for my use case what I need is a sliding window for one

Re: How to perform this join operation?

2016-04-20 Thread Till Rohrmann
Hi Elias, sorry for the late reply. You're right that with the windowed join you would have to deal with pairs where the timestamp of (x,y) is not necessarily earlier than the timestamp of z. Moreover, by using sliding windows you would receive duplicates as you've described. Using tumbling

Re: How to perform this join operation?

2016-04-14 Thread Elias Levy
Anyone from Data Artisans have some idea of how to go about this? On Wed, Apr 13, 2016 at 5:32 PM, Maxim wrote: > You could simulate the Samza approach by having a RichFlatMapFunction over > cogrouped streams that maintains the sliding window in its ListState. As I >

Re: How to perform this join operation?

2016-04-13 Thread Maxim
You could simulate the Samza approach by having a RichFlatMapFunction over cogrouped streams that maintains the sliding window in its ListState. As I understand the drawback is that the list state is not maintained in the managed memory. I'm interested to hear about the right way to implement