Hi Elias,
I like the idea of having a trailing / sliding window assigner to perform
your join. However, the result should not be entirely correct wrt your
initial join specification.
Given an events data set which contains the elements e1 = (4000, 1, 1) and
e2 = (4500, 2, 2) and a changes data
Hi Elias!
I think you brought up a couple of good issues. Let me try and summarize
what we have so far:
1) Joining in a more flexible fashion
=> The problem you are solving with the trailing / sliding window
combination: Is the right way to phrase the join problem "join records
where key is
Till,
An issue with your suggestion is that the job state may grow unbounded. You
are managing
expiration of data from the cache in the operator, but the state is
partitioned by the stream key.
That means if we no longer observe a key, the state associated with that
key will never be
removed.
In
Till,
Thanks again for putting this together. It is certainly along the lines of
what I want to accomplish, but I see some problem with it. In your code
you use a ValueStore to store the priority queue. If you are expecting to
store a lot of values in the queue, then you are likely to be using
Hi Elias,
thanks for the long write-up. It's interesting that it actually kinda works
right now.
You might be interested in a design doc that we're currently working on. I
posted it on the dev list but here it is:
Thanks for the suggestion. I ended up implementing it a different way.
What is needed is a mechanism to give each stream a different window
assigner, and then let Flink perform the join normally given the assigned
windows.
Specifically, for my use case what I need is a sliding window for one
Hi Elias,
sorry for the late reply. You're right that with the windowed join you
would have to deal with pairs where the timestamp of (x,y) is not
necessarily earlier than the timestamp of z. Moreover, by using sliding
windows you would receive duplicates as you've described. Using tumbling
Anyone from Data Artisans have some idea of how to go about this?
On Wed, Apr 13, 2016 at 5:32 PM, Maxim wrote:
> You could simulate the Samza approach by having a RichFlatMapFunction over
> cogrouped streams that maintains the sliding window in its ListState. As I
>
You could simulate the Samza approach by having a RichFlatMapFunction over
cogrouped streams that maintains the sliding window in its ListState. As I
understand the drawback is that the list state is not maintained in the
managed memory.
I'm interested to hear about the right way to implement