Re: Does Flink support joining multiple streams based on event time window now?

2016-04-19 Thread Till Rohrmann
If the data exceeds the main memory of your machine, then you should use the RocksDBStateBackend as a state backend. It allows you to store state (including windows) on disk. Thus, the size of state you can store is then limited by your hard disk capacity. If the expected data size can be kept in

Re: Does Flink support joining multiple streams based on event time window now?

2016-04-19 Thread Yifei Li
Hi Till and Aljoscha, Thank you so much for your suggestions and I'll try them out. I have another question. Since S2 my be days delayed, so there are may be lots of windows and large amount of data stored in memory waiting for computation. How does Flink deal with that? Thanks, Yifei On Tue,

Re: Does Flink support joining multiple streams based on event time window now?

2016-04-19 Thread Till Rohrmann
Hi Yifei, if you don't wanna implement your own join operator, then you could also chain two join operations. I created a small example to demonstrate that: https://gist.github.com/tillrohrmann/c074b4eedb9deaf9c8ca2a5e124800f3. However, bare in mind that for this approach you will construct two wi

Re: Does Flink support joining multiple streams based on event time window now?

2016-04-19 Thread Aljoscha Krettek
Hi, right now, there is no built-in support for n-ary joins. I am working on this, however. For now you can simulate n-ary joins by using a tagged union and doing the join yourself in a WindowFunction. I created a small example that demonstrates this: https://gist.github.com/aljoscha/a2a213d90c7c1

Does Flink support joining multiple streams based on event time window now?

2016-04-18 Thread Yifei Li
Hi, I am new to Flink and I've read some documentation and think Flink may fit my scenario. Here is my scenario: 1. Assume I have 3 streams: S1(id, name, email, action, date), S2(id, name, email, level, date), S3(id, name, position, date). *2. S2 always delays(hours to days, not determined..) *