Re: [DISCUSS] Per-key event time

2017-02-28 Thread Jamie Grier
Thinking about this a bit more... I think it may be interesting to enable two modes for event-time advancement in Flink 1) The current mode which I'll call partition-based, pessimistic, event-time advancement 2) Key-based, eager, event-time advancement In this key-based eager mode it's actually

Re: [DISCUSS] Per-key event time

2017-02-28 Thread Aljoscha Krettek
@Tzu-Li Yes, the deluxe stream would not allow another keyBy(). Or we could allow it but then we would exit the world of the deluxe stream and per-key watermarks and go back to the realm of normal streams and keyed streams. On Tue, 28 Feb 2017 at 10:08 Tzu-Li (Gordon) Tai

Re: [DISCUSS] Per-key event time

2017-02-28 Thread Tzu-Li (Gordon) Tai
Throwing in some thoughts: When a source determines that no more data will come for a key (which  in itself is a bit of a tricky problem) then it should signal to downstream  operations to take the key out of watermark calculations, that is that we  can release some space.  I don’t think this is

Re: [DISCUSS] Per-key event time

2017-02-27 Thread Aljoscha Krettek
This is indeed an interesting topic, thanks for starting the discussion, Jamie! I now thought about this for a while, since more and more people seem to be asking about it lately. First, I thought that per-key watermark handling would not be necessary because it can be done locally (as Paris

Re: [DISCUSS] Per-key event time

2017-02-23 Thread Gábor Hermann
Hey all, Let me share some ideas about this. @Paris: The local-only progress tracking indeed seems easier, we do not need to broadcast anything. Implementation-wise it is easier, but performance-wise probably not. If one key can come from multiple sources, there could be a lot more network

Re: [DISCUSS] Per-key event time

2017-02-22 Thread Paris Carbone
Hey Jamie! Key-based progress tracking sounds like local-only progress tracking to me, there is no need to use a low watermarking mechanism at all since all streams of a key are handled by a single partition at a time (per operator). Thus, this could be much easier to implement and support

[DISCUSS] Per-key event time

2017-02-22 Thread Jamie Grier
Hi Flink Devs, Use cases that I see quite frequently in the real world would benefit from a different watermarking / event time model than the one currently implemented in Flink. I would call Flink's current approach partition-based watermarking or maybe subtask-based watermarking. In this