Re: Late data before window end is even close

2018-06-08 Thread Fabian Hueske
Thanks for reporting back and the debugging advice! Best, Fabian 2018-06-08 9:00 GMT+02:00 Juho Autio : > Flink was NOT at fault. Turns out our Kafka producer had OS level clock > sync problems :( > > Because of that, our Kafka occasionally had some messages in between with > an incorrect

Re: Late data before window end is even close

2018-06-08 Thread Juho Autio
Flink was NOT at fault. Turns out our Kafka producer had OS level clock sync problems :( Because of that, our Kafka occasionally had some messages in between with an incorrect timestamp. In practice they were about 7 days older than they should. I'm really sorry for wasting your time on this.

Re: Late data before window end is even close

2018-05-14 Thread Fabian Hueske
Thanks for correcting me Piotr. I didn't look close enough at the code. With the presently implemented logic, a record should not be emitted to a side output if its window wasn't closed yet. 2018-05-11 14:13 GMT+02:00 Piotr Nowojski : > Generally speaking best practise

Re: Late data before window end is even close

2018-05-11 Thread Piotr Nowojski
Generally speaking best practise is always to simplify your program as much as possible to narrow down the scope of the search. Replace data source with statically generated events, remove unnecessary components Etc. Either such process help you figure out what’s wrong on your own and if not,

Re: Late data before window end is even close

2018-05-11 Thread Juho Autio
Thanks for that code snippet, I should try it out to simulate my DAG.. If any suggestions how to debug futher what's causing late data on a production stream job, please let me know. On Fri, May 11, 2018 at 2:18 PM, Piotr Nowojski wrote: > Hey, > > Actually I think

Re: Late data before window end is even close

2018-05-11 Thread Piotr Nowojski
Hey, Actually I think Fabian initial message was incorrect. As far as I can see in the code of WindowOperator (last lines of org.apache.flink.streaming.runtime.operators.windowing.WindowOperator#processElement ), the element is sent to late side output if it is late AND it wasn’t assigned to

Re: Late data before window end is even close

2018-05-11 Thread Juho Autio
Thanks. What I still don't get is why my message got filtered in the first place. Even if the allowed lateness filtering would be done "on the window", data should not be dropped as late if it's not in fact late by more than the allowedLateness setting. Assuming that these conditions hold: -

Re: Late data before window end is even close

2018-05-11 Thread Fabian Hueske
Hi Juho, Thanks for bringing up this topic! I share your intuition. IMO, records should only be filtered out and send to a side output if any of the windows they would be assigned to is closed already. I had a look into the code and found that records are filtered out as late based on the

Late data before window end is even close

2018-05-11 Thread Juho Autio
I don't understand why I'm getting some data discarded as late on my Flink stream job a long time before the window even closes. I can not be 100% sure, but to me it seems like the kafka consumer is basically causing the data to be dropped as "late", not the window. I didn't expect this to ever