Flink not outputting windows before all data is seen

2020-08-29 Thread Teodor Spæren
Hey! Second time posting to a mailing lists, lets hope I'm doing this correctly :) My usecase is to take data from the mediawiki dumps and stream it into Flink via the `readTextFile` method. The dumps are TSV files with an event per line, each event have a timestamp and a type. I want to use

Re: Flink not outputting windows before all data is seen

2020-08-29 Thread David Anderson
Teodor, This is happening because of the way that readTextFile works when it is executing in parallel, which is to divide the input file into a bunch of splits, which are consumed in parallel. This is making it so that the watermark isn't able to move forward until much or perhaps all of the file

Re: Flink not outputting windows before all data is seen

2020-08-30 Thread Teodor Spæren
Hey David! I tried what you said, but it did not solve the problem. The job still has to wait until the very end before outputting anything. I mentioned in my original email that I had set the parallelism to 1 job wide, but when I reran the task, I added your line. Are there any circumstance

Re: Flink not outputting windows before all data is seen

2020-08-30 Thread Teodor Spæren
Hey again David! I tried your proposed change of setting the paralilism higher. This worked, but why does this fix the behavior? I don't understand why this would fix it. The only thing that happens to the query plan is that a "remapping" node is added. Thanks for the fix, and for any additi

Re: Flink not outputting windows before all data is seen

2020-09-01 Thread David Anderson
Teodor, I've concluded this is a bug, and have reported it: https://issues.apache.org/jira/browse/FLINK-19109 Best regards, David On Sun, Aug 30, 2020 at 3:01 PM Teodor Spæren wrote: > Hey again David! > > I tried your proposed change of setting the paralilism higher. This > worked, but why do