Thanks, Jungtaek. Very useful information. Could I please trouble you with one further question - what you said makes perfect sense but to what exactly does SPARK-24156 <https://issues.apache.org/jira/browse/SPARK-24156> refer if not fixing the "need to add a dummy record to move watermark forward"?
Kind regards, Phillip On Mon, Jul 27, 2020 at 11:41 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > I'm not sure what exactly your problem is, but given you've mentioned > window and OutputMode.Append, you may want to remind that append mode > doesn't produce the output of aggregation unless the watermark "passes by". > It's expected behavior if you're seeing lazy outputs on OutputMode.Append > compared to OutputMode.Update. > > Unfortunately there's no mechanism on SSS to move forward only watermark > without actual input, so if you want to test some behavior on > OutputMode.Append you would need to add a dummy record to move watermark > forward. > > Hope this helps. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > On Mon, Jul 27, 2020 at 8:10 PM Phillip Henry <londonjava...@gmail.com> > wrote: > >> Sorry, should have mentioned that Spark only seems reluctant to take the >> last windowed, groupBy batch from Kafka when using OutputMode.Append. >> >> I've asked on StackOverflow: >> >> https://stackoverflow.com/questions/62915922/spark-structured-streaming-wont-pull-the-final-batch-from-kafka >> but am still struggling. Can anybody please help? >> >> How do people test their SSS code if you have to put a message on Kafka >> to get Spark to consume a batch? >> >> Kind regards, >> >> Phillip >> >> >> On Sun, Jul 12, 2020 at 4:55 PM Phillip Henry <londonjava...@gmail.com> >> wrote: >> >>> Hi, folks. >>> >>> I noticed that SSS won't process a waiting batch if there are no batches >>> after that. To put it another way, Spark must always leave one batch on >>> Kafka waiting to be consumed. >>> >>> There is a JIRA for this at: >>> >>> https://issues.apache.org/jira/browse/SPARK-24156 >>> >>> that says it's resolved in 2.4.0 but my code >>> <https://github.com/PhillHenry/SSSPlayground/blob/Spark2/src/test/scala/uk/co/odinconsultants/sssplayground/windows/TimestampedStreamingSpec.scala> >>> is using 2.4.2 yet I still see Spark reluctant to consume another batch >>> from Kafka if it means there is nothing else waiting to be processed in the >>> topic. >>> >>> Do I have to do something special to exploit the behaviour that >>> SPARK-24156 says it has addressed? >>> >>> Regards, >>> >>> Phillip >>> >>> >>> >>>