Have you try out printing timestamp for rows in each batch and watermark
while you add artificial delay on processing batch?
First of all, you're technically using "processing time" in your query,
where you will never have "late events" theoretically. Watermark is to
handle out-of-order events and
`foreachBatch` is being added in Spark 2.4.x if I understand correctly, so
in any language you'll want to upgrade Spark to 2.4.x to use
`foreachBatch`. PySpark is addressed as well.
https://issues.apache.org/jira/browse/SPARK-24565
On Wed, Jan 22, 2020 at 1:12 AM Nick Dawes wrote:
> Thanks for