Re: How to prevent and track data loss/dropped due to watermark during structure streaming aggregation

2020-02-02 Thread Jungtaek Lim
Have you try out printing timestamp for rows in each batch and watermark while you add artificial delay on processing batch? First of all, you're technically using "processing time" in your query, where you will never have "late events" theoretically. Watermark is to handle out-of-order events and

Re: Extract value from streaming Dataframe to a variable

2020-02-02 Thread Jungtaek Lim
`foreachBatch` is being added in Spark 2.4.x if I understand correctly, so in any language you'll want to upgrade Spark to 2.4.x to use `foreachBatch`. PySpark is addressed as well. https://issues.apache.org/jira/browse/SPARK-24565 On Wed, Jan 22, 2020 at 1:12 AM Nick Dawes wrote: > Thanks for