Hi Mans, Watermark is Spark is used to decide when to clear the state, so if the even it delayed more than when the state is cleared by Spark, then it will be ignored. I recently wrote a blog post on this : http://vishnuviswanath.com/spark_structured_streaming.html#watermark
Yes, this State is applicable for aggregation only. If you are having only a map function and don't want to process it, you could do a filter based on its EventTime field, but I guess you will have to compare it with the processing time since there is no API to access Watermark by the user. -Vishnu On Fri, Jan 26, 2018 at 1:14 PM, M Singh <mans2si...@yahoo.com.invalid> wrote: > Hi: > > I am trying to filter out records which are lagging behind (based on event > time) by a certain amount of time. > > Is the watermark api applicable to this scenario (ie, filtering lagging > records) or it is only applicable with aggregation ? I could not get a > clear understanding from the documentation which only refers to it's usage > with aggregation. > > Thanks > > Mans >