Hi, In our use case the watermark is the processing time.
As per beam capability matrix ( https://beam.apache.org/documentation/runners/capability-matrix/) lateness is not supported by spark runner. But as per the output in our use case we are able to see late data getting emitted. So we wanted to know whether spark runner supports allowed lateness or not. Regards, Vishwas On Fri, Sep 7, 2018, 10:09 PM Raghu Angadi <[email protected]> wrote: > Lateness depends on watermark. How did you configure your KafkaIO reader? > Did you set custom timestamp function? By default watermark in KafkaIO is > set to same as processing time, in which case, your watermark could be > close to 13-38-37 (processing time). Note that this is in general true > across all the runners, though I am not aware of any subtle differences in > Spark runner. > > On Fri, Sep 7, 2018 at 7:03 AM rahul patwari <[email protected]> > wrote: > >> Hi, >> >> We are running a Beam program on Spark. We are using 2.5.0 Beam and >> SparkRunner versions. We are seeing Late data in the output emitted by >> Spark. As per the capability Matrix, Lateness is not supported in Spark. Is >> it supported now? or Are we missing something? >> >> Steps: >> Read from Kafka, Apply a Fixed Window of 1 Min with Lateness as 2 Min >> with Late firings when an element is found with Accumulating Fired Panes, >> GroupByKey, ParDo to display the result. >> >> Below is the output of the ParDo in which we are printing the GroupByKey >> result: >> Pane Timing : LATE >> Processing Time : 2018-09-07----13-38-37-7290----+0000 >> Element Time : 2018-09-07----13-36-59-9990----+0000 >> Window Start Time : 2018-09-07----13-36-00-0000----+0000 >> Window End Time : 2018-09-07----13-37-00-0000----+0000 >> Pane Index : 1 >> Pane NonSpeculativeIndex : 1 >> >> Regards, >> Rahul >> >
