I don't think the capability matrix is updated, the Spark runner uses
LateDataUtils to handle late elements -
https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkGroupAlsoByWindowViaWindowSet.java#L300
On Fri, Sep 7, 2018 at 6:43 PM Raghu Angadi <[email protected]> wrote:

> I see. Hopefully someone with more familiarity with Spark runner will
> chime in.
>
> On Fri, Sep 7, 2018 at 1:41 PM Vishwas Bm <[email protected]> wrote:
>
>> Hi,
>>
>> In our use case the watermark is the processing time.
>>
>> As per beam capability matrix (
>> https://beam.apache.org/documentation/runners/capability-matrix/)
>> lateness is not supported by spark runner.  But as per the output in our
>> use case we are able to see late data getting emitted.
>>
>> So we wanted to know whether spark runner supports allowed lateness or
>> not.
>>
>>
>> Regards,
>> Vishwas
>>
>>
>> On Fri, Sep 7, 2018, 10:09 PM Raghu Angadi <[email protected]> wrote:
>>
>>> Lateness depends on watermark. How did you configure your KafkaIO
>>> reader? Did you set custom timestamp function? By default watermark in
>>> KafkaIO is set to same as processing time, in which case, your watermark
>>> could be close to 13-38-37 (processing time).  Note that this is in general
>>> true across all the runners, though I am not aware of any subtle
>>> differences in Spark runner.
>>>
>>> On Fri, Sep 7, 2018 at 7:03 AM rahul patwari <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are running a Beam program on Spark. We are using 2.5.0 Beam and
>>>> SparkRunner versions. We are seeing Late data in the output emitted by
>>>> Spark. As per the capability Matrix, Lateness is not supported in Spark. Is
>>>> it supported now? or Are we missing something?
>>>>
>>>> Steps:
>>>>  Read from Kafka, Apply a Fixed Window of 1 Min with Lateness as 2 Min
>>>> with Late firings when an element is found with Accumulating Fired Panes,
>>>> GroupByKey, ParDo to display the result.
>>>>
>>>> Below is the output of the ParDo in which we are printing the
>>>> GroupByKey result:
>>>> Pane Timing                : LATE
>>>> Processing Time         : 2018-09-07----13-38-37-7290----+0000
>>>> Element Time              : 2018-09-07----13-36-59-9990----+0000
>>>> Window Start Time      : 2018-09-07----13-36-00-0000----+0000
>>>> Window End Time       : 2018-09-07----13-37-00-0000----+0000
>>>> Pane Index                  : 1
>>>> Pane NonSpeculativeIndex : 1
>>>>
>>>> Regards,
>>>> Rahul
>>>>
>>>

Reply via email to