Sorry, I can’ say much about SDF. Maybe Lukasz Cwik can provide more details on 
this.

> On 8 Sep 2020, at 09:01, Gaurav Nakum <gaurav.na...@oracle.com> wrote:
> 
> Thank you very much for your explanation!
> commitOffsetsInFinalize() -> although checkpointing depends on the runner is 
> it not configurable in a connector implementation?
> Basically, I want to understand how this can be done with a new IO connector 
> implementation, esp. with the new SDF API. If I am right, in the traditional 
> UnboundedSource API, checkpointing was configured using 
> UnboundedSource.CheckpointMark, but I am not sure about the SDF API. 
> Also, since KafkaIO SDF read does not provide commitOffsetsInFinalize 
> functionality could you point to some resources which discuss checkpointing 
> using the new SDF API?
> 
> Thank you,
> Gaurav
> On 9/7/20 10:54 AM, Alexey Romanenko wrote:
>> From my understanding: 
>> - ENABLE_AUTO_COMMIT_CONFIG will say to Kafka consumer (used inside KafkaIO 
>> to read messages) to commit periodically offsets in the background;
>> - on the other hand, if "commitOffsetsInFinalize()” is used, then Beam 
>> Checkpoint mechanism will be leveraged to restart from checkpoints in case 
>> of failures. It won’t need to wait for pipeline's finish, though it’s up to 
>> the runner to decide when and how often to save checkpoints.
>>  
>> In KafkaIO, it’s possible to use only one option for the same transform - 
>> either ENABLE_AUTO_COMMIT_CONFIG or commitOffsetsInFinalize() 
>>  
>>  
>>> On 6 Sep 2020, at 07:24, Apple <gaurav.na...@oracle.com 
>>> <mailto:gaurav.na...@oracle.com>> wrote:
>>>  
>>> Hi everyone, 
>>> 
>>> 
>>> I have a question on KafkaIO.
>>> What is the difference between setting AUTO_COMMIT_CONFIG and 
>>> commitOffsetsInFinalize()? My understanding is that:
>>>  
>>> 1.            AUTO_COMMIT_CONFIG commits Kafka records as soon as 
>>> KafkaIO.read() outputs messages, but I am not sure how would this be 
>>> helpful, for e.g. if a consumer transform after KafkaIO.read() fails , the 
>>> messages would be lost (which sounds like at-most once semantics)
>>>  
>>> 2.            commitOffsetsFinalize()  commits when the pipeline is 
>>> finished. But when does the pipeline end? In other words, when is 
>>> PipelineResult.State = Done in a streaming scenario?
>>>  
>>> Thanks!

Reply via email to