Thank you very much for your explanation!

commitOffsetsInFinalize() -> although checkpointing depends on the runner is it 
not configurable in a connector implementation?

Basically, I want to understand how this can be done with a new IO connector 
implementation, esp. with the new SDF API. If I am right, in the traditional 
UnboundedSource API, checkpointing was configured using 
UnboundedSource.CheckpointMark, but I am not sure about the SDF API. 

Also, since KafkaIO SDF read does not provide commitOffsetsInFinalize 
functionality could you point to some resources which discuss checkpointing 
using the new SDF API?

Thank you,
Gaurav

 

On 9/7/20 10:54 AM, Alexey Romanenko wrote:

>From my understanding: 

- ENABLE_AUTO_COMMIT_CONFIG will say to Kafka consumer (used inside KafkaIO to 
read messages) to commit periodically offsets in the background;

- on the other hand, if "commitOffsetsInFinalize()” is used, then Beam 
Checkpoint mechanism will be leveraged to restart from checkpoints in case of 
failures. It won’t need to wait for pipeline's finish, though it’s up to the 
runner to decide when and how often to save checkpoints.

 

In KafkaIO, it’s possible to use only one option for the same transform - 
either ENABLE_AUTO_COMMIT_CONFIG or commitOffsetsInFinalize() 

 

 

On 6 Sep 2020, at 07:24, Apple <gaurav.na...@oracle.com> wrote:

 

Hi everyone, 


I have a question on KafkaIO.
What is the difference between setting AUTO_COMMIT_CONFIG and 
commitOffsetsInFinalize()? My understanding is that:

 

1.            AUTO_COMMIT_CONFIG commits Kafka records as soon as 
KafkaIO.read() outputs messages, but I am not sure how would this be helpful, 
for e.g. if a consumer transform after KafkaIO.read() fails , the messages 
would be lost (which sounds like at-most once semantics)

 

2.            commitOffsetsFinalize()  commits when the pipeline is finished. 
But when does the pipeline end? In other words, when is PipelineResult.State = 
Done in a streaming scenario?

 

Thanks!

 

Reply via email to