Thank you very much for your explanation! commitOffsetsInFinalize() -> although checkpointing depends on the runner is it not configurable in a connector implementation?
Basically, I want to understand how this can be done with a new IO connector implementation, esp. with the new SDF API. If I am right, in the traditional UnboundedSource API, checkpointing was configured using UnboundedSource.CheckpointMark, but I am not sure about the SDF API. Also, since KafkaIO SDF read does not provide commitOffsetsInFinalize functionality could you point to some resources which discuss checkpointing using the new SDF API? Thank you, Gaurav On 9/7/20 10:54 AM, Alexey Romanenko wrote: >From my understanding: - ENABLE_AUTO_COMMIT_CONFIG will say to Kafka consumer (used inside KafkaIO to read messages) to commit periodically offsets in the background; - on the other hand, if "commitOffsetsInFinalize()” is used, then Beam Checkpoint mechanism will be leveraged to restart from checkpoints in case of failures. It won’t need to wait for pipeline's finish, though it’s up to the runner to decide when and how often to save checkpoints. In KafkaIO, it’s possible to use only one option for the same transform - either ENABLE_AUTO_COMMIT_CONFIG or commitOffsetsInFinalize() On 6 Sep 2020, at 07:24, Apple <gaurav.na...@oracle.com> wrote: Hi everyone, I have a question on KafkaIO. What is the difference between setting AUTO_COMMIT_CONFIG and commitOffsetsInFinalize()? My understanding is that: 1. AUTO_COMMIT_CONFIG commits Kafka records as soon as KafkaIO.read() outputs messages, but I am not sure how would this be helpful, for e.g. if a consumer transform after KafkaIO.read() fails , the messages would be lost (which sounds like at-most once semantics) 2. commitOffsetsFinalize() commits when the pipeline is finished. But when does the pipeline end? In other words, when is PipelineResult.State = Done in a streaming scenario? Thanks!