Sorry, I can’ say much about SDF. Maybe Lukasz Cwik can provide more details on this.
> On 8 Sep 2020, at 09:01, Gaurav Nakum <gaurav.na...@oracle.com> wrote: > > Thank you very much for your explanation! > commitOffsetsInFinalize() -> although checkpointing depends on the runner is > it not configurable in a connector implementation? > Basically, I want to understand how this can be done with a new IO connector > implementation, esp. with the new SDF API. If I am right, in the traditional > UnboundedSource API, checkpointing was configured using > UnboundedSource.CheckpointMark, but I am not sure about the SDF API. > Also, since KafkaIO SDF read does not provide commitOffsetsInFinalize > functionality could you point to some resources which discuss checkpointing > using the new SDF API? > > Thank you, > Gaurav > On 9/7/20 10:54 AM, Alexey Romanenko wrote: >> From my understanding: >> - ENABLE_AUTO_COMMIT_CONFIG will say to Kafka consumer (used inside KafkaIO >> to read messages) to commit periodically offsets in the background; >> - on the other hand, if "commitOffsetsInFinalize()” is used, then Beam >> Checkpoint mechanism will be leveraged to restart from checkpoints in case >> of failures. It won’t need to wait for pipeline's finish, though it’s up to >> the runner to decide when and how often to save checkpoints. >> >> In KafkaIO, it’s possible to use only one option for the same transform - >> either ENABLE_AUTO_COMMIT_CONFIG or commitOffsetsInFinalize() >> >> >>> On 6 Sep 2020, at 07:24, Apple <gaurav.na...@oracle.com >>> <mailto:gaurav.na...@oracle.com>> wrote: >>> >>> Hi everyone, >>> >>> >>> I have a question on KafkaIO. >>> What is the difference between setting AUTO_COMMIT_CONFIG and >>> commitOffsetsInFinalize()? My understanding is that: >>> >>> 1. AUTO_COMMIT_CONFIG commits Kafka records as soon as >>> KafkaIO.read() outputs messages, but I am not sure how would this be >>> helpful, for e.g. if a consumer transform after KafkaIO.read() fails , the >>> messages would be lost (which sounds like at-most once semantics) >>> >>> 2. commitOffsetsFinalize() commits when the pipeline is >>> finished. But when does the pipeline end? In other words, when is >>> PipelineResult.State = Done in a streaming scenario? >>> >>> Thanks!