"Flink will only commit the kafka offsets when the data has been saved to S3"
-> no, you can check the BucketingSink code, and it would mean BucketingSink
depends on Kafka which is not reasonable.
Flink stores checkpoint in disk of each worker, not Kafka.
(KafkaStream, the other streaming API prov
I’m working with a kafka environment where I’m limited to 100 partitions @
1GB log.retention.bytes each. I’m looking to implement exactly once
processing from this kafka source to a S3 sink.
If I have understood correctly, Flink will only commit the kafka offsets
when the data has been saved to S