Hi list, with our current project we are implementing our streaming pipeline based on Google Dataflow.
Essentially we receive input via Kinesis, doing some filtering, enrichment and sessionizing and output to PubSub and/or google storage. After short investigations it is not clear to us, how checkpointing will work running on Dataflow in connection with KinesisIO. Is there any documentation/discussions to get a better understanding on how that will be working? Especially if we are forced to restart our pipelines, how could we ensure not to loose any events? As far as I understand currently, it should work 'auto-magically' but it is not yet clear to us, how it will actually behave. Before we try to start testing our expectations or even try to implement some watermark-tracking by ourself we hoped to get some insights from other users here. Any help appreciated. Best, michel