I would assume the main issue is resuming reading from the Kinesis stream from the last read? In the case for Pubsub (just as another example of the idea) this is part of the internal state of a pre-created subscription.
Kenn On Tue, Apr 6, 2021 at 1:26 PM Michael Luckey <[email protected]> wrote: > Hi list, > > with our current project we are implementing our streaming pipeline based > on Google Dataflow. > > Essentially we receive input via Kinesis, doing some filtering, enrichment > and sessionizing and output to PubSub and/or google storage. > > After short investigations it is not clear to us, how checkpointing will > work running on Dataflow in connection with KinesisIO. Is there any > documentation/discussions to get a better understanding on how that will be > working? Especially if we are forced to restart our pipelines, how could we > ensure not to loose any events? > > As far as I understand currently, it should work 'auto-magically' but it > is not yet clear to us, how it will actually behave. Before we try to start > testing our expectations or even try to implement some watermark-tracking > by ourself we hoped to get some insights from other users here. > > Any help appreciated. > > Best, > > michel >
