Hi list,

with our current project we are implementing our streaming pipeline based
on Google Dataflow.

Essentially we receive input via Kinesis, doing some filtering, enrichment
and sessionizing and output to PubSub and/or google storage.

After short investigations it is not clear to us, how checkpointing will
work running on Dataflow in connection with KinesisIO. Is there any
documentation/discussions to get a better understanding on how that will be
working? Especially if we are forced to restart our pipelines, how could we
ensure not to loose any events?

As far as I understand currently, it should work 'auto-magically' but it is
not yet clear to us, how it will actually behave. Before we try to start
testing our expectations or even try to implement some watermark-tracking
by ourself we hoped to get some insights from other users here.

Any help appreciated.

Best,

michel

Reply via email to