Are you draining[1] your pipeline or simply canceling it and starting a new
one? Draining should close open windows and attempt to flush all in-flight
data before shutting down. For PubSub you may also need to read from
subscriptions rather than topics to ensure messages are processed by either
one or the other.

[1] https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline#drain

On Mon, Apr 15, 2024 at 9:33 AM Juan Romero <jsrf...@gmail.com> wrote:

> Hi guys. Good morning.
>
> I haven't done some test in apache beam over data flow in order to see if
> i can do an hot update or hot swap meanwhile the pipeline is processing a
> bunch of messages that fall in a time window of 10 minutes. What I saw is
> that when I do a hot update over the pipeline and currently there are some
> messages in the time window (before sending them to the target), the
> current job is shutdown and dataflow creates a new one. The problem is that
> it seems that I am losing the messages that were being processed in the old
> one and they are not taken by the new one, which imply we are incurring in
> losing data .
>
> Can you help me or recommend any strategy to me?
>
> Thanks!!
>

Reply via email to