Are you draining[1] your pipeline or simply canceling it and starting a new one? Draining should close open windows and attempt to flush all in-flight data before shutting down. For PubSub you may also need to read from subscriptions rather than topics to ensure messages are processed by either one or the other.
[1] https://cloud.google.com/dataflow/docs/guides/stopping-a-pipeline#drain On Mon, Apr 15, 2024 at 9:33 AM Juan Romero <jsrf...@gmail.com> wrote: > Hi guys. Good morning. > > I haven't done some test in apache beam over data flow in order to see if > i can do an hot update or hot swap meanwhile the pipeline is processing a > bunch of messages that fall in a time window of 10 minutes. What I saw is > that when I do a hot update over the pipeline and currently there are some > messages in the time window (before sending them to the target), the > current job is shutdown and dataflow creates a new one. The problem is that > it seems that I am losing the messages that were being processed in the old > one and they are not taken by the new one, which imply we are incurring in > losing data . > > Can you help me or recommend any strategy to me? > > Thanks!! >