Thanks Reuven and Alex. Yes, we are considering specifying the max time to
read to the Pub/sub input connector first. If it doesn't work out due to
some reason, will consider the approach with GCS. Thanks for your inputs.
Regards,
Sumit Desai
On Mon, Jan 22, 2024 at 4:13 AM Reuven Lax via user
Cloud Storage subscriptions are a reasonable way to backup data to storage,
and you can then run a batch pipeline over the GCS files. Keep in mind that
these files might contain duplicates (the storage subscriptions do not
guarantee exactly-once writes). If this is a problem, you should add a
dedup
There are some valid use cases where you want to handle data going over
Pubsub to handle in batch. It's way too expensive to run a simple daily
extract from the data over Pubsub; batch is a lot cheaper.
What we do is backup the data to Cloud Storage; Pubsub has recently added a
nice feature that c
Some comments here:
1. All messages in a PubSub topic is not a well-defined statement, as
there can always be more messages published. You may know that nobody will
publish any more messages, but the pipeline does not.
2. While it's possible to read from Pub/Sub in batch, it's usually not
rec
Hi all,
I want to create a Dataflow pipeline using Pub/sub as an input connector
but I want to run it in batch mode and not streaming mode. I know it's not
possible in Python but how can I achieve this in Java? Basically, I want my
pipeline to read all messages in a Pubsub topic, process and termi