Hi,

I'd like to confirm Beams data guarantees when used with Google Cloud PubSub 
and Cloud Storage and running on Dataflow. I can't find any explicit 
documentation on it.

If the Beam job is running successfully, then I believe all data will be 
delivered to GCS at least once. If I stop the job with 'Drain', then any 
inflight data will be processed and saved.

What happens if the Beam job is not running successfully, and maybe throwing 
exceptions? Will the data still be available in PubSub when I cancel (not 
drain) the job? Does a drain work successfully if the data cannot be written to 
GCS because of the exceptions?

Thanks,
Andrew

Reply via email to