Hi folks,

Hoping to get some definitive answers with respect to streaming pipeline
bundle retry semantics on Dataflow.  I understand that a bundle containing
a "poison pill" (bad data, let's say it causes a null pointer exception
when processing in DoFn) will be retried indefinitely.  What I'm not clear
on are the implications of those retries.


   1. Is it the case that a worker will continuously retry the same "poison
   pill" bundle, and not be able to work on any other/new bundles indefinitely
   after receiving the first poison pill? I've noticed that a small number
   poison pills can cause all processing to stall, even if the bad data
   represents only a very small percentage of the overall data being processed
   2. Is there any implication with windowing and this retry/stall
   scenario?  I've noticed that the scenario where all processing stalls
   entirely is more common for a pipeline where all data is globally
   windowed.  I don't, however, have a solid understanding of how to explain
   that observation; I'd really appreciate any insights that can aid in
   understanding

Thanks,
Evan

Reply via email to