Re: [Dataflow][Stateful] Bypass Dataflow Overrides?

2023-05-25 Thread Robert Bradshaw via user
The key in GroupIntoBatches is actually not semantically meaningful, and for a batch pipeline the use of state/timers is not needed either. If all you need to do is batch elements into groups of (at most) N, you can write a DoFn that collects things in its process method and emits them when the bat

Re: [Dataflow][Stateful] Bypass Dataflow Overrides?

2023-05-25 Thread Evan Galpin
Understood, thanks for the clarification, I'll need to look more in-depth at my pipeline code then. I'm definitely observing that all steps downstream from the Stateful step in my pipeline do not start until steps upstream of the Stateful step are fully completed. The Stateful step is a RateLimit

Re: [Dataflow][Stateful] Bypass Dataflow Overrides?

2023-05-25 Thread Robert Bradshaw via user
The GbkBeforeStatefulParDo is an implementation detail used to send all elements with the same key to the same worker (so that they can share state, which is itself partitioned by worker). This does cause a global barrier in batch pipelines. On Thu, May 25, 2023 at 2:15 PM Evan Galpin wrote: > H

[Dataflow][Stateful] Bypass Dataflow Overrides?

2023-05-25 Thread Evan Galpin
Hi all, I'm running into a scenario where I feel that Dataflow Overrides (specifically BatchStatefulParDoOverrides.GbkBeforeStatefulParDo ) are unnecessarily causing a batch pipeline to "pause" throughput since a GBK needs to have processed all the data in a window before it can output. Is it str

Re: 🗽 Join us in NYC at Beam Summit 2023

2023-05-25 Thread Pablo Estrada via user
let's goo On Thu, May 25, 2023 at 12:49 PM Carolina Escobar wrote: > *Get to know our speakers!* > > *Take a quick peek at our program:* > > >- > >*Beam IO: CDAP and SparkReceiver IO Connectors Overview * >Alex Kosolapov and Elizaveta Lomteva give an overview of a Beam IO >

🗽 Join us in NYC at Beam Summit 2023

2023-05-25 Thread Carolina Escobar
*Get to know our speakers!* *Take a quick peek at our program:* - *Beam IO: CDAP and SparkReceiver IO Connectors Overview * Alex Kosolapov and Elizaveta Lomteva give an overview of a Beam IO development process and practical insights gained from experience developing CDAP IO and