Hi everyone,

I'm new to Apache Beam and have a question regarding its usage.

I have a scenario where I need to read a stream of elements from a
PCollection and write them to a new file every 5 minutes.

Initially, I considered using Timers and state stores, but I discovered
that Timers are only applicable to KV pairs. If I convert my PCollection
into a key-value pair with a dummy key and then use timers, I encountered
several issues:

   1. It introduces an additional shuffle.
   2. With all elements sharing the same key, they would be processed by a
   single task in the Flink on Beam application. I prefer not to manually
   define the number of keys based on load because I plan to run multiple
   pipelines, each with varying loads.

One alternative I considered is using a custom executor thread within my
Writer DoFn to flush the records every 5 minutes. However, this approach
would require me to use a lock to make sure only one of the process element
and the flush blocks are running at a time.

Is there a more effective way to accomplish this?



Thanks,

Vamsi

Reply via email to