Scaling Beam pipeline on Data Flow - Join bounded and non-bounded source

Maulik Gandhi Tue, 19 Mar 2019 10:13:59 -0700

Maulik Gandhi <mmg...@gmail.com>
10:19 AM (1 hour ago)
to user
Hi Beam Community,


I am working on Beam processing pipeline, which reads data from the
non-bounded and bounded source and want to leverage Beam state management
in my pipeline.  For putting data in Beam state, I have to transfer the
data in key-value (eg: KV<String, Object>.  As I am reading data from the
non-bounded and bounded source, I am forced to perform Window + Triggering,
before grouping data by key.  I have chosen to use GlobalWindows().

I am able to kick-off the Data Flow job, which would run my Beam pipeline.
I have noticed Data Flow would use only 1 Worker node to perform the work,
and would not scale the job to use more worker nodes, thus not leveraging
the benefit of distributed processing.

I have posted the question on Stack Overflow:
https://stackoverflow.com/questions/55242684/join-bounded-and-non-bounded-source-data-flow-job-not-scaling
but
reaching out on the mailing list, to get some help, or learn what I
am missing.

Any help would be appreciated.

Thanks.
- Maulik

Scaling Beam pipeline on Data Flow - Join bounded and non-bounded source

Reply via email to