Re: Scaling Beam pipeline on Data Flow - Join bounded and non-bounded source

Juan Carlos Garcia Tue, 19 Mar 2019 12:54:05 -0700

Hi Maulik,

Have you submitted your job with the correct configuration to enable
autoscaling?


--autoscalingAlgorithm=
--maxWorkers=

I am on my phone right now and can't tell if the flags name are 100%
correct.


Maulik Gandhi <[email protected]> schrieb am Di., 19. März 2019, 18:13:

>
> Maulik Gandhi <[email protected]>
> 10:19 AM (1 hour ago)
> to user
> Hi Beam Community,
>
> I am working on Beam processing pipeline, which reads data from the
> non-bounded and bounded source and want to leverage Beam state management
> in my pipeline.  For putting data in Beam state, I have to transfer the
> data in key-value (eg: KV<String, Object>.  As I am reading data from the
> non-bounded and bounded source, I am forced to perform Window + Triggering,
> before grouping data by key.  I have chosen to use GlobalWindows().
>
> I am able to kick-off the Data Flow job, which would run my Beam
> pipeline.  I have noticed Data Flow would use only 1 Worker node to perform
> the work, and would not scale the job to use more worker nodes, thus not
> leveraging the benefit of distributed processing.
>
> I have posted the question on Stack Overflow:
> https://stackoverflow.com/questions/55242684/join-bounded-and-non-bounded-source-data-flow-job-not-scaling
>  but
> reaching out on the mailing list, to get some help, or learn what I
> am missing.
>
> Any help would be appreciated.
>
> Thanks.
> - Maulik
>

Re: Scaling Beam pipeline on Data Flow - Join bounded and non-bounded source

Reply via email to