Hi Mikhail, Thanks for reply.
I believe that Dataflow optimizer assumes that running this pipeline on a single worker is faster and more cost-effective than spinning up 100 workers. Work items should still be handled in parallel on single worker though. Amount of workers can increase if you increase amount of source data. - Yes, Even if its running on one worker, it should make parallel requests but its not happening like that. Do you think Reshuffle would help for parallelism ? I am not sure how it works. Regards, Anjana ________________________________ From: Mikhail Gryzykhin [[email protected]] Sent: Friday, June 21, 2019 11:34 AM To: [email protected] Subject: [Sender Auth Failure] Re: Question about Parallel execution Hi Anjana, I believe that Dataflow optimizer assumes that running this pipeline on a single worker is faster and more cost-effective than spinning up 100 workers. Work items should still be handled in parallel on single worker though. Amount of workers can increase if you increase amount of source data. Another concern about this pipeline is that it can actually send more than 100 requests to API in case of failures/retries of handling some work item or when DF decides that it's worth handling same item on two workers and let first item to complete to go down the pipeline. Regards, Mikhail. On Fri, Jun 21, 2019 at 11:26 AM Anjana Pydi <[email protected]<mailto:[email protected]>> wrote: Hi, I have a beam pipeline which create 100 requests and post it to an API endpoint like below - with beam.Pipeline(options=PipelineOptions()) as p: elements = (p | beam.Create(range(1,101)) | 'create requests' >> beam.ParDo(create_random_responses()) | 'send to api' >> beam.Map(lambda input: send_to_api(input)) ) When running pipeline using Dataflow runner, I expect it to do 100 requests in parallel, but it is doing them in sequential and the runner uses only 1 worker. Can some one please explain how to make it keep requests in parallel instead of sequential. Thanks, Anjana ----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. ----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
