Hi Evan,

What do you mean startup delay? Is it the time that from you start the
pipeline to the time that you notice the first output record from PubSub?

On Sat, May 8, 2021 at 12:50 AM Ismaël Mejía <ieme...@gmail.com> wrote:

> Can you try running direct runner with the option
> `--experiments=use_deprecated_read`
>
> Seems like an instance of
> https://issues.apache.org/jira/browse/BEAM-10670?focusedCommentId=17316858&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17316858
> also reported in
> https://lists.apache.org/thread.html/re6b0941a8b4951293a0327ce9b25e607cafd6e45b69783f65290edee%40%3Cdev.beam.apache.org%3E
>
> We should rollback using the SDF wrapper by default because of the
> usability and performance issues reported.
>
>
> On Sat, May 8, 2021 at 12:57 AM Evan Galpin <evan.gal...@gmail.com> wrote:
>
>> Hi all,
>>
>> I’m experiencing very slow performance and startup delay when testing a
>> pipeline locally. I’m reading data from a Google PubSub subscription as the
>> data source, and before each pipeline execution I ensure that data is
>> present in the subscription (readable from GCP console).
>>
>> I’m seeing startup delay on the order of minutes with DirectRunner (5-10
>> min). Is that expected? I did find a Jira ticket[1] that at first seemed
>> related, but I think it has more to do with BQ than DirectRunner.
>>
>> I’ve run the pipeline with a debugger connected and confirmed that it’s
>> minutes before the first DoFn in my pipeline receives any data. Is there a
>> way I can profile the direct runner to see what it’s churning on?
>>
>> Thanks,
>> Evan
>>
>> [1]
>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/BEAM-4548
>>
>

Reply via email to