Re: Extremely Slow DirectRunner

2021-05-14 Thread Evan Galpin
Any further thoughts here? Or tips on profiling Beam DirectRunner? Thanks, Evan On Wed, May 12, 2021 at 6:22 PM Evan Galpin wrote: > Ok gotcha. In my tests, all sdk versions 2.25.0 and higher exhibit slow > behaviour regardless of use_deprecated_reads. Not sure if that points to > something dif

Re: Extremely Slow DirectRunner

2021-05-12 Thread Evan Galpin
Ok gotcha. In my tests, all sdk versions 2.25.0 and higher exhibit slow behaviour regardless of use_deprecated_reads. Not sure if that points to something different then. Thanks, Evan On Wed, May 12, 2021 at 18:16 Steve Niemitz wrote: > I think it was only broken in 2.29. > > On Wed, May 12, 20

Re: Extremely Slow DirectRunner

2021-05-12 Thread Steve Niemitz
I think it was only broken in 2.29. On Wed, May 12, 2021 at 5:53 PM Evan Galpin wrote: > Ah ok thanks for that. Do you mean use_deprecated_reads is broken > specifically in 2.29.0 (regression) or broken in all versions up to and > including 2.29.0 (ie never worked)? > > Thanks, > Evan > > On Wed

Re: Extremely Slow DirectRunner

2021-05-12 Thread Evan Galpin
Ah ok thanks for that. Do you mean use_deprecated_reads is broken specifically in 2.29.0 (regression) or broken in all versions up to and including 2.29.0 (ie never worked)? Thanks, Evan On Wed, May 12, 2021 at 17:12 Steve Niemitz wrote: > Yeah, sorry my email was confusing. use_deprecated_rea

Re: Extremely Slow DirectRunner

2021-05-12 Thread Steve Niemitz
Yeah, sorry my email was confusing. use_deprecated_reads is broken on the DirectRunner in 2.29. The behavior you describe is exactly the behavior I ran into as well when reading from pubsub with the new read method. I believe that soon the default is being reverted back to the old read method, n

Re: Extremely Slow DirectRunner

2021-05-12 Thread Evan Galpin
I'd be happy to share what I can. The applicable portion is this `expand` method of a PTransform (which does nothing more complex than group these other transforms together for re-use). The input to this PTransform is pubsub message bodies as strings. I'll paste it as plain-text. @Override

Re: Extremely Slow DirectRunner

2021-05-12 Thread Boyuan Zhang
Hi Evan, It seems like the slow step is not the read that use_deprecated_read targets for. Would you like to share your pipeline code if possible? On Wed, May 12, 2021 at 1:35 PM Evan Galpin wrote: > I just tried with v2.29.0 and use_deprecated_read but unfortunately I > observed slow behavior

Re: Extremely Slow DirectRunner

2021-05-12 Thread Evan Galpin
I just tried with v2.29.0 and use_deprecated_read but unfortunately I observed slow behavior again. Is it possible that use_deprecated_read is broken in 2.29.0 as well? Thanks, Evan On Wed, May 12, 2021 at 3:21 PM Steve Niemitz wrote: > oops sorry I was off by 10...I meant 2.29 not 2.19. > > On

Re: Extremely Slow DirectRunner

2021-05-12 Thread Steve Niemitz
oops sorry I was off by 10...I meant 2.29 not 2.19. On Wed, May 12, 2021 at 2:55 PM Evan Galpin wrote: > Thanks for the link/info. v2.19.0 and v2.21.0 did exhibit the "faster" > behavior, as did v2.23.0. But that "fast" behavior stopped at v2.25.0 (for > my use case at least) regardless of use_d

Re: Extremely Slow DirectRunner

2021-05-12 Thread Evan Galpin
Thanks for the link/info. v2.19.0 and v2.21.0 did exhibit the "faster" behavior, as did v2.23.0. But that "fast" behavior stopped at v2.25.0 (for my use case at least) regardless of use_deprecated_read setting. Thanks, Evan On Wed, May 12, 2021 at 2:47 PM Steve Niemitz wrote: > use_deprecated_

Re: Extremely Slow DirectRunner

2021-05-12 Thread Steve Niemitz
use_deprecated_read was broken in 2.19 on the direct runner and didn't do anything. [1] I don't think the fix is in 2.20 either, but will be in 2.21. [1] https://github.com/apache/beam/pull/14469 On Wed, May 12, 2021 at 1:41 PM Evan Galpin wrote: > I forgot to also mention that in all tests I

Re: Extremely Slow DirectRunner

2021-05-12 Thread Evan Galpin
I forgot to also mention that in all tests I was setting --experiments=use_deprecated_read Thanks, Evan On Wed, May 12, 2021 at 1:39 PM Evan Galpin wrote: > Hmm, I think I spoke too soon. I'm still seeing an issue of overall > DirectRunner slowness, not just pubsub. I have a pipeline like so: >

Re: Extremely Slow DirectRunner

2021-05-12 Thread Evan Galpin
Hmm, I think I spoke too soon. I'm still seeing an issue of overall DirectRunner slowness, not just pubsub. I have a pipeline like so: Read pubsub | extract GCS glob patterns | FileIO.matchAll() | FileIO.readMatches() | Read file contents | etc I have temporarily set up a transform betwe

Re: Extremely Slow DirectRunner

2021-05-12 Thread Evan Galpin
On Mon, May 10, 2021 at 2:09 PM Boyuan Zhang wrote: > Hi Evan, > > What do you mean startup delay? Is it the time that from you start the > pipeline to the time that you notice the first output record from PubSub? > Yes that's what I meant, the seemingly idle system waiting for pubsub output des

Re: Extremely Slow DirectRunner

2021-05-10 Thread Boyuan Zhang
Hi Evan, What do you mean startup delay? Is it the time that from you start the pipeline to the time that you notice the first output record from PubSub? On Sat, May 8, 2021 at 12:50 AM Ismaël Mejía wrote: > Can you try running direct runner with the option > `--experiments=use_deprecated_read`

Re: Extremely Slow DirectRunner

2021-05-08 Thread Ismaël Mejía
Can you try running direct runner with the option `--experiments=use_deprecated_read` Seems like an instance of https://issues.apache.org/jira/browse/BEAM-10670?focusedCommentId=17316858&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17316858 also reported in https:/