Any further thoughts here? Or tips on profiling Beam DirectRunner?
Thanks,
Evan
On Wed, May 12, 2021 at 6:22 PM Evan Galpin wrote:
> Ok gotcha. In my tests, all sdk versions 2.25.0 and higher exhibit slow
> behaviour regardless of use_deprecated_reads. Not sure if that points to
> something dif
Ok gotcha. In my tests, all sdk versions 2.25.0 and higher exhibit slow
behaviour regardless of use_deprecated_reads. Not sure if that points to
something different then.
Thanks,
Evan
On Wed, May 12, 2021 at 18:16 Steve Niemitz wrote:
> I think it was only broken in 2.29.
>
> On Wed, May 12, 20
I think it was only broken in 2.29.
On Wed, May 12, 2021 at 5:53 PM Evan Galpin wrote:
> Ah ok thanks for that. Do you mean use_deprecated_reads is broken
> specifically in 2.29.0 (regression) or broken in all versions up to and
> including 2.29.0 (ie never worked)?
>
> Thanks,
> Evan
>
> On Wed
Ah ok thanks for that. Do you mean use_deprecated_reads is broken
specifically in 2.29.0 (regression) or broken in all versions up to and
including 2.29.0 (ie never worked)?
Thanks,
Evan
On Wed, May 12, 2021 at 17:12 Steve Niemitz wrote:
> Yeah, sorry my email was confusing. use_deprecated_rea
Yeah, sorry my email was confusing. use_deprecated_reads is broken on the
DirectRunner in 2.29.
The behavior you describe is exactly the behavior I ran into as well when
reading from pubsub with the new read method. I believe that soon the
default is being reverted back to the old read method, n
I'd be happy to share what I can. The applicable portion is this `expand`
method of a PTransform (which does nothing more complex than group these
other transforms together for re-use). The input to this PTransform is
pubsub message bodies as strings. I'll paste it as plain-text.
@Override
Hi Evan,
It seems like the slow step is not the read that use_deprecated_read
targets for. Would you like to share your pipeline code if possible?
On Wed, May 12, 2021 at 1:35 PM Evan Galpin wrote:
> I just tried with v2.29.0 and use_deprecated_read but unfortunately I
> observed slow behavior
I just tried with v2.29.0 and use_deprecated_read but unfortunately I
observed slow behavior again. Is it possible that use_deprecated_read is
broken in 2.29.0 as well?
Thanks,
Evan
On Wed, May 12, 2021 at 3:21 PM Steve Niemitz wrote:
> oops sorry I was off by 10...I meant 2.29 not 2.19.
>
> On
oops sorry I was off by 10...I meant 2.29 not 2.19.
On Wed, May 12, 2021 at 2:55 PM Evan Galpin wrote:
> Thanks for the link/info. v2.19.0 and v2.21.0 did exhibit the "faster"
> behavior, as did v2.23.0. But that "fast" behavior stopped at v2.25.0 (for
> my use case at least) regardless of use_d
Thanks for the link/info. v2.19.0 and v2.21.0 did exhibit the "faster"
behavior, as did v2.23.0. But that "fast" behavior stopped at v2.25.0 (for
my use case at least) regardless of use_deprecated_read setting.
Thanks,
Evan
On Wed, May 12, 2021 at 2:47 PM Steve Niemitz wrote:
> use_deprecated_
use_deprecated_read was broken in 2.19 on the direct runner and didn't do
anything. [1] I don't think the fix is in 2.20 either, but will be in 2.21.
[1] https://github.com/apache/beam/pull/14469
On Wed, May 12, 2021 at 1:41 PM Evan Galpin wrote:
> I forgot to also mention that in all tests I
I forgot to also mention that in all tests I was setting
--experiments=use_deprecated_read
Thanks,
Evan
On Wed, May 12, 2021 at 1:39 PM Evan Galpin wrote:
> Hmm, I think I spoke too soon. I'm still seeing an issue of overall
> DirectRunner slowness, not just pubsub. I have a pipeline like so:
>
Hmm, I think I spoke too soon. I'm still seeing an issue of overall
DirectRunner slowness, not just pubsub. I have a pipeline like so:
Read pubsub | extract GCS glob patterns | FileIO.matchAll() |
FileIO.readMatches() | Read file contents | etc
I have temporarily set up a transform betwe
On Mon, May 10, 2021 at 2:09 PM Boyuan Zhang wrote:
> Hi Evan,
>
> What do you mean startup delay? Is it the time that from you start the
> pipeline to the time that you notice the first output record from PubSub?
>
Yes that's what I meant, the seemingly idle system waiting for pubsub
output des
Hi Evan,
What do you mean startup delay? Is it the time that from you start the
pipeline to the time that you notice the first output record from PubSub?
On Sat, May 8, 2021 at 12:50 AM Ismaël Mejía wrote:
> Can you try running direct runner with the option
> `--experiments=use_deprecated_read`
Can you try running direct runner with the option
`--experiments=use_deprecated_read`
Seems like an instance of
https://issues.apache.org/jira/browse/BEAM-10670?focusedCommentId=17316858&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17316858
also reported in
https:/
16 matches
Mail list logo