Hi Pablo,
Thanks a lot for taking the time out to answer my questions. It's been
fantastic to experience this.
For others reading this in the archives in the future, the trick seems to
be in the *FileIO.readMatches()* call which documents as
-
>
>Converts each result of match()
>
>
I've just learned that there are these transforms that should be useful:
p.apply(FileIO.match().filepattern(...))
.apply(WithKeys.of((Void) null))
.apply(GroupByKey.create())
.apply(Values.create())
.apply(Flatten.itearables())
.apply(FileIO.readMatches())
.apply(ParDo.of(new ConsumeFi
Hi Matt,
I am much more familiar with Python, so I usually answer questions using
that SDK. Also, it's quicker to type a fully detailed pipeline on an email
and the SDKs are similar enough that it should not be too difficult to
translate to Java from an IDE.
To your questions:
1. Grouping like th
Hi Pablo,
Apologies, I thought the cases were very simple and clear. Obviously I
should have also mentioned I'm in Java land, not used to the script kiddy
stuff :-)
On the output side: thanks for the grouping "trick". However, doesn't that
mean that all rows will end up in a single in-memory It
Hi Matt,
is this computation running as part of a larger pipeline that does run some
parallel processing? Otherwise, it's odd that it needs to run on Beam.
Nonetheless, you can certainly do this with a pipeline that has a single
element. Here's what that looks like in python:
p | beam.Create(['gs:
Hi Beam!
There's a bunch of stuff that I would like to support and it's probably
something silly but I couldn't find it immediately ... or I'm completely
dim and making too much of certain things.
The thing is, sometimes you just want to do a single threaded operations.
For example, we sometimes