Re: Single threaded processing

2019-01-08 Thread Matt Casters
Hi Pablo, Thanks a lot for taking the time out to answer my questions. It's been fantastic to experience this. For others reading this in the archives in the future, the trick seems to be in the *FileIO.readMatches()* call which documents as - > >Converts each result of match() > >

Re: Single threaded processing

2019-01-07 Thread Pablo Estrada
I've just learned that there are these transforms that should be useful: p.apply(FileIO.match().filepattern(...)) .apply(WithKeys.of((Void) null)) .apply(GroupByKey.create()) .apply(Values.create()) .apply(Flatten.itearables()) .apply(FileIO.readMatches()) .apply(ParDo.of(new ConsumeFi

Re: Single threaded processing

2019-01-07 Thread Pablo Estrada
Hi Matt, I am much more familiar with Python, so I usually answer questions using that SDK. Also, it's quicker to type a fully detailed pipeline on an email and the SDKs are similar enough that it should not be too difficult to translate to Java from an IDE. To your questions: 1. Grouping like th

Re: Single threaded processing

2019-01-07 Thread Matt Casters
Hi Pablo, Apologies, I thought the cases were very simple and clear. Obviously I should have also mentioned I'm in Java land, not used to the script kiddy stuff :-) On the output side: thanks for the grouping "trick". However, doesn't that mean that all rows will end up in a single in-memory It

Re: Single threaded processing

2019-01-07 Thread Pablo Estrada
Hi Matt, is this computation running as part of a larger pipeline that does run some parallel processing? Otherwise, it's odd that it needs to run on Beam. Nonetheless, you can certainly do this with a pipeline that has a single element. Here's what that looks like in python: p | beam.Create(['gs:

Single threaded processing

2019-01-07 Thread Matt Casters
Hi Beam! There's a bunch of stuff that I would like to support and it's probably something silly but I couldn't find it immediately ... or I'm completely dim and making too much of certain things. The thing is, sometimes you just want to do a single threaded operations. For example, we sometimes