Haha, thanks, Sourabh, you beat me to it :) On Thu, Jun 1, 2017 at 2:55 PM, Dmitry Demeshchuk <[email protected]> wrote:
> Looks like the expand method should do the trick, similar to how it's done > in GroupByKey? > > https://github.com/apache/beam/blob/dc4acfdd1bb30a07a9c48849f88a67 > f60bc8ff08/sdks/python/apache_beam/transforms/core.py#L1104 > > On Thu, Jun 1, 2017 at 2:37 PM, Dmitry Demeshchuk <[email protected]> > wrote: > >> Hi folks, >> >> I'm currently playing with the Python SDK, primarily 0.6.0, since 2.0.0 >> is not apparently supported by Dataflow, but trying to understand the 2.0.0 >> API better too. >> >> I've been trying to find a way of combining two or more DoFn's into a >> single one, so that one doesn't have to repeat the same pattern over and >> over again. >> >> Specifically, my use case is getting data out of Redshift via the >> "UNLOAD" command: >> >> 1. Connect to Redshift via Postgres protocol and do the unload >> <http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html>. >> 2. Connect to S3 and fetch the files that Redshift unloaded there, >> converting them into a PCollection. >> >> It's worth noting here that Redshift generates multiple files, usually at >> least 10 or so, the exact number may depend on the amount of cores of the >> Redshift instance, some settings, etc. Reading these files in parallel >> sounds like a good idea. >> >> So, it feels like this is just a combination of two FlatMaps: >> 1. SQL query -> list of S3 files >> 2. List of S3 files -> rows of data >> >> I could just create two DoFns for that and make people combine them, but >> that feels like an overkill. Instead, one should just call ReadFromRedshift >> and not really care about what exactly happens under the hood. >> >> Plus, it just feels like the ability of taking somewhat complex pieces of >> the execution graph and encapsulating them into a DoFn would be a nice >> capability. >> >> Are there any officially recommended ways to do that? >> >> Thank you. >> >> -- >> Best regards, >> Dmitry Demeshchuk. >> > > > > -- > Best regards, > Dmitry Demeshchuk. > -- Best regards, Dmitry Demeshchuk.
