1) Your looking for SplittableDoFn[1]. It is still in development and a conversion of all the current IO connectors that exist today to be able to consume a PCollection of resources is yet to come. There is some limited usecases that exist already like FileIO.match[2] and if these fit your usecase then great.
2) Yes, for yet to be supported usecases, people have just been using ParDo and implement the "IO" logic themselves. 1: https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html 2: https://github.com/apache/beam/blob/2ac5b764e3450798661a97f2b51f2d602feafb23/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java#L133 On Thu, Mar 14, 2019 at 7:04 AM Jozef Vilcek <[email protected]> wrote: > Hello, > > I wanted to write a Beam code which expands incoming `PCollection<>`, > element wise, by use of existing IO components. Example could be to have a > `PCollection<ResourceId>` which will hold arbitrary paths to data and I > want to load them via `HadoopFormatIO.Read` which is of `PTransform<PBegin, > PCollection<KV<>>`. > > Problem is, I do not know how or if it is possible at all. > 1. I do not see a way how to element wise apply `PTransform<PBegin, ..> > (hence reuse some existing IOs). Is it poossible? > > 2. If I would want to write such logic custom, is it doable in Beam model? > > Thanks, > Jozef >
