1)
Your looking for SplittableDoFn[1]. It is still in development and a
conversion of all the current IO connectors that exist today to be able to
consume a PCollection of resources is yet to come.
There is some limited usecases that exist already like FileIO.match[2] and
if these fit your usecase then great.

2) Yes, for yet to be supported usecases, people have just been using ParDo
and implement the "IO" logic themselves.

1: https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
2:
https://github.com/apache/beam/blob/2ac5b764e3450798661a97f2b51f2d602feafb23/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java#L133


On Thu, Mar 14, 2019 at 7:04 AM Jozef Vilcek <jozo.vil...@gmail.com> wrote:

> Hello,
>
> I wanted to write a Beam code which expands incoming `PCollection<>`,
> element wise, by use of existing IO components. Example could be to have a
> `PCollection<ResourceId>` which will hold arbitrary paths to data and I
> want to load them via `HadoopFormatIO.Read` which is of `PTransform<PBegin,
> PCollection<KV<>>`.
>
> Problem is, I do not know how or if it is possible at all.
> 1. I do not see a way how to element wise apply `PTransform<PBegin, ..>
> (hence reuse some existing IOs). Is it poossible?
>
> 2. If I would want to write such logic custom, is it doable in Beam model?
>
> Thanks,
> Jozef
>

Reply via email to