Good morning!

I have the following usecase:

My program reads nested data (in this specific case XML) based on
projections (path expressions) of this data. Often multiple paths are
projected onto the same input. I would like each path to result in its own
dataset.

Is it possible to generate more than 1 dataset using a readFile operation
to prevent reading the input twice?

I have thought about a workaround where the InputFormat would return
Tuple2s and the first field is the name of the dataset to which a record
belongs. This would however require me to filter the read data once for
each dataset or to do a groupReduce which is some overhead i'm looking to
prevent.

Is there a better (less overhead) workaround for doing this? Or is there
some mechanism in Flink that would allow me to do this?

Cheers!

- Pieter

Reply via email to