Good morning! I have the following usecase:
My program reads nested data (in this specific case XML) based on projections (path expressions) of this data. Often multiple paths are projected onto the same input. I would like each path to result in its own dataset. Is it possible to generate more than 1 dataset using a readFile operation to prevent reading the input twice? I have thought about a workaround where the InputFormat would return Tuple2s and the first field is the name of the dataset to which a record belongs. This would however require me to filter the read data once for each dataset or to do a groupReduce which is some overhead i'm looking to prevent. Is there a better (less overhead) workaround for doing this? Or is there some mechanism in Flink that would allow me to do this? Cheers! - Pieter