I have written a feature extraction pipeline in which I extract two features using ParDos and combine the results with a CoGroupByKey.
with beam.Pipeline() as p: input = p | 'read input' >> beam.io.ReadFromText(input_path) first_feature = input | 'extract first feature' >> beam.ParDo(ExtractFirstFeatureFn()) second_feature = input | 'extract second feature' >> beam.ParDo(ExtractSecondFeatureFn()) combined = {'first feature': first_feature, 'second feature': second_feature} | 'combine' >> beam.CoGroupByKey() I'd like to extend the pipeline to extract an arbitrary number of features while still aggregating them at the end with a CoGroupByKey. I'd also like to be able to decide at runtime (via command line arguments or a configuration file) which features will be extracted (e.g., extract features 1 and 3, but not feature 2). How could I write such a pipeline? Thanks in advance, - Xander