Re: Best way to process data in many files? (FLINK-BATCH)

2016-02-24 Thread Tim Conrad
Dear Till and others. I solved the issue by using the strategy suggested by Till like this: List fileListOfSpectra = ... SplittableList fileListOfSpectraSplitable = new SplittableList( fileListOfSpectra ); DataSource fileListOfSpectraDataSource = env.fromParallelCollect

Re: Best way to process data in many files? (FLINK-BATCH)

2016-02-23 Thread Tim Conrad
Hi Till (and others). Thank you very much for your helpful answer. On 23.02.2016 14:20, Till Rohrmann wrote: [...] In contrast, if you had a parallel data source which would consist of multiple source task, then these tasks would be independent and spread out across your cluster [...] Can yo

Best way to process data in many files? (FLINK-BATCH)

2016-02-23 Thread Tim Conrad
Dear FLINK community. I was wondering what would be the recommended (best?) way to achieve some kind of file conversion. That runs in parallel on all available Flink Nodes, since it it "embarrassingly parallel" (no dependency between files). Say, I have a HDFS folder that contains multiple