Hi Flavio, Which version of Flink are you using?
-- Thanks, Amit On Fri, May 4, 2018 at 6:14 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Hi all, > I've a Flink batch job that reads a parquet dataset and then applies 2 > flatMap to it (see pseudocode below). > The problem is that this dataset is quite big and Flink duplicates it before > sending the data to these 2 operators (I've guessed this from the doubling > amount of sent bytes) . > Is there a way to avoid this behaviour? > > ------------------------------------------------------- > Here's the pseudo code of my job: > > DataSet X = readParquetDir(); > X1 = X.flatMap(...); > X2 = X.flatMap(...); > > Best, > Flavio