Hi all,
I've a Flink batch job that reads a parquet dataset and then applies 2
flatMap to it (see pseudocode below).
The problem is that this dataset is quite big and Flink duplicates it before
sending the data to these 2 operators (I've guessed this from the doubling
amount of sent bytes) .
Is there a way to avoid this behaviour?

-------------------------------------------------------
Here's the pseudo code of my job:

DataSet X = readParquetDir();
X1 = X.flatMap(...);
X2 = X.flatMap(...);

Best,
Flavio

Reply via email to