Question about datasource replication

Flavio Pompermaier Fri, 04 May 2018 05:45:07 -0700

Hi all,
I've a Flink batch job that reads a parquet dataset and then applies 2
flatMap to it (see pseudocode below).
The problem is that this dataset is quite big and Flink duplicates it before
sending the data to these 2 operators (I've guessed this from the doubling
amount of sent bytes) .
Is there a way to avoid this behaviour?


-------------------------------------------------------
Here's the pseudo code of my job:

DataSet X = readParquetDir();
X1 = X.flatMap(...);
X2 = X.flatMap(...);

Best,
Flavio

Question about datasource replication

Reply via email to