Hi,

I am looking at implementation of zipWithIndex in DataSetUtils-
https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java

It works in two phases/steps
1) Count number of elements in each partition (using mapPartition)
2) In second mapPartition, unique ID is assigned by calculating offset
using number of elements computed in step 1.

Is there any chance the second mapPartition won't get same number of
elements as first mapPartition (assuming data is in HDFS)?

Thanks
Tarandeep

Reply via email to