Hi, I am looking at implementation of zipWithIndex in DataSetUtils- https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java
It works in two phases/steps 1) Count number of elements in each partition (using mapPartition) 2) In second mapPartition, unique ID is assigned by calculating offset using number of elements computed in step 1. Is there any chance the second mapPartition won't get same number of elements as first mapPartition (assuming data is in HDFS)? Thanks Tarandeep