Hi Mailing List, according to my questions (and your answers!) at this topic http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Performance-issues-with-GroupBy-td8130.html
I have eliminated my ArrayList<T> in my collect methods. Additional I want to emit partial results. My mapper has the following layout: ArrayList<ArrayList<Tuple> structure = ... For (Tuple tuple : input) { addTupleToStructure() } While(WorkNotDone) { doSomeWorkOnStructure() emitPartialResult(); } Instead of emitting the partial result as an ArrayList<T> ("not a valid POJO type") I do now iterate through this ArrayList<T> and emit each Tuple as Tuple2.of(Integer.valueOf(this.partitionIndex), tuple))); While iterating through this ArrayList and emitting tuples, my mapper seems to be blocked and can't continue to doSomeWorkOnStructure(). So I have three questions: - If I change back to emitting the ArrayList<T> would my Mapper also be blocked until Flink has serialized this ArrayList<T>? Or is Serialization done independent from my Mapper? - If emitting the ArrayList<T> won't block my Mapper, which variant would be more performant? - If I emit ArrayList<T>, but additionally implement a combiner, which o Merges all local ArrayLists<T> with the same partitionIndex o Iterates through the local-merged ArrayLists<T> and emits the containing tuples would that be the best variant? Because the combining is done locally, I would assume that no Serialization is required between Mapper and Combiner. Also, the Mapper is probably not blocked with emitting tuples and can continue doSomeWorkOnStructure() Thank you in advance! Robert