Hi Mailing List,

according to my questions (and your answers!) at this topic
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Performance-issues-with-GroupBy-td8130.html

I have eliminated my ArrayList<T> in my collect methods. Additional I want to 
emit partial results. My mapper has the following layout:

ArrayList<ArrayList<Tuple>  structure = ...

For (Tuple tuple : input) {
                addTupleToStructure()
}
While(WorkNotDone) {
                doSomeWorkOnStructure()
                emitPartialResult();
}

Instead of emitting the partial result as an ArrayList<T> ("not a valid POJO 
type") I do now iterate through this ArrayList<T> and emit each Tuple as 
Tuple2.of(Integer.valueOf(this.partitionIndex), tuple)));
While iterating through this ArrayList and emitting tuples, my mapper seems to 
be blocked and can't continue to doSomeWorkOnStructure().

So I have three questions:

-          If I change back to emitting the ArrayList<T> would my Mapper also 
be blocked until Flink has serialized this ArrayList<T>? Or is Serialization 
done independent from my Mapper?



-          If emitting the ArrayList<T> won't block my Mapper, which variant 
would be more performant?


-          If I emit ArrayList<T>, but additionally implement a combiner, which

o   Merges all local ArrayLists<T> with the same partitionIndex

o   Iterates through the local-merged ArrayLists<T> and emits the containing 
tuples
would that be the best variant? Because the combining is done locally, I would 
assume that no Serialization is required between Mapper and Combiner. Also, the 
Mapper is probably not blocked with emitting tuples and can continue 
doSomeWorkOnStructure()

Thank you in advance!
Robert



Reply via email to