One of the reasons is to ensure that data cannot be modified after it left a thread. A function that emits the same object several times (in order to reduce object creation & GC) might accidentally modify emitted records if they would be put as object in a queue. Moreover, it is easier to control the memory consumption if data is serialized into a fixed number of buffers instead of being put on the JVM heap.
Best, Fabian 2017-01-16 14:21 GMT+01:00 Dmitry Golubets <dgolub...@gmail.com>: > Hi Ufuk, > > Do you know what's the reason for serialization of data between different > threads? > > Also, thanks for the link! > > Best regards, > Dmitry > > On Mon, Jan 16, 2017 at 1:07 PM, Ufuk Celebi <u...@apache.org> wrote: > >> Hey Dmitry, >> >> this is not possible if I'm understanding you correctly. >> >> A task chain is executed by a single task thread and hence it is not >> possible to continue processing before the record "leaves" the thread, >> which only happens when the next task thread or the network stack >> consumes it. >> >> Hand over between chained tasks happens without serialization. Only >> data between different task threads is serialized. >> >> Depending on your use case the newly introduced async I/O feature >> might be worth a look (will be part of the upcoming 1.2 release): >> https://github.com/apache/flink/pull/2629 >> > >