One of the reasons is to ensure that data cannot be modified after it left
a thread.
A function that emits the same object several times (in order to reduce
object creation & GC) might accidentally modify emitted records if they
would be put as object in a queue.
Moreover, it is easier to control the memory consumption if data is
serialized into a fixed number of buffers instead of being put on the JVM
heap.

Best, Fabian

2017-01-16 14:21 GMT+01:00 Dmitry Golubets <dgolub...@gmail.com>:

> Hi Ufuk,
>
> Do you know what's the reason for serialization of data between different
> threads?
>
> Also, thanks for the link!
>
> Best regards,
> Dmitry
>
> On Mon, Jan 16, 2017 at 1:07 PM, Ufuk Celebi <u...@apache.org> wrote:
>
>> Hey Dmitry,
>>
>> this is not possible if I'm understanding you correctly.
>>
>> A task chain is executed by a single task thread and hence it is not
>> possible to continue processing before the record "leaves" the thread,
>> which only happens when the next task thread or the network stack
>> consumes it.
>>
>> Hand over between chained tasks happens without serialization. Only
>> data between different task threads is serialized.
>>
>> Depending on your use case the newly introduced async I/O feature
>> might be worth a look (will be part of the upcoming 1.2 release):
>> https://github.com/apache/flink/pull/2629
>>
>
>

Reply via email to