Yes, another KeyBy will be used. The “small size” messages will be strings of 
length 500 to 1000.

Is there a concept of “global” state in flink? Is it possible to keep these 
lists in global state and only pass the list reference (by name?) in the 
LargeMessage?


From: Chesnay Schepler <ches...@apache.org>
Date: Friday, February 8, 2019 at 8:45 AM
To: "Aggarwal, Ajay" <ajay.aggar...@netapp.com>, "user@flink.apache.org" 
<user@flink.apache.org>
Subject: Re: stream of large objects

Whether a LargeMessage is serialized depends on how the job is structured.
For example, if you were to only apply map/filter functions after the 
aggregation it is likely they wouldn't be serialized.
If you were to apply another keyBy they will be serialized again.

When you say "small size" messages, what are we talking about here?

On 07.02.2019 20:37, Aggarwal, Ajay wrote:
In my use case my source stream contain small size messages, but as part of 
flink processing I will be aggregating them into large messages and further 
processing will happen on these large messages. The structure of this large 
message will be something like this:

   Class LargeMessage {
        String key
       List <String> messages; // this is where the aggregation of smaller 
messages happen
   }

In some cases this list field of LargeMessage can get very large (1000’s of 
messages). Is it ok to create an intermediate stream of these LargeMessages? 
What should I be concerned about while designing the flink job? Specifically 
with parallelism in mind. As these LargeMessages flow from one flink subtask to 
another, do they get serialized/deserialized ?

Thanks.



Reply via email to