Hi Alexandre,

According to this StackOverFlow conversation[1], seems both AsyncFunction and 
FlatMapFunction requires the output object could fit into memory, which seems 
not feasible in the case you mentioned.

Maybe it could be done by creating a customized Flink Source with FLIP-27 new 
API? It should allow splitting the entire job multiple subtasks (both bounded 
and unbounded ones), yields source records as required without storing entire 
result batch in memory.

Regards,
yux

[1] 
https://stackoverflow.com/questions/48141409/apache-flink-flatmap-with-millions-of-outputs


De : Alexandre KY <alexandre...@magellium.fr>
Date : vendredi, 7 juin 2024 à 16:24
À : user <user@flink.apache.org>
Objet : Flatmap "blocking" or not

Hi,



I am designing a Flink pipeline to process a stream of images (rasters to be 
more accurate which are quite heavy: up to dozen GB). To distribute the process 
of one image, we split it into tiles to which we apply the processing that 
don't require the whole image before reassembling it. Tiles are bigger than the 
area they are supposed to cover which leads to duplicate data since they have 
pixels that should be on their neighbor tiles (called margin). This provides a 
"context" for the algorithms so that the processed tiles have some continuity 
with their neighbors.

So one of the first process is the tiling which splits the image into tiles. I 
plan to use flatmap since it takes one entry and can produce multiple outputs 
which is our case since we take an image in entry that we split into tiles. 
However, I want to know if it sends the tile to the next operator as it is 
added to the output collector or it waits for the function to have completed 
and then forwards the collector. This is very important since images are heavy 
and all the tiles together are even bigger due to the margins (hundreds GB), 
they can't be stored in memory all at once. If it is the first, then there are 
no issues, however, if it is the latter, then is there an alternative to 
flatmap ?



Sincerely,

Ky Alexandre

Reply via email to