Hi Sigalit, Yes. Here, most of your memory is consumed by JVM heap and Flink network memory, both are somewhat like a pre-allocated memory pool managed by JVM/Flink Memory Manager, which typically do not return memory to the OS even if there's some free space internally.
Best, Zhanghao Chen ________________________________ From: Sigalit Eliazov <e.siga...@gmail.com> Sent: Tuesday, June 4, 2024 15:46 To: Zhanghao Chen <zhanghao.c...@outlook.com> Subject: Re: Task Manager memory usage hi, thanks for your reply as you suggested the memory did stop increasing at some point but even after the pipeline was idle for a long time i still see the TM pod memory is not changed. I assume flink 'saves' this memory just in case data will arrive but is there any point where flink does release the memory at all? thanks Sigalit On Thu, May 23, 2024 at 6:49 PM Zhanghao Chen <zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com>> wrote: Hi Sigalit, For states stored in memory, they would most probably keep alive for several rounds of GC and ended up in the old gen of heap, and won't get recycled until a Full GC. As for the TM pod memory usage, most probabliy it will stop increasing at some point. You could try setting a larger taskmanager.memory.jvm-overhead memory, and monitor it for a long time. If that's not the case, then there might be native memory leakage somewhere, but that may not be related to the state. Best, Zhanghao Chen ________________________________ From: Sigalit Eliazov <e.siga...@gmail.com<mailto:e.siga...@gmail.com>> Sent: Thursday, May 23, 2024 18:20 To: user <user@flink.apache.org<mailto:user@flink.apache.org>> Subject: Task Manager memory usage Hi, I am trying to understand the following behavior in our Flink application cluster. Any assistance would be appreciated. We are running a Flink application cluster with 5 task managers, each with the following configuration: * jobManagerMemory: 12g * taskManagerMemory: 20g * taskManagerMemoryHeapSize: 12g * taskManagerMemoryNetworkMax: 4g * taskManagerMemoryNetworkMin: 1g * taskManagerMemoryManagedSize: 50m * taskManagerMemoryOffHeapSize: 2g * taskManagerMemoryNetworkFraction: 0.2 * taskManagerNetworkMemorySegmentSize: 4mb * taskManagerMemoryFloatingBuffersPerGate: 64 * taskmanager.memory.jvm-overhead.min: 256mb * taskmanager.memory.jvm-overhead.max: 2g * taskmanager.memory.jvm-overhead.fraction: 0.1 Our pipeline includes stateful transformations, and we are verifying that we clear the state once it is no longer needed. Through the Flink UI, we observe that the heap size increases and decreases during the job lifecycle. However, there is a noticeable delay between clearing the state and the reduction in heap size usage, which I assume is related to the garbage collector frequency. What is puzzling is the task manager pod memory usage. It appears that the memory usage increases intermittently and is not released. We verified the different state metrics and confirmed they are changing according to the logic. Additionally, if we had a state that was never released, I would expect to see the heap size increasing constantly as well. Any insights or ideas? Thanks, Sigalit