Hi Sigalit,

Yes. Here, most of your memory is consumed by JVM heap and Flink network 
memory, both are somewhat like a pre-allocated memory pool managed by JVM/Flink 
Memory Manager, which typically do not return memory to the OS even if there's 
some free space internally.

Best,
Zhanghao Chen
________________________________
From: Sigalit Eliazov <e.siga...@gmail.com>
Sent: Tuesday, June 4, 2024 15:46
To: Zhanghao Chen <zhanghao.c...@outlook.com>
Subject: Re: Task Manager memory usage

hi, thanks for your reply
as you suggested the memory did stop increasing at some point
but even after the pipeline was idle for a long time i still see the TM pod 
memory is not changed.
I assume flink 'saves' this memory just in case data will arrive
but is there any point where flink does release the memory at all?


thanks
Sigalit

On Thu, May 23, 2024 at 6:49 PM Zhanghao Chen 
<zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com>> wrote:
Hi Sigalit,

For states stored in memory, they would most probably keep alive for several 
rounds of GC and ended up in the old gen of heap, and won't get recycled until 
a Full GC.

As for the TM pod memory usage, most probabliy it will stop increasing at some 
point. You could try setting a larger taskmanager.memory.jvm-overhead memory, 
and monitor it for a long time. If that's not the case, then there might be 
native memory leakage somewhere, but that may not be related to the state.

Best,
Zhanghao Chen
________________________________
From: Sigalit Eliazov <e.siga...@gmail.com<mailto:e.siga...@gmail.com>>
Sent: Thursday, May 23, 2024 18:20
To: user <user@flink.apache.org<mailto:user@flink.apache.org>>
Subject: Task Manager memory usage


Hi,

I am trying to understand the following behavior in our Flink application 
cluster. Any assistance would be appreciated.

We are running a Flink application cluster with 5 task managers, each with the 
following configuration:

  *   jobManagerMemory: 12g
  *   taskManagerMemory: 20g
  *   taskManagerMemoryHeapSize: 12g
  *   taskManagerMemoryNetworkMax: 4g
  *   taskManagerMemoryNetworkMin: 1g
  *   taskManagerMemoryManagedSize: 50m
  *   taskManagerMemoryOffHeapSize: 2g
  *   taskManagerMemoryNetworkFraction: 0.2
  *   taskManagerNetworkMemorySegmentSize: 4mb
  *   taskManagerMemoryFloatingBuffersPerGate: 64
  *   taskmanager.memory.jvm-overhead.min: 256mb
  *   taskmanager.memory.jvm-overhead.max: 2g
  *   taskmanager.memory.jvm-overhead.fraction: 0.1

Our pipeline includes stateful transformations, and we are verifying that we 
clear the state once it is no longer needed.

Through the Flink UI, we observe that the heap size increases and decreases 
during the job lifecycle.

However, there is a noticeable delay between clearing the state and the 
reduction in heap size usage, which I assume is related to the garbage 
collector frequency.

What is puzzling is the task manager pod memory usage. It appears that the 
memory usage increases intermittently and is not released. We verified the 
different state metrics and confirmed they are changing according to the logic.

Additionally, if we had a state that was never released, I would expect to see 
the heap size increasing constantly as well.

Any insights or ideas?

Thanks,

Sigalit

Reply via email to