Hi
Where are you storing the state.
Try rocksdb.

Thanks
Sachin


On Thu, 23 May 2024 at 6:19 PM, Sigalit Eliazov <e.siga...@gmail.com> wrote:

> Hi,
>
> I am trying to understand the following behavior in our Flink application
> cluster. Any assistance would be appreciated.
>
> We are running a Flink application cluster with 5 task managers, each with
> the following configuration:
>
>    - jobManagerMemory: 12g
>    - taskManagerMemory: 20g
>    - taskManagerMemoryHeapSize: 12g
>    - taskManagerMemoryNetworkMax: 4g
>    - taskManagerMemoryNetworkMin: 1g
>    - taskManagerMemoryManagedSize: 50m
>    - taskManagerMemoryOffHeapSize: 2g
>    - taskManagerMemoryNetworkFraction: 0.2
>    - taskManagerNetworkMemorySegmentSize: 4mb
>    - taskManagerMemoryFloatingBuffersPerGate: 64
>    - taskmanager.memory.jvm-overhead.min: 256mb
>    - taskmanager.memory.jvm-overhead.max: 2g
>    - taskmanager.memory.jvm-overhead.fraction: 0.1
>
> Our pipeline includes stateful transformations, and we are verifying that
> we clear the state once it is no longer needed.
>
> Through the Flink UI, we observe that the heap size increases and
> decreases during the job lifecycle.
>
> However, there is a noticeable delay between clearing the state and the
> reduction in heap size usage, which I assume is related to the garbage
> collector frequency.
>
> What is puzzling is the task manager pod memory usage. It appears that the
> memory usage increases intermittently and is not released. We verified the
> different state metrics and confirmed they are changing according to the
> logic.
>
> Additionally, if we had a state that was never released, I would expect to
> see the heap size increasing constantly as well.
>
> Any insights or ideas?
>
> Thanks,
>
> Sigalit
>

Reply via email to