Hello,
I'd be interested in the kinds of memory configuration that other users in
the community use for Flink task managers, in particular with respect to
the JVM overhead that Flink's config foresees. We only have a few different
streaming jobs in our environment, but most of them require a JVM overhead
of around 50% regardless of the total pod memory. The config for our
biggest TM currently looks like this:
taskmanager:
memory:
jvm-metaspace:
size: 150m
process:
size: 5 gb
network:
fraction: '0.01'
min: 80 mb
jvm-overhead:
fraction: '0.5'
max: 3g
managed:
fraction: '0.4'
numberOfTaskSlots: '2'
and we still see OOM kills.
We use the official docker image, and it seems jemalloc is used as expected:
$ cat /proc/1/environ | tr '\0' '\n' | grep LD_PRELOAD
LD_PRELOAD=:/usr/lib/x86_64-linux-gnu/libjemalloc.so
And we have enabled ZGC with Java 21:
-XX:+UseZGC -XX:+UseDynamicNumberOfGCThreads -XX:+ZGenerational
What is the experience of others, is JVM overhead always so proportionally
high for streaming workloads?