Hi,
If the TM is not responding check the TM logs if there is some long gap in
logs. There might be three main reasons for such gaps:
1. Machine is swapping - setup/configure your machine/processes that machine
never swap (best to disable swap altogether)
2. Long GC full stops - look how to ana
Sorry, the "killed" I mean here is JM lost the TM. The TM instance is still
running inside kubernetes, but it is not responding to any requests,
probably due to high load. And from JM side, JM lost heartbeat tracking of
the TM, so it marked the TM as died.
The „volume“ of Kafka topics, I mean, the
Hi,
> In addition to your comments, what are the items retained by
> NetworkEnvironment? They grew seems like indefinitely, do they ever reduce?
>
Mostly the network buffers, which should be ok. They are always recycled and
should not be released until the network environment is GCed.
> I think
Thanks a lot! This is very helpful.
In addition to your comments, what are the items retained by
NetworkEnvironment? They grew seems like indefinitely, do they ever reduce?
I think there is a GC issue because my task manager is killed somehow after
a job run. The duration correlates to the volume
Hi,
I cannot spot anything that indicates a leak from your screenshots. Maybe you
misinterpret the numbers? In your heap dump, there is only a single instance of
org.apache.flink.runtime.io.network.NetworkEnvironment and it retains about
400,000,000 bytes from being GCed because it holds refere