Hi!

As far as I remember this is a known issue a few years ago but Flink
currently has no solution to this (correct me if I'm wrong). I see that
you're running jobs on a yarn session. Could you switch to yarn-per-job
mode (where JM and TMs are created and destroyed for each job) for a
workaround?

David Clutter <dclut...@yahooinc.com> 于2022年1月4日周二 23:39写道:

> I am seeing an issue with class loaders not being GCed and the metaspace
> eventually OOM.  Here is my setup:
>
> - Flink 1.13.1 on EMR using JDK 8 in session mode
> - Job manager is a long-running yarn session
> - New jobs are submitted every 5m (and typically run for less than 5m)
>
> I find that after a few hours the job manager gets killed with Metaspace
> OOM.  I tried increasing the Metaspace for the job manager but that only
> delays the OOM.
>
> I did some debugging using jcmd and I noticed that the size of the classes
> loaded is always increasing.  Next I did a heap dump and found that
> instances of org.apache.flink.util.ChildFirstClassLoader are present long
> after the jobs complete.  Checking the GC roots I found that there is a
> reference in java.io.ObjectStreamClass$Caches.  Seems to be this JDK
> issue: https://bugs.openjdk.java.net/browse/JDK-8277072
>
> Curious if there are any workarounds for this situation?
>
>

Reply via email to