Hi,

I am running a standalone cluster setup and submit flinksql job with python
udf following the examples here

<https://github.com/ververica/flink-sql-cookbook/blob/main/udfs/01_python_udfs/01_python_udfs.md>
github.com/ververica/flink-sql-cookbook/blob/main/udfs/01_python_udfs/01_python_udfs.md

I notice that each time I submit the job, cancel and resubmit, eventually
my task manager will throw an out of memory exception. I am sure it is due
to a leaky class loader somewhere but I am not sure how to track it down.
Has anyone experienced this issue before?


2023-03-24 04:55:46,380 ERROR
org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - Fatal error
occurred while executing the TaskManager. Shutting it down...
java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error
has occurred. This can mean two things: either the job requires a larger
size of JVM metaspace to load classes or there is a class loading leak. In
the first case 'taskmanager.memory.jvm-metaspace.size' configuration option
should be increased. If the error persists (usually in cluster after
several job (re-)submissions) then there is probably a class loading leak
in user code or some of its dependencies which has to be investigated and
fixed. The task executor has to be shutdown... at
java.lang.ClassLoader.defineClass1(Native Method) ~[?:?] at
java.lang.ClassLoader.defineClass(ClassLoader.java:1017) ~[?:?] at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
~[?:?] at java.net.URLClassLoader.defineClass(URLClassLoader.java:555)
~[?:?] at java.net.URLClassLoader$1.run(URLClassLoader.java:458) ~[?:?] at
java.net.URLClassLoader$1.run(URLClassLoader.java:452) ~[?:?] at
java.security.AccessController.doPrivileged(Native Method) ~[?:?] at
java.net.URLClassLoader.findClass(URLClassLoader.java:451) ~[?:?] at
java.lang.ClassLoader.loadClass(ClassLoader.java:589) ~[?:?] at
org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:67)
~[dpi-flink-sql-base-app-0.9.35.jar:?] at
org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:51)
[dpi-flink-sql-base-app-0.9.35.jar:?] at
java.lang.ClassLoader.loadClass(ClassLoader.java:522) [?:?] at
org.apache.beam.sdk.options.PipelineOptionsFactory.<clinit>(PipelineOptionsFactory.java:500)
[blob_p-bbc3c49fcdd79f0b3f7f6c99a18bd72516414de1-4563cd43f6f153fe0ec32993bf935209:1.16.1]
at
org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238)
[blob_p-bbc3c49fcdd79f0b3f7f6c99a18bd72516414de1-4563cd43f6f153fe0ec32993bf935209:1.16.1]
at
org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
[blob_p-bbc3c49fcdd79f0b3f7f6c99a18bd72516414de1-4563cd43f6f153fe0ec32993bf935209:1.16.1]
at
org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:92)
[blob_p-bbc3c49fcdd79f0b3f7f6c99a18bd72516414de1-4563cd43f6f153fe0ec32993bf935209:1.16.1]
at
org.apache.flink.table.runtime.operators.python.table.PythonTableFunctionOperator.open(PythonTableFunctionOperator.java:114)
[blob_p-bbc3c49fcdd79f0b3f7f6c99a18bd72516414de1-4563cd43f6f153fe0ec32993bf935209:1.16.1]
at
org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107)
[flink-dist-1.16.1.jar:1.16.1] at
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:726)
[flink-dist-1.16.1.jar:1.16.1] at
org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$927/0x0000000800a4ac40.call(Unknown
Source) [flink-dist-1.16.1.jar:1.16.1] at
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100)
[flink-dist-1.16.1.jar:1.16.1] at
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:702)
[flink-dist-1.16.1.jar:1.16.1] at
org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:669)
[flink-dist-1.16.1.jar:1.16.1] at
org.apache.flink.runtime.taskmanager.Task$$Lambda$815/0x0000000800904840.run(Unknown
Source) [flink-dist-1.16.1.jar:1.16.1] at
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
[flink-dist-1.16.1.jar:1.16.1] at
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
[flink-dist-1.16.1.jar:1.16.1] at
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
[flink-dist-1.16.1.jar:1.16.1] at
org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
[flink-dist-1.16.1.jar:1.16.1] at java.lang.Thread.run(Thread.java:829)
[?:?]

Reply via email to