Hi All,

Recently, FLIP-49 [1] introduced the new JVM Metaspace limit in the 1.10
release [2]. Flink scripts, which start the task manager JVM process, set
this limit by adding the corresponding JVM argument. This has been done to
properly plan resources. especially to derive container size for
Yarn/Mesos/Kubernetes. Also, it should surface potential class loading
leaks. There is an option to change it:
'taskmanager.memory.jvm-metaspace.size' [3]. Its current default value is
96Mb.

This change led to 'OutOfMemoryError: Metaspace' in certain cases after
upgrading to 1.10 version. In some cases, a class loading leak has been
detected [4] and has to be investigated on its own. In other cases, just
increasing the option value helped because the default value was not
enough, presumably, due to the job specifics. In general, the required
Metaspace size depends on the job and there is no default value to cover
all cases. There is an issue to improve docs for this concern [5].

This survey is to come up with the most reasonable default value for this
option. If you have encountered this issue and increasing the Metaspace
size helped (there is no class loading leak), please, report any specifics
of your job, if you think it is relevant for this concern, and the option
value that resolved it. There is also a dedicated Jira issue [6] for
reporting.

Thanks,
Andrey

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#jvm-parameters
[3]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/config.html#taskmanager-memory-jvm-metaspace-size
[4] https://issues.apache.org/jira/browse/FLINK-16142
[5] https://issues.apache.org/jira/browse/FLINK-16278
[6] https://jira.apache.org/jira/browse/FLINK-16406

Reply via email to