Hi Maciek,
my understanding is that the jars in the JobManager should be cleaned up
after the job is terminated (I assume that your jobs successfully
finished). The jars are managed by the BlobService. The dispatcher will
trigger the jobCleanup in [1] after job termination. Are there any
suspicious log messages that might indicate an issue?
I'm adding Chesnay to this thread as he might have more insights here.

[1]
https://github.com/apache/flink/blob/2c4e0ab921ccfaf003073ee50faeae4d4e4f4c93/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L797

On Mon, Jan 25, 2021 at 8:37 PM Maciek Próchniak <m...@touk.pl> wrote:

> Hello,
>
> in our setup we have:
>
> - Flink 1.11.2
>
> - job submission via REST API (first we upload jar, then we submit
> multiple jobs with it)
>
> - additional jars embedded in lib directory of main jar (this is crucial
> part)
>
> When we submit jobs this way, Flink creates new temp jar files via
> PackagedProgram.extractContainedLibraries method.
>
> We observe that they are not removed after job finishes - it seems that
> PackagedProgram.deleteExtractedLibraries is not invoked when using REST
> API.
>
> What's more, it seems that those jars remain open in JobManager process.
> We observe that when we delete them manually via scripts, the disk space
> is not reclaimed until process is restarted, we also see via heap dump
> inspection that java.util.zip.ZipFile$Source  objects remain, pointing
> to those files. This is quite a problem for us, as we submit quite a few
> jobs, and after a while we ran out of either heap or disk space on
> JobManager process/host. Unfortunately, I cannot so far find where this
> leak would happen...
>
> Does anybody have some pointers where we can search? Or how to fix this
> behaviour?
>
>
> thanks,
>
> maciek
>
>

Reply via email to