Hi Matthias,

I think the problem lies somewhere in JarRunHandler, as this is the place where the files are created.

I think these are not the files that are managed via BlobService, as they are not stored in BlobService folders (I made experiment changing default BlobServer folders).

It seems to me that CliFrontend deletes those files explicitly:

https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L250

whereas I couldn't find such invocation in JarRunHandler (not deleting those files does not fully explain leak on heap though...)


thanks,

maciek

On 26.01.2021 11:16, Matthias Pohl wrote:
Hi Maciek,
my understanding is that the jars in the JobManager should be cleaned up after the job is terminated (I assume that your jobs successfully finished). The jars are managed by the BlobService. The dispatcher will trigger the jobCleanup in [1] after job termination. Are there any suspicious log messages that might indicate an issue?
I'm adding Chesnay to this thread as he might have more insights here.

[1] https://github.com/apache/flink/blob/2c4e0ab921ccfaf003073ee50faeae4d4e4f4c93/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L797 <https://github.com/apache/flink/blob/2c4e0ab921ccfaf003073ee50faeae4d4e4f4c93/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L797>

On Mon, Jan 25, 2021 at 8:37 PM Maciek Próchniak <m...@touk.pl <mailto:m...@touk.pl>> wrote:

    Hello,

    in our setup we have:

    - Flink 1.11.2

    - job submission via REST API (first we upload jar, then we submit
    multiple jobs with it)

    - additional jars embedded in lib directory of main jar (this is
    crucial
    part)

    When we submit jobs this way, Flink creates new temp jar files via
    PackagedProgram.extractContainedLibraries method.

    We observe that they are not removed after job finishes - it seems
    that
    PackagedProgram.deleteExtractedLibraries is not invoked when using
    REST
    API.

    What's more, it seems that those jars remain open in JobManager
    process.
    We observe that when we delete them manually via scripts, the disk
    space
    is not reclaimed until process is restarted, we also see via heap
    dump
    inspection that java.util.zip.ZipFile$Source  objects remain,
    pointing
    to those files. This is quite a problem for us, as we submit quite
    a few
    jobs, and after a while we ran out of either heap or disk space on
    JobManager process/host. Unfortunately, I cannot so far find where
    this
    leak would happen...

    Does anybody have some pointers where we can search? Or how to fix
    this
    behaviour?


    thanks,

    maciek

Reply via email to