Hi there, Today I realized that we currently have a lot of not housekept flink distribution jar files and would like to know what to do about this, i.e. how to proper housekeep them.
In the job submitting HDFS home directory, I find a subdirectory called `.flink` with hundreds of subfolders like `application_1573731655031_0420`, having the following structure: -rw-r--r-- 3 dev dev 861 2020-01-27 21:17 /user/dev/.flink/application_1580155950981_0010/4797ff6e-853b-460c-81b3-34078814c5c9-taskmanager-conf.yaml -rw-r--r-- 3 dev dev 691 2020-01-27 21:16 /user/dev/.flink/application_1580155950981_0010/application_1580155950981_0010-flink-conf.yaml2755466919863419496.tmp -rw-r--r-- 3 dev dev 861 2020-01-27 21:17 /user/dev/.flink/application_1580155950981_0010/fdb5ef57-c140-4f6d-9791-c226eb1438ce-taskmanager-conf.yaml -rw-r--r-- 3 dev dev 92.2 M 2020-01-27 21:16 /user/dev/.flink/application_1580155950981_0010/flink-dist_2.11-1.9.1.jar drwxr-xr-x - dev dev 0 2020-01-27 21:16 /user/dev/.flink/application_1580155950981_0010/lib -rw-r--r-- 3 dev dev 2.6 K 2020-01-27 21:16 /user/dev/.flink/application_1580155950981_0010/log4j.properties -rw-r--r-- 3 dev dev 2.3 K 2020-01-27 21:16 /user/dev/.flink/application_1580155950981_0010/logback.xml drwxr-xr-x - dev dev 0 2020-01-27 21:16 /user/dev/.flink/application_1580155950981_0010/plugins With having tons of those folders (For each flink session we launched/killed in our CI CD pipeline), they sum up to some terrabytes in our HDFS in used space. I suppose, I kill our flink sessions wrongly. We start and stop sessions and and jobs separately like so: Start: $ {OS_ROOT} /flink/bin/yarn-session.sh -jm 4g -tm 32g --name " $ {FLINK_SESSION_NAME} " -d -Denv.java.opts= " -XX:+HeapDumpOnOutOfMemoryError " $ {OS_ROOT}/flink/bin/flink run -m $ {FLINK_HOST} [..savepoint/checkpoint options...] -d -n " $ {JOB_JAR} " $* Stop $ {OS_ROOT} /flink/bin/flink stop -p $ {SAVEPOINT_BASEDIR}/ $ {FLINK_JOB_NAME} -m $ {FLINK_HOST} $ {ID} yarn application -kill " $ {ID} " yarn application -kill was the best I could find as the flink docu states, the linux session process should only be closed (" Stop the YARN session by stopping the unix process (using CTRL+C) or by entering ‘stop’ into the client. "). Now my question: Is there a more elegant way to kill a yarn session (remotely from some host in the cluster, not necessarily the one starting the detached session), which also does the housekeeping then? Or should I do the housekeeping myself manually? (Pretty easy to script). Do I need to expect any more side effects when killing the session with "yarn application -kill"? Best regards Theo -- SCOOP Software GmbH - Gut Maarhausen - Eiler Straße 3 P - D-51107 Köln Theo Diefenthal T +49 221 801916-196 - F +49 221 801916-17 - M +49 160 90506575 theo.diefent...@scoop-software.de - www.scoop-software.de Sitz der Gesellschaft: Köln, Handelsregister: Köln, Handelsregisternummer: HRB 36625 Geschäftsführung: Dr. Oleg Balovnev, Frank Heinen, Martin Müller-Rohde, Dr. Wolfgang Reddig, Roland Scheel