Gaël Renoux created FLINK-27687: ----------------------------------- Summary: SpanningWrapper shouldn't assume temp folder exists Key: FLINK-27687 URL: https://issues.apache.org/jira/browse/FLINK-27687 Project: Flink Issue Type: New Feature Components: Runtime / Network Affects Versions: 1.14.4 Reporter: Gaël Renoux
In SpanningWrapper.createSpillingChannel, it assumes that the folder in which we create the file exists. However, this is not the case in the following scenario (which actually happened to us today): * The temp folders were created a while ago (I assume on startup of the task-manager) in the /tmp folder. They weren't used for a while, probably because we didn't have any record big enough to trigger it. * The cleanup cron for /tmp did its job and deleted those old folders in /tmp. * We deployed a new version of the job that actually needed the folders, and it crashed. => Not sure if it should be SpanningWrapper's responsability to create the folder if it doesn't exist anymore, though, but I'm not familiar enough with Flink's internal to make a guess as to what class should do it. The problem occurred to us on SpanningWrapper, but it can probably happen in other places as well. More generally, assuming that folders and files in /tmp won't get deleted at some point doesn't seem correct to me. The [documentation for io.tmp.dirs|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/] recommands that it shouldn't be purged, but we do need to clean up at some point. If that is not the case, then the documentation should be updated to indicate that this is not a recommendation but mandatory, and that purges will break the jobs (not just trigger a recovery). -- This message was sent by Atlassian Jira (v8.20.7#820007)