Gaël Renoux created FLINK-27687:
-----------------------------------

             Summary: SpanningWrapper shouldn't assume temp folder exists
                 Key: FLINK-27687
                 URL: https://issues.apache.org/jira/browse/FLINK-27687
             Project: Flink
          Issue Type: New Feature
          Components: Runtime / Network
    Affects Versions: 1.14.4
            Reporter: Gaël Renoux


In SpanningWrapper.createSpillingChannel, it assumes that the folder in which 
we create the file exists. However, this is not the case in the following 
scenario (which actually happened to us today):
 * The temp folders were created a while ago (I assume on startup of the 
task-manager) in the /tmp folder. They weren't used for a while, probably 
because we didn't have any record big enough to trigger it.
 * The cleanup cron for /tmp did its job and deleted those old folders in /tmp.
 * We deployed a new version of the job that actually needed the folders, and 
it crashed.

=> Not sure if it should be SpanningWrapper's responsability to create the 
folder if it doesn't exist anymore, though, but I'm not familiar enough with 
Flink's internal to make a guess as to what class should do it. The problem 
occurred to us on SpanningWrapper, but it can probably happen in other places 
as well.

More generally, assuming that folders and files in /tmp won't get deleted at 
some point doesn't seem correct to me. The [documentation for 
io.tmp.dirs|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/]
 recommands that it shouldn't be purged, but we do need to clean up at some 
point. If that is not the case, then the documentation should be updated to 
indicate that this is not a recommendation but mandatory, and that purges will 
break the jobs (not just trigger a recovery).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to