XComp edited a comment on pull request #19275:
URL: https://github.com/apache/flink/pull/19275#issuecomment-1085493232


   That's a good point. Thanks for the detailed explanation. I was thinking 
about it: Essentially, the question is whether we consider the `HistoryServer` 
archiving being part of the job (which means that we want to finish it before 
the cleanup phase starts and the `JobManagerRunner` is removed) or we want it 
to run concurrently to the cleanup logic. For the latter case, I even thought 
of integrating it into the cleanup phase by making it implement the 
`GloballyCleanableResource` interface. But I'm not a fan of it because, 
semantically, cleaning up and archiving are two different things (it would have 
the benefit of getting retries out-of-the-box in case of failure, though). This 
PR proposes a hybrid approach (i.e. triggering the archiving before the cleanup 
phase but letting it run concurrently to it) which contributes to the code 
becoming more complex. Hence, I'd propose going for one of the options I 
described depending on the usecase we want to cover.
   
   I still tend to lean towards the first approach (waiting for the archiving 
before triggering the cleanup) because that's where the job is considered 
finished and the user should expect a result in the `HistoryServer`. 
   
   That said, keep in mind that if we go for the latter option (i.e. archiving 
concurrently to the cleanup), it doesn't guarantee that the cluster waits for 
it to finish. We have a ticket for that: FLINK-26772 The reason is that the 
`Dispatcher.shutdownCluster` methods isn't waiting for the 
`Dispatcher.jobTerminationFutures` to complete.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to