[ https://issues.apache.org/jira/browse/YARN-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jan Filipiak updated YARN-9670: ------------------------------- Summary: Missing Fsync for localized resources before updating to finalized in statestore (was: Missing Fsync for localized resostatestoreurces before updating to finalized in ) > Missing Fsync for localized resources before updating to finalized in > statestore > -------------------------------------------------------------------------------- > > Key: YARN-9670 > URL: https://issues.apache.org/jira/browse/YARN-9670 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.6.0 > Reporter: Jan Filipiak > Priority: Major > > A resource that was localized is not properly FSynced before the > state-manager is updated to track this resource as finalized. The Download is > currently considered finished after the target local outputstream is closed. > The data might not have made it to the blockdevice before the statestore is > updated. Containers relying on the resource may see only parts of the > resource after recovery usually leading to them crashing. > > Possible fixes: > Introduce a new step in the state machine that Fsyncs the downloaded path > before calling the statestore. > On recovery we can compare the size (and we probably have to unpack archives > again) > -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org