Jan Filipiak created YARN-9670: ---------------------------------- Summary: Missing Fsync for localized resostatestoreurces before updating to finalized in Key: YARN-9670 URL: https://issues.apache.org/jira/browse/YARN-9670 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Jan Filipiak
A resource that was localized is not properly FSynced before the state-manager is updated to track this resource as finalized. The Download is currently considered finished after the target local outputstream is closed. The data might not have made it to the blockdevice before the statestore is updated. Containers relying on the resource may see only parts of the resource after recovery usually leading to them crashing. Possible fixes: Introduce a new step in the state machine that Fsyncs the downloaded path before calling the statestore. On recovery we can compare the size (and we probably have to unpack archives again) -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org