Jan Filipiak created YARN-9670:
----------------------------------

             Summary: Missing Fsync for localized resostatestoreurces before 
updating to finalized in 
                 Key: YARN-9670
                 URL: https://issues.apache.org/jira/browse/YARN-9670
             Project: Hadoop YARN
          Issue Type: Bug
          Components: nodemanager
    Affects Versions: 2.6.0
            Reporter: Jan Filipiak


A resource that was localized is not properly FSynced before the state-manager 
is updated to track this resource as finalized. The Download is currently 
considered finished after the target local outputstream is closed. The data 
might not have made it to the blockdevice before the statestore is updated. 
Containers relying on the resource may see only parts of the resource after 
recovery usually leading to them crashing.

 

Possible fixes:

Introduce a new step in the state machine that Fsyncs the downloaded path 
before calling the statestore.

On recovery we can compare the size (and we probably have to unpack archives 
again)

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to