[ 
https://issues.apache.org/jira/browse/YARN-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Filipiak updated YARN-9670:
-------------------------------
    Comment: was deleted

(was: it bites us in YARN 2.6 i scanned through master briefly and couldn't 
find anything that would fix it.)

> Missing Fsync for localized resources before updating to finalized in 
> statestore
> --------------------------------------------------------------------------------
>
>                 Key: YARN-9670
>                 URL: https://issues.apache.org/jira/browse/YARN-9670
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: Jan Filipiak
>            Priority: Major
>
> A resource that was localized is not properly FSynced before the 
> state-manager is updated to track this resource as finalized. The Download is 
> currently considered finished after the target local outputstream is closed. 
> The data might not have made it to the blockdevice before the statestore is 
> updated. Containers relying on the resource may see only parts of the 
> resource after recovery usually leading to them crashing.
>  
> Possible fixes:
> Introduce a new step in the state machine that Fsyncs the downloaded path 
> before calling the statestore.
> On recovery we can compare the size (and we probably have to unpack archives 
> again)
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to