[ https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005493#comment-14005493 ]
Junping Du commented on YARN-1338: ---------------------------------- Thanks for addressing my comments, [~jlowe]! Some additional comments: I think currently we are using initStorage(conf) to create DB items for storing NMState when NM is start for the first time and the same method for locating DB items when NM is restart. Do we have any code to destroy DB items for NMState when NM is decommissioned (not expecting short-term restart)? If not, when NM is recommissioned - which should be recognized as a fresh node, it will still have stale NMState info if NM_RECOVERY_DIR and DB_NAME not changed. Do I miss anything here? In LocalResourcesTrackerImpl#recoverResource() {code} + incrementFileCountForLocalCacheDirectory(localDir.getParent()); {code} Given localDir is already the parent of localPath, may be we should just increment locaDir rather than its parent? I didn't see we have unit test to check file count for resource directory after recovery. May be we should add some? > Recover localized resource cache state upon nodemanager restart > --------------------------------------------------------------- > > Key: YARN-1338 > URL: https://issues.apache.org/jira/browse/YARN-1338 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Affects Versions: 2.3.0 > Reporter: Jason Lowe > Assignee: Jason Lowe > Attachments: YARN-1338.patch, YARN-1338v2.patch, > YARN-1338v3-and-YARN-1987.patch, YARN-1338v4.patch, YARN-1338v5.patch > > > Today when node manager restarts we clean up all the distributed cache files > from disk. This is definitely not ideal from 2 aspects. > * For work preserving restart we definitely want them as running containers > are using them > * For even non work preserving restart this will be useful in the sense that > we don't have to download them again if needed by future tasks. -- This message was sent by Atlassian JIRA (v6.2#6252)