[ https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794733#comment-13794733 ]
Zhijie Shen commented on YARN-975: ---------------------------------- The concern of the name-space issue is that if we have too many files per application, the name node may be overwhelmed if we run on top of HDFS. To reduce the number of files, we fit all the history data of an application, application attempts and containers into one TFile. Then, each TFile will contain: ||key||value|| |ApplicationId|ApplicationHistoryData| |ApplicationAttemptId1|ApplicationAttemptHistoryData1| |ApplicationAttemptId2|ApplicationAttemptHistoryData2| |ContainerId1|ContainerHistoryData1| |ContainerId2|ContainerHistoryData2| |ContainerId3|ContainerHistoryData3| The benefit is that we strictly limit the file per application to 1. However, even we just read the partial history data of application, for example, the application information, we still need to load the complete file. Hopefully, the meta information of an application will not be big, and will not terribly affect the I/O performance. In addition, we can do application level cache to avoid accessing the secondary storage system all the time. However, I propose it to be done separately. Thoughts? > Add a file-system implementation for history-storage > ---------------------------------------------------- > > Key: YARN-975 > URL: https://issues.apache.org/jira/browse/YARN-975 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Zhijie Shen > Assignee: Zhijie Shen > Attachments: YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch, > YARN-975.4.patch, YARN-975.5.patch > > > HDFS implementation should be a standard persistence strategy of history > storage -- This message was sent by Atlassian JIRA (v6.1#6144)