[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276358#comment-14276358 ]
Sangjin Lee commented on YARN-2928: ----------------------------------- Regarding the per-node approach, I do have some questions (and observations) on the approach in addition to the aspect of losing the isolation/attribution as already discussed. (1) While it may be faster to allocate with the per-node companions, capacity-wise you would end up spending more capacity with the per-node approach. Since these per-node companions are always up although they may be idle for large amount of time. So if capacity is a concern you may lose out. Under what circumstances would per-node companions be more advantageous in terms of capacity? (2) I do have a question about the work-preserving aspect of the per-node ATS companion. One implication of making this a per-node thing (i.e. long-running) is that we need to handle the work-preserving restart. What if we need to restart the ATS companion? Since other YARN daemons (RM and NM) allow for work-preserving restarts, we cannot have the ATS companion break that. So that seems to be a requirement? (3) We still need to handle the lifecycle management aspects of it. Previously we said that when RM allocates an AM it would tell the NM so the NM could spawn the special container. With the per-node approach, the RM would *still* need to tell the NM so that the NM can talk to the per-node ATS companion to initialize the data structure for the given app. These are quick observations. While I do see value in the per-node approach, it's not totally clear how much work it would save over the per-app approach given these observations. What do you think? > Application Timeline Server (ATS) next gen: phase 1 > --------------------------------------------------- > > Key: YARN-2928 > URL: https://issues.apache.org/jira/browse/YARN-2928 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver > Reporter: Sangjin Lee > Assignee: Sangjin Lee > Priority: Critical > Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf > > > We have the application timeline server implemented in yarn per YARN-1530 and > YARN-321. Although it is a great feature, we have recognized several critical > issues and features that need to be addressed. > This JIRA proposes the design and implementation changes to address those. > This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)