[ https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15020191#comment-15020191 ]
Naganarasimha G R commented on YARN-3784: ----------------------------------------- Thanks [~sunilg], for updating offline on this issue, basically i was missing the part that in FiCaSchedulerApp you are subtracting the elapsed time and hence you were able send the effective timeout to the AM during Hearbeat, but i can see following issues with current approach than having probable timestamp (current time + preemption timeout) during the creation of PreemptionContainer and share this to AM * There can be a small delta between actual timeout value and the time when it can actually timeout * some additional loops during creation of response during heartbeat response (though not a thing of high performance impact but nevertheless can be avoided ) * Avoid additional storage of {{containersWithFirstNotifyTime}} in FiCaSchedulerApp But may current approach is more simpler for users to understand a numerical value than a timestamp!, thoughts from others ? Also few issues with the patch : * possible leak in {{containersWithFirstNotifyTime}} as remove is not being called? * can there be a case where {{containersWithFirstNotifyTime}} be not filled in for a preempted container ? if not i feel additional if check {{if (containersWithFirstNotifyTime.containsKey(c))}} in the for loop is not required. > Indicate preemption timout along with the list of containers to AM > (preemption message) > --------------------------------------------------------------------------------------- > > Key: YARN-3784 > URL: https://issues.apache.org/jira/browse/YARN-3784 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Sunil G > Assignee: Sunil G > Attachments: 0001-YARN-3784.patch, 0002-YARN-3784.patch, > 0003-YARN-3784.patch, 0004-YARN-3784.patch > > > Currently during preemption, AM is notified with a list of containers which > are marked for preemption. Introducing a timeout duration also along with > this container list so that AM can know how much time it will get to do a > graceful shutdown to its containers (assuming one of preemption policy is > loaded in AM). > This will help in decommissioning NM scenarios, where NM will be > decommissioned after a timeout (also killing containers on it). This timeout > will be helpful to indicate AM that those containers can be killed by RM > forcefully after the timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)