[ 
https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15020191#comment-15020191
 ] 

Naganarasimha G R commented on YARN-3784:
-----------------------------------------

Thanks [~sunilg], for updating offline on this issue, basically i was missing 
the part that in FiCaSchedulerApp you are subtracting the elapsed time and 
hence you were able send the effective timeout to the AM during Hearbeat, but i 
can see following issues with current approach than having probable timestamp  
(current time + preemption timeout) during the creation of PreemptionContainer 
and share this to AM
* There can be a small delta between actual timeout value and the time when it 
can actually timeout 
* some additional loops during creation of response during heartbeat response 
(though not a thing of high performance impact but nevertheless can be avoided 
) 
* Avoid additional storage of {{containersWithFirstNotifyTime}} in 
FiCaSchedulerApp

But may current approach is more simpler for users to understand a numerical 
value than a timestamp!, thoughts from others ?

Also few issues with the patch :
* possible leak in {{containersWithFirstNotifyTime}} as remove is not being 
called?
* can there be a case where {{containersWithFirstNotifyTime}} be not filled in 
for a preempted container ? if not i feel additional if check {{if 
(containersWithFirstNotifyTime.containsKey(c))}} in the for loop is not 
required.


> Indicate preemption timout along with the list of containers to AM 
> (preemption message)
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3784
>                 URL: https://issues.apache.org/jira/browse/YARN-3784
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: 0001-YARN-3784.patch, 0002-YARN-3784.patch, 
> 0003-YARN-3784.patch, 0004-YARN-3784.patch
>
>
> Currently during preemption, AM is notified with a list of containers which 
> are marked for preemption. Introducing a timeout duration also along with 
> this container list so that AM can know how much time it will get to do a 
> graceful shutdown to its containers (assuming one of preemption policy is 
> loaded in AM).
> This will help in decommissioning NM scenarios, where NM will be 
> decommissioned after a timeout (also killing containers on it). This timeout 
> will be helpful to indicate AM that those containers can be killed by RM 
> forcefully after the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to