[jira] [Comment Edited] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

Konstantinos Karanasos (JIRA) Tue, 12 Feb 2019 10:54:31 -0800


    [ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766325#comment-16766325
 ]


Konstantinos Karanasos edited comment on YARN-999 at 2/12/19 6:53 PM:
----------------------------------------------------------------------

Give a look at YARN-7934 – we had refactored some stuff in preemption for the 
federation code (for the glabal queues in particular). The umbrella Jira is not 
finished, but I think this Jira will point you to some useful classes.

I am not sure how exactly the reduction of node resources is implemented, but 
for the opportunistic containers, you can kill stuff locally at the NMs. So if 
you need to free up resources due to resource reduction, you can go over the 
opportunistic containers running and kill the long-running ones.

As far as I remember, the regular preemption code in the RM will not touch 
opportunistic containers.


was (Author: kkaranasos):
Give a look at YARN-7934 – we had refactored some stuff in preemption for the 
federation code (for the glabal queues in particular). The umbrella Jira is not 
finished, but I think this Jira will point you to some useful classes.

I am not sure how exactly the reduction of node resources is implemented, but 
for the opportunistic containers, you can kill stuff locally at the NMs. So if 
you need to free up resources due to resource reduction, you can go over the 
opportunistic containers running and kill the long-running ones).

As far as I remember, the regular preemption code in the RM will not touch 
opportunistic containers.

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-999
>                 URL: https://issues.apache.org/jira/browse/YARN-999
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: graceful, nodemanager, scheduler
>            Reporter: Junping Du
>            Priority: Major
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be hired here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

Reply via email to