[ 
https://issues.apache.org/jira/browse/YARN-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139397#comment-16139397
 ] 

Joep Rottinghuis commented on YARN-7019:
----------------------------------------

Agreed that we should generalize this to a score. The same could be used for 
asking for additional resources.
For example, if thousands mappers ran, and several reducers are done but tons 
more are running then a map output cannot be fetched (and a mapper has to be 
re-run), getting one more resource to quickly get this done and release a bunch 
or resources has one score.
Earlier in the job execution when waves of mappers run, killing one would have 
a lower score.

Similarly a score could be used to indicate to please not kill an app-master, 
or perhaps if one container gets killed, all the work is lost (in case of 
gang-scheduling needs).

A generalized score of give me x containers and I can make Y progress, vs. cost 
of killing these containers is Y would be useful.

I think the challenge will be to multiply an app or framework reported score to 
scale that to a fair share or priority for the application / user / queue so 
that apps cannot lie about these scores and rig scheduling decisions.

> Ability for applications to notify YARN about container reuse
> -------------------------------------------------------------
>
>                 Key: YARN-7019
>                 URL: https://issues.apache.org/jira/browse/YARN-7019
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Jason Lowe
>
> During preemption calculations YARN can try to reduce the amount of work lost 
> by considering how long a container has been running.  However when an 
> application framework like Tez reuses a container across multiple tasks it 
> changes the work lost calculation since the container has essentially 
> checkpointed between task assignments.  It would be nice if applications 
> could inform YARN when a container has been reused/checkpointed and therefore 
> is a better candidate for preemption wrt. lost work than other, younger 
> containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to