[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506651#comment-16506651
 ] 

Konstantinos Karanasos commented on YARN-8394:
----------------------------------------------

Looks good to me. A couple of things to fix before committing:
 * that tries best efforts to honor task locality constraint -> to honor task 
locality constraints
 * losing the locality constraint -> relaxing the locality constraint
 * when additional is -1, you can say that it is calculated based on the 
formula L * C / N, capped by the cluster size, where L is number of locations 
(nodes or racks) specified in the resource request, C is the number of 
requested containers, and N is the size of the cluster.

> Improve data locality documentation for Capacity Scheduler
> ----------------------------------------------------------
>
>                 Key: YARN-8394
>                 URL: https://issues.apache.org/jira/browse/YARN-8394
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>            Priority: Major
>         Attachments: YARN-8394.001.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to