[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506651#comment-16506651 ]
Konstantinos Karanasos commented on YARN-8394: ---------------------------------------------- Looks good to me. A couple of things to fix before committing: * that tries best efforts to honor task locality constraint -> to honor task locality constraints * losing the locality constraint -> relaxing the locality constraint * when additional is -1, you can say that it is calculated based on the formula L * C / N, capped by the cluster size, where L is number of locations (nodes or racks) specified in the resource request, C is the number of requested containers, and N is the size of the cluster. > Improve data locality documentation for Capacity Scheduler > ---------------------------------------------------------- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Weiwei Yang > Assignee: Weiwei Yang > Priority: Major > Attachments: YARN-8394.001.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org