[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274007#comment-14274007
 ] 

M. C. Srivas commented on YARN-2791:
------------------------------------

The scope in https://issues.apache.org/jira/browse/YARN-2139 is just too 
bloated. We have this problem immediately with YARN overprovisioning since it 
doesn't take into account how performance is impacted by the number of disks on 
each node. We need this fix now, not later. YARN-2139 is too elaborate, and is 
trying to do too much. On the the other hand, it doesn't take into account how 
running DataNodes on the same spindles will impact shuffle performance. I would 
say get this piece of work done, and we can wait on YARN-2139 whenever it gets 
done.

> Add Disk as a resource for scheduling
> -------------------------------------
>
>                 Key: YARN-2791
>                 URL: https://issues.apache.org/jira/browse/YARN-2791
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: scheduler
>    Affects Versions: 2.5.1
>            Reporter: Swapnil Daingade
>            Assignee: Yuliya Feldman
>         Attachments: DiskDriveAsResourceInYARN.pdf
>
>
> Currently, the number of disks present on a node is not considered a factor 
> while scheduling containers on that node. Having large amount of memory on a 
> node can lead to high number of containers being launched on that node, all 
> of which compete for I/O bandwidth. This multiplexing of I/O across 
> containers can lead to slower overall progress and sub-optimal resource 
> utilization as containers starved for I/O bandwidth hold on to other 
> resources like cpu and memory. This problem can be solved by considering disk 
> as a resource and including it in deciding how many containers can be 
> concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to