[ https://issues.apache.org/jira/browse/YARN-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809885#comment-13809885 ]
Chris Douglas commented on YARN-1324: ------------------------------------- bq. When does MR use multiple disks in the same task/container? Isnt the map output written to a single indexed partition file? Spills are spread across all volumes, but merged into a single file at the end. Would randomizing the order of disks be a reasonable short-term workaround for (1)? Future changes could weight/elide directories based on other criteria, but that's a simple change. So would changing the "random" selection to bias its search order using a hash of the task id (instead of disk usage when creating the spill), so the ShuffleHandler could search fewer directories on average. I agree with Vinod, it would be hard to prevent the search altogether... bq. Requiring apps to specify the number of disks for a container is also a viable solution and can be done in a back-compatible manner by changing MR to specify multiple disks and leaving the default to 1 for apps that dont care. This makes sense as a hint, but some users might interpret it as a constraint and be confused when a NM schedules them on a node the reports fewer local dirs (due to failure, heterogeneous config). > NodeManager potentially causes unnecessary operations on all its disks > ---------------------------------------------------------------------- > > Key: YARN-1324 > URL: https://issues.apache.org/jira/browse/YARN-1324 > Project: Hadoop YARN > Issue Type: Improvement > Affects Versions: 2.2.0 > Reporter: Bikas Saha > > Currently, for every container, the NM creates a directory on every disk and > expects the container-task to choose 1 of them and load balance the use of > the disks across all containers. > 1) This may have worked fine in the MR world where MR tasks would randomly > choose dirs but in general we cannot expect every app/task writer to > understand these nuances and randomly pick disks. So we could end up > overloading the first disk if most people decide to use the first disk. > 2) This makes a number of NM operations to scan every disk (thus randomizing > that disk) to locate the dir which the task has actually chosen to use for > its files. Makes all these operations expensive for the NM as well as > disruptive for users of disks that did not have the real task working dirs. > I propose that NM should up-front decide the disk it is assigning to tasks. > It could choose to do so randomly or weighted-randomly by looking at space > and load on each disk. So it could do a better job of load balancing. Then, > it would associate the chosen working directory with the container context so > that subsequent operations on the NM can directly seek to the correct > location instead of having to seek on every disk. -- This message was sent by Atlassian JIRA (v6.1#6144)