[ https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Kanter updated YARN-5683: -------------------------------- Labels: oct16-hard (was: ) > Support specifying storage type for per-application local dirs > -------------------------------------------------------------- > > Key: YARN-5683 > URL: https://issues.apache.org/jira/browse/YARN-5683 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager > Affects Versions: 3.0.0-alpha2 > Reporter: Tao Yang > Assignee: Tao Yang > Labels: oct16-hard > Attachments: YARN-5683-1.patch, YARN-5683-2.patch, YARN-5683-3.patch, > flow_diagram_for_MapReduce-2.png, flow_diagram_for_MapReduce.png > > > h3. Introduction > * Some applications of various frameworks (Flink, Spark and MapReduce etc) > using local storage (checkpoint, shuffle etc) might require high IO > performance. It's useful to allocate local directories to high performance > storage media for these applications on heterogeneous clusters. > * YARN does not distinguish different storage types and hence applications > cannot selectively use storage media with different performance > characteristics. Adding awareness of storage media can allow YARN to make > better decisions about the placement of local directories. > h3. Approach > * NodeManager will distinguish storage types for local directories. > ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration > should allow the cluster administrator to optionally specify the storage type > for each local directories. Example: > [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to > [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir) > ** StorageType defines DISK/SSD storage types and takes DISK as the default > storage type. > ** StorageLocation separates storage type and directory path, used by > LocalDirAllocator to aware the types of local dirs, the default storage type > is DISK. > ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the > local directory of the specified storage type, and will fallback to not care > storage type if the requirement can not be satisfied. > ** Support for container related local/log directories by ContainerLaunch. > All application frameworks can set the environment variables > (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage > type of local/log directories, and choose to not launch container if fallback > through these environment variables (ENSURE_LOCAL_STORAGE_TYPE and > ENSURE_LOG_STORAGE_TYPE). > * Allow specified storage type for various frameworks (Take MapReduce as an > example) > ** Add new configurations should allow application administrator to > optionally specify the storage type of local/log directories and fallback > strategy (MapReduce configurations: mapreduce.job.local-storage-type, > mapreduce.job.log-storage-type, mapreduce.job.ensure-local-storage-type and > mapreduce.job.ensure-log-storage-type). > ** Support for container work directories. Set the environment variables > includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations > above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce > should update YARNRunner and TaskAttemptImpl) > ** Add storage type prefix for request path to support for other local > directories of frameworks (such as shuffle directories for MapReduce). > (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to > support for output/work directories) > ** Flow diagram for MapReduce framework > !flow_diagram_for_MapReduce-2.png! > h3. Further Discussion > * The requirement of storage type for local/log directories may not be > satisfied on heterogeneous clusters. To achieve global optimum, scheduler > should aware and manage disk resources. > [YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that > but seems not support multiple storage types, maybe we should do even more to > aware the storage type of disk resource? > * Node labels or node constraints > ([YARN-3409|https://issues.apache.org/jira/browse/YARN-3409]) can also make a > higher chance to satisfy the requirement of specified storage type. > * Fallback strategy still needs to be concerned. Certain applications might > not work well when the requirement of storage type is not satisfied. When > none of desired storage type disk are available, should container launching > be failed? let AM handle? We have implemented a fallback strategy that fail > to launch container when none of desired storage type disk are available. Is > there some better methods? > This feature has been used for half a year to meet the needs of some > applications on Alibaba search clusters. > Please feel free to give your suggestions and opinions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org