[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-5683:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Support specifying storage type for per-application local dirs
> --
>
> Key: YARN-5683
> URL: https://issues.apache.org/jira/browse/YARN-5683
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: oct16-hard
> Attachments: YARN-5683-1.patch, YARN-5683-2.patch, YARN-5683-3.patch, 
> flow_diagram_for_MapReduce-2.png, flow_diagram_for_MapReduce.png
>
>
> h3.  Introduction
> * Some applications of various frameworks (Flink, Spark and MapReduce etc) 
> using local storage (checkpoint, shuffle etc) might require high IO 
> performance. It's useful to allocate local directories to high performance 
> storage media for these applications on heterogeneous clusters.
> * YARN does not distinguish different storage types and hence applications 
> cannot selectively use storage media with different performance 
> characteristics. Adding awareness of storage media can allow YARN to make 
> better decisions about the placement of local directories.
> h3.  Approach
> * NodeManager will distinguish storage types for local directories.
> ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
> should allow the cluster administrator to optionally specify the storage type 
> for each local directories. Example: 
> [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
> [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
> ** StorageType defines DISK/SSD storage types and takes DISK as the default 
> storage type. 
> ** StorageLocation separates storage type and directory path, used by 
> LocalDirAllocator to aware the types of local dirs, the default storage type 
> is DISK.
> ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
> local directory of the specified storage type, and will fallback to not care 
> storage type if the requirement can not be satisfied.
> ** Support for container related local/log directories by ContainerLaunch. 
> All application frameworks can set the environment variables 
> (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage 
> type of local/log directories, and choose to not launch container if fallback 
> through these environment variables (ENSURE_LOCAL_STORAGE_TYPE and 
> ENSURE_LOG_STORAGE_TYPE).
> * Allow specified storage type for various frameworks (Take MapReduce as an 
> example)
> ** Add new configurations should allow application administrator to 
> optionally specify the storage type of local/log directories and fallback 
> strategy (MapReduce configurations: mapreduce.job.local-storage-type, 
> mapreduce.job.log-storage-type, mapreduce.job.ensure-local-storage-type and 
> mapreduce.job.ensure-log-storage-type).
> ** Support for container work directories. Set the environment variables 
> includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
> above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
> should update YARNRunner and TaskAttemptImpl)
> ** Add storage type prefix for request path to support for other local 
> directories of frameworks (such as shuffle directories for MapReduce). 
> (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
> support for output/work directories)
> ** Flow diagram for MapReduce framework
> !flow_diagram_for_MapReduce-2.png!
> h3.  Further Discussion
> * Scheduling : The requirement of storage type for local/log directories may 
> not be satisfied for a part of nodes on heterogeneous clusters. To achieve 
> global optimum, scheduler should aware and manage disk resources. 
> ** Approach-1: Based on node attributes (YARN-3409), Scheduler can allocate 
> containers which have SSD requirement on nodes with attribute:ssd=true.
> ** Approach-2: Based on extended resource model (YARN-3926), it's easy to 
> support scheduling through extending resource models like vdisk and vssd 
> using this feature, but hard to measure for applications and isolate for 
> non-CFQ based disks.
> * Fallback strategy still needs to be concerned. Certain applications might 
> not work well when the requirement of storage type is not satisfied. When 
> none of desired storage type disk are available, should container launching 
> be failed? let AM handle? We have implemented a fallback strategy that fail 
> to launch container when none of desired storage type disk are available. Is 
> there some better methods? 
> This feature has been used for half a year to meet 

[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-5683:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Support specifying storage type for per-application local dirs
> --
>
> Key: YARN-5683
> URL: https://issues.apache.org/jira/browse/YARN-5683
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: oct16-hard
> Attachments: YARN-5683-1.patch, YARN-5683-2.patch, YARN-5683-3.patch, 
> flow_diagram_for_MapReduce-2.png, flow_diagram_for_MapReduce.png
>
>
> h3.  Introduction
> * Some applications of various frameworks (Flink, Spark and MapReduce etc) 
> using local storage (checkpoint, shuffle etc) might require high IO 
> performance. It's useful to allocate local directories to high performance 
> storage media for these applications on heterogeneous clusters.
> * YARN does not distinguish different storage types and hence applications 
> cannot selectively use storage media with different performance 
> characteristics. Adding awareness of storage media can allow YARN to make 
> better decisions about the placement of local directories.
> h3.  Approach
> * NodeManager will distinguish storage types for local directories.
> ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
> should allow the cluster administrator to optionally specify the storage type 
> for each local directories. Example: 
> [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
> [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
> ** StorageType defines DISK/SSD storage types and takes DISK as the default 
> storage type. 
> ** StorageLocation separates storage type and directory path, used by 
> LocalDirAllocator to aware the types of local dirs, the default storage type 
> is DISK.
> ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
> local directory of the specified storage type, and will fallback to not care 
> storage type if the requirement can not be satisfied.
> ** Support for container related local/log directories by ContainerLaunch. 
> All application frameworks can set the environment variables 
> (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage 
> type of local/log directories, and choose to not launch container if fallback 
> through these environment variables (ENSURE_LOCAL_STORAGE_TYPE and 
> ENSURE_LOG_STORAGE_TYPE).
> * Allow specified storage type for various frameworks (Take MapReduce as an 
> example)
> ** Add new configurations should allow application administrator to 
> optionally specify the storage type of local/log directories and fallback 
> strategy (MapReduce configurations: mapreduce.job.local-storage-type, 
> mapreduce.job.log-storage-type, mapreduce.job.ensure-local-storage-type and 
> mapreduce.job.ensure-log-storage-type).
> ** Support for container work directories. Set the environment variables 
> includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
> above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
> should update YARNRunner and TaskAttemptImpl)
> ** Add storage type prefix for request path to support for other local 
> directories of frameworks (such as shuffle directories for MapReduce). 
> (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
> support for output/work directories)
> ** Flow diagram for MapReduce framework
> !flow_diagram_for_MapReduce-2.png!
> h3.  Further Discussion
> * Scheduling : The requirement of storage type for local/log directories may 
> not be satisfied for a part of nodes on heterogeneous clusters. To achieve 
> global optimum, scheduler should aware and manage disk resources. 
> ** Approach-1: Based on node attributes (YARN-3409), Scheduler can allocate 
> containers which have SSD requirement on nodes with attribute:ssd=true.
> ** Approach-2: Based on extended resource model (YARN-3926), it's easy to 
> support scheduling through extending resource models like vdisk and vssd 
> using this feature, but hard to measure for applications and isolate for 
> non-CFQ based disks.
> * Fallback strategy still needs to be concerned. Certain applications might 
> not work well when the requirement of storage type is not satisfied. When 
> none of desired storage type disk are available, should container launching 
> be failed? let AM handle? We have implemented a fallback strategy that fail 
> to launch container when none of desired storage type 

[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2018-11-16 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-5683:
-
Target Version/s: 3.3.0  (was: 3.2.0)

Bulk update: moved all 3.2.0 non-blocker issues, please move back if it is a 
blocker.

> Support specifying storage type for per-application local dirs
> --
>
> Key: YARN-5683
> URL: https://issues.apache.org/jira/browse/YARN-5683
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: oct16-hard
> Attachments: YARN-5683-1.patch, YARN-5683-2.patch, YARN-5683-3.patch, 
> flow_diagram_for_MapReduce-2.png, flow_diagram_for_MapReduce.png
>
>
> h3.  Introduction
> * Some applications of various frameworks (Flink, Spark and MapReduce etc) 
> using local storage (checkpoint, shuffle etc) might require high IO 
> performance. It's useful to allocate local directories to high performance 
> storage media for these applications on heterogeneous clusters.
> * YARN does not distinguish different storage types and hence applications 
> cannot selectively use storage media with different performance 
> characteristics. Adding awareness of storage media can allow YARN to make 
> better decisions about the placement of local directories.
> h3.  Approach
> * NodeManager will distinguish storage types for local directories.
> ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
> should allow the cluster administrator to optionally specify the storage type 
> for each local directories. Example: 
> [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
> [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
> ** StorageType defines DISK/SSD storage types and takes DISK as the default 
> storage type. 
> ** StorageLocation separates storage type and directory path, used by 
> LocalDirAllocator to aware the types of local dirs, the default storage type 
> is DISK.
> ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
> local directory of the specified storage type, and will fallback to not care 
> storage type if the requirement can not be satisfied.
> ** Support for container related local/log directories by ContainerLaunch. 
> All application frameworks can set the environment variables 
> (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage 
> type of local/log directories, and choose to not launch container if fallback 
> through these environment variables (ENSURE_LOCAL_STORAGE_TYPE and 
> ENSURE_LOG_STORAGE_TYPE).
> * Allow specified storage type for various frameworks (Take MapReduce as an 
> example)
> ** Add new configurations should allow application administrator to 
> optionally specify the storage type of local/log directories and fallback 
> strategy (MapReduce configurations: mapreduce.job.local-storage-type, 
> mapreduce.job.log-storage-type, mapreduce.job.ensure-local-storage-type and 
> mapreduce.job.ensure-log-storage-type).
> ** Support for container work directories. Set the environment variables 
> includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
> above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
> should update YARNRunner and TaskAttemptImpl)
> ** Add storage type prefix for request path to support for other local 
> directories of frameworks (such as shuffle directories for MapReduce). 
> (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
> support for output/work directories)
> ** Flow diagram for MapReduce framework
> !flow_diagram_for_MapReduce-2.png!
> h3.  Further Discussion
> * Scheduling : The requirement of storage type for local/log directories may 
> not be satisfied for a part of nodes on heterogeneous clusters. To achieve 
> global optimum, scheduler should aware and manage disk resources. 
> ** Approach-1: Based on node attributes (YARN-3409), Scheduler can allocate 
> containers which have SSD requirement on nodes with attribute:ssd=true.
> ** Approach-2: Based on extended resource model (YARN-3926), it's easy to 
> support scheduling through extending resource models like vdisk and vssd 
> using this feature, but hard to measure for applications and isolate for 
> non-CFQ based disks.
> * Fallback strategy still needs to be concerned. Certain applications might 
> not work well when the requirement of storage type is not satisfied. When 
> none of desired storage type disk are available, should container launching 
> be failed? let AM handle? We have implemented a fallback strategy that fail 
> to launch container when none of desired storage type disk are 

[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2017-08-16 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-5683:
---
Description: 
h3.  Introduction
* Some applications of various frameworks (Flink, Spark and MapReduce etc) 
using local storage (checkpoint, shuffle etc) might require high IO 
performance. It's useful to allocate local directories to high performance 
storage media for these applications on heterogeneous clusters.
* YARN does not distinguish different storage types and hence applications 
cannot selectively use storage media with different performance 
characteristics. Adding awareness of storage media can allow YARN to make 
better decisions about the placement of local directories.

h3.  Approach
* NodeManager will distinguish storage types for local directories.
** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
should allow the cluster administrator to optionally specify the storage type 
for each local directories. Example: 
[SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
[SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
** StorageType defines DISK/SSD storage types and takes DISK as the default 
storage type. 
** StorageLocation separates storage type and directory path, used by 
LocalDirAllocator to aware the types of local dirs, the default storage type is 
DISK.
** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
local directory of the specified storage type, and will fallback to not care 
storage type if the requirement can not be satisfied.
** Support for container related local/log directories by ContainerLaunch. All 
application frameworks can set the environment variables (LOCAL_STORAGE_TYPE 
and LOG_STORAGE_TYPE) to specified the desired storage type of local/log 
directories, and choose to not launch container if fallback through these 
environment variables (ENSURE_LOCAL_STORAGE_TYPE and ENSURE_LOG_STORAGE_TYPE).
* Allow specified storage type for various frameworks (Take MapReduce as an 
example)
** Add new configurations should allow application administrator to optionally 
specify the storage type of local/log directories and fallback strategy 
(MapReduce configurations: mapreduce.job.local-storage-type, 
mapreduce.job.log-storage-type, mapreduce.job.ensure-local-storage-type and 
mapreduce.job.ensure-log-storage-type).
** Support for container work directories. Set the environment variables 
includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
should update YARNRunner and TaskAttemptImpl)
** Add storage type prefix for request path to support for other local 
directories of frameworks (such as shuffle directories for MapReduce). 
(MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
support for output/work directories)
** Flow diagram for MapReduce framework
!flow_diagram_for_MapReduce-2.png!

h3.  Further Discussion
* Scheduling : The requirement of storage type for local/log directories may 
not be satisfied for a part of nodes on heterogeneous clusters. To achieve 
global optimum, scheduler should aware and manage disk resources. 
** Approach-1: Based on node attributes (YARN-3409), Scheduler can allocate 
containers which have SSD requirement on nodes with attribute:ssd=true.
** Approach-2: Based on extended resource model (YARN-3926), it's easy to 
support scheduling through extending resource models like vdisk and vssd using 
this feature, but hard to measure for applications and isolate for non-CFQ 
based disks.
* Fallback strategy still needs to be concerned. Certain applications might not 
work well when the requirement of storage type is not satisfied. When none of 
desired storage type disk are available, should container launching be failed? 
let AM handle? We have implemented a fallback strategy that fail to launch 
container when none of desired storage type disk are available. Is there some 
better methods? 

This feature has been used for half a year to meet the needs of some 
applications on Alibaba search clusters.
Please feel free to give your suggestions and opinions.

  was:
h3.  Introduction
* Some applications of various frameworks (Flink, Spark and MapReduce etc) 
using local storage (checkpoint, shuffle etc) might require high IO 
performance. It's useful to allocate local directories to high performance 
storage media for these applications on heterogeneous clusters.
* YARN does not distinguish different storage types and hence applications 
cannot selectively use storage media with different performance 
characteristics. Adding awareness of storage media can allow YARN to make 
better decisions about the placement of local directories.

h3.  Approach
* NodeManager will distinguish storage types for local directories.
** 

[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2016-10-27 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-5683:

Labels: oct16-hard  (was: )

> Support specifying storage type for per-application local dirs
> --
>
> Key: YARN-5683
> URL: https://issues.apache.org/jira/browse/YARN-5683
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>  Labels: oct16-hard
> Attachments: YARN-5683-1.patch, YARN-5683-2.patch, YARN-5683-3.patch, 
> flow_diagram_for_MapReduce-2.png, flow_diagram_for_MapReduce.png
>
>
> h3.  Introduction
> * Some applications of various frameworks (Flink, Spark and MapReduce etc) 
> using local storage (checkpoint, shuffle etc) might require high IO 
> performance. It's useful to allocate local directories to high performance 
> storage media for these applications on heterogeneous clusters.
> * YARN does not distinguish different storage types and hence applications 
> cannot selectively use storage media with different performance 
> characteristics. Adding awareness of storage media can allow YARN to make 
> better decisions about the placement of local directories.
> h3.  Approach
> * NodeManager will distinguish storage types for local directories.
> ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
> should allow the cluster administrator to optionally specify the storage type 
> for each local directories. Example: 
> [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
> [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
> ** StorageType defines DISK/SSD storage types and takes DISK as the default 
> storage type. 
> ** StorageLocation separates storage type and directory path, used by 
> LocalDirAllocator to aware the types of local dirs, the default storage type 
> is DISK.
> ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
> local directory of the specified storage type, and will fallback to not care 
> storage type if the requirement can not be satisfied.
> ** Support for container related local/log directories by ContainerLaunch. 
> All application frameworks can set the environment variables 
> (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage 
> type of local/log directories, and choose to not launch container if fallback 
> through these environment variables (ENSURE_LOCAL_STORAGE_TYPE and 
> ENSURE_LOG_STORAGE_TYPE).
> * Allow specified storage type for various frameworks (Take MapReduce as an 
> example)
> ** Add new configurations should allow application administrator to 
> optionally specify the storage type of local/log directories and fallback 
> strategy (MapReduce configurations: mapreduce.job.local-storage-type, 
> mapreduce.job.log-storage-type, mapreduce.job.ensure-local-storage-type and 
> mapreduce.job.ensure-log-storage-type).
> ** Support for container work directories. Set the environment variables 
> includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
> above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
> should update YARNRunner and TaskAttemptImpl)
> ** Add storage type prefix for request path to support for other local 
> directories of frameworks (such as shuffle directories for MapReduce). 
> (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
> support for output/work directories)
> ** Flow diagram for MapReduce framework
> !flow_diagram_for_MapReduce-2.png!
> h3.  Further Discussion
> * The requirement of storage type for local/log directories may not be 
> satisfied on heterogeneous clusters. To achieve global optimum, scheduler 
> should aware and manage disk resources. 
> [YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that 
> but seems not support multiple storage types, maybe we should do even more to 
> aware the storage type of disk resource?
> * Node labels or node constraints 
> ([YARN-3409|https://issues.apache.org/jira/browse/YARN-3409]) can also make a 
> higher chance to satisfy the requirement of specified storage type.
> * Fallback strategy still needs to be concerned. Certain applications might 
> not work well when the requirement of storage type is not satisfied. When 
> none of desired storage type disk are available, should container launching 
> be failed? let AM handle? We have implemented a fallback strategy that fail 
> to launch container when none of desired storage type disk are available. Is 
> there some better methods? 
> This feature has been used for half a year to meet the needs of some 
> applications on Alibaba search clusters.
> Please feel free 

[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2016-10-12 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-5683:
---
Attachment: YARN-5683-3.patch

> Support specifying storage type for per-application local dirs
> --
>
> Key: YARN-5683
> URL: https://issues.apache.org/jira/browse/YARN-5683
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-5683-1.patch, YARN-5683-2.patch, YARN-5683-3.patch, 
> flow_diagram_for_MapReduce-2.png, flow_diagram_for_MapReduce.png
>
>
> h3.  Introduction
> * Some applications of various frameworks (Flink, Spark and MapReduce etc) 
> using local storage (checkpoint, shuffle etc) might require high IO 
> performance. It's useful to allocate local directories to high performance 
> storage media for these applications on heterogeneous clusters.
> * YARN does not distinguish different storage types and hence applications 
> cannot selectively use storage media with different performance 
> characteristics. Adding awareness of storage media can allow YARN to make 
> better decisions about the placement of local directories.
> h3.  Approach
> * NodeManager will distinguish storage types for local directories.
> ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
> should allow the cluster administrator to optionally specify the storage type 
> for each local directories. Example: 
> [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
> [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
> ** StorageType defines DISK/SSD storage types and takes DISK as the default 
> storage type. 
> ** StorageLocation separates storage type and directory path, used by 
> LocalDirAllocator to aware the types of local dirs, the default storage type 
> is DISK.
> ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
> local directory of the specified storage type, and will fallback to not care 
> storage type if the requirement can not be satisfied.
> ** Support for container related local/log directories by ContainerLaunch. 
> All application frameworks can set the environment variables 
> (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage 
> type of local/log directories, and choose to not launch container if fallback 
> through these environment variables (ENSURE_LOCAL_STORAGE_TYPE and 
> ENSURE_LOG_STORAGE_TYPE).
> * Allow specified storage type for various frameworks (Take MapReduce as an 
> example)
> ** Add new configurations should allow application administrator to 
> optionally specify the storage type of local/log directories and fallback 
> strategy (MapReduce configurations: mapreduce.job.local-storage-type, 
> mapreduce.job.log-storage-type, mapreduce.job.ensure-local-storage-type and 
> mapreduce.job.ensure-log-storage-type).
> ** Support for container work directories. Set the environment variables 
> includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
> above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
> should update YARNRunner and TaskAttemptImpl)
> ** Add storage type prefix for request path to support for other local 
> directories of frameworks (such as shuffle directories for MapReduce). 
> (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
> support for output/work directories)
> ** Flow diagram for MapReduce framework
> !flow_diagram_for_MapReduce-2.png!
> h3.  Further Discussion
> * The requirement of storage type for local/log directories may not be 
> satisfied on heterogeneous clusters. To achieve global optimum, scheduler 
> should aware and manage disk resources. 
> [YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that 
> but seems not support multiple storage types, maybe we should do even more to 
> aware the storage type of disk resource?
> * Node labels or node constraints 
> ([YARN-3409|https://issues.apache.org/jira/browse/YARN-3409]) can also make a 
> higher chance to satisfy the requirement of specified storage type.
> * Fallback strategy still needs to be concerned. Certain applications might 
> not work well when the requirement of storage type is not satisfied. When 
> none of desired storage type disk are available, should container launching 
> be failed? let AM handle? We have implemented a fallback strategy that fail 
> to launch container when none of desired storage type disk are available. Is 
> there some better methods? 
> This feature has been used for half a year to meet the needs of some 
> applications on Alibaba search clusters.
> Please feel free to give your suggestions and opinions.




[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2016-10-11 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-5683:
---
Description: 
h3.  Introduction
* Some applications of various frameworks (Flink, Spark and MapReduce etc) 
using local storage (checkpoint, shuffle etc) might require high IO 
performance. It's useful to allocate local directories to high performance 
storage media for these applications on heterogeneous clusters.
* YARN does not distinguish different storage types and hence applications 
cannot selectively use storage media with different performance 
characteristics. Adding awareness of storage media can allow YARN to make 
better decisions about the placement of local directories.

h3.  Approach
* NodeManager will distinguish storage types for local directories.
** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
should allow the cluster administrator to optionally specify the storage type 
for each local directories. Example: 
[SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
[SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
** StorageType defines DISK/SSD storage types and takes DISK as the default 
storage type. 
** StorageLocation separates storage type and directory path, used by 
LocalDirAllocator to aware the types of local dirs, the default storage type is 
DISK.
** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
local directory of the specified storage type, and will fallback to not care 
storage type if the requirement can not be satisfied.
** Support for container related local/log directories by ContainerLaunch. All 
application frameworks can set the environment variables (LOCAL_STORAGE_TYPE 
and LOG_STORAGE_TYPE) to specified the desired storage type of local/log 
directories, and choose to not launch container if fallback through these 
environment variables (ENSURE_LOCAL_STORAGE_TYPE and ENSURE_LOG_STORAGE_TYPE).
* Allow specified storage type for various frameworks (Take MapReduce as an 
example)
** Add new configurations should allow application administrator to optionally 
specify the storage type of local/log directories and fallback strategy 
(MapReduce configurations: mapreduce.job.local-storage-type, 
mapreduce.job.log-storage-type, mapreduce.job.ensure-local-storage-type and 
mapreduce.job.ensure-log-storage-type).
** Support for container work directories. Set the environment variables 
includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
should update YARNRunner and TaskAttemptImpl)
** Add storage type prefix for request path to support for other local 
directories of frameworks (such as shuffle directories for MapReduce). 
(MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
support for output/work directories)
** Flow diagram for MapReduce framework
!flow_diagram_for_MapReduce-2.png!

h3.  Further Discussion
* The requirement of storage type for local/log directories may not be 
satisfied on heterogeneous clusters. To achieve global optimum, scheduler 
should aware and manage disk resources. 
[YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that 
but seems not support multiple storage types, maybe we should do even more to 
aware the storage type of disk resource?
* Node labels or node constraints 
([YARN-3409|https://issues.apache.org/jira/browse/YARN-3409]) can also make a 
higher chance to satisfy the requirement of specified storage type.
* Fallback strategy still needs to be concerned. Certain applications might not 
work well when the requirement of storage type is not satisfied. When none of 
desired storage type disk are available, should container launching be failed? 
let AM handle? We have implemented a fallback strategy that fail to launch 
container when none of desired storage type disk are available. Is there some 
better methods? 

This feature has been used for half a year to meet the needs of some 
applications on Alibaba search clusters.
Please feel free to give your suggestions and opinions.

  was:
h3.  Introduction
* Some applications of various frameworks (Flink, Spark and MapReduce etc) 
using local storage (checkpoint, shuffle etc) might require high IO 
performance. It's useful to allocate local directories to high performance 
storage media for these applications on heterogeneous clusters.
* YARN does not distinguish different storage types and hence applications 
cannot selectively use storage media with different performance 
characteristics. Adding awareness of storage media can allow YARN to make 
better decisions about the placement of local directories.

h3.  Approach
* NodeManager will distinguish storage types for local directories.
** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 

[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2016-10-11 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-5683:
---
Attachment: YARN-5683-2.patch
flow_diagram_for_MapReduce-2.png

> Support specifying storage type for per-application local dirs
> --
>
> Key: YARN-5683
> URL: https://issues.apache.org/jira/browse/YARN-5683
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-5683-1.patch, YARN-5683-2.patch, 
> flow_diagram_for_MapReduce-2.png, flow_diagram_for_MapReduce.png
>
>
> h3.  Introduction
> * Some applications of various frameworks (Flink, Spark and MapReduce etc) 
> using local storage (checkpoint, shuffle etc) might require high IO 
> performance. It's useful to allocate local directories to high performance 
> storage media for these applications on heterogeneous clusters.
> * YARN does not distinguish different storage types and hence applications 
> cannot selectively use storage media with different performance 
> characteristics. Adding awareness of storage media can allow YARN to make 
> better decisions about the placement of local directories.
> h3.  Approach
> * NodeManager will distinguish storage types for local directories.
> ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
> should allow the cluster administrator to optionally specify the storage type 
> for each local directories. Example: 
> [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
> [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
> ** StorageType defines DISK/SSD storage types and takes DISK as the default 
> storage type. 
> ** StorageLocation separates storage type and directory path, used by 
> LocalDirAllocator to aware the types of local dirs, the default storage type 
> is DISK.
> ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
> local directory of the specified storage type, and will fallback to not care 
> storage type if the requirement can not be satisfied.
> ** Support for container related local/log directories by ContainerLaunch. 
> All application frameworks can set the environment variables 
> (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage 
> type of local/log directories.
> * Allow specified storage type for various frameworks (Take MapReduce as an 
> example)
> ** Add new configurations should allow application administrator to 
> optionally specify the storage type of local/log directories. (MapReduce add 
> configurations: mapreduce.job.local-storage-type and 
> mapreduce.job.log-storage-type)
> ** Support for container work directories. Set the environment variables 
> includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
> above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
> should update YARNRunner and TaskAttemptImpl)
> ** Add storage type prefix for request path to support for other local 
> directories of frameworks (such as shuffle directories for MapReduce). 
> (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
> support for output/work directories)
> ** Flow diagram for MapReduce framework
> !flow_diagram_for_MapReduce.png!
> h3.  Further Discussion
> * The requirement of storage type for local/log directories may not be 
> satisfied on heterogeneous clusters. To achieve global optimum, scheduler 
> should aware and manage disk resources. 
> [YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that 
> but seems not support multiple storage types, maybe we should do even more to 
> aware the storage type of disk resource?
> * Node labels or node constraints 
> ([YARN-3409|https://issues.apache.org/jira/browse/YARN-3409]) can also make a 
> higher chance to satisfy the requirement of specified storage type.
> * Fallback strategy still needs to be concerned. Certain applications might 
> not work well when the requirement of storage type is not satisfied. When 
> none of desired storage type disk are available, should container launching 
> be failed? let AM handle?
> This feature has been used for half a year to meet the needs of some 
> applications on Alibaba search clusters.
> Please feel free to give your suggestions and opinions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2016-10-09 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-5683:
-
Target Version/s: 3.0.0-alpha2
   Fix Version/s: (was: 3.0.0-alpha2)

> Support specifying storage type for per-application local dirs
> --
>
> Key: YARN-5683
> URL: https://issues.apache.org/jira/browse/YARN-5683
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-5683-1.patch, flow_diagram_for_MapReduce.png
>
>
> h3.  Introduction
> * Some applications of various frameworks (Flink, Spark and MapReduce etc) 
> using local storage (checkpoint, shuffle etc) might require high IO 
> performance. It's useful to allocate local directories to high performance 
> storage media for these applications on heterogeneous clusters.
> * YARN does not distinguish different storage types and hence applications 
> cannot selectively use storage media with different performance 
> characteristics. Adding awareness of storage media can allow YARN to make 
> better decisions about the placement of local directories.
> h3.  Approach
> * NodeManager will distinguish storage types for local directories.
> ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
> should allow the cluster administrator to optionally specify the storage type 
> for each local directories. Example: 
> [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
> [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
> ** StorageType defines DISK/SSD storage types and takes DISK as the default 
> storage type. 
> ** StorageLocation separates storage type and directory path, used by 
> LocalDirAllocator to aware the types of local dirs, the default storage type 
> is DISK.
> ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
> local directory of the specified storage type, and will fallback to not care 
> storage type if the requirement can not be satisfied.
> ** Support for container related local/log directories by ContainerLaunch. 
> All application frameworks can set the environment variables 
> (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage 
> type of local/log directories.
> * Allow specified storage type for various frameworks (Take MapReduce as an 
> example)
> ** Add new configurations should allow application administrator to 
> optionally specify the storage type of local/log directories. (MapReduce add 
> configurations: mapreduce.job.local-storage-type and 
> mapreduce.job.log-storage-type)
> ** Support for container work directories. Set the environment variables 
> includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
> above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
> should update YARNRunner and TaskAttemptImpl)
> ** Add storage type prefix for request path to support for other local 
> directories of frameworks (such as shuffle directories for MapReduce). 
> (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
> support for output/work directories)
> ** Flow diagram for MapReduce framework
> !flow_diagram_for_MapReduce.png!
> h3.  Further Discussion
> * The requirement of storage type for local/log directories may not be 
> satisfied on heterogeneous clusters. To achieve global optimum, scheduler 
> should aware and manage disk resources. 
> [YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that 
> but seems not support multiple storage types, maybe we should do even more to 
> aware the storage type of disk resource?
> * Node labels or node constraints 
> ([YARN-3409|https://issues.apache.org/jira/browse/YARN-3409]) can also make a 
> higher chance to satisfy the requirement of specified storage type.
> * Fallback strategy still needs to be concerned. Certain applications might 
> not work well when the requirement of storage type is not satisfied. When 
> none of desired storage type disk are available, should container launching 
> be failed? let AM handle?
> This feature has been used for half a year to meet the needs of some 
> applications on Alibaba search clusters.
> Please feel free to give your suggestions and opinions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2016-10-09 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-5683:
-
Assignee: Tao Yang

> Support specifying storage type for per-application local dirs
> --
>
> Key: YARN-5683
> URL: https://issues.apache.org/jira/browse/YARN-5683
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
> Fix For: 3.0.0-alpha2
>
> Attachments: YARN-5683-1.patch, flow_diagram_for_MapReduce.png
>
>
> h3.  Introduction
> * Some applications of various frameworks (Flink, Spark and MapReduce etc) 
> using local storage (checkpoint, shuffle etc) might require high IO 
> performance. It's useful to allocate local directories to high performance 
> storage media for these applications on heterogeneous clusters.
> * YARN does not distinguish different storage types and hence applications 
> cannot selectively use storage media with different performance 
> characteristics. Adding awareness of storage media can allow YARN to make 
> better decisions about the placement of local directories.
> h3.  Approach
> * NodeManager will distinguish storage types for local directories.
> ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
> should allow the cluster administrator to optionally specify the storage type 
> for each local directories. Example: 
> [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
> [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
> ** StorageType defines DISK/SSD storage types and takes DISK as the default 
> storage type. 
> ** StorageLocation separates storage type and directory path, used by 
> LocalDirAllocator to aware the types of local dirs, the default storage type 
> is DISK.
> ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
> local directory of the specified storage type, and will fallback to not care 
> storage type if the requirement can not be satisfied.
> ** Support for container related local/log directories by ContainerLaunch. 
> All application frameworks can set the environment variables 
> (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage 
> type of local/log directories.
> * Allow specified storage type for various frameworks (Take MapReduce as an 
> example)
> ** Add new configurations should allow application administrator to 
> optionally specify the storage type of local/log directories. (MapReduce add 
> configurations: mapreduce.job.local-storage-type and 
> mapreduce.job.log-storage-type)
> ** Support for container work directories. Set the environment variables 
> includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
> above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
> should update YARNRunner and TaskAttemptImpl)
> ** Add storage type prefix for request path to support for other local 
> directories of frameworks (such as shuffle directories for MapReduce). 
> (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
> support for output/work directories)
> ** Flow diagram for MapReduce framework
> !flow_diagram_for_MapReduce.png!
> h3.  Further Discussion
> * The requirement of storage type for local/log directories may not be 
> satisfied on heterogeneous clusters. To achieve global optimum, scheduler 
> should aware and manage disk resources. 
> [YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that 
> but seems not support multiple storage types, maybe we should do even more to 
> aware the storage type of disk resource?
> * Node labels or node constraints 
> ([YARN-3409|https://issues.apache.org/jira/browse/YARN-3409]) can also make a 
> higher chance to satisfy the requirement of specified storage type.
> * Fallback strategy still needs to be concerned. Certain applications might 
> not work well when the requirement of storage type is not satisfied. When 
> none of desired storage type disk are available, should container launching 
> be failed? let AM handle?
> This feature has been used for half a year to meet the needs of some 
> applications on Alibaba search clusters.
> Please feel free to give your suggestions and opinions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2016-09-29 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-5683:
---
Description: 
h3.  Introduction
* Some applications of various frameworks (Flink, Spark and MapReduce etc) 
using local storage (checkpoint, shuffle etc) might require high IO 
performance. It's useful to allocate local directories to high performance 
storage media for these applications on heterogeneous clusters.
* YARN does not distinguish different storage types and hence applications 
cannot selectively use storage media with different performance 
characteristics. Adding awareness of storage media can allow YARN to make 
better decisions about the placement of local directories.

h3.  Approach
* NodeManager will distinguish storage types for local directories.
** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
should allow the cluster administrator to optionally specify the storage type 
for each local directories. Example: 
[SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
[SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
** StorageType defines DISK/SSD storage types and takes DISK as the default 
storage type. 
** StorageLocation separates storage type and directory path, used by 
LocalDirAllocator to aware the types of local dirs, the default storage type is 
DISK.
** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
local directory of the specified storage type, and will fallback to not care 
storage type if the requirement can not be satisfied.
** Support for container related local/log directories by ContainerLaunch. All 
application frameworks can set the environment variables (LOCAL_STORAGE_TYPE 
and LOG_STORAGE_TYPE) to specified the desired storage type of local/log 
directories.
* Allow specified storage type for various frameworks (Take MapReduce as an 
example)
** Add new configurations should allow application administrator to optionally 
specify the storage type of local/log directories. (MapReduce add 
configurations: mapreduce.job.local-storage-type and 
mapreduce.job.log-storage-type)
** Support for container work directories. Set the environment variables 
includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
should update YARNRunner and TaskAttemptImpl)
** Add storage type prefix for request path to support for other local 
directories of frameworks (such as shuffle directories for MapReduce). 
(MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
support for output/work directories)
** Flow diagram for MapReduce framework
!flow_diagram_for_MapReduce.png!

h3.  Further Discussion
* The requirement of storage type for local/log directories may not be 
satisfied on heterogeneous clusters. To achieve global optimum, scheduler 
should aware and manage disk resources. 
[YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that 
but seems not support multiple storage types, maybe we should do even more to 
aware the storage type of disk resource?
* Node labels or node constraints 
([YARN-3409|https://issues.apache.org/jira/browse/YARN-3409]) can also make a 
higher chance to satisfy the requirement of specified storage type.
* Fallback strategy still needs to be concerned. Certain applications might not 
work well when the requirement of storage type is not satisfied. When none of 
desired storage type disk are available, should container launching be failed? 
let AM handle?

This feature has been used for half a year to meet the needs of some 
applications on Alibaba search clusters.
Please feel free to give your suggestions and opinions.

  was:
h3.  Introduction
* Some applications of various frameworks (Flink, Spark and MapReduce etc) 
using local storage (checkpoint, shuffle etc) might require high IO 
performance. It's useful to allocate local directories to high performance 
storage media for these applications on heterogeneous clusters.
* YARN does not distinguish different storage types and hence applications 
cannot selectively use storage media with different performance 
characteristics. Adding awareness of storage media can allow YARN to make 
better decisions about the placement of local directories.

h3.  Approach
* NodeManager will distinguish storage types for local directories.
** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
should allow the cluster administrator to optionally specify the storage type 
for each local directories. Example: 
[SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
[SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
** StorageType defines DISK/SSD storage types and takes DISK as the default 
storage type. 
** StorageLocation separates 

[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2016-09-28 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-5683:
---
Description: 
h3.  Introduction
* Some applications of various frameworks (Flink, Spark and MapReduce etc) 
using local storage (checkpoint, shuffle etc) might require high IO 
performance. It's useful to allocate local directories to high performance 
storage media for these applications on heterogeneous clusters.
* YARN does not distinguish different storage types and hence applications 
cannot selectively use storage media with different performance 
characteristics. Adding awareness of storage media can allow YARN to make 
better decisions about the placement of local directories.

h3.  Approach
* NodeManager will distinguish storage types for local directories.
** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
should allow the cluster administrator to optionally specify the storage type 
for each local directories. Example: 
[SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
[SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
** StorageType defines DISK/SSD storage types and takes DISK as the default 
storage type. 
** StorageLocation separates storage type and directory path, used by 
LocalDirAllocator to aware the types of local dirs, the default storage type is 
DISK.
** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
local directory of the specified storage type, and will fallback to not care 
storage type if the requirement can not be satisfied.
** Support for container related local/log directories by ContainerLaunch. All 
application frameworks can set the environment variables (LOCAL_STORAGE_TYPE 
and LOG_STORAGE_TYPE) to specified the desired storage type of local/log 
directories.
* Allow specified storage type for various frameworks (Take MapReduce as an 
example)
** Add new configurations should allow application administrator to optionally 
specify the storage type of local/log directories. (MapReduce add 
configurations: mapreduce.job.local-storage-type and 
mapreduce.job.log-storage-type)
** Support for container work directories. Set the environment variables 
includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
should update YARNRunner and TaskAttemptImpl)
** Add storage type prefix for request path to support for other local 
directories of frameworks (such as shuffle directories for MapReduce). 
(MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
support for output/work directories)
** Flow diagram for MapReduce framework
!flow_diagram_for_MapReduce.png!

h3.  Further Discussion
* The requirement of storage type for local/log directories may not be 
satisfied on heterogeneous clusters. To achieve global optimum, scheduler 
should aware and manage disk resources. 
[YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that 
but seems not support multiple storage types, maybe we should do even more to 
aware the storage type of disk resource?
* Node labels or node constraints 
([YARN-3409|https://issues.apache.org/jira/browse/YARN-3409]) can also make a 
higher chance to satisfy the requirement of specified storage type.
* Fallback strategy still needs to be concerned. Certain applications might not 
work well when the requirement of storage type is not satisfied. When none of 
desired storage type disk are available, should container launching be failed? 
let AM handle?


  was:
h3.  Introduction
* Some applications of various frameworks (Flink, Spark and MapReduce etc) 
using local storage (checkpoint, shuffle etc) might require high IO 
performance. It's useful to allocate local directories to high performance 
storage media for these applications on heterogeneous clusters.
* YARN does not distinguish different storage types and hence applications 
cannot selectively use storage media with different performance 
characteristics. Adding awareness of storage media can allow YARN to make 
better decisions about the placement of local directories.

h3.  Approach
* NodeManager will distinguish storage types for local directories.
** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
should allow the cluster administrator to optionally specify the storage type 
for each local directories. Example: 
[SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
[SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
** StorageType defines DISK/SSD storage types and takes DISK as the default 
storage type. 
** StorageLocation separates storage type and directory path, used by 
LocalDirAllocator to aware the types of local dirs, the default storage type is 
DISK.
** getLocalPathForWrite method of 

[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2016-09-28 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-5683:
---
Description: 
h3.  Introduction
* Some applications of various frameworks (Flink, Spark and MapReduce etc) 
using local storage (checkpoint, shuffle etc) might require high IO 
performance. It's useful to allocate local directories to high performance 
storage media for these applications on heterogeneous clusters.
* YARN does not distinguish different storage types and hence applications 
cannot selectively use storage media with different performance 
characteristics. Adding awareness of storage media can allow YARN to make 
better decisions about the placement of local directories.

h3.  Approach
* NodeManager will distinguish storage types for local directories.
** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
should allow the cluster administrator to optionally specify the storage type 
for each local directories. Example: 
[SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
[SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
** StorageType defines DISK/SSD storage types and takes DISK as the default 
storage type. 
** StorageLocation separates storage type and directory path, used by 
LocalDirAllocator to aware the types of local dirs, the default storage type is 
DISK.
** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
local directory of the specified storage type, and will fallback to not care 
storage type if the requirement can not be satisfied.
** Support for container related local/log directories by ContainerLaunch. All 
application frameworks can set the environment variables (LOCAL_STORAGE_TYPE 
and LOG_STORAGE_TYPE) to specified the desired storage type of local/log 
directories.
* Allow specified storage type for various frameworks (Take MapReduce as an 
example)
** Add new configurations should allow application administrator to optionally 
specify the storage type of local/log directories. (MapReduce add 
configurations: mapreduce.job.local-storage-type and 
mapreduce.job.log-storage-type)
** Support for container work directories. Set the environment variables 
includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
should update YARNRunner and TaskAttemptImpl)
** Add storage type prefix for request path to support for other local 
directories of frameworks (such as shuffle directories for MapReduce). 
(MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
support for output/work directories)
** Flow diagram for MapReduce framework
!flow_diagram_for_MapReduce.png!

h3.  Further Discussion
* The requirement of storage type for local/log directories may not be 
satisfied on heterogeneous clusters. To achieve global optimum, scheduler 
should aware and management disk resources to. 
[YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that 
but seems not support multiple storage types, maybe we should do even more to 
aware the storage type of disk resource?
* Node labels or node constraints can also make a higher chance to satisfy the 
requirement of specified storage type.
* Fallback strategy still needs to be concerned. Certain applications might not 
work well when the requirement of storage type is not satisfied. When none of 
desired storage type disk are available, should container launching be failed? 
let AM handle?


  was:
h3.  Introduction
* Some applications of various frameworks (Flink, Spark and MapReduce etc) 
using local storage (checkpoint, shuffle etc) might require high IO 
performance. It's useful to allocate local directories to high performance 
storage media for these applications on heterogeneous clusters.
* YARN does not distinguish different storage types and hence applications 
cannot selectively use storage media with different performance 
characteristics. Adding awareness of storage media can allow YARN to make 
better decisions about the placement of local/log directories with input from 
applications. An application can choose the desired storage media by 
configuration based on its performance and requirements.

h3.  Approach
* NodeManager will distinguish storage types for local directories.
** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
should allow the cluster administrator to optionally specify the storage type 
for each local directories. Example: 
[SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
[SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
** StorageType defines DISK/SSD storage types and takes DISK as the default 
storage type. 
** StorageLocation separates storage type and directory path, used by 
LocalDirAllocator to aware the types of 

[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2016-09-28 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-5683:
---
Description: 
h3.  Introduction
* Some applications of various frameworks (Flink, Spark and MapReduce etc) 
using local storage (checkpoint, shuffle etc) might require high IO 
performance. It's useful to allocate local directories to high performance 
storage media for these applications on heterogeneous clusters.
* YARN does not distinguish different storage types and hence applications 
cannot selectively use storage media with different performance 
characteristics. Adding awareness of storage media can allow YARN to make 
better decisions about the placement of local/log directories with input from 
applications. An application can choose the desired storage media by 
configuration based on its performance and requirements.

h3.  Approach
* NodeManager will distinguish storage types for local directories.
** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
should allow the cluster administrator to optionally specify the storage type 
for each local directories. Example: 
[SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
[SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
** StorageType defines DISK/SSD storage types and takes DISK as the default 
storage type. 
** StorageLocation separates storage type and directory path, used by 
LocalDirAllocator to aware the types of local dirs, the default storage type is 
DISK.
** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
local directory of the specified storage type, and will fallback to not care 
storage type if the requirement can not be satisfied.
** Support for container related local/log directories by ContainerLaunch. All 
application frameworks can set the environment variables (LOCAL_STORAGE_TYPE 
and LOG_STORAGE_TYPE) to specified the desired storage type of local/log 
directories.
* Allow specified storage type for various frameworks (Take MapReduce as an 
example)
** Add new configurations should allow application administrator to optionally 
specify the storage type of local/log directories. (MapReduce add 
configurations: mapreduce.job.local-storage-type and 
mapreduce.job.log-storage-type)
** Support for container work directories. Set the environment variables 
includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
should update YARNRunner and TaskAttemptImpl)
** Add storage type prefix for request path to support for other local 
directories of frameworks (such as shuffle directories for MapReduce). 
(MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
support for output/work directories)
** Flow diagram for MapReduce framework
!flow_diagram_for_MapReduce.png!

h3.  Further Discussion
* The requirement of storage type for local/log directories may not be 
satisfied on heterogeneous clusters. To achieve global optimum, scheduler 
should aware and management disk resources to. 
[YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that 
but seems not support multiple storage types, maybe we should do even more to 
aware the storage type of disk resource?
* Node labels or node constraints can also make a higher chance to satisfy the 
requirement of specified storage type.
* Fallback strategy still needs to be concerned. Certain applications might not 
work well when the requirement of storage type is not satisfied. When none of 
desired storage type disk are available, should container launching be failed? 
let AM handle?


  was:
h1.  Introduction
* Some applications of various frameworks (Flink, Spark and MapReduce etc) 
using local storage (checkpoint, shuffle etc) might require high IO 
performance. It's useful to allocate local directories to high performance 
storage media for these applications on heterogeneous clusters.
* YARN does not distinguish different storage types and hence applications 
cannot selectively use storage media with different performance 
characteristics. Adding awareness of storage media can allow YARN to make 
better decisions about the placement of local/log directories with input from 
applications. An application can choose the desired storage media by 
configuration based on its performance and requirements.

h1.  Approach
* NodeManager will distinguish storage types for local directories.
** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
should allow the cluster administrator to optionally specify the storage type 
for each local directories. Example: 
[SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
[SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
** StorageType defines DISK/SSD storage types and takes 

[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2016-09-28 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-5683:
---
Attachment: flow_diagram_for_MapReduce.png

> Support specifying storage type for per-application local dirs
> --
>
> Key: YARN-5683
> URL: https://issues.apache.org/jira/browse/YARN-5683
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Tao Yang
> Fix For: 3.0.0-alpha2
>
> Attachments: YARN-5683-1.patch, flow_diagram_for_MapReduce.png
>
>
> h1.  Introduction
> * Some applications of various frameworks (Flink, Spark and MapReduce etc) 
> using local storage (checkpoint, shuffle etc) might require high IO 
> performance. It's useful to allocate local directories to high performance 
> storage media for these applications on heterogeneous clusters.
> * YARN does not distinguish different storage types and hence applications 
> cannot selectively use storage media with different performance 
> characteristics. Adding awareness of storage media can allow YARN to make 
> better decisions about the placement of local/log directories with input from 
> applications. An application can choose the desired storage media by 
> configuration based on its performance and requirements.
> h1.  Approach
> * NodeManager will distinguish storage types for local directories.
> ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
> should allow the cluster administrator to optionally specify the storage type 
> for each local directories. Example: 
> [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
> [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
> ** StorageType defines DISK/SSD storage types and takes DISK as the default 
> storage type. 
> ** StorageLocation separates storage type and directory path, used by 
> LocalDirAllocator to aware the types of local dirs, the default storage type 
> is DISK.
> ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
> local directory of the specified storage type, and will fallback to not care 
> storage type if the requirement can not be satisfied.
> ** Support for container related local/log directories by ContainerLaunch. 
> All application frameworks can set the environment variables 
> (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage 
> type of local/log directories.
> * Allow specified storage type for various frameworks (Take MapReduce as an 
> example)
> ** Add new configurations should allow application administrator to 
> optionally specify the storage type of local/log directories. (MapReduce add 
> configurations: mapreduce.job.local-storage-type and 
> mapreduce.job.log-storage-type)
> ** Support for container work directories. Set the environment variables 
> includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
> above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
> should update YARNRunner and TaskAttemptImpl)
> ** Add storage type prefix for request path to support for other local 
> directories of frameworks (such as shuffle directories for MapReduce). 
> (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
> support for output/work directories)
> h1.  Further Discussion
> * The requirement of storage type for local/log directories may not be 
> satisfied on heterogeneous clusters. To achieve global optimum, scheduler 
> should aware and management disk resources to. 
> [YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that 
> but seems not support multiple storage types, maybe we should do even more to 
> aware the storage type of disk resource?
> * Node labels or node constraints can also make a higher chance to satisfy 
> the requirement of specified storage type.
> * Fallback strategy still needs to be concerned. Certain applications might 
> not work well when the requirement of storage type is not satisfied. When 
> none of desired storage type disk are available, should container launching 
> be failed? let AM handle?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2016-09-28 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-5683:
---
Attachment: YARN-5683-1.patch

> Support specifying storage type for per-application local dirs
> --
>
> Key: YARN-5683
> URL: https://issues.apache.org/jira/browse/YARN-5683
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Tao Yang
> Fix For: 3.0.0-alpha2
>
> Attachments: YARN-5683-1.patch
>
>
> h1.  Introduction
> * Some applications of various frameworks (Flink, Spark and MapReduce etc) 
> using local storage (checkpoint, shuffle etc) might require high IO 
> performance. It's useful to allocate local directories to high performance 
> storage media for these applications on heterogeneous clusters.
> * YARN does not distinguish different storage types and hence applications 
> cannot selectively use storage media with different performance 
> characteristics. Adding awareness of storage media can allow YARN to make 
> better decisions about the placement of local/log directories with input from 
> applications. An application can choose the desired storage media by 
> configuration based on its performance and requirements.
> h1.  Approach
> * NodeManager will distinguish storage types for local directories.
> ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
> should allow the cluster administrator to optionally specify the storage type 
> for each local directories. Example: 
> [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
> [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
> ** StorageType defines DISK/SSD storage types and takes DISK as the default 
> storage type. 
> ** StorageLocation separates storage type and directory path, used by 
> LocalDirAllocator to aware the types of local dirs, the default storage type 
> is DISK.
> ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
> local directory of the specified storage type, and will fallback to not care 
> storage type if the requirement can not be satisfied.
> ** Support for container related local/log directories by ContainerLaunch. 
> All application frameworks can set the environment variables 
> (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage 
> type of local/log directories.
> * Allow specified storage type for various frameworks (Take MapReduce as an 
> example)
> ** Add new configurations should allow application administrator to 
> optionally specify the storage type of local/log directories. (MapReduce add 
> configurations: mapreduce.job.local-storage-type and 
> mapreduce.job.log-storage-type)
> ** Support for container work directories. Set the environment variables 
> includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
> above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
> should update YARNRunner and TaskAttemptImpl)
> ** Add storage type prefix for request path to support for other local 
> directories of frameworks (such as shuffle directories for MapReduce). 
> (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
> support for output/work directories)
> h1.  Further Discussion
> * The requirement of storage type for local/log directories may not be 
> satisfied on heterogeneous clusters. To achieve global optimum, scheduler 
> should aware and management disk resources to. 
> [YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that 
> but seems not support multiple storage types, maybe we should do even more to 
> aware the storage type of disk resource?
> * Node labels or node constraints can also make a higher chance to satisfy 
> the requirement of specified storage type.
> * Fallback strategy still needs to be concerned. Certain applications might 
> not work well when the requirement of storage type is not satisfied. When 
> none of desired storage type disk are available, should container launching 
> be failed? let AM handle?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2016-09-28 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-5683:
---
Description: 
h1.  Introduction
* Some applications of various frameworks (Flink, Spark and MapReduce etc) 
using local storage (checkpoint, shuffle etc) might require high IO 
performance. It's useful to allocate local directories to high performance 
storage media for these applications on heterogeneous clusters.
* YARN does not distinguish different storage types and hence applications 
cannot selectively use storage media with different performance 
characteristics. Adding awareness of storage media can allow YARN to make 
better decisions about the placement of local/log directories with input from 
applications. An application can choose the desired storage media by 
configuration based on its performance and requirements.

h1.  Approach
* NodeManager will distinguish storage types for local directories.
** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
should allow the cluster administrator to optionally specify the storage type 
for each local directories. Example: 
[SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
[SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
** StorageType defines DISK/SSD storage types and takes DISK as the default 
storage type. 
** StorageLocation separates storage type and directory path, used by 
LocalDirAllocator to aware the types of local dirs, the default storage type is 
DISK.
** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
local directory of the specified storage type, and will fallback to not care 
storage type if the requirement can not be satisfied.
** Support for container related local/log directories by ContainerLaunch. All 
application frameworks can set the environment variables (LOCAL_STORAGE_TYPE 
and LOG_STORAGE_TYPE) to specified the desired storage type of local/log 
directories.
* Allow specified storage type for various frameworks (Take MapReduce as an 
example)
** Add new configurations should allow application administrator to optionally 
specify the storage type of local/log directories. (MapReduce add 
configurations: mapreduce.job.local-storage-type and 
mapreduce.job.log-storage-type)
** Support for container work directories. Set the environment variables 
includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
should update YARNRunner and TaskAttemptImpl)
** Add storage type prefix for request path to support for other local 
directories of frameworks (such as shuffle directories for MapReduce). 
(MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
support for output/work directories)

h1.  Further Discussion
* The requirement of storage type for local/log directories may not be 
satisfied on heterogeneous clusters. To achieve global optimum, scheduler 
should aware and management disk resources to. 
[YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that 
but seems not support multiple storage types, maybe we should do even more to 
aware the storage type of disk resource?
* Node labels or node constraints can also make a higher chance to satisfy the 
requirement of specified storage type.
* Fallback strategy still needs to be concerned. Certain applications might not 
work well when the requirement of storage type is not satisfied. When none of 
desired storage type disk are available, should container launching be failed? 
let AM handle?


  was:
# Introduction
* Some applications of various frameworks (Flink, Spark and MapReduce etc) 
using local storage (checkpoint, shuffle etc) might require high IO 
performance. It's useful to allocate local directories to high performance 
storage media for these applications on heterogeneous clusters.
* YARN does not distinguish different storage types and hence applications 
cannot selectively use storage media with different performance 
characteristics. Adding awareness of storage media can allow YARN to make 
better decisions about the placement of local/log directories with input from 
applications. An application can choose the desired storage media by 
configuration based on its performance and requirements.

# Approach
* NodeManager will distinguish storage types for local directories.
** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
should allow the cluster administrator to optionally specify the storage type 
for each local directories. Example: 
[SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
[SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
** StorageType defines DISK/SSD storage types and takes DISK as the default 
storage type. 
** StorageLocation separates storage type