Folks,

We (i.e. Microsoft) have started stabilization of 2.9 for our production
deployment. During planning, we realized that we need to backport 3.x
features to support GPUs (and more resource types like network IO) natively
as part of the upgrade. We'd like to share that work with the community.

Instead of stabilizing the base release and cherry-picking fixes back to
Apache, we want to work publicly and push fixes directly into
trunk/.../branch-2 for a stable 2.10.0 release. Our goal is to create a
bridge release for our production clusters to the 3.x series and to address
scalability problems in large clusters (N*10k nodes). As we find issues, we
will file JIRAs and track resolution of significant regressions/faults in
wiki. Moreover, LinkedIn also has committed plans for a production
deployment of the same branch. We welcome broad participation, particularly
since we'll be stabilizing relatively new features.

The exact list of features we would like to backport in YARN are:

   - Support for Resource types [1][2]
   - Native support for GPUs[3]
   - Absolute Resource configuration in CapacityScheduler [4]


With regards to HDFS, we are currently looking at mainly fixes to Router
based Federation and Windows specific fixes which should anyways flow
normally.

Thoughts?

Thanks,
Subru/Arun

[1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg27786.html
[2] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg28281.html
[3] https://issues.apache.org/jira/browse/YARN-6223
[4] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg28772.html

Reply via email to