After much offline discussion with Wangda, Sunil, Varun V., and Andrew we've agreed that it would make sense to pull resource types into branch-3.0 ahead of the Hadoop 3.0 RC0.  Resource types has already been merged into trunk/3.1.  Now I'd like open a discussion about getting it into 3.0 GA.  Here's the run-down:

Feature Details
---------------
Resource types replaces the two primitives that tracked CPU and memory with an array of objects to track an arbitrary set of resources (that must always include CPU and memory).  The resource manager reads the master list of supported resources from its configs.  The node managers read their resource values from their configs and report them to the resource manager in their heartbeats.  The clients read the supported resource types from their configs (or an RM service) and specify them in the application submission.  At a high level, nothing else changes.

The Resource object is a core construct in the resource manager and scheduler.  All application operations end up touching Resource objects as we determine fit or share-based priority for applications, queues, and nodes.  As this feature replaces the core of how Resource objects work, resource types impacts almost every aspect of the resource manager's operation.  The change is pervasive, but not radical.

The resource types patches as merged into trunk/3.1 include an additional feature called resource profiles.  Resource profiles are actually independent of resource types, and either is useful without the other.  The resource profiles code is still in a bit of flux, so the current plan is to pull only the resource types code into branch-3.0.  I have backported only the resource types patches into the resource-types branch.  Unit tests are passing, and I don't see any significant risk from the split.  The diff between the resource-types branch and branch-3.0 is available as a branch-3.0 patch on YARN-7013[1].

Justification for 3.0
---------------------
Resource types (leaving out resource profiles) is in a stable state and is well tested with unit tests, performance tests, and functional tests with both the fair scheduler and the capacity scheduler.  Tests were run on both the resource-types branch and the original YARN-3926 branch. There is some additional work to do, but none of it's critical (except maybe improving the docs).  Our confidence level in the feature is good.

Resource types doesn't introduce incompatible changes to any Public and Stable APIs. The are some incompatible changes to Public and Unstable APIs, but that's what a major release is for. The Resource object proto retains the CPU and memory fields and adds a new field for any additional resource types to retain wire compatibility. Other proto changes are all additive.

While it's not possible to turn resource types off per se, if the user does not activate the feature, the operation of YARN will be unchanged.  Getting this feature into Hadoop 3.0 gives us the required groundwork to make progress on tidying up the usage details without having to drag in a large set of invasive changes into 3.1.

If we don't pull resource types into 3.0, it will open a persistent channel through which failures can be introduced through backporting.  The differences introduced by resource types are significant enough that it will be an issue for scheduler and resource manager patches between 3.1 and 3.0.

From the other side, resource types is a pervasive change, and there's no turning it off.  Users will be impacted by it regardless of whether they choose to use it or not.  While we've tested it, the feature represents a large number of changes to core code that's critical to the resource manager's operation.  If we're going to introduce a large change like this, no matter how well tested, we should do it in 3.0 where users already expect some bumps in the road.  Bringing in a large change like this in a 3.1 release, when users expect the release to have stabilized, sounds like a bad idea.


What do folks think about pulling resource types back into branch-3.0 in time for RC0?  Any concerns?

Thanks to Varun Vasudev, Sunil Govind, Wangda Tan, Yufei Gu, Grant Sohn, Jason Lowe, Arun Suresh, Karthik Kambatla, Vinod Vavilapalli, and Andrew Wang for their work on getting the resource types work done, backported, tested, and on track for 3.0.

[1]: https://issues.apache.org/jira/secure/attachment/12892456/YARN-7013.branch-3.0.002.patch

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to