After much offline discussion with Wangda, Sunil, Varun V., and Andrew
we've agreed that it would make sense to pull resource types into
branch-3.0 ahead of the Hadoop 3.0 RC0. Resource types has already been
merged into trunk/3.1. Now I'd like open a discussion about getting it
into 3.0 GA. Here's the run-down:
Feature Details
---------------
Resource types replaces the two primitives that tracked CPU and memory
with an array of objects to track an arbitrary set of resources (that
must always include CPU and memory). The resource manager reads the
master list of supported resources from its configs. The node managers
read their resource values from their configs and report them to the
resource manager in their heartbeats. The clients read the supported
resource types from their configs (or an RM service) and specify them in
the application submission. At a high level, nothing else changes.
The Resource object is a core construct in the resource manager and
scheduler. All application operations end up touching Resource objects
as we determine fit or share-based priority for applications, queues,
and nodes. As this feature replaces the core of how Resource objects
work, resource types impacts almost every aspect of the resource
manager's operation. The change is pervasive, but not radical.
The resource types patches as merged into trunk/3.1 include an
additional feature called resource profiles. Resource profiles are
actually independent of resource types, and either is useful without the
other. The resource profiles code is still in a bit of flux, so the
current plan is to pull only the resource types code into branch-3.0. I
have backported only the resource types patches into the resource-types
branch. Unit tests are passing, and I don't see any significant risk
from the split. The diff between the resource-types branch and
branch-3.0 is available as a branch-3.0 patch on YARN-7013[1].
Justification for 3.0
---------------------
Resource types (leaving out resource profiles) is in a stable state and
is well tested with unit tests, performance tests, and functional tests
with both the fair scheduler and the capacity scheduler. Tests were run
on both the resource-types branch and the original YARN-3926 branch.
There is some additional work to do, but none of it's critical (except
maybe improving the docs). Our confidence level in the feature is good.
Resource types doesn't introduce incompatible changes to any Public and
Stable APIs. The are some incompatible changes to Public and Unstable
APIs, but that's what a major release is for. The Resource object proto
retains the CPU and memory fields and adds a new field for any
additional resource types to retain wire compatibility. Other proto
changes are all additive.
While it's not possible to turn resource types off per se, if the user
does not activate the feature, the operation of YARN will be unchanged.
Getting this feature into Hadoop 3.0 gives us the required groundwork to
make progress on tidying up the usage details without having to drag in
a large set of invasive changes into 3.1.
If we don't pull resource types into 3.0, it will open a persistent
channel through which failures can be introduced through backporting.
The differences introduced by resource types are significant enough that
it will be an issue for scheduler and resource manager patches between
3.1 and 3.0.
From the other side, resource types is a pervasive change, and there's
no turning it off. Users will be impacted by it regardless of whether
they choose to use it or not. While we've tested it, the feature
represents a large number of changes to core code that's critical to the
resource manager's operation. If we're going to introduce a large
change like this, no matter how well tested, we should do it in 3.0
where users already expect some bumps in the road. Bringing in a large
change like this in a 3.1 release, when users expect the release to have
stabilized, sounds like a bad idea.
What do folks think about pulling resource types back into branch-3.0 in
time for RC0? Any concerns?
Thanks to Varun Vasudev, Sunil Govind, Wangda Tan, Yufei Gu, Grant Sohn,
Jason Lowe, Arun Suresh, Karthik Kambatla, Vinod Vavilapalli, and Andrew
Wang for their work on getting the resource types work done, backported,
tested, and on track for 3.0.
[1]:
https://issues.apache.org/jira/secure/attachment/12892456/YARN-7013.branch-3.0.002.patch
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org