[DISCUSS] Merge Resource Types (YARN-3926) to branch-3.0

Daniel Templeton Thu, 19 Oct 2017 12:53:45 -0700

After much offline discussion with Wangda, Sunil, Varun V., and Andrewwe've agreed that it would make sense to pull resource types intobranch-3.0 ahead of the Hadoop 3.0 RC0. Resource types has already beenmerged into trunk/3.1. Now I'd like open a discussion about getting itinto 3.0 GA. Here's the run-down:


Feature Details
---------------

Resource types replaces the two primitives that tracked CPU and memorywith an array of objects to track an arbitrary set of resources (thatmust always include CPU and memory). The resource manager reads themaster list of supported resources from its configs. The node managersread their resource values from their configs and report them to theresource manager in their heartbeats. The clients read the supportedresource types from their configs (or an RM service) and specify them inthe application submission. At a high level, nothing else changes.

The Resource object is a core construct in the resource manager andscheduler. All application operations end up touching Resource objectsas we determine fit or share-based priority for applications, queues,and nodes. As this feature replaces the core of how Resource objectswork, resource types impacts almost every aspect of the resourcemanager's operation. The change is pervasive, but not radical.

The resource types patches as merged into trunk/3.1 include anadditional feature called resource profiles. Resource profiles areactually independent of resource types, and either is useful without theother. The resource profiles code is still in a bit of flux, so thecurrent plan is to pull only the resource types code into branch-3.0. Ihave backported only the resource types patches into the resource-typesbranch. Unit tests are passing, and I don't see any significant riskfrom the split. The diff between the resource-types branch andbranch-3.0 is available as a branch-3.0 patch on YARN-7013[1].


Justification for 3.0
---------------------

Resource types (leaving out resource profiles) is in a stable state andis well tested with unit tests, performance tests, and functional testswith both the fair scheduler and the capacity scheduler. Tests were runon both the resource-types branch and the original YARN-3926 branch.There is some additional work to do, but none of it's critical (exceptmaybe improving the docs). Our confidence level in the feature is good.

Resource types doesn't introduce incompatible changes to any Public andStable APIs. The are some incompatible changes to Public and UnstableAPIs, but that's what a major release is for. The Resource object protoretains the CPU and memory fields and adds a new field for anyadditional resource types to retain wire compatibility. Other protochanges are all additive.

While it's not possible to turn resource types off per se, if the userdoes not activate the feature, the operation of YARN will be unchanged. Getting this feature into Hadoop 3.0 gives us the required groundwork tomake progress on tidying up the usage details without having to drag ina large set of invasive changes into 3.1.

If we don't pull resource types into 3.0, it will open a persistentchannel through which failures can be introduced through backporting. The differences introduced by resource types are significant enough thatit will be an issue for scheduler and resource manager patches between3.1 and 3.0.

From the other side, resource types is a pervasive change, and there'sno turning it off. Users will be impacted by it regardless of whetherthey choose to use it or not. While we've tested it, the featurerepresents a large number of changes to core code that's critical to theresource manager's operation. If we're going to introduce a largechange like this, no matter how well tested, we should do it in 3.0where users already expect some bumps in the road. Bringing in a largechange like this in a 3.1 release, when users expect the release to havestabilized, sounds like a bad idea.

What do folks think about pulling resource types back into branch-3.0 intime for RC0? Any concerns?

Thanks to Varun Vasudev, Sunil Govind, Wangda Tan, Yufei Gu, Grant Sohn,Jason Lowe, Arun Suresh, Karthik Kambatla, Vinod Vavilapalli, and AndrewWang for their work on getting the resource types work done, backported,tested, and on track for 3.0.

[1]:https://issues.apache.org/jira/secure/attachment/12892456/YARN-7013.branch-3.0.002.patch


---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[DISCUSS] Merge Resource Types (YARN-3926) to branch-3.0

Reply via email to