[
https://issues.apache.org/jira/browse/YARN-11834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shilun Fan resolved YARN-11834.
-------------------------------
Fix Version/s: 3.5.0
Hadoop Flags: Reviewed
Resolution: Fixed
> [Capacity Scheduler] Application Stuck In ACCEPTED State due to Race Condition
> ------------------------------------------------------------------------------
>
> Key: YARN-11834
> URL: https://issues.apache.org/jira/browse/YARN-11834
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Affects Versions: 3.4.0, 3.4.1
> Reporter: Syed Shameerur Rahman
> Assignee: Syed Shameerur Rahman
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.5.0
>
>
> It was noted that in a Hadoop 3.4.1 YARN deployment, Spark application was
> stuck in ACCEPTED state even though the cluster had enough resources.
>
> *Steps to replicate*
> 1. Launch YARN cluster total capacity ≥ 1.59 TB memory, 660 vCores or more
> {{{}2.Apply the following properties{}}}{*}{{*}}
> *{{capacity-scheduler}}*
> *{{{}"yarn.scheduler.capacity.node-locality-delay": "-1",
> "yarn.scheduler.capacity.resource-calculator":
> "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"{}}}{{{},{}}}*
> {*}{{"}}{*}{*}{{{}}{}}}}{*}{{{}*{\{{}yarn.scheduler.capacity.schedule-asynchronously.enable*{}}}{*}{{"
> : "true"}}{*}
>
> *{{yarn-site}}*
> *{{"yarn.log-aggregation-enable": "true",}}*
> *{{{}"yarn.log-aggregation.retain-check-interval-seconds": "300",
> "yarn.log-aggregation.retain-seconds": "-1",
> "yarn.scheduler.capacity.max-parallel-apps": "1"{}}}{{{{}}{}}}*
> 3. Submit multiple Spark jobs that launch a large number of containers. For
> example:
> {{spark-example --conf spark.dynamicAllocation.enabled=false --num-executors
> 2000 --driver-memory 1g --executor-memory 1g --executor-cores 1 SparkPi 1000}}
>
> *Observations*
> On analysis the logs, The following were the observations :
> When Application 1 completes, there's a period where its resource requests
> are still being processed or "honored" by the scheduler. During this
> transition period, the following sequence could occur:
> 1. Application 1 completes and releases its resources
> 2. The scheduler is still processing some older allocation requests for
> Application 1
> 3. During this processing, the *cul.canAssign flag* for the user is set to
> false. Refer [Link#1
> |https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1670]and
> [Link
> #2|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1268]
> 4. Application 2 (which is new) tries to get resources
> 5. The scheduler checks the user's cul.canAssign flag, finds it's false (due
> to [cache
> implementation|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1241]),
> and denies resources to Application 2
> 6. Application 2 remains in ACCEPTED state despite available resources
> This race condition occurs because the user's resource usage state (tracked
> in the CapacityUsageLimit object) isn't properly reset or synchronized
> between the completion of one application and the scheduling of another.
>
> *Solutions*
> I can think of two solution for this race condition
> # *Cache Invalidation* : Invalidate the cache when no user information is
> fetched [here
> |https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1669]by
> doing this the new application (by the same user) will be forced to
> calculate new userLimits. The problem with this problem is repeated
> calculation of userLimits
> # *Skip setting cul.canAssign flag :* In this approach setting of
> cul.canAssign flag can be ignored if the application is already completed /
> removed from the applicationAttempt list - Refer
> [this|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractLeafQueue.java#L1267]
> code pointer
>
> I am personally inclined to approach 2
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]