[ 
https://issues.apache.org/jira/browse/YARN-7003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470239#comment-16470239
 ] 

Tao Yang commented on YARN-7003:
--------------------------------

Thanks [~cheersyang] for your detailed analyzation. Make sense to me!

Attached v3 patch. Updates:
 * Check queue state and do recovery if queue state is STOPPED before calling 
\{{queue.submitApplication}} in CapacityScheduler#addApplicationOnRecovery 
which is only called for recovery of running applications. 
 * Fix check-style warning

For comment#2, got "ServiceStateException: ResourceManager cannot enter state 
STARTED from state STOPPED" exception when using rm.stop. To simulate RM 
restart through creating new MockRM instance is similar to other test cases for 
RM restart. So that this logic is not updated in the new patch.

 

> DRAINING state of queues can't be recovered after RM restart
> ------------------------------------------------------------
>
>                 Key: YARN-7003
>                 URL: https://issues.apache.org/jira/browse/YARN-7003
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.9.0, 3.0.0-alpha4
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Major
>         Attachments: YARN-7003.001.patch, YARN-7003.002.patch, 
> YARN-7003.003.patch
>
>
> DRAINING state is a temporary state in RM memory, when queue state is set to 
> be STOPPED but there are still some pending or active apps in it, the queue 
> state will be changed to DRAINING instead of STOPPED after refreshing queues. 
> We've encountered the problem that the state of this queue will aways be 
> STOPPED after RM restarted, so that it can be removed at any time and leave 
> some apps in a non-existing queue.
> To fix this problem, we could recover DRAINING state in the recovery process 
> of pending/active apps. I will upload a patch with test case later for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to