[ 
https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857831#comment-13857831
 ] 

Jian He commented on YARN-1493:
-------------------------------

bq. When submission is rejected by a parent queue, you need to call 
removeApplication. This existed before but your patch removed it.

The earlier addApplication is renamed to addApplicationAttempt, this 
addApplicationAttempt is called when the SchedulerAttemptAddedEvent comes.
So we are not adding or removing any application data structure in the leaf 
queue at all, we are adding/removing attempt in the leaf queue.

bq. finishApplicationAttempt: Should Inform the parent queue so that it can 
call finishApplicationAttempt itself. Similarly for submitApplicationAttempt.
ParetQueue’s finishApplicationAttempt and submitApplicationAttempt logic is 
empty,   ParetQueue only deal with app-specific logic in the current 
implementation. Do we still want to call parentQueue in attempt-specific APIs?

bq. We shouldn’t move to ACCEPTED directly before informing scheduler in case 
of recovery?
YARN-1507 is saving the application after app is accepted.  So after YARN-1507, 
an app is saved meaning it is accepted.  Maybe leave it for now and fix it in 
YARN-1507 ?

bq. RMAppEventType.ATTEMPT_FAILED event should not come in at ACCEPTED state?
This is possible because, RMAppRecoveredTransition is changed to return 
ACCEPTED state, and waiting for the AttemptFailed event to come (waiting for 
the previous AM to exit)
I changed it to ACCEPTED state instead of RUNNING because as said after 
YARN-1507, an app is saved meaning it is ACCEPTED. the app may not necessarily 
be at RUNNING state earlier. 

bq. When can this happen? During recovery? May be we should fix that correctly?
This can happen because I changed app to return ACCEPTED state on recovery, and 
on recovery the app once again go through the scheduler and triggers one more 
APP_ACCEPTED event at ACCEPTED state.

bq. TestFairScheduler: Why the conditional?
because testAclSubmitApplication is testing app2 to be null (AssertNull("The 
application was allowed", app2)), the app is rejected and no app exists.

> Schedulers don't recognize apps separately from app-attempts
> ------------------------------------------------------------
>
>                 Key: YARN-1493
>                 URL: https://issues.apache.org/jira/browse/YARN-1493
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, 
> YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch
>
>
> Today, scheduler is tied to attempt only.
> We need to separate app-level handling logic in scheduler. We can add new 
> app-level events to the scheduler and separate the app-level logic out. This 
> is good for work-preserving AM restart, RM restart, and also needed for 
> differentiating app-level metrics and attempt-level metrics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to