[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857831#comment-13857831 ]
Jian He commented on YARN-1493: ------------------------------- bq. When submission is rejected by a parent queue, you need to call removeApplication. This existed before but your patch removed it. The earlier addApplication is renamed to addApplicationAttempt, this addApplicationAttempt is called when the SchedulerAttemptAddedEvent comes. So we are not adding or removing any application data structure in the leaf queue at all, we are adding/removing attempt in the leaf queue. bq. finishApplicationAttempt: Should Inform the parent queue so that it can call finishApplicationAttempt itself. Similarly for submitApplicationAttempt. ParetQueue’s finishApplicationAttempt and submitApplicationAttempt logic is empty, ParetQueue only deal with app-specific logic in the current implementation. Do we still want to call parentQueue in attempt-specific APIs? bq. We shouldn’t move to ACCEPTED directly before informing scheduler in case of recovery? YARN-1507 is saving the application after app is accepted. So after YARN-1507, an app is saved meaning it is accepted. Maybe leave it for now and fix it in YARN-1507 ? bq. RMAppEventType.ATTEMPT_FAILED event should not come in at ACCEPTED state? This is possible because, RMAppRecoveredTransition is changed to return ACCEPTED state, and waiting for the AttemptFailed event to come (waiting for the previous AM to exit) I changed it to ACCEPTED state instead of RUNNING because as said after YARN-1507, an app is saved meaning it is ACCEPTED. the app may not necessarily be at RUNNING state earlier. bq. When can this happen? During recovery? May be we should fix that correctly? This can happen because I changed app to return ACCEPTED state on recovery, and on recovery the app once again go through the scheduler and triggers one more APP_ACCEPTED event at ACCEPTED state. bq. TestFairScheduler: Why the conditional? because testAclSubmitApplication is testing app2 to be null (AssertNull("The application was allowed", app2)), the app is rejected and no app exists. > Schedulers don't recognize apps separately from app-attempts > ------------------------------------------------------------ > > Key: YARN-1493 > URL: https://issues.apache.org/jira/browse/YARN-1493 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Jian He > Assignee: Jian He > Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, > YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch > > > Today, scheduler is tied to attempt only. > We need to separate app-level handling logic in scheduler. We can add new > app-level events to the scheduler and separate the app-level logic out. This > is good for work-preserving AM restart, RM restart, and also needed for > differentiating app-level metrics and attempt-level metrics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)