[ https://issues.apache.org/jira/browse/YARN-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tao Yang updated YARN-6737: --------------------------- Attachment: YARN-6737.001.patch Upload v1 patch for trunk. Sorry to be late for this update. I have scanned all the usages of AbstractYarnScheduler#getApplicationAttempt and CapacityScheduler#getApplicationAttempt and found one potential problem in QueuePriorityContainerCandidateSelector#preChecksForMovingReservedContainerToNode. {code} FiCaSchedulerApp app = preemptionContext.getScheduler().getCurrentApplicationAttempt( reservedContainer.getApplicationAttemptId()); if (!app.getAppSchedulingInfo().canDelayTo( reservedContainer.getAllocatedSchedulerKey(), ResourceRequest.ANY)) { // This is a hard locality request return false; } {code} NPE should happen here if app is no longer exist, I think we can correct it through adding null check for app like this (the outer caller will skip this invalid reservedContainer): {code} FiCaSchedulerApp app = preemptionContext.getScheduler().getCurrentApplicationAttempt( reservedContainer.getApplicationAttemptId()); if (app == null || !app.getAppSchedulingInfo().canDelayTo( reservedContainer.getAllocatedSchedulerKey(), ResourceRequest.ANY)) { // This is a hard locality request return false; } {code} [~sunilg] Please help to review this patch. Thanks! > Rename getApplicationAttempt to getCurrentAttempt in > AbstractYarnScheduler/CapacityScheduler > -------------------------------------------------------------------------------------------- > > Key: YARN-6737 > URL: https://issues.apache.org/jira/browse/YARN-6737 > Project: Hadoop YARN > Issue Type: Improvement > Affects Versions: 2.9.0, 3.0.0-alpha3 > Reporter: Tao Yang > Priority: Minor > Attachments: YARN-6737.001.patch > > > As discussed in YARN-6714 > (https://issues.apache.org/jira/browse/YARN-6714?focusedCommentId=16052158&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16052158) > AbstractYarnScheduler#getApplicationAttempt is inconsistent to its name, it > discarded application_attempt_id and always return the latest attempt. We > should: 1) Rename it to getCurrentAttempt, 2) Change parameter from attemptId > to applicationId. 3) Took a scan of all usages to see if any similar issue > could happen. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org