[ https://issues.apache.org/jira/browse/YARN-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tao Yang updated YARN-7591: --------------------------- Attachment: YARN-7591.001.patch Attaching init patch without UT for review. [~leftnoteasy], please help to review in your free time. Another question, It will be lots of things to do for detailed UT of these cases. Can you give some suggestions please? > NPE in async-scheduling mode of CapacityScheduler > ------------------------------------------------- > > Key: YARN-7591 > URL: https://issues.apache.org/jira/browse/YARN-7591 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 3.0.0-alpha4, 2.9.1 > Reporter: Tao Yang > Assignee: Tao Yang > Attachments: YARN-7591.001.patch > > > Currently in async-scheduling mode of CapacityScheduler, NPE may be raised in > special scenarios as below. > (1) The user should be removed after its last application finished, NPE may > be raised if getting something from user object without the null check in > async-scheduling threads. > (2) NPE may be raised when trying fulfill reservation for a finished > application in {{CapacityScheduler#allocateContainerOnSingleNode}}. > {code} > RMContainer reservedContainer = node.getReservedContainer(); > if (reservedContainer != null) { > FiCaSchedulerApp reservedApplication = getCurrentAttemptForContainer( > reservedContainer.getContainerId()); > // NPE here: reservedApplication could be null after this application > finished > // Try to fulfill the reservation > LOG.info( > "Trying to fulfill reservation for application " + > reservedApplication > .getApplicationId() + " on node: " + node.getNodeID()); > {code} > (3) If proposal1 (allocate containerX on node1) and proposal2 (reserve > containerY on node1) were generated by different async-scheduling threads > around the same time and proposal2 was submitted in front of proposal1, NPE > is raised when trying to submit proposal2 in > {{FiCaSchedulerApp#commonCheckContainerAllocation}}. > {code} > if (reservedContainerOnNode != null) { > // NPE here: allocation.getAllocateFromReservedContainer() should be > null for proposal2 in this case > RMContainer fromReservedContainer = > allocation.getAllocateFromReservedContainer().getRmContainer(); > if (fromReservedContainer != reservedContainerOnNode) { > if (LOG.isDebugEnabled()) { > LOG.debug( > "Try to allocate from a non-existed reserved container"); > } > return false; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org