[ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15658402#comment-15658402 ]
Konstantinos Karanasos commented on YARN-4597: ---------------------------------------------- Here are some comments on the {{ContainerScheduler}}: - {{queuedOpportunisticContainers}} will have concurrency issues. We are updating it when containers arrive but also in the {{shedQueuedOpportunisticContainers}}. - {{queuedGuaranteedContainers}} and {{queuedOpportunisticContainers}}: I think we should use queues. I don't think we retrieve the container by the key anywhere either ways. - {{oppContainersMarkedForKill}}: could be a Set, right? - {{scheduledToRunContainers}} are containers that are either already running or are going to run very soon (transitioning from SCHEDULED to RUNNING state). Name is a bit misleading, because it sounds like they are only the ones belonging to the second category. I would rather say {{runningContainers}} and specify in a comment that they might not be running at this very moment but will be running very soon. - In the {{onContainerCompleted()}}, the {{scheduledToRunContainers.remove(container.getContainerId())}} and the {{startPendingContainers()}} can go inside the if statement above. If the container was not running and no resources were freed up, we don't need to call the {{startPendingContainers()}}. - fields of the {{opportunisticContainersStatus}} are set in different places. Due to that, when we call {{getOpportunisticContainersStatus()}} we may see an inconsistent object. Let's set the fields only in the {{getOpportunisticContainersStatus()}}. - line 252, indeed we can now do extraOpportContainersToKill -> opportContainersToKill, as Karthik mentioned at a comment. - line 87: increase -> increases - {{shedQueuedOpportunisticContainers}}: -- {{numAllowed}} is the number of allowed containers in the queue. Instead, we are killing numAllowed containers. In other words, we should not kill numAllowed, but {{queuedOpportunisticContainers.size() - numAllowed}}. -- "Container Killed to make room for Guaranteed Container." -> "Container killed to meet NM queuing limits". Instead of kill, you can also say de-queued. > Add SCHEDULE to NM container lifecycle > -------------------------------------- > > Key: YARN-4597 > URL: https://issues.apache.org/jira/browse/YARN-4597 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager > Reporter: Chris Douglas > Assignee: Arun Suresh > Labels: oct16-hard > Attachments: YARN-4597.001.patch, YARN-4597.002.patch, > YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch, > YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch, > YARN-4597.009.patch > > > Currently, the NM immediately launches containers after resource > localization. Several features could be more cleanly implemented if the NM > included a separate stage for reserving resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org