[ 
https://issues.apache.org/jira/browse/YARN-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516505#comment-16516505
 ] 

Miklos Szegedi commented on YARN-6794:
--------------------------------------

Thank you for the patch [~haibochen]. I have the following comments:
{code:java}
283 Resources.subtractFrom(allocatedResourceGuaranteed, resource);{code}
I think this should refer to allocatedResourceOpportunistic
{code:java}
526     Resource resource = rmContainer.getContainer().getResource();
527     if (Resources.fitsIn(resource, node.getUnallocatedResource())) {
528     node.promoteOpportunisticContainer(rmContainer);{code}
This is outside the critical section. The unallocated space may change after 
the check but before the promotion.
{code:java}
529     attemptOpportunisticResourceUsage.decUsed(resource);
530     attemptResourceUsage.incUsed(resource);
531     getQueue().incUsedGuaranteedResource(resource);{code}
SchedulerApplicationAttempt.getResourceUsageReport may return a transient 0 
since the write lock is not held at this point.
{code:java}
1391    public Resource getGuaranteedResourceUsage() {{code}
It might be safer to do a clone here and return a read only copy.
{code:java}
162     @Override
163     public void promoteOpportunisticContainer(RMContainer rmContainer) {
164     super.promoteOpportunisticContainer(rmContainer);
165     priorityContainers.remove(rmContainer);
166     }{code}
This might cause a double promotion. The function is not synchronized, so we do 
the promotion but priorityContainers can be sampled again with the same 
container listed before this finishes.
{code:java}
321     public List<RMContainer> getPriorityContainers() {{code}
It is probably a good idea to have this synchronized as well. The list can 
change while you grab the snapshot.
{code:java}
1124    //boolean validReservation = 
attemptToAssignReservedResources(node);{code}
?
{code:java}
1165    if (!assigned) {
1166    // break out of the loop because assigned being false
1167    // indicates there is no more resources are available.
1168    break;
1169    }{code}
Is this really true? I could imagine that we try but cannot assign a reserved 
resource but we do not even try the opportunistic queue in that case.

 

> Fair Scheduler to explicitly promote OPPORTUNISITIC containers locally at the 
> node where they're running
> --------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6794
>                 URL: https://issues.apache.org/jira/browse/YARN-6794
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, scheduler
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>            Priority: Major
>         Attachments: YARN-6794-YARN-1011.00.patch, 
> YARN-6794-YARN-1011.prelim.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to