[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15661566#comment-15661566
 ] 

Arun Suresh edited comment on YARN-4597 at 11/13/16 2:31 PM:
-------------------------------------------------------------

Updating patch addressing [~kkaranasos]'s and [~kasha]'s comments.

Thanks both of you for the detailed reviews.

I've responded to Karthik's comments directly on github.
[~kkaranasos], I've address your comments except the following.

1.
bq. queuedOpportunisticContainers will have concurrency issues. We are updating 
it when containers arrive but also in the shedQueuedOpportunisticContainers.
Good catch. Ive fixed it in the latest patch by sending a 
'SHED_QUEUED_CONTAINERS' event to the ContainerScheduler when the Node HB 
response from the RM asks to shed queued containers. In addition to preserving 
the fact that ContainerScheduler operations are serialized, it also ensures 
that the Node HB thread is not blocked.

2.
bq. shedQueuedOpportunisticContainers: numAllowed is the number of allowed 
containers in the queue. Instead, we are killing numAllowed containers. In 
other words, we should not kill numAllowed, but 
queuedOpportunisticContainers.size() - numAllowed.
Even though I agreed with you offline, I took a look again, and actually the 
logic is correct. The {{numAllowed}} counter is initialized to maxAllowed and 
the decremented in the loop. Containers are killed only AFTER it's value goes 
<= 0. In any case, I've added a testcase in 'TestContainerSchedulerQueuing' to 
verify that this actually works.

3.
bq. line 252, indeed we can now do extraOpportContainersToKill -> 
opportContainersToKill, as Karthik mentioned at a comment.
I think 'extra' is still apt. Since (as I mentioned to Karthik), these are 
'extra' opp containers over and above what is already present in the 
'oppContainersToKill'.

4.
bq. queuedGuaranteedContainers and queuedOpportunisticContainers: I think we 
should use queues. I don't think we retrieve the container by the key anywhere 
either ways.
Karthik mentioned this in his comments too. LinkedHashMaps are essentially a 
indexed queue. Additionally, there is actually 1 case where we need to retrieve 
by the key: When the AM asks to kill a container that is queued. Furthermore, 
queue re-ordering etc. might be easier with a map... Lets keep it as a 
LinkedHashMap unless we find it is detrimental in some way. 

5.
bq. oppContainersMarkedForKill: could be a Set, right?
Technically yes, but I would have to modify Container to add equals and 
hashcode too, which I felt was too much of a hastle.. I prefer to keep it as it 
is.

6.
bq. fields of the opportunisticContainersStatus are set in different places. 
Due to that, when we call getOpportunisticContainersStatus() we may see an 
inconsistent object. Let's set the fields only in the 
getOpportunisticContainersStatus().
Agreed... I've also added the oppMem, oppCores & numOpp values in the 
NodeManagerMetrics.

7.
bq. There seem to be two redundant parameters at YarnConfiguration at the 
moment: NM_CONTAINER_QUEUING_MIN_QUEUE_LENGTH and 
NM_OPPORTUNISTIC_CONTAINERS_MAX_QUEUE_LENGTH. If I am not missing something, we 
should keep one of the two.
Actually they are bit different. NM_CONTAINER_QUEUING_MIN_QUEUE_LENGTH and the 
corresponding max value is used by the RM to calculate a limit value for the 
Queue. It is possible that the Queue can momentarily go above that. While 
NM_OPPORTUNISTIC_CONTAINERS_MAX_QUEUE_LENGTH is used in the NM to prevent 
queuing beyond that value. It is a configuration hard limit ([~kasha] had 
requested it)
 




was (Author: asuresh):
Updating patch addressing [~kkaranasos]'s and [~kasha]'s comments

I've responded to Karthik's comments directly on github.

[~kkaranasos],
bq. queuedOpportunisticContainers will have concurrency issues. We are updating 
it when containers arrive but also in the shedQueuedOpportunisticContainers.
Good catch. Ive fixed it in the latest patch by sending a 
'SHED_QUEUED_CONTAINERS' event to the ContainerScheduler when the Node HB 
response from the RM asks to shed queued containers. In addition to preserving 
the fact that ContainerScheduler operations are serialized, it also ensures 
that the Node HB thread is not blocked.

bq. shedQueuedOpportunisticContainers: numAllowed is the number of allowed 
containers in the queue. Instead, we are killing numAllowed containers. In 
other words, we should not kill numAllowed, but 
queuedOpportunisticContainers.size() - numAllowed.
Even though I agreed with you offline, I took a look again, and actually the 
logic is correct. The {{numAllowed}} counter is initialized to maxAllowed and 
the decremented in the loop. Containers are killed only AFTER it's value goes 
<= 0. In any case, I've added a testcase in 'TestContainerSchedulerQueuing' to 
verify that this actually works.

bq. line 252, indeed we can now do extraOpportContainersToKill -> 
opportContainersToKill, as Karthik mentioned at a comment.
I think 'extra' is still apt. Since (as I mentioned to Karthik), these are 
'extra' opp containers over and above what is already present in the 
'oppContainersToKill'.

bq. queuedGuaranteedContainers and queuedOpportunisticContainers: I think we 
should use queues. I don't think we retrieve the container by the key anywhere 
either ways.
Karthik mentioned this in his comments too. LinkedHashMaps are essentially a 
indexed queue. Additionally, there is actually 1 case where we need to retrieve 
by the key: When the AM asks to kill a container that is queued. Furthermore, 
queue re-ordering etc. might be easier with a map... Lets keep it as a 
LinkedHashMap unless we find it is detrimental in some way. 

bq. oppContainersMarkedForKill: could be a Set, right?
Technically yes, but I would have to modify Container to add equals and 
hashcode too, which I felt was too much of a hastle.. I prefer to keep it as it 
is.

bq. fields of the opportunisticContainersStatus are set in different places. 
Due to that, when we call getOpportunisticContainersStatus() we may see an 
inconsistent object. Let's set the fields only in the 
getOpportunisticContainersStatus().
Agreed... I've also added the oppMem, oppCores & numOpp values in the 
NodeManagerMetrics.

bq. There seem to be two redundant parameters at YarnConfiguration at the 
moment: NM_CONTAINER_QUEUING_MIN_QUEUE_LENGTH and 
NM_OPPORTUNISTIC_CONTAINERS_MAX_QUEUE_LENGTH. If I am not missing something, we 
should keep one of the two.
Actually they are bit different. NM_CONTAINER_QUEUING_MIN_QUEUE_LENGTH and the 
corresponding max value is used by the RM to calculate a limit value for the 
Queue. It is possible that the Queue can momentarily go above that. While 
NM_OPPORTUNISTIC_CONTAINERS_MAX_QUEUE_LENGTH is used in the NM to prevent 
queuing beyond that value. It is a configuration hard limit ([~kasha] had 
requested it)
 



> Add SCHEDULE to NM container lifecycle
> --------------------------------------
>
>                 Key: YARN-4597
>                 URL: https://issues.apache.org/jira/browse/YARN-4597
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Chris Douglas
>            Assignee: Arun Suresh
>              Labels: oct16-hard
>         Attachments: YARN-4597.001.patch, YARN-4597.002.patch, 
> YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch, 
> YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch, 
> YARN-4597.009.patch, YARN-4597.010.patch
>
>
> Currently, the NM immediately launches containers after resource 
> localization. Several features could be more cleanly implemented if the NM 
> included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to