[ 
https://issues.apache.org/jira/browse/YARN-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13636102#comment-13636102
 ] 

Bikas Saha commented on YARN-562:
---------------------------------

The name clusterTimeStamp confuses me. What time is it? The use of that value 
is essentially and RM generation id or RM identifier. Currently we are using 
timestamp but tomorrow with HA we may have to move to a monotonically 
increasing number because machines may have time lag between them. So lets call 
it RMIdentifier or something like that. This also makes the NM code simpler to 
understand because we are saying containers come from the RM that has the same 
identifier as the RM currently connected to the NM. (which could be different 
RM's for different NM's if need be in the future).

Secondly, I dont think the containerManager.setBlockNewContainerRequests(). We 
can simply invalidate the RMIdentifier in the resync thread where we are 
currently calling setBlockNewContainerRequests(true). The identifier matching 
code would reject all further containers. We dont need to remember to call 
setBlockNewContainerRequests(false) as the next registration will automatically 
set a new valid value for RMIdentifier. This will simply work as is for work 
preserving restart.

Getting the value of RMIdentifier from NodeStatusUpdateImpl is not thread safe 
in the current patch. Instead of synchronized method we could simple use an 
AtomicLong if it exists. Or we could use a sychronized setter into 
ContainerManager to set the new value. This should be rare enough.
                
> NM should reject containers allocated by previous RM
> ----------------------------------------------------
>
>                 Key: YARN-562
>                 URL: https://issues.apache.org/jira/browse/YARN-562
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-562.1.patch, YARN-562.2.patch, YARN-562.3.patch, 
> YARN-562.4.patch, YARN-562.5.patch, YARN-562.6.patch
>
>
> Its possible that after RM shutdown, before AM goes down,AM still call 
> startContainer on NM with containers allocated by previous RM. When RM comes 
> back, NM doesn't know whether this container launch request comes from 
> previous RM or the current RM. we should reject containers allocated by 
> previous RM 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to