[jira] [Updated] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-6059: Attachment: YARN-6059-YARN-5972.012.patch > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma >Priority: Blocker > Fix For: 2.9.0, 3.0.0 > > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, > YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, > YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, > YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch, > YARN-6059-YARN-5972.009.patch, YARN-6059-YARN-5972.010.patch, > YARN-6059-YARN-5972.011.patch, YARN-6059-YARN-5972.012.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-6059: Attachment: YARN-6059-YARN-5972.010.patch Retriggering jenkins.. > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, > YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, > YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, > YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch, > YARN-6059-YARN-5972.009.patch, YARN-6059-YARN-5972.010.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-6059: Attachment: YARN-6059-YARN-5972.009.patch > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, > YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, > YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, > YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch, > YARN-6059-YARN-5972.009.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895484#comment-15895484 ] Hitesh Sharma commented on YARN-6059: - Thanks [~kkaranasos] for the excellent feedback! Sorry for the delay in getting back as well. I have resolved the comments in the latest patch. Unfortunately the logs for style check and java doc are purged out. I will look at the results of the latest patch and resolve them. > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, > YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, > YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, > YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5501) Container Pooling in YARN
[ https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879783#comment-15879783 ] Hitesh Sharma commented on YARN-5501: - Hi [~asuresh], thanks for the feedback and sorry for the delay in responding. bq. From the doc, it looks like "detach" implies removing the pre-initialized container from the pool and "attach" referrs to associating an app with a pooled container. It might be simpler if we treat the operation as atomic. In that sense, we can make do with just having an "attach" or "lease", where a pre-initialized container is associated with an app. Not sure what atomic means here but we need to detach so that the YARN machinery can be updated to reflect the fact that the pre-initialized container was utilized. As part of this detaching we also can associate the resources (files downloaded) by the pre-initialized container to the actual application container that is going to use the pre-initialized container. This ensures that when the application container exits then all resources for pre-initialized container also get cleaned up. bq. For the sake of simplicity. Maybe we should assume that once an application is assigned a container from the pool and it has "attached" to it, it is the application's container and the Pooling framework relinquishes ownership of. The container then completes normally and all resource accounting is billed against the app. The pool of containers can be re-populated externally by the pool manager component in the RM (beyond the scope of this currently) Yes, agreed. We have the same thinking over here. bq. This is one of the reasons why I feel generalized resources would be useful here. Assume initialy we have a cluster with resources <10 vcores, 10 GB> spread across 2 NMs equally. Lets say we allocate 4 pre-initialized containers (via the pooling component in the RM) of type foo each with <1 vcore, 1 GB>. Lets say's we distribute it equally across the NMs. Once the pre-initialized containers have started, the total cluster resources would be <6 vcores, 6 GB, 4 foo>. Each NM would have <3 vcores, 3 GB, 2 foo> available resources. Now if an app asks for <0 vcores, 0 GB, 1 foo>, it will be allocated 1 pooled container and the resources associated with 1 foo <1 vcore, 1 GB> can be accounted against the app. The app can also maybe ask for <1 vcore, 1 GB, 1 foo>, in which case, the app will still be assigned one of the pooled containers with the assumption that, the container's size can expand by <1 vcore, 1 GB> if required. Cgroups/JobObjects to be used to enforce this. Agreed. bq. AM Container communication. In our PoC we introduced a new API in the container executor (attachContainer) which is called when a pre-initialized container is used up by an actual AM. Either the ContainerExecutor or the ContainerRuntime could be used for this purpose. But for now the application would need to have a way for establishing communication with the pre-init container. Thanks for the feedback guys. Appreciate the time and help. > Container Pooling in YARN > - > > Key: YARN-5501 > URL: https://issues.apache.org/jira/browse/YARN-5501 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: Container Pooling in YARN.pdf, Container Pooling - one > pager.pdf > > > This JIRA proposes a method for reducing the container launch latency in > YARN. It introduces a notion of pooling *Unattached Pre-Initialized > Containers*. > Proposal in brief: > * Have a *Pre-Initialized Container Factory* service within the NM to create > these unattached containers. > * The NM would then advertise these containers as special resource types > (this should be possible via YARN-3926). > * When a start container request is received by the node manager for > launching a container requesting this specific type of resource, it will take > one of these unattached pre-initialized containers from the pool, and use it > to service the container request. > * Once the request is complete, the pre-initialized container would be > released and ready to serve another request. > This capability would help reduce container launch latencies and thereby > allow for development of more interactive applications on YARN. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873303#comment-15873303 ] Hitesh Sharma commented on YARN-6059: - [~kkaranasos], thank you for the great feedback. The updated patch resolves the issues you brought up. I'm not sure why those stylecheck issues are coming because things look ok to me. I will look at it again later. One thing that I'm not clear about is how to interpret or understand the loadContainerState method in NMLevelDbStateStoreService. Can you elaborate what the if conditions look for because they are very confusing. I understand the part about suffixes not being removed, but what does the check on rcsStatus do. else if (suffix.equals(CONTAINER_QUEUED_KEY_SUFFIX)) { if (rcs.status == RecoveredContainerStatus.REQUESTED) { rcs.status = RecoveredContainerStatus.QUEUED; } } > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, > YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, > YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, > YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5501) Container Pooling in YARN
[ https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5501: Attachment: Container Pooling in YARN.pdf > Container Pooling in YARN > - > > Key: YARN-5501 > URL: https://issues.apache.org/jira/browse/YARN-5501 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: Container Pooling in YARN.pdf, Container Pooling - one > pager.pdf > > > This JIRA proposes a method for reducing the container launch latency in > YARN. It introduces a notion of pooling *Unattached Pre-Initialized > Containers*. > Proposal in brief: > * Have a *Pre-Initialized Container Factory* service within the NM to create > these unattached containers. > * The NM would then advertise these containers as special resource types > (this should be possible via YARN-3926). > * When a start container request is received by the node manager for > launching a container requesting this specific type of resource, it will take > one of these unattached pre-initialized containers from the pool, and use it > to service the container request. > * Once the request is complete, the pre-initialized container would be > released and ready to serve another request. > This capability would help reduce container launch latencies and thereby > allow for development of more interactive applications on YARN. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5501) Container Pooling in YARN
[ https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871293#comment-15871293 ] Hitesh Sharma commented on YARN-5501: - Hi [~jlowe], thanks for the feedback. I have captured some of the discussion in the attached document. [~arun suresh], [~vvasudev], please have look and share your thoughts. Look forward to the discussion. > Container Pooling in YARN > - > > Key: YARN-5501 > URL: https://issues.apache.org/jira/browse/YARN-5501 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: Container Pooling in YARN.pdf, Container Pooling - one > pager.pdf > > > This JIRA proposes a method for reducing the container launch latency in > YARN. It introduces a notion of pooling *Unattached Pre-Initialized > Containers*. > Proposal in brief: > * Have a *Pre-Initialized Container Factory* service within the NM to create > these unattached containers. > * The NM would then advertise these containers as special resource types > (this should be possible via YARN-3926). > * When a start container request is received by the node manager for > launching a container requesting this specific type of resource, it will take > one of these unattached pre-initialized containers from the pool, and use it > to service the container request. > * Once the request is complete, the pre-initialized container would be > released and ready to serve another request. > This capability would help reduce container launch latencies and thereby > allow for development of more interactive applications on YARN. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5501) Container Pooling in YARN
[ https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871293#comment-15871293 ] Hitesh Sharma edited comment on YARN-5501 at 2/17/17 6:50 AM: -- Hi [~jlowe], thanks for the feedback. I have captured some of the discussion in the attached document. [~asuresh]], [~vvasudev], please have look and share your thoughts. Look forward to the discussion. was (Author: hrsharma): Hi [~jlowe], thanks for the feedback. I have captured some of the discussion in the attached document. [~arun suresh], [~vvasudev], please have look and share your thoughts. Look forward to the discussion. > Container Pooling in YARN > - > > Key: YARN-5501 > URL: https://issues.apache.org/jira/browse/YARN-5501 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: Container Pooling in YARN.pdf, Container Pooling - one > pager.pdf > > > This JIRA proposes a method for reducing the container launch latency in > YARN. It introduces a notion of pooling *Unattached Pre-Initialized > Containers*. > Proposal in brief: > * Have a *Pre-Initialized Container Factory* service within the NM to create > these unattached containers. > * The NM would then advertise these containers as special resource types > (this should be possible via YARN-3926). > * When a start container request is received by the node manager for > launching a container requesting this specific type of resource, it will take > one of these unattached pre-initialized containers from the pool, and use it > to service the container request. > * Once the request is complete, the pre-initialized container would be > released and ready to serve another request. > This capability would help reduce container launch latencies and thereby > allow for development of more interactive applications on YARN. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-6059: Attachment: YARN-6059-YARN-5972.008.patch > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, > YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, > YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, > YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-6059: Attachment: YARN-6059-YARN-5972.007.patch > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, > YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, > YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, > YARN-6059-YARN-5972.007.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-6059: Attachment: YARN-6059-YARN-5972.006.patch > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, > YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, > YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-6059: Attachment: YARN-6059-YARN-5972.005.patch > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, > YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, > YARN-6059-YARN-5972.005.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864980#comment-15864980 ] Hitesh Sharma commented on YARN-6059: - Thanks for the feedback, [~kkaranasos]. I have resolved the issues and posted a new patch. Please note that some of the failures in unittests and javadoc aren't due to my changes, so I think we can ignore them. > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, > YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-6059: Attachment: YARN-6059-YARN-5972.004.patch > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, > YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-6059: Attachment: YARN-6059-YARN-5972.003.patch Resolving CR comments. > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, > YARN-6059-YARN-5972.003.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862607#comment-15862607 ] Hitesh Sharma commented on YARN-6059: - Hi [~kkaranasos], thanks for the feedback! [~asuresh], can you also take a look, please? bq. The paused events are not stored in the NMStateStore. You need to add that in the ContainerScheduler, as we do for the QUEUED containers, e.g., with this.context.getNMStateStore().storeContainerQueued. Thanks, fixed this. bq. You need to make sure that, when a PAUSED container is relaunched, we add a new entry to the NMStateStore to mark it as RUNNING again. We don't launch the paused container and instead we simply kill it. bq. In the RecoverPausedContainerLaunch, you should raise a ContainerEvent to indicate that the container finished its execution, like we do with the other *ContainerLaunch classes, with something like the following: Hmm..since we kill the container an event for that will be raised automatically. Am I missing something? bq. In RecoveredContainerLaunch, indentation needs to be fixed. Can you elaborate? I don't see any stylecheck errors for this. > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-6059: Attachment: YARN-6059-YARN-5972.002.patch > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5501) Container Pooling in YARN
[ https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860407#comment-15860407 ] Hitesh Sharma commented on YARN-5501: - Thanks for the continued feedback, [~jlowe]. bq. Thanks for the detailed answers. I highly recommend these get captured in a new revision of the design document along with diagrams to help others come up to speed and join the discussion. Otherwise we have a wall of text in this JIRA that is essential reading for anyone understanding what is really being proposed. Sure, I will do that. bq. My thinking is the mere fact that the container finishes is indicative that it is ready for reuse. Maybe there's an explicit API they call at the end of the task or whatever, but the same has to be done for this existing design as well -- we need to know when the container is ready to be reused. The main difference I see in this approach is that there isn't an explicit 'pre-init' step where the users or admins need to premeditate what will be run. Instead the first run of the app framework is the same performance it is today, but subsequent runs are faster since it can reused those cached containers. Seems to me the most difficult part of this is coming up with an efficient container request protocol so YARN can know when it can reuse an old, cached container and when it cannot. The existing proposal works around this by requiring the containers to be setup beforehand as special resource types, but that won't work for a general container caching approach. Yes, once we have a way to know that the container is ready to be reused then we can issue a detach on it and add it to the container pool. We had a similar issue in our PoC where we needed to know that the container is pre-initialized or not (as the launched process can take some minutes to be fully ready). As you also said there is no protocol between YARN NM and the container to know what's going on, so we ended up looking for a trigger file in the pre-init container working directory to detect that pre-initialization is done, and the container can be inducted into the pool after that. We can use something similar to this for e.g. containers can create a trigger file and launcher looks for that. If found it detaches the container and inducts it into the pool. bq. They certainly are enforced, but how does the app know about the new constraints so they can either avoid getting shot or take advantage of the new space? Simply updating the cgroup is not going to be sufficient. Either the process will OOM because it slams into the new lower limit (potentially instantly if it is already bigger than the new limit) or it will be completely oblivious that it now has acres of memory that it can use. If it tried to use it before it would fail, so how does it know it grew? For example, the JVM can't magically do this unless the app is doing some sort of explicit off-heap memory management via direct buffers, etc. and is told about its memory limit. Simply updating the cgroup setting doesn't seem to be a sufficient communication channel here, so I'm curious how that's all you need to do for your scenario. It is ok for our scenario because the process we pre-initialize don't do any work and simply initialize themselves. These processes also happen to be not JVM in general (and where we do use JVM I don't think we specify resource limits when starting the processes). When the actual AM comes around and issues work then the processes started a priori require resources and that time we adjust the cgroup or job object. Strictly speaking we aren't using resource resizing as it is in YARN but have our own mechanisms to update the resource constraints. > Container Pooling in YARN > - > > Key: YARN-5501 > URL: https://issues.apache.org/jira/browse/YARN-5501 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: Container Pooling - one pager.pdf > > > This JIRA proposes a method for reducing the container launch latency in > YARN. It introduces a notion of pooling *Unattached Pre-Initialized > Containers*. > Proposal in brief: > * Have a *Pre-Initialized Container Factory* service within the NM to create > these unattached containers. > * The NM would then advertise these containers as special resource types > (this should be possible via YARN-3926). > * When a start container request is received by the node manager for > launching a container requesting this specific type of resource, it will take > one of these unattached pre-initialized containers from the pool, and use it > to service the container request. > * Once the request is complete, the pre-initialized container would be > released and ready to serv
[jira] [Commented] (YARN-5501) Container Pooling in YARN
[ https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858926#comment-15858926 ] Hitesh Sharma commented on YARN-5501: - [~jlowe], thanks for the great feedback and time taken to respond. Some more details on how attach and detach container actually work. PoolManager creates the pre-initialized containers and they are not different from regular containers in any real way. When ContainerManager receives a startContainer request then it issues a DETACH_CONTAINER event. The detach really exists to ensure that we can cleanup the state associated with the pre-init container but avoid cleaning up the resources localized. ContainerManager listens for CONTAINER_DETACHED event and once it receives that then it creates the ContainerImpl for the requested container, but passes the information related to the detached container as the ContainerImpl c'tor. The ContainerManager also follows through the regular code paths of starting the container, which means that resource localization happens for the new container, and when it comes to raising the launch event then the ContainerImpl instead raises the ATTACH_CONTAINER event. This allows the ContainersLauncher to call the attachContainer on the executor, which is where we make the choice of launching the other processes required for that container. I hope this helps clarify things a little bit more. bq. I'm thinking of a use-case where the container is a base set that applies to all instances of an app framework, but each app may need a few extra things localized to do an app-specific thing (think UDFs for Hive/Pig, etc.). Curious if that is planned and how to deal with the lifecycle of those "extra" per-app things. Yes, the base set of things applies to all instances of the app framework. But localization is still done for each instance so you can for e.g. download a set of binaries via pre-initialization but more job specific things can come later. bq. So it sounds like there is a new container ID generated in the application's container namespace as part of the "allocation" to fill the app's request, but this container ID is aliased to an already existing container ID in another application's namespace, not only at the container executor level but all the way up to the container ID seen at the app level, correct? The application gets a container ID from YARN RM and uses that for all purposes. On the NM we internally switch to use the pre-init container ID as the PID. For e.g. pre-init container had the ID container1234 while the AM requested container had the ID containerABCD. Even though we reuse the existing pre-init container1234 to service the start container request on the NM we never surface container1234 to the application and the app always sees containerABCD. bq. One idea is to treat these things like the page cache in Linux. In other words, we keep a cache of idle containers as apps run them. These containers, like page cache entries, will be quickly discarded if they are unused and we need to make room for other containers. We're simply caching successful containers that have been run on the cluster, ready to run another task just like it. Apps would still need to make some tweaks to their container code so it talks the yet-to-be-detailed-and-mysterious attach/detach protocol so they can participate in this automatic container cache, and there would need to be changes in how containers are requested so the RM can properly match a request to an existing container (something that already has to be done for any reuse approach). Seems like it would adapt well to shifting loads on the cluster and doesn't require a premeditated, static config by users to get their app load to benefit. Has something like that been considered? That is a very interesting idea. If the app can provide some hints as to when it is good to consider a container pre-initialized then when the container finishes we can carry out the required operations to go back to the pre-init state. Thanks for bringing this up. bq. I think that's going to be challenging for the apps in practice and will limit which apps can leverage this feature reliably. This is going to be challenging for containers runniing VMs whose memory limits need to be setup at startup (e.g.: JVMs). Minimally I think this feature needs a way for apps to specify that they do not have a way to communicate (or at least act upon) memory changes. In those cases YARN will have to decide on tradeoffs like a primed-but-oversized container that will run fast but waste grid resources and also avoid reusing a container that needs to grow to satisfy the app request. Hmm..let me look at the code and see how container resizing works today. What you are saying makes sense, but in that case container resizing won't work as well. For our scenarios resourc
[jira] [Commented] (YARN-5501) Container Pooling in YARN
[ https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858564#comment-15858564 ] Hitesh Sharma commented on YARN-5501: - Hi [~jlowe], First off all a big thanks for taking the time to look at the document and sharing your thoughts. I appreciate it a lot. bq. I am confused on how this will be used in practice. To me pre-initialized containers means containers that have already started up with application- or framework-specific resources localized, processes have been launched using those resources, and potentially connections already negotiated to external services. I'm not sure how YARN is supposed to know what mix of local resources, users, and configs to use for preinitialized containers that will get a good "hit rate" on container requests. Maybe I'm misunderstanding what is really meant by "preinitialized," and some concrete, sample use cases with detailed walkthroughs of how they work in practice would really help crystallize the goals here. Your understanding of pre-initialized containers is correct here. In the proposed design YARN RM has the config to start pre-initialized containers and this config is pretty much a launch context, which contains launch commands, details of resources to localize, and we also provide the resource constraints with which the container should be started. This configuration is currently static, but in the future we intend to this to be pluggable, so we can extend it to be dynamic and adjust based on cluster load. The first use case happens to be a scenario where each of the container needs to start some processes that take a lot of time to initialize (localization and startup costs). YARN NM receives the config to start the pre-initialized container (there is a dummy application that is associated with the pre-init container for a specific application) and it follows the regular code paths for a container which includes localizing resources and launching the container. As you know, in YARN a container goes to RUNNING state once started, but a pre-initialized container instead goes to PREINITIALIZED state (there are some hooks which allow us to know that the container has initialized properly). From this point the container is not different from a regular container as the container monitor is overlooking it. The "Pool Manager" within YARN NM is used to start the pre-initialized container and watches for container events like stop in which case it simply tries to start it again. In other words at the moment we simply use YARN RM to pick the nodes where pre-initialized container should be started and let the "Pool Manager" in the NM manage the lifecycle of the container. When the AM for which we pre-initialized the container comes and asks for this container then the "Container Manager" takes the pre-initialized container by issuing a "detach" container event and "attaches" it to the application. We added attachContainer and detachContainer events into ContainerExecutor which allow us to define what they mean. As an example, in attachContainer we start a new process within the cgroup of pre-initialized container. The PID to container mapping within the ContainerExecutor is updated to reflect everything accordingly (pre-initialized containers have a different container ID and belong to a different application before they are taken up). As part of the detachContainer all the resources associated with the pre-initialized container are now associated with the new container and get cleaned up accordingly. The other use case where we have prototyped container pooling is the scenario where a container actually needs to be a Virtual Machine. Creation of VMs can take a long time thus container pooling allows us to keep the empty VM shells ready to go. bq. Reusing containers across different applications is going to create some interesting scenarios that don't exist today. For example, what does a container ID for one of these looks like? How many things today assume that all container IDs for an application are essentially prefixed by the application ID? This would violate that assumption, unless we introduce some sort of container ID aliasing where we create a "fake" container ID that maps to the "real" ID of the reused container. It would be good to know how we're going to treat container IDs and what applications will see when they get one of these containers in response to their allocation request. All pre-initialized containers belong to a specific application type. There is a dummy application created to which the pre-initialized container are mapped. As part of containerAttach and containerDetach event we disassociate the containers between application. Specifically ContainerExecutor has a mapping of container ID to PID file and as part of container detach we update this mappi
[jira] [Comment Edited] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858366#comment-15858366 ] Hitesh Sharma edited comment on YARN-6059 at 2/8/17 6:37 PM: - [~asuresh], the patch is renamed. was (Author: hrsharma): Renaming the patch. > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-6059: Attachment: YARN-6059-YARN-5972.001.patch Renaming the patch. > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch, > YARN-6059-YARN-5972.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5501) Container Pooling in YARN
[ https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma reassigned YARN-5501: --- Assignee: Hitesh Sharma > Container Pooling in YARN > - > > Key: YARN-5501 > URL: https://issues.apache.org/jira/browse/YARN-5501 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: Container Pooling - one pager.pdf > > > This JIRA proposes a method for reducing the container launch latency in > YARN. It introduces a notion of pooling *Unattached Pre-Initialized > Containers*. > Proposal in brief: > * Have a *Pre-Initialized Container Factory* service within the NM to create > these unattached containers. > * The NM would then advertise these containers as special resource types > (this should be possible via YARN-3926). > * When a start container request is received by the node manager for > launching a container requesting this specific type of resource, it will take > one of these unattached pre-initialized containers from the pool, and use it > to service the container request. > * Once the request is complete, the pre-initialized container would be > released and ready to serve another request. > This capability would help reduce container launch latencies and thereby > allow for development of more interactive applications on YARN. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5501) Container Pooling in YARN
[ https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5501: Attachment: Container Pooling - one pager.pdf > Container Pooling in YARN > - > > Key: YARN-5501 > URL: https://issues.apache.org/jira/browse/YARN-5501 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun Suresh > Attachments: Container Pooling - one pager.pdf > > > This JIRA proposes a method for reducing the container launch latency in > YARN. It introduces a notion of pooling *Unattached Pre-Initialized > Containers*. > Proposal in brief: > * Have a *Pre-Initialized Container Factory* service within the NM to create > these unattached containers. > * The NM would then advertise these containers as special resource types > (this should be possible via YARN-3926). > * When a start container request is received by the node manager for > launching a container requesting this specific type of resource, it will take > one of these unattached pre-initialized containers from the pool, and use it > to service the container request. > * Once the request is complete, the pre-initialized container would be > released and ready to serve another request. > This capability would help reduce container launch latencies and thereby > allow for development of more interactive applications on YARN. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5501) Container Pooling in YARN
[ https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857192#comment-15857192 ] Hitesh Sharma commented on YARN-5501: - Attaching a one pager design doc to capture some of the details. This is still an early draft so appreciate some feedback. > Container Pooling in YARN > - > > Key: YARN-5501 > URL: https://issues.apache.org/jira/browse/YARN-5501 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun Suresh > Attachments: Container Pooling - one pager.pdf > > > This JIRA proposes a method for reducing the container launch latency in > YARN. It introduces a notion of pooling *Unattached Pre-Initialized > Containers*. > Proposal in brief: > * Have a *Pre-Initialized Container Factory* service within the NM to create > these unattached containers. > * The NM would then advertise these containers as special resource types > (this should be possible via YARN-3926). > * When a start container request is received by the node manager for > launching a container requesting this specific type of resource, it will take > one of these unattached pre-initialized containers from the pool, and use it > to service the container request. > * Once the request is complete, the pre-initialized container would be > released and ready to serve another request. > This capability would help reduce container launch latencies and thereby > allow for development of more interactive applications on YARN. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856654#comment-15856654 ] Hitesh Sharma commented on YARN-6059: - Ping..[~asuresh], [~kkaranasos]..can you guys take a look at the patch? The current patch is a very raw implementation and before I refine it would be good to agree on a high level approach here. Thank you. > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-6059: Attachment: YARN-5216-YARN-6059.001.patch > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5216-YARN-6059.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6059) Update paused container state in the state store
[ https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15808319#comment-15808319 ] Hitesh Sharma commented on YARN-6059: - [~asuresh], the strategy for paused containers would depend upon what we intend to do for opp. containers which is something we need to work upon. We should open a separate JIRA to discuss how opp. containers can be recovered (there shouldn't be anything special there whether they are paused or running opp. containers). In this JIRA I'm only looking to make the changes to state store to ensure paused containers are reflected properly over there. > Update paused container state in the state store > > > Key: YARN-6059 > URL: https://issues.apache.org/jira/browse/YARN-6059 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6059) Update paused container state in the state store
Hitesh Sharma created YARN-6059: --- Summary: Update paused container state in the state store Key: YARN-6059 URL: https://issues.apache.org/jira/browse/YARN-6059 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Sharma Assignee: Hitesh Sharma -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15775615#comment-15775615 ] Hitesh Sharma commented on YARN-5216: - Resolving some feedback comments from [~asuresh]. Thanks! > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN-5216-YARN-5972.001.patch, > YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, > YARN-5216-YARN-5972.004.patch, YARN-5216-YARN-5972.005.patch, > YARN-5216-YARN-5972.006.patch, YARN-5216-YARN-5972.007.patch, > YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5216: Attachment: YARN-5216-YARN-5972.007.patch > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN-5216-YARN-5972.001.patch, > YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, > YARN-5216-YARN-5972.004.patch, YARN-5216-YARN-5972.005.patch, > YARN-5216-YARN-5972.006.patch, YARN-5216-YARN-5972.007.patch, > YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5216: Attachment: YARN-5216-YARN-5972.006.patch > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN-5216-YARN-5972.001.patch, > YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, > YARN-5216-YARN-5972.004.patch, YARN-5216-YARN-5972.005.patch, > YARN-5216-YARN-5972.006.patch, YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768666#comment-15768666 ] Hitesh Sharma commented on YARN-5216: - Ok, fair point regarding the dispatcher. Updating the patch. > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN-5216-YARN-5972.001.patch, > YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, > YARN-5216-YARN-5972.004.patch, YARN-5216-YARN-5972.005.patch, > YARN-5216-YARN-5972.006.patch, YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5216: Attachment: YARN-5216-YARN-5972.005.patch > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN-5216-YARN-5972.001.patch, > YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, > YARN-5216-YARN-5972.004.patch, YARN-5216-YARN-5972.005.patch, > YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768498#comment-15768498 ] Hitesh Sharma commented on YARN-5216: - Hi [~asuresh], thanks for the feedback. I have incorporated the feedback and improved the test case to exercise more code path. bq. Instead of explicitly calling "dispatcher.getEventHandler().handle(..)" from within ContainerScheduler, can you create a method inside Container: sendPauseEvent(String) and sendResumeEvent(String) I'm not so sure about adding anything into the Container interface as pause/resume is only for opportunistic containers. We can do that when support for the same is added into guaranteed containers. > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN-5216-YARN-5972.001.patch, > YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, > YARN-5216-YARN-5972.004.patch, YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5216: Attachment: YARN-5216-YARN-5972.004.patch Fix build warning > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN-5216-YARN-5972.001.patch, > YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, > YARN-5216-YARN-5972.004.patch, YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750141#comment-15750141 ] Hitesh Sharma commented on YARN-5216: - Hi all, thank you for the feedback, I really appreciate it. [~asuresh] and I discussed offline and decided to consider the reclaimResources API in container executor as a separate JIRA in the future. > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN-5216-YARN-5972.001.patch, > YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, > YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5216: Attachment: YARN-5216-YARN-5972.003.patch Resolving javac warning. > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN-5216-YARN-5972.001.patch, > YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, > YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5216: Attachment: YARN-5216-YARN-5972.002.patch Resolving build issues > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN-5216-YARN-5972.001.patch, > YARN-5216-YARN-5972.002.patch, YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15738832#comment-15738832 ] Hitesh Sharma edited comment on YARN-5216 at 12/11/16 2:22 AM: --- I will wait for [~asuresh] to clarify here, but I think one thing on the table is to add a preempt API in container executor, which has a knob to allow KILL vs PAUSE. I captured some concerns with that, but do agree that we don't need two params - one in scheduler and other in executor. was (Author: hrsharma): I will wait for [~asuresh] to clarify here, but I think one thing on the table is to add a preempt API in container executor, which has a knob to allow KILL vs PAUSE. I captured some concerns with that, but do agree that we don't need to params - one in scheduler and other in executor. > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN-5216-YARN-5972.001.patch, YARN5216.001.patch, > yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15738832#comment-15738832 ] Hitesh Sharma commented on YARN-5216: - I will wait for [~asuresh] to clarify here, but I think one thing on the table is to add a preempt API in container executor, which has a knob to allow KILL vs PAUSE. I captured some concerns with that, but do agree that we don't need to params - one in scheduler and other in executor. > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN-5216-YARN-5972.001.patch, YARN5216.001.patch, > yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15738826#comment-15738826 ] Hitesh Sharma commented on YARN-5216: - [~asuresh], I'm a little confused, so to be clear are you suggesting that we have a preempt API in container executor and have a conf knob to select type of preemption (PAUSE or KILL), is that correct? Preempt is a very overloaded term and the current semantics work for scheduler, but not sure whether they can be extended to work preservation, or some other scenario. Currently if a container is preempted via PAUSE then it gets RESUMED when there is free capacity but such a behavior may not be acceptable if the scenario for calling preempt API is different. In other words, having the scheduler call PAUSE or KILL is a very clear choice that needs to be made, but we can't say that PAUSE should be the preemption policy for the container in all cases. Further container PAUSE is only for opp. container and not something we have enabled for GUARANTEED and thus I would scope it down to the scheduler for now. I do want to say that at some point in future it would make sense to have a preempt API and a behavior that can be consistent across all scenarios, but I don't think we are there yet, just my $0.02. > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN-5216-YARN-5972.001.patch, YARN5216.001.patch, > yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15738398#comment-15738398 ] Hitesh Sharma commented on YARN-5216: - Hi [~asuresh], are you saying that we add a preempt method in the container executor and have the scheduler call that instead? One issue with that approach is that we need to add a conf knob in the executor to pick the kind of preemption you want and that also becomes a little too generic and ambiguous. What we are looking for is preemption of opp. container to schedule a guaranteed one and I feel that's best captured in the scheduler state. LMK your thoughts. > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN-5216-YARN-5972.001.patch, YARN5216.001.patch, > yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5216: Attachment: YARN-5216-YARN-5972.001.patch Posting a patch based on YARN-5972. > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN-5216-YARN-5972.001.patch, YARN5216.001.patch, > yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5292) NM Container lifecycle and state transitions to support for PAUSED container state.
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731526#comment-15731526 ] Hitesh Sharma commented on YARN-5292: - Hi [~kasha], the design doc and the patch allow an opportunistic container to be paused and resumed. The actual implementation of the pluggable interface will be as part of [YARN-5196] and there via yarn-site.xml you can specify the preemption policy for an opportunistic container when there is a guaranteed container waiting to run (i.e. pause or kill). Currently only the NM scheduler can raise the PAUSE/RESUME events and there is no support for doing so in the container management protocol. One of the questions on the table is if there is a need to extend support of PAUSE/RESUME to guaranteed containers and have the AMs initiate that. Would love to hear some thoughts on that and if there are any use cases that can benefit from that. > NM Container lifecycle and state transitions to support for PAUSED container > state. > --- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, YARN-5292.002.patch, > YARN-5292.003.patch, YARN-5292.004.patch, YARN-5292.005.patch, yarn-5292.pdf > > > This JIRA addresses the NM Container and state machine and lifecycle changes > needed to support pausing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5959) RM changes to support change of container ExecutionType
[ https://issues.apache.org/jira/browse/YARN-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727312#comment-15727312 ] Hitesh Sharma commented on YARN-5959: - Hello [~asuresh], can you share some design around how NM handles the change in execution type? I will look at the patch more closely but having that context in mind will help. Thanks a lot! > RM changes to support change of container ExecutionType > --- > > Key: YARN-5959 > URL: https://issues.apache.org/jira/browse/YARN-5959 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5959.combined.001.patch, YARN-5959.wip.002.patch, > YARN-5959.wip.patch > > > RM side changes to allow an AM to ask for change of ExecutionType. > Currently, there are two cases: > # *Promotion* : OPPORTUNISTIC to GUARANTEED. > # *Demotion* : GUARANTEED to OPPORTUNISTIC. > This is similar in YARN-1197 which allows for change in Container resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5966) AMRMClient changes to support ExecutionType update
[ https://issues.apache.org/jira/browse/YARN-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727307#comment-15727307 ] Hitesh Sharma commented on YARN-5966: - Just a quick clarification, how is it different than [YARN-5087]. Is this an extension of that patch? > AMRMClient changes to support ExecutionType update > -- > > Key: YARN-5966 > URL: https://issues.apache.org/jira/browse/YARN-5966 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5966.wip.001.patch > > > {{AMRMClient}} changes to support change of container ExecutionType -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5972) Add Support for Pausing/Freezing of containers
[ https://issues.apache.org/jira/browse/YARN-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727256#comment-15727256 ] Hitesh Sharma commented on YARN-5972: - Hi folks, thanks for opening this JIRA and the feedback. Much appreciated. {quote} While notifying an AM of containers that are about to be preempted does allow the AM to check-point work, it does imply, as you pointed out, that AMs be modified to act on this input and make some decisions based on it. Container pausing/freezing on the other hand, given OS/VM level support (also exposed via Docker and LXC) to actually freeze a process (agreed, their definition of freeze might vary), is actually AM/application independent. This can be useful, for applications and deployments that do not really want to check-point on its own but at the same time like the idea of container preemption with work preservations. {quote} Agree with [~asuresh] here. What container pausing/freezing offers is an ability to delegate to the underlying OS how the resources used by a container should be reclaimed and when resources free up again then restart the container. The gains of doing so will vary based on the container executor implementation. That said it doesn't make the PAUSE/RESUME functionality to be the perfect solution for work preservation or substitute AM specific checkpointing. [YARN-5292] adds PAUSE/RESUME for opportunistic containers and doesn't target guaranteed containers. I can think of scenarios where it is good to have this functionality in guaranteed containers but I would wait and see some need coming in the community. Allowing the ContainerManager to initiate a pause/resume on an opportunistic container was considered but we decided not to have that functionality. There are some edge cases around what happens if the CM initiates a RESUME on a paused container and the NM tries to PAUSE it ([YARN-5216]). I think [~subru] is also touching towards these edge cases. Overall I feel that the current design of allowing PAUSE/RESUME on opportunistic containers is a good starting point and allows to PAUSE an opportunistic container in favor of a guaranteed one and when resources free up it gets RESUMED ([YARN-5216]). We should probably implement pauseContainer and resumeContainer for Docker based container executors as opportunistic containers running inside Docker containers can benefit from it. If the community feels then we can extend the functionality towards guaranteed containers. I personally think that may become more relevant as YARN containers become virtualized via Docker or virtual machines, but I would love to hear some scenarios before we do that. > Add Support for Pausing/Freezing of containers > -- > > Key: YARN-5972 > URL: https://issues.apache.org/jira/browse/YARN-5972 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > Instead of preempting a running container, the container can be moved to a > PAUSED state, where it remains until resources get freed up on the node then > the preempted container can resume to the running state. > Note that process freezing this is already supported by 'cgroups freezer' > which is used internally by the docker pause functionality. Windows also has > OS level support of a similar nature. > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt call would > pause the VM and resume would restore it back to the running state. > If the container executor / runtime doesn't support preemption, then preempt > would default to killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5292: Attachment: YARN-5292.005.patch Resolving review feedback. > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, YARN-5292.002.patch, > YARN-5292.003.patch, YARN-5292.004.patch, YARN-5292.005.patch, yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15720879#comment-15720879 ] Hitesh Sharma commented on YARN-5292: - Hi [~asuresh], thanks a lot for the feedback! 1. The default behavior is to throw an exception which is caught by the ContainerLauncher and proceeds to kill the container. So if no PAUSE/RESUME support exists then we kill the container. On a side note, we can open a JIRA to implement PAUSE/RESUME for some of the executors like Docker. 2. Took care of collapsing transitions into one. 3. If the container is REINITIALIZLING and we get a PAUSE then the behavior is undeterministic. Pausing the container when it hasn't finished reinitialization can be be bad thus we kill instead. I feel it would be quite complicated if we try to add the container back to the scheduler queue somehow thus let's not try to do so. 4. Good point. Done. Please have a look at the posted patch. > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, YARN-5292.002.patch, > YARN-5292.003.patch, YARN-5292.004.patch, yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713883#comment-15713883 ] Hitesh Sharma commented on YARN-5292: - [~arun suresh], can you please look at the attached patch? Thanks! > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, YARN-5292.002.patch, > YARN-5292.003.patch, YARN-5292.004.patch, yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713883#comment-15713883 ] Hitesh Sharma edited comment on YARN-5292 at 12/2/16 3:21 AM: -- [~asuresh], can you please look at the attached patch? Thanks! was (Author: hrsharma): [~asuresh]], can you please look at the attached patch? Thanks! > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, YARN-5292.002.patch, > YARN-5292.003.patch, YARN-5292.004.patch, yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713883#comment-15713883 ] Hitesh Sharma edited comment on YARN-5292 at 12/2/16 3:21 AM: -- [~asuresh]], can you please look at the attached patch? Thanks! was (Author: hrsharma): [~arun suresh], can you please look at the attached patch? Thanks! > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, YARN-5292.002.patch, > YARN-5292.003.patch, YARN-5292.004.patch, yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3611) Support Docker Containers In LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710732#comment-15710732 ] Hitesh Sharma commented on YARN-3611: - Hi folks, Docker is now available on Windows and is fully supported by Docker INC (I'm talking about launching Windows containers via Docker). https://www.docker.com/microsoft Unfortunately in the current design Docker is being limited to Linux only. I think we need to revisit this and have a way to share the same code across Docker support for Windows and Linux. Another goal to keep in mind is to have DockerContainerExecutor be completely OS agnostic. As in certain cases Docker client might actually be talking to a daemon on a remote machine or a VM (which maybe Linux or Windows). Would love to hear some thoughts on how to achieve Docker support for Windows by reusing all the good work being done here. Thanks! > Support Docker Containers In LinuxContainerExecutor > --- > > Key: YARN-3611 > URL: https://issues.apache.org/jira/browse/YARN-3611 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > > Support Docker Containers In LinuxContainerExecutor > LinuxContainerExecutor provides useful functionality today with respect to > localization, cgroups based resource management and isolation for CPU, > network, disk etc. as well as security with a well-defined mechanism to > execute privileged operations using the container-executor utility. Bringing > docker support to LinuxContainerExecutor lets us use all of this > functionality when running docker containers under YARN, while not requiring > users and admins to configure and use a different ContainerExecutor. > There are several aspects here that need to be worked through : > * Mechanism(s) to let clients request docker-specific functionality - we > could initially implement this via environment variables without impacting > the client API. > * Security - both docker daemon as well as application > * Docker image localization > * Running a docker container via container-executor as a specified user > * “Isolate” the docker container in terms of CPU/network/disk/etc > * Communicating with and/or signaling the running container (ensure correct > pid handling) > * Figure out workarounds for certain performance-sensitive scenarios like > HDFS short-circuit reads > * All of these need to be achieved without changing the current behavior of > LinuxContainerExecutor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5292: Attachment: YARN-5292.004.patch Adding test case and raising an event for the scheduler to know that the container was paused. > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, YARN-5292.002.patch, > YARN-5292.003.patch, YARN-5292.004.patch, yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692011#comment-15692011 ] Hitesh Sharma commented on YARN-5292: - Hi [~jianhe], apologies for the late response. It seems that [YARN-4876] adds the functionality to do what you are describing. Please let me know if you have something else in mind here. Thanks! > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, YARN-5292.002.patch, > YARN-5292.003.patch, yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686049#comment-15686049 ] Hitesh Sharma commented on YARN-5292: - Thanks [~subru] for the comments. I agree with you that we need to think separately about paused containers in regards to opp. and guaranteed execution type. Most of the discussion in this JIRA is targeted towards opp. I will open a new JIRA to discuss pause/resume for YARN containers and this one can be used for opp. containers. > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, YARN-5292.002.patch, > YARN-5292.003.patch, yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686030#comment-15686030 ] Hitesh Sharma commented on YARN-5292: - [~jianhe], can you elaborate a little on the use case and scenario of pause/resume for long running service? It isn't fully clear to me how that will be used so appreciate the help. > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, YARN-5292.002.patch, > YARN-5292.003.patch, yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15682104#comment-15682104 ] Hitesh Sharma commented on YARN-5292: - Thanks for the comments, [~arun suresh]. Regarding 1, the actual JIRA to avoid killing of opp. containers is [YARN-5216]. I'm working on a patch for that which works on top of the new schedule state. In our offline discussions we have talked about having APIs in ContainerManagementProtocol that allow PAUSE/RESUME on a container. The current implementation is only for opp. containers so there was no need to add anything to the ContainerManagementProtocol, but we can definitely extend it to guaranteed containers and make the required changes. I think the PAUSE/RESUME semantics are particularly of interest for Docker containers and I will be happy to help with any related work in this area. I have test cases as part of the patch for [YARN-5216] and that will test this code path. Please take a look at the patch a bit more closely so I can address other feedback. > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, YARN-5292.002.patch, > YARN-5292.003.patch, yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15672702#comment-15672702 ] Hitesh Sharma commented on YARN-5292: - [~asuresh], can you also please take a look at this patch? Thank you so much! > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, YARN-5292.002.patch, > YARN-5292.003.patch, yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5292: Attachment: YARN-5292.003.patch Rebasing with latest changes from trunk. > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, YARN-5292.002.patch, > YARN-5292.003.patch, yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1593) support out-of-proc AuxiliaryServices
[ https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668649#comment-15668649 ] Hitesh Sharma commented on YARN-1593: - Thanks [~asuresh] for pointing to [YARN-5501]. Agree with you folks that there is some overlap and we will be happy to converge and discuss the best way to leverage the efforts here. [~vvasudev], with regards to pooled container the behavior is to allow NM to serve container requests even if the pre-initialized container is not ready. For container pooling this behavior makes sense as we eventually want to advertise pre-initialized container as a resource and have the AM ask for it. Regarding the 2nd point, current implementation starts a fixed number of pre-initialized container on each node (what to start, resources to localize, and other details are currently passed via config files). Eventually we intend the RM to pick up some nodes where the pre-initialized container should be started. This is something we are starting to work upon. > support out-of-proc AuxiliaryServices > - > > Key: YARN-1593 > URL: https://issues.apache.org/jira/browse/YARN-1593 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, rolling upgrade >Reporter: Ming Ma >Assignee: Varun Vasudev > Attachments: SystemContainersandSystemServices.pdf > > > AuxiliaryServices such as ShuffleHandler currently run in the same process as > NM. There are some benefits to host them in dedicated processes. > 1. NM rolling restart. If we want to upgrade YARN , NM restart will force the > ShuffleHandler restart. If ShuffleHandler runs as a separate process, > ShuffleHandler can continue to run during NM restart. NM can reconnect the > the running ShuffleHandler after restart. > 2. Resource management. It is possible another type of AuxiliaryServices will > be implemented. AuxiliaryServices are considered YARN application specific > and could consume lots of resources. Running AuxiliaryServices in separate > processes allow easier resource management. NM could potentially stop a > specific AuxiliaryServices process from running if it consumes resource way > above its allocation. > Here are some high level ideas: > 1. NM provides a hosting process for each AuxiliaryService. Existing > AuxiliaryService API doesn't change. > 2. The hosting process provides RPC server for AuxiliaryService proxy object > inside NM to connect to. > 3. When we rolling restart NM, the existing AuxiliaryService processes will > continue to run. NM could reconnect to the running AuxiliaryService processes > upon restart. > 4. Policy and resource management of AuxiliaryServices. So far we don't have > immediate need for this. AuxiliaryService could run inside a container and > its resource utilization could be taken into account by RM and RM could > consider a specific type of applications overutilize cluster resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5292: Attachment: YARN-5292.002.patch Fixing the build issues. [~asuresh], can you please take a look? > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, YARN-5292.002.patch, yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5292: Attachment: YARN-5292.001.patch Initial implementation of PAUSE and RESUME in YARN container state machine. > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma >Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5501) Container Pooling in YARN
[ https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414834#comment-15414834 ] Hitesh Sharma edited comment on YARN-5501 at 8/10/16 7:11 AM: -- Thanks [~atris] for the comments. These are good points but the answers depend upon the implementation of the container. 1) Yes, there will be some overhead of maintaining the pooled containers but that's a trade off to optimize for launch latencies. Containers can however implement some custom behaviors to lower the overhead. As an e.g. if the container supports PAUSE and RESUME semantics ([YARN-5292]) then a pooled container could be PAUSED. Some other container could however chose to resize the allocation to a minimum and resize as per the actual resource request. 2) I'm not sure if I follow the comment here. Pooled containers are useful to lower the launch latencies and that's independent of the actual container run time. 3) That would be implementation specific. A pooled container is in itself a resource and when acquired by a request would need to be adjusted accordingly. We will be posting some more design and implementation details which will hopefully help clarify the ideas here. was (Author: hrsharma): Thanks [~atris] for the comments. These are good points but the answers depend upon the implementation of the container. 1) Yes, there will be some overhead of maintaining the pooled containers but that's a trade off to optimize for launch latencies. Containers can however implement some custom behaviors to lower the overhead. As an e.g. if the container supports PAUSE and RESUME semantics [YARN-5292] then a pooled container could be PAUSED. Some other container could however chose to resize the allocation to a minimum and resize as per the actual resource request. 2) I'm not sure if I follow the comment here. Pooled containers are useful to lower the launch latencies and that's independent of the actual container run time. 3) That would be implementation specific. A pooled container is in itself a resource and when acquired by a request would need to be adjusted accordingly. We will be posting some more design and implementation details which will hopefully help clarify the ideas here. > Container Pooling in YARN > - > > Key: YARN-5501 > URL: https://issues.apache.org/jira/browse/YARN-5501 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun Suresh > > This JIRA proposes a method for reducing the container launch latency in > YARN. It introduces a notion of pooling *Unattached Pre-Initialized > Containers*. > Proposal in brief: > * Have a *Pre-Initialized Container Factory* service within the NM to create > these unattached containers. > * The NM would then advertise these containers as special resource types > (this should be possible via YARN-3926). > * When a start container request is received by the node manager for > launching a container requesting this specific type of resource, it will take > one of these unattached pre-initialized containers from the pool, and use it > to service the container request. > * Once the request is complete, the pre-initialized container would be > released and ready to serve another request. > This capability would help reduce container launch latencies and thereby > allow for development of more interactive applications on YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5501) Container Pooling in YARN
[ https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414834#comment-15414834 ] Hitesh Sharma commented on YARN-5501: - Thanks [~atris] for the comments. These are good points but the answers depend upon the implementation of the container. 1) Yes, there will be some overhead of maintaining the pooled containers but that's a trade off to optimize for launch latencies. Containers can however implement some custom behaviors to lower the overhead. As an e.g. if the container supports PAUSE and RESUME semantics [YARN-5292] then a pooled container could be PAUSED. Some other container could however chose to resize the allocation to a minimum and resize as per the actual resource request. 2) I'm not sure if I follow the comment here. Pooled containers are useful to lower the launch latencies and that's independent of the actual container run time. 3) That would be implementation specific. A pooled container is in itself a resource and when acquired by a request would need to be adjusted accordingly. We will be posting some more design and implementation details which will hopefully help clarify the ideas here. > Container Pooling in YARN > - > > Key: YARN-5501 > URL: https://issues.apache.org/jira/browse/YARN-5501 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Arun Suresh > > This JIRA proposes a method for reducing the container launch latency in > YARN. It introduces a notion of pooling *Unattached Pre-Initialized > Containers*. > Proposal in brief: > * Have a *Pre-Initialized Container Factory* service within the NM to create > these unattached containers. > * The NM would then advertise these containers as special resource types > (this should be possible via YARN-3926). > * When a start container request is received by the node manager for > launching a container requesting this specific type of resource, it will take > one of these unattached pre-initialized containers from the pool, and use it > to service the container request. > * Once the request is complete, the pre-initialized container would be > released and ready to serve another request. > This capability would help reduce container launch latencies and thereby > allow for development of more interactive applications on YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5292: Attachment: yarn-5292.pdf Please find the attached document that describes some of the design and implementation details of adding PAUSE and RESUME states to YARN containers. Appreciate the feedback and comments. > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma > Attachments: yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374174#comment-15374174 ] Hitesh Sharma commented on YARN-5216: - We investigated a few approaches over here: * Have a subclass of {{QueuingContainersManagerImpl}}: this approach has some pros but the problem here is that subclassing just to override the preemption behavior isn't the right thing to do. * Having a pluggable policy in {{QueuingContainersManagerImpl}} requires extension points to select which containers to run, run the container, preempt the container, etc. This approach starts to get more complex as we look to add support for PAUSED containers [YARN-5292]. Based on the feedback here and the discussions we have had, I'm looking into adding support for PAUSED containers within {{QueuingContainersManagerImpl}}. That would simplify things quite a bit and allow us to have a more pluggable and cleaner design. [~asuresh], [~kkaranasos], thanks for all the feedback! > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Fix For: 2.9.0 > > Attachments: YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365262#comment-15365262 ] Hitesh Sharma commented on YARN-5216: - Thank you for the insights, [~kkaranasos]! Sorry, rebalancing wasn't the right terminology to use. I was referring to killing of queued containers that happens during {{shedQueuedOpportunisticContainers}} to enforce the queue limits, which in turns follows the paths you mention above. It might be a good idea to use start container to imply resume when the container is paused, but at the same time it also overloads the meaning of start container and given how different they are it can impose some challenges. Anyways, we can discuss this more in [YARN-5292]. {quote} As far as I can see, all you need from the NM to support preemption is (let me know if there are more things that I am missing): # Determine the way a container stops (option 1: kill, option 2: preempt). # Determine the way it start (that is, resume it if it's paused, instead of starting it from the beginning). # Decide which container to start (you might want to start first containers that are paused instead of new ones). {quote} How do you propose to do 3 without having an extension point to pick a container to start? The moment we have an extension point to pick a container to start we also need an extension point to pick up a container to kill for enforcing queue limits or something else. Appreciate the feedback and help. Thanks a lot! > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365085#comment-15365085 ] Hitesh Sharma commented on YARN-5216: - [~kkaranasos], I'm not sure if subclassing would work. We need to have more control on how the opportunistic containers are queued and how we start/preempt them. From a design point of view also {{QueuingContainersPreemptionManagerImpl }} is not really a queuing container manager, but just a specific way to preempt queued opportunistic containers. Thus composition seems a better choice here. Thank you. > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364893#comment-15364893 ] Hitesh Sharma commented on YARN-5216: - [~asuresh], [~kkaranasos]], thank you for the feedback and comments. Regarding the refactoring being done and the reason to pull queues into the currently named {{OpportunisticContainerManager}}: Roughly speaking the {{QueuingContainersManagerImpl}} does the following for starting and stopping opportunistic containers: * A running container simply gets preempted while a container waiting in the queue is removed and RM is notified to reallocate it elsewhere. * Periodically it is checked if there are too many waiting containers in the queue and they are removed so RM can rebalance them. * When a running container finishes then a waiting opportunistic container will be run if there are no guaranteed waiting in the queue. If the preemption policy is to kill the container then things are a little simpler and you can leave the opportunistic container queue within {{QueuingContainersManagerImpl}}. However if the preemption policy is different then we need extension points to know about the operations that the {{QueuingContainersManagerImpl}} wants to do and respond appropriately. Say the preemption policy is to put the container in a pause state so that it can be resumed once there is some room to run a container. This requires to distinguish between whether the {{QueuingContainersManagerImpl}} is looking to run pending containers (e.g. we want to resume a preempted container over an OC which is still waiting in the queue) or is looking to rebalance waiting containers to other nodes (e.g. we can't reallocate a container in the pause state). For pretty much these reasons the pluggable policy is named as {{OpportunisticContainerManager}} as it allows you to preempt and start the opportunistic containers and also manages the queue of the opportunistic containers. I'm open to suggestion on how to do this differently without having to change {{QueuingContainersManagerImpl}} a lot. [~asuresh], can you elaborate a little why {{queuedGuaranteedContainers}} should also be pulled into the {{OpportunisticContainerManagerImpl}}? I will look into using ServiceLoader framework over reflection and add an extra constant to determine the default value. Thank a lot for the feedback and comments. > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348806#comment-15348806 ] Hitesh Sharma commented on YARN-5292: - [~jianhe], the container would be resumed when the running containers finish on the node and resources are available. The main scenario here is for work preservation. If the container supports preemption via pause/freeze then it can be put in this hibernate mode and resumed when resources free up. For some applications it is quite expensive to throw away the work done by an opportunistic container and thus want to have the capability to preserve it. Thanks for the feedback. > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348557#comment-15348557 ] Hitesh Sharma edited comment on YARN-5292 at 6/24/16 10:33 PM: --- [~jlowe] and [~asuresh], appreciate the feedback. It's a good idea to have the "PAUSING" state. If the container fails to pause then we proceed to kill and terminate it. How the pausing is implemented is specific to the container so I'm not so sure if we need APIs to store state. Thanks again for the feedback. was (Author: hrsharma): [~Jason Lowe] and [~Arun Suresh], appreciate the feedback. It's a good idea to have the "PAUSING" state. If the container fails to pause then we proceed to kill and terminate it. How the pausing is implemented is specific to the container so I'm not so sure if we need APIs to store state. Thanks again for the feedback. > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348557#comment-15348557 ] Hitesh Sharma commented on YARN-5292: - [~Jason Lowe] and [~Arun Suresh], appreciate the feedback. It's a good idea to have the "PAUSING" state. If the container fails to pause then we proceed to kill and terminate it. How the pausing is implemented is specific to the container so I'm not so sure if we need APIs to store state. Thanks again for the feedback. > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Hitesh Sharma > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5292) Support for PAUSED container state
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5292: Summary: Support for PAUSED container state (was: Support for PAUSED state in a container) > Support for PAUSED container state > -- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature > Reporter: Hitesh Sharma > > JIRA 2877 introduced OPPORTUNISTIC containers, and JIRA 5216 adds capability > to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5292) Support for PAUSED state in a container
Hitesh Sharma created YARN-5292: --- Summary: Support for PAUSED state in a container Key: YARN-5292 URL: https://issues.apache.org/jira/browse/YARN-5292 Project: Hadoop YARN Issue Type: New Feature Reporter: Hitesh Sharma JIRA 2877 introduced OPPORTUNISTIC containers, and JIRA 5216 adds capability to customize how OPPORTUNISTIC containers get preempted. In this JIRA we propose introducing a PAUSED container state. When a running container gets preempted, it enters the PAUSED state, where it remains until resources get freed up on the node then the preempted container can resume to the running state. One scenario where this capability is useful is work preservation. How preemption is done, and whether the container supports it, is implementation specific. For instance, if the container is a virtual machine, then preempt would pause the VM and resume would restore it back to the running state. If the container doesn't support preemption, then preempt would default to killing the container. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5216: Attachment: yarn5216.002.patch > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343234#comment-15343234 ] Hitesh Sharma commented on YARN-5216: - Resolved the error and attached a new patch. > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5216: Attachment: YARN5216.001.patch > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: YARN5216.001.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma reassigned YARN-5216: --- Assignee: Hitesh Sharma (was: Arun Suresh) > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Hitesh Sharma > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5127) Expose ExecutionType in Container api record
[ https://issues.apache.org/jira/browse/YARN-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5127: Attachment: YARN-5127.005.patch > Expose ExecutionType in Container api record > > > Key: YARN-5127 > URL: https://issues.apache.org/jira/browse/YARN-5127 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: YARN-5127.002.patch, YARN-5127.003.patch, > YARN-5127.004.patch, YARN-5127.005.patch, YARN-5127.v1.patch > > > Currently the ExecutionType of the Container returned as a response to the > allocate call is contained in the {{ContinerTokenIdentifier}} which is > encoded into the ContainerToken. > Unfortunately, the client would need to decode the returned token to access > the ContainerTokenIdentifier, which probably should not be allowed. > This JIRA proposes to add a {{getExecutionType()}} method in the container > record. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5127) Expose ExecutionType in Container api record
[ https://issues.apache.org/jira/browse/YARN-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5127: Attachment: YARN-5127.004.patch > Expose ExecutionType in Container api record > > > Key: YARN-5127 > URL: https://issues.apache.org/jira/browse/YARN-5127 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: YARN-5127.002.patch, YARN-5127.003.patch, > YARN-5127.004.patch, YARN-5127.v1.patch > > > Currently the ExecutionType of the Container returned as a response to the > allocate call is contained in the {{ContinerTokenIdentifier}} which is > encoded into the ContainerToken. > Unfortunately, the client would need to decode the returned token to access > the ContainerTokenIdentifier, which probably should not be allowed. > This JIRA proposes to add a {{getExecutionType()}} method in the container > record. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5162) Exceptions thrown during AM registerAM call when Distributed Scheduling is Enabled
[ https://issues.apache.org/jira/browse/YARN-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5162: Attachment: YARN-5162.002.patch > Exceptions thrown during AM registerAM call when Distributed Scheduling is > Enabled > -- > > Key: YARN-5162 > URL: https://issues.apache.org/jira/browse/YARN-5162 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: YARN-5162.001.patch, YARN-5162.002.patch > > > The following Exception is seen and the AM fails to register with RM: > {noformat} > 16/05/24 17:09:26 INFO ipc.Server: Auth successful for > appattempt_146410856_0001_01 (auth:SIMPLE) > 16/05/24 17:09:26 INFO amrmproxy.AMRMProxyService: Registering application > master. Host: Port:0 Tracking Url: > 16/05/24 17:09:26 INFO amrmproxy.DefaultRequestInterceptor: Forwarding > registration request to the real YARN RM > 16/05/24 17:09:26 DEBUG nodemanager.NodeStatusUpdaterImpl: Node's > health-status : true, > 16/05/24 17:09:26 DEBUG nodemanager.NodeStatusUpdaterImpl: Sending out 1 > container statuses: [ContainerStatus: [ContainerId: > container_146410856_0001_01_000 > 001, ExecutionType: GUARANTEED, State: RUNNING, Capability: vCores:1>, Diagnostics: , ExitStatus: -1000, ]] > 16/05/24 17:09:26 WARN ipc.Client: Exception encountered while connecting to > the server : org.apache.hadoop.security.AccessControlException: Client cannot > authe > nticate via:[TOKEN] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5127) Expose ExecutionType in Container api record
[ https://issues.apache.org/jira/browse/YARN-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5127: Attachment: YARN-5127.003.patch > Expose ExecutionType in Container api record > > > Key: YARN-5127 > URL: https://issues.apache.org/jira/browse/YARN-5127 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: YARN-5127.002.patch, YARN-5127.003.patch, > YARN-5127.v1.patch > > > Currently the ExecutionType of the Container returned as a response to the > allocate call is contained in the {{ContinerTokenIdentifier}} which is > encoded into the ContainerToken. > Unfortunately, the client would need to decode the returned token to access > the ContainerTokenIdentifier, which probably should not be allowed. > This JIRA proposes to add a {{getExecutionType()}} method in the container > record. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5162) Exceptions thrown during AM registerAM call when Distributed Scheduling is Enabled
[ https://issues.apache.org/jira/browse/YARN-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5162: Attachment: YARN-5162.001.patch > Exceptions thrown during AM registerAM call when Distributed Scheduling is > Enabled > -- > > Key: YARN-5162 > URL: https://issues.apache.org/jira/browse/YARN-5162 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: YARN-5162.001.patch > > > The following Exception is seen and the AM fails to register with RM: > {noformat} > 16/05/24 17:09:26 INFO ipc.Server: Auth successful for > appattempt_146410856_0001_01 (auth:SIMPLE) > 16/05/24 17:09:26 INFO amrmproxy.AMRMProxyService: Registering application > master. Host: Port:0 Tracking Url: > 16/05/24 17:09:26 INFO amrmproxy.DefaultRequestInterceptor: Forwarding > registration request to the real YARN RM > 16/05/24 17:09:26 DEBUG nodemanager.NodeStatusUpdaterImpl: Node's > health-status : true, > 16/05/24 17:09:26 DEBUG nodemanager.NodeStatusUpdaterImpl: Sending out 1 > container statuses: [ContainerStatus: [ContainerId: > container_146410856_0001_01_000 > 001, ExecutionType: GUARANTEED, State: RUNNING, Capability: vCores:1>, Diagnostics: , ExitStatus: -1000, ]] > 16/05/24 17:09:26 WARN ipc.Client: Exception encountered while connecting to > the server : org.apache.hadoop.security.AccessControlException: Client cannot > authe > nticate via:[TOKEN] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5162) Exceptions thrown during AM registerAM call when Distributed Scheduling is Enabled
[ https://issues.apache.org/jira/browse/YARN-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302394#comment-15302394 ] Hitesh Sharma commented on YARN-5162: - On further debugging, there seems to be three issues: # It is seen that this is caused due to the {{SchedulerSecurityInfo}} class disallowing all protocols except the ApplicationMasterProtocol. # Once that was fixed, it was noticed that the AM always dies with a {{NullPointerException}}, since the {{DistSchedRegisterResponse}} did not contain a min and incr allocation capability # Finally, it looks like the {{finishApplicationMaster}} call does not work. This seems to be because the method is not declared in the proto file. > Exceptions thrown during AM registerAM call when Distributed Scheduling is > Enabled > -- > > Key: YARN-5162 > URL: https://issues.apache.org/jira/browse/YARN-5162 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Hitesh Sharma > > The following Exception is seen and the AM fails to register with RM: > {noformat} > 16/05/24 17:09:26 INFO ipc.Server: Auth successful for > appattempt_146410856_0001_01 (auth:SIMPLE) > 16/05/24 17:09:26 INFO amrmproxy.AMRMProxyService: Registering application > master. Host: Port:0 Tracking Url: > 16/05/24 17:09:26 INFO amrmproxy.DefaultRequestInterceptor: Forwarding > registration request to the real YARN RM > 16/05/24 17:09:26 DEBUG nodemanager.NodeStatusUpdaterImpl: Node's > health-status : true, > 16/05/24 17:09:26 DEBUG nodemanager.NodeStatusUpdaterImpl: Sending out 1 > container statuses: [ContainerStatus: [ContainerId: > container_146410856_0001_01_000 > 001, ExecutionType: GUARANTEED, State: RUNNING, Capability: vCores:1>, Diagnostics: , ExitStatus: -1000, ]] > 16/05/24 17:09:26 WARN ipc.Client: Exception encountered while connecting to > the server : org.apache.hadoop.security.AccessControlException: Client cannot > authe > nticate via:[TOKEN] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5127) Expose ExecutionType in Container api record
[ https://issues.apache.org/jira/browse/YARN-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5127: Attachment: YARN-5127.002.patch > Expose ExecutionType in Container api record > > > Key: YARN-5127 > URL: https://issues.apache.org/jira/browse/YARN-5127 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: YARN-5127.002.patch, YARN-5127.v1.patch > > > Currently the ExecutionType of the Container returned as a response to the > allocate call is contained in the {{ContinerTokenIdentifier}} which is > encoded into the ContainerToken. > Unfortunately, the client would need to decode the returned token to access > the ContainerTokenIdentifier, which probably should not be allowed. > This JIRA proposes to add a {{getExecutionType()}} method in the container > record. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5127) Expose ExecutionType in Container api record
[ https://issues.apache.org/jira/browse/YARN-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5127: Attachment: YARN-5127.v1.patch > Expose ExecutionType in Container api record > > > Key: YARN-5127 > URL: https://issues.apache.org/jira/browse/YARN-5127 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: YARN-5127.v1.patch > > > Currently the ExecutionType of the Container returned as a response to the > allocate call is contained in the {{ContinerTokenIdentifier}} which is > encoded into the ContainerToken. > Unfortunately, the client would need to decode the returned token to access > the ContainerTokenIdentifier, which probably should not be allowed. > This JIRA proposes to add a {{getExecutionType()}} method in the container > record. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5127) Expose ExecutionType in Container api record
[ https://issues.apache.org/jira/browse/YARN-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5127: Attachment: (was: YARN-5127.0001.patch) > Expose ExecutionType in Container api record > > > Key: YARN-5127 > URL: https://issues.apache.org/jira/browse/YARN-5127 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: YARN-5127.v1.patch > > > Currently the ExecutionType of the Container returned as a response to the > allocate call is contained in the {{ContinerTokenIdentifier}} which is > encoded into the ContainerToken. > Unfortunately, the client would need to decode the returned token to access > the ContainerTokenIdentifier, which probably should not be allowed. > This JIRA proposes to add a {{getExecutionType()}} method in the container > record. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5127) Expose ExecutionType in Container api record
[ https://issues.apache.org/jira/browse/YARN-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Sharma updated YARN-5127: Attachment: YARN-5127.0001.patch > Expose ExecutionType in Container api record > > > Key: YARN-5127 > URL: https://issues.apache.org/jira/browse/YARN-5127 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Attachments: YARN-5127.0001.patch > > > Currently the ExecutionType of the Container returned as a response to the > allocate call is contained in the {{ContinerTokenIdentifier}} which is > encoded into the ContainerToken. > Unfortunately, the client would need to decode the returned token to access > the ContainerTokenIdentifier, which probably should not be allowed. > This JIRA proposes to add a {{getExecutionType()}} method in the container > record. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org