[jira] [Updated] (YARN-6059) Update paused container state in the state store

2017-09-12 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-6059:

Attachment: YARN-6059-YARN-5972.012.patch

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0
>
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, 
> YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, 
> YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch, 
> YARN-6059-YARN-5972.009.patch, YARN-6059-YARN-5972.010.patch, 
> YARN-6059-YARN-5972.011.patch, YARN-6059-YARN-5972.012.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6059) Update paused container state in the state store

2017-04-17 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-6059:

Attachment: YARN-6059-YARN-5972.010.patch

Retriggering jenkins..

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, 
> YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, 
> YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch, 
> YARN-6059-YARN-5972.009.patch, YARN-6059-YARN-5972.010.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6059) Update paused container state in the state store

2017-03-03 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-6059:

Attachment: YARN-6059-YARN-5972.009.patch

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, 
> YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, 
> YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch, 
> YARN-6059-YARN-5972.009.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6059) Update paused container state in the state store

2017-03-03 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895484#comment-15895484
 ] 

Hitesh Sharma commented on YARN-6059:
-

Thanks [~kkaranasos] for the excellent feedback! Sorry for the delay in getting 
back as well.

I have resolved the comments in the latest patch. Unfortunately the logs for 
style check and java doc are purged out. I will look at the results of the 
latest patch and resolve them.

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, 
> YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, 
> YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5501) Container Pooling in YARN

2017-02-22 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879783#comment-15879783
 ] 

Hitesh Sharma commented on YARN-5501:
-

Hi [~asuresh], thanks for the feedback and sorry for the delay in responding.

bq. From the doc, it looks like "detach" implies removing the pre-initialized 
container from the pool and "attach" referrs to associating an app with a 
pooled container. It might be simpler if we treat the operation as atomic. In 
that sense, we can make do with just having an "attach" or "lease", where a 
pre-initialized container is associated with an app.

Not sure what atomic means here but we need to detach so that the YARN 
machinery can be updated to reflect the fact that the pre-initialized container 
was utilized. As part of this detaching we also can associate the resources 
(files downloaded) by the pre-initialized container to the actual application 
container that is going to use the pre-initialized container. This ensures that 
when the application container exits then all resources for pre-initialized 
container also get cleaned up.

bq. For the sake of simplicity. Maybe we should assume that once an application 
is assigned a container from the pool and it has "attached" to it, it is the 
application's container and the Pooling framework relinquishes ownership of. 
The container then completes normally and all resource accounting is billed 
against the app. The pool of containers can be re-populated externally by the 
pool manager component in the RM (beyond the scope of this currently)

Yes, agreed. We have the same thinking over here.

bq.  This is one of the reasons why I feel generalized resources would be 
useful here. Assume initialy we have a cluster with resources <10 vcores, 10 
GB> spread across 2 NMs equally. Lets say we allocate 4 pre-initialized 
containers (via the pooling component in the RM) of type foo each with <1 
vcore, 1 GB>. Lets say's we distribute it equally across the NMs. Once the 
pre-initialized containers have started, the total cluster resources would be 
<6 vcores, 6 GB, 4 foo>.
Each NM would have <3 vcores, 3 GB, 2 foo> available resources. Now if an app 
asks for <0 vcores, 0 GB, 1 foo>, it will be allocated 1 pooled container and 
the resources associated with 1 foo <1 vcore, 1 GB> can be accounted against 
the app. The app can also maybe ask for <1 vcore, 1 GB, 1 foo>, in which case, 
the app will still be assigned one of the pooled containers with the assumption 
that, the container's size can expand by <1 vcore, 1 GB> if required. 
Cgroups/JobObjects to be used to enforce this.

Agreed. 

bq. AM Container communication.

In our PoC we introduced a new API in the container executor (attachContainer) 
which is called when a pre-initialized container is used up by an actual AM. 
Either the ContainerExecutor or the ContainerRuntime could be used for this 
purpose. But for now the application would need to have a way for establishing 
communication with the pre-init container.

Thanks for the feedback guys. Appreciate the time and help.




> Container Pooling in YARN
> -
>
> Key: YARN-5501
> URL: https://issues.apache.org/jira/browse/YARN-5501
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: Container Pooling in YARN.pdf, Container Pooling - one 
> pager.pdf
>
>
> This JIRA proposes a method for reducing the container launch latency in 
> YARN. It introduces a notion of pooling *Unattached Pre-Initialized 
> Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create 
> these unattached containers.
> * The NM would then advertise these containers as special resource types 
> (this should be possible via YARN-3926).
> * When a start container request is received by the node manager for 
> launching a container requesting this specific type of resource, it will take 
> one of these unattached pre-initialized containers from the pool, and use it 
> to service the container request.
> * Once the request is complete, the pre-initialized container would be 
> released and ready to serve another request.
> This capability would help reduce container launch latencies and thereby 
> allow for development of more interactive applications on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6059) Update paused container state in the state store

2017-02-18 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873303#comment-15873303
 ] 

Hitesh Sharma commented on YARN-6059:
-

[~kkaranasos], thank you for the great feedback.

The updated patch resolves the issues you brought up. I'm not sure why those 
stylecheck issues are coming because things look ok to me. I will look at it 
again later.

One thing that I'm not clear about is how to interpret or understand the 
loadContainerState method in NMLevelDbStateStoreService. Can you elaborate what 
the if conditions look for because they are very confusing.

I understand the part about suffixes not being removed, but what does the check 
on rcsStatus do.

 else if (suffix.equals(CONTAINER_QUEUED_KEY_SUFFIX)) {
if (rcs.status == RecoveredContainerStatus.REQUESTED) {
  rcs.status = RecoveredContainerStatus.QUEUED;
}
  } 

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, 
> YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, 
> YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5501) Container Pooling in YARN

2017-02-16 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5501:

Attachment: Container Pooling in YARN.pdf

> Container Pooling in YARN
> -
>
> Key: YARN-5501
> URL: https://issues.apache.org/jira/browse/YARN-5501
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: Container Pooling in YARN.pdf, Container Pooling - one 
> pager.pdf
>
>
> This JIRA proposes a method for reducing the container launch latency in 
> YARN. It introduces a notion of pooling *Unattached Pre-Initialized 
> Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create 
> these unattached containers.
> * The NM would then advertise these containers as special resource types 
> (this should be possible via YARN-3926).
> * When a start container request is received by the node manager for 
> launching a container requesting this specific type of resource, it will take 
> one of these unattached pre-initialized containers from the pool, and use it 
> to service the container request.
> * Once the request is complete, the pre-initialized container would be 
> released and ready to serve another request.
> This capability would help reduce container launch latencies and thereby 
> allow for development of more interactive applications on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5501) Container Pooling in YARN

2017-02-16 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871293#comment-15871293
 ] 

Hitesh Sharma commented on YARN-5501:
-

Hi [~jlowe], thanks for the feedback. I have captured some of the discussion in 
the attached document.

[~arun suresh], [~vvasudev], please have look and share your thoughts. Look 
forward to the discussion.

> Container Pooling in YARN
> -
>
> Key: YARN-5501
> URL: https://issues.apache.org/jira/browse/YARN-5501
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: Container Pooling in YARN.pdf, Container Pooling - one 
> pager.pdf
>
>
> This JIRA proposes a method for reducing the container launch latency in 
> YARN. It introduces a notion of pooling *Unattached Pre-Initialized 
> Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create 
> these unattached containers.
> * The NM would then advertise these containers as special resource types 
> (this should be possible via YARN-3926).
> * When a start container request is received by the node manager for 
> launching a container requesting this specific type of resource, it will take 
> one of these unattached pre-initialized containers from the pool, and use it 
> to service the container request.
> * Once the request is complete, the pre-initialized container would be 
> released and ready to serve another request.
> This capability would help reduce container launch latencies and thereby 
> allow for development of more interactive applications on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5501) Container Pooling in YARN

2017-02-16 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871293#comment-15871293
 ] 

Hitesh Sharma edited comment on YARN-5501 at 2/17/17 6:50 AM:
--

Hi [~jlowe], thanks for the feedback. I have captured some of the discussion in 
the attached document.

[~asuresh]], [~vvasudev], please have look and share your thoughts. Look 
forward to the discussion.


was (Author: hrsharma):
Hi [~jlowe], thanks for the feedback. I have captured some of the discussion in 
the attached document.

[~arun suresh], [~vvasudev], please have look and share your thoughts. Look 
forward to the discussion.

> Container Pooling in YARN
> -
>
> Key: YARN-5501
> URL: https://issues.apache.org/jira/browse/YARN-5501
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: Container Pooling in YARN.pdf, Container Pooling - one 
> pager.pdf
>
>
> This JIRA proposes a method for reducing the container launch latency in 
> YARN. It introduces a notion of pooling *Unattached Pre-Initialized 
> Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create 
> these unattached containers.
> * The NM would then advertise these containers as special resource types 
> (this should be possible via YARN-3926).
> * When a start container request is received by the node manager for 
> launching a container requesting this specific type of resource, it will take 
> one of these unattached pre-initialized containers from the pool, and use it 
> to service the container request.
> * Once the request is complete, the pre-initialized container would be 
> released and ready to serve another request.
> This capability would help reduce container launch latencies and thereby 
> allow for development of more interactive applications on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6059) Update paused container state in the state store

2017-02-16 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-6059:

Attachment: YARN-6059-YARN-5972.008.patch

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, 
> YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, 
> YARN-6059-YARN-5972.007.patch, YARN-6059-YARN-5972.008.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6059) Update paused container state in the state store

2017-02-16 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-6059:

Attachment: YARN-6059-YARN-5972.007.patch

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, 
> YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch, 
> YARN-6059-YARN-5972.007.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6059) Update paused container state in the state store

2017-02-16 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-6059:

Attachment: YARN-6059-YARN-5972.006.patch

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, 
> YARN-6059-YARN-5972.005.patch, YARN-6059-YARN-5972.006.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6059) Update paused container state in the state store

2017-02-16 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-6059:

Attachment: YARN-6059-YARN-5972.005.patch

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch, 
> YARN-6059-YARN-5972.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6059) Update paused container state in the state store

2017-02-13 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864980#comment-15864980
 ] 

Hitesh Sharma commented on YARN-6059:
-

Thanks for the feedback, [~kkaranasos].

I have resolved the issues and posted a new patch.

Please note that some of the failures in unittests and javadoc aren't due to my 
changes, so I think we can ignore them.

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6059) Update paused container state in the state store

2017-02-13 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-6059:

Attachment: YARN-6059-YARN-5972.004.patch

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch, YARN-6059-YARN-5972.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6059) Update paused container state in the state store

2017-02-13 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-6059:

Attachment: YARN-6059-YARN-5972.003.patch

Resolving CR comments.

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch, 
> YARN-6059-YARN-5972.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6059) Update paused container state in the state store

2017-02-11 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862607#comment-15862607
 ] 

Hitesh Sharma commented on YARN-6059:
-

Hi [~kkaranasos], thanks for the feedback! [~asuresh], can you also take a 
look, please?

bq. The paused events are not stored in the NMStateStore. You need to add that 
in the ContainerScheduler, as we do for the QUEUED containers, e.g., with 
this.context.getNMStateStore().storeContainerQueued.

Thanks, fixed this.

bq. You need to make sure that, when a PAUSED container is relaunched, we add a 
new entry to the NMStateStore to mark it as RUNNING again.

We don't launch the paused container and instead we simply kill it. 

bq. In the RecoverPausedContainerLaunch, you should raise a ContainerEvent to 
indicate that the container finished its execution, like we do with the other 
*ContainerLaunch classes, with something like the following:

Hmm..since we kill the container an event for that will be raised 
automatically. Am I missing something?

bq. In RecoveredContainerLaunch, indentation needs to be fixed.

Can you elaborate? I don't see any stylecheck errors for this.



> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6059) Update paused container state in the state store

2017-02-11 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-6059:

Attachment: YARN-6059-YARN-5972.002.patch

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch, YARN-6059-YARN-5972.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5501) Container Pooling in YARN

2017-02-09 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860407#comment-15860407
 ] 

Hitesh Sharma commented on YARN-5501:
-

Thanks for the continued feedback, [~jlowe].

bq. Thanks for the detailed answers.  I highly recommend these get captured in 
a new revision of the design document along with diagrams to help others come 
up to speed and join the discussion.  Otherwise we have a wall of text in this 
JIRA that is essential reading for anyone understanding what is really being 
proposed.

Sure, I will do that.


bq. My thinking is the mere fact that the container finishes is indicative that 
it is ready for reuse.  Maybe there's an explicit API they call at the end of 
the task or whatever, but the same has to be done for this existing design as 
well -- we need to know when the container is ready to be reused.  The main 
difference I see in this approach is that there isn't an explicit 'pre-init' 
step where the users or admins need to premeditate what will be run.  Instead 
the first run of the app framework is the same performance it is today, but 
subsequent runs are faster since it can reused those cached containers.  Seems 
to me the most difficult part of this is coming up with an efficient container 
request protocol so YARN can know when it can reuse an old, cached container 
and when it cannot.  The existing proposal works around this by requiring the 
containers to be setup beforehand as special resource types, but that won't 
work for a general container caching approach.

Yes, once we have a way to know that the container is ready to be reused then 
we can issue a detach on it and add it to the container pool. We had a similar 
issue in our PoC where we needed to know that the container is pre-initialized 
or not (as the launched process can take some minutes to be fully ready). As 
you also said there is no protocol between YARN NM and the container to know 
what's going on, so we ended up looking for a trigger file in the pre-init 
container working directory to detect that pre-initialization is done, and the 
container can be inducted into the pool after that. We can use something 
similar to this for e.g. containers can create a trigger file and launcher 
looks for that. If found it detaches the container and inducts it into the 
pool. 

bq. They certainly are enforced, but how does the app know about the new 
constraints so they can either avoid getting shot or take advantage of the new 
space?  Simply updating the cgroup is not going to be sufficient.  Either the 
process will OOM because it slams into the new lower limit (potentially 
instantly if it is already bigger than the new limit) or it will be completely 
oblivious that it now has acres of memory that it can use.  If it tried to use 
it before it would fail, so how does it know it grew?  For example, the JVM 
can't magically do this unless the app is doing some sort of explicit off-heap 
memory management via direct buffers, etc. and is told about its memory limit.  
Simply updating the cgroup setting doesn't seem to be a sufficient 
communication channel here, so I'm curious how that's all you need to do for 
your scenario.

It is ok for our scenario because the process we pre-initialize don't do any 
work and simply initialize themselves. These processes also happen to be not 
JVM in general (and where we do use JVM I don't think we specify resource 
limits when starting the processes). When the actual AM comes around and issues 
work then the processes started a priori require resources and that time we 
adjust the cgroup or job object. Strictly speaking we aren't using resource 
resizing as it is in YARN but have our own mechanisms to update the resource 
constraints.

> Container Pooling in YARN
> -
>
> Key: YARN-5501
> URL: https://issues.apache.org/jira/browse/YARN-5501
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: Container Pooling - one pager.pdf
>
>
> This JIRA proposes a method for reducing the container launch latency in 
> YARN. It introduces a notion of pooling *Unattached Pre-Initialized 
> Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create 
> these unattached containers.
> * The NM would then advertise these containers as special resource types 
> (this should be possible via YARN-3926).
> * When a start container request is received by the node manager for 
> launching a container requesting this specific type of resource, it will take 
> one of these unattached pre-initialized containers from the pool, and use it 
> to service the container request.
> * Once the request is complete, the pre-initialized container would be 
> released and ready to serv

[jira] [Commented] (YARN-5501) Container Pooling in YARN

2017-02-08 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858926#comment-15858926
 ] 

Hitesh Sharma commented on YARN-5501:
-

[~jlowe], thanks for the great feedback and time taken to respond.

Some more details on how attach and detach container actually work.

PoolManager creates the pre-initialized containers and they are not different 
from regular containers in any real way. When ContainerManager receives a 
startContainer request then it issues a DETACH_CONTAINER event. The detach 
really exists to ensure that we can cleanup the state associated with the 
pre-init container but avoid cleaning up the resources localized. 
ContainerManager listens for CONTAINER_DETACHED event and once it receives that 
then it creates the ContainerImpl for the requested container, but passes the 
information related to the detached container as the ContainerImpl c'tor. The 
ContainerManager also follows through the regular code paths of starting the 
container, which means that resource localization happens for the new 
container, and when it comes to raising the launch event then the ContainerImpl 
instead raises the ATTACH_CONTAINER event. This allows the ContainersLauncher 
to call the attachContainer on the executor, which is where we make the choice 
of launching the other processes required for that container. I hope this helps 
clarify things a little bit more.

bq. I'm thinking of a use-case where the container is a base set that applies 
to all instances of an app framework, but each app may need a few extra things 
localized to do an app-specific thing (think UDFs for Hive/Pig, etc.). Curious 
if that is planned and how to deal with the lifecycle of those "extra" per-app 
things.

Yes, the base set of things applies to all instances of the app framework. But 
localization is still done for each instance so you can for e.g. download a set 
of binaries via pre-initialization but more job specific things can come later.

bq. So it sounds like there is a new container ID generated in the 
application's container namespace as part of the "allocation" to fill the app's 
request, but this container ID is aliased to an already existing container ID 
in another application's namespace, not only at the container executor level 
but all the way up to the container ID seen at the app level, correct?

The application gets a container ID from YARN RM and uses that for all 
purposes. On the NM we internally switch to use the pre-init container ID as 
the PID. For e.g. pre-init container had the ID container1234 while the AM 
requested container had the ID containerABCD. Even though we reuse the existing 
pre-init container1234 to service the start container request on the NM we 
never surface container1234 to the application and the app always sees 
containerABCD.

bq. One idea is to treat these things like the page cache in Linux. In other 
words, we keep a cache of idle containers as apps run them. These containers, 
like page cache entries, will be quickly discarded if they are unused and we 
need to make room for other containers. We're simply caching successful 
containers that have been run on the cluster, ready to run another task just 
like it. Apps would still need to make some tweaks to their container code so 
it talks the yet-to-be-detailed-and-mysterious attach/detach protocol so they 
can participate in this automatic container cache, and there would need to be 
changes in how containers are requested so the RM can properly match a request 
to an existing container (something that already has to be done for any reuse 
approach). Seems like it would adapt well to shifting loads on the cluster and 
doesn't require a premeditated, static config by users to get their app load to 
benefit. Has something like that been considered?

That is a very interesting idea. If the app can provide some hints as to when 
it is good to consider a container pre-initialized then when the container 
finishes we can carry out the required operations to go back to the pre-init 
state. Thanks for bringing this up.

bq. I think that's going to be challenging for the apps in practice and will 
limit which apps can leverage this feature reliably. This is going to be 
challenging for containers runniing VMs whose memory limits need to be setup at 
startup (e.g.: JVMs). Minimally I think this feature needs a way for apps to 
specify that they do not have a way to communicate (or at least act upon) 
memory changes. In those cases YARN will have to decide on tradeoffs like a 
primed-but-oversized container that will run fast but waste grid resources and 
also avoid reusing a container that needs to grow to satisfy the app 
request.

Hmm..let me look at the code and see how container resizing works today. What 
you are saying makes sense, but in that case container resizing won't work as 
well. For our scenarios resourc

[jira] [Commented] (YARN-5501) Container Pooling in YARN

2017-02-08 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858564#comment-15858564
 ] 

Hitesh Sharma commented on YARN-5501:
-

Hi [~jlowe],

First off all a big thanks for taking the time to look at the document and 
sharing your thoughts. I appreciate it a lot.

bq. I am confused on how this will be used in practice.  To me pre-initialized 
containers means containers that have already started up with application- or 
framework-specific resources localized, processes have been launched using 
those resources, and potentially connections already negotiated to external 
services.  I'm not sure how YARN is supposed to know what mix of local 
resources, users, and configs to use for preinitialized containers that will 
get a good "hit rate" on container requests.  Maybe I'm misunderstanding what 
is really meant by "preinitialized," and some concrete, sample use cases with 
detailed walkthroughs of how they work in practice would really help 
crystallize the goals here.

Your understanding of pre-initialized containers is correct here. In the 
proposed design YARN RM has the config to start pre-initialized containers and 
this config is pretty much a launch context, which contains launch commands, 
details of resources to localize, and we also provide the resource constraints 
with which the container should be started. This configuration is currently 
static, but in the future we intend to this to be pluggable, so we can extend 
it to be dynamic and adjust based on cluster load.

The first use case happens to be a scenario where each of the container needs 
to start some processes that take a lot of time to initialize (localization and 
startup costs). YARN NM receives the config to start the pre-initialized 
container (there is a dummy application that is associated with the pre-init 
container for a specific application) and it follows the regular code paths for 
a container which includes localizing resources and launching the container. As 
you know, in YARN a container goes to RUNNING state once started, but a 
pre-initialized container instead goes to PREINITIALIZED state (there are some 
hooks which allow us to know that the container has initialized properly). From 
this point the container is not different from a regular container as the 
container monitor is overlooking it. The "Pool Manager" within YARN NM is used 
to start the pre-initialized container and watches for container events like 
stop in which case it simply tries to start it again. In other words at the 
moment we simply use YARN RM to pick the nodes where pre-initialized container 
should be started and let the "Pool Manager" in the NM manage the lifecycle of 
the container.

When the AM for which we pre-initialized the container comes and asks for this 
container then the "Container Manager" takes the pre-initialized container by 
issuing a "detach" container event and "attaches" it to the application. We 
added attachContainer and detachContainer events into ContainerExecutor which 
allow us to define what they mean. As an example, in attachContainer we start a 
new process within the cgroup of pre-initialized container. The PID to 
container mapping within the ContainerExecutor is updated to reflect everything 
accordingly (pre-initialized containers have a different container ID and 
belong to a different application before they are taken up). As part of the 
detachContainer all the resources associated with the pre-initialized container 
are now associated with the new container and get cleaned up accordingly.

The other use case where we have prototyped container pooling is the scenario 
where a container actually needs to be a Virtual Machine. Creation of VMs can 
take a long time thus container pooling allows us to keep the empty VM shells 
ready to go.

bq. Reusing containers across different applications is going to create some 
interesting scenarios that don't exist today.  For example, what does a 
container ID for one of these looks like?  How many things today assume that 
all container IDs for an application are essentially prefixed by the 
application ID?  This would violate that assumption, unless we introduce some 
sort of container ID aliasing where we create a "fake" container ID that maps 
to the "real" ID of the reused container.  It would be good to know how we're 
going to treat container IDs and what applications will see when they get one 
of these containers in response to their allocation request.

All pre-initialized containers belong to a specific application type. There is 
a dummy application created to which the pre-initialized container are mapped. 
As part of containerAttach and containerDetach event we disassociate the 
containers between application. Specifically ContainerExecutor has a mapping of 
container ID to PID file and as part of container detach we update this 
mappi

[jira] [Comment Edited] (YARN-6059) Update paused container state in the state store

2017-02-08 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858366#comment-15858366
 ] 

Hitesh Sharma edited comment on YARN-6059 at 2/8/17 6:37 PM:
-

[~asuresh], the patch is renamed.


was (Author: hrsharma):
Renaming the patch.

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6059) Update paused container state in the state store

2017-02-08 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-6059:

Attachment: YARN-6059-YARN-5972.001.patch

Renaming the patch.

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch, 
> YARN-6059-YARN-5972.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5501) Container Pooling in YARN

2017-02-07 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma reassigned YARN-5501:
---

Assignee: Hitesh Sharma

> Container Pooling in YARN
> -
>
> Key: YARN-5501
> URL: https://issues.apache.org/jira/browse/YARN-5501
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: Container Pooling - one pager.pdf
>
>
> This JIRA proposes a method for reducing the container launch latency in 
> YARN. It introduces a notion of pooling *Unattached Pre-Initialized 
> Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create 
> these unattached containers.
> * The NM would then advertise these containers as special resource types 
> (this should be possible via YARN-3926).
> * When a start container request is received by the node manager for 
> launching a container requesting this specific type of resource, it will take 
> one of these unattached pre-initialized containers from the pool, and use it 
> to service the container request.
> * Once the request is complete, the pre-initialized container would be 
> released and ready to serve another request.
> This capability would help reduce container launch latencies and thereby 
> allow for development of more interactive applications on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5501) Container Pooling in YARN

2017-02-07 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5501:

Attachment: Container Pooling - one pager.pdf

> Container Pooling in YARN
> -
>
> Key: YARN-5501
> URL: https://issues.apache.org/jira/browse/YARN-5501
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
> Attachments: Container Pooling - one pager.pdf
>
>
> This JIRA proposes a method for reducing the container launch latency in 
> YARN. It introduces a notion of pooling *Unattached Pre-Initialized 
> Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create 
> these unattached containers.
> * The NM would then advertise these containers as special resource types 
> (this should be possible via YARN-3926).
> * When a start container request is received by the node manager for 
> launching a container requesting this specific type of resource, it will take 
> one of these unattached pre-initialized containers from the pool, and use it 
> to service the container request.
> * Once the request is complete, the pre-initialized container would be 
> released and ready to serve another request.
> This capability would help reduce container launch latencies and thereby 
> allow for development of more interactive applications on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5501) Container Pooling in YARN

2017-02-07 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857192#comment-15857192
 ] 

Hitesh Sharma commented on YARN-5501:
-

Attaching a one pager design doc to capture some of the details. This is still 
an early draft so appreciate some feedback.

> Container Pooling in YARN
> -
>
> Key: YARN-5501
> URL: https://issues.apache.org/jira/browse/YARN-5501
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
> Attachments: Container Pooling - one pager.pdf
>
>
> This JIRA proposes a method for reducing the container launch latency in 
> YARN. It introduces a notion of pooling *Unattached Pre-Initialized 
> Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create 
> these unattached containers.
> * The NM would then advertise these containers as special resource types 
> (this should be possible via YARN-3926).
> * When a start container request is received by the node manager for 
> launching a container requesting this specific type of resource, it will take 
> one of these unattached pre-initialized containers from the pool, and use it 
> to service the container request.
> * Once the request is complete, the pre-initialized container would be 
> released and ready to serve another request.
> This capability would help reduce container launch latencies and thereby 
> allow for development of more interactive applications on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6059) Update paused container state in the state store

2017-02-07 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856654#comment-15856654
 ] 

Hitesh Sharma commented on YARN-6059:
-

Ping..[~asuresh], [~kkaranasos]..can you guys take a look at the patch? The 
current patch is a very raw implementation and before I refine it would be good 
to agree on a high level approach here. 

Thank you.

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6059) Update paused container state in the state store

2017-01-26 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-6059:

Attachment: YARN-5216-YARN-6059.001.patch

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6059) Update paused container state in the state store

2017-01-07 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15808319#comment-15808319
 ] 

Hitesh Sharma commented on YARN-6059:
-

[~asuresh], the strategy for paused containers would depend upon what we intend 
to do for opp. containers which is something we need to work upon. We should 
open a separate JIRA to discuss how opp. containers can be recovered (there 
shouldn't be anything special there whether they are paused or running opp. 
containers). In this JIRA I'm only looking to make the changes to state store 
to ensure paused containers are reflected properly over there.

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6059) Update paused container state in the state store

2017-01-05 Thread Hitesh Sharma (JIRA)
Hitesh Sharma created YARN-6059:
---

 Summary: Update paused container state in the state store
 Key: YARN-6059
 URL: https://issues.apache.org/jira/browse/YARN-6059
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Sharma
Assignee: Hitesh Sharma






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-12-24 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15775615#comment-15775615
 ] 

Hitesh Sharma commented on YARN-5216:
-

Resolving some feedback comments from [~asuresh]. Thanks!

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-scheduling
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>  Labels: oct16-hard
> Attachments: YARN-5216-YARN-5972.001.patch, 
> YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, 
> YARN-5216-YARN-5972.004.patch, YARN-5216-YARN-5972.005.patch, 
> YARN-5216-YARN-5972.006.patch, YARN-5216-YARN-5972.007.patch, 
> YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-12-24 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5216:

Attachment: YARN-5216-YARN-5972.007.patch

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-scheduling
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>  Labels: oct16-hard
> Attachments: YARN-5216-YARN-5972.001.patch, 
> YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, 
> YARN-5216-YARN-5972.004.patch, YARN-5216-YARN-5972.005.patch, 
> YARN-5216-YARN-5972.006.patch, YARN-5216-YARN-5972.007.patch, 
> YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-12-21 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5216:

Attachment: YARN-5216-YARN-5972.006.patch

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-scheduling
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>  Labels: oct16-hard
> Attachments: YARN-5216-YARN-5972.001.patch, 
> YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, 
> YARN-5216-YARN-5972.004.patch, YARN-5216-YARN-5972.005.patch, 
> YARN-5216-YARN-5972.006.patch, YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-12-21 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768666#comment-15768666
 ] 

Hitesh Sharma commented on YARN-5216:
-

Ok, fair point regarding the dispatcher. Updating the patch.

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-scheduling
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>  Labels: oct16-hard
> Attachments: YARN-5216-YARN-5972.001.patch, 
> YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, 
> YARN-5216-YARN-5972.004.patch, YARN-5216-YARN-5972.005.patch, 
> YARN-5216-YARN-5972.006.patch, YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-12-21 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5216:

Attachment: YARN-5216-YARN-5972.005.patch

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-scheduling
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>  Labels: oct16-hard
> Attachments: YARN-5216-YARN-5972.001.patch, 
> YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, 
> YARN-5216-YARN-5972.004.patch, YARN-5216-YARN-5972.005.patch, 
> YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-12-21 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768498#comment-15768498
 ] 

Hitesh Sharma commented on YARN-5216:
-

Hi [~asuresh], thanks for the feedback. I have incorporated the feedback and 
improved the test case to exercise more code path.

bq. Instead of explicitly calling "dispatcher.getEventHandler().handle(..)" 
from within ContainerScheduler, can you create a method inside Container: 
sendPauseEvent(String) and sendResumeEvent(String)

I'm not so sure about adding anything into the Container interface as 
pause/resume is only for opportunistic containers. We can do that when support 
for the same is added into guaranteed containers. 

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-scheduling
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>  Labels: oct16-hard
> Attachments: YARN-5216-YARN-5972.001.patch, 
> YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, 
> YARN-5216-YARN-5972.004.patch, YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-12-14 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5216:

Attachment: YARN-5216-YARN-5972.004.patch

Fix build warning

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-scheduling
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>  Labels: oct16-hard
> Attachments: YARN-5216-YARN-5972.001.patch, 
> YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, 
> YARN-5216-YARN-5972.004.patch, YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-12-14 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750141#comment-15750141
 ] 

Hitesh Sharma commented on YARN-5216:
-

Hi all, thank you for the feedback, I really appreciate it.

[~asuresh] and I discussed offline and decided to consider the reclaimResources 
API in container executor as a separate JIRA in the future.

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-scheduling
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>  Labels: oct16-hard
> Attachments: YARN-5216-YARN-5972.001.patch, 
> YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, 
> YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-12-14 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5216:

Attachment: YARN-5216-YARN-5972.003.patch

Resolving javac warning.

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-scheduling
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>  Labels: oct16-hard
> Attachments: YARN-5216-YARN-5972.001.patch, 
> YARN-5216-YARN-5972.002.patch, YARN-5216-YARN-5972.003.patch, 
> YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-12-13 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5216:

Attachment: YARN-5216-YARN-5972.002.patch

Resolving build issues

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-scheduling
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>  Labels: oct16-hard
> Attachments: YARN-5216-YARN-5972.001.patch, 
> YARN-5216-YARN-5972.002.patch, YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-12-10 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15738832#comment-15738832
 ] 

Hitesh Sharma edited comment on YARN-5216 at 12/11/16 2:22 AM:
---

I will wait for [~asuresh] to clarify here, but I think one thing on the table 
is to add a preempt API in container executor, which has a knob to allow KILL 
vs PAUSE. I captured some concerns with that, but do agree that we don't need 
two params - one in scheduler and other in executor.


was (Author: hrsharma):
I will wait for [~asuresh] to clarify here, but I think one thing on the table 
is to add a preempt API in container executor, which has a knob to allow KILL 
vs PAUSE. I captured some concerns with that, but do agree that we don't need 
to params - one in scheduler and other in executor.

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-scheduling
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>  Labels: oct16-hard
> Attachments: YARN-5216-YARN-5972.001.patch, YARN5216.001.patch, 
> yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-12-10 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15738832#comment-15738832
 ] 

Hitesh Sharma commented on YARN-5216:
-

I will wait for [~asuresh] to clarify here, but I think one thing on the table 
is to add a preempt API in container executor, which has a knob to allow KILL 
vs PAUSE. I captured some concerns with that, but do agree that we don't need 
to params - one in scheduler and other in executor.

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-scheduling
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>  Labels: oct16-hard
> Attachments: YARN-5216-YARN-5972.001.patch, YARN5216.001.patch, 
> yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-12-10 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15738826#comment-15738826
 ] 

Hitesh Sharma commented on YARN-5216:
-

[~asuresh], I'm a little confused, so to be clear are you suggesting that we 
have a preempt API in container executor and have a conf knob to select type of 
preemption (PAUSE or KILL), is that correct? 

Preempt is a very overloaded term and the current semantics work for scheduler, 
but not sure whether they can be extended to work preservation, or some other 
scenario. Currently if a container is preempted via PAUSE then it gets RESUMED 
when there is free capacity but such a behavior may not be acceptable if the 
scenario for calling preempt API is different. In other words, having the 
scheduler call PAUSE or KILL is a very clear choice that needs to be made, but 
we can't say that PAUSE should be the preemption policy for the container in 
all cases. Further container PAUSE is only for opp. container and not something 
we have enabled for GUARANTEED and thus I would scope it down to the scheduler 
for now.

I do want to say that at some point in future it would make sense to have a 
preempt API and a behavior that can be consistent across all scenarios, but I 
don't think we are there yet, just my $0.02.


> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-scheduling
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>  Labels: oct16-hard
> Attachments: YARN-5216-YARN-5972.001.patch, YARN5216.001.patch, 
> yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-12-10 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15738398#comment-15738398
 ] 

Hitesh Sharma commented on YARN-5216:
-

Hi [~asuresh], are you saying that we add a preempt method in the container 
executor and have the scheduler call that instead?

One issue with that approach is that we need to add a conf knob in the executor 
to pick the kind of preemption you want and that also becomes a little too 
generic and ambiguous. What we are looking for is preemption of opp. container 
to schedule a guaranteed one and I feel that's best captured in the scheduler 
state.

LMK your thoughts.

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-scheduling
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>  Labels: oct16-hard
> Attachments: YARN-5216-YARN-5972.001.patch, YARN5216.001.patch, 
> yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-12-09 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5216:

Attachment: YARN-5216-YARN-5972.001.patch

Posting a patch based on YARN-5972.

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-scheduling
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>  Labels: oct16-hard
> Attachments: YARN-5216-YARN-5972.001.patch, YARN5216.001.patch, 
> yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5292) NM Container lifecycle and state transitions to support for PAUSED container state.

2016-12-08 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731526#comment-15731526
 ] 

Hitesh Sharma commented on YARN-5292:
-

Hi [~kasha], the design doc and the patch allow an opportunistic container to 
be paused and resumed. The actual implementation of the pluggable interface 
will be as part of [YARN-5196] and there via yarn-site.xml you can specify the 
preemption policy for an opportunistic container when there is a guaranteed 
container waiting to run (i.e. pause or kill). Currently only the NM scheduler 
can raise the PAUSE/RESUME events and there is no support for doing so in the 
container management protocol.

One of the questions on the table is if there is a need to extend support of 
PAUSE/RESUME to guaranteed containers and have the AMs initiate that. Would 
love to hear some thoughts on that and if there are any use cases that can 
benefit from that.

> NM Container lifecycle and state transitions to support for PAUSED container 
> state.
> ---
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, YARN-5292.004.patch, YARN-5292.005.patch, yarn-5292.pdf
>
>
> This JIRA addresses the NM Container and state machine and lifecycle changes 
> needed  to support pausing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5959) RM changes to support change of container ExecutionType

2016-12-06 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727312#comment-15727312
 ] 

Hitesh Sharma commented on YARN-5959:
-

Hello [~asuresh], can you share some design around how NM handles the change in 
execution type? I will look at the patch more closely but having that context 
in mind will help.

Thanks a lot!

> RM changes to support change of container ExecutionType
> ---
>
> Key: YARN-5959
> URL: https://issues.apache.org/jira/browse/YARN-5959
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5959.combined.001.patch, YARN-5959.wip.002.patch, 
> YARN-5959.wip.patch
>
>
> RM side changes to allow an AM to ask for change of ExecutionType.
> Currently, there are two cases:
> # *Promotion* : OPPORTUNISTIC to GUARANTEED.
> # *Demotion* : GUARANTEED to OPPORTUNISTIC.
> This is similar in YARN-1197 which allows for change in Container resources. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5966) AMRMClient changes to support ExecutionType update

2016-12-06 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727307#comment-15727307
 ] 

Hitesh Sharma commented on YARN-5966:
-

Just a quick clarification, how is it different than [YARN-5087]. Is this an 
extension of that patch?

> AMRMClient changes to support ExecutionType update
> --
>
> Key: YARN-5966
> URL: https://issues.apache.org/jira/browse/YARN-5966
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5966.wip.001.patch
>
>
> {{AMRMClient}} changes to support change of container ExecutionType



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5972) Add Support for Pausing/Freezing of containers

2016-12-06 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727256#comment-15727256
 ] 

Hitesh Sharma commented on YARN-5972:
-

Hi folks, thanks for opening this JIRA and the feedback. Much appreciated. 

{quote}
While notifying an AM of containers that are about to be preempted does allow 
the AM to check-point work, it does imply, as you pointed out, that AMs be 
modified to act on this input and make some decisions based on it.

Container pausing/freezing on the other hand, given OS/VM level support (also 
exposed via Docker and LXC) to actually freeze a process (agreed, their 
definition of freeze might vary), is actually AM/application independent. This 
can be useful, for applications and deployments that do not really want to 
check-point on its own but at the same time like the idea of container 
preemption with work preservations.
{quote}

Agree with [~asuresh] here. What container pausing/freezing offers is an 
ability to delegate to the underlying OS how the resources used by a container 
should be reclaimed and when resources free up again then restart the 
container. The gains of doing so will vary based on the container executor 
implementation. That said it doesn't make the PAUSE/RESUME functionality to be 
the perfect solution for work preservation or substitute AM specific 
checkpointing.

[YARN-5292] adds PAUSE/RESUME for opportunistic containers and doesn't target 
guaranteed containers. I can think of scenarios where it is good to have this 
functionality in guaranteed containers but I would wait and see some need 
coming in the community.  

Allowing the ContainerManager to initiate a pause/resume on an opportunistic 
container was considered but we decided not to have that functionality. There 
are some edge cases around what happens if the CM initiates a RESUME on a 
paused container and the NM tries to PAUSE it ([YARN-5216]). I think [~subru] 
is also touching towards these edge cases.

Overall I feel that the current design of allowing PAUSE/RESUME on 
opportunistic containers is a good starting point and allows to PAUSE an 
opportunistic container in favor of a guaranteed one and when resources free up 
it gets RESUMED ([YARN-5216]). We should probably implement pauseContainer and 
resumeContainer for Docker based container executors as opportunistic 
containers running inside Docker containers can benefit from it. 

If the community feels then we can extend the functionality towards guaranteed 
containers. I personally think that may become more relevant as YARN containers 
become virtualized via Docker or virtual machines, but I would love to hear 
some scenarios before we do that.

> Add Support for Pausing/Freezing of containers
> --
>
> Key: YARN-5972
> URL: https://issues.apache.org/jira/browse/YARN-5972
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> Instead of preempting a running container, the container can be moved to a 
> PAUSED state, where it remains until resources get freed up on the node then 
> the preempted container can resume to the running state.
> Note that process freezing this is already supported by 'cgroups freezer' 
> which is used internally by the docker pause functionality. Windows also has 
> OS level support of a similar nature.
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt call would 
> pause the VM and resume would restore it back to the running state.
> If the container executor / runtime doesn't support preemption, then preempt 
> would default to killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5292) Support for PAUSED container state

2016-12-04 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5292:

Attachment: YARN-5292.005.patch

Resolving review feedback.

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, YARN-5292.004.patch, YARN-5292.005.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5292) Support for PAUSED container state

2016-12-04 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15720879#comment-15720879
 ] 

Hitesh Sharma commented on YARN-5292:
-

Hi [~asuresh], thanks a lot for the feedback!

1. The default behavior is to throw an exception which is caught by the 
ContainerLauncher and proceeds to kill the container. So if no PAUSE/RESUME 
support exists then we kill the container. On a side note, we can open a JIRA 
to implement PAUSE/RESUME for some of the executors like Docker.

2. Took care of collapsing transitions into one.

3.  If the container is REINITIALIZLING and we get a PAUSE then the behavior is 
undeterministic. Pausing the container when it hasn't finished reinitialization 
can be be bad thus we kill instead. I feel it would be quite complicated if we 
try to add the container back to the scheduler queue somehow thus let's not try 
to do so. 

4. Good point. Done.

Please have a look at the posted patch.

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, YARN-5292.004.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5292) Support for PAUSED container state

2016-12-01 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713883#comment-15713883
 ] 

Hitesh Sharma commented on YARN-5292:
-

[~arun suresh], can you please look at the attached patch? Thanks!

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, YARN-5292.004.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5292) Support for PAUSED container state

2016-12-01 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713883#comment-15713883
 ] 

Hitesh Sharma edited comment on YARN-5292 at 12/2/16 3:21 AM:
--

[~asuresh], can you please look at the attached patch? Thanks!


was (Author: hrsharma):
[~asuresh]], can you please look at the attached patch? Thanks!

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, YARN-5292.004.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5292) Support for PAUSED container state

2016-12-01 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713883#comment-15713883
 ] 

Hitesh Sharma edited comment on YARN-5292 at 12/2/16 3:21 AM:
--

[~asuresh]], can you please look at the attached patch? Thanks!


was (Author: hrsharma):
[~arun suresh], can you please look at the attached patch? Thanks!

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, YARN-5292.004.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3611) Support Docker Containers In LinuxContainerExecutor

2016-11-30 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710732#comment-15710732
 ] 

Hitesh Sharma commented on YARN-3611:
-

Hi folks,

Docker is now available on Windows and is fully supported by Docker INC (I'm 
talking about launching Windows containers via Docker).

https://www.docker.com/microsoft

Unfortunately in the current design Docker is being limited to Linux only. I 
think we need to revisit this and have a way to share the same code across 
Docker support for Windows and Linux. Another goal to keep in mind is to have 
DockerContainerExecutor be completely OS agnostic. As in certain cases Docker 
client might actually be talking to a daemon on a remote machine or a VM (which 
maybe Linux or Windows). Would love to hear some thoughts on how to achieve 
Docker support for Windows by reusing all the good work being done here.

Thanks!









> Support Docker Containers In LinuxContainerExecutor
> ---
>
> Key: YARN-3611
> URL: https://issues.apache.org/jira/browse/YARN-3611
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>
> Support Docker Containers In LinuxContainerExecutor
> LinuxContainerExecutor provides useful functionality today with respect to 
> localization, cgroups based resource management and isolation for CPU, 
> network, disk etc. as well as security with a well-defined mechanism to 
> execute privileged operations using the container-executor utility.  Bringing 
> docker support to LinuxContainerExecutor lets us use all of this 
> functionality when running docker containers under YARN, while not requiring 
> users and admins to configure and use a different ContainerExecutor. 
> There are several aspects here that need to be worked through :
> * Mechanism(s) to let clients request docker-specific functionality - we 
> could initially implement this via environment variables without impacting 
> the client API.
> * Security - both docker daemon as well as application
> * Docker image localization
> * Running a docker container via container-executor as a specified user
> * “Isolate” the docker container in terms of CPU/network/disk/etc
> * Communicating with and/or signaling the running container (ensure correct 
> pid handling)
> * Figure out workarounds for certain performance-sensitive scenarios like 
> HDFS short-circuit reads 
> * All of these need to be achieved without changing the current behavior of 
> LinuxContainerExecutor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5292) Support for PAUSED container state

2016-11-30 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5292:

Attachment: YARN-5292.004.patch

Adding test case and raising an event for the scheduler to know that the 
container was paused.

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, YARN-5292.004.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5292) Support for PAUSED container state

2016-11-23 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692011#comment-15692011
 ] 

Hitesh Sharma commented on YARN-5292:
-

Hi [~jianhe], apologies for the late response. It seems that [YARN-4876] adds 
the functionality to do what you are describing. Please let me know if you have 
something else in mind here. Thanks!

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5292) Support for PAUSED container state

2016-11-22 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686049#comment-15686049
 ] 

Hitesh Sharma commented on YARN-5292:
-

Thanks [~subru] for the comments. I agree with you that we need to think 
separately about paused containers in regards to opp. and guaranteed execution 
type. Most of the discussion in this JIRA is targeted towards opp. I will open 
a new JIRA to discuss pause/resume for YARN containers and this one can be used 
for opp. containers.

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5292) Support for PAUSED container state

2016-11-22 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15686030#comment-15686030
 ] 

Hitesh Sharma commented on YARN-5292:
-

[~jianhe], can you elaborate a little on the use case and scenario of 
pause/resume for long running service? It isn't fully clear to me how that will 
be used so appreciate the help.

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5292) Support for PAUSED container state

2016-11-20 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15682104#comment-15682104
 ] 

Hitesh Sharma commented on YARN-5292:
-

Thanks for the comments, [~arun suresh].

Regarding 1, the actual JIRA to avoid killing of opp. containers is 
[YARN-5216]. I'm working on a patch for that which works on top of the new 
schedule state.

In our offline discussions we have talked about having APIs in 
ContainerManagementProtocol that allow PAUSE/RESUME on a container. The current 
implementation is only for opp. containers so there was no need to add anything 
to the ContainerManagementProtocol, but we can definitely extend it to 
guaranteed containers and make the required changes. I think the PAUSE/RESUME 
semantics are particularly of interest for Docker containers and I will be 
happy to help with any related work in this area.

I have test cases as part of the patch for [YARN-5216] and that will test this 
code path. Please take a look at the patch a bit more closely so I can address 
other feedback.


> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5292) Support for PAUSED container state

2016-11-16 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15672702#comment-15672702
 ] 

Hitesh Sharma commented on YARN-5292:
-

[~asuresh], can you also please take a look at this patch? Thank you so much!

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5292) Support for PAUSED container state

2016-11-16 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5292:

Attachment: YARN-5292.003.patch

Rebasing with latest changes from trunk.

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1593) support out-of-proc AuxiliaryServices

2016-11-15 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668649#comment-15668649
 ] 

Hitesh Sharma commented on YARN-1593:
-

Thanks [~asuresh] for pointing to [YARN-5501]. Agree with you folks that there 
is some overlap and we will be happy to converge and discuss the best way to 
leverage the efforts here.

[~vvasudev], with regards to pooled container the behavior is to allow NM to 
serve container requests even if the pre-initialized container is not ready. 
For container pooling this behavior makes sense as we eventually want to 
advertise pre-initialized container as a resource and have the AM ask for it. 

Regarding the 2nd point, current implementation starts a fixed number of 
pre-initialized container on each node (what to start, resources to localize, 
and other details are currently passed via config files). Eventually we intend 
the RM to pick up some nodes where the pre-initialized container should be 
started. This is something we are starting to work upon.



> support out-of-proc AuxiliaryServices
> -
>
> Key: YARN-1593
> URL: https://issues.apache.org/jira/browse/YARN-1593
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, rolling upgrade
>Reporter: Ming Ma
>Assignee: Varun Vasudev
> Attachments: SystemContainersandSystemServices.pdf
>
>
> AuxiliaryServices such as ShuffleHandler currently run in the same process as 
> NM. There are some benefits to host them in dedicated processes.
> 1. NM rolling restart. If we want to upgrade YARN , NM restart will force the 
> ShuffleHandler restart. If ShuffleHandler runs as a separate process, 
> ShuffleHandler can continue to run during NM restart. NM can reconnect the 
> the running ShuffleHandler after restart.
> 2. Resource management. It is possible another type of AuxiliaryServices will 
> be implemented. AuxiliaryServices are considered YARN application specific 
> and could consume lots of resources. Running AuxiliaryServices in separate 
> processes allow easier resource management. NM could potentially stop a 
> specific AuxiliaryServices process from running if it consumes resource way 
> above its allocation.
> Here are some high level ideas:
> 1. NM provides a hosting process for each AuxiliaryService. Existing 
> AuxiliaryService API doesn't change.
> 2. The hosting process provides RPC server for AuxiliaryService proxy object 
> inside NM to connect to.
> 3. When we rolling restart NM, the existing AuxiliaryService processes will 
> continue to run. NM could reconnect to the running AuxiliaryService processes 
> upon restart.
> 4. Policy and resource management of AuxiliaryServices. So far we don't have 
> immediate need for this. AuxiliaryService could run inside a container and 
> its resource utilization could be taken into account by RM and RM could 
> consider a specific type of applications overutilize cluster resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5292) Support for PAUSED container state

2016-11-14 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5292:

Attachment: YARN-5292.002.patch

Fixing the build issues.

[~asuresh], can you please take a look?

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, YARN-5292.002.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5292) Support for PAUSED container state

2016-11-14 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5292:

Attachment: YARN-5292.001.patch

Initial implementation of PAUSE and RESUME in YARN container state machine.

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5292.001.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5501) Container Pooling in YARN

2016-08-10 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414834#comment-15414834
 ] 

Hitesh Sharma edited comment on YARN-5501 at 8/10/16 7:11 AM:
--

Thanks [~atris] for the comments. These are good points but the answers depend 
upon the implementation of the container. 

1) Yes, there will be some overhead of maintaining the pooled containers but 
that's a trade off to optimize for launch latencies. Containers can however 
implement some custom behaviors to lower the overhead. As an e.g. if the 
container supports PAUSE and RESUME semantics ([YARN-5292]) then a pooled 
container could be PAUSED. Some other container could however chose to resize 
the allocation to a minimum and resize as per the actual resource request.

2) I'm not sure if I follow the comment here. Pooled containers are useful to 
lower the launch latencies and that's independent of the actual container run 
time.

3) That would be implementation specific. A pooled container is in itself a 
resource and when acquired by a request would need to be adjusted accordingly.

We will be posting some more design and implementation details which will 
hopefully help clarify the ideas here.


was (Author: hrsharma):
Thanks [~atris] for the comments. These are good points but the answers depend 
upon the implementation of the container. 

1) Yes, there will be some overhead of maintaining the pooled containers but 
that's a trade off to optimize for launch latencies. Containers can however 
implement some custom behaviors to lower the overhead. As an e.g. if the 
container supports PAUSE and RESUME semantics [YARN-5292] then a pooled 
container could be PAUSED. Some other container could however chose to resize 
the allocation to a minimum and resize as per the actual resource request.

2) I'm not sure if I follow the comment here. Pooled containers are useful to 
lower the launch latencies and that's independent of the actual container run 
time.

3) That would be implementation specific. A pooled container is in itself a 
resource and when acquired by a request would need to be adjusted accordingly.

We will be posting some more design and implementation details which will 
hopefully help clarify the ideas here.

> Container Pooling in YARN
> -
>
> Key: YARN-5501
> URL: https://issues.apache.org/jira/browse/YARN-5501
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
>
> This JIRA proposes a method for reducing the container launch latency in 
> YARN. It introduces a notion of pooling *Unattached Pre-Initialized 
> Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create 
> these unattached containers.
> * The NM would then advertise these containers as special resource types 
> (this should be possible via YARN-3926).
> * When a start container request is received by the node manager for 
> launching a container requesting this specific type of resource, it will take 
> one of these unattached pre-initialized containers from the pool, and use it 
> to service the container request.
> * Once the request is complete, the pre-initialized container would be 
> released and ready to serve another request.
> This capability would help reduce container launch latencies and thereby 
> allow for development of more interactive applications on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5501) Container Pooling in YARN

2016-08-10 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414834#comment-15414834
 ] 

Hitesh Sharma commented on YARN-5501:
-

Thanks [~atris] for the comments. These are good points but the answers depend 
upon the implementation of the container. 

1) Yes, there will be some overhead of maintaining the pooled containers but 
that's a trade off to optimize for launch latencies. Containers can however 
implement some custom behaviors to lower the overhead. As an e.g. if the 
container supports PAUSE and RESUME semantics [YARN-5292] then a pooled 
container could be PAUSED. Some other container could however chose to resize 
the allocation to a minimum and resize as per the actual resource request.

2) I'm not sure if I follow the comment here. Pooled containers are useful to 
lower the launch latencies and that's independent of the actual container run 
time.

3) That would be implementation specific. A pooled container is in itself a 
resource and when acquired by a request would need to be adjusted accordingly.

We will be posting some more design and implementation details which will 
hopefully help clarify the ideas here.

> Container Pooling in YARN
> -
>
> Key: YARN-5501
> URL: https://issues.apache.org/jira/browse/YARN-5501
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
>
> This JIRA proposes a method for reducing the container launch latency in 
> YARN. It introduces a notion of pooling *Unattached Pre-Initialized 
> Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create 
> these unattached containers.
> * The NM would then advertise these containers as special resource types 
> (this should be possible via YARN-3926).
> * When a start container request is received by the node manager for 
> launching a container requesting this specific type of resource, it will take 
> one of these unattached pre-initialized containers from the pool, and use it 
> to service the container request.
> * Once the request is complete, the pre-initialized container would be 
> released and ready to serve another request.
> This capability would help reduce container launch latencies and thereby 
> allow for development of more interactive applications on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5292) Support for PAUSED container state

2016-08-04 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5292:

Attachment: yarn-5292.pdf

Please find the attached document that describes some of the design and 
implementation details of adding PAUSE and RESUME states to YARN containers.

Appreciate the feedback and comments.

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
> Attachments: yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-07-12 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374174#comment-15374174
 ] 

Hitesh Sharma commented on YARN-5216:
-

We investigated a few approaches over here:

* Have a subclass of {{QueuingContainersManagerImpl}}: this approach has some 
pros but the problem here is that subclassing just to override the preemption 
behavior isn't the right thing to do.
* Having a pluggable policy in {{QueuingContainersManagerImpl}} requires 
extension points to select which containers to run, run the container, preempt 
the container, etc. This approach starts to get more complex as we look to add 
support for PAUSED containers [YARN-5292].

Based on the feedback here and the discussions we have had, I'm looking into 
adding support for PAUSED containers within {{QueuingContainersManagerImpl}}. 
That would simplify things quite a bit and allow us to have a more pluggable 
and cleaner design. 

[~asuresh], [~kkaranasos], thanks for all the feedback!


> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Fix For: 2.9.0
>
> Attachments: YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-07-06 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365262#comment-15365262
 ] 

Hitesh Sharma commented on YARN-5216:
-

Thank you for the insights, [~kkaranasos]!

Sorry, rebalancing wasn't the right terminology to use. I was referring to 
killing of queued containers that happens during 
{{shedQueuedOpportunisticContainers}} to enforce the queue limits, which in 
turns follows the paths you mention above.

It might be a good idea to use start container to imply resume when the 
container is paused, but at the same time it also overloads the meaning of 
start container and given how different they are it can impose some challenges. 
Anyways, we can discuss this more in [YARN-5292].

{quote}
As far as I can see, all you need from the NM to support preemption is (let me 
know if there are more things that I am missing):
# Determine the way a container stops (option 1: kill, option 2: preempt).
# Determine the way it start (that is, resume it if it's paused, instead of 
starting it from the beginning).
# Decide which container to start (you might want to start first containers 
that are paused instead of new ones).
{quote}

How do you propose to do 3 without having an extension point to pick a 
container to start? The moment we have an extension point to pick a container 
to start we also need an extension point to pick up a container to kill for 
enforcing queue limits or something else.

Appreciate the feedback and help. Thanks a lot!




> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-07-06 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365085#comment-15365085
 ] 

Hitesh Sharma commented on YARN-5216:
-

[~kkaranasos], I'm not sure if subclassing would work. We need to have more 
control on how the opportunistic containers are queued and how we start/preempt 
them. From a design point of view also {{QueuingContainersPreemptionManagerImpl 
}} is not really a queuing container manager, but just a specific way to 
preempt queued opportunistic containers. Thus composition seems a better choice 
here. 

Thank you.

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-07-06 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364893#comment-15364893
 ] 

Hitesh Sharma commented on YARN-5216:
-

[~asuresh], [~kkaranasos]], thank you for the feedback and comments.

Regarding the refactoring being done and the reason to pull queues into the 
currently named {{OpportunisticContainerManager}}:

Roughly speaking the {{QueuingContainersManagerImpl}} does the following for 
starting and stopping opportunistic containers:


* A running container simply gets preempted while a container waiting in the 
queue is removed and RM is notified to reallocate it elsewhere.
* Periodically it is checked if there are too many waiting containers in the 
queue and they are removed so RM can rebalance them. 
* When a running container finishes then a waiting opportunistic container will 
be run if there are no guaranteed waiting in the queue. 

If the preemption policy is to kill the container then things are a little 
simpler and you can leave the opportunistic container queue within 
{{QueuingContainersManagerImpl}}. However if the preemption policy is different 
then we need extension points to know about the operations that the 
{{QueuingContainersManagerImpl}} wants to do and respond appropriately. Say the 
preemption policy is to put the container in a pause state so that it can be 
resumed once there is some room to run a container. This requires to 
distinguish between whether the {{QueuingContainersManagerImpl}} is looking to 
run pending containers (e.g. we want to resume a preempted container over an OC 
which is still waiting in the queue) or is looking to rebalance waiting 
containers to other nodes (e.g. we can't reallocate a container in the pause 
state). For pretty much these reasons the pluggable policy is named as 
{{OpportunisticContainerManager}} as it allows you to preempt and start the 
opportunistic containers and also manages the queue of the opportunistic 
containers. I'm open to suggestion on how to do this differently without having 
to change {{QueuingContainersManagerImpl}} a lot.

[~asuresh], can you elaborate a little why {{queuedGuaranteedContainers}} 
should also be pulled into the {{OpportunisticContainerManagerImpl}}?

I will look into using ServiceLoader framework over reflection and add an extra 
constant to determine the default value.

Thank a lot for the feedback and comments.

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5292) Support for PAUSED container state

2016-06-24 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348806#comment-15348806
 ] 

Hitesh Sharma commented on YARN-5292:
-

[~jianhe], the container would be resumed when the running containers finish on 
the node and resources are available.

The main scenario here is for work preservation. If the container supports 
preemption via pause/freeze then it can be put in this hibernate mode and 
resumed when resources free up. For some applications it is quite expensive to 
throw away the work done by an opportunistic container and thus want to have 
the capability to preserve it.

Thanks for the feedback.

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5292) Support for PAUSED container state

2016-06-24 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348557#comment-15348557
 ] 

Hitesh Sharma edited comment on YARN-5292 at 6/24/16 10:33 PM:
---

[~jlowe] and [~asuresh], appreciate the feedback. 

It's a good idea to have the "PAUSING" state. If the container fails to pause 
then we proceed to kill and terminate it. How the pausing is implemented is 
specific to the container so I'm not so sure if we need APIs to store state.

Thanks again for the feedback.


was (Author: hrsharma):
[~Jason Lowe] and [~Arun Suresh], appreciate the feedback. 

It's a good idea to have the "PAUSING" state. If the container fails to pause 
then we proceed to kill and terminate it. How the pausing is implemented is 
specific to the container so I'm not so sure if we need APIs to store state.

Thanks again for the feedback.

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5292) Support for PAUSED container state

2016-06-24 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348557#comment-15348557
 ] 

Hitesh Sharma commented on YARN-5292:
-

[~Jason Lowe] and [~Arun Suresh], appreciate the feedback. 

It's a good idea to have the "PAUSING" state. If the container fails to pause 
then we proceed to kill and terminate it. How the pausing is implemented is 
specific to the container so I'm not so sure if we need APIs to store state.

Thanks again for the feedback.

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Hitesh Sharma
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5292) Support for PAUSED container state

2016-06-23 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5292:

Summary: Support for PAUSED container state  (was: Support for PAUSED state 
in a container)

> Support for PAUSED container state
> --
>
> Key: YARN-5292
> URL: https://issues.apache.org/jira/browse/YARN-5292
> Project: Hadoop YARN
>  Issue Type: New Feature
>    Reporter: Hitesh Sharma
>
> JIRA 2877 introduced OPPORTUNISTIC containers, and JIRA 5216 adds capability 
> to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5292) Support for PAUSED state in a container

2016-06-23 Thread Hitesh Sharma (JIRA)
Hitesh Sharma created YARN-5292:
---

 Summary: Support for PAUSED state in a container
 Key: YARN-5292
 URL: https://issues.apache.org/jira/browse/YARN-5292
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Hitesh Sharma


JIRA 2877 introduced OPPORTUNISTIC containers, and JIRA 5216 adds capability to 
customize how OPPORTUNISTIC containers get preempted.

In this JIRA we propose introducing a PAUSED container state.
When a running container gets preempted, it enters the PAUSED state, where it 
remains until resources get freed up on the node then the preempted container 
can resume to the running state.
 
One scenario where this capability is useful is work preservation. How 
preemption is done, and whether the container supports it, is implementation 
specific.

For instance, if the container is a virtual machine, then preempt would pause 
the VM and resume would restore it back to the running state.
If the container doesn't support preemption, then preempt would default to 
killing the container. 
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-06-21 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5216:

Attachment: yarn5216.002.patch

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-06-21 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343234#comment-15343234
 ] 

Hitesh Sharma commented on YARN-5216:
-

Resolved the error and attached a new patch.

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-06-21 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5216:

Attachment: YARN5216.001.patch

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: YARN5216.001.patch
>
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM

2016-06-20 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma reassigned YARN-5216:
---

Assignee: Hitesh Sharma  (was: Arun Suresh)

> Expose configurable preemption policy for OPPORTUNISTIC containers running on 
> the NM
> 
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>
> Currently, the default action taken by the QueuingContainerManager, 
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM 
> with OPPORTUNISTIC containers using up resources, is to KILL the running 
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a 
> different action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5127) Expose ExecutionType in Container api record

2016-05-27 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5127:

Attachment: YARN-5127.005.patch

> Expose ExecutionType in Container api record
> 
>
> Key: YARN-5127
> URL: https://issues.apache.org/jira/browse/YARN-5127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: YARN-5127.002.patch, YARN-5127.003.patch, 
> YARN-5127.004.patch, YARN-5127.005.patch, YARN-5127.v1.patch
>
>
> Currently the ExecutionType of the Container returned as a response to the 
> allocate call is contained in the {{ContinerTokenIdentifier}} which is 
> encoded into the ContainerToken.
> Unfortunately, the client would need to decode the returned token to access 
> the ContainerTokenIdentifier, which probably should not be allowed.
> This JIRA proposes to add a {{getExecutionType()}} method in the container 
> record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5127) Expose ExecutionType in Container api record

2016-05-26 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5127:

Attachment: YARN-5127.004.patch

> Expose ExecutionType in Container api record
> 
>
> Key: YARN-5127
> URL: https://issues.apache.org/jira/browse/YARN-5127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: YARN-5127.002.patch, YARN-5127.003.patch, 
> YARN-5127.004.patch, YARN-5127.v1.patch
>
>
> Currently the ExecutionType of the Container returned as a response to the 
> allocate call is contained in the {{ContinerTokenIdentifier}} which is 
> encoded into the ContainerToken.
> Unfortunately, the client would need to decode the returned token to access 
> the ContainerTokenIdentifier, which probably should not be allowed.
> This JIRA proposes to add a {{getExecutionType()}} method in the container 
> record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5162) Exceptions thrown during AM registerAM call when Distributed Scheduling is Enabled

2016-05-26 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5162:

Attachment: YARN-5162.002.patch

> Exceptions thrown during AM registerAM call when Distributed Scheduling is 
> Enabled
> --
>
> Key: YARN-5162
> URL: https://issues.apache.org/jira/browse/YARN-5162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: YARN-5162.001.patch, YARN-5162.002.patch
>
>
> The following Exception is seen and the AM fails to register with RM:
> {noformat}
> 16/05/24 17:09:26 INFO ipc.Server: Auth successful for 
> appattempt_146410856_0001_01 (auth:SIMPLE)
> 16/05/24 17:09:26 INFO amrmproxy.AMRMProxyService: Registering application 
> master. Host: Port:0 Tracking Url:
> 16/05/24 17:09:26 INFO amrmproxy.DefaultRequestInterceptor: Forwarding 
> registration request to the real YARN RM
> 16/05/24 17:09:26 DEBUG nodemanager.NodeStatusUpdaterImpl: Node's 
> health-status : true,
> 16/05/24 17:09:26 DEBUG nodemanager.NodeStatusUpdaterImpl: Sending out 1 
> container statuses: [ContainerStatus: [ContainerId: 
> container_146410856_0001_01_000
> 001, ExecutionType: GUARANTEED, State: RUNNING, Capability:  vCores:1>, Diagnostics: , ExitStatus: -1000, ]]
> 16/05/24 17:09:26 WARN ipc.Client: Exception encountered while connecting to 
> the server : org.apache.hadoop.security.AccessControlException: Client cannot 
> authe
> nticate via:[TOKEN]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5127) Expose ExecutionType in Container api record

2016-05-26 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5127:

Attachment: YARN-5127.003.patch

> Expose ExecutionType in Container api record
> 
>
> Key: YARN-5127
> URL: https://issues.apache.org/jira/browse/YARN-5127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: YARN-5127.002.patch, YARN-5127.003.patch, 
> YARN-5127.v1.patch
>
>
> Currently the ExecutionType of the Container returned as a response to the 
> allocate call is contained in the {{ContinerTokenIdentifier}} which is 
> encoded into the ContainerToken.
> Unfortunately, the client would need to decode the returned token to access 
> the ContainerTokenIdentifier, which probably should not be allowed.
> This JIRA proposes to add a {{getExecutionType()}} method in the container 
> record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5162) Exceptions thrown during AM registerAM call when Distributed Scheduling is Enabled

2016-05-26 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5162:

Attachment: YARN-5162.001.patch

> Exceptions thrown during AM registerAM call when Distributed Scheduling is 
> Enabled
> --
>
> Key: YARN-5162
> URL: https://issues.apache.org/jira/browse/YARN-5162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: YARN-5162.001.patch
>
>
> The following Exception is seen and the AM fails to register with RM:
> {noformat}
> 16/05/24 17:09:26 INFO ipc.Server: Auth successful for 
> appattempt_146410856_0001_01 (auth:SIMPLE)
> 16/05/24 17:09:26 INFO amrmproxy.AMRMProxyService: Registering application 
> master. Host: Port:0 Tracking Url:
> 16/05/24 17:09:26 INFO amrmproxy.DefaultRequestInterceptor: Forwarding 
> registration request to the real YARN RM
> 16/05/24 17:09:26 DEBUG nodemanager.NodeStatusUpdaterImpl: Node's 
> health-status : true,
> 16/05/24 17:09:26 DEBUG nodemanager.NodeStatusUpdaterImpl: Sending out 1 
> container statuses: [ContainerStatus: [ContainerId: 
> container_146410856_0001_01_000
> 001, ExecutionType: GUARANTEED, State: RUNNING, Capability:  vCores:1>, Diagnostics: , ExitStatus: -1000, ]]
> 16/05/24 17:09:26 WARN ipc.Client: Exception encountered while connecting to 
> the server : org.apache.hadoop.security.AccessControlException: Client cannot 
> authe
> nticate via:[TOKEN]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5162) Exceptions thrown during AM registerAM call when Distributed Scheduling is Enabled

2016-05-26 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302394#comment-15302394
 ] 

Hitesh Sharma commented on YARN-5162:
-

On further debugging, there seems to be three issues:

# It is seen that this is caused due to the {{SchedulerSecurityInfo}} class 
disallowing all protocols except the ApplicationMasterProtocol.
# Once that was fixed, it was noticed that the AM always dies with a 
{{NullPointerException}}, since the {{DistSchedRegisterResponse}} did not 
contain a min and incr allocation capability
# Finally, it looks like the {{finishApplicationMaster}} call does not work. 
This seems to be because the method is not declared in the proto file.

> Exceptions thrown during AM registerAM call when Distributed Scheduling is 
> Enabled
> --
>
> Key: YARN-5162
> URL: https://issues.apache.org/jira/browse/YARN-5162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>
> The following Exception is seen and the AM fails to register with RM:
> {noformat}
> 16/05/24 17:09:26 INFO ipc.Server: Auth successful for 
> appattempt_146410856_0001_01 (auth:SIMPLE)
> 16/05/24 17:09:26 INFO amrmproxy.AMRMProxyService: Registering application 
> master. Host: Port:0 Tracking Url:
> 16/05/24 17:09:26 INFO amrmproxy.DefaultRequestInterceptor: Forwarding 
> registration request to the real YARN RM
> 16/05/24 17:09:26 DEBUG nodemanager.NodeStatusUpdaterImpl: Node's 
> health-status : true,
> 16/05/24 17:09:26 DEBUG nodemanager.NodeStatusUpdaterImpl: Sending out 1 
> container statuses: [ContainerStatus: [ContainerId: 
> container_146410856_0001_01_000
> 001, ExecutionType: GUARANTEED, State: RUNNING, Capability:  vCores:1>, Diagnostics: , ExitStatus: -1000, ]]
> 16/05/24 17:09:26 WARN ipc.Client: Exception encountered while connecting to 
> the server : org.apache.hadoop.security.AccessControlException: Client cannot 
> authe
> nticate via:[TOKEN]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5127) Expose ExecutionType in Container api record

2016-05-25 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5127:

Attachment: YARN-5127.002.patch

> Expose ExecutionType in Container api record
> 
>
> Key: YARN-5127
> URL: https://issues.apache.org/jira/browse/YARN-5127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: YARN-5127.002.patch, YARN-5127.v1.patch
>
>
> Currently the ExecutionType of the Container returned as a response to the 
> allocate call is contained in the {{ContinerTokenIdentifier}} which is 
> encoded into the ContainerToken.
> Unfortunately, the client would need to decode the returned token to access 
> the ContainerTokenIdentifier, which probably should not be allowed.
> This JIRA proposes to add a {{getExecutionType()}} method in the container 
> record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5127) Expose ExecutionType in Container api record

2016-05-25 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5127:

Attachment: YARN-5127.v1.patch

> Expose ExecutionType in Container api record
> 
>
> Key: YARN-5127
> URL: https://issues.apache.org/jira/browse/YARN-5127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: YARN-5127.v1.patch
>
>
> Currently the ExecutionType of the Container returned as a response to the 
> allocate call is contained in the {{ContinerTokenIdentifier}} which is 
> encoded into the ContainerToken.
> Unfortunately, the client would need to decode the returned token to access 
> the ContainerTokenIdentifier, which probably should not be allowed.
> This JIRA proposes to add a {{getExecutionType()}} method in the container 
> record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5127) Expose ExecutionType in Container api record

2016-05-25 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5127:

Attachment: (was: YARN-5127.0001.patch)

> Expose ExecutionType in Container api record
> 
>
> Key: YARN-5127
> URL: https://issues.apache.org/jira/browse/YARN-5127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: YARN-5127.v1.patch
>
>
> Currently the ExecutionType of the Container returned as a response to the 
> allocate call is contained in the {{ContinerTokenIdentifier}} which is 
> encoded into the ContainerToken.
> Unfortunately, the client would need to decode the returned token to access 
> the ContainerTokenIdentifier, which probably should not be allowed.
> This JIRA proposes to add a {{getExecutionType()}} method in the container 
> record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5127) Expose ExecutionType in Container api record

2016-05-25 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5127:

Attachment: YARN-5127.0001.patch

> Expose ExecutionType in Container api record
> 
>
> Key: YARN-5127
> URL: https://issues.apache.org/jira/browse/YARN-5127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: YARN-5127.0001.patch
>
>
> Currently the ExecutionType of the Container returned as a response to the 
> allocate call is contained in the {{ContinerTokenIdentifier}} which is 
> encoded into the ContainerToken.
> Unfortunately, the client would need to decode the returned token to access 
> the ContainerTokenIdentifier, which probably should not be allowed.
> This JIRA proposes to add a {{getExecutionType()}} method in the container 
> record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org