[jira] [Updated] (YARN-1897) CLI and core support for signal container functionality

2015-09-29 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-1897:
--
Attachment: YARN-1897-8.patch

Thanks [~xgong]. Here is the rebase. The failed unit tests aren't related.

> CLI and core support for signal container functionality
> ---
>
> Key: YARN-1897
> URL: https://issues.apache.org/jira/browse/YARN-1897
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
> YARN-1897-5.patch, YARN-1897-6.patch, YARN-1897-7.patch, YARN-1897-8.patch, 
> YARN-1897.1.patch
>
>
> We need to define SignalContainerRequest and SignalContainerResponse first as 
> they are needed by other sub tasks. SignalContainerRequest should use 
> OS-independent commands and provide a way to application to specify "reason" 
> for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1897) CLI and core support for signal container functionality

2015-09-17 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-1897:
--
Attachment: YARN-1897-7.patch

Thanks [~djp]!

bq.  Number of preempted containers won't be count as container failure in AM 
prospective and won't affect the success in application's running result.
Got it. Make sense to simulate it separately.

bq. If we want to emulate the case NM get shutdown (kill -9) suddenly and never 
come back and its impact to RMContainers.
Interesting scenario. Yes, this simulation needs to be handled differently.

bq. I would prefer YARN-4131 to address 2nd sources event as an addendum to our 
approach here. What do you think?
Sounds good. I have several questions about the implementations in YARN-4131 
and can comment there.

Here is the updated patch that addresses the test results for 
TestContainerManager. TestNetworkedJob failure isn't related.

> CLI and core support for signal container functionality
> ---
>
> Key: YARN-1897
> URL: https://issues.apache.org/jira/browse/YARN-1897
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
> YARN-1897-5.patch, YARN-1897-6.patch, YARN-1897-7.patch, YARN-1897.1.patch
>
>
> We need to define SignalContainerRequest and SignalContainerResponse first as 
> they are needed by other sub tasks. SignalContainerRequest should use 
> OS-independent commands and provide a way to application to specify "reason" 
> for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1897) CLI and core support for signal container functionality

2015-09-16 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-1897:
--
Attachment: YARN-1897-6.patch

Thanks [~djp]! Yes, the approach taken in YARN-4131 is simpler by leveraging 
the existing protocol (to accomplish the kill container scenario. But changing 
the NM-RM protocol will allow us to support other useful scenarios besides kill 
container and thread dump.

* "Pause container" test case.
* Send compound command "kill %pid%; sleep 50; kill -9 %pid%.".
* Run some JVM command to capture perf data.
* Allow container to map custom signal such as SIGUSR2 to any action it wants 
to run in the container process.

I would like to clarify the scenarios described in YARN-4131 to see if it is 
something the signal container can cover.

* Kill container via preemption. This means RM will know about it first before 
NM, different from the signal container order which kills container without 
RM's knowledge first. It seems killing container without RM knowledge matches 
container crash test case better. But killing container via preemption can 
simulate preemption. But does it matter here as long as container is killed?
* Container Expiration. Is that only for a container that has been 
allocated/acquired before it is in running state? It seems it is used by RM to 
time out on container allocation/acquisition. It will trigger 
{{RMContainerEventType.EXPIRE}} and won't have impact on running container.

Here is the updated patch to fix some of the unit test failures. I still don't 
know why the mapred test fails even though it works on my machine.

Look forward to more comments from you.

> CLI and core support for signal container functionality
> ---
>
> Key: YARN-1897
> URL: https://issues.apache.org/jira/browse/YARN-1897
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
> YARN-1897-5.patch, YARN-1897-6.patch, YARN-1897.1.patch
>
>
> We need to define SignalContainerRequest and SignalContainerResponse first as 
> they are needed by other sub tasks. SignalContainerRequest should use 
> OS-independent commands and provide a way to application to specify "reason" 
> for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1897) CLI and core support for signal container functionality

2015-09-14 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-1897:
--
Summary: CLI and core support for signal container functionality  (was: 
Define SignalContainerRequest and SignalContainerResponse)

> CLI and core support for signal container functionality
> ---
>
> Key: YARN-1897
> URL: https://issues.apache.org/jira/browse/YARN-1897
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
> YARN-1897-5.patch, YARN-1897.1.patch
>
>
> We need to define SignalContainerRequest and SignalContainerResponse first as 
> they are needed by other sub tasks. SignalContainerRequest should use 
> OS-independent commands and provide a way to application to specify "reason" 
> for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)