[jira] [Updated] (YARN-1897) CLI and core support for signal container functionality
[ https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated YARN-1897: -- Attachment: YARN-1897-8.patch Thanks [~xgong]. Here is the rebase. The failed unit tests aren't related. > CLI and core support for signal container functionality > --- > > Key: YARN-1897 > URL: https://issues.apache.org/jira/browse/YARN-1897 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, > YARN-1897-5.patch, YARN-1897-6.patch, YARN-1897-7.patch, YARN-1897-8.patch, > YARN-1897.1.patch > > > We need to define SignalContainerRequest and SignalContainerResponse first as > they are needed by other sub tasks. SignalContainerRequest should use > OS-independent commands and provide a way to application to specify "reason" > for diagnosis. SignalContainerResponse might be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1897) CLI and core support for signal container functionality
[ https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated YARN-1897: -- Attachment: YARN-1897-7.patch Thanks [~djp]! bq. Number of preempted containers won't be count as container failure in AM prospective and won't affect the success in application's running result. Got it. Make sense to simulate it separately. bq. If we want to emulate the case NM get shutdown (kill -9) suddenly and never come back and its impact to RMContainers. Interesting scenario. Yes, this simulation needs to be handled differently. bq. I would prefer YARN-4131 to address 2nd sources event as an addendum to our approach here. What do you think? Sounds good. I have several questions about the implementations in YARN-4131 and can comment there. Here is the updated patch that addresses the test results for TestContainerManager. TestNetworkedJob failure isn't related. > CLI and core support for signal container functionality > --- > > Key: YARN-1897 > URL: https://issues.apache.org/jira/browse/YARN-1897 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, > YARN-1897-5.patch, YARN-1897-6.patch, YARN-1897-7.patch, YARN-1897.1.patch > > > We need to define SignalContainerRequest and SignalContainerResponse first as > they are needed by other sub tasks. SignalContainerRequest should use > OS-independent commands and provide a way to application to specify "reason" > for diagnosis. SignalContainerResponse might be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1897) CLI and core support for signal container functionality
[ https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated YARN-1897: -- Attachment: YARN-1897-6.patch Thanks [~djp]! Yes, the approach taken in YARN-4131 is simpler by leveraging the existing protocol (to accomplish the kill container scenario. But changing the NM-RM protocol will allow us to support other useful scenarios besides kill container and thread dump. * "Pause container" test case. * Send compound command "kill %pid%; sleep 50; kill -9 %pid%.". * Run some JVM command to capture perf data. * Allow container to map custom signal such as SIGUSR2 to any action it wants to run in the container process. I would like to clarify the scenarios described in YARN-4131 to see if it is something the signal container can cover. * Kill container via preemption. This means RM will know about it first before NM, different from the signal container order which kills container without RM's knowledge first. It seems killing container without RM knowledge matches container crash test case better. But killing container via preemption can simulate preemption. But does it matter here as long as container is killed? * Container Expiration. Is that only for a container that has been allocated/acquired before it is in running state? It seems it is used by RM to time out on container allocation/acquisition. It will trigger {{RMContainerEventType.EXPIRE}} and won't have impact on running container. Here is the updated patch to fix some of the unit test failures. I still don't know why the mapred test fails even though it works on my machine. Look forward to more comments from you. > CLI and core support for signal container functionality > --- > > Key: YARN-1897 > URL: https://issues.apache.org/jira/browse/YARN-1897 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, > YARN-1897-5.patch, YARN-1897-6.patch, YARN-1897.1.patch > > > We need to define SignalContainerRequest and SignalContainerResponse first as > they are needed by other sub tasks. SignalContainerRequest should use > OS-independent commands and provide a way to application to specify "reason" > for diagnosis. SignalContainerResponse might be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1897) CLI and core support for signal container functionality
[ https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated YARN-1897: -- Summary: CLI and core support for signal container functionality (was: Define SignalContainerRequest and SignalContainerResponse) > CLI and core support for signal container functionality > --- > > Key: YARN-1897 > URL: https://issues.apache.org/jira/browse/YARN-1897 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, > YARN-1897-5.patch, YARN-1897.1.patch > > > We need to define SignalContainerRequest and SignalContainerResponse first as > they are needed by other sub tasks. SignalContainerRequest should use > OS-independent commands and provide a way to application to specify "reason" > for diagnosis. SignalContainerResponse might be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)