[jira] [Commented] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI "list" command
[ https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882636#comment-13882636 ] Kenji Kikushima commented on YARN-1480: --- I tried "mvn test -Dtest=org.apache.hadoop.yarn.client.api.impl.TestNMClient", but no error occurred. Hmm... > RM web services getApps() accepts many more filters than ApplicationCLI > "list" command > -- > > Key: YARN-1480 > URL: https://issues.apache.org/jira/browse/YARN-1480 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Kenji Kikushima > Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, > YARN-1480.patch > > > Nowadays RM web services getApps() accepts many more filters than > ApplicationCLI "list" command, which only accepts "state" and "type". IMHO, > ideally, different interfaces should provide consistent functionality. Is it > better to allow more filters in ApplicationCLI? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1656) Return type of YarnRPC.getProxy() should be the given protocol class instead of Object
[ https://issues.apache.org/jira/browse/YARN-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroshi Ikeda updated YARN-1656: Description: Writing code with explicit cast such as: {code} ((ApplicationClientProtocol) rpc.getProxy(ApplicationClientProtocol.class, rmAddress, appsManagerServerConf)); {code} is tedious. > Return type of YarnRPC.getProxy() should be the given protocol class instead > of Object > -- > > Key: YARN-1656 > URL: https://issues.apache.org/jira/browse/YARN-1656 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Hiroshi Ikeda >Priority: Minor > > Writing code with explicit cast such as: > {code} > ((ApplicationClientProtocol) rpc.getProxy(ApplicationClientProtocol.class, > rmAddress, appsManagerServerConf)); > {code} > is tedious. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1656) Return type of YarnRPC.getProxy() should be the given protocol class instead of Object
Hiroshi Ikeda created YARN-1656: --- Summary: Return type of YarnRPC.getProxy() should be the given protocol class instead of Object Key: YARN-1656 URL: https://issues.apache.org/jira/browse/YARN-1656 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Environment: Writing code with explicit cast such as: {code} ((ApplicationClientProtocol) rpc.getProxy(ApplicationClientProtocol.class, rmAddress, appsManagerServerConf)); {code} is tedious. Reporter: Hiroshi Ikeda Priority: Minor -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1656) Return type of YarnRPC.getProxy() should be the given protocol class instead of Object
[ https://issues.apache.org/jira/browse/YARN-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroshi Ikeda updated YARN-1656: Environment: (was: Writing code with explicit cast such as: {code} ((ApplicationClientProtocol) rpc.getProxy(ApplicationClientProtocol.class, rmAddress, appsManagerServerConf)); {code} is tedious. ) > Return type of YarnRPC.getProxy() should be the given protocol class instead > of Object > -- > > Key: YARN-1656 > URL: https://issues.apache.org/jira/browse/YARN-1656 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Hiroshi Ikeda >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1656) Return type of YarnRPC.getProxy() should be the given protocol class instead of Object
[ https://issues.apache.org/jira/browse/YARN-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroshi Ikeda updated YARN-1656: Attachment: YARN-1656.patch Added a sample patch for 2.2.0, which also fixes some similar issues for classes around YarnRPC. > Return type of YarnRPC.getProxy() should be the given protocol class instead > of Object > -- > > Key: YARN-1656 > URL: https://issues.apache.org/jira/browse/YARN-1656 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Hiroshi Ikeda >Priority: Minor > Attachments: YARN-1656.patch > > > Writing code with explicit cast such as: > {code} > ((ApplicationClientProtocol) rpc.getProxy(ApplicationClientProtocol.class, > rmAddress, appsManagerServerConf)); > {code} > is tedious. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI "list" command
[ https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-1480: Hadoop Flags: Reviewed > RM web services getApps() accepts many more filters than ApplicationCLI > "list" command > -- > > Key: YARN-1480 > URL: https://issues.apache.org/jira/browse/YARN-1480 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Kenji Kikushima > Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, > YARN-1480.patch > > > Nowadays RM web services getApps() accepts many more filters than > ApplicationCLI "list" command, which only accepts "state" and "type". IMHO, > ideally, different interfaces should provide consistent functionality. Is it > better to allow more filters in ApplicationCLI? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI "list" command
[ https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882798#comment-13882798 ] Akira AJISAKA commented on YARN-1480: - +1, the timeout in TestNMClient is not related to the patch. > RM web services getApps() accepts many more filters than ApplicationCLI > "list" command > -- > > Key: YARN-1480 > URL: https://issues.apache.org/jira/browse/YARN-1480 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Kenji Kikushima > Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, > YARN-1480.patch > > > Nowadays RM web services getApps() accepts many more filters than > ApplicationCLI "list" command, which only accepts "state" and "type". IMHO, > ideally, different interfaces should provide consistent functionality. Is it > better to allow more filters in ApplicationCLI? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1657) Timeout occurs in TestNMClient
Akira AJISAKA created YARN-1657: --- Summary: Timeout occurs in TestNMClient Key: YARN-1657 URL: https://issues.apache.org/jira/browse/YARN-1657 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Akira AJISAKA A timeout occurs in TestNMClient when a patch is tested by Jenkins. The following comment can be seen in YARN-1480, YARN-1611, and YARN-888. {code} {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.api.impl.TestNMClient {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package
[ https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-1632: -- Attachment: yarn-1632v2.patch > TestApplicationMasterServices should be under > org.apache.hadoop.yarn.server.resourcemanager package > --- > > Key: YARN-1632 > URL: https://issues.apache.org/jira/browse/YARN-1632 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 0.23.9, 2.2.0 >Reporter: Chen He >Assignee: Chen He >Priority: Minor > Attachments: yarn-1632.patch, yarn-1632v2.patch > > > ApplicationMasterService is under > org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test > file TestApplicationMasterService is placed under > org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice > package which only contains one file (TestApplicationMasterService). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package
[ https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882965#comment-13882965 ] Hadoop QA commented on YARN-1632: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625373/yarn-1632v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2942//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2942//console This message is automatically generated. > TestApplicationMasterServices should be under > org.apache.hadoop.yarn.server.resourcemanager package > --- > > Key: YARN-1632 > URL: https://issues.apache.org/jira/browse/YARN-1632 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 0.23.9, 2.2.0 >Reporter: Chen He >Assignee: Chen He >Priority: Minor > Attachments: yarn-1632.patch, yarn-1632v2.patch > > > ApplicationMasterService is under > org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test > file TestApplicationMasterService is placed under > org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice > package which only contains one file (TestApplicationMasterService). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882997#comment-13882997 ] Karthik Kambatla commented on YARN-1618: [~bikassaha], [~vinodkv] - will you be able to take a look at the patch? It would be nice to include this in 2.3 if possible, thought I wouldn't call it a blocker for 2.3. > Applications transition from NEW to FINAL_SAVING, and try to update > non-existing entries in the state-store > --- > > Key: YARN-1618 > URL: https://issues.apache.org/jira/browse/YARN-1618 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: yarn-1618-1.patch, yarn-1618-2.patch > > > YARN-891 augments the RMStateStore to store information on completed > applications. In the process, it adds transitions from NEW to FINAL_SAVING. > This leads to the RM trying to update entries in the state-store that do not > exist. On ZKRMStateStore, this leads to the RM crashing. > Previous description: > ZKRMStateStore fails to handle updates to znodes that don't exist. For > instance, this can happen when an app transitions from NEW to FINAL_SAVING. > In these cases, the store should create the missing znode and handle the > update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-745) Move UnmanagedAMLauncher to yarn client package
[ https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883018#comment-13883018 ] Bikas Saha commented on YARN-745: - That was the original plan of action for the unmanaged AM launcher. Its just specialization of yarnclient. Under a flag yarn client impl should be able to submit an unmanaged AM. However, running in-process or forking a new process should also be possible. Running in-process would be easier for debugging. Launching a separate process works for cases where people want to run their app in unmanaged mode (eg LAMA AM) Also when one already has an AM in jar then one could launch it in a process with java opts to enabled debugging instead of writing code to invoke YARNClient in unmanaged mode inside the AM. > Move UnmanagedAMLauncher to yarn client package > --- > > Key: YARN-745 > URL: https://issues.apache.org/jira/browse/YARN-745 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Bikas Saha > Fix For: 2.4.0 > > > Its currently sitting in yarn applications project which sounds wrong. client > project sounds better since it contains the utilities/libraries that clients > use to write and debug yarn applications. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1658) Webservice should redirect to active RM when HA is enabled.
Cindy Li created YARN-1658: -- Summary: Webservice should redirect to active RM when HA is enabled. Key: YARN-1658 URL: https://issues.apache.org/jira/browse/YARN-1658 Project: Hadoop YARN Issue Type: Sub-task Reporter: Cindy Li Assignee: Cindy Li When HA is enabled, web service to standby RM should be redirected to the active RM. This is a related Jira to YARN-1525. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1658) Webservice should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883230#comment-13883230 ] Karthik Kambatla commented on YARN-1658: Shouldn't this be a part of YARN-1525? IOW, what do we plan to include here that doesn't go into YARN-1525? > Webservice should redirect to active RM when HA is enabled. > --- > > Key: YARN-1658 > URL: https://issues.apache.org/jira/browse/YARN-1658 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Cindy Li >Assignee: Cindy Li > Labels: YARN > > When HA is enabled, web service to standby RM should be redirected to the > active RM. This is a related Jira to YARN-1525. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue
[ https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-1582: Attachment: YARN-1582-branch-0.23.patch Preliminary patch for branch23. The downside to this is that when the application first gets an application id they are told the cluster level setting which might be bigger then the per queue setting. This shouldn't be a problem as the application just fails later on. But it also causes the api to not be as clean. > Capacity Scheduler: add a maximum-allocation-mb setting per queue > -- > > Key: YARN-1582 > URL: https://issues.apache.org/jira/browse/YARN-1582 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 3.0.0, 0.23.10, 2.2.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-1582-branch-0.23.patch > > > We want to allow certain queues to use larger container sizes while limiting > other queues to smaller container sizes. Setting it per queue will help > prevent abuse, help limit the impact of reservations, and allow changes in > the maximum container size to be rolled out more easily. > One reason this is needed is more application types are becoming available on > yarn and certain applications require more memory to run efficiently. While > we want to allow for that we don't want other applications to abuse that and > start requesting bigger containers then what they really need. > Note that we could have this based on application type, but that might not be > totally accurate either since for example you might want to allow certain > users on MapReduce to use larger containers, while limiting other users of > MapReduce to smaller containers. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue
[ https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883238#comment-13883238 ] Thomas Graves commented on YARN-1582: - Note the patch attached leave the cluster level setting. The per queue settings must be less than or equal to the cluster level setting. It also allows both the cluster level and per queue to be refreshed (yarn rmadmin -refreshQueues) as long as the value increases. We can't allow it to decrease since we've told the AM's the max size and letting that decrease could mess them up. > Capacity Scheduler: add a maximum-allocation-mb setting per queue > -- > > Key: YARN-1582 > URL: https://issues.apache.org/jira/browse/YARN-1582 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 3.0.0, 0.23.10, 2.2.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-1582-branch-0.23.patch > > > We want to allow certain queues to use larger container sizes while limiting > other queues to smaller container sizes. Setting it per queue will help > prevent abuse, help limit the impact of reservations, and allow changes in > the maximum container size to be rolled out more easily. > One reason this is needed is more application types are becoming available on > yarn and certain applications require more memory to run efficiently. While > we want to allow for that we don't want other applications to abuse that and > start requesting bigger containers then what they really need. > Note that we could have this based on application type, but that might not be > totally accurate either since for example you might want to allow certain > users on MapReduce to use larger containers, while limiting other users of > MapReduce to smaller containers. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever
[ https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Acharya updated YARN-1630: - Attachment: diff-1.txt > Unbounded waiting for response in YarnClientImpl.java causes thread to hang > forever > --- > > Key: YARN-1630 > URL: https://issues.apache.org/jira/browse/YARN-1630 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.2.0 >Reporter: Aditya Acharya >Assignee: Aditya Acharya > Attachments: diff-1.txt, diff.txt > > > I ran an MR2 application that would have been long running, and killed it > programmatically using a YarnClient. The app was killed, but the client hung > forever. The message that I saw, which spammed the logs, was "Watiting for > application application_1389036507624_0018 to be killed." > The RM log indicated that the app had indeed transitioned from RUNNING to > KILLED, but for some reason future responses to the RPC to kill the > application did not indicate that the app had been terminated. > I tracked this down to YarnClientImpl.java, and though I was unable to > reproduce the bug, I wrote a patch to introduce a bound on the number of > times that YarnClientImpl retries the RPC before giving up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever
[ https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883242#comment-13883242 ] Aditya Acharya commented on YARN-1630: -- Added updated diff with requested changes. > Unbounded waiting for response in YarnClientImpl.java causes thread to hang > forever > --- > > Key: YARN-1630 > URL: https://issues.apache.org/jira/browse/YARN-1630 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.2.0 >Reporter: Aditya Acharya >Assignee: Aditya Acharya > Attachments: diff-1.txt, diff.txt > > > I ran an MR2 application that would have been long running, and killed it > programmatically using a YarnClient. The app was killed, but the client hung > forever. The message that I saw, which spammed the logs, was "Watiting for > application application_1389036507624_0018 to be killed." > The RM log indicated that the app had indeed transitioned from RUNNING to > KILLED, but for some reason future responses to the RPC to kill the > application did not indicate that the app had been terminated. > I tracked this down to YarnClientImpl.java, and though I was unable to > reproduce the bug, I wrote a patch to introduce a bound on the number of > times that YarnClientImpl retries the RPC before giving up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1658) Webservice should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883262#comment-13883262 ] Cindy Li commented on YARN-1658: Talked with Vinod@hortonworks offline. We would like to do this separately from the web UI part. > Webservice should redirect to active RM when HA is enabled. > --- > > Key: YARN-1658 > URL: https://issues.apache.org/jira/browse/YARN-1658 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Cindy Li >Assignee: Cindy Li > Labels: YARN > > When HA is enabled, web service to standby RM should be redirected to the > active RM. This is a related Jira to YARN-1525. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1636) Implement timeline related web-services inside AHS for storing and retrieving entities+eventies
[ https://issues.apache.org/jira/browse/YARN-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1636: -- Assignee: Zhijie Shen (was: Vinod Kumar Vavilapalli) > Implement timeline related web-services inside AHS for storing and retrieving > entities+eventies > --- > > Key: YARN-1636 > URL: https://issues.apache.org/jira/browse/YARN-1636 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1637) Implement a client library for java users to post entities+events
[ https://issues.apache.org/jira/browse/YARN-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-1637: - Assignee: Zhijie Shen (was: Vinod Kumar Vavilapalli) > Implement a client library for java users to post entities+events > - > > Key: YARN-1637 > URL: https://issues.apache.org/jira/browse/YARN-1637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > > This is a wrapper around the web-service to facilitate easy posting of > entity+event data to the time-line server. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1634) Define a ApplicationTimelineStore interface and an in-memory implementation
[ https://issues.apache.org/jira/browse/YARN-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1634: -- Assignee: Zhijie Shen (was: Vinod Kumar Vavilapalli) > Define a ApplicationTimelineStore interface and an in-memory implementation > > > Key: YARN-1634 > URL: https://issues.apache.org/jira/browse/YARN-1634 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > > As per the design doc, the store needs to pluggable. We need a base > interface, and an in-memory implementation for testing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1635) Implement a Leveldb based ApplicationTimelineStore
[ https://issues.apache.org/jira/browse/YARN-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1635: -- Assignee: Vinod Kumar Vavilapalli (was: Zhijie Shen) > Implement a Leveldb based ApplicationTimelineStore > -- > > Key: YARN-1635 > URL: https://issues.apache.org/jira/browse/YARN-1635 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > > As per the design doc, we need a levelDB + local-filesystem based > implementation to start with and for small deployments. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1633) Define user-faced entity, entity-info and event objects
[ https://issues.apache.org/jira/browse/YARN-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1633: -- Summary: Define user-faced entity, entity-info and event objects (was: Define the entity, entity-info and event objects) > Define user-faced entity, entity-info and event objects > --- > > Key: YARN-1633 > URL: https://issues.apache.org/jira/browse/YARN-1633 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > > Define the core objects of the application-timeline effort. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1635) Implement a Leveldb based ApplicationTimelineStore
[ https://issues.apache.org/jira/browse/YARN-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-1635: - Assignee: Zhijie Shen (was: Vinod Kumar Vavilapalli) > Implement a Leveldb based ApplicationTimelineStore > -- > > Key: YARN-1635 > URL: https://issues.apache.org/jira/browse/YARN-1635 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > > As per the design doc, we need a levelDB + local-filesystem based > implementation to start with and for small deployments. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1633) Define the entity, entity-info and event objects
[ https://issues.apache.org/jira/browse/YARN-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-1633: - Assignee: Zhijie Shen (was: Vinod Kumar Vavilapalli) > Define the entity, entity-info and event objects > > > Key: YARN-1633 > URL: https://issues.apache.org/jira/browse/YARN-1633 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > > Define the core objects of the application-timeline effort. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever
[ https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883269#comment-13883269 ] Hadoop QA commented on YARN-1630: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625422/diff-1.txt against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2943//console This message is automatically generated. > Unbounded waiting for response in YarnClientImpl.java causes thread to hang > forever > --- > > Key: YARN-1630 > URL: https://issues.apache.org/jira/browse/YARN-1630 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.2.0 >Reporter: Aditya Acharya >Assignee: Aditya Acharya > Attachments: diff-1.txt, diff.txt > > > I ran an MR2 application that would have been long running, and killed it > programmatically using a YarnClient. The app was killed, but the client hung > forever. The message that I saw, which spammed the logs, was "Watiting for > application application_1389036507624_0018 to be killed." > The RM log indicated that the app had indeed transitioned from RUNNING to > KILLED, but for some reason future responses to the RPC to kill the > application did not indicate that the app had been terminated. > I tracked this down to YarnClientImpl.java, and though I was unable to > reproduce the bug, I wrote a patch to introduce a bound on the number of > times that YarnClientImpl retries the RPC before giving up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1635) Implement a Leveldb based ApplicationTimelineStore
[ https://issues.apache.org/jira/browse/YARN-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi reassigned YARN-1635: Assignee: Billie Rinaldi (was: Vinod Kumar Vavilapalli) > Implement a Leveldb based ApplicationTimelineStore > -- > > Key: YARN-1635 > URL: https://issues.apache.org/jira/browse/YARN-1635 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Billie Rinaldi > > As per the design doc, we need a levelDB + local-filesystem based > implementation to start with and for small deployments. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1659) Define store-facing entity, entity-info and event objects
Billie Rinaldi created YARN-1659: Summary: Define store-facing entity, entity-info and event objects Key: YARN-1659 URL: https://issues.apache.org/jira/browse/YARN-1659 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi These will be used by ApplicationTimelineStore interface. The web services will convert the store-facing obects to the user-facing objects. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-950) Ability to limit or avoid aggregating logs beyond a certain size
[ https://issues.apache.org/jira/browse/YARN-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883294#comment-13883294 ] Jason Lowe commented on YARN-950: - Ran into another case where a user filled a disk with a large stdout/stderr, and the NM took forever to recover the disk since it was trying to aggregate the huge file to HDFS. Not only was this a waste of HDFS space and network bandwidth, but ops were unable to manually recover easily by removing the large logfile. The NM process was holding the file open during log aggregation, so the disk space was not able to be freed until either the NM finished aggregating or the NM process exited. Many users would prefer the ability to grab a configurable number of bytes at the head of a large log and a number of bytes at the end of the large log. Of course the NM would need to inject some text into the log to indicate it was truncated, and bonus points if it includes the original log size and/or the amount that was truncated. > Ability to limit or avoid aggregating logs beyond a certain size > > > Key: YARN-950 > URL: https://issues.apache.org/jira/browse/YARN-950 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 0.23.9 >Reporter: Jason Lowe > > It would be nice if ops could configure a cluster such that any container log > beyond a configured size would either only have a portion of the log > aggregated or not aggregated at all. This would help speed up the recovery > path for cases where a container creates an enormous log and fills a disk, as > currently it tries to aggregate the entire, enormous log rather than only > aggregating a small portion or simply deleting it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
[ https://issues.apache.org/jira/browse/YARN-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1629: - Target Version/s: 2.3.0 > IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer > -- > > Key: YARN-1629 > URL: https://issues.apache.org/jira/browse/YARN-1629 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1629-1.patch, YARN-1629-2.patch, YARN-1629.patch > > > This can occur when the second-to-last app in a queue's pending app list is > made runnable. The app is pulled out from under the iterator. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1600) RM does not startup when security is enabled without spnego configured
[ https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883388#comment-13883388 ] Jason Lowe commented on YARN-1600: -- +1, lgtm. Will commit this shortly. > RM does not startup when security is enabled without spnego configured > -- > > Key: YARN-1600 > URL: https://issues.apache.org/jira/browse/YARN-1600 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Jason Lowe >Assignee: Haohui Mai >Priority: Blocker > Attachments: YARN-1600.000.patch > > > We have a custom auth filter in front of our various UI pages that handles > user authentication. However currently the RM assumes that if security is > enabled then the user must have configured spnego as well for the RM web > pages which is not true in our case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1600) RM does not startup when security is enabled without spnego configured
[ https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883390#comment-13883390 ] Jason Lowe commented on YARN-1600: -- On second thought, holding off the commit until the recent branch-2.3 re-swizzle is sorted out. > RM does not startup when security is enabled without spnego configured > -- > > Key: YARN-1600 > URL: https://issues.apache.org/jira/browse/YARN-1600 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Assignee: Haohui Mai >Priority: Blocker > Attachments: YARN-1600.000.patch > > > We have a custom auth filter in front of our various UI pages that handles > user authentication. However currently the RM assumes that if security is > enabled then the user must have configured spnego as well for the RM web > pages which is not true in our case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1600) RM does not startup when security is enabled without spnego configured
[ https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1600: - Target Version/s: 2.3.0 (was: 2.4.0) Affects Version/s: (was: 2.4.0) 2.3.0 Hadoop Flags: Reviewed > RM does not startup when security is enabled without spnego configured > -- > > Key: YARN-1600 > URL: https://issues.apache.org/jira/browse/YARN-1600 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Assignee: Haohui Mai >Priority: Blocker > Attachments: YARN-1600.000.patch > > > We have a custom auth filter in front of our various UI pages that handles > user authentication. However currently the RM assumes that if security is > enabled then the user must have configured spnego as well for the RM web > pages which is not true in our case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1642) RMDTRenewer#getRMClient should use ClientRMProxy
[ https://issues.apache.org/jira/browse/YARN-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883392#comment-13883392 ] Sandy Ryza commented on YARN-1642: -- +1 > RMDTRenewer#getRMClient should use ClientRMProxy > > > Key: YARN-1642 > URL: https://issues.apache.org/jira/browse/YARN-1642 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: yarn-1642-1.patch > > > RMDTRenewer#getRMClient gets a proxy to the RM in the conf directly instead > of going through ClientRMProxy. > {code} > final YarnRPC rpc = YarnRPC.create(conf); > return > (ApplicationClientProtocol)rpc.getProxy(ApplicationClientProtocol.class, > addr, conf); > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1600) RM does not startup when security is enabled without spnego configured
[ https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883470#comment-13883470 ] Hadoop QA commented on YARN-1600: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625140/YARN-1600.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2944//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2944//console This message is automatically generated. > RM does not startup when security is enabled without spnego configured > -- > > Key: YARN-1600 > URL: https://issues.apache.org/jira/browse/YARN-1600 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Assignee: Haohui Mai >Priority: Blocker > Attachments: YARN-1600.000.patch > > > We have a custom auth filter in front of our various UI pages that handles > user authentication. However currently the RM assumes that if security is > enabled then the user must have configured spnego as well for the RM web > pages which is not true in our case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883536#comment-13883536 ] Karthik Kambatla commented on YARN-1618: Made this a blocker for 2.3, as this leads to the RM going down when recovery is enabled. > Applications transition from NEW to FINAL_SAVING, and try to update > non-existing entries in the state-store > --- > > Key: YARN-1618 > URL: https://issues.apache.org/jira/browse/YARN-1618 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: yarn-1618-1.patch, yarn-1618-2.patch > > > YARN-891 augments the RMStateStore to store information on completed > applications. In the process, it adds transitions from NEW to FINAL_SAVING. > This leads to the RM trying to update entries in the state-store that do not > exist. On ZKRMStateStore, this leads to the RM crashing. > Previous description: > ZKRMStateStore fails to handle updates to znodes that don't exist. For > instance, this can happen when an app transitions from NEW to FINAL_SAVING. > In these cases, the store should create the missing znode and handle the > update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1618: --- Target Version/s: 2.3.0 (was: 2.4.0) > Applications transition from NEW to FINAL_SAVING, and try to update > non-existing entries in the state-store > --- > > Key: YARN-1618 > URL: https://issues.apache.org/jira/browse/YARN-1618 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: yarn-1618-1.patch, yarn-1618-2.patch > > > YARN-891 augments the RMStateStore to store information on completed > applications. In the process, it adds transitions from NEW to FINAL_SAVING. > This leads to the RM trying to update entries in the state-store that do not > exist. On ZKRMStateStore, this leads to the RM crashing. > Previous description: > ZKRMStateStore fails to handle updates to znodes that don't exist. For > instance, this can happen when an app transitions from NEW to FINAL_SAVING. > In these cases, the store should create the missing znode and handle the > update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever
[ https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Acharya updated YARN-1630: - Attachment: (was: diff-1.txt) > Unbounded waiting for response in YarnClientImpl.java causes thread to hang > forever > --- > > Key: YARN-1630 > URL: https://issues.apache.org/jira/browse/YARN-1630 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.2.0 >Reporter: Aditya Acharya >Assignee: Aditya Acharya > Attachments: diff-1.txt, diff.txt > > > I ran an MR2 application that would have been long running, and killed it > programmatically using a YarnClient. The app was killed, but the client hung > forever. The message that I saw, which spammed the logs, was "Watiting for > application application_1389036507624_0018 to be killed." > The RM log indicated that the app had indeed transitioned from RUNNING to > KILLED, but for some reason future responses to the RPC to kill the > application did not indicate that the app had been terminated. > I tracked this down to YarnClientImpl.java, and though I was unable to > reproduce the bug, I wrote a patch to introduce a bound on the number of > times that YarnClientImpl retries the RPC before giving up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever
[ https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Acharya updated YARN-1630: - Attachment: diff-1.txt > Unbounded waiting for response in YarnClientImpl.java causes thread to hang > forever > --- > > Key: YARN-1630 > URL: https://issues.apache.org/jira/browse/YARN-1630 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.2.0 >Reporter: Aditya Acharya >Assignee: Aditya Acharya > Attachments: diff-1.txt, diff.txt > > > I ran an MR2 application that would have been long running, and killed it > programmatically using a YarnClient. The app was killed, but the client hung > forever. The message that I saw, which spammed the logs, was "Watiting for > application application_1389036507624_0018 to be killed." > The RM log indicated that the app had indeed transitioned from RUNNING to > KILLED, but for some reason future responses to the RPC to kill the > application did not indicate that the app had been terminated. > I tracked this down to YarnClientImpl.java, and though I was unable to > reproduce the bug, I wrote a patch to introduce a bound on the number of > times that YarnClientImpl retries the RPC before giving up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever
[ https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883585#comment-13883585 ] Aditya Acharya commented on YARN-1630: -- Updated patch, including a unit test this time. > Unbounded waiting for response in YarnClientImpl.java causes thread to hang > forever > --- > > Key: YARN-1630 > URL: https://issues.apache.org/jira/browse/YARN-1630 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.2.0 >Reporter: Aditya Acharya >Assignee: Aditya Acharya > Attachments: diff-1.txt, diff.txt > > > I ran an MR2 application that would have been long running, and killed it > programmatically using a YarnClient. The app was killed, but the client hung > forever. The message that I saw, which spammed the logs, was "Watiting for > application application_1389036507624_0018 to be killed." > The RM log indicated that the app had indeed transitioned from RUNNING to > KILLED, but for some reason future responses to the RPC to kill the > application did not indicate that the app had been terminated. > I tracked this down to YarnClientImpl.java, and though I was unable to > reproduce the bug, I wrote a patch to introduce a bound on the number of > times that YarnClientImpl retries the RPC before giving up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883587#comment-13883587 ] Bikas Saha commented on YARN-1618: -- Is this related? Does not look like a compatible change. If it was valid earlier then we should not change the logic now. {code} -Assert.assertTrue("application finish time is not greater then 0", -(application.getFinishTime() > 0)); +Assert.assertTrue("application start time is less than 0", +(application.getStartTime() >= 0)); {code} I am not sure this would happen in real life since only a START event would trigger going to the scheduler and introduce the possibility of a REJECTED event. If that is the case then this transition should not exist since it would be a bug if this got triggered. {code} +.addTransition(RMAppState.NEW, RMAppState.FAILED, +RMAppEventType.APP_REJECTED, new AppRejectedTransition()) {code} We should add a testAppNewKilled() test and possibly remove the testAppNewReject() if the previous comment is correct. > Applications transition from NEW to FINAL_SAVING, and try to update > non-existing entries in the state-store > --- > > Key: YARN-1618 > URL: https://issues.apache.org/jira/browse/YARN-1618 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: yarn-1618-1.patch, yarn-1618-2.patch > > > YARN-891 augments the RMStateStore to store information on completed > applications. In the process, it adds transitions from NEW to FINAL_SAVING. > This leads to the RM trying to update entries in the state-store that do not > exist. On ZKRMStateStore, this leads to the RM crashing. > Previous description: > ZKRMStateStore fails to handle updates to znodes that don't exist. For > instance, this can happen when an app transitions from NEW to FINAL_SAVING. > In these cases, the store should create the missing znode and handle the > update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883598#comment-13883598 ] Karthik Kambatla commented on YARN-1618: bq. Is this related? Does not look like a compatible change. If it was valid earlier then we should not change the logic now. This isn't related. However, the test fails for me on trunk too occasionally. I can leave the fix out. Agree NEW -> FAILED shouldn't exist. Thanks for catching this. Will fix up the patch shortly. > Applications transition from NEW to FINAL_SAVING, and try to update > non-existing entries in the state-store > --- > > Key: YARN-1618 > URL: https://issues.apache.org/jira/browse/YARN-1618 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: yarn-1618-1.patch, yarn-1618-2.patch > > > YARN-891 augments the RMStateStore to store information on completed > applications. In the process, it adds transitions from NEW to FINAL_SAVING. > This leads to the RM trying to update entries in the state-store that do not > exist. On ZKRMStateStore, this leads to the RM crashing. > Previous description: > ZKRMStateStore fails to handle updates to znodes that don't exist. For > instance, this can happen when an app transitions from NEW to FINAL_SAVING. > In these cases, the store should create the missing znode and handle the > update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1655) [YARN-1197] Add implementations to FairScheduler to support increase/decrease container resource
[ https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned YARN-1655: Assignee: Sandy Ryza > [YARN-1197] Add implementations to FairScheduler to support increase/decrease > container resource > > > Key: YARN-1655 > URL: https://issues.apache.org/jira/browse/YARN-1655 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Wangda Tan >Assignee: Sandy Ryza > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever
[ https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883614#comment-13883614 ] Hadoop QA commented on YARN-1630: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625490/diff-1.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.api.impl.TestNMClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2945//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2945//console This message is automatically generated. > Unbounded waiting for response in YarnClientImpl.java causes thread to hang > forever > --- > > Key: YARN-1630 > URL: https://issues.apache.org/jira/browse/YARN-1630 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.2.0 >Reporter: Aditya Acharya >Assignee: Aditya Acharya > Attachments: diff-1.txt, diff.txt > > > I ran an MR2 application that would have been long running, and killed it > programmatically using a YarnClient. The app was killed, but the client hung > forever. The message that I saw, which spammed the logs, was "Watiting for > application application_1389036507624_0018 to be killed." > The RM log indicated that the app had indeed transitioned from RUNNING to > KILLED, but for some reason future responses to the RPC to kill the > application did not indicate that the app had been terminated. > I tracked this down to YarnClientImpl.java, and though I was unable to > reproduce the bug, I wrote a patch to introduce a bound on the number of > times that YarnClientImpl retries the RPC before giving up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1660) add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM
Arpit Gupta created YARN-1660: - Summary: add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM Key: YARN-1660 URL: https://issues.apache.org/jira/browse/YARN-1660 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Arpit Gupta Currently the user has to specify all the various host:port properties for RM. We should follow the pattern that we do for non HA setup where we can specify yarn.resourcemanager.hostname.rm-id and the defaults are used for all other affected properties. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1660) add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM
[ https://issues.apache.org/jira/browse/YARN-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883658#comment-13883658 ] Arpit Gupta commented on YARN-1660: --- Here is a list of properties that one needs to set for each rm yarn.resourcemanager.address.rm1 yarn.resourcemanager.scheduler.address.rm1 yarn.resourcemanager.webapp.address.rm1 yarn.resourcemanager.webapp.https.address.rm1 yarn.resourcemanager.resource-tracker.address.rm1 yarn.resourcemanager.admin.address.rm1 yarn.resourcemanager.ha.admin.address.rm1 > add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting > all the various host:port properties for RM > - > > Key: YARN-1660 > URL: https://issues.apache.org/jira/browse/YARN-1660 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Arpit Gupta > > Currently the user has to specify all the various host:port properties for > RM. We should follow the pattern that we do for non HA setup where we can > specify yarn.resourcemanager.hostname.rm-id and the defaults are used for all > other affected properties. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883659#comment-13883659 ] Karthik Kambatla commented on YARN-1618: bq. I am not sure this would happen in real life since only a START event would trigger going to the scheduler and introduce the possibility of a REJECTED event. Actually, looking at all the places an APP_REJECTED event is called, found that YARN-674 triggers APP_REJECTED on NEW. We could either change this to KILL, or update our comments in RMAppEventType to reflect that APP_REJECTED could come from places other than the scheduler. [~bikassaha] - thoughts? > Applications transition from NEW to FINAL_SAVING, and try to update > non-existing entries in the state-store > --- > > Key: YARN-1618 > URL: https://issues.apache.org/jira/browse/YARN-1618 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: yarn-1618-1.patch, yarn-1618-2.patch > > > YARN-891 augments the RMStateStore to store information on completed > applications. In the process, it adds transitions from NEW to FINAL_SAVING. > This leads to the RM trying to update entries in the state-store that do not > exist. On ZKRMStateStore, this leads to the RM crashing. > Previous description: > ZKRMStateStore fails to handle updates to znodes that don't exist. For > instance, this can happen when an app transitions from NEW to FINAL_SAVING. > In these cases, the store should create the missing znode and handle the > update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1660) add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM
[ https://issues.apache.org/jira/browse/YARN-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883660#comment-13883660 ] Karthik Kambatla commented on YARN-1660: +1 to doing this. Thanks for filing this, Arpit. > add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting > all the various host:port properties for RM > - > > Key: YARN-1660 > URL: https://issues.apache.org/jira/browse/YARN-1660 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Arpit Gupta > > Currently the user has to specify all the various host:port properties for > RM. We should follow the pattern that we do for non HA setup where we can > specify yarn.resourcemanager.hostname.rm-id and the defaults are used for all > other affected properties. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1660) add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM
[ https://issues.apache.org/jira/browse/YARN-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated YARN-1660: -- Assignee: Xuan Gong > add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting > all the various host:port properties for RM > - > > Key: YARN-1660 > URL: https://issues.apache.org/jira/browse/YARN-1660 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Arpit Gupta >Assignee: Xuan Gong > > Currently the user has to specify all the various host:port properties for > RM. We should follow the pattern that we do for non HA setup where we can > specify yarn.resourcemanager.hostname.rm-id and the defaults are used for all > other affected properties. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1661) AppMaster logs says failing even if an application does succeed.
Tassapol Athiapinya created YARN-1661: - Summary: AppMaster logs says failing even if an application does succeed. Key: YARN-1661 URL: https://issues.apache.org/jira/browse/YARN-1661 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.4.0 Reporter: Tassapol Athiapinya Fix For: 2.4.0 Run: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar -shell_command ls Open AM logs. Last line would indicate AM failure even though container logs print good ls result. {code} 2014-01-24 21:45:29,592 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:finish(599)) - Application completed. Signalling finish to RM 2014-01-24 21:45:29,612 INFO [main] impl.AMRMClientImpl (AMRMClientImpl.java:unregisterApplicationMaster(315)) - Waiting for application to be successfully unregistered. 2014-01-24 21:45:29,816 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:main(267)) - Application Master failed. exiting {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1618: --- Attachment: yarn-1618-3.patch The patch that retains receiving APP_REJECTED on NEW. Fixed tests accordingly. > Applications transition from NEW to FINAL_SAVING, and try to update > non-existing entries in the state-store > --- > > Key: YARN-1618 > URL: https://issues.apache.org/jira/browse/YARN-1618 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch > > > YARN-891 augments the RMStateStore to store information on completed > applications. In the process, it adds transitions from NEW to FINAL_SAVING. > This leads to the RM trying to update entries in the state-store that do not > exist. On ZKRMStateStore, this leads to the RM crashing. > Previous description: > ZKRMStateStore fails to handle updates to znodes that don't exist. For > instance, this can happen when an app transitions from NEW to FINAL_SAVING. > In these cases, the store should create the missing znode and handle the > update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1578) Fix how to handle ApplicationHistory about the container
[ https://issues.apache.org/jira/browse/YARN-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883816#comment-13883816 ] Zhijie Shen commented on YARN-1578: --- It should be fine if the container is not finished. In this case, the finish information will not be persisted into the store. Then, in the history file, the history information entry should not exist. However, the exception showed in the description indicates either historyData or finishData is null. historyData cannot be null, because it's just constructed in this method. Then, finishData is only one to be nullable, though it shouldn't be, because mergeContainerHistoryData is called when the finish information entry is found. Would you please debug the following code in FileSystemApplicationHistorySever again? Or would you please provide more log about the bug? {code} while ((!readStartData || !readFinishData) && hfReader.hasNext()) { HistoryFileReader.Entry entry = hfReader.next(); if (entry.key.id.equals(containerId.toString())) { if (entry.key.suffix.equals(START_DATA_SUFFIX)) { ContainerStartData startData = parseContainerStartData(entry.value); mergeContainerHistoryData(historyData, startData); readStartData = true; } else if (entry.key.suffix.equals(FINISH_DATA_SUFFIX)) { ContainerFinishData finishData = parseContainerFinishData(entry.value); mergeContainerHistoryData(historyData, finishData); readFinishData = true; } } } {code} > Fix how to handle ApplicationHistory about the container > > > Key: YARN-1578 > URL: https://issues.apache.org/jira/browse/YARN-1578 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-321 >Reporter: Shinichi Yamashita >Assignee: Shinichi Yamashita > Attachments: YARN-1578.patch, screenshot.png > > > I carried out PiEstimator job at Hadoop cluster which applied YARN-321. > After the job end and when I accessed Web UI of HistoryServer, it displayed > "500". And HistoryServer daemon log was output as follows. > {code} > 2014-01-09 13:31:12,227 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error > handling URI: > /applicationhistory/appattempt/appattempt_1389146249925_0008_01 > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > (snip...) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.mergeContainerHistoryData(FileSystemApplicationHistoryStore.java:696) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getContainers(FileSystemApplicationHistoryStore.java:429) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationHistoryManagerImpl.java:201) > at > org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:110) > (snip...) > {code} > I confirmed that there was container which was not finished from > ApplicationHistory file. > In ResourceManager daemon log, ResourceManager reserved this container, but > did not allocate it. > Therefore, about a container which is not allocated, it is necessary to > change how to handle in ApplicationHistory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-925) Augment HistoryStorage Reader Interface to Support Filters When Getting Applications
[ https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883828#comment-13883828 ] Zhijie Shen commented on YARN-925: -- [~sinchii], thanks for taking care of the filters. I had quick look at the patch. IMO, it's on the right track. However, the major task of this issue is to optimize the filtering in the implementation of application history store, in particular FileSystemApplicationHistoryStore. The current patch still reads each individual history file and loads the full historical information of an application, followed by a number of filtering conditions. It doesn't make the difference with doing this filtering in ApplicationHistoryManager. Given a million history files, it will be a disaster to read all of them. By pushing the filters back to the implementation of application history store, I suppose that the implementation knows best about how the historic data is stored, and we can do optimization here. In the FS implementation, ideally, we should build an index in some way, and only read the historical files that hit the filters. > Augment HistoryStorage Reader Interface to Support Filters When Getting > Applications > > > Key: YARN-925 > URL: https://issues.apache.org/jira/browse/YARN-925 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Mayank Bansal >Assignee: Mayank Bansal > Fix For: YARN-321 > > Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, > YARN-925-4.patch, YARN-925-5.patch, YARN-925-6.patch, YARN-925-7.patch, > YARN-925-8.patch > > > We need to allow filter parameters for getApplications, pushing filtering to > the implementations of the interface. The implementations should know the > best about optimizing filtering. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-925) Augment HistoryStorage Reader Interface to Support Filters When Getting Applications
[ https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-925: - Assignee: Shinichi Yamashita (was: Mayank Bansal) > Augment HistoryStorage Reader Interface to Support Filters When Getting > Applications > > > Key: YARN-925 > URL: https://issues.apache.org/jira/browse/YARN-925 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Mayank Bansal >Assignee: Shinichi Yamashita > Fix For: YARN-321 > > Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, > YARN-925-4.patch, YARN-925-5.patch, YARN-925-6.patch, YARN-925-7.patch, > YARN-925-8.patch > > > We need to allow filter parameters for getApplications, pushing filtering to > the implementations of the interface. The implementations should know the > best about optimizing filtering. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1662) Capacity Scheduler reservation issue cause Job Hang
Sunil G created YARN-1662: - Summary: Capacity Scheduler reservation issue cause Job Hang Key: YARN-1662 URL: https://issues.apache.org/jira/browse/YARN-1662 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Environment: Suse 11 SP1 + Linux Reporter: Sunil G There are 2 node managers in my cluster. NM1 with 8GB NM2 with 8GB I am submitting a Job with below details: AM with 2GB Map needs 5GB Reducer needs 3GB slowstart is enabled with 0.5 10maps and 50reducers are assigned. 5maps are completed. Now few reducers got scheduled. Now NM1 has 2GB AM and 3Gb Reducer_1[Used 5GB] NM2 has 3Gb Reducer_2[Used 3GB] A Map has now reserved(5GB) in NM1 which has only 3Gb free. It hangs forever. Potential issue is, reservation is now blocked in NM1 for a Map which needs 5GB. But the Reducer_1 hangs by waiting for few map ouputs. Reducer side preemption also not happened as few headroom is still available. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1639) YARM RM HA requires different configs on different RM hosts
[ https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883863#comment-13883863 ] Xuan Gong commented on YARN-1639: - My proposal: 1. yarn.resourcemanager.ha.id will become optional. 2. When RM starts(if ha.id is not specified), it will automatically figure out its rm_id by using checking whether RM_ADDRESS is matched with its local address. For example, the RM local_address is 1.1.1.1, and we have configuration for yarn.resourcemanager.address_rm2 is 1.1.1.1. So, this rm can figure out its rm_id as rm2. (There is an assumption here. One node can only launch one RM.) 3. We can still explicitly specify the ha.id. If this value is explicitly specified, the rm can use this value directly. It is mostly for testing purpose, such as MiniYarnCluster, etc. > YARM RM HA requires different configs on different RM hosts > --- > > Key: YARN-1639 > URL: https://issues.apache.org/jira/browse/YARN-1639 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Arpit Gupta >Assignee: Xuan Gong > > We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you > want to first or second. > This means we have different configs on different RM nodes. This is unlike > HDFS HA where the same configs are pushed to both NN's and it would be better > to have the same setup for RM as this would make installation and managing > easier. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1639) YARM RM HA requires different configs on different RM hosts
[ https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883864#comment-13883864 ] Xuan Gong commented on YARN-1639: - Tested the patch in a two-node HA cluster > YARM RM HA requires different configs on different RM hosts > --- > > Key: YARN-1639 > URL: https://issues.apache.org/jira/browse/YARN-1639 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Arpit Gupta >Assignee: Xuan Gong > Attachments: YARN-1639.1.patch > > > We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you > want to first or second. > This means we have different configs on different RM nodes. This is unlike > HDFS HA where the same configs are pushed to both NN's and it would be better > to have the same setup for RM as this would make installation and managing > easier. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1639) YARM RM HA requires different configs on different RM hosts
[ https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1639: Attachment: YARN-1639.1.patch > YARM RM HA requires different configs on different RM hosts > --- > > Key: YARN-1639 > URL: https://issues.apache.org/jira/browse/YARN-1639 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Arpit Gupta >Assignee: Xuan Gong > Attachments: YARN-1639.1.patch > > > We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you > want to first or second. > This means we have different configs on different RM nodes. This is unlike > HDFS HA where the same configs are pushed to both NN's and it would be better > to have the same setup for RM as this would make installation and managing > easier. -- This message was sent by Atlassian JIRA (v6.1.5#6160)