[jira] [Commented] (YARN-2647) Add yarn queue CLI to get queue infos
[ https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184870#comment-14184870 ] Sunil G commented on YARN-2647: --- testResourceTrackerOnHA is not caused by this fix, seems connection exception from registerNodeManager. Add yarn queue CLI to get queue infos - Key: YARN-2647 URL: https://issues.apache.org/jira/browse/YARN-2647 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-2647.patch, 0002-YARN-2647.patch, 0003-YARN-2647.patch, 0004-YARN-2647.patch, 0005-YARN-2647.patch, 0006-YARN-2647.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2647) Add yarn queue CLI to get queue infos
[ https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184892#comment-14184892 ] Wangda Tan commented on YARN-2647: -- Hi [~sunilg], Latest patch LGTM, +1. Thanks, Wangda Add yarn queue CLI to get queue infos - Key: YARN-2647 URL: https://issues.apache.org/jira/browse/YARN-2647 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-2647.patch, 0002-YARN-2647.patch, 0003-YARN-2647.patch, 0004-YARN-2647.patch, 0005-YARN-2647.patch, 0006-YARN-2647.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2750) Allow StateMachine has callback when transition fail
Jeff Zhang created YARN-2750: Summary: Allow StateMachine has callback when transition fail Key: YARN-2750 URL: https://issues.apache.org/jira/browse/YARN-2750 Project: Hadoop YARN Issue Type: Improvement Reporter: Jeff Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2750) Allow StateMachine has callback when transition fail
[ https://issues.apache.org/jira/browse/YARN-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-2750: - Description: We have a situation that sometimes Transition may fail, but we don't want to handle the fail in each Transition, we'd like to handle it in one centralized place, Allow StateMachine has a callback would be good for us. Allow StateMachine has callback when transition fail Key: YARN-2750 URL: https://issues.apache.org/jira/browse/YARN-2750 Project: Hadoop YARN Issue Type: Improvement Reporter: Jeff Zhang We have a situation that sometimes Transition may fail, but we don't want to handle the fail in each Transition, we'd like to handle it in one centralized place, Allow StateMachine has a callback would be good for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2750) Allow StateMachine has callback when transition fail
[ https://issues.apache.org/jira/browse/YARN-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-2750: - Affects Version/s: 2.5.1 Allow StateMachine has callback when transition fail Key: YARN-2750 URL: https://issues.apache.org/jira/browse/YARN-2750 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Jeff Zhang We have a situation that sometimes Transition may fail, but we don't want to handle the fail in each Transition, we'd like to handle it in one centralized place, Allow StateMachine has a callback would be good for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2750) Allow StateMachine has callback when transition fail
[ https://issues.apache.org/jira/browse/YARN-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-2750: - Attachment: YARN-2750.patch Attach a patch for initial review. Allow StateMachine has callback when transition fail Key: YARN-2750 URL: https://issues.apache.org/jira/browse/YARN-2750 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Jeff Zhang Attachments: YARN-2750.patch We have a situation that sometimes Transition may fail, but we don't want to handle the fail in each Transition, we'd like to handle it in one centralized place, Allow StateMachine has a callback would be good for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2743) Yarn jobs via oozie fail with failed to renew token (secure) or digest mismatch (unsecure) errors when RM is being killed
[ https://issues.apache.org/jira/browse/YARN-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185078#comment-14185078 ] Hudson commented on YARN-2743: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #725 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/725/]) YARN-2743. Fixed a bug in ResourceManager that was causing RMDelegationToken identifiers to be tampered and thus causing app submission failures in secure mode. Contributed by Jian He. (vinodkv: rev 018664550507981297fd9f91e29408e6b7801ea9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/RMDelegationTokenIdentifierData.java Yarn jobs via oozie fail with failed to renew token (secure) or digest mismatch (unsecure) errors when RM is being killed - Key: YARN-2743 URL: https://issues.apache.org/jira/browse/YARN-2743 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Arpit Gupta Assignee: Jian He Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2743.1.patch, YARN-2743.2.patch During our HA testing we have seen yarn jobs run via oozie fail with failed to renew delegation token errors on secure clusters and digest mismatch errors on un secure clusters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2734) If a sub-folder is encountered by log aggregator it results in invalid aggregated file
[ https://issues.apache.org/jira/browse/YARN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185077#comment-14185077 ] Hudson commented on YARN-2734: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #725 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/725/]) YARN-2734. Skipped sub-folders in the local log dir when aggregating logs. Contributed by Xuan Gong. (zjshen: rev caecd9fffe7c6216be31f3ab65349182045451fa) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java If a sub-folder is encountered by log aggregator it results in invalid aggregated file -- Key: YARN-2734 URL: https://issues.apache.org/jira/browse/YARN-2734 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.5.1 Reporter: Sumit Mohanty Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2734.1.patch, YARN-2734.2.patch See YARN-2724 for some more context on how the error surfaces during yarn logs call. If aggregator sees a sub-folder today it results in the following error when reading the logs: {noformat} Container: container_1413512973198_0019_01_02 on c6401.ambari.apache.org_45454 LogType: cmd_data LogLength: 4096 Log Contents: Error aggregating log file. Log file : /hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data/hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data (Is a directory) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185104#comment-14185104 ] Varun Vasudev commented on YARN-2741: - [~cwelch] we should add some sort of unit test to confirm the behavior. I can see this bug getting re-introduced by mistake if someone is adding functionality or re-factoring code. Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager is running on) -- Key: YARN-2741 URL: https://issues.apache.org/jira/browse/YARN-2741 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-2741.1.patch PROBLEM: User is getting No Logs available for Container Container_number when setting the yarn.nodemanager.log-dirs to any drive letter other than C: STEPS TO REPRODUCE: On Windows 1) Run NodeManager on C: 2) Create two local drive partitions D: and E: 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs 4) Run a MR job that will last at least 5 minutes 5) While the job is in flight, log into the Yarn web ui , resource_manager_server:8088/cluster 6) Click on the application_idnumber 7) Click on the logs link, you will get No Logs available for Container Container_number ACTUAL BEHAVIOR: Getting an error message when viewing the container logs EXPECTED BEHAVIOR: Able to use different drive letters in yarn.nodemanager.log-dirs and not get error NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2750) Allow StateMachine has callback when transition fail
[ https://issues.apache.org/jira/browse/YARN-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-2750: - Attachment: YARN-2750-2.patch Allow StateMachine has callback when transition fail Key: YARN-2750 URL: https://issues.apache.org/jira/browse/YARN-2750 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Jeff Zhang Attachments: YARN-2750-2.patch, YARN-2750.patch We have a situation that sometimes Transition may fail, but we don't want to handle the fail in each Transition, we'd like to handle it in one centralized place, Allow StateMachine has a callback would be good for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2734) If a sub-folder is encountered by log aggregator it results in invalid aggregated file
[ https://issues.apache.org/jira/browse/YARN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185167#comment-14185167 ] Hudson commented on YARN-2734: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1914 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1914/]) YARN-2734. Skipped sub-folders in the local log dir when aggregating logs. Contributed by Xuan Gong. (zjshen: rev caecd9fffe7c6216be31f3ab65349182045451fa) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java If a sub-folder is encountered by log aggregator it results in invalid aggregated file -- Key: YARN-2734 URL: https://issues.apache.org/jira/browse/YARN-2734 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.5.1 Reporter: Sumit Mohanty Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2734.1.patch, YARN-2734.2.patch See YARN-2724 for some more context on how the error surfaces during yarn logs call. If aggregator sees a sub-folder today it results in the following error when reading the logs: {noformat} Container: container_1413512973198_0019_01_02 on c6401.ambari.apache.org_45454 LogType: cmd_data LogLength: 4096 Log Contents: Error aggregating log file. Log file : /hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data/hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data (Is a directory) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2743) Yarn jobs via oozie fail with failed to renew token (secure) or digest mismatch (unsecure) errors when RM is being killed
[ https://issues.apache.org/jira/browse/YARN-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185168#comment-14185168 ] Hudson commented on YARN-2743: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1914 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1914/]) YARN-2743. Fixed a bug in ResourceManager that was causing RMDelegationToken identifiers to be tampered and thus causing app submission failures in secure mode. Contributed by Jian He. (vinodkv: rev 018664550507981297fd9f91e29408e6b7801ea9) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/RMDelegationTokenIdentifierData.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java Yarn jobs via oozie fail with failed to renew token (secure) or digest mismatch (unsecure) errors when RM is being killed - Key: YARN-2743 URL: https://issues.apache.org/jira/browse/YARN-2743 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Arpit Gupta Assignee: Jian He Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2743.1.patch, YARN-2743.2.patch During our HA testing we have seen yarn jobs run via oozie fail with failed to renew delegation token errors on secure clusters and digest mismatch errors on un secure clusters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2678) Recommended improvements to Yarn Registry
[ https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2678: - Attachment: YARN-2678-006.patch Patch 006 # javadoc and javac warnings believed fixed # purged the no-longer-used header logic # improved data checks on deserialization Recommended improvements to Yarn Registry - Key: YARN-2678 URL: https://issues.apache.org/jira/browse/YARN-2678 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Gour Saha Assignee: Steve Loughran Attachments: HADOOP-2678-002.patch, YARN-2678-001.patch, YARN-2678-003.patch, YARN-2678-006.patch, yarnregistry.pdf In the process of binding to Slider AM from Slider agent python code here are some of the items I stumbled upon and would recommend as improvements. This is how the Slider's registry looks today - {noformat} jsonservicerec{ description : Slider Application Master, external : [ { api : org.apache.slider.appmaster, addressType : host/port, protocolType : hadoop/protobuf, addresses : [ [ c6408.ambari.apache.org, 34837 ] ] }, { api : org.apache.http.UI, addressType : uri, protocolType : webui, addresses : [ [ http://c6408.ambari.apache.org:43314; ] ] }, { api : org.apache.slider.management, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ] }, { api : org.apache.slider.publisher, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ] }, { api : org.apache.slider.registry, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ] }, { api : org.apache.slider.publisher.configurations, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ] } ], internal : [ { api : org.apache.slider.agents.secure, addressType : uri, protocolType : REST, addresses : [ [ https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ] }, { api : org.apache.slider.agents.oneway, addressType : uri, protocolType : REST, addresses : [ [ https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ] } ], yarn:persistence : application, yarn:id : application_1412974695267_0015 } {noformat} Recommendations: 1. I would suggest to either remove the string {color:red}jsonservicerec{color} or if it is desirable to have a non-null data at all times then loop the string into the json structure as a top-level attribute to ensure that the registry data is always a valid json document. 2. The {color:red}addresses{color} attribute is currently a list of list. I would recommend to convert it to a list of dictionary objects. In the dictionary object it would be nice to have the host and port portions of objects of addressType uri as separate key-value pairs to avoid parsing on the client side. The URI should also be retained as a key say uri to avoid clients trying to generate it by concatenating host, port, resource-path, etc. Here is a proposed structure - {noformat} { ... internal : [ { api : org.apache.slider.agents.secure, addressType : uri, protocolType : REST, addresses : [ { uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;, host : c6408.ambari.apache.org, port: 46958 } ] } ], } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2683) registry config options: document and move to core-default
[ https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2683: - Summary: registry config options: document and move to core-default (was: document registry config options) registry config options: document and move to core-default -- Key: YARN-2683 URL: https://issues.apache.org/jira/browse/YARN-2683 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2683-001.patch, YARN-2683-002.patch Original Estimate: 1h Remaining Estimate: 1h Add to {{yarn-site}} a page on registry configuration parameters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2683) registry config options: document and move to core-default
[ https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185213#comment-14185213 ] Steve Loughran commented on YARN-2683: -- I've added one more action to this JIRA: move the config defaults to {{core-default.xml}} This may seem odd for a YARN project but the registry was written so as to allow applications without any YARN artifacts on their classpath to resolve records. That is: the service is expected to be YARN app, (though not exclusively); clients may have a leaner classpath. While this works —the client applications do not get the default values from {{yarn-default.xml}} registry config options: document and move to core-default -- Key: YARN-2683 URL: https://issues.apache.org/jira/browse/YARN-2683 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2683-001.patch, YARN-2683-002.patch Original Estimate: 1h Remaining Estimate: 1h Add to {{yarn-site}} a page on registry configuration parameters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2678) Recommended improvements to Yarn Registry
[ https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185230#comment-14185230 ] Hadoop QA commented on YARN-2678: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677307/YARN-2678-006.patch against trunk revision 0058ead. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5576//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5576//console This message is automatically generated. Recommended improvements to Yarn Registry - Key: YARN-2678 URL: https://issues.apache.org/jira/browse/YARN-2678 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Gour Saha Assignee: Steve Loughran Attachments: HADOOP-2678-002.patch, YARN-2678-001.patch, YARN-2678-003.patch, YARN-2678-006.patch, yarnregistry.pdf In the process of binding to Slider AM from Slider agent python code here are some of the items I stumbled upon and would recommend as improvements. This is how the Slider's registry looks today - {noformat} jsonservicerec{ description : Slider Application Master, external : [ { api : org.apache.slider.appmaster, addressType : host/port, protocolType : hadoop/protobuf, addresses : [ [ c6408.ambari.apache.org, 34837 ] ] }, { api : org.apache.http.UI, addressType : uri, protocolType : webui, addresses : [ [ http://c6408.ambari.apache.org:43314; ] ] }, { api : org.apache.slider.management, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ] }, { api : org.apache.slider.publisher, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ] }, { api : org.apache.slider.registry, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ] }, { api : org.apache.slider.publisher.configurations, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ] } ], internal : [ { api : org.apache.slider.agents.secure, addressType : uri, protocolType : REST, addresses : [ [ https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ] }, { api : org.apache.slider.agents.oneway, addressType : uri, protocolType : REST, addresses : [ [ https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ] } ], yarn:persistence : application, yarn:id : application_1412974695267_0015 } {noformat} Recommendations: 1. I would suggest to either remove the string {color:red}jsonservicerec{color} or if it is desirable to have a non-null data at all times then loop the string into the json structure as a top-level attribute to ensure that the registry data is always a valid json document. 2. The {color:red}addresses{color} attribute is currently a list of list. I would recommend to convert it to a list of dictionary objects. In the dictionary object it would be nice to have the host and port portions of objects of addressType uri as separate key-value pairs to avoid parsing on the client side. The URI should also be retained as a key say uri to avoid clients trying to generate it by concatenating host, port, resource-path, etc. Here is a proposed structure - {noformat} { ... internal : [ { api : org.apache.slider.agents.secure, addressType : uri, protocolType : REST, addresses : [ { uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;, host :
[jira] [Commented] (YARN-2734) If a sub-folder is encountered by log aggregator it results in invalid aggregated file
[ https://issues.apache.org/jira/browse/YARN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185231#comment-14185231 ] Hudson commented on YARN-2734: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1939 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1939/]) YARN-2734. Skipped sub-folders in the local log dir when aggregating logs. Contributed by Xuan Gong. (zjshen: rev caecd9fffe7c6216be31f3ab65349182045451fa) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java If a sub-folder is encountered by log aggregator it results in invalid aggregated file -- Key: YARN-2734 URL: https://issues.apache.org/jira/browse/YARN-2734 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.5.1 Reporter: Sumit Mohanty Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2734.1.patch, YARN-2734.2.patch See YARN-2724 for some more context on how the error surfaces during yarn logs call. If aggregator sees a sub-folder today it results in the following error when reading the logs: {noformat} Container: container_1413512973198_0019_01_02 on c6401.ambari.apache.org_45454 LogType: cmd_data LogLength: 4096 Log Contents: Error aggregating log file. Log file : /hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data/hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data (Is a directory) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2743) Yarn jobs via oozie fail with failed to renew token (secure) or digest mismatch (unsecure) errors when RM is being killed
[ https://issues.apache.org/jira/browse/YARN-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185232#comment-14185232 ] Hudson commented on YARN-2743: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1939 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1939/]) YARN-2743. Fixed a bug in ResourceManager that was causing RMDelegationToken identifiers to be tampered and thus causing app submission failures in secure mode. Contributed by Jian He. (vinodkv: rev 018664550507981297fd9f91e29408e6b7801ea9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/RMDelegationTokenIdentifierData.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java Yarn jobs via oozie fail with failed to renew token (secure) or digest mismatch (unsecure) errors when RM is being killed - Key: YARN-2743 URL: https://issues.apache.org/jira/browse/YARN-2743 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Arpit Gupta Assignee: Jian He Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2743.1.patch, YARN-2743.2.patch During our HA testing we have seen yarn jobs run via oozie fail with failed to renew delegation token errors on secure clusters and digest mismatch errors on un secure clusters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2683) registry config options: document and move to core-default
[ https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2683: - Attachment: YARN-2683-003.patch Patch -003. The configuration document is available [[rendered on github|https://github.com/steveloughran/hadoop-trunk/blob/YARN-913/trunk-YARN-2683-docs/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/registry-configuration.md]] Along with the docs, this patch moves all the configuration options into {{core-default}}. They are only used in YARN applications today, but it is essential to place them there so that non-YARN clients pick up the default values. Configuration should go into {{core-site.xml}} too registry config options: document and move to core-default -- Key: YARN-2683 URL: https://issues.apache.org/jira/browse/YARN-2683 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2683-001.patch, YARN-2683-002.patch, YARN-2683-003.patch Original Estimate: 1h Remaining Estimate: 1h Add to {{yarn-site}} a page on registry configuration parameters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2186) Node Manager uploader service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2186: -- Attachment: YARN-2186-trunk-v4.patch Posted v4. Up to date with YARN-2183. To see the diffs on github, see https://github.com/ctrezzo/hadoop/compare/ctrezzo:trunk...sharedcache-4-YARN-2186-uploader Node Manager uploader service for cache manager --- Key: YARN-2186 URL: https://issues.apache.org/jira/browse/YARN-2186 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2186-trunk-v1.patch, YARN-2186-trunk-v2.patch, YARN-2186-trunk-v3.patch, YARN-2186-trunk-v4.patch Implement the node manager uploader service for the cache manager. This service is responsible for communicating with the node manager when it uploads resources to the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2683) registry config options: document and move to core-default
[ https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185356#comment-14185356 ] Hadoop QA commented on YARN-2683: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677320/YARN-2683-003.patch against trunk revision 0058ead. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5577//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5577//console This message is automatically generated. registry config options: document and move to core-default -- Key: YARN-2683 URL: https://issues.apache.org/jira/browse/YARN-2683 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2683-001.patch, YARN-2683-002.patch, YARN-2683-003.patch Original Estimate: 1h Remaining Estimate: 1h Add to {{yarn-site}} a page on registry configuration parameters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2186) Node Manager uploader service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185366#comment-14185366 ] Hadoop QA commented on YARN-2186: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677322/YARN-2186-trunk-v4.patch against trunk revision 0058ead. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5578//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5578//console This message is automatically generated. Node Manager uploader service for cache manager --- Key: YARN-2186 URL: https://issues.apache.org/jira/browse/YARN-2186 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2186-trunk-v1.patch, YARN-2186-trunk-v2.patch, YARN-2186-trunk-v3.patch, YARN-2186-trunk-v4.patch Implement the node manager uploader service for the cache manager. This service is responsible for communicating with the node manager when it uploads resources to the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185368#comment-14185368 ] bc Wong commented on YARN-2194: --- container-executor.c * L1188: If initialize_user() fails, do you not need to cleanup? * L1194: Same for create_log_dirs(). Seems that goto cleanup is still warranted. * L1207: Missing space before S_IRWXU. * L1243: Nit. Hardcoding 55 here is error-prone. You could allocate a 4K buffer here, and use snprintf. * L1244: You need to check the return value from malloc(). Since you're running as root here, everything has to be extra careful. * L1255: On failure, would log the command being executed. Add Cgroup support for RedHat 7 --- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2194-1.patch In previous versions of RedHat, we can build custom cgroup hierarchies with use of the cgconfig command from the libcgroup package. From RedHat 7, package libcgroup is deprecated and it is not recommended to use it since it can easily create conflicts with the default cgroup hierarchy. The systemd is provided and recommended for cgroup management. We need to add support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185372#comment-14185372 ] Wangda Tan commented on YARN-2729: -- Hi [~Naganarasimha], For 1. I think what I meant is just check label name locally in NM. If NM register/heartbeat with RM failed with labels, I should have commented in YARN-2495: https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14184146page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14184146. bq. use a flag say if the last sync about node labels is success or not This should be also your proposal For 2. I think for now, let's keep it simple, I just don't want to change too much for what we have in NodeLabelsManager :). As our discussion in YARN-2495, in the future we might need return rejected labels. So we can change this that that time. What do you think? Thanks, Wangda Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup --- Key: YARN-2729 URL: https://issues.apache.org/jira/browse/YARN-2729 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity
[ https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185449#comment-14185449 ] Xuan Gong commented on YARN-2726: - +1 lgtm. Will commit it. CapacityScheduler should explicitly log when an accessible label has no capacity Key: YARN-2726 URL: https://issues.apache.org/jira/browse/YARN-2726 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Phil D'Amore Assignee: Wangda Tan Priority: Minor Attachments: YARN-2726-20141023-1.patch, YARN-2726-20141023-2.patch Given: - Node label defined: test-label - Two queues defined: a, b - label accessibility and and capacity defined as follows (properties abbreviated for readability): root.a.accessible-node-labels = test-label root.a.accessible-node-labels.test-label.capacity = 100 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack trace with the following error buried within: Illegal capacity of -1.0 for label=test-label in queue=root.b This of course occurs because test-label is accessible to b due to inheritance from the root, and -1 is the UNDEFINED value. To my mind this might not be obvious to the admin, and the error message which results does not help guide someone to the source of the issue. I propose that this situation be updated so that when the capacity on an accessible label is undefined, it is explicitly called out instead of falling through to the illegal capacity check. Something like: {code} if (capacity == UNDEFINED) { throw new IllegalArgumentException(Configuration issue: + label= + label + is accessible from queue= + queue + but has no capacity set.); } {code} I'll leave it to better judgement than mine as to whether I'm throwing the appropriate exception there. I think this check should be added to both getNodeLabelCapacities and getMaximumNodeLabelCapacities in CapacitySchedulerConfiguration.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2744) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist
[ https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185458#comment-14185458 ] Wangda Tan commented on YARN-2744: -- As offline discussed with [~vinodkv], what this patch has done is not only fix for memory-based-config-store, it is still possible that when we use filesystem-based-config-store, some labels will not be validated. We should fix that. Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist - Key: YARN-2744 URL: https://issues.apache.org/jira/browse/YARN-2744 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.5.1 Reporter: Sumit Mohanty Assignee: Wangda Tan Attachments: YARN-2744-20141025-1.patch, YARN-2744-20141025-2.patch Use the following steps: * Ensure default in-memory storage is configured for labels * Define some labels and assign nodes to labels (e.g. define two labels and assign both labels to the host on a one host cluster) * Invoke refreshQueues * Modify capacity scheduler to create two top level queues and allow access to the labels from both the queues * Assign appropriate label + queue specific capacities * Restart resource manager Noticed that RM starts without any issues. The labels are not preserved across restart and thus the capacity-scheduler ends up using labels that are no longer present. At this point submitting an application to YARN will not succeed as there are no resources available with the labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity
[ https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185462#comment-14185462 ] Xuan Gong commented on YARN-2726: - Committed to trunk, branch-2 and branch-2.6. Thanks wangda ! CapacityScheduler should explicitly log when an accessible label has no capacity Key: YARN-2726 URL: https://issues.apache.org/jira/browse/YARN-2726 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Phil D'Amore Assignee: Wangda Tan Priority: Minor Fix For: 2.6.0 Attachments: YARN-2726-20141023-1.patch, YARN-2726-20141023-2.patch Given: - Node label defined: test-label - Two queues defined: a, b - label accessibility and and capacity defined as follows (properties abbreviated for readability): root.a.accessible-node-labels = test-label root.a.accessible-node-labels.test-label.capacity = 100 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack trace with the following error buried within: Illegal capacity of -1.0 for label=test-label in queue=root.b This of course occurs because test-label is accessible to b due to inheritance from the root, and -1 is the UNDEFINED value. To my mind this might not be obvious to the admin, and the error message which results does not help guide someone to the source of the issue. I propose that this situation be updated so that when the capacity on an accessible label is undefined, it is explicitly called out instead of falling through to the illegal capacity check. Something like: {code} if (capacity == UNDEFINED) { throw new IllegalArgumentException(Configuration issue: + label= + label + is accessible from queue= + queue + but has no capacity set.); } {code} I'll leave it to better judgement than mine as to whether I'm throwing the appropriate exception there. I think this check should be added to both getNodeLabelCapacities and getMaximumNodeLabelCapacities in CapacitySchedulerConfiguration.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity
[ https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185463#comment-14185463 ] Hudson commented on YARN-2726: -- FAILURE: Integrated in Hadoop-trunk-Commit #6354 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6354/]) YARN-2726. CapacityScheduler should explicitly log when an accessible label has no capacity. Contributed by Wangda Tan (xgong: rev ce1a4419a6c938447a675c416567db56bf9cb29e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java CapacityScheduler should explicitly log when an accessible label has no capacity Key: YARN-2726 URL: https://issues.apache.org/jira/browse/YARN-2726 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Phil D'Amore Assignee: Wangda Tan Priority: Minor Fix For: 2.6.0 Attachments: YARN-2726-20141023-1.patch, YARN-2726-20141023-2.patch Given: - Node label defined: test-label - Two queues defined: a, b - label accessibility and and capacity defined as follows (properties abbreviated for readability): root.a.accessible-node-labels = test-label root.a.accessible-node-labels.test-label.capacity = 100 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack trace with the following error buried within: Illegal capacity of -1.0 for label=test-label in queue=root.b This of course occurs because test-label is accessible to b due to inheritance from the root, and -1 is the UNDEFINED value. To my mind this might not be obvious to the admin, and the error message which results does not help guide someone to the source of the issue. I propose that this situation be updated so that when the capacity on an accessible label is undefined, it is explicitly called out instead of falling through to the illegal capacity check. Something like: {code} if (capacity == UNDEFINED) { throw new IllegalArgumentException(Configuration issue: + label= + label + is accessible from queue= + queue + but has no capacity set.); } {code} I'll leave it to better judgement than mine as to whether I'm throwing the appropriate exception there. I think this check should be added to both getNodeLabelCapacities and getMaximumNodeLabelCapacities in CapacitySchedulerConfiguration.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity
[ https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185464#comment-14185464 ] Wangda Tan commented on YARN-2726: -- Thanks [~xgong]'s review and commit! CapacityScheduler should explicitly log when an accessible label has no capacity Key: YARN-2726 URL: https://issues.apache.org/jira/browse/YARN-2726 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Phil D'Amore Assignee: Wangda Tan Priority: Minor Fix For: 2.6.0 Attachments: YARN-2726-20141023-1.patch, YARN-2726-20141023-2.patch Given: - Node label defined: test-label - Two queues defined: a, b - label accessibility and and capacity defined as follows (properties abbreviated for readability): root.a.accessible-node-labels = test-label root.a.accessible-node-labels.test-label.capacity = 100 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack trace with the following error buried within: Illegal capacity of -1.0 for label=test-label in queue=root.b This of course occurs because test-label is accessible to b due to inheritance from the root, and -1 is the UNDEFINED value. To my mind this might not be obvious to the admin, and the error message which results does not help guide someone to the source of the issue. I propose that this situation be updated so that when the capacity on an accessible label is undefined, it is explicitly called out instead of falling through to the illegal capacity check. Something like: {code} if (capacity == UNDEFINED) { throw new IllegalArgumentException(Configuration issue: + label= + label + is accessible from queue= + queue + but has no capacity set.); } {code} I'll leave it to better judgement than mine as to whether I'm throwing the appropriate exception there. I think this check should be added to both getNodeLabelCapacities and getMaximumNodeLabelCapacities in CapacitySchedulerConfiguration.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2744) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist
[ https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2744: - Priority: Critical (was: Major) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist - Key: YARN-2744 URL: https://issues.apache.org/jira/browse/YARN-2744 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.5.1 Reporter: Sumit Mohanty Assignee: Wangda Tan Priority: Critical Attachments: YARN-2744-20141025-1.patch, YARN-2744-20141025-2.patch Use the following steps: * Ensure default in-memory storage is configured for labels * Define some labels and assign nodes to labels (e.g. define two labels and assign both labels to the host on a one host cluster) * Invoke refreshQueues * Modify capacity scheduler to create two top level queues and allow access to the labels from both the queues * Assign appropriate label + queue specific capacities * Restart resource manager Noticed that RM starts without any issues. The labels are not preserved across restart and thus the capacity-scheduler ends up using labels that are no longer present. At this point submitting an application to YARN will not succeed as there are no resources available with the labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2751) HdfsConstants#MEMORY_STORAGE_POLICY_ID and HdfsConstants#MEMORY_STORAGE_POLICY_ID are missing in branch-2
Jing Zhao created YARN-2751: --- Summary: HdfsConstants#MEMORY_STORAGE_POLICY_ID and HdfsConstants#MEMORY_STORAGE_POLICY_ID are missing in branch-2 Key: YARN-2751 URL: https://issues.apache.org/jira/browse/YARN-2751 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jing Zhao Priority: Minor When HDFS-6581 was merged to branch-2 and branch-2.6, HdfsConstants#MEMORY_STORAGE_POLICY_ID and HdfsConstants#HdfsConstants#MEMORY_STORAGE_POLICY_ID, which were defined in HDFS-7228, are missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2591) AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data
[ https://issues.apache.org/jira/browse/YARN-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185485#comment-14185485 ] Jian He commented on YARN-2591: --- looks good, +1 AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data --- Key: YARN-2591 URL: https://issues.apache.org/jira/browse/YARN-2591 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 3.0.0, 2.6.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2591.1.patch, YARN-2591.2.patch AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data. Currently, it is going to return INTERNAL_SERVER_ERROR(500). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2751) HdfsConstants#MEMORY_STORAGE_POLICY_ID and HdfsConstants#MEMORY_STORAGE_POLICY_ID are missing in branch-2
[ https://issues.apache.org/jira/browse/YARN-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185490#comment-14185490 ] Jing Zhao commented on YARN-2751: - Oops, created in the wrong project... HdfsConstants#MEMORY_STORAGE_POLICY_ID and HdfsConstants#MEMORY_STORAGE_POLICY_ID are missing in branch-2 - Key: YARN-2751 URL: https://issues.apache.org/jira/browse/YARN-2751 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jing Zhao Priority: Minor When HDFS-6581 was merged to branch-2 and branch-2.6, HdfsConstants#MEMORY_STORAGE_POLICY_ID and HdfsConstants#HdfsConstants#MEMORY_STORAGE_POLICY_ID, which were defined in HDFS-7228, are missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2591) AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data
[ https://issues.apache.org/jira/browse/YARN-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185500#comment-14185500 ] Hudson commented on YARN-2591: -- FAILURE: Integrated in Hadoop-trunk-Commit #6355 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6355/]) YARN-2591. Fixed AHSWebServices to return FORBIDDEN(403) if the request user doesn't have access to the history data. Contributed by Zhijie Shen (jianhe: rev c05b581a5522eed499d3ba16af9fa6dc694563f6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/AuthorizationException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data --- Key: YARN-2591 URL: https://issues.apache.org/jira/browse/YARN-2591 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 3.0.0, 2.6.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: YARN-2591.1.patch, YARN-2591.2.patch AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data. Currently, it is going to return INTERNAL_SERVER_ERROR(500). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time
[ https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185534#comment-14185534 ] Jian He commented on YARN-2704: --- bq. if (this.token==null || this.applicationId==null || this.conf==null) {“ because this null checks can not happen, and was causing previous find bugs warning. Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time -- Key: YARN-2704 URL: https://issues.apache.org/jira/browse/YARN-2704 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-2704.1.patch, YARN-2704.2.patch, YARN-2704.2.patch, YARN-2704.3.patch In secure mode, YARN requires the hdfs-delegation token to do localization and log aggregation on behalf of the user. But the hdfs delegation token will eventually expire after max-token-life-time. So, localization and log aggregation will fail after the token expires. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time
[ https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2704: -- Attachment: YARN-2704.4.patch Fixed previous comments Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time -- Key: YARN-2704 URL: https://issues.apache.org/jira/browse/YARN-2704 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-2704.1.patch, YARN-2704.2.patch, YARN-2704.2.patch, YARN-2704.3.patch, YARN-2704.4.patch In secure mode, YARN requires the hdfs-delegation token to do localization and log aggregation on behalf of the user. But the hdfs delegation token will eventually expire after max-token-life-time. So, localization and log aggregation will fail after the token expires. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-443) allow OS scheduling priority of NM to be different than the containers it launches
[ https://issues.apache.org/jira/browse/YARN-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185542#comment-14185542 ] Xuan Gong commented on YARN-443: [~tgraves] I found that the function ContainerExecutor.getRunCommand() in the trunk patch is different from that in branch-2/branch-0.23 patch. Is there any reason why we are doing that ? allow OS scheduling priority of NM to be different than the containers it launches -- Key: YARN-443 URL: https://issues.apache.org/jira/browse/YARN-443 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Thomas Graves Assignee: Thomas Graves Fix For: 0.23.7, 2.0.4-alpha Attachments: YARN-443-branch-0.23.patch, YARN-443-branch-0.23.patch, YARN-443-branch-0.23.patch, YARN-443-branch-0.23.patch, YARN-443-branch-2.patch, YARN-443-branch-2.patch, YARN-443-branch-2.patch, YARN-443.patch, YARN-443.patch, YARN-443.patch, YARN-443.patch, YARN-443.patch, YARN-443.patch, YARN-443.patch It would be nice if we could have the nodemanager run at a different OS scheduling priority than the containers so that you can still communicate with the nodemanager if the containers out of control. On linux we could launch the nodemanager at a higher priority, but then all the containers it launches would also be at that higher priority, so we need a way for the container executor to launch them at a lower priority. I'm not sure how this applies to windows if at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185572#comment-14185572 ] Craig Welch commented on YARN-2741: --- Will do, making patch available so I can see the change run against existing tests. Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager is running on) -- Key: YARN-2741 URL: https://issues.apache.org/jira/browse/YARN-2741 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-2741.1.patch PROBLEM: User is getting No Logs available for Container Container_number when setting the yarn.nodemanager.log-dirs to any drive letter other than C: STEPS TO REPRODUCE: On Windows 1) Run NodeManager on C: 2) Create two local drive partitions D: and E: 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs 4) Run a MR job that will last at least 5 minutes 5) While the job is in flight, log into the Yarn web ui , resource_manager_server:8088/cluster 6) Click on the application_idnumber 7) Click on the logs link, you will get No Logs available for Container Container_number ACTUAL BEHAVIOR: Getting an error message when viewing the container logs EXPECTED BEHAVIOR: Able to use different drive letters in yarn.nodemanager.log-dirs and not get error NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-443) allow OS scheduling priority of NM to be different than the containers it launches
[ https://issues.apache.org/jira/browse/YARN-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185578#comment-14185578 ] Thomas Graves commented on YARN-443: Can you be more specific, what is different about it and why it is a problem? The trunk patch shows that there was an existing getRunCommand() routine (before this change) where as the other didn't have one before (it looks like for windows support). allow OS scheduling priority of NM to be different than the containers it launches -- Key: YARN-443 URL: https://issues.apache.org/jira/browse/YARN-443 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Thomas Graves Assignee: Thomas Graves Fix For: 0.23.7, 2.0.4-alpha Attachments: YARN-443-branch-0.23.patch, YARN-443-branch-0.23.patch, YARN-443-branch-0.23.patch, YARN-443-branch-0.23.patch, YARN-443-branch-2.patch, YARN-443-branch-2.patch, YARN-443-branch-2.patch, YARN-443.patch, YARN-443.patch, YARN-443.patch, YARN-443.patch, YARN-443.patch, YARN-443.patch, YARN-443.patch It would be nice if we could have the nodemanager run at a different OS scheduling priority than the containers so that you can still communicate with the nodemanager if the containers out of control. On linux we could launch the nodemanager at a higher priority, but then all the containers it launches would also be at that higher priority, so we need a way for the container executor to launch them at a lower priority. I'm not sure how this applies to windows if at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2571: - Attachment: YARN-2571-006.patch Patch 006: upgrade use a URI in the api from a SHOULD to a MUST; fix examples and tests accordingly RM to support YARN registry Key: YARN-2571 URL: https://issues.apache.org/jira/browse/YARN-2571 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2571-001.patch, YARN-2571-002.patch, YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-006.patch The RM needs to (optionally) integrate with the YARN registry: # startup: create the /services and /users paths with system ACLs (yarn, hdfs principals) # app-launch: create the user directory /users/$username with the relevant permissions (CRD) for them to create subnodes. # attempt, container, app completion: remove service records with the matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2571: - Attachment: (was: YARN-2571-006.patch) RM to support YARN registry Key: YARN-2571 URL: https://issues.apache.org/jira/browse/YARN-2571 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2571-001.patch, YARN-2571-002.patch, YARN-2571-003.patch, YARN-2571-005.patch The RM needs to (optionally) integrate with the YARN registry: # startup: create the /services and /users paths with system ACLs (yarn, hdfs principals) # app-launch: create the user directory /users/$username with the relevant permissions (CRD) for them to create subnodes. # attempt, container, app completion: remove service records with the matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2678) Recommended improvements to Yarn Registry
[ https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2678: - Attachment: YARN-2678-007.patch patch 007: upgrade use a URI in the api from a SHOULD to a MUST; fix examples and tests accordingly Recommended improvements to Yarn Registry - Key: YARN-2678 URL: https://issues.apache.org/jira/browse/YARN-2678 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Gour Saha Assignee: Steve Loughran Attachments: HADOOP-2678-002.patch, YARN-2678-001.patch, YARN-2678-003.patch, YARN-2678-006.patch, YARN-2678-007.patch, yarnregistry.pdf In the process of binding to Slider AM from Slider agent python code here are some of the items I stumbled upon and would recommend as improvements. This is how the Slider's registry looks today - {noformat} jsonservicerec{ description : Slider Application Master, external : [ { api : org.apache.slider.appmaster, addressType : host/port, protocolType : hadoop/protobuf, addresses : [ [ c6408.ambari.apache.org, 34837 ] ] }, { api : org.apache.http.UI, addressType : uri, protocolType : webui, addresses : [ [ http://c6408.ambari.apache.org:43314; ] ] }, { api : org.apache.slider.management, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ] }, { api : org.apache.slider.publisher, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ] }, { api : org.apache.slider.registry, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ] }, { api : org.apache.slider.publisher.configurations, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ] } ], internal : [ { api : org.apache.slider.agents.secure, addressType : uri, protocolType : REST, addresses : [ [ https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ] }, { api : org.apache.slider.agents.oneway, addressType : uri, protocolType : REST, addresses : [ [ https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ] } ], yarn:persistence : application, yarn:id : application_1412974695267_0015 } {noformat} Recommendations: 1. I would suggest to either remove the string {color:red}jsonservicerec{color} or if it is desirable to have a non-null data at all times then loop the string into the json structure as a top-level attribute to ensure that the registry data is always a valid json document. 2. The {color:red}addresses{color} attribute is currently a list of list. I would recommend to convert it to a list of dictionary objects. In the dictionary object it would be nice to have the host and port portions of objects of addressType uri as separate key-value pairs to avoid parsing on the client side. The URI should also be retained as a key say uri to avoid clients trying to generate it by concatenating host, port, resource-path, etc. Here is a proposed structure - {noformat} { ... internal : [ { api : org.apache.slider.agents.secure, addressType : uri, protocolType : REST, addresses : [ { uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;, host : c6408.ambari.apache.org, port: 46958 } ] } ], } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185636#comment-14185636 ] Hadoop QA commented on YARN-2741: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677052/YARN-2741.1.patch against trunk revision c05b581. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 13 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/5580//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5580//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5580//console This message is automatically generated. Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager is running on) -- Key: YARN-2741 URL: https://issues.apache.org/jira/browse/YARN-2741 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-2741.1.patch PROBLEM: User is getting No Logs available for Container Container_number when setting the yarn.nodemanager.log-dirs to any drive letter other than C: STEPS TO REPRODUCE: On Windows 1) Run NodeManager on C: 2) Create two local drive partitions D: and E: 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs 4) Run a MR job that will last at least 5 minutes 5) While the job is in flight, log into the Yarn web ui , resource_manager_server:8088/cluster 6) Click on the application_idnumber 7) Click on the logs link, you will get No Logs available for Container Container_number ACTUAL BEHAVIOR: Getting an error message when viewing the container logs EXPECTED BEHAVIOR: Able to use different drive letters in yarn.nodemanager.log-dirs and not get error NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2678) Recommended improvements to Yarn Registry
[ https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185637#comment-14185637 ] Hadoop QA commented on YARN-2678: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677366/YARN-2678-007.patch against trunk revision c05b581. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5581//console This message is automatically generated. Recommended improvements to Yarn Registry - Key: YARN-2678 URL: https://issues.apache.org/jira/browse/YARN-2678 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Gour Saha Assignee: Steve Loughran Attachments: HADOOP-2678-002.patch, YARN-2678-001.patch, YARN-2678-003.patch, YARN-2678-006.patch, YARN-2678-007.patch, yarnregistry.pdf In the process of binding to Slider AM from Slider agent python code here are some of the items I stumbled upon and would recommend as improvements. This is how the Slider's registry looks today - {noformat} jsonservicerec{ description : Slider Application Master, external : [ { api : org.apache.slider.appmaster, addressType : host/port, protocolType : hadoop/protobuf, addresses : [ [ c6408.ambari.apache.org, 34837 ] ] }, { api : org.apache.http.UI, addressType : uri, protocolType : webui, addresses : [ [ http://c6408.ambari.apache.org:43314; ] ] }, { api : org.apache.slider.management, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ] }, { api : org.apache.slider.publisher, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ] }, { api : org.apache.slider.registry, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ] }, { api : org.apache.slider.publisher.configurations, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ] } ], internal : [ { api : org.apache.slider.agents.secure, addressType : uri, protocolType : REST, addresses : [ [ https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ] }, { api : org.apache.slider.agents.oneway, addressType : uri, protocolType : REST, addresses : [ [ https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ] } ], yarn:persistence : application, yarn:id : application_1412974695267_0015 } {noformat} Recommendations: 1. I would suggest to either remove the string {color:red}jsonservicerec{color} or if it is desirable to have a non-null data at all times then loop the string into the json structure as a top-level attribute to ensure that the registry data is always a valid json document. 2. The {color:red}addresses{color} attribute is currently a list of list. I would recommend to convert it to a list of dictionary objects. In the dictionary object it would be nice to have the host and port portions of objects of addressType uri as separate key-value pairs to avoid parsing on the client side. The URI should also be retained as a key say uri to avoid clients trying to generate it by concatenating host, port, resource-path, etc. Here is a proposed structure - {noformat} { ... internal : [ { api : org.apache.slider.agents.secure, addressType : uri, protocolType : REST, addresses : [ { uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;, host : c6408.ambari.apache.org, port: 46958 } ] } ], } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2752) TestContainerExecutor.testRunCommandNoPriority fails in branch-2
Xuan Gong created YARN-2752: --- Summary: TestContainerExecutor.testRunCommandNoPriority fails in branch-2 Key: YARN-2752 URL: https://issues.apache.org/jira/browse/YARN-2752 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2752) TestContainerExecutor.testRunCommandNoPriority fails in branch-2
[ https://issues.apache.org/jira/browse/YARN-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2752: Description: TestContainerExecutor.testRunCommandNoPriority fails in branch-2. But it passed in trunk. The function code ContainerExecutor.getRunCommand() in trunk is different from that in branch-2. TestContainerExecutor.testRunCommandNoPriority fails in branch-2 Key: YARN-2752 URL: https://issues.apache.org/jira/browse/YARN-2752 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong TestContainerExecutor.testRunCommandNoPriority fails in branch-2. But it passed in trunk. The function code ContainerExecutor.getRunCommand() in trunk is different from that in branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2752) TestContainerExecutor.testRunCommandNoPriority fails in branch-2
[ https://issues.apache.org/jira/browse/YARN-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2752: Attachment: YARN-2752.1-branch-2.patch TestContainerExecutor.testRunCommandNoPriority fails in branch-2 Key: YARN-2752 URL: https://issues.apache.org/jira/browse/YARN-2752 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2752.1-branch-2.patch TestContainerExecutor.testRunCommandNoPriority fails in branch-2. But it passed in trunk. The function code ContainerExecutor.getRunCommand() in trunk is different from that in branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2669) FairScheduler: queueName shouldn't allow periods the allocation.xml
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2669: -- Attachment: YARN-2669-3.patch Thanks for the comments, [~bcwalrus]. How about replace the . with _dot_? FairScheduler: queueName shouldn't allow periods the allocation.xml --- Key: YARN-2669 URL: https://issues.apache.org/jira/browse/YARN-2669 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch For an allocation file like: {noformat} allocations queue name=root.q1 minResources4096mb,4vcores/minResources /queue /allocations {noformat} Users may wish to config minResources for a queue with full path root.q1. However, right now, fair scheduler will treat this configureation for the queue with full name root.root.q1. We need to print out a warning msg to notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2745) YARN new pluggable scheduler which does multi-resource packing
[ https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Grandl updated YARN-2745: Attachment: (was: tetris_design_doc.docx) YARN new pluggable scheduler which does multi-resource packing -- Key: YARN-2745 URL: https://issues.apache.org/jira/browse/YARN-2745 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Robert Grandl Attachments: sigcomm_14_tetris_talk.pptx, tetris_paper.pdf In this umbrella JIRA we propose a new pluggable scheduler, which accounts for all resources used by a task (CPU, memory, disk, network) and it is able to achieve three competing objectives: fairness, improve cluster utilization and reduces average job completion time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time
[ https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185686#comment-14185686 ] Hadoop QA commented on YARN-2704: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677352/YARN-2704.4.patch against trunk revision c05b581. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5579//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5579//console This message is automatically generated. Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time -- Key: YARN-2704 URL: https://issues.apache.org/jira/browse/YARN-2704 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-2704.1.patch, YARN-2704.2.patch, YARN-2704.2.patch, YARN-2704.3.patch, YARN-2704.4.patch In secure mode, YARN requires the hdfs-delegation token to do localization and log aggregation on behalf of the user. But the hdfs delegation token will eventually expire after max-token-life-time. So, localization and log aggregation will fail after the token expires. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2683) registry config options: document and move to core-default
[ https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185730#comment-14185730 ] Gour Saha commented on YARN-2683: - Steve, It looks great, just one minor thing - Orig: It is configured by way of {color:red}Hadoop a{color} Configuration class... I think you meant: It is configured by way of {color:blue}a Hadoop{color} Configuration class... -Gour registry config options: document and move to core-default -- Key: YARN-2683 URL: https://issues.apache.org/jira/browse/YARN-2683 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2683-001.patch, YARN-2683-002.patch, YARN-2683-003.patch Original Estimate: 1h Remaining Estimate: 1h Add to {{yarn-site}} a page on registry configuration parameters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2752) TestContainerExecutor.testRunCommandNoPriority fails in branch-2
[ https://issues.apache.org/jira/browse/YARN-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185732#comment-14185732 ] Xuan Gong commented on YARN-2752: - Initially, The function ContainerExecutor.getRunCommand has been implemented a little difference between trunk and branch-2 in YARN-443. Then in HADOOP-8562, we did a big merge from trunk which did not merge this function correctly. That is why right now, the code in trunk is different from that in branch-2. And this testcase passed in trunk, but failed in branch-2. TestContainerExecutor.testRunCommandNoPriority fails in branch-2 Key: YARN-2752 URL: https://issues.apache.org/jira/browse/YARN-2752 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2752.1-branch-2.patch TestContainerExecutor.testRunCommandNoPriority fails in branch-2. But it passed in trunk. The function code ContainerExecutor.getRunCommand() in trunk is different from that in branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2669) FairScheduler: queueName shouldn't allow periods the allocation.xml
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185813#comment-14185813 ] Hadoop QA commented on YARN-2669: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677381/YARN-2669-3.patch against trunk revision 5b1dfe7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5582//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5582//console This message is automatically generated. FairScheduler: queueName shouldn't allow periods the allocation.xml --- Key: YARN-2669 URL: https://issues.apache.org/jira/browse/YARN-2669 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch For an allocation file like: {noformat} allocations queue name=root.q1 minResources4096mb,4vcores/minResources /queue /allocations {noformat} Users may wish to config minResources for a queue with full path root.q1. However, right now, fair scheduler will treat this configureation for the queue with full name root.root.q1. We need to print out a warning msg to notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2010) If RM fails to recover an app, it can never transition to active again
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185845#comment-14185845 ] Karthik Kambatla commented on YARN-2010: [~vinodkv], [~jlowe], [~jianhe] - will any of you be able to review this? Thanks. If RM fails to recover an app, it can never transition to active again -- Key: YARN-2010 URL: https://issues.apache.org/jira/browse/YARN-2010 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: bc Wong Assignee: Karthik Kambatla Priority: Blocker Attachments: YARN-2010.1.patch, YARN-2010.patch, issue-stacktrace.rtf, yarn-2010-2.patch, yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, yarn-2010-6.patch Sometimes, the RM fails to recover an application. It could be because of turning security on, token expiry, or issues connecting to HDFS etc. The causes could be classified into (1) transient, (2) specific to one application, and (3) permanent and apply to multiple (all) applications. Today, the RM fails to transition to Active and ends up in STOPPED state and can never be transitioned to Active again. The initial stacktrace reported is at https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager
zhihai xu created YARN-2753: --- Summary: potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. It because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2753: Attachment: YARN-2753.000.patch potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. It because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2502) Changes in distributed shell to support specify labels
[ https://issues.apache.org/jira/browse/YARN-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2502: - Attachment: YARN-2502-20141027-2.patch Changes in distributed shell to support specify labels -- Key: YARN-2502 URL: https://issues.apache.org/jira/browse/YARN-2502 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2502-20141009.1.patch, YARN-2502-20141009.2.patch, YARN-2502-20141013.1.patch, YARN-2502-20141017-1.patch, YARN-2502-20141017-2.patch, YARN-2502-20141027-2.patch, YARN-2502.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2010) If RM fails to recover an app, it can never transition to active again
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185883#comment-14185883 ] Jian He commented on YARN-2010: --- sorry, was caught up with something. I'll review today. If RM fails to recover an app, it can never transition to active again -- Key: YARN-2010 URL: https://issues.apache.org/jira/browse/YARN-2010 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: bc Wong Assignee: Karthik Kambatla Priority: Blocker Attachments: YARN-2010.1.patch, YARN-2010.patch, issue-stacktrace.rtf, yarn-2010-2.patch, yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, yarn-2010-6.patch Sometimes, the RM fails to recover an application. It could be because of turning security on, token expiry, or issues connecting to HDFS etc. The causes could be classified into (1) transient, (2) specific to one application, and (3) permanent and apply to multiple (all) applications. Today, the RM fails to transition to Active and ends up in STOPPED state and can never be transitioned to Active again. The initial stacktrace reported is at https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185924#comment-14185924 ] Xuan Gong commented on YARN-2749: - This is because of race condition and it is the testcase issue. This happens when we call AppLogAggregatorImpl.doLogAggregationOutOfBand which will notify and abort the wait to start to aggregate logs. The notify action might be happen be fore the AppLogAggregatorImpl thread starts. The simplest fix could be adding some thread.sleep before we call AppLogAggregatorImpl.doLogAggregationOutOfBand to make sure AppLogAggregatorImpl thread starts. Some testcases from TestLogAggregationService fails in trunk Key: YARN-2749 URL: https://issues.apache.org/jira/browse/YARN-2749 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Some testcases from TestLogAggregationService fails in trunk. Those can be reproduced in centos Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2502) Changes in distributed shell to support specify labels
[ https://issues.apache.org/jira/browse/YARN-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185930#comment-14185930 ] Hadoop QA commented on YARN-2502: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677419/YARN-2502-20141027-2.patch against trunk revision 5b1dfe7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5583//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5583//console This message is automatically generated. Changes in distributed shell to support specify labels -- Key: YARN-2502 URL: https://issues.apache.org/jira/browse/YARN-2502 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2502-20141009.1.patch, YARN-2502-20141009.2.patch, YARN-2502-20141013.1.patch, YARN-2502-20141017-1.patch, YARN-2502-20141017-2.patch, YARN-2502-20141027-2.patch, YARN-2502.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185931#comment-14185931 ] Hadoop QA commented on YARN-2753: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677417/YARN-2753.000.patch against trunk revision 5b1dfe7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1287 javac compiler warnings (more than the trunk's current 1266 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5584//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5584//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5584//console This message is automatically generated. potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. It because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
Siqi Li created YARN-2755: - Summary: NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li reassigned YARN-2755: - Assignee: Siqi Li NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2755: -- Description: When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185948#comment-14185948 ] Siqi Li commented on YARN-2755: --- [~sjlee0] mentioned that NM's initialization is taking a LONG time because of this (this one's been doing this for 1 hour 41 minutes, and not all done), but monit isn't restarting it. To me, the ill effect is that the NM startup is taking a long time (and probably will get longer each time), and the directories are not getting cleaned up. NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time
[ https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185954#comment-14185954 ] Vinod Kumar Vavilapalli commented on YARN-2704: --- +1, looks good. Checking this in. Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time -- Key: YARN-2704 URL: https://issues.apache.org/jira/browse/YARN-2704 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-2704.1.patch, YARN-2704.2.patch, YARN-2704.2.patch, YARN-2704.3.patch, YARN-2704.4.patch In secure mode, YARN requires the hdfs-delegation token to do localization and log aggregation on behalf of the user. But the hdfs delegation token will eventually expire after max-token-life-time. So, localization and log aggregation will fail after the token expires. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time
[ https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2704: -- Priority: Critical (was: Major) This is critical for long running services. Getting this in as a critical item for 2.6. Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time -- Key: YARN-2704 URL: https://issues.apache.org/jira/browse/YARN-2704 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Priority: Critical Attachments: YARN-2704.1.patch, YARN-2704.2.patch, YARN-2704.2.patch, YARN-2704.3.patch, YARN-2704.4.patch In secure mode, YARN requires the hdfs-delegation token to do localization and log aggregation on behalf of the user. But the hdfs delegation token will eventually expire after max-token-life-time. So, localization and log aggregation will fail after the token expires. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time
[ https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185963#comment-14185963 ] Hudson commented on YARN-2704: -- FAILURE: Integrated in Hadoop-trunk-Commit #6357 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6357/]) YARN-2704. Changed ResourceManager to optionally obtain tokens itself for the sake of localization and log-aggregation for long-running services. Contributed by Jian He. (vinodkv: rev a16d022ca4313a41425c8e97841c841a2d6f2f54) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestProtocolRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/DummyContainerManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheDirectoryManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time -- Key: YARN-2704 URL: https://issues.apache.org/jira/browse/YARN-2704 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Priority:
[jira] [Commented] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185977#comment-14185977 ] Wangda Tan commented on YARN-2753: -- [~zxu], Nice finding! Thanks for the patch, looks good to me. And could you remove {code} labels == null {code} Of the check, {code} if (labels == null || labels.isEmpty()) { continue; } {code} In the patch together? Since the labels will never be null. Thanks, Wangda potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. It because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2753: - Issue Type: Sub-task (was: Bug) Parent: YARN-2492 potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. It because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2755: -- Attachment: YARN-2755.v1.patch NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2755.v1.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2755: -- Priority: Critical (was: Major) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2754) addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java.
[ https://issues.apache.org/jira/browse/YARN-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2754: - Issue Type: Sub-task (was: Bug) Parent: YARN-2492 addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. --- Key: YARN-2754 URL: https://issues.apache.org/jira/browse/YARN-2754 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2754.000.patch addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2754) addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java.
[ https://issues.apache.org/jira/browse/YARN-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185982#comment-14185982 ] Wangda Tan commented on YARN-2754: -- Zhihai, Thanks reporting and the fix, Patch LGTM +1, Wangda addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. --- Key: YARN-2754 URL: https://issues.apache.org/jira/browse/YARN-2754 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2754.000.patch addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2756) use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory.
zhihai xu created YARN-2756: --- Summary: use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. Key: YARN-2756 URL: https://issues.apache.org/jira/browse/YARN-2756 Project: Hadoop YARN Issue Type: Improvement Reporter: zhihai xu Assignee: zhihai xu Priority: Minor use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. When a Node is not activated, the resource is never used, When a Node is activated, a new resource will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource;) So it would be better to use static variable Resources.none() instead of allocating a new variable(Resource.newInstance(0, 0)) for each node deactivation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2756) use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory.
[ https://issues.apache.org/jira/browse/YARN-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2756: Description: use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. When a Node is not activated, the resource is never used, When a Node is activated, a new resource will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource) So it would be better to use static variable Resources.none() instead of allocating a new variable(Resource.newInstance(0, 0)) for each node deactivation. (was: use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. When a Node is not activated, the resource is never used, When a Node is activated, a new resource will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource;) So it would be better to use static variable Resources.none() instead of allocating a new variable(Resource.newInstance(0, 0)) for each node deactivation.) use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. --- Key: YARN-2756 URL: https://issues.apache.org/jira/browse/YARN-2756 Project: Hadoop YARN Issue Type: Improvement Reporter: zhihai xu Assignee: zhihai xu Priority: Minor use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. When a Node is not activated, the resource is never used, When a Node is activated, a new resource will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource) So it would be better to use static variable Resources.none() instead of allocating a new variable(Resource.newInstance(0, 0)) for each node deactivation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2756) use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory.
[ https://issues.apache.org/jira/browse/YARN-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2756: Attachment: YARN-2756.000.patch use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. --- Key: YARN-2756 URL: https://issues.apache.org/jira/browse/YARN-2756 Project: Hadoop YARN Issue Type: Improvement Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Attachments: YARN-2756.000.patch use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. When a Node is not activated, the resource is never used, When a Node is activated, a new resource will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource) So it would be better to use static variable Resources.none() instead of allocating a new variable(Resource.newInstance(0, 0)) for each node deactivation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2741: -- Attachment: YARN-2741.6.patch Added unit tests Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager is running on) -- Key: YARN-2741 URL: https://issues.apache.org/jira/browse/YARN-2741 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-2741.1.patch, YARN-2741.6.patch PROBLEM: User is getting No Logs available for Container Container_number when setting the yarn.nodemanager.log-dirs to any drive letter other than C: STEPS TO REPRODUCE: On Windows 1) Run NodeManager on C: 2) Create two local drive partitions D: and E: 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs 4) Run a MR job that will last at least 5 minutes 5) While the job is in flight, log into the Yarn web ui , resource_manager_server:8088/cluster 6) Click on the application_idnumber 7) Click on the logs link, you will get No Logs available for Container Container_number ACTUAL BEHAVIOR: Getting an error message when viewing the container logs EXPECTED BEHAVIOR: Able to use different drive letters in yarn.nodemanager.log-dirs and not get error NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2502) Changes in distributed shell to support specify labels
[ https://issues.apache.org/jira/browse/YARN-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186009#comment-14186009 ] Hadoop QA commented on YARN-2502: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677419/YARN-2502-20141027-2.patch against trunk revision a16d022. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5586//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5586//console This message is automatically generated. Changes in distributed shell to support specify labels -- Key: YARN-2502 URL: https://issues.apache.org/jira/browse/YARN-2502 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2502-20141009.1.patch, YARN-2502-20141009.2.patch, YARN-2502-20141013.1.patch, YARN-2502-20141017-1.patch, YARN-2502-20141017-2.patch, YARN-2502-20141027-2.patch, YARN-2502.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2755: -- Attachment: (was: YARN-2755.v1.patch) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2755: -- Attachment: YARN-2755.v1.patch NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2753: Attachment: YARN-2753.001.patch potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. It because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186048#comment-14186048 ] Hadoop QA commented on YARN-2755: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677439/YARN-2755.v1.patch against trunk revision a16d022. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5587//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5587//console This message is automatically generated. NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2754) addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java.
[ https://issues.apache.org/jira/browse/YARN-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186049#comment-14186049 ] Hadoop QA commented on YARN-2754: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677428/YARN-2754.000.patch against trunk revision 5b1dfe7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5585//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5585//console This message is automatically generated. addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. --- Key: YARN-2754 URL: https://issues.apache.org/jira/browse/YARN-2754 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2754.000.patch addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186047#comment-14186047 ] zhihai xu commented on YARN-2753: - [~leftnoteasy], thanks to review the patch, I remove labels == null in the new patch YARN-2753.001.patch. potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. It because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2749: Attachment: YARN-2749.1.patch Some testcases from TestLogAggregationService fails in trunk Key: YARN-2749 URL: https://issues.apache.org/jira/browse/YARN-2749 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2749.1.patch Some testcases from TestLogAggregationService fails in trunk. Those can be reproduced in centos Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186079#comment-14186079 ] Sangjin Lee commented on YARN-2755: --- @siqi, it would be good to elaborate on the nature of the bug a little more? NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186094#comment-14186094 ] Hadoop QA commented on YARN-2741: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677445/YARN-2741.6.patch against trunk revision 00b4e44. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5589//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5589//console This message is automatically generated. Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager is running on) -- Key: YARN-2741 URL: https://issues.apache.org/jira/browse/YARN-2741 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-2741.1.patch, YARN-2741.6.patch PROBLEM: User is getting No Logs available for Container Container_number when setting the yarn.nodemanager.log-dirs to any drive letter other than C: STEPS TO REPRODUCE: On Windows 1) Run NodeManager on C: 2) Create two local drive partitions D: and E: 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs 4) Run a MR job that will last at least 5 minutes 5) While the job is in flight, log into the Yarn web ui , resource_manager_server:8088/cluster 6) Click on the application_idnumber 7) Click on the logs link, you will get No Logs available for Container Container_number ACTUAL BEHAVIOR: Getting an error message when viewing the container logs EXPECTED BEHAVIOR: Able to use different drive letters in yarn.nodemanager.log-dirs and not get error NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186105#comment-14186105 ] Siqi Li commented on YARN-2755: --- When NM starts up, it moves usercache to usercache_DEL_timestamp, and it tries delete everything inside usercache_DEL_timestamp, and then it will delete usercache_DEL_timestamp. However, when there is nothing in usercache when NM starts up, the usercache_DEL_timestamp will not be delete properly. The reason is that FileContext.listStatus(userDirPath) will not return null if userDirPath is a valid empty directory. That's why empty usercache_DEL_timestamp directories are not cleaned up properly. Especially, when DN/NM is flapping, a large number of empty usercache_DEL_timestamp directories will be generated, which will take a certain amount of space and slow down NM start up process NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2757) potential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels.
zhihai xu created YARN-2757: --- Summary: potential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels. Key: YARN-2757 URL: https://issues.apache.org/jira/browse/YARN-2757 Project: Hadoop YARN Issue Type: Improvement Reporter: zhihai xu Assignee: zhihai xu pontential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels. since we check the nodeLabels null at {code} if (!str.trim().isEmpty() (nodeLabels == null || !nodeLabels.contains(str.trim( { return false; } {code} We should also check nodeLabels null at {code} if (!nodeLabels.isEmpty()) { return false; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2757) potential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels.
[ https://issues.apache.org/jira/browse/YARN-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2757: Issue Type: Bug (was: Improvement) potential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels. --- Key: YARN-2757 URL: https://issues.apache.org/jira/browse/YARN-2757 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu pontential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels. since we check the nodeLabels null at {code} if (!str.trim().isEmpty() (nodeLabels == null || !nodeLabels.contains(str.trim( { return false; } {code} We should also check nodeLabels null at {code} if (!nodeLabels.isEmpty()) { return false; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186115#comment-14186115 ] Hadoop QA commented on YARN-2755: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677456/YARN-2755.v1.patch against trunk revision 00b4e44. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5590//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5590//console This message is automatically generated. NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2698: - Target Version/s: 2.6.0 Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI --- Key: YARN-2698 URL: https://issues.apache.org/jira/browse/YARN-2698 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Mayank Bansal Priority: Critical YARN RMAdminCLI and AdminService should have write API only, for other read APIs, they should be located at YARNCLI and RMClientService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2698: - Priority: Critical (was: Major) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI --- Key: YARN-2698 URL: https://issues.apache.org/jira/browse/YARN-2698 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Mayank Bansal Priority: Critical YARN RMAdminCLI and AdminService should have write API only, for other read APIs, they should be located at YARNCLI and RMClientService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store
Zhijie Shen created YARN-2758: - Summary: Update TestApplicationHistoryClientService to use the new generic history store Key: YARN-2758 URL: https://issues.apache.org/jira/browse/YARN-2758 Project: Hadoop YARN Issue Type: Test Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen TestApplicationHistoryClientService is still testing against the mock data in the old MemoryApplicationHistoryStore. hence it needs to be updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186125#comment-14186125 ] Hadoop QA commented on YARN-2753: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677455/YARN-2753.001.patch against trunk revision 00b4e44. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5591//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5591//console This message is automatically generated. potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. It because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186126#comment-14186126 ] sidharta seethana commented on YARN-1964: - Hi Abin, I applied the patch to a recent snapshot of 2.6 and tested it out. The patch needs to be re-based (the changes are minor) to fix some minor compilation issues. Also, docker pull for the image mentioned in the example ( altiscale/hadoop-docker ) appears to be large and takes long time to pull - what is the reason for this? Is there a way we can bring down the size of the image? thanks, -Sid Create Docker analog of the LinuxContainerExecutor in YARN -- Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Abin Shahab Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186139#comment-14186139 ] Hadoop QA commented on YARN-2749: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677462/YARN-2749.1.patch against trunk revision 00b4e44. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5592//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5592//console This message is automatically generated. Some testcases from TestLogAggregationService fails in trunk Key: YARN-2749 URL: https://issues.apache.org/jira/browse/YARN-2749 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2749.1.patch Some testcases from TestLogAggregationService fails in trunk. Those can be reproduced in centos Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2759) addToCluserNodeLabels should not change the value in labelCollections if the key already exists to avoid the Label.resource is reset.
zhihai xu created YARN-2759: --- Summary: addToCluserNodeLabels should not change the value in labelCollections if the key already exists to avoid the Label.resource is reset. Key: YARN-2759 URL: https://issues.apache.org/jira/browse/YARN-2759 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu addToCluserNodeLabels should not change the value in labelCollections if the key already exists to avoid the Label.resource is reset. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2759) addToCluserNodeLabels should not change the value in labelCollections if the key already exists to avoid the Label.resource is reset.
[ https://issues.apache.org/jira/browse/YARN-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2759: Attachment: YARN-2759.000.patch addToCluserNodeLabels should not change the value in labelCollections if the key already exists to avoid the Label.resource is reset. - Key: YARN-2759 URL: https://issues.apache.org/jira/browse/YARN-2759 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2759.000.patch addToCluserNodeLabels should not change the value in labelCollections if the key already exists to avoid the Label.resource is reset. -- This message was sent by Atlassian JIRA (v6.3.4#6332)