[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277500#comment-14277500 ] Xuan Gong commented on YARN-2807: - Committed to trunk/branch-2. Thanks, Masatake Iwasaki Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: documentation, resourcemanager Reporter: Wangda Tan Assignee: Masatake Iwasaki Priority: Minor Fix For: 2.7.0 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, YARN-2807.4.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277501#comment-14277501 ] Sangjin Lee commented on YARN-2928: --- I agree with Karthik's comments. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param
[ https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-3062: -- Affects Version/s: 2.4.0 2.5.0 2.6.0 timelineserver gives inconsistent data for otherinfo field based on the filter param Key: YARN-3062 URL: https://issues.apache.org/jira/browse/YARN-3062 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.0, 2.5.0, 2.6.0 Reporter: Prakash Ramachandran Attachments: withfilter.json, withoutfilter.json When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 253 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 1009 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277452#comment-14277452 ] Xuan Gong commented on YARN-2807: - +1 LGTM. Will commit it. Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: documentation, resourcemanager Reporter: Wangda Tan Assignee: Masatake Iwasaki Priority: Minor Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, YARN-2807.4.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277491#comment-14277491 ] Robert Kanter commented on YARN-2928: - +1 to starting with a clean slate. If we need to use something from the original ATS, it's pretty easy to import or copy/paste it. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277508#comment-14277508 ] Sangjin Lee commented on YARN-2928: --- bq. In the following, depending on how the writer is implemented, we may want to preserve the outstanding timeline data that is received by ATS companion but is still not be persisted into the storage backend. IAC, it seem to be the common requirement no matter it's per-node (e.g., restarting) or per-app (e.g., crashing). The point taken about the need to recover the in-memory data in either approach. I am fine with starting with the per-node companion approach with the understanding that at some point we need to have at least an option of the per-app companion. We can reword YARN-3033 to do that. What do others think? [~jrottinghuis]? Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param
[ https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277693#comment-14277693 ] Hitesh Shah commented on YARN-3062: --- bq. Do you keep primaryFilters field set to the same filters when the entity is created? I understand it may be counter intuitive, but to post again to the existing entity and make all the records indexed by primaryFilters be updated to, you need to make sure primaryFilters field is properly set. bq. It doesn't update, but append. Say you have primaryFilter key1:value1. Then you update key1:value2. Finally you will get key1:[value1, value2]. Where is all of the above documented? timelineserver gives inconsistent data for otherinfo field based on the filter param Key: YARN-3062 URL: https://issues.apache.org/jira/browse/YARN-3062 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.0, 2.5.0, 2.6.0 Reporter: Prakash Ramachandran Attachments: withfilter.json, withoutfilter.json When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 253 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 1009 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3019) Make work-preserving-recovery the default mechanism for RM recovery
[ https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277478#comment-14277478 ] Jian He commented on YARN-3019: --- thanks for committing, Junping ! Make work-preserving-recovery the default mechanism for RM recovery --- Key: YARN-3019 URL: https://issues.apache.org/jira/browse/YARN-3019 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-3019.1.patch The proposal is to set yarn.resourcemanager.work-preserving-recovery.enabled to true by default to flip recovery mode to work-preserving recovery from non-work-preserving recovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277567#comment-14277567 ] Zhijie Shen commented on YARN-2928: --- bq. My vote is to start from a clean slate with a new source project Hm... It makes sense to me. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277483#comment-14277483 ] Hudson commented on YARN-2807: -- FAILURE: Integrated in Hadoop-trunk-Commit #6859 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6859/]) YARN-2807. Option --forceactive not works as described in usage of (xgong: rev d15cbae73c7ae22d5d60d8cba16cba565e8e8b20) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: documentation, resourcemanager Reporter: Wangda Tan Assignee: Masatake Iwasaki Priority: Minor Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, YARN-2807.4.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param
[ https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277603#comment-14277603 ] Zhijie Shen commented on YARN-3062: --- bq. if I have 2 primaryFilters while creating the entity, any update to entity has to use the TimelineEntity.addPrimaryFilter for both primaryFilters? Yes bq. can a new primaryfilter be added in an update You can, but when you filter with 3rd primaryfilter, you will miss the information that is posted before. It's a limitation of primaryfilter. The recommended way to use primaryfilter is to come up with all filters you may want to use. bq. can the primaryFilter value be changed in an update? It doesn't update, but append. Say you have primaryFilter key1:value1. Then you update key1:value2. Finally you will get key1:[value1, value2]. timelineserver gives inconsistent data for otherinfo field based on the filter param Key: YARN-3062 URL: https://issues.apache.org/jira/browse/YARN-3062 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.0, 2.5.0, 2.6.0 Reporter: Prakash Ramachandran Attachments: withfilter.json, withoutfilter.json When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 253 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 1009 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param
[ https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277573#comment-14277573 ] Prakash Ramachandran commented on YARN-3062: thanks for the clarification.[~zjshen] could be the reason, will check and update. so to clarify * if I have 2 primaryFilters while creating the entity, any update to entity has to use the TimelineEntity.addPrimaryFilter for both primaryFilters? * can a new primaryfilter be added in an update. (2 primaryfilters at entity creation time, add a new filter in update to make 3 filters, every subsequent update sets all 3 filters) * can the primaryFilter value be changed in an update? timelineserver gives inconsistent data for otherinfo field based on the filter param Key: YARN-3062 URL: https://issues.apache.org/jira/browse/YARN-3062 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.0, 2.5.0, 2.6.0 Reporter: Prakash Ramachandran Attachments: withfilter.json, withoutfilter.json When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 253 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 1009 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277516#comment-14277516 ] Hadoop QA commented on YARN-2933: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692282/YARN-2933-8.patch against trunk revision d336d13. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6332//console This message is automatically generated. Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, YARN-2933-8.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param
[ https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277468#comment-14277468 ] Zhijie Shen commented on YARN-3062: --- [~pramachandran], how did you update the entity? Do you keep *primaryFilters* field set to the same filters when the entity is created? I understand it may be counter intuitive, but to post again to the existing entity and make all the records indexed by *primaryFilters* be updated to, you need to make sure *primaryFilters* field is properly set. timelineserver gives inconsistent data for otherinfo field based on the filter param Key: YARN-3062 URL: https://issues.apache.org/jira/browse/YARN-3062 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.0, 2.5.0, 2.6.0 Reporter: Prakash Ramachandran Attachments: withfilter.json, withoutfilter.json When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 253 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 1009 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277597#comment-14277597 ] Zhijie Shen commented on YARN-2928: --- One correlated issue I want to raise here is aggregated log service. Currently, only JHS is serving the aggregated logs for completed MR jobs and control the log file retention. It doesn't cover the different workloads on YARN as well as the long running services that will never end. We thought of making ATS the hub to serve aggregated logs before, but didn't achieve it in the time frame of ATS current gen. Therefore, though aggregated log service is the part of the major goal of ATS next gen - scalability, I hope we'd better take into account in the future, when designing the reader and GUI. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp
[ https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3026: - Attachment: YARN-3026.2.patch Rebased against trunk and fixed findbugs warning. Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp --- Key: YARN-3026 URL: https://issues.apache.org/jira/browse/YARN-3026 Project: Hadoop YARN Issue Type: Task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3026.1.patch, YARN-3026.2.patch Have a discussion with [~vinodkv] and [~jianhe]: In existing Capacity Scheduler, all allocation logics of and under LeafQueue are located in LeafQueue.java in implementation. To make a cleaner scope of LeafQueue, we'd better move some of them to FiCaSchedulerApp. Ideal scope of LeafQueue should be: when a LeafQueue receives some resources from ParentQueue (like 15% of cluster resource), and it distributes resources to children apps, and it should be agnostic to internal logic of children apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how application allocating container from given resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1422#comment-1422 ] Zhijie Shen edited comment on YARN-2928 at 1/14/15 10:23 PM: - how about timelineserver or yarntimelineserver because it doesn't limit to application data only, and other sub modules are all er. bq. Also, any volunteers for creating the project skeleton? I can help on creating the new module. was (Author: zjshen): how about timelineservice or yarntimelineservice because it doesn't limit to application data only. bq. Also, any volunteers for creating the project skeleton? I can help on creating the new module. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-2217: --- Attachment: YARN-2217-trunk-v8.patch [~kasha] Attached is V8. I am not really sure what the issue is. It was complaining about a NoSuchMethodException so that seems like a classpath issue. I am not sure how to see what classpath the jenkins job used on the jenkins slave. For now I have changed the syntax of the expected exception in the error test cases to use the annotation based syntax. Let's see if the build likes that better. Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-2928: - Assignee: Vinod Kumar Vavilapalli (was: Sangjin Lee) Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Vinod Kumar Vavilapalli Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3063) Bootstrap TimelineServer Next Gen Module
[ https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3063: -- Attachment: YARN-3063.1.patch Bootstrap TimelineServer Next Gen Module Key: YARN-3063 URL: https://issues.apache.org/jira/browse/YARN-3063 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3063.1.patch Based on the discussion on the umbrella Jira, we need to create a new sub-module for TS next gen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277879#comment-14277879 ] Jason Lowe commented on YARN-3055: -- Agree with Jian. I believe the concern being raised here is not specific to the changes in YARN-2704 and YARN-2964 but a long-standing issue with YARN's handling of delegation tokens shared between jobs. The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer -- Key: YARN-3055 URL: https://issues.apache.org/jira/browse/YARN-3055 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-3055.001.patch, YARN-3055.002.patch After YARN-2964, there is only one timer to renew the token if it's shared by jobs. In {{removeApplicationFromRenewal}}, when going to remove a token, and the token is shared by other jobs, we will not cancel the token. Meanwhile, we should not cancel the _timerTask_, also we should not remove it from {{allTokens}}. Otherwise for the existing submitted applications which share this token will not get renew any more, and for new submitted applications which share this token, the token will be renew immediately. For example, we have 3 applications: app1, app2, app3. And they share the token1. See following scenario: *1).* app1 is submitted firstly, then app2, and then app3. In this case, there is only one token renewal timer for token1, and is scheduled when app1 is submitted *2).* app1 is finished, then the renewal timer is cancelled. token1 will not be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277969#comment-14277969 ] Swapnil Daingade commented on YARN-2139: Had a look at the latest design doc and was wondering if it would be possible to make the isolation part separate and optional from the avoiding over-allocation part. Enforcing isolation using Cgroups may not always work, especially in cases where HDFS is not the default dfs. [Umbrella] Support for Disk as a Resource in YARN -- Key: YARN-2139 URL: https://issues.apache.org/jira/browse/YARN-2139 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Attachments: Disk_IO_Isolation_Scheduling_3.pdf, Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, YARN-2139-prototype.patch YARN should consider disk as another resource for (1) scheduling tasks on nodes, (2) isolation at runtime, (3) spindle locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2861) Timeline DT secret manager should not reuse the RM's configs.
[ https://issues.apache.org/jira/browse/YARN-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277833#comment-14277833 ] Jian He commented on YARN-2861: --- looks good, +1 Timeline DT secret manager should not reuse the RM's configs. - Key: YARN-2861 URL: https://issues.apache.org/jira/browse/YARN-2861 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2861.1.patch, YARN-2861.2.patch This is the configs for RM DT secret manager. We should create separate ones for timeline DT only. {code} @Override protected void serviceInit(Configuration conf) throws Exception { long secretKeyInterval = conf.getLong(YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_KEY, YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_DEFAULT); long tokenMaxLifetime = conf.getLong(YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_KEY, YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT); long tokenRenewInterval = conf.getLong(YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_KEY, YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT); secretManager = new TimelineDelegationTokenSecretManager(secretKeyInterval, tokenMaxLifetime, tokenRenewInterval, 360); secretManager.startThreads(); serviceAddr = TimelineUtils.getTimelineTokenServiceAddress(getConfig()); super.init(conf); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3063) Bootstrap TimelineServer Next Gen Module
[ https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277940#comment-14277940 ] Hadoop QA commented on YARN-3063: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692368/YARN-3063.1.patch against trunk revision 6464a89. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineserver. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6335//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6335//console This message is automatically generated. Bootstrap TimelineServer Next Gen Module Key: YARN-3063 URL: https://issues.apache.org/jira/browse/YARN-3063 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3063.1.patch Based on the discussion on the umbrella Jira, we need to create a new sub-module for TS next gen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3063) Bootstrap TimelineServer Next Gen Module
[ https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278056#comment-14278056 ] Hadoop QA commented on YARN-3063: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692396/YARN-3063.2.patch against trunk revision 6464a89. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6336//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6336//console This message is automatically generated. Bootstrap TimelineServer Next Gen Module Key: YARN-3063 URL: https://issues.apache.org/jira/browse/YARN-3063 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3063.1.patch, YARN-3063.2.patch Based on the discussion on the umbrella Jira, we need to create a new sub-module for TS next gen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277888#comment-14277888 ] Robert Kanter commented on YARN-2928: - For the aggregated log service, I'm guessing we're talking about the viewer only and not the aggregator service itself (which runs in the NodeManagers)? The viewer is just one of those HTML Block Java files. Anyway, I agree that the Timeline Service reader should service log files; we can reuse that code. On a related note, the JHS currently has a service that deletes aggregated log files after some amount of time. If we eventually get rid of the JHS, we'll have to move this somewhere else. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Vinod Kumar Vavilapalli Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3063) Bootstrap TimelineServer Next Gen Module
[ https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277889#comment-14277889 ] Sangjin Lee commented on YARN-3063: --- Thanks for the prompt patch [~zjshen]! Some quick comments: - Vinod suggested timeline *service* as opposed to timeline server; perhaps we should change timeline server in names to timeline service? -- let's also change the package name to org.apache.hadoop.yarn.server.timelineservice - how about adding applicationhistoryservice as a dependency? the idea is to depend on it and start using it soon Bootstrap TimelineServer Next Gen Module Key: YARN-3063 URL: https://issues.apache.org/jira/browse/YARN-3063 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3063.1.patch Based on the discussion on the umbrella Jira, we need to create a new sub-module for TS next gen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3063) Bootstrap TimelineServer Next Gen Module
[ https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278003#comment-14278003 ] Zhijie Shen commented on YARN-3063: --- [~sjlee0] Thanks for review! bq. Vinod suggested timeline service as opposed to timeline server; perhaps we should change timeline server in names to timeline service? I noticed that. It makes sense to me. Updated the name accordingly. bq. how about adding applicationhistoryservice as a dependency? the idea is to depend on it and start using it soon Sounds good. Added applicationhistoryservice as a dependency. Later on, we can add more dependency on demand. In addition, I fixed some project dependency issues and change the default class to TimelineAggregator. Bootstrap TimelineServer Next Gen Module Key: YARN-3063 URL: https://issues.apache.org/jira/browse/YARN-3063 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3063.1.patch, YARN-3063.2.patch Based on the discussion on the umbrella Jira, we need to create a new sub-module for TS next gen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277761#comment-14277761 ] Sangjin Lee commented on YARN-2928: --- Let's settle on the name of the project. My vote was applicationtimelineservice, but I'm open to suggestions. :) And we can change our minds about this of course. Also, any volunteers for creating the project skeleton? I'm not sure if I have commit privileges for the branch yet, but if so I can do it as well. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param
[ https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1427#comment-1427 ] Zhijie Shen commented on YARN-3062: --- Unfortunately, we're still lacking the documentation. timelineserver gives inconsistent data for otherinfo field based on the filter param Key: YARN-3062 URL: https://issues.apache.org/jira/browse/YARN-3062 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.0, 2.5.0, 2.6.0 Reporter: Prakash Ramachandran Attachments: withfilter.json, withoutfilter.json When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 253 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 1009 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277868#comment-14277868 ] Vinod Kumar Vavilapalli commented on YARN-2928: --- I've been calling it timelineservice everywhere I go - the word server has a specific connotation. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277872#comment-14277872 ] Sangjin Lee commented on YARN-2928: --- +1 with that. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2194) Add Cgroup support for RedHat 7
[ https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bc Wong updated YARN-2194: -- Description:In previous versions of RedHat, we can build custom cgroup hierarchies with use of the cgconfig command from the libcgroup package. From RedHat 7, package libcgroup is deprecated and it is not recommended to use it since it can easily create conflicts with the default cgroup hierarchy. The systemd is provided and recommended for cgroup management. We need to add support for this. (was: In previous versions of RedHat, we can build custom cgroup hierarchies with use of the cgconfig command from the libcgroup package. From RedHat 7, package libcgroup is deprecated and it is not recommended to use it since it can easily create conflicts with the default cgroup hierarchy. The systemd is provided and recommended for cgroup management. We need to add support for this.) Add Cgroup support for RedHat 7 --- Key: YARN-2194 URL: https://issues.apache.org/jira/browse/YARN-2194 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2194-1.patch In previous versions of RedHat, we can build custom cgroup hierarchies with use of the cgconfig command from the libcgroup package. From RedHat 7, package libcgroup is deprecated and it is not recommended to use it since it can easily create conflicts with the default cgroup hierarchy. The systemd is provided and recommended for cgroup management. We need to add support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277880#comment-14277880 ] Vinod Kumar Vavilapalli commented on YARN-2928: --- [~sjlee0] bq. (1) While it may be faster to allocate with the per-node companions, capacity-wise you would end up spending more capacity with the per-node approach. Since these per-node companions are always up although they may be idle for large amount of time. So if capacity is a concern you may lose out. Under what circumstances would per-node companions be more advantageous in terms of capacity? Agreed, we will have to carve out some capacity for the per-node companions. I see some sort of static allocation like 1GB similar to NodeManager. I've never seen anyone change the NM capacity as it usually simply forgets things or persists state to local store. The per-node agent can also take the same approach - a limited heap, and forget or spill over to the Timeline Storage (e.g. HBase). Only when we want to utilize some memory for short term aggregations, capacity will be a concern. The other point is that we anyways have to carve out this capacity for things like YARN-2965. bq. (2) I do have a question about the work-preserving aspect of the per-node ATS companion. One implication of making this a per-node thing (i.e. long-running) is that we need to handle the work-preserving restart. What if we need to restart the ATS companion? Since other YARN daemons (RM and NM) allow for work-preserving restarts, we cannot have the ATS companion break that. So that seems to be a requirement? Yes, recoverability is a requirement for ALA. I'd design it such that it is the responsibility of each app's aggregator (living inside the node agent) instead of of the node-agent itself. bq. (3) We still need to handle the lifecycle management aspects of it. Previously we said that when RM allocates an AM it would tell the NM so the NM could spawn the special container. With the per-node approach, the RM would still need to tell the NM so that the NM can talk to the per-node ATS companion to initialize the data structure for the given app. Yes again. That doesn't change. And it would exactly work the way you said - at no place in the system will it be assumed that the aggregator is running per node - except for the final 'launcher' who launches the aggregator. bq. These are quick observations. While I do see value in the per-node approach, it's not totally clear how much work it would save over the per-app approach given these observations. What do you think? Like I mentioned, it won't save anything. It does two things in my mind (1) Let us focus on the wire up first without thinking about scheduling aspects in RM and (2) Let's us figure out other parallel efforts like YARN-1012, YARN-2965, YARN-2984, YARN-2141 can be unified in terms of per-node stats collection. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Vinod Kumar Vavilapalli Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277913#comment-14277913 ] Sangjin Lee commented on YARN-2928: --- I agree with the aggregated log viewer being part of ATS. Probably not for the first phase, but eventually. I'll update the doc accordingly. Speaking of which, I think we may want to standardize the names of the components, especially the one we launch to write data. We've used several names to refer to the same thing, and it'd be good if we just settle on one name so there is no confusion. May sound like a small thing, but it'd help discussing things rapidly. We used: ATS writer, ATS writer companion, aggregator, and ALA. I'm not married to any names here. How about Timeline aggregator? I'm open to suggestions. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Vinod Kumar Vavilapalli Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277786#comment-14277786 ] Wangda Tan commented on YARN-2933: -- In addition to Jian's comment, it's better to use enum type instead of int in mockContainer, which can avoid call getValue() from enum. Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, YARN-2933-8.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277808#comment-14277808 ] Sangjin Lee commented on YARN-2928: --- timelineserver sounds like a good name to me. Thanks! Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277873#comment-14277873 ] Vinod Kumar Vavilapalli commented on YARN-2928: --- [~zjshen], bq. One correlated issue I want to raise here is aggregated log service. [..] Therefore, though aggregated log service is [not] the part of the major goal of ATS next gen - scalability, I hope we'd better take into account in the future, when designing the reader and GUI. +1, we need a home for log viewer service and we were veering towards the Timeline Service itself. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3063) Bootstrap TimelineServer Next Gen Module
[ https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3063: -- Target Version/s: YARN-2928 Bootstrap TimelineServer Next Gen Module Key: YARN-3063 URL: https://issues.apache.org/jira/browse/YARN-3063 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3063.1.patch Based on the discussion on the umbrella Jira, we need to create a new sub-module for TS next gen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param
[ https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3062: -- Target Version/s: (was: YARN-2928) timelineserver gives inconsistent data for otherinfo field based on the filter param Key: YARN-3062 URL: https://issues.apache.org/jira/browse/YARN-3062 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.0, 2.5.0, 2.6.0 Reporter: Prakash Ramachandran Attachments: withfilter.json, withoutfilter.json When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 253 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 1009 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param
[ https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3062: -- Target Version/s: YARN-2928 timelineserver gives inconsistent data for otherinfo field based on the filter param Key: YARN-3062 URL: https://issues.apache.org/jira/browse/YARN-3062 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.0, 2.5.0, 2.6.0 Reporter: Prakash Ramachandran Attachments: withfilter.json, withoutfilter.json When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 253 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 1009 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling
[ https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277979#comment-14277979 ] Swapnil Daingade commented on YARN-2791: Hi [~kasha]] and [~vinodkv]], Went over the latest design doc for YARN-2139 and posted my comments. Add Disk as a resource for scheduling - Key: YARN-2791 URL: https://issues.apache.org/jira/browse/YARN-2791 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.5.1 Reporter: Swapnil Daingade Assignee: Yuliya Feldman Attachments: DiskDriveAsResourceInYARN.pdf Currently, the number of disks present on a node is not considered a factor while scheduling containers on that node. Having large amount of memory on a node can lead to high number of containers being launched on that node, all of which compete for I/O bandwidth. This multiplexing of I/O across containers can lead to slower overall progress and sub-optimal resource utilization as containers starved for I/O bandwidth hold on to other resources like cpu and memory. This problem can be solved by considering disk as a resource and including it in deciding how many containers can be concurrently run on a node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277977#comment-14277977 ] Zhijie Shen commented on YARN-2928: --- +1 for Timeline Aggregator Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Vinod Kumar Vavilapalli Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277784#comment-14277784 ] Jian He commented on YARN-2933: --- - looks good overall, we should use priority.AMCONTAINER here ? {code} if(setAMContainer i == 0){ cLive.add(mockContainer(appAttId, cAlloc, unit, priority.CONTAINER .getValue())); if (priority.CONTAINER.getValue() == cpriority) { when(mC.isAMContainer()).thenReturn(true); } {code} Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, YARN-2933-8.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3063) Bootstrap TimelineServer Next Gen Module
Zhijie Shen created YARN-3063: - Summary: Bootstrap TimelineServer Next Gen Module Key: YARN-3063 URL: https://issues.apache.org/jira/browse/YARN-3063 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Based on the discussion on the umbrella Jira, we need to create a new sub-module for TS next gen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart
[ https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277951#comment-14277951 ] Jian He commented on YARN-1055: --- With work-preserving RM restart, the max-attempts is not required any more. we may close this out ? Handle app recovery differently for AM failures and RM restart -- Key: YARN-1055 URL: https://issues.apache.org/jira/browse/YARN-1055 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Ideally, we would like to tolerate container, AM, RM failures. App recovery for AM and RM currently relies on the max-attempts config; tolerating AM failures requires it to be 1 and tolerating RM failure/restart requires it to be = 1. We should handle these two differently, with two separate configs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2861) Timeline DT secret manager should not reuse the RM's configs.
[ https://issues.apache.org/jira/browse/YARN-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277962#comment-14277962 ] Hadoop QA commented on YARN-2861: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692169/YARN-2861.2.patch against trunk revision 6464a89. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6334//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6334//console This message is automatically generated. Timeline DT secret manager should not reuse the RM's configs. - Key: YARN-2861 URL: https://issues.apache.org/jira/browse/YARN-2861 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2861.1.patch, YARN-2861.2.patch This is the configs for RM DT secret manager. We should create separate ones for timeline DT only. {code} @Override protected void serviceInit(Configuration conf) throws Exception { long secretKeyInterval = conf.getLong(YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_KEY, YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_DEFAULT); long tokenMaxLifetime = conf.getLong(YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_KEY, YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT); long tokenRenewInterval = conf.getLong(YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_KEY, YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT); secretManager = new TimelineDelegationTokenSecretManager(secretKeyInterval, tokenMaxLifetime, tokenRenewInterval, 360); secretManager.startThreads(); serviceAddr = TimelineUtils.getTimelineTokenServiceAddress(getConfig()); super.init(conf); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1422#comment-1422 ] Zhijie Shen commented on YARN-2928: --- how about timelineservice or yarntimelineservice because it doesn't limit to application data only. bq. Also, any volunteers for creating the project skeleton? I can help on creating the new module. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp
[ https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277854#comment-14277854 ] Hadoop QA commented on YARN-3026: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692338/YARN-3026.2.patch against trunk revision 7fe0f25. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6333//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6333//console This message is automatically generated. Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp --- Key: YARN-3026 URL: https://issues.apache.org/jira/browse/YARN-3026 Project: Hadoop YARN Issue Type: Task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3026.1.patch, YARN-3026.2.patch Have a discussion with [~vinodkv] and [~jianhe]: In existing Capacity Scheduler, all allocation logics of and under LeafQueue are located in LeafQueue.java in implementation. To make a cleaner scope of LeafQueue, we'd better move some of them to FiCaSchedulerApp. Ideal scope of LeafQueue should be: when a LeafQueue receives some resources from ParentQueue (like 15% of cluster resource), and it distributes resources to children apps, and it should be agnostic to internal logic of children apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how application allocating container from given resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
[ https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277966#comment-14277966 ] Robert Kanter commented on YARN-2797: - +1 TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase Key: YARN-2797 URL: https://issues.apache.org/jira/browse/YARN-2797 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Minor Attachments: yarn-2797-1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3065) TestNodeManagerResync errors
Karthik Kambatla created YARN-3065: -- Summary: TestNodeManagerResync errors Key: YARN-3065 URL: https://issues.apache.org/jira/browse/YARN-3065 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, test Affects Versions: 2.6.0 Reporter: Karthik Kambatla TestNodeManagerResync started failing recently, mostly due to a test timeout. See attachment for a sample test output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3065) TestNodeManagerResync errors
[ https://issues.apache.org/jira/browse/YARN-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3065: --- Attachment: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.txt TestNodeManagerResync errors Key: YARN-3065 URL: https://issues.apache.org/jira/browse/YARN-3065 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, test Affects Versions: 2.6.0 Reporter: Karthik Kambatla Attachments: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.txt TestNodeManagerResync started failing recently, mostly due to a test timeout. See attachment for a sample test output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278072#comment-14278072 ] Hadoop QA commented on YARN-2217: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692399/YARN-2217-trunk-v8.patch against trunk revision 6464a89. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.cli.TestRMAdminCLI Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6337//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6337//console This message is automatically generated. Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3064) TestRMRestart/TestContainerResourceUsage failure with allocation timeout in trunk
Wangda Tan created YARN-3064: Summary: TestRMRestart/TestContainerResourceUsage failure with allocation timeout in trunk Key: YARN-3064 URL: https://issues.apache.org/jira/browse/YARN-3064 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Priority: Critical Noticed consistent tests failure, see: https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/ Logs like: {code} Error Message Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED Stacktrace java.lang.AssertionError: Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794) {code} I can reproduce it in local environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk
[ https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3064: -- Summary: TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk (was: TestRMRestart/TestContainerResourceUsage failure with allocation timeout in trunk) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk --- Key: YARN-3064 URL: https://issues.apache.org/jira/browse/YARN-3064 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Assignee: Jian He Priority: Critical Noticed consistent tests failure, see: https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/ Logs like: {code} Error Message Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED Stacktrace java.lang.AssertionError: Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794) {code} I can reproduce it in local environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk
[ https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278276#comment-14278276 ] Jian He commented on YARN-3064: --- caused by YARN-3019. Some tests are still based on the non-work-preserving recovery mechanism Uploaded a patch to fix the tests. TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk --- Key: YARN-3064 URL: https://issues.apache.org/jira/browse/YARN-3064 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Assignee: Jian He Priority: Critical Attachments: YARN-3064.1.patch Noticed consistent tests failure, see: https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/ Logs like: {code} Error Message Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED Stacktrace java.lang.AssertionError: Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794) {code} I can reproduce it in local environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278287#comment-14278287 ] Karthik Kambatla commented on YARN-2217: v8 patch looked mostly good, but for one nit: SharedCacheClientImpl#stopClientProxy checks for scmClient not null but doesn't set it to null. Posted v9 that fixes it. I am +1 on the v9 patch. Will commit it if Jenkins doesn't complain of anything. Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, YARN-2217-trunk-v9.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp
[ https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278073#comment-14278073 ] Wangda Tan commented on YARN-3026: -- Test failures should not relate to this patch, filed YARN-3064 to track it. Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp --- Key: YARN-3026 URL: https://issues.apache.org/jira/browse/YARN-3026 Project: Hadoop YARN Issue Type: Task Components: capacityscheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3026.1.patch, YARN-3026.2.patch Have a discussion with [~vinodkv] and [~jianhe]: In existing Capacity Scheduler, all allocation logics of and under LeafQueue are located in LeafQueue.java in implementation. To make a cleaner scope of LeafQueue, we'd better move some of them to FiCaSchedulerApp. Ideal scope of LeafQueue should be: when a LeafQueue receives some resources from ParentQueue (like 15% of cluster resource), and it distributes resources to children apps, and it should be agnostic to internal logic of children apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how application allocating container from given resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3064) TestRMRestart/TestContainerResourceUsage failure with allocation timeout in trunk
[ https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-3064: - Assignee: Jian He TestRMRestart/TestContainerResourceUsage failure with allocation timeout in trunk - Key: YARN-3064 URL: https://issues.apache.org/jira/browse/YARN-3064 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Assignee: Jian He Priority: Critical Noticed consistent tests failure, see: https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/ Logs like: {code} Error Message Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED Stacktrace java.lang.AssertionError: Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794) {code} I can reproduce it in local environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2984) Metrics for container's actual memory usage
[ https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278251#comment-14278251 ] Karthik Kambatla commented on YARN-2984: The test failure is unrelated. It fails on trunk as well, filed YARN-3065 to fix it. Metrics for container's actual memory usage --- Key: YARN-2984 URL: https://issues.apache.org/jira/browse/YARN-2984 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2984-1.patch, yarn-2984-2.patch, yarn-2984-3.patch, yarn-2984-prelim.patch It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track memory usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3063) Bootstrap TimelineServer Next Gen Module
[ https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278273#comment-14278273 ] Zhijie Shen commented on YARN-3063: --- [~sjlee0], thanks for review. Will commit it to branch YARN-2928. Bootstrap TimelineServer Next Gen Module Key: YARN-3063 URL: https://issues.apache.org/jira/browse/YARN-3063 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3063.1.patch, YARN-3063.2.patch Based on the discussion on the umbrella Jira, we need to create a new sub-module for TS next gen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk
[ https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3064: -- Attachment: YARN-3064.1.patch TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk --- Key: YARN-3064 URL: https://issues.apache.org/jira/browse/YARN-3064 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Assignee: Jian He Priority: Critical Attachments: YARN-3064.1.patch Noticed consistent tests failure, see: https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/ Logs like: {code} Error Message Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED Stacktrace java.lang.AssertionError: Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794) {code} I can reproduce it in local environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3063) Bootstrap TimelineServer Next Gen Module
[ https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278118#comment-14278118 ] Sangjin Lee commented on YARN-3063: --- Thanks [~zjshen]! LGTM. Bootstrap TimelineServer Next Gen Module Key: YARN-3063 URL: https://issues.apache.org/jira/browse/YARN-3063 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3063.1.patch, YARN-3063.2.patch Based on the discussion on the umbrella Jira, we need to create a new sub-module for TS next gen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2217: --- Attachment: YARN-2217-trunk-v9.patch Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, YARN-2217-trunk-v9.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2990) FairScheduler's delay-scheduling always waits for node-local and rack-local delays, even for off-rack-only requests
[ https://issues.apache.org/jira/browse/YARN-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278300#comment-14278300 ] Karthik Kambatla commented on YARN-2990: Looked more closely, unfortunately this appears to be by design. Each application has an allowed locality level - initially node-local, and transitions to rack-local and off-switch after the corresponding delays. Instead, it might be better to track this allowed locality level per {{ResourceRequest}}. I propose: # In the short-term, to address the case where the AM has to go through the node-local and rack-local delays, we could start with the default locality level of off-switch and reset it node-local after the AM is allocated. # In the long-term, let us augment ResourceRequest to include allowed locality level. Thoughts? FairScheduler's delay-scheduling always waits for node-local and rack-local delays, even for off-rack-only requests --- Key: YARN-2990 URL: https://issues.apache.org/jira/browse/YARN-2990 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2990-test.patch Looking at the FairScheduler, it appears the node/rack locality delays are used for all requests, even those that are only off-rack. More details in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support
[ https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276967#comment-14276967 ] Hadoop QA commented on YARN-2896: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12691683/0003-YARN-2896.patch against trunk revision d336d13. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6331//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6331//console This message is automatically generated. Server side PB changes for Priority Label Manager and Admin CLI support --- Key: YARN-2896 URL: https://issues.apache.org/jira/browse/YARN-2896 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2896.patch, 0002-YARN-2896.patch, 0003-YARN-2896.patch Common changes: * PB support changes required for Admin APIs * PB support for File System store (Priority Label Store) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3019) Make work-preserving-recovery the default mechanism for RM recovery
[ https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276978#comment-14276978 ] Hudson commented on YARN-3019: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2005 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2005/]) YARN-3019. Make work-preserving-recovery the default mechanism for RM recovery. (Contributed by Jian He) (junping_du: rev f92e5038000a012229c304bc6e5281411eff2883) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt Make work-preserving-recovery the default mechanism for RM recovery --- Key: YARN-3019 URL: https://issues.apache.org/jira/browse/YARN-3019 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-3019.1.patch The proposal is to set yarn.resourcemanager.work-preserving-recovery.enabled to true by default to flip recovery mode to work-preserving recovery from non-work-preserving recovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3019) Make work-preserving-recovery the default mechanism for RM recovery
[ https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277057#comment-14277057 ] Hudson commented on YARN-3019: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2024 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2024/]) YARN-3019. Make work-preserving-recovery the default mechanism for RM recovery. (Contributed by Jian He) (junping_du: rev f92e5038000a012229c304bc6e5281411eff2883) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt Make work-preserving-recovery the default mechanism for RM recovery --- Key: YARN-3019 URL: https://issues.apache.org/jira/browse/YARN-3019 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-3019.1.patch The proposal is to set yarn.resourcemanager.work-preserving-recovery.enabled to true by default to flip recovery mode to work-preserving recovery from non-work-preserving recovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277058#comment-14277058 ] Hudson commented on YARN-2637: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2024 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2024/]) YARN-2637. Fixed max-am-resource-percent calculation in CapacityScheduler when activating applications. Contributed by Craig Welch (jianhe: rev c53420f58364b11fbda1dace7679d45534533382) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Fix For: 2.7.0 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch,
[jira] [Commented] (YARN-3061) NPE in RM AppBlock render
[ https://issues.apache.org/jira/browse/YARN-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276945#comment-14276945 ] Steve Loughran commented on YARN-3061: -- {code} 2015-01-14 14:25:46,894 ERROR webapp.Dispatcher (Dispatcher.java:service(162)) - error handling URI: /cluster/app/application_1420734007650_0010 java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:116) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
[jira] [Created] (YARN-3061) NPE in RM AppBlock render
Steve Loughran created YARN-3061: Summary: NPE in RM AppBlock render Key: YARN-3061 URL: https://issues.apache.org/jira/browse/YARN-3061 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Priority: Minor An RM (running in a VM which did a sleep/resume) overnight no longer launches apps, and when you try to look at the logs, Web UI says 500 look at the logs, which show a stack trace -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276952#comment-14276952 ] Hudson commented on YARN-2637: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #70 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/70/]) YARN-2637. Fixed max-am-resource-percent calculation in CapacityScheduler when activating applications. Contributed by Craig Welch (jianhe: rev c53420f58364b11fbda1dace7679d45534533382) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Fix For: 2.7.0 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.31.patch,
[jira] [Commented] (YARN-3019) Make work-preserving-recovery the default mechanism for RM recovery
[ https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276951#comment-14276951 ] Hudson commented on YARN-3019: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #70 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/70/]) YARN-3019. Make work-preserving-recovery the default mechanism for RM recovery. (Contributed by Jian He) (junping_du: rev f92e5038000a012229c304bc6e5281411eff2883) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml Make work-preserving-recovery the default mechanism for RM recovery --- Key: YARN-3019 URL: https://issues.apache.org/jira/browse/YARN-3019 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-3019.1.patch The proposal is to set yarn.resourcemanager.work-preserving-recovery.enabled to true by default to flip recovery mode to work-preserving recovery from non-work-preserving recovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3019) Make work-preserving-recovery the default mechanism for RM recovery
[ https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277038#comment-14277038 ] Hudson commented on YARN-3019: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #74 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/74/]) YARN-3019. Make work-preserving-recovery the default mechanism for RM recovery. (Contributed by Jian He) (junping_du: rev f92e5038000a012229c304bc6e5281411eff2883) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java Make work-preserving-recovery the default mechanism for RM recovery --- Key: YARN-3019 URL: https://issues.apache.org/jira/browse/YARN-3019 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-3019.1.patch The proposal is to set yarn.resourcemanager.work-preserving-recovery.enabled to true by default to flip recovery mode to work-preserving recovery from non-work-preserving recovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277071#comment-14277071 ] Karthik Kambatla commented on YARN-2928: Starting from a clean slate sounds reasonable. I would punt on copying source during the development though, just add a dependency on applicationhistoryservice and use the necessary classes. Once phase 1 dev is mostly done, we will be able to make a call whether to merge the modules or copy the requirements over. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3061) NPE in RM AppBlock render
[ https://issues.apache.org/jira/browse/YARN-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276956#comment-14276956 ] Steve Loughran commented on YARN-3061: -- in the source {{RMAppAttemptMetrics attemptMetrics = rmApp.getCurrentAppAttempt().getRMAppAttemptMetrics();}} clearly the app failed *before any app attempt was created* The root cause looks like some token renewal thing probably caused by the VM save/resume, related to kerberos renewal by the look of things {code} org.apache.slider.funtest.lifecycle.AgentWebPagesIT testAgentWeb(org.apache.slider.funtest.lifecycle.AgentWebPagesIT) Time elapsed: 194.768 sec FAILURE! java.lang.AssertionError: Application Launch Failure, exit code 65 Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 192.168.1.134:8188, Ident: (owner=stevel, renewer=yarn, realUser=, issueDate=1421245210012, maxDate=1421850010012, sequenceNumber=11, masterKeyId=6) at org.junit.Assert.fail(Assert.java:88) at org.apache.slider.funtest.framework.CommandTestBase.createTemplatedSliderApplication(CommandTestBase.groovy:691) at org.apache.slider.funtest.lifecycle.AgentWebPagesIT.testAgentWeb(AgentWebPagesIT.groovy:76) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} Server side {code} 2015-01-14 14:20:16,993 ERROR metrics.SystemMetricsPublisher (SystemMetricsPublisher.java:putEntity(427)) - Error when publishing entity [YARN_APPLICATION,application_1420734007650_0010] org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response from the timeline server. at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:339) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.putEntity(SystemMetricsPublisher.java:425) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.publishApplicationCreatedEvent(SystemMetricsPublisher.java:258) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.handleSystemMetricsEvent(SystemMetricsPublisher.java:213) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:442) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:437) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-01-14 14:20:35,026 INFO impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(285)) - Timeline service address: http://devix.cotham.uk:8188/ws/v1/timeline/ 2015-01-14 14:20:35,766 WARN security.DelegationTokenRenewer (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(785)) - Unable to add the application to the delegation token renewer. java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 192.168.1.134:8188, Ident: (owner=stevel, renewer=yarn, realUser=, issueDate=1421245210012, maxDate=1421850010012, sequenceNumber=11, masterKeyId=6) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) at
[jira] [Assigned] (YARN-3061) NPE in RM AppBlock render
[ https://issues.apache.org/jira/browse/YARN-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3061: -- Assignee: Varun Saxena NPE in RM AppBlock render - Key: YARN-3061 URL: https://issues.apache.org/jira/browse/YARN-3061 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Varun Saxena Priority: Minor An RM (running in a VM which did a sleep/resume) overnight no longer launches apps, and when you try to look at the logs, Web UI says 500 look at the logs, which show a stack trace -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276979#comment-14276979 ] Hudson commented on YARN-2637: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2005 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2005/]) YARN-2637. Fixed max-am-resource-percent calculation in CapacityScheduler when activating applications. Contributed by Craig Welch (jianhe: rev c53420f58364b11fbda1dace7679d45534533382) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Fix For: 2.7.0 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.31.patch,
[jira] [Commented] (YARN-3061) NPE in RM AppBlock render
[ https://issues.apache.org/jira/browse/YARN-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276997#comment-14276997 ] Varun Saxena commented on YARN-3061: [~ste...@apache.org], this seems fixed by YARN-2414 NPE in RM AppBlock render - Key: YARN-3061 URL: https://issues.apache.org/jira/browse/YARN-3061 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Varun Saxena Priority: Minor An RM (running in a VM which did a sleep/resume) overnight no longer launches apps, and when you try to look at the logs, Web UI says 500 look at the logs, which show a stack trace -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277039#comment-14277039 ] Hudson commented on YARN-2637: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #74 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/74/]) YARN-2637. Fixed max-am-resource-percent calculation in CapacityScheduler when activating applications. Contributed by Craig Welch (jianhe: rev c53420f58364b11fbda1dace7679d45534533382) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Fix For: 2.7.0 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch,
[jira] [Commented] (YARN-2861) Timeline DT secret manager should not reuse the RM's configs.
[ https://issues.apache.org/jira/browse/YARN-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278376#comment-14278376 ] Zhijie Shen commented on YARN-2861: --- The test failure is not related and reported in YARN-3064. Timeline DT secret manager should not reuse the RM's configs. - Key: YARN-2861 URL: https://issues.apache.org/jira/browse/YARN-2861 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2861.1.patch, YARN-2861.2.patch This is the configs for RM DT secret manager. We should create separate ones for timeline DT only. {code} @Override protected void serviceInit(Configuration conf) throws Exception { long secretKeyInterval = conf.getLong(YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_KEY, YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_DEFAULT); long tokenMaxLifetime = conf.getLong(YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_KEY, YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT); long tokenRenewInterval = conf.getLong(YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_KEY, YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT); secretManager = new TimelineDelegationTokenSecretManager(secretKeyInterval, tokenMaxLifetime, tokenRenewInterval, 360); secretManager.startThreads(); serviceAddr = TimelineUtils.getTimelineTokenServiceAddress(getConfig()); super.init(conf); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278326#comment-14278326 ] Hadoop QA commented on YARN-2217: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692454/YARN-2217-trunk-v9.patch against trunk revision 5805dc0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.cli.TestRMAdminCLI org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6339//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6339//console This message is automatically generated. Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, YARN-2217-trunk-v9.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk
[ https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278355#comment-14278355 ] Hadoop QA commented on YARN-3064: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692450/YARN-3064.1.patch against trunk revision 5805dc0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6338//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6338//console This message is automatically generated. TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk --- Key: YARN-3064 URL: https://issues.apache.org/jira/browse/YARN-3064 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Wangda Tan Assignee: Jian He Priority: Critical Attachments: YARN-3064.1.patch Noticed consistent tests failure, see: https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/ Logs like: {code} Error Message Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED Stacktrace java.lang.AssertionError: Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794) {code} I can reproduce it in local environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2217) Shared cache client side changes
[ https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278127#comment-14278127 ] Chris Trezzo commented on YARN-2217: The last test failure is unrelated. The patch should be good to go! Shared cache client side changes Key: YARN-2217 URL: https://issues.apache.org/jira/browse/YARN-2217 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch Implement the client side changes for the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3065) TestNodeManagerResync errors
[ https://issues.apache.org/jira/browse/YARN-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278268#comment-14278268 ] Jian He commented on YARN-3065: --- fixing as part of YARN-3064, closing as a dup. TestNodeManagerResync errors Key: YARN-3065 URL: https://issues.apache.org/jira/browse/YARN-3065 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, test Affects Versions: 2.6.0 Reporter: Karthik Kambatla Attachments: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.txt TestNodeManagerResync started failing recently, mostly due to a test timeout. See attachment for a sample test output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3065) TestNodeManagerResync errors
[ https://issues.apache.org/jira/browse/YARN-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved YARN-3065. --- Resolution: Duplicate TestNodeManagerResync errors Key: YARN-3065 URL: https://issues.apache.org/jira/browse/YARN-3065 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, test Affects Versions: 2.6.0 Reporter: Karthik Kambatla Attachments: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.txt TestNodeManagerResync started failing recently, mostly due to a test timeout. See attachment for a sample test output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode
[ https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277106#comment-14277106 ] Karthik Kambatla commented on YARN-2962: The proposed approach sounds reasonable. How about adding a config that controls the number of digits (decimal places) to get at 2:2 or 1:3 split. ZKRMStateStore: Limit the number of znodes under a znode Key: YARN-2962 URL: https://issues.apache.org/jira/browse/YARN-2962 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Varun Saxena Priority: Critical We ran into this issue where we were hitting the default ZK server message size configs, primarily because the message had too many znodes even though they individually they were all small. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode
[ https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277112#comment-14277112 ] Varun Saxena commented on YARN-2962: Yup, that sounds good. ZKRMStateStore: Limit the number of znodes under a znode Key: YARN-2962 URL: https://issues.apache.org/jira/browse/YARN-2962 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Varun Saxena Priority: Critical We ran into this issue where we were hitting the default ZK server message size configs, primarily because the message had too many znodes even though they individually they were all small. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3061) NPE in RM AppBlock render
[ https://issues.apache.org/jira/browse/YARN-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277173#comment-14277173 ] Steve Loughran commented on YARN-3061: -- you're right. closing as a duplicate NPE in RM AppBlock render - Key: YARN-3061 URL: https://issues.apache.org/jira/browse/YARN-3061 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 An RM (running in a VM which did a sleep/resume) overnight no longer launches apps, and when you try to look at the logs, Web UI says 500 look at the logs, which show a stack trace -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3061) NPE in RM AppBlock render
[ https://issues.apache.org/jira/browse/YARN-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved YARN-3061. -- Resolution: Duplicate Fix Version/s: 2.7.0 NPE in RM AppBlock render - Key: YARN-3061 URL: https://issues.apache.org/jira/browse/YARN-3061 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 An RM (running in a VM which did a sleep/resume) overnight no longer launches apps, and when you try to look at the logs, Web UI says 500 look at the logs, which show a stack trace -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode
[ https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277099#comment-14277099 ] Varun Saxena commented on YARN-2962: [~kasha], your views on this ZKRMStateStore: Limit the number of znodes under a znode Key: YARN-2962 URL: https://issues.apache.org/jira/browse/YARN-2962 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Varun Saxena Priority: Critical We ran into this issue where we were hitting the default ZK server message size configs, primarily because the message had too many znodes even though they individually they were all small. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3054) Preempt policy in FairScheduler may cause mapreduce job never finish
[ https://issues.apache.org/jira/browse/YARN-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277174#comment-14277174 ] Wei Yan commented on YARN-3054: --- Hi, [~peng.zhang]. Firstly, FairScheduler will check whether the usage is over fairness. {code} private boolean preemptContainerPreCheck() { return parent.getPolicy().checkIfUsageOverFairShare(getResourceUsage(), getFairShare()); } {code} bq. Mapreduce jobs can get additional resources when others are idle. I'm not sure what your idle meaning here. But in YARN, one queue can take the over-fairshare resource, if the resources are not used by other queues. And in FairScheduler, each queue has steady fairshare and dynamic fairshare. For example, if we have two queues (Q1 and Q2), both with weight 1. So Q1's steady share is 50%, and Q2 is also 50%. Assume only Q1 has jobs and no job submitted to Q2, Q1's dynamic fairness is 100% and Q2 is 0. The dynamic fairshare calculation only considers active queues. bq. Mapreduce jobs for one user in one queue can still progress with its min share when others preempt resources back. As I said above, each queue is guaranted with minshare and fairshare. That means, some jobs can still move on. We cannot assign a minshare to each job. Otherwise, the job with multiple concurrent jobs may take over the cluster. Preempt policy in FairScheduler may cause mapreduce job never finish Key: YARN-3054 URL: https://issues.apache.org/jira/browse/YARN-3054 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Peng Zhang Preemption policy is related with schedule policy now. Using comparator of schedule policy to find preemption candidate cannot guarantee a subset of containers never be preempted. And this may cause tasks to be preempted periodically before they finish. So job cannot make any progress. I think preemption in YARN should got below assurance: 1. Mapreduce jobs can get additional resources when others are idle; 2. Mapreduce jobs for one user in one queue can still progress with its min share when others preempt resources back. Maybe always preempt the latest app and container can get this? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2919) Potential race between renew and cancel in DelegationTokenRenwer
[ https://issues.apache.org/jira/browse/YARN-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277142#comment-14277142 ] Karthik Kambatla commented on YARN-2919: Sorry for the delay here. The approach you suggest sounds about right. I might have more concrete comments based on the patch. Potential race between renew and cancel in DelegationTokenRenwer - Key: YARN-2919 URL: https://issues.apache.org/jira/browse/YARN-2919 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Naganarasimha G R Priority: Critical Attachments: YARN-2919.20141209-1.patch YARN-2874 fixes a deadlock in DelegationTokenRenewer, but there is still a race because of which a renewal in flight isn't interrupted by a cancel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param
Prakash Ramachandran created YARN-3062: -- Summary: timelineserver gives inconsistent data for otherinfo field based on the filter param Key: YARN-3062 URL: https://issues.apache.org/jira/browse/YARN-3062 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Prakash Ramachandran When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 1009 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 253 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-2933: Attachment: YARN-2933-8.patch Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, YARN-2933-8.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277328#comment-14277328 ] Mayank Bansal commented on YARN-2933: - Thanks [~wangda] for review. bq. 1) ProportionalCapacityPreemptionPolicy.setNodeLabels is too simple to be a method, it's better to remove it. Getter and setters are usually simple but its good practice to have. I think we should keep it. bq. 2) It's better to use enum here instead of integer Done. Thanks, Mayank Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, YARN-2933-8.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param
[ https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated YARN-3062: --- Attachment: withoutfilter.json withfilter.json attaching sample output files. timelineserver gives inconsistent data for otherinfo field based on the filter param Key: YARN-3062 URL: https://issues.apache.org/jira/browse/YARN-3062 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Prakash Ramachandran Attachments: withfilter.json, withoutfilter.json When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 1009 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 253 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param
[ https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated YARN-3062: --- Description: When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 253 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 1009 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. was: When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 1009 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 253 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. timelineserver gives inconsistent data for otherinfo field based on the filter param Key: YARN-3062 URL: https://issues.apache.org/jira/browse/YARN-3062 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Prakash Ramachandran Attachments: withfilter.json, withoutfilter.json When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 253 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 1009 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2861) Timeline DT secret manager should not reuse the RM's configs.
[ https://issues.apache.org/jira/browse/YARN-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276625#comment-14276625 ] Hadoop QA commented on YARN-2861: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692169/YARN-2861.2.patch against trunk revision f92e503. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6330//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6330//console This message is automatically generated. Timeline DT secret manager should not reuse the RM's configs. - Key: YARN-2861 URL: https://issues.apache.org/jira/browse/YARN-2861 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2861.1.patch, YARN-2861.2.patch This is the configs for RM DT secret manager. We should create separate ones for timeline DT only. {code} @Override protected void serviceInit(Configuration conf) throws Exception { long secretKeyInterval = conf.getLong(YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_KEY, YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_DEFAULT); long tokenMaxLifetime = conf.getLong(YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_KEY, YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT); long tokenRenewInterval = conf.getLong(YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_KEY, YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT); secretManager = new TimelineDelegationTokenSecretManager(secretKeyInterval, tokenMaxLifetime, tokenRenewInterval, 360); secretManager.startThreads(); serviceAddr = TimelineUtils.getTimelineTokenServiceAddress(getConfig()); super.init(conf); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276766#comment-14276766 ] Hudson commented on YARN-2637: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #73 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/73/]) YARN-2637. Fixed max-am-resource-percent calculation in CapacityScheduler when activating applications. Contributed by Craig Welch (jianhe: rev c53420f58364b11fbda1dace7679d45534533382) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java * hadoop-yarn-project/CHANGES.txt maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Fix For: 2.7.0 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.31.patch,