[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623199#comment-14623199 ] Jun Gong commented on YARN-3896: Failed test cases are not related, they are addressed in YARN-3909 and YARN-3910. Kindly review the patch please. > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset > --- > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3896.01.patch, YARN-3896.02.patch, > YARN-3896.03.patch, YARN-3896.04.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623191#comment-14623191 ] Hudson commented on YARN-3116: -- FAILURE: Integrated in Hadoop-trunk-Commit #8150 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8150/]) YARN-3116. RM notifies NM whether a container is an AM container or normal task container. Contributed by Giovanni Matteo Fumarola. (zjshen: rev 1ea36299a47af302379ae0750b571ec021eb54ad) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerContext.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerTerminationContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerInitializationContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMContainerTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ContainerType.java > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Fix For: 2.8.0 > > Attachments: YARN-3116.patch, YARN-3116.v10.patch, > YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, > YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, > YARN-3116.v8.patch, YARN-3116.v9.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623180#comment-14623180 ] Hadoop QA commented on YARN-3908: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 3s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 23s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 23s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 21s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 42m 51s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744857/YARN-3908-YARN-2928.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 2d4a8f4 | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8505/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8505/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8505/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8505/console | This message was automatically generated. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623177#comment-14623177 ] Hadoop QA commented on YARN-3116: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 4s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 46s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 7s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 25s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 42s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 6m 3s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 51m 9s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 107m 54s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.container.TestContainer | | | hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService | | | hadoop.yarn.server.nodemanager.TestDeletionService | | | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744840/YARN-3116.v10.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 47f4c54 | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8503/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8503/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8503/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8503/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8503/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8503/console | This message was automatically generated. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v10.patch, > YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, > YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, > YARN-3116.v8.patch, YARN-3116.v9.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement.
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623176#comment-14623176 ] Zhijie Shen commented on YARN-3116: --- +1 for the last patch. Will commit it after jenkins comments. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v10.patch, > YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, > YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, > YARN-3116.v8.patch, YARN-3116.v9.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table
[ https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623170#comment-14623170 ] Zhijie Shen commented on YARN-3914: --- This will not block the implementation of getEntities (YARN-3049), but the performance will be bad without it, especially when the number of entities per type per app becomes huge, i.e., there's a big job. > Entity created time should be part of the row key of entity table > - > > Key: YARN-3914 > URL: https://issues.apache.org/jira/browse/YARN-3914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > Entity created time should be part of the row key of entity table, between > entity type and entity Id. The reason to have it is to index the entities. > Though we cannot index the entities for all kinds of information, indexing > them according to the created time is very necessary. Without it, every query > for the latest entities that belong to an application and a type will scan > through all the entities that belong to them. For example, if we want to list > the 100 latest started containers in an YARN app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3914) Entity created time should be part of the row key of entity table
Zhijie Shen created YARN-3914: - Summary: Entity created time should be part of the row key of entity table Key: YARN-3914 URL: https://issues.apache.org/jira/browse/YARN-3914 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Entity created time should be part of the row key of entity table, between entity type and entity Id. The reason to have it is to index the entities. Though we cannot index the entities for all kinds of information, indexing them according to the created time is very necessary. Without it, every query for the latest entities that belong to an application and a type will scan through all the entities that belong to them. For example, if we want to list the 100 latest started containers in an YARN app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3904) Adopt PhoenixTimelineWriter into time-based aggregation storage
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623167#comment-14623167 ] Hadoop QA commented on YARN-3904: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 18s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 13s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 26s | The applied patch generated 38 new checkstyle issues (total was 23, now 50). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 40s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 0m 52s | The patch appears to introduce 2 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 23s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 39m 51s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-timelineservice | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744846/YARN-3904-YARN-2928.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 2d4a8f4 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8504/artifact/patchprocess/diffcheckstylehadoop-yarn-server-timelineservice.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8504/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-timelineservice.html | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8504/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8504/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8504/console | This message was automatically generated. > Adopt PhoenixTimelineWriter into time-based aggregation storage > --- > > Key: YARN-3904 > URL: https://issues.apache.org/jira/browse/YARN-3904 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-3904-YARN-2928.001.patch, > YARN-3904-YARN-2928.002.patch > > > After we finished the design for time-based aggregation, we can adopt our > existing Phoenix storage into the storage of the aggregated data. This JIRA > proposes to move the Phoenix storage implementation from > o.a.h.yarn.server.timelineservice.storage to > o.a.h.yarn.server.timelineservice.aggregation.timebased, and make it a fully > devoted writer for time-based aggregation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3908: -- Attachment: YARN-3908-YARN-2928.003.patch v.3 patch posted - fixed the javac warning > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623133#comment-14623133 ] Hadoop QA commented on YARN-3908: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 52s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 7m 46s | The applied patch generated 1 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 48s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 58s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 20s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 22s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 42m 19s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744831/YARN-3908-YARN-2928.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 2d4a8f4 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8502/artifact/patchprocess/diffJavacWarnings.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8502/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8502/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8502/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8502/console | This message was automatically generated. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3910) TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk
[ https://issues.apache.org/jira/browse/YARN-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623119#comment-14623119 ] Hadoop QA commented on YARN-3910: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 6m 48s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 36s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 21s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 50m 53s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 69m 47s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744782/YARN-3910.001.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 47f4c54 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8501/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8501/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8501/console | This message was automatically generated. > TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk > > > Key: YARN-3910 > URL: https://issues.apache.org/jira/browse/YARN-3910 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3910.001.patch > > > Check https://builds.apache.org/job/PreCommit-YARN-Build/8493/testReport/ > {noformat} > Running > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions > Tests run: 44, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 3.515 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions > testAppAcceptedAttemptKilled[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) > Time elapsed: 0.049 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) > testAppAcceptedAttemptKilled[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) > Time elapsed: 0.031 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3912) Fix typos in hadoop-yarn-project module
[ https://issues.apache.org/jira/browse/YARN-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623117#comment-14623117 ] Hadoop QA commented on YARN-3912: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 24m 24s | Pre-patch trunk has 6 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 36 new or modified test files. | | {color:green}+1{color} | javac | 7m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 46s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 6m 9s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 1m 9s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 10m 59s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 21s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 7m 0s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:red}-1{color} | yarn tests | 6m 58s | Tests failed in hadoop-yarn-client. | | {color:red}-1{color} | yarn tests | 1m 57s | Tests failed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 0m 56s | Tests passed in hadoop-yarn-registry. | | {color:green}+1{color} | yarn tests | 3m 14s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 6m 0s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 51m 22s | Tests failed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-server-web-proxy. | | | | 141m 29s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart | | | hadoop.yarn.conf.TestHAUtil | | | hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService | | | hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch | | | hadoop.yarn.server.nodemanager.containermanager.container.TestContainer | | | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744765/YARN-3912.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 47f4c54 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8498/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8498/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8498/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8498/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8498/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-registry test log | https://builds.apache.org/job/PreCommit-YARN-Build/8498/artifact/patchprocess/testrun_hadoop-yarn-registry.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8498/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8498/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8498/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-
[jira] [Commented] (YARN-3900) Protobuf layout of yarn_security_token causes errors in other protos that include it
[ https://issues.apache.org/jira/browse/YARN-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623113#comment-14623113 ] Hadoop QA commented on YARN-3900: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 15s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 8s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 20s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 56s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 3m 12s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:red}-1{color} | yarn tests | 51m 4s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 100m 6s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744815/YARN-3900.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 47f4c54 | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8499/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8499/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8499/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8499/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8499/console | This message was automatically generated. > Protobuf layout of yarn_security_token causes errors in other protos that > include it > - > > Key: YARN-3900 > URL: https://issues.apache.org/jira/browse/YARN-3900 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3900.001.patch, YARN-3900.001.patch > > > Because of the subdirectory server used in > {{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto}} > there are errors in other protos that include them. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3910) TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk
[ https://issues.apache.org/jira/browse/YARN-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623091#comment-14623091 ] Anubhav Dhoot commented on YARN-3910: - Hi Varun, Its possible that the schedulerDispatcher.lastSchedulerEvent is null if its not been set yet. Check for not null as well in the while loop would be good in that case. It would be nicer to reduce the sleep to 100 instead of 1000. > TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk > > > Key: YARN-3910 > URL: https://issues.apache.org/jira/browse/YARN-3910 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3910.001.patch > > > Check https://builds.apache.org/job/PreCommit-YARN-Build/8493/testReport/ > {noformat} > Running > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions > Tests run: 44, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 3.515 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions > testAppAcceptedAttemptKilled[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) > Time elapsed: 0.049 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) > testAppAcceptedAttemptKilled[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) > Time elapsed: 0.031 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623084#comment-14623084 ] Jian He commented on YARN-3866: --- thanks for the feedback. [~mding], we can keep that as is then. thanks ! > AM-RM protocol changes to support container resizing > > > Key: YARN-3866 > URL: https://issues.apache.org/jira/browse/YARN-3866 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-3866.1.patch, YARN-3866.2.patch > > > YARN-1447 and YARN-1448 are outdated. > This ticket deals with AM-RM Protocol changes to support container resize > according to the latest design in YARN-1197. > 1) Add increase/decrease requests in AllocateRequest > 2) Get approved increase/decrease requests from RM in AllocateResponse > 3) Add relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3904) Adopt PhoenixTimelineWriter into time-based aggregation storage
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3904: Attachment: YARN-3904-YARN-2928.002.patch Oops, forgot to add ASF licensing info for one file... > Adopt PhoenixTimelineWriter into time-based aggregation storage > --- > > Key: YARN-3904 > URL: https://issues.apache.org/jira/browse/YARN-3904 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-3904-YARN-2928.001.patch, > YARN-3904-YARN-2928.002.patch > > > After we finished the design for time-based aggregation, we can adopt our > existing Phoenix storage into the storage of the aggregated data. This JIRA > proposes to move the Phoenix storage implementation from > o.a.h.yarn.server.timelineservice.storage to > o.a.h.yarn.server.timelineservice.aggregation.timebased, and make it a fully > devoted writer for time-based aggregation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3904) Adopt PhoenixTimelineWriter into time-based aggregation storage
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623069#comment-14623069 ] Hadoop QA commented on YARN-3904: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 13s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 8m 2s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 19s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 15s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 0m 52s | The patch appears to introduce 2 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 24s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 39m 23s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-timelineservice | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744768/YARN-3904-YARN-2928.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 2d4a8f4 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/8500/artifact/patchprocess/patchReleaseAuditProblems.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8500/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-timelineservice.html | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8500/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8500/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8500/console | This message was automatically generated. > Adopt PhoenixTimelineWriter into time-based aggregation storage > --- > > Key: YARN-3904 > URL: https://issues.apache.org/jira/browse/YARN-3904 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-3904-YARN-2928.001.patch > > > After we finished the design for time-based aggregation, we can adopt our > existing Phoenix storage into the storage of the aggregated data. This JIRA > proposes to move the Phoenix storage implementation from > o.a.h.yarn.server.timelineservice.storage to > o.a.h.yarn.server.timelineservice.aggregation.timebased, and make it a fully > devoted writer for time-based aggregation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1645) ContainerManager implementation to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623048#comment-14623048 ] Jian He commented on YARN-1645: --- - This check should not be needed, because AM should be able to resize an existing container no matter RM restarted or not. {code} if (containerTokenIdentifier.getRMIdentifier() != nodeStatusUpdater .getRMIdentifier()) { // Is the container coming from unknown RM StringBuilder sb = new StringBuilder("\nContainer "); sb.append(containerTokenIdentifier.getContainerID().toString()) .append(" rejected as it is allocated by a previous RM"); throw new InvalidContainerException(sb.toString()); } {code} - A lot of code is duplicate between authorizeStartRequest and authorizeResourceIncreaseRequest - could you refactor the code to share the same code ? - Portion of the code belongs to YARN-1644 and the patch won't compile. > ContainerManager implementation to support container resizing > - > > Key: YARN-1645 > URL: https://issues.apache.org/jira/browse/YARN-1645 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Wangda Tan >Assignee: MENG DING > Attachments: YARN-1645.1.patch, YARN-1645.2.patch, yarn-1645.1.patch > > > Implementation of ContainerManager for container resize, including: > 1) ContainerManager resize logic > 2) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623047#comment-14623047 ] Sandy Ryza commented on YARN-3866: -- Hi [~jianhe]. Most application writers should be using AMRMClient, so not dealing with this interface directly. That said, given that they are separate data types, I think two different methods would be preferable. > AM-RM protocol changes to support container resizing > > > Key: YARN-3866 > URL: https://issues.apache.org/jira/browse/YARN-3866 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-3866.1.patch, YARN-3866.2.patch > > > YARN-1447 and YARN-1448 are outdated. > This ticket deals with AM-RM Protocol changes to support container resize > according to the latest design in YARN-1197. > 1) Add increase/decrease requests in AllocateRequest > 2) Get approved increase/decrease requests from RM in AllocateResponse > 3) Add relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623025#comment-14623025 ] Giovanni Matteo Fumarola commented on YARN-3116: Done in V10.patch. Thanks for the feedback. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v10.patch, > YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, > YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, > YARN-3116.v8.patch, YARN-3116.v9.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-3116: --- Attachment: YARN-3116.v10.patch > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v10.patch, > YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, > YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, > YARN-3116.v8.patch, YARN-3116.v9.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1449) AM-NM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623018#comment-14623018 ] Jian He commented on YARN-1449: --- - the initContainersToIncrease call is unnecessary, containersToIncrease is immediately cleared after wards. {code} if (containersToIncrease == null) { return; } initContainersToIncrease(); this.containersToIncrease.clear(); this.containersToIncrease.addAll(containersToIncrease); {code} I have also refreshed YARN-1197 branch, you can name your patch like "YARN-1449-YARN-1197.4.patch", which will trigger jenkins to run YARN-1449 against YARN-1197 branch. > AM-NM protocol changes to support container resizing > > > Key: YARN-1449 > URL: https://issues.apache.org/jira/browse/YARN-1449 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan (No longer used) >Assignee: MENG DING > Attachments: YARN-1449.1.patch, YARN-1449.2.patch, YARN-1449.3.patch, > yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch > > > AM-NM protocol changes to support container resizing > 1) "IncreaseContainersResourceRequest" and > "IncreaseContainersResourceResponse" PB protocol and implementation > 2) "increaseContainersResources" method in ContainerManagementProtocol > 3) Update "ContainerStatus" protocol to include Resource > 4) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622998#comment-14622998 ] Zhijie Shen commented on YARN-3116: --- one nit: can we move ContainerType to server/api? > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, > YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622987#comment-14622987 ] Sangjin Lee commented on YARN-3908: --- A couple of comments on the latest patch: I find that essentially the event timestamp is stored in the same manner as a metric timestamp. So the mechanism of retrieving the event timestamp is nearly identical to {{ColumnPrefix.readTimeseriesResults()}}. The only restriction of the previous version of that method is that it explicitly cast the values to {{Number}}. In the patch I genericized the method to handle any value type. Having said that, I recognize that reading an event using a method named {{readTimeseriesResults()}} is rather awkward. But alternatives are not great either. {{ColumnPrefix}} is an enum, and it is quite awkward to introduce a method that is useful for only one value of that enum. Perhaps we could create a static wrapper method to bridge that gap. Let me know what you think. Finally, I find that we're still not persisting the metric types. I could be wrong but it appears that all metrics are treated as time series when they are stored. I'll see if it would be straightforward to implement that piece, but it could be bit involved. How about capturing that in its own subtask? > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3908: -- Attachment: YARN-3908-YARN-2928.002.patch v.2 patch posted - replaced the null check for {{getTimestamp()}} with a check for value 0 as it returns a primitive long - genericized {{ColumnPrefix.readTimeseriesResults()}} to return any value type - fixed the unit test to rely on {{EntityColumnPrefix.readTimeseriesResults()}} > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3900) Protobuf layout of yarn_security_token causes errors in other protos that include it
[ https://issues.apache.org/jira/browse/YARN-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3900: Attachment: YARN-3900.001.patch Retriggering jenkins as failures did not repro locally for me > Protobuf layout of yarn_security_token causes errors in other protos that > include it > - > > Key: YARN-3900 > URL: https://issues.apache.org/jira/browse/YARN-3900 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3900.001.patch, YARN-3900.001.patch > > > Because of the subdirectory server used in > {{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto}} > there are errors in other protos that include them. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622859#comment-14622859 ] Zhijie Shen commented on YARN-3116: --- Sure, I'll review the latest patch this afternoon. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, > YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3910) TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk
[ https://issues.apache.org/jira/browse/YARN-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3910: --- Attachment: YARN-3910.001.patch > TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk > > > Key: YARN-3910 > URL: https://issues.apache.org/jira/browse/YARN-3910 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3910.001.patch > > > Check https://builds.apache.org/job/PreCommit-YARN-Build/8493/testReport/ > {noformat} > Running > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions > Tests run: 44, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 3.515 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions > testAppAcceptedAttemptKilled[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) > Time elapsed: 0.049 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) > testAppAcceptedAttemptKilled[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) > Time elapsed: 0.031 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3910) TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk
[ https://issues.apache.org/jira/browse/YARN-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3910: --- Attachment: (was: YARN-3910.01.patch) > TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk > > > Key: YARN-3910 > URL: https://issues.apache.org/jira/browse/YARN-3910 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > > Check https://builds.apache.org/job/PreCommit-YARN-Build/8493/testReport/ > {noformat} > Running > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions > Tests run: 44, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 3.515 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions > testAppAcceptedAttemptKilled[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) > Time elapsed: 0.049 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) > testAppAcceptedAttemptKilled[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) > Time elapsed: 0.031 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3910) TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk
[ https://issues.apache.org/jira/browse/YARN-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622766#comment-14622766 ] Varun Saxena commented on YARN-3910: {noformat} 2015-07-10 07:45:23,412 INFO [main] security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:(94)) - AMRMTokenKeyRollingInterval: 8640ms and AMRMTokenKeyActivationDelay: 90 ms 2015-07-10 07:45:23,412 INFO [main] security.RMContainerTokenSecretManager (RMContainerTokenSecretManager.java:(77)) - ContainerTokenKeyRollingInterval: 8640ms and ContainerTokenKeyActivationDelay: 90ms 2015-07-10 07:45:23,413 INFO [main] security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:(75)) - NMTokenKeyRollingInterval: 8640ms and NMTokenKeyActivationDelay: 90ms 2015-07-10 07:45:23,421 INFO [main] event.AsyncDispatcher (AsyncDispatcher.java:register(197)) - Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler 2015-07-10 07:45:23,422 INFO [main] event.AsyncDispatcher (AsyncDispatcher.java:register(197)) - Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType for class org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions$TestApplicationAttemptEventDispatcher 2015-07-10 07:45:23,422 INFO [main] event.AsyncDispatcher (AsyncDispatcher.java:register(197)) - Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for class org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions$TestApplicationEventDispatcher 2015-07-10 07:45:23,422 INFO [main] event.AsyncDispatcher (AsyncDispatcher.java:register(197)) - Registering class org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions$TestApplicationManagerEventDispatcher 2015-07-10 07:45:23,422 INFO [main] event.AsyncDispatcher (AsyncDispatcher.java:register(197)) - Registering class org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType for class org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions$TestSchedulerEventDispatcher 2015-07-10 07:45:23,423 INFO [main] rmapp.TestRMAppTransitions (TestRMAppTransitions.java:testAppAcceptedAttemptKilled(725)) - --- START: testAppAcceptedAttemptKilled --- 2015-07-10 07:45:23,423 WARN [main] rmapp.RMAppImpl (RMAppImpl.java:(414)) - The specific max attempts: 0 for application: 33 is invalid, because it is out of the range [1, 2]. Use the global max attempts instead. 2015-07-10 07:45:23,425 INFO [main] rmapp.RMAppImpl (RMAppImpl.java:transition(1042)) - Storing application with id application_1436514322006_0033 2015-07-10 07:45:23,425 INFO [main] rmapp.RMAppImpl (RMAppImpl.java:handle(768)) - application_1436514322006_0033 State change from NEW to NEW_SAVING 2015-07-10 07:45:23,425 INFO [main] rmapp.RMAppImpl (RMAppImpl.java:handle(768)) - application_1436514322006_0033 State change from NEW_SAVING to SUBMITTED 2015-07-10 07:45:23,426 INFO [main] rmapp.RMAppImpl (RMAppImpl.java:handle(768)) - application_1436514322006_0033 State change from SUBMITTED to ACCEPTED 2015-07-10 07:45:23,427 INFO [main] rmapp.RMAppImpl (RMAppImpl.java:rememberTargetTransitionsAndStoreState(1061)) - Updating application application_1436514322006_0033 with final state: KILLED 2015-07-10 07:45:23,427 INFO [main] rmapp.RMAppImpl (RMAppImpl.java:handle(768)) - application_1436514322006_0033 State change from ACCEPTED to FINAL_SAVING 2015-07-10 07:45:23,428 INFO [main] rmapp.RMAppImpl (RMAppImpl.java:handle(768)) - application_1436514322006_0033 State change from FINAL_SAVING to KILLED 2015-07-10 07:45:23,428 INFO [AsyncDispatcher event handler] resourcemanager.ApplicationMasterService (ApplicationMasterService.java:registerAppAttempt(699)) - Registering app attempt : appattempt_1436514322006_0033_01 2015-07-10 07:45:23,429 INFO [AsyncDispatcher event handler] attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(793)) - appattempt_1436514322006_0033_01 State change from NEW to SUBMITTED {noformat} > TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk > > > Key: YARN-3910 > URL: https://issues.apache.org/jira/browse/YARN-3910 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3910.01.patch > > > Check https://builds.apache.org/job/PreCommit-YARN-Build/8493/testReport/ > {noformat} > Running > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions > Tests run: 44, Failures: 2, Errors: 0, Sk
[jira] [Created] (YARN-3913) TestResourceTrackerService#testReconnectNode fails on trunk
Varun Saxena created YARN-3913: -- Summary: TestResourceTrackerService#testReconnectNode fails on trunk Key: YARN-3913 URL: https://issues.apache.org/jira/browse/YARN-3913 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622756#comment-14622756 ] Subru Krishnan commented on YARN-3116: -- [~zjshen], I am working with [~kishorch] on YARN-2884 and we do *not* want to expose the _containerType_ to user. We are updating the YARN-2884 patch to use _containerInitializationContext_ to determine if the container is AM or not. Seperately [~kkaranasos] is working on YARN-2882 where he needs to expose the container*Request*Type to the user. That is complementary work & should *not* affect this patch. So can we get this committed soon as YARN-2884 is blocked on this. Thanks! > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, > YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3910) TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk
[ https://issues.apache.org/jira/browse/YARN-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3910: --- Description: Check https://builds.apache.org/job/PreCommit-YARN-Build/8493/testReport/ {noformat} Running org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions Tests run: 44, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 3.515 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions testAppAcceptedAttemptKilled[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.049 sec <<< FAILURE! java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) testAppAcceptedAttemptKilled[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.031 sec <<< FAILURE! java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) {noformat} was: {noformat} Running org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions Tests run: 44, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 3.515 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions testAppAcceptedAttemptKilled[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.049 sec <<< FAILURE! java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) testAppAcceptedAttemptKilled[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.031 sec <<< FAILURE! java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) {noformat} > TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk > > > Key: YARN-3910 > URL: https://issues.apache.org/jira/browse/YARN-3910 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3910.01.patch > > > Check https://builds.apache.org/job/PreCommit-YARN-Build/8493/testReport/ > {noformat} > Running > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions > Tests run: 44, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 3.515 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions > testAppAcceptedAttemptKilled[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) > Time elapsed: 0.049 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.j
[jira] [Updated] (YARN-3910) TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk
[ https://issues.apache.org/jira/browse/YARN-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3910: --- Attachment: YARN-3910.01.patch > TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk > > > Key: YARN-3910 > URL: https://issues.apache.org/jira/browse/YARN-3910 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3910.01.patch > > > Check https://builds.apache.org/job/PreCommit-YARN-Build/8493/testReport/ > {noformat} > Running > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions > Tests run: 44, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 3.515 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions > testAppAcceptedAttemptKilled[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) > Time elapsed: 0.049 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) > testAppAcceptedAttemptKilled[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) > Time elapsed: 0.031 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3904) Adopt PhoenixTimelineWriter into time-based aggregation storage
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3904: Attachment: YARN-3904-YARN-2928.001.patch I moved the Phoenix writer to the time based aggregation package, and rebuilt it so that it can accommodate flow and user types of offline aggregations. > Adopt PhoenixTimelineWriter into time-based aggregation storage > --- > > Key: YARN-3904 > URL: https://issues.apache.org/jira/browse/YARN-3904 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-3904-YARN-2928.001.patch > > > After we finished the design for time-based aggregation, we can adopt our > existing Phoenix storage into the storage of the aggregated data. This JIRA > proposes to move the Phoenix storage implementation from > o.a.h.yarn.server.timelineservice.storage to > o.a.h.yarn.server.timelineservice.aggregation.timebased, and make it a fully > devoted writer for time-based aggregation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622727#comment-14622727 ] Sangjin Lee commented on YARN-3908: --- Thanks [~vrushalic]! I'll look at it and see if more changes are needed. Other reviews are welcome too. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3912) Fix typos in hadoop-yarn-project module
[ https://issues.apache.org/jira/browse/YARN-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-3912: - Attachment: YARN-3912.001.patch Initial version based off HADOOP-11854.005.patch. > Fix typos in hadoop-yarn-project module > --- > > Key: YARN-3912 > URL: https://issues.apache.org/jira/browse/YARN-3912 > Project: Hadoop YARN > Issue Type: Task >Affects Versions: 2.7.1 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Minor > Labels: supportability > Attachments: YARN-3912.001.patch > > > Fix a bunch of typos in comments, strings, variable names, and method names > in the hadoop-yarn-project module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3912) Fix typos in hadoop-yarn-project module
Ray Chiang created YARN-3912: Summary: Fix typos in hadoop-yarn-project module Key: YARN-3912 URL: https://issues.apache.org/jira/browse/YARN-3912 Project: Hadoop YARN Issue Type: Task Affects Versions: 2.7.1 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Fix a bunch of typos in comments, strings, variable names, and method names in the hadoop-yarn-project module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3911) Add tail of stderr to diagnostics if container fails to launch or it container logs are empty
Bikas Saha created YARN-3911: Summary: Add tail of stderr to diagnostics if container fails to launch or it container logs are empty Key: YARN-3911 URL: https://issues.apache.org/jira/browse/YARN-3911 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha The stderr may have useful info in those cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3910) TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk
Varun Saxena created YARN-3910: -- Summary: TestRMAppTransitions#testAppAcceptedAttemptKilled fails on trunk Key: YARN-3910 URL: https://issues.apache.org/jira/browse/YARN-3910 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Saxena Assignee: Varun Saxena {noformat} Running org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions Tests run: 44, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 3.515 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions testAppAcceptedAttemptKilled[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.049 sec <<< FAILURE! java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) testAppAcceptedAttemptKilled[1](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions) Time elapsed: 0.031 sec <<< FAILURE! java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppRemovedSchedulerEvent(TestRMAppTransitions.java:1032) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppAcceptedAttemptKilled(TestRMAppTransitions.java:742) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3909) TestAMRMRPCNodeUpdates#testAMRMUnusableNodes fails on trunk
Varun Saxena created YARN-3909: -- Summary: TestAMRMRPCNodeUpdates#testAMRMUnusableNodes fails on trunk Key: YARN-3909 URL: https://issues.apache.org/jira/browse/YARN-3909 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena {noformat} Running org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.413 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates) Time elapsed: 5.327 sec <<< FAILURE! java.lang.AssertionError: expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:156) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622691#comment-14622691 ] Hitesh Shah commented on YARN-867: -- [~vinodkv] [~xgong] Is this still open or addressed elsewhere? > Isolation of failures in aux services > -- > > Key: YARN-867 > URL: https://issues.apache.org/jira/browse/YARN-867 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hitesh Shah >Assignee: Xuan Gong >Priority: Critical > Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, > YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, > YARN-867.sampleCode.2.patch > > > Today, a malicious application can bring down the NM by sending bad data to a > service. For example, sending data to the ShuffleService such that it results > any non-IOException will cause the NM's async dispatcher to exit as the > service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622665#comment-14622665 ] Giovanni Matteo Fumarola commented on YARN-3116: I looked at the test failures, they are unrelated to my patch. I get the same test failures when I tested locally on trunk without applying my patch. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, > YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622581#comment-14622581 ] Hadoop QA commented on YARN-3896: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 49s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 7s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 20s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 16s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 0m 51s | Tests passed in hadoop-sls. | | {color:red}-1{color} | yarn tests | 51m 2s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 91m 50s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744729/YARN-3896.04.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b489080 | | hadoop-sls test log | https://builds.apache.org/job/PreCommit-YARN-Build/8497/artifact/patchprocess/testrun_hadoop-sls.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8497/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8497/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8497/console | This message was automatically generated. > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset > --- > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3896.01.patch, YARN-3896.02.patch, > YARN-3896.03.patch, YARN-3896.04.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622514#comment-14622514 ] Junping Du commented on YARN-3816: -- >From discussions in YARN-3815, we haven't reach into consensus for aggregation >of framework specific metrics/counters. In this JIRA, I think we should >address YARN system metrics (container resource consumption) first and give >framework specific metrics more time to discussion. Will try to deliver a >poc/demo patch over the weekend. > [Aggregation] App-level Aggregation for YARN system metrics > --- > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du > Attachments: Application Level Aggregation of Timeline Data.pdf > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1643) Make ContainersMonitor can support change monitoring size of an allocated container in NM side
[ https://issues.apache.org/jira/browse/YARN-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MENG DING updated YARN-1643: Attachment: YARN-1643.3.patch Updated the patch to resolve the conflicts with YARN-1012 based on the latest rebase. > Make ContainersMonitor can support change monitoring size of an allocated > container in NM side > -- > > Key: YARN-1643 > URL: https://issues.apache.org/jira/browse/YARN-1643 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Wangda Tan >Assignee: MENG DING > Attachments: YARN-1643.1.patch, YARN-1643.2.patch, YARN-1643.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1449) AM-NM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MENG DING updated YARN-1449: Attachment: YARN-1449.3.patch Updated patch to mark new public APIs as Unstable > AM-NM protocol changes to support container resizing > > > Key: YARN-1449 > URL: https://issues.apache.org/jira/browse/YARN-1449 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan (No longer used) >Assignee: MENG DING > Attachments: YARN-1449.1.patch, YARN-1449.2.patch, YARN-1449.3.patch, > yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch > > > AM-NM protocol changes to support container resizing > 1) "IncreaseContainersResourceRequest" and > "IncreaseContainersResourceResponse" PB protocol and implementation > 2) "increaseContainersResources" method in ContainerManagementProtocol > 3) Update "ContainerStatus" protocol to include Resource > 4) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3816: - Summary: [Aggregation] App-level Aggregation for YARN system metrics (was: [Aggregation] App-level Aggregation on entity tables) > [Aggregation] App-level Aggregation for YARN system metrics > --- > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du > Attachments: Application Level Aggregation of Timeline Data.pdf > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId
[ https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622487#comment-14622487 ] Hudson commented on YARN-3445: -- FAILURE: Integrated in Hadoop-trunk-Commit #8148 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8148/]) YARN-3445. Cache runningApps in RMNode for getting running apps on given NodeId. (Junping Du via mingma) (mingma: rev 08244264c0583472b9c4e16591cfde72c6db62a2) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java > Cache runningApps in RMNode for getting running apps on given NodeId > > > Key: YARN-3445 > URL: https://issues.apache.org/jira/browse/YARN-3445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du > Fix For: 2.8.0 > > Attachments: YARN-3445-v2.patch, YARN-3445-v3.1.patch, > YARN-3445-v3.patch, YARN-3445-v4.1.patch, YARN-3445-v4.patch, > YARN-3445-v5.1.patch, YARN-3445-v5.patch, YARN-3445.patch > > > Per discussion in YARN-3334, we need filter out unnecessary collectors info > from RM in heartbeat response. Our propose is to add cache for runningApps in > RMNode, so RM only send collectors for local running apps back. This is also > needed in YARN-914 (graceful decommission) that if no running apps in NM > which is in decommissioning stage, it will get decommissioned immediately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM
[ https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622477#comment-14622477 ] Varun Saxena commented on YARN-3644: A typo. Meant "I think you *can* check whether flow has been hit or not using Mockito mock or spy" > Node manager shuts down if unable to connect with RM > > > Key: YARN-3644 > URL: https://issues.apache.org/jira/browse/YARN-3644 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Srikanth Sundarrajan >Assignee: Raju Bairishetti > Attachments: YARN-3644.001.patch, YARN-3644.001.patch, > YARN-3644.002.patch, YARN-3644.003.patch, YARN-3644.patch > > > When NM is unable to connect to RM, NM shuts itself down. > {code} > } catch (ConnectException e) { > //catch and throw the exception if tried MAX wait time to connect > RM > dispatcher.getEventHandler().handle( > new NodeManagerEvent(NodeManagerEventType.SHUTDOWN)); > throw new YarnRuntimeException(e); > {code} > In large clusters, if RM is down for maintenance for longer period, all the > NMs shuts themselves down, requiring additional work to bring up the NMs. > Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side > effects, where non connection failures are being retried infinitely by all > YarnClients (via RMProxy). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM
[ https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622468#comment-14622468 ] Varun Saxena commented on YARN-3644: Moreover, I think most, if not all of the My classes added by you are not required. You can easily use mocking and use current classes to achieve the same result. You just need to throw an exception while calling heartbeat. You can easily use Mockito to achieve it. We can probably change visibility of {{getRMClient}} method in one of the MyNodeStatusUpdater* class which you can use so that its visible for use with Mockito. This will greatly reduce unnecessary code. And IMHO, changing a method of a private class in test scope to public shouldn't be an issue. Thoughts ? You can probably explore this option to refactor your code. > Node manager shuts down if unable to connect with RM > > > Key: YARN-3644 > URL: https://issues.apache.org/jira/browse/YARN-3644 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Srikanth Sundarrajan >Assignee: Raju Bairishetti > Attachments: YARN-3644.001.patch, YARN-3644.001.patch, > YARN-3644.002.patch, YARN-3644.003.patch, YARN-3644.patch > > > When NM is unable to connect to RM, NM shuts itself down. > {code} > } catch (ConnectException e) { > //catch and throw the exception if tried MAX wait time to connect > RM > dispatcher.getEventHandler().handle( > new NodeManagerEvent(NodeManagerEventType.SHUTDOWN)); > throw new YarnRuntimeException(e); > {code} > In large clusters, if RM is down for maintenance for longer period, all the > NMs shuts themselves down, requiring additional work to bring up the NMs. > Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side > effects, where non connection failures are being retried infinitely by all > YarnClients (via RMProxy). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM
[ https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622459#comment-14622459 ] Varun Saxena commented on YARN-3644: Thanks [~raju.bairishetti] for working on this. FTew comments : # No need to override {{MyNodeManager3#getNodeStatusUpdater}} and {{MyNodeManager3#serviceStop}}. serviceStop for instance only calls {{super.serviceStop()}} # {{MyNodeStatusUpdater6#context}} is not required. # The test doesnt really check for whether ConnectionException was thrown or NM Shutdown event was called or not. I think you check whether the flow has been hit or not using Mockito mock or spy. For instance, if you call {{NodeManager#start}}, service state will be STARTED irrespective of whether code written by you has been hit or not. # I think log added can be logged at WARN level instead of ERROR. # Also the log says "Not shutting down NodeManager. Retry after default heartbeat interval time". We can instead say something like "Unable to connect to RM...Retry after default heartbeat time". # The config name is {{yarn.nodemanager.shutdown.on.RM.connection.failures}}. All our config names are in lowercase, just for the sake of consistency, maybe RM can be in lowercase too. Thoughts ? > Node manager shuts down if unable to connect with RM > > > Key: YARN-3644 > URL: https://issues.apache.org/jira/browse/YARN-3644 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Srikanth Sundarrajan >Assignee: Raju Bairishetti > Attachments: YARN-3644.001.patch, YARN-3644.001.patch, > YARN-3644.002.patch, YARN-3644.003.patch, YARN-3644.patch > > > When NM is unable to connect to RM, NM shuts itself down. > {code} > } catch (ConnectException e) { > //catch and throw the exception if tried MAX wait time to connect > RM > dispatcher.getEventHandler().handle( > new NodeManagerEvent(NodeManagerEventType.SHUTDOWN)); > throw new YarnRuntimeException(e); > {code} > In large clusters, if RM is down for maintenance for longer period, all the > NMs shuts themselves down, requiring additional work to bring up the NMs. > Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side > effects, where non connection failures are being retried infinitely by all > YarnClients (via RMProxy). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622450#comment-14622450 ] Arun Suresh commented on YARN-3453: --- The test case failures are unrelated and they run fine on my laptop. [~kasha], can you please give it a quick rev (only major thing changed apart from addressing your previous comments is addition of test cases) > Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator > even in DRF mode causing thrashing > > > Key: YARN-3453 > URL: https://issues.apache.org/jira/browse/YARN-3453 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Ashwin Shankar >Assignee: Arun Suresh > Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, > YARN-3453.4.patch, YARN-3453.5.patch > > > There are two places in preemption code flow where DefaultResourceCalculator > is used, even in DRF mode. > Which basically results in more resources getting preempted than needed, and > those extra preempted containers aren’t even getting to the “starved” queue > since scheduling logic is based on DRF's Calculator. > Following are the two places : > 1. {code:title=FSLeafQueue.java|borderStyle=solid} > private boolean isStarved(Resource share) > {code} > A queue shouldn’t be marked as “starved” if the dominant resource usage > is >= fair/minshare. > 2. {code:title=FairScheduler.java|borderStyle=solid} > protected Resource resToPreempt(FSLeafQueue sched, long curTime) > {code} > -- > One more thing that I believe needs to change in DRF mode is : during a > preemption round,if preempting a few containers results in satisfying needs > of a resource type, then we should exit that preemption round, since the > containers that we just preempted should bring the dominant resource usage to > min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-3896: --- Attachment: YARN-3896.04.patch Update patch to fix whitespace error. Failed test cases are not related. I will create new issues to address them. > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset > --- > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3896.01.patch, YARN-3896.02.patch, > YARN-3896.03.patch, YARN-3896.04.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622435#comment-14622435 ] Hudson commented on YARN-1012: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #250 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/250/]) YARN-1012. Report NM aggregated container resource utilization in heartbeat. (Inigo Goiri via kasha) (kasha: rev 527c40e4d664c721b8f32d7cd8df21b2666fea8a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/package-info.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeStatus.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/ResourceUtilization.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/ResourceUtilizationPBImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/NodeStatusPBImpl.java > Report NM aggregated container resource utilization in heartbeat > > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Fix For: 2.8.0 > > Attachments: YARN-1012-1.patch, YARN-1012-10.patch, > YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, > YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, > YARN-1012-9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622430#comment-14622430 ] Hudson commented on YARN-3878: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #250 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/250/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (kasha: rev aa067c6aa47b4c79577096817acc00ad6421180c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/CHANGES.txt > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for
[jira] [Commented] (YARN-3800) Reduce storage footprint for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622434#comment-14622434 ] Hudson commented on YARN-3800: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #250 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/250/]) YARN-3800. Reduce storage footprint for ReservationAllocation. Contributed by Anubhav Dhoot. (Carlo Curino: rev 0e602fa3a1529134214452fba10a90307d9c2072) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/RLESparseResourceAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestRLESparseResourceAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/GreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java * hadoop-yarn-project/CHANGES.txt > Reduce storage footprint for ReservationAllocation > -- > > Key: YARN-3800 > URL: https://issues.apache.org/jira/browse/YARN-3800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.8.0 > > Attachments: YARN-3800.001.patch, YARN-3800.002.patch, > YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, > YARN-3800.005.patch > > > Instead of storing the ReservationRequest we store the Resource for > allocations, as thats the only thing we need. Ultimately we convert > everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3888) ApplicationMaster link is broken in RM WebUI when appstate is NEW
[ https://issues.apache.org/jira/browse/YARN-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622437#comment-14622437 ] Hudson commented on YARN-3888: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #250 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/250/]) YARN-3888. ApplicationMaster link is broken in RM WebUI when appstate is (xgong: rev 52148767924baf423172d26f2c6d8a4cfc6e143f) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java > ApplicationMaster link is broken in RM WebUI when appstate is NEW > -- > > Key: YARN-3888 > URL: https://issues.apache.org/jira/browse/YARN-3888 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: 0001-YARN-3888.patch, 0002-YARN-3888.patch > > > When the application state is NEW in RM Web UI *Application Master* link is > broken. > {code} > 15/07/06 19:46:16 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:18 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:20 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > {code} > *URL formed* > http://:45020/cluster/app/application_1436191509558_0003 > The above link is broken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622367#comment-14622367 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #240 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/240/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (kasha: rev aa067c6aa47b4c79577096817acc00ad6421180c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0
[jira] [Commented] (YARN-3800) Reduce storage footprint for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622371#comment-14622371 ] Hudson commented on YARN-3800: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #240 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/240/]) YARN-3800. Reduce storage footprint for ReservationAllocation. Contributed by Anubhav Dhoot. (Carlo Curino: rev 0e602fa3a1529134214452fba10a90307d9c2072) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestRLESparseResourceAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/GreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/RLESparseResourceAllocation.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSimpleCapacityReplanner.java > Reduce storage footprint for ReservationAllocation > -- > > Key: YARN-3800 > URL: https://issues.apache.org/jira/browse/YARN-3800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.8.0 > > Attachments: YARN-3800.001.patch, YARN-3800.002.patch, > YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, > YARN-3800.005.patch > > > Instead of storing the ReservationRequest we store the Resource for > allocations, as thats the only thing we need. Ultimately we convert > everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3888) ApplicationMaster link is broken in RM WebUI when appstate is NEW
[ https://issues.apache.org/jira/browse/YARN-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622374#comment-14622374 ] Hudson commented on YARN-3888: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #240 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/240/]) YARN-3888. ApplicationMaster link is broken in RM WebUI when appstate is (xgong: rev 52148767924baf423172d26f2c6d8a4cfc6e143f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java * hadoop-yarn-project/CHANGES.txt > ApplicationMaster link is broken in RM WebUI when appstate is NEW > -- > > Key: YARN-3888 > URL: https://issues.apache.org/jira/browse/YARN-3888 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: 0001-YARN-3888.patch, 0002-YARN-3888.patch > > > When the application state is NEW in RM Web UI *Application Master* link is > broken. > {code} > 15/07/06 19:46:16 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:18 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:20 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > {code} > *URL formed* > http://:45020/cluster/app/application_1436191509558_0003 > The above link is broken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622372#comment-14622372 ] Hudson commented on YARN-1012: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #240 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/240/]) YARN-1012. Report NM aggregated container resource utilization in heartbeat. (Inigo Goiri via kasha) (kasha: rev 527c40e4d664c721b8f32d7cd8df21b2666fea8a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/package-info.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/NodeStatusPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/ResourceUtilization.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/ResourceUtilizationPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitor.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeStatus.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java > Report NM aggregated container resource utilization in heartbeat > > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Fix For: 2.8.0 > > Attachments: YARN-1012-1.patch, YARN-1012-10.patch, > YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, > YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, > YARN-1012-9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622334#comment-14622334 ] Hadoop QA commented on YARN-3896: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 4s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 20s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 14s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 0m 52s | Tests passed in hadoop-sls. | | {color:red}-1{color} | yarn tests | 51m 3s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 91m 32s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744706/YARN-3896.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b489080 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8496/artifact/patchprocess/whitespace.txt | | hadoop-sls test log | https://builds.apache.org/job/PreCommit-YARN-Build/8496/artifact/patchprocess/testrun_hadoop-sls.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8496/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8496/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8496/console | This message was automatically generated. > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset > --- > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3896.01.patch, YARN-3896.02.patch, > YARN-3896.03.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3888) ApplicationMaster link is broken in RM WebUI when appstate is NEW
[ https://issues.apache.org/jira/browse/YARN-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622302#comment-14622302 ] Hudson commented on YARN-3888: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2198 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2198/]) YARN-3888. ApplicationMaster link is broken in RM WebUI when appstate is (xgong: rev 52148767924baf423172d26f2c6d8a4cfc6e143f) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java > ApplicationMaster link is broken in RM WebUI when appstate is NEW > -- > > Key: YARN-3888 > URL: https://issues.apache.org/jira/browse/YARN-3888 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: 0001-YARN-3888.patch, 0002-YARN-3888.patch > > > When the application state is NEW in RM Web UI *Application Master* link is > broken. > {code} > 15/07/06 19:46:16 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:18 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:20 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > {code} > *URL formed* > http://:45020/cluster/app/application_1436191509558_0003 > The above link is broken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3800) Reduce storage footprint for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622299#comment-14622299 ] Hudson commented on YARN-3800: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2198 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2198/]) YARN-3800. Reduce storage footprint for ReservationAllocation. Contributed by Anubhav Dhoot. (Carlo Curino: rev 0e602fa3a1529134214452fba10a90307d9c2072) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSimpleCapacityReplanner.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestRLESparseResourceAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/RLESparseResourceAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/GreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java > Reduce storage footprint for ReservationAllocation > -- > > Key: YARN-3800 > URL: https://issues.apache.org/jira/browse/YARN-3800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.8.0 > > Attachments: YARN-3800.001.patch, YARN-3800.002.patch, > YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, > YARN-3800.005.patch > > > Instead of storing the ReservationRequest we store the Resource for > allocations, as thats the only thing we need. Ultimately we convert > everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622295#comment-14622295 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2198 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2198/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (kasha: rev aa067c6aa47b4c79577096817acc00ad6421180c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0
[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622300#comment-14622300 ] Hudson commented on YARN-1012: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2198 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2198/]) YARN-1012. Report NM aggregated container resource utilization in heartbeat. (Inigo Goiri via kasha) (kasha: rev 527c40e4d664c721b8f32d7cd8df21b2666fea8a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/ResourceUtilizationPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/ResourceUtilization.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeStatus.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/package-info.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/NodeStatusPBImpl.java > Report NM aggregated container resource utilization in heartbeat > > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Fix For: 2.8.0 > > Attachments: YARN-1012-1.patch, YARN-1012-10.patch, > YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, > YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, > YARN-1012-9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622285#comment-14622285 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2179 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2179/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (kasha: rev aa067c6aa47b4c79577096817acc00ad6421180c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000700b7925
[jira] [Commented] (YARN-3800) Reduce storage footprint for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622289#comment-14622289 ] Hudson commented on YARN-3800: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2179 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2179/]) YARN-3800. Reduce storage footprint for ReservationAllocation. Contributed by Anubhav Dhoot. (Carlo Curino: rev 0e602fa3a1529134214452fba10a90307d9c2072) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/GreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestRLESparseResourceAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/RLESparseResourceAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSimpleCapacityReplanner.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java > Reduce storage footprint for ReservationAllocation > -- > > Key: YARN-3800 > URL: https://issues.apache.org/jira/browse/YARN-3800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.8.0 > > Attachments: YARN-3800.001.patch, YARN-3800.002.patch, > YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, > YARN-3800.005.patch > > > Instead of storing the ReservationRequest we store the Resource for > allocations, as thats the only thing we need. Ultimately we convert > everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622290#comment-14622290 ] Hudson commented on YARN-1012: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2179 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2179/]) YARN-1012. Report NM aggregated container resource utilization in heartbeat. (Inigo Goiri via kasha) (kasha: rev 527c40e4d664c721b8f32d7cd8df21b2666fea8a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeStatus.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/package-info.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/ResourceUtilization.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/NodeStatusPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/ResourceUtilizationPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitor.java > Report NM aggregated container resource utilization in heartbeat > > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Fix For: 2.8.0 > > Attachments: YARN-1012-1.patch, YARN-1012-10.patch, > YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, > YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, > YARN-1012-9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3888) ApplicationMaster link is broken in RM WebUI when appstate is NEW
[ https://issues.apache.org/jira/browse/YARN-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622292#comment-14622292 ] Hudson commented on YARN-3888: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2179 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2179/]) YARN-3888. ApplicationMaster link is broken in RM WebUI when appstate is (xgong: rev 52148767924baf423172d26f2c6d8a4cfc6e143f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java * hadoop-yarn-project/CHANGES.txt > ApplicationMaster link is broken in RM WebUI when appstate is NEW > -- > > Key: YARN-3888 > URL: https://issues.apache.org/jira/browse/YARN-3888 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: 0001-YARN-3888.patch, 0002-YARN-3888.patch > > > When the application state is NEW in RM Web UI *Application Master* link is > broken. > {code} > 15/07/06 19:46:16 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:18 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:20 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > {code} > *URL formed* > http://:45020/cluster/app/application_1436191509558_0003 > The above link is broken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-3896: --- Attachment: YARN-3896.03.patch Attach a new path: add write lock to reset lastNodeHeartBeatResponse's ID. > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset > --- > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3896.01.patch, YARN-3896.02.patch, > YARN-3896.03.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3888) ApplicationMaster link is broken in RM WebUI when appstate is NEW
[ https://issues.apache.org/jira/browse/YARN-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622133#comment-14622133 ] Hudson commented on YARN-3888: -- FAILURE: Integrated in Hadoop-Yarn-trunk #982 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/982/]) YARN-3888. ApplicationMaster link is broken in RM WebUI when appstate is (xgong: rev 52148767924baf423172d26f2c6d8a4cfc6e143f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java * hadoop-yarn-project/CHANGES.txt > ApplicationMaster link is broken in RM WebUI when appstate is NEW > -- > > Key: YARN-3888 > URL: https://issues.apache.org/jira/browse/YARN-3888 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: 0001-YARN-3888.patch, 0002-YARN-3888.patch > > > When the application state is NEW in RM Web UI *Application Master* link is > broken. > {code} > 15/07/06 19:46:16 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:18 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:20 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > {code} > *URL formed* > http://:45020/cluster/app/application_1436191509558_0003 > The above link is broken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622126#comment-14622126 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Yarn-trunk #982 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/982/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (kasha: rev aa067c6aa47b4c79577096817acc00ad6421180c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000700b79250>
[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622131#comment-14622131 ] Hudson commented on YARN-1012: -- FAILURE: Integrated in Hadoop-Yarn-trunk #982 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/982/]) YARN-1012. Report NM aggregated container resource utilization in heartbeat. (Inigo Goiri via kasha) (kasha: rev 527c40e4d664c721b8f32d7cd8df21b2666fea8a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/ResourceUtilization.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/NodeStatusPBImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/package-info.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/ResourceUtilizationPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeStatus.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java > Report NM aggregated container resource utilization in heartbeat > > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Fix For: 2.8.0 > > Attachments: YARN-1012-1.patch, YARN-1012-10.patch, > YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, > YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, > YARN-1012-9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3800) Reduce storage footprint for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622130#comment-14622130 ] Hudson commented on YARN-3800: -- FAILURE: Integrated in Hadoop-Yarn-trunk #982 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/982/]) YARN-3800. Reduce storage footprint for ReservationAllocation. Contributed by Anubhav Dhoot. (Carlo Curino: rev 0e602fa3a1529134214452fba10a90307d9c2072) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/RLESparseResourceAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestRLESparseResourceAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/GreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemUtil.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationAllocation.java > Reduce storage footprint for ReservationAllocation > -- > > Key: YARN-3800 > URL: https://issues.apache.org/jira/browse/YARN-3800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.8.0 > > Attachments: YARN-3800.001.patch, YARN-3800.002.patch, > YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, > YARN-3800.005.patch > > > Instead of storing the ReservationRequest we store the Resource for > allocations, as thats the only thing we need. Ultimately we convert > everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622122#comment-14622122 ] Hudson commented on YARN-1012: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #252 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/252/]) YARN-1012. Report NM aggregated container resource utilization in heartbeat. (Inigo Goiri via kasha) (kasha: rev 527c40e4d664c721b8f32d7cd8df21b2666fea8a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/package-info.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeStatus.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/ResourceUtilizationPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/NodeStatusPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/ResourceUtilization.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java > Report NM aggregated container resource utilization in heartbeat > > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Fix For: 2.8.0 > > Attachments: YARN-1012-1.patch, YARN-1012-10.patch, > YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, > YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, > YARN-1012-9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3888) ApplicationMaster link is broken in RM WebUI when appstate is NEW
[ https://issues.apache.org/jira/browse/YARN-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622124#comment-14622124 ] Hudson commented on YARN-3888: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #252 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/252/]) YARN-3888. ApplicationMaster link is broken in RM WebUI when appstate is (xgong: rev 52148767924baf423172d26f2c6d8a4cfc6e143f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java * hadoop-yarn-project/CHANGES.txt > ApplicationMaster link is broken in RM WebUI when appstate is NEW > -- > > Key: YARN-3888 > URL: https://issues.apache.org/jira/browse/YARN-3888 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: 0001-YARN-3888.patch, 0002-YARN-3888.patch > > > When the application state is NEW in RM Web UI *Application Master* link is > broken. > {code} > 15/07/06 19:46:16 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:18 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:20 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > {code} > *URL formed* > http://:45020/cluster/app/application_1436191509558_0003 > The above link is broken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3800) Reduce storage footprint for ReservationAllocation
[ https://issues.apache.org/jira/browse/YARN-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622121#comment-14622121 ] Hudson commented on YARN-3800: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #252 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/252/]) YARN-3800. Reduce storage footprint for ReservationAllocation. Contributed by Anubhav Dhoot. (Carlo Curino: rev 0e602fa3a1529134214452fba10a90307d9c2072) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryPlan.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestInMemoryPlan.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestSimpleCapacityReplanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestRLESparseResourceAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/RLESparseResourceAllocation.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/InMemoryReservationAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/GreedyReservationAgent.java > Reduce storage footprint for ReservationAllocation > -- > > Key: YARN-3800 > URL: https://issues.apache.org/jira/browse/YARN-3800 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.8.0 > > Attachments: YARN-3800.001.patch, YARN-3800.002.patch, > YARN-3800.002.patch, YARN-3800.003.patch, YARN-3800.004.patch, > YARN-3800.005.patch > > > Instead of storing the ReservationRequest we store the Resource for > allocations, as thats the only thing we need. Ultimately we convert > everything to resources anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622117#comment-14622117 ] Hudson commented on YARN-3878: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #252 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/252/]) YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (kasha: rev aa067c6aa47b4c79577096817acc00ad6421180c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt > AsyncDispatcher can hang while stopping if it is configured for draining > events on stop > --- > > Key: YARN-3878 > URL: https://issues.apache.org/jira/browse/YARN-3878 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Varun Saxena >Assignee: Varun Saxena >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-3878.01.patch, YARN-3878.02.patch, > YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, > YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch > > > The sequence of events is as under : > # RM is stopped while putting a RMStateStore Event to RMStateStore's > AsyncDispatcher. This leads to an Interrupted Exception being thrown. > # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On > {{serviceStop}}, we will check if all events have been drained and wait for > event queue to drain(as RM State Store dispatcher is configured for queue to > drain on stop). > # This condition never becomes true and AsyncDispatcher keeps on waiting > incessantly for dispatcher event queue to drain till JVM exits. > *Initial exception while posting RM State store event to queue* > {noformat} > 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService > (AbstractService.java:enterState(452)) - Service: Dispatcher entered state > STOPPED > 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher > thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) > {noformat} > *JStack of AsyncDispatcher hanging on stop* > {noformat} > "AsyncDispatcher event handler" prio=10 tid=0x7fb980222800 nid=0x4b1e > waiting on condition [0x7fb9654e9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0
[jira] [Commented] (YARN-3381) A typographical error in "InvalidStateTransitonException"
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622111#comment-14622111 ] Hadoop QA commented on YARN-3381: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 7s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:red}-1{color} | javac | 2m 38s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744686/YARN-3381-009.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d66302e | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8495/artifact/patchprocess/trunkFindbugsWarningshadoop-mapreduce-client-app.html | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8495/console | This message was automatically generated. > A typographical error in "InvalidStateTransitonException" > - > > Key: YARN-3381 > URL: https://issues.apache.org/jira/browse/YARN-3381 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.6.0 >Reporter: Xiaoshuang LU >Assignee: Brahma Reddy Battula > Labels: BB2015-05-TBR > Attachments: YARN-3381-002.patch, YARN-3381-003.patch, > YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, > YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, > YARN-3381-009.patch, YARN-3381.patch > > > Appears that "InvalidStateTransitonException" should be > "InvalidStateTransitionException". Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3381) A typographical error in "InvalidStateTransitonException"
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3381: --- Attachment: YARN-3381-009.patch Attached patch to address the javadoc issue. > A typographical error in "InvalidStateTransitonException" > - > > Key: YARN-3381 > URL: https://issues.apache.org/jira/browse/YARN-3381 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.6.0 >Reporter: Xiaoshuang LU >Assignee: Brahma Reddy Battula > Labels: BB2015-05-TBR > Attachments: YARN-3381-002.patch, YARN-3381-003.patch, > YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, > YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, > YARN-3381-009.patch, YARN-3381.patch > > > Appears that "InvalidStateTransitonException" should be > "InvalidStateTransitionException". Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3884) RMContainerImpl transition from RESERVED to KILL apphistory status not updated
[ https://issues.apache.org/jira/browse/YARN-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622030#comment-14622030 ] Bibin A Chundatt commented on YARN-3884: @Please review patch attached . > RMContainerImpl transition from RESERVED to KILL apphistory status not updated > -- > > Key: YARN-3884 > URL: https://issues.apache.org/jira/browse/YARN-3884 > Project: Hadoop YARN > Issue Type: Bug > Environment: Suse11 Sp3 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-3884.patch, Apphistory Container Status.jpg, > Elapsed Time.jpg, Test Result-Container status.jpg > > > Setup > === > 1 NM 3072 16 cores each > Steps to reproduce > === > 1.Submit apps to Queue 1 with 512 mb 1 core > 2.Submit apps to Queue 2 with 512 mb and 5 core > lots of containers get reserved and unreserved in this case > {code} > 2015-07-02 20:45:31,169 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_e24_1435849994778_0002_01_13 Container Transitioned from NEW to > RESERVED > 2015-07-02 20:45:31,170 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Reserved container application=application_1435849994778_0002 > resource= queue=QueueA: capacity=0.4, > absoluteCapacity=0.4, usedResources=, > usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, > numContainers=5 usedCapacity=1.6410257 absoluteUsedCapacity=0.65625 > used= cluster= > 2015-07-02 20:45:31,170 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Re-sorting assigned queue: root.QueueA stats: QueueA: capacity=0.4, > absoluteCapacity=0.4, usedResources=, > usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, > numContainers=6 > 2015-07-02 20:45:31,170 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=0.96875 > absoluteUsedCapacity=0.96875 used= > cluster= > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_e24_1435849994778_0001_01_14 Container Transitioned from NEW to > ALLOCATED > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf > OPERATION=AM Allocated ContainerTARGET=SchedulerApp > RESULT=SUCCESS APPID=application_1435849994778_0001 > CONTAINERID=container_e24_1435849994778_0001_01_14 > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: > Assigned container container_e24_1435849994778_0001_01_14 of capacity > on host host-10-19-92-117:64318, which has 6 > containers, used and available > after allocation > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > assignedContainer application attempt=appattempt_1435849994778_0001_01 > container=Container: [ContainerId: > container_e24_1435849994778_0001_01_14, NodeId: host-10-19-92-117:64318, > NodeHttpAddress: host-10-19-92-117:65321, Resource: , > Priority: 20, Token: null, ] queue=default: capacity=0.2, > absoluteCapacity=0.2, usedResources=, > usedCapacity=2.0846906, absoluteUsedCapacity=0.4166, numApps=1, > numContainers=5 clusterResource= > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Re-sorting assigned queue: root.default stats: default: capacity=0.2, > absoluteCapacity=0.2, usedResources=, > usedCapacity=2.5016286, absoluteUsedCapacity=0.5, numApps=1, numContainers=6 > 2015-07-02 20:45:31,191 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=1.0 absoluteUsedCapacity=1.0 > used= cluster= > 2015-07-02 20:45:32,143 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_e24_1435849994778_0001_01_14 Container Transitioned from > ALLOCATED to ACQUIRED > 2015-07-02 20:45:32,174 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Trying to fulfill reservation for application application_1435849994778_0002 > on node: host-10-19-92-143:64318 > 2015-07-02 20:45:32,174 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Reserved container application=application_1435849994778_0002 > resource= queue=QueueA: capacity=0.4, > absoluteCapacity=0.4, usedResources=, > usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, > numContainers=6 usedCapacity=2.0317461 absoluteUsedCapacity=0.8125 > used= cluster= > 2015
[jira] [Commented] (YARN-3888) ApplicationMaster link is broken in RM WebUI when appstate is NEW
[ https://issues.apache.org/jira/browse/YARN-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622017#comment-14622017 ] Bibin A Chundatt commented on YARN-3888: [~xgong] Thank you for review and commit. > ApplicationMaster link is broken in RM WebUI when appstate is NEW > -- > > Key: YARN-3888 > URL: https://issues.apache.org/jira/browse/YARN-3888 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: 0001-YARN-3888.patch, 0002-YARN-3888.patch > > > When the application state is NEW in RM Web UI *Application Master* link is > broken. > {code} > 15/07/06 19:46:16 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:18 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > 15/07/06 19:46:20 INFO impl.YarnClientImpl: Application submission is not > finished, submitted application application_1436191509558_0003 is still in NEW > {code} > *URL formed* > http://:45020/cluster/app/application_1436191509558_0003 > The above link is broken -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621982#comment-14621982 ] Hadoop QA commented on YARN-3908: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 38s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 18s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 43s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 47s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 22s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 38m 27s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744663/YARN-3908-YARN-2928.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 2d4a8f4 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8494/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8494/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8494/console | This message was automatically generated. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621970#comment-14621970 ] Hadoop QA commented on YARN-3453: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 4s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 47s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 46s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 6s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 19s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 25s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 50m 50s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 88m 53s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744652/YARN-3453.5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d66302e | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8493/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8493/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8493/console | This message was automatically generated. > Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator > even in DRF mode causing thrashing > > > Key: YARN-3453 > URL: https://issues.apache.org/jira/browse/YARN-3453 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Ashwin Shankar >Assignee: Arun Suresh > Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, > YARN-3453.4.patch, YARN-3453.5.patch > > > There are two places in preemption code flow where DefaultResourceCalculator > is used, even in DRF mode. > Which basically results in more resources getting preempted than needed, and > those extra preempted containers aren’t even getting to the “starved” queue > since scheduling logic is based on DRF's Calculator. > Following are the two places : > 1. {code:title=FSLeafQueue.java|borderStyle=solid} > private boolean isStarved(Resource share) > {code} > A queue shouldn’t be marked as “starved” if the dominant resource usage > is >= fair/minshare. > 2. {code:title=FairScheduler.java|borderStyle=solid} > protected Resource resToPreempt(FSLeafQueue sched, long curTime) > {code} > -- > One more thing that I believe needs to change in DRF mode is : during a > preemption round,if preempting a few containers results in satisfying needs > of a resource type, then we should exit that preemption round, since the > containers that we just preempted should bring the dominant resource usage to > min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode
[ https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621952#comment-14621952 ] Hadoop QA commented on YARN-3857: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 9s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 48s | The applied patch generated 7 new checkstyle issues (total was 125, now 129). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 6 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 21s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 25s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 51m 1s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 89m 3s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744651/YARN-3857-2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d66302e | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8492/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8492/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8492/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8492/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8492/console | This message was automatically generated. > Memory leak in ResourceManager with SIMPLE mode > --- > > Key: YARN-3857 > URL: https://issues.apache.org/jira/browse/YARN-3857 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: mujunchao >Assignee: mujunchao >Priority: Critical > Attachments: YARN-3857-1.patch, YARN-3857-2.patch, > hadoop-yarn-server-resourcemanager.patch > > > We register the ClientTokenMasterKey to avoid client may hold an invalid > ClientToken after RM restarts. In SIMPLE mode, we register > Pair , But we never remove it from HashMap, as > unregister only runing while in Security mode, so memory leak coming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode
[ https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621947#comment-14621947 ] Devaraj K commented on YARN-3857: - Thanks [~mujunchao] for updated patch with test. Please take care of these comments also along with the [~zxu] comments fix. 1.I don't think adding this new method is required. Can we just use the ClientToAMTokenSecretManagerInRM#getMasterKey() to know whether the master key present or not? {code:xml} + + @VisibleForTesting + public synchronized boolean hasMasterKey( + ApplicationAttemptId applicationAttemptID) { + return this.masterKeys.containsKey(applicationAttemptID); + } {code} 2. I see there are some format issues in the patch w.r.t braces and indentation with spaces. Please go through the 'Making Changes' section in https://wiki.apache.org/hadoop/HowToContribute and configure your IDE according. It will be one time job and you don't have to worry next time for creating patches. {code:xml} +if(isSecurityEnabled) +{ {code} {code:xml} +} +else +{ {code} 4. Remove unused imports in RMAppAttemptImpl.java. > Memory leak in ResourceManager with SIMPLE mode > --- > > Key: YARN-3857 > URL: https://issues.apache.org/jira/browse/YARN-3857 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: mujunchao >Assignee: mujunchao >Priority: Critical > Attachments: YARN-3857-1.patch, YARN-3857-2.patch, > hadoop-yarn-server-resourcemanager.patch > > > We register the ClientTokenMasterKey to avoid client may hold an invalid > ClientToken after RM restarts. In SIMPLE mode, we register > Pair , But we never remove it from HashMap, as > unregister only runing while in Security mode, so memory leak coming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode
[ https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621925#comment-14621925 ] zhihai xu commented on YARN-3857: - thanks for updating the patch quickly. The new patch looks good in general. It'd be nice to address a few small nits, mostly regarding the comment and style. # Change {{// this is to test master key is saved in the state store}} to {{// this is to test master key is saved in the secret manager}} # Remove comment {{//assumeTrue(!isSecurityEnabled);}} # Change {{if(isSecurityEnabled)}} to {{if (isSecurityEnabled)}} # Could you fix the indentation? It should be indented by 2(new paragraph) or 4 bytes. +1 non-binding once these are addressed. Thanks again for working on this. > Memory leak in ResourceManager with SIMPLE mode > --- > > Key: YARN-3857 > URL: https://issues.apache.org/jira/browse/YARN-3857 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: mujunchao >Assignee: mujunchao >Priority: Critical > Attachments: YARN-3857-1.patch, YARN-3857-2.patch, > hadoop-yarn-server-resourcemanager.patch > > > We register the ClientTokenMasterKey to avoid client may hold an invalid > ClientToken after RM restarts. In SIMPLE mode, we register > Pair , But we never remove it from HashMap, as > unregister only runing while in Security mode, so memory leak coming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621919#comment-14621919 ] Peng Zhang commented on YARN-3453: -- Yes, I agree with more fix in separate JIRA. And {{YARN-3453.5.patch}} LGTM. > Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator > even in DRF mode causing thrashing > > > Key: YARN-3453 > URL: https://issues.apache.org/jira/browse/YARN-3453 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Ashwin Shankar >Assignee: Arun Suresh > Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, > YARN-3453.4.patch, YARN-3453.5.patch > > > There are two places in preemption code flow where DefaultResourceCalculator > is used, even in DRF mode. > Which basically results in more resources getting preempted than needed, and > those extra preempted containers aren’t even getting to the “starved” queue > since scheduling logic is based on DRF's Calculator. > Following are the two places : > 1. {code:title=FSLeafQueue.java|borderStyle=solid} > private boolean isStarved(Resource share) > {code} > A queue shouldn’t be marked as “starved” if the dominant resource usage > is >= fair/minshare. > 2. {code:title=FairScheduler.java|borderStyle=solid} > protected Resource resToPreempt(FSLeafQueue sched, long curTime) > {code} > -- > One more thing that I believe needs to change in DRF mode is : during a > preemption round,if preempting a few containers results in satisfying needs > of a resource type, then we should exit that preemption round, since the > containers that we just preempted should bring the dominant resource usage to > min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3908: - Attachment: YARN-3908-YARN-2928.001.patch Hi [~zjshen], [~sjlee0], I have put together a quick patch tonight that perhaps [~sjlee0] can improve upon. I have addressed Zhijie's points and accordingly added the code to store the info map from the TimelineEvent and timestamp of the TimelineEvent. The timestamp will get stored as the hbase cell timestamp. If the TimelineEvent has a null value for the timestamp, the code picks the current timestamp. I have also added unit tests to check for correctness of writing and reading these additions. There is a TODO in the junit test for the reader side (not really for the writer, so in that sense we could have a separate jira to add that in or we could add it in with this patch). Just like there is a EntityColumnPrefix#readResults to read a map and a EntityColumnPrefix#readTimeseriesResults to read the timeseries data, we need to have a EntityColumnPrefix#readResults api that can read TimelineEvent data. For now, I have added some code in the unit test itself, it needs to be refactored to be moved into the right place in EntityColumnPrefix. Hope this helps [~sjlee0] to take this forward as needed while I am OOO. thanks Vrushali > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621902#comment-14621902 ] Arun Suresh commented on YARN-3453: --- [~peng.zhang], bq. After review above comments, I am reminded that the case (0 GB, non-zero cores) like (non-zero GB, 0 cores) will also cause preempt more resources than necessary. I agree... But I feel instead of fixing it here, if we can have a comprehensive fix as requested by YARN-2154 ( [~kasha] and myself had an offline discussion about how we should actually break from the preemption loop when incoming requests are satisfied), then we wont even hit this case. Further more, this JIRA fixes the {{isStarved()}} method in the Queue correctly, so at the very least, the {{toPreempt}} resource object would be smaller (and thus would implicitly result in less pre-emptions) I also agree fining the ratio of demand is definitely useful. But again, let us grab all the low hanging fruit first. I propose we create a separate JIRA for that. > Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator > even in DRF mode causing thrashing > > > Key: YARN-3453 > URL: https://issues.apache.org/jira/browse/YARN-3453 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Ashwin Shankar >Assignee: Arun Suresh > Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, > YARN-3453.4.patch, YARN-3453.5.patch > > > There are two places in preemption code flow where DefaultResourceCalculator > is used, even in DRF mode. > Which basically results in more resources getting preempted than needed, and > those extra preempted containers aren’t even getting to the “starved” queue > since scheduling logic is based on DRF's Calculator. > Following are the two places : > 1. {code:title=FSLeafQueue.java|borderStyle=solid} > private boolean isStarved(Resource share) > {code} > A queue shouldn’t be marked as “starved” if the dominant resource usage > is >= fair/minshare. > 2. {code:title=FairScheduler.java|borderStyle=solid} > protected Resource resToPreempt(FSLeafQueue sched, long curTime) > {code} > -- > One more thing that I believe needs to change in DRF mode is : during a > preemption round,if preempting a few containers results in satisfying needs > of a resource type, then we should exit that preemption round, since the > containers that we just preempted should bring the dominant resource usage to > min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621883#comment-14621883 ] Peng Zhang commented on YARN-3453: -- I understood your thought. My suggestion is based on our practice: I found it's confusing to use different policy in queue configuration: eg. parent use fair, child use drf may cause child queue has no resource on cpu dimension, so job will hang there. So we use only drf in one cluster, and change the code to support setting the calculator class in scheduler scope. After review above comments, I am reminded that the case (0 GB, non-zero cores) like (non-zero GB, 0 cores) will also cause preempt more resources than necessary. I mentioned before: bq. To decrease this kind of waste, I want to found what's the ratio of demand can be fulfilled by resourceUpperBound, and use this ratio * resourceUpperBound to be targetResource. Actually, current implementation ignored the resource boundary of each requested container, so even after above logic, it still will has some waste. As for YARN-2154, if we want to only preempt containers can satisfy incoming request, IMHO, we should to do preemption for each incoming request instead count them up with {{resourceDeficit}}. > Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator > even in DRF mode causing thrashing > > > Key: YARN-3453 > URL: https://issues.apache.org/jira/browse/YARN-3453 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Ashwin Shankar >Assignee: Arun Suresh > Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, > YARN-3453.4.patch, YARN-3453.5.patch > > > There are two places in preemption code flow where DefaultResourceCalculator > is used, even in DRF mode. > Which basically results in more resources getting preempted than needed, and > those extra preempted containers aren’t even getting to the “starved” queue > since scheduling logic is based on DRF's Calculator. > Following are the two places : > 1. {code:title=FSLeafQueue.java|borderStyle=solid} > private boolean isStarved(Resource share) > {code} > A queue shouldn’t be marked as “starved” if the dominant resource usage > is >= fair/minshare. > 2. {code:title=FairScheduler.java|borderStyle=solid} > protected Resource resToPreempt(FSLeafQueue sched, long curTime) > {code} > -- > One more thing that I believe needs to change in DRF mode is : during a > preemption round,if preempting a few containers results in satisfying needs > of a resource type, then we should exit that preemption round, since the > containers that we just preempted should bring the dominant resource usage to > min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3327) if NMClientAsync stopContainer failed because of IOException, there's no chance to stopContainer again
[ https://issues.apache.org/jira/browse/YARN-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621877#comment-14621877 ] Mohammad Shahid Khan commented on YARN-3327: Hi sandflee, Can you please attach the logs to do the more analysis. > if NMClientAsync stopContainer failed because of IOException, there's no > chance to stopContainer again > --- > > Key: YARN-3327 > URL: https://issues.apache.org/jira/browse/YARN-3327 > Project: Hadoop YARN > Issue Type: Bug > Components: api, client >Affects Versions: 2.6.0 >Reporter: sandflee > > In AM, I use NMClientAsync to control containers, when I use stopContainer to > kill container ,IOException happens, then If I stopContainer again, nothing > happens, because container Managed by NMClientAsync came to FAILED state. > In my opinion, IOException just means temporary error,it shouldn't came to > FAILED state especially when NM Restart is enabled。 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3771) "final" behavior is not honored for YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[]
[ https://issues.apache.org/jira/browse/YARN-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621878#comment-14621878 ] Hadoop QA commented on YARN-3771: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 48s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 11s | The applied patch generated 4 new checkstyle issues (total was 211, now 201). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 26s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 17s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 0m 46s | Tests passed in hadoop-mapreduce-client-common. | | {color:green}+1{color} | mapreduce tests | 107m 5s | Tests passed in hadoop-mapreduce-client-jobclient. | | {color:green}+1{color} | yarn tests | 0m 27s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 7m 2s | Tests passed in hadoop-yarn-applications-distributedshell. | | | | 160m 21s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12737924/0001-YARN-3771.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 5214876 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8490/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-mapreduce-client-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8490/artifact/patchprocess/testrun_hadoop-mapreduce-client-common.txt | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/8490/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8490/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8490/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8490/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8490/console | This message was automatically generated. > "final" behavior is not honored for > YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH since it is a String[] > > > Key: YARN-3771 > URL: https://issues.apache.org/jira/browse/YARN-3771 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Assignee: nijel > Attachments: 0001-YARN-3771.patch > > > i was going through some find bugs rules. One issue reported in that is > public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = { > and > public static final String[] > DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH= > is not honoring the final qualifier. The string array contents can be re > assigned ! > Simple test > {code} > public class TestClass { > static final String[] t = { "1", "2" }; > public static void main(String[] args) { > System.out.println(12 < 10); > String[] t1={"u"}; > //t = t1; // this will show compilation error > t (1) = t1 (1) ; // But this works > } > } > {code} > One option is to use Collections.unmodifiableList > any thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)