[jira] [Commented] (YARN-4335) Allow ResourceRequests to specify ExecutionType of a request ask
[ https://issues.apache.org/jira/browse/YARN-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255175#comment-15255175 ] Hudson commented on YARN-4335: -- FAILURE: Integrated in Hadoop-trunk-Commit #9660 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9660/]) YARN-4335. Allow ResourceRequests to specify ExecutionType of a request (arun suresh: rev b2a654c5ee6524f81c971ea0b70e58ea0a455f1d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java > Allow ResourceRequests to specify ExecutionType of a request ask > > > Key: YARN-4335 > URL: https://issues.apache.org/jira/browse/YARN-4335 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Fix For: 3.0.0 > > Attachments: YARN-4335-yarn-2877.001.patch, YARN-4335.002.patch, > YARN-4335.003.patch > > > YARN-2882 introduced container types that are internal (not user-facing) and > are used by the ContainerManager during execution at the NM. > With this JIRA we are introducing (user-facing) resource request types that > are used by the AM to specify the type of the ResourceRequest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4335) Allow ResourceRequests to specify ExecutionType of a request ask
[ https://issues.apache.org/jira/browse/YARN-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255173#comment-15255173 ] Arun Suresh commented on YARN-4335: --- Committed this to trunk.. > Allow ResourceRequests to specify ExecutionType of a request ask > > > Key: YARN-4335 > URL: https://issues.apache.org/jira/browse/YARN-4335 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Fix For: 3.0.0 > > Attachments: YARN-4335-yarn-2877.001.patch, YARN-4335.002.patch, > YARN-4335.003.patch > > > YARN-2882 introduced container types that are internal (not user-facing) and > are used by the ContainerManager during execution at the NM. > With this JIRA we are introducing (user-facing) resource request types that > are used by the AM to specify the type of the ResourceRequest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4335) Allow ResourceRequests to specify ExecutionType of a request ask
[ https://issues.apache.org/jira/browse/YARN-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-4335: -- Fix Version/s: 3.0.0 > Allow ResourceRequests to specify ExecutionType of a request ask > > > Key: YARN-4335 > URL: https://issues.apache.org/jira/browse/YARN-4335 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Fix For: 3.0.0 > > Attachments: YARN-4335-yarn-2877.001.patch, YARN-4335.002.patch, > YARN-4335.003.patch > > > YARN-2882 introduced container types that are internal (not user-facing) and > are used by the ContainerManager during execution at the NM. > With this JIRA we are introducing (user-facing) resource request types that > are used by the AM to specify the type of the ResourceRequest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3150) [Documentation] Documenting the timeline service v2
[ https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255115#comment-15255115 ] Hadoop QA commented on YARN-3150: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 18s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 49s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s {color} | {color:green} YARN-2928 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 30s {color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 5s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 2s {color} | {color:green} YARN-2928 passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped branch modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 15s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s {color} | {color:green} YARN-2928 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 7s {color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 33s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patch modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 11s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 10s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {col
[jira] [Commented] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255109#comment-15255109 ] Hadoop QA commented on YARN-4390: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 12 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 30 new + 505 unchanged - 15 fixed = 535 total (was 520) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 17s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 37s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 12s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 170m 29s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | Should org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator$TQComparator be a _static_ inner class? At PreemptableResourceCalculator.java:inner class? At PreemptableResourceCalculator.java:[lines
[jira] [Commented] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255050#comment-15255050 ] Jian He commented on YARN-4390: --- some comments after scanning the patch: - Looks like the approach is to loop over almost all containers several times on the cluster for every preemption cycle (3 secs by default), to see whether some containers can be preempted to make room for the reserved container on the same node. Will this cause too much overhead in a large cluster where we have a large amount of containers ? A unit test may test the time cost for this mega loop. - unnecessary line breakers are added in multiple places, could you clean those up ? especially PreemptableResourceCalculator class. - Does this equal to node.getUnallocatedResource? {code} for (RMContainer c : sortedRunningContainers) { Resources.subtractFrom(available, c.getAllocatedResource()); } {code} - Insn't FifoCandidatesSelector the first selector and the selectedCandidates is empty ? {code} // Previous selectors (with higher priority) could have already // selected containers. We need to deduct preemptable resources // based on already selected candidates. CapacitySchedulerPreemptionUtils .deductPreemptableResourcesBasedSelectedCandidates(preemptionContext, selectedCandidates); {code} > Do surgical preemption based on reserved container in CapacityScheduler > --- > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Wangda Tan > Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, > YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, > YARN-4390.3.patch, YARN-4390.4.patch, YARN-4390.5.patch > > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4807) MockAM#waitForState sleep duration is too long
[ https://issues.apache.org/jira/browse/YARN-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255000#comment-15255000 ] Hadoop QA commented on YARN-4807: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 22 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 43s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 21s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 138m 58s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | | JDK v1.8.0_77 Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.serv
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254970#comment-15254970 ] Hitesh Shah commented on YARN-4844: --- Additionally we are not talking about use in production but rather making upstream apps change as needed to work with 3.x and over time stabilize 3.x. Making an API change earlier rather than later is actually better as the API changes in this case have no relevance to production stability. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254967#comment-15254967 ] Hitesh Shah commented on YARN-4844: --- bq. considering there are hundreds of blockers and criticals of 3.0.0 release, nobody will actually use the new release in production even if 3.0-alpha can be released. We can mark Resource API of trunk to be unstable and update it in future 3.x releases. So the plan is to force users to change their usage of these APIs in some version of 3.x but not in 3.0.0 ? > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4920) ATS/NM should support a link to dowload/get the logs in text format
[ https://issues.apache.org/jira/browse/YARN-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-4920: --- Assignee: Xuan Gong > ATS/NM should support a link to dowload/get the logs in text format > --- > > Key: YARN-4920 > URL: https://issues.apache.org/jira/browse/YARN-4920 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode
[ https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4983: Attachment: YARN-4983-trunk.002.patch Added an unit test for the standby metrics. The UT failures in the previous run appears to be independent to the changes in this patch. Trying them one more time. > JVM and UGI metrics disappear after RM is once transitioned to standby mode > --- > > Key: YARN-4983 > URL: https://issues.apache.org/jira/browse/YARN-4983 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4983-trunk.000.patch, YARN-4983-trunk.001.patch, > YARN-4983-trunk.002.patch > > > When get transitioned to standby, the RM will shutdown the existing metric > system and relaunch a new one. This will cause the jvm metrics and ugi > metrics to miss in the new metric system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254944#comment-15254944 ] Hadoop QA commented on YARN-4844: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 53 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 58s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 22s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 3s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 7s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 50s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 61 new + 1404 unchanged - 47 fixed = 1465 total (was 1451) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) with tabs. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 15s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 47s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 55s {color} | {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} |
[jira] [Commented] (YARN-3150) [Documentation] Documenting the timeline service v2
[ https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254914#comment-15254914 ] Sangjin Lee commented on YARN-3150: --- [~gtCarrera9]: bq. For the list of timeline v2 configs, maybe we'd like to distinguish the configs that we adopt from existing ATS v1.x configs and the newly introduced configs? We may want to stress the overridden configs. I introduced a column that marks whether the config is new in v.2 as opposed to existing. See if that format works. bq. Maybe we'd like to have a few more sentences about the timeline schema creator? There are some "hidden" functions that might be interesting. I did add a sentence about skipping existing tables. I didn't document the rest of the options as I thought those options are mostly geared towards us (TS v.2 developers) rather than general developers/users. Let me know your thoughts. bq. We may want to clarify the meaning of "system metrics" and "container metrics" in the document. When readers have some v1 background, it may be helpful to distinguish a few wordings in the document: "system metrics" vs. "application history data" in AHS, "container metrics" vs. the old "public container metrics" option in v1. I tried to clean up the terminology. I am mostly using "system metrics" to refer to YARN-generated metrics. "Container metrics" are not entirely accurate as we are aggregating them to be at the app level, flow level, etc. While we're at it, I did notice one of the config properties was not described correctly. {{yarn.rm.system-metrics-publisher.emit-container-events}} is about RM publisher emitting container *events*, not *metrics*. I corrected the description and related variable/method names. cc [~Naganarasimha] bq. We may want to explicitly mention in the "Publishing application specific data" section that this section is mainly for YARN application programmers, but not for cluster operators. Added a sentence. bq. Note the programmers that the return value of v2 APIs are changed to void? Good point. Added a couple of sentences. bq. Maybe we can be more precise about the "reasonable defaults" for flow contexts? Done. bq. We need separate docs for the REST APIs in the future. Right now the REST API doc is just a simple reference. I changed a word there to say "informal". Yes, this is not a complete REST API description. I'm not quite sure if we're at a point where we can generate a complete reference for that yet. So that will have to wait a little... I also added some more about the high level architecture and a diagram. > [Documentation] Documenting the timeline service v2 > --- > > Key: YARN-3150 > URL: https://issues.apache.org/jira/browse/YARN-3150 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > Attachments: TimelineServiceV2.html, YARN-3150-YARN-2928.01.patch, > YARN-3150-YARN-2928.02.patch > > > Let's make sure we will have a document to describe what's new in TS v2, the > APIs, the client libs and so on. We should do better around documentation in > v2 than v1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3150) [Documentation] Documenting the timeline service v2
[ https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3150: -- Attachment: YARN-3150-YARN-2928.02.patch Posted patch v.2. Addressed most of Li's comments. We still need to add some more regarding setting up HBase. More to come. To generate the html, go to {{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site}}, and do {{mvn site}}. > [Documentation] Documenting the timeline service v2 > --- > > Key: YARN-3150 > URL: https://issues.apache.org/jira/browse/YARN-3150 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > Attachments: TimelineServiceV2.html, YARN-3150-YARN-2928.01.patch, > YARN-3150-YARN-2928.02.patch > > > Let's make sure we will have a document to describe what's new in TS v2, the > APIs, the client libs and so on. We should do better around documentation in > v2 than v1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254897#comment-15254897 ] Wangda Tan commented on YARN-4844: -- [~hitesh], considering there are hundreds of blockers and criticals of 3.0.0 release, nobody will actually use the new release in production even if 3.0-alpha can be released. We can mark Resource API of trunk to be unstable and update it in future 3.x releases. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4335) Allow ResourceRequests to specify ExecutionType of a request ask
[ https://issues.apache.org/jira/browse/YARN-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254877#comment-15254877 ] Arun Suresh commented on YARN-4335: --- ping [~leftnoteasy], [~kasha].. Planning on pushing this to trunk if you guys have no reservations.. > Allow ResourceRequests to specify ExecutionType of a request ask > > > Key: YARN-4335 > URL: https://issues.apache.org/jira/browse/YARN-4335 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-4335-yarn-2877.001.patch, YARN-4335.002.patch, > YARN-4335.003.patch > > > YARN-2882 introduced container types that are internal (not user-facing) and > are used by the ContainerManager during execution at the NM. > With this JIRA we are introducing (user-facing) resource request types that > are used by the AM to specify the type of the ResourceRequest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4556) TestFifoScheduler.testResourceOverCommit fails
[ https://issues.apache.org/jira/browse/YARN-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254872#comment-15254872 ] Hadoop QA commented on YARN-4556: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 8m 59s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 15s {color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} branch-2.7 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} branch-2.7 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s {color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} branch-2.7 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 11s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager in branch-2.7 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} branch-2.7 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} branch-2.7 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1890 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 49s {color} | {color:red} The patch has 256 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 32s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 18s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 2m 16s {color} | {color:red} Patch generated 61 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 129m 23s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resource
[jira] [Commented] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode
[ https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254865#comment-15254865 ] Hadoop QA commented on YARN-4983: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 29s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 8s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 45s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 12s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 6s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 6s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 11s {color} | {color:red} root: patch generated 1 new + 173 unchanged - 1 fixed = 174 total (was 174) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 50s {color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 39s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 55s {color} | {color:red} hadoop-common in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 46s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s {col
[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254861#comment-15254861 ] Wangda Tan commented on YARN-3215: -- Thanks [~Naganarasimha], committing the patch now. > Respect labels in CapacityScheduler when computing headroom > --- > > Key: YARN-3215 > URL: https://issues.apache.org/jira/browse/YARN-3215 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-3215.branch-2.8.v2.002.patch, > YARN-3215.v1.001.patch, YARN-3215.v2.001.patch, YARN-3215.v2.002.patch, > YARN-3215.v2.003.patch, YARN-3215.v2.branch-2.8.patch > > > In existing CapacityScheduler, when computing headroom of an application, it > will only consider "non-labeled" nodes of this application. > But it is possible the application is asking for labeled resources, so > headroom-by-label (like 5G resource available under node-label=red) is > required to get better resource allocation and avoid deadlocks such as > MAPREDUCE-5928. > This JIRA could involve both API changes (such as adding a > label-to-available-resource map in AllocateResponse) and also internal > changes in CapacityScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254859#comment-15254859 ] Wangda Tan commented on YARN-4390: -- Sure, please [~eepayne]. Would really appreciate if you can get some feedbacks in early next week. I hope this can get in soon :). > Do surgical preemption based on reserved container in CapacityScheduler > --- > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Wangda Tan > Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, > YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, > YARN-4390.3.patch, YARN-4390.4.patch, YARN-4390.5.patch > > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4390: - Summary: Do surgical preemption based on reserved container in CapacityScheduler (was: Consider container request size during CS preemption) > Do surgical preemption based on reserved container in CapacityScheduler > --- > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Wangda Tan > Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, > YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, > YARN-4390.3.patch, YARN-4390.4.patch, YARN-4390.5.patch > > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4390) Consider container request size during CS preemption
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4390: - Attachment: YARN-4390.5.patch Rebased to latest trunk, added a couple of tests, and simplified calculator code a little as suggested offline by [~jianhe]. (ver.5) > Consider container request size during CS preemption > > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Wangda Tan > Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, > YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, > YARN-4390.3.patch, YARN-4390.4.patch, YARN-4390.5.patch > > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers
[ https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-2885: -- Attachment: YARN-2885.010.patch This is a combo patch of YARN-2885 and YARN-4335 rebased against trunk to see if Jenkins is fine.. > Create AMRMProxy request interceptor for distributed scheduling decisions for > queueable containers > -- > > Key: YARN-2885 > URL: https://issues.apache.org/jira/browse/YARN-2885 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2885-yarn-2877.001.patch, > YARN-2885-yarn-2877.002.patch, YARN-2885-yarn-2877.full-2.patch, > YARN-2885-yarn-2877.full-3.patch, YARN-2885-yarn-2877.full.patch, > YARN-2885-yarn-2877.v4.patch, YARN-2885-yarn-2877.v5.patch, > YARN-2885-yarn-2877.v6.patch, YARN-2885-yarn-2877.v7.patch, > YARN-2885-yarn-2877.v8.patch, YARN-2885-yarn-2877.v9.patch, > YARN-2885.010.patch, YARN-2885_api_changes.patch > > > We propose to add a Local ResourceManager (LocalRM) to the NM in order to > support distributed scheduling decisions. > Architecturally we leverage the RMProxy, introduced in YARN-2884. > The LocalRM makes distributed decisions for queuable containers requests. > Guaranteed-start requests are still handled by the central RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4851) Metric improvements for ATS v1.5 storage components
[ https://issues.apache.org/jira/browse/YARN-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254750#comment-15254750 ] Hitesh Shah commented on YARN-4851: --- Some general comments on usability ( have not reviewed the patch in detail) - names need a bit of work e.g. SummaryDataReadTimeNumOps and SummaryDataReadTimeAvgTime - not sure why NumOps has a relation to ReadTime and time in ReadTimeAvgTime seems redundant. - would be good to have the scale in there i.e. is time in millis or seconds? - updates to the timeline server docs for these metrics seems missing. - what is the difference bet CacheRefreshTimeNumOps and CacheRefreshOps ? - Likewise for LogCleanTimeNumOps vs LogsDirsCleaned or PutDomainTimeNumOps vs PutDomainOps - cache eviction rates needed? - how do we get a count of how many cache refreshes were due to stale data vs never cached/evicted earlier? do we need this? - should be there 2 levels of metrics - one group enabled by default and a second group for more detailed monitoring to reduce load on the metrics system? - would be good to understand the request count at the ATSv1.5 level itself to understand which calls end up going to summary vs cache vs fs-based lookups ( i.e. across all gets ). - at the overall ATS level, an overall avg latency across all reqs might be useful for a general health check > Metric improvements for ATS v1.5 storage components > --- > > Key: YARN-4851 > URL: https://issues.apache.org/jira/browse/YARN-4851 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4851-trunk.001.patch, YARN-4851-trunk.002.patch > > > We can add more metrics to the ATS v1.5 storage systems, including purging, > cache hit/misses, read latency, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4807) MockAM#waitForState sleep duration is too long
[ https://issues.apache.org/jira/browse/YARN-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-4807: --- Attachment: YARN-4807.015.patch > MockAM#waitForState sleep duration is too long > -- > > Key: YARN-4807 > URL: https://issues.apache.org/jira/browse/YARN-4807 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Yufei Gu > Labels: newbie > Attachments: YARN-4807.001.patch, YARN-4807.002.patch, > YARN-4807.003.patch, YARN-4807.004.patch, YARN-4807.005.patch, > YARN-4807.006.patch, YARN-4807.007.patch, YARN-4807.008.patch, > YARN-4807.009.patch, YARN-4807.010.patch, YARN-4807.011.patch, > YARN-4807.012.patch, YARN-4807.013.patch, YARN-4807.014.patch, > YARN-4807.015.patch > > > MockAM#waitForState sleep duration (500 ms) is too long. Also, there is > significant duplication with MockRM#waitForState. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4990) Re-direction of a particular log file within in a container in NM UI does not redirect properly to Log Server ( history ) on container completion
Hitesh Shah created YARN-4990: - Summary: Re-direction of a particular log file within in a container in NM UI does not redirect properly to Log Server ( history ) on container completion Key: YARN-4990 URL: https://issues.apache.org/jira/browse/YARN-4990 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah The NM does the redirection to the history server correctly. However if the user is viewing or has a link to a particular specific file, the redirect ends up going to the top level page for the container and not redirecting to the specific file. Additionally, the start param to show logs from the offset 0 also goes missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy
[ https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-4766: - Attachment: yarn4766.004.patch > NM should not aggregate logs older than the retention policy > > > Key: YARN-4766 > URL: https://issues.apache.org/jira/browse/YARN-4766 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: yarn4766.001.patch, yarn4766.002.patch, > yarn4766.003.patch, yarn4766.004.patch > > > When a log aggregation fails on the NM the information is for the attempt is > kept in the recovery DB. Log aggregation can fail for multiple reasons which > are often related to HDFS space or permissions. > On restart the recovery DB is read and if an application attempt needs its > logs aggregated, the files are scheduled for aggregation without any checks. > The log files could be older than the retention limit in which case we should > not aggregate them but immediately mark them for deletion from the local file > system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4390) Consider container request size during CS preemption
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254626#comment-15254626 ] Hadoop QA commented on YARN-4390: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 12 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 28 new + 506 unchanged - 15 fixed = 534 total (was 521) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 15s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 10s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 179m 35s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.8.0_77 Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_95 Timed out junit tests | org
[jira] [Commented] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.
[ https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254625#comment-15254625 ] Hadoop QA commented on YARN-4984: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 51s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 22s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 37m 29s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12800285/YARN-4984-v2.patch | | JIRA Issue | YARN-4984 | | Optional Tests | asflicense
[jira] [Commented] (YARN-4766) NM should not aggregate logs older than the retention policy
[ https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254624#comment-15254624 ] Haibo Chen commented on YARN-4766: -- @Robert Kanter, thanks very much for your review. I have addressed all issues in the latest patch. For #6, I didn't follow exactly your comments. Instead, a new method that takes configs and expected files. testAggregatorWithRetentionPolicyDisabled_shouldUploadAllFiles and testAggregatorWhenNoFileOlderThanRetentionPolicy_ShouldUploadAll are still very much alike, but most of the code duplication is removed. > NM should not aggregate logs older than the retention policy > > > Key: YARN-4766 > URL: https://issues.apache.org/jira/browse/YARN-4766 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: yarn4766.001.patch, yarn4766.002.patch, > yarn4766.003.patch > > > When a log aggregation fails on the NM the information is for the attempt is > kept in the recovery DB. Log aggregation can fail for multiple reasons which > are often related to HDFS space or permissions. > On restart the recovery DB is read and if an application attempt needs its > logs aggregated, the files are scheduled for aggregation without any checks. > The log files could be older than the retention limit in which case we should > not aggregate them but immediately mark them for deletion from the local file > system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4556) TestFifoScheduler.testResourceOverCommit fails
[ https://issues.apache.org/jira/browse/YARN-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254525#comment-15254525 ] Eric Badger commented on YARN-4556: --- [~eepayne], please review this patch and commit to branch-2.7 if you think it looks good. > TestFifoScheduler.testResourceOverCommit fails > --- > > Key: YARN-4556 > URL: https://issues.apache.org/jira/browse/YARN-4556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Akihiro Suda >Assignee: Akihiro Suda > Fix For: 2.8.0 > > Attachments: YARN-4556-1.patch, YARN-4556-branch-2.7.001.patch > > > From YARN-4548 Jenkins log: > https://builds.apache.org/job/PreCommit-YARN-Build/10181/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt > {code} > Running > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler > Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 31.004 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler > testResourceOverCommit(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler) > Time elapsed: 4.746 sec <<< FAILURE! > java.lang.AssertionError: expected:<-2048> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler.testResourceOverCommit(TestFifoScheduler.java:1142) > {code} > https://github.com/apache/hadoop/blob/8676a118a12165ae5a8b80a2a4596c133471ebc1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java#L1142 > It seems that Jenkins has been hitting this intermittently since April 2015 > https://www.google.com/search?q=TestFifoScheduler.testResourceOverCommit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-4556) TestFifoScheduler.testResourceOverCommit fails
[ https://issues.apache.org/jira/browse/YARN-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger reopened YARN-4556: --- Adding a 2.7 patch. > TestFifoScheduler.testResourceOverCommit fails > --- > > Key: YARN-4556 > URL: https://issues.apache.org/jira/browse/YARN-4556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Akihiro Suda >Assignee: Akihiro Suda > Fix For: 2.8.0 > > Attachments: YARN-4556-1.patch, YARN-4556-branch-2.7.001.patch > > > From YARN-4548 Jenkins log: > https://builds.apache.org/job/PreCommit-YARN-Build/10181/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt > {code} > Running > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler > Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 31.004 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler > testResourceOverCommit(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler) > Time elapsed: 4.746 sec <<< FAILURE! > java.lang.AssertionError: expected:<-2048> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler.testResourceOverCommit(TestFifoScheduler.java:1142) > {code} > https://github.com/apache/hadoop/blob/8676a118a12165ae5a8b80a2a4596c133471ebc1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java#L1142 > It seems that Jenkins has been hitting this intermittently since April 2015 > https://www.google.com/search?q=TestFifoScheduler.testResourceOverCommit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4556) TestFifoScheduler.testResourceOverCommit fails
[ https://issues.apache.org/jira/browse/YARN-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-4556: -- Attachment: YARN-4556-branch-2.7.001.patch > TestFifoScheduler.testResourceOverCommit fails > --- > > Key: YARN-4556 > URL: https://issues.apache.org/jira/browse/YARN-4556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Akihiro Suda >Assignee: Akihiro Suda > Fix For: 2.8.0 > > Attachments: YARN-4556-1.patch, YARN-4556-branch-2.7.001.patch > > > From YARN-4548 Jenkins log: > https://builds.apache.org/job/PreCommit-YARN-Build/10181/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt > {code} > Running > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler > Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 31.004 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler > testResourceOverCommit(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler) > Time elapsed: 4.746 sec <<< FAILURE! > java.lang.AssertionError: expected:<-2048> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler.testResourceOverCommit(TestFifoScheduler.java:1142) > {code} > https://github.com/apache/hadoop/blob/8676a118a12165ae5a8b80a2a4596c133471ebc1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java#L1142 > It seems that Jenkins has been hitting this intermittently since April 2015 > https://www.google.com/search?q=TestFifoScheduler.testResourceOverCommit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4971) RM fails to re-bind to wildcard IP after failover in multi homed clusters
[ https://issues.apache.org/jira/browse/YARN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254507#comment-15254507 ] Ray Chiang commented on YARN-4971: -- +1 (nonbinding). The only new test I can think of would be to verify that the member variable address stays at 0.0.0.0 if it's initially 0.0.0.0--mainly useful as a "spec" for the class behavior. > RM fails to re-bind to wildcard IP after failover in multi homed clusters > - > > Key: YARN-4971 > URL: https://issues.apache.org/jira/browse/YARN-4971 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg > Attachments: YARN-4971.1.patch > > > If the RM has the {{yarn.resourcemanager.bind-host}} set to 0.0.0.0 the first > time the service becomes active binding to the wildcard works as expected. If > the service has transitioned from active to standby and then becomes active > again after failovers the service only binds to one of the ip addresses. > There is a difference between the services inside the RM: it only seem to > happen for the services listening on ports: 8030 and 8032 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.
[ https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4984: - Attachment: YARN-4984-v2.patch Thanks [~leftnoteasy] for review and comments! bq. We may need to remove following statement as well. Nice catch. Remove this unnecessary code in v2 patch. > LogAggregationService shouldn't swallow exception in handling createAppDir() > which cause thread leak. > - > > Key: YARN-4984 > URL: https://issues.apache.org/jira/browse/YARN-4984 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.2 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4984-v2.patch, YARN-4984.patch > > > Due to YARN-4325, many stale applications still exists in NM state store and > get recovered after NM restart. The app initiation will get failed due to > token invalid, but exception is swallowed and aggregator thread is still > created for invalid app. > Exception is: > {noformat} > 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService > (LogAggregationService.java:run(300)) - Failed to setup application log > directory for application_1448060878692_11842 > 159 > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo > und in cache > 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427) > 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358) > 162 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source) > 164 at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) > 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown > Source) > 166 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 167 at java.lang.reflect.Method.invoke(Method.java:606) > 168 at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) > 169 at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > 171 at > org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116) > 172 at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315) > 173 at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311) > 174 at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > 175 at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311) > 176 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248) > 177 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67) > 178 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) > 179 at java.security.AccessController.doPrivileged(Native Method) > 180 at javax.security.auth.Subject.doAs(Subject.java:415) > 181 at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > 182 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261) > 183 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367) > 184 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > 185 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447) > 186 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4846) Random failures for TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers
[ https://issues.apache.org/jira/browse/YARN-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254480#comment-15254480 ] Hudson commented on YARN-4846: -- FAILURE: Integrated in Hadoop-trunk-Commit #9656 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9656/]) YARN-4846. Fix random failures for (wangda: rev 7cb3a3da96e59fc9b6528644dae5fb0ac1e44eac) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerPreemption.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java > Random failures for > TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers > > > Key: YARN-4846 > URL: https://issues.apache.org/jira/browse/YARN-4846 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Fix For: 2.9.0 > > Attachments: 0001-YARN-4846.patch, 0002-YARN-4846.patch, > 0003-YARN-4846.patch, 0004-YARN-4846.patch, YARN-4846-update-PCPP.patch > > > {noformat} > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPreemption.testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers(TestCapacitySchedulerPreemption.java:473) > {noformat} > https://builds.apache.org/job/PreCommit-YARN-Build/10826/testReport/org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity/TestCapacitySchedulerPreemption/testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4851) Metric improvements for ATS v1.5 storage components
[ https://issues.apache.org/jira/browse/YARN-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4851: Target Version/s: 2.8.0 > Metric improvements for ATS v1.5 storage components > --- > > Key: YARN-4851 > URL: https://issues.apache.org/jira/browse/YARN-4851 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4851-trunk.001.patch, YARN-4851-trunk.002.patch > > > We can add more metrics to the ATS v1.5 storage systems, including purging, > cache hit/misses, read latency, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4717) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup
[ https://issues.apache.org/jira/browse/YARN-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254448#comment-15254448 ] Vinod Kumar Vavilapalli commented on YARN-4717: --- [~templedf] / [~rkanter], does this exist on previous branches too? If so, can this be backported to 2.8.0 / 2.7.x etc? > TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails > Intermittently due to IllegalArgumentException from cleanup > --- > > Key: YARN-4717 > URL: https://issues.apache.org/jira/browse/YARN-4717 > Project: Hadoop YARN > Issue Type: Test > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Minor > Fix For: 2.9.0 > > Attachments: YARN-4717.001.patch > > > The same issue that was resolved by [~zxu] in YARN-3602 is back. Looks like > the commons-io package throws an IAE instead of an IOE now if the directory > doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4717) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup
[ https://issues.apache.org/jira/browse/YARN-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4717: -- Issue Type: Test (was: Bug) > TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails > Intermittently due to IllegalArgumentException from cleanup > --- > > Key: YARN-4717 > URL: https://issues.apache.org/jira/browse/YARN-4717 > Project: Hadoop YARN > Issue Type: Test > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Minor > Fix For: 2.9.0 > > Attachments: YARN-4717.001.patch > > > The same issue that was resolved by [~zxu] in YARN-3602 is back. Looks like > the commons-io package throws an IAE instead of an IOE now if the directory > doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254441#comment-15254441 ] Vinod Kumar Vavilapalli commented on YARN-4599: --- bq. We are likely better off setting hard limit for all yarn containers so they don't interfere anything else on the machine. We could disable OOM control on the cgroup corresponding to all yarn containers (not including NM) and if all containers are paused, the NM can decide what tasks to kill. This is particularly useful if we are oversubscribing the node. This seems like our only choice, given that none of the options to recover (when the per-container-limit is hit and when OOM-killer is disabled) are usable in practice for YARN containers. /cc [~sidharta-s], [~shanekumpf] > Set OOM control for memory cgroups > -- > > Key: YARN-4599 > URL: https://issues.apache.org/jira/browse/YARN-4599 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-4599-not-so-useful.patch > > > YARN-1856 adds memory cgroups enforcing support. We should also explicitly > set OOM control so that containers are not killed as soon as they go over > their usage. Today, one could set the swappiness to control this, but > clusters with swap turned off exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3602) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IOException from cleanup
[ https://issues.apache.org/jira/browse/YARN-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3602: -- Issue Type: Test (was: Bug) > TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails > Intermittently due to IOException from cleanup > -- > > Key: YARN-3602 > URL: https://issues.apache.org/jira/browse/YARN-3602 > Project: Hadoop YARN > Issue Type: Test > Components: test >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: BB2015-05-RFC > Fix For: 2.8.0, 2.7.3 > > Attachments: YARN-3602.000.patch > > > ResourceLocalizationService.testPublicResourceInitializesLocalDir fails > Intermittently due to IOException from cleanup. The stack trace is the > following from test report at > https://builds.apache.org/job/PreCommit-YARN-Build/7729/testReport/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer/TestResourceLocalizationService/testPublicResourceInitializesLocalDir/ > {code} > Error Message > Unable to delete directory > target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/2/filecache. > Stacktrace > java.io.IOException: Unable to delete directory > target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/2/filecache. > at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1541) > at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270) > at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653) > at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) > at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270) > at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653) > at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.cleanup(TestResourceLocalizationService.java:187) > {code} > It looks like we can safely ignore the IOException in cleanup which is called > after test. > The IOException may be due to the test machine environment because > TestResourceLocalizationService/2/filecache is created by > ResourceLocalizationService#initializeLocalDir. > testPublicResourceInitializesLocalDir created 0/filecache, 1/filecache, > 2/filecache and 3/filecache > {code} > for (int i = 0; i < 4; ++i) { > localDirs.add(lfs.makeQualified(new Path(basedir, i + ""))); > sDirs[i] = localDirs.get(i).toString(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode
[ https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254433#comment-15254433 ] Li Lu commented on YARN-4983: - Similar UT failures happened in HADOOP-12563. The patch now get reverted. I'm launching another Jenkins run for this JIRA. > JVM and UGI metrics disappear after RM is once transitioned to standby mode > --- > > Key: YARN-4983 > URL: https://issues.apache.org/jira/browse/YARN-4983 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4983-trunk.000.patch, YARN-4983-trunk.001.patch > > > When get transitioned to standby, the RM will shutdown the existing metric > system and relaunch a new one. This will cause the jvm metrics and ugi > metrics to miss in the new metric system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254414#comment-15254414 ] Naganarasimha G R commented on YARN-3215: - Hi [~wangda], i have corrected the issue in {{YARN-3215.branch-2.8.v2.002.patch}} , *TestRMWebServicesNodes* is already tracked under YARN-4947. Can you please review... > Respect labels in CapacityScheduler when computing headroom > --- > > Key: YARN-3215 > URL: https://issues.apache.org/jira/browse/YARN-3215 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-3215.branch-2.8.v2.002.patch, > YARN-3215.v1.001.patch, YARN-3215.v2.001.patch, YARN-3215.v2.002.patch, > YARN-3215.v2.003.patch, YARN-3215.v2.branch-2.8.patch > > > In existing CapacityScheduler, when computing headroom of an application, it > will only consider "non-labeled" nodes of this application. > But it is possible the application is asking for labeled resources, so > headroom-by-label (like 5G resource available under node-label=red) is > required to get better resource allocation and avoid deadlocks such as > MAPREDUCE-5928. > This JIRA could involve both API changes (such as adding a > label-to-available-resource map in AllocateResponse) and also internal > changes in CapacityScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4599: --- Attachment: yarn-4599-not-so-useful.patch > Set OOM control for memory cgroups > -- > > Key: YARN-4599 > URL: https://issues.apache.org/jira/browse/YARN-4599 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-4599-not-so-useful.patch > > > YARN-1856 adds memory cgroups enforcing support. We should also explicitly > set OOM control so that containers are not killed as soon as they go over > their usage. Today, one could set the swappiness to control this, but > clusters with swap turned off exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254348#comment-15254348 ] Karthik Kambatla commented on YARN-4599: FWIW, just posted the not useful version of the patch. > Set OOM control for memory cgroups > -- > > Key: YARN-4599 > URL: https://issues.apache.org/jira/browse/YARN-4599 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-4599-not-so-useful.patch > > > YARN-1856 adds memory cgroups enforcing support. We should also explicitly > set OOM control so that containers are not killed as soon as they go over > their usage. Today, one could set the swappiness to control this, but > clusters with swap turned off exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254335#comment-15254335 ] Karthik Kambatla commented on YARN-4599: Looked more into this, specifically on how to resume paused tasks. Per [this article|https://lwn.net/Articles/529927] {noformat} This operation is only allowed to the top cgroup of a sub-hierarchy. If OOM-killer is disabled, tasks under cgroup will hang/sleep in memory cgroup's OOM-waitqueue when they request accountable memory. For running them, you have to relax the memory cgroup's OOM status by * enlarge limit or reduce usage. To reduce usage, * kill some tasks. * move some tasks to other group with account migration. * remove some files (on tmpfs?) Then, stopped tasks will work again. At reading, current status of OOM is shown. oom_kill_disable 0 or 1 (if 1, oom-killer is disabled) under_oom0 or 1 (if 1, the memory cgroup is under OOM, tasks may be stopped.) {noformat} Looks like setting OOM control per each task is not particularly useful. We are likely better off setting hard limit for all yarn containers so they don't interfere anything else on the machine. We could disable OOM control on the cgroup corresponding to all yarn containers (not including NM) and if all containers are paused, the NM can decide what tasks to kill. This is particularly useful if we are oversubscribing the node. [~aw], [~vvasudev], [~vinodkv] - what do you think? > Set OOM control for memory cgroups > -- > > Key: YARN-4599 > URL: https://issues.apache.org/jira/browse/YARN-4599 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > > YARN-1856 adds memory cgroups enforcing support. We should also explicitly > set OOM control so that containers are not killed as soon as they go over > their usage. Today, one could set the swappiness to control this, but > clusters with swap turned off exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN
[ https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254330#comment-15254330 ] Allen Wittenauer commented on YARN-4478: Just an FYI, but there are two key problems that folks should be aware of: #1: None of the current hadoop jenkins "build project" jobs actually report all of the unit test failures in a consistent manner due to how maven works. If a dependent module fails, then the parent module is never run. In words: Given two modules A, B. Relationship is such that B requires A. If A's unit tests succeed, then B's unit tests are executed. If A's unit tests fail, then B's unit tests are *never run*. #2 I've already determined that a large chunk of the YARN unit tests CANNOT be run simultaneously due to TCP port usage. (See YARN-4950). This means that if two YARN nightlies are running on the same box at the same time, it's pretty much a 100% certainty that there will be spurious failures. (Yetus guarantees some\-\-but not total\-\-isolation via docker, so precommit should be immune to this particular problem.) That said, I've been working on a Yetus-based replacement for full compiles (YETUS-156). This would at least solve major parts of both these issues. I've been running it in test for Hadoop for a while now: (https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-test ). I've had unit tests turned off, but I just kicked off another run with the unit tests turned on so that you can see what happens. > [Umbrella] : Track all the Test failures in YARN > > > Key: YARN-4478 > URL: https://issues.apache.org/jira/browse/YARN-4478 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Rohith Sharma K S > > Recently many test cases are failing either timed out or new bug fix caused > impact. Many test faiures JIRA are raised and are in progress. > This is to track all the test failures JIRA's -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN
[ https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254297#comment-15254297 ] Vinod Kumar Vavilapalli commented on YARN-4478: --- bq. Point of concern is when a QA report test failures , contributors/committers has to search for the test failures JIRA IDs and comment on their respective JIRA may be like "test failures are unrelated to this patch. test failure is tracked by YARN-" This is very paining task when there are multiple module test failures. Instead of remembering all the test failures JIRA, Umbrella JIRA would help to find easily. Actually I've tried adding more and more to this umbrella, but it is going out of hand. I kind of agree with [~kasha], we will always have failing unit tests that need fixing. Let's just use the bug-type and component from on - those are easy to search for. I'm going to use ticket-type from now on, appreciate others also doing the same. [~rohithsharma], let's use this umbrella for your current initiative and then close it down. > [Umbrella] : Track all the Test failures in YARN > > > Key: YARN-4478 > URL: https://issues.apache.org/jira/browse/YARN-4478 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Rohith Sharma K S > > Recently many test cases are failing either timed out or new bug fix caused > impact. Many test faiures JIRA are raised and are in progress. > This is to track all the test failures JIRA's -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.
[ https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254291#comment-15254291 ] Hadoop QA commented on YARN-4984: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 16s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 50s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 36m 41s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery | | | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery | | | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/ha
[jira] [Commented] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.
[ https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254277#comment-15254277 ] Wangda Tan commented on YARN-4984: -- Thanks [~djp] for working on this, We may need to remove following statement as well: {code} if (appDirException != null) { throw appDirException; } {code} > LogAggregationService shouldn't swallow exception in handling createAppDir() > which cause thread leak. > - > > Key: YARN-4984 > URL: https://issues.apache.org/jira/browse/YARN-4984 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.2 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4984.patch > > > Due to YARN-4325, many stale applications still exists in NM state store and > get recovered after NM restart. The app initiation will get failed due to > token invalid, but exception is swallowed and aggregator thread is still > created for invalid app. > Exception is: > {noformat} > 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService > (LogAggregationService.java:run(300)) - Failed to setup application log > directory for application_1448060878692_11842 > 159 > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo > und in cache > 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427) > 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358) > 162 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source) > 164 at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) > 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown > Source) > 166 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 167 at java.lang.reflect.Method.invoke(Method.java:606) > 168 at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) > 169 at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > 171 at > org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116) > 172 at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315) > 173 at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311) > 174 at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > 175 at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311) > 176 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248) > 177 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67) > 178 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) > 179 at java.security.AccessController.doPrivileged(Native Method) > 180 at javax.security.auth.Subject.doAs(Subject.java:415) > 181 at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > 182 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261) > 183 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367) > 184 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > 185 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447) > 186 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode
[ https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254238#comment-15254238 ] Daniel Templeton commented on YARN-4983: I meant there are unit tests that are tripping on the problems you're trying to fix. :) > JVM and UGI metrics disappear after RM is once transitioned to standby mode > --- > > Key: YARN-4983 > URL: https://issues.apache.org/jira/browse/YARN-4983 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4983-trunk.000.patch, YARN-4983-trunk.001.patch > > > When get transitioned to standby, the RM will shutdown the existing metric > system and relaunch a new one. This will cause the jvm metrics and ugi > metrics to miss in the new metric system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode
[ https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254222#comment-15254222 ] Li Lu commented on YARN-4983: - Thanks [~templedf]! bq. Some of the unit tests that trip on it are only creating disembodied schedulers. Did you mean the failures for the first time, or from the second patch? I'm trying to understand with the second groups of failures, with something complaining from the protobuf level... > JVM and UGI metrics disappear after RM is once transitioned to standby mode > --- > > Key: YARN-4983 > URL: https://issues.apache.org/jira/browse/YARN-4983 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4983-trunk.000.patch, YARN-4983-trunk.001.patch > > > When get transitioned to standby, the RM will shutdown the existing metric > system and relaunch a new one. This will cause the jvm metrics and ugi > metrics to miss in the new metric system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.
[ https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254209#comment-15254209 ] Junping Du commented on YARN-4984: -- Attach a patch to fix this issue. The fix is very straightforward, so no UT is needed. > LogAggregationService shouldn't swallow exception in handling createAppDir() > which cause thread leak. > - > > Key: YARN-4984 > URL: https://issues.apache.org/jira/browse/YARN-4984 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.2 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4984.patch > > > Due to YARN-4325, many stale applications still exists in NM state store and > get recovered after NM restart. The app initiation will get failed due to > token invalid, but exception is swallowed and aggregator thread is still > created for invalid app. > Exception is: > {noformat} > 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService > (LogAggregationService.java:run(300)) - Failed to setup application log > directory for application_1448060878692_11842 > 159 > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo > und in cache > 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427) > 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358) > 162 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source) > 164 at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) > 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown > Source) > 166 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 167 at java.lang.reflect.Method.invoke(Method.java:606) > 168 at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) > 169 at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > 171 at > org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116) > 172 at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315) > 173 at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311) > 174 at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > 175 at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311) > 176 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248) > 177 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67) > 178 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) > 179 at java.security.AccessController.doPrivileged(Native Method) > 180 at javax.security.auth.Subject.doAs(Subject.java:415) > 181 at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > 182 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261) > 183 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367) > 184 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > 185 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447) > 186 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.
[ https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4984: - Attachment: YARN-4984.patch > LogAggregationService shouldn't swallow exception in handling createAppDir() > which cause thread leak. > - > > Key: YARN-4984 > URL: https://issues.apache.org/jira/browse/YARN-4984 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.2 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4984.patch > > > Due to YARN-4325, many stale applications still exists in NM state store and > get recovered after NM restart. The app initiation will get failed due to > token invalid, but exception is swallowed and aggregator thread is still > created for invalid app. > Exception is: > {noformat} > 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService > (LogAggregationService.java:run(300)) - Failed to setup application log > directory for application_1448060878692_11842 > 159 > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo > und in cache > 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427) > 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358) > 162 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source) > 164 at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) > 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown > Source) > 166 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 167 at java.lang.reflect.Method.invoke(Method.java:606) > 168 at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) > 169 at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > 171 at > org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116) > 172 at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315) > 173 at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311) > 174 at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > 175 at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311) > 176 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248) > 177 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67) > 178 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) > 179 at java.security.AccessController.doPrivileged(Native Method) > 180 at javax.security.auth.Subject.doAs(Subject.java:415) > 181 at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > 182 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261) > 183 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367) > 184 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > 185 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447) > 186 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode
[ https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254206#comment-15254206 ] Daniel Templeton commented on YARN-4983: This also comes up as an issue with some unit tests. I agree that the root issue is the erroneous assumption that the RM and scheduler won't be started a second time from within the same VM. Ideally we'd fix it below the level or the RM. Some of the unit tests that trip on it are only creating disembodied schedulers. > JVM and UGI metrics disappear after RM is once transitioned to standby mode > --- > > Key: YARN-4983 > URL: https://issues.apache.org/jira/browse/YARN-4983 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4983-trunk.000.patch, YARN-4983-trunk.001.patch > > > When get transitioned to standby, the RM will shutdown the existing metric > system and relaunch a new one. This will cause the jvm metrics and ugi > metrics to miss in the new metric system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases
[ https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4325: - Attachment: YARN-4325.patch Put a demo patch first, a completed patch with tests will come later. > Purge app state from NM state-store should cover more LOG_HANDLING cases > > > Key: YARN-4325 > URL: https://issues.apache.org/jira/browse/YARN-4325 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: ApplicationImpl.PNG, YARN-4325.patch > > > From a long running cluster, we found tens of thousands of stale apps still > be recovered in NM restart recovery. > After investigating, there are three issues cause app state leak in NM > state-store: > 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in > NMStateStore. > 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit > aggregator's doAppLogAggregation() exception case. > 3. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has > transition to remove app in NM state store. Application in other status - > like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to > remove this app from NM state store even after app get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases
[ https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254095#comment-15254095 ] Junping Du edited comment on YARN-4325 at 4/22/16 3:43 PM: --- We hit the same issue in a cluster recently again. After checking log, related code and state machine graph for ApplicationImpl (attached). There are three issues cause app state leak in NM state-store 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in NMStateStore. 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit aggregator's doAppLogAggregation() exception case. 3. Only Application in *FINISHED* status receiving APPLICATION_LOG_FINISHED has transition to remove app in NM state store. Application in other status - like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to remove this app from NM state store even after app get finished. Will put up a patch soon to fix this issue. was (Author: djp): We hit the same issue in a cluster recently again. After checking log, related code and state machine graph for ApplicationImpl (attached). There are three issues cause app state leak in NM state-store 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in NMStateStore. 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit aggregator's doAppLogAggregation() exception case. 2. Only Application in *FINISHED* status receiving APPLICATION_LOG_FINISHED has transition to remove app in NM state store. Application in other status - like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to remove this app from NM state store even after app get finished. Will put up a patch soon to fix this issue. > Purge app state from NM state-store should cover more LOG_HANDLING cases > > > Key: YARN-4325 > URL: https://issues.apache.org/jira/browse/YARN-4325 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: ApplicationImpl.PNG > > > From a long running cluster, we found tens of thousands of stale apps still > be recovered in NM restart recovery. > After investigating, there are three issues cause app state leak in NM > state-store: > 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in > NMStateStore. > 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit > aggregator's doAppLogAggregation() exception case. > 3. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has > transition to remove app in NM state store. Application in other status - > like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to > remove this app from NM state store even after app get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases
[ https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4325: - Description: >From a long running cluster, we found tens of thousands of stale apps still be >recovered in NM restart recovery. After investigating, there are three issues cause app state leak in NM state-store: 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in NMStateStore. 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit aggregator's doAppLogAggregation() exception case. 2. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has transition to remove app in NM state store. Application in other status - like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to remove this app from NM state store even after app get finished. was:From a long running cluster, we found tens of thousands of stale apps still be recovered in NM restart recovery. The reason is some wrong configuration setting to log aggregation so the end of log aggregation events are not received so stale apps are not purged properly. We should make sure the removal of app state to be independent of log aggregation life cycle. > Purge app state from NM state-store should cover more LOG_HANDLING cases > > > Key: YARN-4325 > URL: https://issues.apache.org/jira/browse/YARN-4325 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: ApplicationImpl.PNG > > > From a long running cluster, we found tens of thousands of stale apps still > be recovered in NM restart recovery. > After investigating, there are three issues cause app state leak in NM > state-store: > 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in > NMStateStore. > 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit > aggregator's doAppLogAggregation() exception case. > 2. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has > transition to remove app in NM state store. Application in other status - > like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to > remove this app from NM state store even after app get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases
[ https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4325: - Description: >From a long running cluster, we found tens of thousands of stale apps still be >recovered in NM restart recovery. After investigating, there are three issues cause app state leak in NM state-store: 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in NMStateStore. 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit aggregator's doAppLogAggregation() exception case. 3. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has transition to remove app in NM state store. Application in other status - like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to remove this app from NM state store even after app get finished. was: >From a long running cluster, we found tens of thousands of stale apps still be >recovered in NM restart recovery. After investigating, there are three issues cause app state leak in NM state-store: 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in NMStateStore. 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit aggregator's doAppLogAggregation() exception case. 2. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has transition to remove app in NM state store. Application in other status - like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to remove this app from NM state store even after app get finished. > Purge app state from NM state-store should cover more LOG_HANDLING cases > > > Key: YARN-4325 > URL: https://issues.apache.org/jira/browse/YARN-4325 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: ApplicationImpl.PNG > > > From a long running cluster, we found tens of thousands of stale apps still > be recovered in NM restart recovery. > After investigating, there are three issues cause app state leak in NM > state-store: > 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in > NMStateStore. > 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit > aggregator's doAppLogAggregation() exception case. > 3. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has > transition to remove app in NM state store. Application in other status - > like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to > remove this app from NM state store even after app get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases
[ https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4325: - Attachment: ApplicationImpl.PNG > Purge app state from NM state-store should cover more LOG_HANDLING cases > > > Key: YARN-4325 > URL: https://issues.apache.org/jira/browse/YARN-4325 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: ApplicationImpl.PNG > > > From a long running cluster, we found tens of thousands of stale apps still > be recovered in NM restart recovery. The reason is some wrong configuration > setting to log aggregation so the end of log aggregation events are not > received so stale apps are not purged properly. We should make sure the > removal of app state to be independent of log aggregation life cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases
[ https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4325: - Attachment: (was: ApplicationImpl) > Purge app state from NM state-store should cover more LOG_HANDLING cases > > > Key: YARN-4325 > URL: https://issues.apache.org/jira/browse/YARN-4325 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: ApplicationImpl.PNG > > > From a long running cluster, we found tens of thousands of stale apps still > be recovered in NM restart recovery. The reason is some wrong configuration > setting to log aggregation so the end of log aggregation events are not > received so stale apps are not purged properly. We should make sure the > removal of app state to be independent of log aggregation life cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases
[ https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4325: - Attachment: ApplicationImpl > Purge app state from NM state-store should cover more LOG_HANDLING cases > > > Key: YARN-4325 > URL: https://issues.apache.org/jira/browse/YARN-4325 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: ApplicationImpl > > > From a long running cluster, we found tens of thousands of stale apps still > be recovered in NM restart recovery. The reason is some wrong configuration > setting to log aggregation so the end of log aggregation events are not > received so stale apps are not purged properly. We should make sure the > removal of app state to be independent of log aggregation life cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases
[ https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4325: - Attachment: (was: ApplicationImpl.gv) > Purge app state from NM state-store should cover more LOG_HANDLING cases > > > Key: YARN-4325 > URL: https://issues.apache.org/jira/browse/YARN-4325 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: ApplicationImpl > > > From a long running cluster, we found tens of thousands of stale apps still > be recovered in NM restart recovery. The reason is some wrong configuration > setting to log aggregation so the end of log aggregation events are not > received so stale apps are not purged properly. We should make sure the > removal of app state to be independent of log aggregation life cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases
[ https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4325: - Attachment: ApplicationImpl.gv > Purge app state from NM state-store should cover more LOG_HANDLING cases > > > Key: YARN-4325 > URL: https://issues.apache.org/jira/browse/YARN-4325 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: ApplicationImpl > > > From a long running cluster, we found tens of thousands of stale apps still > be recovered in NM restart recovery. The reason is some wrong configuration > setting to log aggregation so the end of log aggregation events are not > received so stale apps are not purged properly. We should make sure the > removal of app state to be independent of log aggregation life cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases
[ https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4325: - Summary: Purge app state from NM state-store should cover more LOG_HANDLING cases (was: purge app state from NM state-store should be independent of log aggregation) > Purge app state from NM state-store should cover more LOG_HANDLING cases > > > Key: YARN-4325 > URL: https://issues.apache.org/jira/browse/YARN-4325 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > > From a long running cluster, we found tens of thousands of stale apps still > be recovered in NM restart recovery. The reason is some wrong configuration > setting to log aggregation so the end of log aggregation events are not > received so stale apps are not purged properly. We should make sure the > removal of app state to be independent of log aggregation life cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4325) purge app state from NM state-store should be independent of log aggregation
[ https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254095#comment-15254095 ] Junping Du commented on YARN-4325: -- We hit the same issue in a cluster recently again. After checking log, related code and state machine graph for ApplicationImpl (attached). There are three issues cause app state leak in NM state-store 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in NMStateStore. 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit aggregator's doAppLogAggregation() exception case. 2. Only Application in *FINISHED* status receiving APPLICATION_LOG_FINISHED has transition to remove app in NM state store. Application in other status - like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to remove this app from NM state store even after app get finished. Will put up a patch soon to fix this issue. > purge app state from NM state-store should be independent of log aggregation > > > Key: YARN-4325 > URL: https://issues.apache.org/jira/browse/YARN-4325 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > > From a long running cluster, we found tens of thousands of stale apps still > be recovered in NM restart recovery. The reason is some wrong configuration > setting to log aggregation so the end of log aggregation events are not > received so stale apps are not purged properly. We should make sure the > removal of app state to be independent of log aggregation life cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4962) support filling up containers on node one by one
[ https://issues.apache.org/jira/browse/YARN-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253975#comment-15253975 ] Daniel Templeton commented on YARN-4962: Got it. I misunderstood your initial problem statement. You want jobs to be scheduled to fill a node completely before scheduling to the next node to avoid having the workload spread like peanut butter all over the cluster, making it hard to schedule a job that needs a full node. In Grid Engine, the scheduling formula is configurable. The scheduler will look for the node for which the scheduling formula has the highest value. The default scheduling formula is essentially "amount of free space." To get the fill-up behavior you want, you'd set the scheduling formula to "number of containers." Which is essentially your original suggestion. > support filling up containers on node one by one > - > > Key: YARN-4962 > URL: https://issues.apache.org/jira/browse/YARN-4962 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: sandflee > > we had a gpu cluster, jobs with bigger resource request couldn't be satisfied > for node is running the jobs with smaller resource request. we didn't open > reserve system because gpu jobs may run days or weeks. we expect scheduler > allocate containers to fill the node , then there will be resource to run > jobs with big resource request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable
[ https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253902#comment-15253902 ] Naganarasimha G R commented on YARN-4963: - Thanks for the clarification [~wangda] & [~nroberts], yes point 2 addresses the same issue and my mistake i missed to read this. And also agree to the focus of this jira to be specific to the system level OFF-SWITCH configuration. bq. so I think when we do the application-level support the default would need to be either unlimited or some high value, otherwise we force all applications to set this limit to something other than 1 to get decent OFF_SWITCH scheduling behavior. Once we have system level OFF-SWITCH configuration do we require app level default also ? IIUC by default we try to make use of system level OFF-SWITCH configuration unless explicitly overridden by the app (implementation can be further discussed in that jira) bq. Sure, my application scheduled very quickly but my locality was terrible so I caused a lot of unnecessary cross-switch traffic. So I think we'll need some system-minimums that will prevent this type of abuse. This point is debatable, even though i agree your point for controlling cross-switch traffic, but still the app is performing under its capacity limits so would it be good to limit it control it. bq. If application A meets its OFF-SWITCH-per-node limit, do we offer the node to other applications in the same queue? any limitations if we offer the node to other applications in the same queue ? it should be fine right ? > capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat > configurable > > > Key: YARN-4963 > URL: https://issues.apache.org/jira/browse/YARN-4963 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 3.0.0, 2.7.2 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4963.001.patch > > > Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment > per heartbeat. With more and more non MapReduce workloads coming along, the > degree of locality is declining, causing scheduling to be significantly > slower. It's still important to limit the number of OFF_SWITCH assignments to > avoid densely packing OFF_SWITCH containers onto nodes. > Proposal is to add a simple config that makes the number of OFF_SWITCH > assignments configurable. > Will upload candidate patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4390) Consider container request size during CS preemption
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253884#comment-15253884 ] Eric Payne commented on YARN-4390: -- {quote} And since it uses R/W lock, write lock will be acquired only if node add / move or node resource update. So in most cases, nobody acquires write lock. I agree to cache node list inside PCPP if we do see performance issues. {quote} [~leftnoteasy], yes, that is a very good point. I was not thinking about {{ClusterNodeTracker#getNodes}} using the read lock, which, of course, can have multiple readers at any time. After thinking more about it, I don't think this will cause much of a strain on the RM. I still want to experiment with the patch a little more. > Consider container request size during CS preemption > > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Wangda Tan > Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, > YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, > YARN-4390.3.patch, YARN-4390.4.patch > > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4048) Linux kernel panic under strict CPU limits
[ https://issues.apache.org/jira/browse/YARN-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253789#comment-15253789 ] Naganarasimha G R commented on YARN-4048: - Hi [~scootli] It was a private code modification based on 2.7.0 and is not available outside. Hence no documentation of it either. > Linux kernel panic under strict CPU limits > -- > > Key: YARN-4048 > URL: https://issues.apache.org/jira/browse/YARN-4048 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Chengbing Liu >Priority: Critical > Attachments: panic.png > > > With YARN-2440 and YARN-2531, we have seen some kernel panics happening under > heavy pressure. Even with YARN-2809, it still panics. > We are using CentOS 6.5, hadoop 2.5.0-cdh5.2.0 with the above patches. I > guess the latest version also has the same issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4989) TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253681#comment-15253681 ] Rohith Sharma K S commented on YARN-4989: - Oops , twice comment has come. There was issue in connecting to JIRA which I thought earlier comment wont be displayed!! > TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails > intermittently > --- > > Key: YARN-4989 > URL: https://issues.apache.org/jira/browse/YARN-4989 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Ajith S > > Sometimes TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails > randomly. > {noformat} > java.lang.AssertionError: expected:<> but > was:<> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkCSLeafQueue(TestWorkPreservingRMRestart.java:289) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testCapacitySchedulerRecovery(TestWorkPreservingRMRestart.java:501) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4989) TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S reassigned YARN-4989: --- Assignee: Ajith S Assigning to Ajith since he asked me offline. > TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails > intermittently > --- > > Key: YARN-4989 > URL: https://issues.apache.org/jira/browse/YARN-4989 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Ajith S > > Sometimes TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails > randomly. > {noformat} > java.lang.AssertionError: expected:<> but > was:<> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkCSLeafQueue(TestWorkPreservingRMRestart.java:289) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testCapacitySchedulerRecovery(TestWorkPreservingRMRestart.java:501) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4989) TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253675#comment-15253675 ] Rohith Sharma K S commented on YARN-4989: - In test {{estWorkPreservingRMRestart#testCapacitySchedulerRecovery}}, after RM recovered and NM's are registered apps are being wait to recover the containers. In test code, there are 3 apps runing before RM restart. After RM restart, {{waitForNumContainersToRecover}} method is called only for 2 apps. {code} // Wait for RM to settle down on recovering containers; waitForNumContainersToRecover(2, rm2, am1_1.getApplicationAttemptId()); waitForNumContainersToRecover(2, rm2, am1_2.getApplicationAttemptId()); waitForNumContainersToRecover(2, rm2, am1_2.getApplicationAttemptId()); {code} In the above code, third {{waitForNumContainersToRecover}} should be for third app instead of 2nd apps which is duplicated. > TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails > intermittently > --- > > Key: YARN-4989 > URL: https://issues.apache.org/jira/browse/YARN-4989 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S > > Sometimes TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails > randomly. > {noformat} > java.lang.AssertionError: expected:<> but > was:<> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkCSLeafQueue(TestWorkPreservingRMRestart.java:289) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testCapacitySchedulerRecovery(TestWorkPreservingRMRestart.java:501) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4989) TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253667#comment-15253667 ] Rohith Sharma K S commented on YARN-4989: - In the test {{TestWorkPreservingRMRestart#testCapacitySchedulerRecovery}}, after RM is restarted, method {{waitForNumContainersToRecover}} has been called for submitted apps. There are 2 apps submitted, but waiting is only for 2 apps i.e am1_1 and am1_2. There is another AM *am2* which need to wait for container recovery. Code is there to wait but it is waiting for am1_2 only. {code} // Wait for RM to settle down on recovering containers; waitForNumContainersToRecover(2, rm2, am1_1.getApplicationAttemptId()); waitForNumContainersToRecover(2, rm2, am1_2.getApplicationAttemptId()); waitForNumContainersToRecover(2, rm2, am1_2.getApplicationAttemptId()); {code} In third waitForNumContainersToRecover, instead of am1_2, variable am2 should solve this randomness > TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails > intermittently > --- > > Key: YARN-4989 > URL: https://issues.apache.org/jira/browse/YARN-4989 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S > > Sometimes TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails > randomly. > {noformat} > java.lang.AssertionError: expected:<> but > was:<> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkCSLeafQueue(TestWorkPreservingRMRestart.java:289) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testCapacitySchedulerRecovery(TestWorkPreservingRMRestart.java:501) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4989) TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails intermittently
Rohith Sharma K S created YARN-4989: --- Summary: TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails intermittently Key: YARN-4989 URL: https://issues.apache.org/jira/browse/YARN-4989 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Rohith Sharma K S Sometimes TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails randomly. {noformat} java.lang.AssertionError: expected:<> but was:<> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkCSLeafQueue(TestWorkPreservingRMRestart.java:289) at org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testCapacitySchedulerRecovery(TestWorkPreservingRMRestart.java:501) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4048) Linux kernel panic under strict CPU limits
[ https://issues.apache.org/jira/browse/YARN-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253619#comment-15253619 ] lihuaqing commented on YARN-4048: - Hi Naganarasimha G R: I want to know how config the cgroup cpuset with hadoop 2.7.1. I don't find in the document of hadoop. Please show me? > Linux kernel panic under strict CPU limits > -- > > Key: YARN-4048 > URL: https://issues.apache.org/jira/browse/YARN-4048 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Chengbing Liu >Priority: Critical > Attachments: panic.png > > > With YARN-2440 and YARN-2531, we have seen some kernel panics happening under > heavy pressure. Even with YARN-2809, it still panics. > We are using CentOS 6.5, hadoop 2.5.0-cdh5.2.0 with the above patches. I > guess the latest version also has the same issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4957) Add getNewReservation in ApplicationClientProtocol
[ https://issues.apache.org/jira/browse/YARN-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253576#comment-15253576 ] Hadoop QA commented on YARN-4957: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 9 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 57s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 57s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 33s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped branch modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 4s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 2s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 57s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 41s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 17s {color} | {color:green} root: patch generated 0 new + 271 unchanged - 3 fixed = 271 total (was 274) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patch modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 50s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 38s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 42s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 44s {color} | {color:gr
[jira] [Commented] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode
[ https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253535#comment-15253535 ] Hadoop QA commented on YARN-4983: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 38s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 37s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 36s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 37s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 37s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 7s {color} | {color:red} root: patch generated 1 new + 173 unchanged - 1 fixed = 174 total (was 174) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 8s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 45s {color} | {color:red} hadoop-common in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 3s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 13s {color} | {color:red} hadoop-common in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 37s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s {color} | {colo
[jira] [Commented] (YARN-4390) Consider container request size during CS preemption
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253522#comment-15253522 ] Hadoop QA commented on YARN-4390: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 12 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 29 new + 506 unchanged - 16 fixed = 535 total (was 522) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 14s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 50s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 165m 26s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority | | | hadoop.yarn.server.resourcemanager.TestAppManager | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication | | | hadoop.yarn.ser
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253472#comment-15253472 ] Hadoop QA commented on YARN-4844: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 53 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 23s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 1s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 4m 25s {color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_77 with JDK v1.8.0_77 generated 1 new + 2 unchanged - 1 fixed = 3 total (was 3) {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 1s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 48s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 61 new + 1408 unchanged - 47 fixed = 1469 total (was 1455) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 16s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) with tabs. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 15s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 13s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 48s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 54s {color} | {color:red} hadoop-yarn-common in the patch failed with JDK
[jira] [Created] (YARN-4988) Limit filter in ApplicationBaseProtocol#getApplications should return latest applications
Rohith Sharma K S created YARN-4988: --- Summary: Limit filter in ApplicationBaseProtocol#getApplications should return latest applications Key: YARN-4988 URL: https://issues.apache.org/jira/browse/YARN-4988 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S When ever limit filter is used to get application report using ApplicationBaseProtocol#getApplications, the applications retrieved are not the latest. The retrieved applications are random based on the hashcode. The reason for above problem is RM maintains the apps in MAP where in insertion of application id is based on the hashcode. So if there are 10 applications from app-1 to app-10 and then limit is 5, then supposed to expect that applications from app-6 to app-10 should be retrieved. But now some first 5 apps in the MAP are retrieved. So applications retrieved are random 5!! I think limit should retrieve latest applications only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)